+ All Categories
Home > Documents > Tactical Language and Culture Training Systems

Tactical Language and Culture Training Systems

Date post: 06-Jan-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
12
Articles 72 AI MAGAZINE T he Tactical Language and Culture Training System (TLCTS) helps people quickly acquire functional skills in foreign lan- guages and cultures. It includes interactive lessons that focus on particular communicative skills and interactive games that apply those skills. Heavy emphasis is placed on spoken com- munication: learners must learn to speak the foreign language to complete the lessons and play the games. It focuses on the language and cultural skills needed to accomplish particular types of tasks and gives learners rich, realistic opportunities to practice achieving those tasks. Several TLCTS courses have been developed so far. Tactical Iraqi, Tactical Pashto, and Tactical French are in widespread use by U.S. marines and soldiers, and increasingly by military serv- ice members in other countries. Additional courses are being developed for use by business executives, workers for non- governmental organizations, and high school and college stu- dents. While precise numbers are impossible to obtain (we do not control copies made by the U.S. government), over 40,000 and as many as 60,000 people have trained so far with TLCTS courses. More than 1000 people download copies of TLCTS courses each month, either for their own use or to set up com- puter language labs and redistribute copies to students. Just one training site, the military advisor training center at Fort Riley, Kansas, trains approximately 10,000 people annually. Artificial intelligence technologies play multiple essential functions in TLCTS. Speech is the primary input modality, so automated speech recognition tailored to foreign language learners is essential. TLCTS courses are populated with “virtual humans” that engage in dialogue with learners. AI techniques are used to model the decision processes of the virtual humans and to support the generation of their behavior. This makes it possible to give learners extensive conversational practice. Learner modeling software continually monitors each learner’s application of communication skills to estimate the learner’s Copyright © 2009, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 Tactical Language and Culture Training Systems: Using AI to Teach Foreign Languages and Cultures W. Lewis Johnson and Andre Valente n The Tactical Language and Culture Training System (TLCTS) helps people quickly acquire communicative skills in foreign languages and cultures. More than 40,000 learners worldwide have used TLCTS courses. TLCTS utilizes arti- ficial intelligence technologies during the authoring process and at run time to process learner speech, engage in dialogue, and evaluate and assess learner performance. This paper describes the architecture of TLCTS and the artificial intelligence technologies that it employs and presents results from multiple evaluation studies that demonstrate the bene- fits of learning foreign language and culture using this approach.
Transcript
Page 1: Tactical Language and Culture Training Systems

Articles

72 AI MAGAZINE

The Tactical Language and Culture Training System (TLCTS)helps people quickly acquire functional skills in foreign lan-guages and cultures. It includes interactive lessons that focus onparticular communicative skills and interactive games thatapply those skills. Heavy emphasis is placed on spoken com-munication: learners must learn to speak the foreign languageto complete the lessons and play the games. It focuses on thelanguage and cultural skills needed to accomplish particulartypes of tasks and gives learners rich, realistic opportunities topractice achieving those tasks.

Several TLCTS courses have been developed so far. TacticalIraqi, Tactical Pashto, and Tactical French are in widespread useby U.S. marines and soldiers, and increasingly by military serv-ice members in other countries. Additional courses are beingdeveloped for use by business executives, workers for non-governmental organizations, and high school and college stu-dents. While precise numbers are impossible to obtain (we donot control copies made by the U.S. government), over 40,000and as many as 60,000 people have trained so far with TLCTScourses. More than 1000 people download copies of TLCTScourses each month, either for their own use or to set up com-puter language labs and redistribute copies to students. Just onetraining site, the military advisor training center at Fort Riley,Kansas, trains approximately 10,000 people annually.

Artificial intelligence technologies play multiple essentialfunctions in TLCTS. Speech is the primary input modality, soautomated speech recognition tailored to foreign languagelearners is essential. TLCTS courses are populated with “virtualhumans” that engage in dialogue with learners. AI techniquesare used to model the decision processes of the virtual humansand to support the generation of their behavior. This makes itpossible to give learners extensive conversational practice.Learner modeling software continually monitors each learner’sapplication of communication skills to estimate the learner’s

Copyright © 2009, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

Tactical Language and Culture Training Systems:

Using AI to Teach Foreign Languages and Cultures

W. Lewis Johnson and Andre Valente

n The Tactical Language and Culture TrainingSystem (TLCTS) helps people quickly acquirecommunicative skills in foreign languages andcultures. More than 40,000 learners worldwidehave used TLCTS courses. TLCTS utilizes arti-ficial intelligence technologies during theauthoring process and at run time to processlearner speech, engage in dialogue, and evaluateand assess learner performance. This paperdescribes the architecture of TLCTS and theartificial intelligence technologies that itemploys and presents results from multipleevaluation studies that demonstrate the bene-fits of learning foreign language and cultureusing this approach.

Page 2: Tactical Language and Culture Training Systems

Articles

SUMMER 2009 73

Figure 1. Example Skill Builder Exercise.

level of mastery of these skills. This helps instruc-tors and training supervisors to monitor learners’progress and enables the software to guide learnersto where they need to focus their training effort.Artificial intelligence is also integrated into the sys-tems’ content authoring tools, assisting contentauthors in the creation and validation of instruc-tional content.

System OverviewTLCTS courses are currently delivered on personalcomputers, equipped with headset microphones.Each course contains a set of interactive SkillBuilder lessons, focusing on particular commu-nicative skills. Figure 1 illustrates an exercise pagefrom the Skill Builder of Encounters: Chinese Lan-guage and Culture, a college-level Chinese coursebeing developed in collaboration with Yale Uni-versity and Chinese International PublishingGroup. The figure is an example of a minidialogue

exercise where the learner practices a conversa-tional turn in the target language. Here the learn-er must think of an appropriate way to say hisname. His response, “Wo xìng Lǐ Dàwei” (My fam-ily name is Li Dawei) exhibits a common mistakemade by beginning language learners: to confusethe semantically similar words “xìng” (familyname is) and “jiào” (name is). A correct responsewould be “Wo jiào Lǐ Dàwei.” The virtual tutor(bottom left) gave appropriate corrective feedback.The spoken conversational system recognized theutterance (bottom center), and was able to recog-nize the error in the utterance, so that the virtualtutor can respond accordingly. Note that in exer-cises there is no single correct answer: any utter-ance that correctly conveys the intended meaningwill be accepted. Depending on the type of exer-cise, the system can give feedback on pronuncia-tion, morphological and grammatical forms, cul-tural pragmatics, or word choice, as in thisexample.

Page 3: Tactical Language and Culture Training Systems

Games play an essential role in TLCTS courses,providing essential practice opportunities. Eachcourse incorporates a scenario-based MissionGame, where learners play a character in a three-dimensional virtual world that simulates the targetculture. Figure 2 shows a screenshot from the Mis-sion Game in Tactical Dari, currently being used byU.S. military service members to learn the Dari lan-guage and Afghan culture in preparation fordeployment to Afghanistan. Here the player (left)is engaged in a meeting with a village leader (themalek) to discuss reconstruction plans for a localschool. A transcript of the conversation to thispoint is shown in the top center of the screen.

The learner speaks to the malek by first clickingthe microphone icon (top right) and then speak-ing into a headset microphone. He or she is free todiscuss a range of topics (far top left) and expresseach topic in a range of ways (top left, below).Although this screen shot is displaying help menusthat make these choices explicit, in ordinary use

these choices are hidden, and the learner is encour-aged to engage in free spoken conversation in theforeign language in the topic at hand. This con-trasts radically with typical uses of speech recogni-tion in other language-learning systems, whicheither do not support spoken dialogue at all orpresent learners fixed sets of choices to read off ofthe screen.

The Alelo ArchitectureThe architecture underlying TLCTS (figure 3) sup-ports a set of complementary training productsbuilt from a common set of content specifications.All content in TLCTS courses is specified in Exten-sible Markup Language (XML) and stored in a web-compatible content repository. A web portalnamed Kona1 provides access to the repository andsupports collaborative editing. Kona supports acollection of web-based authoring tools thatenable authors to edit and modify content specifi-

Articles

74 AI MAGAZINE

Figure 2. Example Mission Game Dialogue.

Page 4: Tactical Language and Culture Training Systems

Articles

SUMMER 2009 75

Learners Instructors

Data

ExportKahu

(Dashboard / LMS)

Hilo(lessons)

Tide(dialog)

Wave(audio)

Hua(language model)

Waihona(libraries)

Paheona(art assets)

Huli(search / references)

Authoring Tools

HoahuUser / Trainee

Data Warehouse

Content Repository(Multimedia, XML, Databases)

Kona(Server Framework / Portal)

Kapili(Build System)

Lapu(Unreal client)

Honua (Social Simulation)

Authors /

Artists /

Production /

Programmers

ResearchersTrainee Clients / GUIs

Keaka(Multiplayer client)

Wele(web client)

Uku(handheld client)

Figure 3. Overall Alelo Architecture.

cations using their web browsers. The authoringtools are integrated in the portal, but each toolapplies to a different subset of the content. Onetool (Hilo) edits lesson content, while another(Tide) edits dialogue specifications within lessonsand game episodes. Production staff edit and man-age speech recordings using the Wave audio tool.The Hua editor is responsible for managing thelanguage model describing the target language:words and phrases, their spellings, phonetic tran-scriptions, reference recordings, and translation.Nonverbal gestures and character models aredefined using Waihona and Paheona, respectively.

Each project is represented in Kona as a “book,”containing a series of “chapters” (each specifyingan individual lesson or game scene). Project mem-

bers can check out individual chapters (that is, lockthem for editing), update them using web-basedauthoring tools, and then check them back in. Anunderlying version-control framework (Subver-sion) prevents simultaneous edits and allows roll-back to earlier versions if necessary. Our goals havebeen to make the authoring cheaper, faster, better,and easier and to eliminate the need for engineersand researchers to be involved in authoring andproduction.

The architecture supports several delivery plat-forms. Our Lapu client is the first client that wedeveloped, and is still the most widely used. Lapuruns on a Windows PC equipped with a headsetmicrophone. Lapu is built on top of Epic Games’Unreal Engine 2.5, which handles scene rendering

Page 5: Tactical Language and Culture Training Systems

and provides a set of user interface classes. Figure 4shows the Lapu architecture in further detail,focusing on the support for social simulations (dia-logues with animated characters). The learner com-municates with characters in the game, using voiceand gestures selectable from a menu. The inputmanager interprets the speech and gestural inputas a communicative act (that is, speech acts aug-mented with gestures). The social simulationengine (Honua) determines how the environmentand the characters in it respond to the learner’sactions. The character actions are realized by theaction scheduler, which issues animation com-mands to the unreal engine. As the learner inter-acts with the system, the learner model is updatedto provide a current estimate of learner skill mas-tery. Interaction logs and learner speech recordingsare saved for subsequent analysis. These compo-nents are implemented primarily in Python, withsome supporting routines in C++ and UnrealScript(the scripting language for the unreal engine).

Reliance on UnrealScript is minimized to facilitateporting to other game engines.

We have recently developed a multiplayerextension (Keaka) that enables multiple learners tointeract with the virtual world at the same time.Each learner interacts with one character at a time,can overhear the conversations of other learners,and can also talk to other learners through voiceover IP. Keaka is being used to support missionrehearsal exercises, in which teams of learnersmust speak with nonplayer characters in order togather information necessary to complete theirmission.

A new web-based client named Wele is alsobeing increasingly used to deliver content. Welecurrently supports the Skill Builder, and we areextending it to include Mission Game implemen-tations. Wele is accessed through a web browserand is implemented in the Adobe Flex2 Internetapplication framework. Wele runs the speech rec-ognizer on the client computer as a plug-in. Inter-

Articles

76 AI MAGAZINE

Scenario Logic

Agents

Input Manager

ActionScheduler

Mission Engine

Simulated Game World (Game Engine)

Social Stimulation Engine

InteractiveSocial Simulation

Learner Model

Learner

Skills Model

SpeechRecognizer

VerbalBehavior(speechsound)

UtteranceHypothesis

Behaviorschedules

Informationabout actionstatus and world state

BehaviorInstructionsfor gameagents

LearnerAbility

Skills /Missions

Worldstate

ParameterizedCommunicative Act

NonverbalBehavior(gestures

etc.)

Other control actions

System events Audio Files

RecordingsLogs

Figure 4. Lapu Client Simulation Architecture.

Page 6: Tactical Language and Culture Training Systems

Articles

SUMMER 2009 77

active dialogues and animations are realized asFlash files inserted into a web page and are con-trolled at run time using finite state machines.

Handheld device implementations (Uku) allowtrainees to continue their training when they donot have access to a PC. One version of Uku cur-rently under beta test delivers TLCTS Skill Buildercontent on iPods. Media assets (instructional texts,recordings, images, and videos) are extracted fromthe repository, converted into iPod-compatible for-mats, and organized using the iPod Notes feature3.Example dialogues between animated charactersare converted into videos. We developed a proto-type for the Sony PlayStation Portable, and we areevaluating other handheld platforms. All clientsexcept for the iPod provide interactive exercises. Incases where we are able to port the speech recog-nizer to the handheld device, the client imple-mentation utilizes the speech recognizer; for plat-forms that cannot run the speech recognizereffectively, speech-recognition input is replacedwith alternative methods, such as selection frommenus of possible utterances.

A lightweight learning management systemcalled Kahu communicates with the client com-puters over a local network. Our users frequentlyorganize their computers in learning labs using adhoc local networks (using Windows shares), thenmove their computers around and reorganize themin different configurations. Kahu provides andmanages repositories for learner profiles and sup-ports easy reconfiguration and disconnected oper-ation of these labs. The system provides mecha-nisms for an instructor or training manager tocreate and configure users and groups of users andproduce progress reports. We also use it to helpretrieve learner profiles, system logs, and record-ings from learning labs and store them in a datawarehouse, where they may be retrieved for furtheranalysis. The recordings are used to retrain andimprove the speech recognizer, while the logs andlearner profiles are used in research to understandhow the users employ the learning environmentand to suggest areas for further improvement ofthe software.

Uses of AI TechnologyArtificial intelligence technologies are employedextensively in the Alelo run-time environmentsand authoring tools. The following is a summary ofthe roles that artificial intelligence currently plays,starting with the run-time environment. However,in practice, authoring concerns and run-time con-cerns are inextricably linked; run-time processingmethods cannot be employed if they place anunacceptable burden on authors to create contentfor them.

AuthoringOur authoring tools are designed to help authorscreate the rich content representations required byour AI-based system, and perform AI-based pro-cessing themselves. For example Hilo, our lessonauthoring tool, supports rich utterance representa-tions for speech and natural language processing.Utterances are modeled in different channels—thenative orthography of the foreign language, an“ez-read” transliteration that is intended as a pho-netic transcription in Roman characters, aphoneme sequence used by the speech recognizer,and an English translation. Each utterance is alsolinked to the language model library so it can becentrally managed.

For some languages we augment the authoringtools with tools that translate between channels.Chinese and French provide two contrastingexamples of how this is done. In Chinese, authorsspecify the written form in traditional Chinesecharacters and specify the ez-read form in Pinyin.Then the run-time system translates automaticallyfrom traditional characters to simplified charactersas needed and translates from Pinyin to phonemesequence for speech recognition purposes. ForFrench we provide authors a tool that generatesphoneme sequences from written French and thengenerates an ez-read view from the phonemesequences. French spelling is highly irregular, sothe phoneme sequence generator is not fully auto-matic. Instead the phoneme sequence generatorproposes alternative pronunciations, based on alexicon of French words, and lets the authorchoose the correct phoneme sequence.

Our Tide authoring tool provides several editingfunctions and analysis tools to specify interactivedialogues. Authors specify the utterances that mayarise in the course of a conversation, their meaningin terms of acts, and the possible interactionsbetween acts in the conversation. Tide supportsseveral dialogue models, from simple scripts tobranched stories to agent-centered models. It pro-vides graphical tools with a custom vocabulary ofabstractions to make interaction authoring as easyas possible. From the specifications, it automati-cally generates program code, which can be incor-porated into the TLCTS clients. Finally, Tide pro-vides a model-checking tool for testing dialoguemodeling codes against the specification. Thismakes it possible for authors to produce a substan-tial portion of the dialogue modeling work them-selves.

Our Hua tool manages the language model,which contains the library of words and utterancesthat are taught in the course. It interoperates withHilo and Tide as needed so that as authors editcontent they can maintain a consistent languagemodel throughout. It also provides for differentroles: content developers can add utterances quick-

Page 7: Tactical Language and Culture Training Systems

ly but that content is later verified and correctedby more senior language experts playing the roleof “librarian.” This helps to eliminate errors andreduces the role of natural language processingspecialists in creating and maintaining linguisticresources.

These authoring tools have enabled us toincrease the quality and quantity of authored con-tent and have made it possible to maintain multi-ple versions of content for different classes oflearners and different platforms. Table 1 showssome of the productivity improvements that thesetools are yielding. It compares Tactical Iraqi 3.1,developed in 2006, Tactical Iraqi 4.0, developed in2007, Tactical French 1.0, also developed in 2007,and Tactical Dari 1.0, developed in 2008. Theauthoring tools doubled the number of lessonpages, vocabulary words, dialogues, and MissionGame scenes in Tactical Iraqi 4.0 vis-à-vis 3.1.Development of Tactical French 1.0 did not startuntil late 2006, yet it has about the same amountof Skill Builder material as Tactical Iraqi. TacticalDari 1.0 was developed in less time and on a con-siderably smaller budget than Tactical French, andyet it is comparable in size (vocabulary is smallerbecause it is a much more complex language).

Speech RecognitionTLCTS is particularly ambitious in its reliance onspeech-recognition technology. Recognition oflearner speech is particularly demanding and chal-lenging. Beginning language learners sometimesmispronounce the foreign language very badly andblame the software if it is unable to recognize theirspeech. The speech recognizer also needs to be verydiscriminating at times and accurately detectspeech errors to provide feedback to learners. Typ-ical speech recognizers, in contrast, are designed todisregard speech errors and focus on decodingspeech into text. Our approach must apply equal-ly to common languages such as French and to lesscommonly taught languages such as Pashto orCherokee, for which few off-the-shelf speech-

recognition resources exist. For these reasons werejected prepackaged speech-recognition solutionsand opted to develop our own tailored speech-recognition models.

We employ hidden Markov acoustic modelsdeveloped using the hidden Markov model toolkit(HTK)4 and delivered using the Julius5 open sourcespeech-recognition toolkit. Models are developediteratively in a bootstrapping process. We constructan initial acoustic model from a combination oflicensed corpora, an initial corpus of learnerspeech, and sometimes corpus data from similarlanguages and integrate it into the first beta ver-sion of the TLCTS course. We then recruit betatesters to start training with the software. It recordssamples of their speech, which we use to retrainthe speech recognizer. Each successive version of aTLCTS speech recognizer has improved perform-ance over the previous versions because it istrained on speech data collected from the previousversions.

The speech recordings must be annotated toindicate the phoneme boundaries. Speech annota-tion is time-consuming but has a critical effect onthe quality of the resulting acoustic model. Corpo-ra licensed from other sources often have annota-tion errors and need to be reannotated.

Each language model includes a grammar-basedmodel built from phrases extracted from theauthored content and a default “garbage model”constructed with a fixed number of acoustic class-es, states, and probability mixtures. If the learnerspeaks an utterance that is in the recognitiongrammar, the grammar-based model will usuallyrecognize it with higher confidence than thegarbage model does, and if the utterance is out ofgrammar the garbage model will recognize it withhigher confidence. The garbage model rejectsmany utterances that the learners know are incor-rect, and forces learners to speak the language withat least a minimum level of accuracy. By adjustingthe properties of the garbage model we can adjustthe recognizer’s tolerance of mispronouncedspeech.

Language learners normally produce severaltypes of errors in their speech, including pragmat-ic errors (for example, failure to adhere to culturalnorms in discourse), semantic errors (for example,word confusions), syntactic errors, morphologicalerrors, and pronunciation errors. Where possiblewe use the speech recognizer to explicitly detectthese errors, using language models that aredesigned to detect language errors. Pronunciationerror detection is handled as a special case. Earlierversions of TLCTS attempted to detect pronuncia-tion errors on a continual basis in the Skill Builder(Mote et al. 2004). However, evaluations identifiedproblems with this approach: it was difficult todetect pronunciation errors reliably in continuous

Articles

78 AI MAGAZINE

Content Metric Tactical Iraqi 3.1

Tactical Iraqi 4.0

Tactical French 1.0

Tactical Dari 1.0

Lessons 35 52 40 51

Lesson pages 891 2027 1920 2139

Words 1072 2214 1820 1199

Example dialogues 42 85 67 44

Active dialogues 13 29 36 31

Table 1. Content Size of TLCTS Courses.

Page 8: Tactical Language and Culture Training Systems

Articles

SUMMER 2009 79

speech (leading to user frustration), and the con-tinual feedback tended to cause learners to focuson pronunciation to the exclusion of other lan-guage skills. We have since adopted a differentapproach where the system does not report specificpronunciation errors in most situations, butinstead provides a number of focused exercises inwhich learners practice particular speech soundsthey have difficulty with.

Speech-recognition performance depends inpart on the authored content, and so we havedeveloped authoring guidelines and tools thatenable content authors to create learning materialssuitable for speech recognition without requiringdetailed knowledge of the workings of speech-recognition technology. When authors write dia-logues and exercises, they specify alternative waysof phrasing these utterances, some of which arewell formed and some of which may illustratecommon learner errors. These are used to createrecognition grammars that recognize learnerspeech as accurately as possible. Authors can markphrases as “ASR-hard,” meaning that the automat-ed speech recognizer (ASR) should not be appliedto them and they should be excluded from thespeech-recognition grammar. Since speech-recog-nition accuracy tends to be better on longer phras-es, individual words (which sometimes appear onlesson pages) are sometimes excluded in this way.

We have used learner data to evaluate speech-recognition performance for the current languagesand obtained recognition rates of 95 percent orabove for in-grammar utterances. However thetraining data is skewed towards male users andbeginner users (reflecting our current user popula-tion), which makes the performance degrade forfemale and advanced users. Learners with a varietyof accents, including regional American accentsand foreign accents, have used the system andhave obtained adequate recognition rates.

Despite these positive results, users have raisedissues about the speech-recognition function. Forexample, some users complained that the recog-nizer did not recognize what they thought werevalid utterances. These problems can be caused byseveral reasons. First, some acoustic models (forexample, for French) were built using proportion-ally more native data (than nonnative) and as suchare less forgiving of mispronounciations. Otherusers complain that they found the ASR toolenient. We have responded to these problems inpart by allowing the learner to individually adjustthe garbage model (see above) to make it appearmore or less lenient of mistakes. A second reasonfor these problems is that the ASR uses grammarsthat change for each specific part of the system.These grammars are compiled from existing con-tent, and thus reflect whatever subset of the lan-guage we are teaching. Therefore, a native or fluent

speaker usually manages to produce an utterancethat is not recognized because it is outside of thesystem’s grammar. We have been working on tech-niques to extend these grammars while maintain-ing recognition rates (Meron, Valente, and John-son 2007). Finally, many complaints assumedunrealistic expectations as to the type of noise thesystem could sustain while keeping recognitionrates. A user sent us some system recordings with aloud TV program in the background and still com-plained that the recognizer was not working well.In response, we developed a speech quality assess-ment module that gives the learner feedback aboutpossible indications of poor voice recording—lev-els too low or too high, excessive periods of silence,noise, or indications that the beginning or end ofthe recording was clipped. This is displayed first asthe background color on a small waveform dis-play—red for poor recording quality, green forgood, yellow for marginal. If a user clicks that dis-play, a larger window appears that compares thelearner’s speech with the native speaker’s.

Modeling DialogueOne key challenge for the dialogues in TLCTS is tomanage the variability of the spoken inputs alearner may produce. On one hand, we wish toattain high variability, meaning that the systemshould recognize, properly interpret, and react to alarge subset of the language. On the other hand,we wish to author interactions that train specificlanguage and culture skills, not unconstrained dia-logue. Furthermore, increasing the size of the lan-guage subset makes speech recognition and inter-pretation much harder. Our strategy to balancethose needs is to manage variability in two layers.In the speech layer, we constrain the speech to berecognized in such a way that we can limit the pos-sible utterances and improve ASR performance asdescribed above. In the act layer, we manage thepossible acts the user can perform and the map-ping from speech to acts.

We originally created linear scripts andbranched scripts and annotated each utterancewith an act name. We then manually translatedthe text scripts into hard-coded programs. Thistranslation included the logic for plot progressionand character control, as well as the mapping fromuser input (the text output by the ASR) to actionsin the game world (in the form of a list of input-action pairs). However, we found out that scriptswere too restrictive a medium in that they limitedthe act variability. We could improve the situationby manually programming more complex algo-rithms in software code, but that approachrequired a programmer to complete the scene(which was both expensive and nonscalable), andit was impossible to verify whether the code wasconsistent with the author’s intent.

Page 9: Tactical Language and Culture Training Systems

In our current methods, authors write specifica-tions of dialogue interactions at the act level. Toincrease the variability of dialogue that the frame-work supports, we introduced utterance templatesinto the dialogue specifications. An utterance tem-plate is a combination of a more flexible grammardefinition syntax (which allows increased variabil-ity in user inputs) with additional syntactic con-structs used to specify semantic content allowingparameterized utterances (which increases vari-ability in possible responses). The grammar ispassed directly to the speech recognizer, and whenan utterance is parsed it comes with semanticannotation that indicates information such as thetype of action or action-object specified in theutterance. We have developed tools maintainingthese utterance templates and have been workingon mechanisms for managing libraries of utterancetemplates (Meron, Valente, and Johnson 2007).

Believable Virtual HumansA central feature of our approach is to put the userin a social simulation with nonplayer characters(NPCs)—virtual humans. Our original virtualhuman behavior generation pipeline was relative-ly simple. An input from the player would be pro-vided as an abstract act (say, greet_respectfully) to anagent implemented in PsychSim (Si, Marsella, andPynadath 2005). The agent system would specifyan act for its NPC to perform (for example,inquire_identity). An XML file would specify theutterance and any animations that needed to beperformed to realize that act; sometimes also asmall script in Python would be called to performmore complex physical actions (say, walk over aspecific place). More recently, we have adopted avariant of the SAIBA framework (Vilhjalmsson andMarsella 2005), which separates intent planning(the choice of courses of action—to complain,cooperate, make a request, and so on—that areadequate to the situation at hand) from the pro-duction of believable physical behavior.

We originally used the PsychSim agent system toperform intent planning (Si, Marsella, and Pyna-dath 2005). While PsychSim is highly expressive,its agents are extremely hard to author and com-putationally complex. The modeling language isnot well suited to represent dialogues or worldknowledge. We then turned to a finite-statemachine approach that authors can use to specifymoderately complex scenarios and that has provedscalable to large numbers of dialogues (for exam-ple, Tactical French has more than 50). We devel-oped additional idioms to help us author certainconversation patterns. For example, we use range-qualified transitions to implement a simple formof the “beat” concept as used in Façade (Mateasand Stern 2005). This helped us more easily speci-fy bits of dialogue that can be used within certain

phases of the story but do not directly change thestory state beyond indirect effects such as increas-ing rapport. We are now designing a third genera-tion of intent planning called LightPsych, which isan integrated framework where agents can operateat any of four levels—self (explicitly model beliefs,short- and long-term memory, and so on), culture(model and reason with politeness, and socialnorms), physical and social environment and dia-logue (understand dialogue openings and closings,requests, turn-taking, negotiations, and the mean-ing of utterances). A key challenge for LightPsychis to make it easy to create and reuse agent-behav-ior specifications and handle the most commoninteractions between trainees and NPCs but also togive authors tools to add to and modify these spec-ifications to handle the aspects that are unique toeach scenario.

A critical issue is the representation of rich com-municative acts. We started representing acts asstrings that encoded some of the parameters need-ed by PsychSim agents. However, to accurately rep-resent the richness of conversational speech, weneed a richer representation for speech acts. Welooked at frameworks such as FrameNet (Baker,Fillmore, and Lowe 1998) before creating our ownontology of communicative acts based on thetypology proposed by Traum and Hinkelman(1992) and extended with new acts (for example,offering greetings, thanks, support, and so on)based on the work by Feng and Hovy (2006). Wealso added representations of conversational con-text, such as the characteristics and roles of inter-locutors, and the state of the world in which theconversation takes place. Our new act representa-tion format extends the functional markup lan-guage (FML) but extends it in a number of ways(Samtani, Valente, and Johnson 2008).

With respect to the generation of animatedbehavior, we have found that our initial hand-cod-ed solution was optimally flexible but hamperedour efforts to lower costs and speed development.Further, these scripting or motion captureapproaches may work if the social context is fixedand known ahead of time but break down if thesocial environment is dynamic. Therefore, westarted working on automating the production ofbehavior animation from intent and are exploringthe use of representations such as the behaviormarkup language (BML).6

Learner ModelingAs learners use TLCTS courses, it is important totrack whether the learners are making progresstoward learning objectives. Evidence for progresscan come from multiple sources: the trainees’ per-formance on games and quizzes, their performancein dialogues, and their performance generally inthe TLCTS games. TLCTS records the learners’ per-

Articles

80 AI MAGAZINE

Page 10: Tactical Language and Culture Training Systems

Articles

SUMMER 2009 81

formance in quizzes and dialogues, but alsoattempts to estimate the learners’ mastery of keyskills, using a technique inspired by the model-tracing technique of Corbett and Anderson (1995).Authors annotate exercises, quiz items, and dia-logue exchanges to indicate the skills that theyemploy. Skills are drawn from a taxonomy ofenabling learning objectives, encompassing lin-guistic skills, cultural skills, and task skills. Eachcorrect or incorrect use of a given skill providesprobabilistic evidence of mastery of that skill. Thisevidence is uncertain because learners may guessan answer or slip and make an unconscious mis-take, or the speech recognizer may misinterpretthe learner’s response. However, after a relativelyshort amount of training time the learner model isusually able to correctly identify the individualskills that the trainee has mastered.

This learner model provides learners, trainers,and researchers with a rich view of traineeprogress. We plan to use it as a basis for automatedprovision of training guidance, to advise learnersabout where they should focus their efforts, andwhen to adjust the amount of scaffolding toincrease or decrease the level of challenge of thegame. We also plan to use it as a starting point forestimating learner progress toward achievinglonger-term learning objectives such as generalspoken language proficiency.

Application Development and Deployment

The TLCTS system developed over several years,starting as a research project at the University ofSouthern California (USC) Information SciencesInstitute and later at Alelo. The original project atUSC spanned about four years starting in 2003,and its continuation at Alelo started in 2005. Costsfor the development of the underlying science andtechnology have been around $5 million to date;costs for the engineering work to make it deploy-able have been more than $1 million, and coursedevelopment costs have gone down from the$800,000 range to $300,000–$600,000, dependingupon the target language and culture. This was firstfunded by Defense Advanced Research ProjectsAgency (DARPA) and U.S. Special Operations Com-mand (USSOCOM) and more recently by U.S.Marine Corps, Army, and Air Force and the Aus-tralian Defence Forces.

Throughout the project we have adopted an iter-ative development strategy. The goal is to producenew versions of the system every several months,deploy and evaluate these new versions, and usethe evaluation results to inform the next spiral. Webelieve that research and development benefit fromuse in practice and that each real-life testing stageprovides critical feedback. Indeed, we believe that a

good deal of the success of TLCTS has stemmedfrom acting upon the feedback from our users.

Some of the key development challenges havebeen related to delivery. The transition from aresearch prototype to a widely used applicationrequired significant engineering effort to developgood installers, integrate the components moreclosely, improve performance, add user interfacepolish, and so on. The early version of the archi-tecture was intended as a platform for experimen-tation (for example, using agent-oriented integra-tion mechanisms), but the architecture then hadto be simplified and streamlined to be stable andfast enough for a deployed system. At severalpoints we had to sacrifice flexibility for perform-ance, stability, and ease of use. For example, theoriginal implementation of the Skill Builder wasbuilt on top of a separate courseware platform(ToolBook), but users found it cumbersome toswitch back and forth between separate SkillBuilder and Mission Game applications. We there-fore converted the Skill Builder to run on top of thegame engine, which was unfortunately less flexi-ble—for example, only later were we able to recov-er the ability to play videos in our lessons.

Just as we have restructured and improved thesystem architecture, we have progressivelyimproved the curricula that TLCTS supports. Thishas often involved incremental revisions of sub-stantial bodies of learning content, as well as therepresentations of that content. We have thereforeinvested significant effort to developing authoringand conversion tools (see the section on Author-ing) to support this maintenance process. Our useof XML schemas and translators has greatly facili-tated this evolution in representations.

We have come to recognize the collaborativenature of content development in the TLCTSmode. The original implementation of an author-ing tool for TLCTS was based on a concept of astand-alone desktop tool used by one author at atime—an unrealistic assumption. The latest itera-tions have embraced the idea of a web applicationand emphasized features to manage the collabora-tion process and cater to the specific needs of thedifferent types of users (linguists, artists, produc-tion staff, programmers, and so on.)

Application Use and PayoffTactical Iraqi was first deployed in June of 2005.Since then, several expanded and improved ver-sions have been released, and additional courseshave been developed. Courses are available in mul-tiple versions for different user communities: forU.S. Marine Corps, U.S. Army, and non-U.S. mili-tary forces and civilian aid workers.

The courses are used by individuals training ontheir own and as part of organized training pro-

Page 11: Tactical Language and Culture Training Systems

grams. Anyone with a .mil e-mail account can reg-ister and download copies either for their own useor for installation in computer labs. In January2008, a typical month, there were 910 downloadsof Tactical Iraqi, 115 downloads of Tactical Pashto,and 146 downloads of Tactical French.

Patterns of usage depend upon availability oftraining time and suitable computers. Many learn-ers who download copies for their own use studythem to completion; depending upon the course,this can require 200 or more hours of training. Formilitary unit training, 20 to 40 hours of trainingare more the norm due to conflicting demands ontraining time. Some units combine TLCTS trainingwith classroom training, whereas others rely exclu-sively on TLCTS for their language and culturetraining.

Several independent evaluations of Tactical Iraqiand Tactical Pashto have been performed by theU.S. military, the Canadian Forces, and the Aus-tralian Defence Force. Results from several rigorousevaluations have been reported in Surface andDierdorff (2007) and Johnson and Wu (2008). Sur-face and Dierdorff studied several subject groups:268 military advisors training at Fort Riley, Kansas;113 members of the Seventh Marine Regiment;and 8 trainees at the Defense Language InstituteForeign Language Center (DLIFLC). All groupstrained for a total of 40 hours, either with TacticalIraqi alone or a mixture of Tactical Iraqi and class-room instruction. All showed significant gains inknowledge of Arabic language and culture andgreater self-confidence in communicating in Ara-bic. The greatest learning gains were achieved bythe DLIFLC group, which trained exclusively withTactical Iraqi and followed our recommended pro-gram of instruction. Six out of 8 participantsachieved an ILR proficiency level of 0+ after 40hours of training.

The marines in the Seventh Marine Regimentwere subjects of a Marine Corps Lessons Learnedstudy. The Third Battalion, Seventh Marines (3/7Marines) returned from their most recent tour of

duty in Iraq in December of 2007. The battalionhad assigned two members of each squad to under-take 40 hours of Iraqi Arabic language and culturetraining prior to deployment. The experience ofthe 3/7 Marines was particularly noteworthybecause the battalion did not suffer a single com-bat casualty during its tour of duty. To understandthe role of Tactical Iraqi in the Seventh Marines’success, members of the two battalions were askedto complete questionnaires and the officers incharge of the 3/7 were interviewed. The question-naires are still being tabulated, but transcripts ofthe officer interviews are available, and their com-ments are strikingly positive. The marines whotrained with Tactical Iraqi were able to performmany communicative tasks on their own, withoutreliance on interpreters. This enhanced the battal-ion’s operational capability, enabled the battalionto operate more efficiently, and resulted in betterrelations with the local people. This provides indi-rect evidence that Tactical Iraqi contributed to aKirkpatrick level 4 training effect (that is, impacton job performance) (Kirkpatrick 1994). Follow-onassessments of the 3/7 Marines’ language retentionare planned for this year.

Future WorkWe continue to improve TLCTS based on experi-ence gained with earlier versions of TLCTS courses.Current work includes broadening the architectureto support reading and writing skills and deepen-ing the curricula and platform to help learnersattain higher levels of language proficiency. Lan-guage proficiency training is particularly challeng-ing because it helps trainees get to the point wherethey can construct and understand new sentencesin the target language, which complicates speechprocessing.

We continue to develop our web-based Weleclient as an option for learners with less powerfulcomputers. The increased ease of installation anduse will hopefully compensate for the reduced lev-

Articles

82 AI MAGAZINE

Kona big island Honua world

Hilo a Polynesian navigator Keaka theater

Hua (hua’olelo)

word Wele (puna welewele) (spider) web

Waihona library Uku (ukulele) flea

Lapu ghost Paheona art

Huli search Hoahu warehouse

Kahu administrator

Glossary of Hawaiian Terms.

Page 12: Tactical Language and Culture Training Systems

Articles

SUMMER 2009 83

Vilhjalmsson, H., and Marsella, S., 2005.Social Performance Framework. In ModularConstruction of Humanlike Intelligence:Papers from the 2005 AAAI Workshop.Technical Report WS-05-08, Association forthe Advancement of Artificial Intelligence,Menlo Park, CA.

W. Lewis Johnson is cofounder, president,and chief scientist of Alelo Inc. Prior to thathe was a research professor in computer sci-ence at the University of Southern Califor-nia/Information Sciences Institute. Alelorealizes his vision to promote the learningof foreign languages and cultural compe-tency worldwide. Alelo’s game-based learn-ing environments are in widespread use bymilitary trainees in the United States andother countries. This work has been recog-nized by multiple awards, including the2005 DARPATech Significant TechnicalAchievement Award, the 2007 I/ITSEC Seri-ous Games Challenge, and the 2008 LosAngeles Technology Council Award. Alelois now partnering with Yale University todevelop integrated suites of learning mate-rials for Chinese and other languages andis developing web-based learning materialsfor distribution worldwide by Voice ofAmerica. Johnson holds an A.B. in linguis-tics from Princeton University and a Ph.D.in computer science from Yale University.He is a member of the steering committeesof the International Artificial Intelligencein Education Society, the InternationalConference on Intelligent User Interfaces,and the International Foundation forAutonomous Agents and Multi-Agent Sys-tems.

Andre Valente is cofounder and CEO ofAlelo Inc. Alelo’s game-based learning envi-ronments are in widespread use by militarytrainees in the United States and othercountries. This work has been recognizedby multiple awards, including the 2005DARPATech Significant Technical Achieve-ment Award, the 2007 I/ITSEC SeriousGames Challenge, and the 2008 Los Ange-les Technology Council Award. Prior toAlelo, Valente worked as an executive forsoftware startup companies, managed soft-ware development, and consulted for busi-nesses in the software, manufacturing,media, and aerospace areas. He also workedas a research scientist at the University ofSouthern California. Valente received aPh.D. in computer science from the Uni-versity of Amsterdam and an MBA from theUniversity of Southern California. He haspublished three books and more than 50technical articles on knowledge manage-ment, knowledge systems tools. and busi-ness process management.

el of three-dimensional rendering andinteraction that will result from theconstraints of web applications.

We are adapting TLCTS virtual-human technologies so they can beintegrated into mission rehearsaltraining simulations. The trainingsimulations will be populated withnonplayer characters that speak thetarget language. This poses new chal-lenges for authoring since militarytrainers with no special technicaltraining will create their own virtualworlds and populate them with non-player characters.

We continue to develop new cours-es and develop pilot versions of futurecourses. A Chinese course for collegeand high school Chinese programs iscurrently being developed in collabo-ration with Yale University. A pilotCherokee game, intended to help pre-serve Native American language andculture, is being developed in collabo-ration with Thornton Media, Inc.

AcknowledgmentsThis work is funded in part by theDefense Advanced Research ProjectsAgency (DARPA), U.S. Marine CorpsPM TRASYS, U.S. Army RDECOM,USS OCOM, U.S. Air Force, DLIFLC,and Yale University. Opinions ex -pressed here are those of the authors,not the U.S. government.

Notes1. Many of the architecture componentshave Hawaiian names. Hawaii, a culturalcrossroads, is evocative of Alelo’s missionto promote cross-cultural interchange. Aglossary of Hawaiian terms is included inthis article.

2. www.adobe.com/products/flex/

3. developer.apple.com/ipod

4. htk.eng.cam.ac.uk

5. julius.sourceforge.jp/en_index.php

6. www.mindmakers.org/projects/BML

ReferencesBaker, C.; Fillmore, C.; and Lowe, J. 1998.The Berkeley FrameNet Project. In Proceed-ings of the 36th Annual Meeting of the Associ-ation for Computational Linguistics and 17thInternational Conference on ComputationalLinguistics. San Francisco: Morgan Kauf-mann Publishers.

Corbett, A., and Anderson, J. R. 1995.Knowledge Tracing: Modeling the Acquisi-

tion of Procedural Knowledge. User Model-ing and User-Adapted Interaction 4(4): 253–278.

Feng, D., and Hovy, E. 2006. Learning toDetect Conversation Focus of ThreadedDiscussions. Proceedings of the HumanLanguage Technology/North AmericanAssociation of Computational LinguisticsConference (HLT-NAACL 2006). New York:Association for Computing Machinery.

Johnson, W. L., and Wu, S. 2008. AssessingAptitude for Language Learning with a Seri-ous Game for Learning Foreign Languageand Culture. Proceedings of the Ninth Inter-national Conference on Intelligent TutoringSystems. Lecture Notes in Computer Science5091. Berlin: Springer.

Kirkpatrick, D. L. 1994. Evaluating TrainingPrograms: The Four Levels. San Francisco:Berrett-Koehler.

Mateas, M., and Stern, A. 2005. StructuringContent in the Façade Interactive DramaArchitecture. In Proceedings of the First Arti-ficial Intelligence and Interactive Digital Enter-tainment Conference. Menlo Park, CA: AAAIPress.

Meron, J.; Valente, A.; and Johnson, W. L.2007. Improving the Authoring of ForeignLanguage Interactive Lessons in the Tacti-cal Language Training System. Paper pre-sented at the SLaTE Workshop on Speechand Language Technology in Education,Farmington, PA, 1–3 October.

Mote, N.; Johnson, W. L.; Sethy, A.; Silva,J.; and Narayanan, S. 2004. Tactical Lan-guage Detection and Modeling of LearnerSpeech Errors; The Case of Arabic TacticalLanguage Training for American EnglishSpeakers. Paper presented at theInSTIL/ICALL 2004 Symposium on Com-puter Assisted Learning, Venice, Italy, 17–19 June.

Samtani, P.; Valente, A.; and Johnson, W.L. 2008. Applying the SAIBA Framework tothe Tactical Language and Culture TrainingSystem. Paper presented at the First Func-tional Markup Language Workshop,AAMAS 2008, Estoril, Portugal, 13 May.

Si, M.; Marsella, S. C.; and Pynadath, D.2005. THESPIAN: An Architecture for Inter-active Pedagogical Drama. Proceedings of theTwelfth International Conference on ArtificialIntelligence in Education. Amsterdam, TheNetherlands: IOS Press.

Surface, E., and Dierdorff E. 2007. SpecialOperations Language Training Software Meas-urement of Effectiveness Study: Tactical IraqiStudy Final Report. Tampa, FL: U.S. ArmySpecial Operations Forces Language Office.

Traum, D., and Hinkelman, E. 1992. Con-versation Acts in Task-Oriented SpokenDialogue. Computational Intelligence 8(3):575–599.


Recommended