Proc. Natl. Acad. Sci. USAVol. 92, pp. 10011-10016, October 1995Colloquium Paper
This paper was presented at a coUoquium entitled "Human-Machine Communication by Voice," organized byLawrence R. Rabiner, held by the National Academy of Sciences at The Arnold and Mabel Beckman Centerin Irvine, CA, February 8-9, 1993.
Military and government applications of human-machinecommunication by voiceCLIFFORD J. WEINSTEINLincoln Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02173-9108
ABSTRACT This paper describes a range ofopportunitiesfor military and government applications of human-machinecommunication by voice, based on visits and contacts withnumerous user organizations in the United States. The ap-plications include some that appear to be feasible by carefulintegration of current state-of-the-art technology and othersthat will require a varying mix of advances in speech tech-nology and in integration of the technology into applicationsenvironments. Applications that are described include (1)speech recognition and synthesis for mobile command andcontrol; (2) speech processing for a portable multifunctionsoldier's computer, (3) speech- and language-based technol-ogy for naval combat team tactical training; (4) speechtechnology for command and control on a carrier flight deck;(5) control of auxiliary systems, and alert and warninggeneration, in fighter aircraft and helicopters; and (6) voicecheck-in, report entry, and communication for law enforce-ment agents or special forces. A phased approach for transferof the technology into applications is advocated, where inte-gration of applications systems is pursued in parallel withadvanced research to meet future needs.
This paper describes a broad range of opportunities formilitary and government applications of human-machine com-munication by voice and discusses issues to be addressed inbringing the technology into real applications. The paperdraws on many visits and contacts by the author with personnelat a variety of current and potential user organizations in theUnited States. The paper focuses on opportunities and on whatis needed to develop real applications, because, despite themany opportunities that were identified and the high userinterest, the military and government organizations contactedwere generally not using human-machine communication byvoice in operational systems (exceptions included an applica-tion in air traffic controller training and voice entry of zipcodes by the U.S. Postal Service). Furthermore, the visits anddiscussions clearly identified a number of applications thattoday's state-of-the-art technology could support, as well asother applications that require major research advances.Background for this paper is provided by a number of
previous assessments of military applications of speech tech-nology (1-7), including prior National Research Council stud-ies (3, 4) and studies conducted in association with the NATORSG10 Speech Research Study Group (1, 2, 5, 7). Those priorstudies provide reviews of the state of the art at the time, andeach outlines a number of programs in which prototype speechrecognition systems were tested in application environments,including fighter aircraft, helicopters, and ship-based com-mand centers. These efforts, as described in the references but
not detailed further here, generally yielded promising techni-cal results but have not yet been followed by operationalapplications. This paper focuses on users and applications inthe United States, but the general trends and conclusionscould apply elsewhere as well.
This paper is organized to combine reports on the militaryand government visits and contacts with descriptions of targetapplications most closely related to each organization. How-ever, it is important to note that many of the applicationspertain to a number of user organizations, as well as havingdual use in the civilian and commercial areas. (Other papersin this volume describe applications of speech technology ingeneral consumer products, telecommunications, and aids forpeople with physical and sensory disabilities.) A summaryrelating the classes of applications to the interests of thevarious military and government users is provided near the endof the paper. The paper concludes with an outline of a strategyfor technology transfer to bring the technology into realapplications.
TECHNOLOGY TRENDS AND NEEDSA thorough discussion of technology trends and needs wouldbe beyond the scope of this paper; hence, the focus here is ondescription of the applications. But the underlying premise isthat both the performance of algorithms and the capability toimplement them in real time, in off-the-shelf or compacthardware, has advanced greatly beyond what was tested inprior prototype applications. The papers and demonstrationsat a recent DARPA (Defense Advanced Research ProjectsAgency) Speech and Natural Language Workshops (8) providea good representation of the state of current technology forhuman-machine communication by voice. Updated overviewsof the state of the art in speech recognition technology arepresented elsewhere in this volume.With respect to technological needs, military applications
often place higher demand on robustness to acoustic noise anduser stress than do civilian applications (7). But militaryapplications can often be carried out in constrained taskdomains, where, for example, the vocabulary and grammar forspeech recognition can be limited.
SUMMARY OF VISITS AND CONTACTSThe broad range of military and government organizationsthat were contacted is shown in Fig. 1. There was broad-basedinterest in speech recognition technology across all theseorganizations. The range of interests was also deep, in thesense that most organizations were interested in applicationsover a range of technical difficulties, including some applica-tions that today's state-of-the-art technology could supportand others that would require major research advances. Also,many organizations had tested speech recognition systems in
10011
The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicate this fact.
Dow
nloa
ded
by g
uest
on
Mar
ch 1
4, 2
020
10012 Colloquium Paper: Weinstein
DARPA/SISTODefense Information Systems
AgencyDefense Language InstituteForeign Broadcast Information
Service IFAA AcademyFAA En-Route Center (DFW)FAA Headquarters
FBI Engineering CenterFBI Headquarters Laboratory I
FIG. 1. Summary of military and government organizations visited or contacted.
prototype applications but had not integrated them into op-
erational systems. This was generally due to a perception that"the technology wasn't ready yet." But major speech recog-
nition tests, such as the Air Force's F-16 fighter tests (9) andthe Army's helicopter tests (10) were conducted a number ofyears ago. In general, tests such as these have not beenperformed with systems that approach today's state-of-the-artrecognition technology (7).
ARMY APPLICATIONS
The Army visits and contacts (see Fig. 1) pointed out many
applications of human-machine communication by voice, ofwhich three will be highlighted here: (i) Command and Controlon the Move (C20TM); (ii) the Soldier's Computer; and (iii)voice control of radios and other auxiliary systems in Armyhelicopters. In fact, the applications for voice-actuated userinterfaces are recognized by the Army to pervade its engi-neering development programs (E. Mettala, DARPA, unpub-lished presentation, Feb. 1992).
In Desert Storm the allied troops moved farther and fasterthan troops in any other war in history, and extraordinaryefforts were needed to make command and control resourceskeep pace with the troops. C20TM is an Army program aimedat ensuring the mobility of command and control for potentialfuture needs. Fig. 2 illustrates some of the mobile forceelements requiring C20TM and some of the potential appli-cations for speech-based systems. Typing is often a very poorinput medium for mobile users, whose eyes and hands are busywith pressing tasks. Referring to Fig. 2, a foot soldier acting asa forward observer could use speech recognition to enter a
stylized report that would be transmitted to command andcontrol headquarters over a very low-rate, jam-resistant chan-nel. Repair and maintenance in the field can be facilitated byvoice access to repair information and helmet-mounted dis-plays to show the information. In a mobile command andcontrol vehicle, commanders need convenient access to bat-tlefield information and convenient means for entering andupdating plans. Integrated multimodal input/output (voice,text, pen, pointing, graphics) will facilitate meeting theserequirements. Other applications suggested in Fig. 2 includesimple voice translation (e.g., of forward observer reports),access to battlefield situation information, and weapons systemselection.The Soldier's Computer is an Army Communications and
Electronics Command (CECOM) program that responds tothe information needs of the modern soldier. The overall
system concept is shown in Fig. 3. Voice will be a crucial inputmode, since carrying and using a keyboard would be vetyinconvenient for the foot soldier. Functions of the Soldier'sComputer are similar to those mentioned above for C20TM.Technical issues include robust speech recognition in noiseand smooth integration of the various input/output modes.The technology for both the Soldier's Computer and C20TMhas many dual-use, peacetime applications, both for everydayuse and in crises such as fires or earthquakes.Speech recognition for control of radios and other devices
in Army helicopters is an application that has been addressedin test and evaluation programs by the Army Avionic Researchand Development Activity (AVRADA) organization, as wellas by groups in the United Kingdom and France. Feasibilityhas been demonstrated, but operational use has not beenestablished. The Army AVRADA people I met described a
tragic helicopter collision in which the fact that both pilotswere tuning radios may have been the major cause of the crash.Although voice control was considered to be a viable solution,it was not established as a requirement (and therefore notimplemented) because of the Army's view that speaker-independent recognition was necessary and was not yet suffi-ciently robust. But the state of the art of speaker-independentrecognition, particularly for small vocabularies, has advanceda great deal and is now likely to be capable of meeting theneeds for control of radios and similar equipment in a militaryhelicopter.
NAVY APPLICATIONS
My Navy visits and contacts uncovered a wide range ofimportant applications of speech technology, with support atvery high levels in the Navy. Applications outlined here will be(i) aircraft carrier flight deck control and information man-
agement, (ii) SONAR supervisor command and control, and(iii) combat team tactical training.The goal in the carrier flight deck control application is to
provide speech recognition for updates to aircraft launch,recovery weapon status, and maintenance information. At therequest of Vice-Admiral Jerry 0. Tuttle (Assistant Chief ofOperations for Space and Electronic Warfare), the NavalOceans Systems (NOSC)* undertook to develop a demonstra-tion system on board the USS Ranger. Recognition require-ments included open microphone; robust, noise-resistant rec-
*The Naval Oceans Systems Center has subsequently reorganized asthe Naval Research and Development Organization.
Army Armament R&D CenterArmy Al CenterArmy AVRADAArmy CECOMArmy Future Battle Lab
Air Force Armstrong LabAir Force Flight Dynamics LabAir Force Foreign Aerospace
Science & Technology Center
Naval Air Systems CommandNaval Oceans Systems CenterNaval Underwater Systems CenterNaval Personnel Research &
Development Center
rlemphrammCkm
E
Proc. Natl. Acad. Sci. USA 92 (1995)
Dow
nloa
ded
by g
uest
on
Mar
ch 1
4, 2
020
Proc. Natl. Acad. Sci. USA 92 (1995) 10013
* Forward observer report* Translation for allies
* Situation awareness* Weapons system selection
* Repair and maintenance
* Information access/displayin mobile C2 vehicle
* Plan entry: voice, text, pen
FIG. 2. Command and Control on the Move (COTM): force elements and example applications of human-machine communication by voice.
ognition with out-of-vocabulary word rejections; and easyintegration into the PC-based onboard system. An extremelysuccessful laboratory demonstration, using a commerciallyavailable recognizer, was performed at NOSC for AdmiralTuttle in November 1991. Subsequent tests on board theRanger in February 1992 identified a number of problems andneeded enhancements in the overall human-machine interfacesystems, but correction of these problems seemed to be wellwithin the current state of the art.The SONAR supervisor on board a surface ship needs to
control displays, direct resources, and send messages while
moving about the command center and looking at commandand control displays. This situation creates an opportunity forapplication of human-machine communication by voice, andthe Naval Underwater Systems Center (NUSC) has sponsoreddevelopment of a system demonstrating voice activation ofcommand and control displays at a land-based integrated testsite in New London, CT. The system would be used first fortraining of SONAR supervisors at the test site and later forshipboard applications. Initial tests with ex-supervisors fromSONAR were promising, but the supervisors expressed dis-satisfaction at having to train the speaker-dependent recog-nizer that was used.
EARPHONE FOR IV.AUDIO OUTPUT
HAND-HELD JOYSTICK
Issues* Robust recognition* Integration of modes* Effective display
_I.
Dual-Use Possibilities* Repair person, medic* Courier, traveller
i - Bus or train driver.. .
FIG. 3. The Soldier's Computer: system concept; functions that would be assisted by human-machine communication by voice; technical issues;and possible dual-use applications.
Functions* Data access for repair* Message entry/report back* Query for location and
battlefield situation* C2 of weapons systems
4- $.. =..mf= N. yi. -z,6 -.,. MIMMIRr.,. F.W. -WI, :la,-;.. M s,m 'Mi. 41
Colloquium Paper: Weinstein
Dow
nloa
ded
by g
uest
on
Mar
ch 1
4, 2
020
10014 Colloquium Paper: Weinstein
nbat Ship
ormation Center Data fordebriefing,training, &
_ Recordedteam inputs
Multichannelvoice, multi-modal actions
Simulated Friends and Foes
E;;.0 lw training, &~~~~~~A
S
Multimedia Data Analysisand Fusion System
* Fusion of language withmultiple data sources
* Include wordspotting, talkerID, gisting
* Extendable to operationalenvironments
- User assistance- Problem detection- Improved user interface
Colbat External events,;cenario Sensor data
FIG. 4. Naval combat team tactical training: system concept and applications of speech- and language-based technology.
A scenario for application of speech-and-language-basedtechnology to Navy combat team tactical training, based on aproposal by the Navy Personnel Research and DevelopmentCenter, is illustrated in Fig. 4. The training scenario includesa mix of real forces and force elements simulated by usingadvanced simulation technology. Personnel in the CombatInformation Center (either at sea or in a land-based testenvironment) must respond to developing combat scenariosusing voice, typing, trackballs, and other modes and mustcommunicate both with machines and with each other. Assuggested in the figure, speech-based and language-basedtechnology, and fusion of language with multiple data sources,can be used to correlate and analyze the data from a combattraining exercise, to allow rapid feedback (e.g., what wentwrong?) for debriefing, training, and replanning. These lan-guage-based technologies, first developed and applied in train-ing applications where risk is not a major issue, can later beextended to operational applications, including detection ofproblems and alerting users, and also to development ofimproved human-machine interfaces in the Combat Informa-tion Center.The approach of first developing and using a system with
human-machine communication by voice in a training appli-cation and then extending to an operational application is a
very important general theme. The training application is bothuseful in itself and provides essential data (including, forexample, language models and speech data characterizing thehuman-machine interaction) for developing a successful op-erational application.
AIR FORCE APPLICATIONSThe Air Force continues its long-term interest in speechinput/output for the cockpit and has proposed to includehuman-machine communication by voice in the future Multi-Role Fighter. Fighter cockpit applications, ranging from voicecontrol of radio frequency settings to an intelligent Pilot'sAssociate system, have been discussed elsewhere (7, 9) and willnot be detailed further here. However, it is likely that the kindsof applications that were tested in the AFITI F-16 Program,with promising results but not complete success, would bemuch more successful with today's robust speech recognitiontechnology. Voice control of radio frequencies, displays, and
gauges could have significant effect on mission effectivenessand safety. A somewhat more advanced but technically feasibleapplication is use of voice recognition in entering reconnais-sance reports. Such a system is currently under development atthe Defense Research Agency in the United Kingdom (11).Other potential Air Force applications include human-machine voice communication in airborne command posts,similar to Army and Navy command and control applications.In particular, entry of data and log information by voice couldpotentially provide significant workload reduction in a largevariety of command and control center operations.
AIR TRAFFIC CONTROL APPLICATIONSThe air traffic controller is taught to use constrained phrase-ology to communicate with pilots. This provides an opportu-nity, which is currently being exploited at the Federal AviationAdministration (FAA) Academy in Oklahoma City, at a NavalAir Technical Training Center in Orlando, FL, and elsewhere,to apply speech recognition and synthesis to emulate pseudo-pilots in the training of air traffic controllers. This application,illustrated in Fig. 5, is an excellent example of military andgovernment application of human-machine communication byvoice that is currently in regular use. Advances in speech andlanguage technology will extend the range and effectiveness ofthese training applications (7). As in the Naval Combat TeamTactical training application, speech recognition technologyand data fusion could be used to automate training sessionanalysis and to provide rapid feedback to trainees.A number of automation aid applications in air traffic
control are also possible via speech technology, as indicated inFig. 5. Again, the experience can be used to help buildoperational automation applications. An application of highcurrent int-erest is on-line recognition of flight identificationinformation from a controller's speech to quickly accessinformation on that flight (12). More advanced potentialapplications include processing and fusion of multimodal datato evaluate the effectiveness of new automation aids for airtraffic control and gisting (13) of pilot/controller communi-cations to detect potential air space conflicts.
LAW ENFORCEMENT APPLICATIONSDiscussions with Federal Bureau of Investigation (FBI) per-sonnel revealed numerous potential applications of speech and
Con
Combat lnf4
CombatScenario
Proc. Natl. Acad. Sci. USA 92 (1995)
Dow
nloa
ded
by g
uest
on
Mar
ch 1
4, 2
020
Proc. Natl. Acad. Sci. USA 92 (1995) 10015
ir-
- ..
FIG. 5. Air traffic control training and automation: system concept and applications of human-machine communication by voice.
language technology in criminal investigations and law en-forcement. For example, the Agent's Computer is envisionedas a portable device, with some similarity to the Soldier'sComputer but specialized to the agent's needs. Functions ofparticular interest to agents include (i) voice check-in, (ii) dataor report entry, (iii) rapid access to license plate or description-based data, (iv) covert communication, (v) rapid access to mapand direction information, and (vi) simple translation of wordsor phrases. Fast access to and fusion of multimedia data, somelanguage based and some image based (e.g., fingerprints andphotos), together were a major need for aid in investigations.Voice-controlled database access could be used to facilitatethis data access. As with the Navy and FAA training applica-tions mentioned above, the FBI had high interest in trainingusing simulation in combination with language-based technol-ogy for both mission execution and mission diagnosis. Criminalinvestigations put a major burden on agents in terms of
reporting and documentation; the use of human-machinecommunication by voice to rapidly prepare reports rangingfrom structured forms to free text, was identified as anapplication of major interest to agents.
SUMMARY OF USERS AND APPLICATIONSThe matrix shown in Fig. 6 relates the classes of applicationsthat have been described to the interests of the various militaryand government users. All the applications have dual use in thecivilian area. Looking across the rows, it is evident that all theusers have interest in a wide range of applications with varyingtechnical difficulty. In fact, upon showing this matrix topotential users, each user generally wanted to fill in all theboxes in his row. The most pervasive near-term application isvoice data entry, which can range from entering numericaldata to creating formatted military messages, to free-form
Data Entry Data CommandUsers & Access & Training Translation
Commun. Control
Soldier xx x x x x
Naval CIC xx xx xx xxOfficer
Pilot xx x xx
Agent xx xx x x
Air Traffic x xx x
Controller
Diplomat x x xx
Joint Force xx xx xxCommander
xx = primary applicationx = additional application
FIG. 6. Matrix relating classes of applications of human-machine communication by voice to the interests of military and government users.
Training* Emulation of pseudo-pilots* Scenario-driven simulation* Automated session analysis
(including speechrecognition)
Automation* Active control (e.g., flight ID
recognition)* Processing of multichannelvoice and sensor data toevaluate automation aids
; Gisting for conflict detection
Colloquium Paper: Weinstein
Dow
nloa
ded
by g
uest
on
Mar
ch 1
4, 2
020
10016 Colloquium Paper: Weinstein
report entry. The current speech recognition technology iscapable of performing these functions usefully in a number ofmilitary environments, including particularly to provide oper-ator workload reduction in command and control centers.
TECHNOLOGY TRANSFERA key conclusion of this study is that there is now a greatopportunity for military and government applications of hu-man-machine communication by voice, which will have realimpact both on the users and on the development of thetechnology. This opportunity is due both to technical advancesand to very high user interest; there has been a big increase inuser interest just within the past few years (i.e., since the studyreported in ref. 7).The strategy of the technologists should be to select and
push applications with a range of technical challenges, so thatmeaningful results can be shown soon, while researcherscontinue to advance the technology to address the harderproblems. It is essential that technologists work with the usersto narrow the gap between the user and the state of the art. Toooften, users have tested speech recognition systems that are offthe shelf but well behind the state of the art, and have beendiscouraged by the results.With today's software-based recognition technology, and
with the increased computing power in PCs, workstations, anddigital signal-processing chips, it is now possible to develop andtest applications with recognition algorithms that run in realtime, in software, on commercially available general-purposeprocessors and that perform very close to the state of the art.Technologists must work with users to understand the user
requirements and provide appropriate technology. For effec-tive technology transfer, software and hardware must beportable and adaptable to new domains or to unforeseenvariations in the user's needs. Eventually the user should beable to take over and continue adapting the technology to thechanging needs, with little support from the technologists.Meanwhile, the technologists, having learned from each gen-eration of operational applications, can be working to developthe research advances that will enable the next generation ofoperational applications.
I would like to acknowledge the contributions to this study of thefollowing individuals: Victor Zue (Massachusetts Institute of Tech-nology (MIT), Allen Sears (MIT Research Establishment), JanetBaker (Dragon Systems), Charles Wayne (DARPA), Erik Mettala(DARPA), George Doddington (DARPA), Deborah Dahl (Paramax),David Ruppe (Army), Jim Schoening (Army), Christine Dean (Navy),Steve Nunn (Navy), Walter Rudolph (Navy), Jim Cupples (Air Force),Tim Anderson (Air Force), Dave Williamson (Air Force), Joe Kiel-man (FBI), John Hoyt (FBI), and Peter Sielman (Analysis andTechnology, Inc.). A special acknowledgment goes to Victor Zue formany helpful discussions and contributions. This work was sponsoredin part by the Advanced Research Projects Agency and in part by theDepartment of the Air Force.
1. Beek, B., Neuburg, E. P. & Hodge, D. C. (1977) IEEE Trans.Acoust. Speech Signal Process..ASSP-25, 310-321.
2. Cupples, E. J. & Beek, B. (1990) in Proceedings of the NATOIAGARD Lecture Series No. 170, SpeechAnalysis and Synthesis andMan-Machine Speech Communications for Air Operations, pp.8-1-8-10.
3. Flanagan, J. L., et al. (1984) in Automatic Speech Recognition inSevere Environments (National Research Council, Committee onComputerized Speech Recognition Technologies, Washington,DC).
4. Makhoul, J., Crystal, T. H., Green, D. M., Hogan, D., McAulay,R. J., Pisoni, D. B., Sorkin, R. D. & Stockham, T. G., Jr. (1989)in Removal of Noise from Noise-Degraded Speech Signals (Na-tional Research Council, Committee on Hearing, Bioacoustics,and Biomechanics, Washington, DC).
5. Proceedings of the NATOAGARD Lecture Series No. 170, SpeechAnalysis and Synthesis andMan-Machine Speech Communicationsfor Air Operations (1990).
6. Woodard, J. P. & Cupples, E. J. (1983) IEEE Commun. Mag. 21,35-44.
7. Weinstein, C. J. (1991) Proc. IEEE 79, 1626-1641.8. Proceedings of the February 1992 DARPA Speech and Natural
Language Workshop (1992) (Kaufmann, San Mateo, CA).9. Howard, J. A. (1987) in Proceedings of the Military Speech Tech
(Arlington, VA), pp. 76-82.10. Holden, J. M. (1989) Proceedings of the American Voice Inputl
Output Society (AVIOS) Conference.11. Russell, M. J., et al. (1990) Proc. ICASSP'90 (Albuquerque, NM).12. Austin, S., et al. (1992) in Proceeding ofthe February 1992 DARPA
Speech and Natural Language Workshop (Kaufmann, San Mateo,CA), pp. 250-251.
13. Rohlicek, J. R., et al. (1992) Proc. ICASSP'92 (San Francisco), pp.II-113-II-116.
Proc. Natl. Acad. Sci. USA 92 (1995)
Dow
nloa
ded
by g
uest
on
Mar
ch 1
4, 2
020