+ All Categories
Home > Documents > Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis...

Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis...

Date post: 05-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
14
The Visual Computer manuscript No. (will be inserted by the editor) Generating Animation from Natural Language Texts and Semantic Analysis for Motion Search and Scheduling Masaki Oshita Received: date / Accepted: date Abstract This paper presents an animation system that generates an animation from natural language texts such as movie scripts or stories. It also proposes a frame- work for a motion database that stores numerous mo- tion clips for various characters. We have developed se- mantic analysis methods to extract information for mo- tion search and scheduling from script-like input texts. Given an input text, the system searches for an appro- priate motion clip in the database for each verb in the input text. Temporal constraints between verbs are also extracted from the input text and are used to schedule the motion clips found. In addition, when necessary, certain automatic motions such as locomotion, taking an instrument, changing posture, and cooperative mo- tions are searched for in the database. An animation is then generated using an external motion synthesis system. With our system, users can make use of ex- isting motion clips. Moreover, because it takes natural language text as input, even novice users can use our system. Keywords computer animation · motion database · natural language processing 1 Introduction Recently, computer animation has been widely used in movies, video games, TV programs, web graphics, etc. Because computer animation is a very powerful tool Masaki Oshita Kyushu Institute of Technology 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan Tel.: +81-948-29-7718 Fax: +81-948-29-7709 E-mail: [email protected] to present a story, drama, or instruction, there are de- mands from non-professional people to create computer animation. However, it is a difficult task because of two main issues. The first issue is the difficulty of making and reusing motion data. Currently, motion data are mainly created using motion capture or keyframe tech- niques. Either way, they are very time consuming and require professional skills. Although there are demands for reusing existing motion data, this is difficult because of the lack of a system for storing and searching large amounts of motion data. Because there can be various motions of various characters, it is difficult to manage them in a standard file system or database. Currently, most motion data are created from scratch for indi- vidual scenes and are thrown away without reuse. The second issue is the limitation of current animation sys- tems. A computer animation can be created by combin- ing a number of existing motion clips using animation software such as MotionBuilder, Maya, 3ds Max, etc. However, it is difficult for novice users to utilize such software, because handling motion data is tricky and these systems require training. To address these issues, we developed an animation system that generates an animation from natural lan- guage texts such as movie scripts or stories (Fig. 1). We also developed a motion database that stores many mo- tion clips for different characters. When an input text is given, the system searches for an appropriate motion clip from the database for each verb. Temporal con- straints between verbs are also extracted from the input text. The searched motion clips are scheduled based on the temporal constraints. In addition, when necessary, some automatic motions such as locomotion, taking an instrument, changing posture, and cooperative motions are searched from the database. The system outputs a motion timetable which consists of motion clips and
Transcript
Page 1: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

The Visual Computer manuscript No.(will be inserted by the editor)

Generating Animation from Natural Language Texts andSemantic Analysis for Motion Search and Scheduling

Masaki Oshita

Received: date / Accepted: date

Abstract This paper presents an animation systemthat generates an animation from natural language textssuch as movie scripts or stories. It also proposes a frame-work for a motion database that stores numerous mo-tion clips for various characters. We have developed se-mantic analysis methods to extract information for mo-tion search and scheduling from script-like input texts.Given an input text, the system searches for an appro-priate motion clip in the database for each verb in theinput text. Temporal constraints between verbs are alsoextracted from the input text and are used to schedulethe motion clips found. In addition, when necessary,certain automatic motions such as locomotion, takingan instrument, changing posture, and cooperative mo-tions are searched for in the database. An animationis then generated using an external motion synthesissystem. With our system, users can make use of ex-isting motion clips. Moreover, because it takes naturallanguage text as input, even novice users can use oursystem.

Keywords computer animation · motion database ·natural language processing

1 Introduction

Recently, computer animation has been widely used inmovies, video games, TV programs, web graphics, etc.Because computer animation is a very powerful tool

Masaki OshitaKyushu Institute of Technology680-4 Kawazu, Iizuka, Fukuoka, 820-8502, JapanTel.: +81-948-29-7718Fax: +81-948-29-7709E-mail: [email protected]

to present a story, drama, or instruction, there are de-mands from non-professional people to create computeranimation. However, it is a difficult task because of twomain issues. The first issue is the difficulty of makingand reusing motion data. Currently, motion data aremainly created using motion capture or keyframe tech-niques. Either way, they are very time consuming andrequire professional skills. Although there are demandsfor reusing existing motion data, this is difficult becauseof the lack of a system for storing and searching largeamounts of motion data. Because there can be variousmotions of various characters, it is difficult to managethem in a standard file system or database. Currently,most motion data are created from scratch for indi-vidual scenes and are thrown away without reuse. Thesecond issue is the limitation of current animation sys-tems. A computer animation can be created by combin-ing a number of existing motion clips using animationsoftware such as MotionBuilder, Maya, 3ds Max, etc.However, it is difficult for novice users to utilize suchsoftware, because handling motion data is tricky andthese systems require training.

To address these issues, we developed an animationsystem that generates an animation from natural lan-guage texts such as movie scripts or stories (Fig. 1). Wealso developed a motion database that stores many mo-tion clips for different characters. When an input textis given, the system searches for an appropriate motionclip from the database for each verb. Temporal con-straints between verbs are also extracted from the inputtext. The searched motion clips are scheduled based onthe temporal constraints. In addition, when necessary,some automatic motions such as locomotion, taking aninstrument, changing posture, and cooperative motionsare searched from the database. The system outputsa motion timetable which consists of motion clips and

Page 2: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

2

Neo waves to Jack. At the same time, Jack takes the red bottle. Jack hits Neo with it.

wave

take hit

Neo

Jack

locomo

locomo

stagger

locomo

(a) Input text

(b) Output motion timetable (c) Generated animation

time

Fig. 1 Example of our system. (a) Input text. (b) Searched motion clips and their execution timings. (c) Generated animation.

their execution timings. An animation is then gener-ated using an external motion synthesis system. Usingour system, even novice users can create animation bymaking use of existing motion clips.

There are many possible applications of our sys-tem. Recently, in movie production, simple animationsare created before production to check camerawork,screenplay, necessary visual effects, etc. These anima-tions are called “previsualization” or “animatics”. Theyare also often created for the scenes in which no com-puter graphics are involved. Using our system, evendirectors or writers who are not professional anima-tors can create an animation very quickly. Moreover,our system can be used by non-professional people whowant to make an animation but do not have professionalskills. It can also be used for children to visualize a storyto make it interesting and easy to understand. Our sys-tem can be used for movie production. Even thoughanimators want to add more details to the output ofour system, our method is much easier than makinganimations from scratch.

In this paper, we propose a motion frame that con-tains meta-information about a motion clip, an object-oriented database framework for storing a number ofmotions of a number of characters in a hierarchicalstructure, natural language analysis methods that arespecialized for extracting motion related descriptionsfrom an input text, and scheduling of multiple motionsbased on the temporal constraints in an input text. Inaddition, we have done preliminary experiments whichshowed that our system generates expected results fromvarious input texts.

This paper is an extended version of our previouswork [1]. As explained in Section 5, we have mainlyextended our natural language analysis methods to en-able our system to handle various expressions in inputtexts. Based on the experiments presented in Section 8,87% of the verbs in a sample movie script can be dealtwith using our methods and represented as motions, al-

though 78% of these were handled by the system beforethe extension [1].

The rest of this paper is organized as follows. Sec-tion 2 reviews related work in the literature. Section 3gives an overview of our system. Sections 4, 5, 6, and 7describe our methods used in the framework of the mo-tion database, natural language analysis, motion searchand motion scheduling, respectively. In Section 8 someexperimental results are presented together with a dis-cussion thereof. Finally, Section 9 concludes the paper.

2 Related Work

Generating animation from natural language texts hasbeen a challenge. Many research groups have tackledthis problem. The SHRDLU system, which was devel-oped by Winograd [2], is known as the pioneer. UsingSHRDLU, a user can give commands to a robot usingEnglish in an interactive manner, and make it arrangeobjects in a scene. However, the types of commandswere very limited.

Badler et al. [3][4] developed virtual agents that fol-low natural language interactions. They proposed Pa-rameterized Action Representation (PAR), which hasa similar purpose to the motion frame in our research.The PAR has more complex information such as pre-condition and achievement. The motion generator ofeach PAR is programmed using a state machine. Itcan use motion data or any motion generation meth-ods. However, specifying detailed information and con-structing motion generators are very time consuming.

Tokunaga et al. [5] developed the K2 system, whichhas similar goals to Badler et al. In their system, agentsare controlled via spoken language. Their research israther focused on solving the vagueness of natural lan-guage instructions. They use case frames [6] to searchfor motions. Unlike our work, they use all cases that areused in linguistic analysis. The interpretation of eachcase is left to the user who adds the case frame han-

Page 3: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

3

dler. The motion generator for each case frame must bemanually programmed by the user.

These previous works aim at developing intelligentagents that understand natural language instructionsand make plans to execute them. However, the sys-tems are very complex, and many rules are required.On the other hand, our system aims to reuse existingmotion data easily and efficiently. The motion frame inour work contains just enough information to search forappropriate motions that match natural language textsand it is easy to describe. We believe that our systemis more practical.

Lu and Zhan [7] developed an animation productionsystem that includes story understanding, plot plan-ning, act planning, camera planning, etc. Although theirsystem takes simple Chinese as input, it requires a greatdeal of additional knowledge, including not only caseframes but also many dictionaries, templates and rules.

Sumi et al. [8] developed a system for visualizingshort stories for children. The system extracts keywordsfrom an input text, and chooses an appropriate scene,characters, and motions from a database. It simply playsa motion that matches the keywords. Although a usercan add motion data to the system, the system cannotselect motions appropriate for the objects or charac-ters and cannot generate interactions between charac-ters and the scene.

There is very little research that deals with motionscheduling from natural language texts. The above sys-tems simply execute motions as instructions are givenor events happen, and no scheduling is considered. How-ever, in order to execute multiple motions of multiplecharacters as instructed by an input text, the executiontiming of the motions must be coordinated. Baba etal. [9] developed a system for generating an animationthat satisfies temporal and spatial constraints given bynatural language texts. The system determines appro-priate initial positions of the agents and objects thatare specified in the input text. However, the motions ofthe agents and motion scheduling were not considered.

Coyne and Sproat [10] developed WordsEye, whichconverts natural language texts to a scene. Becausetheir purpose is to generate a still image, when a char-acter motion is indicated in a given text, the systemsimply chooses a pose for the action from the database.

There have been various studies on generating acharacter’s gestures for a monologue or conversation[11]. These methods generate motions by composingshort fragments of motions based on signal processing ofthe input speech rather than by interpreting the mean-ing of the speech.

There are also animation engines that support somescript language such as Improv [12] and Alice [13]. How-

ever, it is still difficult to program the agents and tomake use of a large amount of existing motion data. Inaddition, markup language formats for describing an-imation including scenes, characters and actions havebeen proposed [14][15] . However, they are difficult todescribe by hand. The animation files should be cre-ated by using specific authoring software. Moreover, itis difficult to add and reuse motion data using such fileformats and authoring software.

There are many motion synthesis methods whichgenerate new motions from a small number of motions[17][18]. However, they require a manual setup for eachmotion module. It is difficult for end users to add newmotion modules. Although currently our system selectsone motion from the database, it is possible to extendour system to blend a number of selected motions basedon quantitative motion query parameters such as con-tact position.

3 System Overview

In this section, we explain the overview of our system(Fig. 2) and data representation (Fig. 3).

When an input text is given to the system, natu-ral language processes (syntax analysis and semanticanalysis) are applied first. The syntax analysis is theprocess of converting a plain text to a tree structurewith phrase tags and dependencies. Fig. 3(b) is an ex-ample of the analyzed tree which is computed from aninput text (Fig. 3(a)). The type of each phrase and thedependency between phrases are determined. For exam-ple, S, NP, VP and PR in Fig. 3(b) represent sentence,noun phrase, verb phrase and preposition, respectively.

The semantic analysis extracts information aboutmotions described in the input text from the tree struc-ture. A query frame contains information for the mo-tion search. One is generated for each verb in the text.The temporal constraints contain information about ex-ecution timing between verbs. For example, QF1∼QF3and TC1∼TC2 in Fig. 3(c) represent query frames andtemporal constraints, respectively.

Based on the temporal constraints, motion schedul-ing determines the execution order of each motion clip,which corresponds to each query frame as shown in Fig.3(d). Note that exact execution times are not decidedat this point, because the duration of each motion isnot known until motion clips are searched from thedatabase and automatic motions are added later.

The motion search is applied for each query frame.In addition, when it is necessary, automatic motionsare inserted before the motion. Finally, motion clipsand their execution timings are passed to the motion

Page 4: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

4

Input Text

Syntax Analysis

Semantic Analysis

Analyzed Tree

Query Frames

Temporal Constraints

Motion Scheduling

Motion Search

Motion Synthesis

Scheduing Information

+ Query Frames

Scheduing Information

+ Motion Clips

Animation

Scene Information

Characters and Objects

Motion

Database

Fig. 2 System overview.

Neo waves to Jack. At the same time, Jack takes the red bottle. Jack hits Neo with it.

Neo waves to Jack. At the same time, Jack takes the red bottle. Jack hits Neo with it.

NP

VP

NPPP

S

NP

VPNP

S

NP

VP

NPPP

S

QF1

agent: Neo

motion: wave

target: Jack

QF2

agent: Jack

motion: take

target: red bottle

QF3

agent: Jack

motion: take

target: Neo

instrument: red bottle

TC2

Serial

QF2 -> QF3

TC1

Synchronized

QF1 = QF2

QF1

QF2 QF3

Neo

Jack

wave

take hit

Neo

Jack

locomo

locomo

stagger

locomo

(a) Example of an input text

(b) Analyzed tree

(c) Query frames and temporal constraints

(d) Scheduling information and query frames

serial

synchronized

(e) Output motion timetable

(Motion clips and their execution timings)

time

NPNPPP

Fig. 3 Example of data representation.

synthesis module as a motion timetable, as shown inFig. 3(e).

The motion synthesis generates an animation bysmoothly connecting given motion clips. The interac-tions between characters and between a character andobjects are handled by this module based on the infor-mation that the motion clips have.

The scene information contains characters and ob-jects and their initial states, including postures, posi-tions, and orientation. Each object has certain objectinformation including names, a default contact point,alternative contact points and their names. For exam-ple, a desk object has “desk”, “table”, etc. as its names.An object also has sets of pairs consisting of a partname and its position (e.g., “above”, “under”, “side”,etc.). This information is used to search for appropri-ate motions and determine appropriate contact posi-tions according to an adjective that is used in the in-put text. In addition, an object has a default positionwhich is used when no adjective is specified. This kindof object information is commonly used in similar ap-proaches [5][10]. In addition, a scene also has defaultentering and leaving points as the default goal locationsfor locomotive motions (see Section 5.5). Currently, oursystem assumes that the scene information is providedin advance by the user.

The scope of this paper is the components in thedotted box in Fig. 2. There are many tools for syn-tax analysis that can be used with our system. TheStanford parser [19] is used for our implementation.For motion synthesis, our system uses an external an-imation system [20]. The system generates continuousmotions from given motion clips and their executiontimings. The system determines an appropriate synthe-sis method for each transition based on the constraintsbetween the foot and the ground during motions. Al-ternatively, another commercial animation system suchas MotionBuilder, Maya, 3ds Max, etc. can be used.

4 Motion Database

In this section, we describe the representation of mo-tion data. We first explain the case frame that is usedin natural language processing. Then, we explain ourmotion frame, which is inspired by the case frame. Wealso describe our database of characters and motions.

4.1 Case Frame

The idea of a case frame was proposed by Fillmore [6].A case frame represents the role of a verb. Each case of

Page 5: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

5

a case frame is a phrase that represents an aspect of theverb. Typically a case frame has the following cases:

– Agent: the person who performs the motion.– Experiencer: the person who experiences something.– Object: the object that an effect is caused to during

the motion.– Instrument: the object that causes an effect during

the motion.– Source: the source or origin of the motion.– Goal: the goal or target of the motion.– Time: the time when the motion is performed.– Location: the location where the motion is performed.

Each case needs to be a specific type of entity. Somecases are mandatory for some verbs. A verb that hasdifferent roles depending on context has multiple caseframes.

In general natural language processing systems, aprocedure to select a case frame for an input text is asfollows. First, based on the types and dependency ofphrases in the analyzed tree, candidate cases of eachphrase are determined. By searching for case framesthat match the candidate cases, the most appropriatecase frame and all its cases are determined.

The case frame is a good way to extract and rep-resent the meanings of texts. The case frame is widelyused in many research papers such as [5][10]. However,the case frame is not suitable for representation of mo-tion data for animation. From the view point of motionrepresentation, each case has different roles dependingon case frames. For example, the “object” case of a caseframe could be an object that the character uses or an-other character that the character’s motion causes aneffect on. Moreover, the case frame does not contain in-formation about postures and contact positions, whichare important for selecting motions.

4.2 Motion Frame

We propose a motion frame, which contains the infor-mation about a motion clip. The motion frame is in-spired by the case frame. However, we define the itemsof the motion frame based on importance when wesearch for a motion according to input texts.

There are many kinds of verbs in general English.However, our system handles only action verbs that in-volve a physical motion, in other words, verbs that canbe visualized as an animation. Other kinds of verbssuch as non-action verbs (e.g., “think”, “believe”) orstate verbs (e.g., “know”, “exist”) are ignored in oursystem, because they are difficult to represent by a mo-tion clip. Action verbs are categorized into intransitive,

Names of Motion

Item Value

Agent human

Instrument NULL

Target appropriate size and weight ranges

Contact Position hand position of contact

take, pick up, get

Initial Posture standing

Adverbs slowly

Target Direction NULL

Fig. 4 Example motion frame of “taking-an-object”.

transitive, and ditransitive verbs. Intransitive verbs in-volve no other object (e.g., “he runs”). Transitive verbsinclude one target object/character/position (e.g., “heopens the door”, “he hits her”, “he walks to the door”).Ditransitive verbs include two target objects (e.g., “hegives her the book”, “he cuts the bread with a knife”).For distractive verbs, one of the two target objects shouldbe the object that the character possesses. We call suchobjects “instruments”. Therefore, action verbs have atmost one “target” object /character/position and atmost one “instrument” object. We use them as items ofa motion frame instead of cases in a case frame. In ad-dition, contact position is used to select a motion thatfits the environment and previous motions.

The items of the motion frame are as follows. Anexample of a motion frame is shown in Fig. 4. Notethat some items may not have any value depending onthe motion.

– Agent Magent ref : The reference to the character inthe database who performs the motion.

– Names of motion Mmotion strings: The set of verbsthat represent the motion. When a verb in the in-put text matches one of the motion names, the mo-tion frame will be a candidate for the verb. To han-dle ambiguity, a motion frame may have multiplenames. For example, a “taking-an-object” motionmay have “take” and “pick up” as its names.

– InstrumentMinstrument ref ,Minstrument params: Theobject that the character uses in the motion. Thisis either a reference to an object in the databaseMinstrument ref or the size and weight ranges of anobject Minstrument params . If the motion requires aspecific object such as “cutting with a knife”, the ob-ject should be specified as a reference to the instru-ment. Otherwise abstract conditions of an objectare specified. For example, if the motion is “pokingsomething with a long object”, then appropriate sizeand weight ranges of the object are specified.

– Target: The reference to an object Mtarget ref or thesize and weight ranges Mtarget params are specifiedin the same way as the instrument. If the target is a

Page 6: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

6

character, the reference to the character is specifiedin Mtarget ref .

– Contact positionMcontact vertical,Mcontact horizontal:the position of the end-effector when it makes con-tact with the target. A contact position is specifiedwhen the motion involves contact with a target char-acter or object. Vertical and horizontal positions arehandled differently. Because the horizontal positioncan be adjusted by lateral movement (see Section7.2), the vertical position is more important for mo-tion selection. For example, if multiple “taking anobject” motions are in the database and an inputtext “he takes the bottle on the ground” is given,then based on the position of the bottle, the appro-priate taking motion (e.g., “taking an object withsquatting”) will be selected. The contact position isautomatically computed from the contact informa-tion (see Section 4.3) of the motion data. The con-tact position is expressed in the local coordinates ofthe motion data.

– Target direction Mtarget direction: The direction ofthe target. For some motion, even though the mo-tion does not involve contact with the target, thetarget direction is important. For example, when“waving to a person” or “shooting a target” motionis executed, the character should face the right di-rection. For some motion, both contact position andtarget direction are specified. For example, “sittingdown on a sofa” motion should make contact witha sofa from the front of the sofa.

– Initial posture Minitial posture flag: the character’sposture when the motion begins. Currently, it is rep-resented as one of three states: standing, sitting, orlying down. The initial posture is used to select amotion that matches the terminal posture of theprevious motion. In cases where no such motion isin the database, an automatic changing posture mo-tion will be added (see Section 7.2 ).

– Adverbs Madverb strings : The set of adverbs rep-resent the style of the motion such as “slowly” or“happily”.

Each item of motion frames must be specified bya user. However, this is not such a difficult task forusers. For each motion frame (each motion clip), theuser is asked to specify the agent, verbs, target, andinstrument. The agent is selected from the characterdatabase. For the target and instrument, it is eitheran appropriate object or agent that is selected fromthe database or the size and weight range of an ob-ject. When the motion involves a specific object (e.g.,“cutting with a sword”), the object should be selected.Otherwise, object conditions are specified (e.g., “liftingup a light object using one hand”). The contact posi-

tion is automatically computed form the motion and itscontact information (see Section 4.3 ). The initial pos-ture is also automatically computed from the motionclip. As a result, specifying the items of a motion frameis very easy.

4.3 Motion Data

Our system supposes that each motion is short and sim-ple. A complex motion is difficult to represent by a mo-tion frame. If a user wants to add a long motion to thedatabase, the motion should be divided into pieces.

Some motions involve an interaction with an objector a character. This information is very important forgenerating animation and for selecting motions. There-fore, it is specified on the motion frame. The contactinformation consists of the contact type (hold, release,or hit), contact time (local time in the motion clip) andthe end-effector (e.g., right hand). This information isalso necessary for generating animation in the motionsynthesis module (see Section 6.2).

Some motions that interact with another charactercause the reaction of the other character (e.g., “Jackhits Neo. Neo falls”). Usually such cooperative motionsare captured or created at the same time but are storedas separate motion clips. In our system, such coop-erative motions are specified on the motion frame. Ifa motion has cooperative motions and no cooperativemotion is indicated in the input text, the system au-tomatically executes a cooperative motion (see Section7.2). In addition, when two cooperative motions includephysical contact, the timings and the initial positionsof these motions are coordinated (see Section 7.1).

4.4 Character and Motion Database

We use an object-oriented framework for the characterand motion database. As shown in Fig. 5, each char-acter is considered to be an object that has variousmotions as its methods. A character inherits from abase character. A motion of the base character can beoverridden by another motion. The motions that arenot overridden are used as the motions for the derivedcharacter. In this way, the hierarchy of characters andtheir motions are efficiently managed. A character caninherit from multiple base characters. All motions thatthe base characters have are used for the derived char-acter. Since the motion that most closely matches aninput sentence is selected from the available motions,even if there are multiple motions with the same name,there is no problem with conflicts caused by the multi-ple inheritance.

Page 7: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

7

+ Walk

+ Take

+ Hit

Human

+ Walk

+ Take

+ Hit

Human

+ Take (specialized)

Male

+ Take (specialized)

Male

Female

Female

+ Take (specialized)

+ Hit (specialized)

Jack

+ Take (specialized)

+ Hit (specialized)

Jack

+ Walk (specialized)

+ Take (specialized)

+ Hit (specialized)

Trained Person

+ Walk (specialized)

+ Take (specialized)

+ Hit (specialized)

Trained Person

+ Hit (specialized)

Neo

+ Hit (specialized)

Neo

Fig. 5 Example of a hierarchical database of characters.

If a user wishes to create a new character, s/he sim-ply adds the new character that inherits from a basecharacter or multiple base characters to the databaseand adds character-specific motions to that character.Even if there are not many new motions for the newcharacter, the motions of the base characters are used.In this way, users can add new characters very easily.

The database can be implemented in various ways.If the characters and motions are implemented usingan object-oriented programming language (e.g., C++or Java), we would represent motions as objects ratherthan methods and implement a mechanism of motioninheritance on the character class, because it is practi-cally difficult to handle motions as methods using suchprogramming languages.

5 Natural Language Analysis

Although natural language processing techniques haveadvanced in recent years, it is still a challenge to un-derstand general texts, because it requires not only lan-guage processing but also a large knowledge of the world.However, our system is supposed to take script-like textand only motion-related descriptions in the text matter.This makes the natural language analysis much eas-ier than general natural language processing systemssuch as machine translation or summarization systems.Moreover, because scene information, such as charac-ters and objects, is given in advance, we do not needthe same large dictionary required by general naturallanguage processing systems.

As explained in Section 3, the semantic analysistakes an analyzed tree and generates query frames andtemporal constraints. A query frame contains informa-tion of a verb for the motion search. The temporal con-straints contain information about the execution timing

between verbs. In the followings of this subsections, weexplain how the semantic analysis works.

5.1 Query Frame

To select a motion that matches an input text, we usea query frame, which has the same items as the motionframe, and whose items are determined by analyzingthe syntax tree of the input text (see Fig. 3(b)). Sceneinformation is also used to determine some items.

As explained in Section 4.2 , unlike generic semanticanalysis, motion searches only need a target and aninstrument for each verb. Therefore, we determine theseby applying the following rules to each verb in the inputtext.

– A verb is used as the name of motion of the queryframe Qmotion strings. If the verb is followed by apreposition or noun, then all sets of the verb andthe following word are also set to Qmotion strings,because this could represent an idiom. Therefore,Qmotion strings can contain multiple phases. For ex-ample, in “Jack falls back”, both the phrases “fall”and “fall back” are set to Qmotion strings.

– If a noun represents a character in the scene andthe verb is dependent on the noun, the character isconsidered as the agent (subject) of the query frameQagent ref .

– If two nouns are dependent on the subject that theverb is related to, they are considered as the tar-get Qtarget ref and the instrument Qinstrument ref .(E.g., in “Jack gives Neo the book”, “Neo” is thetarget and “the book” is the instrument.)

– If only one noun is dependent on the subject, it isconsidered as the target Qtarget ref .

– If a preposition phrase (e.g., “to Neo”) is depen-dent on the subject, it is considered as the targetQtarget ref or the instrument Qinstrument ref de-pending on the preposition. If the preposition is“with” and the noun in the phrase represents anobject, the object is used as the instrument. Other-wise, the noun is used as the target.

– If the character is holding an object, the object isalso used as Qinstrument ref , even if it is not speci-fied in the input text.

– If a phrase considered to be an adverb in the syntaxanalysis is dependent on the subject, the phrase isused as one of the adverbs Qadverb strings that cancontain multiple phrases.

After the names of the target and instrument are de-termined, we obtain the reference or value of each itemfrom the scene information. We suppose that the char-acters or objects in input texts always exist in the scene.

Page 8: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

8

Therefore, unlike general semantic analysis, by lookingup the scene information all nouns in input texts aredetermined.

The target character or object that is indicated inthe input text is searched from the scene informationand the reference and position are set to the queryframe. When the target is a character and a body part isindicated in the text such as “She hit him in the head”,the reference and position of the body part is set. Whenthe target is an object in the scene, the target size andweight are set in the query frame. The contact posi-tion Qcontact vertical, Qcontact horizontal and the targetdirection Qtarget direction are set based on the positionand direction, respectively, of the target character orobject. If an adjective is used in the input text (e.g.“top of the table”, “under the table”) and the objecthas the corresponding part (see Section 3), the posi-tion of the corresponding part assigned to the objectis used as the contact position. If there is no adjective,the default position specified for the object is used. Theinstrument object that is indicated in the input text isalso set to the query frame.

5.2 Temporal Constraints

Temporal constraints are extracted from input texts.The types of temporal constraint are serial executionor synchronized execution between two verbs. A serialexecution constraint has the execution order of two mo-tions. A synchronized execution constraint has relativeexecution timing. Temporal constraints are generatedfrom a syntax tree as follows:

1. For all pairs of sequential verbs in the input text, se-rial execution constraints are assigned. For example,when the input text “Jack walks in the room. Neostands up.” is given to the system, a serial executionsection constraint (Jack, walk) to (Neo, stands up)is generated.

2. When a word that indicates a reverse order exists inthe input text (e.g., “after”), the order of the serialexecution constraint is reversed. If a serial executionconstraint is already created, the old constraint isoverridden. For example, when the input text “Jackwalks in the room after Neo stands up.” is givento the system, a serial execution constraint (Neo,stands up) to (Jack, walk) is generated.

3. When a word that indicates synchronization existsin the input text (e.g., “at the same time” or “while”),a synchronized execution constraint is added. If thereis a conflicting constraint, it is overridden. For ex-ample, when the input text “Jack walks in the room.At the same time, Neo stands up.” is given to the

system, a synchronized execution constraint (Neo,stands up) and (Jack, walk) is generated. The rela-tive timings between two motions are set to zero sothat they start at the same time.

4. When the motions of two characters are cooperativemotions and they include contact with each other, asynchronized execution constraint is added and therelative execution timings of the two motions aredetermined based on their contact information (Sec-tion 4.3). For example, when the input text “Jackhits Neo. Neo falls” is given to the system, a syn-chronized execution constraint (Jack, hit) and (Neo,fall) is generated. At this point, the relative timingsare not set. They will be set based on the contacttimes in the searched motion data, when the mo-tions are searched later.

5.3 Adjective and Pronouns

Sometimes a character or an object is referred to bya combination of adjectives and a noun instead of itsname. In this case, the system has to determine towhich character or object the expression refers. More-over, when a pronoun (e.g., “he”, “she”, “it”, “they”)is used in the input text, the system has to determineto which character or object the pronoun refers. Thisprocess can be difficult especially when ambiguous oreuphemistic expressions are used. However, since oursystem is meant to take simple script-like texts, we han-dle these problems using the following method.

Each character and object in our database has alist of adjectives, nouns, and pronouns by which thecharacter can be referred to. For example, a male sol-dier character would have “he”, “they”, “man”, “guy”,“soldier”, etc. as its list of adjectives and pronouns.The list is inherited from the base character. If the ad-jective, noun, or pronoun appearing in the input textrepresents a single character or object (e.g., “he”, “thesoldier”, “it”), the system searches for a character oran object that matches the words.

However, if the words are ambiguous, meaning thatthere are multiple characters or objects matching thegiven adjective, noun, or pronoun in the scene, the sys-tem has to choose an appropriate character or object.Basically, if the noun or pronoun represents a characteror an object, this should be mentioned in the previ-ous sentences. A character can be the agent Qagent refor the target Qtarget ref of the query frame, while anobject (e.g., “it”) can be the target Qtarget ref or theinstrument Qinstrument ref thereof. By using this con-straint, the noun or pronoun is determined as follows.

Page 9: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

9

1. If one of two items is clearly mentioned in the inputtext, the other should be a different character or ob-ject. For example, if the input text is “Neo comes toJack. Jack gives him a book”, the word “him” can-not represent Jack since Jack is already the agentQagent ref of the query frame of the second sentence.Therefore, in this case, the system searches for char-acters in the previous sentence and uses the othercharacter Neo as the target Qtarget ref .

2. If both of the items for characters are pronouns,the agent of the previous sentence is also used asthe agent Qagent ref . For example, if the input textis “Neo comes to Jack. He gives him a book”, thesystem decides that the first “he” in the second sen-tence represents Neo. In the same way, if both theitems for objects are pronouns, although this is notcommon, the target of the previous sentence is usedas the current target and the instrument of the pre-vious sentence is used as the current instrument.

If the noun or pronoun represents multiple charac-ters (e.g., “they”, “soldiers”), the sentence should berepresented by multiple motions. Therefore, in such acase, multiple query frames are generated. For example,if the input text is “Neo hits the soldiers” and there aretwo soldiers A and B in the scene, two query framesare generated with all items except the target characterthe same. The same rule is applied when an item of thequery frame represents multiple characters (e.g., “Neoand Jack walk.”). However, if a motion frame match-ing the motion name has a target direction, but not acontact position, the center position of all characters re-ferred to is used as the target direction, instead of gen-erating multiple query frames. For example, with theinput “Neo shoots the soldiers”, the “shooting” motionframe has only a target direction, and a query framewhose target position is the center of the soldiers isgenerated.

5.4 Infinitives and Gerunds

Infinitives and gerunds are often used with a verb. Inthis case, the system generates appropriate query framesand temporal constraints depending on the verb.

– If the verb is “do”, “perform”, etc. (e.g., “Neo per-forms dancing“), the infinitive or the gerund is rep-resented as the motion and is used to generate acorresponding query frame. In this case, the verb isnot involved in the query frame.

– If the verb is “start”, “try”, etc., the infinitive or thegerund is used to generate a corresponding queryframe in the same way. However, in this case, the

next event is considered to happen before the mo-tion finishes. Therefore, a temporal constraint is gen-erated to execute the next motion just after thismotion starts.

– If the verb is “repeat”, “keep”, etc., the infinitiveor gerund is used to generate a corresponding queryframe. The motion is repeated before the next mo-tion starts, and is therefore, specified in the queryframe. This information is used for motion schedul-ing to duplicate a motion when it can be executedmore than once.

– It the verb does not fall into any of the above cases,both the verb and the infinitive or gerund are rep-resented as motions (e.g., “Neo walks waving toJack”). In this case, multiple query frames are gen-erated. In addition, temporal constraints are gener-ated to execute all motions at the same time.

The system uses a dictionary of pairs consisting ofthe verb and the corresponding method, to determinewhich method should be applied.

In addition, if a gerund exists on its own in theinput text (“Neo walks to the door, waving to Jack”), aquery frame is generated for the gerund and a temporalconstraint is generated to execute the gerund and theverb in the sentence at the same time. In this case, theagent of the verb becomes the agent of the gerund aswell.

5.5 Locomotive Motions

Locomotive motions such as walking and running re-quire special care, because the target position and pathvary depending on the situation and appropriate mo-tions should be generated instead of simply executinga motion in the database. How to generate locomotivemotions is explained in Section 6.3. In this section, weexplain how to handle locomotive motions in naturallanguage analysis.

As discussed in Section 8, natural language is notsuited to specifying the locomotion path. Therefore,our system currently does not handle it and only de-termines the target position of locomotive motions. Wecategorize locomotive motions into the following typesdepending on how the target position is handled.

– Moving to a target position. If the verb is a loco-motive motion (e.g., “walk”, “run”, “go”, etc.) anda target position is explicitly specified in the in-put text (e.g., “Neo walks to the door.”), the queryframe includes the target position Qcontact virtical,Qcontact horizontal and a flag indicating that this isa locomotive motion.

Page 10: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

10

– Entering and leaving. Sometimes the target posi-tion is not specified in the input text. In this case, ifthe verb is a specific verb, the locomotive motion ishandled as an entering or leaving motion. For exam-ple, if the verb is “leave”, “walk away”, “disappear”,etc., the verb is handled as a leaving motion and theleaving position specified in the scene information isused as the target position. The query frame also in-cludes a flag indicating locomotive motion.

– Simple walking. If a target position is not speci-fied and the verb represents a locomotive motionexcluding an entering or leaving motion (e.g., “Neowalks.”), a walking motion is simply executed. Inthis case, the query frame does not include a flagindicating locomotive motion. This query frame ishandled in the same way as the other query frames.As a result, a motion of walking forward from thecharacter’s current position is selected and executed.

5.6 Adverbs

Adverbs are handled in different ways depending onthe word. As explained in Section 5.2, if the adverbrepresents temporal information, an appropriate tem-poral constraint is generated. If the adverb representsthe frequency or timing of executing a verb, the adverbis handled in the same way as infinitives and gerundsin Section 5.4. For example, if an adverb such as “re-peatedly”, “twice”, etc. is specified, the third option inSection 5.4 is applied. The system has a dictionary ofadverbs for these cases. If the adverb is not found inthe dictionary, it is assigned to a query frame to searchfor an appropriate motion as explained in Section 5.1.

6 Motion Search

In this section, we explain how to search for an appro-priate motion for each verb in the input text. Handlingmultiple verbs and motions is dealt with in the nextsection. A query frame is generated for each verb asexplained in the previous section. Based on the queryframe, a motion is selected from the database.

6.1 Evaluation of Motion Frame

A motion frame that best matches the query frame issearched for in the database. This search is performedin three steps.

In the first step, all candidate motion frames inwhich the motion name and agent match the queryframe are selected from the database. All motion frames

with the agent character or its base characters are po-tential candidates.

In the second step, the motion frames whose itemsdo not match the query frame are excluded as candi-dates. If the query frame has a target Qtarget ref , orQtarget params and/or an instrument Qinstrument ref ,or Qinstrument params but the motion frame does not,then it is excluded. Moreover, if a motion frame hastarget parameters, instrument parameters, or the ver-tical contact position, and the values of the query frameexceed the specified ranges, then that motion frame isalso excluded.

In the third step, all candidate motion frames areevaluated based on the similarity between the motionframe and the query frame items using the followingequation:

E= w0R(Mtarget params, Qtarget params)+w1R(Minstrument params, Qinstrument params)+w2D(Mcontact vertical, Qcontact vertical)+w3D(Mcontact horizontal, Qcontact horizontal)+w4D(Mtarget direction, Qtarget direction)+w5F (Minitial posture flag, Qinitial posture flag)+w6A(Madverb strings, Qadverb strings)+w7H(Magent ref , Qagent ref )

(1)

where R(M,Q), D(M,Q), F (M,Q), A(M,Q),H(M,Q)are the functions that compute normalized distance(0.0∼1.0) between size and weight parameters, contactpositions, posture flags, adverbs, and hierarchical posi-tions, respectively. The distances between the size andweight range of the motion frame and the object sizeand weight of the query frame are computed so thatthe distance becomes zero when the values are at thecenter of the range and the distance becomes one whenthe values are at the edge of the range. The distancebetween posture flags is computed in such a way thatthe distance is zero when they match and otherwisethe distance is one. The distance between adverbs iscomputed so that the distance is zero when there is atleast one pair of matching adverb between the motionframe and query frame and otherwise the distance isone. The distance between hierarchical positions of thecharacters is computed from the number of inheritancesbetween them (see Fig 5). The candidate motion framewhose evaluation is the smallest will be selected andused for animation. w0 ∼ w7 are weight parameters.They can be set for each motion frame in the case thatsome items are important for the motion. In our currentexperiments we used 1.0 for all weights on all motions.

Page 11: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

11

6.2 Motion Modification

The motion clip of the selected motion frame is used foranimation. However, even if the closest motion frame isselected, the contact position may not exactly matchthe query frame. In that case, the motion clip is modi-fied using inverse kinematics. The posture of the char-acter during the motion is modified so that the contactposition of the end-effector (e.g., hand) matches thetarget position in the query frame.

When the character is far from the target, chang-ing the end-effector position is not enough. In addition,when the character executes the selected motion it mayneed to first take an instrumental object or change itsposture (e.g., standing up). These cases are handled byadding automatic motions before the selected motioninstead of modifying the selected motion. Automaticmotions are explained in Section 7.2 .

6.3 Locomotive Motion

When a query frame indicates a locomotive motion (seeSection 5.5), appropriate motion enabling the charac-ter to move to the target position must be generated.Several methods, such as [16], have been developed togenerate walking motions. In our system, the charac-ter should not only walk, but also turn and step, inorder to move to the appropriate position and direc-tion. Therefore, we generate locomotive motions basedon the target position and/or target direction accord-ing to the steps below using the set of motion data thatthe character has.

1. If the target direction is specified and the target po-sition is not, an appropriate turning motion is gen-erated. An appropriate motion based on the targetdirection is selected from the motions with ‘turn’as their motion name in the database. If only thetarget direction is specified, the process stops here.

2. If the target position is specified and it is not infront of the character, a turning motion is added inthe same way as in the first step so that the agentfaces the target position.

3. If the target position is within one step, a steppingmotion is added in the same way as the turning mo-tion. The motion is selected from all ‘step’ motions.

4. If the target position is more than one step in thedistance, a walking motion is added. The walkingmotion is repeated until the agent reaches the tar-get position. The step length in each walking cycleis adjusted so that the walking cycle ends at thetarget position. The motion is selected from ‘walk’motions. Currently, our system generates a straight

path to the target position even if there are obsta-cles.

5. If the target direction is specified and it does notmatch the character’s direction at the end point ofthe walking motion, a turning motion is once againadded.

As explained above, the system uses the “turn”,“step” and “walk” motions that the character has. Cur-rently the system selects an appropriate motion andmodifies it if necessary. Alternatively, motion blendingcan be used to generate more appropriate motions byusing multiple motions [16][18].

7 Motion Scheduling

In this section, we explain how our system handles mul-tiple motions from an input text. Basically, the systemsearches for a motion for each verb in the input text.However, in order to make an animation, the executiontiming of each motion must also be determined. More-over, the continuity of motions should be considered.For example, when a character makes contact with anobject in the scene, the character must first move closeto the object. Our system takes care of this kind ofcontinuity of motions.

When multiple characters perform multiple motionsthe motions should be scheduled. However, an exactexecution time for each motion is not usually specifiedin the input text. In order to determine the motionschedule, we need information about the motions suchas duration and contact information.

Our motion schedule works as follows. First, tempo-ral constraints are extracted from input texts in addi-tion to query frames (Section 5). Second, query framesare roughly scheduled based on the temporal constraints(Section 7.1). Note that at this point, only process or-ders of query frames are determined. Finally, by search-ing for a motion frame that matches each query frame inorder of process, the execution timing of each motionis determined. When automatic motions are requiredto be executed before a motion, they are added incre-mentally (Section 7.2). By repeating this process forall query frames, the motion clips and their executiontimings are determined.

7.1 Scheduling Query Frames

Based on temporal constraints, the query frames arescheduled roughly at first. After that, the process orderof all query frames (verbs) is determined. For motionsthat have a synchronized execution constraint, their

Page 12: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

12

process orders are temporarily set as one of them beingprocessed first. The exact timings of all query framesare determined in the process order.

For each query frame, a motion clip is searched fromthe database as explained in Section 6.1. Before search-ing each motion, the scene condition is set to the timewhen the motion is executed because the selected mo-tion may change depending on the position of the char-acter or object that the motion involves. The executiontiming of the motion is determined based on the dura-tion of the selected motion. The next motion is startedjust after the previous motion is finished if they havea serial execution constraint. If they have a synchro-nized executing constraint, their execution timings aredetermined based on the contact timings of the selectedmotions.

This process is repeated from the first motion tothe last. When multiple query frames are synchronizedbased on the temporal constraints, the motions for allquery frames are searched and their execution timingsare delayed until all constraints are satisfied.

7.2 Automatic Motions

During the motion scheduling and motion search, asearched motion can sometimes not be executed. Inthat case, automatic motions are generated and addedbefore the searched motion. As explained earlier, thepurpose of our system is to reuse motion data withoutcomplex motion planning which may require additionalprogramming for each motion. Therefore, our systemdeals with minimum automatic motions. The additionalmotions are also selected from the database. Therefore,each character is easily customized by adding specifickinds of motion to the database without adding anyrules or modules.

7.2.1 Locomotive motion

If a motion includes interaction with another characteror an object in the scene (i.e., a query frame containsa target object or character), the character has to bein the right place to make contact with the object orcharacter. If not, the system automatically adds loco-motive motions for the character to move to the rightplace and to face the right direction.

If the motion frame has a contact position and tar-get direction (e.g., “sitting on a chair” motion shouldbe executed in the right position and direction to thechair), an appropriate locomotive motion is generatedso that the character approaches the right point andturns in the right direction. The method for generatinglocomotive motions explained in Section 6.3 is used. If

the motion frame has a contact position, an appropri-ate locomotive motion is generated so that the char-acter approaches the right point. If the motion frameonly has a direction (e.g., “shooting toward the target”motion), the character merely turns without walking.As explained in Section 6.3, our current system has nopath planning; the character merely moves in a straightline to the target position.

7.2.2 Taking an instrument

When a character uses an instrument in a motion (i.e.,a query frame contains an instrument and the charac-ter does not hold it), the character must pick up theinstrument object before they use it. When a motionto take the instrument is not explicit in the input text,a ‘take’ motion is selected from the database. Whenthe character is away from the instrument, locomotivemotions are also added before the taking motion.

7.2.3 Changing posture

For motion searches, if there is no candidate motionwhose initial posture matches the terminal posture ofthe previous motion (i.e., the initial posture of a queryframe does not match any of the candidate motionframes), a changing posture motion such as standingup is added. In this case, all motions that include astate change will be candidate motions.

7.2.4 Cooperative motion

As explained in Section 4.3, when a motion involves in-teraction with another character, a cooperative motionof the other character follows. When a selected motionframe has cooperative motions and any of them are notindicated in the input text, the default cooperative mo-tion and a temporal constraint of the motion frame areautomatically added.

8 Experiment and Discussion

We have implemented our method and motion database.Currently, the system has six characters as shown inFig. 5 and about 50 motions that are collected from acommercially available motion capture library. We havetested our system with some short sentences and foundthat an appropriate motion was selected from each sen-tence even though the same verb is used in different sen-tences. An example of the generated animation is avail-able from the author’s web site (http://www.cg.ces.kyutech.ac.jp/research/modb/index.html).

Page 13: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

13

To evaluate our framework, we tested it with a pub-lished movie script (The Matrix, 1999). Because ourmotion database does not yet have enough data, wechecked whether our methods could handle the descrip-tions in the movie script and output appropriate queryframes. There were about 830 actions (verbs) in thescript. We found that about 87% of these were pro-cessed by our system without any problems. However,4% were complex expressions that are difficult to handleusing simple rules, such as a sentence with the subjectbeing a character’s body (e.g., “His elbow hits the en-emy.”, “His body jumps.”), vague representation (e.g.,“he stares into the darkness”), indirect expression (e.g.,“He has no answer.”), and ambiguous nouns or pro-nouns. 4% were verbs that cannot be represented bya motion including non-action or state verbs explainedin Section 4.2, such as “He feels that ∼”, a verb rep-resenting a result of a motion such as “miss” in “heshoots her and misses”. 5% were verbs representing ini-tial states in the scene but not actions (e.g., “they aredead”, “he stands in the room”). As discussed later, anon-text-based interface is suitable for specifying ini-tial states or positions of locomotion. According to theabove results, 9% of the verbs in the sample script wereactually verbs that cannot be represented as motions.This shows that 95% of verbs that can be representedas motion were handled by our methods. Although itis possible to extend our semantic analysis to supportmore complex expressions, this will require a great dealof knowledge and rules, which is contrary to the aims ofthis research. If a complex expression cannot be handledby the system, the user should rephrase it as a plain ex-pression rather than adding more knowledge and rulesto the system.

However, even if an animation is generated from agiven text, since there is limited information in an inputtext, a user may not be satisfied with the motions thatare found in the database. Moreover, since we use a sim-ple method for generating locomotive motion, motionmodification, and motion synthesis, a user may not besatisfied with the synthesized animation. To evaluatethe effectiveness of our system, we intend conductinga practical user study in a future work. Improvementof motion generation and the external motion synthesissystem [20] is also a future work.

Our current system cannot handle object motions.However, as they are also important for animation, itis easily possible to extend our system to handle them,as they tend to be simpler than human motions.

The fundamental principle of our framework is tomake use of motion data without requiring any addi-tional motion specific rules. Currently, our system doesnot support high-level motion planning such as auto-

matically dividing complex motion into small motionsor path planning with object avoidance. Because weuse simple rules for automatic locomotion, the result-ing animations are not so natural. This can be solvedby adding more motion data and some sophisticatedmodules that generate new motion from a number ofmotion data sources such as [17][18].

Our system supposes that scene information, suchas the positions of objects and characters, is provided bythe user. The existing text-to-scene system [10] can beintegrated with our system. However, specifying the po-sitions using natural language can be harder than usinga conventional mouse-based interface. So can specifyinglocomotion path. From a practical viewpoint, a hybridof a text-based interface and a conventional interfacemight be more useful.

With our current system, if the user is not satisfiedwith or wants to change an output motion, they mustchange the input text and they cannot change the out-put motions directly. To address this, we are going todevelop a natural language-based motion editing inter-face with which a user can change generated motionsinteractively by giving instructions to agents, as realdirectors do with actors.

9 Conclusion

We have proposed an animation system that generatesanimation from natural language text such as moviescripts or stories. Our future work includes the expan-sion of both the system and the motion database. Cur-rently, creating animations is very difficult, especiallyfor nonprofessionals. We believe that our system willalleviate this and provide many creators with a meansof expressing their stories as animation.

References

1. Masaki Oshita, “Generating animation from natural languagetexts and framework of motion database”, In Proc. of Interna-tional Conference on Cyberworlds 2009, pp. 146-153, Bradford,UK, September 2009.

2. Terry Winograd. Understanding Natural Language. AcademicPress, 1972.

3. N. Badler, R. Bindiganavale, J. Allbeck, W. Schuler, L. Zhao,and M. Palmer. “Parameterized action representation for vir-tual human agents”, In Embodied Conversational Agents, pp.256-284, 2000.

4. R. Bindiganavale, W. Schuler, J. Allbeck, N. Badler, A. Joshi,and M. Palmer. “Dynamically altering agent behaviors usingnatural language instructions”, In Proc. of Autonomous Agents2000, pp. 293-300, 2000.

5. Takenobu Tokunaga, Kotaro Funakoshi, and Hozumi Tanaka.“K2: animated agents that understand speech commands andperform actions”, In Proc. of 8th Pacific Rim InternationalConference on Artificial Intelligence 2004, pp. 635-643, 2004.

Page 14: Generating Animation from Natural Language Texts and ... · ral language processes (syntax analysis and semantic analysis) are applied rst. The syntax analysis is the process of converting

14

6. Charles J Fillmore. The case for case. In Universals in Lin-guistic Theory, pp. 1-88, 1968.

7. Ruqian Lu, Songmao Zhan, Automatic Generation of Com-puter Animation: Using AI for Movie Animation, Springer,2002.

8. Kaoru Sumi and Mizue Nagata. “Animated storytelling sys-tem via text”, In Proc. of International Conference on Advancesin Computer Entertainment Technology, 2006.

9. Hiromi Baba, Tsukasa Noma, and Naoyuki Okada. “Visual-ization of temporal and spatial information in natural languagedescriptions”, Transaction on Information and Systems, E79-D(5), pp. 591-599, 1996.

10. Bob Coyne and Richard Sproat. “Wordseye: an automatictext-to-scene conversion system”, In Proc. of SIGGRAPH 2001,pp. 487-496, 2000.

11. Sergey Levine, Christian Theobalt, Vladlen Koltun. “Real-time prosody-driven synthesis of body language”, ACM Trans-actions on Graphics (In Proc. of ACM SIGGRAPH Asia 2009),Vol. 28, Issue 5, 2009. (to appear)

12. Ken Perlin, and Athomas Goldberg, “Improv: a system forscripting interactive actors in virtual worlds”, In Proc. of SIG-GRAPH ’96 Proceedings, pp. 205-216, 1996.

13. Matthew J. Conway. Alice: easy-to-learn 3D scripting fornovices, PhD Dissertation, University of Virginia, 1997.

14. Masaki Hayashi, Hirotada Ueda, Tsuneya Kurihara, Michi-aki Yasumura, “TVML (TV program Making Language) - au-tomatic TV program generation from text-based script –”, InProc. of Imagina ’99, pp. 84-89, 1999.

15. Hyunju Shim, Bo Gyeong Kang, “CAMEO - camera, au-dio and motion with emotion orchestration for immersive cine-matography”, In Proc. of International Conference on Advancesin Computer Entertainment Technology (ACE) 2008, pp. 115-118, 2008.

16. Sang Il Park, Hyun Joon Shin, Sung Yong Shin, “On-linelocomotion generation based on motion blending”, In Proc. ofACM SIGGRAPH Symposium on Computer Animation 2002,pp. 105-111, 2002.

17. C. Rose, M. F. Cohen, and B. Bodenheimer. “Verbs andadverbs: Multidimensional motion interpolation”, IEEE Com-puter Graphics and Applications, vol. 18, no. 5, pp. 32-40, 1998.

18. Lucas Kovar and Michael Gleicher. “Automated extractionand parameterization of motions in large data sets”, ACMTransactions on Graphics, vol. 23, no. 3, pp. 559-568, 2004.

19. Dan Klein and Christopher D. Manning. “Fast exact infer-ence with a factored model for natural language parsing”, InAdvances in Neural Information Processing Systems 15 (NIPS2002), pp. 3-10, 2003.

20. Masaki Oshita. “Smart motion synthesis”, Computer Graph-ics Forum, vol. 27, no. 7, pp. 1909-1918, 2008.


Recommended