+ All Categories
Home > Documents > trol erbs - ICSI | ICSIdbailey/diss.pdf · 2000. 5. 15. · When Push Comes to Shove: A...

trol erbs - ICSI | ICSIdbailey/diss.pdf · 2000. 5. 15. · When Push Comes to Shove: A...

Date post: 10-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
217
Transcript
  • When Push Comes to Shove:

    A Computational Model of the Role of Motor Control

    in the Acquisition of Action Verbs

    by

    David Robert Bailey

    B.S. (Cornell University) 1990

    A dissertation submitted in partial satisfaction of the

    requirements for the degree of

    Doctor of Philosophy

    in

    Computer Science

    in the

    GRADUATE DIVISION

    of the

    UNIVERSITY of CALIFORNIA, BERKELEY

    Committee in charge:

    Professor Jerome A. Feldman, ChairProfessor George Lako�

    Professor Robert Wilensky

    Fall 1997

  • The dissertation of David Robert Bailey is approved:

    Chair Date

    Date

    Date

    University of California, Berkeley

    Fall 1997

  • When Push Comes to Shove:

    A Computational Model of the Role of Motor Control

    in the Acquisition of Action Verbs

    Copyright 1997

    by

    David Robert Bailey

  • 1

    Abstract

    When Push Comes to Shove:

    A Computational Model of the Role of Motor Control

    in the Acquisition of Action Verbs

    by

    David Robert Bailey

    Doctor of Philosophy in Computer Science

    University of California, Berkeley

    Professor Jerome A. Feldman, Chair

    Children learn a variety of verbs for hand actions starting in their second year of life. The

    semantic distinctions can be subtle, and they vary across languages, yet they are learned

    quickly. How is this possible? This dissertation explores the hypothesis that to explain the

    acquisition and use of action verbs, motor control must be taken into account. It presents

    a model of embodied semantics|based on the principles of neural computation in general

    and on the human motor system in particular|which takes a set of labelled actions and

    learns both to label novel actions and to obey verbal commands. A key feature of the

    model is the executing schema, an active controller mechanism which, by actually driving

    behavior, allows the model to carry out verbal commands. A hard-wired mechanism links

    the activity of executing schemas to a set of linguistically important features including

    hand posture, joint motions, force, aspect and goals. The feature set is relatively small

    and is �xed, helping to make learning tractable. Moreover, the use of traditional feature

    structures facilitates the use of model merging, a Bayesian probabilistic learning algorithm

    which rapidly learns plausible word meanings, automatically determines an appropriate

    number of senses for each verb, and can plausibly be mapped to a connectionist recruitment

    learning architecture. The learning algorithm is demonstrated on a handful of English verbs,

    and also proves capable of making some interesting distinctions found crosslinguistically.

    Professor Jerome A. FeldmanDissertation Committee Chair

  • iii

    Contents

    List of Figures vii

    1 Overview 1

    2 Setting the Stage 8

    2.1 A Crosslinguistic Conundrum . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 It's the Body, Stupid! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 The Task in Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 A Connectionist Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5 Related E�orts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3 Executing Schemas for Controlling Actions 20

    3.1 Human Motor Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 A Petri Net Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.2.1 Synergies as building blocks . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 The Petri net formalism . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.3 Durative actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.4 Accessing perceived world state . . . . . . . . . . . . . . . . . . . . . 313.2.5 The Slide x-schema in detail . . . . . . . . . . . . . . . . . . . . . . 333.2.6 Other x-schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.7 Multiple entry points . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2.8 What can't be encoded in x-schemas? . . . . . . . . . . . . . . . . . 38

    3.3 Connectionist Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Petri net control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.2 Passing parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    3.4 Related Ideas in Arti�cial Intelligence . . . . . . . . . . . . . . . . . . . . . 443.5 Thoughts on Hierarchical X-Schemas . . . . . . . . . . . . . . . . . . . . . . 45

    4 Linking Actions and Verbs via Features 48

    4.1 Cognitive and Linguistic Motivation . . . . . . . . . . . . . . . . . . . . . . 484.2 The Linking Feature Structure . . . . . . . . . . . . . . . . . . . . . . . . . 50

    4.2.1 The linking feature set . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2.2 Connecting to x-schemas . . . . . . . . . . . . . . . . . . . . . . . . 534.2.3 Deriving the feature set . . . . . . . . . . . . . . . . . . . . . . . . . 55

  • iv

    4.2.4 How many features? . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2.5 Why a separate linking structure? . . . . . . . . . . . . . . . . . . . 574.2.6 Static vs. dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    4.3 Connectionist Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    5 Word Senses and their Use 615.1 Polysemy|Why Do Languages Have It? . . . . . . . . . . . . . . . . . . . . 625.2 Structure of a Word Sense . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    5.2.1 On the use of probabilities . . . . . . . . . . . . . . . . . . . . . . . . 655.2.2 An illustration of two senses of push . . . . . . . . . . . . . . . . . . 67

    5.3 Labelling and Obeying Algorithms . . . . . . . . . . . . . . . . . . . . . . . 695.3.1 Labelling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.3.2 Obeying algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    5.4 Connectionist Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.4.1 Triangle units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.4.2 Complex triangle units . . . . . . . . . . . . . . . . . . . . . . . . . . 775.4.3 Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.4.4 Labelling and obeying . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    5.5 Some Cognitive Linguistics Issues Considered . . . . . . . . . . . . . . . . . 835.5.1 Prototype e�ects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.5.2 Radial categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.5.3 Basic-level e�ects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.5.4 Pragmatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    5.6 Limitations of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.7 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    6 Verb Learning 91

    6.1 Children's Verb Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.2 Learning Word Senses via Model Merging . . . . . . . . . . . . . . . . . . . 93

    6.2.1 An illustration of merging . . . . . . . . . . . . . . . . . . . . . . . . 946.2.2 A Bayesian criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.2.3 Model merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996.2.4 Algorithm details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006.2.5 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . 1056.2.6 Updating the virtual sample priors . . . . . . . . . . . . . . . . . . . 1056.2.7 Summary of algorithm parameters . . . . . . . . . . . . . . . . . . . 107

    6.3 Alternatives to Model Merging . . . . . . . . . . . . . . . . . . . . . . . . . 1076.4 Connectionist Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

    6.4.1 Recruitment learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.4.2 Merging via recruitment . . . . . . . . . . . . . . . . . . . . . . . . . 113

    6.5 Overgeneralization and Contrast Sets . . . . . . . . . . . . . . . . . . . . . . 117

  • v

    7 Adding Verb Satellites 119

    7.1 The Nature of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197.2 Slots: A Provisional Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    7.2.1 Labelling and obeying . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.2.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    7.3 Limitations of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267.4 Thoughts on Construction Grammar and Learning . . . . . . . . . . . . . . 128

    8 Learning Results 1308.1 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

    8.1.1 Animating actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328.2 Results for English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

    8.2.1 Training run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1348.2.2 Tour of the learned lexicon . . . . . . . . . . . . . . . . . . . . . . . 1378.2.3 Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

    8.3 Crosslinguistic Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1548.3.1 Farsi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1548.3.2 Russian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568.3.3 Other crosslinguistic examples . . . . . . . . . . . . . . . . . . . . . 163

    8.4 Sensitivity to Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1668.5 Unlearnable Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688.6 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

    9 Final Thoughts 1719.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1719.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

    9.2.1 Computer science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1729.2.2 Cognitive modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

    9.3 Some Objections Considered . . . . . . . . . . . . . . . . . . . . . . . . . . 1759.4 New Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

    9.4.1 Classi�ers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1769.4.2 Reversatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779.4.3 Speech acts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1789.4.4 Probabilistic linking f-structs . . . . . . . . . . . . . . . . . . . . . . 1789.4.5 X-schema learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1799.4.6 Integrating x-schemas with image schemas . . . . . . . . . . . . . . . 179

    9.5 X-Schemas for Abstract Thought . . . . . . . . . . . . . . . . . . . . . . . . 1809.6 The Real World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

    A Guide to the VerbLearn Software System 184A.1 Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184A.2 Running the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

    A.2.1 The main program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187A.2.2 The scenario generator program . . . . . . . . . . . . . . . . . . . . 190

    A.3 Code Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

  • vi

    A.3.1 Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191A.3.2 Jack Lisp Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

    Bibliography 195

  • vii

    List of Figures

    1.1 Top-level architecture of the verb learning model. . . . . . . . . . . . . . . . 2

    3.1 A taxonomy of Jack grasp synergies. Courtesy of the Center for HumanModeling and Simulation at University of Pennsylvania and Transom Tech-nologies, Inc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.2 Some common Petri net constructs. (a) shows the simplest case of an enabledtransition �ring. (b)-(d) show constructs for sequentiality, branching andconcurrency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3.3 Translating durative-action transitions into the standard Petri formalismwith instantaneous transitions. . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.4 The Slide x-schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5 The Lift x-schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.6 The Rotate x-schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.7 The Depress x-schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.8 The Touch x-schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.9 An example connectionist implementation of Petri nets including one transi-

    tion and several places. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.10 A SHRUTI implementation of a Push x-schema which is similar in function

    to the Slide x-schema of Figure 3.4. From Shastri et al. (1997). . . . . . . 423.11 Two x-schemas organized hierarchically. . . . . . . . . . . . . . . . . . . . . 46

    4.1 A linking feature structure (top) and its connection to the Slide x-schema. 514.2 The di�erent roles played by the linking features in di�erent x-schemas. . . 544.3 The connectionist representation of a linking feature such as force, and its

    connection to motor control. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    5.1 Two senses of push with full speci�cation of their probability distributions. 675.2 The full model as originally depicted in Figure 1.1 but �lled in with the Slide

    x-schema, several linking features and two verb representations. . . . . . . . 685.3 Formulas used to determine the best label v for a given linking f-structure l. 705.4 Formulas used to determine the best motor parameters p for obeying a com-

    mand v given initial world state w. . . . . . . . . . . . . . . . . . . . . . . . 735.5 (a) A simple triangle unit which binds A, B and C. (b) One possible neural

    realization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

  • viii

    5.6 Using a triangle unit to represent the value (palm) of a feature (posture) foran entity ("push"). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    5.7 (a) A complex triangle unit with multiple weighted connections per side. (b)One possible neural realization. . . . . . . . . . . . . . . . . . . . . . . . . . 78

    5.8 A connectionist version of the model, using a collection of triangle units foreach word sense. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    6.1 Learning two senses of push. . . . . . . . . . . . . . . . . . . . . . . . . . . . 946.2 Formulas used to determine the posterior probability of a lexicon model m. 966.3 Summary of parameters of the labelling, obeying and learning algorithms

    and their settings for the English training run. . . . . . . . . . . . . . . . . 1086.4 Recruitment of triangle unit T3 to represent the binding E{F{G. . . . . . . 1116.5 Connectionist merging of two word senses via recruitment of a new triangle

    unit circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    7.1 Word senses from each of two slots are chosen and then combined to processthe command push left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    8.1 A typical Jack stillshot from a hand action animation. . . . . . . . . . . . . 1338.2 Plot of the push model's prior, likelihood and posterior probabilities, as well

    as the recognition rate, as merging proceeds. . . . . . . . . . . . . . . . . . 1528.3 Plot of the recognition rate for various learned lexicons with di�erent total

    numbers of word senses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    A.1 The VerbLearn software system, showing the Lexicon Inspection Window. . 189

  • ix

    Acknowledgements

    As work on this thesis comes to a close, I am reminded of this quote from Terry Regier's

    dissertation, which really hits the nail on the head:

    I have ipped and opped a number of times as regards my general outlook onthe work reported here. At times I've been frustrated purple by my inabilityto make any substantive headway through the morass; at other times I wasconvinced that any idiot could do it. And in the worst of times, these twooutlooks co-occurred. More recently, though, I have settled in to somethingapproaching a very modest satisfaction.

    Through it all, I've gotten by with a little help from my friends. My greatest debt

    is owed to my advisor Jerry Feldman, with his remarkable combination of broad knowledge,

    uncanny intuition, thoughtful guidance, open-mindedness, and considerable patience. Srini

    Narayanan, fellow graduate student, has been an invaluable resource and a convenient target

    to bounce ideas o�; I hope I've returned the favor adequately. And my appreciation also

    goes to Terry Regier, whose thesis framework provided the model for my own.

    The Neural Theory of Language group at ICSI provided a stimulating working

    group for the duration my graduate student days, including the above folks as well as

    George Lako� (whose linguist's perspective and infectious enthusiasm got me into this ter-

    ritory in the �rst place), Nancy Chang (from whose amazing editing skills I have greatly

    bene�ted), Jonathan Segal, Dean Grannes, Andreas Stolcke, Collin Baker, Lokendra Shas-

    tri, Ben Gomes and David Stoutamire. Eve Sweetser and Len Talmy helped re�ne my

    necessarily-amateur linguistic analyses.

    I also thank my other committee members, Robert Wilensky and Dan Slobin, for

    taking the time to critique this work from their di�erent perspectives.

    Much intuition has come from crosslinguistic data cheerfully provided by Masha

    Brodsky, James Chan, Nikki Mirghafori, D. R. Mani, and Srini Narayanan. And a special

    thanks to Carol Bleyle who put much e�ort into gathering data from her ESL students.

    Jane Edwards kindly provided me with child data on a number of verbs.

    Graduate studentdom is not without its bureaucratic hurdles, and Kathryn Crab-

    tree's talents in this regard, and good humor, have smoothed the bumps quite nicely. She

    is an underappreciated jewel in the Computer Science Division.

    Digging back a little further, I'd like to acknowledge two professors frommy Cornell

    undergraduate days: David Gries taught me rigorous thinking, and Devika Subramanian

  • x

    kindled my interest in arti�cial intelligence.

    I thank the Center for Human Modeling and Simulation at University of Pennsyl-

    vania for making the Jack animation package available. Kudos, too, to the rag-tag eet of

    hackers around the globe who created the Linux operating system and made it possible for

    me to work as productively at home as at the o�ce.

    Encouragement, some ideas, and constant demonstrations that there is more to

    life than a thesis, were generously provided by my wife Lisa Leinbaugh, parents Robert and

    Cheryl Bailey, and in-laws Mary and Dennis Leinbaugh. I'd like to thank them for their

    patience and remind them that they are my real measure of happiness in this life.

  • 1

    Chapter 1

    Overview

    \Of the above possible �elds the learning of languages wouldbe the most impressive, since it is the most human of these

    activities. This �eld seems however to depend rather toomuch on sense organs and locomotion to be feasible."

    |Alan Turing, 1948

    How do children learn to use verbs such as push and pull? They are able to do

    so after hearing just a few examples, even though di�erent languages classify actions quite

    di�erently. The answer to this tantalizing question, we believe, is that the common substrate

    of the human motor control system drives children's rapid yet exible acquisition of the

    lexicon of action verbs in their native language. This hypothesis is explored by building a

    computational model of motor control and word learning, and testing its acquisition of the

    relevant vocabulary from a range of the world's languages.

    As an interdisciplinary endeavor this dissertation addresses a wide audience, rang-

    ing from the arti�cial intelligence community to linguists, psychologists and neurobiologists.

    Accordingly, the material has been organized so that each chapter fully discusses one aspect

    of the model, including motivation, representations and algorithms, connectionist account,

    cognitive implications, limitations and extensions. For the sake of the computer scien-

    tist who wishes to cut to the chase, the core implemented computational ideas are always

    grouped into a single block within each chapter (usually in a single section). Machine

    learning experts may wish to skip directly to Chapter 6 and Chapter 8.

    Chapter 2 begins by motivating the verb acquisition problem in detail. We see

    that the English vocabulary for hand actions is quite rich, and furthermore that other

  • CHAPTER 1. OVERVIEW 2

    Linking feature structure

    Verbs

    Labelling

    FeatureExtraction

    ExecutionGuidance

    3 senses

    Motor Actions (X−Schemas)

    Obeying

    Figure 1.1: Top-level architecture of the verb learning model.

    languages of the world classify hand actions in signi�cantly di�erent ways. In that chapter

    we present our philosophical stance toward the question of how children deal with this

    variety: bodily grounding of semantics constrains possible meanings. In this framework we

    pose a speci�c computational task: Given a set of pairings of actions and verbs, learn the

    bidirectional mapping between the two so that future actions can be labelled and future

    commands can be acted out.

    Figure 1.1 shows the architecture of our cognitive and computational model of

    action verb acquisition. As does the dissertation as a whole, the overview will proceed

    \bottom-up" through this diagram, and then move on to learning. For concreteness, a

    simpli�ed running example will be developed during the overview.

    Given the command-obeying component of the task, it is obvious that our so-

    lution must include an active motor control mechanism, not just passive descriptions of

    actions. Chapter 3 presents executing schemas (x-schemas for short, shown at bottom of

    Figure 1.1), our model of high-level motor control, where synchronization and parameter-

  • CHAPTER 1. OVERVIEW 3

    ization of lower-level motor synergies are the key issues. Consequently, the model makes

    the strong claim that details of lower-level motor control are not linguistically relevant.

    X-schemas are described using the Petri net formalism (Murata 1989), allowing natural

    expression of concurrency and asynchrony. Generally there is a 1-to-1 mapping between

    x-schemas and goals. X-schemas for a variety of object manipulation tasks such as sliding

    and lifting are developed. For example:

    Example: A very simple x-schema for sliding an object on a tabletop mightlook like this:

    start

    Slide

    PALM

    GRASP

    (dir, force)MOVE done

    This x-schema begins by either grasping the object or placing the palmagainst it, and then proceeds to move it in a given direction with a givenforce.

    Interfacing the execution of these x-schemas to language (bidirectionally) is ac-

    complished by a set of special features called the linking feature-structure (linking f-struct

    for short, shown in center of Figure 1.1), described in Chapter 4. The linking f-struct

    essentially summarizes the execution of an x-schema as a collection of features. A key con-

    sequence of the overall architecture is that all linguistically relevant aspects of x-schemas

    must be represented in these features, which play the crucial role of further restricting the

    hypothesis space so as to render verb learning tractable (in terms of both computation time

    and the number of training examples needed). In particular, the summarizing nature of the

    linking f-struct allows the verb learning algorithm to avoid dealing directly with the time-

    varying activity of x-schemas. Linking features include the name of the executed x-schema

    (indicating intention), parameters such as force or direction, control ow patterns such as

    loop repetition, and perceptual information necessary to guide action. For example:

  • CHAPTER 1. OVERVIEW 4

    Example: Linking features, derived from our example x-schema and otherslike it, might include the following:

    schema

    slide, depress palm, grasp,index up, down

    low, med, high

    forcedirectionposture

    away, toward,

    Each column represents a feature. The upper box names the feature, andthe lower box lists possible values. Note that features can range from thex-schema name, to choices of hand posture, to parameters of the primitivesynergies such as direction and force.

    Next we turn to the semantic representation of verbs, the topic of Chapter 5.

    Since the model must be able to re-create appropriate actions for a given verb, we cannot

    just represent the minimal abstractions needed to distinguish one verb from another. Rather

    we need to capture a richer \gestalt" representation, which is accomplished by a conjunction

    of linking features called a word sense f-struct. In a word sense f-struct each feature can

    be associated with multiple possible values, with varying strengths of association. This

    permits graded judgments and hence an ability to generalize. By treating these degrees of

    association as probabilities, well-known statistical techniques can be brought to bear on the

    problem. Sometimes, the uses of a verb are too varied to sensibly encode as a single (albeit

    probabilistic) conjunct. For these cases, we employ multiple senses for a single word, as

    shown at the top of Figure 1.1. Labelling an action involves choosing the word sense f-struct

    which most closely matches the linking f-struct resulting from the action, and emitting the

    corresponding verb. Conversely, obeying a command involves choosing, for the given verb, a

    word sense f-struct whose features �t best with the current world state, and then copying the

    word sense f-struct's features into the linking f-struct in order to guide x-schema execution.

    Chapter 5 evaluates this model with respect to notions of human categorization including

    prototype e�ects, basic-level e�ects and radial categorization. An example of the word sense

    representation follows:

  • CHAPTER 1. OVERVIEW 5

    Example: Simpli�ed representations for some senses of the verbs push andpull might look like this:

    schema posture direction

    slide 100%

    PULL: 1 sense

    grasp 80%palm 20%

    index 0%down 10%up 10%toward 70%

    PUSH: 2 sensessense 1

    schema direction

    sense 2

    schema posture

    slide 100%touch 100%slide 0%

    touch 0% grasp 10%palm 60%

    index 30%

    away 50%

    down 30%

    toward 5%up 15% index 10%

    grasp 5%palm 85% low 10%

    high 60%med 30%

    forceposture

    away 10%

    touch 0%

    In use, the �rst sense of push generates sliding actions which usually use thepalm but in a suitable context might use the index �nger, and which tend tobe directed away from the body or downward. The second sense generatesactions such as pushing on a wall, in which there is no motion but insteada steady application of medium to high force, almost always involving thepalm posture. The single pull sense shown here generates sliding actionstoward the body using a grasp. In recognition mode, an occurrence of one ofthese three prototypical actions would strongly activate the correspondingsense, leading to production of the appropriate verb. Other actions wouldweakly activate multiple senses, in which case a verb is produced only if thewinner's activation exceeds a threshold.

    Finally we arrive at the core of the dissertation: the learning algorithm. Since

    much of the model's structure (namely the x-schemas and linking features) is speci�ed

    before learning, the learning process involves only generating an appropriate set of word

    sense f-structs given the data. Chapter 6 begins with a review of the lexical acquisition

    literature, from which three important constraints are taken: children learn to label their

    own actions, do so with little negative evidence, and exhibit fast mapping (learning from as

    few as one example). This leads to the choice of Bayesian model merging for our learning

    algorithm. One key property of this statistical approach to achieving good generalization

    ability is the use of a Bayesian criterion which explicitly speci�es the trade-o� between

    (1) a preference for a small number of word senses, and (2) the ability of a larger number

    of senses to more accurately represent the training data. Another key property of model

    merging is that it captures \fast mapping" because new words are immediately modelled

    by a word sense which essentially copies the training example.

  • CHAPTER 1. OVERVIEW 6

    Example: As instances of pushing and pulling occur, a new sense is initiallycreated for each, closely matching the instance. In accordance with theformulas presented in Chapter 6, similar senses are then merged until onlythe three senses described above remain. Their probability tables reectthe instances merged to form them. The two senses of push do not mergebecause if they did, important correlations|e.g. that sliding pushes oftenuse the index �nger posture but touching pushes rarely do|would be lost.

    A sketch of a connectionist network version of the model, including this learning

    method, is presented in parts of Chapters 3 through 6, though it is mostly unimplemented.

    Recruitment learning, conjunctive \triangle units" and winner-take-all connectivity provide

    the mechanisms needed to implement the model.

    Chapter 7 presents a way to extend the model to handle verb complexes, by

    which we mean a verb plus any inections, particles or auxiliaries which help to specify the

    action. Essentially this involves grouping the word sense f-structs into separate slots for

    each grammatical position, and extending the labelling and obeying algorithms to execute

    within each slot. When a verb complex is given as a command and its component words

    specify conicting feature values, the word with the more selective probability distribution

    is preferred. Interestingly, the learning algorithm can remain unchanged. However, a useful

    heuristic is implemented to encourage the system to learn the typically relevant features in

    each slot to speed learning of new words.

    Example: Assume we have several x-schemas for moving objects like theSlide x-schema described earlier, and we label actions with two words: averb and a directional modi�er. The resulting representation for a modi�erlike up might look like this:

    schema posture direction

    away 0%toward 0%

    UP: 1 sense

    slide 10%

    index 30%grasp 35%palm 35%

    down 0%up 100%depress 10%. . .

    touch 10%

    The word codes very selectively for the upward direction and hence willoverride any weaker directional correlations in verbs such as those shownearlier. Push up therefore generates an action directed upward, not away.

    The model has been tested on a variety of verbs from languages such as English,

    Farsi, and Russian. Chapter 8 surveys these results, �rst with English verbs and then

    crosslinguistically. Sensitivity to the several learning parameters is discussed, as well as

  • CHAPTER 1. OVERVIEW 7

    some categories which are not learnable by the model.

    Finally, Chapter 9 discusses the implications of our model, addresses some objec-

    tions, points out new questions it raises, and discusses related e�orts applying the x-schema

    formalism to other domains. We conclude with some thoughts on real world uses of this

    work.

  • 8

    Chapter 2

    Setting the Stage

    2.1 A Crosslinguistic Conundrum . . . . . . . . . . . 82.2 It's the Body, Stupid! . . . . . . . . . . . . . . . 112.3 The Task in Detail . . . . . . . . . . . . . . . . . 142.4 A Connectionist Commitment . . . . . . . . . . . 162.5 Related E�orts . . . . . . . . . . . . . . . . . . . 17

    \It takes . . . a mind debauched by learning to carry theprocess of making the natural seem strange, so far as to ask

    for the why of any instinctive human act."|William James

    Historically, most of the e�ort in analyzing language has focused on its generative

    capacity|how words are combined|treating the meanings of individual words as a com-

    paratively simple problem. This chapter argues that the issue of lexical semantics is itself

    subtle enough to warrant computational modelling, and proposes a methodology for the

    particular case of action verbs.

    2.1 A Crosslinguistic Conundrum

    Have you ever considered the verbs you use to describe actions you perform with

    your hands? Many people are surprised by the number of such verbs, and the subtle

    distinctions they make. Consider the following list, which is far from complete:

  • CHAPTER 2. SETTING THE STAGE 9

    get, seize, snatch, grab, grasp, pick (up), take, hold, grip, clutch,put, place, lay, drop, slam, release, let go, move, push, pull, shove,yank, slide, bat, ick, tug, nudge, lift, raise, hoist, lower, pass over,lob, toss, throw, ing, whip, chuck, hit, tap, rap, bang, slap, press,poke, punch, rub, shake, pry, turn (over), ip (over), tip (over),rotate, spin, twirl, handle, squeeze, pinch, tie, twist, bend, bounce,scrape, scratch, scrub, smear, crush, smash, shatter, scatter, spread(out/on), cut, slice, clip, wipe, brush, grind, tighten, loosen, open,close, insert, remove, hook, hang, balance, peel, (un)wind, dunk,(un)zip, juggle, knead, dribble, scribble, hand, pass, salute, caress,fondle, pet, pat, stroke, wave, point, hide, stack, touch, feel, reach(for), stop, help, resist, try, bump, slip, knock (over/down)

    There are some gross distinctions in meaning (e.g. possession changes vs. object

    movement vs. object manipulation) but also considerable variation of a subtler kind which

    doesn't so easily admit qualitative characterization (e.g. grab vs. seize, or raise vs. lift vs.

    hoist, or ing vs. toss).

    How could children possibly learn all these �ne distinctions? Maybe all these

    concepts are already in the child's mind|either pre-wired, or as a result of maturation or

    experience|before the child begins to learn language. If so, the verb learning task would

    amount to a game of mix-and-match between verbs and concepts|a comparatively easy

    task. This has been proposed by Nelson (1973).

    But we will argue in this thesis that this can't be the case. A few examples of

    some conceptual distinctions made in other languages of the world should convince you. The

    following examples are from our own informal crosslinguistic survey of languages including

    Tamil, Cantonese, Farsi, Spanish, Korean, Japanese and Arabic:

    � THALLU & ILU (Tamil): These correspond roughly to pushing and pulling; how-

    ever, they connote a sudden action as opposed to continuous application of force and

    smooth movement. The only way to get this latter meaning is to su�x a directional

    speci�er. Thus there is no way to indicate smooth pushing in an arbitrary direction.

    � HOL-DAADAN & FESHAAR-DAADAN (Farsi): These correspond to two

    di�erent senses of push. Hol-daadan refers to moving an object away from oneself. (It

    is actually closer to shove as it implies high force; there is in fact no word for gentle

    or continuous pushing, other than the generic move verb with a directional speci�er.)

  • CHAPTER 2. SETTING THE STAGE 10

    In contrast, feshaar-daadan refers to applying steady pressure to an unmoving object,

    e.g. pushing on a wall.1

    � PULSAR & PRESIONAR (Spanish): These verbs correspond to English press,

    but they make a distinction based on hand posture. Pulsar refers to pressing with a

    single �nger, while presionar refers to pressing with the entire palm.

    � PUDI (Tamil): Pudi covers both obtaining an object, as well as continuing to hold

    an object. It connotes quickness in the �rst case, and exertion of force in the second

    case. It prefers the use of either a cupped palm supporting the object, or else a closed

    �st. Close English verbs are catch, clutch and restrain.

    � ZADAN (Farsi): This word refers to a large number of object manipulations whose

    common character seems to be the use of quick motions. The prototypical zadan is a

    hitting action, though it can also mean to snatch (ghaap zadan), or to strum a guitar

    (or play any musical instrument, for that matter!).

    � MEET (Cantonese): This verb covers both pinching and tearing. It seems to

    connote forceful manipulation by two �ngers, yet it is also acceptable for tearing

    larger items where two full grasps are used.

    � DROP: Neither Tamil nor Cantonese has a verb for gentle dropping. Both languages

    instead possess one verb for grip-release (i.e. let go) which does not specify whether

    the released object is otherwise supported, and another verb for throwing down, which

    connotes use of force.

    � VAIIE & PODU (Tamil): Both verbs can refer to putting an object down. Care-

    fully executed puts which ensure the object is placed securely use vaiie, although

    this verb is perhaps closer to English keep, since its prototypical case refers simply to

    maintaining an object in a given location, without expenditure of e�ort. Meanwhile

    podu connotes a careless put|indeed it includes throwing the object down. There

    does not seem to be an equivalent of place, connoting gentleness but focusing on the

    relocation of the object.

    1Daadan means to give. Hol is a noun for an outward movement, but it is not used alone. Feshaar is acommonly used noun for pressure.

  • CHAPTER 2. SETTING THE STAGE 11

    2.2 It's the Body, Stupid!

    We have seen that languages are quite rich in verbs of hand action and also seem

    to vary widely. Yet children learn those verbs in their native tongue from a modest number

    of samples and quickly generalize their words correctly (or near correctly). How?

    The answer explored in this dissertation is that the potential variety of lexicalized

    action categories is not in�nite, but instead is constrained by virtue of being grounded in the

    workings of the human motor control system. Properly construed, this grounding greatly

    restricts the size of the hypothesis space for verb acquisition, rendering it tractable.

    In some sense it is an undeniable and obvious claim that language must ultimately

    be bodily grounded, since it is a human activity. What this dissertation attempts to do

    is to answer the question of how this grounding is important for computational models of

    language. As I see it, there are three main aspects to bodily grounding as it applies to action

    verbs: (1) neural constraints on information processing algorithms; (2) simple facts about

    the structure of the body (e.g. arm, hand, �ve �ngers, etc.); and (3) organizing principles

    of the motor control system (e.g. discrete coordination of simple synergies). These topics

    will be the theme of the dissertation.

    Upon reection, it's not surprising that details of the motor system are implicated

    in semantics: while abstract representations at the level of \CAUSE(POSSESS(x))" (such

    as the conceptual dependency representation of Schank (1975)) may be useful for reasoning,

    they clearly are inadequate for actually performing the action via the arm and hand. In

    this thesis we will look at representations that do support driving actual behavior.

    The following excerpt from Webster's Ninth New Collegiate Dictionary corrobo-

    rates the view that motor control plays a central role in making some of the �ner distinctions

    in English:

  • CHAPTER 2. SETTING THE STAGE 12

    \TAKE, SEIZE, GRASP, CLUTCH, SNATCH, GRAB mean to gethold of by or as if by catching up with the hand. TAKE is a generalterm applicable to any manner of getting something into one's pos-session or control; SEIZE implies a sudden and forcible movementin getting hold of something tangible or an apprehending of some-thing eeting or elusive when intangible; GRASP stresses a layinghold so as to have �rmly in possession; CLUTCH suggests avidity oranxiety in seizing or grasping and may imply less success in holding;SNATCH suggests more suddenness or quickness but less force thanSEIZE; GRAB implies more roughness or rudeness than SNATCH."

    Most of these distinctions could be made based on features such as speed, force, security of

    grip, and precision of motion. Going back to the crosslinguistic examples from the previous

    section, it is clear that these distinctions, too, can usually be made in terms of motor control

    features, broadly construed to include both goals and those aspects of world state which

    are directly relevant to carrying out actions.

    The inuence of motor control and intentional activity in general on theories of

    conceptual representation has a long history. Piaget is perhaps the best-known advocate,

    having developed a comprehensive theory of the development of abstract concepts via in-

    teraction with the world. Pinker, also, points out that children must certainly attend to

    internal (and hence externally unobservable) variables such as goals in determining word

    choice, since often two di�erent words are uttered in the same world state. And Landau &

    Gleitman (1985) show that blind children learn language more or less normally despite the

    absence of what is often assumed to be the primary semantic source|vision. For example,

    they found that blind children readily learn the verb look. But in their own behavior, it

    translates to haptic exploration (that is, using the sense of touch). The core meaning of the

    verb, they suggest, is \explore with the primary sense," certainly an action-oriented mean-

    ing. More recently, brain imaging studies (Damasio & Tranel 1993) have made a strong case

    for the intimate connection between language and sensorimotor areas of the brain: verbs

    activate motor control regions, while nouns do not.

    The study of embodied cognition generally is not a new enterprise. Johnson (1987)

    argues that human conceptualization is \imaginative" in the sense that our concepts tend

    to reect biases resulting from the human condition|whether perceptual or having to do

    with the kinds of goals which people seek to achieve|rather than purely reecting an

    objective structure to the external world. Lako� (1987) also argues persuasively for a \non-

  • CHAPTER 2. SETTING THE STAGE 13

    objectivist" basis for semantics. Ultimately, embodiment must be explained in terms of

    neural structures. The role of connectionist modelling in this dissertation will be discussed

    shortly (x2.4).

    To be sure, only a subset of human concepts are directly bodily grounded. How-

    ever, as Lako� & Johnson (1980) further argue, these bodily concepts frequently underlie

    more abstract concepts metaphorically. A computational account of metaphor consistent

    with this thesis has been developed by Srini Narayanan and is discussed in x9.5.

    The best-known example of the study of embodied cognition is work on basic color

    terms, and we mention it here to make clear the kind of story we wish to tell. Berlin &

    Kay (1969) show that while languages di�er considerably in their \basic" terms for colors,

    there is an underlying pattern. In particular, given the number of basic color terms in a

    language, one can reliably predict what they will be. And for languages with fewer basic

    color terms (which thus cover wider ranges of the spectrum), the set of prototypes for

    each color corresponds to the basic color terms of the richer languages. Why should this

    be? The punchline of the story is that in later work (Kay & McDaniel 1978), the spectral

    characteristics of some of the prototypes were found to be predictable from the physiology

    of the visual system, which suggests why they might be so nearly universal. Recently, this

    work has been addressed in a computational framework by Lammens (1994). Lammens

    was disturbed that the earlier accounts could not explain non-spectral colors like brown

    or white (the latter would activate all the prototypes!), and built a computational model

    to investigate learning of colors. A key result of the work was that the correctness of the

    learned categories depended upon the choice of color-space with which to represent the light

    collected by the camera. With a cognitively inspired color-space, reasonable learning results

    were obtained using an optimization procedure which �t a multi-dimensional Gaussian to

    each color so as to maximize response to examples of the color while minimizing response

    to examples of other colors.

    To avoid confusion, we point out that our notion of embodiment is somewhat

    di�erent from that of Brooks (1986) and others in the autonomous robotics community,

    who emphasize the need for physically realizing robots in order to make progress in robotic

    control. For them, embodiment primarily means confronting the details of the \real world"

    such as sensor errors and e�ector failures. These concerns are certainly an important com-

    ponent of embodiment, and indeed the design of our model of motor control is partially

  • CHAPTER 2. SETTING THE STAGE 14

    driven by such concerns. However, the perspective on embodiment taken here is that these

    issues are secondary; the focus is instead upon the details of the structure of the human

    body and the principles of neural computation.

    With this mindset, then, I set out to �nd a suitable representation for motor

    control and to see how it partially determines the course of action verb acquisition.

    2.3 The Task in Detail

    In this study, the actions under consideration are limited to those of a single hand

    by a person seated at a table, on which there may be zero or one simple geometric objects

    such as a cube or stick. The verbs studied are limited to those applicable in this world.

    The task is to build a computational model which meets the following requirement,

    for any single natural language:

    � Given: a training set consisting of pairs, each containing

    { an action (represented by the motor control pattern which generates it)

    { a verb (as an atomic symbol)

    � Produce: a representation for the verb lexicon which allows

    { appropriate labelling of novel actions

    { appropriate obeying of verbal commands in novel world states

    To the extent possible, the trajectory of learning should reect the child's.

    The system should also be able to handle \verb complexes" such as keep pushing left.

    Since actions are represented as motor control activity, there is a methodological

    question of how to present them to native speakers in order to collect labels for the training

    data. Ideally, an animation software package, such as Jack from Transom Technologies,

    would be used to translate model internals into an on-screen depiction of the corresponding

    action. Informants viewing the animation could then con�dently label actions as well as

    evaluate the obeying of verbal commands. However, due to some di�culties in interfacing

    Jack to the verb-learning software system we have developed, various shortcuts were used in

  • CHAPTER 2. SETTING THE STAGE 15

    the experimental work reported here. In some cases, the author's knowledge of the internals

    of the model was used to physically demonstrate actions for speakers, or to label training

    data directly. In other cases, informants were familiarized with the internals of the model.

    For more details, see x8.1.

    By its nature, this computational task forces any solution to strike a balance

    between two extremes. The learning requirement demands that any solution model strong

    innate biases. Meanwhile, the crosslinguistic requirement guards against solutions with

    excessive biases which oversimplify the task actually faced by children.

    It is critical to understand that our task de�nition speci�es that the child is as-

    sumed to be labelling his own actions, and therefore has access to his internal state during

    the performance of the action, including his intentions. This is in contrast to using visual in-

    put (i.e. watching a parent perform the action). Certainly it is a simpli�cation to pretend the

    child never hears a verb in association with someone else's action rather than his own. But

    it has been shown that the own-action case is indeed the most frequent (Tomasello 1992).

    Furthermore, there is impressive evidence that even neonates can map others' actions onto

    their own motor control system (Meltzo� & Moore 1977), so even for the other's-action case,

    the language-learning child may be inclined to consider motor parameters as the primary

    semantic component.

    Another simpli�cation in our task is the pre-selection of the time window of activity

    labelled by the verb. This is done to simplify the problem, but Tomasello (1992) provides

    evidence that children may have help in this regard, too, since most verb labels are uttered

    immediately preceding the action, and for early verbs, the actions are usually short in

    duration.

    Many other simpli�cations have been made to render the task manageable. Lin-

    guistic context is not modelled, even though it could contribute to learning individual lexical

    items. (But see Goldberg (1995) for arguments on the separation of verb meaning and gram-

    matical meaning, and the di�erent nature of verb meaning from grammatical meaning.) The

    social domain is absent, restricting the vocabulary we can address. We avoid using objects

    with functional signi�cance beyond their simple spatial qualities, to avoid representing those

    other qualities. We don't deal with deformable, liquid or jointed objects. Nor do we deal

    with multiple objects or actions which involve both hands, or tools. We only consider ac-

    tions which do not involve planning, i.e. those whose \plan" is already wired as a motor

  • CHAPTER 2. SETTING THE STAGE 16

    control program (so we don't handle verbs like stack). Verbs are an open-class grammatical

    category, making these somewhat arbitrary restrictions necessary.

    Lastly, I would like to emphasize that this project explores only the motoric com-

    ponent of the full semantics of action verbs. This reects a belief that the motor component

    is central, but in no way does it represent a claim that the motor component is the full

    story.

    2.4 A Connectionist Commitment

    The role of connectionism in this work is very much tied up with the notion of

    bodily grounding. Neural plausibility provides further strong constraints on how concepts

    may be represented and learned, and thus should inform serious cognitive modelling e�orts.

    Yet working at the connectionist level can be cumbersome. In this thesis, it has

    proved useful to de�ne a number of computational tools which can be mapped to connec-

    tionist models, but to do most of the work at the higher, computational level. Indeed,

    our implementation of the verb learning system is done at this higher level. Throughout

    the thesis, we will provide sketches of connectionist networks which could implement our

    representations and algorithms. These sketches are intended to convince the knowledgeable

    connectionist that they are implementable, but the networks have not been simulated in

    code.

    The important question is, what are the biological constraints which one should

    respect when creating a connectionist network? What are the criteria for evaluating the

    neural plausibility of an algorithm? For high-level tasks such as language learning, precise

    neuronal modelling would be hopeless. What we consider are some broad computational

    constraints imposed by neural structures in general (Feldman & Ballard 1982), including:

    � use of many parallel, simple, slow computing units

    � no central controller|local rules only

    � no passing of structures between computing units (simple messages only)

    � substantially less than full connectivity among computing units

  • CHAPTER 2. SETTING THE STAGE 17

    One decision in designing a neural network is whether to represent the \concepts"

    of the problem domain in a punctate manner (Feldman & Ballard 1982) where \grandmother

    cells" are assigned individually to the concepts2, or in a distributed manner where concepts

    are represented by the activity of many neurons, each of which participates in many such

    concepts. The advantages of distributed representations (Rumelhart & McClelland 1986)

    include graceful degradation and the potential to allow learning algorithms to develop new

    features, and for these reasons they have been the focus of neural network research. Yet,

    learning algorithms for distributed representations, such as backpropagation (Rumelhart &

    McClelland 1986) and its variants, virtually always involve gradual adjustment of weights,

    rendering them useless for tasks in which one-shot learning is desired. Accordingly, we

    will focus on more punctate representations, which have the advantages of facilitating more

    structured design and, as we will see, faster (if less exible) learning.

    Within the connectionist framework, a variety of techniques have been developed

    (Hertz et al. 1991). A few selections from this toolbox will prove useful in demonstrating the

    neural plausibility of the verb learning model developed in this thesis. At a low level, we will

    use the notion that the activation level of a connectionist unit can represent an approximate

    probability, or degree of belief, that the concept it represents is currently applicable. We

    will also make use of the notion of thresholding, in which evidence for a concept must

    reach a certain level before the associated unit will �re. At a higher level, winner-take-all

    organization will be used to select units which best �t data. Recruitment learning (Feldman

    1982) provides a weight update rule and network pattern which proves useful for one-shot

    learning. And lastly, the notion of encoding bindings via temporal synchrony of separate

    units (Shastri & Ajjanagadde 1993) is used in one connectionist implementation of motor

    schemas.

    2.5 Related E�orts

    Ideas from a variety of �elds have been borrowed in this work. They are ac-

    knowledged along the way. This section highlights a small number of projects which have

    attempted an overall task very similar to this one, and which therefore invite comparison.

    2It should be stressed that the punctate model does not demand that concepts are represented by a singlebiological neuron, but allows for a small, functionally distinct cluster of neurons.

  • CHAPTER 2. SETTING THE STAGE 18

    This architecture could be considered an implementation of learning procedural

    semantics, as pioneered by Winograd's SHRDLU system (Winograd 1973). SHRDLU was

    a question-answering system which operated in the time-honored blocks world domain.

    The user could, for example, request that the system \Pick up the red block," and the

    appropriate object identi�cation and action would occur. If the request was ambiguous,

    the system might reply \Which one?" and could handle the simple response, \The big

    one." Later, if asked \What are you holding?" it would respond \The big red block."

    The current work di�ers from SHRDLU in two important ways. First, it pays considerably

    more attention to the �ne-grained motor details which can be involved in verb semantics.

    Winograd was not concerned with distinguishing push from shove, only with providing

    a set of qualitatively distinct actions and a single verb to map to each. It's not clear,

    for example, that SHRDLU could be straightforwardly recoded for an arbitrary natural

    language, since his \procedures" might not correspond to action categories encoded by

    all languages. This was entirely appropriate, for his focus was at the system level, i.e.

    on demonstrating how simple models of semantics, along with simple models of parsing

    and inference, could combine to provide a system capable of understanding full discourse

    when the domain was suitably limited. The second important di�erence is that this thesis

    addresses the problem of learning.

    Siskind (1992) built a system to learn to recognize action verbs in visual movies.

    Among the important contributions of this work is the identi�cation of contact, support and

    attachment relations as key features in understanding the scenes. A logical notion of events

    was used to discretize movies into \phases," which is not unrelated to the role played by

    x-schemas in the model proposed here. Yet di�culties arose from the use of necessary and

    su�cient conditions as a lexical representation: neither defaults nor focus of attention could

    be expressed, and the system exhibited brittleness in the face of minor timing variations.

    Furthermore, the input was strictly visual and thus faced an unnecessarily hard problem

    compared with our model which includes internal state of the actor.

    By far, the project most closely related to this thesis is the dissertation work of

    Regier (1996). Both projects are part of the Neural Theory of Language Project (formally

    called \L0") (Feldman et al. 1996), and Regier essentially provided the model for the type of

    research reported here. Regier built a model of the acquisition of spatial terms based upon

    features derived from the structure of the human visual system. This structure was modelled

  • CHAPTER 2. SETTING THE STAGE 19

    as a connectionist network with subnets for computing orientation relations and center-

    surround relationships. During learning it would develop new features, such as contact or

    inclusion, that were necessary for the language it was learning. A backpropagation network

    then categorized these features into spatial terms such as in, over or through in English.

    The current thesis borrows from Regier the methodology of training a structured network

    on lexical items from a variety of languages in order to force a balance between innate

    and learned structures. However, an important limitation of Regier's system is that it can

    function only as a recognizer; there is no way for it to produce a visual scene corresponding

    to a given spatial term. This de�ciency, it turns out, was the inspiration for my focus on

    actions|recognizing them, and carrying them out.

  • 20

    Chapter 3

    Executing Schemas for Controlling

    Actions

    3.1 Human Motor Control . . . . . . . . . . . . . . . 213.2 A Petri Net Model . . . . . . . . . . . . . . . . . 23

    3.2.1 Synergies as building blocks . . . . . . . . 243.2.2 The Petri net formalism . . . . . . . . . . . 273.2.3 Durative actions . . . . . . . . . . . . . . . 303.2.4 Accessing perceived world state . . . . . . 313.2.5 The Slide x-schema in detail . . . . . . . 333.2.6 Other x-schemas . . . . . . . . . . . . . . . 343.2.7 Multiple entry points . . . . . . . . . . . . 383.2.8 What can't be encoded in x-schemas? . . . 38

    3.3 Connectionist Account . . . . . . . . . . . . . . . 393.3.1 Petri net control . . . . . . . . . . . . . . . 393.3.2 Passing parameters . . . . . . . . . . . . . 41

    3.4 Related Ideas in Arti�cial Intelligence . . . . . . 443.5 Thoughts on Hierarchical X-Schemas . . . . . . . 45

    This chapter presents an active representation for control of actions. The represen-

    tation is crucial because in our model it provides the foundation of action verb semantics.

    In order to best understand the rationale behind the representation, we �rst briey review

    some important properties of human motor control which motivated its design.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 21

    3.1 Human Motor Control

    Before focusing on the neural aspects of motor control, consider the hand and arm

    and their behavior apart from neural control. All motion is the result of joint rotation.

    Muscularly, this rotation is accomplished by adjusting tension in opposing muscles, called

    exors (e.g. biceps) and extensors (e.g. triceps), attached to each degree of freedom of each

    joint. The arm has two joints, the shoulder and elbow. The shoulder is a three-degree-of-

    freedom ball-and-socket joint, while the elbow can either hinge or pivot. The four �ngers

    each contain two hinge joints with one degree of freedom each, and are attached to the

    metacarpals|the bones within the palm|via a bi-axial joint with two orthogonal degrees

    of freedom but no pivoting. Meanwhile the thumb, while possessing one fewer joint, enjoys

    a very exible saddle joint connecting at the proximal end of the metacarpal, facilitating

    opposition. Proprioception|the perception of the body's own state|is accomplished by

    muscle spindles which measure joint positions (the �nger muscles have some of the body's

    most sensitive position sensors (Napier 1993)) and other types of sensors which measure

    joint velocity. Various receptors embedded in the skin detect contact, pressure and shear.

    Controlling many joints at once is of course an exquisitely complex task, but there

    are biological design principles which manage the complexity. The key idea is the notion

    of the motor synergy (Bernstein 1967), which is a sub-cortical continuous feedback control

    circuit for a stereotyped motion, which may be modulated by parameters. The simplest

    example is the stretch reex, in which the stretching of either the exors or extensors (or

    simulated stretching, such as the doctor's tap on the knee) causes those muscles to contract,

    to counteract the stretch. This feedback loop involves just two neurons, which extend

    from the spinal column to the limb. The stretch reex contributes to maintaining posture

    and is also a building block for higher-level synergies such as walking. Synergies can also

    operate across modalities. The exor reex rapidly retracts a limb that is experiencing pain.

    More amazingly, the scratch reex responds immediately to a localized itch by choosing a

    suitable end e�ector (hand or foot), moving it to the needed location, and initiating a

    cyclic scratching motion.1 Many types of grasps are encapsulated as synergies. Cutkosky &

    Howe (1990) catalogs these grasps according to their uses, and argues that motion planning

    involves discrete choices amongst them.

    1See Kandel et al. (1991: Chapters 37{38) for a review of these and other motor reexes.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 22

    The restriction on arbitrary joint movements implicit in the notion of synergies

    is evident from experiments in bimanual coordination. Franz et al. (1991) has shown, for

    instance, that subjects cannot draw a square with one hand while drawing a circle with the

    other. When the two hands do manage to engage in di�erent activities, they often bear

    certain relationships to each other, such as mirror image activity.

    One consequence of the existence of such low-level synergies is that cortical control

    of activity needs to control many fewer degrees of freedom, since it need specify only the

    \name" of the synergy and its parameters. While the idea remains controversial, so-called

    command neurons which trigger a synergy by their activation have been located. For the

    simple stretch reex, activation and parameterization are accomplished by speci�cation of

    a single threshold. But even more complicated synergies can be modulated with a small

    number of parameters. Cat locomotion, for example, can be driven by a single labelled line

    (i.e. axon) from cortex to a central pattern generator (CPG) which not only controls the

    speed of the cat's gait, but also induces a switch to a di�erent gait (e.g. trot to gallop)

    as required to achieve the commanded speed (Shik & Orlovsky 1976) (reviewed in Kandel

    et al. (1991)). Parameters seem to be speci�ed separately from the coordinative structure,

    and often are encoded in an ensemble of neurons. For example, Georgopoulos (1993) has

    discovered population coding of direction and force in motor cortex of behaving monkeys.

    According to this scheme, a precise parameter value is speci�ed by the sum of the varying

    activations of an ensemble of neurons which, individually, are only coarsely selective.

    This modularization of low-level continuous control loops allows motor cortex to

    concentrate on higher-level concerns: the coordination of �ring of synergies. What types of

    coordination are required? The list obviously include sequentiality. Evidence that humans

    construct a sequential motor plan includes the work of Sternberg et al. (1978) on delays in

    starting or stopping typing sequences depending upon the length of the string to be typed.

    But the motor control system must also coordinate concurrent actions, as demonstrated by

    Arbib et al. (1987) in the context of preshaping the hand during the movement of the hand

    toward an object to be grasped. Synergy �ring can be based not only on current perceptual

    input but also on internal state. Central pattern generators are the simplest case of this;

    the e�ect of perceptual input is modulated by the state (or phase) of the central pattern

    generator. Thus high-level controllers are not simple percept!synergy maps. The existence

    of ballistic synergies (for actions which must be performed too quickly to allow for feedback

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 23

    control) necessitates a looping mechanism at the coordination level to implement successive

    re�nement. And lastly, the uncertainties of the world demand coordination of \emergency"

    error-correcting actions with the main sequence of actions. Motor representations with

    these kinds of capabilities are referred to as motor programs and their study constitutes a

    signi�cant sub�eld of neuroscience (see e.g. (Pearson 1993)). The supplementary motor area

    of the cortex appears to be implicated, since its activity increases with action complexity;

    its activity often precedes activity in primary motor cortex and initiation of movement;

    and cells in this area have been shown to be sensitive to ordering of actions about to be

    performed (Tanji 1994).

    While most of motor cortex is active only when actions are being carried out

    (or are about to be carried out), the premotor area is active even when actions are only

    thought about, including mental imagery or viewing another person acting (in which case

    the phenomenon is called \mirroring" (Gallese et al. 1996; Grafton et al. 1996)). The

    accepted view is that this area is involved in planning action sequences. This fact nicely

    supports related work (see x9.5) employing x-schemas for reasoning.

    In summary, then, the key properties of human motor control which we will capture

    in our representation are (1) synergies for continuous coordination of muscles during simple

    actions, (2) limited parameterization of these synergies, and (3) serial, concurrent and

    asynchronous combination of these synergies to compose complex actions. For the curious

    reader, Wing et al. (1996) provides a compendium of neurobiological and psychological

    research on hand movements. For a lighter survey on hands ranging from their evolution

    to left-hand taboos see Napier (1993).2

    3.2 A Petri Net Model

    Several constraints, then, drive the representation of actions described in this

    section. Logical descriptions are ruled out, since the representations must be able to support

    real-time control of the actions described. Traditional procedural attachment is ruled out,

    since \black box" controllers would not support the kind of inference about actions which is

    required in the language task. An inspectable yet active representation is needed, suggesting

    2In this work we won't address the issue of which portions of motor control are innate, maturationalor learned via experience. Huttenlocher et al. (1983) and Gopnik (1981) suggest, however, that lexicaldevelopment can be partly explained in terms of concurrent learning of language and motor competence.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 24

    a state machine formalism.3 In particular, the Petri net formalism has some desirable

    properties and is our choice for implementing x-schemas.

    3.2.1 Synergies as building blocks

    The Petri net formalism requires that there be a set of actions at the lowest

    level which are essentially atomic. In our x-schemas these will consist of actions which are

    hypothesized to be controlled by motor synergies as described in the previous section.4 This

    set has several properties. First, they form a limited set of distinct actions. Second, since

    they are atomic, the internal implementation of the primitive actions is irrelevant at the

    Petri net level. Generally these actions would be expected to be implemented as some form

    of continuous feedback controller. Third, most synergies have a small number of parameters

    to modulate their function. The following list includes all the synergies used in the examples

    in this dissertation:

    3This idea is not new in robot control, e.g. Brooks (1986).4While the \synergies" proposed here are biologically plausible in some ways (as discussed shortly), they

    are not taken directly from the motor control literature. Indeed, a full characterization of human motorsynergies remains an elusive goal.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 25

    List of Primitive Synergies

    MOVE ARM (direction, force, duration): apply force to move the

    arm in a feedback-controlled manner

    MOVE ARM TO (dest, [via]): ballistically move the arm to a target

    location, passing through the via point if it is speci�ed

    PIVOT WRIST (direction, force, duration): pivot the wrist around

    the axis of the forearm

    GRASP: preshape a circular grasp for holding round or cubic objects

    WRAP: preshape a grasp for holding long, thin objects

    PALM: preshape the palm for at contact with an object

    PLATFORM: preshape the palm to support an object from below

    PINCH: preshape grasp for holding objects between the thumb and index

    �nger

    APPLY HAND: close the �ngers and/or move the palm until they contact

    an object which is in front of the hand

    TIGHTEN GRIP: increment the gripping force of the �ngers

    RELEASE: open the hand, terminating any kind of grip

    POINT: extend index �nger while closing other �ngers

    While the internals of these primitive synergies are not modelled here, a few points

    are in order. First of all, note that most of the synergies involve moving a body part into

    a goal position or orientation. We will assume that invocation of one of these synergies

    when the body is already in the goal position is allowable and simply produces no motion.

    This assumption is compatible with several theories of motor control, including the spring

    model (Latash 1993; Jeannerod 1988) which hypothesizes that movements are generated by

    simply informing the muscles of their desired tension levels and then allowing the system to

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 26

    Figure 3.1: A taxonomy of Jack grasp synergies. Courtesy of the Center for Human Mod-eling and Simulation at University of Pennsylvania and Transom Technologies, Inc.

    relax to this state in accordance with the spring law. It is also compatible with comparator

    models (Jeannerod 1997: Chapter 6) in which cortex drives the muscles only until (or if)

    their perceived position di�ers from the goal position. The MOVE ARM TO synergy also

    demands a bit of explanation, since it seems to be missing some important parameters

    such as force or duration. The answer here is that we assume this synergy computes these

    parameters in accordance with Fitts' Law (Fitts 1954), which is an empirically derived rule

    relating force and duration to the accuracy requirements of a movement (determined by

    context or by the speci�city of the destination).

    Many of these synergies refer to types of grasps. This taxonomy roughly follows

    that of Cutkosky & Howe (1990) as mentioned in x3.1. It turns out that the Jack simulator

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 27

    also follows this taxonomy, and Figure 3.1 portrays the full taxonomy. Our synergy set uses

    only a subset of these grasps for simplicity.5

    Certainly, humans possess many more motor synergies relevant to hand actions.

    However, this set is large enough to make the points we want to convey.

    3.2.2 The Petri net formalism

    Executing schemas, or x-schemas for short, is the name given to our motor control

    representation. In the current design they are modeled as Petri nets. Additionally, there

    is a parameter-passing mechanism which operates in conjunction with the Petri formalism.

    Each x-schema is designed to achieve a given goal (such as obtaining an object) but may

    represent multiple ways of achieving the goal, depending on the world state.

    The Petri net formalism (Murata 1989; Reisig 1985) conveniently expresses most of

    the needed properties for coordination of synergies, including concurrency and asynchrony.

    A Petri net consists of places and transitions with directed connections between them.

    Places may represent either perceived states of the world or internal state, and the current

    state is indicated by the presence of a token. When all of the places with connections to a

    transition possess tokens6, the transition is enabled and may �re, which involves consuming

    those tokens and then depositing a token in each place with connections from the transition.

    Figure 3.2(a) shows a before-and-after view of the �ring of a single transition. Places are

    drawn as circles, transitions as rectangles, and tokens as solid dots.

    All transitions in a Petri net operate in parallel. There is no global clock, nor

    do �rings get serialized. Each transition �res whenever it becomes enabled. In general,

    the delay between enablement and �ring is unpredictable, although the formalism allows

    speci�cation of probability distributions on the delays. But we do not use this feature here,

    and in our x-schemas all delays are assumed to be zero.

    In our use of Petri nets, a transition usually represents an action, namely the

    execution of a primitive motor synergy. The action occurs exactly when the transition �res.

    These transitions are depicted with the name of the action inside the rectangle. Sometimes,

    5A more detailed model of grasping could, if necessary, be made. The grasp types above can be decom-posed using the opposition space and virtual �nger abstractions of MacKenzie & Iberall (1994).

    6Generally, only one token is required at each input place. Where more than one token is required, thenumber is indicated next to the incoming arc.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 28

    B CA

    c2

    1c

    (b) A Petri net which executes A, B and C in sequence.

    (a) A before-and-after depiction of a transition firing.

    B

    A

    B

    A

    (or a random choice if both c1 and c2 are already true).(c) A Petri net which executes A when c1 becomes true, or B when c2 becomes true

    (d) A Petri net which executes A and B concurrently (or, at least, asynchronously).

    | |

    Figure 3.2: Some common Petri net constructs. (a) shows the simplest case of an enabledtransition �ring. (b)-(d) show constructs for sequentiality, branching and concurrency.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 29

    though, transitions are needed simply to move tokens without any corresponding action.

    These transitions are left unlabelled.

    When assembling these building blocks into working Petri nets, certain patterns of

    network structure prove particularly useful. For example, sequential actions are a common

    requirement. The Petri net implementing sequential �ring of transitions is depicted in

    Figure 3.2(b). The places between each pair of transitions serve to pass \control" from

    one transition to the next. Placing a token in the left-most place, as shown, leads to the

    action sequence A, B, C and leaves a token in the right-most place. Loops are another

    common pattern, and are trivial to construct; if the �nal output arc of Figure 3.2(b) were

    to connect back to the left-most place, the net would generate the sequence A, B, C, A, B,

    C, . . . . Another pattern, branching on perceptual conditions, is slightly more complex and

    is shown in Figure 3.2(c). Separate places encode the mutually exclusive set of conditions,

    such as \c1" and \c2". When the token is deposited in the start place, only the transition

    connected to the currently-true condition �res. If none of the conditions is true, the net

    suspends until one becomes true. If the conditions are not mutually exclusive and more

    than one is currently true, then one of the two enabled transitions is chosen at random to

    �re.

    Figure 3.2(d) shows how a Petri net can encode concurrency. A transition with

    no associated action (labelled \||" to indicate concurrency) is used to turn one token at

    the start place into two tokens. The two tokens simultaneously enable transitions A and

    B, allowing them to �re simultaneously, or at least in an arbitrary order. Once two tokens

    arrive in the right-most place, we have a guarantee that both transitions have �red.

    Note that some places represent control ow, while others represent input from

    \outside" the net, i.e. perceptual information, which inuences the course of action. This

    latter type will be discussed shortly.

    A simple extension to Petri nets allows weighted connections. A weight on a

    connection into a transition speci�es the number of tokens which must be present in order

    to enable the transition. A weighted output arc from a transition speci�es that multiple

    tokens are emitted when the transition �res.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 30

    (b) initiateaction

    (a)durativeaction

    abortdesired

    abortdesired

    action

    donedetect

    A B

    BA

    abort

    Figure 3.3: Translating durative-action transitions into the standard Petri formalism withinstantaneous transitions.

    3.2.3 Durative actions

    In the Petri formalism, all transition �rings are instantaneous, and thus input to-

    kens are e�ectively consumed at the same time that output tokens are generated. Yet many

    of the synergies de�ned in x3.2.1 do not execute instantaneously. In order to represent such

    durative actions as transitions, we must alter the semantics of transitions from the standard

    Petri model as follows. When a durative transition �res, it consumes its input tokens im-

    mediately, but does not deposit its output tokens until its action completes. Furthermore,

    we allow a special kind of connection from a place to a durative transition which aborts the

    action-in-progress should a token become available before completion. These connections

    are drawn with a at bar at the \tip".

    Translating durative transitions back into standard Petri transitions can be ac-

    complished as shown in Figure 3.3. The translation assumes that the durative action can

    be redescribed as a pair of instantaneous actions which initiate and abort the action, along

    with a place which detects when the action is done. In some cases the needed detector is

    already explicitly modelled for other purposes (for example, detecting contact) but in other

    cases we would need to model new detectors (such as detecting that muscles have reached

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 31

    their set point). The translated network in Figure 3.3(b) operates as follows. When a token

    arrives in place A the \initiate" transition �res, invoking the corresponding synergy and im-

    mediately placing a token in the unlabelled place in the center of the �gure which indicates

    that the action is ongoing. In the usual case, the network then waits until completion of the

    synergy is detected. At this time the unlabelled transition, \noticing" that the action was

    ongoing but is now done, �res and deposits a token in place B. However, if there exists an

    \abort" connection and it receives a token before the synergy completes, then the \stop"

    transition �res instead, aborting the synergy and depositing a token in place B.

    3.2.4 Accessing perceived world state

    A central feature of an x-schema is that its execution path can be highly context-

    dependent, by virtue of having special places which receive tokens from external percep-

    tual sources. In practice, these perceptual places represent fairly high-level properties of

    the world, as opposed to, say, low-level visual details. Separate, unmodelled perceptual

    mechanisms are assumed to perform the appropriate computations over perceptual and

    proprioceptive inputs to generate these high-level properties. Such places have slightly non-

    standard behavior, in that the token is never depleted; if consumed by a transition, it is

    instantaneously replaced (unless, of course, the world state changes). These percepts are

    boolean in nature. They are listed below.

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 32

    List of Perceptual Places

    SMALL object size is small compared to hand

    LARGE object size is large compared to hand

    RESISTANCE detects a very high force opposing arm motion

    ELONGATED object has a large length-to-width ratio

    SLIPPING detects a slipping grip

    STABLE detects that the object is stably supported

    GRIPPING detects that object is already held in grip (any type of grip

    including palm contact)

    AT GOAL detects whether object is at goal location or orientation

    Naturally, x-schema transitions can be enabled by either the truth or falsity of

    these conditions. To allow this, the model has separate places for both the true and false

    cases. The need for inhibitory connections is thus removed, and the model also gains the

    capability of representing the \don't know" condition. It is the (unmodelled) perceptual

    mechanisms' responsibility for ensuring consistency between the two contradictory places.

    Not all of the relevant features of the world are boolean, however. X-schemas will

    need to make calculations, such as computing the force needed to move an object given its

    weight and a desired acceleration. We posit, therefore, a small working memory where such

    values are stored. Features contained in this area include:

    List of Perceptual Features (Quantitative)

    WEIGHT an estimate of the object's weight

    OBJLOC a vector indicating object position

  • CHAPTER 3. EXECUTING SCHEMAS FOR CONTROLLING ACTIONS 33

    force,dur)

    (horiz-dir,

    MOVEARM

    at goal

    at goalnot

    GRIPTIGHTENslipping

    | |start

    SlideSchema

    small

    large done

    2

    PALMPRESHAPE

    MOVEARM TO(objloc)

    GRASPPRESHAPE

    APPLYHAND

    2 2

    e

    Figure 3.4: The Slide x-schema.

    3.2.5 The Slide x-schema in detail

    Each x-schema corresponds to a single Petri net, and in this section we will examine

    the Slide x-schema in detail. It is shown in Figure 3.4.

    The Slide x-schema controls actions which move an object across the surface of

    a tabletop. It begins when a token is deposited in the \start" place at the left-hand side

    of the �gure. Its �rst transition essentially copies this token into two output places in

    preparation for carrying out two concurrent synergies (the || symbol is a reminder of this

    function). The two concurrent synergies are preshaping of the hand and moving the hand to

    the object. The preshaping step is conditional on the size of the object, choosing a circular

    �ve-�ngered grasp if the object is small but a at palm if the object is large.

    Only when both actions have completed|enforced by an arc with weight of 2|

    does the x-schema proceed to the next step, actually applying the hand to the object

    using the preshaped grip. The APPLY HAND transition outputs two tokens. One of them

    �res the MOVE ARM transition which engages the arm in the continuous horizontal motion

    which is the central action of the x-schema. The direction of motion parameter is externally

    speci�ed. The force of motion may be externally speci�ed, or may be computed from the

    desired acceleration and the estimated weight of the object. The duration of the movement

    may be externally speci�ed, but otherwise a value is computed which is likely to lan


Recommended