THEORY OF AUTomATA AND ITS APPLICATION TO …

- 78 -

THEORY OF AUTomATA AND ITS APPLICATION TO PSYCHOLOGY

P. Suppes

Contents

Lecture 1. Introduction (language).

Lecture 2. Register machines.

Lecture 3. Models for the performance of elementary arithmetic

computations. Linear regression models. Automaton mol

dels. Register machines.

Lecture 4. Language-processing. Classical model-theoretic seman

tics. Model-theoretic semantics for context-free

languages. Application to "Erica".

Lecture 5. From theory to corpus. Procedural semantics. The com

plexity of semantics.

Institute for Mathematical Studies in the Social Sciences, Stan

ford University, Stanford, Calif. 94305, U.S.A.

patguest

Typewritten Text

G.J. Dalenoort (Ed.), Process Models for Psychology. Lecture Notes of the NUFFIC International Summer Course, 1972. Rotterdam: Rotterdam University Press, 1973, pp. 78-123.

- 79 -

THEORY OF AUTOmATA AND ITS APPLICATION TO PSYCHOLOGY

P. Suppes

Lecture 1

1.1 I NTRODUCT I ON

Suppes

The first three lectures w~ll deal with language processlng.

The objectlve is to understand natural language, especially of

children. The emphasis ~s on granlmar and sefTIantics wi th the maln

accent on semantics. The semantics will mainly be concerned ulith

model-thBoret~c semantics. By model-theoretic semantics is meant

the set-thooretical eccount of the maenlng of a sentence that

makes use of standard techniques of modern logic and mathematics.

Another topic will be procedural semantics. The central question

here is how a semantic tree actually can be computed. The last

two lectures will deal with process models ln young student's

learning and performing elementary mathemetlcs

1.2

The ideas underlying model theoretic-semantics in mathematical

logic go back to Frege, they have been developed especIally by

the work of Tarski and hlS students, amonq whom I mention ~on

tague, who contributed substantially to the applicatIon of model

theory to natural language. most of model theory is not concerned

with natural language. The semantical approaches developed by

linguists or others whose viewpoint is that of generative grammar

have been lacking in the formal precision of model-theoretic se

mantlcs. my objective is to combine the viewpoint of model-theo

retIC semantics on the one hand and generative grammar, especial

ly the work of Chomsky, on the other. The tools of model-theory

th respect to the responsibility for the contents of these 1ecturessee page Vll. Summarized by L. Noordman, H. Schouwenburg.

Suppes - 80 -

wlil be brought to bear on the analysls of natural language.

A number of examples will give you an intuitive idea of the ap

plicatlon of model theory. Let us take as an instance the simple

noun phrase square table. That noun phrase can be represented by a

simple tree,

NP /'" AdJ

I square table

abbreviations: NP noun phrase

rig. 1.1

N noun

Adj= adjective t

called a derivation tree of a particular noun phrase grammar. This

grammar has (except for insertion of lexical ltems) as yet only one

pro due t ion r u 1 e: f~ P ~ Ad J + N. How is the s em ant i c s 0 f t his no un

phrase to be formalized? ~ith each production rule of the grammar

there is associated a semantic function and thus we may convert

each derlvatlon tree for a given terminal utterance to a semantic

tree by assigning a denotation to each node of the tree. The de

notation of each node of the tree is shown after the colon follow

ing the label of the node.

NP : 5" T

/'" Adj: S N T

I I square Stable T Fig. 1.2

The semantic tree is obtained by the assignment of denotations, in

this case S for the set of square things, T for the set of tables.

The denotatlon for Adj or N does not change as we come up the

tree. The denotation for the noun phrase is simply the inter

section of 5 and T. The semantic tree uses just the intersection

- 81 - Suppes

and 1dentity functions. The word order in the correspond1ng French

example table carree is reversed

NP : 5 r. T

/~ r~ : T Adj T

I I table T carree 5 Fig. 1.3

The root of the tree, the denotation, is Just the same. The se

mantic part of the tree is more like the English than the gramma

tical surface.

In order to study how to analyse denotation, an apparatus w1ll

be Introduced that is a modification of the claSSIcal semantic

theory. It is important to note that grammar and semantics go hand

in hand: the semantic analys1s is not a separate analys1s, it 1S

built on the same tree.

The noun phrase the square table is a slightly more elaborate

and compl1cated example. will give an analysis of the def1n1te

article in this example. A grammar with the following rules:

NP

AdJ

Adj

AdJ

yields

~ AdjP + N

P4 Adj P

p~ AdJ

p~ Det

this tree:

+ Adj abbreviatIons: AdJ P

Det

NP

/~ Adj P N

/'''''-Adj P Adj

I D

I

adjectIval phrase

determiner

the square table Fig. 1.4

Suppes - 82 -

The denotation of this noun phrase presents a problem: the and

sguare have to be put together, but it is much more natural to

think of the modifying square table as expressed in the grammati

cal tree:

NP : <j)<C,S n T)

/~ Oet : C NP I : Sf'lT

/"" Adj : 5 N T

I I the: C square: 5 table: T • 1.5

The denotation of the is C: the set of objects in the circum

stances, as determined by the context in wh1ch the phrase 1S

used. The semantic function for putting Oet and NP' together 1S a

function ,(C,SoT) defined as follows:

if ICnSoTI = 1 the intersection of C,S and T if the cardinality of CnSnT = 1

I CnS nT I a , .i. e. the fun c t ion is em p t y if the cardinality is zero

f if ICn51'\TI > 1 •• i.e. the function is too full 1f the cardinality 1S greater than 1

where e and f are two abstract Fregean objects.

Not all noun phrases can be analysed in this way. One can find

many counter-examples. we have to deal especially with two kinds

of counter-examples for the analysis of NP:

The denotation of the NP

is not the intersection of the set of alleged things and the

set of dictators.

two flowers. The denotation of this NP is nat the intersection

of two and the set of flowers.

A thlrd example, deal1ng with adverbs, is: the very tall man. The

adverb very is an lntensifier. The denotation of cannot

be represented by the intersection of very and tall. Another in

teresting example is two red flowers. One could add a new rule:

AdjP-7Car to the grammar of noun phrases given earlier, Car

- 83 - Suppes

standing for cardinal. But this would give the same problem as ~n

the example the square table. The following grammar can be de

vised:

NP ~ Car + NP t

NP I ~ AdjP + N

AdjP ~ AdjP + Adj

AdJP-+ AdJ

It is to be noted that NP I Npr. NP' is used in order to avoid re

cursion on cardinal numbers that would allow sentences like two

three flowers.

To this grammar has to be added the Fraga-Russell definition

of 2: 2 is the set of all pair sets. The tree for two red flowers

is:

NP : 2 np(RnF)

/~ Car f\ip t

: RnF

I" AdjP : R N : F

+.R I two: 2 red: R flowers r F 1.9. 1.6

What about the semantics for the special rule NP---", Car.;. NP I,?

P(RnF) ~s the power set of RnF i.e. the family of all subsRts of

RnF. The denotation of the root NP is:

the intersection of the set of all pair sets and the power set

P (RnF)

It is not suggested that this model is a model of what is ln

the child's head. ~ child, a speaker, or a listener of Engl~sh ~s

not exam~ning in any sense the entire set 2 or the large set

P(RnF): these rules are not meant to be a part of procedural se

mantics.

The analysls given so far does not deal w~th all problems of

semant~cs. It has to be kept in mind that we are at the moment

Suppes - 84 -

dealing only with extensional semantics. Later on we will talk

about intensional semantics. In order to illustrate the difference

we can use a famous example of Frege's:

9 = the number of planets premiss

premiss 2 it is mathematlcally or logically necessary that

9 > 7

conclusion it 18 mathematically or logically ne~essary that

the number of planets is greater than 7

The problem is that what is cal18d in ~rem1ss 1 extensional iden

tity has been used for substitution into a non-extenslonal con-

text.

As an illustratlon of the application of the analysls to sen-

tences, let us take the example:

mar has the following production rules:

S --7' NP + VP

NP ~ as before

VP --7' TV + NP

NP~ PN

The gram-

where 5 stands for sentence, VP for verb phrase, TV for transltlv8

verb, and PN for proper name. The derivation tree looks llke:

5 : F (q,( C ,R n B) ,f:l' , J )

~ ------- ~ NP : <1>( C, R f"I 8) VP : H' 'J

/ ~ /' "'" Det:C NP':Rn8 TV:H NP:J

I~ I AdJP : R N: B PN : J

A1 j : R I

the: C red: R ball: !:J hi t : H John: J Fig. 1.7

The int8r8sti~g part of the tree is the denotation of each node. H

is a binary relation denoted by the two-place predicate hit. H'IJ

is the converse of H, restricted to J, denoting in this way a unit

- 85 - Suppes

set (in the present case). The denotation of S is the Frege

function:

={true if A :S B

F(A,B) false otherwise

(In the notation HI IJ, we would be more explicit if we wrote

H"{J}, but I idenb.fy unit sets with thelr members, l..e., {J}

J) •

Jhat I now would like to do is to glve a simple example from

actual data taken from a corpus collected by Roger Brown and his

aSSoclates. That corpus 1S known as Adam I, i.e. the early years

of a boy Adam who lS about 24 months old. I shall give a noun

phrase grammar for Adam I. Instead of uSlng simpllcity as the

cr1terlon for a grammar of a corpus, I prefer a much more con

servat1ve crlterion. without gIving an account of the mechanisms

by which a chlld lS generatlng the speech, we can probablilstl

cally predlct the utterance types by assigning probabilistic pa

rameters to the varlOUS productlon rules. One of the byproducts

one can expect from thls is a pretty good account of the length

of the utterances. The grammar for the noun phrase of Adam I has

7 product10n rules. The probabillty of uSlng each rule in a de

rivation as well as the set-theoretlcal semantic functlons are

lndlcated for each rule.

Noun-Phrase Grammar for Adam I

Production Rule [...rob('lbility SemantIc Functlon

1 • NP -'> N a1

Identlty

2. i\lp ~ AdjP a2

Identlty

3. I~P -'> Ad jP + N a3

Intersection

4. r~p ~Pro a4

Identlty

5. rJP ~NP + NP a5

Choice functlon

6. AdjP -4AdjP + Adj b1

Intersection

7. AdjP ~Adj b2

Identity

11 u 1 e 5, the c h 0 1. C e fun c t 1. 0 n, i s a r u let 0 ex pre ssp 0 sse s s ion. Pro

stands for pronoun. From a probabillstic viewpoint, the grammar

Suppes - 86 -

has five independent parameters, because the a's sum to one and

the b's sum to one. These parameters may be estimated from the

data by a maximum likelihood method. The basic grammatical data

are shown in the table • The first column gives the types of noun

phrases actually occurring in the corpus in decreasing order of

frequency. Some obvious abbreviations are used to shorten nota

t1on: A for Adj, P for Pro. The grammar defined genefates an in

finite number of utterances, but, of course, all except a small

finite number have a small probabil1ty of being generated. The

second column lists the numerical observed frequencies of the ut

terances (with 1mmediate repetition of utterances deleted from

the frequency count). The third column lists the theoretical or

predicted frequenc1es when a maximum-l1kelihood estimate of the

five parameters is made. The fourth column lists the observed

frequency with which the "standard" semant1c function shown above

seems to provide the correct interpretation for the five most

frequent types.

The fit with the data is far from perfect. I only want to 11-

lustrate the method. The point 1S that if we can produce a gram

mar that also has semantic functions that we are satisfied with,

then this fit is a direct way of BV81uat1ng the grammar.

The number of parameters in the analysis of a whole corpus 18

usually very large. One thing we are trying to do is to reduce

the number of parameters. The other thing, which is linguist1cal

ly much more interesting, IS the use of probab1lities for proba

bilistic disambiguation: to select one of two trees as the denot

ation of an ambiguous sentence.

- 87 - Suppes

TABLE

Probabilistic Noun-Phrase Grammar for Adam I

~Joun Observed Theoretical Stand. semantic

phrase frequency frequency function

N 1445 1555.6 1445

P 388 350.1 388

NN 231 113.7 154

AN 135 114.0 91

A 114 121.3 114

PN 31 25.6

NA 19 8.9

N Nr~ 12 8.3

AA 10 7. 1

NAN 8 8,3

AP 6 2.0

PPN 6 .4

A r~N 5 8.3

.4AN 4 6.6

rA 4 2.0

MJIl. 3 .7

Il. P ~J 3 • 1

AAA 2 .4

APA 2 .0

PJPP 2 .4

PAA 2 • 1

PAN 2 1 .9

Suppes - SB -

THEORY OF ~UTomATA AND ITS APPLICATION TO PSYCHOLOGY

P. Suppes

Lecture 2

2.1 RECISTER mACHINES

The formal presentation of the model-theoretlc semantics Wh1Ch

we were talking about in the previous lecture, w1ll be given

after the presentation of register machines. Register machines,

which have a set of instructions, are 1ntultively much closer in

conceptual organization to actual computers than are Tur1ng ma

ch1nes. It is my hope that they can be applied ln psychological

research by introducing psychologically appropriate instructions.

~e begin wlth some notation:

cn> 1S the content of a register n before carrying out an in

struction,

enl>ls the content after carry1ng out an instruction.

Defln1tion: An Unlimited Register reachine (URm) has

(1) a denumerable sequence of reg1sters numbered 1,2, 3, ••• ,

each of WhlCh can store any natural number, but each program

uses only a finite number of registers and

~i) six bas1c instructions

(a) P (n) add 1, en I> = en> + 1 ,

(b) D(n) substract 1, < n '> en> - 1 if < n> f. 0

(c) D(n) clear register, <n '> = D,

(d) C(m,n) copy from register m to n, <n I> m,

(e) J [L ~ Jump to label L 1 ,

(n J (n) [L1] jump to label L1 if <n~ = 0

In programs we use ( e) and (f) in the form J [3] jump to label (3) •

If jump 1S to a nonexistent labeled line, the machine stops. Thus

L1 is a variable for the labeled line to jump to. crhe six instruct

ions (a)-(f) are not a m1nimal set.)

As an example, the following program puts "half" of the number that

is in register in register 2, and the remainder in register 3.

1. J(1)t~ Go to line 2 if <1) f. D. Else to line B (i.e. stop)

2. D(1) Substract 1 from number in register 1

- 89 - Suppes

3. P(2) Add 1 to number in reglster 2

4. J(1)[8]

5. D (1 )

6. P(3)

7. J (11 Jump to llne 1

The definltion of a program of a URm is as follows:

A program for a URm is a finite sequence of lnstructions anyone

of which may be labeled. It can be proved that any partial recur

sive function can be computed by a URm by considerlng how a par

tial recursive function is defined. (Shepherdson and Sturgls,

1963)

many equ1valences are known:

A number-theoretic function is computable by a Turing I'lachlne

i ff 1 tis par t i a 11 y r e cur s i v e

iffit 1S lambda-deflneable

iffit is computable by a Register machine

We now will talk about an URm over an arbitrary finite alpha

bet V and about part1al recursive functions over such an alphabet.

The set V is finite, and as usual V* is the set of all finite se

quences of elements of V; 1n the present context, we shall call lIE the elements of V words. In the follow1ng, variables a

1 range

over V, and var1ables x, y, x1

' ••• , xn

' etc., over V*. Also, xa i is the concatenation of x and a

i" The definition of a part1al re

cursive function is in this case as follows:

Definition: A function is partial recurS1Ve iff it can be obtained

from the functions of I, II, and III by a finite number of ap

plications of IV, V and VI.

1.

II.

III.

I V •

Sa. (x1

) = x1

a i (Successor Function) 1

On(x1

, ••• , xn

)

U~(x1' •.• , xn)

~ (Constant Function, ~ = empty string)

x. (Projection Function) 1

If h1 , g1' ••• 9m

are part1al recurSive, so is the funct

ion f such that f(x 1 , ••• , xn) = h(9 1 (x 1 , ••• , x n ), ••• ,

g (x 1 ' ••• , x )) (Compos1tion) m n V. If g, h, are part1al recurS1ve so is the funct10n f such

Suppes - 90 -

that f(~, x2 ' ••• , xn) = g(x 2 , ••• , xn)

f(za., x2 ' ••• , x) == h.(z, f(z, x2 ' ••• , x), x2 ' ••• , x) ~ n l n n

i = 1, ••. , s. (Primitive Recursion)

VI. Let ~~y mean the shortest word composed Just of a i • If g is

part~al recursive so is the functlon f such that

f(x 1 , •.. , xn ) = ~iY g(x 1 , ••• , xn ' y) =.0 In other words, the value of the function f(x 1 , ••• , x n ) for the

arguments x1 ' •• 0' xn is the shortest word y composed Just of a l such that g(x 1 , ••• , xn ' y) == O.

~e define in 8 fashion as before a URM over V. The basic instruct-

ions can be reduced to three:

(a I ) p(i)(n) place a. on the right-hand end of <n>, l

( b I ) D(n) delete the left most letter of < n> if < n> J 0,

(c I ) J(i)(n) [L1] jump to label L 1 if <n> begins wlth a .• ~

A machine like this can generate or recognize a type 0 phrase-struc

ture grammar.

want to examine some simple examples that illustrate the

general remarks, and also try to say in what sense we can regard

the production rules of a grammar as the primitive instructions

of some reg~ster machine.

A simple illustration of what I have in mlnd ~n talklng about

proving that a program is correct, can be given by using a very

simple stack machine, proposed by a student of mine, Alex Can

nara, and being investigated by him as a teaching device. The

simple instructions will be almost self-explanatory. Thus the

first line of the program is PUSH X, which places X at the top of

the pushdown stack. The second, SQUARE, pushes X down and places on

the top of the stack X2. Instructions llke ADD and mULTIPLY operate

on the top two expressions on the stack, and each places the result

on the top of the stack. The instruction SUBTRACT subtracts the

second item in the stack from the top item, and places the result

on the top of the stack. Finally DIVIDE divides the top ~tem on the

stack by the second item and of course places the result on the top

of the stack. Now let me write a simple program to compute

(X 2 _ y2 )/2 XV. In the right-hand column I show the content of the

- 91 - Suppes

more general purposes it would be necessary to keep track of the

content at each line of all the positions in the stack.

PROGRAm ~EXT ITEW IN STACK

PUSH X X

SQUARE x2

PUSH Y Y x2

SQUARE y2 x2

SUBTRACT _ X2

PUSH 2 2 y2 _ X2

DIVIDE 2/(y2 _ X2 )

PUSH X X 2/(y2 _ X2 )

mUL TIPLY x( (y2 _ X2))

PUSH Y y x( (y2 _ X2»

mUL TI PL Y Y(X(2;{y 2 _ X 2 ) ) )

PUSH -1 -1 Y (x (2;{y2 _ x2»))

IVIDE _1/Y(X(2/(Y 2 _ X2)))

To shaw that this simple program is correct we would need to

prove the trivial theorem:

YX2/(

whlch is obvious to us, but well mIght not be to a 13-year old

student of algebra. moreover, in the special case of rational

functions, e.g., there is an algorithm for deciding If two ex

pressions denote the same function. On the ather hand, in gen

eral the proof of correctness of a program cannot be given by an

algorit~m, and the proof of correctness can be non-obvious.

In several of our informal discussions have tried to make

the point that the concept of a procedure and the concept of

proving the procedure correct have well-anchored historical roots

in Greek mathematics. In Euclid, for example, constructions are

procedures. The surprising fact ~s the clear and explicit recog

n~tion by Greek mathematicians that constructions and proofs of

their correctness are given a d~fferent status - are classified

Suppes - 92 -

separately - from the proofs of propositions. Euclid's construct

ions may be regarded as a process model of geometry. The r~gorous

examination of the foundations of geometry at the end of the

nineteenth century bv Hilbert and others took an unfortunate turn

in obliterating this distinction. In Hilbert's axiomatlzation of

geometry and all the others of the same period constructlons were

eliminated as formal objects to be studled explicitly. In other

wards, no explicit aXloms about constructions were given, al

though intultive talk about constructions was frequent. A good

example would be Hilbert's discussion of constructive geometry

over a Pythagorean field at the end of hlS book on geometry. (A

field is Pythagorean If for each a and each b ln the field'there 222 is a c in the field such that a + b = c ). In terms of the mo-

dern history of procedures prior to the advent of computers, the

explicit consideration of mathematical proofs as objects, espe

clally in Hilbert's proof theory, was a fundamental step, one not

taken by the ancient Greeks.

I now turn to the second point wanted to plck up from our

earlier construction. This concerns the sense in which a grammar

may be looked at as defining procedures or programs. I shall look

at the question from the standpoint of reg1ster mach1nes and try

interpreting the production rules of the grammar as the prlmitlve

instructions of the machine. (fy examples will be slmple context

free grammars but the approach is equally applicable to any

phrase-structure grammar.) If we attempt to apply the s1mple In

terpretation just described, we at once encounter a difficulty.

To ll1ustrate it, let us consider the following Simple grammar.

Productlon Rules

s~ r~p + VP

NP~r~

NP~Adj + N

VP~TV + NP

Inst

S

NP1

NP2

VP

ons

Each production we interpret as an instruction of a machine with

a single register. (~e shall generate or parse only one sentence

- 93 - Suppes

at a t1me.) Thus, the first production now reads: "Replace S by

NP VP" and 1S denoted by S. l:Je also have an inltlal LOAD instruct

lon for placing a nonterminal symbol in the register. Now let us

look flrst at the derlvatlon in the grammar of the sentence

cows eat grass. For brev1ty I shall derive only the grammatical

structure and not make the flnal lexical substitutions. Thus we

begin the der1vatlon an the left and the attempted program an the

nght.

Derlvation

1. S

2. NP VP

3. NP TV NP

4. A dJ N TV NP

Program

Load 5

5

VP

?

At line 4 of the derivation we do not have a correct instruction

to add to our program on the right. It is clear from the deriv

atlon that NP2 1S used, but if we write NP2 as our instruction on

the rlght, we do not know, looking simply at the reglster machlne,

whether after the execution of the lnstruction the content of the

register is AdJ N TV NP or NP TV Adj N, unless of course we apply

NP2 to both occurrences of NP, which would be an unambiguous 1n

struction, but not the one we want. Consequently wlthout some

minor changes there 1S a d1fficulty of such a direct lnterpret

ation of the production rules. The minor change we must make is to

add an index to each lnstructlon. For example NP1(i) means that

instruction NP1 is to be applied to the ith occurrence of NP in

the register reading from left to right. Thus, our four lnstruct

ions are now S(i),NP1(1), NP2(i) and VP(i), and the program cor

responding to the derivation reads:

LOAD S

S (1)

VP(1)

NP2(1)

NP 1 ( 1 )

If we want to have a general register machine for generating or

Suppes - 94 -

parsing any language that is over a vocabulary V and that has a

phrase-structure grammar (not necessarily context-free), we can

use the general register machine over V described earlier and

write subroutines for each of the production rules of the gram

mar, which are by defin~tion of a phrase-structure grammar not

only partial recurs~ve but primitive recurs~ve.

Finally, I note that the usual definition of a language L( )

generated by a phrase-structure grammar G (a string of terminal

words is in L(G) if it is derivable from 5 in G) can be replaced

by the closely associated definition: a string of terminal words

1S in L(G) if there is a program with a slngle initial LOAD in

struct~on in the form LOAD 5 and the content of the reg1ster of

the machine is the string after the program lS run.

- 95 -

THEORY OF AUTomATA ~ND ITS APPLICATION TO PSYCHOLOGY

P. Suppes

Lecture 3

Today I shall talk about process models for students perform

ance ln elementary arithmetic. Process models In thlS domain can

be very much sImpler than process models of language performance.

Elementary mathematics provides one of the best examples as an

i~troduction to detailed process models. mathematics is a unl

versal set of skills; all children in the world are taught the

skills embodied in arithmetic. There has been historically a very

large Ilterature on the teachlng of elementary mathematics. Be

ginning in about 1910 there is a very good empirical literature

on how students come to perform the algorlthms of arithmetic. The

most influential single person is perhaps Edward Thorndlke, who

published Educational Psychology (1913), The Psychology of Arith

metlc (1922), and The PsycholoGY of Algebra.

Thorndlke tried to give a theoretIcal account of what the stu

dents are learning in terms of his concept of bonds, which is

like conditioning, and in terms of the law of effect. At that

time there was not a close connection between the psychologIcal

theory and the formal theory of algorlthms; the theory of algo

rithms in a mathematical sense was not very well developed. A

very large number of empirical studles has been made by many In

vestIgators In the 1920's. They were interested in empirIcal

questions and not so much ln psychological theory. The same was

true for Brown and others in the 1930's. Brown e.g. dld very nIce

studies on subtraction. The tradition of psychologists belng In

terested ln these topics stopped around 1930 WIth people like

Hull and Tolman.

At Stanford a few years ago we started the study of student's

performance In ele~entary mathematics ln a computer setting. Stu

dents are presented exerCIses at a teletype terminal. Two mea

sures of studentfs performance have been used: response probabil-

Suppes - 96 -

ity - the probability of a correct answer and response latency.

I only will talk about response probability. For simpllcity I

shall restrict the analysis to column addition, although the me

thods are extended also to other kinds of arithmetlc operations.

Three kinds of models in this domain will be reviewed: linear

regression models, automaton models and register machines.

3.1 LINEAR REGRESSION mODELS

The first question was to identify the structural features of

problems or exercises that will predict their relatlve difflcul

ty. These features are the independent variables ln the regres

sion models. Let Pl be the observed proportion of correct res

ponses on exsrClse 1; let f .. be the value of feature j on pro-1J

blem i and let ~ be the regresslon coefficient for feature j. A J

linear regression equat10n is

L a j J

In order to preserve the probabillty as the structural features

are combined, the following transformation was made:

Z. 1

log

The regression model we used is

Zi = j a j f ij + a a

The additions had the form:

abc

d e f

Lower case letters are used for simple digits. There are approx-6 imately 10 such problems. These have to be reduced to a small

number of independent variables (structural features). The fea

tures were identified and quantif~ed independently from the

data. The following structural features were defined for column

- 97 - Suppes

addition: SUMR IS the number of columns in the largest addend. For

three-row exerC1ses SUMR is defined as 1.5 t1mes the number of

columns, plus .5 if a column sum is 20 or more. For example:

SUMR [;1"

[ ~1_{1.5 if de < 20

SUMR -E - 2 1f de > 20 de

The second structural feature is CAR which represents the number

of tImes the sum of a column exceeds nine. For example:

CAR

CAR

Cl\R

[;] " 0

[~ ~J "{~ if b + d < 9

otherwise

[ ~ ~J = t1 1 f b +

-h- 2lfb+ 9 c

d + f < 9,

d + f> 9,

a+c+e>9

a+c+e>9

The third structural feature is VF: vertical format. The vertical

exercises with one digit response were given value D.

multicolumn exerCises with multidigit responses and one-column ad

dItion exerC1ses with a response of 11 were given the value of 1.

One-column addit10n exerCises W1th a multidig1t response other

than 11 were glven the value 3. For example:

vr [;] 0

v{~l 3

[a b CJ VF~ 9 h 1

Suppes - 98 -

The following regression equation was obtained for the mean res

ponse data of 63 students.

Zi = .53 SUMRi

+ .93 CARi

+ .31 vri

- 4.06

The multiple R was .74 and R2 = .54.

The regression models are not process models. They do not provide

even a schematic analysis of the algorithmic steps the student

uses to find an answer. They are only rough-end-ready models that

can be used if one does not understand the processes very well.

Automaton models, on the other hand, are process models. Their

use represents a natural extension of the regression analysis.

3.2 AUTOMATON mODELS

For the exercises in column addition we may restrict ourselves

to finite automata. We will start with a definition of determinis

tic automata.

A structure = <A, VI' Va' m, 0, sO> is a finite

(deterministic) automaton if and only if

(1) A 1S a finite, nonempty set (the set of the states of the

automaton)

(ii) VI is a f1nite, nonempty set (the input vocabulary)

Vo is a finite, nonempty set (the output vocabulary)

(iii) m is a function from the Cartesian product

(m defines the transition table; this tells

wo rks)

(iv) 0 1S a function from the Cartesian product

A x VI to Va (Q is the output function)

(v) sOc A (sO is the initial state).

(i) and (ii) are the finiteness conditions.

A x VI to A

how the machine

As an example, we may characterize an automaton that will perform

two-row column addition:

A {O,H where 0:: no carry, 1 a carry

VI {(m,n): 0 < m,n < 9} a column of two digits is processed

at a time

- 99 - Suppes

Vo = {0,1, ••• ,9} the output consists of the single digits

{01 if m + n + k _>< 9

9 where k is a state (0 or 1)

m (k,(m,n)) if m + n + k and (m,n) is an input pair

Q (k,(m,n))

So ::: 0

(k+m+n) mod 10

Thus the automaton operates by adding first the column of ones,

storing a~ internal state 0 if there is no carry, 1 l.f there is a

carry, with as output the sum of the column modulus 10, and then

moving on to the input of the column of tens, etc. The initial

internal state So is 0, because at the beginning of the exercise

there is no carry. This model is not interesting, because it is a

deterministic model: all problems will be solved correctly. Let

us now turn to a probabilistic automaton.

De f 1 nit ion: A s t r u c t u r e U::: < A, V I' V 0' p, q, sO> is a fl.nite

probabilistic (or stochastic) automaton if and only if

(i) A lS a finite, nonempty set

(il) VI and Vo are finite, nonempty sets

(iii) p is a function on A x VI x A to (o,D such that for each s

in A and a l.n VI p is a probability density over A, i.e. s,o (a) for each s· in A, p (s') > 0

s,o -(b) E p (s') = 1

s'EA s,o

(iv) q l.S a function on A x VI x Vo to [0,1J such that for each

s ln A and a in VI' q is a probability density over Vo s,o

(v) sOfA

The number of possible parameters l.n this case is unl.nterestingly

large: each transition m(k,(m,n)) may be replaced by a probabilis

tic transition 1 - £k ' and each output Q(k,(m,n)) by 10 proba-,m,n

bl.lities for a total of 2200 parameters.

what kind of assumptions can be made in order to reduce the

number of parameters? A three-parameter automaton model l.S easily

defined. The fl.rst parameter, y lS simply the probability of an

output error. Conversely, the probability of a correct output is

P(Q(k,(m,n)) = (k+m+n) mod 10) ::: 1 - y

Suppes - 100 -

The second parameter, E, is the probabilIty of an error when there

is no carry:

P(m,(k,(m,n)) = 0 Ik+m+n ~ 9) 1 - E

The second parameter, T'I, is the probability of an error when there

is a carry:

P(IYl,(k,(m,n)) 1 I k+m+n > 9) :::: 1 - T'I

In other words, if there is no carry, the probabilIty of a correct

transition is 1 - E and if there is a carry, the probability of

such a transItIon is 1 - n. We assume that these errors are con

stant, independently of the partIcular combInation of k+m+n. So

this is a drastIc reduction of the number of the parameters. Con

sider now problem i with Ci

carrIes and Dl digits. If we Ignore the

probability of two errors leading to a correct response, then the

probabilIty of a correct answer is Just

D. C. D.-C -1 P(correct answer to problem i) = (1-y) 1 (1-n) 1 (1-£) 1 1

We may get a direct comparison with the linear regression model if

we take the logarithm of both sides to obtain:

og(1-y) + C 10g(1-n) + (D -C -1)109(1-£) 1 I 1

and estimate log (1-y), 10g(1 n) and 10g(1-£) by regression with

the additive constant set equal to zero. We also may use some other

approach to make estlmates, such as minimumX 2 or maXImum likeli

hood methods. The automaton model naturally suggests a more de

tailed analysis of the data. Unlike the regresslon model, the auto

maton provides an immediate analysis of the digit-by-digit res

ponses. We can find the general maXImum likelihood 8stlmates of Y,

E and n when the response data are given in this more explicit

form. Let there be n digit-responses in a block of responses. For

1 ~ i S n let X be the random variable that assumes the value 1 if I

the i th respo'nse is correct and 0 otherwise. I t is then easy to see

that

- 101 - Suppes

f1

-

Y

)

if i is a onesl-column digit,

p(x 1 ) (1-y)(1-&) 1.f it 1.S not a onas'-column d1.git and :;::

there carry to the ith d1.git, 1.S no

(1-y)(1-n) if there is a carry to the ith digit

Similarly for the same three alternatives

p( 0)

For a string of actual digit responses x1

' ••• , xn we can wr1te

the likelihood functlon as:

where a the number of correct responses, the number of 1.n

correct responses in the onesl-column, c = the number of correct

responses not ln the ones'-column when the internal state 18 0,

d the number of correct responses when the internal state 1.S 1,

e the number of lncorrect responses not ln the onesl-column

when the internal state is 0, and f

ponses when the 1.nternal state is

the number of lncorrect res

(In the model statistical In-

dependence of responses is assured by the correctlon procedure).

I tis m 0 r e con v e n i e n t toe s t 1 mat e y t 1 -V, £ I = 1 -£ and n I = 1 - n • fil a k -

ing this change, taklng the logarithm of both sides of the likell

hood function, and dlfferentlating with respect to each of the va

rlables, we obtain three equations that determine the maXimum

likellhood estimates of y', e:' and n'

OL a fn' oy I V' -y 1-Y 'n I 0,

OL E- ey' 0 O£ I £ I 1-y'e:'

oL .sL. fy t 0 on I n t 1-V 'n I

Solving these equations we obtain as estimates:

A' V

Suppes

" .. , 11

c(a+b-c-d) (c+e)(a-c-d) f

d(a+b-c-d) (d+f) (a-c-d) •

- 102 -

The estimates for the same data already described are:

y .0430

£ .0085

n .0578

The fit with the data is pretty good but by no means exact. The

fit can be improved by replacing the parameter E by two parameters:

EO in case there is no carry also in the preceding column and E 1

in case there is a carry in the preceding column. (as an illustra

tion, consider 457 135 592

The estimates are

.0000

.0928

This model of EO and £1' as well as the data analysis for both

automata is due to Alex Cannara.

The central criticism of the automaton models is that they are

informationally inadequate to the problem. They do not give any

account of how the student is processing the format of the problem.

The way of handling the problem is abstract in the sense that the

definit~on of the input for the automaton is not closely related to

the input the student is faced with perceptually. The type of mo

dels to be discussed next, on the other hand, is more perceptually

adequate.

3.3 REGISTER mACHINES*

The programs written for a register machine deal with the pro-

* This research is a Joint effort with Lindsay Flannery.

- 103 - Suppes

cessing of the perceptual format of the problems. The informa

tional part in this case is more realistic. However, what is not

realistic in these models, is the highly schematic character of

the perceptual processing. The introduction of perceptual pro

cesses in the model will be illustrated by an example of a sub

routine of the program. We have a two-dimensional array. The

square at the top at the right hand side is called (1,1). The

instructions are stated in the column at the right hand side.

5 <op>

9 <55>

14 <NSS>

Attend (1,1)

Read in

Attend (+1. 0)

Read in

+

5

9

, +

The instruction "Read inti means that the content of the square

(1,1) in this case has to be stored in a stimulus-supported re

g1ster 55; "Attend (+1, 0)" means that one has to go to the

square one row below the previous square that is in the same co

lumn as that square. The number 5 will be transferred to an ope

rating reglster OP and the number 9 will be stored in a stimulus

supported register. The response -14- will be put in a non-sti

mulus-supported register NSS. The numbers 5 and 9 have visual

support; the number 14 has no visual support.

In solvlng a problem e.g. making the addition

l 516

~4932

21543

214

the student is not surveying the whole array. The problem is that,

when he has finished a column, he has to determine whether there is

anything to the left of that column. Let me describe what we call

the vertical scan subroutine that is used in all these vertical

Suppes - 104 -

format problems. That subroutine is indicated as

V SCAN (0, • IF • , 9,_) .

RD READIN

J~lP(O, ••• ,9,_) S8 FIN

Attend (+1.-1) READIN

JIYlP ( - ) ss FIN

Attend (+0, + 1 )

JIY1P RD

FIN END

The first operat10n 1S "read 1n" wherever we are. We jump to FIN

1f we get anything because all we want to do is to find out 1f

there lS something. If we get something we put 1t in a stimulus

supported reg1ster. If we do not get anything we go down a row

and move to the right: Attend (+1,-1). ~e read in, if we get some

thing, we put 1t in a stimulus-supported register and do a jump to

FIN. Otherwise we stay in the same row and go to the left: Attend

(+0.+1) and make an unconditional Jump to READ.

The basic idea of the application of the register mach1nes 1S

the quest10n what mean1ngful instructions - in terms of behav10r

and internal processes of the students - to have as prlmitives.

There 1S some hopeof behaviorally checking these primit1ves. As a

check of the model we can look at the eye-movements. How far from

the actual eye-movements is this subroutine. If we want to account

for the eye-movements, the program becomes much more elaborate.

Another feature that lS miSSing 1n the model are the latencies.

One of the things that has been fairly extenslvely studied over 50

years is not only the accuracy but also the speed of children do-

1ng these exercises. This model does not give any account of the

latenc18s. The third feature that 1S omitted is learning. In the

case of process models with children, arithmetic offers one of the

best possib1l1ties for actually getting a detailed grip on learn

ing. We can ask the question what kind of verbal reinforcements

can be given by the teacher in order to enlarge the internal pro-

- 105 - Suppes

gram of the student. f 1n classifying geometrlc figures the ver

bal reJ.nforcement con:Hsts of "correct" and "not correct", the

learn1ng 1S extremely slow. What 1S needed 1S close correspond

enC8 between the verbal lnstruction to the student - 1n thlS e

the reinforcement - and what he 1 to do. iuch close corres-

pondence 18 pract1cal 1n the case of arithmetlc.

- 106 -

THEORY OF AUTomATA AND ITS APPLICATION TO PSYCHOLOGY

P. Suppes

Today I will turn to models for natural language processing.

The studies of natural languages, especially of procedural seman

tics are as yet in a much less empirical stage than the studies

in arithmetic. I shall talk about classical model-theoretic se

mantics; semantics for context free languages, about some ap

plications to Erica, a corpus of a young girl, and about procedu

ral semantics.

4.1 CLASSICAL mODEL-THEORETIC SEMANTICS

I return to the semantics of natural languages by first re

viewing the classical semantics of mathematical logic, because

the ideas develop grew out of this earlier development. Some

linguists have claimed that the semantics of natural languages

should be represented in first-order logic. According to Quine

e .• , the two 8ssential features of natural languages are pre

dication and quantification. I restrict myself to theories with

standard formalization. (An axiomatic base for all of classical

mathematics can be given by such a theory, namely, axiomatic set

theory.)

Theories with standard formalization consist of the following

ingredients:

(i) First-order predicate lOgIC with identity, including the

logical notation of sentential connectives, quantifiers, va

riables, identity symbol and parentheses.

(ii) The non-logical primitive symbolS of the theory, which fall

Into three classes: relation or predicate symbols, operation sym

bols, and IndiVIdual constants or names.

(iii) The axioms of the theory, when the theory is axiomatically

built.

- 107 - Suppes

Definitions of neill symbols are treated as special kinds of axioms.

Roughly speaking, definitions are axioms that add no new content

to the theory. (Technically they must satisfy crlteria of elimina

bility and non-creativity).

Here are some simple examples of theories with standard forma

lization:

1. Group theory

Primitive symbols: binary operation symbol 9, unary operation sym

bol -1, and individual constant 8. (It is important to distinguish ~ -

between the binary operation symbol 0 in a formal context and the

operation 0 in the possible realization. The symbol is a symbol of

the language or theory of groups. The operation itself is a non

linguistic object. In a given case of a possible realization we

think of the symbol as denoting the relation. But, and this is

fundamental, the operation symbol only denotes an operation relat

ive to a given possible realization.)

Axiom:

(~x)(Yy)(Vz)(xo(yoz)

(Vx)(x 0 e = x)

(Vx)(x 0 x- 1 = e)

(xoy)oz)

2. Elementary Euclidean geometry

Primitive symbols: Ternary relation symbol B for betweenness, qua

ternary relation symbol E for equldistance: xy E uv if the dis

tance between x and y is equal to the distance between u and v.

Axioms: Several standard verSlons which I shall omit.

3. Axiomatic set th

Prlmitive symbol: binary relatlon symbol e of membership.

Axioms: Zermelo-fraenkel axioms, for example.

The next step is to move on to the semantics of a theory with

standard formalization. \Ue first deflne the concept of a possible

of a theory_ As an example, consider group theory as

defined above. The primitive symbols are given in the fixed order

stated above. A possible realization of group theory is a relat--1

ional structure U <At 0, ,e> where A is a non-empty set, 0

Suppes

is a binary operatlon on A,

- 108 -

i.e., -1

a function from A x A to A,

18 a unary Function fram A to A, and 8 is an element of A.

Finally, we say that a possible realization of the theory of

groups is a of the theory of groups iITall the axioms are

satisfied in the possible realization. (I omit a technical defi

nition of satisfaction. Another intuitive way of formulating the

matter 1S thlS: in a model of a theory all the axioms hold or ~

In usual mathematical terminology, a model of the theory of

groups is a group, But such terminology 18 not always used, for

example, it 1S not ordinarily so used in geometry or in set

theory, although we do often speak of spaces in geometry.

Glven the apparatus I have briefly descrlbed, there are many

addltional general concepts that have been formulated and that

may be intensely studied, under the general heading of the theory

of models, which is Just the study of semantics of theories wlth

standard formalization.

For the sentences of the language of a theory, independent of

the axioms of the theory, perhaps the most important relat10n be-

tween them is that of A sentence S1 in the

language of a theory is a logical consequence of a sentence iff

51 holds in any poss~ble realization of the theory in which

holds. To show, for example, that associativity of a binary oper

at10n is not a consequence of lts being commutative, we can give

a possible realization in WhlCh the binary operation is commut

stlve but not associative.

What is important about first-order languages, or the lan

guages of theories with standard formal1zation is that we can

exactly m~rror the semantic concept of log~cal consequence in the

syntactic concept of derivation. In other words, we can give

purely syntactic rules of loglcal ~nference such that 51 is de

rivable from S2 by means of the rules of inference if and only if

S1 is a logical consequence of • It is generally agreed that

such an intimate connection between syntax and semantics probably

does not exist for natural languages. We can grant this claim,

- 109 - Suppes

and yet go on to use the lack of almost any study of the relatlon

bet~een syntax (Inference) and semantics (logical consequence) in

natural languages as a point of crIticism of most contemporary

theories of the semantlcs of natural language. However, this is

not a topic, as seductlve as It is, that I can explore here.

It might be thought that only axiomatIC theories can be use

fully studied In the theory of models, but the example of th

concept of logIcal consequence, applying as It does very broadly

to the language of the theory, shows that thIS is not necessarIly

the case. In many instances, rather than aXiomatlzing the theory,

we characterize the valid sentences of the theory by statIng that

a valId sentence is one that holds in a particular possible rea

lization of the theory. For example, the valld sentences of ele

mentary Euclidean geometry may be defined to be Just those that

hold in the standard numerical interpretation of betweenness and

equidlstance In the Cartesian plane Re x Ref where Re is the set

of real numbers.

Because the semantics of first-order logic is well understood,

it is natural that some people have proposed using fIrst-order

logic as the "base structure" of a generative grammar. The dIffi

culty with this approach is that we have no algorithms for trans

lating ordinary language into logical notatIon, and we must re

peatedly appeal directly to the intuitive meaning of the natural

language expressions in order to make the translation. For this

and many other reasons it seems desirable to develop mOdel-theo

retic semantics dIrectly for natural language.

4.2 mODEL-THEORETIC SEmANTICS FOR CONTEXT-FREE lANGUAGES

I now shall describe in a more technical way the applIcation

of model-theory to natural languages. Some standard grammatical

concepts are defined in the Interest of completeness. First, if V

is a set, V* IS the set of all finite sequences whose elements

are members of V. I shall often refer to these finite sequences

28 The empty sequence, 0, IS in V*; we define V+ V* -{O}.

Suppes - 110 -

A structure G <V,VN,p,s> is a phrase-structure grammar if and

only if V and P are finite, nonempty sets, VN

is a subset of V, S

is in VN and P ~ V* X V+. Following the usual terminology, VN is

the nonterminal vocabulary and VT = V - VN the terminal vocabula

ry. 5 is the start symbol of the single axiom from which we de

rive strings or words in the language generated by G. The set P

is the set of production or rewrite rules. If <a, $> EP, we write

a"" $ t which we read: from a we may produce or derive B (immediat

el y) •

A phrase-structure grammar G = <V,VN,P,S> is context-free if

and only if P ~ VN

X V+, Le., if a'" B is in P then CI €VN

and

B €V+.

The standard definition of derivations is as follows. Let

G <V,VN,p,S> be a phrase-structure grammar.

a production of P, and y and 0 are strings in

First, if a'" a is

V*, then yao===> YfHl.

'" .!... D Gl.' f We say that a is derivable from a in G, in symbols, ~ ~p G

there are strings a 1 , ••• ,a n in V* such that a a 1 , a 1 t- Cl 2' ••• ,

a 1 ~ tI = 8. The sequence 6 <a 1, ••• ,a n> is a derivation in G. n- G n The language L(G) generated by G is {a: a EV~ & 5 ~a}. I n other

G words, L(G) is the set of all strings made up of terminal vocabu-

lary and derived from S.

The semantic concepts developed also require use of the con

cept of a derivation tree of a grammar. The relevant notions are

set forth in a series of definitions. Certain familiar set-theo

retical notions about relations are also needed. To begin with, a

is an ordered pair < T , R > such that T is a non

empty set and R is a binary relation on T, i.e., R ~ TxT. R is

a of T if and only if R is reflexive, antisym-

metric and transitive on T. R is a strict simple ordering of T if

and only if R is asymmetric, transitive and connected on T. We

also need the concept of R-immediate predecessor. For x and y l.n

T, xJy if and only if xRy, not yRx and for every z if z I y and

zRy then zRx. In the language of formal grammars, we say that if

xJy then x directly dominates y, or y is the direct descendant of

x.

- 111 - Suppes

Using these notions, we define in succession tree, ordered tree

and labeled ordered tree. A binary structure <T,R> is a tree if

and only if (i) T is flnite, (ii) R is a partial ordering of T,

(iii) there is an R-first element of T, i.e., there is an x such

that for every y, xRy, and (iv) If xJz and yJz then x = y. If xRy

in a tree, we say that y is a descendant of x. Also the R-first

element of a tree is called the root of the tree, and an element

of T that has no descendants is called a leaf. Ue call any element

of T a node, and we shall sometlmes refer to leaves as termlnal

nodes.

A ternary structure <T,R,L> is an ordered tree if and only If

(i) L is a binary relation on T, (ii) <T,R> is a tree, (iii) for

each x in T, L is a strict simple ordering of {y : xJy}, (iv) if

xLy and yRz then xLz, and (v) if xLy and xRz then zLy. It is cus

tomary to read xLy as "x is to the left of y". Having this order

ing is fundamental to generating termlnal strlngs and not just

sets of termlnal words. The terminal string of an ordered labeled

tree lS just the sequence of labels <f(x1), ••• ,f(x

n» of the

leaves of the tree as ordered by L. Formally, a quinary structure

<T,V,R,L,f> is a labeled ordered tree if and only if (i) V is a

nonempty set, (ii) <T,R,L> is an ordered tree, and (iii) f is a

function from T into V. The function f is the labeling functlon

and f(x) is the label of node x.

The deflnition of a derivation tree is relatlve to a given con

text-free grammar.

Deflnition. Let G = <V,VN,p,S> be a context-free grammar and

let T = <T,V,R,L,f> be a labeled ordered tree. T is a derivation

tree of G if and only if

(i) If x is the root of T, f(x) = 5;

(ii) If xRy and x I y then f(x) is in VN

;

(iii) If Y1'."'Yn are all the direct descendants of x, i.e.,

={y: xJy} 1,0, and y.Ly. if i < j, then l J

< f ( x ) ,< f ( y 1 ) , ••• , f ( Y n »> is a production in P.

Suppes - 112 -

Je now turn to semantics proper by Introducing the set t of

set-theoretical functIons. Ue shall let the domains of these

functions be n-tuples of any sets (with some approprIate restrIC

tion understood to avoid set-theoretical paradoxes).

Let <V,VN

,P,5> be a context-free grammar. Let 4) be

a function defined on P which assigns to each production p in P a

finite, possibly empty set of set-theoretical functIons subject

to the restriction that f the right member of production p has n

terms of V, then any function of 4) (p) has n arguments. Then

~ <V,VN,P,s,t> is a potentIally denoting context-free grammar.

If for each p in P, t(R) has exactly one member then G is said to

be simple.

The simplicity and abstractness of the definltion may be mis

leading. In the case of a formal language, 8.g., a context-free

programming language, the creators of the language speCIfy the

semantics by defining t. matters are mare complicated in apply

ing the same idea of capturing the semantlcs by such a function

for fragments of a natural language.

A clear separation of the generallty of • and an evaluation

function v is lntended. The functions in • should be constant

.over many different uses of a word, phrase or statement. The

valuation v, on the other hand, can change sharply from one oc

casion of use to the next. To provide for any finite composltion

of functions, or other ascensions in the natural hierarchy of

sets and functions built up from a domain of individuals, the

familY~'(D) of sets with closure properties stronger than needed

in any particular application is defined. The abstract objects T

(for truth) and F (for falSity) are excluded as elements of~'(D).

In this definition ~A is the power set of A, i.e., the set of all

subsets of A.

Definition. Let D be a nonempty set. Then~I(D) is the smal-

lest family of sets such that

(1) DE.lt"(D),

(ii) if A, BeX'(D) then A U Be:;re'(D),

(iii) if At:: )tIeD) then .!PA e;Jt' (D) ,

- 113 - Suppes

We defineJl'(D) =Jl'(D) U {T,F}, with T f/..;e'(D), F ¢Je.'(D) and

T .; F.

A model structure for G is defined Just for terminal words and

phrases. The meaning or denotation of nonterminal symbols changes

from one derivation or derivation tree to another.

Definition. Let D be a nonempty set, let G = <V,VN,P,S> be a +

phrase-structure grammar, and let v be a partial function on VT to }teD) such that if v is defined for a in V; and if y 1S a sub

sequence of a, then v 1S not defined for y. Then o1J::: <D,v> 1S a

model structure for G. If the domain of v is exactly VT , then ~

is s1.mple.

We also refer to v as a valuat10n function for G.

I now define semantic trees that assign denotat10ns to nonter

m1nal symbols 1.n a derivat1.on tree. The definit1.on is for simple

potentIally denoting grammars and for simple model structures. In

other words, there 1S a unique function for each production, and

the valuation function is defined just on VT

' and not on phrases +

of Vr Definition. Let G = <V,VN,p,s,t> be a simple, potentially de

noting context-free grammar, let.;/) = < D,v> be a simple model

s t r u c t u ref 0 r G, Ie t '?T' = < T , V , R , L ,f> be a de r i vat ion t r e e 0 f

<V,V N,P,5> such that if x is a terminal node then f(x) € VT

and

let tP be a function from f to dt(D) such that

(i) if <x,f(x» E f and f(x) € VT

then 1/J(x,f(x)) = v(f(x)),

(li) if <x,f(x» c f, f(x) E VN' and Y1""'Yn are all the direct

descendants of x with y.Ly if i < J, then 1. J

t/I(x,f(x)) = (1P(Y1,f(Y1)), ••• ,lJi(Yn,f(Yn))'

where <p = t (p) and p is the production < fex) ,<f(Y1)"" ,f(Yn»>'

The n ~ = < T , V , R , L , f , tV> is a s 1. m pIe s em ant i c t r e e of G and ~.

The function", assigns a denotation to each node of a semantic

tree. We now have two parts, on the one hand we have the models

with the denotations of the terminal words or termlnal phrases,

on the other hand we have the semantic functions that put to

gether the structure of the sentence by saying what the denotat

ions are of the nodes in the tree as we come up in the structural

Suppes - 114 -

of the tree has a denotation the result is as illustrated in the

simple description I gave in the first lecture.

4.3 APPLICP.TImJS TO "ERICA"

now llke to give so~e examples from the corpus [rlca, taken

from the dissertatlon of Bob S~ith. [rlca was about 30 months old

at the time of the recordings. We have 23 hours of recordings.

There are approxlmately 27.000 words by [rlca and approximately

50.000 words by adults. '~e start with the domain. The subsets of

tne domain are adjectives, nouns, possessive adjectives, personal

pronouns, proper names, pronouns. There were 2039 types ln the

corpus. S8~ of the types fall wlthln these categories. The most

lnteresting part of the corpus from a semantlc point of view' are

the verbs. There were 4 kinds of verbs in [rlca: auxlliarles, mo

diflers, transltlve and lntransltlve verbs, and linking verbs

like to be. Leavlng the llnking verbs aside 812 dlfferent verbs

occurred l.e. 26~ of the different types. In order to lllustrate

how we write the semantic rules of utterances I shall take the

following sentence form: pronoun, linklng verb, nOlln. If [pro-

noun] lndicates the denotation of a pronoun, [I'J] the denotation

of a noun and f indicates the llnklng verb, the semantlc rule of

thlS sentence is: If [pronoun] f [11] then True, else False.

- 115 -

THEORY OF AUTo~ATA A~D ITS ~PPLICATIO~ TO PSYCHOLOGY

~. Suppes

Lecture 5

The four topics I would like to discuss today are:

1. from theory to corpus

2. procedural semantiCS

3. intenSional semantics and meaning

4. the complexity of semantics

The first topic concerns the question of how one actually applies

the ideas developed in the earlier lectures to a corpus. The

other tOPiCS are especially relevant to a theory of how psycho

logically we use language.

5.1 FRO~ ThE08Y TO CORPUS

How do we make the semantiC analysiS that we have discussed.

shall diSCUSS in a very simplified way how we get to work on an

actual corpus. Suppose that we have sampled 509 noun phrases ut

tered by a child in a controlled situation. The question of how

noun phrases are identified Will be Ignored at the moment. Let

us further assume that we h2ve written a simple grammar for noun

phrases as we have discussed earlier. ThiS grammar has, say, 3S

production rules. ]e associate with each of these production

rules a semantiC function. Juppose the vocabulary consists of 105

termInal words. In the controlled experiment I have in ~Ind, t~e

meanIng of the majOrity of these words is independent of the

particular SItuation; we know the denotation of these indIVidual

words. Suppose there is a residue of 25 fu~ctlon words and SO~8

others - lIke the, not, to, of - whose meaninq 1S not 1ndependent

of the sltuation. So we have to postUlate a meaning for these

words.

In the k1nd of controlled situation I am talking about, it IS

SummarIzed by L. Noordman.

Suppes - 116 -

not difficult to determine the denotation of each one of the noun

phrases. Two investigators will easily agree on the denotation of

the noun phrases, for example, on that of red book, if there are

some objects in front of the child.

We now have an ordinary type of theoretical analysis. Je know

the denotation of the 509 term~nal phrases; we know the denotat

ion of most of the words. We have a hypothesis concerning the

grammar, the semantic functions and some of the special words. ~e

can construct theoretical denotations from the semantic functions

and from the words we agreed on. The quest10n is: do these theo

retical denotations match the denotations the investigators

agreed on.

This description of the analysis of child's speech is somewhat

simplified. It only indicates how we test in f1rst approximation

the grammar and semantics we have wr1tten with the agreed-upon

denotations of a given collection of sp~ech.

The work may be more complicated for several reasons. There

may be discrepancies between the meaning of adult's speech and

child's speech. Sometimes adults will not agree on the denotation

of the utterances of the child. Moreover the investigation may

not have an appropriate analysis for manvutterances in a corpus,

although they agree on the denotation of these utterances. I

shall come back to this issue when talking about intensional se

mantics.

5.2 PROCEDURAL SEMANTICS

The analysis of semantics I have given in the previous lec

tures was in terms of set-theoretical notions; one failure of

this analysis 1S that it is not procedurally oriented. I would now

like to talk about procedural semantics. Uhat I have to say about

semantics is not at the stage of the analysis of performance in

arithmetic. I' would like to make a parallel between arithmetic

and semantics. In the case of arithmetic we used a register ma

chine to describe the process, we should like to have a s1milar

- 117 - Suppes

machine, to be called al~st machine, for the analysis of seman

tics. It is very clear what the intension is in the case of the

reglster ~achines for arithmetic. We have a clear formal con

ception of ths algorithms that the student is learning. The

point is to try to get an understanding of how in deta~l a stu

dent has internalised these algorithms. One test of our under

standing is whether we can successfully give an account of the

learning and performance of the student. We would like to do the

same thing in the case of semantics. Let us conslder the tree

for the red table or the tree for two red flowers as given

earlier. The question that arises repeatedly is: in what sense

are these denotations e.g. the power set 2oP(RnF) in the case of

two red flowers, represented in the head. It is not suggested

that the operatl0ns on sets constitute the operations in the

head of the child. The model-theoretic concepts however repre

sent a character~sation of the semantics of sentences which in

itself is algorlthmlc and well understood. In fact we take a

step forward in the representation of the semantics of sentences

by reducing it to such a representation. The next question is

how the semantics mlght be processed. ille can talk about list re

presentation: internally we have a list for each term~nal node.

The ~dea of a list machine is to compute the list for a node -

e.g. for red table; RnT - from the two underlying lists. In this

way one computes the root of the tree in the case of speech re

cognition. We replace sets and operations on sets by lists. We

can write a simple context-free grammar for lists:

SI-?(S)

5 -io 5S

5 ~ (5)

S ~ atom

The last rule is a lexical rewrite rule.

As a first approximation, we can say that the child is inter

nally representing the semantics as lists. And just as we had a

hierarchydt(D) indicating set-theoretical denotations, we have a

corresponding definition for a list hierarchy:oC(D) indicating

Suppes - 118 -

operations performed on lists. We now have psychologically and

conceptually a number of detailed fine-grained operations suggest

ing how we internally manipulate semantic representations.

In setting up procedures - lists in this case - one can work

with a higher order language such as LISP. Another approach, that

is psychologlcally oriented, is to postulate a simple register ma

chine and to see how closely we can tailor its instructions and

ltS operations to the actual behavior of the child. These ap

proaches are not mutually exclusive. \Ue have not yet applied in

detail this kind of machine to the language-learnlng of the child.

would like to make an additional re~ark about procedural 8e

mantlcs. Learning skills llke baseball-catching are often con

trasted wlth language-learning. It is sometlmes said that in the

case of e.g. learning to catch a baseball in contrast to learning

a language there s a possibility of looking lnto the procedures

the chlld is setting up in learnlng the skill. There are training

sessions; there is a clear improvement in performance, while

everybody learns the language without trainlng sessions. nut we

~ave to keep in mind the following facts. A child between 2 and

yearls old produces approximately 300 utterances per hour, With 6

hours a day there are approximately 500,000 utterances a year.

The nu~ber of utterances of adults to Children is about 1,000,000.

~8ople learn to catch baseballs In far ress trials. What I want

to emphasize is that there is a tremendous amount of experience

lnvolved in language-learning.

5.3 INTEfJSIONAL S IeS

A second range of problems that cannot be handled in exten

sional set-theoretical semantlcs concerns the analysis of inten

sional semantics: propOSitional attitudes lncluding modalities.

The apparatus I talked about however, can be adjusted to this

failure. Let me give examples of propOSitional attitudes,

It is possible that twill rsin tomorroUJ (modal)

It is necessary that Seymour will shave tomorrow (modal)

- 119 - Suppes

It is necessary that the number of planets is greater than 7 (modal)

- John bel~eves that Sm1th lS crazy

- I am seeking a unicorn

- I want a chocolate cookie.

The semantics we need has to give an account e.g. of the fact

that the sentence "Smith is crazy" might be false whlle the sen

tence "John bel~eves that Sm1th is crazy" mlght be true. The no

tion that is implied ~n proposit~onal attitudes is that of pos

slble world. Belleves are to be characterised not by the struc

ture of the actual world but by the structure of poss1ble worlds.

The not~on can be illustrated w~th a famlllar example. Conslder

an experiment ln Wh1Ch one throws a COln one hundred t~mes. The

sample space lS the set of poss~ble exper~mental outcomes: 2100

sequences.

Let us consider the following sltuatlon. 30meone lS oet

t~ng, say on trlal 15: II I am looklng for a run of 15 heads ~n

order to bet on thp next tr~al". The analvs~s of thls sentence

lS just llke the analysiS of the sentence "I am loo~lng for a

unlcorn tl• That person ~s looking for an SIlent that has a clear

denotation ln th8 set of posslble exoer~mental outco~es.

The lntroduction of the notion of posslble world lS not

enough to give an account of meanlng. A slmple example wlll il

lustrate thls pOlnt. Suppose the above mentioned person sald:

"I a~ lookin!] for a run of n heads where n lS the product of the

second and third pr1me numbers ••••• tI This sentence is meant ln

the possible world to describe the same 10ok1ng for. But 1ntui

tlvely the one sentence has an e3s11y understandable meanlng and

the other has not. Even thouoh we introduced tne notion of pos

slble world, we stlll m1SS a very lmportAnt lngredlent for the

analys1s of speech and the way we use languaqe. T~e ~ean1ngs of

the two given sentences are different in the sense e.g. that

they d1ffer in case of understanding.

30 we have to take lnto account congruence of meaning. The

phllosophlcal d1Scusslon about sameness of me~n1ng is ~ mist2ke.

Suppes - 120 -

~hat we want to discuss in fact is strong and weak notlons of con

gruence of meanlng. From a psychological point of view we want to

use similarity in case of understanding as a measure of closeness

of meaning.

Let me indicate now how we can put together intensional seman

tics with the concepts of meaning. We can start with an approxi

mate deflnition: the meaning of a sentence is its semantic tree.

m1th this definition, the two sentences "I am looking for ••••• "

given above do not have the same meaning since the two sentences

have different semantic trees.

If trees are the objects that we in first approXlmatlon regard

as the meaning of sentences, then the next step is to define va

rlOUS strong and weak notions of congruence of semantic trees.

"Semantic tree" here lncludes the denotations and the labels of

the tree: otherwise the noun phrase the red book would have the

same meaning as the noun phrase the green glass.

The weakest notion of congruence that has been studied in the

Ilterature 1S the natlon of semantic paraphrase. The two sentences

above starting with "I am looking for ••••• " are paraphrases of

each other. There are two ways of defining semantic paraphrase,

WhlCh are essentially equlvalent. According to Frege two sentences

are paraphrases of each other if they have the same logical conse

quences. In terms of intensional semantics, two sentences are pa

raphrases of each other, if the denotations of the roots of their

semantic trees are the same. In intensional semantics, the deno

tation of the root of the tree of a sentence is a proposltlan i.e.

a function from possible worlds to truth values. It is clear why

the definitlon of paraphrase is given in terms of intensional se

mantics, because in extensional semantics every root of the tree

of a sentence denotes either true or every two true sen-

tences as well as every twa false sentences would be the para

phrases of each other.

To have really rich paraphrasing is a good requirement on

either a model of long term memory or on a program that under

stands speech. However, the notion of semantic paraphrase is only

- 121 - Suppes

the weakest notion of congruence of meaning. Stronger notions are

introduced by permittlng restrlcted transformations on the trees.

Psycholog1cally interesting 1S the questing which transformatlons

on trees do not disturb the ab1lity to comprehend a sentence. The

investigation of congruence of meanlng is probably much more an

approprlate psychologlcal investigation than it is a phllosophl

calor loglcal one.

5.4 THE COMPLEXITY OF SEMANTICS

The systematic theories of semantics that have been proposed

have complex character. One can easily get the feeling that the

concepts are so complex that it seems hopeless to think of ap

propr1ate procedural mechanisms that we can postulate of the per

son speaking and learning the language. I think this is a mistake.

What is needed In semantics is an effort to construct simple pro

cedures that put together fragments of speech in terms of compre

hension or productlon. As an example of simple mechanisms that

under11e complex behavior, conslder an automobile moving down a

straight road with a constant speed and a dog that is chasing the

car with a constant speed, pointing his nose always toward the

car. The question is: what is the trajectory of the motion of the

i ~ car dog

dog. No one wlll say that the dog is uSlng a differential equa

tion. The dog uses a simple constant mechanism: lt keeps lts eyes

on the car. The resulting trajectory is a relatively sophisti

cated curve.

Anothe~ example comes from astronomy and Greek geometry. The

motion of the planets observed from the earth looks hopelessly

complex. Some very clever ideas about decompOSing th1s motion

into clrcular motion were developed by the Greeks and applied

quite successfully to the observations. The analysiS of the data

at the time of Ptolemy, who in fact restricted himself to rather

static events that occur at a fixed time like eclipses, was a

Suppes - 122 -

very complex description in ter~s of circles. Decause there were

for the Greeks two kinds of natural motion, circular and linear,

the decomposition into circular motions was a satisfactory ex

planation. However, when one tries to fit the data more exactly

and especially when one tries to give an account of all the dy

namics of these mot1ons, the Greek explanation becomes less sa

tisfactory. The descript10n in terms of circular mot1ons was so

complex that one got discouraged as to how one could ever hope

to have any mechanisms that describe all the dynamics of these

motions. But the opposite turned out to be the case: with New

ton's hypotheSiS, the explanation was relatively simple. The

idea that appeared to unify and organize the observations 1S an

enormously slmple mechanism. So in this case there has been data

reduction for centuries.

In the domain of language we are Just beginning to make such

conceptual simpl1fications. What has to be done is to conceive

and test simple mechanisms that can give an account of the very

complex phenomena of language use.

- 123 - Suppes

Suppes, P., "Stlmulus-response theory of finlte automata. J. Math. Psych. 6 (1969) 327-355.

Suppes, P., IIProbabilistic grammars for natural languar]8s", 22 (1970) 95-116.

Suppes, P., languages ll

,

Sciences, Stanford University, Psychology Report No • .:!2,l, march 30,1971.

Date post:	21-Nov-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

THEORY OF AUTomATA AND ITS APPLICATION TO …

Documents