Modeling the Acquisition of Words with Multiple Meanings

Proceedings of the Society for Computation in Linguistics

Volume 2 Article 23

2019

Modeling the Acquisition of Words with MultipleMeaningsLibby BarakPrinceton University, [email protected]

Sammy FloydPrinceton University, [email protected]

Adele GoldbergPrinceton University, [email protected]

Follow this and additional works at: https://scholarworks.umass.edu/scil

Part of the Computational Linguistics Commons, and the Developmental Psychology Commons

This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Proceedings of theSociety for Computation in Linguistics by an authorized editor of ScholarWorks@UMass Amherst. For more information, please [email protected].

Recommended CitationBarak, Libby; Floyd, Sammy; and Goldberg, Adele (2019) "Modeling the Acquisition of Words with Multiple Meanings," Proceedingsof the Society for Computation in Linguistics: Vol. 2 , Article 23.DOI: https://doi.org/10.7275/tr21-m273Available at: https://scholarworks.umass.edu/scil/vol2/iss1/23

https://scholarworks.umass.edu/scil?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.umass.edu/scil/vol2?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.umass.edu/scil/vol2/iss1/23?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.umass.edu/scil?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/375?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/410?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.umass.edu/scil/vol2/iss1/23?utm_source=scholarworks.umass.edu%2Fscil%2Fvol2%2Fiss1%2F23&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

Modeling the Acquisition of Words with Multiple Meanings

Libby Barak, Sammy Floyd, and Adele GoldbergPsychology Department

Princeton Univesrity{lbarak,sfloyd,adele}@princeton.edu

Abstract

Learning vocabulary is essential to success-ful communication. Complicating this task isthe underappreciated fact that most commonwords are associated with multiple senses (arepolysemous) (e.g., baseball cap vs. cap of abottle), while other words are homonymous,evoking meanings that are unrelated to one an-other (e.g., baseball bat vs. flying bat). Mod-els of human word learning have thus far failedto represent this level of naturalistic complex-ity. We extend a feature-based computationalmodel to allow for multiple meanings, whilecapturing the gradient distinction between pol-ysemy and homonymy by using structured setsof features. Results confirm that the presentmodel correlates better with human data onnovel word learning tasks than the existingfeature-based model.

1 Introduction

Children acquire language at a remarkable ratedespite many layers of complexity in their learn-ing environment. Previous computational modelsof human vocabulary learning have been primar-ily aimed at the mapping problem or the prob-lem of “referential indeterminacy” (Quine, 1969),namely, determining which word maps onto whichobject within a noisy context (Siskind, 1996;Trueswell et al., 2013; Stevens et al., 2017; Smithet al., 2014; Fazly et al., 2010; Frank et al., 2009).These models explicitly make the simplifying butcounter-factual assumption that each word canmap to only one meaning in order to address howit is that learners determine which meaning a wordrefers to from among multiple potential referentsin a scene. The models further assume that eachpossible meaning competes with every other pos-sible meaning. For example, in a scene depicting“A cat drinking milk”, the meaning of the wordcat competes with the meaning of milk, bowl and

every other potential meaning evoked in the scene.This perspective emphasizes the richness of visualscenes, but it overlooks the complexity associatedwith word meanings which very commonly referto multiple distinct senses or meanings (Piantadosiet al., 2012). For example, a bowl can refer to a“dish used for feeding” in the cat scene, but to a“toilet bowl” within a different context. That is,the meaning of a word cannot be a winner-takes-all affair in which meanings compete with one an-other across contexts, because people learn to as-sign multiple meanings to many words in their vo-cabularies.

Multiple meanings of one word can typicallynot be subsumed under a general definition or rule.This is clearly true in the case of homonyms,which have multiple, unrelated meanings (e.g.,baseball bat vs. flying bat). It is also true of manypolysemes, which evoke conventional senses thatare related to one another yet distinct. Natural lan-guage polysemy often involves extensions alongmultiple dimensions that are not completely pre-dictable on the basis of a general definition orrule. For example, while baseball caps and bottlecaps both cover something tightly, English speak-ers must learn that corks and lids, which also coverthings tightly, are not called caps, while mush-room caps are, even though the latter do not coveranything tightly (for discussion of rule-based pol-ysemy see e.g., (Srinivasan and Rabagliati, 2015;Srinivasan et al., 2017)). Notably, polysemes aremuch more frequent than homonyms, insofar as40% of frequent English words are polysemous(Durkin and Manning, 1989), while closer to 4%of words are homonyms (Dautriche, 2015).

Even though homonyms are relatively rare, chil-dren as young as 3 years old have been found toknow a number of them (Backscheider and Gel-man, 1995). At least for these words, preschoolershave managed to overcome their reluctance to as-

216Proceedings of the Society for Computation in Linguistics (SCiL) 2019, pages 216-225.

New York City, New York, January 3-6, 2019

sign a second meaning to a familiar word (Casen-hiser, 2005). We also know that children readilygeneralize the meaning of a word to include newreferents that share a single dimension, such asshape (Smith et al., 2002) or function (Gentner,1978), and Srinivasan et al. (2017) has found that4-5 year-old children can be taught that a word ex-tends to other referents that share the same mate-rial (Srinivasan et al., 2017).

While previous psycholinguistic work has pri-marily focused on learning words with a singlemeaning or words that can be generalized alonga single dimension (rule-based polysemy), a re-cent study that we simulate below has investi-gated words with multiple distinct, conventionalmeanings (non-rule-based). This work has demon-strated that it is easier to learn conventional poly-semy when compared with homonymy, even whenthe polysemy follows complex, multidimensionalextension patterns as in natural language. 1

We propose a computational model that al-lows words to be assigned multiple meanings thatcannot be generated by a one-dimensional rule,but must instead be learned through exposure(Brocher et al., 2017). We use the results fromthe behavioral experiment in order to inform andtest the proposed model. As reported, the modelnot only captures the finding that people find iteasier to learn polysemous words than ambiguouswords, but it also closely approximates human er-rors. This represents a first step toward addressingthe complexity involved in learning more than asingle meaning of a given word.

2 Related Work

Only two recent models of human vocabularylearning allow words to evoke multiple senses.The model of Kachergis et al. (2017) implements abias to prefer a single referent, but allows a second(unrelated) candidate meaning to be represented.Another model, Pursuit, maps each word onto asingle candidate meaning per trial, and selects anew candidate meaning (at random) only when theprimary meaning is disconfirmed (Stevens et al.,2017). This model retains a stipulation that onlya single meaning wins. Importantly, neither ofthese models is evaluated on their ability to ac-curately represent multiple meanings. In fact,these and most other models make the simpli-fying assumption that each sense is represented

1Experimental results are under submission.

atomically, without any internal structure or fea-tures. This precludes them from even attemptingto distinguish polysemy from homonymy, sinceeach meaning is equally (un)related to every othermeaning.

It is necessary to allow word meanings to haveinternal structure if we are to capture relation-ships among meanings of a single word. The onemodel of human vocabulary learning that assignssuch internal structure is the feature-based asso-ciative model of (Fazly et al., 2010), which hasbeen extended in multiple studies to account forpatterns of learning complex naturalistic meaning(Nematzadeh et al., 2012, 2014). This model rep-resents a cross-situational learner, acquiring themeaning of each word incrementally by aligningeach feature in the context with a probabilistic as-sociation to each word. The model learns by ulti-mately representing each word’s meaning as an as-sociated “bag-of-features”. We choose this modelas a basis for our approach, given its successful ap-plication in many word learning tasks and its abil-ity to represent fine-grained properties (features)of meanings.

But critically, we extend the NFS12 model inorder to represent the learning of words with mul-tiple distinct meanings that may share overlappingfeatures to varying degrees. The key innovationwe add to the bag-of-features model of NFS12is the following: we assign each distinguishableobject a distinct, albeit overlapping, set of fea-tures. In our version, the model learns wordsas associations to distinct structured collections offeature-sets rather than learning independent asso-ciations of each word to each feature. We replicatethe input and tasks of recent experimental multi-meaning word learning work, and compare theperformance of the extended model with NFS12and with the performance of human learners. Inthe following sections, we describe the originalmodel and our modification of it.

3 Computational Models

3.1 Cross-situational Word Learning Model

We use the implementation of the cross-situationalword learner as implemented by Nematzadeh et al.(2012) (NFS12) as the best fitting basis for ourmodel. While later versions of the model arealso available, these versions encode assump-tions regarding hierarchical-categorical learningthat are irrelevant to this research and require

217

hand-coded data of the categories in the input.NFS12 learns from <utterance, scene> inputpairs that simulate what a language learner hearsin the linguistic input: i.e., the utterance, andthe features corresponding to the non-linguisticcontext (the scene). For example, the learnermight first encounter the word cap accompaniedby features that represent the scene of a parentasking a child to put a cap on a summer day, e.g.,

Utterance = “put your cap on”Features = {sun, light, clothing, fabric,

cover, animate,...}

The features for each utterance correspond to allrelevant aspects of the understood message and thewitnessed scene. This is represented as a bag-of-features in the sense that there are no boundariesto indicate which features represent each object inthe visual world. The model learns the probabilis-tic association between each feature, f , and eachword, w, through a bootstrapping process. Themodel initializes all Pt�1(f |w) to a uniform dis-tribution over all words and features. At time t,the model learns the current association of w andf as proportional to their prior learned probability:

assoct(w, f) =Pt�1(f |w)P

w02U Pt�1(f |w0)(1)

where Pt�1(f |w) is the probability of f being partof the meaning of w at the previous learning step.If the association of f with some other word in theutterance is particularly high, the association of fwith w will be correspondingly lower. The newevidence is then used to update the probability ofall observed features in a smoothed version of:

Pt(f |w) =assoct(w, f)P

f 02F assoct(w, f 0)(2)

where F is the set of all features observed thusfar. The associations are thus summed over theiroccurrences in the input in proportion to the timepassed since last occurrence.

assoct(f, w) = ln(tX

t0=1

at0(w|f)

(t � t0)d) (3)

The associations are updated with every learn-ing step to account for past experience. The de-nominator represents the decay of the associationover time as memories of the input are assumed to

fade in memory. d is proportional to the strengthof association such that stronger associations willfade less, even when significant time has passedsince a previous encounter of w, i.e., t � t0. Thelearning iterations result in an association scorebetween each feature and each word based on theobserved input. The acquisition of word meaningis defined as success on a prediction task over thelearned associations as described in Section 4.

3.2 Exemplar-based Learning as Sets ofFeatures

The NFS12 model creates a bank of associationsof varying strengths between features and words.It is based on the idea that over many observationsof a word, the features that are actually relevantto that word will gain in probability over featuresthat only coincidently co-occurred with the wordin some subset of contexts. To date, no version ofNFS12 has been evaluated on words with multi-ple senses. Note that if applied to multiple mean-ings in its current formulation, all of the featuresfrom all of the word’s meanings will become as-sociated with the word, without regard to whethercertain features tend to occur with one meaningwhile other features tend to occur with a differ-ent meaning. That is, a word with multiple mean-ings will come to be associated with a mergedbag-of-features. For instance, separate occur-rences of the word cap would be associated witheither {plastic, cover, bottle} or {fabric, head}but the model would predict that a combination offeatures such as {cover, fabric, bottle} would bea reasonable interpretation of cap.

We predict that this vague representation willnot be sufficient to approach human-like perfor-mance in recognizing distinct senses. Based onevidence that people are able to remember partic-ular instances of objects they observe (Allen andBrooks, 1991; Brooks, 1987; Thibaut and Gelaes,2006; Nosofsky et al., 2018), we modify the inputrepresentations to include sets of features for eachword in the utterance as follows.

We propose a Structured Multi-Feature (SMF)model that extends NSF12, by associating eachword with sets of features that have been learnedon the basis of witnessing potential referents (asopposed to features) across scenes.2 For example,if a scene involved two potential referents (the

2Like other models of human word learning, we focus ourevaluation on for now the learning of words that correspondto referents in scenes.

218

sun and a baseball cap), the following featuresets would be candidates for association with thewords in the utterance:

Utterance = “put your cap on”Feature sets = {sun, light},

{clothing, fabric, cover}

We modify the learning process to estimate theassociation of a word, w, and a set of features, s,following the formulation of the original model.

assoc(w, s) =Pt(s|w)P

w02U Pt(s|w0)(4)

Thus a set of features, s, essentially representsan hypothesized sense of a referential word. Theprobability Pt(s|w) is estimated from the previousoccurrences of the word, where the probability ofeach set is proportional to the degree of overlap infeatures rather than a direct observation of the spe-cific set. The degree of overlap between two sets,sf and sj is calculated using the Jaccard similar-ity coefficient, which is the proportion of sharedfeatures across the two sets over all features in thetwo sets.

jacc � sim(sf , sj) =|sf \ sj ||sf [ sj |

(5)

The modification – making use of coherent setsof features rather than independent features – cap-tures a key claim about how people learn referen-tial words. Rather than learning the degree of as-sociation between words and individual features,e.g., learning cap and fabric, independently of theassociation between cap and clothing, the modelassumes that people learn from coherent exem-plars. The learner eventually learns a collectionof sets of features with various degrees of associ-ation strength among the feature sets. The asso-ciation between fabric and cap can only be deter-mined once other features are taken into accountas well. In this case, fabric will be more stronglyassociated with cap in the presence of the feature,clothing, and less associated with cap if the fea-ture, bottle, is included and clothing is missing. 3

(a) Polysemy

(b) Homonymy

Figure 1: A sample of the objects used in thenovel-word learning experiment. Polysemy (up-per panel) - pairs share properties as marked byarrows, with no single core feature shared by allthree exemplars. Homonymy (lower panel) - ascrambled selection of three objects with fewer re-lationships among exemplars.

4 Learning Polysemy vs. Homonymy

We evaluate the NFS12 model and the presentSMF model by simulating a novel word learn-ing task in which human participants learned sev-eral polysemous or homonymous words as de-scribed below. The experiment compared howthree populations learn words with multiple mean-ings: adults, typically developed children, andchildren with autism spectrum disorder. Since noprevious computational models have attempted tocapture how humans learn multi-meaning words,we focus on the adult group as a first step sinceit allows us to minimize assumptions regardinglearners’ development of cognitive abilities. Wefollow the experimental design to investigate, forthe first time, how words with distinct but relatedsenses (conventional polysemy) are learned, par-ticularly when the range of senses do not followfrom any language-wide rule.

4.1 Novel Word Learning Experiment

The experimental work explicitly compared thedistinction between homonymy and conventionalpolysemy. In particular, participants learned 4novel words, and each novel word was associatedwith 3 clearly distinct novel objects. Randomlyinterspersed among the 12 labeled objects were 20

3A very recent publication by the authors of NFS12 exper-iments with the use of sets but remains limited to single-senseword representations and still learns association of word overfeatures rather than sets (Nematzadeh et al., 2017).

219

Figure 2: An example of the stimuli presented toparticipants in the label ID task. The target ob-ject was presented along with 3 distractors, whichwere targets for other novel words.

unlabeled filler (non-target) objects accompaniedby tones. Novel objects were used to avoid in-terference from familiar words. Half of the par-ticipants were randomly assigned to a Polysemycondition in which the objects were related to oneanother with the 3 objects sharing distinct featureswith one another. The other half was assigned toa Homonymy condition, in which the 3 objectsassigned to each word did now share any distin-guishing features that could distinguish them fromthe filler objects in terms of a stronger feature-relation. (See Figure 1 for an example). The “pol-ysemous” meanings of words were confirmed tobe more similar than the “homonymous” mean-ings, as intended, using a separate norming studywith a new group of participants.

After brief exposure, participants completedthe following two tasks designed to determinewhether polysemous words were easier to learnthan homonymous words.

1. Label ID task - Participants were asked to se-lect, from 4 available options, the one ob-ject that corresponded to a given label. The3 foil objects had been labeled by each ofthe 3 other labels (see Figure 2). Resultsshowed significantly higher accuracy for thepolysemy condition over the homonymy con-dition.

2. Sense Selection task - Participants were pre-sented with the label of one of the 4 words,and shown 8 objects (see Figure 3). Threeof the objects corresponded to the 3 sensesof the word and the 5 additional objects werefillers that had been witnessed during expo-sure. Accuracy was lower on this task, show-ing only slight polysemy advantage due totask difficulty.

Figure 3: An example of the stimuli presented toparticipants in the Sense Selection task. All 3 tar-get objects associated with one of the novel wordswere presented along with 5 filler objects, whichhad also been witnessed during the exposure buthad not been labeled.

The Label ID task allows a comparison of thetwo conditions, polysemy and homonymy, but par-ticipants may have used the memory of one or twowords to perform well by a process of eliminationby recognizing an object as related to a differentlabel. The Sense Selection task allows for a morethorough error analysis; importantly, the particularobjects selected by humans was made available tothe computational analysis. We perform an erroranalysis on the selection rate of each filler object.The results of this task provide a crucial test of thebag-of-features meanings learned by the NFS12model.

4.2 Experimental Simulation

We trained the model on input that reflected theexposure in the novel word study. In particular,two annotators hand-coded each object with 4 to5 features to compose a joint a list of 40 fea-tures that jointly described all 12 labeled and 20unlabeled (filler) objects. The features includedproperties related to shape, size, color, texture,material, symmetry, etc. We trained the NSF12and SMF models independently: the features wereused as a bag-of-features for the NFS12 model,and as structured sets of features for SMF. Recallthat although each input item consisted of a sin-gle word associated with observable features, themodels differ in the way they learn. NFS12 learnsthe association of a word with each feature, whileSMF learns the association of a word to a subsetof features.

At the end of training, we tested the models bysimulating each of the two tasks described above.We first estimate the association of each word toeach of the items, using the cosine distance be-tween the learned associations and the feature rep-

220

Polysemy HomonymyNFS12 0.88 0.37SMF 0.92 0.51

Table 1: Pearson correlation between results fromparticipants on the task with NSF12 and proposedSMF models.

resentation of the word. For the NFS12 model,we calculated the cosine similarity between all theassociated features. For the SMF model, we cal-culated the maximum cosine similarity score overall the sets of features associated with the wordand the feature representation of the item (i.e., weconsidered the sense of the word most similar tothe object in question).

The likelihood of choosing an object as a targetis measured by the proportional similarity of eachobject compared with the other objects presentedin the task. For each stimuli set of 4 items used inthe Label ID task (see Figure 2) and 8 items usedfor the Sense Selection task (see Figure 3) , wecalculate

P (object|w) =cos(o, w)P

o02O cos(o0, w)(6)

where, w is the word presented as visual stimuli attest. o ranges over all the objects presented at test(4 or 8 items), and O is the full set of objects forthis test set.

5 Results

The experimental settings kept the alignment ofsets of objects constant across participants whilerandomizing the word labels and the order of ob-ject presentation. For example, the same set of 4objects in Figure 2 was used to test all 4 words. Wereplicated the combinations of objects to test eachlabel in order to compare the computational mod-els to people’s choices. We used the default pa-rameter settings included in the configuration filesfor NFS12.

5.1 Label ID Task

We first evaluate each model on its ability to repli-cate the polysemy advantage observed in humandata. We obtain the item selection probability us-ing Equation 6 for the target items only. Followingthe results from human experiments, we averagethe item probability over all targets to get the re-sults from each model (see Figure 4).

Figure 4: The likelihood of choosing an object cor-responding to a target sense for Homonymy vs.Polysemy conditions in the Label ID task for (1)the human results, (2) the model of NFS12, (3)our extension of that model, SMF.

As can be seen in the middle panel of Fig-ure 4, NFS12 replicates the human polysemy ad-vantage in identifying the target meanings in thepolysemy condition more accurately than targetsin the homonymy condition. However, the accu-racy of NFS12 in choosing the target object forthe given label is considerably lower comparedwith the human results in the left most panel.The low accuracy of NFS12 compared with hu-man performance suggests that NFS12 is prone toselecting non-target objects which do not resem-ble a specific sense of the learned word. Recallthat because NSF12 does not maintain each exem-plar’s associated set of features, the model createsa superset of weighted features without respect-ing the co-variance among features that are asso-ciated with a particular sense. Thus, NSF12 as-signs high probability to objects which share fea-tures from distinct target meanings, whereas hu-mans are much less likely to do this.

To illustrate, Figure 1 provides example stim-uli representing three senses of a polysemous vs.homonymous word. The middle object in the up-per panel of Figure 1 shares some features with theleft-most object (e.g., handle and overall shape),and other features with the right-most object (e.g.,color, texture, and rectangular shape). However,the homonymous senses of the word share fewerfeatures with one another (lower panel of Fig-ure 1). Since almost no features overlap betweenpairs of homonymous objects, the number of fea-tures included in the superset for this word is

221

higher than for the polysemous word.

As a result, NFS12 under-performs in accuracyfor all targets presented in the upper panel of Fig-ure 1 in both the homonymy and polysemy con-ditions. In homonymy (lower panel of Figure 1),the bag-of-features consists of more features com-pared with the polysemous condition. Probabilis-tically, this bag-of-features will generate a highernumber of subsets that happen to coincide withthe features associated with fillers, which results inlower the accuracy of NFS12 for homonymy. Forinstance, NFS12 accuracy is significantly lowerfor the translucent yellow item in the middle ofthe panel because it simply aggregates the highlyfrequent features, learning a strong association be-tween the word label and the feature orange. Onthe other hand, SMF preserves the co-occurrencestatistics of features, preventing the orange featurefrom being incremented in isolation from the otherfeatures of that object.

The SMF model also captures the polysemyover homonymy advantage with higher accuracythan NFS12. Overall then, accuracy more closelymatches human performance more closely whencompared with NFS12 (see right panel on Fig-ure 4). To quantify the correlation of each modelwith the particular selections made by human par-ticipants, we calculate the Pearson correlation overall objects (targets and fillers), using the resultsfrom Equation 6. (We use the Pearson correlationas the results from both human and models havenormal distributions with kurtosis values close to3).

The correlations with human errors for bothmodels are given in Table 1. SMF offers signifi-cant improvement over NFS12 in the homonymycondition, and mirrors human errors in the poly-semy condition slightly better as well. The weakerabsolute correlation in the homonymy conditionof the SMF model when compared with polysemy(.51 vs. 92) stems from the model over-performingon some items while under-performing on oth-ers, when compared with humans. We hypoth-esize that people differ from the model in theweights they give particular features, e.g., colorvs. size. For example, SMF has higher accu-racy than humans in selecting the leftmost itemin the homonymy condition in Figure 1, possiblyby forming a bias towards large-size items, whilepeople may not attend to size to the same degree.

The models increase probability with every

Figure 5: Sense Selection Task: The likelihood ofchoosing an object corresponding to a target sensefor Homonymy vs. Polysemy conditions in (1) thehuman results, (2) the model of NFS12, (3) ourextension of the model, SMF.

Polysemy HomonymyNFS12 0.82 0.67SMF 0.91 0.90

Table 2: Pearson correlation between errors pro-duced by human participants with NSF12 andSMF models on the Sense Identification task.

overlapping feature regardless of what the featuredenotes (shape, color, size, etc.). It is well knownthat children learn to attend to shape in learningreferential novel nouns by two years of age (Smithet al., 2002). Moreover, people learn to attend tocertain dimensions of meaning more closely givencertain categories, e.g., using color to distinguishfruits and vegetables but not dogs and cats (Slout-sky et al., 2016). The SMF model overcomes thisdifficulty to some degree by having distinct mem-ories of individual items. In order to capture thesesorts of biases toward certain features for certaintypes of words, future models need to learn suchbiases over time, as we discuss further in Sec-tion 6.

5.2 The Sense Selection task

Following the results in subsection 5.1, we aim tofurther our analysis of the learning pattern of eachmodel using a second task which challenges par-ticipants to recognize all senses of a word simul-taneously, and includes more distractors (filler ob-jects). The accuracy of choosing all three targetitems is presented in Figure 5.

Human’s polysemy advantage was less pro-

222

nounced in the Sense Selection task comparedwith the Label ID task. As shown in Figure 5,both models also show less difference betweenpolysemy and homonymy than they did on theLabel ID task. While the polysemy advantage ishigher in NFS12, SMF actually shows closer per-formance to the human data, due to more compa-rable levels of accuracy.

We again evaluate the probability of choosingeach of the objects over both targets and fillers.That is, we compare the probability of each modelselecting each object with human performance. Inparticular, we calculate the Pearson correlation be-tween each of the two models and human results;see Table 2. The correlations of SMF with hu-man results are much better than NFS12 in boththe Polysemy and Homonymy conditions. Theseresults align with our findings in the previous sim-ulation, especially in mirroring NFS12’s difficultyin learning unrelated senses (homonymy). TheSMF model, on the other hand, approaches a 0.9correlation in both conditions. Thus, althoughthe SMF model has a lower overall probabilityof choosing targets compared to people, it closelymirrors human error patterns. These results sup-port the role of distinct memories of exemplars,while taking into account the overlap among setsof features during selection. Note that the highcorrelations can be attributed to similarity in therelative ranking across items for the human re-sults and SMF. At the same time, SMF still under-estimate the overall probability of predicting cer-tain items, which results in a lower accuracy com-pared with the human results.

6 Discussion and Future Directions

We have presented a computational analysis of theacquisition of word meaning for words with multi-ple senses. Despite the growing interest in compu-tational models for analyzing human word learn-ing, this aspect has remained under-studied due tothe complexity of the problem. Our analysis is thefirst, to our knowledge, to directly model differ-ences in the acquisition of multi-sense words withvarying degree of overlap across senses. The com-putational design enables a closer analysis of thestrengths and weaknesses involved in the humanlearning of multi-sense words, though the analy-ses of human errors.

The model of Nematzadeh et al. (2012) learnedthe association between independent features and

words. It was chosen as the benchmark forour analysis because it represents the rare modelwhich goes beyond atomic meanings by offer-ing feature-based representations. Results demon-strate, however, that its bag-of-features represen-tation is not sufficient to account for human-likelearning of multi-meaning words, particularly inthe case of homonymy, where combining the fea-tures of unrelated senses results in a particularlynoisy representation. Our modified version whichis a Structured Multi-Feature model, changes boththe input representation and how the model learnsto associate words with meanings. In particular,SMF preserves the co-occurrence statistics of thefeatures associated with particular objects (exem-plars), as motivated by evidence in human mem-ory research (Allen and Brooks, 1991; Brooks,1987; Thibaut and Gelaes, 2006; Nosofsky et al.,2018).

This study offers only the first step toward acomputational model that fully captures the waythat human learn realistic words, which commonlyevoke a range of senses that importantly includefunction and metaphorical extensions that are notpart of the interpretation of our novel stimuli. Werecognize that our hand-coding of features makesboth NFS12 and SMF impractical, but insofar aswords meaningfully differ on a number of dis-tinct dimensions, the reliance on features–howeverthey are to be determined–is reasonable. Giventhe quite short exposure phase in the experimentalwork, the current analysis has not explored the roleof memory or attention mechanisms included inthe original model of NFS12 (Nematzadeh et al.,2014).

We believe that correlations with human perfor-mance could potentially be improved with suprisalor novelty affecting the weights of features. Wealso know that people pay more attention to somefeatures over others in a way that depends on lin-guistic cues, the domain involved, and their priorknowledge. For example, people attend to colorsto distinguish fruits, while color is less importantwhen identifying dogs vs. cats.

The addition of structured sets of features offersan improvement over a general bag-of-features ap-proach and has demonstrated strong correlationswith human performance. Learning words withmulti-meanings is a common occurrence in natu-ral languages so it behooves models that aim tocapture this basic fact.

223

Future extensions of SMF should incorporatea mechanism to simulate attention, including pri-macy and recency effects, in order to investigatehow people weight different features or dimen-sions of meaning in various contexts. Although,NFS12 included a mechanism in the model to en-code higher attention to novel words, this onlycaptures item-based novelty, i.e., how frequentlyan item is observed, which does not play a signif-icant role within the context of our experiment.4

The multi-meaning words, however, introduce thechallenge of attending to new meanings of familiarwords over a short span of time. To more fully un-derstand the relevant mechanisms and their rolesin word learning, we plan to simulate the tasksdiscussed here using real-world polysemes withmuch richer sets of features. The conclusions ofthis study will be further used to guide extensionsof the experimental designs in order to considerthe role of attention in human word learning aswell.

ReferencesScott W Allen and Lee R Brooks. 1991. Specializing

the operation of an explicit rule. Journal of experi-mental psychology: General, 120(1):3.

Andrea G Backscheider and Susan A Gelman. 1995.Children’s understanding of homonyms. Journal ofChild language, 22(1):107–127.

Andreas Brocher, Jean-Pierre Koenig, Gail Mauner,and Stephani Foraker. 2017. About sharing andcommitment: the retrieval of biased and balancedirregular polysemes. Language, Cognition and Neu-roscience, pages 1–24.

Lee R Brooks. 1987. Decentralized control of catego-rization: The role of prior processing episodes.

Devin M Casenhiser. 2005. Children’s resistanceto homonymy: An experimental study of pseudo-homonyms. Journal of Child Language, 32(2):319–343.

Isabelle Dautriche. 2015. Weaving an ambiguous lexi-con. Ph.D. thesis, Sorbonne Paris Cite.

Kevin Durkin and Jocelyn Manning. 1989. Polysemyand the subjective lexicon: Semantic relatedness andthe salience of intraword senses. Journal of Psy-cholinguistic Research, 18(6):577–612.

Afsaneh Fazly, Afra Alishahi, and Suzanne Steven-son. 2010. A probabilistic computational model ofcross-situational word learning. Cognitive Science,34(6):1017–1063.4We use the original settings and keep this constant with

future studies.

Michael C Frank, Noah D Goodman, and Joshua BTenenbaum. 2009. Using speakers’ referential in-tentions to model early cross-situational word learn-ing. Psychological science, 20(5):578–585.

Dedre Gentner. 1978. A study of early word meaningusing artificial objects: What looks like a jiggy butacts like a zimbo? Reading in developmental psy-chology.

George Kachergis, Chen Yu, and Richard M Shiffrin.2017. A bootstrapping model of frequency and con-text effects in word learning. Cognitive science,41(3):590–622.

Aida Nematzadeh, Barend Beekhuizen, ShanshanHuang, and Suzanne Stevenson. 2017. Calculatingprobabilities simplifies word learning. Proceedingsof the 39th Annual Conference of the Cognitive Sci-ence Society.

Aida Nematzadeh, Afsaneh Fazly, and SuzanneStevenson. 2012. A computational model of mem-ory, attention, and word learning. In Proceedings ofthe 3rd Workshop on Cognitive Modeling and Com-putational Linguistics, pages 80–89. Association forComputational Linguistics.

Aida Nematzadeh, Afsaneh Fazly, and SuzanneStevenson. 2014. A cognitive model of semanticnetwork learning. In Proceedings of the 2014 Con-ference on Empirical Methods in Natural LanguageProcessing (EMNLP).

Robert M Nosofsky, Craig A Sanders, and Mark A Mc-Daniel. 2018. Tests of an exemplar-memory modelof classification learning in a high-dimensionalnatural-science category domain. Journal of Exper-imental Psychology: General, 147(3):328.

Steven T Piantadosi, Harry Tily, and Edward Gibson.2012. The communicative function of ambiguity inlanguage. Cognition, 122(3):280–291.

Willard V Quine. 1969. Word and object. Cambridge,Mass.

Jeffrey Mark Siskind. 1996. A computational studyof cross-situational techniques for learning word-to-meaning mappings. Cognition, 61(1-2):39–91.

Vladimir M Sloutsky et al. 2016. Selective attention,diffused attention, and the development of catego-rization. Cognitive psychology, 91:24–62.

Linda B Smith, Susan S Jones, Barbara Landau, LisaGershkoff-Stowe, and Larissa Samuelson. 2002.Object name learning provides on-the-job trainingfor attention. Psychological Science, 13(1):13–19.

Linda B Smith, Sumarga H Suanda, and Chen Yu.2014. The unrealized promise of infant statisti-cal word–referent learning. Trends in cognitive sci-ences, 18(5):251–258.

224

Mahesh Srinivasan, Catherine Berner, and HughRabagliati. 2017. Childrens use of lexical flexibil-ity to structure new noun categories. In Proceedingsof the 39th Annual Conference of the Cognitive Sci-ence Society.

Mahesh Srinivasan and Hugh Rabagliati. 2015. Howconcepts and conventions structure the lexicon:Cross-linguistic evidence from polysemy. Lingua,157:124–152.

Jon Scott Stevens, Lila R Gleitman, John C Trueswell,and Charles Yang. 2017. The pursuit of word mean-ings. Cognitive science, 41(S4):638–676.

Jean-Pierre Thibaut and Sabine Gelaes. 2006. Exem-plar effects in the context of a categorization rule:Featural and holistic influences. Journal of Experi-mental Psychology: Learning, Memory, and Cogni-tion, 32(6):1403.

John C Trueswell, Tamara Nicol Medina, Alon Hafri,and Lila R Gleitman. 2013. Propose but verify:Fast mapping meets cross-situational word learning.Cognitive psychology, 66(1):126–156.

225

Date post:	17-Mar-2023
Category:	Documents
Upload:	khangminh22
View:	0 times
Download:	0 times

Modeling the Acquisition of Words with Multiple Meanings

Documents