+ All Categories
Home > Documents > Learning categories at different hierarchical levels: A comparison...

Learning categories at different hierarchical levels: A comparison...

Date post: 16-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
9
Psychonomic Bulletin & Review 1999, 6 (3), 495-503 Learning categories at different hierarchical levels: A comparison of category learning models THOMAS J. PALMERI Vanderbilt University, Nashville, Tennesee Three formal models of category learning, the rational model (Anderson, 1990),the configural-cue model (Gluck & Bower, 19S5a), and ALCOVE (Kruschke, 1992), were evaluated on their ability to ac- count for differential learning of hierarchically structured categories. An experiment using a theoreti- cally challenging category structure developed by Lassaline, Wisniewski,and Medin (1992) is reported. Subjects learned one of two different category structures. For one structure, diagnostic information was present along a single dimension (I-D). For the other structure, diagnostic information was dis- tributed across four dimensions (4-D).Subjects learned these categories at a general or at a specific level of abstraction. For the I-D structure, specific-level categories were learned more rapidly than general-level categories. For the 4-Dstructure, the opposite result was observed. These results proved highly diagnostic for evaluating the models-although ALCOVE provided a good account of the ob- served results, the rational model and the configural-cue model did not. In recent years, there has been tremendous growth in the development offormal models of classification. These include exemplar (e.g., Estes, 1994; Kruschke, 1992; Nosofsky, 1986; Nosofsky & Palmeri, 1997; Palmeri, 1997), connectionist (e.g., Gluck & Bower, I988a, 1988b), Bayesian statistical (e.g., Anderson, 1990), decision bound (e.g., Ashby & Maddox, 1993; Maddox & Ashby, 1993), and rule-based models (e.g., Nosofsky & Palmeri, 1998; Nosofsky, Palmeri, & McKinley, 1994; Palmeri & Nosof- sky, 1995). Although these models make different as- sumptions about category representations and processes, many of them make similar predictions of some elemen- tary patterns of classification data. This has caused re- searchers to look at more detailed aspects of classification in order to evaluate models. In the present work, rather than simply asking whether models could account for transfer data following category learning, the models were in- stead evaluated on whether they could account for patterns of classification throughout the entire training sequence. This work follows in a line of recent studies that have fo- cused on understanding the details of the category learn- ing process (e.g., Estes, 1986; Estes, Campbell, Hat- sopoulos, & Hurwitz, 1989; Kruschke, 1992; Nosofsky, Gluck, Palmeri, McKinley, & Glauthier, 1994; Nosofsky, Kruschke, & McKinley, 1992; Nosofsky & Palmeri, 1996). Surprisingly, formal models of classification have largely neglected issues surrounding category learning at different hierarchical level (see, however, Estes, 1993). This work was supported by Vanderbilt URC Direct Research Sup- port Grants and by Grant PHS ROl MH48494-06 from NIMH to Indi- ana University, Correspondence should be addressed to T. 1. Palmeri, Department of Psychology, 301 Wilson Hall, Vanderbilt University, Nashville, TN 37240 (e-mail: thomas,j,[email protected]), The seminal work of Rosch and colleagues (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976) demon- strated that intermediate levels of a category hierarchy (the basic level) have a privileged status relative to superordi- nate or subordinate categories (see Lassaline, Wisniewski, & Medin, 1992, for one recent review). For example, ob- jects are often classified most rapidly at the basic level (e.g., Murphy & Smith, 1982; Rosch et a1., 1976), and categories are often learned most rapidly at the basic level (e.g., Lassaline et al., 1992). Although most models of supervised! category learning (in contrast to unsuper- vised learning; see Fisher & Langley, 1990, and Schyns, 1991) have not been formalized with hierarchical aspects of classification in mind, it seems reasonable to evaluate whether these models can account for differences in learn- ing categories at different hierarchical levels. For purposes of evaluating the category learning models, an intriguing category structure reported by Lassaline et a1. (1992; Experiment 3 of Lass aline, 1990) was used. In their study, subjects learned one of two different cate- gory structures at either a specific or a general level of a category hierarchy. As shown in Table 1, the two different structures primarily differed in whether the defining fea- tures fell along just a single dimension (1-D structure) or fell along all four dimensions (4-D structure). Lassaline et a1. observed an interesting, and theoretically challeng- ing, pattern of results. For the I-D structure, a "basic-level effect" was observed-fewer errors were made when learning the specific level categories than when learning the general level categories. For the 4-D structure, the op- posite pattern of results was observed. These results presented formidable challenges to three classification models Lassaline et a1. (1992) examined: A category utility measure (Gluck & Corter, 1985), the 495 Copyright 1999 Psychonomic Society, Inc.
Transcript
Page 1: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

Psychonomic Bulletin & Review1999, 6 (3), 495-503

Learning categories atdifferent hierarchical levels:

A comparison of category learning models

THOMAS J. PALMERIVanderbilt University, Nashville, Tennesee

Three formal models of category learning, the rational model (Anderson, 1990), the configural-cuemodel (Gluck & Bower, 19S5a), and ALCOVE (Kruschke, 1992),were evaluated on their ability to ac­count for differential learning of hierarchically structured categories. An experiment using a theoreti­cally challenging category structure developed by Lassaline, Wisniewski,and Medin (1992) is reported.Subjects learned one of two different category structures. For one structure, diagnostic informationwas present along a single dimension (I-D). For the other structure, diagnostic information was dis­tributed across four dimensions (4-D). Subjects learned these categories at a general or at a specificlevel of abstraction. For the I-D structure, specific-level categories were learned more rapidly thangeneral-level categories. For the 4-Dstructure, the opposite result was observed. These results provedhighly diagnostic for evaluating the models-although ALCOVE provided a good account of the ob­served results, the rational model and the configural-cue model did not.

In recent years, there has been tremendous growth inthe development offormal models ofclassification. Theseinclude exemplar (e.g., Estes, 1994; Kruschke, 1992;Nosofsky, 1986; Nosofsky & Palmeri, 1997; Palmeri,1997), connectionist (e.g., Gluck & Bower, I988a, 1988b),Bayesian statistical (e.g., Anderson, 1990), decision bound(e.g., Ashby & Maddox, 1993; Maddox & Ashby, 1993),and rule-based models (e.g., Nosofsky & Palmeri, 1998;Nosofsky, Palmeri, & McKinley, 1994; Palmeri & Nosof­sky, 1995). Although these models make different as­sumptions about category representations and processes,many of them make similar predictions of some elemen­tary patterns of classification data. This has caused re­searchers to look at more detailed aspects ofclassificationin order to evaluate models. In the present work, rather thansimply asking whether models could account for transferdata following category learning, the models were in­stead evaluated on whether they could account for patternsof classification throughout the entire training sequence.This work follows in a line of recent studies that have fo­cused on understanding the details of the category learn­ing process (e.g., Estes, 1986; Estes, Campbell, Hat­sopoulos, & Hurwitz, 1989; Kruschke, 1992; Nosofsky,Gluck, Palmeri, McKinley, & Glauthier, 1994; Nosofsky,Kruschke, & McKinley, 1992; Nosofsky & Palmeri, 1996).

Surprisingly, formal models of classification havelargely neglected issues surrounding category learning atdifferent hierarchical level (see, however, Estes, 1993).

This work was supported by Vanderbilt URC Direct Research Sup­port Grants and by Grant PHS ROl MH48494-06 from NIMH to Indi­ana University, Correspondence should be addressed to T. 1. Palmeri,Department of Psychology, 301 Wilson Hall, Vanderbilt University,Nashville, TN 37240 (e-mail: thomas,j,[email protected]),

The seminal work of Rosch and colleagues (Rosch,Mervis, Gray, Johnson, & Boyes-Braem, 1976) demon­strated that intermediate levels ofa category hierarchy (thebasic level) have a privileged status relative to superordi­nate or subordinate categories (see Lassaline, Wisniewski,& Medin, 1992, for one recent review). For example, ob­jects are often classified most rapidly at the basic level(e.g., Murphy & Smith, 1982; Rosch et a1., 1976), andcategories are often learned most rapidly at the basic level(e.g., Lassaline et al., 1992). Although most models ofsupervised! category learning (in contrast to unsuper­vised learning; see Fisher & Langley, 1990, and Schyns,1991) have not been formalized with hierarchical aspectsof classification in mind, it seems reasonable to evaluatewhether these models can account for differences in learn­ing categories at different hierarchical levels.

For purposes ofevaluating the category learning models,an intriguing category structure reported by Lassalineet a1. (1992; Experiment 3 of Lass aline, 1990) was used.In their study, subjects learned one of two different cate­gory structures at either a specific or a general level of acategory hierarchy. As shown in Table 1, the two differentstructures primarily differed in whether the defining fea­tures fell along just a single dimension (1-D structure) orfell along all four dimensions (4-D structure). Lassalineet a1. observed an interesting, and theoretically challeng­ing, pattern ofresults. For the I-D structure, a "basic-leveleffect" was observed-fewer errors were made whenlearning the specific level categories than when learningthe general level categories. For the 4-D structure, the op­posite pattern of results was observed.

These results presented formidable challenges to threeclassification models Lassaline et a1. (1992) examined:A category utility measure (Gluck & Corter, 1985), the

495 Copyright 1999 Psychonomic Society, Inc.

Page 2: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

496 PALMERI

Table 1Category Structure Used in the Experiment

(From Lassaline et aI., 1992)

Category Structure

Category Label 1-0 4-0

Stimulus General Specific 01 D2 03 04 01 D2 03 04

I A C 0 0 2 3 0 I 3 22 A C 0 I 3 0 0 2 1 33 A C 0 2 0 I 0 3 2 I4 A 0 1 3 I 0 I 0 2 I5 A 0 1 0 2 1 2 0 3 16 A 0 1 I 3 2 3 0 1 27 B E 2 2 0 2 3 2 0 I8 B E 2 3 1 3 I 3 0 29 B E 2 0 2 0 2 1 0 3

10 B F 3 1 3 1 2 3 1 011 B F 3 2 I 2 3 1 2 012 B F 3 3 0 3 1 2 3 0

Note-The italicized values highlight those feature values that are mostdiagnostic for category membership.

adaptive network model (Gluck & Bower, 1988b), andthe context model (Medin & Schaffer, 1978) could notaccount for the interaction of level and structure. Cate­gory utility predicted a specific-level advantage for bothstructures, whereas the adaptive network model and thecontext model predicted a general-level advantage forboth structures. Would these results also pose challengesto more recent, and potentially more sophisticated, cate­gory learning models? In particular, could the rationalmodel (Anderson, 1990), the configural-cue model (Gluck& Bower, 1988a), and ALCOVE (Kruschke, 1992) ac­count for an interaction of category level and structure?

Unfortunately, two aspects of Lassaline et al. (1992)make it impossible to use their results to evaluate thesemodels. First, they only reported average accuracythroughout training. In order to rigorously evaluate thecategory learning models, the present study instead re­ported classification data throughout the course of train­ing. Second, they used a category verification paradigmin which a stimulus and a category label were simulta­neously displayed, and the subjects' task was to decidewhether that category label was the correct one or not. Thepresent study instead used a more typical category learn­ing paradigm in which a stimulus was displayed, and thesubjects' task was to decide which category label from anumber of possible category labels to apply to the stim­ulus. This kind of task is preferable, largely because thevarious category learning models have been explicitly for­mulated to account for data from just such classificationparadigms.

Note that in using this more typical kind of categorylearning task, the interesting interaction ofcategory leveland structure that Lassaline et al. (1992) reported is notguaranteed to be reproduced. In the category verificationtask, there are just two possible responses ("yes" or "no")at both the general level and the specific level. By con­trast, in the classification task used in the present study,

there were two possible responses at the general level, butthere were four possible responses at the specific level.By simply guessing, subjects learning at the general levelwould be correct halfofthe time, whereas subjects learn­ing at the specific level would be correct only one fourthofthe time. In order to observe a specific-level advantagein this task, subjects would need to surmount this rela­tively large base rate difference-s-obtaining a crossoverinteraction between category level, and the amount oftraining is a real empirical challenge. Moreover, this pat­tern of results would provide an even more difficult testfor the category learning models.

Because it has proven quite difficult to train individu­als to classify objects at different hierarchical levels si­multaneously, different groups of subjects in each ofthefour experimental conditions (I-D-specific, I-D-general,4-D-specific, and 4-D-general) were trained separately.In the studies in which subjects have been trained at mul­tiple levels, they have typically been trained on one levelat a time (e.g., Murphy, 1991; Murphy & Smith, 1982).In addition to posing difficulties for subjects, it is not ob­vious how to instantiate this segmented training regimenin the models that were evaluated. Although the hierar­chical levels existed only virtually, in the sense that nosubject learned more than one level throughout the ex­periment, any empirical differences between levels thatare observed would still be challenging for the models toreproduce. If the models cannot account for differentiallearning of categories that exist in virtual hierarchies, itseems less likely that they will be able to account for nat­urally learned hierarchies either.

In the following experiment, a standard category learn­ing paradigm was used. On every trial, a stimulus waspresented, and the subject classified it into either one oftwo categories (subjects learning at the general level) orone of four categories (subjects learning at the specificlevel). Ofparticular interest was how classification accu­racy would change with training as a function of whichcategory structure subjects learned (l-D vs. 4-D) and atwhat level they learned to classify (specific vs. general).

METHOD

SubjectsThe subjects were 100 undergraduates who voluntarily partici­

pated as part of an introductory psychology course.

StimuliThe stimuli were computer-generated line drawings of rocket­

ships varying in the shape of the wing, nose, porthole, and tail(closely modeled after stimuli originally used by Hoffman &Ziessler, 1983; see also Anderson, 1990). Each dimension had fourpossible values.

At the general level, rocketships were divided into two cate­gories; at the specific level, rocketships were divided into four cat­egories. As shown in Table 1, the two different category structures,1-0 and 4-0, differed in how the most relevant information was dis­tributed across the four dimensions. For the 1-0 structure, diag­nostic features were contained along 0 I. For the 4-0 structure, di-

Page 3: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

LEARNING HIERARCHICAL CATEGORIES 497

25201510

ALCOVE

Rational Model

5o252015105o

A Observed Data ~B.7------.-- 10-G

I---0- 10-5...... .5 4D-G.... ------.--a........ ----{}-- 40-5w......-

o, .3

.1

C Configural-cue Model D.7

...... .5....a........w......-o, .3

.1

Training Block Training Block

Figure 1. Average probability of error as a function of the number of training blocks and condition. Specific level of clas­sification (S) is indicated by open symbols, and general level of classification (G) is indicated by fiIled symbols. The I-D cat­egory structure is indicated by circles, and the 4-D category structure is indicated by squares. Panel A displays the observeddata, panel B displays the predictions by the rational model, panel C displays the predictions by the conflgural-cue model,and panel D displays the predictions by ALCOVE.

agnostic features were distributed across all four dimensions. Inparticular, at the specific level, for the I-D structure, Values 0, 1,2,and 3 along D I signaled Categories C, D, E, and F,respectively; forthe 4-D structure, at the specific level, Value °along Dimen­sions DI, D2, D3, and D4, signaled Categories C, D, E, and F, re­spectively. Both category structure (I-D vs. 4-D) and category level(specific vs. general) were manipulated between subjects.

The assignments of physical dimensions and features to abstractdimensions and features were randomized for every subject. For ex­ample, a given subject learning the I-D-specific categories mightneed to learn that a particular shape ofthe wing was associated witheach category; a given subject learning the l-D-general categoriesmight need to learn that two different shapes of the wing were as­sociated with each category. A given subject learning the 4-D­specific categories might need to learn that a particular shape of thewing was associated with the first category, that a particular shapeof the nose was associated with the second category, that a particu­lar shape of the porthole was associated with the third category, andthat a particular shape of the tail was associated with the fourth cat­egory; a given subject learning the 4-D-general categories mightneed to learn that a particular shape of the wing or a particularshape of the nose was associated with the first category and that aparticular shape of the porthole or a particular shape of the tail wasassociated with the second category.

ProcedureA standard supervised category learning procedure was used in

which the subjects were supplied with corrective feedback afterevery response. Halfof the subjects learned category structure I-D,whereas the other half learned category structure 4-D. For the sub­jects learning each category structure, half of them learned to clas­sify each ofthe 12 stimuli into one of two categories (general level),whereas the other halflearned to classify each ofthe 12 stimuli intoone of four categories (specific level). Each stimulus was presentedonce per block for a total of 25 training blocks. On every trial, arandomly chosen stimulus was presented, the subject classified thatstimulus into either one of two possible categories (those learningatthe general level) or one of four possible categories (those learn­ing at the specific level), and then corrective feedback was suppliedfor I sec. The learning trials were terminated when the subject com­pleted two error-free training blocks.

EMPIRICAL RESULTS AND DISCUSSION

Figure IA displays classification error probabilities asa function of training. First, error rates decreased as afunction of training. Second, more errors were made bysubjects given the 4-D structure (square symbols) than

Page 4: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

498 PALMERI

those given the 1-D structure (circle symbols). Third,category level interacted with category structure: For the1-D structure, the specific level (open circles) was learnedmore rapidly than the general level (filled circles); for the4-D structure, the general level (filled squares) was learnedmore rapidly than the specific level (open squares). Usinga very different learning procedure from the one used byLassaline et al. (1992), a very similar interaction betweencategory level and category structure was observed.

A 2 (I-D vs. 4-D) X 2 (general vs. specific) X 25(training block) analysis of variance was conducted onthe data, with category structure and category level asbetween-subjects factors and training block as a within­subjects factor.? An alpha level of .05 was set for all sta­tistical tests. Corroborating the above impressions, sig­nificant main effects of category structure and trainingblock were found [F(I,89) = 15.30, MSe = 0.54, andF(24,2136) = 98.69, MSe = 0.02, respectively]. A signif­icant two-way category structure X category level inter­action reflected the more rapid learning of the specificlevel than the general level for the 1-D structure and themore rapid learning of the general level than the specificlevel for the 4-D structure [F(I,89) = 4.43, MSe = 0.54].A significant two-way category structure X training blockinteraction reflected the quicker learning of the 1-D struc­ture than the 4-D structure [F(24,2136) = 3.51, MSe =0.02]. Finally, a significant two-way category level Xtraining block interaction partially reflected the initial ad­vantage ofgeneral level over specific level due to the dif­ferent number ofresponse categories [F(24,2136) = 5.63,MSe = 0.02].

OVERVIEW OF THECATEGORY LEARNING MODELS

AND THEIR PREDICTIONS

I will summarize the key aspects of the three categorylearning models, discuss some possible expectations forhow well the models might account for the observed data,and then summarize the actual fits of the models to theobserved data. More details of the model fitting are pro­vided in the Appendix.

Rational ModelAccording to the rational model (Anderson, 1990,

1992), classification involves a Bayesian statisticalanalysis of the environment. Internal representations ofsubcategories, or partitions, are created to the extent thatobjects in the world are divided up into disjoint setswhose members probabilistically share certain features(cf. Rosch et aI., 1976). Partitions are similar to proto­types in that they may be abstractions of a number ofspecific instances; a single category in the world may berepresented as one or more internally defined partitions.The probability of classifying an object as a member ofsome particular category is essentially a function of thesimilarity of that object to the central tendency of eachpartition weighted by how likely that particular categorylabel is associated with objects contained within the par-

tition. Anderson (1990) applied the rational model to ex­periments examining basic-level effects that were simi­lar to the present one (e.g., Hoffman & Ziessler, 1983;Murphy & Smith, 1982). Although the model was notactually fitted to experimental data, the rational modeldid show a strong preference for creating partitions at theexperimentally determined basic level-that is, eachbasic-level category was represented by a single parti­tion. Thus, it may seem reasonable to expect the modelto predict a specific-level advantage in the present task.In addition, although the rational model does not assumeany mechanism of selective attention! to psychologicaldimensions.' Anderson (1990, 1992) applied the modelto classification tasks in which certain dimensions werehighly diagnostic for determining category membership(e.g., Medin & Schaffer, 1978; Shepard, Hovland, &Jenkins, 1961). Thus, it may seem reasonable to expectthe model to predict a I-D advantage as well.

The best-fitting predicted learning curves of the ra­tional model are shown in Figure IB. Contrary to the in­tuitive predictions generated above, the qualitative fit ofthe model was quite poor. The model was unable to pre­dict any specific-level advantage and was essentially un­able to predict any difference between the I-D and 4-Dconditions. It should be noted, however, that although themodel-fitting routine settled on parameters that maxi­mized quantitative fit, the best-fitting parameter valueswere fairly extreme (see the Appendix). In essence, fit wasmaximized when the rational model created individualpartitions for every stimulus, effectively reducing to apure exemplar-based model (see Nosofsky, 1991).

Because these shortcomings of the rational model maybe partially a product ofthe demands of maximizing thequantitative fit of model, it seemed important to explorethe qualitative predictions of the model using more rea­sonable parameters similar to what was used in previousresearch (Anderson, 1990; Nosofsky, 1991). 5 In thesenew simulations, the model still predicted essentially nodifference between the I-D and 4-D conditions and did notpredict a crossover in learning the specific- and general­level categories. Examination of the partitions that wereformed was particularly revealing. When learning cate­gories at a general level, partition formation was fairlyidiosyncratic: Depending on the particular sequence oftraining stimuli, between three and six partitions wereformed for each category. For example, in one particularsequence, when learning the I-D structure, Stimuli 1,3,and 5 formed one partition, Stimuli 2 and 6 formed an­other partition, and Stimulus 4 formed its own partition;in one particular sequence, when learning the 4-D struc­ture, Stimuli 3, 4, and 5 formed one partition, whileStimuli 1, 2, and 6 formed their own separate partitions.By contrast, when learning categories at a specific level,all stimuli within a given category were grouped togetherinto a single partition; however, the formation of suchpartitions was insensitive to whether the diagnostic fea­tures were present along a single dimension (I-D) or pre­sent along various dimensions (4-D). So the rationalmodel did indeed create consistent partitions when learn-

Page 5: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

ing categories at the specific level. Unfortunately, al­though such partitioning may suggest some sort of an ad­vantage for "basic-level" categories, the model did noteffectively utilize this advantageous partitioning to pre­dict these categories to be more quickly learned than cat­egories at higher levels of the hierarchy.

To assess whether the model could possibly account fora specific-level advantage when learning only the l-Dstructure, the model was fitted to the 1-0 data alone-itwas not possible to find parameters that permitted themodel to predict the observed crossover between specific­and general-level categories.

Configural-Cue ModelThe configural-cue model (Gluck & Bower, 1988a) is

a two-layer connectionist model of category learning.The input layer contains a single node for every individualcue (i.e., particular values along psychological dimen­sions) and combination of cues (configural cues) thatcompose an item. The output layer contains a single nodefor every category. Association weights are learned be­tween cues and categories via gradient descent on error.Gluck, Corter, and Bower (1996) applied the configural­cue model to existing data (Hoffman & Ziessler, 1983;Murphy & Smith, 1982) and new experiments examiningbasic-level effects in artificial categories. Thus, it mayseem reasonable to expect the model to predict a specific­level advantage in the present results. Also, although theconfigural-cue model does not incorporate selective at­tention to psychological dimensions (see note 3), Gluckand Bower (1988a) provided simulations demonstratingthe effectiveness of the model in accounting for classicattentional phenomena in classification. Thus, it may seemreasonable to expect the model to predict a 1-0 advan­tage as well.

The best-fitting predictions of the configural-cuemodel are shown in Figure 1C. The fit was quite poor­the model failed to predict a specific-level advantage,and the predicted learning curves for the 1-0 and 4-0conditions overlapped completely. But could the modelaccount for the observed specific-level advantage if itwas fitted only to the data from the 1-0 condition? In con­trast to what was reported by Gluck et al. (1996), usingdifferent category structures, it was not possible to findparameter values that allowed the model to predict theobserved crossover of the specific- and general-levelconditions.

Why did the configural-cue model fail to account forthe observed difficulty of/earning the 4-0 category struc­ture? According to the model, a cue is any specific featurevalue, such as the particular shape of the tailor the par­ticular shape ofthe porthole ofa rocketship stimulus. Themodel assumes that people learn the relevance of suchcues, and combinations ofcues, for making category de­cisions. However, no distinction is made between learn­ing the relevance of two cues along the same psycholog­ical dimension (e.g., a circular vs. a rectangular porthole)versus two cues along different psychological dimensions(e.g., a circular porthole vs. a triangular wing). Psycho-

LEARNING HIERARCHICAL CATEGORIES 499

logical dimensions, per se, do not exist in the model,only featural cues and cue combinations. By contrast, itis likely that people might learn the relevance of psycho­logical dimensions as well as the relevance of particularvalues along those dimensions.

ALCOVEALCOVE is a connectionist instantiation of an exem­

plar model of classification, the generalized contextmodel (GCM; Medin & Schaffer, 1978; Nosofsky, 1986).According to exemplar models, categories are representedin terms of the individual remembered exemplars. AL­COVE assumes that classification decisions are deter­mined by the similarity of a target item to each remem­bered exemplar and by the learned association strengthbetween each exemplar and each category. A fundamentalproperty of ALCOVE is that psychological dimensionscan be learned to be selectively attended to according totheir diagnosticity. Selective attention acts to "stretch"the psychological space along relevant dimensions and"shrink" it along irrelevant dimensions. For example, ifstimuli varied in shape, size, and color, but shape was par­ticularly diagnostic for deciding which category a stimu­lus belonged in, then differences along the shape dimen­sion would be accentuated, whereas differences alongthe size and color dimensions would be attenuated. Be­cause ALCOVE learns to attend to dimensions based ontheir diagnosticity, it may seem reasonable to expect themodel to predict a 1-0 advantage in the present results.

However, one generally acknowledged shortcoming ofmany exemplar models is that, in a typical category learn­ing paradigm, they cannot predict classification of ob­jects at lower levels ofa nested category hierarchy to everbe superior to classification ofthose same objects at higherlevels of a hierarchy (e.g., Lassaline et al., 1992)-theyfail to predict a "basic-level" advantage. To illustrate, sup­pose that at the general level there are two categories, Aand B, and that these two categories can be divided up atthe specific level into two categories each, C and 0 forCategory A, and E and F for Category B. According tothe context model (Medin & Schaffer, 1978; Nosofsky,1986), the evidence that some item belongs to a categoryis found by summing up the similarities of that item to allexemplars of the category. The probability ofclassifyingan item as a member of a category is given by the ratioof the evidence for that category to the total evidence forall categories. For example, when classifying at the gen­erallevel peA) = EA/(EA + EB) , and when classifying atthe specific level P(C) = Ee I(Ee + ED+ EE + EF) , whereEx is the evidence (summed similarity) for Category X(where X can be Category A, B, C, 0, E, or F). If cate­gories are represented in terms of stored exemplars, thencategories at higher levels of a hierarchy are simply theunion of the exemplars of categories at lower levels 'ofthe hierarchy. This means that the summed similarities tocategories at higher levels of a hierarchy are simplyequal to the sum of the summed similarities to categoriesat lower levels; for example, EA = Ee + ED' In comput­ing classification response probabilities, the denomina-

Page 6: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

500 PALMERI

tor in the ratios are identical at the general level and thespecific level, and the numerator is greater at the generallevel than at the specific level, so classification proba­bilities are constrained to be less accurate at the specificlevel than the general level-the context model fails topredict a "basic-level" advantage.

ALCOVE is not necessarily constrained to predict ageneral-level advantage for several reasons. First, becauseALCOVE learns to attend to dimensions based on theirdiagnosticity, the similarity between an item and an ex­emplar in memory can depend on the category level thatwas learned. Therefore, evidence for classification at ahigher level of a hierarchy is not a simple linear combina­tion ofevidences at a lower level. Second, ALCOVE learnsto associate exemplars with categories via a connectionisterror-driven learning algorithm. It is possible for associa­tion weights to be larger for categories learned at the spe­cific level than at the general level. By contrast, the con­text model just tallies the number of times each exemplarhas been associated with a given category. Finally, theresponse rule used by ALCOVE to map category activa­tions to response probabilities is highly nonlinear. Unlikethe context model, evidence for classification at higherlevels of a hierarchy is not simply the additive combina­tion ofevidences at lower levels. In summary, ALCOVEis not subject to the same constraints as the context model.Yet, explicit simulations of the model are necessary toassess whether the model can predict the observed spe­cific level.

The best-fitting predictions of ALCOVE are shown inFigure ID. The quantitative fit of the model was quitegood (see Appendix). More importantly, the model wasable to account for all of the qualitative results: AL­COVE predicted more rapid learning ofthe 1-D structureover the 4-D structure and was able to predict the observedcrossover interaction ofcategory structure with categorylevel.

Not surprisingly, dimensionalized selective attentionlearning is critical for allowing ALCOVE to predict theobserved I-D over 4-D category structure advantage. Todemonstrate this, a restricted version of ALCOVE with­out allowing for learned selective attention was fitted tothe observed data. As was the case for the configural-cuemodel, this restricted version of ALCOVE predicted ab­solutely no difference in learning the 1-D and 4-D categorystructures.s This finding adds additional support for thetheoretical claim that dimensionalized selective attentionis a critical component ofcategory learning (see Kruschke,1992; Nosofsky, 1986; Nosofsky, Gluck, et aI., 1994;Nosofsky & Palmeri, 1996). Essentially, ALCOVE cap­tures the notion that people generally find it is easier tolearn to pay attention to differences along a single psy­chological dimension than differences along multiplepsychological dimensions.

Why can ALCOVE predict a specific-level advantagefor the I-D structure when the context model failed? Asdiscussed above, multiple factors relieve ALCOVE fromthe constraint of assuming evidence for classification atthe higher level of a hierarchy to be a simple linear sum-

mation of evidences at lower levels. In breaking this con­straint, dimensionalized selective attention is clearlyimportant-a restricted version of ALCOVE without at­tention could not predict a specific-level advantage. Forthe I-D structure, when trained on the specific-level cat­egories, ALCOVE learned to attend solely to Dimen­sion 1 because Dimensions 2-4 were completely nondi­agnostic. By contrast, when trained on the general-levelcategories, ALCOVE learned to attend to Dimensions2-4 because they were somewhat diagnostic. These dif­ferences in selective attention cause the relative evidencefor specific-level classification to be greater than the rel­ative evidence for general-level classification. Error-dri­ven learning and nonlinear response rules are also very im­portant-restricted versions of ALCOVE with Hebbian(correlational) learning and a linear response mappingrule could not predict a specific-level advantage either.Accordingly, when provided the 1-D structure, ALCOVEdeveloped stronger association weights from exemplarsto categories when trained at the specific level than whentrained at the general level.

SUMMARY

The present article reported an extension of an exper­iment by Lassaline et al. (1992). For one category struc­ture in which diagnostic information was present along asingle dimension, specific-level categories were learnedmore rapidly than general-level categories; for anothercategory structure in which diagnostic information wasspread across dimensions, the reverse pattern of resultswas found.

Predicting this interaction of category structure andcategory level proved quite challenging. Neither the ra­tional model (Anderson, 1990) nor the configural-cuemodel (Gluck & Bower, 1988a) was able to predict a dif­ference in learning the two category structures, largelybecause neither model incorporates dimensionalized se­lective attention (see Nosofsky, Gluck, et aI., 1994;Nosof­sky & Palmeri, 1996). Also, neither model was able topredict a specific-level advantage for the I-D categorystructure. Although the rational model can indeed pro­duce partitions at the specific level when learning spe­cific-level categories (Anderson, 1990), the model wasun-able to take advantage of this partitioning to predicta specific-level advantage in category learning. Al­though the configural-cue model showed some promisein predicting basic-level effects in previous work (Glucket aI., 1996), the model was unable to account for thepresent results. By contrast, ALCOVE (Kruschke, 1992)was able to account for the interaction ofcategory struc­ture and category level. The presence ofdimensionalizedselective attention allowed the model to predict fasterlearning of the I-D structure than of the 4-D structure. Anonlinear response mapping rule, differences in associ­ation weights when learning at different category levels,and differences in learned selective attention weightswhen learning at different category levels all contributedto ALCOVE predicting a specific-level advantage when

Page 7: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

learning the I-D structure. The surprising failure of therational model and the configural-cue model and the sur­prising success of ALCOVE highlight the importance ofcarrying out explicit simulations. Hintzman (1990) notedthe difficulties in predicting the behavior ofcomplex psy­chological models based on some a prior understandingof the models: "Surprises are likely when the model hasproperties that are inherently difficult to understand,such as variability, parallelism, and nonlinearity-all,undoubtedly, properties of the brain" (p. Ill).

It should again be emphasized that the present empir­ical and theoretical work is limited in that the categoryhierarchies existed only virtually, in the sense that indi­vidual subjects never learned categories at more than onelevel. However, the model-based analyses are still rele­vant in pointing out important limitations ofthe rationalmodel and the configural-cue model in accounting forobserved patterns of categorization behavior, irrespec­tive ofwhether these particular empirical results say any­thing about learning natural categories at multiple levelsof a hierarchy. In extending these results to the more gen­eral issue of how people might actually learn natural cat­egory hierarchies, the present work highlights the possi­bility that learning categories at different hierarchicallevels may require attending to the psychological di­mensions ofstimuli in very different ways. That these pat­terns of selective attention to dimensions seem to vary asa function of category level may help explain why it hasproved so difficult to train people to classify stimuli atmultiple levels of a category hierarchy at the same time(Lassaline et aI., 1992; Murphy, 1991; Murphy & Smith,1982). But these varying patterns of selective attentionto dimensions also point out an important limitation ofALCOVE. If ALCOVE is to account for natural situa­tions in which the same individual learns categories atmultiple levels, then there must be inclusion of somemechanisms for remembering specific patterns of selec­tive attention weights and setting them to their appropri­ate values depending on the given category context. Fu­ture research will be needed to extend ALCOVE toaccount for learning multiple category levels at the sametime and to then test whether ALCOVE can explainbasic-level effects in other classification situations.

REFERENCES

ANDERSON, J. R. (1990). The adaptive character ofthought. Hillsdale,NJ: Erlbaum.

ANDERSON, J. R. (1992). The adaptive nature of human categorization.Psychological Review, 98, 409-429.

ASHBY. F. Goo & MADDOX. W. T. (1993). Relations between prototype,exemplar, and decision bound models of categorization. Journal ofMathematical Psychology, 37, 372-400.

ESTES, W. K. (1986). Memory storage and retrieval processes in cate­gory learning. Journal of Experimental Psychology: General. 115,155-174.

ESTES, W. K. (1993). Models of categorization and category learning.Psychology ofLearning & Motivation, 29, 15-56.

ESTES. W. K. (1994). Classification and cognition. Oxford: Oxford Uni­versity Press.

LEARNING HIERARCHICAL CATEGORIES 501

ESTES, W. K., CAMPBELL. J. A.• HATSOPOULOS, N., & HURWITZ, J. B.(1989). Base-rate effects in category learning: A comparison ofparal­lel network and memory storage-retrieval models. Journal ofExperi­mental Psychology: Learning, Memory. & Cognition, 15. 556-571.

FISHER. D.•& LANGLEY, P. (1990). The structure and formation of nat­ural categories. Psychology ofLearning & Motivation, 26, 241-284.

GARNER. W. R. (1974). The processing of information and structure.New York: Wiley.

GLUCK, M. A., & BOWER. G. H. (I988a). Evaluating an adaptive net­work model of human learning. Journal ofMemory & Language, 27.166-195.

GLUCK, M. A., & BOWER. G. H. (1988b). From conditioning to categorylearning: An adaptive network model. Journal ofExperimental Psy­chology: General. 117. 225-244.

GLUCK, M. A., & CORTER. J. E. (1985). Information. uncertainty, andthe utility of categories. In Proceedings ofthe Seventh Annual Con­ference ofthe Cognitive Science Society (pp. 283-288). Hillsdale. NJ:Erlbaum.

GLUCK.M. A.• CORTER. J. E., & BOWER. G. H. (1996). Basic levels inlearning category hierarchies: An adaptive network model. Unpub­lished manuscript.

HINTZMAN, D. L. (1990). Human learning and memory: Connectionsand dissociations. Annual Review ofPsychology. 41. 109-139.

HOFFMAN, J., & ZIESSLER. C. (1983). Objectidentifikation in kunst­lichen Begriffshierarchien [Object identification in artificial concepthierarchies). Zeitscrift fiir Psychologie, 16.243-275.

KRUSCHKE, J. K. (1992). ALCOVE: An exemplar-based connectionistmodel of category learning. Psychological Review. 99. 22-44.

LASSALINE. M. E. (1990). The basic level in hierarchical classification.Unpublished master's thesis. University of Illinois. Champaign.

LASSALINE. M. E., WISNIEWSKI. E. J.•& MEDIN. D. L. (1992). Basic lev­els in artificial and natural categories: Are all basic levels createdequal? In B. Burns (Ed.), Percepts. concepts, and categories: Therepresentation and processing of information (pp. 327-378). Am­sterdam: North-Holland.

MADDOX. W.T.. & ASHBY, F.G. (1993). Comparing decision bound andexemplar models ofcategorization. Perception & Psychophysics. 53.49- 70.

MEDIN. D. L.. & SCHAFFER. M. M. (1978). Context theory of classifi­cation learning. Psychological Review. 85, 207-238.

MURPHY, G. L. (1991). Parts in objects concepts: Experiments with ar­tificial categories. Memory & Cognition. 19.423-438.

MURPHY. G. L.. & SMITH. E. E. (1982). Basic level superiority in pic­ture categorization. Journal of Verbal Learning & Verbal Behavior.21, 1-20.

NOSOFSKY. R. M. ( 1986). Attention, similarity, and the identification­categorization relationship. Journal of Experimental Psychology:General. 115. 39-57.

NOSOFSKY, R. M. (1991). Relations between the rational model and thecontext model of categorization. Psychological Science. 2. 416-421.

NOSOFSKY, R. M., GLUCK, M. A.• PALMERI. T. J., McKINLEY, S. C; &GLAUTHIER, P. (1994). Comparing models of rule-based classifica­tion learning: A replication ofShepard. Hovland. and Jenkins (1961).Memorv & Cognition. 22. 352-369.

NOSOFSKY, R. M., KRUSCHKE, J. K.. & McKINLEY, S. C. (1992). Com­bining exemplar-based category representations and connectionistlearning rules. Journal ofExperimental Psychology: Learning, Mem­ory. & Cognition. 18.211-233.

NOSOFSKY, R. M., & PALMERI, T. J. (1996). Learning to classify integral­dimension stimuli. Psychonomic Bulletin & Review, 3. 222-226.

NOSOFSKY, R. Moo & PALMERI, T. J. (1997). An exemplar-based randomwalk model of speeded classification. Psychological Review. 104,266-300.

NOSOFSKY. R. M., & PALMERI, T. J. (1998). A rule-plus-exception modelfor classifying objects in continuous-dimension spaces. PsychonomicBulletin & Review.S, 345-369.

NOSOFSKY, R. M., PALMERI, T. J., & McKINLEY,S. C. (1994). Rule-plus­exception model of classification learning. Psychological Review,101,53-79.

PALMERI, T. J. ( 1997). Exemplar similarity and the development of au-

Page 8: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

502 PALMERI

tomaticity. Journal ofExperimental Psychology: Learning. Memory.& Cognition, 23, 324-354.

PALMERI, T. J., & NOSOFSKY, R. M. (1995). Recognition memory for ex­ceptions to the category rule. Journal ofExperiment Psychology:Learning, Memory, & Cognition, 21, 548-568.

ROSCH, E., MERVIS, C. B., GRAY, W. D., JOHNSON, D. M., & BOYES­BRAEM, P. (1976). Basic objects in natural categories. Cognitive Psy­chology, 8, 382-439.

SCHYNS, P. G. (1991). A modular neural network model of concept ac­quisition. Cognitive Science, 15,461-508.

SHEPARD, R. N., HOVLAND, C. L., & JENKINS, H. M. (1961). Learningand memorization of classifications. Psychological Monographs,7S(13, Whole No. 517).

NOTES

I. Two main forms of category learning have been investigated. Insupervised learning tasks, explicit trial-by-trial feedback is suppliedabout whether particular category responses are correct or incorrect. Inunsupervised learning tasks, no feedback is supplied, and subjects formtheir own categories on the basis of some internal criteria for categorycohesiveness.

2. Unfortunately, due to a procedural error, 7 ofthe 25 subjects in the4-D-general condition did not finish all 25 training blocks (none ofthese subjects completed fewer than 20 training blocks). Although theiravailable data were included in the data plotted in Figure I, which wasused to assess the model predictions, our statistical package could notaccommodate partial observations from individual subjects.

3. A number of categorization studies have found that the diagnostic­ity of certain stimulus dimensions has a profound effect on the speed oflearning particular categories (e.g., Nosofsky, Gluck, et aI., 1994; Shep­ard, Hovland, & Jenkins, 1961) and on transfer of category knowledgeto new stimuli (e.g., Medin & Schaffer, 1978; Nosofsky, 1986). These re­sults have been taken as evidence for some form ofa dimensionalized se­lective attention mechanism in categorization (in other words, particulardimensions of stimuli are weighted more heavily because they are rela­tively more diagnostic for determining category membership). However,the rational model and the configural-cue model, which do not incorpo­rate any explicit form ofselective attention to dimensions, have been ableto account for some of these empirical results (see, however, Nosofsky,Gluck, et al., 1994).

4. Note that psychological dimensions, such as shape, color, or size,are contrasted with particular values (or features) along those dimen­sions, such as circular, red, or large (see Garner, 1974).

5. Typically, the coupling parameter is set to some intermediate value,dimensional salience is significantly larger than label salience, and theresponse-mapping parameter is not used (Anderson, 1990; Nosofsky,1991). These simulations were conducted with c= O.3,sD= 1.0,sL = 0.1,and r = 1.0.

6. The fact that this restricted version ofALCOVE and the best-fittingversion of the configural-cue model accounted for the observed dataequally poorly is not simply a matter of coincidence. Rather, a versionof ALCOVE without selective attention is formally identical to a ver­sion of the configural-cue model with only complete exemplar cuecombinations present; the best-fitting parameters of the configural-cuemodel had nonzero learning rates only for complete exemplars (A 4) .

APPENDIXDetails of the Category Learning Models and Model Fitting

In this Appendix, I provide additional details about the three category learning models and how they werefitted to the observed learning curves. Best-fitting predictions from each ofthe three category learning mod­els were generated by adjusting free parameters of the models using a hill-climbing routine that minimizedsum ofsquared deviations (SSD) between observations and predictions. To guard against local minima emerg­ing from the hill-climbing routine, a number of different starting parameter values were used to initialize thesearch. To fit each of the models, 100 random stimulus sequences were generated, predictions were obtainedfor each ofthese sequences, and these predictions were then averaged. These average predicted category learn­ing curves constituted the predictions that were fitted to the observed data. The best-fitting parameters andfit values for the three models are given in Table AI.

Rational ModelAccording to the rational model, categories are learned by grouping objects into partitions. The probabil­

ity that some object joins an existing partition is a function of both the similarity of that exemplar to the par­tition's central tendency and the prior probability of the partition. The prior probability ofa partition is jointlydetermined by the size ofthe partition and by the value ofa coupling parameter, c, which is a free parameter;large values ofc produce large partitions, and small values ofc produce small partitions. In most applications,the coupling parameter has an intermediate value (Anderson, 1990, used c = 0.3). The similarity of an exem­plar to a partition's central tendency is found using a multiplicative similarity rule somewhat analogous to thesimilarity rule used in ALCOVE and the context model (Nosofsky, 1991). This similarity is jointly deter­mined by whether the dimensions ofthe object sufficiently match those stored in the partitions and priors spec­ified by salience terms for stimulus dimensions and category labels, sD and St., respectively, which are freeparameters; unlike other classification models, the category label is treated as just another stimulus dimen­sion in the stored representation. In most applications, dimensional salience is significantly larger than label

Table AIBest-Fitting Model Parameters

Model Parameters Fit

Rational c = 0.001, sD = 0.125, sL = 0.000, r = 1.486 SSD = 0.958, RMSD = 0.098, Var = 67.2Configural-cue A1= 0.000, A 2 = 0.000, A 3 = 0.000, A4 = 0.074, rfi = 1.788 SSD = 1.318, RMSD = 0.115, Var = 54.9ALCOVE c = 7.440, Aw = 0.017, Aa = 0.712, rfi= 3.746 SSD = 0.228, RMSD = 0.054, Var = 90.1

Note-SSD, sum of squared deviations; RMSD = root mean squared deviations; Var, percentage of variance accounted for.

Page 9: Learning categories at different hierarchical levels: A comparison …catlab.psy.vanderbilt.edu/wp-content/uploads/File/Palmeri-PBR99.pdf · ofclassification throughout the entire

LEARNING HIERARCHICAL CATEGORIES 503

APPENDIX (Continued)

salience (Anderson, 1990; Nosofsky, 1991). The probability, PA' that a given category label is assigned to anobject is found by summing the probability that the object belongs to each partition multiplied by the proba­bility that each partition signals that category label. A response-mapping parameter, r, transforms the inter­nal category label probabilities, PA' into actual response probabilities, peA). Given a category label proba­bility, p A' the actual probability of a Category A response is given by

r

P(A)=~,'L,P~c

where the subscript C ranges over all possible categories. This mapping function allows probabilities more orless extreme than those ordinarily predicted by the rational model (see Nosofsky et al., 1994). In the presentapplication, there are four estimated parameters: the coupling parameter, c, the dimensional salience, sD' thelabel salience, SL' and the response-mapping parameter, r.

Configural-Cue ModelInputs to the configural-cue model are individual cues and configural cues that comprise a given object.

With four dimensions, each having four possible values, there are 16 single nodes (4 dimension X 4 features),96 possible double nodes, 256 possible triple nodes, and 256 quadruple nodes (one for each possible exem­plar). The activation of each input node, aj, is set equal to one if the relevant configuration is present withina stimulus, otherwise it is set equal to zero. The activation of a category output node is given by

0A ='L,wiaa i,

where WiA is the learned association weight between cue i and Category A. Output activations are convertedinto response probabilities by

P(A)= exp(l/>°A) ,

'L,exp(l/>Oc lc

where the subscript C ranges over all possible categories and I/> is a response mapping constant. In the presentapplication, there are five estimated parameters: the response mapping constant (1/», and learning rates (A"A2, A3, and A4) , for updating the association weights between singles, doubles, triples, and quadruples and thecategory output nodes, respectively.

ALCOVEFormally, ALCOVE is a three-layer feedforward network. The input layer consists ofa single node for every

psychological dimension of an object. The hidden layer consists of a single node for every stored exemplar.The activation ofeach hidden node is a function ofthe similarity between the current input representation andthe exemplar representation of that node and is given by

hid ~a j =exp(-c~aidji)'

i

where a j is the learned selective attention to dimension i, and dji is an indicator variable equal to zero if theinput stimulus and the exemplar match along dimension i and equal to I if they mismatch. The positive constantc, called the specificity, acts as a scaling factor. Every hidden node is connected to every category output nodevia a learned association weight. The activation of Category node A is given by

hid°A='L,wjAa j ,

J

where wj Ais the learned association weight. Output activations are converted into response probabilities by

P(A)= exp(I/>°Al ,'L,exp(I/>Oc)c

where the subscript C ranges over all possible categories, and I/> is a response-mapping constant. In the pre­sent applications, there were four estimated parameters: the sensitivity parameter (c), the response-mappingconstant (1/», and learning rates (Aw and Au), for updating the exemplar association weights and selective at­tention weights (see Kruschke, 1992, for details).

(Manuscript received March 19, 1998;revision accepted for publication December 9, 1998.)


Recommended