Vocabulary, Grammar, Sex, and Agingpeople.psych.cornell.edu/~jec7/pcd 2015-16...

Cognitive Science (2016) 1–26Copyright © 2016 Cognitive Science Society, Inc. All rights reserved.ISSN: 0364-0213 print / 1551-6709 onlineDOI: 10.1111/cogs.12367

Vocabulary, Grammar, Sex, and Aging

Ferm�ın Moscoso del Prado Mart�ın

University of California, Santa Barbara

Received 24 April 2015; received in revised form 18 November 2015; accepted 6 January 2016

Abstract

Understanding the changes in our language abilities along the lifespan is a crucial step for

understanding the aging process both in normal and in abnormal circumstances. Besides controlled

experimental tasks, it is equally crucial to investigate language in unconstrained conversation. I

present an information-theoretical analysis of a corpus of dyadic conversations investigating how

the richness of the vocabulary, the word-internal structure (inflectional morphology), and the syn-

tax of the utterances evolves as a function of the speaker’s age and sex. Although vocabulary

diversity increases throughout the lifetime, grammatical diversities follow a different pattern,

which also differs between women and men. Women use increasingly diverse syntactic structures

at least up to their late fifties, and they do not deteriorate in terms of fluency through their lifes-

pan. However, from age 45 onward, men exhibit a decrease in the diversity of the syntactic struc-

tures they use, coupled with an increased number of speech disfluencies.

Keywords: Aging; Corpus study; Dialog; English; Information theory; Lexicon; Morphology; Sex

differences; Syntax

1. Introduction

Language is perhaps the most unique cognitive ability of humans. Naturally, it is an

ability that changes along people’s lifetimes. Beyond the evident changes in language

during the early years of life (i.e., language acquisition), linguistic performance has been

widely documented to be influenced by aging processes. It is long-known that the funda-

mental frequency of speech (F0) becomes lower with growing age (e.g., Endres, Bam-

bach, & Fl€osser, 1971; Harrington, Palethorpe, & Watson, 2007), which is related to both

cognitive and physiological aspects of aging (Ramig & Ringel, 1983). It has also been

found that older people are slower and less accurate than younger people in recognizing

and producing words (e.g., Lima, Hale, & Myerson, 1991; Mortensen, Meyer, & Hum-

Correspondence should be sent to Ferm�ın Moscoso del Prado Mart�ın, Department of Linguistics, South

Hall, UCSB Santa Barbara, CA 93106. E-mail: [email protected]

phreys, 2006), which results in significantly reduced speech tempos for older people (e.g.,

Quen�e, 2013). This slowing down is, however, counterbalanced by older people using

richer vocabularies than younger people do (Hartshorne & Germine, 2015; Kav�e, Knafo,& Gilboa, 2010; Kav�e, Samuel-Enoch, & Adiv, 2009). This has led some researchers to

argue that cognitive decline of linguistic abilities is a “myth” (Ramscar, Hendrix, Shaoul,

Milin, & Baayen, 2014), as the slowing down and decreased accuracies can be attributed

to having to choose from a larger lexicon or to trying to access more detailed representa-

tions (Kav�e & Nussbaum, 2012), rather than to decline in actual cognitive ability. Beyond

single words, the syntactic complexity of the utterances produced by people is reported to

decline in the later stages of life (e.g., Kemper, Thompson, & Maquis, 2001), and so does

the ease with which people understand syntactically complex sentences (e.g., Waters &

Caplan, 2001).

Most research on aging effects on language skills is based on well-controlled experi-

mental studies in lab (and, more recently, also online) contexts or on highly edited sam-

ples such as novels, speeches, or broadcasts (e.g., Harrington et al., 2007; Le, Lancashire,

Hirst, & Jokel, 2011; Quen�e, 2013). However, the “ecological niche” of human language

is neither picture naming or lexical decision experiments nor carefully prepared written

language, but rather natural unedited conversations. Despite the evident value of experi-

mental studies, it is also necessary to investigate the age evolution of language abilities

in natural dialog situations. This is important because performance in natural dialog

requires that the speakers successfully negotiate multiple social, pragmatic, and perceptual

cues, imposing an additional set of constraints and conditionings on the cognitive system

(e.g., Adams, Smith, Pasupathi, & Vitolo, 2002; Stine-Morrow, Soederberg Miller, &

Hertzog, 2006).

Some researchers have investigated the effects of aging on linguistic performance in

dialog situations. Bortfeld, Leon, Bloom, Schrober, and Brennan (2001) analyzed a corpus

of conversations elicited in collaborative task situations. They report that older speakers

produce more sentence-internal disfluencies than both middle-aged and young speakers

do, with no difference found between the two latter groups. Closer to natural spontaneous

dialog, Horton, Spieler, and Shriberg (2010) and Meylan and Gahl (2014) analyzed the

Switchboard I Corpus (Godfrey, Holliman, & McDaniel, 1992), a large collection of tran-

scribed telephone conversations between speakers of different ages and backgrounds, in

which the caller chose the conversation topic from a predefined list. Horton and his col-

leagues found positive correlations between the age of the speakers and the lexical rich-

ness, number of filled pauses, and the length of the sentences they used (a proxy for

syntactic complexity), as well as a negative correlation between the age and the rate at

which people spoke. However, it was noticed that the degree of intercorrelation between

all variables, and the likely influence of other properties of the speaker (i.e., sex, level of

education, dialect, etc.), make plain correlations inadequate for assessing whether or not a

factor is influenced by age. Working on the same corpus, Meylan and Gahl addressed the

problem of possible confounds due to speaker properties by using linear mixed-effect

model regression analyses instead of the plain correlations. Their results confirmed Hor-

ton et al.’s (2010) finding that the diversity of the lexicon increases with the age of the

2 F. Moscoso del Prado Mart�ın / Cognitive Science (2016)

speaker, and further added that the older speakers are less sensitive to lexical priming

than young speakers (i.e., older speakers are less likely to reuse words produced earlier

by their interlocutor). Finally, Gahl, Cibelli, Hall, and Sprouse (2014) analyzed a longitu-

dinal corpus of spontaneous speech following 10 speakers from ages 7 to 49 years, con-

firming the relative slowing of speech rate with age previously found by Horton and his

collaborators.

A further problem in the studies of Horton et al. (2010) and Meylan and Gahl (2014)

concerns their measure of lexical diversity. They measured lexical diversity using the

Uber Index (Dugast, 1980), a variation of the traditional type-token ratio, but claimed to

be less dependent on sample size. Unfortunately this index is far from independent from

sample size (cf., Tweedie & Baayen, 1998; see also Appendix S1). Aware of this poten-

tial problem, Meylan and Gahl assumed that, as the sample sizes for each speaker are

more or less similar across the Switchboard I corpus, they should not expect any system-

atic effects of sample size to affect the results (cf., Meylan & Gahl, 2014, p. 1007). This

is, however, a problematic assumption. First, as I will show, there is considerable vari-

ability in the sample sizes contributed by different speakers in that corpus (i.e., ranging

from 94 to almost 3,000 words). Second, and most important, this variability is system-

atic: The sample size (i.e., length of the contributions to a conversation) of each speaker

is significantly related to his/her age. Any effects found are therefore suspect of being just

the consequence of the differences in sample size (i.e., speakers of certain ages might just

happen to talk more than those of others, but there is no real change in the properties of

their speech beyond its quantity). Notice that this problem is not exclusive of the Uber

Index: All measures of lexical diversity that can be estimated from a corpus suffer from

sample size bias (cf., Tweedie & Baayen, 1998). It is therefore crucial that the bias of

sample size is explicitly considered when evaluating the change in measures of lexical

diversity.

Moscoso del Prado Mart�ın (2014) introduces a uniform information-theoretical frame-

work for measuring the lexical, morphological (inflectional), and syntactic diversity from

corpora. In this approach, all diversities are quantified using the same measure, the

entropy (Shannon, 1948) of their distribution. This approach to measuring diversity,

which is dominant in biology (cf. Gotelli & Chao, 2013), presents several advantages: (a)

uniformity, all aspects of diversity are measured using the same standard tool, rather than

ad hoc measures for each level; (b) interpretability, in contrast with the often obscure val-

ues and units obtained from traditional lexical diversity measures (e.g., Uber Index, Her-

dan Index, etc.), entropy provides easily interpretable values measured in well-understood

standard units (i.e., bits, nats, etc.); (c) finiteness, entropy offers valid finite values even

for distributions with a potentially infinite number of types (as would be the case for syn-

tactic structures according to many linguistic theories); (d) the raw measures exhibit faster

convergence and higher consistency than type-token ratio variants and germane measures

(see Appendix S1); and (e) there are effective, well-studied methods for correcting the

sample size bias.

It is often overlooked that the evolution of cognitive abilities along the adult lifespan

is both nonlinear and multi-faceted. In a recent study, Hartshorne and Germine (2015)

F. Moscoso del Prado Mart�ın / Cognitive Science (2016) 3

report that cognitive abilities do not necessarily follow monotonically increasing/decreas-

ing patterns. Rather, they often exhibit nonlinear trends, where a certain ability improves

in the early stages of adulthood, reaches a peak, and decreases thereafter. Crucially, Hart-

shorne and Germine found remarkable variability in the ages at which different abilities

peak, ranging from the early teens for several types of short-term memory tasks, to the

fifties (or perhaps later), for vocabulary tasks. In turn, language itself is far from being a

monolithic system, but it involves a wide range of types of knowledge and skills. The

studies on the evolution of linguistic performance in natural dialog have focused mostly

on properties of the speech signal, on the vocabulary, and on the disfluencies (with syntax

being indirectly considered as well by Horton et al., 2010). All studies assume monotonic

(linear) trends of the age of the speakers on vocabulary size (Horton et al., 2010; Meylan

& Gahl, 2014). This might, however, be inappropriate. Several authors have reported a

possible decline of vocabulary size with very old age (e.g., Hartshorne & Germine, 2015;

Kav�e et al., 2010; Kemper et al., 2001), which may arise from non-monotonic—peaking

—trends (Hartshorne & Germine, 2015; Kav�e et al., 2010). Furthermore, beyond vocabu-

lary, it is important to assess the evolution of higher levels of linguistic processing with

age. As mentioned above, experimental evidence seems to indicate a decrease in the abil-

ity to produce and comprehend complex syntactic structures (e.g., Kemper et al., 2001;

Waters & Caplan, 2001). Higher levels of linguistic structure might be subject to a differ-

ent set of cognitive constraints than those affecting the lower levels.

Numerous studies have documented that aging affects men and women differently. Both

sexes age differently with respect to a wide range of biological markers (e.g., Nakamura &

Miyao, 2008). Anatomical changes on the brain are reported to have different evolutions

with age with respect to sex (e.g., Cowell et al., 1994; Gur & Gur, 2002). In turn, these

physiological differences result in differences in the aging pattern of behavior and cogni-

tive performance (e.g., Costa, Santos, Cunha, Palha, & Sousa, 2013; Gur & Gur, 2002). In

particular, many studies report that men are affected earlier and more pronouncedly than

women by both anatomical changes (e.g., Cowell et al., 1994) and decreases in cognitive

performance across a wide set of domains (e.g., Costa et al., 2013; Gur & Gur, 2002).

These sex-differentiated aging patterns raise the question of whether (and how) men differ

from women in terms of how aging affects their linguistic abilities. Sex differences con-

cerning language have been widely reported at anatomical, physiological, and behavioral

levels. Cerebral areas of importance for language processing have been found to be of dif-

ferent sizes and neural densities between males and females (Harasty, Double, Halliday,

Krill, & McRitchie, 1997; Sowell et al., 2003; Witelson, Glezer, & Kilgar, 1995). Like-

wise, men and women differ in the brain areas they engage in language processing (see,

e.g., Baxter et al., 2003, and references therein). There are significant differences in the

linguistic behavior of boys and girls during language acquisition (Hartshorne & Ullman,

2006), and these differences extend to their behavior later in life (Ullman, Miranda, & Tra-

vers, 2007). Interestingly, it appears that these behavioral differences can be modulated by

hormonal factors (Estabrooke, Mordecai, Maki, & Ullman, 2002; Ullman et al., 2002), and

these are known to change with age. The aging literature on this question has focused

almost exclusively on investigating whether men and women differ in terms of their lexical


knowledge, whose changes with age do not appear to depend on the sex of the speaker

(e.g., Gur & Gur, 2002). It remains, however, unclear whether sex differences could be

observed at higher levels of linguistic structure, such as morphology and syntax. From the

corpus analysis perspective, few studies have investigated the joint effects of sex an aging

on linguistic performance. Analyzing the Switchboard I corpus, Shriberg (1996) noticed

that men produced certain types of disfluencies more often than women did, but she found

it difficult to disentangle these differences from socioeconomic properties of the speakers.

On their part, in their analysis of task-oriented speech, Bortfeld et al. (2001) studied the

effects of both the age and the sex on the speakers on the number (and type) of disfluencies

they produced. Once more, they found that men produced more disfluencies than women

did, and this effect could not be attributed to socioeconomic variables. Furthermore, the

role that the speaker played in the joint activity (i.e., leading “director” vs. more subordi-

nate “matcher”) increases the number of disfluencies produced by men, but not by women.

Unfortunately, however, Bortfeld and her colleagues did not analyze whether the age-

related patterns found in the disfluencies differed also in terms of sex.

In this study, I investigate how the age and sex of speakers affect the complexity of the

lexicon (i.e., vocabulary), inflectional morphology (i.e., grammatical processes that change

the form of words to fit them into particular contexts; e.g., “eat”–“eats”–“ate”–“eat-ing”–“eaten,” “car”–“cars”), syntax (i.e., grammatical processes governing the ordering

and grouping of words), and disfluencies (i.e., rephrasings, filled pauses, etc.) produced by

speakers in telephone conversations. As several previous studies (Horton et al., 2010;

Meylan & Gahl, 2014; Shriberg, 1996), I use the conversations in the Switchboard I Cor-

pus combined with syntactic parses of those conversations (from the Penn Treebank; Mar-

cus, Santorini, Marcinkiewicz, & Taylor, 1999). I follow Moscoso del Prado Mart�ın(2014) in using entropy for characterizing all diversity measures. I improve on previous

studies by applying sample size bias correction techniques when these are available, and

in all cases explicitly discounting the confounds that might be caused by sample size dif-

ferences. Similar to what Meylan and Gahl (2014) did, and in contrast with Horton et al.

(2010), I use mixed-effects regression models to account for possible properties of the

speaker. Crucially, unlike all previous dialog studies, the regressions include nonlinear

terms, hence allowing to model any potential non-monotonicities in the relationships

between the measures and the speaker ages. In addition, I also investigate how the sex of

the speakers interacts with the evolution of their linguistic performance with age. Finally,

the results are discussed in relation to previously reported behavioral and neurophysiologi-

cal studies on the influence of sex and aging on language abilities.

2. Method

2.1. Materials

I used the Switchboard I Corpus (Godfrey et al., 1992), a collection of telephone con-

versations between previously unacquainted native speakers of American English of


diverse ages and backgrounds, triggered by a conversational prompt (i.e., an initial con-

versation topic was chosen by the speaker making the call, but both speakers were free to

change topics during the conversation). I crossed the conversations with the syntactic

parse trees provided in the Penn Treebank (Marcus et al., 1999) for a subset of the dia-

logs. This resulted in 650 conversations for which syntactic parse trees were available in

the Treebank. In total, the subcorpus contained 1,023,832 words (excluding punctuation,

digits, and non-alphabetic characters), that is, an average of 788 words per participant in

each conversation. These words were grouped into 120,414 parse trees, corresponding to

an average of 93 parse trees per participant in a conversation. In total, there were 359

distinct speakers (165 women and 194 men, all born between 1924 and 1972), some of

which took part in more than one conversation (ranging from a single conversation for

more than 25% of the speakers, to two speakers who took part in 12 conversations each;

the median speaker took part in three conversations). As is shown in Fig. 1, the ages of

the speakers were similarly distributed for men and women (i.e., the age distributions

were not significantly different according to a two-sample Kolmogorov–Smirnov test:

D = .100, p > .250).

Each time one of the 359 speakers took part in a conversation, I attached to that con-

versation’s record his/her age in years (computed as the difference in days between the

birth date1 and the date of the recording, divided by 365), sex (647 instances of women

and 653 instances of men), level of education (“less than high school”: 14 cases, “less

than college”: 58, “college”: 798, “more than college”: 417, and “unknown”: 13), the

conversational topic chosen (with 64 different values), and the American English dialect

area where the speaker resided (“New England”: 55 cases, “North Midland”: 165, “North-

ern”: 190, “New York City”: 76, “South Midland”: 427, “Southern”: 127, “Western”:

176, “mixed”: 83, and “unknown”: 1 case) as provided by the Switchboard I Corpus.

Women

Age (in 1991)

Freq

uenc

y

20 30 40 50 60 70

010

2030

4050

Men

Age (in 1991)

Freq

uenc

y

20 30 40 50 60 70

010

2030

4050

Fig. 1. Distribution of distinct speakers by their sex and their age in 1991 (when Switchboard data collection

began).


2.2. Corpus processing and measurements

The words in each conversation were lemmatized (e.g., “eat,” “eats,” “ate,” “eating,”

and “eaten” were all considered to be instances of the lemma EAT, and both “car” and

“cars” were taken as instances of the lemma CAR) using the WordNet (Miller, Beckwith,

Fellbaum, Gross, & Miller, 1990) automatic lemmatizer. The frequency distribution of

the lemmas was used for computing the lexical diversity (H [L]; Moscoso del Prado

Mart�ın, 2014), that is, the entropy (Shannon, 1948) of the frequency distribution of

lemmas. One could compute the entropy using Shannon’s original expression,

H½L� ¼ �X

‘2Lpð‘Þ log pð‘Þ ð1Þ

where the p(‘) correspond to the relative probabilities of the lemmas used by the speaker.

However, using corpus counts directly into this expression—the maximum-likelihood esti-

mator—results in substantial underestimation of the entropy (i.e., the estimator is biased;

Miller, 1955). In order to attenuate this problem, the entropies were computed from the

frequencies using the optimal reduced bias entropy estimator described by Chao, Wang,

and Jost (2013).

The entropy of the frequency distribution of unlemmatized word—lemma pairs (H [W,

L]) was also computed for each participant in each conversation using the method of

Chao et al. (2013). The difference between the two entropies H [W, L] and H [L] is the

inflectional diversity (Moscoso del Prado Mart�ın, 2014),

H½W jL� ¼ H½W ;L� � H½L�: ð2Þ

Inflectional diversity corresponds to the average inflectional entropy (Moscoso del Prado

Mart�ın, Kosti�c, & Baayen, 2004) of the lemmas used by each participant. The latter is a

measure of the diversity of inflected variants for each lemma in the corpus. This measure

has been shown to capture the cost of recognizing and acquiring different words (Baayen

& Moscoso del Prado Mart�ın, 2005; Moscoso del Prado Mart�ın et al., 2004; Stoll et al.,

2012). Therefore, as the average inflectional entropy, inflectional diversity captures the

complexity of the morphological system of a person from an information-processing per-

spective. The values of this measure are an index of how many distinct inflected variants

are used for the average word.

In order to measure the syntactic complexity, for each participant in each conversation,

I extracted from the Penn Treebank (Marcus et al., 1999) the syntactic parse trees corre-

sponding to all of the utterances produced by that participant. The parse trees were

cleaned to remove all disfluencies marked in the tree (i.e., false starts, hesitations, “huh,”

pauses, etc.). Punctuation nodes, and the tree leaves (i.e., the words themselves), were

removed, so that the leaves of the new tree would be the part-of-speech tags (e.g., “singu-

lar noun,” “adverb,” “adjective,” . . .). Finally, to ensure that all trees had the same root

node, a new node SS0 was added to each tree directly dominating its root. This process


resulted, for each conversation participant, in a collection of parse trees like the one in

Fig. 2a. From those trees, I extracted the phrase-structure production rules (see Fig. 2b).

Using these productions and their frequencies of usage, for each speaker I induced a

probabilistic context-free grammar (PCFG; Booth & Thompson, 1973) by maximum-like-

lihood estimation (i.e., using the raw frequencies of occurrence of the rules in each con-

versation participant’s sample). Finally, from the induced PCFG, I computed the entropy

of the parse trees it generates (Chi, 1999; Grenander, 1976), the syntactic diversity (Mos-

coso del Prado Mart�ın, 2014). This measures how many distinct parse trees (taking their

probabilities into account) could be generated using the grammar rules provided, and it

has been shown to be a relevant measure of processing difficulty (Hale, 2006). Given the

small samples, this entropy is obviously an underestimate of the entropy of the trees that

the speaker could have hypothetically produced (see Appendix S1). This is not, however,

a problem, as I explicitly include the sample size as an independent predictor in the

regression models. Therefore, any differences in entropy estimates due to sample size

alone are accounted for.

In order to obtain controls for the biases that result from the entropy estimation proce-

dures above (see Appendix S1), for each participant in each conversation, I recorded the

mean length in words of the clauses (i.e., parse trees) he/she produced (after removing

disfluencies) and the total summed length in words of all the utterances he or she pro-

duced in the conversation. Finally, for each speaker in each conversation I also recorded

the average number of disfluencies per clause that were labeled in the corpus. A summary

of the corpus measures is provided in Table 1. Fig. 3 summarizes the marginal distribu-

tions, correlations, and nonlinear relations between each of the numerical variables

considered.

(a) (b)

Fig. 2. (a) Example of a syntactic parse tree (with disfluencies removed). The nodes in italic font are the ter-

minals that are removed prior to rule extraction. (b) Phrase-structure rules extracted from the tree in (a).


2.3. Analyses

For each of the diversity measures (lexical, inflectional, and syntactic), and for the

mean number of disfluencies per clause, I fitted a generalized additive mixed-effects

model (GAMM; cf., Wood, 2006). All four models included random effects of conversa-

tion identity (with 650 possible values), topic of conversation (with 64 possible values),

and dialect area of the speaker (“New England,” “North Midland,” “Northern,” “New

York City,” “South Midland,” “Southern,” “Western,” “mixed,” or “unknown”).2 In each

model, the random effect terms were deemed necessary using Wald tests on maximum-

likelihood model fits with different random effect structures. After fitting each model, the

model residuals were inspected. Those models whose residuals diverged substantially

from normality were refit after transforming the dependent variable using its logarithm

(giving rise to log-normal regressions), the resulting log-normal model residuals were

again inspected for deviations from normality, but none was found. All regression models

included fixed effects of speaker’s role in the conversation (caller vs. callee), speaker sex

(woman vs. man), listener sex (woman vs. man), the interaction between both sexes, and

level of education (“less than high school” vs. “less than college” vs. “college” vs. “more

than college” vs. “unknown”). The fixed effects that did not reach significance were

removed from the model fits.

On the one hand, in the models fitting the diversity measures, it is necessary to con-

sider the total sample size (i.e., the summed lengths in words of the speaker’s utterances

in the conversation), as this will be the main factor determining the negative bias of the

entropy estimators (i.e., entropy estimates generally increase with increasing sample size;

Miller, 1955). To account for these biases, those three models included a nonlinear effect

of the total length of utterances (modeled using a thin plate regression spline with auto-

matically determined dimension; cf., Wood, 2006). On the other hand, the number of dis-

fluencies per clause does not depend on the total length of the utterances, but on the

length of the clauses themselves: Longer clauses offer more opportunities for disfluencies

to arise and, for this reason, disfluency counts are known to be linearly related to clause

Table 1

Summary statistics for the measures extracted from the corpus

Measure Unit Minimum 1st Quartile Median Mean 3rd Quartile Maximum

Age years 19.71 28.75 34.73 37.17 46.42 67.59

Age difference years 0 5 10 11.79 18 40

Total length words 94 513 696.5 787.6 980 2973

Clause length words/clause 2.31 6.36 7.81 8.32 9.77 26.79

Lexical div. natsa/word 3.88 4.86 4.98 4.97 5.09 5.58

Inflectional div. nats/word .0199 .1112 .1358 .1358 .1627 .2620

Syntactic div. nats/clause 3.90 14.40 17.88 18.91 22.55 62.68

Disfluencies disfl./clause 1.46 2.33 2.70 2.92 3.32 12.58

Note. anats are loge information units, the same way that bits are log2 based information units; for exam-

ple, 1 nat = 1/log(2) � 1.4427 bits.


lengths (Oviatt, 1995; Shriberg, 1996). This was accounted for by including a nonlinear

effect (also a thin plate spline) of mean clause length in the regression fitting the number

of disfluencies.

To investigate the evolution of the measures with age, a nonlinear effect (thin plate

spline) of age (in years) was included in the four regression models. In those models

where sex was found to have a significant contribution (significantly higher or lower val-

ues for women than for men), I fitted an additional model with the same fixed and ran-

dom effect structure, including a nonlinear interaction considering different effects of age

lexical diversity

0.05 0.15 0.25 0.5 1.5 2.5 4.5 5.5 6.5 7.5

4.0

4.5

5.0

5.5

0.05

0.15

0.25

r= 0.37

p<0.001

inflectional diversity

r= 0.20

p<0.001

r= 0.30

p<0.001

(log) grammatical diversity

1.5

2.5

3.5

0.5

1.5

2.5

r= 0.26

p<0.001

r= 0.067

p= 0.016

r= 0.43

p<0.001

(log) dysfluencies/clause

r= 0.10

p<0.001

r= 0.23

p<0.001

r= 0.98

p<0.001

r= 0.46

p<0.001

(log) clause length

1.0

2.0

3.0

4.5

5.5

6.5

7.5

r= 0.44

p<0.001

r= 0.32

p<0.001

r= 0.56

p<0.001

r= 0.16

p<0.001

r= 0.43

p<0.001

(log) total length

4.0 4.5 5.0 5.5

r= 0.097

p<0.001

r= 0.085

p= 0.002

1.5 2.5 3.5

r= 0.19

p<0.001

r= 0.077

p= 0.005

1.0 2.0 3.0

r= 0.20

p<0.001

r= 0.079

p= 0.005

20 40 60

2040

60Age

Fig. 3. Distribution of the numerical predictors, correlations, and nonlinear relations among them. The solid

lines denote nonlinear smoothers.


for women and men (instead of the nonlinear effect of age). Both models were compared

using Wald log-likelihood tests, and only the better model was kept.

Finally, it is necessary to take into account that—on average—women and men tend to

have different preferences on the topics about which they like to talk. Fig. 4 compares,

for the 64 topics, the number of times each topic was chosen by a female or a male

caller. It shows how the likelihood of a topic being chosen significantly depends on the

sex of the caller (v2(63) = 127.6, p < .001). If the conversation is about a topic one has

not—or would have rather not—chosen, this might lead a speaker to have less to con-

tribute to the conversation, and therefore use a poorer vocabulary or a less complex syn-

tax.3 To account for this possibility, I included additional mixed-effects (i.e., random

slopes) of speaker sex by topic and speaker role (caller/callee) by topic. According to

Wald tests on maximum regressions, and including them anyway did not change the pat-

tern of results. For these reasons, in what follows, I do not discuss these mixed effects

any further.

3. Results

Table 2 summarizes the results of the GAMM regressions. Before moving into the

specific effects found in each of the models, as a “sanity check,” it is worth examining

whether the nonlinear terms succeeded in reconstructing the shape of the correction terms

included in the models, that is, the sample size (i.e., fragment length) measures used for

correcting the bias of the entropy estimators, and the clause length term included to

account for the fact that longer clauses afford more disfluencies per clause. These are

plotted in Fig. 5. It was noticed that the estimates become too noisy beyond sample sizes

of 1,500 words or mean clause lengths of more than 15 words, as there are few data

points with these characteristics. Panel (a) plots the effect of the sample size on the esti-

mated lexical diversity. The monotonically increasing concave shape plotted in the graph,

is the exact shape one should expect for the convergence of the Chao–Wang–Jost entropy

AID

S

AIR

_PO

LLU

TIO

N

AUTO

_REP

AIR

S

BASE

BALL

BASK

ETBA

LL

BOAT

ING

_AN

D_S

AILI

NG

BOO

KS_A

ND

_LIT

ERAT

UR

E

BUYI

NG

_A_C

AR

CAM

PIN

G

CAP

ITAL

_PU

NIS

HM

ENT

CAR

E_O

F_TH

E_EL

DER

LY

CH

ILD

_CAR

E

CH

OO

SIN

G_A

_CO

LLEG

E

CLO

THIN

G_A

ND

_DR

ESS

CO

MPU

TER

S

CO

NSU

MER

_GO

OD

S

CR

EDIT

_CAR

D_U

SE

CR

IME

DR

UG

_TES

TIN

G

ELEC

TIO

NS_

AND

_VO

TIN

G

ETH

ICS_

IN_G

OVE

RN

MEN

T

EXER

CIS

E_AN

D_F

ITN

ESS

FAM

ILY_

FIN

ANC

E

FAM

ILY_

LIFE

FAM

ILY_

REU

NIO

NS

FED

ERAL

_BU

DG

ET

FISH

ING

FOO

TBAL

L

GAR

DEN

ING

GO

LF

GU

N_C

ON

TRO

L

HO

BBIE

S_AN

D_C

RAF

TS

HO

ME_

REP

AIR

S

HO

USE

S

IMM

IGR

ATIO

N

JOB_

BEN

EFIT

S

LATI

N_A

MER

ICA

MAG

AZIN

ES

MET

RIC

_SYS

TEM

MID

DLE

_EAS

T

MO

VIES

MU

SIC

NEW

S_M

EDIA

PAIN

TIN

G

PETS

POLI

TIC

S

PUBL

IC_E

DU

CAT

ION

REC

IPES

/FO

OD

/CO

OKI

NG

REC

YCLI

NG

RES

TAU

RAN

TS

RIG

HT_

TO_P

RIV

ACY

SOC

IAL_

CH

ANG

E

SOVI

ET_U

NIO

N

SPAC

E_FL

IGH

T_AN

D_E

XPLO

RAT

ION

TAXE

S

TRIA

L_BY

_JU

RY

TV_P

RO

GR

AMS

UN

IVER

SAL_

HEA

LTH

_IN

S

UN

IVER

SAL_

PBLI

C_S

ERV

VAC

ATIO

N_S

POTS

VIET

NAM

_WAR

WEA

THER

_CLI

MAT

E

WO

MEN

'S_R

OLE

S

WO

OD

WO

RKI

NG

MEN WOMEN

0

5

10

15

Fig. 4. Distribution of chosen topics by the sex of the caller.


Table

2

Effectsignificance

inthefourGAMM

modelsontheuntransform

eddependentvariables

Predictor

Lexical

Diversity

aInflectional

Diversity

aSyntactic

Diversity

bNumber

ofDisfluencies

bSpeaker’s

Sex

Speaker’s

role

F(1,1285.96)=10.51

p<.001

F<1

F(1,1279.233)=2.58

p=.108

F(1,1283.89)=2.10

p=.147

Speaker’s

sex

F(1,1282.88)=1.02

p>

.250

F<1

F(1,1279.84)=44.93

p<.001

F(1,1285.04)=45.31

p<.001

Listener’s

sex

F(1,1282.88)=1.55

p=.214

F(1,1285.01)=1.23

p>

.250

F(1,1279.84)=16.03

p<.001

F<1

Sex

interaction

F(2,1282.88)=1.68

p=.195

F(1,1285.01)=1.10

p>

.250

F<1

F(2,1285.04)=5.70

p=.003

Level

ofeducation

F(4,1285.96)=6.14

p<.001

F(4,1289.09)=3.26

p<.001

F(4,1279.84)=3.08

p=.015

F(4,1285.04)=10.34

p<.001

Length

of

utterances

F(7.04,1285.96)=42.27

p<.001

F(2.27,1289.09)=71.94

p<.001

F(6.92,1279.84)=224.34

p<.001

–

Meanlength

ofclauses

––

–F(1.91,1285.04)=162.27

p<.001

Age

F(1,1285.96)=7.96

p<.001

F(3.64,1289.09)=3.82

p=.006

F(2.42,632.84)=15.51

p<.001

F(1.00,638.04)=5.97

p=.015

Women

F(3.81,626.84)=7.31

p<.001

F(4.05,632.04)=6.87

p<.001

Men

Age9

Sex

Interaction

v2(2)=1.557

p>

.250

v2(2)=.018

p>

.250

v2(2)=5.970

p=.050

v2(2)=24.869

p<.001

Note.Thep-values

anddegrees

offreedom

oftheF-tests

areapproxim

ations.

aUsingnorm

alregressionmodel.

bUsinglog-norm

alregressionmodel.


estimator that were used (see Appendix S1). In contrast the inflectional diversity—plotted

in panel (b)—shows a quasilinear, slightly concave increase. The inflectional diversity is

the difference between two entropy estimates (see Eq. 2), the first of which is expected to

be only slightly larger than the second. In such small magnitude of difference, the con-

vergence is necessarily slow, hence the almost linear—but still concave—pattern.

Panel (c), plotting the convergence of the grammatical diversity, also exhibits a concave

convergence pattern. However, as these are fully uncorrected maximum-likelihood esti-

mates (I do not know any method for correcting the bias of PCFG entropy estimates),

their convergence should be expected to be extremely slow (see Appendix S1). Finally,

0 500 1000 1500 2000 2500 3000

4.5

4.6

4.7

4.8

4.9

5.0

5.1

(a)

Sample Size (Fragment Length)

Est

imat

ed L

exic

al D

iver

sity

(nat

s)

0 500 1000 1500 2000 2500 3000

0.10

0.12

0.14

0.16

0.18

0.20

0.22

(b)

Sample Size (Fragment Length)E

stim

ated

Infle

ctio

nal D

iver

sity

(nat

s)

0 500 1000 1500 2000 2500 3000

1020

3040

50

(c)

Sample Size (Fragment Length)

Est

imat

ed G

ram

mat

ical

Div

ersi

ty (n

ats)

5 10 15 20 25

34

56

78

(d)

Clause Length

Mea

n N

umbe

r of D

isflu

enci

es p

er C

laus

e

Fig. 5. Reconstructed nonlinear correction terms in the four generalized additive mixed-effects models.


the number of disfluencies per clause is expected to be directly proportional to the aver-

age clause length (Oviatt, 1995; Shriberg, 1996), hence the linear pattern in panel (d).

The model fitting the lexical diversities did not reveal any effect of sex, either of

the speaker or of the listener (or their interaction). However, it did show a significant

effect of the speaker’s role in the conversation, indicating that the speaker making the

call (the caller) and choosing the topic overall used a richer vocabulary than the

speaker receiving the call (the callee). A significant main effect for the speaker’s level

of education was also present, indicating that the lexical diversity was lowest for peo-

ple with education below high school, slightly higher for education below college,

higher for people educated at college level or more, and highest for the cases whose

educational level was unknown (13 datapoints). After discounting the nonlinear effect

of the length of utterances, there was a significant effect of the age of the partici-

pants. As plotted in Fig. 6a, this effect indicated that lexical diversity (i.e., vocabu-

lary) of the utterances produced increases linearly with the speaker’s age. In other

words, speakers enrich their vocabularies at a steady rate throughout their lives, with

no suggestion of decline up to advanced ages. This pattern was not significantly dif-

ferent between men and women.

The model fitting inflectional diversities revealed no effects of the speaker’s role, or of

the sex of neither speaker nor listener. It found, once again, an effect of the speaker’s

level of education (i.e., the 13 speakers whose educational level was unknown exhibited

richer inflection). After discounting the nonlinear effect of the length of the utterances,

there was a nonlinear effect of age of the participants (which was not found to differ by

sex). As shown in Fig. 6b, inflectional diversity evolves non-monotonically with speaker

20 30 40 50 60

4.84

4.86

4.88

4.90

(a)

Age

Lexi

cal D

iver

sity

(nat

s)

20 30 40 50 60

0.12

50.

135

0.14

5

(b)

Age

Infle

ctio

nal D

iver

sity

(nat

s)

Fig. 6. (a) Effect of speaker’s age on the lexical diversities. (b) Effect of the speaker’s age on the inflectional

diversities. Note. The unreliable estimates for ages over 63 have been clipped from the graphs.


age irrespective of his/her sex, peaking at around 45 years of age, and decreasing there-

after.

The GAMM fit to the syntactic diversities revealed main effects of the speaker’s sex

(i.e., overall men used a more diverse syntax than women did), the sex of the listener

(i.e., speakers made use of a more diverse syntax when talking to a man than when talk-

ing to a woman), and level of education (i.e., speakers with education below high school

used less varied syntactic constructions than those speakers who had education of high

school or above, and the 13 speakers whose educational level was unknown exhibited the

richest syntax). There was a nonlinear effect of the length of the utterances as before.

Interestingly, the effect of age on the syntactic diversities was different for women and

men and significant for both (the presence of the interaction was marginally significant;

p = .050).4 These effects are plotted in Fig. 7. On the one hand, the syntactic diversity of

utterances produced by women (left-hand side panel) increases throughout their lives,

with a slight attenuation in the late fifties. On the other hand, the syntactic diversity of

utterances produced by men (right-hand side panel), although overall richer than that of

women, peaks at around 45 years of age and clearly decreases thereafter. In addition, for

men, there seems to be an acceleration of the increase in syntactic complexity starting

around the mid thirties.

Finally, the fit to the mean number of disfluencies per clause revealed main effects for

the speaker’s sex (overall, women produced less disfluencies per sentence than men did),

and an significant interaction between the sex of the speaker and the sex of the listener

(besides producing on average more disfluencies than women, men’s disfluencies were

further increased when talking to other men, whereas for women the sex of the listener

20 30 40 50 60

1213

1415

1617

18

Women

Age

Syn

tact

ic D

iver

sity

(nat

s)

20 30 40 50 60

1213

1415

1617

18

Men

Age

Syn

tact

ic D

iver

sity

(nat

s)

Fig. 7. Effect of the speaker’s age on the syntactic diversities of female speakers (left panel) and male speak-

ers (right panel). Note. The plots have been back-transformed from the logarithmic scale in which the regres-

sions were fitted; the unreliable estimates for ages over 63 have been clipped from the graphs.


did not significantly affect their average number of disfluencies). As in the previous three

models, there was also an effect of the level of education (the speakers with education

below high school produced more disfluencies than those speakers who had education of

high school or above and, as in the previous models, the 13 speakers with unknown edu-

cational level produced the lowest number of disfluencies per clause). The mean clause

length also exhibited a significant nonlinear effect. The effect of age on the disfluencies

was clearly different for women and men, and was significant for both. These effects are

plotted in Fig. 8. As they age, women steadily produce less disfluencies, following a lin-

ear trend. In contrast, men appear to follow approximately the same pattern as women

until they reach the age of 45, from where the number of disfluencies they produce mark-

edly increases. The clause length measure employed has a very strong linear relationship

with the syntactic diversity measure considered above (i.e., longer clauses require more

syntax; Pearson’s r = .98, t(1298) = 165.89, p < .001). Therefore, by partialing out the

effect of clause length, I have implicitly partialed out the syntactic diversity as well. In

other words, the effect of aging on the number of disfluencies cannot be attributed to

differences in syntactic complexity.5

As evidenced by the plots and correlations in Fig. 3, the lexical, inflectional, and syn-

tactic diversity and disfluency measures studied above are far from independent from

each other. It would be therefore desirable to investigate to what degree do the results

obtained reflect genuinely independent components of the evolution of linguistic abilities

along the lifespan. This would normally be achieved by including multiple diversity and

disfluency measures in the same regression models. The strong relationships between the

variables, compounded with the need to include clause length, and sample size predictors

20 30 40 50 60

3.0

3.5

4.0

4.5

5.0

Women

Age

Num

ber o

f Dis

fluen

cies

20 30 40 50 60

3.0

3.5

4.0

4.5

5.0

Men

Age

Num

ber o

f Dis

fluen

cies

Fig. 8. Effect of the speaker’s age on the average number of disfluencies per clause produced by female

speakers (left panel) and male speakers (right panel). Note. The plots have been back-transformed from the

logarithmic scale in which the regressions were fitted; the unreliable estimates for ages over 63 have been

clipped from the graphs.


into the models, would lead to extremely high multicollinearity, rendering the resulting

analyses virtually useless.

An alternative route to address the problem above is to consider whether those mea-

sures can be decomposed into a set of predictors that are uncorrelated to each other. One

assumes that the measured variables are the result of a linear mixing between multiple

originally independent source signals. In the case of linear relations between variables,

this is typically achieved using principal component analysis (PCA). However, the rela-

tions between our variables of interest are often nonlinear. It is then more adequate to

use independent component analysis (ICA; cf., Hyvarinen & Oja, 2000), which estimates

a set of original source variables (i.e., “independent components”) that are not only

uncorrelated, but also independent in a nonlinear information-theoretical sense. In order

to evaluate how the age effects represent different aspects of linguistic abilities, I per-

formed an ICA decomposition on the original dependent variables, and repeated the

GAMM regressions above, using the independent components as the dependent variables,

following the same methodology, fixed effect, and random effect structure that was used

in the analysis of the original variables. The results of those analyses—fully reported in

Appendix S2—confirm that the pattern of results reported here are not a side effect of the

multi-collinearity between the four measures used: (a) there is a nonlinear pattern on the

diversity of inflectional and grammatical constructions used by speakers of different ages,

(b) how aging affects speakers is dependent on the sex, with men showing an earlier

decay than women do, with an onset at around 45 years of age, and (c) men’s disfluen-

cies appear to increase from age 45, whereas women’s do not.

4. Discussion

This study demonstrates that age-related changes in the linguistic structures produced

by speakers in natural conversations are heterogeneous; lexical diversities improve

throughout speakers’ lives, while grammatical (i.e., inflectional & syntactic) diversities

and disfluencies exhibit nonlinear patterns. Furthermore, the aging patterns in language

are differentiated with respect to the sex of the speakers. Whereas women’s performance

steadily increases until ages beyond 60 (with perhaps some decrease in their use of inflec-

tional morphology), men exhibit a clear decrease in the richness of the grammatical struc-

tures they produce from the age of 45. At this age, the complexity of the syntax and

inflectional morphology of the utterances they produce begins to recess. After age 45, the

reduction in grammatical complexity is accompanied by a sudden marked increase in the

number of disfluencies produced by male speakers, but not by female speakers.

Importantly, in contrast with previous corpus studies on aging (Bortfeld et al., 2001;

Horton et al., 2010; Meylan & Gahl, 2014; Shriberg, 1996), the use of generalized addi-

tive mixed-effect models including nonlinear terms has enabled the investigation of the

peaking patterns exhibited by the different measures with age, while simultaneously tak-

ing into account multiple properties of the speakers and the conversations that could give

rise to confounds. Furthermore, independent component analysis (see Appendix S2) was


used to argue that the patterns reported for different measures should not be attributed to

effects of a single aspect generating what appear to be multiple patterns by spreading

intercorrelations, a weakness shared by the previous studies. This is especially important

as it underlines the polyhedric nature of human language: Aging affects different aspects

of language in different ways, similarly to what has been observed for cognitive abilities

in general (e.g., Hartshorne & Germine, 2015). This also addresses the limitation

expressed by Horton et al. (2010, p. 713) that this is a “found” dataset—rather than one

elicited under controlled experimental conditions—and is therefore subject to possible

confounds arising from the properties of the speakers. This is reminiscent of the ever-pre-

sent tension in biology between the complementary fields of ethology and behavioral

experimentation, appearing in linguistics under the names of corpus linguistics and psy-

cholinguistics. The importance of well-designed, controlled experiments is beyond doubt,

but this needs to be complemented with observational data of linguistic behavior in natu-

ral contexts. As is the case in ethology, natural dialogs are often subject to additional

constraints generally difficult to recreate in the laboratory (e.g., Adams et al., 2002;

Stine-Morrow et al., 2006). In this respect, I think that Horton and his colleagues might

have underestimated the possibilities of modern statistical modeling techniques for

addressing the possible confounds that arise in observational data.

A related technical aspect advanced by this study is the demonstration that one can

draw meaningful inferences from samples that are—in the scale of corpora—extremely

small. I combined the information-theoretical framework developed in Moscoso del Prado

Mart�ın (2014) for studying diachronic aspects of language, with appropriate non-para-

metric corrections for the strong biases that arise in such small sample sizes. As I dis-

cussed, sample size effects were a problem in all previous corpus studies of the evolution

of lexical and syntactic complexity with age (Horton et al., 2010; Meylan & Gahl, 2014).

The results reported demonstrate that one can obtain reliable comparisons about the diver-

sities represented by samples as small as the contributions of a single speaker to a short

telephone conversation. As it is evidenced by the analyses in the text (and the simulations

detailed in Appendix S1), such technique is able to recover even very succinct patterns

that are initially swamped under much confounding noise and is able to discard them

when they are just spurious by-effects of other nonlinear relations. I believe that this is a

useful contribution to the study of language using corpora: For many populations and lan-

guages, obtaining sufficiently large corpora is often simply beyond reach.

The linear increase in vocabulary richness throughout the lifespan is consistent with

previous research on natural conversations (Horton et al., 2010; Meylan & Gahl, 2014),

as well as with the experimental literature (see Verhaeghen, 2003, for an extensive meta-

analysis, and Hartshorne & Germine, 2015, for a recent view). This result also supports

the argument that vocabulary learning is generally spared in aging, continuing up to an

advanced age, and that the observed slowing down of older people in vocabulary tasks is

probably a by-effect of their having to access a larger and more detailed lexicon (e.g.,

Kav�e & Nussbaum, 2012; Ramscar et al., 2014). In this respect, my result should, how-

ever, be taken with some care. The sample analyzed lacks any data beyond the age of

67 years, and in fact only six conversations were included at this age, and just a single


woman and a single man (both aged 67) in the pool of speakers were older than this. This

resulted in confidence intervals beyond age 63 too large to draw any meaningful infer-

ence.6 It remains therefore possible that a peak in lexical diversity might be reached

much later in life (cf. Singer, Verhaeghen, Ghisletta, Lindenberger, & Baltes, 2003).

The results for the grammatical components of language (inflectional morphology and

syntax) were rather different. Instead of the linear increase that was observed for the

vocabulary, both of these components exhibited significantly non-monotonic patterns. The

diversity of inflectional forms shows an increasing pattern up to the age of 45 and

decreases thereafter for both sexes. The diversity of syntactic structures used by speakers

shows a very similar pattern, peaking at 45 years of age for men and not showing clear

evidence of decline for women at least into their early sixties. In contrast with the ever-

increasing vocabulary, from 45 years of age, men use less and less diverse grammatical

constructions. It is as if the language they produced were becoming more and more “ossi-

fied” with age, making use of a more limited and predictable set of constructions. The

progressive decrease in the syntactic complexity of the utterances produced by men from

age 45 onward is consistent with the behavioral literature, which indicates that there is a

decrease in the performance of older speakers in producing (e.g., Kemper et al., 2001)

and comprehending (e.g., Waters & Caplan, 2001) syntactically complex sentences (see

Burke & Shafto, 2008, for a detailed review). In comprehension, Antonenko et al. (2013)

report that a decrease in syntactic ability—as reflected in decreased ability to understand

sentences with increasing numbers of syntactic embeddings—in older age is paired with

reduced functional connectivity within “dedicated syntax networks” in the brain. Finally,

neural atrophy in older age (i.e., loss of both gray and white matter) is well documented,

and this neural deterioration requires older speakers to recruit additional brain resources

for syntactic and semantic processing (cf., Tyler et al., 2010, and references therein).

Importantly, the changes in white matter volume are reported to be nonlinear, increasing

from ages 19 to 40, and decreasing thereafter (Sowell et al., 2003). In short, the

decreased syntactic complexity of the utterances produced by older men is fully in line

with what is reported from the behavioral and neurophysiological literature: Older persons

have more difficulties in processing syntax, and this is due to both anatomical and func-

tional differences in their brains, contradicting Ramscar et al. (2014)’s statement that cog-

nitive decline in linguistic abilities is a “myth.” Interestingly, I find that only men

manifest the properties of a decreasingly diverse syntax on their speech. To my knowl-

edge, this has been reported neither in the behavioral nor in the neurophysiological litera-

ture. However, none of the studies that I have found on this topic analyzed whether the

patterns of change differed by sex, and they might have therefore overlooked it.

The question arises as to what degree do the results on syntax depend on the use of

the specific grammatical formalism from the Penn Treebank, to which I propose no theo-

retical commitment. Indeed, the specific values of the syntactic entropy for each partici-

pant in a conversation will—to some degree—depend on the grammatical theory used for

constructing the parses. However, one should expect the relative values of those entropies

(i.e., the shape of their relationship with age and other variables) to remain more or less

unchanged. Different syntactic structures should, in the majority of cases, receive


different parses within the same grammatical formalism (even if the specific parses differ

across formalisms), and it is precisely such variability that is measured by the entropy

values (Chi, 1999; Grenander, 1976). In fact, the current data already demonstrate this

point. The average clause lengths were almost perfectly correlated (Pearson’s r = .98)

with the syntactic complexity measure (see Fig. 3), and one can replicate the analysis on

syntactic diversity replacing it with the mean clause length and obtain the very same

results. Crucially, average clause lengths are basically identical to the proxy for syntactic

complexity—but remain completely independent of any grammatical paradigm—that is

most often used in the field of language acquisition and often also in clinical studies, the

mean length of utterance (MLU), dating as far back as Nice (1925). Although the current

standard practice—following Brown (1973)—is to measure MLUs in morphemes rather

than in words, this does not really make any difference (Parker & Brorson, 2005). In this

respect, the current results indicate that MLUs are indeed reliable measures of the aver-

age syntactic complexity of the utterances of an individual: They correlate almost per-

fectly with the productivity of the grammar the individual is using. This finding further

validates the use of MLU-like measures in studies using corpora for which syntactic

parses are not available (e.g., mean clause length was one of the proxies for syntactic

complexity used in the study by Horton et al., 2010).

The results regarding the disfluencies are also remarkable. There is disagreement in the

literature as to whether aging affects the amount of disfluencies produced by speakers.

Multiple authors (Duchin & Mysak, 1987; Juste & Furquim de Andrade, 2011; de Oli-

veira Martins & Furquim de Andrade, 2008; Shewan & Henderson, 1988) have failed to

find any difference on the number of disfluencies produced by younger and older sub-

jects, not even for centenarians (Caruso, McClowry, & Max, 1997; Searl, Gabel, & Fulks,

2002). In contrast, others report higher disfluency rates among older speakers (Bortfeld

et al., 2001; Horton et al., 2010; Schow, Christensen, Hutchinson, & Nerbonne, 1978). It

was noticed that, for women, there is actually little difference in the number of disfluen-

cies produced by younger and older speakers (and none if one takes the result of the ICA

analyses into account), but, for men, there is a marked increase in the production of dis-

fluencies from age 45 onwards. Crucially, the results found for men mimic those reported

by Bortfeld et al. (2001); older speakers produce more disfluencies than middle-aged

ones, which are themselves in this respect no different from the younger speakers. With

respect to sex, Furquim de Andrade and Martins (2011) failed to find sex differences on

the number of disfluencies, whereas such differences are reported by other studies (Bort-

feld et al., 2001; Shriberg, 1996). My results suggest that the disagreements in the litera-

ture stem from failing to jointly considered sex and age as interacting variables.

One possibility is that the changes in the use of complex grammar in older ages are,

per se, not an indication of cognitive decline, but rather reflect a change in speaking

styles as one matures or some sociolectal differences across generations. This explanation

would need to account for both the marked increase in the number of disfluencies pro-

duced by men above the age of 45 and the differences between the sexes. It could per-

haps be that the increasing use of disfluencies is an effect of sociolect. For instance, if a

speaker’s dialect includes words and/or constructions that are outdated or rare today, the


speaker might hesitate in using those with younger speakers, leading to increased disflu-

encies and complex syntax due to circumlocutions. Given the history of cultural gender

differences, it may not be surprising to find a lag between men and women (e.g., due to

slower entry into the workplace for older women). In such case, one would expect to find

that the age difference between speakers influences the grammatical diversity, which was

not present in our data.7 Furthermore, one would expect variables such as the degree of

acquaintance of the speakers to play a role in these factors. However, after controlling for

such degree of acquaintance, Bortfeld et al. (2001) found a pattern of increase in disflu-

encies very much like that reported here for men. Together with the evidence for neuro-

physiological effects of aging in areas relevant to language, and even patterns of decay

also beginning at around age 40 (Sowell et al., 2003), it seems more parsimonious to

attribute the differences observed here to actual effects of aging.

The results indicate that men and women exhibit different patterns of aging with

regard to their linguistic performance. Sex differences in language processing have often

been reported in the literature (see, Ullman et al., 2007, for a review). In this respect, the

findings that women use more inflectional morphology and show fewer disfluencies than

men are perhaps not very surprising. Women are known to outperform men in both of

these, even from an early age (e.g., Hartshorne & Ullman, 2006). It is more surprising

that men show an overall increased diversity over women in the use of syntax itself.

Although most studies on language abilities have found—when anything—higher perfor-

mance for women, some theories have proposed that men should in fact be better at tasks

involving “procedural” processes, such as those necessary for processing syntactic regu-

larities (Hartshorne & Ullman, 2006; Ullman et al., 2002). Most surprising of all is the

accelerated pattern of change found for men. It seems that men’s inflectional and syntac-

tic abilities and fluency peak at around age 45, decreasing from there on. One would

think that this could point toward an earlier onset of dementias for men than for women.

The literature, however, indicates that—if anything—it is women who show a higher inci-

dence of dementias (cf. Ruitenberg, Ott, van Swieten, Hofman, & Breteler, 2001). The

fact that the reduction in overall fluency (i.e., increased pauses, false starts, self-correc-

tions, etc.) in men seems to be very strong over and above any effects of syntax suggests

that there are general cognitive, not purely linguistic, mechanisms of importance for lan-

guage performance, playing an important role in the breakdown of linguistic skills. The

procedural/declarative distinction drawn by Ullman and his colleagues offers one tentative

explanation for these patterns. However the declarative aspects of language (i.e., vocabu-

lary and knowledge) improve throughout the lifespan for both sexes, the procedural abili-

ties subserving grammatical processing begin to decay in middle age, with a later onset

of this decay for women than for men.

Finally, a note of caution is owed here. Anyone who has talked to—otherwise healthy

—older men knows that they often have no difficulties in either speaking or understand-

ing. The changes reported in this study do not necessarily constitute deficits. They are rel-

atively small-scale differences in performance. Even the most marked of these, the

disfluencies (i.e., from Fig. 8 one can deduce that 60-year old men produce on average

42% more difluencies per sentence than women do, and 27% more than 45-year old men


do), may not be an indication of poor performance. Some authors have even found that

such disfluencies might actually be beneficial for listeners, who may be able to compen-

sate for them and actually facilitate their understanding (e.g., Brennan & Schober, 2001;

Lau & Ferreira, 2005). In sum, the differences reported here are important in offering

insights into the processes involved in mental aging, but whether they point to any form

of impairment remains an open question.

Acknowledgments

I thank John W. Du Bois, Roger Levy, Michael Ramscar, Petar Milin, and two anony-

mous reviewers for helpful suggestions on this paper, even if disagreement remains on

some aspects.

Notes

1. As only the year of birth was available, all birth dates were set to July 1 of the cor-

responding year.

2. I also considered a possible random effect of the speaker’s identity. However, con-

sidering this effect was problematic: The number of conversations per speaker was

generally small, with many speakers participating in a single conversation. This is

compounded by the overwhelming majority of the age variance being between

speakers (rather than within speakers). In such a situation, it is sometimes not

advisable to include a random effect (e.g., Clark & Linzer, 2015). In GAMMs, this

situation is aggravated, resulting in considerable shrinkage on the nonlinear effect

estimates that is not easily detected by correlations. Much of the systematic nonlin-

ear effects of age is erroneously attributed to the (non-systematic) speaker random

effect coefficients, to the point that one could then analyze those random effect

adjustments as a function of speaker sex and age and obtain the very same pattern

of results reported here. Therefore, this random effect was discarded from the

regressions.

3. Notice, however, that speaking less does not necessarily imply using a poorer

vocabulary or syntax, especially when—as is the case in this study—the amount of

speech produced by each speaker is specifically considered as a nonlinear predictor

separately from any other predictors under consideration.

4. The estimates for significance of nonlinear interactions in GAMM are estimated by

model comparison of the random effects part of the model and are only very rough

approximations testing only the difference in degrees of freedom of the smoothers

and not their difference in shape. Even if the effect is marginally significant, I

decided to keep this interaction as the models with the interaction definitely

improved on the models without it. According to Akaike’s Information Criteria

(AIC), the model with the interaction was indeed better. The AIC difference


approached two units (1.96), generally interpreted as only weak support for the alter-

native model without the interaction. Further support for this choice is the clearly

different shapes (which the p-value does not test) between the effects for each sex.

5. One obtains identical results—with the evident rescaling—if one analyzes the num-

ber of disfluencies per word instead of the number of disfluencies per clause.

6. All analyses were also conducted removing the six points with ages above 63. This

did not substantially change the results.

7. Additional nonlinear effects of the age difference between speakers or its absolute

value were added in the regressions, none approached significance (F < 1 in all

cases but the absolute value of the age difference’s effect on the number of disflu-

encies, for which F (1, 1284.01) = 1.673, p = .196).

References

Adams, C., Smith, M. C., Pasupathi, M., & Vitolo, L. (2002). Social context effects on story recall in older

and younger women: Does the listener make a difference? Journal of Gerontology B: PsychologicalSciences & Social Sciences, 57, P28–P40.

Antonenko, D., Brauer, J., Meinzer, M., Fengler, A., Kerti, L., Friederici, A. D., & Fl€oel, A. (2013).

Functional and structural syntax networks in aging. NeuroImage, 83, 513–523.Baayen, R. H., & Moscoso del Prado Mart�ın, F. (2005). Semantic density and past-tense formation in three

Germanic languages. Language, 81, 666–698.Baxter, L. C., Saykin, A. J., Flashman, L. A., Johnson, S. C., Guerin, S. J., Babcock, D. R., & Wishart, H.

A. (2003). Sex differences in semantic language processing: A functional MRI study. Brain andLanguage, 84, 264–272.

Booth, T. L., & Thompson, R. A. (1973). Applying probability measures to abstract languages. IEEETransactions on Computers, C-22, 442–450.

Bortfeld, H., Leon, S. D., Bloom, J. E., Schrober, M. F., & Brennan, S. E. (2001). Disfluency rates in

conversation: Effects of age, relationship, topic, role, and gender. Language and Speech, 44, 123–147.Brennan, S. E., & Schober, M. F. (2001). How listeners compensate for disfluencies in spontaneous speech.

Journal of Memory and Language, 44, 274–296.Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.

Burke, D. M., & Shafto, M. A. (2008). Language and aging. In F. I. M. Craik & T. A. Salthouse (Eds.), Thehandbook of aging and cognition (3rd ed., pp. 373–443). New York: Psychology Press.

Caruso, A. J., McClowry, M. T., & Max, L. (1997). Age-related effects on speech fluency. Seminars inSpeech and Language, 18, 171–180.

Chao, A., Wang, Y. T., & Jost, L. (2013). Entropy and the species accumulation curve: A novel entropy

estimator via discovery rates of new species. Methods in Ecology and Evolution, 4, 1091–1100.Chi, Z. (1999). Statistical properties of probabilistic context-free grammars. Computational Linguistics, 25,

131–160.Clark, T. S., & Linzer, D. A. (2015). Should I use fixed or random effects? Political Science Research and

Methods, 3, 399–408.Costa, P. S., Santos, N. C., Cunha, P., Palha, J. A., & Sousa, N. (2013). The use of Bayesian latent class

cluster models to classify patterns of cognitive performance in healthy ageing. PLoS ONE, 8, e71940.Cowell, P. E., Turetsky, B. I., Gur, R. C., Grossman, R. I., Shtasel, D. L., & Gur, R. E. (1994). Sex

differences in aging of the human frontal and temporal lobes. The Journal of Neuroscience, 14, 4748–4755.


Duchin, S. W., & Mysak, E. D. (1987). Disfluency and rate characteristics of young adult, middle-aged, and

older males. Journal of Communication Disorders, 20, 245–257.Dugast, D. (1980). La statistique lexicale. Geneva, Switzerland: �Editions Slatkine.Endres, W., Bambach, W., & Fl€osser, G. (1971). Voice spectrograms as a function of age, voice disguise,

and voice imitation. Journal of the Acoustical Society of America, 49, 1842–1848.Estabrooke, I. V., Mordecai, K., Maki, P., & Ullman, M. T. (2002). The effect of sex hormones on language

processing. Brain and Language, 83, 143–146.Furquim de Andrade, C. R., & Martins, V. d. (2011). Influence of gender and educational status on fluent

adults’ speech fluency. Revista de Logopedia, Foniatr�ıa y Audiolog�ıa, 31, 74–81.Gahl, S., Cibelli, E., Hall, K., & Sprouse, R. (2014). The “Up” corpus: A corpus of speech samples across

adulthood. Corpus Linguistics and Linguistic Theory, 10, 315–328.Godfrey, J. J., Holliman, E. C., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for

research and development. In Proceedings of the IEEE conference on acoustics, speech, and signalprocessing (Vol. 1, pp. 517–520). San Francisco, CA: Institute of Electrical and Electronics Engineers.

Gotelli, N. J., & Chao, A. (2013). Measuring and estimating species richness, species diversity, and biotic

similarity from sampling data. In S. A. Levin (Ed.), Encyclopedia of biodiversity (2nd ed., Vol. 5, pp.

195–211). Waltham, MA: Academic Press.

Grenander, U. (1976). Lectures in pattern theory (Vols. 1, Pattern Synthesis). New York: Springer-Verlag.

Gur, R. E., & Gur, R. C. (2002). Gender differences in aging: Cognition, emotions, and neuroimaging

studies. Dialogues in Clinical Neuroscience, 4, 197–210.Hale, J. (2006). Uncertainty about the rest of the sentence. Cognitive Science, 30, 609–642.Harasty, J., Double, K. L., Halliday, G. M., Krill, J. J., & McRitchie, D. A. (1997). Language-associated

cortical regions are proportionally larger in the female brain. Archives of Neurology, 54, 171–176.Harrington, J., Palethorpe, S., & Watson, C. I. (2007). Age-related changes in fundamental frequency and

formants: A longitudinal study of four speakers. In D. van Compernolle & L. Boves (Eds.), Proceedingsof the 8th Annual Conference of the International Speech Communication Association (Vol. 1, pp. 1081–1084). Baixas, France: International Speech Communication Association.

Hartshorne, J. K., & Germine, L. T. (2015). When does cognitive functioning peak? The asynchronous rise

and fall of different cognitive abilities across the lifespan. Psychological Science, 26, 433–443.Hartshorne, J. K., & Ullman, M. T. (2006). Why girls say “holded” more than boys. Developmental Science,

9, 21–32.Horton, W. S., Spieler, D. H., & Shriberg, E. (2010). A corpus analysis of patterns of age-related change in

conversational speech. Psychology and Aging, 25, 708–713.Hyvarinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural

Networks, 13, 411–430.Juste, F. S., & Furquim de Andrade, C. R. (2011). Speech disfluency types of fluent and stuttering

individuals: Age effects. Folia Phoniatrica et Logopaedica, 63, 57–64.Kav�e, G., Knafo, A., & Gilboa, A. (2010). The rise and fall of word retrieval across the lifespan. Psychology

and Aging, 25, 719–724.Kav�e, G., & Nussbaum, S. (2012). Characteristics of noun retrieval in picture descriptions across the adult

lifespan. Aphasiology, 26, 1238–1249.Kav�e, G., Samuel-Enoch, K., & Adiv, S. (2009). The association between age and the frequency of nouns

selected for production. Psychology and Aging, 24, 17–27.Kemper, S., Thompson, M., & Maquis, J. (2001). Longitudinal change in language production: Effects of

aging and dementia on grammatical complexity and propositional content. Psychology and Aging, 16,600–614.

Lau, E. F., & Ferreira, F. (2005). Lingering effects of disfluent material on comprehension of garden path

sentences. Language and Cognitive Processes, 20, 633–666.


Le, X., Lancashire, I., Hirst, G., & Jokel, R. (2011). Longitudinal detection of dementia through lexical and

syntactic changes in writing: A case study of three British novelists. Literary and Linguistic Computing,26, 435–461.

Lima, S. D., Hale, S., & Myerson, J. (1991). How general is general slowing? Evidence from the lexical

domain. Psychology and Aging, 6, 416–425.Marcus, M., Santorini, B., Marcinkiewicz, M. A., & Taylor, A. (1999). Treebank-3 LDC99T42. Philadelphia,

PA: Linguistic Data Consortium.

Meylan, S., & Gahl, S. (2014). The divergent lexicon: Lexical overlap decreases with age in a large corpus

of conversational speech. In P. Bello, M. Guarini, M. McShane, & B. Scasselatti (Eds.), Proceedings ofthe 36th Annual Conference of the Cognitive Science Society (pp. 1006–1011). Austin, TX: Cognitive

Science Society.

Miller, G. A. (1955). Note on the bias of information estimates. In H. Quastler (Ed.), Information theory inpsychology (pp. 95–100). Glencoe, IL: Free Press.

Miller, G. A., Beckwith, R., Fellbaum, C. D., Gross, D., & Miller, K. (1990). WordNet: An online lexical

database. International Journal of Lexicography, 3, 235–244.Mortensen, L., Meyer, A. S., & Humphreys, G. W. (2006). Age-related effects on speech production: A

review. Language and Cognitive Processes, 21, 238–290.Moscoso del Prado Mart�ın, F. (2014). Grammatical change begins within the word: Causal modeling of the

co-evolution of Icelandic morphology and syntax. In P. Bello, M. Guarini, M. McShane, & B. Scasselatti

(Eds.), Proceedings of the 36th Annual Conference of the Cognitive Science Society (pp. 2657–2662).Austin, TX: Cognitive Science Society.

Moscoso del Prado Mart�ın, F., Kosti�c, A., & Baayen, R. H. (2004). Putting the bits together: An information

theoretical perspective on morphological processing. Cognition, 94, 1–18. doi:10.1016/

j.cognition.2003.10.015

Nakamura, E., & Miyao, K. (2008). Sex differences in human biological aging. Journal of Gerontology A:Biological Sciences, 63, 936–944.

Nice, M. M. (1925). Length of sentences as a criterion of a child’s progress in speech. Journal ofEducational Psychology, 16, 370–379.

de Oliveira Martins, V., & Furquim de Andrade, C. R. (2008). Speech fluency developmental profile in

Brazilian Portuguese speakers. Pr�o-Fono Revista de Atualizac�~ao Cient�ıfica, 20, 7–12.Oviatt, S. (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and

Language, 9, 19–35.Parker, M. D., & Brorson, K. (2005). A comparative study between mean length of utterance in morphemes

(MLUm) and mean length of utterance in words (MLUw). First Language, 25, 365–376.Quen�e, H. (2013). Longitudinal trends in speech tempo: The case of Queen Beatrix. Journal of the

Acoustical Society of America, 133, EL452–EL457.Ramig, L. A., & Ringel, R. L. (1983). Effects of physiological aging on selected acoustic characteristics of

voice. Journal of Speech & Hearing Research, 26, 22–30.Ramscar, M., Hendrix, P., Shaoul, C., Milin, P., & Baayen, R. H. (2014). The myth of cognitive decline:

Non-linear dynamics of lifelong learning. Topics in Cognitive Science, 6, 5–42.Ruitenberg, A., Ott, A., van Swieten, J. C., Hofman, A., & Breteler, M. M. B. (2001). Incidence of dementia:

Does gender make a difference? Neurobiology of Aging, 22, 575–580.Schow, R., Christensen, J., Hutchinson, J., & Nerbonne, M. (1978). Communication disorders of the aged: A

guide for health professionals. Baltimore, MD: University Park Press.

Searl, J. P., Gabel, R. M., & Fulks, J. S. (2002). Speech difluency in centenarians. Journal of CommunicationDisorders, 35, 382–392.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(379–423), 623–656.

Shewan, C. M., & Henderson, V. L. (1988). Analysis of spontaneous language in the older normal

population. Journal of Communication Disorders, 21, 139–154.


http://dx.doi.org/10.1016/j.cognition.2003.10.015

http://dx.doi.org/10.1016/j.cognition.2003.10.015

Shriberg, E. (1996). Disfluencies in Switchboard. In H. T. Bunell & R. A. Foulds (Eds.), Proceedings of theinternational conference on spoken language processing (ICSLP’96) (Vol. Addendum, pp. 11–14).Philadelphia, PA.

Singer, T., Verhaeghen, P., Ghisletta, P., Lindenberger, U., & Baltes, P. B. (2003). The fate of cognition in

very old age: Six-year longitudinal findings in the Berlin Aging Study (BASE). Psychology and Aging,18, 318–331.

Sowell, E. R., Peterson, B. S., Thompson, P. M., Welcome, S. E., Henkenius, A. L., & Toga, A. W. (2003).

Mapping cortical change across the human life span. Nature Neuroscience, 6, 309–315.Stine-Morrow, E. A. L., Soederberg Miller, L. M., & Hertzog, C. (2006). Aging and self-regulated language

processing. Psychological Bulletin, 132, 582–606.Stoll, S., Bickel, B., Lieven, E., Paudyal, N. P., Banjade, G., Bhatta, T. N., & Rai, N. K. (2012). Nouns and

verbs in Chintang: Children’s usage and surrounding adult speech. Journal of Child Language, 39, 284–321.

Tweedie, F., & Baayen, R. H. (1998). How variable may a constant be? Measures of lexical richness in

perspective. Computers and the Humanities, 32, 323–352.Tyler, L. K., Shafto, M. A., Randall, B., Wright, P., Marslen-Wilson, W. D., & Stamakis, E. A. (2010).

Preserving syntactic processing across the adult life span: The modulation of the frontotemporal language

system in the context of age-related atrophy. Cerebral Cortex, 20, 352–364.Ullman, M. T., Estabrooke, I. V., Steinhauer, K., Brovetto, C., Pancheva, R., Ozawa, K., & Maki, P. (2002).

Sex differences in the neurocognition of language (abstract). Brain and Language, 83, 141–142.Ullman, M. T., Miranda, R. A., & Travers, M. L. (2007). Sex differences in the neurocognition of language.

In J. B. Becker, K. J. Berkley, N. Geary, E. Hampson, J. P. Herman, & E. Young (Eds.), Sex differencesin the brain. From genes to behavior (pp. 291–310). Oxford, UK: Oxford University Press.

Verhaeghen, P. (2003). Aging and vocabulary score: A meta-analysis. Psychology and Aging, 18, 232–339.Waters, G. S., & Caplan, D. (2001). Age, working memory, and on-line syntactic processing in sentence

comprehension. Psychology and Aging, 16, 128–144.Witelson, S. F., Glezer, I. I., & Kilgar, D. L. (1995). Women have greater density of neurons in posterior

temporal cortex. Journal of Neuroscience, 15, 3418–3428.Wood, S. N. (2006). Generalized additive models: An introduction with R. Boca Raton, FL: Taylor and

Francis.

Supporting Information

Additional Supporting Information may be found in

the online version of this article:

Appendix S1. Validity of the entropy bias adjustment

method.

Appendix S2. Independent component analysis.


Date post:	14-Mar-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Vocabulary, Grammar, Sex, and Agingpeople.psych.cornell.edu/~jec7/pcd 2015-16...

Documents