Developing a phylogeny of Pama-Nyungan
o The Data(base)o Bowern and Atkinson’s (2012) phylogenyo Second generation questions: Extending the analysis:
o Bowern and Atkinson’s ‘problem’ subgroupso Unidentified languageso Unity of Pama-Nyungan within Australia:o Exploratory qualitative phylogenetics
o Conclusions
Why Australia?o Natural laboratory for language changeo Linguistically diverse: large number of languages and familieso Only continent without agriculture before the Colonial periodo Ecologically diverseo Claimed to be exceptional
• Claims that traditional methods don’t work: but based on universals of perception, production, and human interaction
• Estimates for age of Pama-Nyungan range 4,000-40,000 years• 20+ primary subgroups is very unusual
o Many outstanding questions; little previous work; great test case for phylogenetic methods
The CHIRILA database (Bowern submitted)o 775,000 lexical items
o 343 Pama-Nyungan languages; 1140+ doculects
o 56 Non-Pama-Nyungan languages, 15+ families
o The entire corpus of Tasmanian
o Grammatical features for 90 languages
o Morphology collection in progress
o Data collection, curation, and processing is still very much in progress.
o First data release this Fall.
o Aim complete lexical records for Australia
Problem: The Pama-Nyungan ‘Rake’
O’Grady, Voegelin & Voegelin (1965), Dixon (1980, 2002), Hercus (1994), Bowern & Koch (2004), etc.
o Missing data?
o Too many loans?
o Haven’t looked hard enough?
o Or indicative of how hunter-gatherer languages expand?
Research Questions:
1. Can we recover the uncontroversial lower-level groupings? [testing internal validity of model]
2. What higher-level groupings do we reconstruct?
3. What level of support do they have?
Methods/Datao 194 Languageso 189 words of basic vocabulary, coded for cognacyo Stochastic Dollo model [vs CTMC and Covarion]o Relaxed clock [root fixed at 10,000 years/calibration
points]
1) Subgroup recovery
o Tracked 28 subgroups; recovered 24:
o Problems: 4 groups appear as paraphyletic• Western Torres (Mabuiag) has high
replacement levels;• Paman has missing data and was
under-sampled.• Ngumpin-Yapa and Yardli have very
high loan levels;• In addition, Yardli has high levels of
missing data;
Next stages (2012-15):o Sample undersampled areas [Paman, Kulin]
o Extend cognate coding to additional widespread, well attested forms
o Study the effects of loans on coding [recoding solves Ngumpin, but so does adding more cognates and langs].
o Look at language and (phylo)geography [in progress]
o Use the tree to probe unidentified wordlists.
o Examine the unity of Pama-Nyungan, by coding relatives and adjacent families [Garrwan, Tangkic, Nyulnyulan, Worrorran]
o Use the tree in ancestral state reconstruction [cf. Zhou and Bowern 2015, Bowern et al 2013, etc].
More languages and cognate codingo Added 105 languages o Added 20 cognates (body parts, kin terms, ‘camp’, ‘hill’)o Numerous minor coding changes, updating current
knowledge, typographical errors, etco This solved the lower level Western Torres, Paman and
Karnic (Yardli) problems. That is, we now recover all groups as per established classifications as monophyletic.
o Implication: Even within ‘basic vocabulary’, the wordlist matters. This needs further investigation.
Moving beyond ‘Swadesh’ listso Lexical replacement is a model of semantic change.o We don’t have very good models of lexical semantic change
(though cf. Urban [2014] for a start).o BUT, we do know that there are homologous changes in
body parts (Wilkins 1996), and body parts are core basic vocabulary; cf also Bowern et al (2013) on kinship vocab.
o We want low-loan, low-homoplasy data, not just ‘slow’ data (cf. Round [yesterday]; Dellert and Buch [this morning])
o diagnosing family-level relationships is not the same as inferring internal tree structure
Language and Geography:
o Core-periphery model in language change:o Centers of (dialect) areas are
innovative; innovations spread to periphery
Unidentified languages/wordlists
o Poorly attested materials: do they belong to languages we already know about?
o Or are there additional languages not previously identified in classifications?
o Can we classify languages with doubtful subgroup affiliation?
o Solution: code for cognacy and investigate phylogenetically
o Relevant both for science and for revitalization/reclamation efforts
How many languages?
o Previous estimates: c. 200-250 languages [Dixon 1980, 2002, O’Grady, Voegelin and Voegelin 1966, Wurm 1972, Walsh 1991, 1997, etc]
o Walsh (1997) notes problems with reconciling the figure of 250 languages with per language population estimates.
o Method here: counting ‘languages’ as the reference names used in the database (that is, names that are used to group sources together that experts say belong to the same language).
More languages than we thought
o 397 Australian Languages
o 303 Pama-Nyungan
o 94 non-Pama-Nyungan
o 20 non-Pama-Nyungan families
o 30 Pama-Nyungan subgroups
o Sources of discrepancy: • under-counting (e.g. in Yolŋu,
Paman, Giimbiyu)• treating badly attested varieties as
dialects of better known varieties• multiple (different) languages with
the same name (e.g. Kungkari, Dharawal, Yugambeh/-bal)
Unity of Pama-Nyungano Pama-Nyungan’s nearest
relatives:• Garrwan • Tangkic
o Classed as Pama-Nyungan in early classifications on the basis of typology (eg OVV65)
o Reclassified in Blake (1988) on their pronouns
Ten years ago…
o no Pama-Nyungan treeo no consensus on how Pama-Nyungan subgroups are
relatedo no data repositoryo and therefore, no easy way to study change in
Australia
Now…o Much better idea of macro-groupings, but still substantial
issues about how they might fit together.• the data matters• the model matters• [not insoluble, just work for the future]
o Much better idea of the extent of the diversity on the continent• More than we thought…
o CHIRILA database, access to extensive datao New ways to investigate language in space, questions of
language diversification in spaceo Need for further investigation of the internal composition of
Pama-Nyungan.
Acknowledgments
o NSF grants BCS-0844550 and BCS-1423711o The Aboriginal and Torres Strait Islanders
who have given permission for their languages to be included in the database, and made data available.
o The 100+ linguists who have given permission for their work to be included in the database.
o The 50+ research students (undergraduates and graduates) who have been involved in the project since 2007, at Rice Univ. and Yale.
o Russell, who got me interested in this approach.