Computational Linguistics

transcript

What is it and what (if any) are its

unifying themes?

Computational linguistics

I often agree with XKCD…

re rig

less rig

re flakey

literarycriticism

physics biologychemistry psychology

computational linguistics

neuropsychology

linguistics?

What defines the rigor of a field?

• Whether results are reproducible

• Whether theories are testable/falsifiable

• Whether there are a common set of methods for similar problems

• Whether approaches to problems can yield interesting new questions/answers

Linguistics

re rig

less rig

literarycriticism

engineering sociologylinguistics

re rig

less rig

other areas of sociolinguistics(e.g. D

eborah Tannen)

“theoretical” linguistics(e.g. m

inimalist syntax)

some areas of sociolinguistics

(e.g. Bill Labov)

The true situation with linguistics

psycholinguistics

experimental phonetics

historical linguistics

“theoretical” linguistics(e.g. lexical-functional gram

Okay enough alreadyWhat is computational linguistics

• Text normalization/segmentation• Morphological analysis• Automatic word pronunciation prediction• Transliteration• Word-class prediction: e.g. part of speech tagging• Parsing• Semantic role labeling• Machine translation• Dialog systems• Topic detection• Summarization• Text retrieval• Bioinformatics• Language modeling for automatic speech recognition• Computer-aided language learning (CALL)

Computational linguistics

• Often thought of as natural language engineering

• But there is also a serious scientific component to it.

Why CL may seem ad hoc

• Wide variety of areas (as in linguistics)

• If it’s natural language engineering, the goal is often just to build something that works

• Techniques tend to change in somewhat faddish ways…– For example: machine learning approaches

fall in and out of favor

Machine learning in CL

• In general it’s a plus since it has meant that evaluation has become more rigorous

• But it’s important that the field not turn into applied machine learning

• For this to be avoided, people need to continue to focus on what linguistic features are important

• Fortunately, this seems to be happening

Some interesting themes…

• Finite-state methods:– Many application areas– Raises interesting questions about how much of

language is “regular” (in the sense of “finite state”)

• Grammar induction:– Linguists have done a poor job at their stated goal of

explaining how humans learn grammar

• Computational models of language change:– Historical evidence for language change is only

partial. There are many changes in language for which we have no direct evidence.

Finite state methods

• Used from the 1950’s onwards

• Went out of fashion a bit during the 1980’s

• Then a revival in the 1990’s with the advent of weighted finite-state methods

Some applications

• Analysis of word structure – morphology

• Analysis of sentence structure– Part of speech tagging– Parsing

• Speech recognition

• Text normalization

• Computational biology

• …

Regular languages

• A regular language is a language with a finite alphabet that can be constructed out of one or more of the following operations:– Set union– Concatenation– Transitive closure (Kleene star)

Finite state automata: formal definition

Every regular language can be recognized by a finite-state automaton.Every finite-state automaton recognizes a regular language. (Kleene’s theorem)

Representation of FSA’s: State Diagram

Regular relations: formal definition

Finite-state transducers

An FST

Composition

• In addition to union, concatenation and Kleene closure, regular relations are closed under composition

• Composition is to be understood here the same way as composition in algebra:– R1oR1 means take the output of R1 and feed it

to the input of R2

Composition: an illustration

R1 as a transducer

R2 as a transducer

R1○R2

Some things you can do with FSTs

• Text analysis/normalization– Word segmentation– Abbreviation expansion– Digit-to-number-name mappingsi.e. mapping from writing to language

• Morphological analysis• Syntactic analysis

– E.g. part-of-speech tagging

• (With weights) pronunciation modeling and language modeling for speech recognition

That’s fine for engineering but…

• Does it really account for the facts?– Is morphology really regular?– Is the mapping between writing and speech

really regular?

What is morphology?

• scripsērunt is third person, plural, perfect, active of scrībō (`I write’)

• Morphology relates word forms– the “lemma” of scripsērunt is scrībō

• Morphology analyzes the structure of word forms– scripsērunt has the structure scrīb+s+ērunt

Morphology is a relation

• Imagine you have a Latin morphological analyzer comprising:– D: a relation that maps between surface form

and decomposed form– L: a relation that maps between decomposed

form and lemma

• Then:– scripsērunt ○ D = scrīb+s+ērunt– scripsērunt ○ D ○ L = scrībō

English regular plurals

• cat + s = cats /s/

• dog + s = dogs /z/

• spouse + s = spouses /Əz/

• This can be implemented by a rule that composes with the base word, inserting the relevant form of the affix at the end

Templatic affixes in Yowlumne

Transducer for each affix transforms base into required templaticform and appends the relevant string.

Subtractive morphology

Transducer deletes final VC of the base…

Bontoc infixation

• Insert a marker “>” after the first consonant (if any)

• Change “>” into the infix –um-

Side note … infixation in English

Kalamazoo

f*****g

Reduplication: Gothic

Problem: mapping w to ww is not a regular relation

Factoring Reduplication

• Prosodic constraints

• Copy verification transducer C

Non-Exact Copies

• Dakota (Inkelas & Zoll, 1999):

Non-Exact Copies

• Basic and modified stems in Sye (Inkelas & Zoll, 1999):

“they will fall all over”

Morphological Doubling Theory(Inkelas & Zoll, 1999)

• Most linguistic accounts of reduplication assume that the copying is done as part of morphology

• In MDT:– Reduplication involves doubling at the

morphosyntactic level – i.e. one is actually simply repeating words or morphemes

– Phonological doubling is thus expected, but not required

Gothic Reduplication under Morphological Doubling Theory

Summary

• If Inkelas & Zoll are right then all morphology can be computed using regular relations

• This in turn suggests that computational morphology has picked the right tool for the job

Another Example: Linguistic analysis of text

• Maps between the stuff you see on the page – e.g. text written in the standard orthography of a language – into linguistic units (words, morphemes, phonemes…)

• For example:– I ate a 25kg bass– [aI εIt Ə twεnti faIv kIlƏgræm bæs]

• This can be done using transducers– But is the mapping between writing and language

really regular (finite-state)?

Linguistic analysis of text

• Abbreviation expansion

• Disambiguation

• Number expansion

• Morphological analysis of words

• Word pronunciation

• …

A transducer for number namesConsider a machine that maps between digit strings and their reading as number names in English.

30,294,005,179,018,903.56 → thirty quadrillion, two hundred and ninety four trillion, five billion, one hundred seventy nine million, eighteen thousand, nine hundred three, point five six

Mapping between speech and writing

It seems obvious on the face of it that the mapping between speech and its written form is regular. After all, the words are ordered in the same way as speech. Even the tend to be ordered in the same lettersway as the sounds they represent.

Some examples where it isn’t…

twt`nx

‘honorific inversion’

Finite state methods

• In morphology they seem almost exactly correct as characterizations of the natural phenomenon

• In the mapping from writing to language, again, finite-state models seem almost exactly correct

Grammar induction

The common “nativist” view in linguistics…From Gilbert Harman's review of Chomsky's New Horizons in the Study of Language and Mind (published in Journal of Philosophy, 98(5), May 2001):

Further reflection along these lines and a great deal of empirical study of particular languages has led to the "principles and parameters" framework which has dominated linguistics in the last few decades. The idea is that languages are basically the same in structure, up to certain parameters, for example, whether the head of a phrase goes at the beginning of a phrase or at the end. Children do not have to learn the basic principles, they only need to set the parameters. Linguistics aims at stating the basic principles and parameters by considering how languages differ in certain more or less subtle respects. The result of this approach has been a truly amazing outpouring of discoveries about how languages are the same yet different.

Similarly…

Children come equipped with a set of principles of grammar construction (i.e. Universal Grammar (UG)). The principles of UG have open parameters. Specific grammars arise once values for these open parameters are specified. Parameter values are determined on the basis of [the primary linguistic data]. A language specific grammar, then, is simply a specification the values that the principles of UG leave open.

Cedric Boeckx and Norbert Hornstein. 2003. “The Varying Aims of Linguistic Theory.”

My “challenge” with Shalom Lappin

Automatic induction of grammars from unannotated text

• Klein, Dan and Manning, Christopher. 2004. “Corpus-based induction of syntactic structure: models of dependency and constituency”. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

• Lots of subsequent work…

Different syntactic representations

Dependency Model with Valence (DMV)

• Each head generates a set of non-STOP arguments to one side, then a STOP argument; then similarly on the other side

• Trained using expectation maximization

Performance

Improvements

• Constituent structure can be induced in a similar way to inducing word classes (e.g. parts of speech) – by considering the environments in which the putative constituent finds itself.

• In Klein & Manning’s constituent-context model (CCM) probability of a bracketing is computed as follows:

Combined DMV+CCM

Subsequent work – e.g. Rens Bod’s 2006 Unsupervised Data Oriented Parsing – report F-scores close to 83.0

For comparison, the best supervised parsers get about 91.0

Some objections … and a synopsis

• Children do not learn grammars from unannotated text corpora: they get a lot of guidance from the environmental situation– Sure

• Performance of automatic induction algorithms is still far from human performance so they do not constitute evidence that we can do away with (nativist) linguistic theories of language acquisition– They do not show this. But the argument would have more

weight if nativist theories had already been demonstrated to contribute to a working model of grammar induction

• But Computational Linguistics is starting to make some serious contributions to this 50-year-old debate

The evolution of complex structure in language

Examples from: Stump, Gregory (2001) Inflectional Morphology:A Theory of Paradigm Structure. Cambridge University Press.

Evolutionary Modeling (A tiny sample)

• Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98.

• Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of Language Universals. Oxford

• Nettle, D. "Using Social Impact Theory to simulate language change". Lingua, 108(2-3):95--117, 1999.

• de Boer, B. (2001) The Origins of Vowel Systems. Oxford

• Niyogi, P. (2006) The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press.

A multi-agent simulation• System is seeded with a grammar and small number of agents

– Each agent randomly selects a set of phonetic rules to apply to forms– Agents are assigned to one of a small number of social groups

• 2 parents “beget” child agents.– Children are exposed to a predetermined number of training forms combined

from both parents• Forms are presented proportional to their underlying “frequency”

– Children must learn to generalize to unseen slots for words– Learning algorithm similar to:

• David Yarowsky and Richard Wicentowski (2001) "Minimally supervised morphological analysis by multimodal alignment." Proceedings of ACL-2000, Hong Kong, pages 207-216.

• Features include last n-characters of input form, plus semantic class– Learners select the optimal surface form to derive other forms from (optimal =

requiring the simplest resulting ruleset – a Minimum Description Length criterion)• Forms are periodically pooled among all agents and the n best forms are

kept for each word and each slot• Population grows, but is kept in check by “natural disasters” and a quasi-

Malthusian model of resource limitations– Agents age and die according to reasonably realistic mortality statistics

Final states for a given initial state

Another example

• Kirby, Simon. 2001. “Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity.” IEEE Transactions on Evolutionary Computation, 5(2):102--110.

• Assumes two meaning components each with 5 values, for 25 possible words

• Initial “speaker” randomly selects examples from the 25, producing random strings for each, and “teaches” them to the “hearer”

• Not all of the slots are filled, thus producing a “bottleneck”: the hearer must compute forms for the missing slots

The basic algorithm produces results that are too regular

Initial state

Final state

A more realistic result…• Addition of other

constraints, including – a random tendency for

“speakers” to omit symbols,

– a frequency distribution over the 25 possible meaning combinations

Summary

• Evolutionary modeling is evolving slowly– We are a long way from being able to model

the complexities of known language evolution

• Nonetheless, computational approaches promise to lend insights into how complex social systems such as language change over time, and complement discoveries in historical linguistics

Final thoughts

• Language is central to what it means to be human.

• Language is used to:– Communicate information– Communicate requests– Persuade, cajole…– (In written form) record history– Deceive

• Other animals do some or most of these things (cf. Anindya Sinha’s work on bonnet macaques)

• But humans are better at all of these

Final thoughts

• So the scientific study of language ought to be more central than it is

• We need to learn much more about how language works– How humans evolved language– How languages changed over time– How humans learn language

• Computational linguistics can contribute to all of these questions.

Computational Linguistics

Documents