13 Jan 2006
An Introduction to Natural Language Syntax
Rajat [email protected]
CS-460/IT-632Department of Computer Science and EngineeringIndian Institute of Technology, Bombay
13 Jan 2006
Outline
Grammatical Analysis Finite State GrammarPhrase Structure Grammar Transformational GrammarNatural Language Phenomena
13 Jan 2006
A Ubiquitous Task for NLP
Sequence labeling task can be at different levels.In written text
WordsPhrasesSentencesParagraphs
13 Jan 2006
Names for Labeling Tasks
Words: Part of Speech tagging
Phrases: Chunking
Sentences: Parsing
Paragraphs: Co-reference annotating
13 Jan 2006
Example (Words: POS Tagging)
<s> The dispute shows clearly the global power of Japan's financial titans.</s>
<s>[ The/DT dispute/NN ] shows/VBZ clearly/RB [ the/DT global/JJ power/NN ]of/IN [ Japan/NNP 's/POS financial/JJ titans/NNS ]./.
</s>
13 Jan 2006
Example (Phrases: Chunking)
The dispute
shows clearly
the global power
Japan's financial titans
of
13 Jan 2006
Example (Sentences: Parsing)
( (S (NP-SBJ The dispute)(VP shows
(ADVP-MNR clearly)(NP (NP the global power)
(PP of(NP (NP Japan 's)
financial titans)))).))
13 Jan 2006
Parse TreeS
VP
V
globalshows
NP
Det
The dispute
JJ
NP
PP
of Japan’s financial
titans
NP
Det N
the power
N
13 Jan 2006
Example (Sentences: Co-referencing)
( (S (NP-SBJ-1 The banks)(VP (ADVP-MNR badly)
want(S (NP-SBJ *-1)
(VP to(VP break
(PP into(NP (NP all aspects)
(PP of(NP the securities business))))))))
13 Jan 2006
What is Grammar?
A theory of languageA theory of competence of a native speaker (in the context of a Natural Language)A finite set of rules
that generates only and all sentences of a language.that assigns an appropriate structural description to each one.
An explicit model of competence
13 Jan 2006
What are the requirements?
An explicit model of competenceShould be able to generate an infinite set of grammatical sentences of the languageShould not generate any ungrammatical onesShould be able to account for ambiguities (i.e., If a sentence is understood to have two meanings, the grammar should give two different structural description)If two sentences are understood to have same meaning, the grammar should give the same structure for both at some levelIf two sentences are understood to have different internal relationship, the grammar should assign different structural description
13 Jan 2006
What is Syntax?
Syntax is the study of the combination of words into phrases, clauses and sentences
Syntax describes how sentences and their constituents are structured
13 Jan 2006
Grammatical Analysis Techniques
Two main devices
MorphologicalCategorialFunctional
SequentialHierarchicalTransformational
Breaking up a String Labeling the Constituents
A grammar may combine any of these devices for grammatical analysis.
13 Jan 2006
Breaking up and LabelingSequential Breaking up
Sequential Breaking up and Morphological Labeling
Sequential Breaking up and Categorial Labeling
Sequential Breaking up and Functional Labeling
Hierarchical Breaking upHierarchical Breaking up
and Categorial LabelingHierarchical Breaking up
and Functional Labeling
13 Jan 2006
Sequential Breaking up
that student solve ed the problem s+ + + + + +
That student solved the problems.
13 Jan 2006
Sequential Breaking up and Morphological Labeling
That student solved the problems.
that student solve ed the problem s
word word stem affix word stem affix
13 Jan 2006
Sequential Breaking up and Categorial Labeling
This boy can solve the problem.
They called her a taxi.
this boy can solve the problem
Det N Aux V Det N
They call ed taxi
Pron V Affix N
her
Pron
a
Det
13 Jan 2006
Sequential Breaking up and Functional Labeling
They called taxi
Subject Verbal IndirectObject
her
Direct Object
a
They called
Subject Verbal
taxi
DirectObject
her
Indirect Object
a
13 Jan 2006
Hierarchical Breaking up
Old men and women
Old men and women
Old men and
women
Old men and women Old men and women
womenandmenmenOld
13 Jan 2006
Hierarchical Breaking up and Categorial Labeling
S
VP
V Adv
ran away
NP
A N
Poor John
Poor John ran away.
13 Jan 2006
Hierarchical Breaking up and Functional Labeling
Immediate Constituent (IC) AnalysisConstruction types in terms of the function of the constituents:
Predication (subject + predicate)Modification (modifier + head)Complementation (verbal + complement)Subordination (subordinator + dependent unit)Coordination (independent unit + coordinator)
13 Jan 2006
Predication
[Birds]subject [fly]predicate
S
PredicateSubject
Birds fly
13 Jan 2006
Modification
[A]modifier [flower]head
John [slept]head [in the room]modifier
S
PredicateSubject
HeadJohn Modifier
slept In the room
13 Jan 2006
Complementation
He [saw]verbal [a lake]complement
S
PredicateSubject
VerbalHe Complement
saw a lake
13 Jan 2006
Subordination
John slept [in]subordinator [the room]dependent unit
S
PredicateSubject
HeadJohn Modifier
slept
the room
Subordinator Dependent Unit
in
13 Jan 2006
Coordination
[John came in time] independent unit [but]coordinator[Mary was not ready] independent unit
S
CoordinatorIndependent Unit
John came in time but Mary was not ready
Independent Unit
13 Jan 2006
S
HeadModifier
In the morning,the sky looked much brighter
Subordinator DU PredicateSubject
Head
Head
Head Verbal ComplementModifierModifier
Modifier
In the morning, the sky looked much brighter.
An Example
13 Jan 2006
Hierarchical Breaking up and Categorial / Functional Labeling
Hierarchical Breaking up coupled with Categorial /Functional Labeling is a very powerful device.
But there are ambiguities which demand something more powerful.
E.g., Love of GodSomeone loves GodGod loves someone
13 Jan 2006
Hierarchical Breaking up
Love of God Love of God
Noun Phrase
Prepositional Phrase
Head
DU
Modifier
Godoflove
Sub
love of God
Categorial Labeling Functional Labeling
13 Jan 2006
Types of Generative Grammar
Finite State Model (sequential)
Phrase Structure Model (sequential + hierarchical) + (categorial)
Transformational Model (sequential + hierarchical + transformational)
+ (categorial + functional)
13 Jan 2006
Finite State Model
THE MAN
MENCOME
COMES
THEMAN
MENCOME
COMES
OLD
The machine begins in the
initial state, runs through a
sequence of states (producing a word with each transition), and ends in the final state (producing
a sentence)
13 Jan 2006
Phrase Structure Model
13 Jan 2006
Phrase Structure Grammar (PSG)
A phrase-structure grammar G consists of a four tuple (V, T, S, P), where V is a finite set of alphabets (or vocabulary)
E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP, student, sing, etc.
T is a finite set of terminal symbols: T ⊂ VE.g., student, sing, etc.
S is a distinguished non-terminal symbol, also called start symbol: S ∈ VP is a set of production rules
13 Jan 2006
Noun Phrases
John
NP
N
student
NP
N
the
Det
student
NP
N
the
Det
intelligent
AdjP
John the student the intelligent student
13 Jan 2006
Noun Phrase
five
NP
Quant
his
Det
first
Ord
students
N
PhD
N
his first five PhD students
13 Jan 2006
Noun Phrase
five
NP
Quant
the
Det
students
N
best
AP
of my class
PP
The five best students of my class
13 Jan 2006
Verb Phrases
sing
VP
V
can
Aux
the ball
VP
NP
can
Aux
hit
V
can sing can hit the ball
13 Jan 2006
Verb Phrase
a flower
VP
NP
can
Aux
give
V
to Mary
PP
Can give a flower to Mary
13 Jan 2006
Verb Phrase
John
VP
NP
may
Aux
make
V
the chairman
NP
may make John the chairman
13 Jan 2006
Verb Phrase
the book
VP
NP
may
Aux
find
V
very interesting
AP
may find the book very interesting
13 Jan 2006
Prepositional Phrases
in the classroom
the river
PP
NP
near
P
the classroom
PP
NP
in
P
near the river
13 Jan 2006
Adjective Phrases
intelligent
AP
A
honest
AP
A
very
Degree
of sweets
AP
PP
fond
A
intelligent very honest fond of sweets
13 Jan 2006
Adjective Phrase
• very worried that she might have done badly in the assignment
that she might have done badly in the assignment
AP
S’
very
Degree
worried
A
13 Jan 2006
Phrase Structure Rules
Rewrite Rules:1. S NP VP2. NP Det N3. VP V NP4. Det the5. N boy, ball6. V hit
We interpret each rule X Y as the instruction rewrite X as Y.
The boy hit the ball.
13 Jan 2006
Derivation
SentenceNP + VP (1) S NP VPDet + N + VP (2) NP Det NDet + N + V + NP (3) VP V NPThe + N + V + NP (4) Det theThe + boy + V + NP (5) N boyThe + boy + hit + NP (6) V hitThe + boy + hit + Det + N (2) NP Det NThe + boy + hit + the + N (4) Det theThe + boy + hit + the + ball (5) N ball
The boy hit the ball.
13 Jan 2006
PSG Parse Tree
The boy hit the ball.S
VPNP
VNDet
the
NP
NDet
the ball
boy hit
13 Jan 2006
PSG Parse TreeJohn wrote those words in the Book of Proverbs.
S
VPNP
VPropN NP
John wrote thosewords
PP
NP
in
P
thebook
ofproverbs
NP PP
13 Jan 2006
Transformational Model
13 Jan 2006
Transformational Grammar
If a generative grammar makes use of all the three
SequentialHierarchicaltransformational
breaking up and two categorialfunctional
labeling is called a Transformational grammar (Universal Grammar).
13 Jan 2006
Other Grammar Formalisms
Lexical Functional Grammar (LFG)Generalised Phrase Structure Grammar (GPSG)Tree Adjoining Grammar (TAG)Categorial Grammar (CG)Head-driven Phrase Structure Grammar (HPSG)Systemic Functional Grammar (SFG)
13 Jan 2006
Levels of Representation in Universal Grammar (UG)
Lexicon
Move -alphaD(eep)-Structure
S(urface)-Structure
LF (logical form)
PF (phonetic form)
13 Jan 2006
Interacting subsystems
UG consists of interacting subsystems Various subcomponents of the rule system of grammarSubsystems of Principles
13 Jan 2006
Subcomponents
Subcomponents of the rule systemLexiconSyntax
Categorial componentTransformational component
PF-componentLF-component
13 Jan 2006
Principles
Subsystem of PrinciplesX-bar TheoryTheta-theoryGovernment Binding Principles Case TheoryControl Theory
13 Jan 2006
Issues in Phrase Structure Grammar
LimitationOvergeneration
SolutionsSubcategorization RestrictionsSelectional Restriction
13 Jan 2006
Overgeneration
UngrammaticalityThe boy relied on the girl.* The boy relied the girl.*The boy relied.
Grammatically sound but semantically odd*The boy frightens sincerity.*Sincerity kicked the boy.
13 Jan 2006
Ungrammaticality
Given sentences:The boy relied on the girl.* The boy relied the girl.*The boy relied.
PS Rules: VP V (NP) (PP)NP Det NV relyDet theN boy | girl
13 Jan 2006
Subcategorization Frame
Specify the categorial class of the lexical item.Specify the environment.Examples:
kick: [V; _ NP]cry: [V; _ ] rely: [V; _PP] put: [V; _ NP PP]think: : [V; _ S` ]
13 Jan 2006
Subcategorization Frame
forwardV__ NP PP
invitationN__ PP
accessibleA__ PP
e.g., An invitation to the party
e.g., A program making science is more accessible to young people
e.g., We will be forwarding our new catalogue to you
13 Jan 2006
Subcategorization Rules
V y /_NP]_ ]_PP]_NP PP]_S`]
Subcategorization Rule:
13 Jan 2006
Applying Subcategorization Rules
1. S NP VP2. VP V (NP) (PP) (S`)…3. NP Det N4. V rely / _PP]5. P on / _NP]6. Det the7. N boy, girl
* The boy relied the girl.*The boy relied.
• The boy relied on the girl.
13 Jan 2006
Semantically Odd Constructions
Can we exclude these two ill-formed structures ?
*The boy frightened sincerity.*Sincerity kicked the boy.
Necessity of a mechanism
13 Jan 2006
Selectional Restrictions
Inherent Properties of Nouns:[+/- ABSTRACT], [+/- ANIMATE]
E.g.,Sincerity [+ ABSTRACT]Boy [+ANIMATE]
Lexical information of this type can be used to set up a context sensitive ‘rewrite rule’.
13 Jan 2006
Selectional Rules
A selectional rule specifies certain selectional restrictions associated with a verb.
V y /[+/-ABSTARCT]
[+/-ANIMATE]
V frighten/ [+/-ABSTARCT]
[+ANIMATE]
____
__
__
*The boy frightened sincerity.*Sincerity kicked the boy.
13 Jan 2006
Nature of Transformation
TopicalizationTopicalized NPTopicalized PP
MovementWh-movementRelative Pronoun movement
13 Jan 2006
TopicalizationI can solve this problem.This problem, I can solve.I can solve *(this problem).
S
VP
Aux NP
can Det N
V
solve
the problem
NP
I
Pron
13 Jan 2006
TopicalizationThis problem, I can solve.
S
VPNP
I
Aux NP
can t(race)i
V
solve
NPi
Det N
this problem
Pron
13 Jan 2006
TopicalizationTo John, Mary gave the book.
S
VPNP
V
Mary gave
PPi
P
to John
NNP
PP
t(race)i
NP
Det N
the book
N
13 Jan 2006
Wh-movement
John can solve this problem.Which problem can John solve?
S
VP
Aux NP
can Det N
V
solve
this problem
NP
John
N
13 Jan 2006
Wh-movement
S
VPNP
John
Aux
NP
can t(race)i
V
solve
NPi
Wh-Det N
which problem
N
Comp
S`
[Which problemi can John solve ti ? ]
13 Jan 2006
Relative Pronoun Movement
John heard the claim which Bill made.S
VP
V NP
heardDet N S`
the claimi
NP
John
N
…
13 Jan 2006
Relative Pronoun Movement
S`
S
NP VP
N V NP
made t(race)i
Comp
Rel-Pron
NP
whichi Bill
[the claim whichi Bill made ti ].NP
DetN
the claimi
13 Jan 2006
Relative Pronoun Movement[The problemi thati he solved ti was easy].
S`
S
NP VP
Pron V NP
Comp
Rel-Pron
NP
solvedthati
S
NP VP
he
V AP
was
easy
Det N
the problemi
A
t(race)i
13 Jan 2006
Parser Output
The problem that he solved was easy.The problem that he solved was easy.
SBAR
S
NP VP
PRP VBD
IN
solvedthat he
NP VP
AUX ADJP
was
easy
DT NN
the problem
JJ
S
13 Jan 2006
X-bar Theory
It tells us how words are combined to make phrases and sentences.
It captures the commonality between different types of phrases, which PS-rules cannot.
13 Jan 2006
X-bar Projection
XP
X `
X ZP
YP
(Maximal projection)
(Intermediate projection)
(Zero projection)
13 Jan 2006
X-bar Projection
XP
X ZP
X `YP
(X-phrase)
(Head)
(Specifier)
(Complement)
13 Jan 2006
X-bar Projection
XP
X `
X
ZP
YP(Specifier)
X `
ZP(Head) (Complement)
(Adjunct)
13 Jan 2006
X-bar Projection
NP
N `
PP
NP
John’s N
solution
to the problem
13 Jan 2006
X-bar Projection
NP
N `
N
PP
Det
of the cricket match
theN `
PP
discussion
In the cabinet meeting
13 Jan 2006
X-bar Theory
[Specifier-Head-Complement]SHC
[Specifier-Complement-Head]SCH
[Head-Complement-Specifier]HCS
Every phrase is endocentric. There is a specific relation between the specifier and the head, i.e., Spec-Head configuration.
13 Jan 2006
C(onstituent)-command
C-command is a structural relation among the terminal and non-terminal nodes in a syntactic treeα c-commands β iff:
the first branching node dominating α also dominates βα does not dominate β
A
B
C D
E
F G
13 Jan 2006
C-commandNP
N `
N
Det
the cricket match
the N `
PP
discussion P
of
NP
PP
P
of
NP
N `Det
the
meetingN
13 Jan 2006
Government
α governs β iffα is a lexical head (or tensed I)α C-commands βNo barrier (VP, NP, PP, AP, or tensed IP) intervenes between α and β
13 Jan 2006
Theta-Theory
Hit: <1,2> (argument structure)<Agent, Patient> (thematic structure)
Smile: <1> (argument structure)<Agent> (thematic structure)
Forward: <1,2,3> (argument structure)<Agent, Theme, Goal> (thematic structure)
Theta-CriterionEach argument must be assigned a theta-roleEach theta-role must be assigned to an argument
13 Jan 2006
Thematic Roles
The man forwarded the mail to the minister.
forward
V__ NP PP
Event FORWARD [Agent THE MAN], [Theme THE MAIL],
[Goal TO THE MINISTER]
()
13 Jan 2006
Binding Principles
A relation, called Bindingα binds β iff
α c-commands βα and β are co-indexed
Rajivi likes himselfi.
13 Jan 2006
BindingIP
I `NP
Rajiv
I
TenseAGR
…
VP
NP
NPV
V `
like
himselfi
N`
t
N`
N
N
13 Jan 2006
BindingIP
I `NP
Rajiv’sbrother I
TenseAGR
…
VP
NP
NPV
V `
like N`
t
Nhimselfi
13 Jan 2006
Binding
Rajivi’s brotherj likes himself*i /j[Rajiv’s brother] is the antecedent of [himself]. [Rajiv] cannot be the antecedent of [himself].That is, the sentence cannot mean that “Rajivi’sbrother likes Rajivi”.A particular kind of structural relation is maintained between [Rajiv’s brother] and [himself], but not between [Rajiv] and [himself].This structural relation is called
C(onstituent)-command.
13 Jan 2006
For the purpose of interpretation, noun phrases have been conveniently divided into three groups:
Anaphors (Reflexives and Reciprocals)e.g., myself, yourself, each other, one another, etc
Pronounse.g. he, she, it, we, etc
R-Expressions e.g., John, Mumbai
Binding
13 Jan 2006
Binding Principles
Principle A: An anaphor is bound in its governing category
Rajivi likes himselfiPrinciple B: A pronominal is free in its governing category
Rajivi likes him*i / j
Principle C: An R-expression is always freeJohn likes Mary
ExamplesWe think that nobody likes us.*We think that nobody likes ourselves.
13 Jan 2006
Natural Language PhenomenaAgreement
Subject-verb agreementAgreement in Relative Pronouns (English):
The man who/*which I sawThe book which/*who I saw
AmbiguityThe mayor asked the police to stop drinking after midnight.Yesterday I saw a crane in the campus.
Negation ScopeJohn did not deliberately broke the glass.John deliberately did not broke the glass.
Quantifier ScopeEvery student likes a teacher in the class.
GappingJohn bought a story book and Mary a pen.Meena was crying because her mother was.
13 Jan 2006
Natural Language PhenomenaScrambling effectSlifting
John has robbed the bank, I believe.Sluicing
John bought something but I don’t know what [John bought t].Question
Auxiliary InversionWh-frontingIntonationWh-in situ
Control StructuresI compelled John to read this article.I promised John to read this article.
13 Jan 2006
Suggested ReadingsChomsky, N. 1957. Syntactic Structures. Mouton, The Hague.Chomsky, N. 1981. Lectures on Government and Binding. MIT, Mass.Radford, A. 1988. Transformational Grammar. CUP.Jurafsky, D and J. Martin, 2000. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, New Jersey.Allen, James, 1995. Natural Language Understanding. The Benjamins/Cummings Publishing Company, Inc. UK.
13 Jan 2006
Thank You