Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | aubrey-davis |
View: | 219 times |
Download: | 2 times |
CSA2050 Introduction to Computational
Linguistics
Lecture 1
Overview
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 2
Lecture 1
Course Information What is CL?
What is L? Course Contents
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 3
Course Information
Webhttp://www.cs.um.edu.mt/~mros/csa2050
[email protected]@[email protected]
Book (nominally)Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2000, ISBN 0-13-095069-6
Natural Language Toolkit (NLTK)
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 4
Human Language Technologies
Natural Language Processing (NLP) Computational models of language analysis, interpretation,
and generation. syntax/semantics interface
Natural Language Engineering emphasis on large-scale performance example: Google
Speech Technology Computational Linguistics
Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 5
CL: Two Main Disciplines
COMP SCILINGUISTICS
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 6
Linguistics
Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 7
Noam Chomsky
Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.
Chomsky has been the dominant figure in linguistics ever since.
Chomsky invented the generative approach to grammar.
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 8
Generative Grammar:Key Points
A language is a possibly infinite set of strings.
Grammar is a finite description of that set. Grammar is precisely defined. Theory of Grammar is a theory of human
linguistic abilities. Grammar should generate all and only the
strings of the language.[source: Sag & Wasow]
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 9
A Simple Grammar + Lexicon
grammar:
S NP VPNP NVP V NPlexicon:
V kicksN JohnN Bill
S
NP
N
John kicks
NPV
VP
N
Bill
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 10
Generative Power of a Grammar
G
G
GL
L
L
undergenerationonly but not all
overgenerationall but not only
all and only
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 11
Formal v. Natural Languages
Formal Languages
Numbers3290 1 1010101
Logicx man(x) mortal(x)
Cif (i >10) exit(0);
Natural Languages
EnglishJohn saw the dog
GermanJohann hat den hund gesehen
MalteseGianni ra kelb
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 12
Points of Similarity
A language is considered to be a (possibly infinite) set of sentences.
Sentences are sequences of tokens. Formation rules determine which sequences
are valid sentences. Sentences have a definite structure. Sentence structure related to meaning.
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 13
Structure Affects Meaning
I shot an elephant in my trousers
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 14
Points of Difference
Formal Languages The grammar
defines the language
Restricted application
Non ambiguous
Natural Languages The language
defines the grammar
Universal application
Highly ambiguous
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 15
Ambiguity Lexical Ambiguity
Iraqi Head Seeks Arms Syntactic Ambiguity
small animals and children laugh Semantic Ambiguity
every girl loves a sailor Pragmatic Ambiguity
can you pass the salt? The management of ambiguity is central to the
success of CL
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 16
Algorithms and Linguistics Pure linguistics deals with
data grammar rules theories about grammar rules
Putting knowledge to some use involves processing.
Linguistic theory is silent about implementation issues
Implementation is central to Computational Linguistics
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 17
Computational Linguistics – Issues
Representation of grammar and a lexicon How is the structure of a given sentence
actually discovered? Generation of a sentence to express a
particular meaning? Learning a language with limited exposure to
grammatical sentences?
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 18
Unimplemented theoriescan be dangerous
Representational details omitted. Computer memory/complexity issues
omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 19
Computational LinguisticsTwin Goals
Scientific Goal:Contribute to Linguistics by adding a computational dimension.
Technological Goal: Develop basis for machinery capable of handling human language that can support “language engineering”
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 20
Applications of Computational Linguistics
Machine Translation Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Dialogue Systems Speech
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 21
The Information Food Chain1. input format2. tokenization3. gross text structure
paragraph sentences words
4. morphological analysis5. part of speech tagging6. syntactic analysis
parsing chunking
7. Semantic Analysis Entities
People Locations Organisations
Anaphora Resolution Relations
Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 22
LECTURES
1 Overview
2 POS [RF]
3 Tagging
4 Tagging
5 Chunking
6 Chunking
7 Syntax[RF]
8 Parsing
9 Parsing
10 Morphology[RF]
11 Finite State
12 Finite State
13 Lexicon
14 Revision