+ All Categories
Home > Documents > CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Date post: 04-Jan-2016
Category:
Upload: aubrey-davis
View: 219 times
Download: 2 times
Share this document with a friend
22
Introduction to Computational Linguistics Lecture 1 Overview
Transcript
Page 1: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

CSA2050 Introduction to Computational

Linguistics

Lecture 1

Overview

Page 2: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 2

Lecture 1

Course Information What is CL?

What is L? Course Contents

Page 3: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 3

Course Information

Webhttp://www.cs.um.edu.mt/~mros/csa2050

[email protected]@[email protected]

Book (nominally)Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2000, ISBN 0-13-095069-6

Natural Language Toolkit (NLTK)

Page 4: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 4

Human Language Technologies

Natural Language Processing (NLP) Computational models of language analysis, interpretation,

and generation. syntax/semantics interface

Natural Language Engineering emphasis on large-scale performance example: Google

Speech Technology Computational Linguistics

Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts

Page 5: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 5

CL: Two Main Disciplines

COMP SCILINGUISTICS

Page 6: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 6

Linguistics

Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use

Page 7: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 7

Noam Chomsky

Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.

Chomsky has been the dominant figure in linguistics ever since.

Chomsky invented the generative approach to grammar.

Page 8: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 8

Generative Grammar:Key Points

A language is a possibly infinite set of strings.

Grammar is a finite description of that set. Grammar is precisely defined. Theory of Grammar is a theory of human

linguistic abilities. Grammar should generate all and only the

strings of the language.[source: Sag & Wasow]

Page 9: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 9

A Simple Grammar + Lexicon

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

S

NP

N

John kicks

NPV

VP

N

Bill

Page 10: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 10

Generative Power of a Grammar

G

G

GL

L

L

undergenerationonly but not all

overgenerationall but not only

all and only

Page 11: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 11

Formal v. Natural Languages

Formal Languages

Numbers3290 1 1010101

Logicx man(x) mortal(x)

Cif (i >10) exit(0);

Natural Languages

EnglishJohn saw the dog

GermanJohann hat den hund gesehen

MalteseGianni ra kelb

Page 12: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 12

Points of Similarity

A language is considered to be a (possibly infinite) set of sentences.

Sentences are sequences of tokens. Formation rules determine which sequences

are valid sentences. Sentences have a definite structure. Sentence structure related to meaning.

Page 13: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 13

Structure Affects Meaning

I shot an elephant in my trousers

Page 14: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 14

Points of Difference

Formal Languages The grammar

defines the language

Restricted application

Non ambiguous

Natural Languages The language

defines the grammar

Universal application

Highly ambiguous

Page 15: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 15

Ambiguity Lexical Ambiguity

Iraqi Head Seeks Arms Syntactic Ambiguity

small animals and children laugh Semantic Ambiguity

every girl loves a sailor Pragmatic Ambiguity

can you pass the salt? The management of ambiguity is central to the

success of CL

Page 16: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 16

Algorithms and Linguistics Pure linguistics deals with

data grammar rules theories about grammar rules

Putting knowledge to some use involves processing.

Linguistic theory is silent about implementation issues

Implementation is central to Computational Linguistics

Page 17: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 17

Computational Linguistics – Issues

Representation of grammar and a lexicon How is the structure of a given sentence

actually discovered? Generation of a sentence to express a

particular meaning? Learning a language with limited exposure to

grammatical sentences?

Page 18: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 18

Unimplemented theoriescan be dangerous

Representational details omitted. Computer memory/complexity issues

omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable

Page 19: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 19

Computational LinguisticsTwin Goals

Scientific Goal:Contribute to Linguistics by adding a computational dimension.

Technological Goal: Develop basis for machinery capable of handling human language that can support “language engineering”

Page 20: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 20

Applications of Computational Linguistics

Machine Translation Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Dialogue Systems Speech

Page 21: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 21

The Information Food Chain1. input format2. tokenization3. gross text structure

paragraph sentences words

4. morphological analysis5. part of speech tagging6. syntactic analysis

parsing chunking

7. Semantic Analysis Entities

People Locations Organisations

Anaphora Resolution Relations

Page 22: CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.

Feb 2008 -- MR CSA2050 - Lecture I: What Is CL? 22

LECTURES

1 Overview

2 POS [RF]

3 Tagging

4 Tagging

5 Chunking

6 Chunking

7 Syntax[RF]

8 Parsing

9 Parsing

10 Morphology[RF]

11 Finite State

12 Finite State

13 Lexicon

14 Revision


Recommended