+ All Categories
Home > Documents > Introduction to Computational Linguistics (LIN3060) Lecture 1 Computers and Language.

Introduction to Computational Linguistics (LIN3060) Lecture 1 Computers and Language.

Date post: 14-Dec-2015
Category:
Upload: candice-alice-daniels
View: 233 times
Download: 10 times
Share this document with a friend
Popular Tags:
30
Introduction to Computational Linguistics (LIN3060) Lecture 1 Computers and Language
Transcript

Introduction to Computational

Linguistics (LIN3060)

Lecture 1

Computers and Language

Feb 2005 -- MR CLINT - Lecture 1 2

Course Information

Webhttp://www.cs.um.edu.mt/~mros/lin3060

[email protected]@um.edu.mt

Books Speech and Language Processing, Jurafsky and

Martin, Prentice Hall 2000 Algorithmics, David Harel, Addison Wesley, 2004

Feb 2005 -- MR CLINT - Lecture 1 3

Computers and Language

Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts

Natural Language Processing Computational models of language analysis,

interpretation, and generation. Language Engineering

emphasis on large-scale performance example: Google

Feb 2005 -- MR CLINT - Lecture 1 4

CL: Two Main Disciplines

COMP SCILINGUISTICS

Feb 2005 -- MR CLINT - Lecture 1 5

Linguistics is Multi Layered

Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use

Feb 2005 -- MR CLINT - Lecture 1 6

Noam Chomsky

Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.

Chomsky has been the dominant figure in linguistics ever since.

Chomsky invented the generative approach to grammar.

Feb 2005 -- MR CLINT - Lecture 1 7

Generative Grammar is Prescriptive

Prescriptive Grammar

Rules for and against certain uses

Proscribed forms that are in current use

“don’t end a sentence with a preposition”

Subjective

Descriptive Grammar

Rules characterizing what people actually say

Goal to characterize all and only that which speakers find acceptable

Objective

Feb 2005 -- MR CLINT - Lecture 1 8

Generative Grammar:Key Points

A language is a (possibly infinite) set of sentences. Grammar is finite. Grammar of a particular language expresses

linguistic knowledge of that language Theory of Grammar includes mathematical definition

of what a grammar is. The “Theory of Grammar” is a theory of human

linguistic abilities.[source: Sag & Wasow]

Feb 2005 -- MR CLINT - Lecture 1 9

Theories of Sentence and Word Structure: Rewrite Rules

Rules can be used to specify the sentences of a language.

Rules have the formLHS RHS LHS may be a sequence of symbols RHS may be a sequence of symbols or words.

Lexicon specifies words and their categories

Feb 2005 -- MR CLINT - Lecture 1 10

A Simple Grammar/Lexicon

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

S

NP

N

John kicks

NPV

VP

N

Bill

Feb 2005 -- MR CLINT - Lecture 1 11

Formal v. Natural Languages

Formal Languages

Arithmetic3290 1 1010101

Logicx man(x) mortal(x)

URLhttp://www.cs.um.edu.mt

Natural Languages

EnglishJohn saw the dog

GermanJohann hat den hund gesehen

MalteseĠianni ra kelb

Feb 2005 -- MR CLINT - Lecture 1 12

Points of Similarity

A language is considered to be a (possibly infinite) set of sentences.

Sentences are sequences of words. Rules determine which sequences are valid

sentences. Sentences have a definite structure. Sentence structure related to meaning.

Feb 2005 -- MR CLINT - Lecture 1 13

Points of Difference

Formal Languages The grammar

defines the language

Restricted application

Non ambiguous

Natural Languages The language

defines the grammar

Universal application

Highly ambiguous

Feb 2005 -- MR CLINT - Lecture 1 14

Ambiguity Morphological Ambiguity

en-large-ment Lexical Ambiguity

the sheep is in the pen Syntactic Ambiguity

small animals and children laugh Semantic Ambiguity

every girl loves a sailor Pragmatic Ambiguity

can you pass the salt? The management of ambiguity is central to the

success of CL in general and MT in particular.

Feb 2005 -- MR CLINT - Lecture 1 15

Computer Science

The study of basic concepts Information Data Algorithm Program

The application of these concepts to practical tasks.

Implementation of computational models.

Feb 2005 -- MR CLINT - Lecture 1 16

Information Information is an theoretical concept invented by Shannon in

1948 to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits

1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else).

When I tell you that I have tea, I have conveyed one bit of information.

The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome.

Feb 2005 -- MR CLINT - Lecture 1 17

Data

A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means.

Example: a telephone directory Unlike information, which is abstract, data is

concrete Data has a certain level of structure. In the

telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number.

Feb 2005 -- MR CLINT - Lecture 1 18

Algorithm

Feb 2005 -- MR CLINT - Lecture 1 19

Algorithm

Input: ingredients Output: delicious chocolate cake Method: Algorithm

Hardware: oven, pan, chef Software: recipe

Feb 2005 -- MR CLINT - Lecture 1 20

Algorithm to Add X and Y

subtract 1 from X

add 1 to Y

X = 0?

Read X and YX = 2, Y = 3

yesnoOutput Y

Feb 2005 -- MR CLINT - Lecture 1 21

Algorithm

A well defined procedure for the solution of a given problem in a finite number of steps

Abstract Designed to perform a well-defined task. Finite description length. Guaranteed to terminate.

Feb 2005 -- MR CLINT - Lecture 1 22

Levels of Detail

Every algorithm assumes the existence of elementary instructions, e.g. spread the ingredients in the pan add 1 to Y

The idea is that these can be executed by the hardware directly.

There is nothing necessary about the particular instruction set. We could imagine greater or lesser amounts of detail.

We need to agree about the instruction set before describing an algorithm.

Feb 2005 -- MR CLINT - Lecture 1 23

Abstraction

Every algorithm could be described at the lowest level of detail.

However, the process of abstracting away from the elementary details is central to efficient description – for computers as well as humans. Prepare a sauce bordelaise and pour over the meat.

Computer programming languages embody higher levels of abstraction and allow more efficient descriptions

Feb 2005 -- MR CLINT - Lecture 1 24

Computer Program

A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem.

Feb 2005 -- MR CLINT - Lecture 1 25

Instructions vs. Execution Steps

1. Read X

2. Read Y

3. X = X-1

4. Y = Y+1

5. If X = 0 then Print(X) else goto 3

How many instructions?

How many execution steps?

Feb 2005 -- MR CLINT - Lecture 1 26

Computer Program

Finite Length Concrete (can be written down) Implements an algorithm. More than one program may implement the

same algorithm. Not all programs express good algorithms!

Feb 2005 -- MR CLINT - Lecture 1 27

Algorithms and Linguistics

Linguistic theory provides linguistic knowledge in the form of grammar rules theories about grammar rules

Putting knowledge to some use involves processing, e.g.: parsing generation

Feb 2005 -- MR CLINT - Lecture 1 28

Computational Linguistics – Issues

How are a grammar and a lexicon represented?

By what algorithm can we actually discover the structure of a sentence? actually generate a sentence to express a

particular meaning? How can we actually test a linguistic theory? Could an artificial system acquire a grammar

with limited exposure to grammatical sentences.

Feb 2005 -- MR CLINT - Lecture 1 29

Computers and LanguageTwin Goals

Scientific Goal:Contribute to Linguistics by adding a computational dimension.

Technological Goal: Develop machinery capable of handling human language that can support “language engineering”

Feb 2005 -- MR CLINT - Lecture 1 30

Computers and Language: Applications

Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Integrated Multimodal Tasks Machine Translation


Recommended