+ All Categories
Home > Documents > Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

Date post: 18-Dec-2015
Category:
Upload: barbra-pierce
View: 221 times
Download: 2 times
Share this document with a friend
Popular Tags:
31
Methods in Computational Linguistics II Queens College Lecture 1: Introduction
Transcript
Page 1: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

Methods in Computational Linguistics II

Queens College

Lecture 1: Introduction

Page 2: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

2

Methods in Computational Linguistics II

• 2nd semester of a two semester course providing instruction in – The basics of computer science and

programming (via python)– An introduction to techniques in computational

linguistics

Page 3: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

3

My background

• Research– Speech Synthesis, and Recognition– Prosody (Intonation)– Speech Segmentation– Non-native speech– Political speech, and other paralinguistics

• Computer Science professor at Queens and CUNY GC.

• Worked at IBM and Google

Page 4: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

4

Your Background

• Name.• What are your research interests in linguistics?• How do you expect computational linguistics to

fit into your work?– Are there techniques or applications that you are

particularly looking to learn

• Programming background?– 1 semester? more?

• Are you simultaneously taking Language Technologies

Page 5: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

5

Outline

• NLTK– Overview– Major Capabilities

• Searching and Sorting.– Linear (Sequential) search– Binary Search– Insertion sort– MergeSort

• Course Policies• Syllabus Review

Page 6: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

6

NLTK

• Natural Language Toolkit.

• A set of utilities in python that facilitate the processing of text.

Page 7: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

7

NLTK Functionality

• Accessing corpora• String processing• Collocation discovery• Part of speech tagging• Classification and Clustering• Evaluation Metrics• Chunking• Parsing

Page 8: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

8

NLTK Functionality

• Semantic interpretation– first order logic, lambda calculus, model

checking

• Probability and estimation• WordNet Browsing• Chatbots

Page 9: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

9

NLTK as a resource

• This range of functionality is quite broad, and not necessarily cohesive.

• However, there are resources and tools (functions and objects) that underpin most major computational linguistics tasks.

Page 10: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

10

Major Computational Linguistics Tasks

• Syntax– Tagging– Parsing

• Semantics– Information Extraction– Semantic Role Labeling

• Phonology• Sentence Processing• Segmentation

• Summarization• Speech Recognition• Speech Synthesis• Information Retrieval• Sentiment Analysis• Authorship studies• Co-reference

resolution

Page 11: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

11

NLTK Resources

• NLTK also contained lexical material– Project Gutenberg– WordNet– Penn Treebank (subset)– Named Entity Recognition data– Inaugural addresses– Sentiment data– Names corpus– Switchboard (subset)– TIMIT– Webtext

Page 12: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

12

Quick Assignment

• Methods I used NLTK.• Homework 0

– Make sure that NLTK is installed and working correctly

– Install matplotlib to use nltk’s graphing functions.

• “Due” asap.

Page 13: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

13

One Question Pop Quiz

• Solve for p

Page 14: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

14

Math

• Computational Linguistics requires a not-quite-trivial amount of math.

• Statistics and probabilistic modeling form the pillars underlying these computational techniques.

• This involves counting and algebra.• Machine learning governs the classification and

clustering techniques that CL makes heavy use of.– Requires calculus, statistics, linear algebra.

Page 15: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

15

Math in this course

• Overview of probability.– Next class

• Algebra for evaluation, some common features

• Statistics for Naïve Bayes classification• Entropy in Decision Trees

Page 16: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

16

Outline

• NLTK– Overview– Major Capabilities

• Searching and Sorting.– Linear (Sequential) search– Binary Search– Insertion sort– MergeSort

• Course Policies• Syllabus Review

Page 17: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

17

Data Structures, Algorithms, etc.

• In computer science, there is a tight relationship between data structures and algorithms

• In general, the more complex the data structure– the more general or flexible the data and

relationships that can be represented– the faster algorithms can run

Page 18: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

18

Searching and Sorting

• Searching and sorting is a frequent example of the relationship between algorithm runtimes, and data structuring.

• Search: identify the location of a value, x, in a list, A.

• Sort: manipulate a list A, such that the values in A are increasing. A[i] <= A[i+1]

Page 19: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

19

Sequential Search

def search(A, x):for i in xrange(len(A)):

if A[i] == x:return i

return -1

Page 20: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

20

How long does sequential search take to run?

• Best case?

• Worst case?

• Average case?

Page 21: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

21

Binary Search

• If the list A is in increasing order, large chunks of the list can be be ignored.

def search(A, x):top = len(A)bottom = 0while bottom < top:mid = (top + bottom) / 2if A[mid] < x:bottom = mid + 1elif A[mid] > x:top = midelse:return midreturn -1

Page 22: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

22

How long does binary search take to run?

• Best Case?

• Worst Case?

• Average Case?

Page 23: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

23

Improvement of Binary Search

• Binary search is a significant improvement– log n < n

• However, Binary search requires that A is sorted.

• How long does it take to sort an Array and how does this impact the total runtime?

Page 24: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

24

Insertion Sort

• Sort the list [5, 2, 4, 6, 1, 3]

def insertionSort(A):for j in xrange(1, len(A)):

key = A[j]i = j - 1while i > -1 and A[i] >

key:A[i + 1] = A[i]i = i - 1

A[i + 1] = key

Page 25: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

25

How long does Insertion sort take to run?

• Best Case?

• Worst Case?

• Average Case?

Page 26: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

26

Can we sort faster?

• Yes.

• This requires recursion. • We’ll come back to this, but here is a first

example.

Page 27: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

27

Merge Sort

def mergeSort(A):if len(A) == 1:

return Amid = len(A) / 2Abottom = mergeSort(A[1:mid])Atop = mergeSort(A[mid +

1:len(A)])return merge(Abottom, Atop)

Page 28: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

28

Merge

def merge(A, B):C = []i = 0j = 0A.append(float('inf'))B.append(float('inf'))for k in xrange(len(A) + len(B)):

if A[i] < B[j]:C.append(A[i])i = i + 1

else:C.append(B[j])j = j + 1

return C

Page 29: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

29

How long does Merge Sort take to run?

• Hint: This is a (much) harder question.• Best Case?

• Worst Case?

• Average Case?

Page 30: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

30

Comparison of run times

Sorting Searching0 n

n*log(n) log(n)

How much searching do you need to do to make it worth sorting?

Page 31: Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

31

Class Structure and Policies

• Course website:– http://eniac.cs.qc.cuny.edu/andrew/methods2/syllabus.html

• Email list– Banner does not have an email function– Put your email address on the sign up sheet.


Recommended