+ All Categories
Home > Documents > CS 4032 Natural Language Processing · 5/1/2020  · Natural Language Processing Budditha Hettige...

CS 4032 Natural Language Processing · 5/1/2020  · Natural Language Processing Budditha Hettige...

Date post: 18-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
76
CS 4032 N atural L anguage P rocessing Budditha Hettige Department of Computer Engineering Faculty of Computing General Sir John Kotelawala Defence University
Transcript
  • CS 4032

    Natural Language Processing

    Budditha HettigeDepartment of Computer Engineering

    Faculty of Computing

    General Sir John Kotelawala Defence University

  • Course details

    Course Code CS 40322

    Course Title NATURAL LANGUAGE PROCESSING

    Course Type Elective

    Credits 02

    Hours Allotted

    Theory 30

    Total60

    Practical30

    Assignments/Tutorials

    Budditha Hettige (http://budditha.wordpress.com) 2

  • Course details

    • Assignment (30%)

    • Final Examination (70%)

    • References

    – Russell and Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, 2003

    – D. Jurafsky, JH. Martin, Speech and Language Processing

    – S.Bird, E.Klein, Natural Language Processing with Python

    – Natural Language Toolkit, http://www.nltk.org/

    • Materials

    – https://budditha.wordpress.com/natural-language-processing/

    Budditha Hettige (http://budditha.wordpress.com) 3

  • Contents • Introduction

    • Speech Processing

    – Text-to-Speech,

    – Speech Recognition

    • Words

    – Morphology

    – Part-of-Speech Tagging

    – Morphological Processing

    • Syntax

    – Word Classes

    – Context-Free Grammars,

    – Parsing

    – Language and Complexity

    • Semantics

    – Representing Meaning,

    – Semantic Analysis,

    – Lexical Semantics, Word Sense

    Disambiguation and Information

    Retrieval

    • Pragmatics

    – Discourse, Dialogue and

    Conversational Agents,

    • NLP Applications

    – Machine Translation System

    Budditha Hettige (http://budditha.wordpress.com) 4

  • Budditha Hettige (http://budditha.wordpress.com) 5

  • Where does it fit in the CS taxonomy?

    Computers

    Artificial Intelligence AlgorithmsDatabases Networking

    Robotics SearchNatural Language Processing

    Information

    Retrieval

    Machine

    TranslationLanguage

    Analysis

    Semantics Parsing

    Budditha Hettige (http://budditha.wordpress.com) 6

  • NLP Thoughts

    Budditha Hettige (http://budditha.wordpress.com) 7

  • What is Natural Language Processing?

    • Natural Language Processing (NLP) is a

    computational treatment of the Natural (human)

    Languages

    – Natural Language Understanding

    – Natural Language Generation

    • Pipeline

    Natural Language

    Natural Language

    Computer

    Understanding Generation

    Budditha Hettige (http://budditha.wordpress.com) 8

  • What is Natural Language Processing?

    • Natural Language Processing

    – Process information contained in natural

    language text.

    • Also known as

    – Computational Linguistics (CL),

    – Human Language Technology (HLT)

    – Natural Language Engineering (NLE)

    • Can machines understand human language?

    Budditha Hettige (http://budditha.wordpress.com) 9

  • Why Study NLP?

    • A hallmark of human intelligence.

    • Text is the largest repository of human knowledge

    and is growing quickly.

    – emails, news articles, web pages, scientific

    articles, insurance claims, customer complaint

    letters, transcripts of phone calls, technical

    documents, government documents, patent

    portfolios, court decisions, contracts, ……

    • Are we reading any faster than before?

    Budditha Hettige (http://budditha.wordpress.com) 10

  • Why are language technologies needed?

    • Many companies would make a lot of money if they

    could use computer programmes that understood text

    or speech. Just imagine if a computer could be used

    for:

    – Answering the phone, and replying to a question

    – Understanding the text on a Web page to decide

    who it might be of interest to

    – Translating a daily newspaper from Japanese to

    English (an attempt is made to do this already)

    – Understanding text in journals / books and building

    an expert systems based on that understanding

    Budditha Hettige (http://budditha.wordpress.com) 11

  • Dreams??• NLP Applications

    – Show me Star Trek..?? (Talk to your TV set)

    – Will my computer talk to me like another human ??

    – Will the search engine get me exactly what I am looking

    for??

    – Can my PC read the whole newspaper and tell me the

    important news only..??

    – Can my palmtop translate what that Japanese lady is

    telling me.. ??

    – Can my PC do my English homework ??

    Budditha Hettige (http://budditha.wordpress.com) 12

  • NLP Applications

    • Question answering

    – Who is the first Taiwanese president?

    • Text Categorization/Routing

    – e.g., customer e-mails.

    • Text Mining

    – Find everything that interacts with BRCA1.

    • Machine Translation

    • Language Teaching/Learning

    – Usage checking

    • Spelling correction

    – Is that just dictionary lookup?

    Budditha Hettige (http://budditha.wordpress.com) 13

  • Application areas• Text-to-Speech & Speech recognition

    • Natural Language Dialogue Interfaces to Databases

    • Information Retrieval

    • Information Extraction

    • Document Classification

    • Document Image Analysis

    • Automatic Summarization

    • Text Proofreading – Spelling & Grammar

    • Machine Translation

    • Story understanding systems

    • Plagiarism detection

    • Can u think of anything else ??

    Budditha Hettige (http://budditha.wordpress.com) 14

  • Relevant Scientific Conferences

    • Association for Computational Linguistics (ACL)

    • North American Association for Computational Linguistics (NAACL)

    • International Conference on Computational Linguistics (COLING)

    • Empirical Methods in Natural Language Processing (EMNLP)

    • Conference on Computational Natural Language Learning (CoNLL)

    • International Association for Machine Translation (IMTA)

    15Budditha Hettige (http://budditha.wordpress.com) 15

  • Early days..

    • How to measure Intelligence of a Machine?

    • Turing test – Alan Turing (1950)

    – A machine can be accepted to be intelligent if it can fool a judge that its human over a tele-typing exercise.

    • ELIZA by Weizenbaum (1966)

    – Pretends to be a psychiatrist and converses with a user on his problems.

    – Uses Keyword pattern matching

    – Many users thought the machine really understood their problem.

    – Many such systems exist now. E.g. Alan, Alice, David Can such tests be taken as a measure for Intelligence ?

    Budditha Hettige (http://budditha.wordpress.com) 16

  • Early days..• SHRDLU

    – Can understand Natural Language command.

    – Developed by Terry Winograd MIT AI Lab (1968 –70) using Lisp.

    – Works on a “Blocks World” a simulated environment in which blocks like coloured cubes, cylinders, pyramids can be moved around, placed over each other, etc.

    – Understands a bit of anaphora.

    – Memory to store history.

    – Successful demonstration of AI.

    Budditha Hettige (http://budditha.wordpress.com) 17

  • The problem

    • When people see text, they understand its meaning

    • When computers see text, they get only character

    strings (and perhaps HTML tags)

    • We'd like computer agents to see meanings and be

    able to intelligently process text

    • These desires have led to many proposals for

    structured, semantically marked up formats

    • But often human beings still resolutely make use of

    text in human languages

    Budditha Hettige (http://budditha.wordpress.com) 18

  • Knowledge of language needed

    • Phonetics and Phonology – The study of linguistic sounds.

    • Morphology – The study of the meaningful components of words

    • Syntax – The study of the structural relationships between words.

    • Semantics – The study of meaning.

    • Pragmatics – The study of how language is used to accomplish goals.

    • Discourse – The study of linguistic units larger than a single utterance

    Budditha Hettige (http://budditha.wordpress.com) 19

  • Why is NLP difficult?

    • Computers are not brains

    – There is evidence that much of language understanding is built-in to the human brain

    • Computers do not socialize

    – Much of language is about communicating with people

    • Key problems:

    – Representation of meaning

    – Language presupposed knowledge about the world

    – Language only reflects the surface of meaning

    – Language presupposes communication between people

    Budditha Hettige (http://budditha.wordpress.com) 20

  • Why is NLP difficult?

    • The hidden structure of language is highly

    ambiguous

    • Structures for: Fed raises interest rates 0.5% in

    effort to control inflation (NYT headline 5/17/00)

    Budditha Hettige (http://budditha.wordpress.com) 21

  • Hidden Structure

    • English plural pronunciation

    – Toy + s → toyz ; add z

    – Book + s → books ; add s

    – Church + s → churchiz ; add iz

    – Box + s → boxiz ; add iz

    – Sheep + s → sheep ; add nothing

    • What about new words?

    – Bach + ‘s → boxs ; why not boxiz?

    Budditha Hettige (http://budditha.wordpress.com) 22

  • Language subtleties

    • Adjective order and placement

    – A big black dog

    – A big black scary dog

    – A big scary dog

    – A scary big dog

    A black big dog

    • Antonyms

    – Which sizes go together?

    • Big and little

    • Big and small

    • Large and small

    Large and little

    Budditha Hettige (http://budditha.wordpress.com) 23

  • World Knowledge is subtle

    • He arrived at the lecture.

    • He chuckled at the lecture.

    • He arrived drunk.

    • He chuckled drunk.

    • He chuckled his way through the lecture.

    He arrived his way through the lecture.

    Budditha Hettige (http://budditha.wordpress.com) 24

  • Words are ambiguous

    (have multiple meanings)

    • I know that.

    • I know that block.

    • I know that blocks the sun.

    • I know that block blocks the sun.

    Budditha Hettige (http://budditha.wordpress.com) 25

  • Challenges in NLP: Ambiguity

    • Words or phrases can often be understood in

    multiple ways.

    – Teacher Strikes Idle Kids

    – Killer Sentenced to Die for Second Time in 10

    Years

    – They denied the petition for his release that was

    signed by over 10,000 people.

    – child abuse expert/child computer expert

    – Who does Mary love? (three-way ambiguous)

    Budditha Hettige (http://budditha.wordpress.com) 26

  • Where are the ambiguities?

    Budditha Hettige (http://budditha.wordpress.com) 27

  • Challenges in NLP: Variations

    • Syntactic Variations

    – I was surprised that Kim lost

    – It surprised me that Kim lost

    – That Kim lost surprised me.

    • The same meaning can be expressed in different

    ways

    – Who wrote “The Language Instinct”?

    – Steven Pinker, a MIT professor and author of “The

    Language Instinct”, ……

    Budditha Hettige (http://budditha.wordpress.com) 28

  • Parsing

    • Analyze the structure of a sentence

    The student put the book on the table

    D N V D DN NP

    NP

    PP

    NP

    VPNP

    S

    Budditha Hettige (http://budditha.wordpress.com) 29

  • Syntactic Variations contd.

    Teacher strikes idle kids

    N N V N

    NP

    VP

    NP

    S

    Teacher strikes idle kids

    N V A N

    NP

    VP

    NP

    S

    Budditha Hettige (http://budditha.wordpress.com) 30

  • How can a machine understand

    these differences?– Get the cat with the gloves.

    Budditha Hettige (http://budditha.wordpress.com) 31

  • Natural Languages vs. Computer Languages

    • Ambiguity is the primary difference between natural and computer languages.

    • Formal programming languages are designed to be unambiguous, i.e. they can be defined by a grammar that produces a unique parse for each sentence in the language.

    • Programming languages are also designed for efficient (deterministic) parsing, i.e. they are deterministic context-free languages (DCLFs).– A sentence in a DCFL can be parsed in O(n) time where n

    is the length of the string.

    Budditha Hettige (http://budditha.wordpress.com) 32

  • Natural Language Tasks

    • Processing natural language text involves many various syntactic, semantic and pragmatic tasks in addition to other problems.

    • Task can be divided into

    – Syntactic Tasks

    – Semantics Tasks

    – Pragmatics/Discourse Tasks

    – Other Tasks

    Budditha Hettige (http://budditha.wordpress.com) 33

  • Syntactic tasks:

    Word Segmentation• Breaking a string of characters (graphemes) into a

    sequence of words.• In some written languages (e.g. Chinese) words are not

    separated by spaces.• Even in English, characters other than white-space can be

    used to separate words [e.g. , ; . - : ( ) ]• Examples from English URLs:

    – jumptheshark.com jump the shark .com– myspace.com/pluckerswingbarmyspace .com pluckers wing barmyspace .com plucker swing bar

    Budditha Hettige (http://budditha.wordpress.com) 34

  • Syntactic tasks:

    Morphological Analysis• Morphology is the field of linguistics that studies the internal

    structure of words. (Wikipedia)

    • A morpheme is the smallest linguistic unit that has semantic meaning (Wikipedia)– e.g. “carry”, “pre”, “ed”, “ly”, “s”

    • Morphological analysis is the task of segmenting a word into its morphemes:

    – carried carry + ed (past tense)

    – independently in + (depend + ent) + ly

    – Googlers (Google + er) + s (plural)

    – unlockable un + (lock + able) ?

    (un + lock) + able ?

    Budditha Hettige (http://budditha.wordpress.com) 35

  • Syntactic tasks:

    Part Of Speech (POS) Tagging• Annotate each word in a sentence with a

    part-of-speech.

    • Useful for subsequent syntactic parsing and word sense disambiguation.

    I ate the spaghetti with meatballs.

    Pro V Det N Prep N

    John saw the saw and decided to take it to the table.

    PN V Det N Con V Part V Pro Prep Det N

    Budditha Hettige (http://budditha.wordpress.com) 36

  • Syntactic tasks:

    Phrase Chunking• Find all non-recursive noun phrases (NPs) and

    verb phrases (VPs) in a sentence.

    – [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs].

    – [NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ]

    Budditha Hettige (http://budditha.wordpress.com) 37

  • Syntactic tasks:

    Syntactic Parsing• Produce the correct syntactic parse tree for a

    sentence.

    Budditha Hettige (http://budditha.wordpress.com) 38

  • Semantic Tasks:

    Word Sense Disambiguation (WSD)

    • Words in natural language usually have a fair number of different possible meanings.

    – Ellen has a strong interest in computational linguistics.

    – Ellen pays a large amount of interest on her credit card.

    • For many tasks (question answering, translation), the proper sense of each ambiguous word in a sentence must be determined.

    Budditha Hettige (http://budditha.wordpress.com) 39

  • Semantic Tasks:

    Semantic Role Labeling (SRL)

    • For each clause, determine the semantic role played by each noun phrase that is an argument to the verb.

    agent patient source destination instrument

    – John drove Mary from Austin to Dallas in his Toyota Prius.

    – The hammer broke the window.

    • Also referred to a “case role analysis,” “thematic analysis,” and “shallow semantic parsing”

    Budditha Hettige (http://budditha.wordpress.com) 40

  • Semantic Tasks:

    Semantic Parsing

    • A semantic parser maps a natural-language sentence to a complete, detailed semantic representation (logical form).

    • For many applications, the desired output is immediately executable by another program.

    • Example: Mapping an English database query to Prolog:

    How many cities are there in the US?

    answer(A, count(B, (city(B), loc(B, C),

    const(C, countryid(USA))),

    A))

    Budditha Hettige (http://budditha.wordpress.com) 41

  • Pragmatics/Discourse Tasks:

    Anaphora Resolution/Co-Reference

    • Determine which phrases in a document refer to the same underlying entity.– John put the carrot on the plate and ate it.

    – Bush started the war in Iraq. But the president needed the consent of Congress.

    • Some cases require difficult reasoning.• Today was Jack's birthday. Penny and Janet went to the store.

    They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite. He will make you take it back."

    Budditha Hettige (http://budditha.wordpress.com) 42

  • Pragmatics/Discourse Tasks:

    Ellipsis Resolution• Frequently words and phrases are omitted from

    sentences when they can be inferred from context.

    "Wise men talk because they have something to say; fools, because they have to say something.“ (Plato)

    "Wise men talk because they have something to say; fools talk because they have to say something.“ (Plato)

    Budditha Hettige (http://budditha.wordpress.com) 43

  • 44

    Other Tasks:

    Information Extraction (IE)• Identify phrases in language that refer to specific types of

    entities and relations in text.

    • Named entity recognition is task of identifying names of people, places, organizations, etc. in text.

    people organizations places– Michael Dell is the CEO of Dell Computer Corporation and lives

    in Austin Texas.

    • Relation extraction identifies specific relations between entities.– Michael Dell is the CEO of Dell Computer Corporation and lives

    in Austin Texas.

    Budditha Hettige (http://budditha.wordpress.com) 44

  • Other Tasks:

    Question Answering• Directly answer natural language questions

    based on information presented in a corpora of textual documents (e.g. the web).– When was Barack Obama born? (factoid)

    • August 4, 1961

    – Who was president when Barack Obama was born?• John F. Kennedy

    – How many presidents have there been since Barack Obama was born?

    • 9

    Budditha Hettige (http://budditha.wordpress.com) 45

  • Text Summarization

    • Produce a short summary of a longer document or article.– Article: With a split decision in the final two primaries and a flurry of

    superdelegate endorsements, Sen. Barack Obama sealed the Democratic presidential nomination last night after a grueling and history-making campaign against Sen. Hillary Rodham Clinton that will make him the first African American to head a

    major-party ticket. Before a chanting and cheering audience in St. Paul, Minn., the first-term senator from Illinois savored what once seemed an unlikely outcome to the Democratic race with a nod to the marathon that was ending and to what will be another hard-fought battle, against Sen. John McCain, the presumptive Republican nominee….

    – Summary: Senator Barack Obama was declared the presumptive Democratic presidential nominee.

    Budditha Hettige (http://budditha.wordpress.com) 46

    http://projects.washingtonpost.com/congress/members/o000167/http://projects.washingtonpost.com/congress/members/c001041/http://projects.washingtonpost.com/congress/members/m000303/

  • Machine Translation (MT)

    • Translate a sentence from one natural language to another.

    – Hasta la vista, bebé

    Until we see each other again, baby.

    Budditha Hettige (http://budditha.wordpress.com) 47

  • Assignment 1

    • Find some NLP tool or application and demonstrate

    how it work? Including (5 min presentation + system

    demonstration)

    – What is ?

    – Technology

    – Features

    – Who it work

    – What wee can do from that

    Budditha Hettige (http://budditha.wordpress.com) 48

  • Applications

  • Applications

    • What uses of the computer involve language?

    • What language use is involved?

    • What are the main problems?

    • How successful are they?

    Budditha Hettige (http://budditha.wordpress.com) 50

  • Speech applications• Speech recognition (Speech-to-text)

    – Uses• As a general interface to any text-based application

    • Text dictation

    • Speech understanding – Not the same: computer must understand intention, not necessarily

    exact words

    – Uses• As a general interface to any application where meaning is important

    rather than text

    • As part of speech translation

    • Difficulties– Separating speech from background noise

    – Filtering of performance errors (disfluencies)

    – Recognizing individual sound distinctions (similar phonemes)

    – Variability in human speech

    – Ambiguity in language (homophones)

    Budditha Hettige (http://budditha.wordpress.com) 51

  • Speech applications

    • Voice recognition – Not really a linguistic issue

    – But shares some of the techniques and problems

    • Text-to-speech (Speech synthesis)– Uses:

    • Computer can speak to you

    • Useful where user cannot look at (or see) screen

    – Difficulties

    • Homograph disambiguation

    • Prosody determination (pitch, loudness, rhythm)

    • Naturalness (pauses, disfluencies?)

    Budditha Hettige (http://budditha.wordpress.com) 52

  • Word processing

    • Check and correct spelling, grammar and style

    • Types of spelling errors– Non-existent words

    • Easy to identify

    • But suggested correction not always appropriate

    – Accidental homographs

    • Deliberate ‘errors’– Foreign words

    – Proper names, neologisms

    – Illustrations of spelling errors!

    Budditha Hettige (http://budditha.wordpress.com) 53

  • Better word processing

    • Spell checking for homonyms

    • Grammar checking

    • Tuned to the user

    – You can (already) add your own auto-corrections

    – Non-native users (‘Interference checking’)

    – Dyslexics and other special needs users

    • Intelligent word processing

    – Find/replace that knows about morphology, syntax

    Budditha Hettige (http://budditha.wordpress.com) 54

  • Text prediction

    • Speed up word processing

    • Facilitate text dictation

    • At lexical level, already seen in SMS

    • More sophisticated , might be based on corpus of previously seen texts

    • Especially useful in repeated tasks

    – Translation memory

    – Authoring memory

    Budditha Hettige (http://budditha.wordpress.com) 55

  • Dialogue systems

    • Computer enters a dialogue with user– Usually specific cooperative task-oriented dialogue

    – Often over the phone

    – Examples?

    • Usually speech-driven, but text also appropriate

    • Modern application is automatic transaction processing

    • Limited domain may simplify language aspect

    • Domain ‘model’ will play a big part

    • Simplest case: choose closest match from (hidden) menu of expected answers

    • More realistic versions involve significant problems

    Budditha Hettige (http://budditha.wordpress.com) 56

  • Dialogue systems

    • Apart from speech recognition and synthesis issues, NL components include …

    • Topic tracking

    • Anaphora resolution

    – Use of pronouns, ellipsis

    • Reply generation

    – Cooperative responses

    – Appropriate use of anaphora

    Budditha Hettige (http://budditha.wordpress.com) 57

  • (also know as)

    Conversation machines

    • Another old AI goal (cf. Turing test)

    • Also (amazingly) for amusement

    • Mainly speech, but also text based

    • Early famous approaches include ELIZA, which

    showed what you could do by cheating

    • Modern versions have a lot of NLP, especially

    discourse modelling, and focus on the language

    generation component

    Budditha Hettige (http://budditha.wordpress.com) 58

  • QA systems

    • NL interface to knowledge database

    • Handling queries in a natural way

    • Must understand the domain

    • Even if typed, dialogue must be natural

    • Handling of anaphorae.g. When is the next flight to Sydney?

    And the one after?

    What about Melbourne then?

    6.50

    7.50

    7.20

    OK I’ll take the last one.

    Budditha Hettige (http://budditha.wordpress.com) 59

  • IR systems

    • Like QA systems, but the aim is to retrieve

    information from textual sources that contain the

    info, rather than from a structured data base

    • Two aspects

    – Understanding the query (cf Google, Ask Jeeves)

    – Processing text to find the answer

    • Named Entity Recognition

    Budditha Hettige (http://budditha.wordpress.com) 60

  • Budditha Hettige (http://budditha.wordpress.com) 61/26

  • Budditha Hettige (http://budditha.wordpress.com) 62/26

  • Budditha Hettige (http://budditha.wordpress.com) 63/26

  • Named entity recognition

    • Typical textual sources involve names (people, places, corporations), dates, amounts, etc.

    • NER seeks to identify these strings and label them

    • Clues are often linguistic

    • Also involves recognizing synonyms, and processing anaphora

    Budditha Hettige (http://budditha.wordpress.com) 64

  • Automatic summarization

    • Renewed interest since mid 1990s, probably due to growth of WWW

    • Different types of summary

    – indicative vs. informative

    – abstract vs. extract

    – generic vs. query-oriented

    – background vs. just-the-news

    – single-document vs. multi-document

    Budditha Hettige (http://budditha.wordpress.com) 65

  • Automatic summarization

    • topic identification

    • stereotypical text structure

    • cue words

    • high-frequency indicator phrases

    • intratext connectivity

    • discourse structure centrality

    • topic fusion• concept generalization

    • semantic association

    • summary generation• sentence planning to achieve information compaction

    Budditha Hettige (http://budditha.wordpress.com) 66

  • Text mining

    • Discovery by computer of new, previously unknown

    information, by automatically extracting information

    from different written resources (typically Internet)

    • Cf data mining (e.g. using consumer purchasing

    patterns to predict which products to place close

    together on shelves), but based on textual

    information

    • Big application area is biosciences

    Budditha Hettige (http://budditha.wordpress.com) 67

  • Text mining

    • preprocessing of document collections (text categorization, term extraction)

    • storage of the intermediate representations

    • techniques to analyze these intermediate representations (distribution analysis, clustering, trend analysis, association rules, etc.)

    • visualization of the results.

    Budditha Hettige (http://budditha.wordpress.com) 68

  • Story understanding

    • An old AI application

    • Involves …

    – Inference

    – Ability to paraphrase (to demonstrate

    understanding)

    • Requires access to real-world knowledge

    • Often coded in “scripts” and “frames”

    Budditha Hettige (http://budditha.wordpress.com) 69

  • Machine Translation

    • Oldest non-numerical application of computers

    • Involves processing of source-language as in other applications, plus …– Choice of target-language words and structures

    – Generation of appropriate target-language strings

    • Main difficulty is source-language analysis and/or cross-lingual transfer implies varying levels of “understanding”, depending on similarities between the two languages

    • MT ≠ tools for translators, but some overlap

    Budditha Hettige (http://budditha.wordpress.com) 70

  • Machine Translation

    • First approaches perhaps most intuitive: look up words and then do local rearrangement

    • “Second generation” took linguistic approach: grammars, rule systems, elements of AI

    • Recent (since 1990) trend to use empirical (statistical) approach based on large corpora of parallel text– Use existing translations to “learn” translation

    models, either a priori (Statistical MT ≈ machine learning) or on the fly (Example-based MT ≈ case-based reasoning)

    – Convergence of empirical and rationalist (rule-based) approaches: learn models based on treebanks or similar.

    Budditha Hettige (http://budditha.wordpress.com) 71

  • Language teaching

    • CALL

    • Grammar checking but linked to models of

    – The topic

    – The learner

    – The teaching strategy

    • Grammars (etc) can be used to create language-

    learning exercises and drills

    Budditha Hettige (http://budditha.wordpress.com) 72

  • Assistive computing

    • Interfaces for disabled

    • Many devices involve language issues, e.g.

    – Text simplification or summarization for users

    with low literacy (partially sighted, dyslexic, non-

    native speaker, illiterate, etc.)

    – Text completion (predictive or retrospective)

    • Works on basis of probabilities or previous

    examples

    Budditha Hettige (http://budditha.wordpress.com) 73

  • Conclusion

    • Many different applications

    • But also many common elements

    – Basic tools (lexicons, grammars)

    – Ambiguity resolution

    – Need (but impossibility of having) for real-world knowledge

    • Humans are really very good at language

    – Can understand noisy or incomplete messages

    – Good at guessing and inferring

    Budditha Hettige (http://budditha.wordpress.com) 74

  • Question

    Sophia is a social humanoid robot developed by Hong

    Kong-based company Hanson Robotics. Discuss what

    are the NLP techniques use by the Sophia?

    Budditha Hettige (http://budditha.wordpress.com) 75

  • Next

    Budditha Hettige (http://budditha.wordpress.com) 76

    • Introduction

    • Speech Processing

    – Text-to-Speech,

    – Speech Recognition• Words

    – Morphology

    – Part-of-Speech Tagging

    – Morphological Processing

    • Syntax

    – Word Classes

    – Context-Free Grammars,

    – Parsing

    • Semantics

    – Representing Meaning,

    – Semantic Analysis,

    – Lexical Semantics, Word Sense

    Disambiguation and Information

    Retrieval

    • Pragmatics

    – Discourse, Dialogue and

    Conversational Agents,

    • NLP Applications

    – Machine Translation System


Recommended