The Semantics and Pragmatics
of Natural Language
Daniela GÎFU
http://profs.info.uaic.ro/~daniela.gifu/
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI
FACULTY OF COMPUTER SCIENCE
Course 1
SPNL OVERVIEW
2
https://profs.info.uaic.ro/~daniela.gifu/
Who am I?
“Alexandru Ioan Cuza” University of Iași
T H E H A L L O F T H E L O S T S T E P S
Faculty of Computer Science
BE AMONG THE FIRST…..
Romanian Academy
What is this course about?
➢ Meaning and Natural Language Processing (NLP)
➢ Computational Semantics
➢ Computational Pragmatics
8
Familiarization
with relevant terminology
• Semantics
• Pragmatics
• Natural language
• Computational Linguistics
• Natural Language Processing
…9
Simulation of human (natural)
intelligence by machines
Interdisplinary field ~
Scientific study of
language from a
computational
perspective
A discipline that spans
theory and practice to
understand
computer systems and
networks at a deep level.10
Computational Linguistics (CL)
vs.
Natural Language Processing (CLP)
11
CL = gives theoretical background (computational
theories on language), linguistics models.
NLP = applied CL, including:
- natural language technology (NLT)
- human language technology (HLT)
12
Researches
Engineering techniques have to be underpinned by scientific
understanding…
Good performances in some
tasks when large amount of data
(with annotation) are available
Spoken language
- speech processing (from speech to text to syntax and
semantics to speech) - https://speechlogger.appspot.com/ro/
Ex: mobile
Written language – my area of interest
Language in correlation with other modalities
(multimodality)
- speech
- intonation
- image
Ex: GPS (Global Positioning System)13
Natural Language Technology
Document segmentation and interpretation
– cleaning (elimination of dots, enhancing contrast,
etc.)
– separation of text from image, curved lines...
– recognizing printed, semi-uncial characters, etc.
• Optical Character Recognition (OCR)
~ 100% accuracy in scanning printed Latin script
based material
Challenge in OCR
14
Written Language Technologies
Students?
15
OCR Handwriting – Why?
= presents some unique particularities
= many varieties of cursive writing
see: https://pdf.iskysoft.com/ocr-pdf/handwriting-ocr.html
16
OCR Handwriting very challenging
= the interpretation of physician handwriting (Rasmussen,
L.V. et al., 2012; Broda. B. & Piasecki, M., 2007)
= analysis of old handwritten documents (useful for linguists,
musicians, historians, etc.)
Document Image
Analysis
PR = a sub-topic of machine learning
(description or classification (recognition) of
measurements.
17
Differences between CL Approaches
•Analysis and understanding of written language
– sub-syntactic processing
• lexical units
• sentence splitting
• clause borders
• part of speech and morphological information
• lemmas
• entity names
• groups (nominal, verbal, prepositional, etc.)
and lexical attractions (collocations)
18
Written Language Technologies
• Language analysis and understanding
– semantic and discourse processing
• semantic disambiguation → word senses
• semantic roles labeling → NLTK
• rhetorical structure of discourse and dialogue →
RST (Rhetorical Structure Theory)
• anaphora resolution → StandfordCoreNLP
• text summarization → Machine Learning
19
Written Language Technologies
20
the study of mathematical structures and methods that are
of importance to linguistics.
→ Phonetics → Phonology → Morphology →
Syntax and → Semantics → and…
Sociolinguistics → Language Acquisition.
20
Mathematical Linguistics
Mathematical Linguistics before Computational Linguistics….
ML ⇔ CL?
= art of solving problems that need to analyze
(or generate) natural language text.
Find that metrics for a good solution to the
engineering problem…
NLP
Google Translate – Don’t blame!!!!
Romanian = Luceafărul de dimineață
English = The morning gentleman (bad answer)
= Morning star (good answer)
Why????
explains how human translators do their job...
21
Let’s try!
22
NLP – a subdomain of
Artificial Intelligence & Linguistics
Thematic Areas
- Linguistics - mathematical linguistics - computational
linguistics
- Formal Language
- Linguistic and Language Processing
- The grammatical structure of utterances: the sentence,
constituents, phrase, classifications and structural rules,
syntactic processing ...
- Parser or Syntax Analyzer
- Semantics & Pragmatics
= an area of Artificial Intelligence (AI) devoted to
creating computers that use NL as input and/or
output.
NLP
23
AI-hard problem
= machine reading
comprehension
= produces language
as output on the basis
of data input
= developing computational methods/models of human
linguistics behavior.
CL
▪ INFORMATION RETRIEVAL
▪ INFORMATION EXTRACTION
▪ MACHINE TRANSLATION
▪ QUESTION – ANSWERING
▪ SUMMARIZATION
▪ MACHINE READABLE DICTIONARIES
▪ SPELLING & GRAMMAR CHECKERS
…
24
Let’s describe and exemplify
2525
A discipline concerned with understanding written and spoken
language from a computational perspective.
- detecting synonymy (Grigonytė et al., 2010);
- developing WordNet (including Romanian - Gala et Mititelu,
2013), (Iftene and Balahur, 2007)...;
- WSD (Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș,
2002)...;
- semantic annotation (Garcia et al., 2012)...;
- reconstructing a diachronic morphology (Cristea et al.,
2007/2012)
- diachronic text classification (Mihalcea and Năstase, 2012;
Popescu and Strapparava, 2015), etc.
- epoch detection (Gifu, 2015/2016/2017)...;
CL – Applications
Tools developed
by students…
26
Linguistic & Language Processing
1. Linguistics
- Science of language. Includes:
✓ Sounds (phonology)
✓ Word formation (morphology)
✓ Sentence structure (syntax)
✓ Meaning (semantics) and understanding
(pragmatics)…
2. Levels of linguistic analysis
- Higher level → Speech Recognition (SR)
- Lower levels → Natural Language Processing (NLP)
27
Levels of Linguistic Analysis
NLP
Letters - strings
Morphemes
Words
Phrases & sentences
Meaning out of context
Meaning in context
Phonemes
Acoustic signal
Speech
Recognition
Phonetics – production and perception of speech
Phonology – Sound patterns of language
Lexicon – Dictionary of words in a language
Morphology – Word formation and structure
Syntax – Sentence structure
Semantics – Intended meaning
Pragmatics – Understanding from external info
NLP Pipeline
Course purpose
28
29
MAIN CONCEPTS
1. Natural Language
- used by human beings for communication...
- sign, system, symbols, rule-set (or grammar)
2. Semantics
- literal meaning determined from a word, phrase,
sentence.
3. Pragmatics
- contextual meaning {situation, speaker, etc.}
30
Natural or ordinary language
• A system of speech symbols → (form criterion)
Types:
a) speech (spoken language)
b) signing (written language) - the representation of a spoken or
gestural language.
• The most important means of human communication →
(function criterion)
31
Natural Language…• Multiplicity of languages
32
Formal Language_I
1. Symbol
- a character, an abstract entity that has no meaning by
itself
Ex: lettters, digits and special characters
2. Alphabet
- finite set of symbols
- often denoted by Σ
Ex:
B = {0, 1} says B is an alphabet of two symbols, 0 and 1
C = {a, b, c} – C an alphabet of 3 symbols, a, b and c
* More about formal language:
http://www.its.caltech.edu/~matilde/FormalLanguageTheory.pdf
33
Formal Language_II
3. String or word
- a finite sequence of symbols from an alphabet
Ex: 01110 and 111 are strings from the alphabet B above
aaabccc and b are strings from the C above
4. Sentence
- a string of words.
Ex: I saw the gentleman with the hat.
String = a b c d e b f
34
Formal language_III
Define possible relations of parts of a string to each other?
A.
[I] saw the gentleman [with the binocular] = [a] b c d [e b f]
B.
I saw [the gentleman with the binocular] = a b [c d e b f ]
We can represent structures with trees…
I saw the gentleman with the binocular. I saw the gentleman with the binocular.
35
Formal Language_IV
5. Language
- a set of strings of symbols from an alphabet.
6. Natural Language or ordinary language
- open-ended = built on 3 different knowledge components: the
sound of words - phonology; the meaning of words -
semantics; the grammatical rules according to which words are
put together - syntax.
7. Formal language
- a set L of sequences/strings over some finite alphabet Σ
- described using formal grammars (a set of rules for strings,
specified to it).
- many application (e.g., Prognosis wearable system)
36
Formal Language_VContext-Free Grammars (CFG) - a finite set of grammar rules https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm
= a quadruple (N, T, P, S) , where:
N = a finite set of non-terminal symbols (character or variable).
Note! Each n ∈ N = type of phrase/clause in the sentence.
T = a finite set of terminals (an alphabet, defined by the grammar) disjoint of N: N ∩ T = NULL.
P = a finite set of (rewrite) rules or productions of the grammar, from N to
P: N → (N ∪ T)*
Note! The left-hand side of the production rule P does have any right context or left
context. * = Kleene star operation = unary operation on sets of strings or sets of symbols or
characters → a set N is written as N* (used for regular expressions).
Ex: {"a", "b", "c"}* = {ε, "a", "b", "c", "aa", "ab",
"ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab",
...} - {ε} (the language consisting only of the empty string)
S = start symbol/start symbol, used to represent the whole sentence.
37
Main Concepts - IICONCLUSIONS
Computational semantics and pragmatics:
➢ automatic construction of semantic representations for NL
expressions (in context).
➢ automatic inferences over the representations.
Major Issues:
➢Ambiguity of various levels:
lexical, syntactic, semantic, pragmatic
➢ Interface between LF from linguistic form and context of use
(essential for modelling anaphora).
Tools used include:
➢ Information: syntax, world knowledge, lexical semantics,
corpora…
➢ Inference: logic (model checkers and theorem proving), machine
learning, statistics…
38
Semester Homework:
1. Each student has to present a paper about
his/her SEMEVAL task that guide final project
- https://aclweb.org/anthology/
between 2018-2021
EMNLP (Empirical Methods on Natural Language
Processing)
ACL (Association of Computational Linguistics)
EACL (European Association of Computational
Linguistics)
COLING (International Conference on
Computational Linguistics) …
39
Final project: SEMEVAL 2022
Groups structured by 2-3 students:
- 1-2 humanists & 1 computer scientists prepare a paper
at the SEMEVAL-2022 based to their research
supervised constantly -
https://semeval.github.io/SemEval2022/tasks
40
Projects steps – next time
1. Form a team...
2. Choose a task
3. Define the teamwork
4. Establish the modular structure
5. Edit the paper – a possible structure
41
5. Edit the paper – making and outline
* Choosing a Title
* Abstract (executive summary) & Keywords
* Introduction (the new approach; background
information; research problem/question; theoretical
framework)
* SOTA (citation tracking; content alert services;
evaluating sources; primary sources; secondary sources…)
* Methodology (qualitative methods; quantitative
methods)
* Results
* Discussion
* Conclusions and future work
* References
Thank you!
42