Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | paulina-lynch |
View: | 215 times |
Download: | 1 times |
A probabilistic approach to language structure
Annarita Felici and Paul PalRoyal Holloway, University of London
Helsinki 2-4 June 2008
2-4 June 2008 QITL3
Outline
Field of investigationResearch goalsDataProbabilistic analysisInformation TheoryEntropy results
2-4 June 2008 QITL3
Field of investigation
Repetitive language structure in multilingual legal text
EU normative statements in translation
Languages of investigationEnglish, French, German and Italian
2-4 June 2008 QITL3
Field of investigation: legal norms
Deontic norms (from the Greek deon = duty). obligations, prohibitions, permissions and authorizations
Constitutive performatives The uttering of a performative is, or is part of, the doing of a certain kind of action or speech acts (Austin 1962)
Uttering a sentence = doing things
2-4 June 2008 QITL3
Other norm types
Logical necessity necessary requirements or competences
Non-binding norms guidelines, correct procedure
2-4 June 2008 QITL3
Research goals
1. To evaluate the degree of prescriptive standardization in French, German and Italian with reference to English
2. To predict translation equivalents in French, German and Italian
2-4 June 2008 QITL3
English legal drafting is highly standardized
The EU and the main English drafting suggest modal verbs for prescriptive norms (Coode 1843, Driedger 1976, Dickerson 1975, Thornton 1996)
Text types under investigation are repetitive and reusable
Text types under investigation can be more or less binding
under the conditions that:
2-4 June 2008 QITL3
DataMultilingual parallel corpus
Origin: EUCorpus size: 1.404.723 wordsText type:normativeType of docs: Secondary Legislation
(Regulations,Decisions,Directives, Recommendations)
Years: 2001-04 Languages: English, French, German,
Italian
2-4 June 2008 QITL3
Probabilistic Analysis
Information TheoryTo measure the amount of linguistic alternatives when translating a repetitive normative statement from English into French, German and Italian
= Quantifying information by reducing uncertainty
more alternatives = more uncertainty (high entropy) less alternatives = more standardization, certainty (low
entropy)
2-4 June 2008 QITL3
Probabilistic Variables
Categories of expressionsLinguistic forms
English modals Entry point for parallel
retrieval shall, must, may, can, should
2-4 June 2008 QITL3
Categories of expression
Constitutive norms and performatives
Logical necessityPermissions and authorizationsCapabilityNon-binding norms
2-4 June 2008 QITL3
Linguistic forms
Indicative (pres.)Modal verbs (mv)Verbal periphrasis (vp)Lexicalized modal expressions (le)Ellipses (0- correspondence)
2-4 June 2008 QITL3
Linguistic formsLinguistic equivalents used in constitutive and
performative normsREGULATIONSEnglish Italian raw no.% French raw no.% German raw no.%shall 1382 Indicative - Others 1192 86.3 Indicative - Others 1223 88.5 Indicative - Others 1125 81.4
MV 81 5.86 MV 58 4.2 MV 94 6.8dovere (58) devoir (32) muessen (34)potere (23) 9 neg. pouvoir(26) 14 neg. sollen (3)
duerfen (33) 9 neg.Koennen (24)
Verbal periphrasis 6 0.43 Verbal periphrasis 8 0.58 Verbal periphrasis 71 5.14va + Past part. (1) être à (1) sein…zu (61)essere tenuto (5) être tenu (7) haben…zu (10)Modal expressions 51 3.69 Modal expressions 43 3.11 Modal expressions 46 3.33vietare (1) interdire (1) untersagen (1)essere obbligatorio (15) être obligatoire (15) verbindlich sein (15)soggetto a obbligo (1) soumis à obligation (1) verpflichten (8)
avere il potere (1) avoir le droit (11) das Recht/Anspruch haben (10)avere il diritto (12) il importe (1) befuegt sein (2)consentire, autorizzare,
autoriser, octroyer,
bewilligt/zugelassen/erlaubt sein
occorre (3) il importe (1) gewaehrt sein (3)spettare (2)
Ellipsis 52 3.76 Ellipsis 50 3.62 Ellipsis 46 3.33TOT 1382 TOT 1382 TOT 1382
2-4 June 2008 QITL3
Linguistic formsLinguistic equivalents used to convey
permissions and authorizations
REGULATIONSItalian raw no.% French raw no.% German raw no.%
may 294 Indicative - Others 34 11.56 Indicative - Others 36 12.24 Indicative - Others 31 10.54MV 246 83.67 MV 250 85.03 MV 247 84.01potere (45) pouvoir (249) koennen (218)dovere (1) neg. devoir (1) neg. duerfen (27)
sollen (1)moegen (1)
Verbal periphrasis 0 0 Verbal periphrasis 0 0 Verbal periphrasis 1 0.34sind…zu (1)
Oth. Modal express. 6 2.041 Oth. Modal express. 2 0.68 Oth. Modal express. 7 2.381avere facoltà(3) permettre (1) zulaessig sein (4)essere consentito/ammesso être habilite (1) berechtigt/ essere abilitato (1)elllipsis 8 2.721 elllipsis 6 2.041 elllipsis 8 2.721TOT 294 TOT 294 TOT 294
English MV
2-4 June 2008 QITL3
Given the English system of modality, which is the relative probability of choosing an equivalent modal verb in the translation of may or must and a different linguistic form as the equivalent of shall?
Is the probability of a choice in a system affected by a choice in another?
2-4 June 2008 QITL3
Information Theory
the information value or content h(p) is dependent on the probability of occurrence (p) of an event (Shannon 1949)
h(p) = - log (p) = log (1/p)
Entropy degree of uncertainty (= shortage of information due to the
large number of alternatives)
2-4 June 2008 QITL3
Probabilistic analysis
The frequency of occurrence (ni) of each linguistic form is associated with a category
A probability variable (pi) is derived from the estimated proportion of a particular linguistic form
2-4 June 2008 QITL3
Probabilistic analysis
In English P1 = p mv→ shall = n shall / n; p2 = pmv → must = nmust / n;
p3 = pmv →should = nshould / n; p4 = pmv → can = ncan / n;
p5 = pmv → may = nmay / n
In French, German and Italian p1 = pindicative + pmv + pvp + pme + pellipses;
p2 = pindicative + pmv + pvp + pme + pellipses
and so on.
2-4 June 2008 QITL3
Linguistic forms and frequencies of occurrences in the EU Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization
ENGLISH FRENCH GERMAN ITALIAN
a) Constitut ive norms and performatives
mv (0.655)
Pres. Ind.(0.58)
mv (0.03)
vp (0.003)
me (0.02)
ellipses (0.02)
Pres.Ind.(0.53)
mv (0.04)
vp (0.02)
me (0.02)
ellipses (0.02)
Pres. Ind. (0.56)
mv (0.04)
vp (0.002)
me (0.02)
ellipses (0.02)
c) Perm ission and aut horization
mv (0.14)
Pres. Ind.(0.017)
mv (0.118)
vp (0)
me (0.0009)
ellipses (0.0028)
Pres.Ind.(0.014)
mv (0.117)
vp (0.0004)
me (0.002)
ellipses (0.004)
Pres. Ind. (0.016)
mv (0.116)
vp (0)
me (0.0028)
ellipses (0.004)
2-4 June 2008 QITL3
Probabilistic approach
The sum of these probabilities produces different information values
The expected information content of a system is the sum of the information contents weighted by the probabilities for each possible outcome ⎥⎦⎤⎢⎣⎡−=∑==51 2)(logii ii ppH
2-4 June 2008 QITL3
Entropy : extrema Variations in the language-specific p(i) values of
linguistic forms produce distribution profiles reflecting the characteristics of the corresponding language.
Mathematically it can be shown that
If all the p(i) values are equal (equi-probable situation), the profile is a uniform distribution and results in maximum entropy.
If only one probability p(i) is maximum and the remaining p(i) values are zero, the entropy is minimum (e.g. English).
All other distributions lie between these two limits (e.g. French, German and Italian)
2-4 June 2008 QITL3
A concrete example
Regulation document in English, French, German and Italian + a fictitious language.
One category of expression: e.g. the constitutive norms.
5 linguistic forms for this category. Total number of modal verbs and
alternatives: 2075.
2-4 June 2008 QITL3
Constitutive norm
English French German
Italian Fictitious
mv 1382 58 94 81 276
ind 0 1223 1125 1192 276
vp 0 8 71 6 276
me 0 43 46 51 276
el 0 50 46 52 276
Frequency of occurrences of expression modes in 4 real languages and one fictitious language
2-4 June 2008 QITL3
Histogram of 5 modes of expression
Histogram plot of frequencies
0
200
400
600
800
1000
1200
1400
1600
English French German Italian Fictitious
pi
mv
vp
me
el
2-4 June 2008 QITL3
Comparison based on Entropy
Computed Entropy of Constitutive norm
EN H = 0 + Hmv + 0 + 0 + 0 = 0.405FR H = Hind + Hmv + Hvp + Hme + Hme =0.857GE H = Hind + Hmv + Hvp + Hme + Hme =1.08IT H = Hind + Hmv + Hvp + Hme + Hme =0.88FI H = Hind + Hmv + Hvp + Hme + Hme =2.32
2-4 June 2008 QITL3
Computed Entropy of constitutive norms (English, French, German, Italian and Fictitious)
Entropy of real and fictitous languages (Constituve norm)
0
0.5
1
1.5
2
2.5
English French German Italian Fictititous
2-4 June 2008 QITL3
Entropy results
1. In the EU Regulation according to the 5 categories of expression(1. Constitutive and performative norms, 2. Logical necessity, 3.Permissions and authorizations, 4.Capability, 5. Non-binding norms)
2. In the EU Secondary Legislation overall according to the 4 types of documents
(Regulations, Decisions, Directives, Recommendations)
2-4 June 2008 QITL3
Entropy in the EU Regulation
0
0.2
0.4
0.6
0.8
1
1.2
cost./perf.norms
logicalnecessity
perm/author. capability non-bindingnorms
English
Italian
French
German
2-4 June 2008 QITL3
Entropy resultsEU Regulation
Logical necessity, permissions and authorizations and capability (< entropy)
quite standardized in the 4 languages = almost equivalent translations
Constitutive performative norms (> entropy)
translation is more difficult to predict Definitions, const. statements, obligations FR: < entropy than IT DE: > entropy (VP sein/haben…zu)
2-4 June 2008 QITL3
Entropy resultsEU Regulation
Non -binding normsfairly amount of variation among the
4 languagesFR/IT: >entropy DE: < entropy (should is most likely
translated with sollen- Soll-Vorschriften)
2-4 June 2008 QITL3
Entropy overall the 4 EU documents
0
0.5
1
1.5
2
2.5
3
3.5
English Italian French German
Decisions
Directives
Regulations
Reccomendations
2-4 June 2008 QITL3
Entropy resultsEU Secondary Legislation
Regulations and Decisions (< entropy)
Direct applicability of the norms = more precision and standardization
FR looks more standardized than IT and DEDirectives (> entropy than Reg. and Dec.)
Binding only as to the result to be achievedRecommendations (> entropy)
Not-binding: more freedom DE : sollen
2-4 June 2008 QITL3
Conclusions
Given certain conditions, it is possible to predict with some certainty the occurrence of a particular factor
If applied to repetitive texts, entropy analysis can enhance research in langauge testing, evaluation and in the development of automated translation’s tools
2-4 June 2008 QITL3
References Austin, J. L. 1962. How to do things with words.Oxford:
Oxford University Press. Coode, G. 1843. Legislative Expressions. Appendix to
the Report of the Poor Law Commissioners on Local Taxation. Published separately 1845, 2nd Ed.1852.
Driedger, E. A. 1976. The Composition of legislation. Legislative forms and precedents (2nd Ed.). Ottawa:The Department of Justice
Shannon, Cand W. Weaver. 1963 (1949) The mathematical theory of communication. Urbana: University of Illinois Press.USA.
Thornton G.C. 1996. Legislative Drafting (4th Ed.). Butterworths, London.
http://publications.europa.eu/code/en/en-6000000.htm