Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | nyssa-atkins |
View: | 22 times |
Download: | 5 times |
The Construction Of Bilingual The Construction Of Bilingual Knowledge Bank Based On a Bitext Knowledge Bank Based On a Bitext
Synchronous Parsing TechniqueSynchronous Parsing Technique
Computer Aided Translation UnitComputer Aided Translation UnitSchool of Computer Sciences School of Computer Sciences UUniversity niversity SScience cience MMalaysia alaysia
Example-Based Machine Translation Example-Based Machine Translation Based on the Synchronous SSTC Based on the Synchronous SSTC
Annotation SchemaAnnotation Schema
Presentation Outline
Synchronous Structured String-Tree Correspondence (SSTC)EBMT based on synchronous SSTC
Structured String-Tree Correspondence (SSTC)
Introduction
The Construction of a BKB Based on the Synchronous SSTC
Bitext World-level Mapping (Word Alignment)Bitext Synchronous Parsing Technique
TheThe SStructured SString-TTree CCorrespondence (SSTCSSTC)
SSTCSSTC = string + arbitrary tree structure + correspondence CorrespondenceCorrespondence = node(X/Y)
eat(2-3 (2-3 /0-4)
cats(1-2/0-2)
mice(3-4/3-4)
all(0-1/0-1)
0 all
1 cats 2 eat 3 mice
4
2-32-32-32-3
22 eat eat
3322 eat eat
33
eat(2-3/0-4)
cats(1-2/0-2)
mice(3-4/3-4)
all(0-1/0-1)
all cats eat mice0-1 1-2 2-3 3-4
0-40-40-40-4
00 all all
11 cats cats
22 eat eat
33 mice mice
4400 all all
11 cats cats
22 eat eat
33 mice mice
44
Tree Tree
String String
X:SNODEX:SNODE Y:STREEY:STREE
interval of the substring that corresponds to the node.interval of the substring that corresponds to the subtree having the node as root.
X:SNODEX:SNODE = =
Y:STREEY:STREE = =
eat(2-3/0-4)(2-3/0-4)
cats(1-2/0-2)(1-2/0-2)
mice(3-4/3-4)(3-4/3-4)
all(0-1/0-1)
0 all 1 cats 2 eat
3 mice
4
1-21-21-21-2
11catscats
2211catscats
22
Tree
String
eat(2-3/0-4)(2-3/0-4)
cats(1-2/0-2)(1-2/0-2)
mice(3-4/3-4)(3-4/3-4)
all(0-1/0-1)
all cats eat 3 mice
4
0-20-20-20-2
00
all all 11 cats cats
22 00
all all 11 cats cats
22
Tree
String
X:SNODE X:SNODE X:STREE X:STREE
Translation Translation unitsunits
English source sentence “ he picks the ball up”“ he picks the ball up”Malay target sentence “dia kutip bola itu”“dia kutip bola itu”
ENGLISHENGLISH MALAYMALAY
kutip[v](1-2/0-4)
itu[det](3-4/3-4)
dia[n](0-1/0-1)
bola[n](2-3/2-4)
0dia
1kutip
2bola
3itu
4
MEpick[v] up[p](1-2+4-5/0-5)
the[det](2-3/2-3)
he[n](0-1/0-1)
ball[n](3-4/2-4)
0he
1pick
2the
3ball
4up
5
IndexStree
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)
(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
English source sentence “ I did not give it to him”“ I did not give it to him” French target sentence “Je ne le lui ai pas donn锓Je ne le lui ai pas donné”
IndexStree
IndexSnode
(0-7,0-7)
(0-1,0-1)
(5-6, - )(4-5,2-3)(0-1,0-1)
(2-3, 1-2+5-6)
ENGLISHENGLISH FRENCHFRENCH Translation unitsTranslation units
ai[v]donné [v](4-5+6-7/0-1+2-5+6-7)
Je [n](0-1/0-1)
0Je
1ne
2le
3lui
4ai
5 pas6donné
7
lui [n](3-4/3-4)
F
le [n](2-3/2-3)
(0-2+3-7, 0-1+2-5+6-7)
(6-7,3-4)
Did [v] give [v](1-2+3-4/3-7)
I [n](0-1/0-1)
0I1did
2not
3give
4it
5to
6him
7
to [p](5-6/5-7)
E
it [n](4-5/4-5)
him [n](6-7/6-7)
not [neg](2-3/0-7)
ne[neg] pas[neg](1-2+5-6/0-7)
:
(1-2+3-4, 4-5+6-7)
miss [v](2-3/0-4)
hopefully [adv](0-1/0-1)
E
IndexStree
IndexSnode
(0-1,0-3) (1-2,6-7) (0-4,0-7) (3-4,3-4)
(3-4,3-4)(2-3,4-5+5-6)(1-2,6-7)(0-1,0-1+1-2+2-3)
ENGLISHENGLISH FRENCHFRENCH
Translation Translation unitsunits
Dale [n](3-4/3-4)Kim [n]
(1-2/1-2)
0 hopefully
1 Kim
2 miss
3 Dale
4
manque[v] á[p](4-5+5-6/0-7)
on[n]espére[v]que[c](0-1+1-2+2-3/0-3)
F
Dale [n](3-4/3-4)
Kim [n](6-7/6-7)
0on
1espére
2que
3 Dale4 manque
5 á6 Kim7
English source sentence “ hopefully Kim miss Dale”“ hopefully Kim miss Dale”French target sentence “on espére que Dale manque á Kim”“on espére que Dale manque á Kim”
EExample-BBased MMachine TTranslation (EBMTEBMT)
EBMT is the case-based reasoning approach to MT
EBMT uses translated examples of similar sentences to translate a given Source sentence into the target sentence.
BKBBKB
The general ArchitectureThe general Architecture forfor EBMTEBMT
Source sentence
For Source language
For Target language
correspondence
CombinationCombinationRetrieve
Corresponding TL examples
Retrieve Corresponding TL examples
Targetsentence
Find closest
related SL examples
Find closest
related SL examples
Different senses for the word “bank” :
bank 1: a land beside the river.
bank 2: a place to keep money.
E.g: The1 man2 keep1 his1 money1 in1 the1 bank2.
BKB
Replacement & CombinationReplacement & Combination
source sentence
tagger
target sentence
Tagged source
sentence
List of sub-synchronous SSTCs generated based on
the source sentence
List of Sub-synchronous SSTCs constructed from
the chosen example
A chosen closest synchronous SSTC
example
The resultant synchronous
SSTC
EBMT based on synchronous SSTC.
English sentence: The lamp is off.Malay translation:
Lampu itu padam.
English sentence:
He pick the ball up.Malay translation:
Dia kutip bola itu.
English sentence: The green signal turn on.
Malay translation: Isyarat hijau itu bertukar.
English sentence:
The old man drink tea.Malay translation:
Lelaki tua itu minum teh.
1
43
2
Source sentence: The old man picks the green lamp up The old man picks the green lamp up
padam(1)[v](2-3/0-3)
lampu(1)[n](0-1/0-2)
itu(1)[det](1-2/1-2)
0lampu
1itu
2padam
3
2Mis[v](2) off(1)[adv]
(2-3+3-4/0-4)
lamp(1)[n](1-2/0-2)
the(1)[det](0-1/0-1)
0the
1lamp
2is
3off
4
2E IndexStree
IndexSnode
(0-4,0-4)(0-2,0-2)(0-4,0-4)(0-1,1-2)
(0-1,1-2)(0-4,0-4)(1-2,0-1)
(2-3+3-4,2-3)
Set of synchronous SSTCs represents Example-base.
English sentence:
The lamp is off.Malay translation:
Lampu itu padam.
kutip(1)[v](1-2/0-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
0dia
1kutip
2bola
3itu
4
1M1E
pick(1)[v] up(1)[p](1-2+4-5/0-5)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
0he
1pick
2the
3ball
4up
5
IndexStree
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)
(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
English sentence:
He pick the ball up.Malay translation:
Dia kutip bola itu.
bertukar(2)[v](3-4/0-4)
isyarat(1)[n](0-1/0-3)
itu(1)[det](2-3/2-3)
hijau(1)[adj](1-2/1-2)
0Isyarat
1hijau
2itu
3bertukar
4
3Mturn(1)[v] on(1)[adv]
(3-4+4-5/0-5)
signal(2)[n](2-3/0-3)
the(1)[det](0-1/0-1)
green(1)[adj](1-2/1-2)
0the
1green
2signal
3turn
4on
5
3E IndexStree
IndexSnode
(0-5,0-4)(0-3,0-3)(0-1,2-3)(1-2,1-2)
(1-2,1-2)(0-1,2-3)(2-3,0-1)
(3-4+4-5,3-4)
English sentence: The green signal turn on.
Malay translation: Isyarat hijau itu bertukar.
English sentence: The old man drinks tea.
Malay translation: Lelaki tua itu minum teh.
drink (1)[v] (3-4/0-5)
man (1)[n](2-3/0-3)
the (1)[det](0-1/0-1)
0the
1old
2man
3drink
4 tea
5
old (1)[adj](1-2/1-2)
4E IndexStree
IndexSnode
(0-5,0-5)(0-3,0-3)(0-1,2-3)(1-2,1-2)
(1-2,1-2)(0-1,2-3)(2-3,0-1)(3-4,3-4)
tea (1)[n](4-5/4-5)
0lelaki
1tua
2itu
3minum
4teh
5
minum (1)[v](3-4/0-5)
lelaki (1)[n](0-1/0-3)
itu (1)[det](2-3/2-3)
tua (1)[adj](1-2/1-2)
4M
teh (1)[n](4-5/4-5)
(4-5,4-5)
(4-5,4-5)
SourceSource: the old man picks the green lamp upthe old man picks the green lamp up
0the1green2signal3turn4on5
turn[v]on[adv] (3-4+4-5/0-5)
signal[n](2-3/0-3)
the[det](0-1/0-1)
green[adj](1-2/1-2)
(2)
is[v]off[adv] (2-3+3-4/0-4)
lamp[n](1-2/0-2)
the[det](0-1/0-1)
0the1lamp2is3off4
(3)
pick[v] up[p](2-3+5-6/0-6)
the[det](3-4/3-4)
boy[n](1-2/0-2)
ball[n](4-5/3-5)
0the
1boy
2pick
3the
4ball
5up
6
(1)
the[det](0-1/0-1)
drink[v](3-4/0-5)
man[n](2-3/0-3)
the[det](0-1/0-1)
0the1old2man3drink4tea5
old[adj](1-2/1-2)
(4)
tea[n](4-5/4-5) green[adj]
(5-6/5-6)
lamp[n](6-7/ 4-7 )
the[det](4-5/4-5)
pick[v] (3-4/ 0-8 )
pick[v] (3-4/ 0-8 )
up[p](7-8/-)up[p](7-8/-)
old[adj](1-2/1-2)
man[n](2-3/0-3 )
the[det](0-1/0-1)
man[n](2-3/0-3)man[n]
(2-3/0-3)
the[det](0-1/0-1)the[det](0-1/0-1)
old[adj](1-2/1-2)old[adj](1-2/1-2)
green[adj](1-2/1-2)
green[adj](1-2/1-2)
pick[v] up[p](2-3+5-6/0-6)pick[v] up[p](2-3+5-6/0-6)
lamp[n](1-2/0-2)lamp[n](1-2/0-2)
Sub-synchronous Sub-synchronous SSTCSSTCs for the source sentences for the source sentence
green[adj](5-6/5-6)
lamp[n](6-7/ 4-7 )
the[det](4-5/4-5)
pick[v] (3-4/ 0-8 )
pick[v] (3-4/ 0-8 )
up[p](7-8/-)up[p](7-8/-)
old[adj](1-2/1-2)
man[n](2-3/0-3 )
the[det](0-1/0-1)
lelaki (1)[n](0-1/0-3)
itu (1)[det](2-3/2-3)
0lelaki
1tua
2itu
3
tua (1)[adj](1-2/1-2)
man(1)[n](2-3/0-3)
the(1)[det](0-1/0-1)
old(1)[adj](1-2/1-2)
0the
1old
2man
3
IndexStree
IndexSnode
(0-3,0-3)(0-1,2-3)(1-2,1-2)
(2-3,0-1)(0-1,2-3)(1-2,1-2)
(1)
pick(1)[v](3-4/3-4)
IndexStree
IndexSnode(3-4,3-4)
(3-4,3-4)
kutip(1)[v](3-4/3-4)
3pick
4 3kutip
4
(2)
lamp(1)[n](6-7/4-7)
the(1)[det](4-5/4-5)
green(1)adj](5-6/5-6)
4the
5green
6lamp
7
IndexStree
IndexSnode
(4-7,4-7)(4-5,6-7)(5-6,5-6)
(6-7,4-5)(4-5,6-7)(5-6,5-6)
lampu(1)[n](4-5/4-7)
itu(1)[det](6-7/6-7)
hijau(1)[adj](5-6/5-6)
4lampu
5hijau
6itu
7
(3)
up(1)[p](7-8/7-8)
IndexStree
IndexSnode(7-8,-)
(7-8,-) 7up
8
(4)
Selected closed exampleSelected closed example
Sub-synchronous Sub-synchronous SSTCSSTCs derived from the examples derived from the example
kutip(1)[v](1-2/0-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
0dia
1kutip
2bola
3itu
4
1M1Epick(1)[v] up(1)[p]
(1-2+4-5/0-5)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
0he
1pick
2the
3ball
4up
5
IndexStree
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
English sentence:
He pick the ball up.Malay translation:
Dia kutip bola itu.
he(1)[n](0-1/0-1)
IndexStree
IndexSnode(0-1,0-1)
(0-1,0-1)
dia(1)[n](0-1/0-1)
0he
1 0dia
1
(1)
(2)pick(1)[v](1-2/0-5)
IndexStree
IndexSnode(0-5,0-4)
(1-2,1-2)
kutip(1)[v](1-2/0-4)
1pick
2 1kutip
2
(4)up(1)[p](4-5/ -)
IndexStree
IndexSnode(- , -)
(4-5, -)
4up
5
bula(1)[n](2-3/2-4)
itu (1)[det](3-4/3-4)
2bula
3itu
4
ball(1)[n](3-4/2-4)
the(1)[det](2-3/2-3)
2the
3ball
4
IndexStree
IndexSnode
(2-4,2-4)(2-3,3-4)
(2-3,0-1)(3-4,2-3)
(3)
lelaki (1)[n](0-1/0-3)
itu (1)[det](2-3/2-3)
0lelaki
1tua
2itu
3
tua (1)[adj](1-2/1-2)
man(1)[n](2-3/0-3)
the(1)[det](0-1/0-1)
old(1)[adj](1-2/1-2)
0the
1old
2man
3
IndexStree
IndexSnode
(0-3,0-3)(0-1,2-3)(1-2,1-2)
(2-3,0-1)(0-1,2-3)(1-2,1-2)
(1)
pick(1)[v](3-4/3-4)
IndexStree
IndexSnode(3-4,3-4)
(3-4,3-4)
kutip(1)[v](3-4/3-4)
3pick
4 3kutip
4
(2)
lamp(1)[n](6-7/4-7)
the(1)[det](4-5/4-5)
green(1)adj](5-6/5-6)
4the
5green
6lamp
7
IndexStree
IndexSnode
(4-7,4-7)(4-5,6-7)(5-6,5-6)
(6-7,4-5)(4-5,6-7)(5-6,5-6)
lampu(1)[n](4-5/4-7)
itu(1)[det](6-7/6-7)
hijau(1)[adj](5-6/5-6)
4lampu
5hijau
6itu
7
(3)
up(1)[p](7-8/7-8)
IndexStree
IndexSnode(7-8,-)
(7-8,-) 7up
8
(4)
Sub-synchronous SSTCs.
he(1)[n](0-1/0-1)
IndexStree
IndexSnode
(0-1,0-1)
(0-1,0-1)
dia(1)[n](0-1/0-1)
0he
1 0dia
1
(1)
(2)pick(1)[v](1-2/0-5)
IndexStree
IndexSnode(0-5,0-4)
(1-2,1-2)
kutip(1)[v](1-2/0-4)
1pick
2 1kutip
2
(4)up(1)[p](4-5/ -)
IndexStree
IndexSnode(- , -)
(4-5, -)
4up
5
bula(1)[n](2-3/2-4)
itu (1)[det](3-4/3-4)
2bula
3itu
4
ball(1)[n](3-4/2-4)
the(1)[det](2-3/2-3)
2the
3ball
4
IndexStree
IndexSnode
(2-4,2-4)(2-3,3-4)
(2-3,0-1)(3-4,2-3)
(3)
Source sentenceSource sentence Example sentenceExample sentence
kutip(1)[v](1-2/0-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
dia kutip bola itu0-1 1-2 2-3 3-4
1E 1M
pick(1)[v] up(1)[p](1-2+4-5/0-5)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
he pick the ball up0-1 1-2 2-3 3-4 4-5
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
ReplacementReplacementEnglish Malay
Pick(1)[v](1-2/0-5)
pick(1)[v](1-2/0-5)
IndexStree
IndexSnode(0-5,0-4)
(1-2,1-2)
kutip(1)[v](1-2/0-4)
1pick
2 1kutip
2
(2)
Example partExample part
pick(1)[v](3-4/3-4)
IndexStree
IndexSnode(3-4,3-4)
(3-4,3-4)
kutip(1)[v](3-4/3-4)
3pick
4 3kutip
4
(2)
Source partSource part
kutip(1)[v](1-2/0-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
dia kutip bola itu0-1 1-2 2-3 3-4
1E 1M
pick (1)[v] up(1)[p](1-2+4-5/0-5)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
he pick the ball up0-1 1-2 2-3 3-4 4-5
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
pick (1)[v]pick (1)[v]1-21-2 0-50-5
(0-5,0-4)(0-5,0-4)
(1-2 ,1-2)(1-2 ,1-2)
kutip(1)[v]kutip(1)[v](1-2/0-4)(1-2/0-4)
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
0dia
1kutip
2bola
3itu
4
1E 1M
pick(1)[v] up(1)[p](3-4+4-5/3-4)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
0he
1pick
2the
3ball
4up
5
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
ReplacementReplacement
(1)
he(1)[n](0-1/0-1)
IndexStree
IndexSnode
(0-1,0-1)
(0-1,0-1)
dia(1)[n](0-1/0-1)
0he
1 0dia
1
Example partExample partlelaki (1)[n]
(0-1/0-3)
itu (1)[det](2-3/2-3)
0lelaki
1tua
2itu
3
tua (1)[adj](1-2/1-2)
man(1)[n](2-3/0-3)
the(1)[det](0-1/0-1)
old(1)[adj](1-2/1-2)
0the
1old
2man
3
IndexStree
IndexSnode
(0-3,0-3)(0-1,2-3)(1-2,1-2)
(2-3,0-1)(0-1,2-3)(1-2,1-2)
(1)
Source partSource part
MalayEnglish
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
dia(1)[n]dia(1)[n](0-1/0-1)(0-1/0-1)
bola(1)[n](2-3/2-4)
0dia
1kutip
2bola
3itu
4
1E 1M
pick(1)[v] up(1)[p](3-4+4-5/3-4)
the(1)[det](2-3/2-3)
he(1)[n]he(1)[n](0-1/0-1)(0-1/0-1)
ball(1)[n](3-4/2-4)
0he
1pick
2the
3ball
4up
5
IndexSnode
(0-5,0-4)(0-1,0-1)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
dia kutip bola itu0-1 3-4 2-3 3-4
1E 1M
pick(1)[v] up(1)[p](3-4+7-8/3-4)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
he pick the ball up0-1 3-4 2-3 3-4 7-8
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
he(1)[n](0-1/0-1)
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
dia kutip bola itu0-1 3-4 2-3 3-4
1E 1M
pick(1)[v] up(1)[p](3-4+7-8/3-4)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
he pick the ball up0-1 3-4 2-3 3-4 7-8
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
he(1)[n]he(1)[n]0-10-1 0-10-1
(0-1,0-1)(0-1,0-1)
(0-1,0-1)(0-1,0-1)
dia(1)[n]dia(1)[n](0-1/0-1)(0-1/0-1)
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
bola(1)[n](2-3/2-4)
0lelaki
1tua
2itu
3kutip
4bola
5itu
6
1M
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
lelaki(1)[n](0-1/0-3)
itu (1)[det](2-3/2-3)
tua (1)[adj](1-2/1-2)
1E
pick(1)[v] up(1)[p](3-4+7-8/3-4)
the(1)[det](2-3/2-3)
ball(1)[n](3-4/2-4)
0the
1old
2man
3pick
4the
5ball
6up
7
man(1)[n](2-3/0-3)
the(1)[det](0-1/0-1)
old(1)[adj](1-2/1-2)
kutip(1)[v](3-4/3-4)
lelaki tua itu kutip lampu hijau itu 0-1 1-2 2-3 3-4 4-5 5-6 6-7
1M
IndexSnode
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
(1-2+4-5,1-2)
IndexStree
lelaki(1)[n](0-1/0-3)
itu(1)[det](2-3/2-3)
tua(1)[adj](1-2/1-2)
1E
pick(1)[v] up(1)[p](3-4+7-8/0-8)
the old man pick the green lamp up0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8
man(1)[n](2-3/0-3)
the(1)[det]
(0-1/0-1)
old(1)[adj](1-2/1-2)
lamp(1)[n](2-3/0-3)
the(1)[det]
(0-1/0-1)
green(1)[adj]
(1-2/1-2)
(0-1,0-1)(2-4,2-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
lampu(1)[n](0-1/0-3)
itu(1)[det](2-3/2-3)
hijau(1)[adj](1-2/1-2)
(2-3,3-4)
GenerationGeneration
lelaki tua itu kutip lampu hijau itulelaki tua itu kutip lampu hijau itu
lelaki tua itu kutip lampu hijau itulelaki tua itu kutip lampu hijau itu
The translation
The translation for the source sentence is generated from the synchronous SSTC the Malay part, which is the String in the SSTC.
EBMTEBMT General Problems General Problems
How to utilize more than one example to translate one source sentence
lack of flexibility in representing translation relations between source and target substrings
The construction of well-formed target language sentences from extracted fragments of a BKB.
The treatment of wild linguistic phenomena, which are non-standard, e.g. crossed dependencies
Our approach Our approach overcomes these overcomes these
problems problems
Our approach Our approach overcomes these overcomes these
problems problems
Transfer Approach to MTTransfer Approach to MT
SourceSource TargetTarget
Ana
lysi
s
Ana
lysi
s
Synthesis
Synthesis
transfertransfer
The general ArchitectureThe general Architecture forfor EBMTEBMT
BKBBKB
Source sentence
For Source language
For Target language
correspondence
CombinationCombinationRetrieve
Corresponding TL examples
Retrieve Corresponding TL examples
Targetsentence
Find closest
related SL examples
Find closest
related SL examples
How to Construct The Bilingual Knowledge Bank
(BKB) or (Example-Base)
Substantial Reservation !!!
BiText: Text that is available in two languages.
The Construction of a The Construction of a BKBBKB Based on the Synchronous Based on the Synchronous SSTCSSTC
S: English T: MalayThe basic idea of example-based parsing is very simple: it is to find the corresponding representation for an input sentence based on the representations of similar sentences in the example-base.
Idea asas bagi penghuraian berasaskan-contoh adalah mudah: iaitu untuk mencari perwakilan yang sepadan bagi suatu ayat input berdasarkan perwakilan ayat yang serupa dalam pengkalan-contoh.
Based on Bitext Synchronous Parsing TechniqueBased on Bitext Synchronous Parsing Technique
Alignment Process
Bi-textApple Pie Parser
Parsing & POS Tagging for the
English source text
Parsing & POS Tagging for the
English source text
Build the SSTC for Malay target text based on the SSTC
for the English source text using the word alignment
Build the SSTC for Malay target text based on the SSTC
for the English source text using the word alignment
Bilingual dictionary Sentence level
word level
Phrase level
( S ( NP . ( ..(..)))
( S ( VP …( ..(..)))
Malay targetEnglish source
Compile the APP output into SSTC for the English source
text
Compile the APP output into SSTC for the English source
text
English source
Malay target
SSTC Editor English
sourceMalay target
Synchronous SSTC
English source
Malay target
BKB
SchemaSchema
Alignment Process
Bi-textApple Pie Parser
Bilingual dictionary Sentence level
word level
Phrase level
( S ( NP . ( ..(..)))
( S ( VP …( ..(..)))
Malay targetEnglish source
English source
Malay target
SSTC Editor English
sourceMalay target
Synchronous SSTC
English source
Malay target
BKB
Bitext World-level Mapping (Word Alignment)
Real texts are noisy:- Fertility = A single word in the source sentence may correspond to zero, one, two or more words in the target sentence and vice versa. - crossed dependencies (distortion) = Where human translators change and rearrange material so the target output text will not flow well according to the order of the source text.
0102030405060708090
100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400410420430440450
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520
source
targ
et
mapping
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500source
±n Context Window Word Alignment
S: English T: Malay
0The1basic2idea3of4example5-
6based7parsing8is9very10simple11:
12It13is14to15find16the17corresponding18representation19for20an21input22
sentence23based24on25the26representations27of28similar29sentences
30in31the32example33-34base35 .36
0Idea1asas2bagi3penghuraian4
berasaskan5-6contoh7adalah
8mudah9:10Iaitu11untuk12
mencari13perwakilan14yang15
sepadan16 bagi17suatu18ayat
19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27-
28contoh29.30
The correspondence between the source and the target is denoted by
an interval attached to each subtext according to its offset in the text.
±n Context Window Word Alignment
Find the TPCs between the source and the target.
(Bilingual dictionary)
Cognate words Computer Komputer
Bilingual dictionary
Dice coefficient Dice = 2prob(S,T) / [prob(S) + prob(T)]
-The probabilities of S and T to occur in the text.
-The probability of both to co-occur in the same
bitext segment.
±n Context Window Word Alignment
Find out the chains for all possible TPCs for a source word.
Example(4-5)contoh(6-7)
contoh(28-29)
basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)
bagi(2-3) penghuraian(3-4) berasaskan(4-5) – (5-6) contoh (6-7)
– (27-28) contoh(28-29)
basic(1-2) idea(2-3) of(3-4) example(4-5) – (5-6) based (6-7) parsing (7-8)
±n Context Window Word Alignment
For every chain, calculate the weight W:
)(*
1)(
)(log chainlen
gaplen
seqlenw
len(seq): length of continuous sequence of words. len(gap): length of the gaps between the words in the chain. len(chain): length of the chain.
Example(4-5)contoh(6-7)
contoh(28-29)
W=1.39
W=0.60
Bitext Synchronous Parsing Technique
The basic idea of example-based parsing is very simple
Idea asas bagi penghuraian berasaskan – contoh adalah mudah
S: English T: Malay
0The1basic2idea3of4example5-
6based7parsing8is9very10simple11:
12It13is14to15find16the17corresponding18representation19for20an21input22
sentence23based24on25the26representations27of28similar29sentences
30in31the32example33-34base35 .36
0Idea1asas2bagi3penghuraian4
berasaskan5-6contoh7adalah
8mudah9:10Iaitu11untuk12
mencari13perwakilan14yang15
sepadan16 bagi17suatu18ayat
19input20berdasarkan21 perwakilan22ayat23yang24 serupa25dalam26pengkalan27-
28contoh29.30
Alignment Process
Bi-textApple Pie Parser
Bilingual dictionary Sentence level
word level
Phrase level
( S ( NP . ( ..(..)))
( S ( VP …( ..(..)))
Malay targetEnglish source
English source
Malay target
SSTC Editor English
sourceMalay target
Synchronous SSTC
English source
Malay target
BKB
Apple Pie Parser (Apple Pie Parser (APPAPP))
It is a bottom-up probabilistic chart parser to find the parse tree for an input text (English).
It was developed at New York University.
It is Free, and available to download with the source code.
The parser generates a syntactic tree in PennTreeBank bracketing.
http://cs.nyu.edu/cs/projects/proteus/sekine
Apple Pie Parser (Apple Pie Parser (APPAPP))
The basic idea of example-based parsing is very simple
APPAPP
(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))
The representation structure and the POS for the source English is obtained
Alignment Process
Bi-textApple Pie Parser
Bilingual dictionary Sentence level
word level
Phrase level
( S ( NP . ( ..(..)))
( S ( VP …( ..(..)))
Malay targetEnglish source
English source
Malay target
SSTC Editor English
sourceMalay target
Synchronous SSTC
English source
Malay target
BKB
Compile the APP output to SSTC structure
(S (NP (NPL The basic idea) (PP of (NPL example-based parsing))) (VP is (ADJP very simple)))
0the1basic2idea3of4example5-6based7parsing8is9very10simple11
Tree
String
is(8-9/8-9)
The basic idea(0-3/0-3)
of(3-4/3-4)
Example-based parsing(4-8/4-8)
Very simple(9-11/9-11)
S(Ø/0-11)
VP(Ø/8-11)
NP(Ø/0-8)
NPL(1)(Ø/0-3)
NPL(1)(Ø/4-8)
PP(1)(Ø/3-8)
ADJP(1)(Ø/9-11)
Lexical Transfer
The basic idea of example-based parsing is very simple
Idea asas bagi penghuraian berasaskan – contoh adalah mudah
0the1basic2idea3of4example5-6based7parsing8is9very10simple11
Tree
String
is(8-9/8-9)
The basic idea(0-3/0-3)
of(3-4/3-4)
Example-based parsing(4-8/4-8)
Very simple(9-11/9-11)
S(Ø/0-11)
VP(Ø/8-11)
NP(Ø/0-8)
NPL(1)(Ø/0-3)
NPL(1)(Ø/4-8)
PP(1)(Ø/3-8)
ADJP(1)(Ø/9-11)
0idea1asas2bagi3penghuraian4berasaskan5-6contoh7adalah8mudah9
Tree
String
adalah(7-8/7-8)
Idea asas(0-2/0-2)
bagi(2-3/2-3)
Penghuraian berasaskan-contoh(3-7/3-7)
mudah(8-9/8-9)
S(Ø/0-9)
VP(Ø/7-9)
NP(Ø/0-7)
NPL(1)(Ø/0-2)
NPL(1)(Ø/3-7)
PP(1)(Ø/2-7)
ADJP(1)(Ø/8-9)
Alignment Process
Bi-textApple Pie Parser
Bilingual dictionary Sentence level
word level
Phrase level
( S ( NP . ( ..(..)))
( S ( VP …( ..(..)))
Malay targetEnglish source
English source
Malay target
SSTC Editor English
sourceMalay target
Synchronous SSTC
English source
Malay target
BKB
The synchronous The synchronous SSTCSSTC editor. editor.
File Edit Correspondences Windows
0the1 basic2 idea3 of4 exampleexample5 5 ––6 6 basedbased7 7
parsingparsing88 is9 very10 simple11
0Idea1 asas2 bagi3 penghuraianpenghuraian4 4 berasaskanberasaskan5 5
––6 6 contohcontoh77 adalah 8 mudah9
is(8-9/8-9)
The basic idea
(0-3/0-3)
of(3-4/3-4)
Example-based parsing(4-8/4-8)
Very simple(9-11/9-11)
S(Ø/0-11)
VP(Ø/8-11)
NP(Ø/0-8)
NPL(1)(Ø/0-3)
NPL(1)(Ø/4-8)
PP(1)(Ø/3-8)
ADJP(1)(Ø/9-11)
adalah(7-8/7-8)
Idea asas(0-3/0-3)
bagi(2-3/2-3)
Penghuraian berasaskan-contoh
(3-7/3-7)
mudah(8-9/8-9)
S(Ø/0-9)
VP(Ø/7-9)
NP(Ø/0-7)
NPL(1)(Ø/0-2)
NPL(1)(Ø/3-7)
PP(1)(Ø/2-7)
ADJP(1)(Ø/8-9)
Discussion Discussion
Thank you…..Thank you…..