Automatically Aligning English-German Parallel Texts

Post on 05-Jul-2018

226 views 0 download

transcript

 at sentence level using linguistic knowledge
PICS
Volume 2-2004  
Series title: PICS Publications of the Institute of Cognitive Science
Volume: 2-2004
Date: September 2004
Cover design: Thorsten Hinrichs
© Institute of Cognitive Science
 
    
















 




































 
































































 


 






















 

























 













































 


























































 



























 










 


















 
























































 








































































































 


























































































 




















   




















   














 




































 


















 




 









 






 








 
























 
   






  
































 




































































 
  













































  












































































































































































































































































































  
  
























































  















































































































































































































































































































  
  



























































 
































 

















 


 




 







































 
   






















 




































 








 












































 












 





 
     

























































 





  





























 




      



















 
















 



  
 












 






























 






















  









































































 













 

 










 

 

 





































































 

  










  



 
  




  

  








  


























  




  


  














  










 

 




























 







































 




 






       





















































 
1 2 3 4 5 6 7 8 9 10 11 12 13
  8
  7
  6
  5
  4
  3
  2
  1
 9
10
distance measure
aligned parallel text post editor
result and postprocessing
plain German parallel text half
English parallel text half (original format)
plain English paralleltext half
German bitext half sentence per sentence,
lemmatized, tagged
lemmatized, tagged
 segmentation marks after headlines)
(parsing, splitting dictionary into
further specialised DEPs
rejoin tagged, lemmatized halves, set up dictionary object in memory
German dictionary half
English dictionary half
 alignment and distance
decomposition of German compounds
find "synonyms" (PMS)
find transformations from N, ADJ, V, ADV to N, ADJ, V, ADV
e.g.: Lösung -> solve (Stem)
cognate matching (trigramme, lcsr)