Date post: | 08-Aug-2015 |
Category: |
Documents |
Upload: | hind-abdulkhaleq |
View: | 282 times |
Download: | 4 times |
Arabic MT
بسم الله الرحمن الرحيم
كلية الحاسبات و
المعلومات
يوليو
2007
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Problem Definition
Attempt to automate part of the process of translation from Arabic to English applying suggested Arabic NLP paradigms
Lexicon
Parser
Dictio
nary
الدرس المعلم شرح
teacher explain lesson
Arab
ic MT
system
The teacher explainedthe lesson
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Giving Credit where Credit is Due
• Most of MT notes are taken from
“MACHINE TRANSLATIONAn Introductory Guide” Book
ByDouglas Arnold
Department of Language &Linguistics, University of Essex
Lorna BalkanSiety Meijer
R. Lee HumphreysLouisa Sadler
Machine Translation
• Importance
• Levels of Human involvement
• Translation Engines
• Dictionaries
• Problems
Machine Translation
• Importance
• Levels of Human involvement
• Translation Engines
• Dictionaries
• Problems
•social or politicalArises in communities where more than one language is generally spoken.
•Commercial Choice between a product with an instruction `manual in (English, Arabic, …) manual
MT Importance
•ScientificallyDirect application and testing groundfor many ideas in Computer Science, Artificial Intelligence, and Linguistics, some of the most important developments in these fields have begun in MT !
MT Importance (2)
•Philosophically“getting the correct translation of negatively charged electrons and protons into French depends on knowing that protons are positively charged, so the interpretation cannot be something like “negatively charged electrons and negatively charged protons”.
MT Importance (3)
Machine Translation
• Importance
• Levels of Human involvement
• Translation Engines
• Dictionaries
• Problems
•CATComputer Aided translation
•HAMT
Human Aided Machine Translation
•MAHT Machine Aided Human Translation
•FAHQT Fully Automatic High Quality Translation
Human Involvement Levels
Machine Translation
• Importance
• Levels of Human involvement
• Translation Engines
• Dictionaries
• Problems
MT engines
•Direct Architectures
•Transform Architectures
•Linguistic Knowledge Architectures
• Last two architectures are described bellow
MT engines
•Transformer Architectures
Main Ideainput sentences can be transformed into output sentences applying simplest possible parse, replacing source words with their target language equivalents as specified in bilingual dictionary, then re-arranging their order to suit the rules of the target language.
SOURCE PARSER uses Lexicon And Small Grammar toProduce a Source Structure
SOURCE - TARGETTRANSFORMER:
Source to Target Transformation rules successively transform The Source Structure Into The Target Structure
Source Text
Target Text
•T
ransformer A
rchitectures (con.)
MT engines (3)
•Linguistic Knowledge Architectures
Indirect or linguistic knowledge (LK) architecture
High quality MT requires linguistic knowledge of both the source and the target languages as well as the differences between them.
MT engines (3)
•Linguistic Knowledge Architectures (2)
Requirements
• substantial grammar of both the source language and the target language.
•An additional comparative grammar
•Linguistic K
nowledge A
rchitectures (3)
ANALYSIS SYNTHESIS
Source Language Grammars
Parse and analyze the input to produce an Interlingua Representation
Target Language Grammar
Generate TargetLanguage Outputfrom Interlingua
Interlingua Representation
Source Text Target Text
Machine Translation
• Importance
• Levels of Human involvement
• Translation Engines
• Dictionaries
• Problems
Dictionaries
•Largest components of an MT system in terms of the amount of information they hold...may well be the most expensive components to construct.
•Its size and quality limits the scope , coverage and the quality of translation that can be expected.
Dictionaries (2)
•End users can make some additions to system dictionaries
•A basic understanding of dictionary construction and sensitivity to the issues involved in ‘describing words’ is an important asset.
Dictionaries (3) •Morphology
Recognize internal structure of words•Inflectional , Inflected words, (e.g. walk, walks);•Derivational , Derived words (e.g. grammar
grammatical, grammatical
grammaticality);
•Compounding, (buttonhole).
Machine Translation
• Importance
• Levels of Human involvement
• Translation Engines
• Dictionaries
• Problems
MT Problems
• Ambiguity Problems
• Lexical ambiguity
• Structural ambiguity
• Problems arising from Differences
between languages
• Multiword units: Idioms and Collocations
MT Problems (2)
• Ambiguity Problems
• Lexical ambiguity
A word that has more than one meaning
• Structural ambiguity
A phrase or sentence that have more than
one structure
MT Problems (3)
Problems arising from Differences between languages
• Examples :• Lexical holes , where one language has to
use a phrase to express what another
language expresses in a single word.
• Structural order
English :[subject][ verb]
Arabic : [verb ][subject]
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Challenge of Arabic NLP/MT
• Arabic NLP is a relatively new area of
research
• Colloquial Arabic and MSA
e.g. Egyptian Colloquial
• No many texts of spoken Arabic available
Challenge of Arabic NLP/MT (2)
• Arabic dialects • Dialect-specific electronic resources, such as
annotated corpora, dictionaries, and parsers
rarely exist
• It is hard to develop resources for each
dialect, since data transcription is expensive
and time-consuming, and there is a whole
continuum of Arabic dialects
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Representation and Processing•Linguistic Knowledge
•Knowledge of the source language.
•Knowledge of the target language.
•Knowledge of various correspondences between source language and target language
•Knowledge of the field
Representation and Processing (2)
•Representing Linguistic knowledge
•Grammars and Constituent Structure
•Grammatical Relations
•Processing
•Parsing
•Generation
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Giving Credit where Credit is Due
• Evaluation Notes are based on
“Problems of Arabic Machine Translation: evaluation of three
systems”
Sattar Izwaini
Abu Dhabi University
Evaluation
•Previous slides mentioned Arabic to English
linguistic problem , that is lexical and syntax
problems
Three Online Systems
•Googel
•Sakhr ” صخر “
•Systran
Evaluation (2)
• Lexical problems
Deletion
Non-Vocalization
Inadequate
Multiple meaning
Connation and Collocation
Miscellaneous
Evaluation (3)
• Syntax problems
Word Order
Gender and Reference
Wrong analysis of input
Tense aspect
Preposition
Definite Article
Agenda
• Problem Definition
• Machine Translation
• Challenge of Arabic NLP/MT
• Data Representation and processing
• Evaluation
• Application
Arabic MT
Small model that tries to implement part of MT
science Basics with Arabic (SL) and English (TL)
, applying Transform architecture , with Arabic
parser and some Arabic to English
Transformation rules
Arab
ic MT
Flo
wch
art
Arabic MT (3)
• Components
•Normalizer
•Tokenizer
•Stemmer
•Lexicon
• Components (2):
•Arabic Parser
•Arabic to English Translator
•some Rules Arabic to English
Transformer (Morphology and syntax ) are
included with
Arabic MT (4)
Arabic MT (5)
•Normalizer
•Ignores Non-Arabic character
•Recognize stored words/Roots searching Base
tree by asking the Lexicon
•Create new tree include input text words only
currentTree
Arabic MT (6)
•Tokenizer
•Produces list of Arabic words of input sentence
•Combine English words to produce output sentence
•Stemmer•Tack an inflected word and Produce prefix , Infix and Suffix
Arabic MT (7)
•Lexicon
•Load Dictionary Basic items of Arabic Language
•Load to Binary Tree
•Load from XML Documents
•Find words asking Binary Tree
•Recognize inflected words by asking the
stemmer to get information
Giving Credit where Credit is Due
• Parser implementation is based on Arabic Grammar from - with some modifications -
“Developing Arabic Parser in a multilingual Machine Translation system”
M. Sc. Thesis
Submitted by
Ahmed Farouk Ahmed
Cairo University
Arabic MT (8)
•Arabic Parser
•Expect list of supported input sentence words
•Asks CurrentTree - built by Normalizer – for word
information
•Recognizes well formed Arabic sentence
Follow Arabic Grammar Constituents (later slide)
Gram
mar C
onstituents StructureA h m ed
F
a ro uk
M.
S C. T he si s
Arabic MT (9)
•Translator
•Apply some Arabic to English Transformation rules
•Morphology
•Asks Lexicon for word number , Type ,
gender , ..etc
•Syntax
•Try to rearrange translated sentence