+ All Categories
Home > Documents > Arabic MT Project

Arabic MT Project

Date post: 08-Aug-2015
Category:
Upload: hind-abdulkhaleq
View: 282 times
Download: 4 times
Share this document with a friend
Popular Tags:
52
Arabic MT ن م ح ر ل ه ا ل ل م ا س ب م ي ح ر ل ا و ات ب س حا ل ا ه ي ل ك ومات ل ع م ل ا و ي ل و ي20 0 7
Transcript
Page 1: Arabic MT Project

Arabic MT

بسم الله الرحمن الرحيم

كلية الحاسبات و

المعلومات

يوليو

2007

Page 2: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 3: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 4: Arabic MT Project

Problem Definition

Attempt to automate part of the process of translation from Arabic to English applying suggested Arabic NLP paradigms

Lexicon

Parser

Dictio

nary

الدرس المعلم شرح

teacher explain lesson

Arab

ic MT

system

The teacher explainedthe lesson

Page 5: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 6: Arabic MT Project

Giving Credit where Credit is Due

• Most of MT notes are taken from

“MACHINE TRANSLATIONAn Introductory Guide” Book

ByDouglas Arnold

Department of Language &Linguistics, University of Essex

Lorna BalkanSiety Meijer

R. Lee HumphreysLouisa Sadler

Page 7: Arabic MT Project

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Page 8: Arabic MT Project

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Page 9: Arabic MT Project

•social or politicalArises in communities where more than one language is generally spoken.

•Commercial Choice between a product with an instruction `manual in (English, Arabic, …) manual

MT Importance

Page 10: Arabic MT Project

•ScientificallyDirect application and testing groundfor many ideas in Computer Science, Artificial Intelligence, and Linguistics, some of the most important developments in these fields have begun in MT !

MT Importance (2)

Page 11: Arabic MT Project

•Philosophically“getting the correct translation of negatively charged electrons and protons into French depends on knowing that protons are positively charged, so the interpretation cannot be something like “negatively charged electrons and negatively charged protons”.

MT Importance (3)

Page 12: Arabic MT Project

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Page 13: Arabic MT Project

•CATComputer Aided translation

•HAMT

Human Aided Machine Translation

•MAHT Machine Aided Human Translation

•FAHQT Fully Automatic High Quality Translation

Human Involvement Levels

Page 14: Arabic MT Project

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Page 15: Arabic MT Project

MT engines

•Direct Architectures

•Transform Architectures

•Linguistic Knowledge Architectures

• Last two architectures are described bellow

Page 16: Arabic MT Project

MT engines

•Transformer Architectures

Main Ideainput sentences can be transformed into output sentences applying simplest possible parse, replacing source words with their target language equivalents as specified in bilingual dictionary, then re-arranging their order to suit the rules of the target language.

Page 17: Arabic MT Project

SOURCE PARSER uses Lexicon And Small Grammar toProduce a Source Structure

SOURCE - TARGETTRANSFORMER:

Source to Target Transformation rules successively transform The Source Structure Into The Target Structure

Source Text

Target Text

•T

ransformer A

rchitectures (con.)

Page 18: Arabic MT Project

MT engines (3)

•Linguistic Knowledge Architectures

Indirect or linguistic knowledge (LK) architecture

High quality MT requires linguistic knowledge of both the source and the target languages as well as the differences between them.

Page 19: Arabic MT Project

MT engines (3)

•Linguistic Knowledge Architectures (2)

Requirements

• substantial grammar of both the source language and the target language.

•An additional comparative grammar

Page 20: Arabic MT Project

•Linguistic K

nowledge A

rchitectures (3)

ANALYSIS SYNTHESIS

Source Language Grammars

Parse and analyze the input to produce an Interlingua Representation

Target Language Grammar

Generate TargetLanguage Outputfrom Interlingua

Interlingua Representation

Source Text Target Text

Page 21: Arabic MT Project

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Page 22: Arabic MT Project

Dictionaries

•Largest components of an MT system in terms of the amount of information they hold...may well be the most expensive components to construct.

•Its size and quality limits the scope , coverage and the quality of translation that can be expected.

Page 23: Arabic MT Project

Dictionaries (2)

•End users can make some additions to system dictionaries

•A basic understanding of dictionary construction and sensitivity to the issues involved in ‘describing words’ is an important asset.

Page 24: Arabic MT Project

Dictionaries (3) •Morphology

Recognize internal structure of words•Inflectional , Inflected words, (e.g. walk, walks);•Derivational , Derived words (e.g. grammar

grammatical, grammatical

grammaticality);

•Compounding, (buttonhole).

Page 25: Arabic MT Project

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Page 26: Arabic MT Project

MT Problems

• Ambiguity Problems

• Lexical ambiguity

• Structural ambiguity

• Problems arising from Differences

between languages

• Multiword units: Idioms and Collocations

Page 27: Arabic MT Project

MT Problems (2)

• Ambiguity Problems

• Lexical ambiguity

A word that has more than one meaning

• Structural ambiguity

A phrase or sentence that have more than

one structure

Page 28: Arabic MT Project

MT Problems (3)

Problems arising from Differences between languages

• Examples :• Lexical holes , where one language has to

use a phrase to express what another

language expresses in a single word.

• Structural order

English :[subject][ verb]

Arabic : [verb ][subject]

Page 29: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 30: Arabic MT Project

Challenge of Arabic NLP/MT

• Arabic NLP is a relatively new area of

research

• Colloquial Arabic and MSA

e.g. Egyptian Colloquial

• No many texts of spoken Arabic available

Page 31: Arabic MT Project

Challenge of Arabic NLP/MT (2)

• Arabic dialects • Dialect-specific electronic resources, such as

annotated corpora, dictionaries, and parsers

rarely exist

• It is hard to develop resources for each

dialect, since data transcription is expensive

and time-consuming, and there is a whole

continuum of Arabic dialects

Page 32: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 33: Arabic MT Project

Representation and Processing•Linguistic Knowledge

•Knowledge of the source language.

•Knowledge of the target language.

•Knowledge of various correspondences between source language and target language

•Knowledge of the field

Page 34: Arabic MT Project

Representation and Processing (2)

•Representing Linguistic knowledge

•Grammars and Constituent Structure

•Grammatical Relations

•Processing

•Parsing

•Generation

Page 35: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 36: Arabic MT Project

Giving Credit where Credit is Due

• Evaluation Notes are based on

“Problems of Arabic Machine Translation: evaluation of three

systems”

Sattar Izwaini

Abu Dhabi University

[email protected]

Page 37: Arabic MT Project

Evaluation

•Previous slides mentioned Arabic to English

linguistic problem , that is lexical and syntax

problems

Three Online Systems

•Googel

•Sakhr ” صخر “

•Systran

Page 38: Arabic MT Project

Evaluation (2)

• Lexical problems

Deletion

Non-Vocalization

Inadequate

Multiple meaning

Connation and Collocation

Miscellaneous

Page 39: Arabic MT Project

Evaluation (3)

• Syntax problems

Word Order

Gender and Reference

Wrong analysis of input

Tense aspect

Preposition

Definite Article

Page 40: Arabic MT Project

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Page 41: Arabic MT Project

Arabic MT

Small model that tries to implement part of MT

science Basics with Arabic (SL) and English (TL)

, applying Transform architecture , with Arabic

parser and some Arabic to English

Transformation rules

Page 42: Arabic MT Project

Arab

ic MT

Flo

wch

art

Page 43: Arabic MT Project

Arabic MT (3)

• Components

•Normalizer

•Tokenizer

•Stemmer

•Lexicon

Page 44: Arabic MT Project

• Components (2):

•Arabic Parser

•Arabic to English Translator

•some Rules Arabic to English

Transformer (Morphology and syntax ) are

included with

Arabic MT (4)

Page 45: Arabic MT Project

Arabic MT (5)

•Normalizer

•Ignores Non-Arabic character

•Recognize stored words/Roots searching Base

tree by asking the Lexicon

•Create new tree include input text words only

currentTree

Page 46: Arabic MT Project

Arabic MT (6)

•Tokenizer

•Produces list of Arabic words of input sentence

•Combine English words to produce output sentence

•Stemmer•Tack an inflected word and Produce prefix , Infix and Suffix

Page 47: Arabic MT Project

Arabic MT (7)

•Lexicon

•Load Dictionary Basic items of Arabic Language

•Load to Binary Tree

•Load from XML Documents

•Find words asking Binary Tree

•Recognize inflected words by asking the

stemmer to get information

Page 48: Arabic MT Project

Giving Credit where Credit is Due

• Parser implementation is based on Arabic Grammar from - with some modifications -

“Developing Arabic Parser in a multilingual Machine Translation system”

M. Sc. Thesis

Submitted by

Ahmed Farouk Ahmed

Cairo University

Page 49: Arabic MT Project

Arabic MT (8)

•Arabic Parser

•Expect list of supported input sentence words

•Asks CurrentTree - built by Normalizer – for word

information

•Recognizes well formed Arabic sentence

Follow Arabic Grammar Constituents (later slide)

Page 50: Arabic MT Project

Gram

mar C

onstituents StructureA h m ed

F

a ro uk

M.

S C. T he si s

Page 51: Arabic MT Project

Arabic MT (9)

•Translator

•Apply some Arabic to English Transformation rules

•Morphology

•Asks Lexicon for word number , Type ,

gender , ..etc

•Syntax

•Try to rearrange translated sentence

Page 52: Arabic MT Project

Hind Abdulkhaleqmailto:[email protected]


Recommended