Arabic MT Project

Post on 08-Aug-2015

282 views 4 download

Tags:

transcript

Arabic MT

بسم الله الرحمن الرحيم

كلية الحاسبات و

المعلومات

يوليو

2007

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Problem Definition

Attempt to automate part of the process of translation from Arabic to English applying suggested Arabic NLP paradigms

Lexicon

Parser

Dictio

nary

الدرس المعلم شرح

teacher explain lesson

Arab

ic MT

system

The teacher explainedthe lesson

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Giving Credit where Credit is Due

• Most of MT notes are taken from

“MACHINE TRANSLATIONAn Introductory Guide” Book

ByDouglas Arnold

Department of Language &Linguistics, University of Essex

Lorna BalkanSiety Meijer

R. Lee HumphreysLouisa Sadler

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

•social or politicalArises in communities where more than one language is generally spoken.

•Commercial Choice between a product with an instruction `manual in (English, Arabic, …) manual

MT Importance

•ScientificallyDirect application and testing groundfor many ideas in Computer Science, Artificial Intelligence, and Linguistics, some of the most important developments in these fields have begun in MT !

MT Importance (2)

•Philosophically“getting the correct translation of negatively charged electrons and protons into French depends on knowing that protons are positively charged, so the interpretation cannot be something like “negatively charged electrons and negatively charged protons”.

MT Importance (3)

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

•CATComputer Aided translation

•HAMT

Human Aided Machine Translation

•MAHT Machine Aided Human Translation

•FAHQT Fully Automatic High Quality Translation

Human Involvement Levels

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

MT engines

•Direct Architectures

•Transform Architectures

•Linguistic Knowledge Architectures

• Last two architectures are described bellow

MT engines

•Transformer Architectures

Main Ideainput sentences can be transformed into output sentences applying simplest possible parse, replacing source words with their target language equivalents as specified in bilingual dictionary, then re-arranging their order to suit the rules of the target language.

SOURCE PARSER uses Lexicon And Small Grammar toProduce a Source Structure

SOURCE - TARGETTRANSFORMER:

Source to Target Transformation rules successively transform The Source Structure Into The Target Structure

Source Text

Target Text

•T

ransformer A

rchitectures (con.)

MT engines (3)

•Linguistic Knowledge Architectures

Indirect or linguistic knowledge (LK) architecture

High quality MT requires linguistic knowledge of both the source and the target languages as well as the differences between them.

MT engines (3)

•Linguistic Knowledge Architectures (2)

Requirements

• substantial grammar of both the source language and the target language.

•An additional comparative grammar

•Linguistic K

nowledge A

rchitectures (3)

ANALYSIS SYNTHESIS

Source Language Grammars

Parse and analyze the input to produce an Interlingua Representation

Target Language Grammar

Generate TargetLanguage Outputfrom Interlingua

Interlingua Representation

Source Text Target Text

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

Dictionaries

•Largest components of an MT system in terms of the amount of information they hold...may well be the most expensive components to construct.

•Its size and quality limits the scope , coverage and the quality of translation that can be expected.

Dictionaries (2)

•End users can make some additions to system dictionaries

•A basic understanding of dictionary construction and sensitivity to the issues involved in ‘describing words’ is an important asset.

Dictionaries (3) •Morphology

Recognize internal structure of words•Inflectional , Inflected words, (e.g. walk, walks);•Derivational , Derived words (e.g. grammar

grammatical, grammatical

grammaticality);

•Compounding, (buttonhole).

Machine Translation

• Importance

• Levels of Human involvement

• Translation Engines

• Dictionaries

• Problems

MT Problems

• Ambiguity Problems

• Lexical ambiguity

• Structural ambiguity

• Problems arising from Differences

between languages

• Multiword units: Idioms and Collocations

MT Problems (2)

• Ambiguity Problems

• Lexical ambiguity

A word that has more than one meaning

• Structural ambiguity

A phrase or sentence that have more than

one structure

MT Problems (3)

Problems arising from Differences between languages

• Examples :• Lexical holes , where one language has to

use a phrase to express what another

language expresses in a single word.

• Structural order

English :[subject][ verb]

Arabic : [verb ][subject]

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Challenge of Arabic NLP/MT

• Arabic NLP is a relatively new area of

research

• Colloquial Arabic and MSA

e.g. Egyptian Colloquial

• No many texts of spoken Arabic available

Challenge of Arabic NLP/MT (2)

• Arabic dialects • Dialect-specific electronic resources, such as

annotated corpora, dictionaries, and parsers

rarely exist

• It is hard to develop resources for each

dialect, since data transcription is expensive

and time-consuming, and there is a whole

continuum of Arabic dialects

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Representation and Processing•Linguistic Knowledge

•Knowledge of the source language.

•Knowledge of the target language.

•Knowledge of various correspondences between source language and target language

•Knowledge of the field

Representation and Processing (2)

•Representing Linguistic knowledge

•Grammars and Constituent Structure

•Grammatical Relations

•Processing

•Parsing

•Generation

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Giving Credit where Credit is Due

• Evaluation Notes are based on

“Problems of Arabic Machine Translation: evaluation of three

systems”

Sattar Izwaini

Abu Dhabi University

s.izwaini@adu.ac.ae

Evaluation

•Previous slides mentioned Arabic to English

linguistic problem , that is lexical and syntax

problems

Three Online Systems

•Googel

•Sakhr ” صخر “

•Systran

Evaluation (2)

• Lexical problems

Deletion

Non-Vocalization

Inadequate

Multiple meaning

Connation and Collocation

Miscellaneous

Evaluation (3)

• Syntax problems

Word Order

Gender and Reference

Wrong analysis of input

Tense aspect

Preposition

Definite Article

Agenda

• Problem Definition

• Machine Translation

• Challenge of Arabic NLP/MT

• Data Representation and processing

• Evaluation

• Application

Arabic MT

Small model that tries to implement part of MT

science Basics with Arabic (SL) and English (TL)

, applying Transform architecture , with Arabic

parser and some Arabic to English

Transformation rules

Arab

ic MT

Flo

wch

art

Arabic MT (3)

• Components

•Normalizer

•Tokenizer

•Stemmer

•Lexicon

• Components (2):

•Arabic Parser

•Arabic to English Translator

•some Rules Arabic to English

Transformer (Morphology and syntax ) are

included with

Arabic MT (4)

Arabic MT (5)

•Normalizer

•Ignores Non-Arabic character

•Recognize stored words/Roots searching Base

tree by asking the Lexicon

•Create new tree include input text words only

currentTree

Arabic MT (6)

•Tokenizer

•Produces list of Arabic words of input sentence

•Combine English words to produce output sentence

•Stemmer•Tack an inflected word and Produce prefix , Infix and Suffix

Arabic MT (7)

•Lexicon

•Load Dictionary Basic items of Arabic Language

•Load to Binary Tree

•Load from XML Documents

•Find words asking Binary Tree

•Recognize inflected words by asking the

stemmer to get information

Giving Credit where Credit is Due

• Parser implementation is based on Arabic Grammar from - with some modifications -

“Developing Arabic Parser in a multilingual Machine Translation system”

M. Sc. Thesis

Submitted by

Ahmed Farouk Ahmed

Cairo University

Arabic MT (8)

•Arabic Parser

•Expect list of supported input sentence words

•Asks CurrentTree - built by Normalizer – for word

information

•Recognizes well formed Arabic sentence

Follow Arabic Grammar Constituents (later slide)

Gram

mar C

onstituents StructureA h m ed

F

a ro uk

M.

S C. T he si s

Arabic MT (9)

•Translator

•Apply some Arabic to English Transformation rules

•Morphology

•Asks Lexicon for word number , Type ,

gender , ..etc

•Syntax

•Try to rearrange translated sentence

Hind Abdulkhaleqmailto:habdulkhaleq@live.com