+ All Categories
Home > Documents > COMPUTATIONAL MODELING OF YORÙBÁ SPLIT-VERBS … · COMPUTATIONAL MODELING OF YORÙBÁ...

COMPUTATIONAL MODELING OF YORÙBÁ SPLIT-VERBS … · COMPUTATIONAL MODELING OF YORÙBÁ...

Date post: 08-Dec-2018
Category:
Upload: lamliem
View: 231 times
Download: 0 times
Share this document with a friend
12
www.ijarcsa.org COMPUTATIONAL MODELING OF YORÙBÁ SPLIT-VERBS FOR ENGLISH TO YORÙBÁ MACHINE TRANSLATION SYSTEM ELUDIORA SAFIRIYU I. 1 OKUNOLA TUNDE 2 ODETUNJI A. ODEJOBI 3 1, 2 & 3 Department of Computer Science & Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria. ABSTRACT Sentences that contain split verbs pose a serious challenge in the translation process achine Translation system, due to the fact that they do not follow the pattern of normal verbs in the sentential form. These split verbs differ from one another in their structure depending on the part of speech that precede them in the sentence. When translated they generate invalid sentence. We examined the computational model of the split verbs and developed a software artefact that we used for the test of the model. We modelled within the content of context free grammar. We used rule-based approach to develop the software. The results showed that split-verbs can be modelled computationally. The software codes will be used as a module in the on-going development of English to . Keywords: Keywords: split-verb, word, word-swapping, model, linguistics, rule-based I. INTRODUCTION People from different nations speak different languages to communicate, to exchange ideas, sell products, offer help, formulation and acceptance of theories etc. There is need for both parties to understand the language with which they communicate. The early researchers made use of human translators which was not efficient at this time. Later they concentrate exclusively on translation of scientific text and technical documents in textual form. This approach led to the development of machine translation. The history of machine translation could be traced backed to seventeen century when ideas of universal, philosophical languages and of mechanical dictionaries came to existence. The first practical suggestions was made in 1933 with two patents issued in France and Russia to Georges Artsrouni and Petr Trojanskij respectively both having the patent for mechanical dictionary. They later went further with proposals for coding and interpreting grammatical functions using universal (Esperanto-based) symbols in a multilingual translation device. Since then there had been machine translators for different languages. A machine translator depending on the language it might involve the integration of different structures or aspect of the language to form a complete machine translation for a language. , some verbs differ in pattern when translated following each other, they are called split verbs. Example of such verb is believed ( ). Its form in the sentence can be seen in the examples below: 1. He believed the boy ( mnáà gb ) Also, cheat - r j2. He cheats Ola ( ) Compare with the verbs like dance ( ). Dance is a non-splitting verb and the sentence below explains it. 3. Olu danced at the party (náà) INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONS ISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782 VOLUME 3, ISSUE 4, APRIL 2015. www.ijarcsa.org 1 [email protected]
Transcript

www.ijar

csa.o

rg

COMPUTATIONAL MODELING OF YORÙBÁ SPLIT-VERBS FOR

ENGLISH TO YORÙBÁ MACHINE TRANSLATION SYSTEM

ELUDIORA SAFIRIYU I. 1 OKUNOLA TUNDE

2 ODETUNJI A. ODEJOBI

3

1, 2 & 3

Department of Computer Science & Engineering, Obafemi Awolowo University, Ile-Ife,

Nigeria.

ABSTRACT

Sentences that contain split verbs pose a serious challenge in the translation process

achine Translation system, due to the fact that they do not follow the pattern of normal

verbs in the sentential form. These split verbs differ from one another in their structure depending on

the part of speech that precede them in the sentence. When translated they generate invalid sentence.

We examined the computational model of the split verbs and developed a software artefact that we

used for the test of the model. We modelled within the content of context free grammar. We used

rule-based approach to develop the software. The results showed that split-verbs can be modelled

computationally. The software codes will be used as a module in the on-going development of

English to .

Keywords: Keywords: split-verb, word, word-swapping, model, linguistics, rule-based

I. INTRODUCTION

People from different nations speak different

languages to communicate, to exchange ideas,

sell products, offer help, formulation and

acceptance of theories etc. There is need for both

parties to understand the language with which

they communicate. The early researchers made

use of human translators which was not efficient

at this time. Later they concentrate exclusively

on translation of scientific text and technical

documents in textual form. This approach led to

the development of machine translation.

The history of machine translation could be

traced backed to seventeen century when ideas

of universal, philosophical languages and of

mechanical dictionaries came to existence. The

first practical suggestions was made in 1933

with two patents issued in France and Russia to

Georges Artsrouni and Petr Trojanskij

respectively both having the patent for

mechanical dictionary. They later went further

with proposals for coding and interpreting

grammatical functions using universal

(Esperanto-based) symbols in a multilingual

translation device. Since then there had been

machine translators for different languages. A

machine translator depending on the language it

might involve the integration of different

structures or aspect of the language to form a

complete machine translation for a language.

, some

verbs differ in pattern when translated

following each other, they are called split verbs.

Example of such verb is believed ( ). Its

form in the sentence can be seen in the examples

below:

1. He believed the boy ( Ọmọ

náà gb )

Also, cheat - r jẹ

2. He cheats Ola ( Ọ ẹ)

Compare with the verbs like dance ( ). Dance is

a non-splitting verb and the sentence below

explains it.

3. Olu danced at the party (Ọ

náà)

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 1 [email protected]

www.ijar

csa.o

rg

Also sang (kọrin)

4. Ade sang at the party ( ọrin ní patí

náà).

Other splitting verbs are rebuke ( ), scatter

( ), lost (sọ ) etc., these verbs and others

follow the same pattern as those in the examples

1 and 2 above.

Translation (MT) arise from the fact that, the

number of words in the sentence increased by

one in the equ which is

not so for the English language verbs.

In addition, if the split verb in the sentence is

followed by a preposition like in (nínú) as we

have in example 5.

5. He believed in God (Ó nínú

Ọ ).

In sentence

machine translation system development. The

development of machine translators

system, the project reported here focused on the

translation of simple sentences in the form

subject-verb-object (SVO). For instance, the

sentence 6 and 7

6. Olu<subject> damaged <verb> the

<determiner>

radio<object>(Olú<subject>ba<verb1>rédí

ó<noun> náà<determiner> j <verb2>)

7. The<determiner>tall<adjective>boy<noun>

believed<verb>in<preposition>his<prono

un>friend<noun>

(Ọmọ

<verb>nínú<pre

position> r <noun>r <pronoun>).

It should be noted that the two languages are

SVO, but word-swapping do occur in among

some

speech. For example, verb ( r ìṣ e).

We applied rule-based approach. This approach

involves linguistic information about source and

target languages in which bilingual dictionary

was used. English language is the source

language (SL) while

target language (TL). Rule based approach

provides the re-write rules to generate the output

for the TL from SL.

II. YORÙBÁ LANGUAGE

-Congo language spoken in

Nigeria and other parts of the world. According

to [1] of over

25 million (South West Nigeria only). Its loan

words are mostly from Arabic, English, Hausa

and Igbo languages. Its dialects include: gbá,

Ìj bú, y /Ìbàdàn, Èkìtì, Ìgbómìnà, Ìj sà,

Ìkál /Òndó and If [2].

According to the International African Institute

[3], the Yorùbá language is used by the media

i.e. the Press, Radio and Television. It is also

used as a language of formal instruction and a

curriculum subject in the primary, secondary and

post-secondary school (including University). It

has a standard orthography.

.

th many

other language groups in Nigeria and in some

African countries; so it has several exonyms

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 2 [email protected]

www.ijar

csa.o

rg

(outside names) like Yáríbà, Yórúbáwá, Nàgó

Ànàgó, Lùkúmì, and Akú [4].

A.

By the definition of grammar in linguistics;

grammar is the set of structural rules that govern

the composition of clauses, phrases,

and words in any given natural language. It

refers to the study of such rules in the fields of

morphology, syntax, and phonology, often

complemented by phonetics, semantics,

and pragmatics.

B. Alphabets

-five alphabets, out of which

seven are vowels and the rest are consonants.

: Aa Bb Dd Ee Ẹẹ Ff Gg GBgb

Ii Hh Jj Kk Ll Mm Nn Oo Ọọ Pp Rr Ss Ṣ ṣ Tt

Uu Ww Yy

(a) - owels: a e ẹ i o ọ u

There are five Nasalised vowels (an, ọn, in, ẹn,

un) and two syllabic nasal vowels (m, n) [5] and

[6]

do)] respectively

[7]

. For example, a word that

has the same form (i.e. vowels and consonants)

can have different meanings depending on the

tones that it has:

Igba -- two hundred

-- calabash

-- time

-- the season when perennial crops

have the least production

-- garden egg

-- climbing rope

The mid tone is usually left unmarked on

vowels. Out of the three basic (high, low and

mid) tones that are attested in the language, only

the high tone cannot occur on a word initial

vowel [7]. This is why potential words such as

those given below are not possible in the

language.

orí (cf. orí) -- a head

i (cf. ) -- a bottle

) -- a curse

(cf. ) -- bitter leaves

cf (common form)

Loan words that have closed syllables in the

source languages are made to conform to the

forms acceptable in the language [7]:

tì -- shirt

k sì -- course

Here, vowel /i/ is inserted to re-syllabify the

coda from the English loan. Consonant clusters

are not allowed in either. Therefore

consonant clusters in the loan words are re-

syllabified. The most common method for

consonant cluster simplification is vowel

insertion. For example, vowel /i/ is inserted to

simplify consonant clusters in.

tì -- slate

sì -- class

dir -- driver

-- trailer

rather it ends with vowel sounds [A E Ẹ I O Ọ U

AN ẸN IN ỌN UN]

C. Morphology

Morphology is the branch of linguistics that

studied the internal structure of words and how

they are formed in a language [8]. Morphology

accounts for word formation in language. The

basic unit of analysis in morphology is called the

„morpheme‟. A morpheme is defined as the

meaningful unit of grammatical analysis, that is,

a meaningful sequence of sound which is not

divisible into smaller unit. have some

productive methods of word derivation. The

main morphological processes in the language

include: affixation, compounding and

reduplication.

D. Affixation

uses pre-fixation and in-fixation to

derive new words. Each of the Yo oral

vowels (except /u/ in the standard dialect) can be

used as a prefix to derive a new word. Each of

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 3 [email protected]

www.ijar

csa.o

rg

the usable six oral vowels (a, e, ẹ, i, o, ọ) has

two forms as a prefix: mid toned and low toned.

They are attached to verbs to derive nouns as

follows:

Low toned prefixes

+ d „to be soft‟ = d -- idiot

ì + „to break‟ = ì -- poverty

+ gún „to pierce‟ = gún -- thorn

Mid toned prefixes

ẹ + „to carry‟ = ẹ -- load

ọ + dẹ „to hunt‟ = ọdẹ -- hunter

a + „to sieve‟ = a -- sieve

Infixes are (usually) inserted between two forms

of the same word to derive a new word:

„house‟ + kí + ( ) -- a bad

house / any house

ọmọ „child‟ ọmọ + kí + ọmọ (ọmọk mọ)

-- a bad child

D. Compounding

also derive new words by combining

two independent words:

ẹran „meat‟ + oko „farm‟ = ẹranko --

animal

„mother‟ + ọkọ „husband‟ = iyakọ --

mother-in-law

E. Reduplication

derive nominal items/adjectives from

verbs through a partial reduplication of verbs.

New nouns can also be derived by a total

reduplication of an existing noun.

jẹ „to eat‟ = jíjẹ -- edible

„to cook‟ = -- cooked

ọmọ „child‟ = ọmọọmọ -- grand-children

„mother‟ = ì -- grand-mother

III. ENGLISH AND YORÙBÁ

The parts of speech that are attested in

include: verbs, nouns, adjectives, prepositions,

pronoun etc as we have in English language.

A. r Orúkọ (Noun)

A noun in as we have in English

language is the name of a person, animal, place,

things etc. A noun can take two forms in

language structure; it can be the subject (the

performer of an action in) or the object (receiver

of an action) in a sentence. For example;

Olú - name of a person

Ìbàdàn – name of a place

Ewúr – Goat (animal)

B. pò r Orúkọ (Pronoun)

Pronouns are words used in place of nouns.

For example:

àwa – we

àwọn – they/them

Ó – he/she/it ( has no gender

pronoun as we have in English)

C. r Ìṣ e (Verbs)

A verb is a word in a sentence expressing the

action performed by the subject or received by

the object. It is known as “ r ìse” in .

verbs can be monosyllabic as we have in

the case of:

lọ -- to go

n -- to sleep

-- to die

-- to break

Also it can be more than one syllable as we have

in the case of:

-- to forget

t -- to follow

-- to insult

Some of the verbs are discontinuous

morphemes. They are called splitting verbs in

the traditional grammar [9].

j -- to get spoiled/ to damage

n -- to introduce

does not mark any agreement between

the verb and the number feature of the nouns:

Like/likes – f ràn

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 4 [email protected]

www.ijar

csa.o

rg

D. Splitting Verbs

According to [9], “when splitting verbs used

with an object, each verb in this class is always

split into halves and the object is inserted

between them”. Reference [10] reported the split

verbs in relation to work of [11] and [12] which

analyse further the work of [9]. However,

Reference [10] observed that split verb usually

give idiomatic expression in the

equivalent meaning. Our research reported in

this paper is limited to context free grammar.

For example: Case 1

1. Ade damaged the radio

2. Ó ba rédìò náà j

Also:

1. Taye introduced

2. Táyé fi Adé hàn, according to [13].

From the example above, the English word

“damaged” was spitted to “ba” and “j ” in the

context. This makes the translation

difficult. A more complex situation is

encountered in the following examples:

Case 2

1. Taye introduced Ade to me (Táyé fi

Adé hàn mí)

Also

2. He reported Bayo to me (Ó fi ẹj

Báy sùn mí)

From the two examples above, the second

syllable does not assume the last position as in

case 1. Some splitting verbs do split to two or

more syllables in its context. Other

examples of splitting verbs include:

Helped --“ran” + “lowo”

Hungry --“n” + “pa”

Reference [14] and [15] described An example

of how verb splits can be found in the Bilingual

N-gram Statistical Machine Translation [14 and

15] implemented a translation model based on

the finite-state perspective, which is used along

with a log-linear combination of four additional

feature functions [16].

Moreover, [17] presents a lexical decomposition

of composite forms within the framework of a

lexical morphology vis-à-vis using his

knowledge as a native speaker of the language.

He found that some complex verbs have had

their forms modified, thereby causing some

disturbances in the recognition of their

component forms.

IV. Machine Translation Approaches

Statistical machine translation is a machine

translation paradigm where translations are

generated on the basis of statistical models

whose parameters are derived from the analysis

of bilingual text corpora [18].

Statistical machine translation (SMT) is

characterised by the use of machine learning

methods, for example, Hidden Markov Model

for POS tagging. In less than two decades, SMT

has come to dominate academic MT research

and has gained a share of the commercial MT

market [19].

Hybrid machine translation (HMT) leverages the

strengths of statistical and rule-based translation

methodologies [20].

Rule-based Machine Translation (RBMT) also

known as `Knowledge-based Machine

Translation', `Classical Approach' of MT is a

general term that denotes machine translation

systems based on linguistic information about

source and target languages. Basically the

linguistic information can be retrieved from

(bilingual) dictionaries and grammars covering

the main semantic, morphological and syntactic

regularities of each language [21] and [22]

Reference [23] presented rule based machine

translation (RBMT) approach for English to

Bangla. The Authors used of fuzzy rules that is,

the if- then rules and the formation of bilingual

dictionary for the languages.

Reference [24] Shallow-transfer MT system

from Swedish to Danish was developed.

Transfer-based MT model based on Apertium

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 5 [email protected]

www.ijar

csa.o

rg

platform was used. The following steps were

used to develop the system 1) Resources 2)

Analysis and generation 3) Disambiguation 4)

Lexical transfer 5) Syntactic transfer 6) Status.

V. SYSTEM ARCHITECTURE

The software architecture of a system is the set

of structures needed to reason about the system,

which comprise software elements, externally

visible properties, relations among them, and

properties of both. The term also refers to

documentation of a system's "software

architecture". The architecture explains the

decomposition of system into sub-systems. This

software contains

A. The User Interface

The user interface is important in most

application software because the user will make

use of it to interact with the system. The test on

any software starts at the interface level; failure

at interface might lead to the condemning the

software. The interface contains two textboxes,

two labels, a list-box, file-menu and a button.

The labels give the description to the textboxes.

The first label contains the description: “English

sentence” and the second label: “

sentence”. One of the textbox is design to accept

the user input that is; English sentence, the

second textbox displays the translation of the

inputted sentence in and is not editable.

Also a list box which displays the word(s) in the

input sentence that is (are) not in the database. A

file menu that will allow the user to view the

words in the database, add to list of words, edit

and delete words. Finally a button labeled

“Translate” which handle the event of the

translation process.

B. Software Modules

The modules can also be called classes or

functions which perform various operations

leading to the output of the translation. They

include

1) Parts of Speech (POS) Tagging

It splits or tokenizes the input sentence to

separate words which have meaning in the

English context. For example the sentence: “He

damage the radio” will be tokenize to an array

of words,

He<Prn>damage<V>the<Det>radio<N>.

These words were tagged manually. Each word

in the word array its corresponding part of

speech then returns the pattern to the parser to

rearrange the pattern. This function also gets the

translation of each word in the array. The

translations, the part of speech and the words are

stored in a dictionary for easy referencing in

other part of the program. The structure of the

dictionary can be seen in the illustration below:

English sentence: “He damaged the radio”

He<Prn>damage<V>the<Det>radio<N>

equivalent sentence:

Ó<Prn>ba<V1>àga<N>náà<Det>j <V2>

The verb in the content of split verbs had

already qualified the split verbs as verb so there

is no need to store it part of speech in the

database rather we stored the two parts it splits

to in the database.

C. Lexicon

In computation linguistics, lexicon supplies

paradigmatic information about words, including

parts of speech, labels, irregular plurals and sub

categorization information for verbs.

Traditionally, lexicons are small and constructed

largely by hand. There is a growing concern that

effective natural language processing requires

increased amount of lexical information. A

recent trend has been the use of automatic

techniques applied to large corpora for the

purpose of acquiring lexical information from

text [25].

D. System Database

The database is responsible for storing the

English and words and their

parts of speech. This collection of words and

their meaning can also be called corpora. The

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 6 [email protected]

www.ijar

csa.o

rg

database contains two tables one for the splitting

verbs and the other one for words that are used

with the splitting verbs. Figure 1 shows the

words, translation and POS used with the

splitting verbs and figure 2 shows the splitting

verbs, their translations; the views are presented

in form of screen shot of the words in a notepad.

Figure 1. Words frequently used with

splitting verbs

Figure 2 Splitting verbs table

VI. SYSTEM DESIGN

Automata is the study of abstract machine and

how they are used to solve computational

problem. It is used in application areas like;

communication systems, compiler, computer

hardware system, numerical computation etc. its

application in relative to this project work is the

compiler. A language translator is a form of

compiler that takes input sentence from a source

language and maps it to sentence in the target

language.

The automata structure of the sentence is a

context free grammar (CFG). It is the most

common way of modeling constituency.

CFG = Context-Free Grammar = Phrase

Structure Grammar = BNF = Backus-Naur Form

Grammar (G) is a mechanism to describe the

language; it is used to de

sentences with respect to

the context free grammar can be described as a

four tuple grammar given as:

G = {T, N, S, R}

Where:

T is set of terminals (lexicon)

N is set of non-terminals For NLP, we usually

distinguish out a set P N of pre-terminals

which always rewrite as terminals.

S is start symbol (one of the non-terminals)

R is rules or productions of the form X → ,γ

where X is a non-terminal and γ is a sequence

of terminals and non-terminals.

Production is used to specify how a grammar

transforms one string to another thus defining a

language associated with a grammar.

T = {a noun, a verb, a pronoun, a

preposition, determinant or article, adjective e.g.

He as a pronoun, Damaged as a verb, the as a

determinant, In as a preposition}.

N = {NP = noun phrase, VP = verb phrase, PP =

prepositional phrase also DET(determinant),

N(noun or pronoun), V(Verb), P(preposition),

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 7 [email protected]

www.ijar

csa.o

rg

ADJ(adjective), which will be replaced by

corresponding terminal symbols.

and

is either N or DET in English}.

P = {this is the rule of sentence

formation}.

The table 1 below shows the sentence formation

rule, the decomposition and the arrangements of

non-terminal .

The sentence structures can also be represented

with a transition graph. It is a graph that consists

of three things:

A finite set of state

Input strings form non-terminal symbols

A finite set of transition that shows how to

move from one state to another depending

on the input string.

Figures 3 and 4 below show the transition graph

of English and automata structure

respectively. The graphical representation of the

English structure has one verb and the verb does

not split. The English structure can be used

generate many sentences by following the

transitions. It can only be used for a simple

sentence because it has only one verb. There are

two phrases: NP + VP that make a sentence.

There is provision for adjectival phrase which

can take care of noun qualifier. Figure 4

describes the -

-

. Figure 4 shows how we

can realise the translation of simple sentences

using the finite state diagram. The rules were

used to design the state diagram. It is possible to

qualify subject and object in the diagram shown

in figure 4. For example, the tall boy damaged

the red car - ọmọkùnrin gíga náà ba ọk pupa

náà j . We tested the rules and the transition

using the JFLAP shown in figure 5.

Table 1 Sentence Formation Rules

English productions productions

S> →

<NP><VP>

<VP> →

<V><NP>

<NP> →

<DET><N>

<NP>→

DET><ADJ><N>.

<NP> → <N>

<S>→ <NP><V1P><V2>

<V1P> → <V1><NP>

<NP> → <N><DET>

<NP>→<N><ADJ><DET>

<NP> → <N>

English Structure

Figure 3 Transition graph for English

sentence structure

structure

Figure 4. Transition graph for

sentence structure.

N/P

rn

De

t

V2

<S>

<VP

>

Ad

j

De

t

<NP> De

t

Adj

N N

2

6

St

op N V1 4 3 5

7 S

t

a

r

t

t

<NP>

<NP>

<S>

<VP>

Det

Stop

Adj

N Det

N V N/Prn

4 3 5 Start

2

<NP>

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 8 [email protected]

www.ijar

csa.o

rg

The diagrams above show the transition graphs

for English and , it represents the

sequence of arrangement of part of speech to

form the production of a sentence. Examples

are:

1 Olu believed God →< N V N>

1(a) Ọ Ọ → < N V1 N V2

2. He damaged the radio →< N V Det N >

2(a) →< N V1 N Det V2 >

3.The tall boy damage the box

→<Det AdjNVDetN >

3(a)Ọmọ

→<NAdjDetV1NDetV2 >

4. Ade cheated the tall boy →<N V Det Adj N >

4(a) Adé r ọmọ ẹ

→< NV1NAdjDetV2 >

Pronouns are treated as noun because they

followed the same pattern in the sentence. The

examples above shows some sequence of

NV1V2, Adj, Det being combined together to

generate a valid sentence. These can be

concatenated following the transition graphs

above to generate a valid sentence for the two

languages. We checked the validity of the re-

write rules in Table 1 using the JFLAP as shown

in figure 5.

Figure 5 JFLAP sample demonstration of the

re-write rules

VII. System Implementation

System implementation deals with the process of

converting system specification to executable

system for the case of software, in the case of

hardware; it involves the construction of

components that will make up the system (gotten

from specification). The process of

implementation involves the total steps taken

from the analysis of the problem to the

production of the executable file. Parts of the

processes include gathering the requirements,

gathering the tools, managing the tools and the

requirements to obtain the result. Some of the

tools used in the implementation of this

application include: Python IDE, wxPython,

Sqlite3 and py2exe.

Python IDE provides an interface for writing

python code, wxPython is a package integrated

to python IDE for building the GUI, sqlite3 also

a package built in with python installer; it

provides the database functionality for the

application and py2exe is used in compiling

python codes to executable (.exe) file. The entire

program code was developed in line with

program design.

A. System Description

. The

system will be described in term of its data

format, requirements for populating the database

and the user interface.

B. System Usage

The user can only enter English sentence in the

space provided then strike the enter button or

click the translate button. If the user does not

enter anything in the space provided for English

sentence before he/she strike the enter button or

clicks translate button; the application will

simply popup a message box telling the user

that he/she has not entered any text.

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 9 [email protected]

www.ijar

csa.o

rg

C. Requirements for Populating the Database

Populating th

words; the database was populated with

keyboard that has the capability of tone marking.

So, it will be necessary for the user to use

keyboard with this capability in case there will

be need to add to the words in the database.

D. User Interface

As it was previously mentioned, the user

interface has three textboxes; one of the textbox

accept user input (English sentence), the second

displays the translation of the inp

sentence in a well arranged format based

on the grammar. Also the user interface has a

button labeled “Translate” and a menu for

performing activities like; viewing the database,

editing the database and performing exiting

function.

Figure 6 shows the user interface for the

application, figure 7 shows the popup message it

will show when the user tries to translate an

empty text, followed by the translation of

sentences in figures 8 and 9.

Figure 6. User interface for the application

Figure 7. Popup error message for empty

string

Figure 8. Sample Sentence

All these and many more can be translated with

the translator. One important thing is just for the

user to include a split verb in his sentence;

because the structure of the system is built

around split verbs.

E. Error Handling

This is another important aspect in application

development. Error handling is important in the

development of software to prevent unnecessary

behavior such as; hanging and halting when

unnecessary input is inputted to the system. Two

approach were used in this application; the first

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 10 [email protected]

www.ijar

csa.o

rg

one is the management of empty text for

translation, figure 6 already shows the nature of

the popup error message when this is done.

Another approach used was to notify the user

whenever the system comes across a string that

is not in the database. Figure 9 shows a snapshot

of the popup message. This error technique only

allow word which their part of speech is

suggested to be “Noun”; when ok is clicked the

exact word will reflect in the translates sentence

otherwise the system will give a wrong

translation or not give any translation of the

sentence. This was done to guild against the

population of database with names (of: object,

place, things).

Figure 9. Popup menu for a word not in the

database

CONCLUSION

This model will be integrated to English to

machine translation system. This

subsystem modelled can translate simple

sentences that have subject verb object. Again

simple sentence is expected to have one verb. It

cannot handle compound and complex

sentences. There are other concepts that we have

to do separately and then integrate with the

system. Some of these concepts are: tone

changing verbs. Ambiguity, numbering system

etc. These and other concepts will be handled in

future.

References

1. NPC (National Population Commission),

2006 Census. url: www.population.gov.ng,

(Accessed: 25/06/2012).

2. F. A. Fabunmi, “A GPSG Structure of

Aspect in Àkókó” in a book titled

Current Perspectives in Phono-Syntax and

Dialectology, Ghana, 2009, Pp: 159.

3. International African Institute, “Provisional

Survey of Major Languages in the

Independent States of Sub-Saharan Africa”,

P. Baker (ed.). UNESCO: International

African Institute, 1980.

4. F. A. Fabunmi, and A. S. Salawu, “Is

Yorùbá an Endangered Language?”, Nordic

Journal of African Studies 14(3): 391-408,

2005, url: http://www.njas.helsinki.fi/pdf-

files/vol14num3/fabunmi.pdf [accessed:

22/11/2014]

5. A. Bamgbose, A Short Yoruba Grammar.

Heinemann Educational Books Ltd, Ibadan,

First edition, 1967.

6. L. O. Adewole, The Yoruba Language:

Published Works and Doctoral

Dissertations. Helmut Buske, Hamburg,

1987.

7. K. Owolabi, Ìjìnl Ìtúpal Èdè Yorùbá (1)

Fòn tíìkì ati Fon lọjì, Extension

Publications Limited, Molete, Ibadan, 2009.

8. J. Lyor, “Language and Linguistics: an

introduction” United Kingdom Cambridge

University press, 1981

9. O. Awobuluyi, “Essentials of

Grammar”, Oxford University Press, Ibadan,

1978.

10. A. Howell, “Abstracting over Degrees in

Yoruba Comparison Constructions1”,

Universität Tübingen, 2012, url:

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 11 [email protected]

www.ijar

csa.o

rg

http://semanticsarchive.net/sub2012/Howell.

pdf [accessed: 22/11/2014].

11. O. Awobuluyi, The Yoruba Verb Phrase. In

A. A. (Ed.), Yoruba language and literature,

University of Ife Press, 1982.

12. O. G. Bode, Yorùbá Clause Structure. Ph. D.

thesis, University of Iowa, Iowa City, 2000.

13. J. D. Atoyebi, “Complex Verbs and Valency

Classes in Yorùbá, Workshop on Valency

Classes”, MPI-EVA, Leipzig, Aug. 2010,

Pp: 1-12, [Accessed: 13/11/2014].

14. A. De Gispert, and J. B. Mari˜no, “Using X-

grams for speech-to-speech translation” In

Proceeding. of the 7th Int. Conf. on Spoken

Language Processing, 2002.

15. A. De Gispert, J. B. Mari˜no, and J.M.

Crego, “TALP: X-gram-based spoken

language translation system”, Proceeding of

the Int. Workshop on Spoken Language

Translation, Kyoto, Japan, October, 2004,

pp.:85–90.

16. J. M. Crego, J. B. Mari˜no, and A. De

Gispert, “A Ngram-based Statistical

Machine Translation Decoder”, Submitted

to INTERSPEECH, 2005.

17. J. A. Ogunwale, “Problems of Lexical

Decomposition: The Case of Yoruba

Complex Verbs”, Nordic Journal of African

Studies vol: 14, issue: 3, 2005, Pp: 318–333.

18. W. Weaver, Translation, in Machine

Translation of Languages. MIT Press,

Cambridge, 1955.

19. A. Lopez, “Hierarchical Phrase-Based

Translation with Suffix Arrays” In

Proceedings of the 2007 Joint Conference on

Empirical Methods in Natural Language

Processing and Computational Natural

Language Learning (EMNLP-CNLL,

Association for Computational Linguistics,

url: http://www.aclweb.org/anthology/ D/ D

07 /D07-1104, (Accessed: 23/06/2011).

20. A. Boretz, A. (2009). AppTek Launches

Hybrid Machine Translation Software.

Magazine, Speech Technology, 2009, url:

http://www.speechtechmag.com/Articles/Ne

ws/ News-eature/AppTek-Launches-Hybrid-

Machine-Translation-Software-52871.aspx

(Accessed:04/08/2011).

21. F. Bond, “Introduction to Machine

Translation", url:www.cs.mu.oz.au /research

/it /nlp06/ materials/Bond /mt-intro.pdf,

(Accessed: 02/ 03/ 2011), 2006.

22. M. Osborne, “Machine Translation (MT)

History and Rule-Based Systems”, School of

Informatics, University of Edinburgh, 2012.

23. C. P. Francisca, S. L. Virach, C. P. Anee,

“Improving Translation Quality of Rule

Based Machine Translation”. Proceeding of

the workshop on Machine Translation; Asia,

2002.

24. F. M. Tyres, and J. Nordfalk, “Shallow

transfer RBMT for Swedish to Danish”,

Proceeding of the First International

Workshop on free/open-source RBMT, Pp.

27-33, 2009.

25. U. Zernike, “Lexical acquisition: Exploring

on-line resources to build a lexicon

Hillsdale”, NJ: Lawrence Erlbaum

Associates, 1991.

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER SCIENCE AND APPLICATIONSISSN 2321-872X ONLINE ISSN 2321-8932 PRINT IMPACT FACTOR : 0.782VOLUME 3, ISSUE 4, APRIL 2015.

www.ijarcsa.org 12 [email protected]


Recommended