+ All Categories
Home > Documents > Nastaaligh Handwritten Word Recognition Using a ContinuousDensity Variable-Duration HMM

Nastaaligh Handwritten Word Recognition Using a ContinuousDensity Variable-Duration HMM

Date post: 04-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
24
April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 95 NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM Reza Safabakhsh* and Peyman Adibi** Computational Vision/Intelligence Laboratory Computer Engineering Department, Amirkabir University of Technology Hafez Avenue, Tehran, Iran اﻟﺨﻼﺻـﺔ ﺳﻮف اﻟﺬاѧ هѧ ﻧﻘﺪمѧ ﺒﺤѧ ﻧﻈﺎﻣ ﺎﻣﻼѧ آѧ ﻋﻠﺮفѧ ﻟﻠﺘﻌﺎتѧ آﻠﻤ) ﻴﺔѧ اﻟﻔﺎرﺳѧ اﻟﺨﻄﻴﺘﻌﻠﻴﻖѧ ﻧﺴ( ﺘﺨﺪامѧﺮةѧ اﻟﻤﺘﻐﻴﺎﻻتѧ اﻟﺤﻮلѧ وﻃﺘﻤﺮةѧ اﻟﻤﺴﺎهﺪاتѧ اﻟﻤﺸﺎﺛﻒѧ وﺗﻜѧ اﻟﺨﻔ ﻣﺎرآﻮف ﻣﻮدﻳﻞ(CDVDHMM) . ѧ وﻓѧ ﻣﺮﺣﻠѧ ﻋﻠﻮلѧ واﻟﺤﺼﻮﻳﺰѧ اﻟﻨﺎءѧ وإﻟﻐﺎﻳﻨﺮيѧﺎتѧ ﻋﻤﻠﻴѧ ﺑﻌѧ اﻟﻤﻘﺪﻣﺪﻳﻢѧ اﻟﺘﻘﺰاءѧ اﻷﺟ اﻟﺘﺨﺪامѧ اﺳﺘﻢѧﻠﺔѧ ﻤﺘﺼ ﺟﺪﻳﺪة ﺧﻮارزﻣﻴﺔ وﺳﺎﺋﺮ واﻟﻨﻘﺎط واﻟﻬﺎﺑﻂ اﻟﺼﺎﻋﺪ ﻟﻜﺸﻒ اﻷﺟﺰاء اﻟﺮﺋﻴﺴﻲ اﻟﺘﺼﻮﻳﺮ ﻣﻦ وﺷﻄﺒﻬﺎ اﻟﺜﺎﻧﻮﻳﺔ. ﺛﻢ ﺗﻨﻔﻴﺬ ﻳﺘﻢ ﺧﻮارزﻣﻴﺔ ﺟﺪﻳﺪ ﺗﻘﻄﻴﻊ ة ﻋﻠﻲ أﺳﺎس ﻣﺴﺎﻋﺪﻳﺘﻴﻦ وﻋﻤﻠﻴﺘﻴﻦ اﻟﻌﻠﻮي آﺎﻧﺘﻮر ﺗﺤﻠﻴﻞ. واﻟﻐﺮضѧѧ ه اﻟﺨﻮارزﻣﻴﺔ هﻮ أن ﻻﺗﻜﻮن اﻹ ﻗﺪر هﻨﺎك ﻣﻜﺎن ﻣﺸﻜﻠﺔ اﻟﺘﻘﻄﻴﻊ ﺪم. ﺮةѧ اﻟﻤﺘﻐﻴ اﻟﺤﺎﻻت ﻃﻮل ﺗﺨﺼﻴﺺ ﺗﻢ وﻗﺪѧѧѧ اﻟﺰاﺋѧѧѧ اﻟﺘﻘﻄﻴѧѧѧ ﻹزاﻟ. ѧѧѧ ﻋﻠﻮلѧѧѧ اﻟﺤﺼѧѧѧ وﺑﻌﻴﻦѧѧѧ اﻟﻴﻤѧѧѧѧѧѧ اﻟﺘﺮﺗﻴѧѧѧ إﻟﺘﻢѧѧѧﺎرѧѧѧ اﻟﻴﺴﺮاءѧѧѧ إﺟѧѧѧ ﻣﻮدﻳCDVDHMM اﻟﻨﺎﺗﺠﺔ اﻟﺘﺤﺘﻴﺔ اﻟﺤﺮوف ﺑﻤﺆﺧﺮ. و ﻋﻠ ﺗﺸﺘﻤﻞ اﻟﺘﻲ اﻟﺜﻤﺎﻧﻴﺔ اﻟﺨﺼﺎﺋﺺѧ ﻓﻮرﻳﻴﻒѧ ﺗﻮاﺻﺎﺋﺺѧѧ اﻟﺨﺼﺪدѧѧ وﻋѧѧ اﻟﺜﻼﺛﻴﺔѧѧ اﻷﺳﺎﺳѧѧ اﻟﺘﺘﺨﺪѧѧ ﺗﺴѧѧﻮزѧѧ اﻟﺮﻣﺬﻩѧѧ هﺮضѧѧ ﻟﻌ مﻮاءѧѧ أﺟﺎﺋﺺѧѧ اﻟﺨﺼ. وѧѧ اﻟﺒﻌѧ اﻟﻤﺘﻐﻴ ﻏﻴﺮ اﻟﻘﻴﺎس ﻟﺘﻐﻴﻴﺮ ﺑﺎﻟﻨﺴﺒﺔ اﻟﺨﺼﺎﺋﺼﻲ. إنѧ ﻋﻠﺘﻤﻞѧ ﺗﺸﻮذجѧ اﻟﻨﻤﺬاѧ هѧﺎﻻتѧ اﻟﺤﺮفѧ أﺣѧ ﺧﺎﻟﺼ) ﺪونѧ أﺰاءѧѧ ﺛﺎﻧﻮﻳ( ﺪةѧ وﻋﻜﺎلѧ أﺷѧѧ ﺗﺮآﻴﺒﻴﻠﻮبѧ أﺳﺘﻌﻠﻴﻖѧ ﺑﺎﻟﻨﺴѧ اﻟﻜﺘﺎﺑ. ѧѧ وﻋﻠﻴﺘﻢѧѧ اﻟﻤﻮدﻳﻴﻢѧ ﺗﻌﻠ نﻠﻮبѧ ﻷﺳѧ اﻟﺤﺎﺟ ودونﻬﻮﻟﺔѧ ﺑﺴﺎﻧﻮيѧ اﻟﺜﺪﻳﺮѧ اﻟﺘﻘ. ѧ ﻋﻠﻮلѧ اﻟﺤﺼﺘﻢѧﻴﻢѧ اﻟﺘﻌﻠѧ ﻣﺮﺣﻠѧ وﻓﺎتѧ ﻣﻜﻮﻧѧﺪدѧ اﻟﻘﺎﻣﻮس ﻣﻦ واﻟﺒﺎﻗﻲ اﻟﺘﻌﻠﻴﻤﻴﺔ اﻟﺘﺼﺎوﻳﺮ ﻣﺠﻤﻮﻋﺔ ﻣﻦ اﻟﻤﻮدﻳﻞ، ﻋﻠ اﻟﺤﺼﻮل ﻳﺘﻢ وﺑﺎﻟﺘﺎﻟﻲ ﻧﺴﺨﺔ ﺧﻮارزﻣﻴﺔ ﻟﻠﺘﻌﺮﻳﻒﻟﺔ اﻟﻤﻌﺪ وﻳﺘﺮﺑﻲ. وﺗ هﺬ ﻌﻄﻴﻨﺎ اﻟﺨﻮارزﻣﻴﺔ أﻓﻀﻞﺎمѧﺎرѧ ﻣﺴѧѧ ﻷآﺜ ﺻﻮرةﺬيѧ اﻟѧ اﻟﻤﻮاﻗѧ ﻳﺄﺧ وا ﺷﻜﺎل ﻟﻸ اﻟﻤﺨﺘﻠﻔﺔﺮةѧ اﻟﻤﺘﻐﻴﺎﻻتѧ اﻟﺤﻮلѧ وﻳﺴﺎﻧﺪ ﺗﺤﺘﻴﺔ آﺤﺎﻻت ﺣﺮف. و ـѧ ﺗﺒﻴѧ أѧ اﻟﺘﺎراتѧ اﻻﺧﺘﺒ ن ﻋﻠ اﺟﺮﻳﺖ ذات وﻗﺎﻣﻮس ﺧﻄﻴﺔ ﻧﻤﺎذج50 اﻟﻤﺴﺘﺨﺪم ﻟﻼﺳﻠﻮب ﺟﻴﺪة ﻧﺘﺎﺋﺞ ﻗﺪﻣﺖ آﻠﻤﺔ،. ﻴﺔѧ اﻟﺮﺋﻴﺴ ﺎتѧ اﻟﻜﻠﻤ: ѧ ﻣﻮدﻳ ﺘﻤﺮ،ѧ ﻣﺴ ﺘﻌﻠﻴﻖ،ѧ ﻧﺴ ﻲ،ѧ ﻋﺮﺑ ﻲ،ѧ ﻓﺎرﺳ ﻊ،ѧ ﺗﻘﻄﻴ ﻲ،ѧ ﺧﻄ ﺎت،ѧ اﻟﻜﻠﻤѧ ﺗﻌﺮﻳ ﻣﺎرآﻮف اﻟﺨﻔﻲ. * To whom correspondence should be addressed. * Fax : +98 21 6495521 , E-mail: ([email protected]) ** ([email protected])
Transcript

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 95

NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION

HMM

Reza Safabakhsh* and Peyman Adibi** Computational Vision/Intelligence Laboratory

Computer Engineering Department, Amirkabir University of Technology

Hafez Avenue, Tehran, Iran

الخالصـةذا ال سوف امالً ًا نظام بحث نقدم في ه ية ( آلمات ى للتعرف عل آ ة الفارس تعليق الخطي ستخدام ا ب )نس

رة اثف المشاهدات المستمرة وطول الحاالت المتغي وفي . (CDVDHMM)موديل مارآوف الخفي وتكة ويز والحصول عل مرحل اء الن اينري وإلغ ات ب د عملي ة بع ديم المقدم تخدام الاألجزاء ى التق تم اس متصلة ي

ثم . الثانوية وشطبها من التصوير الرئيسياألجزاءلكشف الصاعد والهابط والنقاط وسائر خوارزمية جديدة ذ من والغرض . تحليل آانتور العلوي وعمليتين مساعديتينأساس علي ة تقطيع جديد خوارزميةيتم تنفيذ ه ه

رة . دم التقطيع مشكلة ع مكان هناك قدر اإل التكون أن هو الخوارزمية وقد تم تخصيص طول الحاالت المتغيد ع الزائ ة التقطي ول عل. إلزال د الحص ين ىوبع ن اليم ب م تم ىإل الترتي ار ي راء اليس ل إج مودي

CDVDHMM ة تواصيف فوري ىالخصائص الثمانية التي تشتمل علو. بمؤخر الحروف التحتية الناتجة ائص دد الخص ة وع يةالثالث ي األساس تخدالت ي تس وز ف ذه الرم رض ه واءم لع ائصأج د و. الخص البع

ر وذج تشتمل عل إن . الخصائصي بالنسبة لتغيير القياس غير المتغي ذا النم خالصة أحرف ىالحاالت في هدون ( زاءأب ة ج دة ) ثانوي كالوع ي أش ة ف تعليقأسلوب ترآيبي ة بالنس ه ف. الكتاب تم إوعلي ل ي يم المودي ن تعل

انويبسهولة ودون الحاجة ألسلوب دير الث تم الحصول عل. التق يم ي ة التعل ي مرحل ات ىوف دد من مكون ع خوارزمية نسخة ى وبالتالي يتم الحصول عل ،الموديل من مجموعة التصاوير التعليمية والباقي من القاموس

ام أفضل الخوارزمية هعطينا هذ وت. ويتربي المعّدلة للتعريف ر من مسار ع ذي صورة ألآث ع ال يأخذ المواقرة المختلفة لأل شكال ألوا ـّن و. حرف آحاالت تحتية ويساند طول الحاالت المتغي د تبي ي أق ارات الت ن االختب

. آلمة، قدمت نتائج جيدة لالسلوب المستخدم50 نماذج خطية وقاموس ذات ىاجريت عل

ية ات الرئيس ل :الكلم تمر، مودي تعليق، مس ي، نس ي، عرب ع، فارس ي، تقطي ات، خط ف الكلم تعري . الخفيمارآوف

* To whom correspondence should be addressed. * Fax : +98 21 6495521 , E-mail: ([email protected])

** ([email protected])

Reza Safabakhsh and Peyman Adibi

96 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

ABSTRACT This paper introduces a complete system for recognition of Farsi Nastaaligh

handwritten words using a continuous-density variable-duration hidden Markov model, CDVDHMM [1]. In preprocessing stage, after binarization, noise reduction, and connected component specification, new algorithms are applied to find and eliminate ascenders, descenders, dots, and other secondary strokes from the original image. Then a new segmentation algorithm based on analyzing upper contour and two other processes is applied. The main goal of this algorithm is to avoid the under-segmentation problem. Considering variable duration states in the system allows covering the over-segmentation problem. By finding the right-to-left order, the sequence of obtained sub-characters is modeled by the CDVDHMM. Eight features, including three Fourier descriptors and five structural and discrete features, are applied to represent symbols in the feature space. This feature vector is invariant to size and shift. The states in the model are considered as pure characters (without secondary strokes) plus some compound forms of characters in Nastaaligh handwriting style. Thus, training the model becomes simple and does not need any re-estimation method. In the training stage, some parameters of the model are obtained from the training image set and the others from the dictionary. At the last stage, a modified version of Viterbi algorithm is applied for recognition. This algorithm provides more than one globally best path and considers different positions and forms of letters as sub-states and also supports variable duration states. Experiments on handwritten samples and a 50-word dictionary show very good performance of the system.

Key Words: OCR, handwritten, word recognition, segmentation, Farsi, Arabic, Nastaaligh, cursive, HMM, CDVDHMM.

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 97

NASTAALIGH HANDWRITTEN WORD RECOGNITION USING A CONTINUOUS-DENSITY VARIABLE-DURATION HMM

1. INTRODUCTION

Off-line recognition of handwritten text has many applications in bank check processing, postal address and zip code recognition, and automated handwritten document entry and understanding. As a result, research interest is increasing in this field and some progress has been made. However, the performance of even the best handwritten text recognition systems is as yet far from human reading ability. Many papers have been concerned with the recognition of Latin, Japanese, and Chinese characters in recent years. But although almost one third of the people in the world use Arabic and Farsi characters for writing, little and sparse efforts for the automated recognition of these characters have been made. This is probably the result of a lack of adequate support in terms of funding, and other utilities, such as comprehensive and standard Arabic or Farsi text databases, dictionaries, etc; and certainly, of the cursive nature of writing in these languages [2]. More details on the state of the art in Arabic character recognition is presented in [2].

An important aspect in classification of character recognition systems is the existence of and the used method of segmentation in them. The concept and various methods of segmentation are reviewed in [3]. Three basic strategies for segmentation are proposed there, such that each segmentation method can be considered as a weighted combination of these three strategies. These strategies are as follows: (1) classic strategy, that attempts to dissect images to classifiable units; (2) recognition-based segmentation strategy, that looks for components of image which match to classes of system’s alphabet and decides about segmentation using a feedback from the recognition stage; and (3) holistic strategy, that tries to recognize a word as a whole.

The holistic methods have the advantage that the difficult dissection stage is not required in them, but their drawback is that the number of words for which the system is designed is limited and cannot be too many. On the other hand, the classic and recognition-based methods are more powerful and not limited in the number of words which they can recognize.

In this paper, we have developed a system for off-line recognition of Nastaaligh handwritten words which uses a recognition-based segmentation method and applies a continuous-density variable-duration hidden Markov model for the recognition task. In Section 2, characteristics of Farsi and Arabic writings are briefly described. In Section 3, the hidden Markov model (HMM) and several word recognition systems based on HMM are discussed. Section 4 describes the operational stages of the overall system. In Section 5, experimental results for each stage of the method are presented. Section 6 concludes the paper.

2. CHARACTERISTICS OF FARSI/ARABIC CURSIVE SCRIPT

Farsi/Arabic scripts are different from Latin scripts in several ways: (1) the shape of a Farsi/Arabic letter is a function of position of that letter in the word. For each letter, there may be up to four different shapes based on the letter position in the word, which are called “first”, “middle”, ”last”, and “isolated” forms of the letter (Table 1). In addition, for some Farsi writing styles, there is more than one shape for a letter in a fixed position. (2) Farsi and Arabic writings are naturally cursive. Nevertheless, some characters never connect to the next letter in the word. Because of this, a word can have more than one cursive part. These cursive parts are here referred to as “sub-words”. (3) Farsi and Arabic scripts have various styles. Also, each writing style can contain new and compound forms of letters. Thus, if we consider, for example, an unconstrained handwritten Farsi text, the number of separate classes that must be considered will be too many. This makes the recognition process very difficult. (4) Farsi and Arabic characters can have zero, one, two, or three dots over or under them; and sometimes, the only difference between two characters is the existence or the number of these dots. 5) Farsi and Arabic text, in contrast to Latin texts, is written from right to left. A list of Farsi characters and different forms of them is presented in Table 1. The arabic alphabet is identical to the Farsi alphabet, except that Farsi has four more characters (these four characters are underlined in Table 1). Characters that are similar except for their dots or other secondary strokes can be considered as one family. For example, the family of character “Be” contains characters of rows 2 to 5 of Table 1.

Reza Safabakhsh and Peyman Adibi

98 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

Table 1. Farsi character set and shapes of each character in different positions.

Last Middle First Isolated Character

)آ ( ا ـا ـا )آ ( ا Alef 1 Be 2 ب بـ ـبـ ـب Pe 3 پ پـ ـپـ ـپ Te 4 ت تـ ـتـ ـت Se 5 ث ثـ ـثـ ـث Jim 6 ج جـ ـجـ ـج Che 7 چ چـ ـچـ ـچ He 8 ح حـ ـحـ ـح Khe 9 خ خـ ـخـ ـخ Dal 10 د د ـد ـد Zal 11 ذ ذ ـذ ـذ Re 12 ر ر ـر ـر Ze 13 ز ز ـز ـز Zhe 14 ژ ژ ـژ ـژ Sin 15 س سـ ـسـ ـس Shin 16 ش شـ ـشـ ـش Sad 17 ص صـ ـصـ ـص Zad 18 ض ضـ ـضـ ـض Ta 19 ط طـ ـطـ ـط Za 20 ظ ظـ ـظـ ـظ Ayn 21 ع عـ ـعـ ـع Ghayn 22 غ غـ ـغـ ـغ Fe 23 ف فـ ـفـ ـف Ghaf 24 ق قـ ـقـ ـق Kaf 25 ك آـ ـكـ ـك Gaf 26 گ گـ ـگـ ـگـل ـلـ ـل Lam 27 ل Mim 28 م مـ ـمـ ـم Noon 29 ن نـ ـنـ ـن Waw 30 و و ـو ـو He 31 ه هـ ـهـ ـه Ye 32 ي يـ ـيـ ـي

Different scripts that use the Farsi alphabet can be divided into eight groups, and each group can include different

styles [4]. Some of these styles were more common in the past and some are so in the present. Most of today’s handwritten Farsi texts are written in “Nastaaligh” and “Naskh” styles; and the “Nastaaligh” style, due to its special beauty, is the most popular and favorite writing style among most writers. As a result, the “Nastaaligh” writing style is considered in this paper.

The Nastaaligh style, however, despite its wide popularity, is a difficult style and has numerous rules and exceptions, which make its automatic recognition very hard. Some examples that show such rules and the difficulties of the work are presented below.

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 99

• The family of character “Be” (بـ) (rows 2 to 5 of Table 1), depending on the letter following them, are written in

different forms:

• The family of character “He” (حـ) (rows 6 to 9 of Table 1), depending on the letter following them, appear in

different forms:

• The family of character “Kaf” (آـ) (rows 25 to 26 of Table 1), depending on their position in the word, appear in

different forms:

• The character “Mim” (مـ), is written in two different forms, rectangular and circular:

• The character “He” (ه) (row 31 of Table 1), depending on its position in the word, can appear in different forms:

• Cogs in the words, sometimes appear as curves:

• The distance between a character and the middle baseline or the vertical position of the character, depending on

other characters of the word, can be different:

• Some characters and compound forms rest on the baseline while others do not. In fact, some parts of words are

written in an angle about 30 degrees to the baseline:

Even if we ignore the problems arising from multiple shapes of some characters, the above last two rules alone are

sufficient to indicate the difficulty of segmenting Nastaaligh Handwritten words.

Reza Safabakhsh and Peyman Adibi

100 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

In order to create appropriate training and testing sets, we must select a minimal set of words which include various features of the handwriting style. We assume that the training and testing sets are constrained only to the Nastaaligh style to reduce the number of patterns that must be recognized. Also, the words included in these sets are carefully selected in such a manner that they contain all characters, character shapes, and compound forms of characters in Nastaaligh style. Figure 1 shows some samples of the words used and Figure 2 illustrates four compound forms of characters in Nastaaligh style which are included in our system. More details about the selected training and testing sets are presented in Section 5.1.

تخصيص باج گيرها هدهد آاآل تسريع مهمات

Figure 1. Some samples of handwritten words in the test set.

1. Compound form made by composition of three letters ”ك-م-ك“ or ”ل-م-ك “ or ”ا-م-ك“ :

: آمك

،

: اآمل

،

:آمال

2. The composition ”ه-ط“ : 3. The composition ”ا-ه“ : 4. The compositions ”ل-ك“ “ا-ك” , , and ”ك-ك“ :

: ظهر ، : بعدها ،: ، آاآل:آك

In the above cases, we can also consider ”گ“ and ”ظ“ in place of “ ك” and ”ط“ , respectively.

Figure 2. Four compound forms in the Nastaaligh Farsi writing style.

3. TEXT RECOGNITION BASED ON HIDDEN MARKOV MODELS

3.1. Introduction to HMM

Hidden Markov models are based on doubly stochastic processes whose underlying random process is not directly observable (i.e. it is hidden). The transition of the system from the current state to the next state is done based on this underlying process. Observable outputs or observations are produced by another stochastic process, which is determined by symbol probabilities. A hidden Markov model with discrete observation symbols, is represented by ),,( Π= BAλ , where A is the state transition probabilities matrix, B is the discrete probability distributions of observation symbols, and Π is the probability of initial states [5].

In some applications, the distribution of observations is considered continuous and duration probability of states is considered in explicit form. For example in [1] a continuous density variable duration HMM (CDVDHMM) is applied. This model is represented by ),,,,( DBA ΓΠ=λ whose parameters are:

Initial probability: }Pr{};{ 1 iii qi ===Π ππ , StateFirst 1 =i (1)

Transition probability: jijij qaaA Pr{};{ == at iqt |1+ at }t (2)

Last-state probability: }Pr{};{ iTii qi ===Γ γγ , StateLast =Ti (3)

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 101

Symbol probability: )...()};({ 1 dtttdt

tdt

tj oooOObB ++++ == , }|Pr{)( jj qOOb = (4)

Duration probability: })(Pr{)|()};|({ dqdurationqdPqdPD iii === (5)

In the study of HMM’s, there are three basic problems: (1) Given an observation sequence O={ 1o ,…, To } and a model λ , how do we find )|( λOP effectively? This is the scoring problem. (2) Given an observation sequence O and a model λ , how can we find the state sequence },...,{ 1 Tqqq = for O such that it is optimal in some sense (i.e. better explains observation sequence)? This is the recognition problem. (3) How can we find the parameters of the model which maximize )|( λOP ? This is the training problem.

The solution to problem (1) is the forward or backward process. For problem (2), the most common optimization criteria is finding an optimal state sequence (an optimal path). The Viterbi algorithm yields such a sequence. To solve problem (3), one can apply the Baum–Welch re-estimation algorithm. The reader is referred to [5] for more details about these solutions.

3.2. Text Recognition with HMM

In word recognition problems, there are two main approaches to model the observation sequence (pseudo-characters) by HMM [6]. The first approach is called model discriminant HMM. In this strategy, for each class of the problem (each word in lexicon) a model is constructed. Then for recognizing an input word, the score for matching the word to each model is computed, and the class related to the model that has the maximum score gives the result of the recognition. This approach is reasonable for small dictionary sizes, say up to several hundred words. But when the size of dictionary grows to about 1000 words or larger, this approach will have excessive complexity in terms of computation and memory. The second approach, called path discriminant HMM, is to build only one model for all classes and use different paths (state sequences) to distinguish one pattern from the others. A test pattern is classified into the class which has the maximum path probability over all possible paths. This approach is a better alternative for a large or variable dictionary.

Some researchers have applied path discriminant methods for Latin handwritten word recognition [1,6–8]. In [7], a second order HMM is also tested which for long words has shown a better performance in comparison to the first order model. In [6], the inputs to the system are assumed to be unconstrained handwritten words. In this system, the number of states may become too large and so the speed and precision of the system can decrease. In [1], this problem is removed by considering variable durations for states and using an over-segmentation method, which does not leave any two letters unsegmented. In addition, by considering continuous density for observation probabilities, the performance of the system is improved. The problem with this system is unreliable training of state duration probabilities in limited training databases. To remove this problem, a system is proposed in [8] whose operation is independent of state duration probabilities.

In [8–12], model discriminant approaches are used. In these systems, a left-to-right (for Latin characters) or right-to-left (for Arabic characters) HMM is considered for each character, and the word model is obtained by concatenation of these HMM’s. In [13] and [14], 2-D hidden Markov models are applied for recognition of printed Arabic words, and in [15], the model is used to improve the performance of recognition of Farsi printed sub-words.

4. THE RECOGNITION SYSTEM

In this paper, a system for off-line recognition of Farsi Nastaaligh handwritten words is presented. The stages of the system are illustrated in the block diagram of Figure 3. First, the necessary preprocessing algorithms are applied to the image of the word. Then, the word is dissected to its letters or pseudo letters, and a set of features is extracted from the image of each segment or combination of adjacent segments, and recognition is done based on the classification of these feature vectors. In the recognition stage, the model, which is trained before, recognizes the word using these feature vectors. Since single segments and combinations of adjacent segments are examined for finding optimal letters, we can classify the segmentation method used in our system as a recognition-based method.

Because of successful applications of hidden Markov models (HMMs) in word recognition systems [1, 6, 16, 17], an HMM-based method is selected for the recognition stage of the system. The applied model is a continuous-density variable-duration HMM [1]. The “recognition–combining adjacent segments–feature extraction” cycle, which is used by the Viterbi algorithm for HMM, is the key factor in optimal determination of words in this system.

Reza Safabakhsh and Peyman Adibi

102 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

Figure 3. Block diagram of the system

4.1. Preprocessing

In preprocessing stage, the input image is first binarized by means of the iterative threshold selection method [18]. Then the two morphological operations closing with 3 × 3 and opening with 2 × 2 structural elements are applied to the image respectively to eliminate spiked noise [1]. Then, connected components are found by an algorithm which starts from the top row of the image and builds bounding rectangles. By adding consequent rows, the height and width of these rectangles are modified such that when we arrive at the bottom of the image, final bounding rectangles, i.e. connected components, are obtained [19]. The pen width is estimated by an algorithm in which the mean value of the vertical run length is computed in each column, periodically. Then, those run lengths larger than 1.5 times this mean value will not enter in computation of the mean value in consequent iterations [6].

For handwritten Farsi/Arabic words, and specially Nastaaligh writing style, baseline detection is a difficult and unreliable process. Sometimes, for Nastaaligh style, more than one or one slanted baseline must be considered. Thus, information provided by the horizontal histograms of words may not be sufficient for baseline detection. Some other methods for baseline detection are also proposed (e.g. in [20]); but due to their high complexity, they must be used only when necessary. As a result, we design our system such that it works independent of the baseline.

In our system, since the model receives a sequence of segments, the right-to-left order of segments must be specified after segmentation. In the absence of the baseline, the ascenders of characters “Kaf” and “Gaf” (Figure 4(b)) and descenders of characters “Jim”, “Che”, “He”, “Khe”, “Ayn”, and “Ghayn” (Figure 4(a)), cause incorrect determination of the right-to-left order. Thus, it is desired to eliminate problematic ascenders and descenders in preprocessing stage. This elimination also eliminates the incorrect segmentation of them, and therefore decreases the segmentation errors. Furthermore, since the recognition process is done based on the pure body of characters, elimination of dots and other secondary strokes from the image is required in the preprocessing stage. These secondary strokes are processed in a post-processing stage and the recognition task becomes complete. To eliminate ascenders, descenders, and dots, several new algorithms are proposed that will be explained in the following sections.

:تسريع :بكمكش (a) (b)

Figure 4. Problems arise from descenders and ascenders: (a) “Ayn” will be considered before “Ye” (b) “Kaf” will be considered before “Be”.

4.1.1. Ascender Elimination Characters “Kaf” and “Gaf” have ascenders that should be eliminated. Some characteristics that discriminate

ascenders from other strokes in a word are their almost 45-degree slope, straightness, and relative large length. These three features form the basis for the ascender detection and elimination algorithm. Character “Kaf” has only one long ascender while “Gaf” has one long and one short ascender. The algorithm for detection and elimination of ascenders is shown in Figure 5.

Combining adjacent segments

Recognized word

Feature vectors

Images of segments

Preprocessed image Feature

extraction Recognition Preprocessing Segmentation

Input word image

Images of combined segments

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 103

1. For each valid connected component (CC) do: 1.1. Find the top-most point of the lower contour and call it SP (Starting Point). Let k=k1=0. 1.2. While stop condition is not true, starting from SP, do:

1.2.1. Traverse the lower contour downward by going to the next point of lower contour. k=k+1. 1.2.2. If (current move is in left-down direction): k1=k1+1.

1.3. If (a0xPW > k > a1xPW AND k1/k > b1): a ‘short ascender’ is detected. Mark it. 1.4. If ( k > a0xPW AND k1/k > b0 ): a ‘long ascender’ is detected.

1.4.1. Mark it, and eliminate its lower contour points from the lower contour of this CC. 1.4.2. goto step 1.1 to search for other probably existing ascenders in this CC.

2. For each detected ascender in step 1 do: 2.1. Check validity conditions. 2.2. If (this ascender is valid): Eliminate it from the image by filling it with color ASCENDER_COLOR.

Figure 5. Ascender detection and elimination algorithm. In this algorithm, PW is the estimated pen width. The parameter values showing the best results in experiments are

4.95, 1.4, 0.58, and 0.33 for a0, a1, b0, and b1, respectively. The lower contour for each connected component is found from the chain code of the outer contour obtained during finding connected components. East, North-East, and South-East directions in the chain code of the outer contour represent the lower contour. In step 1.2 of the algorithm, the stop condition becomes true if one of these situations occur while the lower contour is traversed:

(i) Movements in the left or left-up directions or a combination of these two directions continue in more than 3, 1, and 3 pixels, respectively.

(ii) A jump with a displacement of more than 1 pixel upward or to the right, or more than 4 pixels downward, or more than 3 pixels to the left direction.

(iii) Reaching the last point of the lower contour. In step 2 a long ascender is considered to be invalid if it has a long overlap with other strokes near the head of

ascender or there is a relatively large change in the stroke width near its head, or there exists a downward vertical part at the head of the ascender. A short ascender is valid if it approximately covers all the space of its connected component and is higher upper than and very close to a long ascender. Figure 6 illustrates the application of the algorithm to a word having two long and one short ascenders. Figure 6(b) shows the lower contour of the word in Figure 6 (a). At first, the lower contour is traversed from SP1 to EP1, where EP1 is the last pixel in the lower contour of this connected component. Variables k and k1 satisfy the conditions of a long ascender. Thus SP2 is selected as the new starting point. At EP2 a long jump to the right terminates the loop and another long ascender is detected. Again starting from SP3, the last point of current connected component, i.e. EP3, is reached and the values of k and k1 denote a short ascender. Validity conditions are true for all these ascenders and they are eliminated from the image as shown in Figure 6(c).

(a) (b) (c)

Figure 6. (a) A word with characters “Kaf” and “Gaf”. (b) Three ascenders are detected and illustrated with their starting points (SP) and end points (EP). (c) All detected ascenders are valid and are eliminated from the image.

In Figure 7, two samples of operation of this algorithm are shown. Figure 7(a) shows a successful elimination of the ascenders of the two “Kaf” letters, while Figure 7(b) illustrates a mistake of the algorithm. In this figure, character “Te”, because of its 45 degrees slope and relatively large length, is also eliminated incorrectly as an ascender.

Reza Safabakhsh and Peyman Adibi

104 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

(a) (b) Figure 7. Operation of ascender elimination algorithm: (a) correct operation; (b) incorrect operation.

4.1.2. Descender Elimination

The algorithm which detects and eliminates the descenders of characters “Jim”, “Che”, “He”, “Khe”, “Ayn”, and “Ghayn” is shown in Figure 8. These descenders cause the incorrect determination of right to left order.

This algorithm works as follows. In the image, it starts from the right-most column which contains black pixels and selects the most-bottom black run in it, and follows this black run column-by-column toward left until this run joins another black run (step 1.1.3). In step 1.1.3.1 this detected descender is considered to be valid if the following conditions are true:

(i) The overlap length of this stroke with upper strokes (UpRunLen) is relatively large (more than 2.5 × PW, where PW is the estimated pen width).

(ii) The length of this stroke is relatively large (more than 2.5xPW).

(iii) At least in one column, there are more than two black runs.

1. For each valid connected component (CC) do: 1.1. For each column of current CC starting from right column do:

1.1.1. Find black runs in the current column and let the lowest black run to be current run (CurRun). UpRunLen=0. 1.1.2. If (number of black runs is greater than 1 AND current column is not the most right column):

1.1.2.1. UpRunLen= UpRunLen+1. 1.1.2.2. If (number of black runs which are adjacent to CurRun is more than 1): the current CC does not contain any descender. go to the next CC. Else: let the black run which is adjacent to CurRun as CurRun.

1.1.3. If (number of black runs in previous column which are adjacent to CurRun is more than 1): a descender is detected:

1.1.3.1. Check validity conditions. 1.1.3.2. If (this descender is valid): Eliminate it from the image by filling it with color DESCENDER_COLOR.

Figure 8. Descender detection and elimination algorithm.

These characteristics discriminate descenders from other strokes properly. Fig. 9(a) shows a typical result obtained by this algorithm. Fig. 9(b) and 9(c), respectively, show the situations that make the conditions in steps 1.1.2.2 and 1.1.3 true.

(a) (b) (c) Figure 9. (a) Operation of descender elimination algorithm. (b) Situation which satisfies condition in step 1.1.2.2 of the algorithm. (c)

Situation which satisfies condition in step 1.1.3 of the algorithm.

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 105

4.1.3. Secondary Strokes Elimination

To detect the secondary strokes, some of their characteristics such as small size and containment in a larger sub-word can be considered. We consider a connected component as a secondary stroke and eliminate it from the image if its width is less than 2.5 times and its height is less than 5 times the estimated pen width (or vice versa) and it overlaps, at least in 25 percent of its width, with a larger component.

The above thresholds are determined such that no other strokes are incorrectly eliminated as secondary strokes. Thus the algorithm retains some secondary strokes that are written moderately large or are distant from character body. To enhance the performance of this algorithm, a primary classifier can be used in the preprocessing stage which can discriminate secondary strokes from letters that have nearly the same size as them (such as single forms of letters “Alef”, “Dal”, “Zal”, “Re”, “Waw”, and “He”) [21]. Since the number of classes is smaller in this case, the features extracted from the image can be simpler and can be optimized for recognition.

4.2. Segmentation and Determination of the Right-to-Left Order

The objective of the segmentation stage here is to achieve an over-segmentation such that each pair of connected characters are split. Then characters can be considered as states in the recognition stage [1]. When a character is segmented to more than one segment, variable duration of HMM states considered in this system, covers this problem. After segmentation, right-to-left order of segments must be found to use in recognition stage.

We studied the existing word segmentation techniques and their ability to satisfy the mentioned criteria. The methods that are based on vertical histogram or baseline [22–25] are not suitable for handwritten words, specially for Nastaaligh style, because of various vertical overlaps and horizontal slants that exist in this style. Furthermore, methods that use vertical width of strokes [26] do not seem to be very appropriate for moderately free handwritten scripts. We will propose two enhanced methods which are more suitable for Nastaaligh word segmentation.

The first method works based on the idea of regular and singular components, and considers the regular components as candidates for segmentation. The second method works based on analysis of upper contour of the words. In next sections, these segmentation methods and the technique used for finding right-to-left order of segments will be explained. 4.2.1. Segmentation using Regular and Singular Components

Segmentation based on regular and singular components (or regularities and singularities) is proposed in [27], [6], and [1] for Latin handwritten and in [19] for Arabic handwritten words. We have implemented a segmentation method based on the same idea. In this method, first the holes in the preprocessed image (such as loops in characters “ط“ ,”ص”, etc.) are filled to avoid segmentation in these loops. Then, an opening operation is performed on the image with a ,”ه“vertical structural element, whose height is a little (one or two pixels) larger than the estimated pen width. In this way, the moderately vertical parts of the image are obtained. Then a closing operation with a horizontal structural element having a small width (about three to five pixels) is performed to join together the vertical parts (resulting from the previous operation) that are close to each other. The results of this operation are called singularities or islands.

By subtracting these components from the original image, regularities or bridges are found. At this point, some characters may have too many regularities which will cause an unacceptable over-segmentation. To decrease this problem, those regularities with a width smaller than a threshold (e.g. estimated pen width) are eliminated; i.e., they are added to singularities. Then among the remaining regularities (bridges), those that do not join two singularities (two islands) are also eliminated, i.e. are added to singularities (these regularities are the starting or the ending components of the sub-words). Finally, segmentation is performed at the middle of the final regularities. The following parameters are effective in the performance of this algorithm: • The height of the structuring element for the opening operation: larger values for this parameter reduce the number

of regularities, and thus increase over-segmentation and decrease under-segmentation. Since our goal here is to decrease under-segmentation, a value equal to the estimated pen width plus two is selected for this parameter. This value has experimentally shown better results.

• The width of the structuring element for the closing operation: smaller values for this parameter results in more over-segmentation and less under-segmentation occurrences. In [1] the value ‘5’ is proposed for this parameter. We have selected the value ‘3’ for it.

Reza Safabakhsh and Peyman Adibi

106 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

Figure 10 shows some words, regularities and singularities, and the segmentation of them.

(a)

(b)

(c) Figure 10. Operation of the first segmentation method, which is segmentation based on regularities and singularities. (a) Binarized and noise reduced images of the words: “ فكر بي Singularities and regularities specified by black and gray (b) .”ياسمن“ ,”محجوب“ ,”

colors, respectively. (c) Resulting segmentation.

4.2.2. Segmentation using Local Minima of the Word Upper Contour

In [28], the local minima of the upper contour of words have been considered as candidate positions for segmentation. Then if some conditions are satisfied, segmentation is performed in these positions. In addition, overlapping areas are detected and if required, segmentation is performed there. We have modified this method to be suitable for Nastaaligh handwritten words.

In Nastaaligh style, when character “Re” is connected to a character before it, it is written without any upper contour minima between it and the previous character. As a result, this method is not able to segment character “Re”. Therefore, a new algorithm for detection and segmentation of connected “Re” is developed. First, this algorithm is applied to the word image. Then, the overlapping areas are detected and proper segmentation is performed there. Next, the upper contour is found by a simple method, and its local minima are found as primary segmentation points (PSP’s). A validation process is performed for these PSP’s and the word is segmented at the position of the valid PSP’s. The algorithms for these steps are explained in below.

• Detection and Segmentation of the Connected “Re” The connected “Re” is detected using an idea similar to the one used for ascender detection. The special characteristics

that discriminate connected “Re” from the other strokes are its almost 45 degree slope and large length. The proposed algorithm is shown in Figure 11.

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 107

1. For each valid connected component do: 1.1. Let ncol=0 . 1.2. For each column of this connected component, from the left most to the right most column do:

1.2.1. If (there exists more than one black run in the current column OR the width of some runs are more than PR0xPW): exit the loop (i.e. goto step 1.3). 1.2.2. If (the most bottom black point of the current column is more than PR1 pixels under the lowest black point of the first column in the current decreasing trend (to consider probable rising end of “Re”)): exit the loop (i.e. goto step 1.3). 1.2.3. ncol=ncol+1 .

1.3. If (ncol <= PR3xPW): the sub-word does not contain “Re”. Else: 1.3.1. The traversed lower contour is considered as a sequence of segments of 3-pixel length and a label ‘H’ or ‘S’ is assigned to each segment on the bases of, respectively, horizontal or slanted form of the segment. 1.3.2. The sequence of the labels ‘H’ and ‘S’ is smoothed by a state machine (e.g. ‘SHS’ is converted to ‘SSS’, etc.). 1.3.3. If (the number of the columns which considered as slanted are more than PR4xPW): a cut with color “SEGMENTATION_COLOR” is produced at the most right one of the traversed columns.

Figure 11. Connected “Re” detection and segmentation algorithm.

The values of the parameters are PR0=2.9, PR1=1, PR2=1, PR3=4.8, PR4=2. The performance of the proposed algorithm is very good. Figure 12 shows some results of this method.

Figure 12. The result of connected “Re” detection algorithm for three words “شرف“ ,”ظهر”, and “تيررس”.

• Detection and Segmentation of Overlapped Strokes

Segmentation using local minima of the upper contour is not able to segment the overlapped strokes either. For example, the form of middle “He” (row 8 in Table 1) in Nastaaligh style (such as

) or last “Ye” (such as ) or middle “Mim” (such as ) cannot be segmented by this method. This problem can be solved by finding overlapped strokes and performing segmentation in these areas (Figure 14). We have proposed a new algorithm for detection and segmentation of overlapped strokes that is shown in Figure 13. Figure 14 shows several good results obtained from this algorithm.

• Finding the Minima of Upper Contour and Segmentation

The first step in this stage is finding the upper contour. For this purpose, the chain code at each point of the outer contour, found in the previous stages, is obtained by traversing it and saving the West, North-West, and South-West directions in it as the upper contour. Then the weak noise on this contour (with one pixel width and less than ‘ContourNoise’ height) is eliminated. However, this elimination may result in a sub-segmentation problem for some cases in handwritten Nastaaligh style, e.g. for weak cogs. Thus, we set parameter ‘ContourNoise’ to zero. Figure 15 shows the upper contour of a word resulting from this algorithm.

Then the local minima of the upper contour are found and the segmentation is performed based on them. The algorithm for these operations is shown in Figure 16.

Reza Safabakhsh and Peyman Adibi

108 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

1. For each valid connected component do: 1.1. For each column of this connected component, from right most to left most column do:

1.1.1. The number, length, and position of the black runs are found in the current column. 1.1.2. For each black run, starting from the most bottom of them do:

1.1.2.1. If (there is no black points in the right column adjacent to this run AND overlap was found): 1.1.2.1.1. Move toward left direction until two overlapped pieces are joined together. So two columns that there is overlap between them are found. 1.1.2.1.2. By traversing outer contour, we check that overlaps are not related to a loop. 1.1.2.1.3. On the upper part of the overlap, the position which has the least width is found and is signed for segmentation.

Figure 13. Overlapped strokes detection and segmentation algorithm.

Figure 14. Operation of the overlap detection algorithm for three words “ياسمن“ ,”محجوب”, and “انگليسي”.

Figure 15. Obtained upper contour for the word “تخصيص”.

1. For each valid connected component do: 1.1. Finding PSPs: During traversing the upper contour, the points in which a falling trend is replaced with a rising one are found by a state machine, and the positions of these local minima are saved as primary segmentation points (PSP). If the minimum value continues in more than one pixel, the PSP is considered on the pixel whose width of pen is minimum; and if this situation also continues in more than one pixel, the PSP is considered at the middle of this part. 1.2. Validation of PSPs: for each found PSP, if there is no loop under it and the width of pen there is lower than THR1xPW (THR1 is considered equal to 4), this candidate point is valid and is labeled to segment. 1.3. Doing segmentation: in the labeled pixels, a cut is made with color ‘SEGMENTATION_COLOR’ only if the left and right sides of the cut is black. If one side of the cut is a loop, the column of the hole which is adjacent to the cut is filled with black color to prevent this adjacency. 1.4. After segmentation, the cuts which result to small segments (less than three pixels wide) are canceled.

Figure 16. Minima of the upper contour detection and segmentation algorithm.

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 109

Figure 17 shows three samples of the operation of the complete segmentation algorithm, i.e. after running the algorithms of Figure 11, 13, and 16.

Figure 17. Operation of the second segmentation method, which is segmentation based on the minimums of the upper contour.

4.2.3. Finding Right-to-Left Order

A relatively complicated algorithm for finding right-to-left order of the segments is proposed in [1]. The algorithm proposed here is much less complicated than that algorithm. After ascender and descender elimination, the order can be found independent of the baseline. First, the order of sub-words is found. Then in each sub-word, the order of segments is obtained by considering the right most segment as the first one in the sub-word, and then traversing the outer contour and considering the order in which segments are visited. This algorithm is shown in Figure 18.

4.3. Feature Extraction

The feature extraction method used in a character recognition system is probably the most important factor in achieving a good recognition rate [29]. Many different feature extraction methods are proposed in the literature, and the most suitable ones of them are generally found experimentally.

After studying various feature extraction methods and testing some of them [21], we selected a mixed feature vector containing various features from binary and outer contour representations of pseudo-character images. The features we tested include geometric moments [18] extracted from binary and thinned representations, Fourier descriptors [30] extracted from outer contour, discrete and structural features including loop, height-to-width ratio, number of black points to total number of points ratio, position of connection to the right and left pseudo characters [21] extracted from binary representation, and pixel distribution features plus some other discrete features such as end points, T-joints, X-joints, and zero-crossing features [6] extracted from skeletons.

Various combinations of these features were tested on images of ideally segmented characters using a mixture-of-Gaussian classifier. Finally, eight features, including three Fourier descriptors (descriptors number one to three), number of loops, height-to-width ratio, the number of black points to total number of points ratio, and the position of right and left connections were selected. This mixed feature vector, in addition to high discrimination power, has a short length that increase the speed of recognition. With these features no skeletonization is required, and so we avoid the complexity of such process. The feature vector is invariant to scale and shift.

The normalization of features [6] makes the discrimination effect of them moderately equal. But some features may have more discrimination power than the others and by normalization this deference will be ignored. So we found a weight for each feature experimentally instead of normalizing them. These weights, which show the importance of each feature, resulted in more discrimination power in experiments. Fourier descriptors, which are normalized by the first descriptor, are used with unity weight, the loop feature with weight 40, the height-to-width ratio with weight 15, the number of black points to total number of points ratio with weight 45, and the left and right connections position with weight 40. This feature vector showed a good performance. Table 3 compares the performance of these features with the features proposed in [6].

Reza Safabakhsh and Peyman Adibi

110 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

1. For each connected component do: 1.1. Filling the cut positions with black color in a temporary image (a copy of the segmented image): during the traversing of outer contour, each time a pixel with color ‘SEGMENTATION_COLOR’ is visited, pixels relevant to this cut are painted black.

2. Finding right-to-left order of sub-word: The connected components of temporary image are sorted by their start columns (i.e. right most column) such that the right most sub-word becomes the first one in the order. 3. For each connected component of the temporary image (i.e. each sub-word) in order obtained in step 2 do:

3.1. For each connected component of the segmented images (i.e. each segment) do: 3.1.1. If (the current segment belongs to the current sub-word): it is specified in the relevant index of an array of sub-words (i.e. it is specified that each sub-word contains which segments).

3.2. Outer contour of the current sub-word is traversed, starting from the right- and top-most point of it. 3.2.1. If (the current segment, which is traversing, is revisited AND there are some unvisited segments): this segment is moved to the end of the order of segments. Else:

3.2.1.1. If (there are some unvisited segments): the current segment is added at the end of the order of segments. goto 3.3 .

3.2.1.2. If (a pixel with color ‘SEGMENTATION_COLOR’ is visited): continue traversing in the next segment. goto 3.2.1 .

3.3. The position of each segment (i.e. first, middle, last, or isolated) in the current sub-word is specified according to the obtained order of segments.

4. The coordinate of one point is stored in an array in the obtained order of them. 5. Ascenders and descenders are added to the image, new connected components are found, and the order of them is specified using the order of stored points in step 4. 6. Isolated ascenders are removed from the image.

Figure 18. Finding right-to-left order algorithm.

4.4. Recognition

A CDVDHMM model [1] is used for recognition. The characteristics which make this model fairly suitable for our system are:

(1) The sequential nature of writing: Markov models can successfully code the sequential information.

(2) Hidden states: In the handwritten word recognition task, the system tries to recover the sequence of characters (as “hidden” states) from the sequence of observed features (as observations).

(3) Continuous symbol probability distribution: there is no vector quantization error in this case, and the multi-shaped property of Farsi/Arabic characters can be fairly modeled by the mixture-of-Gaussian distributions.

(4) Variable duration of states: this aspect can handle the over-segmentation problem.

We consider the pure form of characters (i.e. without secondary strokes) and the compound forms of characters in Nastaaligh style as the states of the model. So the number of states will be 25. As mentioned before, Farsi characters in various positions have different forms and a character in a given position can also have different shapes (e.g. “ ” and “ ”). So, considering all forms of a character in one class will not result in a good recognition rate. To compensate for this problem, we have defined the “sub-state” idea. We considered different shapes of each character as sub-state of the state assigned to that character. Then in the training stage, we use training images to obtain parameters of each sub-state separately. The role of sub-states in the performance improvement will become clearer in the following sections.

4.4.1. Training the Model

In the training stage, the goal is estimation of model parameters ),,,,( DBA ΓΠ=λ (equations (1) to (5)). As mentioned before, characters are considered as states. Therefore, the states are meaningful, which makes possible avoiding re-estimation methods (e.g. Baum–Welch method) for training, and so the training stage becomes simple [1]. Two training sources are used which include training images and the dictionary. Parameters B and D are obtained from training images, and the other parameters from dictionary.

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 111

4.4.1.1. Training with images

In this stage, the parameters are computed for each sub-state separately. In this subsection, the word “state” refers to “sub-state”. After using the segmentation algorithm on the training images, the state duration probabilities (D) are computed by counting the number of segments of each character manually. The probability that state iq has duration d , ( ( )iP d q ), is equal to the number of times that the character iq is segmented to d parts, divided by the total number of times that this character has appeared in the training images. In our training samples, the maximum duration of a state was four. But to be able to consider worse cases, we consider the maximum duration of states equal to six ( 6,...,2,1=d ).

The observation pdf (parameter B) is represented as a finite mixture of the form:

=

≤≤=jM

mjmjmjmj NjUxNcxb

11],,,[.)( µ

(6)

where N represents a Gaussian distribution with mean vector jmµ and covariance matrix jmU for the m’th mixture component at state j. x is the vector being modeled, jM is the number of Gaussian components at state j , and jmc is the mixture coefficient for the m’th Gaussian component at state j. The mixture gains satisfy the stochastic constraint:

jjm

M

mjm MmcNjc

j

≤≤≥≤≤=∑=

1,0,1,11

(7)

We used the k-means clustering algorithm with a free parameter k and a fixed SNR to find the number of Gaussian functions for each state. We used criterion 4 [ ] [ ]w mJ tr S tr S= to determine a proper SNR [31]. This criterion is based on the trace of within-class scattering matrix ( [ ]wtr S ) divided by the trace of mixture scattering matrix ( [ ]mtr S ). The experimental value 0.9 is obtained as the optimal value for the terminating condition of algorithm (SNR). The mixture coefficient jmc is the number of training samples existing in jmH divided by the total number of training samples for state jq .

jmH is the set of the samples in cluster m of state jq . For each cluster in state jq , the parameters of Gaussian distribution are estimated as follows:

∑∈

=jmHx jm

jm xN

1µ (8)

∑∈

−−=jmHx

Tjmjm

jmjm xx

NU ))((1 µµ (9)

where x is the feature vector of the training samples and jmN is the number of samples in jmH . The covariance matrix jmU is assumed to be diagonal in our implementations. Because of the limited amount of available training data, a small constant ρ is added to the diagonal elements of the covariance matrix to prevent it from becoming singular [1]. The value 0.1 is selected for ρ in our implementations. The symbol probability density for an observation O is computed in the recognition stage as:

∑=

− −−−×=jM

mjmjm

Tjmn

jmnjmj OUO

UcOb

1

12

)].()(exp[]det[.)2(

1.)( µµπ

(10)

The Observation O can be composed of one or several consecutive segments. In handwritten word recognition, the shapes of consecutive segments resulting from segmentation process are dependent on each other. Thus the symbol probability for a composite observation is defined as follows [1]:

ddjdj Obooob )()...( 121 = (11)

where dO1 is the image built by merging segment images dooo ,...,, 21 together. The power d is used to balance the symbol probability for different number of segments. This is a necessary normalization procedure when every node in Viterbi net is used to represent a segment [1].

Reza Safabakhsh and Peyman Adibi

112 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

Thus we will have a classifier which, given a sample, computes the score for each class by using the learned parameters of that class.

4.4.1.2. Training with dictionary

The three parameter sets of CDVDHMM, which include initial probability ( Π ), transition probability ( A ), and last-state probability ( Γ ), are obtained from the dictionary as [1]:

iNumber of words that start with letter qTotal number of words in the dictionaryiπ = (12)

i j

i

Number of transitions from letter q to letter qTotal number of transitions from qija = (13)

iNumber of words that end with letter qTotal number of words in the dictionaryiγ = (14)

In this manner, the system can be used in new applications with different dictionaries only by simple re-computing of

the above probabilities from the new dictionary. This feature makes the system considerably flexible. 4.4.1.3. Recognition phase

The common method used in hidden Markov models for recognition of a sequence of input symbols is the Viterbi algorithm. In this paper, the modified Viterbi algorithm proposed in [1] is used and by considering the sub-state idea, some new modifications are added to it to make it more suitable for Farsi/Arabic script recognition. The modified version of Viterbi algorithm used in this work is presented in the appendix. Comparing with the standard version of the Viterbi algorithm, this version is generalized in three aspects:

1. It has become compatible with variable-duration states by modification of the Viterbi algorithm based on the following equation [1]:

[ ]dtdtjjijdtDdNit ObqdPaij )()|()(max)( 11,1 +−−

≤≤≤≤= δδ (15)

where, D shows the maximum acceptable duration of a state (here D=6). tdtO 1+− shows the feature vector extracted

from an image which is obtained from concatenation of the last d segments.

2. The standard version only yields the best global path in Viterbi net. To find L globally best paths, two alternatives are proposed in [6]: the serial and parallel methods. It has been shown that the serial version is more efficient. We used this serial version in our implementations.

3. In the algorithm used in our system, each state can contain different sub-states (which are different forms and positions of the character corresponding to that state). The position (first, middle, last, or isolated) of the input segment is known. Thus, when computing observation probabilities, only this specific position of each state is considered. But each state in each position can have different shapes that are also considered as sub-states. Though, to compute the observation probability of x (where x is the feature vector extracted from the input image) for state j ( ))(( Xb j ), first we find the sub-state of state j which yields the highest score ( )}({maxarg Xbk

iji= ). Then the state duration probability ( )(

kjqdP ) and the corresponding score ))(( Xbkj

are computed based on the parameters of this sub-state.

The Viterbi net is generalized from two-dimensional (t, and i) to three-dimensional (t, i, and l). In serial version, all elements of the third dimension will not be computed. Instead, only the nodes of two-dimensional net, which become winner in each column (i.e. have the highest score) are allowed to have next best scores in the third dimension [1]. This

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 113

algorithm obtains the score of each node of the net and then finds the globally best paths, which start from t=1 and end at t=T.

In each iteration of the algorithm, one globally optimal path is found which shows one sequence of characters (one sequence of states), i.e. a word. For example, Figure 19 shows the first best path for the input image in a 2-D net. In this example, the word is segmented to seven parts, and the best path is found considering the duration of states. In this figure, only parts of Viterbi net and the 2-D case is shown. The real applied net has dimensions N × T × L (N=25 is the number of states, T=7 is the number of segments, and L=20 is the maximum number of best paths).

Figure 19. An example of finding the best path in Viterbi net for the word “چشمك”.

5. PERFORMANCE OF THE SYSTEM

5.1. Training and Testing Sets

For training and testing the system, a number of persons wrote 50 specifically selected words three different times, in moderately free Nastaaligh style. This 50-word set, which is also considered as the dictionary of the system, is selected such that it includes all Farsi characters and all compound forms of characters in Nastaaligh style. A number of these words are shown in Figure 1. This is an important feature, because the system trained with these words will be potentially able to recognize other words that has never seen them. Table 2 shows the characteristics of the image sets used in our experiments. The training image set (TRN) contains 700 words (two 50-word scripts from each of seven writers). Three test sets are applied for verification in various stages of the system. These include TST1 set, containing 350 words (the third 50-word scripts from the same seven writers of TRN set), TST2 set containing 100 words (two 50-word scripts of two writers whose written samples are not included in the TRN set), and TST3 set, containing 21 words not included in the dictionary. It should be noted that poorly written samples also exist in the training and testing sets; e.g., there are filled loops, damaged and split strokes, dots touching the body of letters, and unexpected curvatures. The images are digitized with a resolution of 300 dpi and 256 gray-levels.

Table 2. The Used Image Sets. Description

(from the system point of view) Number of unique

words Number of word images Name of the set

- 50 700 Training set (TRN) Known words – Known writers 50 350 Test set (TST1)

Known words –Unknown writers 50 100 Test set (TST2) Unknown words – Unknown writers 7 21 Test set (TST3)

5.2. Experiments

5.2.1. Performance of Preprocessing and Segmentation Stages

To evaluate the performance of preprocessing and segmentation stages, we use the TST1 test set. In preprocessing stage, only the ascenders elimination algorithm showed error. This algorithm eliminates 97.69% of ascenders correctly. A rate of 0.0023% of other strokes are considered as ascenders and are eliminated wrongly. This problem occurs for

Reza Safabakhsh and Peyman Adibi

114 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

letters which are written slanted and are moderately long. A sample of this problem is illustrated in Figure 7(b). In segmentation, as mentioned before, the goal is to avoid under-segmentation. The first method which is based on regularities and singularities satisfies this objective in 77.62% of cases. The second method, which uses upper contour analysis satisfies the objective in 95.68% of cases. Thus, the second method is selected to use in our system. The algorithm for finding right-to-left order showed correct operation in 96.85% of the words in test set. Most of the errors at this stage appear in the cases where the segmentation algorithm makes a segment in the middle of a word and the segment is adjacent only to one other segment. This appears, for example, at the overlap area of letter “He” as in examples shown in Figure 20.

Figure 20. Samples for which the right-to-left order algorithm operates incorrectly.

To solve this problem, one can find all segments that are adjacent with only one other segment, and if these segments

were not the first or the last segment of the word (which can be easily determined based on their positions), the cuts related to that segment are filled, the connected components of the image are found again, and the right-to-left order is determined again with these new connected components. 5.2.2. Performance of Feature Extraction Stage

To evaluate the selected features, a classifier based on mixture-of-Gaussian functions (Section 4.4.1.1) is trained with feature vectors extracted from ideally segmented characters in the training set, TRN. Then the same features are extracted from the ideally segmented characters in set TST1. The training set contains 2950 and the test set 1477 characters. Table 3 shows the correct classification rate of the feature vectors to the fifth best choice, in comparison to the feature vector proposed in [6]. It can be seen that our feature vector with considerably less features shows better performance. Adding some more structural features, such as open loops and the direction of them, can also results in a better performance.

Table 3. Efficiency of our 8 Features in Comparison with the 35 Features Proposed in [6]. Performance of 35 features proposed in

[6], using 615 characters Performance of our 8 proposed features using 1477 characters

No. of acceptable choices

51.9% 81.11% TOP1 71.9% 93.22% TOP2 82.1% 96.47% TOP3 88.3% 97.63% TOP4 91.2% 98.17% TOP5

5.2.3. Performance of Recognition Stage and Modified Viterbi Algorithm

The modified Viterbi algorithm yields the recognition results. Table 4 shows the results of the recognition stage for 5, 10, and 20 iterations of Viterbi algorithm for those words of the test set which do not show any error in any of the previous stages.

We also obtained the recognition results in the absence of dictionary statistics (by taking all the probabilities resulting from dictionary equal) to find the effect of dictionary in the recognition task. Two columns of Table 4 show recognition

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 115

results for the test set TST1 in the presence and absence of dictionary. It is observed that 96.8% of these words are recognized in the first 20 iterations of Viterbi algorithm (“direct recognition” [1]) using the dictionary. Thus, without hypothesis generation scheme, much better recognition results are obtained in comparison with [1] (the best recognition result which is reported in [1] for direct recognition is 88.3% with a 10-word dictionary).

Table 4 shows that the absence of dictionary reduces the recognition rate from 96.80% to 86.17%. Indeed, an advantage of path discriminant methods in comparison with model discriminant ones is this lower dependence on dictionary. This result also shows the ability of the modified version of the Viterbi algorithm proposed in this paper and the methods used in preprocessing, segmentation, feature extraction, and classification stages, to be effectively utilized in a character recognition system without a dictionary.

Table 4. The Correct Recognition Rate of the System in Different Experiments.

TST3 with dictionary

TST2 with dictionary

TST1, with dictionary

TST1, without dictionary

No. of iterations of Viterbi algorithm

52.38% 69.00% 88.65% 67.02% 5 76.19% 80.00% 94.68% 78.82% 10 90.48% 91.00% 96.80% 86.17% 20

As mentioned before, the test set TST1 is selected from the words written by writers that their other samples are used

in the training set. Thus to measure the sensitivity of the system to writers, the experiment was repeated with the test set TST2 which contains samples from new writers. The results of this experiment are shown in the fourth column of Table 4. These results show that reduction in the performance of the system because of unknown writers is about 5.8%, which is acceptable.

To observe the performance of the system for the words that do not exist in the training set (unknown words), we used the test set TST3. The results of recognition in this case are shown in the fifth column of Table 4. The statistics of these words are not contradictory with statistics obtained from the dictionary. If such a contradiction appears (for example, one of the state transitions, first states, or last states of these words has probability zero in dictionary statistics, or the statistics of the training and testing sets are very different), the system will be unable to recognize the words.

In this experiment, the system is involved not only with the absence of dictionary statistics, but also with lack of training with images. But we see that the results still remain acceptable. An important reason for this appropriate behavior is the idea considered for selection of the training word set. In other words, including all possible Farsi characters and various combinations of them in the selected 50-word set showed its positive effect in the recognition of unknown words. It almost removes the problem of lack of training with images. Thus, to extend the system, we can increase the number of writers of this 50-word set to as many as possible, and use these samples for training with images, while the used dictionary can be selected independently and based on the specific application.

The mean speed of operation of the system on a PC with Pentium II - 533 MHz processor is 1.09 words or 4.6 characters per second.

6. CONCLUSION

In this paper, a continuous density variable duration hidden Markov model is utilized for recognition of naturally cursive Farsi Nastaaligh handwritten words. In the preprocessing stage, several new methods are proposed for eliminating ascenders and descenders that destroy the right-to-left order of segments. Also two enhanced segmentation methods are proposed. The first method finds the final regular components and segments the words in the positions of these regularities. The second method considers the local minima of the upper contour, overlapping positions, and connection positions of the character “Re” to the previous character as candidate places for segmentation, and segments the words at proper candidate places. The second method achieves a higher correct segmentation rate and thus is selected for use in the system. A small feature set is proposed which shows high discrimination ability and the advantage of invariance to scale and shift. The complexity of computing this feature vector is acceptably low.

Reza Safabakhsh and Peyman Adibi

116 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

In the CDVDHMM model used for recognition, the continuous density of observations improves the performance of the system by avoiding the vector quantization errors. Also, using the mixture of Gaussian functions classifier yields the opportunity of considering different writing forms for each character. We have assumed completely different form sub-states for each state, and the results are very appropriate. The variability of duration of states compensates for the over-segmentation problem quite well.

In the word recognition methods based on HMM, a lexicon or dictionary is always required. Model discrimination methods are very dependent on dictionary. In these systems, the result of recognition is either one of the dictionary words or “rejection”. But in the path discriminant methods, as in our system, only the training from dictionary is dependent on the dictionary. Experiments showed that the system can recognize out-of-dictionary words with an acceptable performance, if the order of letters in these words is not contradictory with parameters ( , ,AΠ Γ ).

ACKNOWLEDGMENT

The authors would like to thank all individuals who assisted in writing the word samples, specially Mr. J. Kaboudian. APPENDIX – Modified Viterbi Algorithm

In this algorithm, T is the number of segments of the current word, N is the number of states, and L is the number of globally best paths that must be obtained.

Step 0 - Storage: t : time index c : iteration index

LlNiTtlit ≤≤≤≤≤≤Ψ 1,1,1),,( : survivor ),,( *** dli terminating in ti LlNiTtlit ≤≤≤≤≤≤ 1,1,1),,(δ : survivor score in ti

TtNiticount ≤≤≤≤ 1,1),,( : count of passes allowed at node ti Step 1 – Initialization:

)}({maxarg 1obkjij

=

Ni1for )|1().()1,( 11 ≤≤==kk iii qdPobi πδ

Llli ≤≤≤≤= 2 , Ni1for 0),(1δ Llli ≤≤≤≤=Ψ 1 , Ni1for )1,0,0(),(1 Ttticount ≤≤≤≤= 1 , Ni1for 1),(

1=c Step 2 - Pre-recursion:

Nj1 , Tt2for ≤≤≤≤ :{ )}({maxarg curji

obkk

=

[ ]dtdtjjijdtDdNit ObqdPaij

kk)()|()1,(max)1,( 11,1 +−−

≤≤≤≤= δδ

[ ]dtdtjjijdtDdNit ObqdPaij

kk)()|()1,(maxarg)1,( 11,1 +−−

≤≤≤≤=Ψ δ

} //end of for Step 3 – Backtracking:

[ ]iTTicountlNilithcP γδ ),(max)(

),(1,1

*

≤≤≤≤−=

[ ]iTTicountlNiTT lithcli γδ ),(max)arg(),(),(1,1

**

≤≤≤≤−=

1),(),( ** += TicountTicount TT

Reza Safabakhsh and Peyman Adibi

April 2005 The Arabian Journal for Science and Engineering, Volume 30, Number 1B. 117

Which [ ]⋅− max)( thc denotes the c-th maximum.

),( ***TTt

dT lid Ψ⎯⎯←

Tt = : )1( * dodtwhile t ≥− {

),(, **,**** ttt

lidtdt

lilitt

Ψ⎯⎯←−−

*tdtt −=

),( ***ttt

dt lid Ψ⎯⎯←

)0( * <tdif loop whileafter the to //go;break

1),(),( ** += ticountticount tt } //end of while If c-th optimal state sequence or c-th globally best path ...}){( *

1*1

**1d

iiI+

= , satisfies the given criteria (i.e the recognition

is correct) or Lc > , exit; otherwise continue: Step 4 - Forward-tracking

)( Ttwhile ≤ {

**tij =

),( * tjcountl = )}({maxarg * curji

obki

=

])()|(),([max)(),( 1),(1,1,1

****

dtdtjjijdtdticountmDdNit ObqdPamithllj

kk+−−

−≤≤≤≤≤≤−= δδ

])()|(),([max)arg(),( 1),(1,1,1

****

dtdtjjijdtdticountmDdNit ObqdPamithllj

kk+−−

−≤≤≤≤≤≤−=Ψ δ

} //end of while REFERENCES

[1] M.-Y. Chen, A. Kundu, and S. N. Srihari, “Variable Duration Hidden Markov Model and Morphological Segmentation for Handwritten Word Recognition,” IEEE Trans. On Image Processing, 4 (12) (1995), pp. 1675–1688.

[2] A. Amin, “Off-Line Arabic Character Recognition: The State of the Art ,” Pattern Recognition, 31 (5) (1998), pp. 517–530.

[3] R. G. Casey and E. Lecolinet, ”A Survey of Methods and Strategies in Character Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., 18 (7) (1996), pp. 690–706.

[4] H. Fazayeli, Calligraphy Teaching, 6th edn. Tehran : Soroush Press, 1991. (In Farsi) [5] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice-Hall, 1993.

[6] M.-Y. Chen, A. Kundu, and J. Zhou, “Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network,” IEEE Trans. Pattern Anal. Mach. Intell., 16 (5) (1994), pp.481–496.

[7] A. Kundu, Y. He, and P. Bahl, “Recognition of Handwritten Word: First and Second Order Hidden Markov Model Based Approach,” Pattern Recognition, 22 (3) (1989), pp. 283–297.

[8] A. Kundu, Y. He, and M.-Y. Chen, “Alternatives to Variable Duration HMM in Handwriting Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 20 (11) (1998) pp. 1275–1280.

[9] A. El-Yacoubi, M. Gilloux, R. Sabourin, and C. Y. Suen, “An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Moddeling and Recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 21 (8) (1999), pp. 759–760.

[10] H. Miled, C. Olivier, M. Cheriet and Y. Lecourtier, “Coupling Observation/letter for a Markovian Modelisation Applied to the Recognition of Arabic Handwriting,” IEEE Proc. of 4th Int. Conf. on Document Analysis and Recognition, Ulm, Germany, 1997, vol. 2, pp. 580–583.

Reza Safabakhsh and Peyman Adibi

118 The Arabian Journal for Science and Engineering, Volume 30, Number 1 B. April 2005

[11] I. Bazzi, R. Schwartz, and J. Makhoul, “An Omnifont Open-Vocabulary OCR System for English and Arabic,” IEEE Trans. Pattern Anal. Mach. Intell. 21 (6), (1999), pp. 495–504.

[12] M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, “Handwritten Farsi (Arabic) Word Recognition: a Holistic Approach Using Discrete HMM,” Pattern Recognition, 34 (2001), pp. 1057–1065.

[13] S. Kuo and O. E. Agazzi, “Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Makov Models,” IEEE Trans. Pattern Anal. Mach. Intell., 16 (1994), pp. 842–848.

[14] N. BenAmara and A. Belaid, “Printed PAW Recognition Based on Planar Hidden Markov Models,” in 13th Int. Conf. On Pattern Recognition, Vienna, Austria, 1996, vol. B, 1996, pp. 220–224.

[15] R. Azmi, E. Kabir, and K. Badie, “Using HMM and Modified Viterbi Algorithm to Reduce the Rejection Rate in Printed Farsi Sub-Word Final Recognition,” in Proc. of 4th Int. Conf. of Computer Society of Iran, 1999, pp. 291–297. (In Farsi).

[16] H. Miled, C. Olivier, M. Cheriet, and Y. Lecourtier, “Coupling Observation/Letter for a Markovian Modelisation Applied to the Recognition of Arabic Handwriting,” IEEE Proc. of 4th Int. Conf. on Document Analysis and Recognition, (2) (1997), pp. 580–583.

[17] A. El-Yacoubi, M. Gilloux, R. Sabourin, and C. Y. Suen, “An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Moddeling and Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 21 (8) (1999), pp. 752–760.

[18] M. Sonka, V. Hlavac and R. Boyle, Image Processing, Analysis and Machine Vision, 2 nd edn. London: Chapman & Hall, 1993.

[19] D. Motawa, A. Amin and R. Sabourin, “Segmentation of Arabic Cursive Script,” Proc. 4th Int’l Conf. Document Analysis and Recognition, Ulm, Germany, 1997, vol. 2, pp. 625–628.

[20] J. Wang, M. K. H. Leung and S. C. Hui, “Cursive Word Reference Line Detection,” Pattern Recognition, 30 (3) (1997), pp. 503–511.

[21] P. Adibi, “Farsi Handwritten Word Recognition using A Continuous-Density Variable-Duration Hidden Markov Model”, Master of Science Thesis, Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran, September 2001. (In Farsi).

[22] A. Amin and G. Masini, “Machine Recognition of Multifonts Printed Arabic Texts,” Proc. 8th Int. Conf. On Pattern Recognition, Paris, 1986, pp. 392–395.

[23] F. El-Khaly and M. Sid-Ahmed, “Machine Recognition of Optically Captured Machine Printed Arabic Text,” Pattern Recognition, 23 (1990), pp. 1207–1214.

[24] H. Almuallim and S. Yamaguchi, “A Method of Recognition of Arabic Cursive Handwriting,” IEEE Trans. Pattern Anal. Mach. Intell., PAMI-9 (1987), pp. 715-722.

[25] A. Amin and S. Al-Fedaghi, “Machine Recognition of Printed Arabic Text Utilising a Natural Language Morphology,” Int’l. J. Man–Machine Stud., 35 (1991), pp. 769–788.

[26] T. El-Sheikh and R. Guindi, “Computer Recognition of Arabic Cursive Script,” Pattern Recognition, 21 (1988), pp. 293–302.

[27] J.C. Simon, “Off-line Cursive Word Recognition,” Proc. IEEE, 1992, p. 1150.

[28] C. Olivier, H. Miled, K. Romeo, and Y. Lecourtier, “Segmentation and Coding of Arabic Handwritten Words,” IEEE Proc. of 13th Int. Conf. on Pattern Recognition, Vienna, Austria, 1996, vol. 3, pp. 264–268.

[29] O. D. Trier, A. K. Jain, and T. Taxt, “Feature Extraction Methods For Character Recognition – A Survey,” Pattern Recognition, 29 (4) (1996), pp. 641–662.

[30] R. C. Gonzalez and P. Wintz, Digital Image Processing, 2nd edn. Boston, Massachusetts : Addison-Wesley, 1987.

[31] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edn. Boston, Massachusetts :Academic Press, 1990.

Paper Received 7 June 2003; Revised 31 December 2003; Accepted 13 January 2004.


Recommended