+ All Categories
Home > Documents > Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar Text

Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar Text

Date post: 14-Aug-2015
Category:
Upload: htet-lynn
View: 77 times
Download: 1 times
Share this document with a friend
24
Maximum-Length Comparison Method Of Automatic Word Segmentation for Myanmar Text Htet Myet Lynn, Pankoo Kim, Junho Choi, Jenogin Kim AIPT2015 – June 23, 2015 CAIPT2015
Transcript

CAIPT2015

Maximum-Length Comparison Method Of Automatic Word Segmentation

for Myanmar Text

Htet Myet Lynn, Pankoo Kim, Junho Choi, Jenogin Kim

CAIPT2015 – June 23, 2015

What is NLP?

CAIPT2015

NaturalLanguage

Why Word Segmentation?

CAIPT2015

U n i v e r s i t y L O G O

Contents

1 Nature of Myanmar Script

CAIPT2015

2 Maximum-Length Comparsion Model

3 Experimental Result & Future Study

CAIPT2015

Nature Of Myanmar Script

Nature of Myanmar Script

CAIPT2015

Consonants

Digits

Nature of Myanmar Script

CAIPT2015

Consonants

Basic Vowels

Nature of Myanmar Script

CAIPT2015

Consonants

Consonant Combination Symbols

Nature of Myanmar Script

CAIPT2015

Consonants

Devowelization Consonants

Nature of Myanmar Script

CAIPT2015

Lack of standard rules for distinct word delimiter (white-space) between words become challenge

He is having a meal with three nephews.

He is nephew three persons with meal eating

CAIPT2015

Maximum-Length Comparison Model

1 Preprocessing Sentences

2 Detect First Character (Consonant)

3 Candidates Detection & Extraction

4 Maximum-Length Comparison

Preprocessing Sentences

CAIPT2015

Input Text

Preprocessing

Detect Consonant

Data DictionaryCadidate

ExtractionCandidates.txt

Maximum Length Comparison

Output Result

Preprocessing Sentences

CAIPT2015

He joins the army.Input:

Preprocessing:

Each and every news media uses different style of writing and positioning white-space in a sentence

Remove punctuation marks, white-spaces

Detect First Character (Consonant)

CAIPT2015

Input Text

Preprocessing

Detect Consonant

Data DictionaryCadidate

ExtractionCandidates.txt

Maximum Length Comparison

Output Result

Detect First Character (Consonant)

CAIPT2015

Preprocessing:

1st Character Detection:

He joins the army.

He

Get Consonant:

Candidates Detection & Extraction

CAIPT2015

Input Text

Preprocessing

Detect Consonant

Data DictionaryCadidate

ExtractionCandidates.txt

Maximum Length Comparison

Output Result

Candidates Detection & Extraction

CAIPT2015

Consonant:

Data Dictionary

Candidates.txt

Let,Length of word_#1 = 3;Length of word_#2 = 5;..Length of word_#10= 20;

Truncate the input_sentence with the value of word_#n;

If (word_#n == truncate_word) {

mark_as_candidate;

} else{ ignore();}

1.10.

Maximum-Length Comparison

CAIPT2015

Input Text

Preprocessing

Detect Consonant

Data DictionaryCadidate

ExtractionCandidates.txt

Maximum Length Comparison

Output Result

Maximum-Length Comparison

CAIPT2015

1.10.

Candidates.txt

IfLength of candidate_#1 = 3;Length of candidate_#10= 20;

//Get the word with longest value among candidatesbest_candidate = candidate_#10;final_word = best_candidate;

Truncate the value of best_candidate from input;

Input: He joins the army.

New input:

Maximum-Length Comparison Model

CAIPT2015

Input Text

Preprocessing

Detect Consonant

Data DictionaryCadidate

ExtractionCandidates.txt

Maximum Length Comparison

Output Result

While (length_input_sent <= 0)

CAIPT2015

Experimental Result & Future Study

Experimental Result

CAIPT2015

Future Study

CAIPT2015

30147 sentences including a total of (23,454 words) have been tested

21577 words out of 23,454 words are aright (92%)

Error can be occurred according to the shortage of data dictionary, technical terms and new derived words

Increase the value of data dictionary

Understand the meaning of segmented word semantically for further NLP tasks

! !!

!!! !?

Do You Have any Questions?

CAIPT2015


Recommended