CAIPT2015
Maximum-Length Comparison Method Of Automatic Word Segmentation
for Myanmar Text
Htet Myet Lynn, Pankoo Kim, Junho Choi, Jenogin Kim
CAIPT2015 – June 23, 2015
U n i v e r s i t y L O G O
Contents
1 Nature of Myanmar Script
CAIPT2015
2 Maximum-Length Comparsion Model
3 Experimental Result & Future Study
Nature of Myanmar Script
CAIPT2015
Lack of standard rules for distinct word delimiter (white-space) between words become challenge
He is having a meal with three nephews.
He is nephew three persons with meal eating
CAIPT2015
Maximum-Length Comparison Model
1 Preprocessing Sentences
2 Detect First Character (Consonant)
3 Candidates Detection & Extraction
4 Maximum-Length Comparison
Preprocessing Sentences
CAIPT2015
Input Text
Preprocessing
Detect Consonant
Data DictionaryCadidate
ExtractionCandidates.txt
Maximum Length Comparison
Output Result
Preprocessing Sentences
CAIPT2015
He joins the army.Input:
Preprocessing:
Each and every news media uses different style of writing and positioning white-space in a sentence
Remove punctuation marks, white-spaces
Detect First Character (Consonant)
CAIPT2015
Input Text
Preprocessing
Detect Consonant
Data DictionaryCadidate
ExtractionCandidates.txt
Maximum Length Comparison
Output Result
Detect First Character (Consonant)
CAIPT2015
Preprocessing:
1st Character Detection:
He joins the army.
He
Get Consonant:
Candidates Detection & Extraction
CAIPT2015
Input Text
Preprocessing
Detect Consonant
Data DictionaryCadidate
ExtractionCandidates.txt
Maximum Length Comparison
Output Result
Candidates Detection & Extraction
CAIPT2015
Consonant:
Data Dictionary
Candidates.txt
Let,Length of word_#1 = 3;Length of word_#2 = 5;..Length of word_#10= 20;
Truncate the input_sentence with the value of word_#n;
If (word_#n == truncate_word) {
mark_as_candidate;
} else{ ignore();}
1.10.
Maximum-Length Comparison
CAIPT2015
Input Text
Preprocessing
Detect Consonant
Data DictionaryCadidate
ExtractionCandidates.txt
Maximum Length Comparison
Output Result
Maximum-Length Comparison
CAIPT2015
1.10.
Candidates.txt
IfLength of candidate_#1 = 3;Length of candidate_#10= 20;
//Get the word with longest value among candidatesbest_candidate = candidate_#10;final_word = best_candidate;
Truncate the value of best_candidate from input;
Input: He joins the army.
New input:
Maximum-Length Comparison Model
CAIPT2015
Input Text
Preprocessing
Detect Consonant
Data DictionaryCadidate
ExtractionCandidates.txt
Maximum Length Comparison
Output Result
While (length_input_sent <= 0)
Future Study
CAIPT2015
30147 sentences including a total of (23,454 words) have been tested
21577 words out of 23,454 words are aright (92%)
Error can be occurred according to the shortage of data dictionary, technical terms and new derived words
Increase the value of data dictionary
Understand the meaning of segmented word semantically for further NLP tasks