+ All Categories
Home > Documents > On using context for automatic correction of non-word misspellings in student essays Michael Flor...

On using context for automatic correction of non-word misspellings in student essays Michael Flor...

Date post: 30-Dec-2015
Category:
Upload: kelly-spencer
View: 217 times
Download: 4 times
Share this document with a friend
Popular Tags:
32
On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service Educational Testing Service 2012 ACL
Transcript

On using context for automatic correction of non-word misspellings in student

essays

Michael Flor Yoko FutagiEducational Testing Service Educational Testing Service

2012 ACL

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

1. Introduction

Non-word misspellings:e.g.,

Busineesinthemor efun

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

2. Corpus

High-stakes standardized tests:

- TOEFL - GRE

The corpus includes 3000 essays, for a total of 963,428 words.

2. Corpus

TOEFL essays GRE essays

ELL 98.73% 57.86%

English speakers 1.27% 42.14%

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

3. Annotation

Annotators were asked to identify all non-word misspellings.

Two annotators: - native English speakers - experienced in linguistic annotation

3. Annotation

Annotators agreed in 82.6% of the cases

(Cohen’s Kappa=0.8, p<.001).

All disagreements were resolved by a third annotator (adjudicator).

3. Annotation

3. Annotation

The annotated corpus of 3,000 essays has the following statistics:

- Average essay length is 321 words (the range is 28-798 words)

- 148 essays turned out to have no misspellings at all

- 2.24% of the words in the corpus are non-word misspellings

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

4. Spelling correction systems(ConSpel system)

The system focused on non-word misspellings for detection and correction of spelling errors.

4. Spelling correction systems(ConSpel system)

By default, the system will ignore: - numbers - dates - web - email addresses - mixed alpha-numeric strings (e.g. ‘RV400’) - capitalized words (e.g. ‘London’) - all uppercase (e.g. ‘ROME’)

4. Spelling correction systems(ConSpel system)

ConSpel spelling dictionaries include about 360,000 entries.

- includes all inflectional variants (e.g. ‘love’, ‘loved’, ‘loves’, ‘loving’) - international spelling variants (e.g. American and British English)

The core set includes 245,000 entries (modern English vocabulary)

Additional dictionaries include about 120,000 entries.

- international surnames and first names - names for geographical places

4. Spelling correction systems(ConSpel system)

Detection of Misspellings

The string is not in the system dictionaries.

4. Spelling correction systems(ConSpel system)

Correction of Misspellings

Dictionaries are also the source of suggested corrections.

Candidate suggestions:Use edit distance with the default threshold of 5.

Problem:Can easily get hundreds of correction candidates.

4. Spelling correction systems(ConSpel system)

Candidate suggestions are ranked using a set of algorithms:

- edit distance - phonetic similarity - word frequency - local context - context-sensitive

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

5. Comparative evaluation

All evaluations were performed in “ full context”(rather than word-by-word)

5. Comparative evaluation

Error Detection

5. Comparative evaluation

Error Correction

5. Comparative evaluation

5. Comparative evaluation

Error Detection (native and non-native English speakers.)

5. Comparative evaluation

Error Correction (native and non-native English speakers.)

5. Comparative evaluation

5. Comparative evaluation

5. Comparative evaluation

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

6. Discussion

Absence of grammatical errors. For example:

“They received fresh air, interacte with other youth their age, solved problems...”.

Outline

[ 1. Introduction ] [ 2. Corpus ] [ 3. Annotation ] [ 4. Spelling correction systems ] ConSpel system [ 5. Comparative evaluation ] [ 6. Discussion ] [ 7. Conclusions ]

7. Conclusions

Results with ConSpel system demonstrate that utilizing contextual information helps improve automatic correction of non-word misspellings, for both native and non-native speakers of English.


Recommended