+ All Categories
Home > Documents > After the Deadline at NAACL CL&W - June 2010

After the Deadline at NAACL CL&W - June 2010

Date post: 30-May-2018
Category:
Upload: rsmudge
View: 219 times
Download: 0 times
Share this document with a friend

of 31

Transcript
  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    1/31

    The Design of a

    Proofreading Software Service

    Raphael MudgeNLP Hacker, Automattic

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    2/31

    Overview

    y What is After the Deadline

    y W

    here can you use ity How it works

    y Where to get it

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    3/31

    What is AtD?

    y A software service, checks:

    y Spelling

    y Real-word errors

    y Style

    y Grammar

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    4/31

    A Software Service?

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    5/31

    What is AtD?

    y A software service, checks:

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    6/31

    What is AtD?

    y A software service, checks:

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    7/31

    What is AtD?

    y A software service, checks:

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    8/31

    What is AtD?

    y A software service, checks:

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    9/31

    Wherecan you useit?

    y In your browseryGoogle Chrome and Firefox

    y With your blogyWordPress and IntenseDebate

    y On your siteyTinyMCE and jQuery

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    10/31

    Google Chrome

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    11/31

    Firefox

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    12/31

    OpenOffice.org

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    13/31

    How much use?

    May 2010:

    3.5 million requests100-140K requests/day

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    14/31

    Design Goals

    y Speed

    y Simplicity

    y A working solution

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    15/31

    Spell Checking

    Iswordin

    dictionary?

    Generate

    Suggestions

    Sort Suggestions

    No :(

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    16/31

    Sorting Suggestionsy Compare suggestion, error

    y Do the first letters match?

    y Edit distancey Probability of suggestion in context

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    17/31

    Sorting Suggestionsy The written wrd

    y Suggestions: ward, word

    yFirst letters match

    y Edit distance = 1

    y Pn(ward | written) = 0.00%

    y Pn(word | written) = 0.17%

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    18/31

    Language ModelP(word)

    count(word) / total

    Pn(word|previous)count(previous word) / count(previous)

    Pp(word|next)Pn(next|word) * P(word) / P(next)

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    19/31

    Sorting SuggestionsyWe want to calculate:

    score(suggestion, error, context)

    y Answer? Neural networks

    y Trained with misspelled wordsy Returns a value 0.0 1.0

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    20/31

    Spell CheckerEvaluation

    Method / Numbers from:Sebastian Deorowicz and Marcin G. Ciura. 2005. Correctingspelling errors by modeling their causes. International Journal of

    Applied Mathematics and Computer Science, 15(2):275285.

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    21/31

    Real-Word Errors

    Isword

    part of a

    confusion

    set?

    Sort Confusion Set

    in Context

    Yes

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    22/31

    Sorting Confusion Set

    y Features

    y P(word)

    y Pp(word | previous)y Pn(word | next)

    y Pp(word | previous, previous2)

    y Pn(word | next, next2)y Score function : Neural Network

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    23/31

    Real-Word ErrorEvaluation

    Evaluation captures grammarcheckerandstatistical

    method. Details at: http://wp.me/pCBVi-a1

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    24/31

    GrammarChecker

    y The error:

    y I wonder if this is your

    companies way of providingsupport?

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    25/31

    GrammarChecker

    y The error:

    y I wonder if this is your

    companies way of providingsupport?

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    26/31

    GrammarChecker

    y The error:

    y I wonder if this is your

    companies way of providingsupport?

    y

    The rule:yPattern: your .*/NNS

    ySuggestion: your \1:possessive

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    27/31

    GrammarChecker

    yHow AtD sees the sentence:y I/PRP wonder/VBP if/IN this/DT is/VBZ

    your/PRP$ companies/NNS way/NN of/INproviding/VBG support/NN

    yHow AtD sees it:

    y your companies way = 0.000004%y your company's way = 0.000030%

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    28/31

    Design Principles

    y Speed over accuracy

    y Simplicity over complexity

    y Do what works

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    29/31

    Why now?

    y Cheap hardware

    y Persistent internet

    y Lots of data

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    30/31

    Open Source

    y Server technology is GPL

    y Bootstrap data available

    y Front-end technology is LGPL

  • 8/9/2019 After the Deadline at NAACL CL&W - June 2010

    31/31

    Where to get ithttp://open.afterthedeadline.com

    AtD Technology (GPL)

    http://www.afterthedeadline.com

    Homepage

    mailto: [email protected]

    My email, I dont bite.


Recommended