Date post: | 07-Dec-2014 |
Category: |
Technology |
Upload: | anabela-barreiro |
View: | 927 times |
Download: | 0 times |
SPIDER: A SYSTEM FOR PARAPHRASING
IN DOCUMENT EDITING AND REVISION
APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING
Anabela Barreiro
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
OUTLINE
INTRODUCTION
PARAPHRASES IN NLP
PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS
SPIDER
FIRST STEPS
IMPORTANT FEATURES
PARAPHRASES COVERED BY SPIDER
INTERFACE
LINGUISTIC RESOURCES
EVALUATION RESULTS
THE FUTURE
FUTURE APPLICATIONS?
FUTURE RESEARCH
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
Question Answering[Ibrahim et al., 2003], [Paşca, 2003], [Duboué & Chu-Carroll, 2006]
Information Extraction and Text Mining [Ibrahim et al., 2003], [Shinyama et al., 2002] [Shinyama & Sekine, 2003], [Sekine, 2005] [Paşca, 2005], [Paşca & Dienes, 2005]
Summarization [McKeown et al., 2002], [Barzilay, 2001, 2003], [Hirao et al., 2004] [Zhou et al., 2006b]
Natural Language Generation[Iordanskaja et al. 1991]
Plagiarism Detection [Potthast et al., 2010], [Vila et al., 2010]
Machine Translation [Zhou et al., 2006], [Callison-Burch et al., 2006a, 2006b, 2007 and 2008] [Barreiro, 2008, 2009, 2011]
IMPORTANCE OF PARAPHRASES IN NLP TASKS
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
THE PRACTICAL NEED FOR PARAPHRASES
IN PEDAGOGICAL CONTEXTS
Text Processing and Authoring Aids
Writing and revision of original/creative/customized texts
Learning Tools
Native and second language learning
Creation of clear and understandable text content
e.g. students learning language and writing skills
Style Editors
Uniformization /consistency of style
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
THE PRACTICAL NEED FOR PARAPHRASES
IN PROFESSIONAL CONTEXTS
Technical Writing
Professional high quality documentation and domain-specific texts
Controlled language
Linguistic Quality Assurance
Linguistic quality of generic texts and specialized documentation
Verification/validation of meaningful content
Text Optimization
Readable / publishable texts (business-oriented or purpose-oriented content)
Terminology
Search for the “exact” term or relevant keywords
Translation
Indispensable for human and machine translation (pre-editing and post-editing)
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
OUTLINE
INTRODUCTION
PARAPHRASES IN NLP
PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS
SPIDER
FIRST STEPS
IMPORTANT FEATURES
PARAPHRASES COVERED BY SPIDER
INTERFACE
LINGUISTIC RESOURCES
EVALUATION RESULTS
THE FUTURE
FUTURE APPLICATIONS?
FUTURE RESEARCH
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
SPIDER PARAPHRASING SYSTEM
FIRST STEPS
Initially developed for Portuguese 1st version – ReEscrevepublicly available service at http://www.linguateca.pt/ReEscreve/
2nd version – eSPERTo (Portuguese: the smart/clever one; expert)currently being integrated in a cyber school project within the scope of an educational program
Writing exercises – students learning how to improve their writing skills in the Portuguese language
English SPIDERprototype to assist writing of domain-specific texts
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
SPIDERIMPORTANT FEATURES
Applies linguistic knowledge to recognize and generate paraphrasesautomatically (preserves the source text semantics and grammaticality -inflectional features) in the suggestions provided (included transformations ofmulti-word units)
Uses text-editing mechanisms which provide a variety of alternatives foreach expression and the possibility to choose among them (according topersonal preferences, style, idiomacity, etc.)
Allows users to suggest new expressions that can be immediately appliedto their text, making the text editing process easier, more flexible, andupgradable
Designed to help with writing optimization, understandability andtranslatability (improvement of the quality of the source text so that it can causea positive impact in translation)
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
PARAPHRASES COVERED BY SPIDER
Synonyms in context (ex: phrasal verbs into equivalent expressions)to clear up (weather) = (weather) to become better/brighter
Support verb constructions into single verbs and stylistic variantsto make a decision = to decide; to make an audit = to perform an audit
Aspectual constructions into single verbsto launch an attack = to attack
Adverbials (compounds into single adverbs)in a constructive way = constructively
Relatives into participial adjectivesthe president that was elected = the president elect
Relatives into possessivesthe role that Europe plays/has = the role of Europe
Relatives into compound nouns (and vice-versa)a container for the milk = a milk container; a bottle made of plastic = a plastic bottle
Agentive passives into activesthe man was released by the police officer = the police officer released the man
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
INTERFACE
SUGGESTIONS FOR EXAMPLE SENTENCES
Suggestions for general languagelinguistic phenomena
Compound adverbs > single adverbs
Support verb constructions > single verbs
Relatives > participial adjectives
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
INTERFACE
SELECTION OF PARAPHRASING GRAMMARS FOR SPECIFIC
LINGUISTIC PHENOMENA
Users can select among general and technical dictionaries (more than one selection allowed), grammars for specific linguistic transformations (one, several or all grammars can be selected). The interface provides sample texts for testing.
Sample LEGAL text
Informative details about the linguistic resources selected
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
Identification of legal terms in the text
Suggestions for the term “breach of law”
Users can select one term from the list of suggestions or provide a new suggestion
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
INTERFACE
SELECTION OF A DOMAIN DICTIONARY
Text rewritten• In red, the expressions in the source text
• In green, suggestions provided by SPIDER and selected by the user
The user can suggest new words orexpressions (synonyms or paraphrases)
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
INTERFACE
SUGGESTIONS PROVIDED AND USER’S CAPABILITY TO ADD NEW REWRITING
OPTIONS
It is possible to go back and change the user option as many times as necessary
LINGUISTIC RESOURCES
Eng4NooJ – linguistic knowledge system
• OpenLogos dictionary (http://logos-os.dfki.de/)
• converted into NooJ format, and enhanced with newproperties, including derivational and morpho-syntacticand semantic relations
• Morphological system
• Contextual rules and grammars
• Domain specific dictionary (sample “legal terms”)
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
NDRV04 = <B>ion/Npred+NomADRV02 = <B>icableAVDRV01 = <E>ly/ADVAVDRV04 = <B>tically/ADV
impress,V+FLX=POLISH+SAL=PVPCpleasetype+PT=impressionar+DRV=NDRV01:BOOK+VSUP=make+VSUP=cause+NPREP=onaesthetic,AFLX=NATURAL+SAL=AVstate+PT=aesthetically+DRV=AVDRV03skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02
Grammar to recognize adverbial compounds and
transform them into equivalent single adverbs
Rules to transform
morpho-syntactically
and semantically
related words of
different parts of
speech
General language dictionary entries
LINGUISTIC RESOURCES
Morpho-syntactic
and semantic
relations
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
Rules to improve precision in specific contexts [bring(vt)) N(charge; action) > present(vt) N(idem)]
Contextual rules
Sample of terms classified as Information +
Instructional/legal
LINGUISTIC RESOURCES
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
EVALUATION RESULTS: PARAPHRASING
PRECISION
SVC Recognition
Precision
SVC Recognition
Recall
SVC Paraphrasing
Precision
Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6%
Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1%
Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7%
Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1%
Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8%
Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%
Evaluation of recognition and paraphrasing of support verb constructions
Corpus: 500 sentences
100 sentences for each of 5 elementary support verbs
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
EVALUATION RESULTS: IMPACT ON
TRANSLATABILITY (MT)Same corpus, 50 sentences selected randomly
(i) automated pre-processing of support verb constructions with SPIDER and conversion into equivalent single verbs
(ii) pre-processed sentences (automatically generated paraphrases) and original text are submitted to MT and the output translations for both original and pre-processed sentences were compared
• 29 (58%) of the best translations were of automatically generated paraphrases• 9 (18%) were of support verb constructions • 12 (24%) were equally bad or equally good
CONCLUSIONThe experiment indicates that paraphrases such as those generated by SPIDER help
improve translation scores
• The automated paraphrasing of support verb constructions through SPIDER
allowed a significant improvement of the quality of the MT results in that context
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
OUTLINE
INTRODUCTION
PARAPHRASES IN NLP
PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS
SPIDER
FIRST STEPS
IMPORTANT FEATURES
PARAPHRASES COVERED BY SPIDER
INTERFACE
LINGUISTIC RESOURCES
EVALUATION RESULTS
THE FUTURE
FUTURE APPLICATIONS?
FUTURE RESEARCH
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
FUTURE APPLICATIONS?• Writing / authoring aid (word processing applications)
• Language composition tool - general and technical language (e.g. student texts or legaltexts)
• Text production and style editor
• Terminology verification tool - professional use of terminology in technical domains(elimination of informal, idiomatic, slang use of language)
• Empirical testbed for linguistic quality assurance (source and target texts)
• Text editing (machine translation pre-editing and post-editing) and translation aid
• Controlled language tool
• Consistent, direct, and simple language• Restricted grammar (avoid certain types of construction)• Avoid complex reasoning, figures of speech, metaphors, etc.• Elimination of wordiness
• “Revision memory” tool (≈ “translation memory”) - recycling of validated reviewedsentences, structures or phrases
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
$EN
FUTURE RESEARCHFROM SPIDER TO MACHINE TRANSLATION
a fazer um estágio para dar aulas de / tutor Religião
a fazer um estágio para dar aulas de / lecture Religião
a fazer um estágio para dar aulas de / teach Religião
começa a dar exemplos / exemplify :
sentia-se capaz de dar um murro em / punch quem quisesse detê-lo
gostávamos de lhe dar uma palavrinha / speak .
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan
SPIDER: A SYSTEM FOR PARAPHRASING
IN DOCUMENT EDITING AND REVISION
APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING
Anabela Barreiro
CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan