+ All Categories
Home > Education > 34016665 translation-and-paraphrasing

34016665 translation-and-paraphrasing

Date post: 28-Nov-2014
Category:
Upload: khanhhoa-tran
View: 878 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
29
Translated and Paraphrased Plagiarism The cat and mouse game continues... One man’s rigor is another man’s mortis. - CF Bohren and DR Huffman, 1983
Transcript
Page 1: 34016665 translation-and-paraphrasing

Translated and Paraphrased Plagiarism

The cat and mouse game continues...

One man’s rigor is another man’s mortis.

- CF Bohren and DR Huffman, 1983

Page 2: 34016665 translation-and-paraphrasing

Overview

The ever changing counter-detection landscape

Paraphrasing versus textual entailment

Ways to paraphrase

Tools of the trade

Page 3: 34016665 translation-and-paraphrasing

Many Roads to Plagiarism

The ‘old fashioned’ way

Page 4: 34016665 translation-and-paraphrasing

Many Roads to PlagiarismTranslated plagiarism.

Page 5: 34016665 translation-and-paraphrasing

Many Roads to PlagiarismTranslated plagiarism.

Page 6: 34016665 translation-and-paraphrasing

Many Roads to PlagiarismParaphrased plagiarism

Paraphrased plagiarism is not new either. However, there are new tools to aid in automatically paraphrasing text which are accelerating this form of detection avoidance.

Back-translation: the latest form of plagiarismMichael Jones University of Wollongong, Australia

4th Asia Pacific Conference on Educational Integrity (4APCEI) 28–30 September 2009

Paraphrase plagiat n'est pas nouveau non plus. Toutefois, il existe de nouveaux outils pour l'aide dans le texte paraphrase automatiquement qui sont l'accélération de cette forme d'évasion de détection.

Paraphrase plagiarism is not new either. However, there are new tools to help in paraphrasing the text automatically, which are accelerating this form of escape detection.

Page 7: 34016665 translation-and-paraphrasing

Many Roads to PlagiarismParaphrased plagiarism

Page 8: 34016665 translation-and-paraphrasing

Many Roads to PlagiarismParaphrased plagiarism

Page 9: 34016665 translation-and-paraphrasing

Paraphrasing vs Textual EntailmentTwo sentences are paraphrased if they “mean the same thing”:

1) Similarity: they share a substantial amount of information

2) Dissimilarities are extraneous: if extra information in the sentences exists, the effect of its removal is not significant.

Page 10: 34016665 translation-and-paraphrasing

Paraphrasing vs Textual Entailment

A paraphrase is a special case of textual entailment. A paraphrase is reflexive whereas textual entailment indicates that two sentences overlap to a degree with one sentence being subsumed by the other.

Page 11: 34016665 translation-and-paraphrasing

Ways to ParaphraseLexical substitution/synonymy

Hypo/Syno/Hyper-nym replacement: article, paper or red, crimson

Acronym replacement: Mr., mister

Contractions: do not, don’t

Compounding/decompounding: ballgame, ball game

Numeric/Alphabetic numbers: 11, eleven; 12/1/2010, December first two-thousand-ten

Page 12: 34016665 translation-and-paraphrasing

Ways to ParaphraseActive and passive exchange

The gangster killed 3 innocent people. vs Three innocent people are killed by the gangster.

Re-ordering of sentence components

Tuesday they met vs They met TuesdayZhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 13: 34016665 translation-and-paraphrasing

Ways to Paraphrase

Realization in different syntactic components

Palestinian leader Arafat vs Arafat, Palestinian leader

Prepositional phrase attachment

The Alabama plant vs A plant in Alabama

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 14: 34016665 translation-and-paraphrasing

Ways to ParaphraseChange into different sentence types

Who drew this picture? vs Tell me who drew this picture.

Morphological derivation

He is a good teacher. vs He teaches well. vs He is good at teaching.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 15: 34016665 translation-and-paraphrasing

Ways to ParaphraseLight verb construction

The film impressed him. vs The film made an impression on him.

Comparatives vs. superlatives

He is smarter than everyone else. vs He is the smartest one.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 16: 34016665 translation-and-paraphrasing

Ways to Paraphrase

Converse word substitution

John is Mary's husband. vs Mary is John's wife.

Verb nominalization

He wrote the book. vsHe was the author of the book.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 17: 34016665 translation-and-paraphrasing

Ways to ParaphraseSubstitution using words with overlapping meanings

Bob excels at mathematics. vs Bob studies mathematics well.

Inference

He died of cancer. vs Cancer killed him.Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 18: 34016665 translation-and-paraphrasing

Ways to ParaphraseDifferent semantic role realization

He enjoyed the game. vs The game pleased him.

Subordinate clauses vs separate sentences lined by anaphoric pronouns.

The tree healed its wounds by growing new bark. vs The tree healed its wounds. It grew new bark.

Zhou, Ming and Niu, Cheng. “Principled Approach to Paraphrasing.” U.S. Patent 11,934,010. 1 Nov 2007.

Page 19: 34016665 translation-and-paraphrasing

Tools of the TradeMicrosoft paraphrase corpus

Used to test algorithms

WordNet: English only :(

Synonyms, hypernyms, hyponyms, and antonyms.

Algorithms: Finite State Transducers (FSTs) and/or iterative Longest Common Sequence (LCS) on sets.

Page 20: 34016665 translation-and-paraphrasing

Tools of the Trade

Stemming or lemmatization

am, are, is be

car, cars, car's, cars' car

Page 21: 34016665 translation-and-paraphrasing

Word Alignment Examples

According to the MS paraphrase corpus:

This is a paraphrase

Not Paraphrased (However, the first sentence is textually entailed by the second. Turnitin would currently match this.)

12/14 = 86%12/16 = 75%

18/19 = 95%18/26 = 69%

Page 22: 34016665 translation-and-paraphrasing

Slippery SlopeWhen does textual entailment become arbitrary/noise?

14/48 = 29%14/34 = 41%

Page 23: 34016665 translation-and-paraphrasing

Slippery SlopeWhen does textual entailment become arbitrary/noise?

13/24 = 54%13/21 = 62%

Page 24: 34016665 translation-and-paraphrasing

Translated Plagiarism

Non-English markets, in particular, are concerned about their English as a second language students submitting English documents that have been translated to their native language.

Page 25: 34016665 translation-and-paraphrasing

Translated Plagiarism

Initial approach:

Non-English documents searched as they are now

Additional search performed: Translate document to English, search English documents, and then display English matches with translations (or vice versa)

Page 26: 34016665 translation-and-paraphrasing

Translated PlagiarismOur new strategic partner:

On demand SaaS statistical machine translation

Page 27: 34016665 translation-and-paraphrasing

Translated Plagiarism: Need for

Paraphrasing?Machines and humans translate text in many different ways.

Paraphrase detection allows us to match the variations.

Google translate: The zeitgeist is thinking and feeling one age. The term describes the characteristics of a particular period, or an attempt to remind us it. The German word Zeitgeist is transferred through English as a loanword into numerous other languages been.

Bing translate: Zeitgeist is thinking and feeling how an age. Is the nature of a particular era or trying to understand them. The German word Zeitgeist is taken from English as a loanword in many other languages.

http://de.wikipedia.org/wiki/Zeitgeist

Page 28: 34016665 translation-and-paraphrasing

Possible Turnitin UI

Page 29: 34016665 translation-and-paraphrasing

Finis

Thank you for listening!

Questions?


Recommended