+ All Categories
Home > Documents > Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Date post: 21-Apr-2015
Category:
Upload: internet
View: 105 times
Download: 1 times
Share this document with a friend
46
Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia
Transcript
Page 1: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building a parallel corpus for translation research

and much more"

Ana Frankenberg-Garcia

Page 2: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

The study of human translation

Traditionally not a hard scienceDifficult to be systematic

With the advances of corpus linguistics,

things can change …

Page 3: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

What is a corpus?

large

specific criteriatext-retrieval software

machine-readable

naturally occurring texts

Page 4: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Advantages of using corpora to study human translation

An enormous amount of translated texts

Systematic analyses

Quantifiable results

Page 5: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Corpora used in translation practice and research

1. Bilingual comparable corpora Farmhouse holidays (EN) & Agroturismo (IT)

2. Monolingual comparable corpora Translational English Corpus (EN)

3. Simple parallel corpora Tectra (EN-GL)

4. Bidirectional parallel corpora COMPARA (PT-EN and EN-PT)

Page 6: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building parallel corpora text selection

• Genre (scientific, imaginative, technical, etc.)

• Mode (oral? written?)

• Variety (standard? regional?)

• Time (contemporary? older?)

• Languages (which? just two or more?)

• Translations (professional? native speakers? different translators? )

• Simple or bidirectional?

Are there translations?

Page 7: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building parallel corpora example of interrelated factors

PT-EN or EN-PT PT-EN ↔ EN-PT

scientificacademic

tourism

literaturepolitics (EP)

Languages: PT-ENGenreoral popular

Page 8: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building parallel corpora

Personal use Shared use

copyright permissions

results verifiable

more users and uses

copyright

no hassle

Page 9: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building parallel corporacopyright

• Two permissions, double the work

• Publishers, authors and translators generally don’t know what a corpus is

• Protect

• Advertise

Page 10: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building parallel corpora alignment

Text?

Paragraph?

Sentence?

Clause?

Word?

Which parts of ST and TT match?

Page 11: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Building parallel corpora tagsAlignment tags

e.g. textual, grammatical, semantic

What do we want tags for? More pre-processing, less post-processing

Optional tags

<id=EBJT1 1845>Joe watched Robin climb into the trailer and man-handle the calves one by one towards the ramp, their winglike ears pierced with plastic identity tags.

<id=EBJT1 1845>Joe ficou a ver Robin subir para o atrelado e encaminhar as vitelas uma a uma para a rampa, com as suas orelhas, que faziam lembrar asas, furadas e umas etiquetas de plástico a identificá-las.

Page 12: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Our options for

A bidirectional parallel corpus of English and Portuguese

Funding Portuguese Government and European Union (FEDER and FSE) contract ref. POSC/339/1.3/C/NAC

Project leaders Ana Frankenberg-Garcia & Diana SantosResearch assistants Pedro Sousa, Rosário Silva & Susana Inácio

Page 13: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

PT Source texts EN Source texts

Corpus structure

EN TranslationsPT Translations

parallel

bi-directional

parallel

Page 14: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

PT ENPT1 PT2

EN1 EN2

ST TT

Page 15: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Language varieties

Portugal

Brazil

Angola

Mozambique

UK

US

South Africa

PORTUGUESE ENGLISH

Unbalanced distribution!

Page 16: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Publication dates

1837

2002

1880

1997

1988

1914

Page 17: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Genre

Published fiction other genres

EXTENSIBLE

Page 18: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Portuguese authors

PortugalCamilo Castelo BrancoEça de QueirósJosé Cardoso PiresJosé SaramagoJorge de SenaLídia JorgeMário de CarvalhoSá Carneiro

Brazil Aluísio AzevedoAutran Dourado Chico Buarque Jô SoaresJosé de AlencarMachado de AssisManuel Antônio de AlmeidaMarcos ReyPatrícia MeloPaulo CoelhoRubem Fonseca

MozambiqueMia Couto

AngolaJosé Eduardo Agualusa

Page 19: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

English authors

British IslesDavid Lodge

Ian McEwan

Julian Barnes

Joseph Conrad

Joanna Trollope

Kazuo Ishiguro

Lewis Carrol

Mary Shelley

Oscar Wilde

United StatesHenry JamesEdgar Allan PoeRichard Zimler

South AfricaNadine Gordimer

Page 20: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Portuguese translators

Ana Maria Amador, Ana Falcão Bastos, Ana Luísa Faria, Aníbal Fernandes, Carlos Grifo Babo, Cristina Ferreira de Almeida, Cristina Rodriguez, Eduardo Guerra Carneiro, Fernanda Pinto Rodrigues, Geraldo Galvão Ferraz, Helena Cardoso, Januário Leite, José Viera Lima, J. Teixeira de Aguilar, Lídia Cavalcante-Luther, Lucinda Santos Silva, Luís Lobo, Manuel João Gomes, M. F. Gonçalves de Azevedo, Maria Carlota Pracana, Maria do Carmo Figueira, Mário Martins de Carvalho, Nina Videira, Paula Reis, Yolanda Artiaga.

Page 21: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

English translators

Adria Frizzi, Alan Clarke, Alexis Levitin, Alice Clemente, Cliff Landers, David Brookshaw, David Rosenthal, Elizabeth Lowe, Ellen Watson, Helen Caldwell, Giovanni Pontiero, Graeme Mac Nicoll, Gregory Rabassa, Isabel Burton, John Gledson, John Parker, John Byrne, John Vetch, Margaret Jull Costa, Mary Fitton, Natália Costa, Peter Bush, Richard Zenith, Ronald W. Sousa.

Page 22: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Can any text be included in the corpus?

Only published source texts and translations

Only English translated directly from Portuguese

Portuguese translated directly from English

Only human translations!

Page 23: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

72 source texts (extracts)

75 translations

Texts

Page 24: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Size

1,549,551 1,436,493words words in in English Portuguese

Possibly the largest existing edited parallel corpus

Page 25: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Interface

Free

Easy to use by people who have never heard of corpora before

Powerful and flexible tool for experienced corpus users

Results good for research and education

www.linguateca.pt/COMPARA/

Page 26: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 27: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

“nodded”

Page 28: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 29: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 30: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

ST

TT

0

2

4

6

8

10

12

14

100 K words

Distribution of “nodded” in source texts and translations

Page 31: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Users and uses

Language learners and anyone working with PT-EN bilingual dictionary with examples

Language teachers exercises and tests

Translators language equivalents

Translation lecturers exercises & problems

Translation theorists test translation hypotheses

Lexicographers bilingual dictionaries

Computational linguists and language engineers machine translation and other applications

Page 32: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Backstage options

Page 33: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Text tags

EBJB1.ptele revelou-me o seu interesse por Gosse <tnote> Edmund William Gosse (1849-1928), crítico inglês </tnote> e pela sociedade literária inglesa dos finais do século passado.

EBDL2T1.enWhen we sat on the sofa together to watch <title>News at Ten</title>

Page 34: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

EBDL1T1.pt passou-me uma receita de <named> Valium </named>

EBJB1.en the white bear, <foreign> thalassarctos maritimus </foreign>, is the aristocrat of bears...

EBDL1T1.ptacaba por se esquecer de ter medo, até que acaba por verificar que não há <emph> de que </emph> ter medo.

Text tags

Page 35: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 36: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 37: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

1 alignment unit = 1 source-text sentence

S

S

S

S

S2

S S(+S)

S

Ø

ST TT

Alignment options and tags

Page 38: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 39: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 40: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

Portuguese: PALAVRAS

Petrus/PROP pediu/V_fmc a/DETartd especialidade/N da/PRP+DETartd casa/N --/PU uma/DETarti paella/N valenciana/ADJ --/PU que/SPECrel comemos/V em/PRP silêncio/N ,/PU acompanhados/V apenas/ADV do/PRP+DETartd saboroso/ADJ vinho/N Rioja/PROP ./PU

Grammar tags

Page 41: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

[pos="V.*"] "silêncio"

Page 42: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 43: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

English: CLAWS (coming soon)

Petrus/NP1 asked/VVD for/IF the/AT specialty/NN1 of/IO the/AT house/NN1 --a/AT1 Valencia/NP1 paella/NN1 --which/DDQ we/PPIS2 ate/VVD in/II silence/NN1 ./.

Grammar tags

Page 44: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

I did, too --changed over to the knitted tie at a <sem=“cor”> red </sem>light.

People interested in creating specific tags for their research can do so, as long as they do the tag insertion and revision work

Specific tag revision interface underway (Sousa, in preparation)

e.g. semantic tag for colour (Inácio et al. 2007)

Other tags

Page 45: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.
Page 46: Building a parallel corpus for translation research and much more" Ana Frankenberg-Garcia.

1. Observing source texts and translations

2. Constrasting Portuguese and English

3. Comparing translated and untranslated language

4. Examining the characteristics of translated texts

Research work

Studies unthinkable before corporaMany other studies possible!

www.linguateca.pt/COMPARA/ComparaPublications.html


Recommended