+ All Categories
Home > Documents > Using the Corpógrafo

Using the Corpógrafo

Date post: 01-Feb-2016
Category:
Upload: tamah
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Using the Corpógrafo. Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA. First steps. Get a username and password You will receive one automatically. Working with the Corpógrafo. Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research All research done ONLINE - PowerPoint PPT Presentation
Popular Tags:
43
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA
Transcript
Page 1: Using the Corpógrafo

USP workshop

Using the Corpógrafo

Belinda Maia & Luís Sarmento

PoloFLUP

LINGUATECA

Page 2: Using the Corpógrafo

USP workshop

First steps

• Get a username and password

• You will receive one automatically

Page 3: Using the Corpógrafo

USP workshop

Page 4: Using the Corpógrafo

USP workshop

Page 5: Using the Corpógrafo

USP workshop

Working with the Corpógrafo

• Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research

• All research done ONLINE• Each username/password = separate space on

our server• At present > anyone can work with it using 10

MB space for FREE• BUT - you get an empty space + tools + tutorial!

Page 6: Using the Corpógrafo

USP workshop

Help Files

• Introdução à utilização do Corpógrafo - um pequeno tutorial A tutorial – to be translated into English – describing the whole process of terminiology research using the Corpógrafo. Available in PDF.

• Corpógrafo Roadmap In English and Portuguese – a panoramic view of the Corpógrafo and how it works. Available in PDF.

• The Corpógrafo in Easy Stages In English and Portuguese – User’s guide to the Corpógrafo and FAQ. Available in PDF.

• Also Note > on entry page there is a Glossary of terms and instructions PT > EN

Page 7: Using the Corpógrafo

USP workshop

File Manager

Area where each individual or group can:– upload texts to space on server– convert various text formats to .txt– ‘clean’ them of unnecessary material– check tokenization and sentence divisions– register full information on source, domain

and text type– group – and re-group - texts into corpora

Page 8: Using the Corpógrafo

USP workshop

File Manager

• 1. Files• >List Files on Server• >Add Files• >Add Files from URL (Experimental!)

2. Corpora • > List Corpora

> Compile New Corpus

Page 9: Using the Corpógrafo

USP workshop

Page 10: Using the Corpógrafo

USP workshop

Page 12: Using the Corpógrafo

USP workshop

Page 13: Using the Corpógrafo

USP workshop

Page 14: Using the Corpógrafo

USP workshop

Page 15: Using the Corpógrafo

USP workshop

Page 16: Using the Corpógrafo

USP workshop

Page 17: Using the Corpógrafo

USP workshop

General corpus analysis

Corpora analysis area:• Concordancing tools for regular

expressions – at sentence level– KWIC concordancing– Collocations

• N-gram tool– Case-sensitive– Alphabetical or frequency ordering

Page 18: Using the Corpógrafo

USP workshop

Page 19: Using the Corpógrafo

USP workshop

Page 20: Using the Corpógrafo

USP workshop

Page 21: Using the Corpógrafo

USP workshop

Page 22: Using the Corpógrafo

USP workshop

Corpora + TDB

• Choose corpus

• Choose related TDB

= All terms, examples, definitions extracted from corpus (semi) automatically transferred to TDB

= All metadata on texts in corpus can be automatically transferred to TDB

Page 23: Using the Corpógrafo

USP workshop

Term extraction

• N-grams– Unfiltered– Filtered with restrictions on term in

PT,EN,FR,IT,ES,DE– Filtered with restrictions on term and context

in PT,EN,FR,IT,ES,DE– Singular + plural terms can be combined– Existing terms in TDB need not appear

Page 24: Using the Corpógrafo

USP workshop

Page 25: Using the Corpógrafo

USP workshop

Term selection from n/grams

• Consultation of list of n-grams

• Check term status of each n-gram via underlying concordances

• Check sources

• Send to TDB

Page 26: Using the Corpógrafo

USP workshop

Page 27: Using the Corpógrafo

USP workshop

Page 28: Using the Corpógrafo

USP workshop

Page 29: Using the Corpógrafo

USP workshop

Search for definition candidates

• Already possible via TDB

• Under development

• Research area for Mestrado dissertations and bolseiros

Page 30: Using the Corpógrafo

USP workshop

TDB - Terminology database

Databases are designed to be multilingual– Terms listed alphabetically + language tag– General data– Morphological data– Source metadata: Authors, texts etc– Definitions + search for candidates– Translation equivalents– Semantic relations

Page 31: Using the Corpógrafo

USP workshop

Page 32: Using the Corpógrafo

USP workshop

Page 33: Using the Corpógrafo

USP workshop

Page 34: Using the Corpógrafo

USP workshop

Page 35: Using the Corpógrafo

USP workshop

Page 36: Using the Corpógrafo

USP workshop

Page 37: Using the Corpógrafo

USP workshop

Page 38: Using the Corpógrafo

USP workshop

Page 39: Using the Corpógrafo

USP workshop

Page 40: Using the Corpógrafo

USP workshop

Page 41: Using the Corpógrafo

USP workshop

Page 42: Using the Corpógrafo

USP workshop

Page 43: Using the Corpógrafo

USP workshop

Future developments – general policy

• General testing and improvement

• Development of new ideas or functions – using isomorphic relationships between researchers’ needs and our possibilities

• Coordination of individual corpus projects into bigger projects, when possible or necessary


Recommended