+ All Categories
Home > Documents > EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans...

EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans...

Date post: 15-Jan-2016
Category:
Upload: ashlee-malone
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
34
EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research Center on CALL
Transcript
Page 1: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Contextualising concordances for corpusCALL

Hans Paulussen & Piet DesmetK.U.Leuven / KULAK

ALT Research Center on CALL

Page 2: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Overview

• Corpora for CALL: samples

• Types of sample rendition

• XML: new opportunities

Page 3: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Corpora for CALL /1

• Corpora for learning activities– before: preparation of exercises– during: corpus material as part of the learning

activity– after: corpus material for feedback

Page 4: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Corpora for CALL /2

• Corpora as reference material– learner dictionaries– learner grammars

Page 5: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Corpora during learning activities

• corpus is part of learning activity– Mariana (Vordingburg Gymnasium, Denmark)

• http://www.vordingbg-gym.dk/km/ict4lt/

• corpus supports learning activity– NEDERLEX (FUNDP Namur)

• http://obelix.droit.fundp.ac.be/droit1/index.php

Page 6: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 7: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

REBECA

• Ressources Electroniques Bilingues Extraites de Corpus Alignés (bilingual electronic ressources extracted from aligned corpora)

• Parallel corpus:5,000,000 Dutch 5,000,000 French

• automatic corpus selection• sentence alignment

Page 8: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 9: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Resource links

REBECA

alignedbilingualcorpus

filteredKWICfiles

coursetexts

lexicon

Page 10: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

http://corpora.informatik.uni-leipzig.de/

Page 11: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Drinking glasses /1• At the winery, Evxinograd's director, Ivan Penkov, is pouring out

glasses of his 20-year-old brandy. (source: Wall Street Journal 1991)

• Drinking glasses are plastic and are weighted on the bottom so that they are tough to knock over. (source: Wall Street Journal 1991)

• Noting that Winston Churchill regularly drank several glasses of whiskey, brandy, champagne, and at least one high-ball during the working day, the Economist observed on March 4, "he could never have been trusted to run the Pentagon." (source: Wall Street Journal 1989)

• The two drain their glasses in one gulp. (source: Wall Street Journal 1991)

Page 12: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Drinking glasses /2• Yet, consumerism lived, even if it didn't fill too many champagne

glasses. (source: Wall Street Journal 1988)• After Investcorp took over, Tiffany played down its $10 wine

glasses to concentrate on the high-priced diamonds and gold jewelry that had made it famous. (source: Wall Street Journal 1990)

• At a black-tie benefit hosted by ARA Services Inc. a few weeks ago, Chairman Joseph Neubauer and members of his management team exuded confidence as they moved from one dinner table to the next, shaking hands, patting backs and clinking glasses. (source: Wall Street Journal 1988)

Page 13: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Spectacles• Mr. Brown wears tinted aviator glasses, combat boots, and a

Soldier of Fortune cap on his closely shaved head. (source: Wall Street Journal 1991)

• She must wear prism glasses to correct double vision caused by the accident. (source: Wall Street Journal 1991)

• In meetings, he often can be seen chewing on the end of his reading glasses; sometimes, he speaks so softly that he can't be heard. (source: Wall Street Journal 1990)

• His horn-rimmed glasses and rakish beret were irresistibly photogenic. (source: Wall Street Journal 1987)

Page 14: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Problematic glasses• And its replaceable filters are good for only about 100 glasses.

(source: Wall Street Journal 1991)• I still have to find his glasses and keys for him. (source: Wall Street

Journal 1988)• The police confiscated her watch and glasses. (source: Wall Street

Journal 1989)• He plans to hand out 100 glasses when he performs in

Washington, D.C., in December at the Kennedy Center's Mozart Festival. (source: Wall Street Journal 1991)

• He often hands out glasses to his audience and has them play chords. (source: Wall Street Journal 1991)

• The glasses were my idea. (source: Wall Street Journal 1988)

Page 15: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Rendering authentic text samples

• Linked samples

• Extracted samples

• Embedded samples

Page 16: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Linked samples

• The sample is linked to the original document (e.g. pdf document)– Original context & layout– Full context– Problem: sample skimming

Page 17: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Extracted samples

• The example is extracted from the original document (e.g. KWIC concordance)– Sample shown in immediate context– Layout: not authentic– Context: limited

Page 18: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Embedded samples

• The example is embedded in the original document– Sample shown in full context– Layout: authentic– Problem: recreating and indexing the

document

Page 19: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

XML -> XHTML

• XML: extensible markup language

• Stylesheets:– CSS: cascading style sheet– XSLT: XML style sheet transformations

• XHTML

Page 20: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Web reinvents standardisation

• SGML: standard generalized markup language (1968; ISO in 1986)

• HTML: hypertext markup language (1993)

• XML: extensible markup language (1998)

• XHTML: extensible HTML

Page 21: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 22: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 23: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 24: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 25: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

<?xml version ="1.0" encoding="ISO-8859-1"?><!DOCTYPE poème SYSTEM "poemfr.dtd"><poème><préambule><titre>Chanson d'automne</titre><recueil>Poèmes saturniens</recueil><date>1866</date><auteur>Paul Veraine</auteur></préambule><corps><stance><ligne>Les sanglots longs</ligne><ligne>Des violons</ligne><ligne><r/>De l'automne</ligne><ligne>Blessent mon coeur</ligne><ligne>D'une langueur</ligne><ligne><r/>Monotone.</ligne></stance>

Page 26: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

poem.dtd

<?xml version="1.0" encoding="ISO-8859-1"?><!-- poemfr.dtd : DTD pour poésie M. Goossens --><!ELEMENT poème (préambule, corps)><!ELEMENT préambule (titre, recueil?, date?, auteur)><!ELEMENT titre (#PCDATA)><!ELEMENT recueil (#PCDATA)><!ELEMENT date (#PCDATA)><!ELEMENT auteur (#PCDATA)><!ELEMENT corps (stance|ligne)+><!ELEMENT stance (ligne)+><!ELEMENT ligne (#PCDATA|r)*><!ELEMENT r EMPTY>

Page 27: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

xpath

$ xpath -e '//*/stance[contains(., "langueur")]' Verlaine1.xml Found 1 nodes in Verlaine1.xml:-- NODE --<stance><ligne>Les sanglots longs</ligne><ligne>Des violons</ligne><ligne><r />De l'automne</ligne><ligne>Blessent mon coeur</ligne><ligne>D'une langueur</ligne><ligne><r />Monotone.</ligne></stance>

Page 28: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 29: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 30: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 31: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 32: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 33: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Page 34: EUROCALL September 2006 — Universidad de Granada Contextualising concordances for corpusCALL Hans Paulussen & Piet Desmet K.U.Leuven / KULAK ALT Research.

EUROCALL September 2006 — Universidad de Granada

Conclusion

• Recreating an authentic document containing indexed samples is feasible

• At what cost?– Full control of production cycle– Text and images?– Optimisation of on-the-fly rendition


Recommended