+ All Categories
Home > Documents > Astin2011 Charla

Astin2011 Charla

Date post: 03-Jun-2018
Category:
Upload: dogass1
View: 221 times
Download: 0 times
Share this document with a friend
45
The only option is open: Why should language technology and resources be free? Francis M. Tyers 1 Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03071 Alacant, Spain 11th May 2011
Transcript
Page 1: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 1/45

The only option is open: Why should languagetechnology and resources be free?

Francis M. Tyers

1Departament de Llenguatges i Sistemes Informàtics,Universitat d’Alacant, E-03071 Alacant, Spain

11th May 2011

Page 2: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 2/45

Introduction

  Introduction to who am I and  why I am here

 Description of some problems we face when developing andworking with language resources and technology

  Introduction to software and resource pools

  Description of how these solve many of the problemsdescribed

 Discussion of some commercial aspects to pools

  Conclusions

Page 3: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 3/45

Who am I?

I . . .

  am a PhD student at the Universitat d’Alacant

  have been working with free software for approx. ten years  and language technology for around five years   spend most of my time working on machine translation,   but also (necessarily) work on other language technology

Page 4: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 4/45

Why am I here?

I was asked to give a talk about the importance of

  free dissemination and access to resources

  both within a single language and multilinguallyAdditionally,

  I would like to describe my experiences as a developer offree/open language technology

Page 5: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 5/45

Why am I here?

I was asked to give a talk about the importance of

  free dissemination and access to resources   both within a single language and multilingually

Additionally,

  I would like to describe my experiences as a developer offree/open language technology

 ... I hope it doesn’t get too tiresome

Page 6: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 6/45

What are language resources and technology?

Often the terms are used interchangeably,

 Language resources: Data with which language processingapplications are made.

Ex.: A machine-readable dictionary, a treebank or parallel

corpus

  Language technology: Software with which languageprocessing applications are made.

Ex.: A machine translation engine, parser or spellchecking

engine

Sometimes in natural language processing it is difficult to separatedata from software.

Page 7: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 7/45

What is free and open?

These are the four essential freedoms published by theFree Software Foundation (FSF).

  Freedom 0: The freedom to run the program, for any purpose   Freedom 1: The freedom to study how the program works,

and change it to make it do what you wish   Freedom 2: The freedom to redistribute copies so you can

help your friends and neighbours   Freedom 3: The freedom to distribute copies of your modified

versions to othersOpen access to source code is a precondition to freedoms 1 and3, which is why it is also called  open-source .

Page 8: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 8/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X - (X) - - X X XSwedish   X X X - - - - - - - X -N. Norwegian   X X X - X - (X) - - - - -B. Norwegian   X X X - X - X - - - - -

Greenlandic   X X X - X - X - - - - -North Sámi   X X X - X - X - - - - -Lule Sámi   X X X - X - X - - - - -South Sámi   - X X - X - X - - - - -

Faroese   X - X - X - X - - - - -Icelandic   X - - X X X (X) - X (X) - -Danish   X - - - - X - X - X X XEstonian   X - - - - - - - - - X -Latvian   X - - - - - - - - - X -

Lithuanian   X - - - - - - - - - X -

Page 9: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 9/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X   X X - X - (X) - - X X XSwedish   X   X X - - - - - - - X -N. Norwegian   X   X X - X - (X) - - - - -B. Norwegian   X   X X - X - X - - - - -

Greenlandic   X   X X - X - X - - - - -North Sámi   X   X X - X - X - - - - -Lule Sámi   X   X X - X - X - - - - -South Sámi   -   X X - X - X - - - - -

Faroese   X   - X - X - X - - - - -Icelandic   X   - - X X X (X) - X (X) - -Danish   X   - - - - X - X - X X XEstonian   X   - - - - - - - - - X -Latvian   X   - - - - - - - - - X -

Lithuanian   X   - - - - - - - - - X -

Page 10: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 10/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X   X   X - X - (X) - - X X XSwedish   X   X   X - - - - - - - X -N. Norwegian   X   X   X - X - (X) - - - - -B. Norwegian   X   X   X - X - X - - - - -

Greenlandic   X   X   X - X - X - - - - -North Sámi   X   X   X - X - X - - - - -Lule Sámi   X   X   X - X - X - - - - -South Sámi   -   X   X - X - X - - - - -

Faroese   X   -   X - X - X - - - - -Icelandic   X   -   - X X X (X) - X (X) - -Danish   X   -   - - - X - X - X X XEstonian   X   -   - - - - - - - - X -Latvian   X   -   - - - - - - - - X -

Lithuanian   X   -   - - - - - - - - X -

Page 11: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 11/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X   X   - X - (X) - - X X XSwedish   X X   X   - - - - - - - X -N. Norwegian   X X   X   - X - (X) - - - - -B. Norwegian   X X   X   - X - X - - - - -

Greenlandic   X X   X   - X - X - - - - -North Sámi   X X   X   - X - X - - - - -Lule Sámi   X X   X   - X - X - - - - -South Sámi   - X   X   - X - X - - - - -Faroese   X -   X   - X - X - - - - -Icelandic   X -   -   X X X (X) - X (X) - -Danish   X -   -   - - X - X - X X XEstonian   X -   -   - - - - - - - X -Latvian   X -   -   - - - - - - - X -

Lithuanian   X -   -   - - - - - - - X -

Page 12: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 12/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X   -   X - (X) - - X X XSwedish   X X X   -   - - - - - - X -N. Norwegian   X X X   -   X - (X) - - - - -B. Norwegian   X X X   -   X - X - - - - -

Greenlandic   X X X   -   X - X - - - - -North Sámi   X X X   -   X - X - - - - -Lule Sámi   X X X   -   X - X - - - - -South Sámi   - X X   -   X - X - - - - -Faroese   X - X   -   X - X - - - - -Icelandic   X - -   X   X X (X) - X (X) - -Danish   X - -   -   - X - X - X X XEstonian   X - -   -   - - - - - - X -Latvian   X - -   -   - - - - - - X -

Lithuanian   X - -   -   - - - - - - X -

s

Page 13: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 13/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X -   X   - (X) - - X X XSwedish   X X X -   -   - - - - - X -N. Norwegian   X X X -   X   - (X) - - - - -B. Norwegian   X X X -   X   - X - - - - -

Greenlandic   X X X -   X   - X - - - - -North Sámi   X X X -   X   - X - - - - -Lule Sámi   X X X -   X   - X - - - - -South Sámi   - X X -   X   - X - - - - -Faroese   X - X -   X   - X - - - - -Icelandic   X - - X   X   X (X) - X (X) - -Danish   X - - -   -   X - X - X X XEstonian   X - - -   -   - - - - - X -Latvian   X - - -   -   - - - - - X -

Lithuanian   X - - -   -   - - - - - X -

s

Page 14: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 14/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X   -   (X) - - X X XSwedish   X X X - -   -   - - - - X -N. Norwegian   X X X - X   -   (X) - - - - -B. Norwegian   X X X - X   -   X - - - - -

Greenlandic   X X X - X   -   X - - - - -North Sámi   X X X - X   -   X - - - - -Lule Sámi   X X X - X   -   X - - - - -South Sámi   - X X - X   -   X - - - - -Faroese   X - X - X   -   X - - - - -Icelandic   X - - X X   X   (X) - X (X) - -Danish   X - - - -   X   - X - X X XEstonian   X - - - -   -   - - - - X -Latvian   X - - - -   -   - - - - X -

Lithuanian   X - - - -   -   - - - - X -

s e

Page 15: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 15/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X -   (X)   - - X X XSwedish   X X X - - -   -   - - - X -N. Norwegian   X X X - X -   (X)   - - - - -B. Norwegian   X X X - X -   X   - - - - -

Greenlandic   X X X - X -   X   - - - - -North Sámi   X X X - X -   X   - - - - -Lule Sámi   X X X - X -   X   - - - - -South Sámi   - X X - X -   X   - - - - -Faroese   X - X - X -   X   - - - - -Icelandic   X - - X X X   (X)   - X (X) - -Danish   X - - - - X   -   X - X X XEstonian   X - - - - -   -   - - - X -Latvian   X - - - - -   -   - - - X -

Lithuanian   X - - - - -   -   - - - X -

t s e e

Page 16: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 16/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X - (X)   -   - X X XSwedish   X X X - - - -   -   - - X -N. Norwegian   X X X - X - (X)   -   - - - -B. Norwegian   X X X - X - X   -   - - - -

Greenlandic   X X X - X - X   -   - - - -North Sámi   X X X - X - X   -   - - - -Lule Sámi   X X X - X - X   -   - - - -South Sámi   - X X - X - X   -   - - - -Faroese   X - X - X - X   -   - - - -Icelandic   X - - X X X (X)   -   X (X) - -Danish   X - - - - X -   X   - X X XEstonian   X - - - - - -   -   - - X -Latvian   X - - - - - -   -   - - X -

Lithuanian   X - - - - - -   -   - - X -

t s. e e .

Page 17: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 17/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X - (X) -   -   X X XSwedish   X X X - - - - -   -   - X -N. Norwegian   X X X - X - (X) -   -   - - -B. Norwegian   X X X - X - X -   -   - - -

Greenlandic   X X X - X - X -   -   - - -North Sámi   X X X - X - X -   -   - - -Lule Sámi   X X X - X - X -   -   - - -South Sámi   - X X - X - X -   -   - - -Faroese   X - X - X - X -   -   - - -Icelandic   X - - X X X (X) -   X   (X) - -Danish   X - - - - X - X   -   X X XEstonian   X - - - - - - -   -   - X -Latvian   X - - - - - - -   -   - X -

Lithuanian   X - - - - - - -   -   - X -

t s. e e .

Page 18: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 18/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X - (X) - -   X   X XSwedish   X X X - - - - - -   -   X -N. Norwegian   X X X - X - (X) - -   -   - -B. Norwegian   X X X - X - X - -   -   - -

Greenlandic   X X X - X - X - -   -   - -North Sámi   X X X - X - X - -   -   - -Lule Sámi   X X X - X - X - -   -   - -South Sámi   - X X - X - X - -   -   - -Faroese   X - X - X - X - -   -   - -Icelandic   X - - X X X (X) - X   (X)   - -Danish   X - - - - X - X -   X   X XEstonian   X - - - - - - - -   -   X -Latvian   X - - - - - - - -   -   X -

Lithuanian   X - - - - - - - -   -   X -

st s. e e p.

Page 19: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 19/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X - (X) - - X   X   XSwedish   X X X - - - - - - -   X   -N. Norwegian   X X X - X - (X) - - -   -   -B. Norwegian   X X X - X - X - - -   -   -

Greenlandic   X X X - X - X - - -   -   -North Sámi   X X X - X - X - - -   -   -Lule Sámi   X X X - X - X - - -   -   -South Sámi   - X X - X - X - - -   -   -Faroese   X - X - X - X - - -   -   -Icelandic   X - - X X X (X) - X (X)   -   -Danish   X - - - - X - X - X   X   XEstonian   X - - - - - - - - -   X   -Latvian   X - - - - - - - - -   X   -

Lithuanian   X - - - - - - - - -   X   -

st s. e e p.

Page 20: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 20/45

    K   e   y    b   o   a   r    d

    L   e   m   m   a    l    i   s    t

    M   o   r .   a   n .

    M   o   r .   g   u   e   s   s .

    R   u    l   e    d    i   s .

     S    t   a    t .    d    i   s .

    R   u    l   e   p   a   r   s   e

     S    t   a    t .   p   a   r   s   e

    T   a   g .   c   o   r   p .

    T   r   e   e    b   a   n    k

    P   a   r   a .   c   o   r   p .

    W   o   r    d    N   e    t

Finnish   X X X - X - (X) - - X X   XSwedish   X X X - - - - - - - X   -N. Norwegian   X X X - X - (X) - - - -   -B. Norwegian   X X X - X - X - - - -   -

Greenlandic   X X X - X - X - - - -   -North Sámi   X X X - X - X - - - -   -Lule Sámi   X X X - X - X - - - -   -South Sámi   - X X - X - X - - - -   -Faroese   X - X - X - X - - - -   -Icelandic   X - - X X X (X) - X (X) -   -Danish   X - - - - X - X - X X   XEstonian   X - - - - - - - - - X   -Latvian   X - - - - - - - - - X   -

Lithuanian   X - - - - - - - - - X   -

Page 21: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 21/45

Problems for language resources(the ones not on the previous list)

Major problems for language resources

Page 22: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 22/45

Major problems for language resources

  Visibility:“When I go looking for a resource, can I find it?”

Ex.: Swedish WordNet

  Availability:“When I find the resource I’m looking for, can I use it?” Ex.: Beygingarlýsing íslensks nútímamáls (BÍN)

  Sustainability:“Will the resource I’m using be there in next year?”

Ex.: METIS-II project

Problem: Visibility

Page 23: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 23/45

Problem: Visibility

So, why is visibility such a problem ?

  As researchers, we would like to be sure that our work isknown

For example, people citing us

  As users, we would like to be able to find resources easily. The longer it takes to find, the more we are going to assumethat it doesn’t exist

  As developers, we would like the work that we produce to be

used. Usage means feedback, and feedback means improvements

  As funding bodies, we want to effectively spend our money

Not unnecessarily duplicating something that already exists

Problem: Availability

Page 24: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 24/45

Problem: Availability

And availability ?

  As researchers, we want to be able to reproduce the resultsof others Many papers are difficult or impossible to reproduce – cf.

“Zigglebottom tagger” (Pedersen, 2008)

  As users, we want to be able to use what we find with theminimum of hassle Both standalone, and in combination with other software.

  As developers, we don’t want to have to recreate somethingthat has already been made

Especially if it has been made with public money

  As funding bodies, we don’t want to fund the same thingtwice

We also want the results of one project to be able to be used in

the development of another

Problem: Sustainability

Page 25: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 25/45

Problem: Sustainability

And sustainability ?

  As researchers, we want to be able to reproduce our work(and that of others), even after fifteen years Our results are highly dependent on specific versions of data

and code

  As users, we want to be able to depend on software If we build a service or software around it, we don’t want it to

disappear overnight

  As developers, we want our code to keep working

We might not have time to keep up with the changes in libraryversions, but someone else will

  As funding bodies, we want the results of our investments tobe relevant for as long as possible

That’s cost effective

Particular issues for M-languages

Page 26: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 26/45

Particular issues for M languages

In my opinion, of the three problems outlined, for marginalised and

minority languages, the biggest one is availability.   Less “noise”, when there is only one of something it sticks out

more.   Projects tend to be “labours of love” for the participants,

meaning that they are more likely to have staying powerAvailability is a problem why ?

  M-languages often barely have the base to make their ownresources, let alone duplicate resources of a major language

as well.   While major languages can often afford to rewrite resources

several times, this is rarely the case for M-languages.

Why do these problems occur ?

Page 27: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 27/45

Why do these problems occur ?

These problems occur as a result of how language resources and

technology are developed and published. Here are some commonexperiences:

  Commercial: A company makes a resource, and sellslicences for it in the usual fashion.

 Big research: A big consortium or group develops aresource, and charges for its use. Sometimes with publicmoney, sometimes without.

 Small research: A small group develops something, andpublishes it on their university web. They mark it as research 

only/non commercial 

  Single person: A lecturer or student develops something,and as above.

Page 28: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 28/45

Free/open-source pools

The pool

Page 29: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 29/45

p

What is a pool (Scannell, 2006) ?

“ A pool is a multilingual collection of resources of the sameform and function under a free/open licence ”

Features of a pool:

  Multilingual: To be a pool, the collection must be multilingual  Community maintained: The collection must have an active

community of unpaid users, developers and maintainers  Open to contribution: The collection must allow external

contribution   Uniform: The collection should be homogenous as far as

possible, dictionaries with dictionaries, taggers with taggers  Free / open: Everything in the collection should be released

under free/open licences, ideally the same one1

1More on this later

Existing pools

Page 30: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 30/45

g p

Here are some examples of existing pools:

  Corpora: OPUS: Open Parallel Corpus EuroParl: Corpus of European Parliament Proceedings

  Grammars: DELPH-IN: Collection of HPSG grammars MOLTO

: Grammars based on Grammatical Framework (GF)  Morphological analysers:

Giellatekno: Morphological analysers for the Sámi languages,and others

Apertium: Morphological analysers for a range of languages

  Spellcheckers: {A,I,Hun,My}spell: Ubiquitous free spellcheckers

 Machine translation systems: Apertium: Simple RBMT systems for many languages MOLTO: Interlingua-based MT systems

What is not a pool?

Page 31: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 31/45

p

The definition of pool  is quite wide, but does not include thefollowing:

  ELRA / LDC: Sparse databases of some existing language

resources   CORPORA-LIST: Mailing list for corpora and language

resources in general   ACLWIKI: A collaboratively-maintained Wiki of pointers to

language resources

Page 32: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 32/45

The solution

Solution: Visibility

Page 33: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 33/45

y

When you add your resource to an existing collection you gainvisibility.

  The resource is with similar resources, so is more likely to be

referred to in passing. Higher search engine rankings

  The more languages a collection has, the more visible it is, soit is in the interest of the maintainers to have more languages.

Solution: Availability

Page 34: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 34/45

All of the resources in the pool have compatible licences

  Resources and software can be shared between projects No need to call in the lawyers

  No need to wait years for someone to finally make a decision   Results can be reproduced and new results published without

issue

And they are all hosted on the same project-independent (and

mirrored) infrastructure

Solution: Sustainability

Page 35: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 35/45

The pool is project-independent, and community maintained  It can use freely available infrastructure... for example GNU

Savannah, SourceForge, Google Code, La Farga, . . . No problem of the servers being turned off when the project

ends   A lot of maintenance is done by people who aren’t tied to

project funding – although they may be being paid  Data conversion happens automatically   Feedback about one language can be used to improve all

languages

Particular issues for M-languages

Page 36: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 36/45

M-languages can gain a lot by being pooled – either with majorlanguages – or with other M-languages, or both.

  No need to build their own infrastructure – spend more time

on linguistic matters   Sharing of infrastructre and expertise  To make major language ↔ M-language applications, both

free M-language and  free major language resources arenecessary.

Page 37: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 37/45

Commercial aspects

Commercial aspects

Page 38: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 38/45

How does the pool (i.e. being free/open) effect commercial useand sales ?

 Commercial use: Explicitly allowed by the first freedom.  Commercial sales: Allowed with permission of all authors.

Many companies dual-license their software or resources, with

  A free licence for use with other free/open software, and   A commercial licence for use with proprietary/commercial

software

Brass tacks: You can still sell licences (if you own all the rights).

Licences and licensing

Page 39: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 39/45

There are a number of things to consider when choosing a licence   Type: Is the thing to be licensed code, or data ?   Copyleft: Should all changes be released under the same

licence ?   Compatibility: Is the licence compatible with other software ?

And there is one thing that shouldn’t be considered:

  Non-commercial: Limiting the commercial use of yourresource, or limiting it to research only 

Why ?

What about non-commercial ?

Page 40: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 40/45

What other people think of when they think of  commercial use :

  MicrosoftTM

is going to come along and take my Finnish-NorthSámi wordlist, make a dictionary and sell it for a million euros.I will die poor and penniless.

What I think of:

  A company offering bundled versions of OpenOffice and aspellchecker to schools on easy-to-install DVDs. The DVDprice covers costs. The company makes its main profitoffering services.

  A minority language newspaper wants to produce bilingualeditions of some of their articles. They download an MTsystem and use it in their commercial operation.

Outside academia, almost everything can be classified ascommercial.

Which licence should I choose?

Page 41: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 41/45

M-language perspective

Page 42: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 42/45

 Do not hesitate to dual license

 The language comes first! If the only way to get your software onto the desktops of

M-language users is to pact with the devil, do it.

Summary

Page 43: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 43/45

I see two main avenues for the development of language

technology and resources  Open / free

Sharing Easy collaboration Linguistically rich applications Inclusion of all languages

 Closed / proprietary Duplicated work Reduced collaboration Dependence on linguistically-poor techniques

Spellcheckers working on simple wordlists “Basic” phrase-based statistical MT

Inclusion of profitable  languages

Further reading

Page 44: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 44/45

 Pedersen, T. (2008) ‘Empiricism Is Not a Matter of Faith’.Computational Linguistics  34(3), 465–470.

 Scannell, K., Streiter, O. and Stuflesser, M. (2006)‘Implementing NLP Projects for Non-Central Languages:Instructions for Funding Bodies, Strategies for Developers’Machine Translation  20(4), 267–289

Page 45: Astin2011 Charla

8/11/2019 Astin2011 Charla

http://slidepdf.com/reader/full/astin2011-charla 45/45

Pieldes !   ·   Tack !   ·   Tak !   ·  Takk fyrir !   ·Takk fyri !   ·   Kiitti !   ·   Gæhjtoe !  ·   Aciu !   ·

Giitu !


Recommended