+ All Categories
Home > Documents > Mātāpuna Dictionary Database System The Open Source Multi-user Web-based Dictionary Writing...

Mātāpuna Dictionary Database System The Open Source Multi-user Web-based Dictionary Writing...

Date post: 18-Dec-2015
Category:
Upload: benjamin-evans
View: 233 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Mātāpuna Dictionary Database System The Open Source Multi-user Web-based Dictionary Writing System Dave Moskovitz DWS 2004, Brno
Transcript

Mātāpuna Dictionary Database System

The Open Source

Multi-user

Web-based

Dictionary Writing System

Dave MoskovitzDWS 2004, Brno

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Outline

Background info

Design criteria

Functions

Database structure

Future development

Lab

Call for collaboration

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Background – New Zealand / Aotearoa / Māori

4m people; 268,000 km2

15% Māori; 1 in 4 of those speak Māori

Median age 22; median income NZD14,000(compared to 35 and 18,500 for pākehā)

Māori is an official language, polynesian language group

Uses standard roman character set with macrons

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Background – New Zealand in the Pacific

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Background – The Mātāpuna project

First monolingual dictionary of Māori, written from Māori cultural perspective

Target of 20,000 entries (1 entry = 1 definition)

Designed for language learners with some proficiency

3+ year project under auspices of Te Taura Whiri i te Reo Māori / The Māori Language Commission

4 writers, one editor, one lexicographer, one project manager, admin support, one geek

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Background – The Mātāpuna team

Sharon ArmstrongWiha Te Rakihawea

Te Waireka WalkerPou Temara Phil Matthews

Ruka Broughton

Hēni Jacobs

Not in photo: Te Awanuiārangi Black, Te Haumihiata Mason, Dave Moskovitz

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Background – Dave

BA (Hons) Comp Sci Univ. California Berkeley

Began PhD in Applied Linguistics – NZ Sign Language phonology

25 years in IT industry

Background in Application Development, Systems Architecture, System Performance, Internet

3rd lexicography project, after Dictionary of NZ Sign Language and Oxford NZ Dictionary

Open Source bigot

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Background – Software

Free Software – Open Source – GPL

Uses Linux, Apache, mod_perl, Postgres, runs on any old hardware (eg Pentium 600)

Browser based

About 4,000 lines of Perl code

Won Computerworld excellence award for use of IT in Government

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Open Source is Good for Lexicography

Free

Market is too small to support proprietary software

Everyone’s needs are unique – and you can modify the source code to suit

Open source programmers not hard to find

Low risk and futureproof: no vendor lock-in

Everyone helps each other

Software is open, but data is not (necessarily)

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Design Criteria

Easy to use by untrained lexicographers

Support workflow and management as well as entry

End-to-end processing

Produce printed output as well as web access

Multiuser

Multilingual interface, easy to add languages

Unicode-based, allows any character set to be used

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Sample Output

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions

Add

Search

Edit

Corpus search

Reports

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions - Add

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions - Search

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions - Edit

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Corpus search

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Reports

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Validation

Field-based, including:- orthography- punctuation- blank- undefined word / not in defining vocab- synonym rules

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Workflow

Basic workflow:Add → Self check → Editor 1 → Editor 2

Editor can make minor changes, or send the entry back to the owner

Owner is notified of any changes by email

You can always view the history of an entry

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Synonym handling

Entries allow for synomym ‘families’

Master – slave (tuakana – teina) relationship

Masters can’t have masters and slaves can’t have slaves

Slave definitions printed from master

All cross-references managed

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Multilingual interface

186 text snippets

Can add additional languages

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Functions – Multilingual interface

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Database Structure

wordclass

category

examplesource

headword

qastatus

hwarchive

matapunauser

activityjournal

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Future development

Multiple citations

Bilingual / multilingual

More corpus material (and better corpus performance)

Advanced search

Better user administration

XML / SGML export

More languages

… what do you want or need ????

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Lab – Words from the Olympics

15 users, 15 categories of words

Rawiri is the editor

Practise entering definitions, linking synonyms, playing with major and minor senses, searching, breaking validation rules …

Be nice to Rawiri, he can send work back to you to get fixed

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Call for collaboration

This is Free Software

Use it and contribute enhancements

It’s robust and capable of producing a major lexicographical work

We are interested in your feedback and participation

Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

Call for collaboration

Contact:

Dave MoskovitzThinktank Consulting LimitedPO Box 15-212Wellington, New Zealand

[email protected]

+64 27 220 2202


Recommended