+ All Categories
Home > Technology > Google Summer of Code 2011: UOC & Apertium

Google Summer of Code 2011: UOC & Apertium

Date post: 18-Nov-2014
Category:
Upload: office-of-learning-technologies-universitat-oberta-de-catalunya
View: 1,266 times
Download: 0 times
Share this document with a friend
Description:
Summary of the UOC participation in the Google Summer of Code 2012 together with Apertium.
14
Lluís Villarejo Learning Technologies March 2012 Pre and post editing environment for Apertium
Transcript
Page 1: Google Summer of Code 2011: UOC & Apertium

Lluís VillarejoLearning Technologies

March 2012

Pre and post editing environment for Apertium

Page 2: Google Summer of Code 2011: UOC & Apertium

What is GSoC?• It's a global program that offers student developers stipends

to write code for various open source software projects.• Since 2005

• Inspire young developers to participate in OSS projects.• Give students more exposure to real-world soft dev

scenarios.• Get more open source code created and released.• Help open source prjs identify and bring in new developers.

c

Page 3: Google Summer of Code 2011: UOC & Apertium

Some participants

• Apache Soft. Found.• Debian• Facebook• Drupal• Creative Commons• DocBook project• GCC • Gnome• ...

• Sakai Foundation• Mozilla• Inclusive Design Inst.• The Linux Foundation• The GNU project• Wikimedia Foundation• WordPress• Inclusive Design Inst.• ...

c

Page 4: Google Summer of Code 2011: UOC & Apertium

How does it work?• Orgs present themselves as mentoring agents.• Orgs present a list of potential projects and mentors.• Accepted orgs should try to attract students' interest.• Students build project proposals.• Google finances slots for each org (5.000 + 500 USD).• The project community decides the student-slot assignation.• Between end of May and end of August.

c

Page 5: Google Summer of Code 2011: UOC & Apertium

GsoC'11 statistics

c

• $7.2M budget

• 1115 students accepted from 68 countries

• 2096 mentors and co-mentors from 55 countries

• 175 Open Source organizations

• 18.1% of students have participated in previous years

• 97 countries with student applicants

• 88% overall success rate

Page 6: Google Summer of Code 2011: UOC & Apertium

Accepted Students GSoC'11

c

Page 7: Google Summer of Code 2011: UOC & Apertium

Why participating with Apertium?• Strategically:

– Apertium is a strategic agent inside UOC.– Developing Apertium means further developing

internationalization aids for UOC.– Attract and onboard new developers for Apertium.– Collaboration with Google's Open Source initiatives.

• Functionally:– Opporutnity to further develop specific UOC needs with

external funding.– Capitalize specific user feedback on translation quality.

c

Page 8: Google Summer of Code 2011: UOC & Apertium

The Apertium case• 20 proposed tasks • 17 tasks got interest from students [1-9]

– Pre and post-editing environment gets 11 students interested.

• Apertium community ranks the 17 tasks– Pre and post-editing environment ranks 4th

• Google assigns 9 slots to Apertium (49.500 USD)– Our task goes through and Camille Mougey is selected

from the Grenoble Insitute of Technology.

c

Page 9: Google Summer of Code 2011: UOC & Apertium

Pre and post-editing, why?• An important part of the errors you get when translating a

document are due to deficiencies in the original.• The integration of existing resources can help to ease this

burden:– Digital knowledge sources (digital dictionaries... )– Automatic tools (spell-checker, grammar checker, translation

memory generation, search & replace...)• These processes should be integrated naturally in the

translation workflow → the need for an integrated web interface to Apertium.

• To improve the system we need to have access to the human post-editing process.

c

Page 10: Google Summer of Code 2011: UOC & Apertium

Pre and post-editing, features• Pre and Post-editing web interface integrated with Apertium translation toolbox.• Spell checking on source and target languages. Integration with Aspell• Grammar checking on source and target languages. Integration with

LanguageTool• Integration with several external dictionaries.• Search & replace functionalities on source and target languages. • Ability to deal with formatted text. • Logging system. All events are logged as they happen, ie at the very moment

the user inserts or deletes text. This allows for a further data mining process to be run on the logs to detect commonly modified structures or vocabulary.

• Translation memory generation. Integration of Maligna.• PDF translation through pdftohtml• Image translation. Through tesseract.

Final report 2010Final report 2011

c

Page 11: Google Summer of Code 2011: UOC & Apertium

Results & learned lessons• Fully functional environment, goals accomplished. • Automatic availability of feedback on post-editing human

behaviour.

• Jointly defined task (flexible framework provided).• Interest in developing great empathy with the student.• Motivated and pro-active student.• Student engagement.• Very frequent feedback.• Mentoring team with access to ABSOLUTELY ALL the

information regarding the project.

c

Page 12: Google Summer of Code 2011: UOC & Apertium

Further work• Proof of concept accomplished.• Base platform developed so further work can be easily

added.• Integration of other resources (more external dictionaries).• Extension of currently used resources (addition of

grammar rules, dictionaries improvement, format range extension).

• Logging information mining to get deeper knowledge on the human post-editing process.

• Use of this mining process to improve Apertium translation engine.

c

Page 13: Google Summer of Code 2011: UOC & Apertium

GsoC 2012

• Logging information mining to get deeper knowledge on the human post-editing process.

• Use of this mining process to improve Apertium translation engine.

• Post-edition over formatted text.

c

Page 14: Google Summer of Code 2011: UOC & Apertium

ThanksQuestions & answers

c


Recommended