+ All Categories
Home > Documents > Implementation Basket

Implementation Basket

Date post: 25-Feb-2016
Category:
Upload: kura
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Implementation Basket. Moderator: Felix Sasaki (DFKI / W3C Fellow) . What is in the basket?. Tools to support work with W3C ITS 2.0 ITS 2.0 in editing environments Generate and validate ITS 2.0 (Automatically) process ITS 2.0 enhanced content What the audience should do - PowerPoint PPT Presentation
22
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. Implementation Basket Moderator: Felix Sasaki (DFKI / W3C Fellow)
Transcript
Page 1: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

ImplementationBasket

Moderator:Felix Sasaki (DFKI / W3C Fellow)

Page 2: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 2

What is in the basket?• Tools to support work with W3C ITS 2.0

① ITS 2.0 in editing environments② Generate and validate ITS 2.0③ (Automatically) process ITS 2.0 enhanced content

• What the audience should do– Think about the area that interests you– Remember faces and use META-FORUM for hallway

conversations

Page 3: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 3

W3C ITS 2.0 in editing environments• In the CMS 1: Adobe. Presenter: Felix Sasaki• In the CMS 2: Cocomore. Presenter: Clemens

Weins• In a word processor: ]init[. Presenter: Steffen

Haller• In a Web content editor: Disruptive Innovations.

Presenter: Daniel Glazman

Page 4: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

Adobe’s ITS2 Implementation

CMS

REST Framework

• Translate• Localization Note• Id Value• Target Pointer

Adobe’s fully open source implementation imports and exports content enabled with ITS2 metadata to/from a JCR Content Repository

XML (xliff) html5

To access content:GET http://myhost/my/content/file.htmlTo access the same content, ITS Enabled :GET http://myhost/my/content/file.its.html

Implemented Data Categories

Accessible via ‘selector’ REST URLs. E.g.:

4

Page 5: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

Build the bridge Web CMS <> TMS

5

• Drupal ITS 2.0 integration https://drupal.org/project/its

• JavaScript ITS 2.0 parser http://plugins.jquery.com/its-parser/

• Real life ITS 2.0 showcase with a customer (VDMA) and Language Service Provider (Linguaserve)

XHTML + ITS 2.0 LSP

Page 6: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

W3C ITS Libre Office Extension ]init[ AG für Digitale Kommunikation

Downloadable at Libre Office Extension Centre: http://extensions.libreoffice.org/extension-center

• Open Source GPL v3• free to use and to be

developed furtherMore on:http://www.init.de/en/libreofficeWriter

6

Page 7: Implementation Basket

http://bluegriffon.org

Page 8: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 8

Generate and validate ITS 2.0• Generate Terminology: Tilde. Presenter: Andrejs

Vasiļjevs• Generate Text Analysis information: Institut “Jožef

Stefan”. Presenter: Felix Sasaki• Transform HTML5+ITS2 to NIF (NLP Interchange

Format): Univ. of Leipzig. See on NIF poster from Sebastian Hellmann

• Validate all ITS 2.0 data categories: University of Economics Prague. Presenter: Jirka Kosek

Page 9: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

W3C ITS 2.0 EnrichedTerminology Annotation Showcase

taws.tilde.com

Machine users

TaaS Terminology Services

ITS 2.0 enriched content

ITS2.0term-annotated content

export / visualisation

Showcase Web Page

Terminology Annotation

Web Service API

Plaintext

Term-annotated content

ITS 2.0 enriched content

ITS2.0term-annotated

content

CAT Tools MT Systems

ITS 2.0 enriched content

ITS2.0term-annotated

content

Human users(e.g., translators,terminologists)

9

Page 10: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

Creating translation context with disambiguation

Problem: Localizing content containing proper names without sufficient context• ITS 2.0 markup provides the

key information about which entities are mentioned, so they can be correctly processed within translation

• Data category: Text Analysis

Solution: use natural language processing techniques to provide context for ambiguous content.• Implemented and

demonstrated with the Enrycher NLP tool

• Demo: enrycher.ijs.si/mlw/• Questions: [email protected]

10

Page 11: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

W3C ITS 2.0 Supportin Modern Document Formats

• HTML5 support– Native support (its-* attributes)– Supported by validators – validator.w3.org and validator.nu– You can use ITS markup right now in your pages and get them

validated• DocBook support

– Supported bystandard schemaand stylesheets

• DITA support– Coming soon

11

Page 12: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 12

(Automatically) processITS 2.0 enhanced content (1/2)

• Machine translation statistical: Dublin City University. Presenter: Felix Sasaki

• Machine translation rule based: Lucy Software. See presentation from Pedro Díez Orzas later

• Building localization processes: ENLASO. Presenter: Felix Sasaki• Building localization Web services: University of Limerick,

Moravia. Presenter: David Filip• Workflow for creating global content: Trinity College Dublin.

Presenter: Dave Lewis• Preview in the browser: Logrus. Presenter: Serge Gladkoff

Page 13: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

ITS 2.0 & MACHINE TRANSLATIONTranslation Web Service

• Translating of HTML / XLIFF documents tagged with ITS 2.0 metadata– Domain, Lang Info, Locale Filter– Terminology, Translate– MT Confidence, Provenance

• Demonstrate pre/post process wrapper scripts are sufficient to adapt a pre-existing MT system to the ITS 2.0 standard

• Benefits include integration of MT system into the larger localization pipeline

Training Web Service• Use of metadata info to train

Statistical MT components (Translation & Lang Models)– Translate, Terminology

• Extract do-not-translate and named entity Terms, force feed this in training cycle– Significant Improvement observed in

translation accuracy• Benefits include added

consistency in translation across multiple documents

Web Service Located at: http://srv-cngl.computing.dcu.ie/mlwlt/

13

Page 14: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

W3C ITS in the Okapi Framework• Open-source and cross-platform set of libraries and tools

for building localization processes.• Offers ITS support for XML, HTML5 and XLIFF, as well as in

many components: Quality Check, Term Extraction, Microsoft Batch Translation, Enrycher, LanguageTool, etc.

• Makes adoption of ITS easy for developers and immediate for Okapi’s tools users.

• Continuing work after the MLW-LT project.

14

Page 15: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

ITS and XLIFF in a full roundtrip test bed

15

Source CMS

Target CMS

RDF provenancestore

Named Entity Recogniser

Term Annotstor

Web-based

PE

MT - Matrex

CAT

XLIFF store

Parse, filter, segment

ITS+XLIFF 1.2 & 2.0

XLIFF/ PROV-O

QA viewer

MT - Bing

MT – M4LOC

ITS+HTML5+CMIS

ITS+XLIFFITS

+SPARQL

Workflow Management

Services BrokersMT, TA, CAT, …

Page 16: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

ITS 2.0 for Global Intelligent Content

Linked Data and Multilingual Content Processing

Multilingual Content Interoperability

New FP7: FALCONwww.falcon-project.eu

New FP7: LIDERwww.lider-project.eu

16

Source CMS

Target CMS

RDF provena

ncestore

Named Entity

Recogniser

Term Annotst

or

Web-

based

PE MT - Matre

x

CAT

XLIFF store

Parse, filter,

segment

ITS+XLIFF 1.2 & 2.0

XLIFF/

PROV-O

QA viewer

MT - Bing

MT – M4LOC

ITS+HTML5+CMIS

ITS+XLIFF

ITS+SPARQL

Workflow

Management

Page 17: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

Preview of ITS 2.0 Metadata in Web Browsers(Part of the Multilingual Web-LT Program)

COMPLEX METADATA AT YOUR FINGERTIPS:Part of Work in Context Solution (WICS) from Logrus

17

Page 18: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 18

(Automatically) processW3C ITS 2.0 enhanced content (2/2)

• Capturing ITS 2.0 metadata: VistaTEC. Presenter: Phil Ritchie, separate slot

• Localization CMS / TMS / MT integration: Linguaserve. Presenter: Pedro Díez Orzas, separate slot

Page 19: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 19

WHAT WILL OR MAY COME NEXT?

Page 20: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 20

What will or may come next?• Standardization break – let’s use W3C ITS 2.0 and

gather experience!• Outreach involving ordinary Web (content)

developers – “ITS 2.0 for everybody”• Strengthen the bridge to the Semantic Web: via

e.g. ITS2<>NIF conversion (Sebastian Hellmann poster), FALCON (Dave Lewis poster), LIDER (Asunción Gómez Pérez presentation)

Page 21: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815. 21

What will or may come next?• Further contributions to the development of

multilingual services and data analytics technologies – a long and open list of ideas– Mining provenance information for business

analytics, “Terminology-Translation-Web technology” triangle, multilingual technologies for multimedia content, ...

• We are looking for your ideas & thoughts – let’s discuss here at META-FORUM

Page 22: Implementation Basket

The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in the area of Language Technologies. Grant Agreement No. 287815.

ImplementationBasket

Moderator:Felix Sasaki (DFKI / W3C Fellow)


Recommended