FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 1
Co-funded by the Horizon 2020Framework Programme of the European UnionGrant Agreement Number 644771
FREME WEBINAR HELD FOR GALA, 28 APRIL 2016
A FRAMEWORK FOR MULTILINGUAL AND SEMANTIC ENRICHMENT OF DIGITAL CONTENT (NEW L10N BUSINESS OPPORTUNITIES)
www.freme-project.eu Presented by Tatjana Gornostaja (Tilde) and Felix Sasaki (DFKI / W3C Fellow)
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 2
OVERVIEW
• Introduction
• Technological aspects of the framework
• Localization and other FREME business cases
• Q&A
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 3
Coupling
Knowledge and Language
via e-Service Ecosystem
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 4
Knowledge Language
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 5
Knowledge Language
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 6
KnowledgeLanguage
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 7
FREME
Picture: coloringpageswallpaper.com
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 8
THE FREME PROJECT
• Two year H2020 Innovation action; start February 2015
• Industry partners leading four business cases arounddigital content and (linked) data
• Technology development bridging language and data
• Outreach and business modelling demonstrating monetization of the multilingual data value chain
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 9
CURRENT STATE OF SOLUTIONS
Machine translation, terminology
annotation, ...
Linked data creation & processing
GAPS THAT HINDER BUSINESS:
• Plethora of formats
• Adaptability and platform dependency
• Language coverage
• Usability “The right tool for the right person in given and new enterprises”:technology influences job profiles
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 10
FREME TO THE RESCUE: ENRICHING DIGITAL CONTENT
Machine translation, terminology
annotation, ...
Linked data creation & processing
LT and LD as first class citizens on the Web
A SET OF INTERFACES* - DESIGN DRIVENBY BUSINESS CASES
LT and LD for varioususer types: (application) developer, content architect, content author, …
* Graphical interfaces* Software Interfaces
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 11
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 12
OVERVIEW
• Introduction
• Technological aspects of the framework
• Localization and other FREME business cases
• Q&A
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 13
FREME FROM A TECHNICAL PERSPECTIVE
A framework for multilingual and semantic enrichment of digital content that provides access via a set of APIs and GUIs to six E-services.
• e-Entity for enriching content with information on named entities;
• e-Link for enrichment with linked data sources;
• e-Terminology for detecting terms and enriching them with term related information;
• e-Translation for providing custom machine translation systems;
• e-Internationalisation for processing a variety of digital content formats; and
• e-Publishing for exporting the outcome of enrichment processes in the ePub format.
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 14
FREME FROM A TECHNICAL PERSPECTIVE
How to access FREME – several options:
• A life version 0.5 (0.6 soon to be released!) including documentation at http://api.freme-project.eu/doc/current/
• A development version at http://api-dev.freme-project.eu/doc/
• A Java / maven software package;see the documentation for installation instructions
• Source code in a GitHub project https://github.com/freme-project/
• The framework is available under Apache 2.0 license to ease commercial use
• Underlying services have various licensing conditions
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 15
LINGUISTIC LINKED DATA AND OTHER STANDARDSPUT IN ACTION VIA FREME
• NIF (Natural Language Processing Interchange Format) for representing digital content and enrichment information in a format agnostic manner, based on the linked data stack;
• OntoLex lemon for representing lexical information, to be used e.g. for improving machine translation output;
• Internationalization Tag Set 2.0 for representing various types of enrichment information in a standardized manner, related e.g. to terminology named entities; and
• The general linked data technology stack (RDF, SPARQL etc.)
FREME is built on outcomes of standard driving projects in FP7 in the area of linguist linked data: LIDER and FALCON
Cf. http://lider-project.eu/ and http://falcon-project.eu/
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 16
EXAMPLE API CALL
• The request is made to the API for the e-Entity service, a service that enriches content with named entities.
• The input format of content is plain text; the output format is turtle.• The content to enrich is “Welcome to the city of Prague”.• The language or the content is English.• The dataset used for the enrichment is DBpedia.
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 17
EXAMPLE OUTPUT: USING NIF TO STORE CONTENT …
(1) <http://freme-project.eu/#char=0,29>
(2) a nif:String , nif:Context , nif:RFC5147String ;
(3) nif:beginIndex "0"^^xsd:int ;
(4) nif:endIndex "29"^^xsd:int ;
(5) nif:isString "Welcome to the city of Prague"^^xsd:string .
1) Identifying the content via a URI2) Adding certain types from NIF*3) Identifying the start offset of the content4) Identifying the end offset of the content5) Providing the string content itself.* For More on NIF: see a dedicated tutorial http://de.slideshare.net/m1ci/nif-tutorial
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 18
… AND ENRICHMENT INFORMATION
(1) <http://freme-project.eu/#char=23,29> …
(2) nif:anchorOf "Prague"^^xsd:string ;
(3) nif:beginIndex "23"^^xsd:int ;
(4) nif:endIndex "29"^^xsd:int ;
(5) nif:referenceContext <http://freme-project.eu/#char=0,29> ;
(6) itsrdf:taClassRef <http://dbpedia.org/ontology/City>.
1) Identifying the annotation via a URI2) Providing the string content of the annotation3) Identifying the start offset of the content4) Identifying the end offset of the content5) Relating the content to annotations6) Enrichment with ITS 2.0 class information (“Prague” = a city)
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 19
SIMPLIFIED OUTPUT HELPS API DEVELOPERS TO CONSUME LINKED DATA
• FREME provides user specified filter mechanism to simply the output
• Supports CVS, XML or JSON
• Example output as CSV
http://dbpedia.org/resource/Prague,50.0878367932108,14.4241322001241
For more infos on filtering, see
http://api.freme-project.eu/doc/current/knowledge-base/filtering.html
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 20
FORMAT COVERAGE
• Processing of various content formats
◦ NIF, RDF, Text, HTML, OpenOffice, XLIFF 1.2, various XML formats, …
• Many formats are processed via e-Internationalization services
• Format specified in API call as input and (partially supported) output
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 21
USING E-TERMINOLOGY WITH HTML OUTPUT
<!DOCTYPE html> …
<body>
<p>Welcome to the city of Prague.</p>
</body> … </html>
<!DOCTYPE html> …
<p>Welcome to the <span its-term="yes">city</span> of Prague.…</html>
Call of e-Terminology
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 22
TRANSLATING XLIFF CONTENT WITH E-TRANSLATION
...<trans-unit>
<source>This is car</source>
</trans-unit> ...
<http://freme-project.eu/#char=0,13>
nif:isString "This is a car"@en
itsrdf:target "Dies ist ein Auto"@de .
Call of e-Translation
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 23
IMPROVING E-TRANSLATION OUTPUT VIA E-TERMINOLOGY
“The EU in brief. The EU is a unique economic and political partnership between 28 European countries that together cover much of the
continent.”
continent, partnership, briefing, economics, covering
Call of e-Terminology: detection of translation suggestions
De voorschriften in DE EU. De EU is een uniek partnerschap tussenpolitiek en economie in de Europese landen, die gezamenlijk 28
verpakking van het continent.
Call of e-Translation: improved output!
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 24
OVERVIEW
• Introduction
• Technological aspects of the framework
• Localization and other FREME business cases
• Q&A
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 25
MOTIVATION
• Aid translators
◦ Supplement typical linguistic support tools like glossary look-up with entity recognition and term disambiguation
◦ Possibility to introduce proprietary and domain-specific semantic datasets
• Provide “Value-Add” to customers
◦ Make their content more interactive, compelling and discoverable
◦ Open up service offerings to new customers from existing and new channels
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 26
TRANSLATOR SUPPORT
• Automatic machine translation suggestions
• Automatic terminology look-up
◦ Includes definitions
• Automatic Entity Recognition
◦ Includes many textual and visual contextual properties: descriptions, images, links to other resources…
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 27
CUSTOMER VALUE-ADD
• Relationships can be formed between new content and existing knowledge resources
• Utilize open and private Multilingual Linked Data Cloud
DBpedia
Proprietarydataset
TranslatedContent
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 28
BUSINESS BENEFITS
• Technological Support to Content Authors and Localizers
◦ Aid with the cognitive and physical tasks of finding and employing the most appropriate terminology
• Opens up Conversations with New Customers
• Deliver semantically richer, more interactive, highly sociable and discoverable content
◦ Through integration, enrichment added automatically can be validated by human and saved with content
• Demonstrates Vistatec thought leadership to customers looking for service differentiators and value add
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 29
CHALLENGE AND OPPORTUNITY: BIG DATA IS GROWING ACROSS LANGUAGES, SECTORS AND DOMAINS
• BC: Digital publishing
• BC: Translation and localisation
• BC: Agriculture and food domain data
• BC: Web site personalisation
Agriculture metadata, user content, news
content, …
WHAT LIES AHEAD FOR SEVERAL INDUSTRIES? SEE THE FREME BUSINESS CASES
EN
ESJA, ZH, ...
AR
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 30
DIGITAL PUBLISHING
With a simple click you can fetch extra information from a dataset and use it to annotate content.
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 31
AGRICULTURE AND FOOD DATA
Domain experts can automatically extract terms from title, description, abstracts and full text.
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 32
PERSONALISATION OF WEB CONTENT
Businesses can identify the topics their customers are engaging with, focusing their global content strategy.
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 33
CONTACTS
E-mail:
CONSORTIUM
FREME Webinar for GALA – April 2016 WWW.FREME-PROJECT.EU 34
OVERVIEW
• Introduction
• Technological aspects of the framework
• Localization and other FREME business cases
• Q&A