ECM and Semantic Web - Meetupfiles.meetup.com/1336198/semantic-meetup-boston-nuxeo...3 Nuxeo, in...

Post on 14-Aug-2020

0 views 0 download

transcript

1

ECM and Semantic WebEn route to Semantic ECM

Roland BenedettiOlivier GriselStefane Fermigier

2

Agenda Nuxeo in short Nuxeo platform, overview Why integrating semantic web with Nuxeo ? The Scribo & IKS project Demo & Work delivered What’s next ?

3

Nuxeo, in short• Open Source Enterprise Content Management provider with

global install base

• Our Focus is Enterprise Content Management Not an onramp to another core offering

• ECM as a Platform for Content Applications Technically superior, extensible, plug-in friendly architecture

• Current ECM Platform fully based on the Java environment – technology refresh in 2005

• Open Source as Efficient Development ModelOpen development process, freely available ECM platform & components to download / deploy / extend, trick-free business-friendly licensing, etc.Innovation driven by community of customers, partners, our core developers

• 10 years old, Paris, Boston, San Francisco, 50+ employees

Nuxeo ECM - Our Approach

PlatformContent

Infrastructure

Nuxeo Enterprise PlatformComplete set of components covering all aspects of ECM

Nuxeo CoreLightweight, scalable, embeddable content repository

HorizontalPackages

DocumentManagement

Digital AssetManagement

CaseManagement

Framework

StructuredDocument

Server

ContentAggregator

Business Solutions

Correspondence Management

Contracts Management

Invoice Processing

Records Management

Construction Media Government Life Sciences

5

Nuxeo, from ECM ...

6

Nuxeo, from ECM ...

... to Semantic ECM

7

2010, technologies and data are available but not really used by Enterprise Content Management providers.

Let’s put them to use!

8

Goals for Semantic ECM• Repurpose existing content better

• Improve search and collaboration

• Make information more contextual

• Extract and use information from content

• Leverage Open and Linked Data, contribute

• Make ECM user’s content smarter!

• > Gain efficiency, effectiveness and strategic positioning on the ECM market

9

Demo

Scribo project• Project under the french FUI program,

with 9 partners, and a budget of 4.7 MEUR

• Goal: to develop algorithms and collaborative tools for extracting knowledge from unstructured documents and images

• Started in 2008, finishing in Dec. 2010, with results already integrated as a Nuxeo plugin

10

11

IKS project

• European project under the FP7, with 13 partners (6 SMEs) and a 8.5 MEUR budget

• Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products

• Started in Jan. 2009, will last until Dec. 2012

• First tangible result: FISE, already integrated in a Nuxeo plugin

12

The Semantic Engine

• From unstructured content to Knowledge

• Language guessing

• Topic classification (Business, Sports, Media, ...)

• Named Entities extraction and linking

• Relationships and properties extraction

13

14

15

RESTfulis

Beautiful

16

17

18

= fise +

fast Linked Data local index +

semantic rule engine+

more ?

Local IT infrastructure (LAN) 19

Nuxeo DM

addon

1

Apache Stanbol

2

Engine 1

Engine 2

Engine 3

3

DBpedia

Freebase

GeonamesLDAP

20

Next ?

21

Mining Wikipedia in the Cloud with Hadoop and Pig to improve Natural Language Processing efficiency, and better result in extracting Named Entities

• http://blogs.nuxeo.com/dev/2011/01/mining-wikipedia-with-hadoop-and-pig-for-natural-language-processing.html

22

Mining Wikipedia in the Cloud with Hadoop and Pig to improve Natural Language Processing efficiency, and better result in extracting Named Entities

• Wikipedia as a learning knowledge base to train our NLP system on

• DBPedia to locate entities from Wikipedia to the NLP system

• OpenNLP to translate this in learning material

• Apache Hadoop for distributed processing

• Apache Pig and Whirr for deployment and management on Amazon EC2 cluster

23

Resources• http://iks-project.eu

• http://stanbol.demo.nuxeo.com

• http://incubator.apache.org/stanbol

• http://blogs.nuxeo.com/dev

• http://hadoop.apache.org/

• http://incubator.apache.org/opennlp/

• http://incubator.apache.org/projects/whirr.html