Copyright Antidot™ 1 Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE...

Post on 26-Dec-2015

217 views 0 download

Tags:

transcript

Copyright Antidot™ 1

Linked Enterprise Data

LEVERAGING THE SEMANTIC WEB STACKIN A CORPORATE ENVIRONMENT

ISWC 2012 – BOSTONFABRICE LACROIX – LACROIX@ANTIDOT.NET

Copyright Antidot™ 2

Antidot – who we are

French-based Software Vendor Since 1999 | Paris, Lyon, Aix-en-Provence Information access | Data

management

Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.

Copyright Antidot™ 4

Unstructured documents

files, ECM, collaborative spacesintranet, extranet, Web sitese-mails, instant messaging

Copyright Antidot™ 5

Structured data

CRM, ERP, directoryknowledge basesbusiness applications (production, support)

Copyright Antidot™ 6

IS are bloated

1 practice => 1 need => 1 application => 1 siloInformation system is driven by the processData are numerous, various and scattered

Copyright Antidot™ 7

Solutions or workarounds?

BI MDM

SOA Search

Copyright Antidot™ 8

Solutions and workarounds

Enterprise Search brings little value to users Document oriented Does not solve real business problems

Google like Verity like

Copyright Antidot™ 9

What we want

Copyright Antidot™ 10

What we want

LDAP

CRM

Production

ERP

ECM

FilesSupport

Copyright Antidot™ 11

Changing the paradigm

Switching from an application view to a data centric way of thinking.

Copyright Antidot™ 12

Bring out the implicit

Build the Giant Enterprise Graph

Copyright Antidot™ 13

LED

Linked Enterprise Data application of the Semantic Web

technologies and Linked Data principles to the enterprise infrastructure

Copyright Antidot™ 14

What works for the Web…

Federating silos on the Web

http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)

Copyright Antidot™ 15

…can’t always be used

in corporate IS Legacy apps can’t be "Sparql’ed" 80% un- or semi- structured data don’t fit in the

model as such Defining vocabularies/ontologies for silos is too

complex and expensive Don’t want RDF per se but valuable information External data is available in XML/JSON through

Web Services Staff trained for RDB, XML, Web apps. No Risk and stability strategy: SemWeb

technology considered as new and immature

Copyright Antidot™ 16

The RDF/storage approach

Setting up a global RDF repository does not work either ITs are afraid by the "RDF everywhere"

activists

Copyright Antidot™ 17

Semantic Web technology

still is the right solution

in corporate environmentBUT it is not an aim

JUST use it

as a means

Copyright Antidot™ 18

Just do it

Think of it as a stream paradigm build new objects using existing data without interfering with the existing

infrastructure with SemWeb somewhere under the hood

Copyright Antidot™ 19

Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph

Copyright Antidot™ 20

How: extract & normalize

Harvest and normalize as in an ETL fetch, clean, transform… normalize records (names, IDs) to

prepare the linking step

For databases db2triples : an RDB2RDF

implementation by Antidot (open source, W3C validated)

Copyright Antidot™ 21

How: semantize

Don’t transform everything in RDF cherry-pick a subset of interesting fields

for each object and create their RDF triples counterpart

interesting == needed for linking or inferring

Semantize

Copyright Antidot™ 22

How: semantize

Triples generation Be smart: avoid upfront ontology design,

use small vocabularies Be pragmatic: transform XML tags and

field names to predicates Be agile: only insert what you need. And

when you need more, add more.

Semantic Web fuels the modeling, linking and information building process

Copyright Antidot™ 23

Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph

Copyright Antidot™ 24

How: semantize

Unstructured documents Extract metadata and transform them as

needed to RDF.➡ Ex: author => dc:creator

Use of text-mining to extract named entities: people, organizations, products…➡ generate those entities list using the data

sources: directory for employees, CRM for companies and people, ERP for products

➡ create triples like doc_URI quotes entity_URI

Copyright Antidot™ 25

How: semantize

Unstructured documents Compare documents using various and

dedicated algorithms➡ is the same➡ is included➡ is similar➡ is related

Generates new triples➡ create triples like<docA> is_sub_version_of <docB>

Copyright Antidot™ 26

Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph

Copyright Antidot™ 27

How: enrich

Enrich the graph run specific algorithms to generate more

links and triples (classifiers, topic detection, …)

insert external data gathered from the LOD or other external datasets or APIs

Copyright Antidot™ 28

How: infer

Create new knowledge add rules according to your needs

IF a coworker is quoted in documents

THEN the business unit is bound to the documents

AND this coworker belongs to a business unit

Copyright Antidot™ 29

Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph

Copyright Antidot™ 30

How: build

Build select resources corresponding to

objects seeds (using Sparql queries) for each seed, follow links smartly in

order to create basic objects

Build

Copyright Antidot™ 31

How: build

Finalize decorate the new knowledge objects

with data set apart (not loaded in the triplestore)

now we have rich user-actionable objects

Build Finalize

Copyright Antidot™ 32

Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph

Copyright Antidot™ 33

How: expose

Make the new information available to users and to the entire IS

EnrichHarvest

Classify

Semantize

NormalizeAnnotate

Indexation AFS search engine

RDF Triplestore (Linked Data)

Relational DB

Copyright Antidot™ 34

Conclusion

It works! The triples we create and the inference

rules we add are dictated by the goal / application➡ usage and value oriented

We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL➡ we are agile

What matters is the graph. But the graph is not the triplestore➡ storage independent

Copyright Antidot™ 35

There’s an app for that

Antidot Information Factory a software solution designed specifically

to leverage structured and unstructured data

enable large-scale processing of existing data

automate publishing of enriched or newly created information.

Harvest Normalize Semantize Enrich Build Expose

Copyright Antidot™ 36

The Giant Enterprise Graph

Now we have a path to let SemWeb enter the enterprise

Copyright Antidot™ 37

THANKS FOR YOUR ATTENTIONQUESTIONS?

DiscussUnderstandLearnExchange

www.antidot.netinfo@antidot.net