Copyright Antidot™ 1 Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE...

transcript

Copyright Antidot™ 1

Linked Enterprise Data

LEVERAGING THE SEMANTIC WEB STACKIN A CORPORATE ENVIRONMENT

ISWC 2012 – BOSTONFABRICE LACROIX – LACROIX@ANTIDOT.NET

Antidot – who we are

French-based Software Vendor Since 1999 | Paris, Lyon, Aix-en-Provence Information access | Data

management

Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient.

ClientsPublishing

Healthcare

Enterprises E-commerce

Unstructured documents

files, ECM, collaborative spacesintranet, extranet, Web sitese-mails, instant messaging

Structured data

CRM, ERP, directoryknowledge basesbusiness applications (production, support)

IS are bloated

1 practice => 1 need => 1 application => 1 siloInformation system is driven by the processData are numerous, various and scattered

Solutions or workarounds?

BI MDM

SOA Search

Solutions and workarounds

Enterprise Search brings little value to users Document oriented Does not solve real business problems

Google like Verity like

What we want

Production

FilesSupport

Changing the paradigm

Switching from an application view to a data centric way of thinking.

Bring out the implicit

Build the Giant Enterprise Graph

Linked Enterprise Data application of the Semantic Web

technologies and Linked Data principles to the enterprise infrastructure

What works for the Web…

Federating silos on the Web

http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)

…can’t always be used

in corporate IS Legacy apps can’t be "Sparql’ed" 80% un- or semi- structured data don’t fit in the

model as such Defining vocabularies/ontologies for silos is too

complex and expensive Don’t want RDF per se but valuable information External data is available in XML/JSON through

Web Services Staff trained for RDB, XML, Web apps. No Risk and stability strategy: SemWeb

technology considered as new and immature

The RDF/storage approach

Setting up a global RDF repository does not work either ITs are afraid by the "RDF everywhere"

activists

Semantic Web technology

still is the right solution

in corporate environmentBUT it is not an aim

JUST use it

as a means

Just do it

Think of it as a stream paradigm build new objects using existing data without interfering with the existing

infrastructure with SemWeb somewhere under the hood

Enterprise Graph HowTo

Construct the graph generate triples from data create triples from documents

Leverage the graph enrich infer

Browse the graph select resources build objects

Trash the graph

How: extract & normalize

Harvest and normalize as in an ETL fetch, clean, transform… normalize records (names, IDs) to

prepare the linking step

For databases db2triples : an RDB2RDF

implementation by Antidot (open source, W3C validated)

How: semantize

Don’t transform everything in RDF cherry-pick a subset of interesting fields

for each object and create their RDF triples counterpart

interesting == needed for linking or inferring

Semantize

How: semantize

Triples generation Be smart: avoid upfront ontology design,

use small vocabularies Be pragmatic: transform XML tags and

field names to predicates Be agile: only insert what you need. And

when you need more, add more.

Semantic Web fuels the modeling, linking and information building process

Trash the graph

How: semantize

Unstructured documents Extract metadata and transform them as

needed to RDF.➡ Ex: author => dc:creator

Use of text-mining to extract named entities: people, organizations, products…➡ generate those entities list using the data

sources: directory for employees, CRM for companies and people, ERP for products

➡ create triples like doc_URI quotes entity_URI

How: semantize

Unstructured documents Compare documents using various and

dedicated algorithms➡ is the same➡ is included➡ is similar➡ is related

Generates new triples➡ create triples like<docA> is_sub_version_of <docB>

Trash the graph

How: enrich

Enrich the graph run specific algorithms to generate more

links and triples (classifiers, topic detection, …)

insert external data gathered from the LOD or other external datasets or APIs

How: infer

Create new knowledge add rules according to your needs

IF a coworker is quoted in documents

THEN the business unit is bound to the documents

AND this coworker belongs to a business unit

Trash the graph

How: build

Build select resources corresponding to

objects seeds (using Sparql queries) for each seed, follow links smartly in

order to create basic objects

How: build

Finalize decorate the new knowledge objects

with data set apart (not loaded in the triplestore)

now we have rich user-actionable objects

Build Finalize

Trash the graph

How: expose

Make the new information available to users and to the entire IS

EnrichHarvest

Classify

Semantize

NormalizeAnnotate

Indexation AFS search engine

RDF Triplestore (Linked Data)

Relational DB

Conclusion

It works! The triples we create and the inference

rules we add are dictated by the goal / application➡ usage and value oriented

We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL➡ we are agile

What matters is the graph. But the graph is not the triplestore➡ storage independent

There’s an app for that

Antidot Information Factory a software solution designed specifically

to leverage structured and unstructured data

enable large-scale processing of existing data

automate publishing of enriched or newly created information.

Harvest Normalize Semantize Enrich Build Expose

The Giant Enterprise Graph

Now we have a path to let SemWeb enter the enterprise

THANKS FOR YOUR ATTENTIONQUESTIONS?

DiscussUnderstandLearnExchange

www.antidot.netinfo@antidot.net

Copyright Antidot™ 1 Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE...

Documents