+ All Categories
Home > Software > Not All Graph Databases are Created Equally

Not All Graph Databases are Created Equally

Date post: 17-Jul-2015
Category:
Upload: ontotext
View: 155 times
Download: 1 times
Share this document with a friend
Popular Tags:
34
Not All Graph Databases are Created Equally A webinar by with Atanas Kiryakov, CEO and Founder of Ontotext September 30 th , 2014 Not All Graph Databases are Created Equally #1 Sept 2014
Transcript
Page 1: Not All Graph Databases are Created Equally

Not All Graph Databases are Created Equally

A webinar by with Atanas Kiryakov, CEO and Founder of Ontotext

September 30th, 2014

Not All Graph Databases are Created Equally #1Sept 2014

Page 2: Not All Graph Databases are Created Equally

• An overview on graph databases

• Triplestores: advantages and design choices

• Reasoning: best practices and pitfalls

• Essential features of triplestores – owl:sameAs optimization

– Full-Text Search and NoSQL Connectors

• Enterprise resilience and scalability

• Text mining pipeline and triplestores

• Success stories

Today’s Topics

Not All Graph Databases are Created Equally #2Sept 2014

Page 3: Not All Graph Databases are Created Equally

About Ontotext

• Information management company providing text analysis, data management and state-of-the-art semantic technology

• 70 employees, head quartered in Sofia, Bulgaria

• Sales presence in London, Washington, DC, and Boston

• Clients include BBC, AstraZeneca, US DoD, Wiley & Sons, Getty

• Over 400 person-years in R&D to create a one-stop shop for:– Content enrichment

– Data management

– Graph database engine

• Open and standard compliant technology:– RDF(S), OWL, GATE, Sesame

#3Not All Graph Databases are Created Equally Sept 2014

Page 4: Not All Graph Databases are Created Equally

Some of our clients

The most

popular

financial

newspaper

#4Not All Graph Databases are Created Equally Sept 2014

Page 5: Not All Graph Databases are Created Equally

• Standard compliance– Unlike most of the NoSQL and proprietary graph databases

– Based on a mature set of W3C standards: RDF, RDFS, OWL, SPARQL

• Schema agility, easy querying diverse data– Unlike SQL databases

– RDF facilitates dealing with multiple schemata and schema evolution

• Allow for complex queries– Unlike the typical NoSQL databases

– SPARQL allows for comprehensive structured queries, similar to SQL

– Queries that are not possible in SQL (e.g. unknown relation types)

• Linked Data Ready– RDF is the standard for linked data publication

How are RDF Databases Different?

#5Not All Graph Databases are Created Equally Sept 2014

Page 6: Not All Graph Databases are Created Equally

Visual Representation - Graph Database

Not All Graph Databases are Created Equally #6Sept 2014

Page 7: Not All Graph Databases are Created Equally

• Have a database of locations, with part-of info

• Have a database with companies, with dependencies

• Define semantics for the relevant relationships:– sub-region and control are transitive relationships

– Located-in is transitive over sub-region

• Define the semantics of suspicious relationshipsCONSTRUCT { ?orgA my:suspiciousLink ?orgB } WHERE {

?orgA ptop:locatedIn ?x ; fibo:controls ?y .

?y fibo:controls ?orgB ; ptop:locatedIn ?z .

?orgB ptop:locatedIn ?x .

?z a ptop:OffshoreZone .

}

What It Takes to Make It Work?

#7Not All Graph Databases are Created Equally Sept 2014

Page 8: Not All Graph Databases are Created Equally

Sample RDF Graph: Data and Schema

#8

myData: Maria

ptop:Agent

ptop:Person

ptop:Woman

ptop:childOf

ptop:parentOf

rdfs:range

ow

l:inverseO

f

inferred

myData:Ivan

owl:relativeOf

owl:inverseOfowl:SymmetricProperty

rdfs:subPropertyOf

owl:inverseOf

owl:inverseOf

rdf:

typ

e

rdf:

typ

e

rdf:type

Not All Graph Databases are Created Equally Sept 2014

Page 9: Not All Graph Databases are Created Equally

Data Representation: RDBMS vs. RDF

#9

Person

ID Name Gender

1 Maria P. F

2 Ivan Jr. M

3 …

Parent

ParID ChiID

1 2

Spouse

S1ID S2ID From To

1 3

Statement

Subject Predicate Object

myo:Person rdf:type rdfs:Class

myo:gender rdfs:type rdfs:Property

myo:parent rdfs:range myo:Person

myo:spouse rdfs:range myo:Person

myd:Maria rdf:type myo:Person

myd:Maria rdf:label “Maria P.”

myd:Maria myo:gender “F”

myd:Maria rdf:label “Ivan Jr.”

myd:Ivan myo:gender “M”

myd:Maria myo:parent Myd:Ivan

myd:Maria myo:spouse myd:John

Relational Tables RDF Representation

Not All Graph Databases are Created Equally Sept 2014

Page 10: Not All Graph Databases are Created Equally

What is RDF Good for?

• Metadata-based content management– Metadata represents a re-usable result of content analytics

– It can be repurposed allowing for a wide range of applications

– Most of the search engines do analytics, but the results are not explicit; so, they cannot be validated, refined and used by other applications

• Linking text and structured data– Allows structured, uniform and efficient access to diverse domain

models, taxonomies, dictionaries, reference databases

• Reference data management– E.g. product catalogs and taxonomies that are too structured to be

managed with NoSQL, but too diverse and interconnected for SQL

• Using open linked data (LOD)– A growing amount and diverse public data can be used in enterprise

Knowledge Management applications

#10Not All Graph Databases are Created Equally Sept 2014

Page 11: Not All Graph Databases are Created Equally

Interlinking Text and Data

#11Not All Graph Databases are Created Equally Sept 2014

Page 12: Not All Graph Databases are Created Equally

Why is Inference Important?

• Intelligent mapping of queries to data– Rather than query 10+ different sources, an application can resolve

queries using inferred facts. These facts can evolve independently.

• Cheaper data integration, lower costs– Data integration costs are reduced

– Developers don’t need to maintain multiple schemas

• Finding patterns and inferring new relationships– Users can use inferred facts to discover patterns, connections and

relationships that they previously did not know existed

• Database depth, accurate complete results– With tens of billions of triples, users get complete, accurate results

• Faster query evaluation– One can look at materialization as specific type of indexing

#12Not All Graph Databases are Created Equally Sept 2014

Page 13: Not All Graph Databases are Created Equally

Lightweight Inference - Simple Rules

Not All Graph Databases are Created Equally

The database will return ‘Ivan’ as result of a query for

Maria relativeOf ?x

when the fact asserted was

Ivan childOf Maria

This type of “intelligence” can be achieved in many ways, but graph databases offer the cleanest approach, delivering the best efficiency and lowest cost through the entire data lifecycle.

myData: Maria

ptop:Agent

ptop:Person

ptop:Woman

ptop:childOf

ptop:parentOf

rdfs:range

ow

l:inverseO

f

inferred

myData:Ivan

owl:relativeOf

owl:inverseOfowl:SymmetricProperty

rdfs:subPropertyOf

owl:inverseOf

owl:inverseOf

rdf:

typ

e

rdf:

typ

e

rdf:type

#13Sept 2014

Page 14: Not All Graph Databases are Created Equally

Forward-Chaining and Materialization

<C1,rdfs:subClassOf,C2>

<C2,rdfs:subClassOf,C3>

<C1,rdfs:subClassOf,C3>

<I,rdf:type,C1>

<C1,rdfs:subClassOf,C2>

<I,rdf:type,C2>

<P1,owl:inverseOf,P2>

<I1,P1,I2>

<I2,P2,I1>

<P1,rdf:type,owl:SymmetricProperty>

<P1,owl:inverseOf,P1>

<P1,rdfs:range,C1>

<I1,P1,I2>

<I2,rdf:type,C1>

The rule entailment language used by GraphDB is a simplification of Datalog, used in DBMS since the 1980’s.

myData: Maria

ptop:Agent

ptop:Person

ptop:Woman

ptop:childOf

ptop:parentOf

rdfs:range

ow

l:inverseO

f

inferred

myData:Ivan

owl:relativeOf

owl:inverseOfowl:SymmetricProperty

rdfs:subPropertyOf

owl:inverseOf

owl:inverseOf

rdf:

typ

e

rdf:

typ

e

rdf:type

#14Not All Graph Databases are Created Equally Sept 2014

Page 15: Not All Graph Databases are Created Equally

#15

Two principle strategies for rule-based inference:• Forward-chaining: to start from the known facts (the explicit statements),

and to perform inference in an inductive fashion. Typically, the goal is to compute the inferred closure

• Backward-chaining: to start from a particular fact or a query, and to verify it or get all possible results. In a nutshell, the reasoner decomposes (or transforms) the query, or the fact, into simpler, or alternative, facts, whichare available in the KB, or can be proven through further recursivetransformations

Inferred closure: extension of a an RDF graph with all implicit facts (triples) that can be inferred from it

Materialization: keep an up-to-date inferred closure

Reasoning Strategies

Not All Graph Databases are Created Equally Sept 2014

Page 16: Not All Graph Databases are Created Equally

Inference - Retraction

• Inferences materialized at load time– Fast query answering, as no inference is done during query time

– Alternative approaches (backward-chaining) harm query optimization

• Query optimization requires statistics about the „selectivity“ of the contraints in the query, in order to reorder them for optimal execution

• Retracting statements using custom algorithm– Does not require re-computation of full-closure

– Forward chaining to find potentially affected inferences

– Backward chaining to test which inferences are still supported

– No truth maintenance; pending patent application

– Fast (same order of magnitude as inserting)

• Result: allowing massive query loads along with huge updates (inserts+deletes) rates

#16Not All Graph Databases are Created Equally Sept 2014

Page 17: Not All Graph Databases are Created Equally

Essential Features

• Geo-spatial indexing

• Ranking graph nodes

• Optimized handling of owl:sameAS

• Integration with FTS and NoSQL engines

#17Not All Graph Databases are Created Equally Sept 2014

Page 18: Not All Graph Databases are Created Equally

The Honey and the Sting of owl:sameAs

E11 E22

E12 E21

E23

#18Not All Graph Databases are Created Equally Sept 2014

Page 19: Not All Graph Databases are Created Equally

E11 E22

E12 E21

E23

The Honey and the Sting of owl:sameAs

#19Not All Graph Databases are Created Equally Sept 2014

Page 20: Not All Graph Databases are Created Equally

GraphDB Connectors

• Adapters and configuration interfaces capable to connect GraphDB to external stores/engines

• Based on GraphDB’s Plug-in and Notification APIs

• Integrate IR engines and NoSQL databases

• IR engines– Full-text searches, Faceted search, Real-time synchronization, Property

paths

– Lucene , SOLR, Elastic search (end of June)

• NoSQL databases– Access Big Data from SPARQL

#20Not All Graph Databases are Created Equally Sept 2014

Page 21: Not All Graph Databases are Created Equally

Integration with FTS Engines

• Limited query expressivity

• Extreme performance and scalability

Replication

Query Processor

Graph indexesInternal indexes

SPARQL queries

IR queries

IR engine GraphDB database

#21Not All Graph Databases are Created Equally Sept 2014

Page 22: Not All Graph Databases are Created Equally

Enterprise Resilience

GraphDB Enterprise has two design goals:

• Improved resilience– failover, dynamic configuration

• Improved query bandwidth– larger cluster means more queries per unit time

#22Not All Graph Databases are Created Equally Sept 2014

Page 23: Not All Graph Databases are Created Equally

Replication Cluster

• Two types of nodes - flexible topologies

• Resilient to failure of workers and masters

Worker 1Worker 3

Master

Worker 2

Master(hot standby)

Dispatches queries and updates to workers(read/write)

Dispatches queries to workers(read only)

GraphDB-Enterprise worker nodes

Queries &updates

Queries only

#23Not All Graph Databases are Created Equally Sept 2014

Page 24: Not All Graph Databases are Created Equally

High Availability Cluster

#24Not All Graph Databases are Created Equally Sept 2014

Page 25: Not All Graph Databases are Created Equally

Integration with Text Mining Pipelines

Not All Graph Databases are Created Equally #25Sept 2014

Page 26: Not All Graph Databases are Created Equally

Technology Portfolio

#26Not All Graph Databases are Created Equally Sept 2014

Page 27: Not All Graph Databases are Created Equally

Ontotext and BBC

Not All Graph Databases are Created Equally

Profile• Mass media broadcaster founded in 1922• 23,000 employees and over 5 billion

pounds in annual revenue.

Goals• Create a dynamic semantic publishing

platform that assembled web pages on-the-fly using a variety of data sources

• Deliver highly relevant data to web site visitors with sub-second response

Challenges• BBC journalists author and publish content

which is then statistically rendered. The costs and time to do this were high.

• Diverse content was difficult to navigate, content re-use was not flexible

• User experience needed to be improved with relevant content

"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform."

John O’DonovanChief Technical Architect

#27Sept 2014

Page 28: Not All Graph Databases are Created Equally

Ontotext and AstraZeneca

Not All Graph Databases are Created Equally

Profile• Global, Bio-pharma company• $28 billion in sales in 2012• $4 billion in R&D across three continents

Goals• Efficient design of new clinical studies• Quick access to all of the data• Improved evidence based decision-making• Strengthen the knowledge feedback loop• Enable predictive science

Challenges• Over 7,000 studies and 23,000 documents

are difficult to obtain• Searches returning 1,000 – 10,000 results• Document repositories not designed for

reuse• Tedious process to arrive at evidence

based decisions

#28Sept 2014

Page 29: Not All Graph Databases are Created Equally

Context-based Disambiguation

Not All Graph Databases are Created Equally #29Sept 2014

Page 30: Not All Graph Databases are Created Equally

Ontotext and LMI

Not All Graph Databases are Created Equally

Profile• Established in 1961 to enable federal

agencies • Specializes in logistics, financial,

infrastructure & information management

Goals• Unlock large collections of complex

documents• Improve analyst productivity• Create an application they can sell to US

Federal agencies

Challenges• Analysts taking hours to find, download

and search documents, using inaccurate keyword searches

• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches

• Extracts knowledge from collection of documents

• Uses GraphDB to intuitively search and filter• Knowledge base used to suggest searches• Hyper speed performance• Huge savings in analyst time• Accurate results

#30Sept 2014

Page 31: Not All Graph Databases are Created Equally

Ontotext and Euromoney

Not All Graph Databases are Created Equally

Profile• Euromoney Institutional Investor PLC, the

international online information and events group

Goals• Create a horizontal platform to serve 100

different publications • create a new publishing and information

platform which would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository

Challenges• Different domains covered • Sophisticated content analytics incl.

Relation, template and scenario extraction

• Analytics of reports and news of various domains

• Extraction of sophisticated macro economic views on markets and market conditions; trades, condition and trade horizons, assets, asset allocations, etc.

• Multi-faceted search • Completely new content and data

infrastructure

#31Sept 2014

Page 32: Not All Graph Databases are Created Equally

GraphDB Essentials

• Enterprise grade (resilience, scale, management)

• Geo-spatial, ranking and full-text search

• Scales to tens of billions of RDF statements

• Expressive inference (from RDFS to OWL2-RL)

• SPARQL 1.1 (query, update, federation, graph store)

• Pure Java implementation (portable)

• Sesame openRDF framework (Jena also in GraphDB-SE)

#32Not All Graph Databases are Created Equally Sept 2014

Page 33: Not All Graph Databases are Created Equally

Additional Resources: Ontotext.com

Not All Graph Databases are Created Equally #33Sept 2014

Page 34: Not All Graph Databases are Created Equally

Thank you!

Not All Graph Databases are Created Equally

A Link to the recording and response to any of your

unanswered questions will be sent out shortly.

September 30th, 2014

Not All Graph Databases are Created Equally #34Sept 2014


Recommended