+ All Categories
Home > Technology > MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

Date post: 23-Jun-2015
Category:
Upload: marklogic
View: 558 times
Download: 0 times
Share this document with a friend
Description:
Semantic Technologies such as RDF and SPARQL have become mainstream ways to manage and model graph-like relationships. Unlike generic graph databases, RDF triple stores treat the links between nodes as first-class citizens with their own unique identities. And they also provide advanced capabilities such as inferencing and ontology management. This presentation will explore the Semantic technology stack, including where and how to use it effectively as part of your information access solution, using real-world examples from several industries.
Popular Tags:
81
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantic Technology in the Real World Presented by: Stephen Buxton October 2014
Transcript
Page 1: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Semantic Technology in the Real World Presented by: Stephen Buxton October 2014

Page 2: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2

Hierarchical Era For your application data! • Application- and

hardware-specific

We Are The New Generation Database

Relational Era “For all your structured data!” • Normalized, tabular

model • Application-

independent query • User control

Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Application services • Faster time-to-results

Page 3: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3

Harnessing Data & Reimagining Applications

Reduce Risk

Manage Compliance

Create New Value from Data

Optimize Operations

Lower TCO / Better IT Economics

Better Decision-making

Page 4: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4

The Only Enterprise NoSQL Database Search & Query

ACID Transactions

High Availability / Disaster Recovery

Replication

Government-grade Security

Scalability & Elasticity

On-premise or Cloud Deployment

Hadoop for Storage & Compute

Semantics

Faster Time-to-Results

SEARCH DATABASE

APPLICATION SERVICES

Page 5: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5

Agenda Semantic Technologies – a whirlwind tour

Triple Stores (Graph Databases) NLP – deriving structured from unstructured

Semantics in the context of a database You also need documents … and scalars, and geospatial, and bitemporal, and … … and an Enterprise database

Semantics use cases Semantics and search Semantics and data integration

Page 6: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6

SEMANTIC TECHNOLOGIES

Page 7: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7

Semantics Is: A New Way to Organize Data

Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England

Page 8: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8

Semantics Is: A New Way to Organize Data

Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England

Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) England

"John Smith" "England" livesIn "London" isIn

livesIn

RDF Triples

Page 9: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9

Semantics Is: A New Way to Organize Data

Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England

Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) England

RDF Triples

John England London isIn

livesIn

livesIn

Page 10: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10

Triple Stores and Graph Databases Triple Store

Triples lookup + graph traversal

Standard data model, standard language SPARQL queries over RDF

Example: MarkLogic

Graph Database

Graph analytics (consider the whole graph)

"Show me the shortest [weighted] path between .."

"Show me the node with highest degree"

Proprietary data model, proprietary language

Example: Neo4j

Page 11: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11

Triple Stores and Graph Databases Triple Store

Triples lookup + graph traversal

Standard data model, standard language SPARQL queries over RDF

Example: MarkLogic

Property Graph Database

Graph analytics "Show me the shortest [weighted] path between .."

"Show me the node with highest degree"

Proprietary data model, proprietary language

Example: Neo4j

Graph Database

Page 12: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12

RDF – What is RDF? Resource Description Framework

W3C Spec with a defined vocabulary for representing facts/relationships http://www.w3.org/RDF/

Facts expressed as triples: (subject, predicate, object) Abstract data model facilitates data sharing/merging even if the underlying

representations are different example: Ingest RDBMS data into a triple store as RDF triples

example: Map entities using predicates such as sameAs, subClassOf

Page 13: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

RDF – What is a triple? A single fact/relationship consisting of (subject, predicate, object)

Subject An IRI representing a resource. For example a person or a company.

Predicate An IRI representing a property or characteristic of the subject; or of the

relationship between the subject and the object.

Also known as an arc or edge.

Object An IRI or a typed literal.

Typed literal: xsd:double, xsd:string, xsd:date, …

IRI: may be the subject of other triples.

Page 14: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

RDF – What is an IRI? Internationalized [Unique] Resource Identifier

A string used to uniquely identify a resource in the Universe. An IRI may contain characters from the Universal Character Set (Unicode/ISO 10646).

Allows for Chinese, Japanese Kanji, etc. characters IRI vs. URI

URIs (uniform resource identifiers) are limited to ASCII characters IRI/URI vs URL

URL is a uniform resource locator There’s an expectation that if you follow a URL, you’ll find something useful An IRI is an identifier – it may or may not also be a locator

Page 15: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

RDF - Examples David Bowie London

birthPlace

London

latitude

51.5072

Page 16: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

RDF - Examples

Example 1: Object is also a subject, therefore its reflected as IRI Subject <http://dbpedia.org/resource/David_Bowie>

Predicate <http://dbpedia.org/ontology/birthPlace>

Object <http://dbpedia.org/resource/London>

David Bowie London

birthPlace

Example 2: Object is a typed literal Subject <http://dbpedia.org/resource/London>

Predicate <http://w3.org/2003/01/geo/wgs84_pos#lat>

Object “51.5072”^^<http://www.w3.org/2001/XMLSchema#float>

London

latitude

51.5072

Page 17: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

RDF – A Serialization Turtle = Terse RDF Triple Language (*.ttl)

Natural, easy to read <http://dbpedia.org/resource/David_Bowie> <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/London> . <http://dbpedia.org/resource/David_Bowie> <http://dbpedia.org/ontology/birthDate> "1947-01-08"^^<http://www.w3.org/2001/XMLSchema#date> .

Namespace prefixes for brevity Semicolon indicates repeating subject

@prefix db: <http://dbpedia.org/resource/> . @prefix onto: <http://dbpedia.org/ontology/> . @prefix xs: <http://www.w3.org/2001/XMLSchema> . db:David_Bowie onto:birthPlace db:London ; onto:birthDate "1947-01-08"^^xs:date .

Page 18: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

SPARQL – What is SPARQL? SPARQL – the SPARQL Protocol and RDF Query Language

an RDF query language, that is, a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format (wikipedia)

Looks a lot like SQL Based on pattern matching 4 kinds of SPARQL queries

SELECT, CONSTRUCT, ASK, DESCRIBE + SPARQL Update (part of SPARQL 1.1)

Page 19: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Page 20: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Prefixes – makes for less typing (a bit like namespaces in XML)

Page 21: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Comments

Page 22: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Projection – variables are bound in the pattern match (or externally bound)

Page 23: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Selection – select triples matching these patterns

Page 24: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Filter – return only triples that match these conditions

Page 25: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Order by – order the results

Page 26: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

# Find all news item headlines newer than July 11 2013.

SELECT ?s ?headline ?date

WHERE {

?s a rnews:NewsItem ;

rnews:headline ?headline ;

rnews:datePublished ?date .

FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )

} ORDER BY DESC(?date)

Return the ?s, ?headline, and ?date where ?s is a news item AND ?s has the headline ?headline AND ?s was published on ?date but only return results where ?date is after July 11 2013 Order the results by date descending

Page 27: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27

Inference – What is inference? We infer new facts/relationships based on:

Facts/relationships in the database Rules that we "know"

Page 28: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28

Inference – Example[1] We know:

prod001 is a Henley prod001 is blue Henley is a subclass of Shirt

We can infer: prod001 is a Shirt

We can ask "find me all blue Shirts", and find prod001

Page 29: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29

Inference – Rules[1] We said "Henley is a subclass of Shirt" We need a formal definition (rule) for subclass

rule "subClassOf rdfs9" construct { ?x a ?c2 } { ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 . filter(?c1!=?c2) }

Page 30: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30

Inference – Example[2] We know:

prod001 is a Henley ID_001 is blue prod001 is the same as ID_001

We can infer: prod001 is a blue Henley

We can ask "find me all blue Henleys", and find prod001

Page 31: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31

Inference – Rules[2] We said "prod001 is the same as ID_001" We need a formal definition (rule) for same as

rule "sameAs rdfp11b" construct { ?u2 ?p ?v } { ?u1 ?p ?v . ?u1 owl:sameAs ?u2 . filter(?u1!=?u2) }

Page 32: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32

Share data + share rules and vocabularies Make use of the Linked Open Data web

Make use of standard Ontologies

Generalized rules: owl, rdf, rdfs Domain-specific rules and vocabularies: foaf, FIBO, …

Page 33: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33

NLP – What is it? Natural Language Processing (NLP) - A field of computer science and linguistics enabling computers

to derive meaning from human or natural language input. NLP technology can extract meaning from text or speech.

Text Analytics – Analysis to derive high-quality information from text usually involving the process of structuring the input text, deriving patterns within the structured data and interpretation of the output.

Entity Extraction – A subset of Text Analytics, where you run a tool over some text to identify "entities" in the text. "Entities" may be people, places, companies, organizations, phone numbers, and so on. MarkLogic Partners that do entity extraction include Temis, NetOwl, Smart Logic, and SAP. Some can return entities in the form of RDF.

Event Extraction – an emerging enhancement to Entity Extraction that extracts events ("John went to China") as well as entities ("John is a person, China is a place") from text.

Page 34: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34

NLP – Where does it fit?

Page 35: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 35

Where do Triples come from? Triples are used to express

Facts

Relationships

Page 36: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 36

DOMAIN WORLD AT LARGE

DOCUMENTS

Page 37: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 37

Facts from the World at Large

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Linked Open Data

Facts that are freely available In a form that’s easily consumed

DBpedia (wikipedia as structured information)

Einstein was born in Germany

Ireland’s currency is the Euro GeoNames

Doha is the capital of Qatar

Doha has these lat/long coordinates

Page 38: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 38

Facts from your domain Like Open Data, but domain specific

Might be proprietary within a company

Or shared across an industry

Includes data and ontologies

Some Examples

A bank's proprietary reference data

A pharmaceutical company's drug ontology

An industry-wide ontology such as FIBO

Proprietary Semantic Facts (Facts and Taxonomies in your

organization or industry)

Page 39: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 39

Facts from Documents Document metadata

Ex: Categories, author, publish date, source

Facts in free-flowing text

Entities: this document mentions the person Richard Nixon, the product Advil, the company IBM

Events: this document says that Nixon went to China, John Smith met Jane Doe, Barclays acquired Lehman Brothers

Found automatically or provided at authoring time

Page 40: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40

The World of Triples Linked Open Data

(Free semantic facts available to anyone)

Facts from Free-Flowing Text (Derived from semantic enrichment)

Proprietary Semantic Facts (Facts and Taxonomies in your organization)

Facts in Documents (Part of metadata or added with authoring tools)

Sem

anti

c W

orld

Doc

um

ent

Wor

ld

Page 41: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41

Page 42: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42

Relationships Triples can model many kinds of relationships: Relationships between resources

customer123 is the same as cus_id_456 Relationships between values

"John Smith" is the same as "John Smythe" Relationships between classes

"Henley" is a sub class of "Shirt" Relationships between a predicate and its subject or object

The object of "lives in" is a place Relationships between entities and documents

"Merrill Lynch" was mentioned in reportABC (which mentioned "rogue trader")

Page 43: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43

Relationships Relationships make it easy to: Integrate data from disparate sources

customer123 (from source1) is the same as cus_id_456 (from source2) Reconcile data

"John Smith" is the same as "John Smythe" Infer new facts about the data

"Henley" is a sub class of "Shirt" The object of "lives in" is a place

Link entities with documents "Merrill Lynch" was mentioned in reportABC (which mentioned "rogue trader")

Page 44: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 44

Data Modelling - You can model everything in RDF! Pattern

Temptation: go "all-in" on RDF and SPARQL Pragmatism: back off to use a mix of RDF and XML/JSON

Example: Customer record Always delivered as a whole record Don’t shred it into RDF and reconstitute for every query! Store as XML/JSON, return as a single object

Recommendation Be pragmatic from the start Ask for requirements (what), not implementation(how) Use RDF and XML/JSON as appropriate

Page 45: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 45

Why Semantic Technologies?

Triples are atomic – easy to create, manage, combine The Linked Open Data Web shares data as triples A natural choice for metadata and real-world facts

.. and facts embedded in a document Adds relationships between facts, between documents Standards encourage tools and sharing Graph model – easy to follow links Ontologies – share information, infer new facts

Because …

Page 46: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 46

Why Semantics and Search?

Many use cases need documents, triples, and data together One database means a simple, efficient, powerful architecture Combination queries – query documents, triples, data in a single query –

open up new possibilities

Because …

Page 47: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 47

SEMANTICS AND … Better Together

Page 48: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 48

SEMANTICS AND .. DOCUMENTS

Page 49: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 49

Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>

Page 50: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 50

Triples and Documents

Triples are persisted in documents <sem:triple> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>

Page 51: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 51

Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>

Triples are persisted in documents <sem:triple> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>

Page 52: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 52

Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>

Triples can be annotated in documents <source>AP Newswire</source> <sem:triple date="1972-02-21" confidence="100"> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>

Page 53: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy"; sem:sparql(' SELECT ?country WHERE { <http://example.org/news/Nixon> <http://example.org/wentTo> ?country } ', (), (), cts:and-query( ( cts:path-range-query( "//sem:triple/@confidence", ">", 80) , cts:path-range-query( "//sem:triple/@date", "<", xs:date("1974-01-01")), cts:or-query( ( cts:element-value-query( xs:QName("source"), "AP Newswire" ), cts:element-value-query( xs:QName("source"), "BBC" ) ) ) ) ) )

Which countries did Nixon visit?

.. before 1974?

.. only show me answers where I have at least 80% confidence

.. and the source is AP Newswire OR BBC

Page 54: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 54

SEMANTICS AND .. DOCUMENTS .. AND GEOSPATIAL .. AND SCALAR (DATETIME) .. AND BITEMPORAL

Page 55: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 55

Two Hemispheres, One Brain

Page 56: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 56

Two Hemispheres, One Brain

Triples: Highly structured Atomic Do one thing well

XML and JSON: Flexible structure Rich documents Rich applications

Page 57: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 57

Combination query - scenario You work in an Incident Call Center A call comes in:

"some maniac in a blue van just tried to run me down" "I got the first three letters of his license plate: ABC"

You could look up "ABC*" in the license plate database, or … .. Look for similar incident reports

Reports that mention a "blue van" … around the same time … around the same place … with a license plate that starts with "ABC"

Page 58: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 58

<SAR> <title> Suspicious vehicle… Suspicious vehicle near airport <date> <type> <threat>

2012-11-12Z observation/surveillance

<type> suspicious activity <category> suspicious vehicle

<location> <lat> 37.497075 <long> -122.363319

<subject> IRIID <subject> IRIID

<predicate> <predicate>

isa value

<triple> <triple>

<object> license-plate <object> ABC 123

<description> A blue van… A blue van with license plate ABC 123 was observed parked behind the airport sign…

</title> </date>

</type>

</type> </category>

</threat>

</lat> </long>

</location>

</subject> </subject>

</predicate> </predicate>

</object> </object>

</description> </SAR>

</triple> </triple>

An XML or JSON document can represent many information types:

Page 59: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 59

Combination Query: Example <SAR>

<title>

Suspicious vehicle…

<date>

2012-11-12Z

<type>

<threat>

suspicious activity <category>

suspicious vehicle

<location>

<lat>

37.497075

<long>

-122.363319

<description>

A blue van…

<subject> <subject>

<predicate>

<object>

IRIID

IRIID

isa

value

license-plate

ABC 123 <predicate>

<object>

observation/surveillance <type>

<triple>

<triple>

Page 60: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 60

SEMANTICS AND .. ENTERPRISE DATABASE FEATURES

Page 61: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 61

XQuery XSLT SQL JavaScript SPARQL

GRAPH SPARQL

Semantics Database Architecture

TRIPLE

Page 62: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

MarkLogic Semantics Use Cases

Page 63: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 63

SEMANTIC SEARCH

Page 64: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 64

Semantic Search User searches and queries refined by topics and semantic relationships

Refine search with topics and concepts

Geo-location of research institutions, Semantic Visualization & Tag Clouds

Page 65: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 65

Suggested / Related Content Topic, semantic relationships and content used to find related and suggested content

Related articles

Suggestions

Augmented topic browsing

Page 66: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 66

Linked Open Data Semantic data augmenting user search and queries

Return concepts and facts in addition to results

Leverage context from all sources

Page 67: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 67

Dynamic Semantic Publishing Present content, data and information to users

Relationships power content presentation

Taxonomy browsing

Beyond WebCMS to Dynamic Publishing

Efficiently tag articles

Page 68: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 68

Master (Meta)data Management Flexible model to manage metadata

Metadata master

Digital Supply Chain powered with semantic relationships

Captures the complexity of information needed to deliver digital assets and products

Page 69: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 69

DATA INTEGRATION

Page 70: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 70

Use Case: Investment Research Environment:

SEC Filings Analyst Briefing Transcripts News Feeds Press Releases

70 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com

Challenge:

Provide a simple search solution for investment analysts to quickly identify opportunities

Page 71: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 71

Investment Research Using Semantic Technology

71 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com

SEC Filings

News Feeds

Analyst Briefings

Press Releases

Research Ontology

Page 72: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 72

Use Case: Reference Data Environment:

Hundreds of Business Units Hundreds of Products Thousands of Applications Multiple Data Formats

Structured Unstructured

Multiple Identifiers

72 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com

Challenge:

Aggregate all data for across business units and geographies.

Page 73: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 73

Reference Data Using Semantic Technology

73 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com

UltimateParent

JointVenture

WhollyOwnedSubsidiary

MajorityOwnedSubsidiary

SignificantlyOwnedSubsidiary

Customer

Customer APAC Subsidiary

Customer Japanese Subsidiary

ultimateParentOf, whollyOwnsAndControls

majorityOwnsAndControls

Page 74: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 74

Reference Data Using Semantic Technology

74 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com

Customer

Customer APAC Subsidiary

Customer Japanese Subsidiary

ultimateParentOf, whollyOwnsAndControls

majorityOwnsAndControls

Advantages of using Semantics: • Clearly define relationships between entities • Query entities and relationships together • Use graph traversal to find and discover

facts/relationships • Queries can infer data using standard rules • Run queries / serve queries from standard SPARQL

endpoints • Aggregate and report using SPARQL

Page 75: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 75

Use Case: Customer Insight

Challenge: Progress from transaction flow analysis to person-centric analytics, combining data from many diverse sources

Environment:

Dozens of transactional

systems, each with their own analytics

Interaction records External data sources Connections among

customers and other entities

Page 76: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 76

Marketing

Profile Configuration Tools

Profile Data Extracted From multiple sources

Profiles include social graphs

Fraud and Financial Crime

Customer Insight Using Semantic Technology

Customer-centric view

Page 77: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 77

Use Case: Regulatory Compliance

Environment:

Thousands of rules, millions

of accounts and onboarding documents

Impossible to pre-define dimensions, relationships

Challenge:

Provide a scalable map of regulations to internal policies and drive automated workflow.

Page 78: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 78

Regulatory Compliance Using Semantic Technology

Documents

MarkLogic Workflow

Policies Ontology

Regulations

Page 79: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 79

Use Case: Data Provenance

79 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com

Challenge: Provide a consistent way to identify the source, timeliness and accuracy of the data

Environment:

Regulations requiring data

lineage Complex data lifecycle, which

makes it hard to keep track of data elements and their changes

Page 80: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 80

Data Provenance Using Semantic Technology

<Trade> <Cashflows>

<subject> <subject> TradeID

<predicate> <predicate>

wasDerivedFrom wasAttributedTo

<triple> <triple>

<object> CDS_xyz <object> System_123

<provenance> </subject>

</subject> </predicate>

</predicate> </object> </object>

</provenance> </Trade>

</triple> </triple>

Cashflows

<PartyIdentifier> <TradeID> 123456 </TradeID>

</PartyIdentifier> </Cashflows>

Page 81: MarkLogic - Semantic Technology in the Real World - Big Data Tech Con - Oct 2014

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 81

Thank you! www.marklogic.com


Recommended