+ All Categories
Home > Data & Analytics > Applying large scale text analytics with graph databases

Applying large scale text analytics with graph databases

Date post: 05-Apr-2017
Category:
Upload: data-ninja-api
View: 48 times
Download: 3 times
Share this document with a friend
51
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. Applying Large-Scale Text Analytics with Graph Databases to Visualize Entity and Relationship Inferences Trung Diep Ronald Sujithan Zhe Wu Architect Software Architect Architect Docomo Innovations Docomo Innovations Oracle
Transcript
Page 1: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Applying Large-Scale Text Analytics with Graph Databases to Visualize Entity and Relationship Inferences Trung Diep Ronald Sujithan Zhe Wu Architect Software Architect Architect Docomo Innovations Docomo Innovations Oracle

Page 2: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Outline

2

• Introduction and overview of graph technologies and graph database

• RDF semantic graph

• Property graph

• Overview of text analytics offered by Data Ninja Services

• Case Study #1: news mining application

• Case Study #2: insights from analyzing Amazon product reviews

• Summary

Page 3: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

• Relational Model • Graph Model

Relational Model vs. Graph Model

Courtesy: Tom Sawyer 2016

Page 4: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Two Graph Models: RDF and Property Graph

RDF Data Model

• Data federation

• Knowledge representation

• Inferencing

Social Network Analysis

National Intelligence Public Safety Social Media search Marketing - Sentiment

Linked Data / Semantic Mediation

Property Graph Model • Graph Search & Analysis

• Big Data analytics

• Entity analytics

Life Sciences Health Care Publishing Finance

Application Area Graph Model Industry Domain

Release 2 (12.2) in Oracle Cloud

Page 5: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

• World’s fastest data loading performance

• World’s fastest query performance

• Worlds fastest inference performance

• Massive scalability: 1.08 trillion edges

• Platform: Oracle Exadata X4-2 Database Machine

• Source: w3.org/wiki/LargeTripleStores, 9/26/2014

Oracle Database 12c can load, query and inference millions of RDF graph edges

per second

0.00

0.50

1.00

1.50

2.00

Query Load Inference

1.13

1.42 1.52

Millions of triples per second

World’s Fastest Big Data Graph Benchmark 1 Trillion Triple RDF Benchmark with Oracle Spatial and Graph

Page 6: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

What is RDF

A graph data model for web resources and their relationships

The graph can be serialized into - RDF/XML, N3, N-TRIPLE, …

Construction unit: Triple

(or assertion, or fact) <http://foobar> <:produces> <:mp3>

Quads (named graphs) add context, provenance, identification, etc. to assertions

<http://foobar> <:produces> <:mp3 > <:ProductGraph>

Subject Predicate Object

http://www.foobar.com

“CA”

http://www.foobar.com/products/mp3

http://…/locatedIn

http://…/produce

http://www.oracle.com

http://www.oracle.com/products/RDF

http://…/produce http://…/uses

6

Page 8: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

The advantage of Oracle RDF Triple store: – Greater flexibility that single purpose triple stores

– SPARQL and SQL interaction with relationally stored data

– Use of SQL Hints, indexes and caching to increase performances

– Standard DB Administration : Backup/recovery/replication, etc…

– PL/SQL or Java programming

– Supports large volumes of data (100’s of billions to over a trillion)

– Good integration with standard RDF client tools such as Jena and Sesame

Why Oracle Spatial & Graph for Linked Data?

Oracle Semantic Graph in a scientific knowledge portal Date 16-09-2013

Page 9: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

GeoSPARQL Support for Spatial Data

Enterprise Data Servers

Spatial Database Population Statistics

Database

Relational Schema 2D Feature Schema

Web Analyst 1 Web Analyst 2

Linked Data Graphs

Pop_Stat_Graph Spatial_Graph

SPARQL/GeoSPARQL

Spatial Vocabularies

Rest

Page 10: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Enriching Text Using NLP and Domain Ontologies

NLP Machine Learning

Genzyme ontologies

Search, Presentation, Report, Visualization, Query

Page 11: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Data Ninja Text Analytics Cloud Services

12

Text Analytics

Ontology (RDF)

Oracle Social Cloud

Unstructured Data

Semantic Extractor

Relational Table

Oracle Spatial and Graph

Graph Analytics

Graph Visualization

Structured Data

New BusinessInsights

by making graph inferences that could not be queried in a relational database

Page 12: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graph Roadmap

• SPARQL optimization with RDBMS kernel

• SNA Analysis: Cluster, path analysis, community detection, page rank...

• Manageable: Enterprise Developer integration

• R2RML Enhancements: Geospatial (vector) features

• Deeper RDBMS kernel: Graph computation

• Standards based: OWL QL

• Multi-type support: graph, relational, JSON, text, geospatial …

• Visualization: Richer graph visualization options

13

Page 13: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Property Graph

14

Page 14: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

The Property Graph Data Model

• A set of vertices (or nodes) – each vertex has a unique identifier.

– each vertex has a set of in/out edges.

– each vertex has a collection of key-value properties.

• A set of edges (or links) – each edge has a unique identifier.

– each edge has a head/tail vertex.

– each edge has a label denoting type of relationship between two vertices.

– each edge has a collection of key-value properties.

https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

15

3

1

6

4

2

5

weight=0.4

weight=1.0

weight=0.2

weight=0.4

9

8 7

weight=0.5

10

12

11

knows

knows

created

created

created

created

weight=1.0

name= “ripple” lang = “java”

name= “lop” lang = “java”

name= “peter” age = 35

name=“josh” age = 32

name = “vadas” age = 27

name=“marko” age = 29

Page 15: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Graph Analysis in Business

Purchase Record

customer items

Product Recommendation Influencer Identification

Communication Stream (e.g. tweets)

Graph Pattern Matching Community Detection

Recommend the most similar item purchased by similar people

Find out people that are central in the given network – e.g. influencer marketing

Identify group of people that are close to each other – e.g. target group marketing

Find out all the sets of entities that match to the given pattern – e.g. fraud detection

16

Page 16: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Oracle Big Data Spatial and Graph

Data Access Layer

Architecture of Existing Property Graph Support

Graph Analytics

Apache Blueprints & Lucene/SolrCloud

RDF (RDF/XML, N-Triples, N-Quads,

TriG,N3,JSON)

REST/W

eb

Service

Java, Gro

ovy, P

ytho

n, …

Java APIs

Java APIs/JDBC/SQL/PLSQL Property graph formats supported

GraphML GML

Graph-SON Flat Files

CSV Relational Data Sources

Oracle NoSQL Database

Apache HBase

Parallel In-Memory Graph Analytics (PGX)

Oracle Spatial and Graph

Oracle Database 12.2

Java SDK

Java APIs

Page 17: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Support for Cytoscape Open Source Visualization

Page 18: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Integration with Tom Sawyer Perspectives via property graph REST APIs

Page 19: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

In-Memory Analyst on 1 node is up to 2 orders of magnitude faster than Spark GraphX distributed execution on 2 to 16 nodes

Oracle’s In-Memory Analyst vs Spark GraphX 1.1

20

0.1

1

10

100

1000

10000

Oracle

Spark (2

)

Spark (4

)

Spark (8

)

Spark (1

6)

Oracle

Spark (2

)

Spark (4

)

Spark (8

)

Spark (1

6)

Twitter Web

Exe

cuti

on

Tim

e (

secs

)

1

10

100

1000

10000

Oracle

Spark (2

)

Spark (4

)

Spark (8

)

Spark (1

6)

Oracle

Spark (2

)

Spark (4

)

Spark (8

)

Spark (1

6)

Twitter Web

Exe

cuti

on

Tim

e (

secs

)

Single-Source Shortest Path

Pagerank

Page 20: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Oracle Big Data Spatial and Graph

Data Access Layer

Roadmap for Property Graph Support

Apache TinkerPop3 & Lucene/SolrCloud/ElasticSearch

RDF (RDF/XML, N-Triples, N-Quads,

TriG,N3,JSON)

REST/W

eb

Service

Java, Gro

ovy, P

ytho

n, …

Java APIs

Java APIs/JDBC/SQL/PLSQL Property graph formats supported

GraphML GML

Graph-SON Flat Files

CSV Relational Data Sources

21

Oracle NoSQL

Database

Apache HBase

Oracle Spatial and Graph

Oracle Database 12.2

Apache Spark Integration (ML lib, SPARK-SQL)

Deep Learning (Neural Networks)

Graph Analytics

Parallel In-Memory Graph Analytics (PGX)

Apache Cassandra

Java SDK

Page 21: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Case Study: News Mining Application

22

Page 22: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Text Analytics API in N-TRIPLE Format

23

Free-form Texts

Structured Data

Documents Messages

News

Concepts Categories Entities Sentiments

• Cloud-based web services • Daily updated knowledge base • Support for customization • Scalable performance

Text Analytics

Page 23: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Text Analytics API for Constructing RDF Graphs

24

Free-form Texts

N-Triples

Documents Tweets

News

Concepts Categories Entities Sentiments

Text Analytics API RDF Graphs

Concepts

Categories

Entities

Entity Categories

Texts

Ontology

Page 24: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

News Mining Overview

newsID newsArticle newsSource

20160902_555 A new study says that parts of Africa and the Asia-Pacific region may be vulnerable to outbreaks of the Zika virus, including some of the world's most populous countries and many with limited resources to identify and respond to the mosquito-borne disease. [more]

http://www.newkerala.com/news/2016/fullnews-113309.html

20160903_1317 Hurricane Hermine, set to cause flooding and damage when it hits Florida overnight, will make it harder for the state to fight Zika, a mosquito-borne virus shown to cause birth defects, experts in infectious diseases and mosquitoes said on Thursday. [more]

http://kelo.com/news/articles/2016/sep/01/hurricane-hermine-will-complicate-floridas-zika-fight-experts/

20160904_2209 Singapore confirmed 26 more cases of locally transmitted Zika infections, the health ministry and National Environment Agency (NEA) said in a joint statement on Saturday, bringing the tally to 215. Of the 26 new cases, 24 were linked to existing clusters while two cases have no known links to any existing cluster, they said. [more]

https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-052937119--finance.html

… … …

• Domain-specific, health-related news crawling

• English language only

• Worldwide coverage

• Healthcare-related keywords in news titles

25

Page 25: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graph Example of Extracted Entities Subject Predicate Object

http://www.newkerala.com/news/2016/fullnews-113309.html

http://dataninja.net/occurrence urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/entity http://dataninja.net/entity/Zika+virus

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/occurrence/entity/sentiment http://dataninja.net/entity/sentiment/negative

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/occurrence/entity/count "12"^^xsd:integer

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/occurrence/entity/sentiment_score “-1.0"^^xsd:float

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/occurrence/entity/score "1.0"^^xsd:float

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/occurrence/entity/text_locations "(135,145) (565,575) (777,787) (950,960) (1142,1152) (1535,1545) (1696,1706) (1755,1765) (1887,1891) (2191,2195) (2352,2362) (2376,2386)"

(265 more for same news article)

26

Page 26: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graphs for Extracted Entities (one news article)

27

http://www.newkerala.com/news/2016/fullnews-113309.html

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/entity/Zika+virus

negative

12 …

http://dataninja.net/occurrence

http://dataninja.net/entity

http://dataninja.net/occurrence/entity/sentiment

http://dataninja.net/occurrence/entity/count

http://dataninja.net/entity/Philippines

http://dataninja.net/entity/Thailand

http://dataninja.net/entity/Nigeria

One occurrence-blank node for each extracted entity

Page 27: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graphs for Extracted Entities (multiple articles)

28

http://www.newkerala.com/news/2016/fullnews-113309.html

urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33

http://dataninja.net/entity/Zika+virus

http://www.newkerala.com/news/2016/fullnews-113309.html

urn:uuid:68282cbb-b70c-4f6e-8157-5ef6b1d34d31

https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-052937119--finance.html

urn:uuid:ab7b9e43-710f-436e-b6ff-15abad71ca15

Same URI for same entity

Page 28: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Ontology for Extracted Entities

29

http://dataninja.net/entity/Philippines

http://dataninja.net/entity/Thailand

http://dataninja.net/entity/Nigeria

http://dataninja.net/entity/category/Location http://dataninja.net/entity/category/Country

http://dataninja.net/entity/category/Kingdom

rdfs:subClassOf

Ontology extracted for categories of entities

rdfs:subClassOf

rdfs:subClassOf

Page 29: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Ontology for Extracted Entities (with more categories)

30

http://dataninja.net/entity/Philippines

http://dataninja.net/entity/Thailand

http://dataninja.net/entity/Nigeria

http://dataninja.net/entity/category/Location

http://dataninja.net/entity/category/Country

rdfs:subClassOf

http://dataninja.net/category/Southeast+Asia

http://dataninja.net/entity/category/Kingdom

http://dataninja.net/category/Regions+of+Asia

Additional categories of entities added to ontology http://dataninja.net/category/Africa

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

Page 30: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graphs for Extracted Concepts (one news article)

31

http://www.newkerala.com/news/2016/fullnews-113309.html

urn:uuid:3f365159-2572-4c91-99ea-0f7ec7c0b7bc

http://dataninja.net/concept/Zika+virus

0.33

http://dataninja.net/occurrence

http://dataninja.net/concept

http://dataninja.net/occurrence/concept/score

http://dataninja.net/entity/Zika+fever

Same URI for same concepts, but not for entities with same names

http://dataninja.net/entity/Zika+virus

owl:sameAs

Page 31: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graphs for Extracted Concepts (with categories)

32

http://dataninja.net/concept/Zika+virus

http://dataninja.net/concept/Zika+fever http://dataninja.net/category/Flaviviruses

http://dataninja.net/category/Zoonoses http://dataninja.net/category/Viral+diseases

http://dataninja.net/category/Infectious+diseases

rdfs:subClassOf

rdfs:subClassOf

More categories of concepts added to improve richness of ontology

rdfs:subClassOf

rdfs:subClassOf rdfs:subClassOf

Page 32: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

RDF Graphs for Extracted Relationships

33

https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-052937119--finance.html

http://dataninja.net/entity/Zika+virus http://dataninja.net/entity/Singapore

http://dataninja.net/occurrence

http://dataninja.net/entity

http://dataninja.net/relationship/Outbreak

http://dataninja.net/relationship/Mosquitoes

http://dataninja.net/relationship/Infections

New relationships discovered over time to

enrich the ontology further

owl:intersectionOf

Page 33: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Semantic Search using RDF Graphs

34

Documents Documents

News Articles

Oracle Spatial and Graph

Concepts, related concepts, categories, entities, entity

categories, keywords, relationships

Relevant Matched

News Articles

Oracle Graph Analytics

Queries

RDF Graph

Page 34: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Case Study: Insights from analyzing Amazon Product Reviews

35

Page 35: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Amazon Product Reviews – PG Data Model

36

A

1 5

Helpful reviewText

Overall Summary

reviewTime

Review

created

asin=“0000078”

name=“John” Raw JSON Format: {"reviewerID": "A3AF8FFZAZYNE5",

"asin": "0000000078",

"helpful": [1, 1],

"reviewText": “…”,

"overall": 5.0,

"summary": "Impactful!",

"unixReviewTime": 1092182400,

"reviewTime": "08 11, 2004"}

B C

3

D

2

Review Review Review Review Review

name=“Sue” name=“buy1” name=“shopper”

asin=“10467328” asin=“00675434” asin=“20794378”

Page 36: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Amazon Product Reviews – Data Ninja Enrichment

37

A

1 5

helpful reviewText

overall summary

reviewTime sentiment

sentimentScore

Review

created

asin=“0000078”

name=“John”

B C

3

D

2

Review Review Review Review Review

name=“Sue” name=“buy1” name=“shopper”

asin=“10467328” asin=“00675434” asin=“20794378”

JSON

Parser Fetch

Sentiment

Create

Nodes Create

Relationship

Oracle

Connector

Product Review

Oracle Big Data Spatial and Graph

Oracle NoSQL Database Apache HBase

Product Review Product Review

Page 37: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Data Ninja Integration

# Please sign-up at https://market.mashape.com/dataninja/smart-content

# and obtain your free Data Ninja API key.

# Alternatively, you can use the Amazon Web Services API Gateway

# using your AWS account): https://auth.dataninja.net/cart

smartcontent_url = 'https://smartcontent.dataninja.net/smartcontent/tag'

mashape_key = ‘YOUR_API_KEY_HERE’

headers = {'Content-Type': 'application/json',

'Accept': 'application/json',

'X-Mashape-User': user_name,

'X-Mashape-Key': mashape_key}

Page 38: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Data Ninja Integration

def getSmartSentiment(text):

payload = {'text': text}

r = requests.post(smartcontent_url, headers=headers,

data=json.dumps(payload))

data = r.json()

# Extract the sentiment and sentiment_score from output

sentiment = ''

if 'sentiment' in data:

sentiment = data['sentiment']

sentScore = 0.0

if 'sentiment_score' in data:

sentScore = data['sentiment_score']

return sentiment, sentScore

Page 39: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Initialization

# Log into the Oracle Big Data Lite VM

cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/sh

gremlin-opg-nosql.sh

server = new ArrayList<String>();

server.add(“localhost:5000");

cfg = GraphConfigBuilder.forPropertyGraphNosql() \

.setName(“aws_review").setStoreName("kvstore") \

.setHosts(server) \

.addVertexProperty("name", PropertyType.STRING, “EMPTY_NAME") \

.addEdgeProperty("overall", PropertyType.DOUBLE, "0.0") \

.addEdgeProperty("sentimentScore", PropertyType.DOUBLE, "0.0") \

.addEdgeProperty("sentiment", PropertyType.STRING, "NO_SENTIMENT") \

.addEdgeProperty("reviewText", PropertyType.STRING, "NO_REVIEW") \

.setMaxNumConnections(2).build();

Page 40: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Create session

// Create an in-memory instance of our property graph using

// the configuration from the previous step

opg = OraclePropertyGraph.getInstance(cfg);

// Create a new Analyst session and read the graph from database

// into memory — this will allow us to perform PGQL queries

// efficiently and run built-in graph algorithms

session = Pgx.createSession("session1");

analyst = session.createAnalyst();

pgxGraph = session.readGraphWithProperties(cfg);

Page 41: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — PGQL queries // PGQL is a SQL-like query language for Property Graphs

// http://pgql-lang.org/

query1 = “SELECT n, e, e.overall, e.sentimentScore, m ” +

“WHERE (n) -[e]-> (m) LIMIT 10”;

pgxResultSet=pgxGraph.queryPgql(query1);

pgxResultSet.print(10);

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

| n | e | m |

===============================================================================================

| PgxVertex[ID=-7340878287527889238] | PgxEdge[ID=5762] | PgxVertex[ID=-9102601091582098129] |

| PgxVertex[ID=-3177690238472796119] | PgxEdge[ID=16300] | PgxVertex[ID=-9064039503677645533] |

| PgxVertex[ID=4519911688218637303] | PgxEdge[ID=17019] | PgxVertex[ID=-8952286227085815033] |

| PgxVertex[ID=-519930175215930092] | PgxEdge[ID=10178] | PgxVertex[ID=-8670116947875050439] |

| PgxVertex[ID=-3248157193225014577] | PgxEdge[ID=10818] | PgxVertex[ID=-8450344604270036796] |

| PgxVertex[ID=1160440609280744779] | PgxEdge[ID=11251] | PgxVertex[ID=-8079550817648245886] |

| PgxVertex[ID=6181033568449534264] | PgxEdge[ID=8948] | PgxVertex[ID=-7996993222650009100] |

| PgxVertex[ID=8061500766289030429] | PgxEdge[ID=3605] | PgxVertex[ID=-7826585563510228947] |

| PgxVertex[ID=-6856157354094250528] | PgxEdge[ID=5813] | PgxVertex[ID=-7593018979011067527] |

| PgxVertex[ID=862019015540675002] | PgxEdge[ID=1018] | PgxVertex[ID=-7556968917107238591] |

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Page 42: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Aggregate queries (1)

// Example1: Disagreement in polarity: high rating and low sentiment score

query2 = “SELECT n.name, e.overall, e.sentimentScore, e.reviewText, m “ +

“WHERE (n) -[e with overall > 4.0 and sentimentScore < -0.9]-> (m) “ +

“order by e.sentimentScore LIMIT 10”;

pgxResultSet=pgxGraph.queryPgql("query2");

pgxResultSet.print(10);

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

| n.name | e.overall | e.sentimentScore | e.reviewText | m |

===============================================================================================================================================================================================

| Kiwi | 5.0 | -0.90514827 | She climbed out of the cockpit of her Fairey Barracuda and became instantly famo | PgxVertex[ID=1878509548385937579] |

| Gary Selikow | 5.0 | -0.90514827 | The Holocaust A History of the Jews of Europe During the Second World War , by p | PgxVertex[ID=9122138607977681669] |

| Miss Calculation "Mathbaby" | 5.0 | -0.90514827 | There I was. Probably the only one in the movie theater above the age of thirtee | PgxVertex[ID=-611636155378504919] |

| Srinivas P. Ganti "prasad" | 5.0 | -0.90514827 | In a very exhaustive account of Middle Eastern politics, Friedman narrates, base | PgxVertex[ID=7872217753950946849] |

| Bluestalking Reader "Bluestalking Reader" | 5.0 | -0.90514827 | I guess the only way to do this is just plunge right in, though of all the books | PgxVertex[ID=4467821667800686818] |

| Bonnie Brody "Book Lover and Knitter" | 5.0 | -0.90514827 | Joyce Carol Oates has written a deeply felt memoir, ̀ A Widow's Story', following | PgxVertex[ID=4467821667800686818] |

| Stephen Frater | 5.0 | -0.90514827 | Book reviewBy STEPHEN FRATER, author of HELL ABOVE EARTHLOST IN SHANGRI-LA: | PgxVertex[ID=5830558107292558467] |

| Cy B. Hilterman "Cy. Hilterman" | 5.0 | -0.90514827 | A true historic story of survival in the jungles of New Guinea amidst natives wh | PgxVertex[ID=5830558107292558467] |

| Cy B. Hilterman "Cy. Hilterman" | 5.0 | -0.90514827 | What a delightful read! Water for Elephants has got to be one of the best reads | PgxVertex[ID=5894498295248166816] |

| John Umland | 5.0 | -0.90514827 | I read Unbroken in two days. I will summarize the story, mention the author's ef | PgxVertex[ID=-5439053811866244671] |

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Page 43: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Aggregate queries (2)

// Example2: Disagreement in polarity: low rating and high sentiment score

query3 = “SELECT n.name, e.overall, e.sentimentScore, e.reviewText, m “ +

“WHERE (n) -[e with overall < 2.0 and sentimentScore > 0.9]-> (m) “ +

“order by e.sentimentScore LIMIT 10”;

pgxResultSet=pgxGraph.queryPgql("query3");

pgxResultSet.print(10);

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

| n.name | e.overall | e.sentimentScore | e.reviewText | m |

=====================================================================================================================================================================================

| Amazon Customer | 1.0 | 0.90086615 | This title is deceptive-- no one knows what an &#34;Annual&#34; is, except for t | PgxVertex[ID=-5918987544460979951] |

| Galina | 1.0 | 0.90110934 | This book takes many, many pages to say in a remarkably roundabout and flowery w | PgxVertex[ID=-4968252386747415161] |

| Elizebeth Neumann | 1.0 | 0.90114343 | Unless you enjoy reading a book as interesting as the dictionary this book isnt | PgxVertex[ID=5415848389720693761] |

| Doug Rice | 1.0 | 0.90118825 | A dictionary should demonstrate good lexicographic technique and have an up-to-d | PgxVertex[ID=-8360052157946045560] |

| Doug Rice | 1.0 | 0.9011979 | A dictionary should demonstrate good lexicographic technique and have an up-to-d | PgxVertex[ID=-5498908216507816124] |

| Kindle Reader "Kindle Reader" | 1.0 | 0.90166533 | This was positively the most frustrating book I have ever read. Where others mi | PgxVertex[ID=-4463070554159192016] |

| Alessandro Bruno | 1.0 | 0.90180194 | I felt compelled to review this book in order to shake off that feeling of intel | PgxVertex[ID=3280190210596483762] |

| Hiwaycruzer | 1.0 | 0.9018065 | This book is a must read for all teenagers considering a career at nearby Disney | PgxVertex[ID=3280190210596483762] |

| Jackal | 1.0 | 0.9019033 | This is a boring book about traditional Russian cooking. If you want current Rus | PgxVertex[ID=2190854144543979320] |

| Amazon Customer "Sci-reader" | 1.0 | 0.90192723 | I just finished this book and I must ay that it was a spectacularly boring coll | PgxVertex[ID=-6659798236378008734] |

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Page 44: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Demo — Graph Algoritms

// Personalized Pagerank

vertexSet = pgxGraph.createVertexSet();

vertex = pgxGraph.getVertex(4681900072665192241L);

vertexSet.add(vertex);

ppr = analyst.personalizedPagerank(pgxGraph, vertexSet);

it = ppr.getTopKValues(10); // iterate over the top-K values

// Community detection, Path Analysis, Clustering, …

Page 45: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Summary

• Introduction and overview of graph technologies and graph database • RDF Semantic Graph

• Property Graph

• Integrating text analytics with graph technologies • Construct graph out of text using Natural Language Understanding technologies

• Enrich graph data with text analytics

• Data Ninja Services Java client for Oracle Spatial and Graph available with the Oracle Big Data Lite Virtual Machine • Please try it and give us your feedback!

• Contact us at [email protected] or [email protected]

Page 46: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Resources

• Oracle Spatial and Graph

oracle.com/technetwork/database/options/spatialandgraph

• Oracle Big Data Spatial and Graph

oracle.com/database/big-data-spatial-and-graph/index.html

• Data Ninja Services

https://dataninja.net

• Java SDK for Oracle Spatial and Graph

https://github.com/DataNinjaAPI/dataninja-api-oracle-sdk-java

Page 47: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

BACKUP

48

Page 48: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. 49

Semantic Alignment of Enterprise Metadata Powering Enterprise Federation and Integration

Benefits:

– Existing relational data stays in place and corresponding applications do not need to change

– Use of virtual mapping eliminates synchronization issues

– Common vocabulary helps with data integration issues

Database Server

HR Schema Inventory Schema Sales Schema

Mid-Tier Server

Application 1

Application 2 Application 3

SQL RDF Graph

Inventory Graph Sales Graph

Shared Ontologies

SPARQL

HR Database Inventory Database Sales Database

Page 49: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

The National Statistics Center (NSTAC), an incorporated administrative agency, forms a part of the central statistical organization in Japan.

The Database of IMISOS has been Exadata X2-2 Half Rack since 2013,with Active Data Guard option and Database Firewall. Oracle Japan published customer case study.

NSTAC also bought Exadata X3-2 Eighth Rack for the Tabulation Work. (FY14Q4)

Other Exadata opportunity for population census will be closed by FY15Q3.

50

http://www.nstac.go.jp/en/index.html

Page 50: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. 51

• Pattern matching on relational tables

• Supports W3C RDF & SPARQL standard

• Automatic and custom mapping

• RDF views: on tables, views, SQL query results

• No duplication of data and storage

• Direct Mapping – Automatic

• R2RML - express customized mappings

RDF Semantic Graph RDF Views on Relational Tables

EmpNo Ename Job Mgr DeptNo

7521 Ward Salesman 7698 10

7698 Blake Manager 7839 10

7839 King President 30

DeptNo LOC

10 NYC

30 CHI

Ward Blake King

Salesman Manager President

:emp7521 :emp7698 :emp7839

:dept10 :dept30

NYC CHI

:name :name :name :job :job :job

:hasMgr :hasMgr

:worksAt :worksAt :worksAt

:location :location

Page 51: Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.

Text Search through Apache Lucene/SolrCloud

• Integration with Apache Lucene & SolrCloud

• Support manual and auto indexing of Graph elements

• Manual index:

• oraclePropertyGraph.createIndex(“my_index", Vertex.class);

• indexVertices = oraclePropertyGraph.getIndex(“my_index” , Vertex.class);

• indexVertices.put(“key”, “value”, myVertex);

• Auto Index

• oraclePropertyGraph.createKeyIndex(“name”, Edge.class);

• oraclePropertyGraph.getEdges(“name”, “*hello*world”);

• Enables queries to use syntax like “*oracle* or *graph*”

52


Recommended