+ All Categories
Home > Documents > Technological White Paper_sones GraphDB

Technological White Paper_sones GraphDB

Date post: 07-Oct-2014
Category:
Upload: geantdepapier
View: 21 times
Download: 2 times
Share this document with a friend
Popular Tags:
17
Technological White Paper sones Graph Database Technology
Transcript
Page 1: Technological White Paper_sones GraphDB

Technological White Paper

sones Graph Database

Technology

Page 2: Technological White Paper_sones GraphDB

2

Contents 1. Introduction ................................................................................................................................................... 3

1.1. Components of the sones graph database ....................................................................................... 3

1.2. Data model .......................................................................................................................................... 5

2. USPs ............................................................................................................................................................. 7

2.1. Index-free adjacency........................................................................................................................... 7

2.2. Handling semi-structured data ........................................................................................................... 7

2.3. Dynamic type extension ..................................................................................................................... 7

2.4. Graph query language ........................................................................................................................ 7

2.5. Solving the object-relational depiction problem ................................................................................. 8

2.6. HTTP/REST API ................................................................................................................................. 9

2.7. Traverser API .................................................................................................................................... 10

3. Technical case study .................................................................................................................................. 11

Bibliography .......................................................................................................................................................... 14

Glossary ................................................................................................................................................................ 15

Page 3: Technological White Paper_sones GraphDB

3

1. Introduction

This white paper is intended for IT professionals needing more in-depth information on the sones

graph database technology.

Many problems encountered in every-day (IT) life (e.g., hyperlinks, navigation, who-knows-who)

can be depicted with graphs. A graph is a tuple consisting of the set V of nodes (vertices) and the

set E of edges. The latter depicts the relationship between elements in V.

A graph-oriented database uses this structure to present and administer information.

In the following, we will first explain the sones graph database technology in greater detail and

then draw a comparison to a relational database model. This will be followed by an illustration of

the technology's unique features followed by a technical case study.

1.1. Components of the sones graph database

The graph below illustrates the components of the sones graph database. It is comprised of four

layers – the storage medium, the GraphFS, the GraphDB and the GraphDS.

Illustration 1 Components of the sones graph database

The bottom layer consists of an interface to a number of storage media. These include both local

file systems such as NTFS or Ext4 as well as storage service providers such as Microsoft Azure

or Amazon S3. In addition to using the persistent varieties mentioned above, it is possible to store

data in an in-memory-only structure. (Edlich, Friedland, Hampe, & Brauer, 2010)

Page 4: Technological White Paper_sones GraphDB

4

Based on this first layer, we then have the GraphFS, which provides object management. This

includes, among other things, management of the following aspects:

Object namespace

Object identities

Object data flows

Object editions

Object revisions

Unlike traditional file systems, the GraphFS manages all of an object's information as metadata of

that object and controls the distribution of the data on the integrated storage media. This enables

a hybrid approach consisting of fast, local storage media and network storage providers. In

addition to storing objects, the MVCC principle is applied when accessing these objects. This

principle enables concurring access without blocking. (Edlich, Friedland, Hampe, & Brauer, 2010)

The purpose of the GraphDB is to manage property hypergraphs. This includes type, nodes,

index and plugin administration. GraphDB type administration (create, alter, drop) is able to

manage node types hierarchically in an ontology. Here, both the definition of abstract types as

well as the inheritance of attributes is possible. Node administration regulates the manipulation

(insert, update, delete,…) of nodes within the property hypergraph.

The GraphDB also manages indexes and the manipulation of these (insert, update, rebuilding,

reorganization). The GraphDB can be extended modularly in many areas (aggregates, functions,

settings, …). This is handled by an administration system that takes care of coordinating the

different components. The GraphDB's main function is to provide logic for presenting and

manipulating the property hypergraph. This includes implementing projection and selection as well

as manipulating nodes and dynamically extending the type schema. An integrated subgraph

matching engine is necessary to enable the selection of nodes. This engine generates subgraphs

using Boolean expressions. In addition to logic and administration, the graph query language can

also be used to make ad hoc queries to the database. (Edlich, Friedland, Hampe, & Brauer, 2010)

The GraphDS module combines the above-mentioned components into a whole. The GraphDS is

an interface for user applications and offers an entire spectrum of access options such as REST,

.Net, Java and WebDAV. (Edlich, Friedland, Hampe, & Brauer, 2010)

Page 5: Technological White Paper_sones GraphDB

5

1.2. Data model

Reference was made to the property hypergraph model in the last section. In this section it will be

discussed in further detail.

The property hypergraph

The property hypergraph is an extension of the property graph data model, which was established

in the past few years. A property graph is a directed multi-relational graph. The nodes and edges

of this graph are comprised of objects and the semi-structured properties embedded therein.

These are key/value relationships whose keys and values can be specified by the relevant node

and edge type. In this case, an edge is a special case of a property value.

Extension to a hypergraph is based on the use of hypernodes, which act as information carriers of

additional information in the context of the edges.

Illustration 2 Diagram of a property hypergraph

Illustration 2 gives an example of the structure of a property hypergraph. The user nodes Alice,

Bob and Carol are shown. These have been assigned the property "Age" in addition to a unique

ID. Alice has also been assigned the attribute "Friends," which is realized as a hypernode to a

number of other nodes of the "User" vertex type. Here you can see that is possible to enter

information in the hypernode as well as in the nodes it contains.

Page 6: Technological White Paper_sones GraphDB

6

Node definition (vertex type)

The defined number of properties in a node is specified via the vertex type. Just like the node

instances in the sones graph database, this is an object in the GraphFS and therefore contains

metadata such as ID, name and position. In addition, the entered indexes and a reference to the

super vertex type are stored in the vertex type. The purpose of the latter is to realize the ontology,

which allows the user to inherit properties. Another function of the vertex type is to define the

structured number of node properties. It is also possible to define constraints such as uniqueness

or mandatory for these. The properties of the sones graph database can generally be divided into

two categories.

1. Basic properties…

are properties whose value equals basic data type instances such as string, integer or

Boolean. Collections (lists, amounts) of basic values can also be entered as a property.

2. Edge properties…

are properties that connect a sones graph database node with a number of other nodes.

These fulfill the node properties in the property hypergraph mentioned above.

Node instance (vertex)

Node instances act as a container for the properties they contain. They are comprised of multiple

streams, which are depicted and explained in brief in Illustration 3. Each of the entered streams

contains a dynamic number of editions and revisions. This enables both semantic (editions) as

well as temporal (revisions) information management. It is also possible to access individual

streams separately.

Illustration 3 Schematic diagram of a node instance

Page 7: Technological White Paper_sones GraphDB

7

2. USPs

Now that the components of the sones graph database have been described and the basic data

model has been illustrated, this section of the white paper will discuss the USPs in relation to

relational state-of-the-art databases.

2.1. Index-free adjacency

Graph databases like the sones GraphDB address the paradigm of the index-free adjacency. That

means that is not necessary to manage a global index for relationships between nodes/entities.

The linked objects contain direct reference to their adjacent neighboring nodes. There is no need

to search a relation table in order to locate relevant information down to a thousandth of a percent.

This makes it possible to optimally scale the hypergraph since there is no need to manage

extensive relation tables.

2.2. Handling semi-structured data

In the 70s and 80s, most people followed the paradigm of structured data administration. To this

day, the approach is still being implemented in almost all state-of-the-art RDBMSs. The semi-

structured data approach was established in the mid-90s. This approach is based on the fact that

many application areas rarely allow a structured table structure due to complex information

characteristics. An example of such problem domains can be found in bioinformatics or the

Semantic Web. The sones graph database is able to store and retrieve unstructured properties in

any node of the graph. The idea is also to transfer unstructured data to structured data and vice

versa.

2.3. Dynamic type extension

Another advantage is that structured data can be dynamically extended with high performance in

nodes and edges during runtime. Additional properties can easily be entered or deleted from

vertex types in a short amount of time. The number of nodes is irrelevant here. By contrast,

changing relational data schemas at a later time is very time and resource intensive.

2.4. Graph query language

The sones GraphQL is a user-friendly domain-specific language and can be thought of as an

"SQL for graphs." The similarity to SQL is intentional and makes the transition much easier for

developers/consultants. It enables queries to the sones graph database property hypergraph and

can be dynamically extended during runtime using plugins such as functions or aggregates. When

Page 8: Technological White Paper_sones GraphDB

8

an SQL query on the RDBMS is as long as half a novel (see complex JOINs), the GraphQL

equivalent is usually much shorter and much more intuitive. Here is an example of this type of

query:

„FROM User SELECT Enemies.Enemies.Name WHERE Name = 'JohnSmith'“

Analogously, the name of all Enemies' Enemies would be searched starting with "JohnSmith." The

same query on an RDBMS could also look like this:

“SELECT u_end.Name FROM User u_start

CROSS JOIN Enemies e1

CROSS JOIN Enemies e2

CROSS JOIN Enemies e3

CROSS JOIN User u_end

ON

(

u_start.user_id = e1.user_id

AND e1.enemy_id = e2.user _id

AND e2.enemy_id = e3.user _id

AND e3.enemy_id = u_end.user_id

)

WHERE u.Name = 'JohnSmith'“

In addition to the GraphQL, it is also possible to operate all other DSLs with the sones graph

database, since language and logic exist separately from one another.

2.5. Solving the object-relational depiction problem

Depicting object-oriented programming language objects in an RDBMS calls for what is known as

an object-relational mapper (see Illustration 4). This is due to the fact that, conceptually speaking,

the OOP and RDBMS paradigms are fundamentally different. Objects encapsulate their state

behind an interface and have a unique identity.

Page 9: Technological White Paper_sones GraphDB

9

Illustration 4 Object-relational mapper

Unlike the above, RDBMSs are based on the mathematical concept of relational algebra. In the

90s, this contradiction was referred to as an object-relational depiction problem.

Illustration 5 No O/R mapper

The sones graph database solves this dilemma by implementing an object-oriented concept (see

the highly simplified diagram in Illustration 5). This results in better integration into object-oriented

languages, since no O/R mapper is required.

2.6. HTTP/REST API

In addition to a number of interfaces (e.g., Java, C#, WebShell, WebDAV) the sones graph

database also offers a REST API. This enables uncomplicated interaction with state-of-the-art

web technologies. A REST query is all that is required to execute CRUD operations directly on the

database.

Page 10: Technological White Paper_sones GraphDB

10

2.7. Traverser API

Another important feature of the sones graph database is the Traverser API. This feature makes it

possible to analyze local data. Based on a number of nodes (local), neighboring nodes can be

searched recursively (breadth/depth first). With this method, for example, local rankings,

recommendations or (path) searches can be realized. Results of a traversal include paths, a

number of nodes or an aggregate result.

Realizing this technology with an RDBMS is highly resource intensive, since each step to a

neighboring node has to be depicted with a JOIN. In contrast, the sones property hypergraph

concept allows direct access to neighboring nodes by eliminating the edge attribute.

Page 11: Technological White Paper_sones GraphDB

11

3. Technical case study

This section illustrates a technical case study on implementing a keyword recommendation

engine. The purpose of this engine is to generate relevant keywords based on a click-path

analysis. 200,000 paths were analyzed during the study. Together, these paths contained around

5,000 keywords. An individual path contained around 10-30 keywords.

Entering data

Before a keyword can be recommended, the data base needs to be uploaded onto the sones

graph database. This process is divided into two steps, which will be explained below:

1. Generating vertex types

The first step is to define a schema for the data base. This includes entering the vertex

type keyword and path in subpoints a and b. The latter contains a hyperedge on the node

of the vertex type keyword. In c, what is known as a backward edge attribute is added to

the keyword type generated in a. This makes it possible to select and project implicit

incoming edges in keyword instances. This new property is named UsedInPaths and

specifies the usable incoming edges. This makes it possible for neighbors of the keyword

hyperedge to contain an explicit incoming edge to the corresponding path.

a. “CREATE VERTEX Keyword”

b. “CREATE VERTEX Path ATTRIBUTES (SET<Keyword> Keywords)”

c. “CHANGE VERTEX Keyword”

ADD BACKWARDEDGES (Path.Keywords UsedInPaths)”

2. Generating nodes

Once the vertex types have been generated, the data base itself can be generated.

a. Generating keywords

First, the keywords are uploaded onto the sones graph database. Only the unique

node IDs (UUID) are set during this process. Other properties are not necessary.

“INSERT INTO Keyword VALUES (UUID='keyWordID1')”

b. Generating paths

Once the keywords have been entered, the path node generation process can

begin. As with the keyword instances, the UUID is set. The hyperedge keywords

are also filled in by referring to their unique ID. As soon as this step has been

completed it is possible to traverse from a path node to its adjacent keyword node.

This contains the explicit incoming edge, which enables "backward" movement.

„INSERT INTO Path VALUES(UUID = 'Path1',

Page 12: Technological White Paper_sones GraphDB

12

Keywords = SETOF(

UUID='keyWordID1' ,

UUID='keyWordID2' ,

UUID='keyWordIDn'))”

Generating recommendations

Once the data base for the recommendations has been created, the next query is used to

calculate the top 10 potentially interesting keywords.

"FROM Keyword

SELECT FindMatching(UsedInPaths)

WHERE UUID IN ['keyWordID1', 'keyWordID2', 'keyWordIDn']”

The enquiry is comprised of three parts:

1. FROM Keyword

Selecting the vertex type keywords that act as the vertex type reference for subsequent

projection and selection.

2. SELECT FindMatching(UsedInPaths)

The actual recommendation is generated in this step. This is done with the help of the

FindMatching aggregate function, which works on the UsedInPaths hyperedge of the

keywords selected in step 3. As mentioned above, this edge provides the path instances

that contain the selected keywords. A frequency analysis of the keywords in this node is

conducted in order to make a recommendation. The order of the keyword nodes selected

in point 3 is also important since their influence on the result decreases as the index

increases ('keyWordID1' is more important than 'keyWordID2').

3. WHERE UUID IN ['keyWordID1', 'keyWordID2', 'keyWordIDn']”

The third and final step of the query provides the relevant keyword instances for the

projection in point 2.

Page 13: Technological White Paper_sones GraphDB

13

Benchmark

The measurements below were generated on the open source edition of the sones graph

database and the sones embedded C# API was used.

.NET Mono

Number of paths 10,000 100,000 200,000 10,000 100,000 200,000

Import duration (in sec) 220 1,840 2,940 140 1,560 3,300

Recommendations / sec 2,827 2,539 2,339 2,022 2,030 1,979 Table 1 Recommendation engine performance

Table 1 illustrates performance on a DELL Latitude E6400 notebook (Core2Duo 2.60GHz,

4.00GB RAM) using the .NET framework and the Linux equivalent, Mono. The second line

("Number of objects") shows the number of paths that were analyzed as a basis for the engine.

The time needed to upload the relevant number of paths onto the sones graph database is

indicated as well. The last line provides information on the number of recommendations per

second.

Page 14: Technological White Paper_sones GraphDB

14

Bibliography Edlich, S., Friedland, A., Hampe, J., & Brauer, B. (2010). NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken (NoSQL: Introduction to the world of non-relational web 2.0 databases). Hanser Fachbuchverlag.

Page 15: Technological White Paper_sones GraphDB

15

Glossary

Term Explanation

Backward edge Backward edges are incoming edges on nodes.

CRUD CRUD stands for create, read, update, delete, i.e., fundamental

database operations.

DSL DSL is an abbreviation for domain-specific language. A formal

language which is developed for a specific problem.

GraphQL Graph query language (GraphQL) is a query language developed by

sones that can be thought of as an "SQL for graphs."

GraphDB The GraphDB (graph database) is a component of the sones graph

database technology and handles administration of the property

hypergraphs.

GraphDS The GraphDS (graph data storage) combines the GraphDB and the

GraphFS into a whole. The GraphDS is an interface for user

applications and offers an entire spectrum of access options such as

REST, .Net, Java and WebDAV.

GraphFS The GraphFS (graph file system) is a component of the sones graph

database technology and provides abstract object management.

Hyperedge A hyperedge is able to connect a node to more than one node.

JOIN A JOIN is the linked execution of Cartesian product and selection.

Edge An edge is a structural element in any graph. It specifies the

connection of nodes. A special case is what is referred to as a

hyperedge.

Edge type The defined number of properties in an edge is specified via the edge

type.

Nodes Nodes are elementary components of a graph and act as a container

for properties in the sones graph database.

Node type The defined number of properties in a node is specified via the node

type.

MVCC Multiversion concurrency control (MVCC) refers to a technology that

is used to avoid conflict between read-only and write access to the

same object. In the context of a database, MVCC makes

simultaneous access possible.

Page 16: Technological White Paper_sones GraphDB

16

Ontology An ontology is a formally ordered number of concepts and their

relationships to one another.

OOP OOP stands for object-oriented programming and specifies a

programmer paradigm.

Property graph A property graph is a directed multi-relational graph. The nodes and

edges of this graph are comprised of objects and the semi-structured

properties embedded therein. These are key/value relationships

whose keys and values can be specified by the relevant node and

edge schema. In this case, an edge is a special case of a property

value.

Property hypergraph The extension of a property graph to a hypergraph is based on the

use of hyperedges, which act as carriers of additional information in

the context of the edges.

RDBMS A relational database management system (RDBMS) manages a

relational database.

REST Representational state transfer (REST) refers to a software

architecture style for distributed hypermedia information systems such

as the World Wide Web. It suggests that each resource be addressed

with its own unique identifier.

SQL Structured query language (SQL) is a query language used to define,

access and manipulate data in relational databases.

Traverser A traverser enables local data analysis. Based on a number of nodes

(local), neighboring nodes can be searched recursively

(breadth/depth first). With this method, for example, local rankings,

recommendations or (path) searches can be realized.

WebShell WebShell is a console depicted via the browser, which enables

interaction with the sones graph database.

Page 17: Technological White Paper_sones GraphDB

17

sones GmbH November 2010 sones GmbH Eugen-Richter-Str. 44 99085 Erfurt Germany Tel.: +49(0) 361 - 30 26 25 0 Fax.: +49(0) 361 - 244 500 8

© 2010 sones GmbH. All rights reserved. sones and its logos are registered trademarks of sones GmbH. All other names of products and services are trademarks of the associated company. The information contained in this publication is non-binding and is intended for informational purposes only. Products may vary depending on the country. Information contained in this publication may be modified without prior notice. The information contained herein has been provided by sones and is intended for informational purposes only. sones does not assume any liability or guarantee for errors of inconsistencies in this publication. No further liability may ensue from information contained in this publication.


Recommended