Date post: | 20-Feb-2017 |
Category: |
Software |
Upload: | roberto-franchini |
View: | 234 times |
Download: | 1 times |
Where are yours vertexes and what are they talking
about?Roberto Franchini
whoami(1)
More than 15 years of experience, proud to be a programmer
Member of OrientDB team, tech lead for full-text & spatial indexes, JDBC driver and Docker images
Wrote software for NLP and opinion mining on fast data/big data
JUG-Torino co-lead
#orientdb at #jugmi
Meet OrientDBThe First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs
The Five Imperatives
1. Availability and Integrity2. Scalability and Performance3. Relationships and Connections4. Data Model Complexity5. Agility and Ease of Use
Availability and Integrity
• Atomic, Consistent, Isolated and Durable (ACID) multi-statement transactions
Master Node
Master Node
CC C C
CCC
Multi-master Replication
Scalability and Performance
• Multi-Master Replication, Sharding and Auto-Discovery to Simplify Ops
Master Node
Master Node
CC C C
CCC
Auto-Discovere
d Node
Complex Relationships
No costs to traverse relationships:• Recommendation engines• Master Data Management• Information Clustering• Social Applications• Spatial Apps
JohnThriller
Comedy
Pulp Fiction
Mr Bean
TheaterB
TheaterA
Theater C
NYC
San Josè
Lives in
Likes
`
Flexible Data Model
{ ”@rid": “12:382”, ”@class": ”people", “first”: “John”, “last” : “Power”, “details”: {
“city”:”London",
“tags”:”millenial” }}
John
Comedy
Likes
General purpose solution:• Schema-less • Nested documents• Rich indexing and querying• Developer friendly
Agility and Ease of Use
• Flexible data model supports rapid iterations
• Hybrid or schema full guarantee data quality
• Graph model allows natural modeling of complex relationships
{ ”@rid": “12:382”, ”@class": ”people", “first”: “John”, “last” : “Power”, “details”: {
“city”:”London", “tags”:”millenial”
}}
developers are more productive and programming is
easier
API & Standards
• Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API
• SQL + extensions for graphs• JDBC driver to connect any BI tool• HTTP/JSON support• Drivers in Java, Node.js, Python,
PHP, .NET, Perl, C/C++ and more
A multi-model operational database can be the system of records for modern enterprises and the database of choice for ISV/OEMs
Snow Patrol(Band)
Luca(Accou
nt)
Indie(Genre
)123, 1st
Street Austin, TX
(Location)
Jill(Accou
nt)
Graphs{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Jill”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” }}
Schema-less structures
Object Oriented
Key-Value pairs
Geo-Spatial
Full-Text
Graph
Document
Object
Key/Value
Multi-Model represents the intersection
of multiple models in just one product
Full-Text
Spatial
Multi-model
Graph databases
Order #134(Order)
John(Provider)
Commodore
Amiga 1200
(Product)
Frank(Customer
)
Monitor 40”
(Product)
Mouse(Product)
Bruno(Provider)
Just Data
Order #134(Order)
John(Provider)
Commodore
Amiga 1200
(Product)
Frank(Customer
)
Monitor 40”
(Product)
Mouse(Product)
Bruno(Provider)
Data by itself has little value, it’s the relationship
between data that gives it
incredible value
Order #134(Order)
John(Provider)
Commodore
Amiga 1200
(Product)
(Sells)
Frank(Customer
)
(Has)
(Makes)
Monitor 40”
(Product)
(Sells)
(Has)
Mouse(Product)
Bruno(Provider)
(Sells)
(Has)
Data and relationships
Every developer knowsthe Relational Model,
but who knows theGraph one?
Back to school:Graph Theory crash
course
Basic Graph
Roberto MilanVisite
d
Vertices and Edges can have properties
Vertices are directed
* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
Property Graph Model*
Milancountry: Italy
Roberto
company: OrientDB
Vertices and Edges can have properties
Vertices and Edges can have properties
Visited
on: 2016
RobertoMilan
Visited
on: 2016
An Edge connects only 2 vertices
Use multiple edges to represent 1-N and N-M relationships
Worked
on: 2016
1-N and N-M relationships
Rob Milan
Visited
on: 2016#13:55
#15:99
out = #22:11 in = #22:11
#22:11
(Edge)
(Vertex)
(Vertex)
out = #13:55
in = #15:99
Connections use persistent
pointers
Each element in the Graph has own
immutable Record ID
Each element in the Graph has own
immutable Record ID
Each element in the Graph has own
immutable Record ID
Congrats! This is your diploma in
«Graph Theory»
Searching for something
Vertices and Edges are Documents
`
{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Frank”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: {
“city”:”London",“tags”:”millennial”
}}
Frank
Order
Makes
General purpose solution:• JSON• Schema-less • Schema-full• Schema-hybrid• Nested documents• Rich indexing and
querying• Developer friendly
Schema
• Property types – STRING, DATE, DATETIME, BYTE,
BOOLEAN, SHORT, BINARY• Constraint
– MANDATORY, NOTNULL, MIN, MAX, READONLY, REGEX
Schema
• Define indexes on single property or multiple properties– UNIQUE– NOT UNIQUE– FULL TEXT (Lucene)– SPATIAL (Lucene)
Polymorphic domain schema
Customer
Provider
Productname: string
qty: int
Actorname: string
surname: string
Sellsprice:
decimal
Inherits
Edge
Legenda:
V Vertex
Makes
Ordernumber:
intdate:
datetime
Hasprice:
decimal
Who
A Vertex is just a Document
We can define indexes on fields
CREATE CLASS User EXTENDS VCREATE PROPERTY User.userId LONGCREATE INDEX User.userId ON User(userId) UNIQUE
SELECT FROM User WHERE userId = 1024
What?
Ok, but my Users can describe themselves with free text. How can I find users describing
themselves as programmes?
CREATE PROPERTY User.description STRINGCREATE INDEX User.description
ON User(description) FULLTEXT ENGINE LUCENE
SELECT FROM User WHERE description LUCENE “programmer”
Where?
Users write articles with geo localisation data inside. I want all the article posted from the
Milano’s areaCREATE CLASS Article EXTENDS VCREATE PROPERTY Article.geo EMBEDDED OPointCREATE INDEX Article.geo
ON Article (geo) SPATIAL ENGINE LUCENE
SELECT * FROM Article WHERE ST_WITHIN(geo,
ST_Buffer(ST_GeomFromText(‘POINT(8.959091 46.005473)'), 1)) = true
Twitter Graph
Twitter graph
User
Tweet
Posts
User
Follows
Tweet
Retweets
Tweet
ReplyTo
Source
Using
Hashtag
Tags
User schema
CREATE CLASS User EXTENDS VCREATE PROPERTY User.userId LONGCREATE INDEX User.userId ON User(userId) UNIQUE
CREATE PROPERTY User.description STRINGCREATE PROPERTY User.screenName STRINGCREATE PROPERTY User.lang STRINGCREATE PROPERTY User.location STRING
Tweet schema
CREATE CLASS Tweet EXTENDS VCREATE PROPERTY Tweet.tweetId LONGCREATE INDEX Tweet.tweetId ON Tweet(tweetId) UNIQUECREATE PROPERTY Tweet.text STRINGCREATE PROPERTY Tweet.lang STRINGCREATE PROPERTY Tweet.location STRINGCREATE PROPERTY Tweet.createdAt DATETIMECREATE PROPERTY Tweet.isRetweeted BOOLEANCREATE PROPERTY Tweet.isRetweet BOOLEAN
Indexes
CREATE INDEX User.description ON User(description) FULLTEXT ENGINE LUCENE
CREATE INDEX Tweet.text ON Tweet(text) FULLTEXT ENGINE LUCENE
CREATE PROPERTY Tweet.geo EMBEDDED OPointCREATE INDEX Tweet.geo ON Tweet (geo)
SPATIAL ENGINE LUCENE
Relations
CREATE CLASS Posts EXTENDS E
CREATE CLASS Hashtag EXTENDS VCREATE PROPERTY Hashtag.label STRING
CREATE CLASS Tags EXTENDS E
CREATE CLASS Source EXTENDS VCREATE PROPERTY Source.name STRING
CREATE CLASS Using EXTENDS E
CREATE CLASS Follows EXTENDS ECREATE CLASS Retweets EXTENDS ECREATE CLASS ReplyTo EXTENDS ECREATE CLASS Mentions EXTENDS E
It’s demo time
docker run --name jugmi16 -d \
-v ~/local/orientdb/jugmi16/config:/orientdb/config \
-v ~/local/orientdb/jugmi16/databases:/orientdb/databases \
-p 2424:2424 -p 2480:2480 \
-e ORIENTDB_ROOT_PASSWORD=rootpwd \
-e ORIENTDB_NODE_NAME=ved1 \
orientdb/orientdb-spatial:latest server.sh
Run the Docker!
42Luca Franchini
Full text
Based on Lucene
Configurable
Analyzers
Stopwords
Access type
Full text
CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA{ "directory_type": "nio", "use_compound_file": false, "ram_buffer_MB": "16", "max_buffered_docs": "-1", "max_buffered_delete_terms": "-1", "ram_per_thread_MB": "1024", "default": "org.apache.lucene.analysis.standard.StandardAnalyzer" "description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer", "description_index_stopwords": [ "the", "is" ]}
Spatial
Lucene, Spatial4J, JTS
Geometry data
Point, line, polygon, multiline, multipolygon
Functions
follows The Open Geospatial Consortium OGC for extending SQL to support spatial data.
Implements a subset of SQL-MM functions with ST prefix (Spatial Type)
Spatial
Functions
ST_AsText(geom)
ST_GeomFromText(text)
ST_Equals(geom1,geom2)
ST_Within(geom1,geom2)
ST_Contains(geom1,geom2)
….
Spatial
SELECT ST_Intersects(ST_GeomFromText('POINT(0 0)'),
ST_GeomFromText('LINESTRING ( 2 0, 0 2 )'));
Result → (false)
SELECT ST_Disjoint(ST_GeomFromText('POINT(0 0)'),
ST_GeomFromText('LINESTRING ( 2 0, 0 2 )'));
Result → (true)
OrientDB Features
First Multi-Model DBMS with a Graph-Engine
Open Source Apache2 license
Data Models are built into the core engine
Schema-less, Schema-full and Schema-mixed
Written in Java (runs on every platform)
Zero-config HA
Get Started for Free
OrientDB Community Edition is FREE for any purpose (Apache 2 license)
Udemy Getting Started Training is ★★★★★ and Freehttp://www.orientechnologies.com/getting-started
OrientDB Enterprise is Free for Development
OrientDB At a Glance
70,000Downloads per
month from 200+ countries
100+Code contributors on Github and 15,000+
commits
1,000sUsers from SMBs to
Fortune 10 Companies
17+Years of
Product Research
Global Coverage and 24x7 Support
B-SideHow (the demo) it’s made
Roberto Franchini
How it’s made
Twitter4j for fetching stream
rxJava for stream processing (not so much
processing)
OrientDB graph API
OrientDB Docker image (custom)