Date post: | 05-Jul-2015 |
Category: |
Software |
Upload: | rishikese-mr |
View: | 155 times |
Download: | 1 times |
WelcomeSchool of Engineering, CUSAT 1
A SEMINAR ON
NEO4J
Presented by: Vishnu Sanker
Project guide: Dr. Sudheep Elayidom
Contents
• Trends in big data
• NoSQL
• Graphs
• Neo4j
• Brief introduction to Cypher
• Pros and Cons of Neo4j
School of Engineering, CUSAT 3
TRENDS IN BIG DATA
1. Increasing data size (big data)
• “Every 2 days we create as much information as we did up to 2003”
- Eric Schmidt
2. Increasingly connected data (graph data)
• For example, text documents to html
3. Semi-structured data
• Individualization of data, with common sub-set
4. Architecture
• From monolithic to modular, distributed applications
School of Engineering, CUSAT 4
NO SQL
School of Engineering, CUSAT 5
NOSQL
• Carlo Strozzi used the term NoSQL in 1998 to name his lightweight,
open-source relational database that did not expose the standard
SQL interface
• Provides a mechanism for storage and retrieval of data that is
modeled in means other than the tabular relations used in relational
databases.
School of Engineering, CUSAT 6
BENEFITS OF NOSQL
• Large volumes of structured, semi-structured and unstructured data
• Agile sprints, quick iteration, and frequent code pushes
• Flexible, easy to use object-oriented programming
• Efficient, scale-out architecture instead of expensive, monolithic architecture
School of Engineering, CUSAT 7
TYPES OF NOSQL
• Column
- distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements
Unique name : Used to reference the column
Value : The content of the column.
Timestamp : Used to determine the valid content
• Document oriented
- designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data
• Key value pairs
- collection of key value pairs
• Graph
- database that uses graph structures with nodes, edges, and properties to represent and store data
School of Engineering, CUSAT 8
GRAPHS
School of Engineering, CUSAT 9
GRAPHS
A GRAPH DATABASE...
NO: not for charts & diagrams, or vector artwork
YES: for storing data that is structured as a graph
School of Engineering, CUSAT 10
Graphs Everywhere
๏Relationships in
•Politics, Economics, History, Science, Transportation
๏Biology, Chemistry, Physics, Sociology
•Body, Ecosphere, Reaction, Interactions
๏Internet
•Hardware, Software, Interaction
๏Social Networks
•Family, Friends
•Work, Communities
•Neighbours, Cities, Society
School of Engineering, CUSAT 11
School of Engineering, CUSAT 12
Good Relationships
๏The world is rich, messy and related data
๏Relationships are as least as important as the things they connect
๏Complex interactions
๏Always changing, change of structures as well
๏Graph: Relationships are part of the data
๏RDBMS: Relationships part of the fixed schema
School of Engineering, CUSAT 13
HOW AN RDB IS REPRESENTED BY GRAPH
RDB PROPERTY GRAPH
School of Engineering, CUSAT 14
NEO4J - A GRAPH DATABASE
NEO4j - A GRAPH DATABASE
School of Engineering, CUSAT 15
GRAPHS
School of Engineering, CUSAT 16
School of Engineering, CUSAT 17
Neo4j is a Graph Database
๏A Graph Database:
•a schema-free Property Graph
•perfect for complex, highly connected data
๏Why NEO4J:
•reliable with real ACID Transactions
•fast with more than 1M traversals / second
•Server with REST API, or Embeddable on the JVM
•scale out for higher-performance reads with High-Availability
School of Engineering, CUSAT 18
DATA MODELING FOR NEO4J
School of Engineering, CUSAT 19
School of Engineering, CUSAT 20
School of Engineering, CUSAT 21
School of Engineering, CUSAT 22
School of Engineering, CUSAT 23
School of Engineering, CUSAT 24
School of Engineering, CUSAT 25
School of Engineering, CUSAT 26
SAMPLE CODE
School of Engineering, CUSAT 27
School of Engineering, CUSAT 28
CYPHER
School of Engineering, CUSAT 29
CYPHER - QUERY LANGUAGE FOR NEO4J
• Declarative query language
• Describe what you want, not how
• Based on pattern matching
• declarative grammar with clauses (like SQL)
• aggregation, ordering, limits
• create, update, delete
School of Engineering, CUSAT 30
Cypher: START + RETURN
๏START <lookup> RETURN <expressions>
๏START binds terms using simple look-up
•directly using known ids
•or based on indexed Property
๏RETURN expressions specify result set
School of Engineering, CUSAT 31
Cypher: MATCH
๏START <lookup> MATCH <pattern> RETURN <expr>
๏MATCH describes a pattern of nodes+relationships
•node terms in optional parenthesis
•lines with arrows for relationships
School of Engineering, CUSAT 32
Cypher: WHERE
๏START <lookup> [MATCH <pattern>]
๏WHERE <condition> RETURN <expr>
๏WHERE filters nodes or relationships
•uses expressions to constrain elements
School of Engineering, CUSAT 33
Cypher: SET
๏SET [<node property>] [<relationship property>]
•update a property on a node or relationship
•must follow a START
School of Engineering, CUSAT 34
Cypher: DELETE
๏DELETE [<node>|<relationship>|<property>]
•delete a node, relationship or property
•must follow a START
•to delete a node, all relationships must be deleted
first
School of Engineering, CUSAT 35
PROS AND CONS OF NEO4J
PROS
• Powerful data model - as generalized as rdbms
• Connected data is locally indexed
• Easy to query
Cons
• Sharding
• Needs new way of thinking
School of Engineering, CUSAT 36
Concluding...
• Neo4j is property graph database
• It is scalable, flexible, and is totally
designed in java
• Cypher is a query language for neo4j,
which is highly declarative and flexible
aswell
School of Engineering, CUSAT 37
School of Engineering, CUSAT 38
School of Engineering, CUSAT 39