Date post: | 15-May-2015 |
Category: |
Technology |
Upload: | david-funaro |
View: | 4,178 times |
Download: | 5 times |
NoSQL
David Funaro
Torino, 11 luglio 2011
PHP.TO.START
What about me ?• sw engineer
• PHP developer (2002)
• Symfony Framework developer (2009)
• Mobile developer ( iOs / Symbian )
• Senior developer @ dnsee
• PHP user group Rome Founder
• Open Source contributor
RDBMS
NOSQL
Other
Database - logical model
Relational DB
• In the *70’s
• SQL ,relational algebra & set theory
• excellent for applications such as management( accounting, reservations, management staff)
ACID
• Atomic
• Consistency
• Isolation
• Durability
Transactions work in the right mode if the database can satisfy this four properties:
RDBMS
NOSQL
Other
Database - logical model
RDBMS
Database - logical model
RDBMS
Database - logical model
Key Value
Document Oriented
Column Oriented
Graph DB
NOSql
NOSql !=
NOSql !=
Not Only SqlOne Size fits all
Historical IntroThe concept of “non relational database” is older than the “relational model” but has been resumed and improved
technology comes back
New Requirements
New Requirements
half *90’s
New Requirements
half *90’s
with the new internet-based systems the Consistency and the Security of data are no longer enough
New Requirements
half *90’s
with the new internet-based systems the Consistency and the Security of data are no longer enough
the new need is the Hight availability
• distributed storage system
• scale file dimension up to Petabyte
Wide applicability
Scalability
High performance
High availability
Google BigTable
• Web indexing
• Google Earth
• Google Finance
• Orkut
• Custom Search
• Google Docs
column - Oriented DB
Amazon
• Relational model doesn’t fit requirements
• 10 of thousand of server around the world
• 10 Millions customers
High Reliability High scale
Amazon Dynamo
• High Reliability
• High Scale
Key-Value Store Database
New Trends
Web Company
• Startup with explosive growth:
• DBMS open source
• v 1.0 - 1 node , becomes soon inadequate
• next version:
• Horizontal Partitioning (sharding)
• implement the node routing inside the application logic
Web Company
• Re-implement inter-node query
• Handle inter-node transaction
• Node failure increasingly likely - less reliability - less availability
• “Hot” Data restructuring and data redistribuition becomes hard
Solution
• Scalability, very simple operations, but on many nodes
• Performance, low latency
• Productivity
• Flexibility (data structure)
• Skill to distribute data on many nodes
} web Application
needs
Compromise
• SQL Renounce
• less strict transactions
Query Language
• SQL like
• map-reduce
• SparQL
• ...
Leave a standard query language like SQL, and embrace a different kind of query language based on the selected product
CAP Theorem(2009)
• Consistency
• Availability
• Partition Tollerance
It’s impossibile to have all of them at the same time in a distributed system. You have to choose only two.
Eric Brewer
Consistency
N2
N5
N4
N1
N6
tk
tk
tk
tk
• Strong: After the update completes any subsequent access will return the updated value.
• Weak: The system does not guarantee that subsequent accesses will return the updated value.
• Eventually: The storage system guarantees that if no new updates are made to the object eventually (after the inconsistency window closes) all accesses will return the last updated value.
Consistency
N2
N5
N4
N1
N6
tk
tk
tk
tk
• Strong: After the update completes any subsequent access will return the updated value.
• Weak: The system does not guarantee that subsequent accesses will return the updated value.
• Eventually: The storage system guarantees that if no new updates are made to the object eventually (after the inconsistency window closes) all accesses will return the last updated value.
Facebook Cassandra
• Key-Value store
• data model: BigTable
• infrastructure: Amazon-Dynamo
• Eventual Consistency
• High Availability
Just find the right way to manage your data-set
Search Best Solution
Technology Focus
context
purp
ose
Cos
t of
impl
emen
tatio
n
choose bike => (climb the mountain)
choose bike => (climb the mountain)
choose bike => (climb the mountain)
choose bike => (climb the mountain)
choose bike => (climb the mountain)
Know available tools
NOSql Families
Key Value StoreOne Key -> One Value
it’s like an HASH
db knows information about “key” type (integer, float, ...), nothing about the value
very fast
‘name’ ‘david’=>
key value
Key Value Store
• redis
• memcached
• dynamo
• voldemort
performance
Scalability
Flexibility
Complexity
Functionality
high
high
high
none
variabile(none)
Document Oriented• key -> document
• structured document
• schema-less{ name: ‘david’, surname: ‘funaro’, age: ’18’, mail: { home : ‘[email protected]’, office: ‘[email protected]‘ }}
user_13 =>
key
document
Document Oriented
performance
Scalability
Flexibility
Complexity
Functionality
high
variable (high)
high
low
variabile(low)
Graph DB
• composed by Vertices and Edges
• Vertices connected by Edges
• Edge has a Label and Direction
• Edges and Vertices have Properties
Graph DB
Funaro
dnsee
User_2David
User_1
User_3
User_3
friend
friend
friendsurnam
e
name
work
Graph DB
• neo4J
• OrientDB
• infogrid
• VertexDB
performance
Scalability
Flexibility
Complexity
Functionality
variable
variable
high
high
graph theory
Why NOSql
some case example
A Graph RDBMS
id name salary
1 ale 200
2 marco 230
3 david 340
4 sergio 349
5 andre 200
id_1 id_22 43 13 43 21 55 35 2
FolloweeUsers
A Graph RDBMS
id name salary
1 ale 200
2 marco 230
3 david 340
4 sergio 349
5 andre 200
id_1 id_22 43 13 43 21 55 35 2
FolloweeUsers
handled as BTree101
A Graph RDBMS
Lookup david’s id [Log(N)]
N = # users
Look K Followees [Log(N)]
Get their names [K*Log(N)]
Graph DB
Marco
Sergio
AndreaAle
David
Lookup David Log(N)
Lookup for Followees O(K)
Benchmark
• 1 Million Vertex
• 4 Million Edge
• Scale Free Tolopogy
• Postgres VS Neo4J
• Both Hash and BTree
Deph RDBMS Graph
1
2
3
4
5
100ms 30ms
1000ms 500ms
10000ms 3000ms
100000ms 50000ms
N/A 100000ms
http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
Schema
RDBMS NOSql - DocumentaleCREATE TABLE `pma_bookmark` ( `id` int(11) NOT NULL auto_increment, `name` varchar(255) NOT NULL default '', `surname` varchar(255) NOT NULL default '', `mobile` varchar(255) NOT NULL default '', `url` text NOT NULL,... `name` varchar(255) NOT NULL default '',... `telex` varchar(255) NOT NULL default '', `fax` varchar(255) NOT NULL default '', `office` text NOT NULL, PRIMARY KEY (`id`));
Schema Less
Schema 2id name surname mobile url ... telex office telex ...
1
2
3
david funaro 3548 davidfunaro.com null null 3548631 null null
alessandro nadalin 3257 null null null 32458 5456 null
marco rossi 3548 null null null null 515648 null
too value set to NULL
user :{ name: david, surname: funaro, mobile : 3454, url: davidfunaro.com, office: 3423423,}
user :{ name: alessandro, surname: nadalin, mobile : 6262, office: 342343, telex: 3434}
user :{ name: marco, surname: rossi, telex: 3434}
Each Document has only the required fields
Schema less
• flexibility to handle the data model fields
• the model can grow easily
Performance====== SET ======
100007 requests completed in 0.88 seconds 50 parallel clients 3 bytes payload keep alive: 1
====== GET ====== 100000 requests completed in 1.23 seconds 50 parallel clients 3 bytes payload keep alive: 1
http://redis.io/topics/benchmarks
http://research.yahoo.com/files/ycsb-v4.pdf
NOSql for PHP
✓Redis
✓MongoDB
✓CouchDB
✓Cassandra
✓Memcached
✴OrientDB
OrientDB library for PHP
https://github.com/congow/Orient
A Set of tools to use and manage any OrientDB instance from PHP.
Orient includes:
•the HTTP protocol binding•the query builder•the data mapper ( Object Graph Mapper )
credits
http://www.slideshare.net/ClaudioMartella/presentation-7398682?from=ss_embed http://www.slideshare.net/harrikauhanen/nosql-3376398 http://www.slideshare.net/ingdavidino/cmf-a-pain-in-the-f-phpday-05142011 http://it.wikipedia.org/wiki/Modello_relazionale http://www.slideshare.net/gabriele.lana/nosql-7405964 http://blog.indigenidigitali.com/l-ecosistema-nosql/ http://www.dia.uniroma3.it/~torlone/bd2/noSQL-1.pdf http://nosql-database.org/