Post on 23-Dec-2015
transcript
A Comparison of A Comparison of SQL and NoSQL SQL and NoSQL
DatabasesDatabasesKeith W. HareKeith W. Hare
JCC Consulting, Inc.JCC Consulting, Inc.Convenor, ISO/IEC JTC1 Convenor, ISO/IEC JTC1
SC32 WG3SC32 WG3
April 19, 2023 Metadata Open Forum 1
ISO/IEC JTC1/SC32/WG2 N1537
AbstractAbstractNoSQL databases (either no-SQL or NoSQL databases (either no-SQL or Not Only SQL) are currently a hot Not Only SQL) are currently a hot topic in some parts of computing. In topic in some parts of computing. In fact, one website lists over a hundred fact, one website lists over a hundred different NoSQL databases. different NoSQL databases.
This presentation reviews the features This presentation reviews the features common to the NoSQL databases and common to the NoSQL databases and compares those features to the compares those features to the features and capabilities of SQL features and capabilities of SQL databases.databases.April 19, 2023 Metadata Open Forum 2
Who Am I?Who Am I? Muskingum College, 1980, BS in Muskingum College, 1980, BS in
Biology and Computer ScienceBiology and Computer Science Senior Consultant with JCC Consulting, Senior Consultant with JCC Consulting,
Inc. since 1985 – high performance Inc. since 1985 – high performance database systemsdatabase systems
Ohio State – Masters in Computer & Ohio State – Masters in Computer & Information Science, 1985Information Science, 1985
SQL Standards committees since 1988SQL Standards committees since 1988 Vice Chair, INCITS H2 since 2003Vice Chair, INCITS H2 since 2003 Convenor, ISO/IEC JTC1 SC32 WG3 Convenor, ISO/IEC JTC1 SC32 WG3
since 2005since 2005
April 19, 2023 Metadata Open Forum 3
TopicsTopics SQL DatabasesSQL Databases
SQL StandardSQL Standard SQL CharacteristicsSQL Characteristics SQL Database ExamplesSQL Database Examples
NoSQL DatabasesNoSQL Databases NoSQL DefintionNoSQL Defintion General CharacteristicsGeneral Characteristics NoSQL Database TypesNoSQL Database Types NoSQL Database ExamplesNoSQL Database Examples
April 19, 2023 Metadata Open Forum 4
Standard SQLStandard SQLThe following is a short, incomplete history of The following is a short, incomplete history of
the SQL Standards – ISO/IEC 9075the SQL Standards – ISO/IEC 9075 1987 – Initial ISO/IEC Standard1987 – Initial ISO/IEC Standard 1989 – Referential Integrity1989 – Referential Integrity 1992 – SQL21992 – SQL2
1995 SQL/CLI (ODBC)1995 SQL/CLI (ODBC) 1996 SQL/PSM – Procedural Language extensions1996 SQL/PSM – Procedural Language extensions
1999 – User Defined Types1999 – User Defined Types 2003 – SQL/XML2003 – SQL/XML 2008 – Expansions and corrections2008 – Expansions and corrections 2011 (or 2012) System Versioned and 2011 (or 2012) System Versioned and
Application Time Period TablesApplication Time Period Tables
April 19, 2023 Metadata Open Forum 5
SQL CharacteristicsSQL Characteristics Data stored in columns and tablesData stored in columns and tables Relationships represented by dataRelationships represented by data Data Manipulation LanguageData Manipulation Language Data Definition Language Data Definition Language TransactionsTransactions Abstraction from physical layerAbstraction from physical layer
April 19, 2023 Metadata Open Forum 6
SQL Physical Layer SQL Physical Layer AbstractionAbstraction
Applications specify what, not howApplications specify what, not how Query optimization engineQuery optimization engine Physical layer can change without Physical layer can change without
modifying applicationsmodifying applications Create indexes to support queriesCreate indexes to support queries In Memory databasesIn Memory databases
April 19, 2023 Metadata Open Forum 7
Data Manipulation Data Manipulation Language (DML)Language (DML)
Data manipulated with Select, Data manipulated with Select, Insert, Update, & Delete statementsInsert, Update, & Delete statements Select T1.Column1, T2.Column2 …Select T1.Column1, T2.Column2 …
From Table1, Table2 …From Table1, Table2 …Where T1.Column1 = T2.Column1 …Where T1.Column1 = T2.Column1 …
Data AggregationData Aggregation Compound statementsCompound statements Functions and ProceduresFunctions and Procedures Explicit transaction controlExplicit transaction control
April 19, 2023 Metadata Open Forum 8
Data Definition LanguageData Definition Language Schema defined at the startSchema defined at the start Create Table (Column1 Datatype1, Column2 Create Table (Column1 Datatype1, Column2
Datatype 2, …)Datatype 2, …) Constraints to define and enforce Constraints to define and enforce
relationshipsrelationships Primary KeyPrimary Key Foreign KeyForeign Key Etc.Etc.
Triggers to respond to Insert, Update , & Triggers to respond to Insert, Update , & DeleteDelete
Stored ModulesStored Modules Alter …Alter … Drop …Drop … Security and Access ControlSecurity and Access ControlApril 19, 2023 Metadata Open Forum 9
Transactions – ACID Transactions – ACID PropertiesProperties
AAtomic – All of the work in a transaction tomic – All of the work in a transaction completes (commit) or none of it completescompletes (commit) or none of it completes
CConsistent – A transaction transforms the onsistent – A transaction transforms the database from one consistent state to database from one consistent state to another consistent state. Consistency is another consistent state. Consistency is defined in terms of constraints.defined in terms of constraints.
IIsolated – The results of any changes made solated – The results of any changes made during a transaction are not visible until during a transaction are not visible until the transaction has committed.the transaction has committed.
DDurable – The results of a committed urable – The results of a committed transaction survive failurestransaction survive failures
April 19, 2023 Metadata Open Forum 10
SQL Database ExamplesSQL Database Examples CommercialCommercial
IBM DB2IBM DB2 Oracle RDMSOracle RDMS Microsoft SQL ServerMicrosoft SQL Server Sybase SQL AnywhereSybase SQL Anywhere
Open Source (with commercial options) Open Source (with commercial options) MySQLMySQL IngresIngres
Significant portions of the Significant portions of the world’s economy use SQL world’s economy use SQL
databases!databases!
April 19, 2023 Metadata Open Forum 11
NoSQL DefinitionNoSQL DefinitionFrom www.nosql-database.org:From www.nosql-database.org:
Next Generation Databases mostly Next Generation Databases mostly addressing some of the points: being addressing some of the points: being non-non-relational, distributed, open-sourcerelational, distributed, open-source and and horizontal scalablehorizontal scalable. The original intention . The original intention has been has been modern web-scale databasesmodern web-scale databases. . The movement began early 2009 and is The movement began early 2009 and is growing rapidly. Often more characteristics growing rapidly. Often more characteristics apply as: apply as: schema-free, easy replication schema-free, easy replication support, simple API, eventually support, simple API, eventually consistentconsistent / / BASEBASE (not ACID), a (not ACID), a huge huge data amountdata amount, and more. , and more.
April 19, 2023 Metadata Open Forum 12
NoSQL Products/ProjectsNoSQL Products/Projectshttp://www.nosql-database.org/ lists lists 122 NoSQL Databases122 NoSQL DatabasesCassandraCassandraCouchDBCouchDBHadoop & HbaseHadoop & HbaseMongoDBMongoDBStupidDBStupidDBEtc.Etc.
April 19, 2023 Metadata Open Forum 13
NoSQL Distinguishing NoSQL Distinguishing CharacteristicsCharacteristics
Large data volumesLarge data volumes Google’s “big data”Google’s “big data”
Scalable replication and distributionScalable replication and distribution Potentially thousands of machinesPotentially thousands of machines Potentially distributed around the worldPotentially distributed around the world
Queries need to return answers quicklyQueries need to return answers quickly Mostly query, few updatesMostly query, few updates Asynchronous Inserts & UpdatesAsynchronous Inserts & Updates Schema-lessSchema-less ACID transaction properties are not needed ACID transaction properties are not needed
– BASE– BASE CAP TheoremCAP Theorem Open source developmentOpen source developmentApril 19, 2023 Metadata Open Forum 14
BASE TransactionsBASE Transactions Acronym contrived to be the opposite of Acronym contrived to be the opposite of
ACIDACID BBasically asically AAvailable, vailable, SSoft state,oft state, EEventually Consistentventually Consistent
CharacteristicsCharacteristics Weak consistency – stale data OKWeak consistency – stale data OK Availability firstAvailability first Best effortBest effort Approximate answers OKApproximate answers OK Aggressive (optimistic)Aggressive (optimistic) Simpler and fasterSimpler and faster
April 19, 2023 Metadata Open Forum 15
Brewer’s CAP TheoremBrewer’s CAP TheoremA distributed system can support only A distributed system can support only two of the following characteristics:two of the following characteristics: ConsistencyConsistencyAvailabilityAvailabilityPartition tolerancePartition tolerance
The slides from Brewer’s July 2000 The slides from Brewer’s July 2000 talk do not define these talk do not define these characteristics.characteristics.
April 19, 2023 Metadata Open Forum 16
ConsistencyConsistency all nodes see the same data at the all nodes see the same data at the
same time – Wikipediasame time – Wikipedia client perceives that a set of client perceives that a set of
operations has occurred all at once – operations has occurred all at once – PritchettPritchett
More like Atomic in ACID More like Atomic in ACID transaction propertiestransaction properties
April 19, 2023 17Metadata Open Forum
AvailabilityAvailability node failures do not prevent node failures do not prevent
survivors from continuing to operate survivors from continuing to operate – Wikipedia– Wikipedia
Every operation must terminate in Every operation must terminate in an intended response – Pritchettan intended response – Pritchett
April 19, 2023 18Metadata Open Forum
Partition TolerancePartition Tolerance the system continues to operate the system continues to operate
despite arbitrary message loss – despite arbitrary message loss – WikipediaWikipedia
Operations will complete, even if Operations will complete, even if individual components are individual components are unavailable – Pritchettunavailable – Pritchett
April 19, 2023 19Metadata Open Forum
NoSQL Database TypesNoSQL Database TypesDiscussing NoSQL databases is Discussing NoSQL databases is complicated because there are a complicated because there are a variety of types:variety of types:Column Store – Each storage block Column Store – Each storage block contains data from only one columncontains data from only one columnDocument Store – stores documents Document Store – stores documents made up of tagged elementsmade up of tagged elementsKey-Value Store – Hash table of keysKey-Value Store – Hash table of keys
April 19, 2023 Metadata Open Forum 20
Other Non-SQL DatabasesOther Non-SQL Databases XML DatabasesXML Databases Graph DatabasesGraph Databases Codasyl DatabasesCodasyl Databases Object Oriented DatabasesObject Oriented Databases Etc…Etc… Will not address these todayWill not address these today
April 19, 2023 Metadata Open Forum 21
NoSQL Example: Column NoSQL Example: Column StoreStore
Each storage block contains data Each storage block contains data from only one columnfrom only one column
Example: Hadoop/HbaseExample: Hadoop/Hbase http://hadoop.apache.org/http://hadoop.apache.org/ Yahoo, FacebookYahoo, Facebook
Example: Ingres VectorWiseExample: Ingres VectorWise Column Store integrated with an SQL Column Store integrated with an SQL
databasedatabase http://www.ingres.com/products/http://www.ingres.com/products/
vectorwisevectorwiseApril 19, 2023 Metadata Open Forum 22
Column Store CommentsColumn Store Comments More efficient than row (or More efficient than row (or
document) store if:document) store if: Multiple row/record/documents are Multiple row/record/documents are
inserted at the same time so updates of inserted at the same time so updates of column blocks can be aggregatedcolumn blocks can be aggregated
Retrievals access only some of the Retrievals access only some of the columns in a row/record/documentcolumns in a row/record/document
April 19, 2023 Metadata Open Forum 23
NoSQL Example: Document NoSQL Example: Document Store Store
Example: CouchDBExample: CouchDB http://couchdb.apache.org/ BBCBBC
Example: MongoDBExample: MongoDB http://www.mongodb.org/ Foursquare, ShutterflyFoursquare, Shutterfly
JSON – JavaScript Object NotationJSON – JavaScript Object Notation
April 19, 2023 Metadata Open Forum 24
CouchDB JSON ExampleCouchDB JSON Example{{
"_id": "guid goes here","_id": "guid goes here",
"_rev": "314159","_rev": "314159",
"type": "abstract","type": "abstract",
"author": "Keith W. Hare""author": "Keith W. Hare"
"title": "SQL Standard and NoSQL Databases","title": "SQL Standard and NoSQL Databases",
"body": "NoSQL databases (either no-SQL or Not Only SQL) "body": "NoSQL databases (either no-SQL or Not Only SQL)
are currently a hot topic in some parts ofare currently a hot topic in some parts of
computing.",computing.",
"creation_timestamp": "2011/05/10 13:30:00 +0004""creation_timestamp": "2011/05/10 13:30:00 +0004"
}}
April 19, 2023 Metadata Open Forum 25
CouchDB JSON TagsCouchDB JSON Tags "_id" "_id"
GUID – Global Unique IdentifierGUID – Global Unique Identifier Passed in or generated by CouchDBPassed in or generated by CouchDB
"_rev""_rev" Revision numberRevision number Versioning mechanismVersioning mechanism
"type", "author", "title", etc. "type", "author", "title", etc. Arbitrary tagsArbitrary tags Schema-lessSchema-less Could be validated after the fact by user-Could be validated after the fact by user-
written routinewritten routine
April 19, 2023 Metadata Open Forum 26
NoSQL Examples: Key-NoSQL Examples: Key-Value StoreValue Store
Hash tables of KeysHash tables of Keys Values stored with KeysValues stored with Keys Fast access to small data valuesFast access to small data values Example – Project-Voldemort Example – Project-Voldemort
http://www.project-voldemort.com/ LinkedinLinkedin
Example – MemCacheDBExample – MemCacheDB http://memcachedb.org/ Backend storage is Berkeley-DBBackend storage is Berkeley-DB
April 19, 2023 Metadata Open Forum 27
Map ReduceMap Reduce Technique for indexing and Technique for indexing and
searching large data volumessearching large data volumes Two Phases, Map and ReduceTwo Phases, Map and Reduce
MapMap Extract sets of Key-Value pairs from Extract sets of Key-Value pairs from
underlying dataunderlying data Potentially in Parallel on multiple machinesPotentially in Parallel on multiple machines
ReduceReduce Merge and sort sets of Key-Value pairsMerge and sort sets of Key-Value pairs Results may be useful for other searchesResults may be useful for other searches
April 19, 2023 Metadata Open Forum 28
Map ReduceMap Reduce Map Reduce techniques differ across Map Reduce techniques differ across
productsproducts Implemented by application Implemented by application
developers, not by underlying developers, not by underlying softwaresoftware
April 19, 2023 Metadata Open Forum 29
Map Reduce PatentMap Reduce PatentGoogle granted US Patent 7,650,331, January 2010Google granted US Patent 7,650,331, January 2010
System and method for efficient large-scale data System and method for efficient large-scale data processing processing
A large-scale data processing system and method A large-scale data processing system and method includes one or more application-independent map includes one or more application-independent map modules configured to read input data and to apply at modules configured to read input data and to apply at least one least one application-specific map operationapplication-specific map operation to the to the input data to produce intermediate data values, input data to produce intermediate data values, wherein the map operation is automatically wherein the map operation is automatically parallelized across multiple processors in the parallel parallelized across multiple processors in the parallel processing environment. A plurality of intermediate processing environment. A plurality of intermediate data structures are used to store the intermediate data structures are used to store the intermediate data values. One or more application-independent data values. One or more application-independent reduce modules are configured to retrieve the reduce modules are configured to retrieve the intermediate data values and to apply at least one intermediate data values and to apply at least one application-specific reduce operation application-specific reduce operation to the to the intermediate data values to provide output data. intermediate data values to provide output data.
April 19, 2023 Metadata Open Forum 30
Storing and Modifying DataStoring and Modifying Data Syntax variesSyntax varies
HTMLHTML Java ScriptJava Script Etc.Etc.
Asynchronous – Inserts and updates Asynchronous – Inserts and updates do not wait for confirmationdo not wait for confirmation
VersionedVersioned Optimistic ConcurrencyOptimistic Concurrency
April 19, 2023 Metadata Open Forum 31
Retrieving DataRetrieving Data Syntax VariesSyntax Varies
No set-based query languageNo set-based query language Procedural program languages such as Procedural program languages such as
Java, C, etc.Java, C, etc. Application specifies retrieval pathApplication specifies retrieval path No query optimizerNo query optimizer Quick answer is importantQuick answer is important May not be a single “right” answerMay not be a single “right” answer
April 19, 2023 Metadata Open Forum 32
Open SourceOpen Source Small upfront software costsSmall upfront software costs Suitable for large scale distribution Suitable for large scale distribution
on commodity hardwareon commodity hardware
April 19, 2023 Metadata Open Forum 33
NoSQL SummaryNoSQL Summary NoSQL databases reject:NoSQL databases reject:
Overhead of ACID transactionsOverhead of ACID transactions ““Complexity” of SQLComplexity” of SQL Burden of up-front schema designBurden of up-front schema design Declarative query expression Declarative query expression Yesterday’s technologyYesterday’s technology
Programmer responsible forProgrammer responsible for Step-by-step procedural languageStep-by-step procedural language Navigating access pathNavigating access path
April 19, 2023 Metadata Open Forum 34
SummarySummary SQL DatabasesSQL Databases
Predefined SchemaPredefined Schema Standard definition and interface languageStandard definition and interface language Tight consistencyTight consistency Well defined semanticsWell defined semantics
NoSQL DatabaseNoSQL Database No predefined SchemaNo predefined Schema Per-product definition and interface Per-product definition and interface
languagelanguage Getting an answer quickly is more Getting an answer quickly is more
important than getting a correct answerimportant than getting a correct answer
April 19, 2023 Metadata Open Forum 35
April 19, 2023 Metadata Open Forum 36
Questions?Questions?
April 19, 2023 Metadata Open Forum 37
Web ReferencesWeb References ““NoSQL -- Your Ultimate Guide to the Non - NoSQL -- Your Ultimate Guide to the Non -
Relational Universe!” Relational Universe!” http://nosql-database.org/links.html
““NoSQL (RDBMS)”NoSQL (RDBMS)”http://en.wikipedia.org/wiki/NoSQL
PODC Keynote, July 19, 2000. PODC Keynote, July 19, 2000. Towards RobustTowards Robust. . Distributed SystemsDistributed Systems. Dr. Eric A. . Dr. Eric A. BrewerBrewer. Professor, . Professor, UC Berkeley. Co-Founder & Chief Scientist, UC Berkeley. Co-Founder & Chief Scientist, Inktomi Inktomi ..www.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
““Brewer's CAP Theorem” posted by Julian Browne, Brewer's CAP Theorem” posted by Julian Browne, January 11, 2009. January 11, 2009. http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
““How to write a CV” Geek & Poke Cartoon How to write a CV” Geek & Poke Cartoon http://geekandpoke.typepad.com/geekandpoke/201http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html1/01/nosql.html
April 19, 2023 Metadata Open Forum 38
Web ReferencesWeb References ““Exploring CouchDB: A document-oriented Exploring CouchDB: A document-oriented
database for Web applications”, Joe Lennon, database for Web applications”, Joe Lennon, Software developer, Core International.Software developer, Core International.http://www.ibm.com/developerworks/opensource/http://www.ibm.com/developerworks/opensource/library/os-couchdb/index.htmllibrary/os-couchdb/index.html
““Graph Databases, NOSQL and Neo4j” Posted by Graph Databases, NOSQL and Neo4j” Posted by Peter Neubauer on May 12, 2010 at: Peter Neubauer on May 12, 2010 at: http://www.infoq.com/articles/graph-nosql-neo4jhttp://www.infoq.com/articles/graph-nosql-neo4j
““Cassandra vs MongoDB vs CouchDB vs Redis vs Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison”, Kristóf Kovács. Riak vs HBase comparison”, Kristóf Kovács. http://kkovacs.eu/cassandra-vs-mongodb-vs-http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-rediscouchdb-vs-redis
““Distinguishing Two Major Types of Column-Distinguishing Two Major Types of Column-Stores” Posted by Daniel Abadi onMarch 29, 2010 Stores” Posted by Daniel Abadi onMarch 29, 2010 http://dbmsmusings.blogspot.com/2010/03/distinghttp://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.htmluishing-two-major-types-of_29.html
April 19, 2023 Metadata Open Forum 39
Web ReferencesWeb References ““MapReduce: Simplified Data Processing on MapReduce: Simplified Data Processing on
Large Clusters”, Jeffrey Dean and Sanjay Large Clusters”, Jeffrey Dean and Sanjay Ghemawat, December 2004.Ghemawat, December 2004.http://labs.google.com/papers/mapreduce.htmlhttp://labs.google.com/papers/mapreduce.html
““Scalable SQL”, ACM Queue, Michael Rys, April Scalable SQL”, ACM Queue, Michael Rys, April 19, 201119, 2011http://queue.acm.org/detail.cfm?id=1971597http://queue.acm.org/detail.cfm?id=1971597
““a practical guide to noSQL”, Posted by Denise a practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at Miura on March 17, 2011 at http://blogs.marklogic.com/2011/03/17/a-http://blogs.marklogic.com/2011/03/17/a-practical-guide-to-nosql/practical-guide-to-nosql/
April 19, 2023 Metadata Open Forum 40
BooksBooks ““CouchDB CouchDB The Definitive GuideThe Definitive Guide”, J. Chris ”, J. Chris
Anderson, Jan Lehnardt and Noah Slater. Anderson, Jan Lehnardt and Noah Slater. O’Reilly Media Inc., Sebastopool, CA, USA. 2010O’Reilly Media Inc., Sebastopool, CA, USA. 2010
““Hadoop Hadoop The Definitive GuideThe Definitive Guide”, Tom White. ”, Tom White. O’Reilly Media Inc., Sebastopool, CA, USA. 2011O’Reilly Media Inc., Sebastopool, CA, USA. 2011
““MongoDB MongoDB The Definitive GuideThe Definitive Guide”, Kristina ”, Kristina Chodorow and Michael Dirolf. O’Reilly Media Chodorow and Michael Dirolf. O’Reilly Media Inc., Sebastopool, CA, USA. 2010Inc., Sebastopool, CA, USA. 2010
April 19, 2023 Metadata Open Forum 41