+ All Categories
Transcript

Is NoSQL the Future of Data

Storage?

By Gary Short

Developer Express

Introduction

• Gary Short

• Technical Evangelist for Developer Express

• C# MVP

[email protected]

• www.garyshort.org

• @garyshort.

2

What About You Guys?

3

Breadth First Look @ NoSQL

4

Be Doing 3 Things

1. Define NoSQL databases

2. Look at scenarios where you can use NoSQL

3. Drill into a specific use case.

5

6

Where Does NoSQL Originate?

• 1998

– OS relational database

• Created by Carlo Strozzi

• Didn’t expose an SQL interface

• Called NoSQL

• The author said:

• “departs from the relational model altogether...”

• “...should have been called ‘NoREL”.

7

More Recently...

• Eric Evans reintroduced the term in 2009

– Johan Oskarsson (last.fm)

• Event to discuss OS distributed databases

• This labels growing number datastores

– Open source

– Non-relational

– Distributed

– (often) don’t guarantee ACID.

8

Atlanta 2009

• No:sql(east) conference

• Billed as “conference of no-rel datastores”

• Worst tag line ever– SELECT fun, profit FROM real_world WHERE rel=false.

9

Not Ant-RDBMS

10

Let’s Talk a Bit About What NoSQL DBs

Look Like...

11

Key Attributes of NoSQL Databases

• Don’t require fixed table schemas

• Non-relational

• (Usually) avoid join operations

• Scale horizontally

– Adding more nodes to a storage system.

12

What Does the Taxonomy Look Like?

13

Document Store

• RavenDB

• Apache Jackrabbit

• CouchDB

• MongoDB

• SimpleDB

• XML Databases

– MarkLogic Server

– eXist.

14

Document What?

15

Graph Storage

• Trinity

• AllegroGraph

• Core Data

• Neo4j

• DEX

• FlockDB.

16

Which Means?

• Graph consists of

– Node (‘stations’ of the graph)

– Edges (lines between them)

• FlockDB

– Created by the Twitter folks

– Nodes = Users

– Edges = Nature of relationship between nodes.

17

Social Graph

18

Key/Value Stores

• On disk

• Cache in Ram

• Eventually Consistent

– Weak Definition• “If no updates occur for a period, eventually all updates will

propagate through the system and all replicas will be consistent”

– Strong Definition• “for a given update and a given replica eventually either the

update reaches the replica or the replica retires”

• Ordered

– Distributed Hash Table allows lexicographical processing.

19

Object Databases

• Db4o

• GemStone/S

• InterSystems Caché

• Objectivity/DB

• ZODB.

20

How the &*$% do You Index

That?!

21

Okay got it, Now Let’s Compare Some

Real World Scenarios

22

You Need Constant Consistency

• You’re dealing with financial transactions

• You’re dealing with medical records

• You’re dealing with bonded goods

• Best you use a RDMBS ☺.

23

You Need Horizontal Scalability

• You’re working across defined geographic regions

• You’re working with large quantities of data

• Game server sharding

• Use NoSQL

– Something like Cassandra.

24

Up in the Clouds Baby

25

26

Frequently Written Rarely Read

• Think web counters and the like

• Every time a user comes to a page = ctr++

• But it’s only read when the report is run

• Use NoSQL (key-value storage/memcache).

27

I Got Big Data!

28

Binary Baby!

• If you are YouTube

• Flickr

• Twitpic

• Spotify

• NoSQL (Amazon S3).

29

Here Today Gone Tomorrow

• Transient data like..

– Web Sessions

– Locks

– Short Term Stats

• Shopping cart contents

• Use NoSQL (Memcache).

30

Data Replication

• Same data in two or more locations

– Music Library

• Web browser

• iPone App

• NoSQL (CouchDB).

31

Hit me Baby One More Time!

• High Availability

– High number of important transactions

• Online gambling

• Pay Per view

– Ahem!

• Online Auction

• NoSQL (Cassandra – automatic clustering).

32

Give me a Real World Example

• Twitter

– The challenges

• Needs to store many graphs

– Who you are following

– Who’s following you

– Who you receive phone notifications from etc

• To deliver a tweet requires rapid paging of followers

• Heavy write load as followers are added and removed

• Set arithmetic for @mentions (intersection of users).

33

What Did They Try?

• Relational Databases

• Key-Value storage of denormalized lists

34

Did it Work?

35

What Did They Need?

• Simplest possible thing that would work

• Allow for horizontal partitioning

• Allow write operations to

– Arrive out of order

– Or be processed more than once

• Failures should result in redundant work

– Not lost work!

36

The Result was FlockDB

• Stores graph data

• Not optimised for graph traversal operations

• Optimised for large adjacency lists

– List of all edges in a graph

• Each entry is a set of end points (or tuple if directed)

• Optimised for fast read and write

• Optimised for page-able set arithmetic.

37

How Does it Work?

• Stores graphs as sets of edges between nodes

• Data is partitioned by node

– All queries can be answered by a single partition

• Write operations are idempotent

– Can be applied multiple times without changing the result

• And commutative

– Changing the order of operands doesn’t change the result.

38

A Little More About Idempotency

• Applied several times with no change to the

result

• A operation ’O’ on set S is called idempotent

if, for all x in S, x O x = x.

• Set union

– A U B = {X: X E A or X E B}

• Set intersection

– A n B = {X: X E A and X E B}

39

A Little More About Commutative

• Changing the order of operands doesn’t change the result.

40

3 + 2 = 5

• Can be combined with idempotency

• Let’s look at the follow command in Twitter

• Let X = follow person X

• Let Y = follow person Y

• Then 3X + 2Y = 2Y + 3X

• And 2X + 3Y = 3X + 2Y

• Note: it’s only true for the same operation.

Commutative Writes Help Bring up

Partitions

• Partition can receive write traffic immediately

• Receive dump of data in the background

• Live for read as soon as the dump is complete.

41

Performance?

• Currently store 13 billion edges

• 20K writes / second

• 100K reads / second.

42

Punchline?

• Under all the bells and whistles...

– Its MySQL ☺.

43

So is this the Future?

• Yes!

• And No!

44

What?! How Can That be?!

45


Top Related