+ All Categories
Home > Documents > 1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory...

1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory...

Date post: 26-Dec-2015
Category:
Upload: geraldine-marsh
View: 213 times
Download: 0 times
Share this document with a friend
52
1 Introduction to Big Data and NoSQL SQL Azure Saturday April, 21, 2012 Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com
Transcript

1

Introduction to Big Data and NoSQLSQL Azure SaturdayApril, 21, 2012

Don Demsak

Advisory Solutions Architect

EMC Consulting

www.donxml.com

2

Meet Don

• Advisory Solutions Architect– EMC Consulting

• Application Architecture, Development & Design

• DonXml.com, Twitter: donxml• Email – [email protected]• SlideShare - http://www.slideshare.net/dondemsak

3

The era of Big Data

4

How did we get here?• Expensive

– Processors– Disk space– Memory– Operating Systems– Software– Programmers

• Monoculture– Limit CPU cycles– Limit disk space– Limit memory– Limited OS

Development– Limited Software– Programmers

• Mono-lingual• Mono-persistence

5

Typical RDBMS Implementations• Fixed table schemas

• Small but frequent reads/writes

• Large batch transactions

• Focus on ACID– Atomicity– Consistency– Isolation– Durability

6

How we scale RDBMS implementations

7

1st Step – Build a relational database

Database

8

2nd Step – Table Partitioning

Database

p1 p2 p3

9

3rd Step – Database Partitioning

Web TierBrowser B/L Tier Database

Customer #2

Web TierBrowser B/L Tier Database

Customer #1

Web TierBrowser B/L Tier Database

Customer #3

10

4th Step – Move to the cloud?

Web TierBrowser B/L TierSQL AzureFederation

Customer #2

Web TierBrowser B/L Tier SQL AzureFederation

Customer #1

Web TierBrowser B/L TierSQL AzureFederation

Customer #3

11

There has to be other ways

12

Polyglot Persistence

13

Polyglot Programmer

14

15

Where Did NoSQL Originate?• 1998 - Carlo Strozzi– NoSQL project - lightweight open-source

relational DB with no SQL interface

• 2009 - Eric Evans & Johan Oskarsson of Last.fm wanted to organize an event to discuss open-source distributed databases

16

NoSQL (loose) Definition• (often) Open source

• Non-relational

• Distributed

• (often) don’t guarantee ACID

17

Atlanta 2009• No:sql(east) conference– select fun, profit from real_world where

relational=false

• Billed as “conference of no-rel datastores”

18

Types Of NoSQL Data Stores

19

5 Groups of Data Models

Relational

Document

Key Value

Graph

Column Family

20

Document Store• Apache Jackrabbit

• CouchDB

• MongoDB

• SimpleDB

• XML Databases– MarkLogic Server– eXist.

21

Document?• Okay think of a web page...– Relational model requires column/tag– Lots of empty columns– Wasted space

• Document model just stores the pages as is– Saves on space– Very flexible.

22

Graph Storage• AllegroGraph

• Core Data

• Neo4j

• DEX

• FlockDB

• Microsoft Trinity (research project)– http://research.microsoft.com/en-us/projects/

trinity/

23

What’s a graph?• Graph consists of– Node (‘stations’ of the graph)– Edges (lines between them)

• FlockDB– Created by the Twitter folks– Nodes = Users– Edges = Nature of relationship between nodes.

24

Key/Value Stores• On disk

• Cache in Ram

• Eventually Consistent– Weak Definition

• “If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent”

– Strong Definition• “for a given update and a given replica eventually either the

update reaches the replica or the replica retires”

• Ordered– Distributed Hash Table allows lexicographical processing

25

Key/Value Examples• Azure AppFabric Cache

• Memcache-d

• VMWare vFabric GemFire

26

Object Databases• Db4o

• GemStone/S

• InterSystems Caché

• Objectivity/DB

• ZODB

27

Tabular• BigTable

• Mnesia

• Hbase

• Hypertable

• Azure Table Storage

• SQL Server 2012

28

Azure Table Storage Demo

29

Big Data

30

Big Data Definition• Volumes & volumes of data

• Unstructured

• Semi-structured

• Not suited for Relational Databases

• Often utilizes MapReduce frameworks

31

Big Data Examples• Cassandra

• Hadoop

• Greenplum

• Azure Storage

• EMC Atmos

• Amazon S3

• SQL Azure (with Federations support)

32

Real World Example• Twitter

– The challenges• Needs to store many graphs

Who you are following Who’s following you Who you receive phone

notifications from etc• To deliver a tweet requires

rapid paging of followers• Heavy write load as followers

are added and removed• Set arithmetic for @mentions

(intersection of users).

33

What did they try?• Started with

Relational Databases

• Tried Key-Value storage of denormalized lists

• Did it work?– Nope

• Either good at Handling the write

load Or paging large

amounts of data But not both

34

What did they need?• Simplest possible thing that would work

• Allow for horizontal partitioning

• Allow write operations to

• Arrive out of order– Or be processed more than once– Failures should result in redundant work

• Not lost work!

35

The Result was FlockDB• Stores graph data

• Not optimized for graph traversal operations

• Optimized for large adjacency lists– List of all edges in a graph

• Key is the edge value a set of the node end points

• Optimized for fast read and write

• Optimized for page-able set arithmetic.

36

How Does it Work?• Stores graphs as sets of edges between

nodes

• Data is partitioned by node– All queries can be answered by a single partition

• Write operations are idempotent– Can be applied multiple times without changing

the result

• And commutative– Changing the order of operands doesn’t change

the result.

37

Working With Big Data

38

ACID• Atomicity– All or Nothing

• Consistency– Valid according to all defined rules

• Isolation– No transaction should be able to interfere with

another transaction

• Durability– Once a transaction has been committed, it will

remain so, even in the event of power loss, crashes, or errors

39

BASE• Basically Available– High availability but not always consistent

• Soft state– Background cleanup mechanism

• Eventual consistency– Given a sufficiently long period of time over

which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.

40

Traditional (relational) Approach

Extract

Transform

Load

Transactional Data Store

Data Warehouse

41

Big Data Approach• MapReduce Pattern/Framework– an Input Reader– Map Function – To transform to a common

shape (format)– a partition function– a compare function– Reduce Function– an Output Writer

42

MongoDB Example

> // map function> m = function(){... this.tags.forEach(... function(z){... emit( z , { count : 1 } );... }... );...};

> // reduce function> r = function( key , values ){... var total = 0;... for ( var i=0; i<values.length; i++ )... total += values[i].count;... return { count : total };...};

> // execute> res = db.things.mapReduce(m, r, { out : "myoutput" } );

43

MongoDB Demo

44

Big Data on Azure• Azure Table Storage– Azure Service Bus

• SQL Azure Federations

• MongoDB on Azure– http://www.mongodb.org/display/DOCS/MongoDB+on+Azur

e

• Hadoop on Azure– https://www.hadooponazure.com/

45

Using Azure for Computing

MasterClient

Data

Worker

Worker

Worker

Data

Data

DataJob/Task SchedulerSockets

46

Moving to Event Based Architecture

Web Role

Queue

Req

Web Role

Web Role

Req

Req

Monitor queuelength against

user’s expectations

Web Role

Web Role

Web Role

Worker Role

Worker Role

Worker Role

Worker Role

Worker Role

Worker Role

47

Aggregate Stores

48

Visualizing Aggregates

ID: 1001

Customer: Ann

Line Items

32411234 2 $48 $96

707423234 1 $56 456

125145 1 $24 $24

Payment Details

Card: AmExCC#: 12343Expiration: 07/2015

Orders

Customers

Order Lines

Credit Cards

49

Visualizing Aggregates

ID: 1001

Customer: Ann

Line Items

32411234 2 $48 $96

707423234 1 $56 456

125145 1 $24 $24

Payment Details

Card: AmExCC#: 12343Expiration: 07/2015

{“SalesOrdersView”:{ ID: 1001, Customer: Ann, LineItems: []……………..…………….……………..}}

50

MongoDB on Azure Demo

51

Next Steps

• Learn a NoSQL product– Great place to start – AppFabric Cache, Azure

Table Storage, MongoDB

• Pick a new programming language to learn– Not Java or C#/VB– Node.js, JavaScript, F#

52

THANK YOU


Recommended