+ All Categories
Home > Documents > introducing_in_mongodb

introducing_in_mongodb

Date post: 01-Feb-2016
Category:
Upload: fedogor-fed
View: 218 times
Download: 0 times
Share this document with a friend
Description:
Начальные знания о mongodb
Popular Tags:
57
Introducing: MongoDB David J. C. Beach Sunday, August 1, 2010
Transcript
Page 1: introducing_in_mongodb

Introducing: MongoDB

David J. C. Beach

Sunday, August 1, 2010

Page 2: introducing_in_mongodb

David Beach

Software Consultant (past 6 years)

Python since v1.4 (late 90’s)

Design, Algorithms, Data Structures

Sometimes Database stuff

not a “frameworks” guy

Organizer: Front Range Pythoneers

Sunday, August 1, 2010

Page 3: introducing_in_mongodb

Outline

Part I: Trends in Databases

Part II: Mongo Basic Usage

Part III: Advanced Features

Sunday, August 1, 2010

Page 4: introducing_in_mongodb

Part I:Trends in Databases

Sunday, August 1, 2010

Page 5: introducing_in_mongodb

Database Trends

Past: “Relational” (RDBMS)

Data stored in Tables, Rows, Columns

Relationships designated by Primary, Foreign keys

Data is controlled & queried via SQL

WARNING: extreme oversimplification

Sunday, August 1, 2010

Page 6: introducing_in_mongodb

Trends:Criticisms of RDBMS

Rigid data model

Hard to scale / distribute

Slow (transactions, disk seeks)

SQL not well standardized

Awkward for modern/dynamic languages

Lots of disagreement over this

There are points & counterpoints from both sides

The debate is not over

Not here to deliver a verdict

POINT: This is why we see an explosion of new databases.

Sunday, August 1, 2010

Page 7: introducing_in_mongodb

Trends:Fragmentation

Relational with ORM (Hibernate, SQLAlchemy)

ODBMS / ORDBMS (push OO-concepts into database)

Key-Value Stores (MemcacheDB, Redis, Cassandra)

Graph (neo4j)

Document Oriented (Mongo, Couch, etc...)categories are incomplete

some don’t fit neatly into categories

As with so many things in technology, we’re seeing... FRAGMENTATION!

some examples of DB categories

Sunday, August 1, 2010

Page 8: introducing_in_mongodb

Where Mongo Fits

“The Best Features ofDocument Databases,

Key-Value Stores,and RDBMSes.”

Mongo’s Tagline (taken from website)

Sunday, August 1, 2010

Page 9: introducing_in_mongodb

What is Mongo

Document-Oriented Database

Produced by 10gen / Implemented in C++

Source Code Available

Runs on Linux, Mac, Windows, Solaris

Database: GNU AGPL v3.0 License

Drivers: Apache License v2.0

Sunday, August 1, 2010

Page 10: introducing_in_mongodb

MongoAdvantages

json-style documents (dynamic schemas)

flexible indexing (B-Tree)

replication and high-availability (HA)

automatic sharding support (v1.6)*

easy-to-use API

fast queries (auto-tuning planner)

fast insert & deletes (sometimes trade-offs)

sharding support available as of v1.6 (late July 2010)

many of these taken straight from home page

Sunday, August 1, 2010

Page 11: introducing_in_mongodb

MongoLanguage Bindings

C, C++, Java

Python, Ruby, Perl

PHP, JavaScript

(many more community supported ones)

Sunday, August 1, 2010

Page 12: introducing_in_mongodb

MongoDisadvantages

No Relational Model / SQL

No Explicit Transactions / ACID

Limited Query API You can do a lot more with MapReduce and JavaScript!

Operations can only be atomic within single collection. (Generally)

Can mimic with foreign IDs, but referential integrity not enforced.

Sunday, August 1, 2010

Page 13: introducing_in_mongodb

When to use Mongo

Rich semistructured records (Documents)

Transaction isolation not essential

Humongous amounts of data

Need for extreme speed

You hate schema migrations

My personal take on this...

Caveat: I’ve never used Mongo in Production!

Sunday, August 1, 2010

Page 14: introducing_in_mongodb

Part II:Mongo Basic Usage

BRIEFLY cover:

- Download, Install, Configure- connection, creating DB, creating Collection- CRUD operations (Insert, Query, Update, Delete)

Sunday, August 1, 2010

Page 15: introducing_in_mongodb

Installing Mongo

Use a 64-bit OS (Linux, Mac, Windows)

Get Binaries: www.mongodb.org

Run “mongod” process

32-bit available; not for production

PyMongo uses memory-mapped files.

32-bits limits database to 2 GB!

Sunday, August 1, 2010

Page 16: introducing_in_mongodb

Installing PyMongo

Download: http://pypi.python.org/pypi/pymongo/1.7

Build with setuptools

(includes C extension for speed)

# python setup.py install

# python setup.py --no-ext install

(to compile without extension)

Sunday, August 1, 2010

Page 17: introducing_in_mongodb

Mongo Anatomy

Database

Collection

Document

Mongo Server

Sunday, August 1, 2010

Page 18: introducing_in_mongodb

>>> import pymongo

>>> connection = pymongo.Connection(“localhost”)

Getting a Connection

Connection required for using Mongo

Sunday, August 1, 2010

Page 19: introducing_in_mongodb

>>> db = connection.mydatabase

Finding a Database

Databases = logically separate stores

Navigation using properties

Will create DB if not found

Sunday, August 1, 2010

Page 20: introducing_in_mongodb

>>> blog = db.blog

Using a Collection

Collection is analogous to Table

Contains documents

Will create collection if not found

Sunday, August 1, 2010

Page 21: introducing_in_mongodb

>>> entry1 = {“title”: “Mongo Tutorial”, “body”: “Here’s a document to insert.” }

>>> blog.insert(entry1)

ObjectId('4c3a12eb1d41c82762000001')

Inserting

collection.insert(document) => document_id

document

Sunday, August 1, 2010

Page 22: introducing_in_mongodb

>>> entry1

{'_id': ObjectId('4c3a12eb1d41c82762000001'), 'body': "Here's a document to insert.", 'title': 'Mongo Tutorial'}

Inserting (contd.)

Documents must have ‘_id’ field

Automatically generated unless assigned

12-byte unique binary value You can also assign your own ‘_id’, can be any unique value.

Mongo’s IDs are designed to be unique...

...even if hundreds of thousands of documents are generated per second, on numerous clustered machines.

ID generated by driver. No waiting on DB.

Sunday, August 1, 2010

Page 23: introducing_in_mongodb

>>> entry2 = {"title": "Another Post", "body": "Mongo is powerful", "author": "David", "tags": ["Mongo", "Power"]}

>>> blog.insert(entry2)ObjectId('4c3a1a501d41c82762000002')

Inserting (contd.)

Documents may have different properties

Properties may be atomic, lists, dictionaries

another documentSunday, August 1, 2010

Page 24: introducing_in_mongodb

>>> blog.ensure_index(“author”)

>>> blog.ensure_index(“tags”)

Indexing

May create index on any field

If field is list => index associates all values

index by single value

by multiple values

Sunday, August 1, 2010

Page 25: introducing_in_mongodb

bulk_entries = [ ]for i in range(100000): entry = { "title": "Bulk Entry #%i" % (i+1), "body": "What Content!", "author": random.choice(["David", "Robot"]), "tags": ["bulk",

random.choice(["Red", "Blue", "Green"])] } bulk_entries.append(entry)

Bulk Insert

Let’s produce 100,000 fake posts

Sunday, August 1, 2010

Page 26: introducing_in_mongodb

>>> blog.insert(bulk_entries)

[ObjectId(...), ObjectId(...), ...]

Bulk Insert (contd.)

collection.insert(list_of_documents)

Inserts 100,000 entries into blog

Returns in 2.11 seconds

Sunday, August 1, 2010

Page 27: introducing_in_mongodb

>>> blog.remove() # clear everything

>>> blog.insert(bulk_entries, safe=True)

Bulk Insert (contd.)

returns in 7.90 seconds (vs. 2.11 seconds)

driver returns early; DB is still working

...unless you specify “safe=True”

Sunday, August 1, 2010

Page 28: introducing_in_mongodb

>>> blog.find_one({“title”: “Bulk Entry #12253”})

{u'_id': ObjectId('4c3a1e411d41c82762018a89'), u'author': u'Robot', u'body': u'What Content!', u'tags': [u'bulk', u'Green'], u'title': u'Bulk Entry #99999'}

Querying

collection.find_one(spec) => document

spec = document of query parameters

presumably, need more entries to effectively test index performance...

returned in 0.04s - extremely fast

No index created for “title”!

Sunday, August 1, 2010

Page 29: introducing_in_mongodb

>>> blog.find_one({“title”: “Bulk Entry #12253”, “tags”: “Green”})

{u'_id': ObjectId('4c3a1e411d41c82762018a89'), u'author': u'Robot', u'body': u'What Content!', u'tags': [u'bulk', u'Green'], u'title': u'Bulk Entry #99999'}

Querying(Specs)

Multiple conditions on document => “AND”

Value for tags is an “ANY” match

presumably, need more entries to effectively test index performance...

Sunday, August 1, 2010

Page 30: introducing_in_mongodb

>>> green_items = [ ]>>> for item in blog.find({“tags”: “Green”}): green_items.append(item)

Querying(Multiple)

collection.find(spec) => cursor

new items are fetched in bulk (behind the scenes)

>>> green_items = list(blog.find({“tags”: “Green”}))

- or -

Sunday, August 1, 2010

Page 31: introducing_in_mongodb

>>> blog.find({"tags": "Green"}).count()

16646

Querying(Counting)

Use the find() method + count()

Returns number of matches found

presumably, need more entries to effectively test index performance...

Sunday, August 1, 2010

Page 32: introducing_in_mongodb

>>> item = blog.find_one({“title”: “Bulk Entry #12253”})>>> item.tags.append(“New”)>>> blog.update({“_id”: item[‘_id’]}, item)

Updating

collection.update(spec, document)

updates single document matching spec

“multi=True” => updates all matching docs

Sunday, August 1, 2010

Page 33: introducing_in_mongodb

>>> blog.remove({"author":"Robot"}, safe=True)

Deleting

use remove(...)

it works like find(...)

Example removed approximately 50% of records.

Took 2.48 seconds

Sunday, August 1, 2010

Page 34: introducing_in_mongodb

Part III:Advanced Features

Sunday, August 1, 2010

Page 35: introducing_in_mongodb

Advanced Querying

Regular Expressions

{“tag” : re.compile(r“^Green|Blue$”)}

Nested Values {“foo.bar.x” : 3}

$where Clause (JavaScript)

Sunday, August 1, 2010

Page 36: introducing_in_mongodb

>>> blog.find({“$or”: [{“tags”: “Green”}, {“tags”: “Blue”}]})

Advanced Querying

$lt, $gt, $lte, $gte, $ne

$in, $nin, $mod, $all, $size, $exists, $type

$or, $not

$elemmatch

Sunday, August 1, 2010

Page 37: introducing_in_mongodb

>>> blog.find().limit(50) # find 50 articles>>> blog.find().sort(“title”).limit(30) # 30 titles>>> blog.find().distinct(“author”) # unique author names

Advanced Querying

collection.find(...)

sort(“name”) - sorting

limit(...) & skip(...) [like LIMIT & OFFSET]

distinct(...) [like SQL’s DISTINCT]

collection.group(...) - like SQL’s GROUP BYwon’t be showing detailed examples of all these...

there are good tutorials online for all of this

let’s move on to something even more interesting

Sunday, August 1, 2010

Page 38: introducing_in_mongodb

Map/Reduce

collection.map_reduce(mapper, reducer)

ultimate in querying power

distribute across multiple nodes

Most powerful querying mechanism

Sunday, August 1, 2010

Page 39: introducing_in_mongodb

Map/ReduceVisualized

Java MapReduce

Mappermap()

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.MapReduceBase;import org.apache.hadoop.mapred.Mapper;import org.apache.hadoop.mapred.OutputCollector;import org.apache.hadoop.mapred.Reporter;

public class MaxTemperatureMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999; public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { output.collect(new Text(year), new IntWritable(airTemperature)); } }}

20 | Chapter 2: MapReduce

Diagram Credit:Hadoop: The Definitive Guideby Tom White; O’Reilly Books

Chapter 2, page 20

also see: Map/Reduce : A Visual Explanation

1 2 3

Sunday, August 1, 2010

Page 40: introducing_in_mongodb

db.runCommand({mapreduce: "DenormAggCollection",query: { filter1: { '$in': [ 'A', 'B' ] }, filter2: 'C', filter3: { '$gt': 123 } },map: function() { emit( { d1: this.Dim1, d2: this.Dim2 }, { msum: this.measure1, recs: 1, mmin: this.measure1, mmax: this.measure2 < 100 ? this.measure2 : 0 } );},reduce: function(key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for(var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; },finalize: function(key, val) { val.mavg = val.msum / val.recs; return val; },out: 'result1',verbose: true});db.result1. find({ mmin: { '$gt': 0 } }). sort({ recs: -1 }). skip(4). limit(8);

SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg, MIN(Measure1) AS MMin MAX(CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMaxFROM DenormAggTableWHERE (Filter1 IN (’A’,’B’)) AND (Filter2 = ‘C’) AND (Filter3 > 123)GROUP BY Dim1, Dim2HAVING (MMin > 0)ORDER BY RecordCount DESCLIMIT 4, 8

!

"

#

$

%

!

&'

!

"

#

$

%

()*+,-./.01-230*2/4*5+123/6)-/,+55-./*+7/63/8-93/02/7:-/16,/;+2470*2</)-.+402=/7:-/30>-/*;/7:-/?*)802=/3-7@

A-63+)-3/1+37/B-/162+6559/6==)-=67-.@

C==)-=67-3/.-,-2.02=/*2/)-4*)./4*+273/1+37/?607/+2705/;02650>670*2@

A-63+)-3/462/+3-/,)*4-.+)65/5*=04@

D057-)3/:6E-/62/FGAHC470E-G-4*).I5**802=/3795-@

' C==)-=67-/;057-)02=/1+37/B-/6,,50-./7*/7:-/)-3+57/3-7</2*7/02/7:-/16,H)-.+4-@

& C34-2.02=J/!K/L-34-2.02=J/I!

G-E030*2/$</M)-67-./"N!NIN#IN'

G048/F3B*)2-</)048*3B*)2-@*)=

19OPQ A*2=*LR

http://rickosborne.org/download/SQL-to-MongoDB.pdfSunday, August 1, 2010

Page 41: introducing_in_mongodb

Map/ReduceExamples

This is me, playing with Map/Reduce

Sunday, August 1, 2010

Page 42: introducing_in_mongodb

Health Clinic Example

Person registers with the Clinic

Weighs in on the scale

1 year => comes in 100 times

Sunday, August 1, 2010

Page 43: introducing_in_mongodb

Health Clinic Example

person = { “name”: “Bob”,

! “weighings”: [

! ! {“date”: date(2009, 1, 15), “weight”: 165.0},

! ! {“date”: date(2009, 2, 12), “weight”: 163.2},

! ! ... ]

}

Sunday, August 1, 2010

Page 44: introducing_in_mongodb

for i in range(N): person = { 'name': 'person%04i' % i } weighings = person['weighings'] = [ ] std_weight = random.uniform(100, 200) for w in range(100): date = (datetime.datetime(2009, 1, 1) + datetime.timedelta( days=random.randint(0, 365)) weight = random.normalvariate(std_weight, 5.0) weighings.append({ 'date': date, 'weight': weight }) weighings.sort(key=lambda x: x['date']) all_people.append(person)

Map/ReduceInsert Script

Sunday, August 1, 2010

Page 45: introducing_in_mongodb

Insert DataPerformance

1

10

100

1000

1k 10k 100k

3.14s

29.5s

292s

Insert

LOG-LOG scale

Linear scaling

Sunday, August 1, 2010

Page 46: introducing_in_mongodb

map_fn = Code("""function () { this.weighings.forEach(function(z) { emit(z.date, z.weight); });}""")

reduce_fn = Code("""function (key, values) { var total = 0; for (var i = 0; i < values.length; i++) { total += values[i]; } return total;}""")

result = people.map_reduce(map_fn, reduce_fn)

Map/ReduceTotal Weight by Day

Sunday, August 1, 2010

Page 47: introducing_in_mongodb

>>> for doc in result.find(): print doc

{u'_id': datetime.datetime(2009, 1, 1, 0, 0), u'value': 39136.600753163315}{u'_id': datetime.datetime(2009, 1, 2, 0, 0), u'value': 41685.341024046182}{u'_id': datetime.datetime(2009, 1, 3, 0, 0), u'value': 38232.326554504165}

... lots more ...

Map/ReduceTotal Weight by Day

Sunday, August 1, 2010

Page 48: introducing_in_mongodb

Total Weight by Day Performance

1

10

100

1000

1k 10k 100k

4.29s

38.8s

384s

MapReduce

Sunday, August 1, 2010

Page 49: introducing_in_mongodb

map_fn = Code("""function () { var target_date = new Date(2009, 9, 5); var pos = bsearch(this.weighings, "date", target_date); var recent = this.weighings[pos]; emit(this._id, { name: this.name, date: recent.date, weight: recent.weight });};""")

reduce_fn = Code("""function (key, values) { return values[0];};""")

result = people.map_reduce(map_fn, reduce_fn, scope={"bsearch": bsearch})

Map/ReduceWeight on Day

Sunday, August 1, 2010

Page 50: introducing_in_mongodb

bsearch = Code("""function(array, prop, value) { var min, max, mid, midval; for(min = 0, max = array.length - 1; min <= max; ) { mid = min + Math.floor((max - min) / 2); midval = array[mid][prop]; if(value === midval) { break; } else if(value > midval) { min = mid + 1; } else { max = mid - 1; } } return (midval > value) ? mid - 1 : mid;};""")

Map/Reducebsearch() function

Sunday, August 1, 2010

Page 51: introducing_in_mongodb

Weight on DayPerformance

1

10

100

1000

1k 10k 100k1.23s

10s

108s

MapReduce

Sunday, August 1, 2010

Page 52: introducing_in_mongodb

target_date = datetime.datetime(2009, 10, 5)

for person in people.find(): dates = [ w['date'] for w in person['weighings'] ] pos = bisect.bisect_right(dates, target_date) val = person['weighings'][pos]

Weight on Day(Python Version)

Sunday, August 1, 2010

Page 53: introducing_in_mongodb

Map/ReducePerformance

0.1

1

10

100

1000

1k 10k 100k

0.37s

2.2s

26s

1.23s

10s

108s

MapReduce Python

Sunday, August 1, 2010

Page 54: introducing_in_mongodb

Summary

Sunday, August 1, 2010

Page 55: introducing_in_mongodb

Resources

www.10gen.com

www.mongodb.org

MongoDBThe Definitive Guide

O’Reilly

api.mongodb.org/pythonPyMongo

Sunday, August 1, 2010

Page 56: introducing_in_mongodb

END OF SLIDES

Sunday, August 1, 2010

Page 57: introducing_in_mongodb

Chalkboardis not Comic Sans

This is Chalkboard, not Comic Sans.

This isn’t Chalkboard, it’s Comic Sans.

does it matter, anyway?

Sunday, August 1, 2010