Date post: | 18-May-2015 |
Category: |
Technology |
Upload: | richard-schneeman |
View: | 1,549 times |
Download: | 1 times |
Scaling the Web:Databases &NoSQL
Richard Schneeman@schneems works for @Gowalla
Wed Nov 10 2011
whoami• @Schneems
• BSME with Honors from Georgia Tech
• 5 + years experience Ruby & Rails
• Work for @Gowalla
• Rails 3.1 contributor : )
• 3 + years technical teaching
Traffic
Compounding Trafficex. Wikipedia
Compounding Trafficex. Wikipedia
Gowalla
Gowalla• 50 best websites NYTimes 2010
• Founded 2009 @ SXSW
• 1 million+ Users
• Undisclosed Visitors
• Loves/highlights/comments/stories/guides
• Facebook/Foursquare/Twitter integration
• iphone/android/web apps
• public API
Gowalla Backend• Ruby on Rails
• Uses the Ruby Language
• Rails is the Framework
The Web is Data• Username => String
• Birthday => Int/ Int/ Int
• Blog Post => Text
• Image => Binary-file/blob
Data needs to be stored to be useful
Database
Gowalla Database • PostgreSQL
• Relational (RDBMS)
• Open Source
• Competitor to MySQL
• ACID compliant
• Running on a Dedicated Managed Server
Need for Speed• Throughput:
• The number of operations per minute that can be performed
• Pure Speed:
• How long an individual operation takes.
Potential Problems • Hardware
• Slow Network
• Slow hard-drive
• Insufficient CPU
• Insufficient Ram
• Software
• too many Reads
• too many Writes
Scaling Up versus Out• Scale Up:
• More CPU, Bigger HD, More Ram etc.
• Scale Out:
• More machines
• More machines
• More machines
• ...
Scale Up• Bigger faster machine
• More Ram
• More CPU
• Bigger ethernet bus
• ...
• Moores Law
• Diminishing returns
Scale Out• Forget Moores law...
• Add more nodes
• Master/ Slave Database
• Sharding
Master DB
Slave DB Slave DB Slave DB Slave DB
Write
Copy
Read
Master/Slave
Master & Slave +/-• Pro
• Increased read speed
• Takes read load off of master
• Allows us to Join across all tables
• Con
• Doesn’t buy increased write throughput
• Single Point of Failure in Master Node
Users in USA
Read
Sharding
Write
Users in Europe
Users in Asia
Users in Africa
Sharding +/-• Pro
• Increased Write & Read throughput
• No Single Point of failure
• Individual features can fail
• Con
• Cannot Join queries between shards
What is a Database?• Relational Database Managment System
(RDBMS)
• Stores Data Using Schema
• A.C.I.D. compliant
• Atomic
• Consistent
• Isolated
• Durable
RDBMS• Relational
• Matches data on common characteristics in data
• Enables “Join” & “Union” queries
• Makes data modular
Relational +/-• Pros
• Data is modular
• Highly flexible data layout
• Cons
• Getting desired data can be tricky
• Over modularization leads to many join queries
• Trade off performance for search-ability
Schema Storage• Blueprint for data storage
• Break data into tables/columns/rows
• Give data types to your data
• Integer
• String
• Text
• Boolean
• ...
Schema +/-• Pros
• Regularize our data
• Helps keep data consistent
• Converts to programming “types” easily
• Cons
• Must seperatly manage schema
• Adding columns & indexes to existing large tables can be painful & slow
ACID• Properties that guarante a database
transaction are processed reliably
• Atomic
• Consistent
• Isolated
• Durable
ACID• Atomic
• Any database Transaction is all or nothing.
• If one part of the transaction fails it all fails
“An Incomplete Transaction Cannot Exist”
ACID• Consistent
• Any transaction will take the database from one consistent state to another
“Only Consistent data is allowed to be written”
ACID• Isolated
• No transaction should be able to interfere with another transaction
“the same field cannot be updated by two sources at the exact same time”
a = 0a += 1 a += 2 } a = ??
ACID• Durable
• Once a transaction Is committed it will stay that way
“Save it once, read it forever”
What is a Database?• RDBMS
• Relational
• Flexible
• Has a schema
• Most likely ACID compliant
• Typically fast under low load or when optimized
What is SQL?• Structured Query Language
• The language databases speak
• Based on relational algebra
• Insert
• Query
• Update
• Delete
“SELECT Company, Country FROM Customers WHERE Country = 'USA' ”
Why people <3 SQL• Relational algebra is powerful
• SQL is proven
• well understood
• well documented
Why people </3 SQL• Relational algebra Is hard
• Different databases support different SQL syntax
• Yet another programming language to learn
SQL != Database• SQL is used to talk to a RDBMS (database)
• SQL is not a RDBMS
What is NoSQL?
Not ARelationalDatabase
RDBMS
Types of NoSQL• Distributed Systems
• Document Store
• Graph Database
• Key-Value Store
• Eventually Consistent Systems
Mix And Match ↑
Key Value Stores• Non Relational
• Typically No Schema
• Map one Key (a string) to a Value (some object)
Example: Redis
Key Value Exampleredis = Redis.new
redis.set(“foo”, “bar”)
redis.get(“foo”)
>> “bar”
Key Value Exampleredis = Redis.new
redis.set(“foo”, “bar”)
redis.get(“foo”)
>> “bar”
Key Value
Key
Value
Key Value• Like a databse that can only ever use
primary Key (id)
YESselect * from users where id = ‘3’;
NOselect * from users where name = ‘schneems’;
NoSQL @ Gowalla• Redis (key-value store)
• Store “Likes” & Analytics
• Memcache (key-value store)
• Cache Database results
• Cassandra
• (eventually consistent, with-schema, key value store)
• Store “feeds” or “timelines”
• Solr (search index)
Memcache• Key-Value Store
• Open Source
• Distributed
• In memory (ram) only
• fast, but volatile
• Not ACID
• Memory object caching system
Memcache Examplememcache = Memcache.new
memcache.set(“foo”, “bar”)
memcache.get(“foo”)
>> “bar”
Memcache• Can store whole objects
memcache = Memcache.newuser = User.where(:username => “schneems”)memcache.set(“user:3”, user)
user_from_cache = memcache.get(“user:3”)user_from_cache == user>> trueuser_from_cache.username>> “Schneems”
Memcache @ Gowalla• Cache Common Queries
• Decreases Load on DB (postgres)
• Enables higher throughput from DB
• Faster response than DB
• Users see quicker page load time
What to Cache?• Objects that change infrequently
• users
• spots (places)
• etc.
• Expensive(ish) sql queries
• Friend ids for users
• User ids for people visiting spots
• etc.
Memcache Distributed
B
C
A
Memcache Distributed
B C
A
Easily add more nodes
D
Memcache <3’s DB• We use them Together
• If memcache doesn’t have a value
• Fetch from the database
• Set the key from database
• Hard
• Cache Invalidation : (
Redis• Key Value Store
• Open Source
• Not Distributed (yet)
• Extremely Quick
• “Data structure server”
Redis Example, againredis = Redis.new
redis.set(“foo”, “bar”)
redis.get(“foo”)
>> “bar”
Redis - Has Data Types• Strings
• Hashes
• Lists
• Sets
• Sorted Sets
Redis Example, setsredis = Redis.newredis.sadd(“foo”, “bar”)redis.members(“foo”)>> [“bar”]redis.sadd(“foo”, “fly”)redis.members(“foo”)>> [“bar”, “fly”]
Redis => Likeable• Very Fast response
• ~ 50 queries per page view
• ~ 1 ms per query
• http://github.com/Gowalla/likeable
Cassandra• Open Source
• Distributed
• Key Value Store
• Eventually Consistent
• Sortof not ACID
• Uses A Schema
• ColumnFamilies
Cassandra Distributed
B C
A
Eventual Consistency
D
Data In
Copied To Extra Nodes ... Eventually
Cassandra@ Gowalla{Activity
Feeds
Cassandra @ Gowalla• Chronologic
• http://github.com/Gowalla/chronologic
Should I use NoSQL?
Which One?
Pick the right tool
Tradeoffs • Every Data store has them
• Know your data store
• Strengths
• Weaknesses
NoSQL vs. RDBMS• No Magic Bullet
• Use Both!!!
• Model data in a datastore you understand
• Switch to when/if you need to
• Understand Your Options
Questions?
Richard Schneeman@schneems works for @Gowalla