+ All Categories
Home > Documents > No sql findings

No sql findings

Date post: 13-Jun-2015
Category:
Upload: christian-van-der-leeden
View: 1,594 times
Download: 1 times
Share this document with a friend
Popular Tags:
23
NoSQL Findings Christian van der Leeden Thursday, September 23, 2010
Transcript
Page 1: No sql findings

NoSQL FindingsChristian van der Leeden

Thursday, September 23, 2010

Page 2: No sql findings

Our problem• Growth is not linear and not predictable

• e.g. History::Session table now > 30 Mio entries

• Activities > 26 Mio entries

• Postgres will be the performance bottleneck

Thursday, September 23, 2010

Page 3: No sql findings

Criteria• Allow us to scale from 100k Daily Active Users (DAU)

to 1 Mio DAU up to 10Mio DAU

• Scale horizontally (“Just add servers”)

• Good ruby performance

• Good transition from Rails/Postgres -> Rails/NoSQL

• Actively developed

Thursday, September 23, 2010

Page 4: No sql findings

Goal• Scores (@ 10 Mio Daily Active Users)

• 10 Mio Scores/day == 350 inserts/second

• around same read rate for Leaderboards

• Game with 10 Mio Players

• Leaderboard with 10 Mio entries

• Session (@ 10 Mio DAU)

• > 10 Mio session handshakes/day

Thursday, September 23, 2010

Page 5: No sql findings

Data Patterns• Most data is accessed time based (the most recent

data is accessed the most often)

• Write-Read rate is around the same

• Eventually consistency is good enough most of the time

Thursday, September 23, 2010

Page 6: No sql findings

Rating criteria• Type (Document Store, Key/Value Store, Big Table)

• Deployment

• How easy is it to scale?

• Existing installations

• How big are known installations?

• Heritage and activity

• Where does the solution come from and how actively is it developed by whom?

Thursday, September 23, 2010

Page 7: No sql findings

Products evaluated• MongoDB

• Redis

• Cassandra

• HBase

• Membase

Thursday, September 23, 2010

Page 8: No sql findings

MongoDB• document store

• “SQL DB” without relations

• easy transition with MongoMapper, Mongoid

• supports sharding over replication sets (since August 2010)

• Haven’t found a big shareded server installation

Thursday, September 23, 2010

Page 9: No sql findings

Experience with Mongo• nice/easy to program with

• deployment woes we’ve encountered (1.6.0)

• segmentation fault

• cannot read beacuse: invalid BSON object

• when index is > RAM performance degradation (from 20ms to 200 ms for queries)

• Global write lock makes data migrations slow

Thursday, September 23, 2010

Page 10: No sql findings

Cassandra• Big Table data store

• Was developed by Facebook and is actively maintained

• Easy to add servers and to setup (peer to peer concept)

• Thrift API to Ruby was slow in tests (Our tests: around 150 write ops/second)

• Avro API promises to be faster (will be an option in 0.7)

• Used by Facebook

• Not using it because it is too slow with ruby

Thursday, September 23, 2010

Page 11: No sql findings

Redis • Memcache with simple persistence

• Supports many different data types and atomic operations on them

• Sharding is done client side (difficult to add new servers)

• We’re using it for indexes on SQL data

• Very fast (Our tests: 4000 write operations/second)

Thursday, September 23, 2010

Page 12: No sql findings

HBase• Big Table Database

• Complex to setup and to maintain

• Very often used for Analytics Jobs with Hadoop/HIVE e.g as Amazon EC2 Elastic Map Reduce

• For Analytics also look at Scribe for data collection

Thursday, September 23, 2010

Page 13: No sql findings

Membase• Key-Value Store

• Distributed, persistent Memcache

• Easy to add nodes

• Used by Zynga

Thursday, September 23, 2010

Page 14: No sql findings

Example Leaderboards• User has many scores

• Each score has one result (integer)

• Game has many scores

• Query: the leaderboard for one game

• Insert one score into the leaderboard

• What is my rank?

• Give me 10 scores starting at position 100,000

Thursday, September 23, 2010

Page 15: No sql findings

SQL vs NoSQL• Think about Data

• Redundancy is bad

• Indexes are managed by the DB

• Query over relations

• Always exact results

• Think about Queries

• Redundancy is ok

• Roll your own indexes depending on queries

• No Joins and connecting entities

• Query results don’t have to return latest write operation

Thursday, September 23, 2010

Page 16: No sql findings

SQL vs NoSQL• standardized query

language and DDL

• All DBs are “the same”

• some solutions share standards

• Many different approaches

• Document store

• Big Table

• Key Value

Thursday, September 23, 2010

Page 17: No sql findings

Postgres

• Create new score: Score.new(attributes)Score.save => insert into scores;

• What is my rank?select count(*) from scores inner join games on (games.id = scores.game_id) where result > #{my_score.result} and games.name = #{game_name} order by result desc

• Give me 10 scores in leaderboard from position 100000 select * from scores inner join games on (games.id = scores.game_id)order by result desc offset 100000 limit 10;

ScoreUser Game1 n n 1

Thursday, September 23, 2010

Page 18: No sql findings

Redis• New Score

redis.zadd(“Jewels”, result, score_id)

• My Rank?redis.zrevrank("Jewels", result)

• 10 scores from position 100000redis.zrevrange(“Jewels”, 100000, 10)

SortedSet

key: game_namescore: resultvalue: score_id

key: "Jewels"

100<2563>

99<96877>

96<6752>

...

key: "Bug Landing"

key: "Toss It"

...

KeyValue Store

key: score_idvalue: marshalled score object

2563: { result : 100, user_id : 52345, game_id: 57142 } 96877: { result : 99, user_id : 2541, game_id: 57142 } 9752: { result : 96, user_id : 3652, game_id: 57142 }

Thursday, September 23, 2010

Page 19: No sql findings

Mongo

• New ScoreScore.create!(attributes)db.scores.insert( { result: 100, user_id: 52345, game_id: 57142 } )

• What is my rank?db.scores.count( { result: { $gt: #{my_score.result} }})

• 10 scores from position 100000db.scores.find({}).sort({ result: -1 }).skip(100000).limit(10)

Collection

key: Scores

{ _id: 6752, result : 96, user_id : 3652, game_id: 57142 }{ _id: 96877, result : 99, user_id : 2541, game_id: 57142 }

{ _id: 2563, result : 100, user_id : 52345, game_id: 57142 }

Thursday, September 23, 2010

Page 20: No sql findings

CassandraColumFamily: Leaderboards

row_key: game_name

row_key: "Jewels"

100: 2563

row_key: "Bug Landing"

row_key: "Toss It"

...

99: 96877 96: 6752

ColumFamily: Scores

row_key: score_id

row_key: 2563

result: 100

...

game_id: 57142 user_id: 6325

row_key: 96877

result: 99 game_id: 57142 user_id: 2375

row_key: 6752

result: 96 game_id: 57142 user_id: 2311

Thursday, September 23, 2010

Page 21: No sql findings

Cassandra • Insert new score:

client.insert(“ScoreList”, “Jewels”, result => id)client.insert(id, :result => result, :user_id => user_id, :game_id => game_id)

• What is my rank?=> not easy, need help from other tools

• Give me the next 10 scores starting at score Xclient.get(“ScoreList”, “Jewels”, :start => X.result, count => 10)

ColumFamily: Leaderboards

row_key: game_name

row_key: "Jewels"

100: 2563

row_key: "Bug Landing"

row_key: "Toss It"

...

99: 96877 96: 6752

Thursday, September 23, 2010

Page 22: No sql findings

Findings• Use and test the tools you want to use on the scale

you are going to use them

• There is no “Best NoSQL” solution

• Mix and match the tools you need

• NoSQL requires a lot of rethinking and change in your Ruby Code.

Thursday, September 23, 2010

Page 23: No sql findings

Links• Cassandra: http://cassandra.apache.org/

• Cassandra API: http://wiki.apache.org/cassandra/API

• Twitter on Cassandra: http://github.com/ericflo/twissandra

• Redis: http://code.google.com/p/redis/

• Redis API: http://code.google.com/p/redis/wiki/CommandReference

• Membase: http://www.membase.org/

• HBase: http://hbase.apache.org/

• Scribe: http://github.com/facebook/scribe

• Mongo: http://www.mongodb.org/

Thursday, September 23, 2010


Recommended