Membase is an Open Source distributed, key-value database management system optimized for storing data behind interactive web applications.
All aspects of membase are simple, fast and elastic by design.
2
Valueimage courtesy http://www.flickr.com/photos/vintagedept/3617706196/
3
Simple
Image courtesy http://www.flickr.com/photos/brenda-starr/3509344100/sizes/m/in/photostream/
4
Simple
Image courtesy http://www.flickr.com/photos/brenda-starr/3509344100/sizes/m/in/photostream/
(with a replica )4
Fast
5
• Original use case: speed up access to authoritative data as a distributed hashtable
• Must be at at least as fast as a highly tuned DBMS
• Designed for modern datacenter substrate– Designed for VM and cloud
deployments
Elastic
• Add nodes without losing access to data
• Maintain consistency when accessing data– membase is a CP
type system• Scale linearly by just
adding more nodes
6
Before: Application scales linearly, data hits wall
Application Scales OutJust add more commodity web servers
Database Scales UpGet a bigger, more complex server
7
Membase is a distributed database
8
Membase Servers
In the data center
Web application server
Application user
On the administrator console
Built-in Memcached Caching Layer
9
Memcached
Membase Database
Memcached
Membase Database
Memcached Mode Membase Mode
Fact: Membase development team has also contributed over half of the code to the Memcached project.
Leading cloud service (PAAS) providerOver 65,000 hosted applicationsOver 2,000 users to dateMembase Server serving over 3,000 Heroku customers
Proven at small, and extra large scale
10
Social game leader – FarmVille, Mafia Wars, Café WorldOver 230 million monthly usersMembase Server is the 500,000 ops-per-second database behind FarmVille and Café World
After: Data layer scales like application logic layerData layer now scales with linear cost and constant performance.
Application Scales OutJust add more commodity web servers
11
Database Scales OutJust add more commodity data servers
Scaling out flattens the cost and performance curves.
Membase Servers
Who?
12
Fault-tolerant memcached Cluster
at NHNthe biggest web portal in Korea
What is Project Arcus?
• Memcached– Common protocol across PHP, Java, C
applications• Moxi (Memcached proxy) based• In-house automatic fault-detection and failover
solution• Collectd-based monitoring• Proxy and cache server administration UI• Private cloud service
14
Previous Deployments
• A few individual memcached installations• Problems
– No fault-tolerance• Hardware failures are common (heat, network switch
failure, etc)– No automatic scalability
• To add / remove a memcached server, they need to rebuild code, distribute, and restart all clients
15
Today
• Memcached clusters– Fault-tolerance transparent to clients
• Consistent hashing in moxi (memcached proxy)– Cache As A Service (CaaS)
• All major services in NHN started using cache• Multitenancy across cache services
16
Performance impact
X 16.6
Throughput
X 10
Response Time
Performance
50 %
34 %
DB Load
Membase-Cloudera Partnership
“AOL serves more than 5 billion impressions per day from our ad serving platforms, and any incremental improvement in processing time translates to huge benefits in our ability to more effectively serve the ads to needed meet our contractual commitments. Traditional databases like MySQL lack the scalability required to support our goal of five milliseconds per read/write. Creating user profiles with Hadoop, then serving them from Membase, reduces profile read and write access to under a millisecond, leaving the bulk of the processing time budget for improved targeting and customization.”
Pero SubasicChief Architect, AOL
Joint development of bi-directional software integration between Membase and Hadoop• Membase NodeCode Module streaming interface
to Cloudera Distribution for Hadoop via Flume interface
• Sqoop-derived command line utility for bi-directional batch movement of data between Membase and Cloudera Distribution for Hadoop
Joint marketing and sales of integrated distributed OLTP-OLAP solution• Membase – the distributed OLTP solution• Cloudera – the distributed OLAP solutionCloudera to distribute integration
Membase-Cloudera Partnership
Customer use case – Ad targeting
20
eventsprofiles, campaigns
profiles, real time campaign statistics
40 milliseconds to come up with an answer.
2
3
1
21
Demo
The Guts
Photo Courtesy http://www.flickr.com/photos/pellis/76804760/
23
Clustering
• Underlying cluster functionality based on erlang OTP
• Have a custom, vector clock based way of storing and propagating...– Cluster topology– vBucket mapping
• Collect statistics from many nodes of the cluster– Identify hot keys,
resource utilization 24
vBucket mapping
26
TAP
• A generic, scalable method of streaming mutations from a given server– As data operations arrive, they can be sent to arbitrary TAP
receivers
• Leverages the existing memcached engine interface, and the non-blocking IO interfaces to send data
• Three modes of operation
Working setDataMutations
Working setDataMutations
Working set
27
Disk > Memory
Buc
ket C
onfig
urat
ion
mem_high_wat
mem_low_wat
memory quota
28
Dataset may have many items infrequently accessed. However, memcached has different behavior (LRU) than wanted with membase.
Still, traditional (most) RDBMS implementations are not 100% correct for us either. The speed of a miss is very, very important.
ns_servermembase(memcached + membase engine)
moxi ns_server
vbucketmigratorTAP
memcached operationswith tap commands
memcached operations
Client
port 11211 memcached operations
moxi + Client
port 11210 memcached operations REST/comet
cluster topology and vbucket map
Clients, nodes and other nodes
29