Growing in the Wild. The story by CUBRID Database
Developers.
Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center
Monday, April 2, 2012
Eugen Stoianovici,NHN CorporationCUBRID Development Lab
Who are we?
• Eugen Stoianovici– CUBRID Engine Team– [email protected]
• Esen Sagynov @CUBRID– CUBRID Project Manager– [email protected]
Purpose of this presen-tation
This is what I remember from every presentation that I’ve attended. Not the details.
1. “Some guys talked about some cool stuff they encountered in applications (don't remember what)”
2. “There's a database that they use for this type of applications, it's open source and saves from a lot of trouble (don't remember what trouble exactly).”
3. “They're really keen on doing things right.”
You will learn…
Reasons behind CUBRID development.
What CUBRID has to offer. Benefits & ad-vantages.
What we have learnt so far. Where we are heading to.
CUBRID Facts
RDBMSTrue Open Source @ www.cubrid.orgOptimized for Web servicesHigh performance 3-tier architecture Large DB supportHigh-Availability featureDB Sharding supportMySQL compatible SQL syntaxACID TransactionsOnline Backup
Reasons Behind CUBRID Development
Japan
30,000+Web Servers
USA
Korea
China
150+ Web Services
30,000+Web Servers
Korea Japan
USA
USA
Korea
Korea Japan
iOS & Android
Japan Oracle, MSSQL,MySQL, CUBRID,
NoSQL
150+ Web Services
Disadvantages of existing so-lutions
1. High License Cost1. Over 10,000 servers @ NHN
2. Third-party solution1. No ownership of the code base2. Additional $$$ for customizations3. Branch tech support is not enough4. Communication barriers w/ vendors5. Slow updates & fixes
Fork or Start from Scratch?
• No full ownership• Time to learn the
code base• Fixed
architecture• Understand the
design philosophy
• Full ownership• Time to develop• Custom more
advanced architecture and design
Benefits of in-house solution
1. High License Cost1. Over 10,000 servers
@ NHN
2. Third-party solution1. No ownership of the
code base2. Additional $$$ for
customizations3. Communication
barriers w/ vendors4. Slow updates & fixes
1. No License Cost2. Core Technological Asset
1. Complete control of the code base2. No additional $$$ for
customizations3. No communication barriers4. Fast updates & fixes
3. Key Storage Technology Skills1. Grow our developers2. Export developers
4. New Database Solution Service1. Provide CUBRID service to other
platforms2. Instant reaction to customer issues
5. Recurring Key Technology1. High-Availability2. Sharding3. Rebalancing4. Cluster5. etc.
CUBRID
Stability Performance
Scalability Ease of Use
Goal
• Human vs. DB Errors• # of customers
• Smart Index Optimizations• Shared Query Caching• Web Optimized Features• Load Balancer
• High-Availability w/ auto fail-over• Sharding• Data Rebalancer• Cluster
• SQL & API Compatibility• Native Migration Tool• Native GUI DB Management Tools• Monitoring Tools
#1
Performance
ClientRe-quests
Performance UP!
Types of WebServices
Main operations Example
READ > 95% News, Wiki, Blog, etc.
READ:WRITE = 70:30% SNS, Push services, etc.
WRITE > 90% Log monitoring, Analyt-ics.
90% of WebSer-vices
CRUD WHY?
SELECT Fast searching, avoid sequential scan and OR-DER BY
INSERT Concurrent WRITE performance, reduce I/O, andFast searching
UPDATE Fast searching, improve lock mechanism
DELETE
Fast searching
How &What toimprove
Phase 1v1.0 ~ 2.0
Phase 2v8.2.2
Phase 3v8.4.0
Phase 4v8.4.1
Phase 5Apricot
Phase 6Banana
SELECTPerfor-mance
+
INSERT &DELETEPerfor-mance
+
SELECTPerfor-mance
++
INSERT &UPDATEPerfor-mance
++
INSERTPerfor-mance+++
SELECTPerfor-mance++++
Shared Query Plan
Caching
SpaceReusabilityImprove-
ment
CoveringIndex,
Key limit, etc.
MemoryBuffer Mgmt.
Improve-ments
Filter index,Skip index,
etc.
OptimizeJOINs
DB & IndexVolume
Optimiza-tions
APIPerfor-mance
+
WindowsPerfor-mance
+
TPS 15% 10% 270% 70%
Smart Indexing
MySQL SELECTperformance
CUBRID SELECTperformance< MySQL INSERT
performanceCUBRID INSERT
performance<
CREATE TABLE forum_posts( user_id INTEGER, post_moment INTEGER, post_text VARCHAR(64));
INDEX i_forum_posts_post_moment ON forum_posts (post_moment);INDEX i_forum_posts_post_moment_user_idON forum_posts (post_moment, user_id);
Random INSERT Perfor-mance
SELECT username FROM users WHERE id = ?;
INSERT INTO forum_posts(user_id, post_moment, post_text)VALUES (?, ?, ?);
UPDATE users SET last_posted = ? WHERE id = ?;
CREATE TABLE users( id INTEGER UNIQUE, username VARCHAR(255), last_posted INTEGER,);
Random INSERT Perfor-mance
• Users– 100,000 rows prepopulated
• Test– CUBRID vNext (code name Apricot)–MySQL 5.5.21– 40 workers– 1 hour– Record QPS every 2 minutes
0
523,
080
1,04
7,72
0
1,55
4,00
0
2,07
9,00
0
2,58
6,00
0
3,11
6,64
0
3,65
2,92
0
4,17
8,40
0
4,69
4,52
0
5,21
1,24
0
5,70
8,40
0
6,18
7,32
0
6,68
1,84
0
7,17
0,96
0
7,64
1,48
0
8,10
3,84
0
8,55
9,84
0
8,99
5,32
0
9,41
8,20
0
9,83
4,60
0
10,2
30,3
60
10,5
94,0
80
10,9
68,8
40
11,2
42,8
00
11,6
90,0
40
11,9
67,3
60
12,3
88,3
20
12,7
57,3
20
13,0
85,2
800
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
CUBRID QPS decrease with DataSet size
Queri
es p
er
second
Random INSERT Perfor-mance
Average = 3685Max = 4469Min = 2821
0
1,0
74,2
19
1,7
69,1
30
2,2
31,0
16
2,5
33,9
65
2,7
97,2
36
3,0
33,1
98
3,2
25,9
48
3,3
99,6
81
3,5
68,5
63
3,7
23,4
71
3,8
73,8
73
4,0
15,6
35
4,1
57,4
33
4,2
89,1
12
4,4
32,9
38
4,5
70,9
20
4,7
06,5
23
4,8
38,0
79
4,9
78,1
52
5,1
18,6
51
5,2
70,6
94
5,4
19,0
56
5,5
46,5
17
5,6
75,6
19
5,8
09,0
68
5,9
41,2
96
6,0
73,4
31
6,2
01,1
38
6,3
34,7
490
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
MySQL QPS decrease with DataSet size
Queri
es p
er
second
Random INSERT Perfor-mance
Average = 1796Max = 8951Min = 1122
0
746,
073
943,
344
1,04
0,35
2
1,11
2,70
9
1,16
3,96
4
1,21
4,58
0
1,27
3,63
8
1,31
2,50
9
1,35
7,38
3
1,40
8,64
7
1,45
8,56
4
1,50
0,97
2
1,54
3,50
0
1,58
5,75
8
1,62
4,95
3
1,65
6,57
9
1,70
5,83
6
1,75
7,17
2
1,79
1,96
6
1,82
5,71
0
1,84
7,51
7
1,87
7,52
9
1,92
2,12
7
1,95
2,99
1
1,98
5,65
5
2,01
0,43
5
2,04
4,97
7
2,08
7,99
7
2,11
7,61
00
1000
2000
3000
4000
5000
6000
7000
PostgreSQL QPS decrease with DataSet size
Queri
es p
er
second
Random INSERT Perfor-mance
Average = 594Max = 6217Min = 181
Random INSERT Perfor-mance
094
3,34
41,
074,
219
1,21
4,58
01,
357,
383
1,50
0,97
21,
585,
758
1,70
5,83
61,
791,
966
1,87
7,52
91,
985,
655
2,07
9,00
02,
231,
016
2,79
7,23
63,
225,
948
3,65
2,92
04,
015,
635
4,28
9,11
24,
694,
520
4,97
8,15
25,
270,
694
5,67
5,61
95,
941,
296
6,20
1,13
87,
170,
960
8,55
9,84
09,
834,
600
10,9
68,8
4011
,967
,360
13,0
85,2
80
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
QPS decline over one hour
MySQL QPSCUBRID QPSPostgreSQL QPS
Queri
es p
er
second
CUBRID Optimizations
Index Features
Reverse Index
Prefix Index
Function Index
Filter Index
Unique Index
Primary Key
Foreign Key
Query Features
Multi-range key limit
Index skip scan
Skip order by
Skip group by
Range Scan op-timizations
Query rewrites
Covering Index
Descending In-dex
Server level opti-mizations
Log compres-sion
Shared Query Plan cache
Locking Opti-mizations
Transaction concurrency
Filter Index
• Interesting (open) tickets fit into a very small index.• No overhead for INSERT/UPDATE• Very fast results for open tickets
CREATE INDEX ON tickets(component, assignee)WHERE status = ‘open’;
SELECT title, component, assignee FROM usersWHERE register_date > ‘2008-01-01’ AND status = ‘open’;
QPS Filter vs. Full index0
500,0
00
1,0
00,0
00
1,5
00,0
00
2,0
00,0
00
2,5
00,0
00
3,0
00,0
00
3,5
00,0
00
4,0
00,0
00
4,5
00,0
00
5,0
00,0
00
5,5
00,0
00
6,0
00,0
00
6,5
00,0
00
7,0
00,0
00
7,5
00,0
00
8,0
00,0
00
8,5
00,0
00
9,0
00,0
00
9,5
00,0
00
10,0
00,0
000
1000
2000
3000
4000
5000
6000
7000
QPS Full IndexQPS Filter Index
Queri
es p
er
second
CUBRID Architecture
APICCI, JDBC, ADO.NET, OLEDB, ODBC,PHP, Perl, Python, Ruby
BrokerQuery Parser Query Optimizer
Query Planer
ServerQuery Man-
agerQuery Execu-
torTransaction
Manager
Lock Manager Log Manager
Storage Manager
File Manager
CUBRID
Parameterized Queries & Filter Index
• Will not use partial indexPostgreSQL
• Provides workaroundMS SQL Server
• Less flexible, has to be the exact ex-pressionORACLE
• “Shared” Query Plan CacheCUBRID
SELECT title, component, assignee FROM usersWHERE register_date > ? AND status = ?;
SELECT name, email FROM usersWHERE register_date > ? AND age < ? AND age < 18;
Query Plan Cache
• Cache a plan for the life-span of a driver level pre-pared statement
PostgreSQL
• No query plan cacheMySQL
• “Shared” Query Plan CacheCUBRID
Query Plan Cache
Parse SQL
Name Resolv-ingSemantic checkQuery Opti-mize
Query Plan
Query Execu-tion
Query Execution without Plan
Cache
Parse SQL
Get Cached Plan
Query Execution
Query Execution with Plan Cache
Auto Parameterization
SELECT title, component, assignee FROM usersWHERE register_date > ‘2008-01-01’ AND status = ‘open’;
SELECT title, component, assignee FROM usersWHERE register_date > ? AND status = ?;
#2
Scalability
Scalability challenges
• How to synchronize?– Async
• Load balancing?– Third-party solution
• Who handles Fail-over?– Application– Third-party solution
• Cost?
HA solutions
DBMS Cost Disk-shared
Replication
Consistency
Auto- Failover
Oracle RAC +++++
Shared everything N/A N/A O
MS-SQL Cluster +++ Shared
everything N/A N/A O
MySQL Cluster ++ Shared
nothing Log Based AsyncSync O
MySQL Replication
+ Third-party
Free Shared nothing
Statement Based Async O
CUBRID Free Shared nothing Log Based
SyncSemi-sync
AsyncO
ClientRe-quests
1. Non-stop 24/7 service uptime2. No missing data between nodes
Phase 1
v8.1.0
Phase 2v8.2.x
Phase 4
v8.3.x
Phase 5v8.4.x
Phase 6Apricot
Replica-tion
HASupport
Ex-tended
HAfeatures
HAMonitoring
+
Easy AdminScripts
Async AutoFail-over
HA Sta-tus
Monitor-ing
HAPerfor-
mance+
ReduceReplicationDelay Time
CUBRIDHeartbeat
HA +Replica
AdminScripts
Read-Write Serviceduring DB mainte-
nance
Async,Semi-sync,
Sync
Broker Modes
(RW, RO)
N:N Master:Slave
http://www.cubrid.org/cubrid_ha_oscon
1:1 M:S1:N M:S1:1:N M:S:RN:N M:SN:1 M:S
CUBRID HA: Benefits
• Non-stop maintenance• Auto Fail-over• Large Installations are Easy• Load balancing• Accurate and reliable Failure detection• Various Master-Slave Configurations:– 3 replication modes– 3 broker modes
Database Sharding
• Partitioning
Divide the data between
multiple tables within one
Database Instance
• Sharding
Divide the data between
multiple tables created in
separate Database Instances
DB
X Y Z
DB
X
DB
Y
DB
Z
Shard
Without Database Shard-ing
Tbl1
Tbl2
Tbl3
Broker
App
DB
Tbl4
With Database Sharding
Tbl1
Tbl2
Tbl3
Broker
App
DB
Tbl4
MetadataDirectory
CUBRID SHARDPhase 1
ApricotPhase 2
Banana
UnlimitedShards
DataRebalanc-
ing
MultipleShard ID Gen. Algo-
rithm
Connection & Statement
Pooling
Load Balancing
HA Support
CUBRID, MySQL, Oracle Support
Sharding: Benefits
• Developer friendly– Single database view– No more application logic– No application changes
• Multiple sharding strategies• Native scale-out support• Load balancing• Support for heterogeneous
databases
#3
Ease of Use
Phase 1v.8.2.x
Phase 2
v.8.3.x
Phase 4v8.4.x
Phase 6
Apricot
Oracle MySQL MySQL MySQL,Oracle
HierarchicalQuery
SQL: 60+PHP: 20+
SQL: 70+PHP: 20+
CurrencySQL
LOB,API++
Implicit Type
Conver-sion+
Usabil-ity+
Usability+++
RegExpr
MSSQL win-back
MySQL, Oracle win-back:
Monitoring system
Oracle: Ads,
Shopping
ClientRe-quests
SQL Compatibility
> 90% MySQL SQL Compatibility
ClientRe-quests
1. API Support2. Ease of Migration3. Usability
Phase 1
v.8.1.x
Phase 2v.8.3.x
Phase 3
v.8.4.x
Phase 3Apricot
CM CM, CQB, CMT
CUNI-TOR
Web man-ager
CMMonitoring
++
Phase 1v.8.1.x
Phase 2v.8.2.x
Phase 3v.8.3.x
Phase 4v.8.4.x
CCI, JDBC, OLEDB
PHP, Python, Ruby
ODBC Perl, ADO.NET
MSSQL Win-Back in 2010
Dual Read/Writer
MS SQL
Application
CUBRID
ReadWrite
[Step1] Dual Write
Dual Read/Writer
MS SQL
Application
CUBRID
ReadWrite
[Step2] Dual Write and Read
Application
CUBRID
ReadWrite
[Step3] Win-back Complete
• 16 Master/Slave servers and 1 Archive server• DB size:
0.4~0.5 billion/DB, Total 4 billion records Total 3.2 TB Total 4,000 ~ 5,000 QPS
• Save money for MSSQL License and SAN Storage
ORACLEEnterprise CUBRID
ORACLEStandardORACLE
StandardORACLEStandardORACLE
Standard
CUBRIDCUBRID
CUBRIDCUBRID
40 servers
25 servers
• DB size: 1.5 ~ 2.0 TB/DB, Total 40 TB 10~100K Inserts per second
• Save money for Oracle License and SAN Storage
1 server
Oracle Win-Back in 2011
System Monitoring Service
What we have learnt so far and Where we are heading to?
What we have learnt so far
• Not easy to break users’ habits.• Need time.• Technical support is the key to
acceptance!• Some services don’t deserve Oracle.
CUBRID Deployment in NHN
~2009 2010-1Q 2010-2Q 2010-3Q 2010-4Q 2011-1Q 2011-2Q 2011-3Q 2011-4Q 2012-1Q0
20
40
60
80
100
120
140
0
100
200
300
400
500
42 5060
6977
8294
100107
117
166181
208
259273 283
312326
346
500
∑ services ∑ deployments
CUBRID
Stability Performance
Scalability Ease of Use
Achievements
• Human vs. DB Errors• # of customers
• Smart Index Optimizations• Shared Query Caching• Web Optimized Features• Load Balancer
• High-Availability w/ auto fail-over• Sharding• Data Rebalancer• Cluster
• > 90% MySQL SQL Compatibility• Native Migration Tool• Native GUI DB Management Tools• Monitoring Tools
CUBRID Roadmap
8.4.x
Performance++Covering index,Key limit, Range scan
SQL Compatibil-ity+70+ new syntax
HA++Monitoring tools
I18N, L10N2~3 European charsets
SQL Compatibil-ity++Cursor holdability,Mass table UPDATE &DELETE
DB SHARDING
I18N, L10N+more charsets
Performance+++ SQL monitoring perfor-
mance+ SQL Compatibility+++ Table Partitioning Improve-
ments DB SHARDING+
Performance++++ CURBID Lite SQL Compatibility++
++ DB Monitoring
Improvements Arcus Caching Inte-
gration
CUBRID is Big now.
What can you do?
1. Keep watching it2. Consider using3. Discuss, talk, write about CUBRID4. Support CUBRID in your apps5. Contribute to CUBRID6. Provide CUBRID service
. . .
• How do CUBRID developers cope with stress?– Join MySQL issue tracker ;)
• Want more?– Follow us to the next room. We’ll have
more discussions!