Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)

transcript

Database Sharding the Right Way: Easy, Reliable, and Open

source.

Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center

Monday, October 22, 2012

Who am I?

• Esen Sagynov (NHN Corp.)– CUBRID Project Manager–MVB at DZone

–@CUBRID– fb.com/cubrid– esen@cubrid.org

Growing in the Wild. The story by CUBRID Database

Developers.

Esen Sagynov (@CUBRID),NHN CorporationService Platform Development Center

Monday, April 2, 2012

Eugen Stoianovici,NHN CorporationCUBRID Development Lab

View on Slideshare

http://profyclub.ru/docs/439

We talked about…

• Who is NHN• Reasons behind CUBRID

development• What CUBRID has to offer. Benefits &

advantages.• What we have learnt so far. Where

we are heading to.

CUBRID Facts RDBMS True Open Source @ www.cubrid.org Optimized for Web services High performance Large DB support High-Availability feature DB Sharding support 90+% MySQL compatible SQL syntax + Oracle

statistics functions ACID Transactions Online Backup Supported by NHN Corporation

• Используем ли мы внешние библиотеки, как STL или Boost, в CUBRID?

• Каким образом CUBRID будет выполнять сложные SQL запросы, поддерживать транзакции и ACID во множественных серверах в условиях горизонтального масштабирования?

=Big Business Opportunity

How to manage Big Data?

NoSQL- Open Source- Scalable- Non-standard API

- Enterprise- Vendor dependency- Scalability con-

straints- Common interface

Big Data = NoSQL?• Uses RDBMS with Sharding• Data is stored as simple Key-Value.

• Uses RDBMS with Sharding• Sharding and Replication is abstracted through Gizzard

• Uses RDBMS with Sharding• Hbase usage is limited

• Uses RDBMS to store data• Data caching in a variety of ways

• Uses RDBMS with Sharding• ACID is the reason to use RDBMS

• Uses RDBMS with Sharding• Easier to implement, best suits their needs

• Uses RDBMS with Sharding and HA• Data consistency and relationship are the reason

What NoSQL lacks?

Transactions

NoSQL => NoACID

Standard Interface

Experts

Oracle DB Market Share

2009 2010 201140%45%50%55%60%65%70%

KoreaWorldwide

DBMSMarket$MM

Worldwide 21,359 23,252 26,701 11.8%

Korea 349 395 478 17%

1.6% 1.7% 1.8%

Source: Gartner, 2012

RDBMS is still the best choice formission-critical data

How to manage Big data with RDBMS?

Database Sharding

• Partitioning

Divide the data between

multiple tables within one

Database Instance

• Sharding

Divide the data between

multiple tables created in

separate Database Instances

“User” Ta-ble

1 Jackie

2 Bruce

3 Chuck

4 Billy 14

Shard #1

“User” Ta-ble

1 Jackie

2 Bruce

DB Sharding

Shard #2

“User” Ta-ble

3 Chuck

4 Billy 14

Database Sharding

Sharding SolutionsName Type Requirements Inter-

faceDB ETC

Hibernate shards AS framework

DBMS w/Hibernatesupport

- Hiber-nate

- JVMJava

dbShards AS & Middle-ware MySQL Java, C

Gizzard (Twitter) Middleware Any stor-age

- JVM Java

Spider for MySQL

Middleware &Storage Engine MySQL Any

CUBRID SHARD Middleware

- CUBRID

- MySQL- Oracle

Disadvantages of Existing Solu-tions

• Third-party• Separate installation/hardware• Painful configuration and management• Additional dependency– RDBMS upgrade?– Sharding solution upgrade?

• Generating Unique ID?• Auto-rebalancing?• HA?

The Ideal Solution

• A single open source RDBMS which provides native support for scalability:– database sharding– connection pooling– load balancing– data auto-rebalancing– high-availability

CUBRID 9.0

Scalability with CUBRID

CUBRID 8.4.1connection poolingload balancinghigh-availability

CUBRID 9.0 (code name Apricot)database sharding

CUBRID next (code name Banana)data auto-rebalancing

Universal Sharding

• Supports:

coming soon…

Native DB Sharding

User Apps

CUBRID SHARD middleware

shard #0 shard #N……

CUBRID

How to install CUBRID SHARD?

Install CUBRID 9.0http://www.cubrid.org/downloads

CUBRID SHARD Configura-tion

• shard.conf– Default configuration file for CUBRID

SHARD.

• shard_connection.txt– Predefined list of shard IDs, database

and host names for CUBRID/MySQL.

• shard_keys.txt– A list of shard_key_columns and their

mapping with shard_id

shard.conf

SHARD_KEY_MODULAR = 256SHARD_KEY_LIBRARY_NAME = ‘’SHARD_KEY_FUNCTION_NAME = ‘’

shard_connection.txt

shard_keys.txt

Shard Key Column name id user_id order_no …

Custom Libraryint user_get_shard_key(int type, void *val){ int mod = 2;

if (val == NULL) {

return ERROR_ON_ARGUMENT; }

switch(type) {

case SHARD_U_TYPE_INT:{ int ival; ival = (int) (*(int *)val); return ival % 2;} break;case SHARD_U_TYPE_STRING: return ERROR_ON_MAKE_SHARD_KEY;default: return ERROR_ON_ARGUMENT;

} return ERROR_ON_MAKE_SHARD_KEY;}

Configuring CUBRID SHARD is very easy!

#1) Create Shards• Host 1..N:

$> cubrid createdb shard1$> csql -S -u dba shard1 -c "create user shard password 'shard123’”$> cubrid server start shard1

#2) Create same tables

$> csql -C -u shard -p 'shard123' shard1@localhost -c ”CREATE TABLE users (id BIGINT PRIMARY KEY, name VARCHAR(20), age SMALLINT)”

• Host 1..N:

#3) Start CUBRID SHARD

$> cubrid shard start@ cubrid shard start ++cubrid shard start: success

Standard Connection URL

connectionURL ="jdbc:cubrid:localhost:45511:shard1:shard:shard123:";

DB name

password

Shard broker port

username

String query = "SELECT name FROM student WHERE student_no = /*+ shard_key */ ?; ";PrepareStatement query_stmt = connection.prepareStatement(query);query_stmt.setInt(1,100);ResultSet rs = query_stmt.executeQuery();// fetch resultset

2. Query analysis3. shard_key hashing4. Passing the query to the selected shard.Shard

shard #0 shard #1 shard #2 shard #4

key_columnrange

(hash result) shard_idmin max

student_no 0 63 0

student_no 64 127 1

student_no 128 191 2

student_no 192 255 3

Shard selection

1. Execute queryClient app

Querying Shards

SELECT name FROM student WHERE student_no = /*+ shard_key */ ?;

SQL hint

Shard key column

• bind variable• fixed value

Types of SQL Hints

SQL Hints Description

/*+ shard_key */ a hint to specify the location of - a bind variable - or the literal valuewhich corresponds to the shard key column

/*+ shard_val(value) */

a hint to explicitly specify the shard key in case thecolumn that corresponds to the shard key does notexist in a query

/*+ shard_id(shard_id) */

A hints which can be used to directly processuser queries on a particular shard

How did we tackle theunique ID problem?

Generating Unique IDs

• DB Ticket Server– SERIAL (max = 1037 for ascending serial)– Sortable– 64 bits (BIGINT)– SERIAL CACHE (ON/OFF) for improved performance– Auto fail-over through HA

Other ways

• Create a generic table with BIGINT AUTO_INCREMENT PRIMARY KEY column (also “fail-over”-able)

• Generate IDs in Web app• Dedicated Service like Twitter’s

Snowflake

Does CUBRID SHARD support HA?

CUBRID SHARD in HA

CUBRID SHARD middle-ware

shard #0

Client app

shard #1 shard #2 shard #3

Master

shard #0 shard #1 shard #2 shard #3

= =sync / async / semi-sync + auto fail-over

…. ….Client app Client app

CUBRID SHARD Performance

nGrinderCon-

troller

nGrinderAgent

nDrive App

Simulator(DBGW

nDrive App

Simulator(DBGW

nDrive App

Simulator(DBGW

ShardProxy

Broker

Meta DB 1

Meta DB 2

Meta DB 3

Meta DB 4

User DB

CAS- CUBCCI - CAS

8 agents and simu-

lators

Description Quantity OS (64bit) / CPU / MEM

Agent to generatload andNDrive App Simula-tor

8 Centos5.3 / xeon 2G-8core / 8G

CUBRID Shard 1 Centos5.3 / xeon 2.27G-16core / 24G

CUBRID Broker 1 Centos5.3 / xeon 2.27G-16core / 24G

Meta DB 4 Centos5.x / xeon 2.33G-4core / 8G

User DB 1 Centos5.3 / xeon 2.5G-8core / 8G

Meta DB

User DB

Broker

Shard Proxy

nGrinderAgent

nDrive App

Simulator

Performance Test ScenarioTest environment and hardware configurations

Test scenario #1: Maximum performance test

- Check if it can handle the load of at least 60,000 RPS

RPS = Request Per Second (Throughput)Vuser = # of concurrent users 32 64 96 128 160 192 256 320 384 448 512

020000400006000080000

100000

Load Generator Per-formance

# of concurrent users

Increase in Vuser leads to in-crease:- of response time- of CPU Load on Proxy

hardware

After 320 Vuser, RPS de-creases- Proxy server load

(CPU load of the Simulator hard-ware ismaintained under 40%)

64 128 192 256 320010203040506070

Performance trend when load is increased

proxy cpu Mean Time(ms)RPS metadb TPS

Max performance conditions- 256 Vuser- 13981.98 RPS- 4 Proxy processes => 200 CAS(If # of Proxy processes is in-creased, performance can be in-creased)

Test Results (1)

Test scenario #2: Performance results compar-ison when: - SHARD is used - is NOT used

- When SHARD is not used, Broker is used.

Test Results (2)

64 128 192 256 3200

SHARD vs. Broker Performance Comparison

Broker - broker cpuShard - proxy CPUBroker - Mean Time(ms)Shard - Mean Time(ms)Broker - RPSShard - RPS

- Similar performance until 128 Vuser- When SHARD is not used, 128 Vuser

is maximum- In SHARD usage case, when # of Vuser is

increase- maximum performance can be

achieved as well as shorter re-sponse time andlower CPU utilization.

TPC-C Performance Test

Test Scenario

• 9 tables– customer (300000 records)– district (100 records)– history (300000 records)– item (100000 records)– new_order (0 records)– order_line (3000000 records)– orders (300000 records)– stock (1000000 records)– warehouse (10 records)

• The total number of records > 5 million.

• AWS Xlarge in-stance• 7GB RAM• 20 EC2 units

• Ubuntu 12.04 64-bit

• CUBRID 9.0 (beta)• MySQL 5.5.28

• Buffer• 2.8GB

data_buffer_size• 2.8GB

innodb_pool_size

• Default configura-tions

Test Results

TPC-C Index30

MySQLCUBRID

CUBRID SHARD is very stableand easy to use!

CUBRID SHARD Features

Single database view No application change Unlimited DB shards Multiple Sharding

Strategies Parameterized queries Shard targeted query

(SQL Hints) Shared Query Plan

Caching

Connection and statement pooling

Load balancing Read only Sharding Broker

1+ Sharding Brokers/Proxies

Native HA support Master-Slave DB

Generic (non-sharded) Tables

Supports CUBRID and MySQL

CUBRID SHARD

• Easy– No configuration hassle– No “moving parts”

• Reliable– High performance– Auto fail-over

• Open source– Supported by NHN

Big data is easy!

What’s next for CUBRID?

CUBRID vNext (Banana)

Auto-rebalancing in CUBRID SHARDPerformance+++SQL Compatibility+++SQL Monitoring++

Esen SagynovCUBRID Project Manager

esen@cubrid.org

www.cubrid.org

www.facebook.com/cubrid www.twitter.com/cubrid

CUBRID Q&A www.cubrid.org/questions

Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)

Documents