MariaDB: Connect Storage Engine

© MariaDB. Company Confidential.

© MariaDB Corpora,on Ab. Company Confiden,al.

MariaDB &

CONNECT Storage Engine Serge Frezefond

[email protected] @sfrezefond


MariaDB Server •  RDBMS open source project based on MySQL under GPLv2 •  Backed by MariaDB Founda,on •  Enterprise product (MariaDB Enterprise) •  Goal is to be the best DB for DevOps while remaining compa,ble with

MySQL


MariaDB Versions

•  MariaDB 5.1 based on MySQL CE 5.1 –  MariaDB 5.2 based on MariaDB 5.1 –  MariaDB 5.3 based on MariaDB 5.2

•  MariaDB 5.5 based on MySQL CE 5.5 •  MariaDB 10.0 based on MariaDB 5.5

–  Plus features from MySQL 5.6 •  MariaDB 10.1 based on MariaDB 10.0

–  Plus features from MySQL 5.7


MariaDB 10.0

Scalability ●  Advanced parallel replication ●  Sharding ●  MaxScale proxy

Performance ●  Enhanced optimization ●  Improved and special purpose storage engines ●  Carefully tuned and enhanced server internals ●  Advanced performance monitoring

Availability ●  HA clustering - integrating Galera cluster ●  More online operations, less planned downtime

NoSQL ●  Interoperable storage engines such as Cassandra and Connect ●  Dynamic columns and JSON processing ●  HandlerSocket API

Operations ●  Comprehensive diagnostics built-in to the DB ●  APIs and open architecture for easier integration

Security ●  Role-based access control ●  Authentication plugins ●  Sophisticated auditing capabilities

© 2014, MariaDB Corp.

Global Transaction ID (GTID) ●  New MariaDB exclusive global event

ID unique across multiple independent replication streams. ○  DomainID added to SeqNum-ServerID to

uniquely label replication events. ○  Slaves save their replication status in a

crash-safe table, transactionally synced to the slave’s binlog.

○  Replication streams always strictly ordered, but independent streams may be interleaved on the slave.

●  Much simpler failover to new master with complex topologies.

●  Supports multi-source and parallel replication.

A1

A2

A3

A4

A5

B1

B2

B3

B4

B5

A1

A2

B1

A3

B2

B3

A4

A 4

B 3

Crash-safe Replication State

XX DomainID

32-bit

YYYY SeqNum

64-bit

ZZ ServerID

32-bit

MariaDB GTID


Parallel Slave Replica,on

●  Sponsored by Google. ●  Allows slaves to�

process update events�in parallel.

●  Uses MariaDB 10’s�improved Global�Transaction ID (GTID).

●  Preliminary benchmarks: almost 10x faster at 12 threads.


Multi-Source Replication

●  Collects data for�

analytics using�built-in replication.

●  Aids in administration�example: consolidated�backups of multiple databases.

●  Uses MariaDB 10’s improved�Global Transaction ID (GTID).

Online E-Commerce Application

Master S S S S

Content Management

System

Click-stream data

Data Warehouse Slave ETL

Master S S S S

Master S S S S


●  Clustered MariaDB nodes�cooperate to remain in sync.

●  With multiple master nodes,�reads and updates both scale.

●  Synchronous replication with�optimistic locking delivers high�availability with little overhead.

●  Fast failover because all nodes remains synchronized.

●  Integrated and tested binaries.

MariaDB Galera Cluster

Load Balancing�and Failover

Application / App Server


Optimizer Improvements

●  Enhancements include:

○  Disk access optimizations. ○  JOIN optimizations. ○  Subquery optimizations. ○  Optimized derived tables and views. ○  Execution control. ○  Optimizer control. ○  EXPLAIN improvements.

Less I/O, CPU, memory requirements. Faster execution.


CPU Cache

DRAM

Disks, SANs

Nanoseconds Microseconds Milliseconds

6 orders of magnitude

How Does Fusion-io Flash Storage Accelerate Databases?


How Much Faster Is MariaDB 10 With Fusion-io?

12 hours 24 hours

/ sec

Fusion-‐io fills the buffer pool in less than an hour

All the data does not fit in the buffer pool, So performance dips

Never dips below 25,000 tx / sec

HDD performance rises for much longer as takes a LOT longer to fill buffer pool

1 hour

HDD : Performance dips as IO increases

About 24 times faster


MariaDB 10 Interoperability: Cassandra Storage Engine

●  Window into a Cassandra ring:�read/write like a table in MariaDB.

●  Use standard SQL queries. ●  JOIN Cassandra data�

to MariaDB tables. ●  Use a MariaDB cluster�

for high-availability�access.

●  Bring data from�Cassandra into OLTP�applications.

Application

Spider

Database Tables

MariaDB Parser/Optimizer/Connection Pool

Cassandra Engine

Other Engines


Dynamic (& Virtual) Columns

●  Store unstructured data in MariaDB tables with a simple API. ●  Use MariaDB’s indexing and transactions to manipulate�

“document” style data fast and consistently. ●  Nest sets of dynamic columns inside of other dynamic columns -

hierarchical structuring. ●  Include multiple rows with dynamic columns in transactions. ●  Virtual Columns allows to create function based columns

Cust ID Account Balance Dyn_Col_BLOBs

2035 $154.04 NAME: John Smith|LOC: 45.35243, -74.98348|IMAGE: x27A8B8C ...

2036 $929.10 NAME: Jane Doe|LOC: 45.35243, -74.98348|AGE: 32| GENDER: F...

2037 $377.53 NAME: Carol Jones|AGE: 43|GENDER: F||IMAGE: xA9674DE678 ...


Introducing MariaDB MaxScale, Web Scale Database Proxy

MaxScale hides complexity, making clusters of systems look like a single

server to a client.

●  Simplifies complex replication schemes�for massive scale, high availability.

●  Manages performance with logging. ●  Safeguards data through firewall filtering. ●  Connects diverse clients and databases�

with multiple protocols, query transformations.

Client Simple Requests

MaxScale


MaxScale Use Case 1: Read Scalability

19.06.2014

MaxScae

MySQL Replica.on + Connec.on Load Balancing

Each application server uses 2 connections: 1 R/W, 1 R

MaxScale connects the R/W client connections to the master and the R connections are load-balanced to all

slaves


MaxScale Use Case 2: R/W Rou,ng

MySQL Replica.on + R/W Split rou.ng

Each application server uses only 1 connection

MaxScale creates 2 connections, one for R/W on the master node and one for R/O

load balanced on the slave nodes

MaxScale

R/W�Splitting

MaxScale monitors the state of each node and only applies

operations on available slaves


MariaDB 10 Interoperability: CONNECT Storage Engine

●  From 3rd party developer. ●  Maps diverse data�

to tables. ●  JOIN mapped data�

to DB tables. ●  Flat files including CSV. ●  Tables in external DBs. ●  Generated tables�

(PIVOT etc.) ●  Plug-in API for your own mappings.

Powerful tool for data integration, federation.

Application

Spider

MariaDB Parser/Optimizer/Connection Pool

CONNECT Engine

Other Engines

Database Tables

.log

XML CSV


The CONNECT Storage Engine

MySQL Server / MariaDB

MyISAM InnoDB Memory Connect Federated Merge CSV ...

ODBC MySQL XML CSV DIR TBL JSON ...

XML CSV ODBC MySQL DIR ...


CONNECT Storage Engine ODBC table type

●  Allow to access to any ODBC data source. –  Excel, Access, Firebird, SQLite –  SQL Server, Oracle, DB2

●  Supports insert, update, delete and any other commands J

●  Multi files ODBC: consolidated monthly excel datasheet ● On Linux access to ODBC and UnixODBC data sources ● WHERE conditions are push to the ODBC source


DSN for ODBC target

● Standard DSN syntax used ● Auto discovery of columns takes place ● If specified the columns set can be a subset

of target table create table toto ENGINE=CONNECT TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager1';


Remote execution of SQL on ODBC target

To execute non MySQL syntax To compute agregate remotely

create table emp_hierarchy ENGINE=CONNECT TABLE_TYPE=ODBC tabname='EMP' CONNECTION='DSN=orcl;UID=scott;PWD=manager' srcdef='SELECT empno, ename, mgr, LEVEL FROM emp CONNECT BY PRIOR empno = mgr;';


Any command on the ODBC target create table crlite (

command varchar(128) not null, number int(5) not null flag=1, message varchar(255) flag=2)

engine=connect table_type=odbc connection='Driver=SQLite3 ODBC Driver;Database=test.sqlite3;NoWCHAR=yes' option_list='Execsrc=1';

select * from crlite where command = 'CREATE TABLE lite (ID integer primary key, …)';


JSON is cool for developers so ...

● Hierarchical, simple types ● JSON is used as storage format for document

stores: CouchBase, MongoDB(BSON) ● Used by rest API : FB, AWS, AZURE … ● JSON is at the heart of JavaScript ● Used by many cool dev framework : AngulaJS

…


Not much so far for JSON & MySQL ● JSON udfs to do basic operations: ○ To manipulate JSON (search, contains, extract, set, append, remove, replace ..) ○ To generate JSON (value, object, arrays, member…) ○ It is udf so do not expect top performance L

● Import / export in JSON ○ Mysql2json ○ Mysqljson

● Explain output in JSON


JSON output of MariaDB Dynamic column

MariaDB JSON export of dynamic columns

> select item_name, COLUMN_JSON(dynamic_cols) from assets; +-----------------------+------------------------------------------------+ | item_name | COLUMN_JSON(dynamic_cols) | +-----------------------+------------------------------------------------+ | MariaDB T-shirt | {"size":”XL","color":"blue"} | | Thinkpad Laptop | {"color":"black","warranty":"3 years"} | +-----------------------+------------------------------------------------+


Not much so far for JSON and MySQL but ..

● A native JSON type with content indexing and compact format would be great JNow it is coming ! : ○ Facebook introduces DOCstore with Native JSON

support ■ Indexing on mixed col + document path

○ MySQL 5.7 lab release with native JSON support : ■ Validation, fast access ■ index through virtual columns

● It was time to do it, PostgreSQL already got it ( JSONB, index, functions …)


JSON is simple {! "ISBN": "9782212090819",!…! "AUTHOR": [! {! "FIRSTNAME": "Jean-Christophe",! "LASTNAME": "Bernadac"! },!..! ],! "TITLE": "Construire une application XML",! "PUBLISHER": {! "NAME": "Eyrolles", ...! },! }!


Create a table on JSON file ● JSON Path used to map column names to JSON

properties create table jsampall (!ISBN char(15),!…!Author char(128) field_format='AUTHOR:[" and "]',!Title char(32) field_format='TITLE',!Publisher char(20) field_format='PUBLISHER:NAME',!)!engine=CONNECT table_type=JSON File_name='biblio3.jsn';!


JSON Table Type Query Result

The result is: Title author publisher location Construire application XML Jean Bernadac and François Knab Eyrolles Paris XML en Action William J. Pardi Microsoft Press Paris

select title, author, publisher, location from jsampall;!


Create a table on JSON file ● Possibility to define the starting point ● For example with this Facebook JSON file : { "data": [ { "id": "X999_Y999", "from": { "name": "Tom Brady", "id": "X12" }, "message": "Looking forward to 2010!", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/X999/posts/Y999"


Create a table on JSON file

● Possibility to define the starting point

create table jfacebook ( ÌD` char(10) field_format='id', `Name` char(32) field_format='from:name', Àction` char(16) field_format='actions::name', `Link` varchar(256) field_format='actions::link', engine=connect table_type=JSON file_name='facebook.json' option_list='Object=data,Expand=actions';


JSON file formats supported ● 2 JSON file formats supported ● format of exported MongoDB files :

33

{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" } { "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" } …

create table cities ( `_id` char(5) key, `city` char(32), `long` double(12,6) field_format='loc:[1]', `lat` double(12,6) field_format='loc:[2]’ ) engine=CONNECT table_type=JSON file_name='cities.json' lrecl=128 option_list='pretty=0';


JSON Table Type : The Jpath Specification

Specification Array Type Description [n] All Take the nth value of the array. Ignore it if n is 0. [X] or [x] All Expand. Generate one row for each array value. ["string”] String Concatenate all values separated by specified string. [+] Numeric Make the sum of all the array values. [*] Numeric Make the product of all array values. [!] Numeric Make the average of all the array values. [>] or [<] All Return the greatest or least value of the array. [#] All Return the number of values in the array. [] All Sum if numeric, else concatenation separated by “, “.

All Take the first value if an array.


JSON facts table Aggregation of property values

The result is:

[ { "WHO": "Joe", "WEEK": 3, "EXPENSE": [ { "WHAT": "Beer", "AMOUNT": 18.00 }, … ] }, { "WHO": "Joe", "WEEK": 4, "EXPENSE": …


Aggregation through SQL/JSON Path

The result is:

create table jexpw ( WHO char(12) not null, WEEK int(2) not null field_format='WEEK:[x]:NUMBER', WHAT char(32) not null field_format='WEEK::EXPENSE:[", "]:WHAT', SUM double(8,2) not null field_format='WEEK::EXPENSE:[+]:AMOUNT', AVERAGE double(8,2) not null field_format='WEEK::EXPENSE:[!]:AMOUNT') engine=CONNECT table_type=JSON File_name='expense.json’;

WHO WEEK WHAT SUM AVERAGE Joe 3 Beer, Food, Food, Car 69.00 17.25 Joe 4 Beer, Beer, Food, Food, Beer 83.00 16.60


JSON facts table Aggregation through SQL

The result is:

CREATE TABLE `jexpall` ( `WHO` char(12) DEFAULT NULL, `WEEK` int(2) DEFAULT NULL `field_format`='WEEK:[x]:NUMBER', `WHAT` char(32) DEFAULT NULL `field_format`='WEEK:[x]:EXPENSE:[x]:WHAT', ÀMOUNT` double(8,2) DEFAULT NULL `field_format`='WEEK:[x]:EXPENSE:[x]:AMOUNT' ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=JSON `File_name`='/var/lib/mysql/json/expense.jsn’

select who,week, ! group_concat(what),! sum(amount), ! avg(amount) !from jexpall group by who,week;!


Simple JSON creation with autodiscovery

● CONNECT will automatically discover the structure of a JSON file (like with ODBC) ○ Automatic column naming ■ Column name =

propertyname1_..._propertyname5 ● Use a depth argument for flattening hierarchy ○ Deeper will remain JSON

● Will return 1rst array element (as do not now what to do with array elements)


Simple creation with autodiscovery

● Show create table will show what has been autodiscovered

39

create table jsampall2 engine=connect table_type=JSON file_name='biblio3.json' option_list='level=1';

CREATE TABLE `jsampall2` ( ÌSBN` char(13) NOT NULL, ÀUTHOR_FIRSTNAME` char(15) NOT NULL `FIELD_FORMAT`='AUTHOR::FIRSTNAME', … ) ENGINE=CONNECT `TABLE_TYPE`='JSON' `FILE_NAME`='biblio3.json' ÒPTION_LIST`='level=1';


Catalog of a JSON File create table bibcol engine=connect table_type=JSON file_name='biblio3.json' option_list='level=2' catfunc=columns;

select column_name, type_name type, column_size size, jpath from bibcol; column_name type size jpath ISBN CHAR 13 TITLE CHAR 30 ... TRANSLATED_TRANSLATOR_FIRSTNAME CHAR 5 TRANSLATED:TRANSLATOR:FIRSTNAME


JSON representation of an existing table

● With CONNECT it is very easy to get the JSON representation of a table ● Create like or Create Select works fine ● Insert … Select json_object(…) from create table xj1 ( row varchar(500) field_format='*’ ) engine=connect table_type=JSON file_name='biblio3.json' option_list='jmode=2’;


Build JSON to populate file ● Udf used to build the JSON representation of

the table insert into xj1 select json_object_nonull(ISBN, language LANG, SUBJECT, json_array_grp(json_object(authorfn FIRSTNAME, authorln LASTNAME)) json_AUTHOR, TITLE, json_object(translated PREFIX, json_object(tranfn FIRSTNAME, tranln LASTNAME) json_TRANSLATOR) json_TRANSLATED, json_object(publisher NAME, location PLACE) json_PUBLISHER, date DATEPUB) from xsampall2 group by isbn;


JSON representation of existing tables

What about master-detail table based on Foreign Key ? - You can create view to hide the join - use udf to build the hierarchical JSON structure

Classic impedance mismatch between a hierarchical(object model) and


Modify JSON File ● Update a JSON table

update jsampex set authorfn = 'John' where authorln = 'Knab';

update jsample2 set json_author = json_array_add(json_author, json_object('Charles' FIRSTNAME, 'Dickens' LASTNAME)) where isbn = '9782840825685’;


Some useful udf functions

● JSON building udf including aggregate functions

Name Type Return Description

Json_Value Function STRING Make a JSON value from its unique argument

Json_Array Function STRING Make a JSON array containing its arguments

Json_Array_Add Function STRING Add to its first array argument all following arguments

Json_Object Function STRING Make a JSON object containing its arguments

Json_Object_nonull Function STRING Make a JSON object containing its not null arguments

Json_Array_Grp Aggregate STRING Makes JSON arrays from coming argument

Json_Object_Grp Aggregate STRING Makes JSON objects from coming arguments


What is next ? Evolutions

● It would be great to be able to query through http a rest API as if it was a JSON file

create table jsampall (!…!)!engine=CONNECT table_type=JSON !File_name=’https://../RESTAPI';!


Take aways

● CONNECT storage Engine can make you life simpler for heterogeneous datasources

● CONNECT offers JSON complementary technology ○ CONNECT/JSON to read, produce, modify JSON file ○ Native JSON for strict efficient JSON ○ JSON udf to manipulate JSON stored as text


Resources

●  bugs: mariadb.org/jira ●  mailing lists:

○  [email protected] ○  [email protected]

●  fb.com/MariaDB.dbms ●  twitter: @mariadb ●  #maria on irc.freenode.net ●  https://mariadb.com/kb/en ●  downloads: https://downloads.mariadb.org

○  apt,yum repositoriies available ●  Default in RHEL7, SuSE 12, Fedora, Slackware, Archlinux etc etc

© MariaDB Corpora,on Ab. Company Confiden,al.

Ques,ons ? Serge Frezefond

[email protected] @sfrezefond

Date post:	06-Aug-2015
Category:	Technology
Upload:	kangaroot
View:	113 times
Download:	5 times

MariaDB: Connect Storage Engine

Technology