1
MariaDB ColumnStore and Use Cases Maria Luisa Raviol, Senior Solu6on Architect, MariaDB Corp. Massimiliano Pinto, Senior So?ware Solu6ons Engineer, MariaDB Corp
2
We should be talking about the analy6cs of things, not the internet of things.
Jim DavisCMO, SAS
“
”
3
Current State of Analy5cs
• Tradi5onal OLAP
o Cost to perform
o Appliances or Proprietary Solu6ons
• Big-‐Data Analy5cs
o Scale to perform
o Non-‐SQL Interfaces
• Analy5cs and Transac5on Separa5on
4
Why MariaDB ColumnStore
Price to Performance at Scale
Data Analy6cs using SQL or SPARK
Unified Simplicity ( Transac6on and Analy6cs under the same Roof )
Open-‐Source GPL2
SQL
5
Row-‐Oriented vs Column-‐Oriented Row-oriented: rows stored sequentially in a file
Key Fname Lname State Zip Phone Age Sales 1 Bugs Bunny NJ 11217 (123) 938-‐3235 34 100 2 Yosemite Sam CT 95389 (234) 375-‐6572 52 500 3 Daffy Duck IA 10013 (345) 227-‐1810 35 200 4 Elmer Fudd CT 04578 (456) 882-‐7323 43 10 5 Witch Hazel CT 01970 (567) 744-‐0991 57 250
Column-oriented: each column is stored in a separate file. Each column for a given row is at the same offset. Key 1 2 3 4 5
Fname Bugs Yosemite Daffy Elmer Witch
Lname Bunny Sam Duck Fudd Hazel
State NJ CT IA CT CT
Zip 11217 95389 10013 04578 01970
Phone (123) 938-‐3235 (234) 375-‐6572 (345) 227-‐1810 (456) 882-‐7323 (567) 744-‐0991
Age 34 52 35 43 57
Sales 100 500 200 10 250
6
Why Customers Choose MariaDB ColumnStore SCALE ● Massively parallel architecture designed for big data scaling to process petabytes of data
● Read performance scales linearly with data growth
SPEED ● Excep6onal performance
● Real-‐6me response to analy6cs queries and High speed data loading
SECURITY and RELIABILITY ● Data with encryp6on for data in mo6on, role based access and audit features of
MariaDB Enterprise
● Built-‐in high availability at access and data layers
SIMPLICITY with POWER ● Simplified management and maintenance, Easy installa6on and scaling
● Same interface as MariaDB and MySQL, Aeaches to wide range of BI tools
Columnar Distributed Data Storage
MariaDB SQL Front End
Query Engine
User Modules
Performance Modules 1 ... Performance
Modules N Performance Modules 2
Performance Modules 3
Clients
User Connec5ons
7
MariaDB ColumnStore Architecture ▪ User Module : Processes SQL Requests ▪ Performance Module : Distributed Processing Engine
8
Data Storage -‐ Extents and PMs
Extent 1 Extent 2
Extent 3 Extent 4
Extent 5 Extent 6
Extent 7 Extent 8
PM 1 PM 2
Extent 1 Extent 2 Extent 3 Extent 4
Extent 5 Extent 6 Extent 7 Extent 8
PM 1 PM 2 PM 4 PM 3
● Extent Map
○ In memory meta-‐data of an extent’s min, max value for a column, extent’s physical block offset and PM on which the extent resides
Data Inges5on ● Bulk data loadHadoop is suitable for
○ cpimport : CSV and Binary
○ LOAD DATA INFILE: CSV
● Apache Sqoop Integra6on: ○ Integra6on with cpimport and sql interface
● Future Release ○ Data Streaming from MariaDB/MySQL database to MariaDB ColumnStore cluster
• via Kafka
• Avro data record
Data Inges5on -‐ Bulk Data Load ● cpimport
○ Fastest way to load data • Load data from CSV file • Load data from Standard Input • Load data from Binary Source file
○ Mul6ple tables in can be loaded in parallel by launching mul6ple jobs ○ Read queries con6nue without being blocked ○ Successful cpimport is auto-‐commieed ○ In case of errors, en6re load is rolled back
● LOAD DATA INFILE ○ Tradi6onal way of impor6ng data into any MariaDB storage engine table ○ Up to 2 6mes slower than cpimport for large size imports ○ Either success or error opera6on can be rolled back
Analy5cs In database analy6cs with complex joins, windowing func6ons and UDFs Out of box BI Tools connec6vity, Analy6cs integra6on with R
Scale • Columnar, Massively Parallel • Linear scalability
Performance • High performance adhoc analysis • Consistent query response 6me
High Availability Built in redundancy and high availability
Ease of Use • ANSI SQL compa6ble • ACID compliant • No indexes, No materialized views • No manual par66oning
Data Inges5on CONNECT Engine Create Table as Select High speed parallel data load and extract
Security SSL support, Audit Plugin, Authen6ca6on Plugin, Role Based Access
Deployment Op5ons On premise, AWS, Hadoop 11
MariaDB ColumnStore 1.0
• Harvest new value from large historical datasets by deriving new insights • Support growth in your business, while con6nue to deliver high service levels
for data analy6cs
Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000 10-‐100GB 100-‐1000GB 1-‐10TB 10-‐100TB...PB
MariaDB Enterprise OLTP
MariaDB Enterprise Enterprise OLAP
Use Case: Scaling Big Data Analy5cs
12
● Improved DBA produc6vity
● Familiar SQL interfaces democra6zes access to big data to larger user base
● Reduced opera6onal complexity
● Gekng most value out of big data while minimizing DBA Opex cost
Use Case: Simplifying Big Data Management
14
Use Case: Simplifying Big Data Management
● MariaDB ColumnStore
● Libera6on from Index management
● Automa6c par66oning
● Easy to grow
● Micro-‐batch bulkload for real-‐6me data-‐flow
Business Challenge MariaDB Solu6on Complexity of data management increases as data volume grows
● Tedious to keep up with indexes and par66oning as data grow
● Scaling-‐out or Scaling up management
● Moving opera6onal data to big data analy6cs plamorm in real-‐6me
PM Node
cpimport
Source Source Source
UM Node
PM Node
PM Node
15
Use Case: Scaling Big Data Analy5cs
● An organiza6on is genera6ng large amount of opera6onal data
● Mul6ple tera-‐bytes of historical data
● With growth in business and in opera6onal data
○ Analy6cs query performance degrades
○ Imprac6cal to do analy6cs
● Put past data into MariaDB ColumnStore
● As data grows
● Perform analy6cs without performance degrada6on
● Linear Scalability with data growth
Business Challenge MariaDB Solu6on
1 2 3
MariaDB ColumnStore 1.0
Add new node(s)
● Uncover new business opportunity with data explora6on and analy6cs on petabyte data volumes
● Generate real-‐6me insights to inform and enhance live customer interac6ons
Use Case: Discover Insight
Use Case: Discover Insight
Challenges
● Need to analyze real-‐6me and historical flight parameter data
● Too 6me-‐consuming to perform analy6cs with current toolset
● Most data analyst have SQL background
Objec5ves: ● Maintain flight safety -‐ accurately
predict part replacement t ● Provide high service levels and
minimize cost -‐ proac6vely plan equipment maintenance and re6rement
Global Commercial Avia5on Manufacturer
Historical DATA Real-‐6me in-‐flight performance data
• Complex-‐join, aggrega6on and windowing func6ons
• High speed real-‐6me performance
Micro-‐batch upload real-‐6me flight performance into MariaDB ColumnStore
Analy6cs DATA Scien6st
Familiar SQL Interface
The company plans to sell this solu5on as a service to commercial airliners
Timely maintenance forecast, part replacement,
flight re5rement
● Familiar SQL interfaces democra6zes access to big data to larger user base
● Aeach wide range of BI tools via MariaDB/MySQL connectors
● Gekng most value out of big data while minimizing Opex cost
● Leverage Hadoop deployments
Use Case: Accelerated Analy5cs with SQL & SPARK
19
Use Case: Accelerated Analy5cs with Hadoop
● MariaDB ColumnStore OLAP can run on premise, on cloud or on Hadoop cluster
● Ingest data from Hadoop
● Mature ANSI-‐SQL compliance
● Stellar performance : 70 to 80 6mes faster than SQL-‐on-‐Hadoop counterparts Hive, Hbase and Impala
● Mature interfaces
Business Challenge MariaDB Solu6on ● Large amount of data in Hadoop
● Hadoop is suitable for
○ batch processing
○ Transforms via Map-‐Reduce programming
● Real-‐6me analy6cs on Hadoop
○ Speed cannot meet business requirement with the Hadoop tool set
● Shortage of Hadoop skills for Data Scien6st/BA
○ SQL interfaces on Hadoop Tools are not mature
Map Reduce HBase MariaDB ColumnStore
Hadoop Distributed File System
Pig/Hive
Batch Processing High Performance analy6cs
20
MariaDB to Hadoop Replica5on – Coming Soon MariaDB MaxScale Binlog-‐Avro translator
● AVRO files: Object Container File consists of: • A File Header
• 4 bytes, ASCII 'O', 'b', 'j', followed by 1. • File metadata, including the schema. • The 16-byte, randomly-generated sync marker for this file.
• One or more file data blocks • A long indicating the count of objects in this block. • A long indicating the size in bytes of the serialized objects in the current
block. • The file's 16-byte sync marker.
Note: each AVRO file contains data related to ONE table only,
21
MariaDB to Hadoop Replica5on – Coming Soon
Master
Slaves
Binlog to Avro
Amazon EMR
Amazon RedShift
MaxScale
Binary log events
Avro or JSON events
MariaDB MaxScale Binlog-Avro translator ○ Replicate binlog events from MariaDB to Kafka Producer ○ Kafka consumers to ingest data into Hadoop or any other custom
data warehouse or application
22
Booking.com: a MaxScale solu6on Based in Amsterdam since 1996 • 150 offices worldwide
• +590.000 proper6es in 212 countries • 42 languages (website and customer service)
• >3000 servers, ~90% replica6ng, around 100 master, 10 to > 50 slaves, 4 have > 100 slaves
Problem? • With so many slaves it’s easy to saturate the network interface of the master
Solu6on? • MariaDB MaxScale Binlog Server, that is a daemon that: ! Downloads binary logs from the master
• Saves them in the same structure as the master
• Serves the binary logs to slaves
23
Booking.com: a MaxScale solu6on
Slaves
Binlog Cache
Master
MaxScale MaxScale
Slaves
Binlog Cache
MariaDB MaxScale Binlog Server: ! Horizontal scaling of slaves
without master overload ! Crash safe disaster recovery ! Master switch/fail over without
reconfiguring any slave
24
MariaDB ColumnStore Roadmap
First release • MariaDB ColumnStore (Por6ng of InfiniDB on MariaDB 10.1) • Amazon EBS support • Create Table Like/As Select
Future Releases • Spark Integra6on • Data Streaming integra6on with MaxScale • Na6ve API for columnar file • Join and Filter performance op6miza6on • ROLLUP, CUBE in MariaDB ColumnStore • AS OF implementa6on in MariaDB Server • CONNECT Engine support in MariaDB Server • SQL Editor (OSS or 3rd party partner)
Subscrip6on offering
25
● BETA release in May 2016.
● Sign up for no6fica6on of BETA availability today
● Product Page heps://mariadb.com/products/mariadb-‐columnstore
Learn more about MariaDB ColumnStore
26
Q&A
27
Thank You Maria Luisa Raviol, Senior Sales Engineer
Massimiliano Pinto, Senior So?ware Solu6ons Engineer [email protected]