Date post: | 02-Jul-2015 |
Category: |
Technology |
Upload: | inside-analysis |
View: | 89 times |
Download: | 2 times |
Grab some coffee and
enjoy the
pre-show
banter
before the top of the
hour!
The Briefing Room
The Distributed Enterprise – How a Flexible Foundation Opens Doors
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA ECOSYSTEM
September: INTEGRATION & DATA FLOW
October: ANALYTIC PLATFORMS
Twitter Tag: #briefr
The Briefing Room
Executive Summary
! Information architectures are changing
! Hybrid architectures will dominate
! Flexibility will be increasingly critical
! Embedded database systems will expand
! Distributed computing is here to stay
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr
The Briefing Room
InfiniDB
! InfiniDB is a scalable, columnar database built for big data analytics, business intelligence and data warehousing
! It is 100% open source and offers a MySQL interface
! InfiniDB for Apache Hadoop integrates with Hadoop’s file system (HDFS) to enable real-time analytics within the Hadoop cluster
Twitter Tag: #briefr
The Briefing Room
Guest: Jim Tommaney
Jim Tommaney is the Chief Technical Officer for InfiniDB, bringing 20+ years of enterprise data architecture and performance tuning experience to the team. Data warehouse architectures include clustered, large SMP, and distributed/partitioned systems for verticals including retail, web, and telecom. At InfiniDB he is responsible for delivering architecture and design for the InfiniDB product: a high performance, horizontally scalable and cost effective solution purpose built for data warehousing and analytics. He holds a Masters in Management Information Systems from the University of Texas at Dallas.
The Briefing Room with InfiniDB
11
InfiniDB Design Principles
®
Scalable
Fast
Simple
Copyright © 2014 InfiniDB. All Rights Reserved.
Segment Overview
Copyright © 2014 InfiniDB. All Rights Reserved.
structured unstructured
OLTP
analyGcs
Business Intelligence and Visualization Products Specialty Analytics Applications
InfiniDB Vertica
Impala w/ parquet
Infrastructure Products – Hadoop, Cloud/Hosting, Virtualization, Storage, etc.
NoSQL Products
Specialty Products (Splunk, etc.)
Traditional RDMS Products
ETL, MDM Products.
The Big Data Ecosystem
InfiniDB enables companies to analyze massive amounts of data in real-time on both Hadoop and non-Hadoop environments to discover deep and wide insights.
Radiant Advisors Open-Source SQL-on-Hadoop Benchmark Summary
Que
ry
What is InfiniDB?
Open Source Core
§ GPLv2 licensed scalable, MPP core
§ No restrictions on performance, syntax or scale
MySQL Compatible § “Drop-in” replacement
for other MySQL storage engines
§ Full SQL syntax and capabilities regardless of platform
Apache Hadoop Friendly § Native HDFS
integration leverages existing Hadoop deployments
§ Best in class SQL analytics over Hadoop
(My)SQL for Hadoop
InfiniDB uses standard “Engine=InfiniDB” syntax:
15
CREATE TABLE `game_warehouse`.`dim_Gtle` ( `id` INT, `name` VARCHAR(45), `publisher` VARCHAR(45), `release_date` DATE, `language` INT, `plaUorm_name` VARCHAR(45), `version` VARCHAR(45) ) ENGINE=InfiniDB;
Copyright © 2014 InfiniDB. All Rights Reserved.
InfiniDB Architecture
§ User Module – Understands SQL Requests § Performance Module – Distributed Processing Engine
or
Single Server MPP
Copyright © 2014 InfiniDB. All Rights Reserved.
InfiniDB for Hadoop
§ User Module – maps to Name Node § Performance Module – maps to Data Nodes
MPP
Hadoop Distributed File System
Hadoop Name Node
Hadoop Data Nodes
Copyright © 2014 InfiniDB. All Rights Reserved.
InfiniDB DoW Differentiation
§ User Module – Understands SQL Requests § Performance Module – Distributed Processing Engine
Unique Distribution of Work (DoW): • Move the processing to the data
- Including complex queries, joins • Primitives sent to distributed queues • Primitive complete in sub-second • C++, purpose built • High, standard, low priority queues A primitive is a unit of work a single thread can accomplish without waits.
Single Server
Copyright © 2014 InfiniDB. All Rights Reserved.
19
Relational Foundations
ü Can run in-memory, but not required ü Pure column storage for vertical partitioning ü Automatic horizontal partitioning (grid
storage) ü Columnar compression ü Column-aware optimizer ü Transactional support ü Hadoop is a deployment option, not required
20
Trans-Relational Features
ü Shared meta-data layer allows for relational algebra to be applied at the plan level, and recursively to deliver partition elimination
ü Primitive DoW avoids complexity of assigning “right” number of resources to a given query
The processing model is more like a storage device than a
traditional database
21
Trans-Relational Features
ü Distributed-system aware optimizer moves the processing to the data even in complex SQL operations
ü N-Way join operation in primitive structure
ü Flexible join/aggregation behavior can be PM
based, UM based, or on disk
ü Handles nested query constructs while still
“moving the processing to the data”
22
Trans-Relational Hadoop Features
ü Parallel/scalable bulk loads – linear scale for load operations
ü Parallel/scalable transform/aggregate operations
ü Parallel/scalable extract ü Parallel/scalable input into R or other tools
deliver open access to predictive analytics
23
InfiniDB Design Principles
®
Scalable
Fast
Simple
Copyright © 2014 InfiniDB. All Rights Reserved.
InfiniDB Customers
Copyright © 2014 InfiniDB. All Rights Reserved.
InfiniDB Deployment Options
Cloud
§ Amazon® machine image
Source Code
§ https://github.com/infinidb/infinidb
Apache Hadoop Distros:
§ Cloudera®
§ Hortonworks®
§ Apache Hadoop®
§ Also MapR®, IBM Big Insights®
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Robin Bloor
Built To Scale?
Robin Bloor, PhD
Hadoop and the Data Warehouse
Hadoop and its multitude of components will not supersede the data warehouse:
u The HDFS is suited to be a database data store (for column-stored data, but with row stores there’s a problem)
u MapReduce is NOT an appropriate algorithm for database optimization
u YARN is a useful capability for scheduling resource sharing
u What is required is a database architecture AND an optimizer AND a SQL capability
The “Old” Data Warehouse
Data wrangling is also a workload!
The “New” Data Warehouse
Data wrangling is a much more significant workload! Analytics is also a significant workload!
Data Wrangling
The Central Data Engine
At this point in time it looks reasonably certain
that the CENTRAL DATA ENGINE will be a scale-out column-store
SQL DBMS
u In general what is the DBA overhead to an InfiniDB database compared with, say, Oracle?
u How does InfiniDB organize its data on HDFS? Is that different from the way it uses MySQL as a store?
u Is there any qualitative difference between the HDFS and MySQL versions of InfiniDB?
u Please explain the open source arrangement with InfiniDB.
u What do you see as the sweet spot for this database?
u In respect to scale, what is your largest implementation by data volume?
u Does InfiniDB have specific support for analytical applications?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA ECOSYSTEM
September: INTEGRATION & DATA FLOW
October: ANALYTIC PLATFORMS
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!
Opening slide image courtesy of Wikimedia Commons