The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Grab some coffee and

enjoy the

pre-show

banter

before the top of the

hour!

The Briefing Room

The Distributed Enterprise – How a Flexible Foundation Opens Doors

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh


The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission


The Briefing Room

Topics

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA ECOSYSTEM

September: INTEGRATION & DATA FLOW

October: ANALYTIC PLATFORMS


The Briefing Room

Executive Summary

!   Information architectures are changing

!  Hybrid architectures will dominate

!   Flexibility will be increasingly critical

!   Embedded database systems will expand

!  Distributed computing is here to stay


The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor


The Briefing Room

InfiniDB

! InfiniDB is a scalable, columnar database built for big data analytics, business intelligence and data warehousing

!   It is 100% open source and offers a MySQL interface

! InfiniDB for Apache Hadoop integrates with Hadoop’s file system (HDFS) to enable real-time analytics within the Hadoop cluster


The Briefing Room

Guest: Jim Tommaney

Jim Tommaney is the Chief Technical Officer for InfiniDB, bringing 20+ years of enterprise data architecture and performance tuning experience to the team. Data warehouse architectures include clustered, large SMP, and distributed/partitioned systems for verticals including retail, web, and telecom. At InfiniDB he is responsible for delivering architecture and design for the InfiniDB product: a high performance, horizontally scalable and cost effective solution purpose built for data warehousing and analytics. He holds a Masters in Management Information Systems from the University of Texas at Dallas.

The Briefing Room with InfiniDB

11

InfiniDB Design Principles

®

Scalable

Fast

Simple

Copyright © 2014 InfiniDB. All Rights Reserved.

Segment Overview


structured unstructured

OLTP

analyGcs

Business Intelligence and Visualization Products Specialty Analytics Applications

InfiniDB Vertica

Impala w/ parquet

Infrastructure Products – Hadoop, Cloud/Hosting, Virtualization, Storage, etc.

NoSQL Products

Specialty Products (Splunk, etc.)

Traditional RDMS Products

ETL, MDM Products.

The Big Data Ecosystem

InfiniDB enables companies to analyze massive amounts of data in real-time on both Hadoop and non-Hadoop environments to discover deep and wide insights.

Radiant Advisors Open-Source SQL-on-Hadoop Benchmark Summary

Que

ry

What is InfiniDB?

Open Source Core

§  GPLv2 licensed scalable, MPP core

§  No restrictions on performance, syntax or scale

MySQL Compatible §  “Drop-in” replacement

for other MySQL storage engines

§  Full SQL syntax and capabilities regardless of platform

Apache Hadoop Friendly §  Native HDFS

integration leverages existing Hadoop deployments

§  Best in class SQL analytics over Hadoop

(My)SQL for Hadoop

InfiniDB uses standard “Engine=InfiniDB” syntax:

15

CREATE TABLE `game_warehouse`.`dim_Gtle` ( `id` INT, `name` VARCHAR(45), `publisher` VARCHAR(45), `release_date` DATE, `language` INT, `plaUorm_name` VARCHAR(45), `version` VARCHAR(45) ) ENGINE=InfiniDB;


InfiniDB Architecture

§  User Module – Understands SQL Requests §  Performance Module – Distributed Processing Engine

or

Single Server MPP


InfiniDB for Hadoop

§  User Module – maps to Name Node §  Performance Module – maps to Data Nodes

MPP

Hadoop Distributed File System

Hadoop Name Node

Hadoop Data Nodes


InfiniDB DoW Differentiation

§  User Module – Understands SQL Requests §  Performance Module – Distributed Processing Engine

Unique Distribution of Work (DoW): •  Move the processing to the data

- Including complex queries, joins •  Primitives sent to distributed queues •  Primitive complete in sub-second •  C++, purpose built •  High, standard, low priority queues A primitive is a unit of work a single thread can accomplish without waits.

Single Server


19

Relational Foundations

ü Can run in-memory, but not required ü Pure column storage for vertical partitioning ü Automatic horizontal partitioning (grid

storage) ü Columnar compression ü Column-aware optimizer ü Transactional support ü Hadoop is a deployment option, not required

20

Trans-Relational Features

ü Shared meta-data layer allows for relational algebra to be applied at the plan level, and recursively to deliver partition elimination

ü Primitive DoW avoids complexity of assigning “right” number of resources to a given query

The processing model is more like a storage device than a

traditional database

21

Trans-Relational Features

ü Distributed-system aware optimizer moves the processing to the data even in complex SQL operations

ü N-Way join operation in primitive structure

ü Flexible join/aggregation behavior can be PM

based, UM based, or on disk

ü Handles nested query constructs while still

“moving the processing to the data”

22

Trans-Relational Hadoop Features

ü Parallel/scalable bulk loads – linear scale for load operations

ü Parallel/scalable transform/aggregate operations

ü Parallel/scalable extract ü Parallel/scalable input into R or other tools

deliver open access to predictive analytics

23

InfiniDB Design Principles

®

Scalable

Fast

Simple


InfiniDB Customers


InfiniDB Deployment Options

Cloud

§  Amazon® machine image

Source Code

§  https://github.com/infinidb/infinidb

Apache Hadoop Distros:

§  Cloudera®

§  Hortonworks®

§  Apache Hadoop®

§  Also MapR®, IBM Big Insights®


The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Built To Scale?

Robin Bloor, PhD

Hadoop and the Data Warehouse

Hadoop and its multitude of components will not supersede the data warehouse:

u  The HDFS is suited to be a database data store (for column-stored data, but with row stores there’s a problem)

u  MapReduce is NOT an appropriate algorithm for database optimization

u  YARN is a useful capability for scheduling resource sharing

u  What is required is a database architecture AND an optimizer AND a SQL capability

The “Old” Data Warehouse

Data wrangling is also a workload!

The “New” Data Warehouse

Data wrangling is a much more significant workload! Analytics is also a significant workload!

Data Wrangling

The Central Data Engine

At this point in time it looks reasonably certain

that the CENTRAL DATA ENGINE will be a scale-out column-store

SQL DBMS

u  In general what is the DBA overhead to an InfiniDB database compared with, say, Oracle?

u  How does InfiniDB organize its data on HDFS? Is that different from the way it uses MySQL as a store?

u  Is there any qualitative difference between the HDFS and MySQL versions of InfiniDB?

u  Please explain the open source arrangement with InfiniDB.

u  What do you see as the sweet spot for this database?

u  In respect to scale, what is your largest implementation by data volume?

u  Does InfiniDB have specific support for analytical applications?


The Briefing Room


The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA ECOSYSTEM

September: INTEGRATION & DATA FLOW

October: ANALYTIC PLATFORMS


The Briefing Room

THANK YOU for your

ATTENTION!

Opening slide image courtesy of Wikimedia Commons

Date post:	02-Jul-2015
Category:	Technology
Upload:	inside-analysis
View:	89 times
Download:	2 times

The Anywhere Enterprise – How a Flexible Foundation Opens Doors

Technology