+ All Categories
Home > Technology > Introduction to Apache Drill 1.0

Introduction to Apache Drill 1.0

Date post: 19-Jul-2015
Category:
Upload: mapr-technologies
View: 969 times
Download: 1 times
Share this document with a friend
Popular Tags:
12
© 2015 MapR Technologies 1
Transcript

© 2015 MapR Technologies 1

© 2015 MapR Technologies 2

SEMI-STRUCTURED DATA

STRUCTURED DATA

1980 2000 20101990 2020

Data Is Doubling Every Two Years

Unstructured data will account

for more than 80% of the data

collected by organizations

Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data

To

tal D

ata

Sto

red

IT

Resources

© 2015 MapR Technologies 3

1980 2000 20101990 2020

Fixed schema

DBA controls structure

Dynamic / Flexible schema

Application controls structure

NON-RELATIONAL DATASTORESRELATIONAL DATABASES

GBs-TBs TBs-PBsVolume

Database

Data Increasingly Stored in Non-Relational Datastores

Structure

Development

Structured Structured, semi-structured and unstructured

Planned (release cycle = months-years) Iterative (release cycle = days-weeks)

© 2015 MapR Technologies 4

How To Bring SQL Into An Unstructured Future?

Familiarity of SQL Flexibility of NoSQL

• SQL

• BI (Tableau, MicroStrategy,

etc.)

• Low latency

• Scalability

• No schema management

– HDFS (Parquet, JSON, etc.)

– HBase

– …

• No transform or silos of data

© 2015 MapR Technologies 5

Industry's First

Schema-free SQL engine

for Big Data

© 2015 MapR Technologies 6

Apache Drill Brings Flexibility & PerformanceAccess to any data type, any data source

• Relational

• Nested data

• Schema-less

Rapid time to insights

• Query data in-situ

• No Schemas required

• Easy to get started

Integration with existing tools

• ANSI SQL

• BI tool integration

Scale in all dimensions

• TB-PB of scale

• 1000’s of users

• 1000’s of nodes

Granular Security

• Authentication

• Row/column level controls

• De-centralized

© 2015 MapR Technologies 7

Extending Self Service to Schema-free dataA

gil

ity &

Bu

sin

ess V

alu

e

Use cases for BI

IT-Driven BI

Self-Service BI

Schema-Free

Data Exploration

IT-Driven BI IT-Driven BI

Self-Service BI

Analyst-driven with

no IT dependency

Analyst-driven with

IT support for ETL

IT-created

reports, spreadsheets

1980s -1990s 2000s Now

© 2015 MapR Technologies 8

Enabling “As-It-Happens” Business with Instant Analytics

Hadoop data Data modeling Transformation

Data movement

(optional)

Users

Hadoop data Users

Governed

approach

Exploratory

approach

New Business questionsSource data evolution

Total time to insight: weeks to months

Total time to insight: minutes

© 2015 MapR Technologies 9

Drill’s Role in the Enterprise Data Architecture

Raw data

• JSON, CSV, ...

“Optimized” data

• Parquet, …

Centrally-structured data

• Schemas in Hive Metastore

Relational data

• Highly-structured data

Hive, Impala, Spark SQL

Oracle, Teradata

Exploration

(known and unknown questions)

© 2015 MapR Technologies 10

MapR with Drill is Top-Ranked SQL-on-Hadoop

Source: Gigaom Research, 2015

Key:

• Number indicates companies relative strength across all vectors

• Size of ball indicates company’s relative strength along individual vector

Like other vendors’

offerings, Drill

handles BI and

interactive queries with

great aplomb, but it is

designed to serve these

workloads with data

complexity that goes

well beyond the flat

structured data that

other SQL-on-

Hadoop systems deal

with.

© 2015 MapR Technologies 11

Drill Benefits

Business users Technical IT

Business Analyst,

Data scientists,

VP of Hadoop Dev.,

Director of BI & Analytics,

Enterprise architect

• Self Service access to

Hadoop data from BI

tools

• Agility with no IT

intervention

• Interactive performance

• Drive Hadoop adoption

in company

• Enable better/new BI in

raw, real time and new

data types

• Reduce cost of

traditional systems

© 2015 MapR Technologies 12

Additional Resources

Resource Hub:

MapR.com/Drill

Tutorial:

Apache Drill in

10 Minutes

Whitepaper:

Faster Time to

Value


Recommended