Big data and mstr bridge the elephant

Big Data and MicroStrategy: Building a Bridge for the Elephant

Jan 2013Paul Groom, Chief Innovation Officer

Let’s start at…

The End.

Panacea

You…built the DWE

You…built the BICC

and yes you built… lots of cool reports and dashboards

EpilogueA comfortable status quo

How are you really judged?

• Fast?• Consistent?• All users?

Rrrrrriiiiiiinnnnnngggggg!

Back to the real world

Disruption

Disruptor: New Data

Disruptor: Social Media & Sentiment

Data ?

Disruptor:

Disruptor: More Connected Users

Disruptor: Data Discovery Tools

Choices for engaging quickly with data

Business users head’s distracted from core BI!

BI Wild West

Where it matters

Lots of variety of DW and EDW

analytical workload

The Reality of the DW

EDW says no or not now!…and CFO says no big upgrades

Pragmatism

…ok so you enable plenty of caching,limit drill anywhere and add Intelligent Cubes

And then came…

http://oris-rake.deviantart.com/

BoonDistraction

or

Scalable, resilient, bit bucket

Experimenting

© 20th Century Fox

The Hadoop stack

HDFSHDFS

HB

ase

HB

ase

MapReduceMapReduceO

ozie

Ooz

ie

ZooK

eppe

r/ A

mba

riZo

oKep

per/

Am

bari

HCatalogHCatalog

PigPig HiveHive

Hadoop Performance Reality

• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads

– ~30 second base response time– Too much latency in stack and processing model– Trade-off in optimization and latency

• MapReduce complex– Typically multiple Java routines

https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

SQL to the Rescue• So MapReduce is complicated

HDFSHDFS

HB

ase

HB

ase

MapReduceMapReduce

Ooz

ieO

ozie

ZooK

eppe

r/ A

mba

riZo

oKep

per/

Am

bari

HCatalogHCatalog

PigPig HiveHive

– use Hive (SQL) as the easy way out

Hive• Simplifies access

“Hive is great, but Hadoop’s execution engine

makes even the smallest queries take minutes!”

• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage

Hadoop just too slow for interactive BI!

…loss of train-of-thought

Conclusion

“while hadoop shines as a processing

platform, it is painfully slow as a query tool”

Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours.

I remain skeptical on the practical performance of the Hive query approach and have yet to talk to any beta customers. A more practical approach is loading some of the Hadoop data into the in-memory cube with the new Hadoop connector.

Why can’t Hadoopbe in-memory?Why can’t I have a

giant icubes?

Lots of these

Not so many of these

Remember…

Hadoop inherently disk oriented

Typically low ratio of CPU to Disk

Larger cubes

Issues: Time to Populate, Proliferation

Analytics requires CPU,RAM keeps the data close

Alternative - In-memory Processing

Cores do the work!Scale with the data

Goals: Minimise Disruption, Cut Latency

• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk

– Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput

– Its all about queries per hour!• Minimal DBA requirement

Kognitio Hadoop Connectors

HDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data

in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memory

Filter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant

predicates to agent• Data filtering and projection takes place

locally on each Hadoop node• Only data of interest is loaded into memory

via parallel load streams

Centrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools

Analytical power

BI – Central Governance

Engineering for Success

Thomas Herbrich

connect

www.kognitio.com

twitter.com/kognitiolinkedin.com/companies/kognitio

tinyurl.com/kognitio youtube.com/kognitio

NA: +1 855 KOGNITIOEMEA: +44 1344 300 770

Date post:	22-Nov-2014
Category:	Technology
Upload:	kognitio
View:	1,033 times
Download:	5 times

Big data and mstr bridge the elephant

Technology