Date post: | 22-Nov-2014 |
Category: |
Technology |
Upload: | kognitio |
View: | 1,033 times |
Download: | 5 times |
Big Data and MicroStrategy: Building a Bridge for the Elephant
Jan 2013Paul Groom, Chief Innovation Officer
Let’s start at…
The End.
Panacea
You…built the DWE
You…built the BICC
and yes you built… lots of cool reports and dashboards
EpilogueA comfortable status quo
How are you really judged?
• Fast?• Consistent?• All users?
Rrrrrriiiiiiinnnnnngggggg!
Back to the real world
Disruption
Disruptor: New Data
Disruptor: Social Media & Sentiment
Data ?
Disruptor:
Disruptor: More Connected Users
Disruptor: Data Discovery Tools
Choices for engaging quickly with data
Business users head’s distracted from core BI!
BI Wild West
Where it matters
Lots of variety of DW and EDW
analytical workload
The Reality of the DW
EDW says no or not now!…and CFO says no big upgrades
Pragmatism
…ok so you enable plenty of caching,limit drill anywhere and add Intelligent Cubes
And then came…
http://oris-rake.deviantart.com/
BoonDistraction
or
Scalable, resilient, bit bucket
Experimenting
© 20th Century Fox
The Hadoop stack
HDFSHDFS
HB
ase
HB
ase
MapReduceMapReduceO
ozie
Ooz
ie
ZooK
eppe
r/ A
mba
riZo
oKep
per/
Am
bari
HCatalogHCatalog
PigPig HiveHive
Hadoop Performance Reality
• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads
– ~30 second base response time– Too much latency in stack and processing model– Trade-off in optimization and latency
• MapReduce complex– Typically multiple Java routines
https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
SQL to the Rescue• So MapReduce is complicated
HDFSHDFS
HB
ase
HB
ase
MapReduceMapReduce
Ooz
ieO
ozie
ZooK
eppe
r/ A
mba
riZo
oKep
per/
Am
bari
HCatalogHCatalog
PigPig HiveHive
– use Hive (SQL) as the easy way out
Hive• Simplifies access
“Hive is great, but Hadoop’s execution engine
makes even the smallest queries take minutes!”
• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage
Hadoop just too slow for interactive BI!
…loss of train-of-thought
Conclusion
“while hadoop shines as a processing
platform, it is painfully slow as a query tool”
Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours.
I remain skeptical on the practical performance of the Hive query approach and have yet to talk to any beta customers. A more practical approach is loading some of the Hadoop data into the in-memory cube with the new Hadoop connector.
Why can’t Hadoopbe in-memory?Why can’t I have a
giant icubes?
Lots of these
Not so many of these
Remember…
Hadoop inherently disk oriented
Typically low ratio of CPU to Disk
Larger cubes
Issues: Time to Populate, Proliferation
Analytics requires CPU,RAM keeps the data close
Alternative - In-memory Processing
Cores do the work!Scale with the data
Goals: Minimise Disruption, Cut Latency
• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk
– Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput
– Its all about queries per hour!• Minimal DBA requirement
Kognitio Hadoop Connectors
HDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data
in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memory
Filter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant
predicates to agent• Data filtering and projection takes place
locally on each Hadoop node• Only data of interest is loaded into memory
via parallel load streams
Centrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools
Analytical power
BI – Central Governance
Engineering for Success
Thomas Herbrich
connect
www.kognitio.com
twitter.com/kognitiolinkedin.com/companies/kognitio
tinyurl.com/kognitio youtube.com/kognitio
NA: +1 855 KOGNITIOEMEA: +44 1344 300 770