Date post: | 10-Nov-2014 |
Category: |
Technology |
Upload: | clintongreen |
View: | 1,046 times |
Download: | 0 times |
HADOOP: NOT DEAD YETClint Green (+clintgreen)
Volume Velocity Variety
“Big Data” is when the size of the data itself becomes part of the
problem.
- Big Data Now, O’Reilly
PERCEPTION IS REALITY
Hadoop is Flawed
“You can’t install it without an expert.”
“Fine for R&D, but not for real production.”
“Hadoop is just for batch processing.”
“The dirty-little-secret with Hadoop is…”
Hadoop isn’t for RealWork™
1.Adopt Hadoop for pilot projects.2.Scale Hadoop to production use.3.Observe an unacceptable
performance penalty.4.Morph to a real parallel DBMS.
-Michael Stonebraker, CACM, May 2012(Vertica, VoltDB, SciDB)
Availability
Partition ToleranceConsistency
Availability
Partition ToleranceConsistency
Atomicity
Consistency
Isolation
Durability
ACI
D
“4. Morph to a real parallel DBMS.”
REALITY IS RELATIVE
Evolve
“Hadoop has become the kernel of the distributed operating system for Big Data…
No one uses the kernel alone.”-Doug Cutting, Strata 2012
(Cloudera, ASF)
Hadoop + MapReduce
“There is nothing really embarrassing about embarrassingly parallel applications."
-Luiz André Barroso, ACM 2011(Distinguished Engineer Google)
Not Just for Batch Anymore…
APACHEHAMA D
RILL
APACHE
Apache Hadoop YARNThe per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
No Secret Here
Help is on the Way
ACTUAL PROBLEMS
VS
VS
THANK YOUClint Green (+clintgreen)