+ All Categories
Home > Technology > Hadoop Summit Keynote 2014

Hadoop Summit Keynote 2014

Date post: 19-Jun-2015
Category:
Upload: merv-adrian
View: 2,211 times
Download: 2 times
Share this document with a friend
Description:
Keynote presentation from 2014 Hadoop Summit
Popular Tags:
43
© 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity ." Merv Adrian Research Vice President, Information Management @merv Blogs.gartner.com/merv-adrian Waiting for Hadoop (Apologies to Samuel Beckett)
Transcript
  • 1. Waiting for Hadoop(Apologies to Samuel Beckett) 2014 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior writtenpermission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained fromsources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. Thispublication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Although Gartner research mayinclude a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firmsand funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its researchorganization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see "Guiding Principles on Independence and Objectivity."Merv AdrianResearch Vice President, Information [email protected]/merv-adrian

2. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.What Is "Big Data?"Big data" is high-volume, high-velocity and high-varietyinformation assets that demand cost-effective,innovative forms of information processingfor enhanced insight and decision making.Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas Laney, G00235055 3. Let's go. We can't.Waiting for Hadoop 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 4. Why not? Let's wait till 2014 Gartner, Inc. and/or its affiliates. All rights reserved.we knowexactly howwe stand.Waiting for Hadoop 5. Big Data Plans?Many Find Themselves Waiting 6. Investments Are on the Rise,And Deployments Are Beginning11% 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Have invested inbig data technologyPlan to within next yearPlan to within 2 yearsNo plans at this timeDon't know31%20135%15% 19%64%30%N = 720Investing or Planning27%15%16%31%201258%N = 473Investing or PlanningSource: Gartner Research Circle Surveys, 2012, 2013 7. But They Know the Leading Opportunities0% 5% 10% 15% 20% 25% 30% 35%Monetizing Data(Directly/Indirectly)Marketing & Sales GrowthNew Products & ServicesInnovationRisk & Fraud DetectionOperational & FinancialPerformance 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 620142013 8. I wouldn'teven knowhim if I sawhim. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Who is he? 9. So We Search on Gartner.com, 2nd Highest Term160014001000800600400200 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Big Data + HadoopMagic Quadrant01200JanurayFebMarchJanuaryOver1000searchespermonth 10. Starting With What You Need to Do,We See Pieces of a SolutionAnalyze 2014 Gartner, Inc. and/or its affiliates. All rights reserved.ComputePersistIngestMonitor,AdministerDescribe 11. How to Begin? 12. It's the startthat'sdifficult. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.You can startfromanything. 13. Yes, but youhave todecide. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 14. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 15. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 16. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 17. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 18. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 19. The Complexity of Stack Composition Is Rising 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Ingest/PropagateDescribe, DevelopCompute, SearchPersistMonitor, AdministerAnalytics, Machine Learning 20. And Usage Moves - From Pilot to Production10%15% 2014 Gartner, Inc. and/or its affiliates. All rights reserved.57%14%4%Piloting on premisePiloting in the cloudProduction on premise with clusterProduction on premise withapplianceProduction in the cloudSource: Gartner Webinar n=127 21. And Production Means Growth1%15% 2014 Gartner, Inc. and/or its affiliates. All rights reserved.4% None havent started yet2018% 62%Fewer than 10 nodesBetween 11 and 50 nodesBetween 51 and 100 nodesOver 100 nodesSource: Gartner Webinar, April 2014 n=145 22. What is Your Secondary Processing Mode for Hadoop?18% 2014 Gartner, Inc. and/or its affiliates. All rights reserved.2114%53%6%9%Stream processingInteractive analyticsGraph applicationsDatabase ManagementSystemsSearchSource: Gartner Webinar, April 2014 n=120 23. Then all wehave to do iswait on here. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Its notcertain. 24. No, nothing iscertain... 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 25. So, After Batch, Whats Next? 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 26. YARN Changes the Game It All Starts Here 2014 Gartner, Inc. and/or its affiliates. All rights reserved.YARNCluster Resource ManagementHDFSDistributed StorageSQLInteractiveStreamingand eventsDBMSs:Graph,othersBatch In-Memory Search 27. SQL-on-Hadoop Is The Most Typical Addition in 2014Which SQL-on-Hadoop Approach are You MOST Likely to Use in 2014?9% 2014 Gartner, Inc. and/or its affiliates. All rights reserved.2627%23%9%32%Creating your own SQL queries via HiveUsing a distribution-specific SQL solution(e.g., Cloudera Impala, Pivotal HAWQ)Using interfaces to HDFS/Hbase fromanalytics tool providers (e.g. Cognos, SAS,Tableau)Using Hadoop BI specialists (e.g. Platfora,Datameer)Getting to HDFS/Hbase data from yourDBMS external table capability (e.g. KognitioHDFS Connector, Teradata SQL-H)Source: Gartner Webinars 2014 n=164 28. HBase Is The Default Hadoop Database, But Not Alone In every distribution Not just the Valleybase anymore: Bloomberg, Nielsen, others adopt Becoming more secure: cell level is coming But there are alternatives:- NOSQL (Accumulo, Apache Cassandra, MongoDB... )- RDBMS on cluster and off 2014 Gartner, Inc. and/or its affiliates. All rights reserved.27 29. Lets go. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.We cant. 30. Why not? 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Werewaiting for[Hadoop]. 31. Spark Powers Machine Learning,Other Iterative Uses in-MemorySp arkUnifies batch, streaming, interactive comp.Easy to build sophisticated applications Support iterative, graph-parallel algorithms Powerful APIs in Scala, Python, Java In-memory execution engine (richer alternative toMapReduce) for multiple reuse of data to support Iterative algorithms (machine learning, graphs) Interactive data mining Directed acyclic graphs, function pipelining, Partition aware(minimize shuffle) Used with HDFS, HBase Streaming applications 2014 Gartner, Inc. and/or its affiliates. All rights reserved.BlinkDB!Sophisticated algos.Spark!Spark!Streaming! Shark SQL!GraphX! MLlib!StreamingBatch,InteractiveBatch,Interactive InteractiveData-parallel,Iterative 32. Storm: Do-It-Yourself Stream Processing Storm processes streams Spouts emit tuples: k/vtuples representingevents Bolts consume tuples andpass them through rest oftopology Logic & topology is up toyou Apache: Incubating 2014 Gartner, Inc. and/or its affiliates. All rights reserved.SpoutSpout31BoltBoltBoltBoltBoltBolt 33. Tackling the Limitations of SearchFindingStuff 2014 Gartner, Inc. and/or its affiliates. All rights reserved.ShiftingSchemasOn-the-flyAggregations Iterating over a large number ofresults Doing calculations on field valuesfor lots of documents Joining values from multipleindexes Does not do complex analyticchains well You must precalculate answersto facilitate responsiveness If new data changes storedanswers, you must reindex Indexes are HUGEDistributedComputing 34. Hadoop to the Rescue? Maybe Scalable, reliable, fault-tolerant data processing Very good for batch processing of lots of data Can do very complex analysis Can work on data from multiple records at once But its hands-on. Much assembly required. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 35. ...we'll comebacktomorrow.And then theday aftertomorrow.And so on. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.He should behere. And ifhe doesn'tcome? 36. So Now We WaitFor Whats Next. But First 37. Securing HDFS Theres No DBMS ThereSupportedDistribution 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Access Restriction (Physical and Logical)Configuration & Vulnerability ManagementIdentity & Access ManagementNetwork trafficencryptionAudit & ProtectionData maskingTokenization,encryption36DataProtectionMonitoring For Sensitive DataDataAnonymizationAdmin. PrivilegeManagementChangeManagementLogManagementOperations HygieneHDFS Data 38. Data lake or reservoir? 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 39. "Big Data Replacing the Data Warehouse?"Not a Relevant Notion. It Joins the Warehouse.Data warehouses are collections of data not technology platforms.A data warehouse can be made out of anything that manages data.The key point is that when we find value, it is indeed managed. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Source MQ DW DBMS Survey, Nov. 2013 and Nov. 2012What AreOrganizationsPlanning forTheir DWs? 40. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.ManagedTransformedFilteredSecured (somewhat)PortablePotable (fit for consumption)A reservoircontainswater that is 41. And itsnot over. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.Apparentlynot. 42. Its onlybeginning. 2014 Gartner, Inc. and/or its affiliates. All rights reserved. 43. 2014 Gartner, Inc. and/or its affiliates. All rights reserved.The Journey from Pilotto Productionto PlatformBegins here.Thank you!http://www.flickr.com/photos/orinrobertjohn/3267286885/sizes/o/in/photostream/


Recommended