Date post: | 12-May-2015 |
Category: |
Technology |
Upload: | jeff-hammerbacher |
View: | 2,735 times |
Download: | 0 times |
Tuesday, June 8, 2010
Evolving a New Analytical PlatformWhat Works and What’s Missing
Jeff HammerbacherChief Scientist and Vice President of Products, ClouderaJune 8, 2010
Tuesday, June 8, 2010
My BackgroundThanks for Asking
▪ [email protected]▪ Studied Mathematics at Harvard▪ Worked as a Quant on Wall Street▪ Conceived, built, and led Data team at Facebook▪ Nearly 30 amazing engineers and data scientists▪ Several open source projects and research papers
▪ Founder of Cloudera▪ Chief Scientist▪ Also, check out the book “Beautiful Data”
Tuesday, June 8, 2010
Presentation Outline▪ BI: Science for Profit▪ Need tools for whole research cycle▪ SQL Server 2008 R2: defining the platform
▪ State of the Platform Ecosystem▪ New Foundations: Hadoop▪ Boiling the Frog▪ Future developments
▪ Questions and Discussion
Tuesday, June 8, 2010
BI is looking more like science (for profit)
Tuesday, June 8, 2010
Jim Gray: Science entering Fourth Paradigm“We have to do better at producing tools to
support the whole research cycle”
Tuesday, June 8, 2010
RDBMS only a small part of this tool set
Tuesday, June 8, 2010
Example: SQL Server 2008 R2
Tuesday, June 8, 2010
RDBMS: SQL Server
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting Services
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
MDM: Master Data Services
Tuesday, June 8, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
MDM: Master Data ServicesCollaboration: SharePoint
Tuesday, June 8, 2010
What do we call this unified suite?
Tuesday, June 8, 2010
For today: Analytical Data Platform
Tuesday, June 8, 2010
Who makes up the platform ecosystem?
Tuesday, June 8, 2010
Platform Providers
Tuesday, June 8, 2010
Platform ProvidersInfrastructure Providers
Tuesday, June 8, 2010
Platform ProvidersInfrastructure Providers
Application Developers
Tuesday, June 8, 2010
Platform ProvidersInfrastructure Providers
Application Developers
Content Providers
Tuesday, June 8, 2010
Platform ProvidersInfrastructure Providers
Application DevelopersEnd Users
Content Providers
Tuesday, June 8, 2010
What is new about the ecosystem today?
Tuesday, June 8, 2010
Content Providers1. > 95% of enterprise data is unstructured
2. Data volumes growing rapidly
Tuesday, June 8, 2010
Infrastructure Providers1. Cloud
2. Warehouse-Scale Computers
Tuesday, June 8, 2010
Platform Providers1. Open source
2. Driven by consumer web properties
Tuesday, June 8, 2010
Application Developers1. Data Scientists
2. Diversity of languages
Tuesday, June 8, 2010
End Users1. Move beyond reporting to analytics2. Make use of all enterprise data
Tuesday, June 8, 2010
New foundations: HDFS and MapReduce
Tuesday, June 8, 2010
(This is what boiling a frog feels like)
Tuesday, June 8, 2010
2005: Doug/Mike start project inside Nutch
Tuesday, June 8, 2010
2006: Doug joins Yahoo!
Tuesday, June 8, 2010
2007: Make Hadoop scale
Tuesday, June 8, 2010
2007: Make Hadoop scaleYahoo! makes Pig open source
Tuesday, June 8, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Tuesday, June 8, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Randy Bryant’s “DISC” lecture
Tuesday, June 8, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Randy Bryant’s “DISC” lecture
Powerset makes HBase open source
Tuesday, June 8, 2010
2008: Make Hadoop fast
Tuesday, June 8, 2010
2008: Make Hadoop fastYahoo! wins Daytona terabyte sort benchmark
Tuesday, June 8, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmark
Tuesday, June 8, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Tuesday, June 8, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Facebook makes Hive open source
Tuesday, June 8, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Facebook makes Hive open source“MapReduce: A Major Step Backwards”
Tuesday, June 8, 2010
2009: Insert Hadoop into the enterprise
Tuesday, June 8, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
Tuesday, June 8, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYC
Tuesday, June 8, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Tuesday, June 8, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Cloudera adds training, support, services
Tuesday, June 8, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Cloudera adds training, support, services
“The Unreasonable Effectiveness of Data”
Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterprise
Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Teradata, Pentaho, and others integrate
Tuesday, June 8, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Teradata, Pentaho, and others integrateHive adds JDBC and ODBC
Tuesday, June 8, 2010
Hadoop will be an Analytical Data Platform
Tuesday, June 8, 2010
What’s Next?
Tuesday, June 8, 2010
Capture: Log collection and CEP
Tuesday, June 8, 2010
Curate: Workflow and Scheduling
Tuesday, June 8, 2010
Curate: Secondary and Full-Text Indexing
Tuesday, June 8, 2010
Curate: Learn Structure from Data
Tuesday, June 8, 2010
Analyze: Mesos-enabled frameworks
Tuesday, June 8, 2010
Analyze: Link local and global data
Tuesday, June 8, 2010
All behind a single pane of glass
Tuesday, June 8, 2010
Cloudera DesktopMaking Many Computers Feel Like One
Tuesday, June 8, 2010
(c) 2009 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0
Tuesday, June 8, 2010