+ All Categories
Home > Technology > 20100608sigmod

20100608sigmod

Date post: 12-May-2015
Category:
Upload: jeff-hammerbacher
View: 2,735 times
Download: 0 times
Share this document with a friend
Popular Tags:
69
Tuesday, June 8, 2010
Transcript
Page 1: 20100608sigmod

Tuesday, June 8, 2010

Page 2: 20100608sigmod

Evolving a New Analytical PlatformWhat Works and What’s Missing

Jeff HammerbacherChief Scientist and Vice President of Products, ClouderaJune 8, 2010

Tuesday, June 8, 2010

Page 3: 20100608sigmod

My BackgroundThanks for Asking

[email protected]▪ Studied Mathematics at Harvard▪ Worked as a Quant on Wall Street▪ Conceived, built, and led Data team at Facebook▪ Nearly 30 amazing engineers and data scientists▪ Several open source projects and research papers

▪ Founder of Cloudera▪ Chief Scientist▪ Also, check out the book “Beautiful Data”

Tuesday, June 8, 2010

Page 4: 20100608sigmod

Presentation Outline▪ BI: Science for Profit▪ Need tools for whole research cycle▪ SQL Server 2008 R2: defining the platform

▪ State of the Platform Ecosystem▪ New Foundations: Hadoop▪ Boiling the Frog▪ Future developments

▪ Questions and Discussion

Tuesday, June 8, 2010

Page 5: 20100608sigmod

BI is looking more like science (for profit)

Tuesday, June 8, 2010

Page 6: 20100608sigmod

Jim Gray: Science entering Fourth Paradigm“We have to do better at producing tools to

support the whole research cycle”

Tuesday, June 8, 2010

Page 7: 20100608sigmod

RDBMS only a small part of this tool set

Tuesday, June 8, 2010

Page 8: 20100608sigmod

Example: SQL Server 2008 R2

Tuesday, June 8, 2010

Page 9: 20100608sigmod

RDBMS: SQL Server

Tuesday, June 8, 2010

Page 10: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Tuesday, June 8, 2010

Page 11: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting Services

Tuesday, June 8, 2010

Page 12: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Tuesday, June 8, 2010

Page 13: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

Tuesday, June 8, 2010

Page 14: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

Tuesday, June 8, 2010

Page 15: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

OLAP: PowerPivot

Tuesday, June 8, 2010

Page 16: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

OLAP: PowerPivot

MDM: Master Data Services

Tuesday, June 8, 2010

Page 17: 20100608sigmod

RDBMS: SQL ServerETL: SQL Server Integration Services

Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services

Search: Full-Text Search

CEP: StreamInsight

OLAP: PowerPivot

MDM: Master Data ServicesCollaboration: SharePoint

Tuesday, June 8, 2010

Page 18: 20100608sigmod

What do we call this unified suite?

Tuesday, June 8, 2010

Page 19: 20100608sigmod

For today: Analytical Data Platform

Tuesday, June 8, 2010

Page 20: 20100608sigmod

Who makes up the platform ecosystem?

Tuesday, June 8, 2010

Page 21: 20100608sigmod

Platform Providers

Tuesday, June 8, 2010

Page 22: 20100608sigmod

Platform ProvidersInfrastructure Providers

Tuesday, June 8, 2010

Page 23: 20100608sigmod

Platform ProvidersInfrastructure Providers

Application Developers

Tuesday, June 8, 2010

Page 24: 20100608sigmod

Platform ProvidersInfrastructure Providers

Application Developers

Content Providers

Tuesday, June 8, 2010

Page 25: 20100608sigmod

Platform ProvidersInfrastructure Providers

Application DevelopersEnd Users

Content Providers

Tuesday, June 8, 2010

Page 26: 20100608sigmod

What is new about the ecosystem today?

Tuesday, June 8, 2010

Page 27: 20100608sigmod

Content Providers1. > 95% of enterprise data is unstructured

2. Data volumes growing rapidly

Tuesday, June 8, 2010

Page 28: 20100608sigmod

Infrastructure Providers1. Cloud

2. Warehouse-Scale Computers

Tuesday, June 8, 2010

Page 29: 20100608sigmod

Platform Providers1. Open source

2. Driven by consumer web properties

Tuesday, June 8, 2010

Page 30: 20100608sigmod

Application Developers1. Data Scientists

2. Diversity of languages

Tuesday, June 8, 2010

Page 31: 20100608sigmod

End Users1. Move beyond reporting to analytics2. Make use of all enterprise data

Tuesday, June 8, 2010

Page 32: 20100608sigmod

New foundations: HDFS and MapReduce

Tuesday, June 8, 2010

Page 33: 20100608sigmod

(This is what boiling a frog feels like)

Tuesday, June 8, 2010

Page 34: 20100608sigmod

2005: Doug/Mike start project inside Nutch

Tuesday, June 8, 2010

Page 35: 20100608sigmod

2006: Doug joins Yahoo!

Tuesday, June 8, 2010

Page 36: 20100608sigmod

2007: Make Hadoop scale

Tuesday, June 8, 2010

Page 37: 20100608sigmod

2007: Make Hadoop scaleYahoo! makes Pig open source

Tuesday, June 8, 2010

Page 38: 20100608sigmod

2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source

Tuesday, June 8, 2010

Page 39: 20100608sigmod

2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source

Randy Bryant’s “DISC” lecture

Tuesday, June 8, 2010

Page 40: 20100608sigmod

2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source

Randy Bryant’s “DISC” lecture

Powerset makes HBase open source

Tuesday, June 8, 2010

Page 41: 20100608sigmod

2008: Make Hadoop fast

Tuesday, June 8, 2010

Page 42: 20100608sigmod

2008: Make Hadoop fastYahoo! wins Daytona terabyte sort benchmark

Tuesday, June 8, 2010

Page 43: 20100608sigmod

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmark

Tuesday, June 8, 2010

Page 44: 20100608sigmod

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop

Tuesday, June 8, 2010

Page 45: 20100608sigmod

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop

Facebook makes Hive open source

Tuesday, June 8, 2010

Page 46: 20100608sigmod

2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop

Facebook makes Hive open source“MapReduce: A Major Step Backwards”

Tuesday, June 8, 2010

Page 47: 20100608sigmod

2009: Insert Hadoop into the enterprise

Tuesday, June 8, 2010

Page 48: 20100608sigmod

2009: Insert Hadoop into the enterpriseCloudera releases CDH

Tuesday, June 8, 2010

Page 49: 20100608sigmod

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYC

Tuesday, June 8, 2010

Page 50: 20100608sigmod

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYCYahoo! sorts a petabyte with Hadoop

Tuesday, June 8, 2010

Page 51: 20100608sigmod

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYCYahoo! sorts a petabyte with Hadoop

Cloudera adds training, support, services

Tuesday, June 8, 2010

Page 52: 20100608sigmod

2009: Insert Hadoop into the enterpriseCloudera releases CDH

First Hadoop World NYCYahoo! sorts a petabyte with Hadoop

Cloudera adds training, support, services

“The Unreasonable Effectiveness of Data”

Tuesday, June 8, 2010

Page 53: 20100608sigmod

2010: Integrate Hadoop into the enterprise

Tuesday, June 8, 2010

Page 54: 20100608sigmod

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Tuesday, June 8, 2010

Page 55: 20100608sigmod

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Tuesday, June 8, 2010

Page 56: 20100608sigmod

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Datameer and Karmasphere funded

Tuesday, June 8, 2010

Page 57: 20100608sigmod

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Datameer and Karmasphere funded

Teradata, Pentaho, and others integrate

Tuesday, June 8, 2010

Page 58: 20100608sigmod

2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights

Yahoo! completes enterprise-class security

Datameer and Karmasphere funded

Teradata, Pentaho, and others integrateHive adds JDBC and ODBC

Tuesday, June 8, 2010

Page 59: 20100608sigmod

Hadoop will be an Analytical Data Platform

Tuesday, June 8, 2010

Page 60: 20100608sigmod

What’s Next?

Tuesday, June 8, 2010

Page 61: 20100608sigmod

Capture: Log collection and CEP

Tuesday, June 8, 2010

Page 62: 20100608sigmod

Curate: Workflow and Scheduling

Tuesday, June 8, 2010

Page 63: 20100608sigmod

Curate: Secondary and Full-Text Indexing

Tuesday, June 8, 2010

Page 64: 20100608sigmod

Curate: Learn Structure from Data

Tuesday, June 8, 2010

Page 65: 20100608sigmod

Analyze: Mesos-enabled frameworks

Tuesday, June 8, 2010

Page 66: 20100608sigmod

Analyze: Link local and global data

Tuesday, June 8, 2010

Page 67: 20100608sigmod

All behind a single pane of glass

Tuesday, June 8, 2010

Page 68: 20100608sigmod

Cloudera DesktopMaking Many Computers Feel Like One

Tuesday, June 8, 2010

Page 69: 20100608sigmod

(c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0

Tuesday, June 8, 2010