JAZOON'13 - Benoit Perroud - Realtime Queries

Post on 15-Jan-2015

273 views 2 download

Tags:

description

http://guide13.jazoon.com/#/submissions/133

transcript

Enabling Real-time Queries to End UsersBenoit Perroud

• Benoit Perroud

• Software Engineer @Verisign

• Leading Hadoop Team

• Apache Committer

• @killerwhile

About me

|

• What’s going on

• Batch and Realtime

• Hadoop Deployments

• Next steps

Agenda

|

• Mainframes are obsolete, replaced by commodity hardware’s cluster

• TenG (10Gb/s) links are the new standard

• RESTful APIs are everywhere

• Everybody wants to visit Paxos island

• Firehoses do not only carry water

• Asynchronous non-blocking functional programming is taught at primary school

• NoSQL is the new way to store data at scale

• API management startups are rising (and raising)

• Hadoop keywords boost your LinkedIn profile by 2000%

• Public clouds are responsible for more than 50% of the global Internet traffic

• … and counting …

What’s going on

|

| Speaker’s Logo

Source: http://dev.datasift.com/blog/high-scalabilityNote: the diagram is stamped from 2009, it is probablypartially or even completely outdated today

A Possible

Deployment

Batch and Realtime

|

Batch Processing

Batch 1

Batch 1 ready to be served

Time

Batch 1 startsprocessing

t1 t2

Batch 2

Batch 2 ready to be served

Batch 2 startsprocessing

t3 t4

Query data from t1 Query data from t3

Batch 3

Batch 3 startsprocessing

t5

Data gap Data gap

|

Batch Processing in details

Batch with data from yesterday

Time

New batch granularityperiod

Let some timefor data to finishupload

Load resultsin a data store

Notify the retrieval systema new batch is readyto be served

Processing time

|

Query data from the day before yesterday?

• Interactive query

• REST like request/response query type

And

• Query the latest version of the data

• Latest meaning n seconds ago with n known and fixed

Realtime Query

|

Hybrid Approach

Batch 1

Batch 1 ready to be served

Time

Batch 1 startsprocessing

t1 t2

Batch 2

Batch 2 ready to be served

Batch 2 startsprocessing

t3 t4

Query data from t1 snapshot AND

complementary data

|

Complementary data for batch 1Complementary data for batch 2

Query data from t2 snapshot AND

complementary data

Hadoop

Deployments

|

Naïve Hadoop Deployment

Gateway

NameNode

hdfs dfs -put

mapred job …jar

hdfs dfs -get

JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

Processing

|

Industry Hadoop

Deployment

Data In GW

Data Out GWMetadata StoreMonitoring

Gateway

NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNode

Processing

NameNode JobTracker NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

NameNode

Research,Data Science

|

Realtime Hadoop Deployment

Data In GW

Gateway

NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

Processing

NameNode JobTracker

RT Data Out GW

RT processing

|

Realtime Search with Hadoop

Data In GW

Gateway

NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

Generate Indexes

NameNode JobTracker

RT Data Out GW

Update indexes

|

Coordinator

Next Steps

|

… is moving … really fast

•Interactive Queries: Cloudera Impala, Apache Drills, Tez, …

•Search: SolrCloud, ElasticSearch, Cloudera Search

•Hybrid layer: Twitter SummingBird

•… and counting …

Hadoop Ecosystem

|

Thanks for the attention!

Follow @killewhilebperroud@verisign.com

“Copyright © 2013 VeriSign, Inc.  All rights reserved.  The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries.  All other trademarks, service marks, and designs are property of their respective owners.  Verisign has made efforts to ensure the accuracy and completeness of the information in this document.  However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document.  Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”