+ All Categories
Home > Technology > JAZOON'13 - Benoit Perroud - Realtime Queries

JAZOON'13 - Benoit Perroud - Realtime Queries

Date post: 15-Jan-2015
Category:
Upload: jazoon13
View: 273 times
Download: 2 times
Share this document with a friend
Description:
http://guide13.jazoon.com/#/submissions/133
Popular Tags:
18
Enabling Real-time Queries to End Users Benoit Perroud
Transcript
Page 1: JAZOON'13 - Benoit Perroud - Realtime Queries

Enabling Real-time Queries to End UsersBenoit Perroud

Page 2: JAZOON'13 - Benoit Perroud - Realtime Queries

• Benoit Perroud

• Software Engineer @Verisign

• Leading Hadoop Team

• Apache Committer

• @killerwhile

About me

|

Page 3: JAZOON'13 - Benoit Perroud - Realtime Queries

• What’s going on

• Batch and Realtime

• Hadoop Deployments

• Next steps

Agenda

|

Page 4: JAZOON'13 - Benoit Perroud - Realtime Queries

• Mainframes are obsolete, replaced by commodity hardware’s cluster

• TenG (10Gb/s) links are the new standard

• RESTful APIs are everywhere

• Everybody wants to visit Paxos island

• Firehoses do not only carry water

• Asynchronous non-blocking functional programming is taught at primary school

• NoSQL is the new way to store data at scale

• API management startups are rising (and raising)

• Hadoop keywords boost your LinkedIn profile by 2000%

• Public clouds are responsible for more than 50% of the global Internet traffic

• … and counting …

What’s going on

|

Page 5: JAZOON'13 - Benoit Perroud - Realtime Queries

| Speaker’s Logo

Source: http://dev.datasift.com/blog/high-scalabilityNote: the diagram is stamped from 2009, it is probablypartially or even completely outdated today

A Possible

Deployment

Page 6: JAZOON'13 - Benoit Perroud - Realtime Queries

Batch and Realtime

|

Page 7: JAZOON'13 - Benoit Perroud - Realtime Queries

Batch Processing

Batch 1

Batch 1 ready to be served

Time

Batch 1 startsprocessing

t1 t2

Batch 2

Batch 2 ready to be served

Batch 2 startsprocessing

t3 t4

Query data from t1 Query data from t3

Batch 3

Batch 3 startsprocessing

t5

Data gap Data gap

|

Page 8: JAZOON'13 - Benoit Perroud - Realtime Queries

Batch Processing in details

Batch with data from yesterday

Time

New batch granularityperiod

Let some timefor data to finishupload

Load resultsin a data store

Notify the retrieval systema new batch is readyto be served

Processing time

|

Query data from the day before yesterday?

Page 9: JAZOON'13 - Benoit Perroud - Realtime Queries

• Interactive query

• REST like request/response query type

And

• Query the latest version of the data

• Latest meaning n seconds ago with n known and fixed

Realtime Query

|

Page 10: JAZOON'13 - Benoit Perroud - Realtime Queries

Hybrid Approach

Batch 1

Batch 1 ready to be served

Time

Batch 1 startsprocessing

t1 t2

Batch 2

Batch 2 ready to be served

Batch 2 startsprocessing

t3 t4

Query data from t1 snapshot AND

complementary data

|

Complementary data for batch 1Complementary data for batch 2

Query data from t2 snapshot AND

complementary data

Page 11: JAZOON'13 - Benoit Perroud - Realtime Queries

Hadoop

Deployments

|

Page 12: JAZOON'13 - Benoit Perroud - Realtime Queries

Naïve Hadoop Deployment

Gateway

NameNode

hdfs dfs -put

mapred job …jar

hdfs dfs -get

JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

Processing

|

Page 13: JAZOON'13 - Benoit Perroud - Realtime Queries

Industry Hadoop

Deployment

Data In GW

Data Out GWMetadata StoreMonitoring

Gateway

NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNode

Processing

NameNode JobTracker NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

NameNode

Research,Data Science

|

Page 14: JAZOON'13 - Benoit Perroud - Realtime Queries

Realtime Hadoop Deployment

Data In GW

Gateway

NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

Processing

NameNode JobTracker

RT Data Out GW

RT processing

|

Page 15: JAZOON'13 - Benoit Perroud - Realtime Queries

Realtime Search with Hadoop

Data In GW

Gateway

NameNode JobTracker

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

DataNodeDataNode

Generate Indexes

NameNode JobTracker

RT Data Out GW

Update indexes

|

Coordinator

Page 16: JAZOON'13 - Benoit Perroud - Realtime Queries

Next Steps

|

Page 17: JAZOON'13 - Benoit Perroud - Realtime Queries

… is moving … really fast

•Interactive Queries: Cloudera Impala, Apache Drills, Tez, …

•Search: SolrCloud, ElasticSearch, Cloudera Search

•Hybrid layer: Twitter SummingBird

•… and counting …

Hadoop Ecosystem

|

Page 18: JAZOON'13 - Benoit Perroud - Realtime Queries

Thanks for the attention!

Follow @[email protected]

“Copyright © 2013 VeriSign, Inc.  All rights reserved.  The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries.  All other trademarks, service marks, and designs are property of their respective owners.  Verisign has made efforts to ensure the accuracy and completeness of the information in this document.  However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document.  Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”


Recommended