1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By...

1

Large-scale Large-scale Incremental Processing Incremental Processing

Using Distributed Using Distributed Transactions and Transactions and

NotificationsNotifications

Written By Daniel Peng and Written By Daniel Peng and Frank DabekFrank Dabek

Presented By Michael OverPresented By Michael Over

2

AbstractAbstract

Task: Updating an index of the web Task: Updating an index of the web as documents are crawledas documents are crawled Requires continuously transforming a Requires continuously transforming a

large repository of existing documents large repository of existing documents as new documents arriveas new documents arrive

One example of a class of data One example of a class of data processing tasks that transform a large processing tasks that transform a large repository of data via small, repository of data via small, independent mutationsindependent mutations

3

AbstractAbstract

These tasks lie in a gap between the These tasks lie in a gap between the capabilities of existing infrastructurecapabilities of existing infrastructure Databases – Databases – MapReduce – MapReduce –

PercolatorPercolator A system for incrementally processing A system for incrementally processing

updates to a large data setupdates to a large data set Deployed to create the Google web search Deployed to create the Google web search

indexindex Now processes the same number of Now processes the same number of

documents per day but reduced the documents per day but reduced the average age of documents in Google search average age of documents in Google search results by 50%results by 50%

Storage/throughput Storage/throughput requirementsrequirementsCreate large batches for Create large batches for

efficiencyefficiency

4

OutlineOutline

IntroductionIntroduction DesignDesign

BigtableBigtable TransactionsTransactions TimestampsTimestamps NotificationsNotifications

EvaluationEvaluation Related WorkRelated Work Conclusion and Future WorkConclusion and Future Work

5

TaskTask

Task: Build an index of the web that Task: Build an index of the web that can be used to answer search queries.can be used to answer search queries.

Approach: Approach: Crawl every page on the web and process Crawl every page on the web and process

themthem Maintain a set of invariants – same Maintain a set of invariants – same

content, link inversioncontent, link inversion Could be done using a series of Could be done using a series of

MapReduce operationsMapReduce operations

6

ChallengeChallenge

Challenge: Update the index after Challenge: Update the index after recrawling some small portion of the recrawling some small portion of the web.web. Could we run MapReduce over just the Could we run MapReduce over just the

recrawled pages?recrawled pages? No, there are links between the new pages and No, there are links between the new pages and

the rest of the webthe rest of the web Could we run MapReduce over the entire Could we run MapReduce over the entire

repository?repository? Yes, this is how Google’s web search index was Yes, this is how Google’s web search index was

produced prior to this workproduced prior to this work What are some effects of this?What are some effects of this?

7

ChallengeChallenge

What about a DBMS?What about a DBMS? Cannot handle the sheer volume of dataCannot handle the sheer volume of data

What about distributed storage systems What about distributed storage systems like Bigtable?like Bigtable? Scalable but does not provide tools to Scalable but does not provide tools to

maintain data invariants in the face of maintain data invariants in the face of concurrent updatesconcurrent updates

Ideally, the data processing system for Ideally, the data processing system for the task of maintaining the web search the task of maintaining the web search index would be optimized for index would be optimized for incremental processingincremental processing and able to and able to maintain invariantsmaintain invariants

8

PercolatorPercolator

Provides the user with random access Provides the user with random access to a multiple petabyte repositoryto a multiple petabyte repository Process documents individuallyProcess documents individually

Many concurrent threads Many concurrent threads ACID ACID compliant transactionscompliant transactions

Observers – Invoked when a user-Observers – Invoked when a user-specified column changesspecified column changes

Designed specifically for incremental Designed specifically for incremental processingprocessing

9

PercolatorPercolator

Google uses Percolator to prepare web Google uses Percolator to prepare web pages for inclusion in the live web pages for inclusion in the live web search indexsearch index

Can now process documents as they Can now process documents as they are crawled are crawled Reducing the average document Reducing the average document

processing latency by a factor of 100processing latency by a factor of 100 Reducing the average age of a document Reducing the average age of a document

appearing in a search result by nearly 50%appearing in a search result by nearly 50%

10

OutlineOutline




11

DesignDesign

Two main abstractions for performing Two main abstractions for performing incremental processing at large scale:incremental processing at large scale: ACID compliant transactions over a ACID compliant transactions over a

random access repositoryrandom access repository Observers – a way to organize an Observers – a way to organize an

incremental computationincremental computation A Percolator system consists of three A Percolator system consists of three

binaries:binaries: A Percolator workerA Percolator worker A Bigtable tablet serverA Bigtable tablet server A GFS chunkserverA GFS chunkserver

12

OutlineOutline




13

Bigtable OverviewBigtable Overview

Percolator is built on top of the Bigtable Percolator is built on top of the Bigtable distributed storage systemdistributed storage system

Multi-dimensional sorted mapMulti-dimensional sorted map Keys: (row, column, timestamp) tuplesKeys: (row, column, timestamp) tuples

Provides lookup and update operations on Provides lookup and update operations on each roweach row

Row transactions enable atomic read-modify-Row transactions enable atomic read-modify-write operations on individual rowswrite operations on individual rows

Runs reliably on a large number of unreliable Runs reliably on a large number of unreliable machines handling petabytes of datamachines handling petabytes of data

14

Bigtable OverviewBigtable Overview

A running BigTable consists of a A running BigTable consists of a collection of tablet serverscollection of tablet servers

Each tablet server is responsible for Each tablet server is responsible for serving several tabletsserving several tablets

Percolator maintains the gist of Percolator maintains the gist of Bigtable’s interfaceBigtable’s interface

Percolator’s API closely resembles Percolator’s API closely resembles Bigtable’sBigtable’s

Challenge: Provide the additional Challenge: Provide the additional features of multirow transactions and features of multirow transactions and the observer frameworkthe observer framework

15

OutlineOutline


BigTableBigTable TransactionsTransactions TimestampsTimestamps NotificationsNotifications


16

TransactionsTransactions Percolator provides cross-row, cross-table Percolator provides cross-row, cross-table

transactions with ACID snapshot-isolation transactions with ACID snapshot-isolation semanticssemantics

Stores multiple versions of each data item Stores multiple versions of each data item using Bigtable’s timestamp dimensionusing Bigtable’s timestamp dimension

Provides snapshot isolation, which Provides snapshot isolation, which protects against write-write conflictsprotects against write-write conflicts

Percolator must explicitly maintain locksPercolator must explicitly maintain locks Example of transaction involving bank Example of transaction involving bank

accountsaccounts

17

TransactionsTransactions

8: data @ 78: data @ 7

7:7:

6: data @ 56: data @ 5

5: 5:

8:8:

7:7:

6:6:

5:5:

8:8:

7: $67: $6

6:6:

5: $25: $2

8: data @ 78: data @ 7

7:7:

6: data @ 56: data @ 5

5:5:

8:8:

7:7:

6:6:

5:5:

8:8:

7: $67: $6

6:6:

5: $105: $10

Bal:WriteBal:WriteBal:LockBal:LockBal:DataBal:DataKeyKey

JoeJoe

BobBob

KeyKey

BobBob

KeyKey

BobBob

KeyKey

JoeJoe

BobBob

I am PrimaryI am Primary

Primary @ Primary @ Bob.balBob.bal

18

OutlineOutline




19

TimestampsTimestamps Server hands out timestamps in strictly increasing Server hands out timestamps in strictly increasing

orderorder Every transaction requires contacting the Every transaction requires contacting the

timestamp oracle twice, so this server must scale timestamp oracle twice, so this server must scale wellwell

For failure recovery, the timestamp oracle needs For failure recovery, the timestamp oracle needs to write the highest allocated timestamp to disk to write the highest allocated timestamp to disk before responding to a request. before responding to a request.

For efficiency, it batches writes, and "pre-For efficiency, it batches writes, and "pre-allocates" a whole block of timestamps. allocates" a whole block of timestamps.

How many timestamps do you think Google’s How many timestamps do you think Google’s timestamp oracle serves per second from 1 timestamp oracle serves per second from 1 machine?machine?

Answer:Answer:2,000,000(2 million) per

second

20

OutlineOutline




21


Transactions let the user mutate the Transactions let the user mutate the table while maintaining invariants, but table while maintaining invariants, but users also need a way to trigger and run users also need a way to trigger and run the transactions.the transactions.

In Percolator, the user writes In Percolator, the user writes “observers” to be triggered by changes “observers” to be triggered by changes to the tableto the table

Percolator invokes the function after Percolator invokes the function after data is written to one of the columns data is written to one of the columns registered by an observerregistered by an observer

22


Percolator applications are structured Percolator applications are structured as a series of observersas a series of observers

Notifications are similar to database Notifications are similar to database triggers or events in active database but triggers or events in active database but they cannot maintain data invariantsthey cannot maintain data invariants

Percolator needs to efficiently find dirty Percolator needs to efficiently find dirty cells with observers that need to be runcells with observers that need to be run

To do so, it maintains a special “notify” To do so, it maintains a special “notify” Bigtable column, containing an entry for Bigtable column, containing an entry for each dirty celleach dirty cell

23

OutlineOutline




24

EvaluationEvaluation Percolator lies somewhere in the Percolator lies somewhere in the

performance space between MapReduce performance space between MapReduce and DBMSsand DBMSs

Converting from MapReduce – Percolator Converting from MapReduce – Percolator was built to create Google’s large “base” was built to create Google’s large “base” index, a task previously done by index, a task previously done by MapReduceMapReduce

In MapReduce, each day several billions of In MapReduce, each day several billions of documents were crawled and fed through documents were crawled and fed through a series of 100 MapReduces, resulting in a series of 100 MapReduces, resulting in an index which answered user queriesan index which answered user queries

25

EvaluationEvaluation

Using MapReduce, each document Using MapReduce, each document spent 2-3 days being indexed before it spent 2-3 days being indexed before it could be returned as a search resultcould be returned as a search result

Percolator crawls the same number of Percolator crawls the same number of documents, but the document is sent documents, but the document is sent through Percolator as it is crawledthrough Percolator as it is crawled

The immediately advantage is a The immediately advantage is a reduction in latency (the median reduction in latency (the median document moves through over 100x document moves through over 100x faster than with MapReduce)faster than with MapReduce)

26


Percolator freed Google from needing to Percolator freed Google from needing to process the entire repository each time process the entire repository each time documents were indexeddocuments were indexed

Therefore, they can increase the size of Therefore, they can increase the size of the repository (and have, now 3x it’s the repository (and have, now 3x it’s previous size)previous size)

Percolator is easier to operate – there Percolator is easier to operate – there are fewer moving parts: just tablet are fewer moving parts: just tablet servers, Percolator workers, and servers, Percolator workers, and chunkserverschunkservers

27


Question: How do you think Percolator Question: How do you think Percolator performs in comparison to MapReduce if:performs in comparison to MapReduce if: 1% of the repository needs to be updated per 1% of the repository needs to be updated per

hour?hour? 30% of the repository needs to be updated 30% of the repository needs to be updated

per hour?per hour? 60% of the repository needs to be updated 60% of the repository needs to be updated

per hour?per hour? 90% of the repository needs to be updated 90% of the repository needs to be updated

per hour?per hour?

28


29


Comparing Percolator versus “raw” BigtableComparing Percolator versus “raw” Bigtable Percolator introduces overhead relative to Percolator introduces overhead relative to

Bigtable, a factor of four overhead on writes Bigtable, a factor of four overhead on writes due to 4 round trips:due to 4 round trips: Percolator -> Timestamp Server -> Percolator -> Percolator -> Timestamp Server -> Percolator ->

Tentative Write -> Percolator -> Timestamp Tentative Write -> Percolator -> Timestamp Server -> Percolator -> Commit -> Percolator Server -> Percolator -> Commit -> Percolator

30

OutlineOutline




31

Related WorkRelated Work Batch processing systems like MapReduce are Batch processing systems like MapReduce are

well suited for efficiently transforming or well suited for efficiently transforming or analyzing an entire repositoryanalyzing an entire repository

DBMSs satisfy many of the requirements of an DBMSs satisfy many of the requirements of an incremental system but does not scale like incremental system but does not scale like PercolatorPercolator

Bigtable is a scalable, distributed, and fault Bigtable is a scalable, distributed, and fault tolerant storage system, but is not designed to tolerant storage system, but is not designed to be a data transformation systembe a data transformation system

CloudTPS builds an ACID-compliant datastore CloudTPS builds an ACID-compliant datastore on top of distributed storage but is intended on top of distributed storage but is intended to be a backend for a website (stronger focus to be a backend for a website (stronger focus on latency and partition tolerance than on latency and partition tolerance than Percolator)Percolator)

32

OutlineOutline




33

Conclusion and Future Conclusion and Future WorkWork

Percolator has been deployed to Percolator has been deployed to produce Google’s websearch index since produce Google’s websearch index since April, 2010April, 2010

It’s goals were reducing the latency of It’s goals were reducing the latency of indexing a single document with an indexing a single document with an acceptable increase in resource usageacceptable increase in resource usage

Scaling the architecture costs a very Scaling the architecture costs a very significant 30-fold overhead compared significant 30-fold overhead compared to traditional database architecturesto traditional database architectures How much of this is fundamental to How much of this is fundamental to

distributed storage systems and how much distributed storage systems and how much could be optimized away?could be optimized away?

Date post:	27-Dec-2015
Category:	Documents
Upload:	arlene-alicia-kelley
View:	214 times
Download:	1 times

1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By...

Documents