Transactions Over Apache HBase

Post on 16-Apr-2017

674 views 3 download

transcript

TRANSACTIONS OVER HBASEAlex Baranau @abaranau Gary Helmling @gario

Continuuity

WHO WE ARE • We’ve built Continuuity Reactor: the world’s first scale-out

application server for Hadoop

• Fast, easy development, deployment and management of Hadoop and HBase apps

• Continuuity team has years of experience in using and contributing to Open Source, and we intend to continue doing so.

���2

AGENDA • Transactions in stream processing: Why? What?

• Implementation: How? • Omid-style transactions explained

• Transaction Manager

• What’s next?

���3

THE REACTOR • Continuuity Reactor is an app platform built on Hadoop and HBase

• Collect, Process, Store, and Query data.

• A Flow is a real-time processor with exactly-once guarantee

• A flow is composed of flowlets, connected via queues

• All processing happens with ACID guarantees in transactions

���4

HBase Table

PROCESSING IN A FLOW

���5

...Queue ...

...

Flowlet

... ...

HBase Table

PROCESSING IN A FLOW

���6

...Queue ...

...

Flowlet

... ...

HBase Table

PROCESSING IN A FLOW

���7

...Queue ...

...

Flowlet

TRANSACTIONS: WHAT?• Atomic - Entire transaction is committed as one

• Consistent - No partial state change due to failure

• Isolated - No dirty reads, transaction is only visible after commit

• Durable - Once committed, data is persisted reliably

���8

WHAT ABOUT HBASE?• Atomic operations on cell value:

checkAndPut, checkAndDelete, increment, append

• Atomic batch of operations on rows within region

���9

• No cross region atomic operations support

• No cross table atomic operations support

• No multi-RPC atomic operations support

IMPLEMENTATION OVERVIEW

���10

OMID-STYLE TRANSACTIONS • Multi-Version Concurrency Control

• Cell version (timestamp) = transaction ID

• All writes in the same transaction use the transaction ID as timestamp

• Reads exclude other, uncommitted transactions (for isolation)

• Optimistic Concurrency Control

• Conflict detection at commit of transaction

• Write Conflict: two overlapping transactions write the same row

• Rollback of one transaction in case of conflict (whichever commits later)

���11

OPTIMISTIC CONCURRENCY CONTROL

• Avoids cost of locking rows and tables

• No deadlocks or lock escalations

• Cost of conflict detection and possible rollback is higher

• Good if conflicts are rare: short transaction, disjoint partitioning of work

���12

ZooKeeper

TRANSACTIONS IN CONTEXT

���13

Tx Manager (standby)

HBase

Master 1

Master 2

RS 1

RS 2 RS 4

RS 3

Client 1

Client 2

Client N

Tx Manager (active)

TRANSACTION LIFE CYCLE

time out

try abort

failed

roll back in HBase

write to

HBasedo work

Client Tx Manager

none

complete Vabortsucceeded

in progress

start txstart

start tx

committry commit check conflicts

RPC API

invalid Xinvalidate

failed

HBase

CLIENT SIDE: TX AWARE

���15

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1

Client 2

write = 1002 read = 1001

HBase

CLIENT SIDE: TX AWARE

���16

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1

start

write = 1002 read = 1001

Client 2

write = 1002 read = 1001

HBase

CLIENT SIDE: TX AWARE

���17

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1

start

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���18

Cell TS Value

row1:col1 1001 10

Tx Manager

Client 1increment

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���19

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

Tx Manager

Client 1 increment

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���20

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

Tx Manager

Client 1 start

write = 1002 read = 1001

Client 2

write = 1003 read = 1001

inprogress=[1002]

write = 1003 read = 1001

excluded=[1002]

HBase

CLIENT SIDE: TX AWARE

���21

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

Tx Manager

Client 1 start

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002, 1003]

write = 1003 read = 1001

excluded=[1002]

HBase

CLIENT SIDE: TX AWARE

���22

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

increment

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002, 1003]

write = 1003 read = 1001

excluded=[1002]

HBase

CLIENT SIDE: TX AWARE

���23

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

commit

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002, 1003]

write = 1003 read = 1001

excluded=[1002]

HBase

CLIENT SIDE: TX AWARE

���24

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

commit

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

write = 1003 read = 1001

excluded=[1002]

HBase

CLIENT SIDE: TX AWARE

���25

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

commit

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���26

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

conflict!

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���27

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���28

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1 rollback

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���29

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

abort

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���30

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

abort

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[]

HBase

CLIENT SIDE: TX AWARE

���31

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

abort

write = 1002 read = 1001

Client 2

write = 1004 read = 1003

inprogress=[]

HBase

CLIENT SIDE: TX AWARE

���32

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1 start

Client 2

write = 1005 read = 1003

inprogress=[]

write = 1004 read = 1003

HBase

CLIENT SIDE: TX AWARE

���33

Cell TS Value

row1:col1 1001 10

row1:col1 1003 11

Tx Manager

Client 1

read

Client 2

write = 1005 read = 1003

inprogress=[]

write = 1004 read = 1003

HBase

CLIENT SIDE: TX AWARE

���34

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

conflict!

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���35

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���36

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback failed

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���37

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

invalidate

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

HBase

CLIENT SIDE: TX AWARE

���38

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

invalidate

write = 1002 read = 1001

Client 2

write = 1004 read = 1003

inprogress=[] invalid=[1002]

HBase

CLIENT SIDE: TX AWARE

���39

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 start

Client 2

write = 1005 read = 1003

inprogress=[] invalid=[1002]

write = 1004 read = 1003

exclude = [1002]

HBase

CLIENT SIDE: TX AWARE

���40

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1

read

Client 2

write = 1005 read = 1003

inprogress=[] invalid=[1002]

write = 1004 read = 1003

exclude = [1002]

invisible!

TRANSACTION MANAGER• Create new transactions

• Provides monotonically increasing write pointers

• Maintains all in-progress, committed, and invalid transactions

• Detect conflicts

• Transaction = Write Pointer: Timestamp for HBase writes

Read pointer: Upper bound timestamp for reads

Excludes: List of timestamps to exclude from reads

���41

TRANSACTION MANAGER• Simple & Fast

• All required state is in-memory

• Single point of failure? • Persist all state to a write-ahead log

• Secondary Tx Manager watches for failure of Primary

• Failover can happen quickly

���42

TRANSACTION MANAGER

���43

Tx ManagerCurrent State

in progress

committed

invalid

read point

write point

start()

TRANSACTION MANAGER

���44

Tx ManagerCurrent State

in progress (+)

committed

invalid

read point

write point ++

start()Tx Log

started, <write pt>

HDFS

TRANSACTION MANAGER

���45

Tx ManagerCurrent State

in progress (-)

committed (+)

invalid

read point

write pointcommit()

Tx Logstart, <write pt>

commit, <write pt>

HDFS

TRANSACTION SNAPSHOTS

• Write-ahead log provides persistence • Guarantees point-in-time recovery

• Longer the log grows, longer recovery takes

• Periodically write snapshot of full transaction state • Snapshot + all new logs provides full state

���46

Tx ManagerCurrent State

TRANSACTION SNAPSHOTS

���47

Tx Log A

in progress

committed

invalid

read point

write point

HDFS

Tx ManagerCurrent State

TRANSACTION SNAPSHOTS

���48

Tx Log A

in progress

committed

invalid

read point

write point Tx Log B1

HDFS

TRANSACTION SNAPSHOTS

���49

Tx Log ATx Manager

in progress

committed

invalid

read point

write point

Current State

State Snapshot

in progress

committed

invalid

read point

write point

Tx Log B2

HDFS

TRANSACTION SNAPSHOTS

���50

Tx Log ATx Manager

in progress

committed

invalid

read point

write point

Current State

State Snapshot

in progress

committed

invalid

read point

write point

Tx Log B

Tx Snapshot

in progress

committed

invalid

read point

write point

3

HDFS

TRANSACTION SNAPSHOTS

���51

Tx Log ATx Manager

in progress

committed

invalid

read point

write point

Current State

State Snapshot

in progress

committed

invalid

read point

write point

Tx Log B

Tx Snapshot

in progress

committed

invalid

read point

write point

4

HDFS

HBase

TRANSACTION CLEANUP

���52

Cell TS Value

row1:col1 1001 10

row1:col1 1002 11

row1:col1 1003 11

Tx Manager

Client 1 rollback failed

write = 1002 read = 1001

Client 2

write = 1004 read = 1001

inprogress=[1002]

TRANSACTION CLEANUP: DATA JANITOR

• RegionObserver coprocessor

• Maintains in-memory snapshot of recent invalid & in-progress sets

• Periodically updates from transaction snapshot in HDFS

• Purges data from invalid transactions and older versions on flush & compaction

���53

HBase

TRANSACTION CLEANUP: DATA JANITOR

���54

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

refresh

Data Janitor (RegionObserver)

MemStore

preFlush()

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Cell TS Valuerow1:col1 1004 12

1003 111002 11

Custom RegionScanner

HFileCell TS Value

HBase

TRANSACTION CLEANUP: DATA JANITOR

���55

Data Janitor (RegionObserver)

HFileCell TS Value

Custom RegionScanner

MemStore

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HBase

TRANSACTION CLEANUP: DATA JANITOR

���56

Data Janitor (RegionObserver)

HFileCell TS Value

row1:col1 1004 12Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HBase

TRANSACTION CLEANUP: DATA JANITOR

���57

Data Janitor (RegionObserver)

Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 12

HBase

TRANSACTION CLEANUP: DATA JANITOR

���58

Data Janitor (RegionObserver)

Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 121003 11

HBase

TRANSACTION CLEANUP: DATA JANITOR

���59

Data Janitor (RegionObserver)

Custom RegionScanner

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

MemStore

preFlush()

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 121003 11

HBase

TRANSACTION CLEANUP: DATA JANITOR

���60

Data Janitor (RegionObserver)

read point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Tx Snapshotread point = 1003

write point = 1005

in progress = [1004]

committed = []

invalid = [1002]

Custom RegionScanner

preFlush()

MemStore

Cell TS Valuerow1:col1 1004 12

1003 111002 11

HFileCell TS Value

row1:col1 1004 121003 11

WHAT’S NEXT?• Open Source

• Continue Scaling Tx Manager • Transaction Groups?

• Integration across other transactional stores

���61

QS?Looking for the chance to work with a team that is

defining a new category within Big Data?

!

We are hiring! http://continuuity.com/careers

careers@continuuity.com

���62