+ All Categories
Home > Documents > Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group –...

Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group –...

Date post: 01-Apr-2015
Category:
Upload: jazmyne-willey
View: 214 times
Download: 0 times
Share this document with a friend
53
Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software Engineer
Transcript
Page 1: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

Secondary Indexing in Phoenix

Jesse YatesHBase CommitterSoftware Engineer

SF HBase User Group – September 26, 2013

James TaylorPhoenix LeadSoftware Engineer

Page 2: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

2

Agenda

• About

• Indexes In Phoenix

• Immutable Indexes

• Mutable Indexes

• Demo!

• Roadmap

SF HUG – Sept 2013

https://www.madison.k12.wi.us/calendars

Page 3: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

3

Phoenix

• Open Source– https://github.com/forcedotcom/phoenix

• “SQL-skin” on HBase– Everyone knows SQL!

• JDBC Driver– Plug-and-play

• Faster than HBase– in some cases

SF HUG – Sept 2013

Page 4: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

4

Secondary Indexes

• Sort on ‘orthogonal’ axis

• Save full-table scan

• Expected database feature

• Hard in HBase b/c of ACID considerations

SF HUG – Sept 2013

Page 5: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

5

• About

• Indexes In Phoenix

• Immutable Indexes

• Mutable Indexes

• Demo!

• Roadmap

Agenda

SF HUG – Sept 2013

Page 6: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

6

Indexes In Phoenix

• Creating an index– DDL statement– Creates another HBase table behind the scenes

• Deciding when an index is used– Transparent to the user– (but user can override through hint)– No stats yet

• Knowing which table was used– EXPLAIN <query>

SF HUG – Sept 2013

Page 7: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

7

Creating Indexes In Phoenix

• CREATE INDEX <index_name>ON <table_name>(<columns_to_index>…)INCLUDE (<columns_to_cover>…);

• Optionally add IMMUTABLE_ROWS=true property to CREATE TABLE statement

SF HUG – Sept 2013

Page 8: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

8

Creating Indexes In Phoenix

CREATE TABLE baby_names ( name VARCHAR PRIMARY KEY, occurrences BIGINT);

CREATE INDEX baby_names_idx ON baby_names(occurrences DESC,

name);

SF HUG – Sept 2013

Page 9: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

9

Deciding When To Use

• Transparent to the user• Query optimizer does the following:– Compiles query against data and index tables– Chooses “best” one (not yet stats driven)• Can index even be used?

– Active, Using columns contained in index (no join back to data table)

• Can ORDER BY be removed?• Which plan forms the longest start/stop scan key?

SF HUG – Sept 2013

Page 10: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

10

Deciding When To Use

SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10;

SELECT name, occurrences FROM baby_names_idxLIMIT 10

SF HUG – Sept 2013

ORDER BY not necessary since rows in index table are already ordered this way

Page 11: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

11

Deciding When To Use

SELECT name, occurrences FROM baby_names WHERE occurrences > 100;

SELECT name, occurrences FROM baby_names_idxWHERE occurrences > 100;

SF HUG – Sept 2013

Uses index, since we can form start row for scan based on filter of occurrences

Page 12: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

12

Deciding When To Use

SELECT /* NO_INDEX */ nameFROM baby_names WHERE occurrences > 100;

SELECT /*+ INDEX (baby_names baby_names_idx other_baby_names_idx) */name,occurrences FROM baby_namesWHERE occurrences > 100;

SF HUG – Sept 2013

Override optimizer by telling it not to use any indexes

Tell optimizer priority in which it should consider using indexes`

Page 13: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

13

Knowing which table was used

EXPLAIN SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10;

CLIENT PARALLEL 1-WAY FULL SCAN OVER BABY_NAMES_IDX SERVER FILTER BY PageFilter 10CLIENT 10 ROW LIMIT

SF HUG – Sept 2013

Page 14: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

14

• About

• Indexes In Phoenix

• Immutable Indexes

• Mutable Indexes

• Demo!

• Roadmap

Agenda

SF HUG – Sept 2013

Page 15: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

15

Immutable Indexes

• Immutable Rows

• Much easier to implement

• Client-managed

• Bulk-loadable

SF HUG – Sept 2013

Page 16: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

16

• About

• Indexes In Phoenix

• Immutable Indexes

• Mutable Indexes

• Demo!

• Roadmap

Agenda

SF HUG – Sept 2013

Page 17: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

17

Mutable Indexes

• Global Index

• Change row state– Common use-case– “expected” implementation

• Covered Columns/Join Index

SF HUG – Sept 2013

Page 18: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

18

1.5 years*

SF HUG – Sept 2013

Page 19: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

19

Internals

• Index Management– Build index updates– Ensures index is ‘cleaned up’

• Recovery Mechanism– Ensures index updates are “ACID”

SF HUG – Sept 2013

Page 20: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

20

“There is no magic”

- Every programming hipster (chipster)

SF HUG – Sept 2013

Page 21: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

21

Mutable Indexing: Standard Write Path

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

SF HUG – Sept 2013

Page 22: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

22

Mutable Indexing: Standard Write Path

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

SF HUG – Sept 2013

Page 23: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

23

Mutable Indexing

RegionCoprocessor

Host

WAL

RegionCoprocessor

Host

Indexer Builder

WAL Updater

Durable!

IndexerIndex Table

Index TableIndex Table

Codec

SF HUG – Sept 2013

Page 24: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

24

Index Management

• Lives within a RegionCoprocesorObserver• Access to the local HRegion• Specifies the mutations to apply to the index

tables

public interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);

}

SF HUG – Sept 2013

Page 25: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

25

Why not write my own?

• Managing Cleanup – Efficient point-in-time correctness– Performance tricks

• Abstract access to HRegion– Minimal network hops

• Sorting correctness– Phoenix typing ensures correct index sorting

SF HUG – Sept 2013

Page 26: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

26

Example: Managing Cleanup

• Updates can arrive out of order– Client-managed timestamps

SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Page 27: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

27

Example: Managing Cleanup

Index Table

SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Val1|Row1 Index Fam:Qual 10

Val1|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 28: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

28

Example: Managing Cleanup

SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Row1 Fam Qual 11 val4

Page 29: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

29

Example: Managing Cleanup

SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam Qual 11 val4

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Page 30: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

30

Example: Managing Cleanup

SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Va1|Row1 Index Fam:Qual 10

Val4|Row1 Index Fam:Qual 11

Val4|Val2|Row1 Index Fam:QualFam2:Qual2

12

Va1l|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 31: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

31

Example: Managing Cleanup

SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Va1|Row1 Index Fam:Qual 10

Val4|Row1 Index Fam:Qual 11

Val4|Val2|Row1 Index Fam:QualFam2:Qual2

12

Va1l|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 32: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

32

Managing Cleanup

• History “roll up”• Out-of-order Updates• Point-in-time correctness• Multiple Timestamps per Mutation• Delete vs. DeleteColumn vs. DeleteFamily

Surprisingly hard!SF HUG – Sept 2013

Page 33: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

33

Phoenix Index Builder

• Much simpler than full index management• Hides cleanup considerations• Abstracted access to local state

SF HUG – Sept 2013

public interface IndexCodec{public void initialize(RegionCoprocessorEnvironment env);public Iterable<IndexUpdate> getIndexDeletes(TableState state);public Iterable<IndexUpdate> getIndexUpserts(TableState state);

}

Page 34: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

34

Phoenix Index Codec

SF HUG – Sept 2013

Page 35: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

35

Dude, where’s my data?

SF HUG – Sept 2013

Ensuring Correctness

Page 36: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

36

HBase ACID

• Does NOT give you:– Cross-row consistency– Cross-table consistency

• Does give you:– Durable data on success– Visibility on success without partial rows

SF HUG – Sept 2013

Page 37: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

Key Observation

“Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.”

- Lars Hofhansl

37 SF HUG – Sept 2013

Page 38: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

38

Idempotent Index Updates

• Doesn’t need full transactions

• Replay as many times as needed

• Can tolerate a little lag– As long as we get the order right

SF HUG – Sept 2013

Page 39: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

39

Failure Recovery• Custom WALEditCodec– Encodes index updates– Supports compressed WAL

• Custom WAL Reader– Replay index updates from WAL

SF HUG – Sept 2013

<property><name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value>

</property><property>

<name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value>

</property>

Page 40: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

40

Failure Situations

• Any time before WAL, client replay

• Any time after WAL, HBase replay

• All-or-nothing

SF HUG – Sept 2013

Page 41: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

41

Failure #1: Before WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

SF HUG – Sept 2013

Page 42: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

42

Failure #1: Before WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

No problem! No data is stored in the WAL, client just retries entire update.

SF HUG – Sept 2013

Page 43: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

43

Failure #2: After WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

SF HUG – Sept 2013

Page 44: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

44

Failure #2: After WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

WAL replayed via usual replay mechanisms

SF HUG – Sept 2013

Page 45: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

45

“Magic”

• Server-short circuit• Lazy load columns• Skip-scan for cache• Parallel Writing• Custom MemStore in Indexer• Caching HTables• Pluggable Index Writing/Failure Policy• Minimize byte[] copy (ImmutableBytesPtr)

SF HUG – Sept 2013

Page 46: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

46

• About

• Indexes In Phoenix

• Immutable Indexes

• Mutable Indexes

• Demo!

• Roadmap

Agenda

SF HUG – Sept 2013

Page 47: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

47

Demo

SF HUG – Sept 2013

Page 48: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

48

• About

• Indexes In Phoenix

• Immutable Indexes

• Mutable Indexes

• Demo!

• Roadmap

Agenda

SF HUG – Sept 2013

Page 49: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

49

Roadmap

• Next release of Phoenix

• Performance improvements

• Functional Indexes

• Other indexing approaches (Huawei, SEP)

SF HUG – Sept 2013

Page 50: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

50

Open Source!

• Main: https://github.com/forcedotcom/phoenix

• Indexing:https://github.com/forcedotcom/phoenix/tree/mutable-si

SF HUG – Sept 2013

Page 51: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

(obligatory hiring slide)

We’re Hiring!

Page 52: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

Questions? Comments?

[email protected]@jamesplusplus

[email protected]@jesse_yates

Page 53: Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software.

53

Appendix

• AsyncHBaseWriter– github.com/jyates/phoenix/tree/async-hbase– 2x+ slower*

* Written in 2hrs, not 100% correct either

SF HUG – Sept 2013


Recommended