+ All Categories
Home > Documents > Secondary Indexing in Phoenix

Secondary Indexing in Phoenix

Date post: 25-Feb-2016
Category:
Upload: jenaya
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Secondary Indexing in Phoenix. LA HBase User Group – September 4, 2013 . Jesse Yates HBase Committer Software Engineer. Agenda. https://www.madison.k12.wi.us/calendars. About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap. - PowerPoint PPT Presentation
54
Secondary Indexing in Phoenix Jesse Yates HBase Committer Software Engineer LA HBase User Group – September 4, 2013
Transcript
Page 1: Secondary Indexing in Phoenix

Secondary Indexing in Phoenix

Jesse YatesHBase CommitterSoftware Engineer

LA HBase User Group – September 4, 2013

Page 2: Secondary Indexing in Phoenix

2

Agenda• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

https://www.madison.k12.wi.us/calendars

Page 3: Secondary Indexing in Phoenix

3

About me

• Developer at Salesforce– System of Record, Phoenix

• Open Source– Phoenix– HBase– Accumulo

LA HUG – Sept 2013

Page 4: Secondary Indexing in Phoenix

4

Phoenix

• Open Source– https://github.com/forcedotcom/phoenix

• “SQL-skin” on HBase– Everyone knows SQL!

• JDBC Driver– Plug-and-play

• Faster than HBase– in some cases

LA HUG – Sept 2013

Page 5: Secondary Indexing in Phoenix

5

Why Index?

• HBase is only sorted on 1 “axis”

• Great for search via a single pattern

Example!LA HUG – Sept 2013

Page 6: Secondary Indexing in Phoenix

6

Example

name:type:

subtype:date:

major:minor:

quantity:

LA HUG – Sept 2013

Page 7: Secondary Indexing in Phoenix

7

Secondary Indexes

• Sort on ‘orthogonal’ axis

• Save full-table scan

• Expected database feature

• Hard in HBase b/c of ACID considerations

LA HUG – Sept 2013

Page 8: Secondary Indexing in Phoenix

8

Agenda• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 9: Secondary Indexing in Phoenix

9 LA HUG – Sept 2013http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/

Page 10: Secondary Indexing in Phoenix

10

Other (Major) Indexing Frameworks

• HBase SEP– Side-Effects Processor– Replication-based– https://github.com/NGDATA/hbase-sep

• Huawei – Server-local indexes– Buddy regions– https://github.com/Huawei-Hadoop/hindex

LA HUG – Sept 2013

Page 11: Secondary Indexing in Phoenix

11

Agenda• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 12: Secondary Indexing in Phoenix

12

Immutable Indexes

• Immutable Rows

• Much easier to implement

• Client-managed

• Bulk-loadable

LA HUG – Sept 2013

Page 13: Secondary Indexing in Phoenix

13

Bulk Loading

phoenix-hbase.blogspot.com

LA HUG – Sept 2013

Page 14: Secondary Indexing in Phoenix

14

Index Bulk Loading

Identity Mapper

Custom Phoenix Reducer

LA HUG – Sept 2013

HFile Output Format

Page 15: Secondary Indexing in Phoenix

15

Index Bulk LoadingPreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute();

String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)\n" + "values(?,?,?,?,?)";

statement = conn.prepareStatement(upsertStmt);… //set values

Iterator<Pair<byte[],List<KeyValue>>> dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn);

LA HUG – Sept 2013

Page 16: Secondary Indexing in Phoenix

16

Agenda• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 17: Secondary Indexing in Phoenix

17

The “fun” stuff…

LA HUG – Sept 2013

Page 18: Secondary Indexing in Phoenix

18

1.5 years

LA HUG – Sept 2013

Page 19: Secondary Indexing in Phoenix

19

Mutable Indexes

• Global Index

• Change row state– Common use-case– “expected” implementation

• Covered Columns

LA HUG – Sept 2013

Page 20: Secondary Indexing in Phoenix

20

Usage

• Just SQL!

• Baby name popularity

• Mock demo

LA HUG – Sept 2013

Page 21: Secondary Indexing in Phoenix

21

Usage• Selects the most popular name for a given yearSELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1;

• Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names

WHERE name='Jesse' GROUP BY name;

• Selects the total occurrences of a given name across all years allowing an index to be used

SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;

LA HUG – Sept 2013

Page 22: Secondary Indexing in Phoenix

22

Usage• Update rows due to census inaccuracy

– Will only work if the mutable indexing is workingUPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM

baby_names WHERE name='Jesse';

• Selects the now updated data (from the index table)

SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;

• Index table still used in scansEXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE

name='Jesse' GROUP BY NAME;

LA HUG – Sept 2013

Page 23: Secondary Indexing in Phoenix

23

Agenda• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes in Phoenix

• Mutable Indexing Internals

• Roadmap

LA HUG – Sept 2013

Page 24: Secondary Indexing in Phoenix

24

Internals

• Index Management– Build index updates– Ensures index is ‘cleaned up’

• Recovery Mechanism– Ensures index updates are “ACID”

LA HUG – Sept 2013

Page 25: Secondary Indexing in Phoenix

25

“There is no magic”

- Every programming hipster (chipster)

LA HUG – Sept 2013

Page 26: Secondary Indexing in Phoenix

26

Mutable Indexing: Standard Write Path

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 27: Secondary Indexing in Phoenix

27

Mutable Indexing: Standard Write Path

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 28: Secondary Indexing in Phoenix

28

Mutable Indexing

RegionCoprocessor

Host

WAL

RegionCoprocessor

Host

Indexer Builder

WAL Updater

Durable!

IndexerIndex Table

Index TableIndex Table

Codec

LA HUG – Sept 2013

Page 29: Secondary Indexing in Phoenix

29

Index Management

• Lives within a RegionCoprocesorObserver• Access to the local HRegion• Specifies the mutations to apply to the index

tables

public interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);

}

LA HUG – Sept 2013

Page 30: Secondary Indexing in Phoenix

30

Why not write my own?

• Managing Cleanup – Efficient point-in-time correctness– Performance tricks

• Abstract access to HRegion– Minimal network hops

• Sorting correctness– Phoenix typing ensures correct index sorting

LA HUG – Sept 2013

Page 31: Secondary Indexing in Phoenix

31

Example: Managing Cleanup

• Updates can arrive out of order– Client-managed timestamps

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Page 32: Secondary Indexing in Phoenix

32

Example: Managing Cleanup

Index Table

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Val1|Row1 Index Fam:Qual 10

Val1|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 33: Secondary Indexing in Phoenix

33

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Row1 Fam Qual 11 val4

Page 34: Secondary Indexing in Phoenix

34

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE

Row1 Fam Qual 10 val1

Row1 Fam Qual 11 val4

Row1 Fam2 Qual2 12 val2

Row1 Fam Qual 13 val3

Page 35: Secondary Indexing in Phoenix

35

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Va1|Row1 Index Fam:Qual 10

Val4|Row1 Index Fam:Qual 11

Val4|Val2|Row1 Index Fam:QualFam2:Qual2

12

Va1l|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 36: Secondary Indexing in Phoenix

36

Example: Managing Cleanup

LA HUG – Sept 2013

ROW FAMILY QUALIFIER TS

Va1|Row1 Index Fam:Qual 10

Val4|Row1 Index Fam:Qual 11

Val4|Val2|Row1 Index Fam:QualFam2:Qual2

12

Va1l|Val2|Row1 Index Fam:QualFam2:Qual2

12

Val3|Val2|Row1 Index Fam:QualFam2:Qual2

13

Page 37: Secondary Indexing in Phoenix

37

Managing Cleanup

• History “roll up”• Out-of-order Updates• Point-in-time correctness• Multiple Timestamps per Mutation• Delete vs. DeleteColumn vs. DeleteFamily

Surprisingly hard!LA HUG – Sept 2013

Page 38: Secondary Indexing in Phoenix

38

Phoenix Index Builder

• Much simpler than full index management• Hides cleanup considerations• Abstracted access to local state

LA HUG – Sept 2013

public interface IndexCodec{public void initialize(RegionCoprocessorEnvironment env);public Iterable<IndexUpdate> getIndexDeletes(TableState state;public Iterable<IndexUpdate> getIndexUpserts(TableState state);

}

Page 39: Secondary Indexing in Phoenix

39

Phoenix Index Codec

LA HUG – Sept 2013

Page 40: Secondary Indexing in Phoenix

40

Dude, where’s my data?

LA HUG – Sept 2013

Ensuring Correctness

Page 41: Secondary Indexing in Phoenix

41

HBase ACID

• Does NOT give you:– Cross-row consistency– Cross-table consistency

• Does give you:– Durable data on success– Visibility on success without partial rows

LA HUG – Sept 2013

Page 42: Secondary Indexing in Phoenix

Key Observation

“Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.”

- Lars Hofhansl

42 LA HUG – Sept 2013

Page 43: Secondary Indexing in Phoenix

43

Idempotent Index Updates

• Doesn’t need full transactions

• Replay as many times as needed

• Can tolerate a little lag– As long as we get the order right

LA HUG – Sept 2013

Page 44: Secondary Indexing in Phoenix

44

Failure Recovery• Custom WALEditCodec– Encodes index updates– Supports compressed WAL

• Custom WAL Reader– Replay index updates from WAL

LA HUG – Sept 2013

<property><name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value>

</property><property>

<name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value>

</property>

Page 45: Secondary Indexing in Phoenix

45

Failure Situations

• Any time before WAL, client replay

• Any time after WAL, HBase replay

• All-or-nothing

LA HUG – Sept 2013

Page 46: Secondary Indexing in Phoenix

46

Failure #1: Before WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 47: Secondary Indexing in Phoenix

47

Failure #1: Before WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

No problem! No data is stored in the WAL, client just retries entire update.

LA HUG – Sept 2013

Page 48: Secondary Indexing in Phoenix

48

Failure #2: After WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

LA HUG – Sept 2013

Page 49: Secondary Indexing in Phoenix

49

Failure #2: After WAL

Client HRegion

RegionCoprocessorHost

WAL

RegionCoprocessorHost

MemStore

WAL replayed via usual replay mechanisms

LA HUG – Sept 2013

Page 50: Secondary Indexing in Phoenix

50

Agenda

• About

• Other Indexing Frameworks

• Immutable Indexes

• Mutable Indexes

• Roadmap

LA HUG – Sept 2013

Page 51: Secondary Indexing in Phoenix

51

Roadmap

• Next release of Phoenix

• Performance testing

• Increased adoption

• Adding to HBase (?)

LA HUG – Sept 2013

Page 52: Secondary Indexing in Phoenix

52

Open Source!

• Main: https://github.com/forcedotcom/phoenix

• Indexing:https://github.com/forcedotcom/phoenix/tree/mutable-si

LA HUG – Sept 2013

Page 53: Secondary Indexing in Phoenix

(obligatory hiring slide)

We’re Hiring!

Page 54: Secondary Indexing in Phoenix

Questions? Comments?

[email protected]@jesse_yates


Recommended