+ All Categories
Home > Documents > Old and New Tricks with GIN - iki.fi

Old and New Tricks with GIN - iki.fi

Date post: 19-Oct-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
34
Old and New Tricks with GIN Heikki Linnakangas / VMware March 20, 2014
Transcript
Page 1: Old and New Tricks with GIN - iki.fi

Old and New Tricks with GIN

Heikki Linnakangas / VMware

March 20, 2014

Page 2: Old and New Tricks with GIN - iki.fi

What is GIN?

Generalized Inverted iNdex

Used to index things like

I full-text searchI arraysI key/value pairs (hstore)I json, xml (with expression indexes)

Page 3: Old and New Tricks with GIN - iki.fi

GIN example: Arrays

create table int_arrays (intarr integer[]);

create index intarr_gin on int_arrays using GIN (intarr);

insert into int_arrays

select array[g, random() * 1000, random() * 1000]

from generate_series(1,10000) g;

Page 4: Old and New Tricks with GIN - iki.fi

GIN example: Arrays

select * from int_arrays where intarr @> array[29, 95];

intarr

---------------

{4399,95,29}

{34355,29,95}

{59742,29,95}

{94927,95,29}

(4 rows)

Page 5: Old and New Tricks with GIN - iki.fi

GIN example: Array operators

At index creation / insertion:

1. Extract elements from array

2. Index the elements

Page 6: Old and New Tricks with GIN - iki.fi

GIN example: Array operators

At search:

1. Extract elements from query

2. Search the index for the elements

3. Return rows that contain all of them

@> - “contains”, must contain all elements&& - “overlap”, must contain at least one element

Page 7: Old and New Tricks with GIN - iki.fi

Operator classes

PostgreSQL is extendable.

The operations to extract elements, search, and combine resultsare defined by an operator class

Built-in operator classes for arrays, full-text search, etc.

Page 8: Old and New Tricks with GIN - iki.fi

Three fundamental GIN operations

1. Extract keys from a value to insert or query

System calls the opclass’ extractQuery / extractValue function

2. Index them

System stores the extracted keys in a B-tree, using the opclass’compare function.

3. Combine matches of several keys efficiently

System calls the opclass’ consistent function to determine if theitem with a combination of keys matches the overall query.

Page 9: Old and New Tricks with GIN - iki.fi

GIN examples: Full-text search (1/2)

At insert:

1. Extract words from text:‘PostgreSQL - The world’‘s most advanced open sourcedatabase’->“postgresql”, “world”, “advanc”, “open”, “sourc”

2. Index the words in the b-tree within GIN index.

Page 10: Old and New Tricks with GIN - iki.fi

GIN examples: Full-text search (2/2)

At search:

1. Extract words from query

2. Fetch all items containing any of the words

3. Determine which items match the overall query

Full-text search has a mini parser and syntax of its own:

select plainto_tsquery(’an advanced open source database’);

plainto_tsquery

-----------------------------------------

’advanc’ & ’open’ & ’sourc’ & ’databas’

(1 row)

Page 11: Old and New Tricks with GIN - iki.fi

GIN examples: Trigrams (1/2)

At insert:

1. Extract trigrams from text:

foobar -> ‘f’, ‘fo’, ‘foo’, ‘oob’, ‘oba’, ‘bar’, ‘ar’

2. Index them

Page 12: Old and New Tricks with GIN - iki.fi

GIN examples: Trigrams (2/2)

At search:

1. Extra trigrams from query

2. Fetch all items containing any of the trigrams.

3. Determine which items match the overall query

must have at least N common trigrams.

I Can speed up LIKE searches!I Also regular expressions!

Page 13: Old and New Tricks with GIN - iki.fi

Three fundamental GIN operations

1. Extract keys from a value to insert or query

2. Index them

System stores the extracted keys in a B-tree, using the opclass’compare function.

3. Determine which rows match, based on the keys present

Page 14: Old and New Tricks with GIN - iki.fi

Refresher: Regular B-tree

advanc: (0,8)advanc: (0,14)advanc: (0,22)advanc: (0,17)advanc: (0,26)...databas: (0,3)databas: (2,10)open: (0,11)postgresql: (0,8)postgresql: (0,41)...

Page 15: Old and New Tricks with GIN - iki.fi

GIN on-disk format

Page 16: Old and New Tricks with GIN - iki.fi

Posting list

I A posting list contains pointers to the physical tuples in thetable

I Each pointer consists of the Page Number and offset withinthe page

(0,8) (0,14) (0,17) (0,22) (0,26) (0,33) (0,34) (0,35) (0,45) (0,47)(0,48) (1,3) (1,4) (1,6) (1,8)

Can be stored in-line in the entry-tree, or as a whole separateB-tree (posting tree)

Page 17: Old and New Tricks with GIN - iki.fi

Posting tree page format

9.3 format

(0,8) (0,14) (0,17) (0,22) (0,26) (0,33) (0,34) (0,35)(0,45) (0,47) (0,48) (1,3) (1,4) (1,6) (1,8)

Each pointer takes 6 bytes (4 bytes for block number and 2 foroffset): 90 bytes in total.

Page 18: Old and New Tricks with GIN - iki.fi

Posting tree page format

9.4 format

(0,8) +6 +3 +5 +4 +7 +1 +1 +10 +2 +1 +2051 +1+2 +2

Stores the pointers in compressed format, as a difference from theprevious item: 21 bytes in total!

Page 19: Old and New Tricks with GIN - iki.fi

9.4 Posting tree format - btree gin example

(btree gin extension is a “dummy” opclass implementation toemulate a normal B-tree)

create extension btree_gin;

create table numbers (n int4);

insert into numbers

select g % 10 from generate_series(1, 10000000) g;

create index numbers_btree on numbers (n);

create index numbers_gin on numbers using gin (n);

Page 20: Old and New Tricks with GIN - iki.fi

9.4 Posting tree format - btree gin example9.4

postgres=# \di+

List of relations

Schema | Name | ... | Size | ...

--------+---------------+-----+--------+-----

public | numbers_btree | | 214 MB |

public | numbers_gin | | 11 MB |

(2 rows)

9.3

Schema | Name | ... | Size | ...

--------+---------------+-----+--------+-----

public | numbers_btree | | 214 MB |

public | numbers_gin | | 58 MB |

(2 rows)

Page 21: Old and New Tricks with GIN - iki.fi

Wow!

Table 346 MB

B-tree index 214 MB

GIN (9.3) 58 MB

GIN (9.4) 11 MB

Page 22: Old and New Tricks with GIN - iki.fi

New posting list format in 9.4

I Much more compactI The new code can still read old-format pages

I pg upgrade worksI but you won’t get the benefit until you REINDEX.

I More expensive to do random updates

I GIN isn’t very fast with random updates anyway. . .

Page 23: Old and New Tricks with GIN - iki.fi

Recap: Three fundamental GIN operations

1. Extract keys from a value to insert or query

2. Index them

3. Combine matches of several keys efficiently, anddetermine which items match the overall query

Page 24: Old and New Tricks with GIN - iki.fi

Consistent function

select plainto_tsquery(

’an advanced PostgreSQL open source database’);

plainto_tsquery

--------------------------------------------------------

’postgresql’ & ’advanc’ & ’open’ & ’sourc’ & ’databas’

(1 row)

select * from foo where col @@ plainto_tsquery(

’an advanced PostgreSQL open source database’

)

Page 25: Old and New Tricks with GIN - iki.fi

3. Combine matches efficiently (0/4)The query returns the following matches from the index:

advanc databas open postgresql sourc

(0,8) (0,3) (0,2) (0,8) (0,1)

(0,14) (0,8) (0,8) (0,41) (0,2)

(0,17) (0,43) (0,30) (0,8)

(0,22) (0,47) (0,33) (0,12)

(0,26) (1,32) (0,36) (0,13)

(0,33) (0,44) (0,18)

(0,34) (0,46) (0,19)

(0,35) (0,56) (0,20)

(0,45) (1,4) (0,26)

(0,47) (1,22) (0,34)

(0,48) (1,24) (0,35)

(1,3) (1,32) (0,50)

(1,4) (1,39) (1,1)

(1,6) (1,5)

(1,8) (1,6)

Page 26: Old and New Tricks with GIN - iki.fi

3. Combine matches efficiently (1/4)(0,1) contains only word “sourc” -> no match

advanc databas open postgresql sourc

(0,8) (0,3) (0,2) (0,8) (0,1)

(0,14) (0,8) (0,8) (0,41) (0,2)

(0,17) (0,43) (0,30) (0,8)

(0,22) (0,47) (0,33) (0,12)

(0,26) (1,32) (0,36) (0,13)

(0,33) (0,44) (0,18)

(0,34) (0,46) (0,19)

(0,35) (0,56) (0,20)

(0,45) (1,4) (0,26)

(0,47) (1,22) (0,34)

(0,48) (1,24) (0,35)

(1,3) (1,32) (0,50)

(1,4) (1,39) (1,1)

(1,6) (1,5)

(1,8) (1,6)

Page 27: Old and New Tricks with GIN - iki.fi

3. Combine matches efficiently (2/4)(0,2) contains words “open” and “sourc” -> no match

advanc databas open postgresql sourc

(0,9) (0,3) (0,2) (0,8) (0,1)

(0,14) (0,8) (0,8) (0,41) (0,2)

(0,17) (0,43) (0,30) (0,8)

(0,22) (0,47) (0,33) (0,12)

(0,26) (1,32) (0,36) (0,13)

(0,33) (0,44) (0,18)

(0,34) (0,46) (0,19)

(0,35) (0,56) (0,20)

(0,45) (1,4) (0,26)

(0,47) (1,22) (0,34)

(0,48) (1,24) (0,35)

(1,3) (1,32) (0,50)

(1,4) (1,39) (1,1)

(1,6) (1,5)

(1,8) (1,6)

Page 28: Old and New Tricks with GIN - iki.fi

3. Combine matches efficiently (3/4)(0,3) contains word “databas” -> no match

advanc databas open postgresql sourc

(0,8) (0,3) (0,2) (0,8) (0,1)

(0,14) (0,8) (0,8) (0,41) (0,2)

(0,17) (0,43) (0,30) (0,8)

(0,22) (0,47) (0,33) (0,12)

(0,26) (1,32) (0,36) (0,13)

(0,33) (0,44) (0,18)

(0,34) (0,46) (0,19)

(0,35) (0,56) (0,20)

(0,45) (1,4) (0,26)

(0,47) (1,22) (0,34)

(0,48) (1,24) (0,35)

(1,3) (1,32) (0,50)

(1,4) (1,39) (1,1)

(1,6) (1,5)

(1,8) (1,6)

Page 29: Old and New Tricks with GIN - iki.fi

3. Combine matches efficiently (4/4)(0,8) contains all the words -> match

advanc databas open postgresql sourc

(0,8) (0,3) (0,2) (0,8) (0,1)

(0,14) (0,8) (0,8) (0,41) (0,2)

(0,17) (0,43) (0,30) (0,8)

(0,22) (0,47) (0,33) (0,12)

(0,26) (1,32) (0,36) (0,13)

(0,33) (0,44) (0,18)

(0,34) (0,46) (0,19)

(0,35) (0,56) (0,20)

(0,45) (1,4) (0,26)

(0,47) (1,22) (0,34)

(0,48) (1,24) (0,35)

(1,3) (1,32) (0,50)

(1,4) (1,39) (1,1)

(1,6) (1,5)

(1,8) (1,6)

Page 30: Old and New Tricks with GIN - iki.fi

Fast Scan

Instead of scanning through the posting lists of all the keywords,only scan through the list with fewest items, and skip the otherlists to the next possible match.

I Big improvement for “frequent-term AND rare-term” stylequeries

Page 31: Old and New Tricks with GIN - iki.fi

Fast scan example(0,8) contains all the words -> match

postgresql databas open advanc sourc

(0,8) (0,3) (0,2) (0,8) (0,1)

(0,41) (0,8) (0,8) (0,14) (0,2)

(0,43) (0,30) (0,17) (0,8)

(0,47) (0,33) (0,22) (0,12)

(1,32) (0,36) (0,26) (0,13)

(0,44) (0,33) (0,18)

(0,46) (0,34) (0,19)

(0,56) (0,35) (0,20)

(1,4) (0,45) (0,26)

(1,22) (0,47) (0,34)

(1,24) (0,48) (0,35)

(1,32) (1,3) (0,50)

(1,39) (1,4) (1,1)

(1,6) (1,5)

(1,8) (1,6)

Page 32: Old and New Tricks with GIN - iki.fi

Summary: Improvements in 9.4

More compact posting list format

I 2x-10x smaller indexes, yay!

Fast scan

I Big speedup for queries with some frequent and some rareitems

Thanks to Alexander Korotkov for these improvements!

Page 33: Old and New Tricks with GIN - iki.fi

Final GIN tip

GIN indexes are efficient at storing duplicates

I Use a GIN index using btree gin extension for status-fields etc.

postgres=# \di+

List of relations

Schema | Name | ... | Size | ...

--------+---------------+-----+--------+-----

public | numbers_btree | | 214 MB |

public | numbers_gin | | 11 MB |

(2 rows)

Page 34: Old and New Tricks with GIN - iki.fi

Questions?


Recommended