Date post: | 12-Oct-2015 |
Category: |
Documents |
Upload: | lkjdgfdsgfdh |
View: | 28 times |
Download: | 0 times |
of 32
Key/Value Pair versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL.
Michel Ott
June 17, 2011 1University of Applied Science Rapperswil
What is KVP?
A key-value pair (KVP) is a set of two linked data items: a key, which is a unique identifier for some item of data, and the value, which is
either the data that is identified or a pointer to the location of that data.
(Source: Techtarget, http://searchenterprisedesktop.techtarget.com/definition/key-value-pair)
an open-ended data structure that allows for future extension without modifying existing code or data.
(Source: Wikipedia, http://en.wikipedia.org/wiki/Attribute-value_pair)
June 17, 2011 University of Applied Science Rapperswil 2
1. { data : [
2. { amenity : restaurant,3. name : Godman },4. { amenity : university,5. name : Harvard University }6. ] }
Agenda
Introduction
What is hstore?
KVP Schema in PostgreSQL
Setup / Environment
Performance Benchmark Design
Test Environment
Benchmark (May 2011)
Results
Findings
Conclusion
June 17, 2011 3University of Applied Science Rapperswil
INTRODUCTION What is hstore?
KVP Schema in PostgreSQL
June 17, 2011 4University of Applied Science Rapperswil
What is hstore?
Hstore in PostgreSQL
Storage for semistructural data (a'la perl hash)
Stores associative arrays in a attribute of a table
Is an abstract data type in PostgreSQL
Provides a bunch of PostgreSQL functions for querying, transforming, manipulating,
Usage of hstore
June 17, 2011 University of Applied Science Rapperswil 5
1. CREATE TABLE bench_hstore (
id BIGINT PRIMARY KEY,
(),kvp_hstore HSTORE
);
What is hstore?
Usage of hstore
June 17, 2011 University of Applied Science Rapperswil 6
1. SELECT hstore(kvp_hstore)->name as nameFROM bench_hstore
WHERE hstore(kvp_hstore)->amenity = restaurant;
id : BIGINT kvp_hstore : HSTORE
1 amenity=>restaurant, name=>Goodman
1. INSERT INTO bench_hstore(kvp_hstore) VALUES (
hstore(amenity=>restaurantname=>Godman
));
KVP Schema in PostgreSQL
Schema
Two tables needed (one for the unforeseen arbitrary data = KVP and one for the additional data)
Usage
June 17, 2011 University of Applied Science Rapperswil 7
1. CREATE TABLE bench_kvp_info(
id BIGINT PRIMARY KEY,
());
2. CREATE TABLE bench_kvp(
id BIGINT REFERENCES bench_kvp_main(id),
key TEXT NOT NULL,
value TEXT
);
KVP Schema in PostgreSQL
Usage
June 17, 2011 University of Applied Science Rapperswil 8
1. INSERT INTO bench_kvp_info(id)
VALUES(1)
2. INSERT INTO bench_kvp(id, key, value)
VALUES(1, amenity, restaurant);3. INSERT INTO bench_kvp(id, key, value)
VALUES(1, name, Godman);
1. SELECT * FROM bench_kvp WHERE id = (
SELECT bench_id FROM bench_kvp
WHERE key = amenity' AND value = restaurant);
id : BIGINT key : TEXT value : TEXT
1 amenity restaurant
1 name Godman
SETUP / ENVIRONMENT Performance Benchmark Design
Table Schemas
Test Data
Test Environment
June 17, 2011 9University of Applied Science Rapperswil
Performance Benchmark Design
Table Schema
June 17, 2011 University of Applied Science Rapperswil 10
1. CREATE TABLE bench_kvp_info(
id BIGINT PRIMARY KEY,
());
2. CREATE TABLE bench_kvp(
id BIGINT REFERENCES bench_kvp_main(id),
key TEXT NOT NULL,
value TEXT
);
1. CREATE TABLE bench_hstore (
id BIGINT PRIMARY KEY,
(),kvp_hstore HSTORE
);
Performance Benchmark Design
Test Data
12 different data sets with the following amount of records
Each data set testes twice (once with index and once without)
GiST (Generalized Search Tree) is used for hstore (basis for B-Tree and R-Tree)
Hence
June 17, 2011 University of Applied Science Rapperswil 11
10 100 500
1000 2500 5000
10000 20000 35000
50000 100000 250000
[cicles] 144 start] [warm 3indices] of [# 2 types]of [# 2length] of [# 12
Test Data
Test Data Schema
Example
June 17, 2011 University of Applied Science Rapperswil 12
Column Description
id : integer, sequence Mandatory. A unique sequence identifier.
surname : Text Mandatory. A fancy name.
forename : Text Optional: A fancy name. Can be empty to have a
variable KVP length.
zip : Integer Optional: A number between 1000 and 9000.
comment : Text Optional: A dummy text.
1,cucyp,ecnalehad,6593,lorem ipsum dolor sit amet 2,kasarzyc,,6593,
Test Environment
June 17, 2011 University of Applied Science Rapperswil 13
Test Environment
June 17, 2011 University of Applied Science Rapperswil 14
Technical Specification
Intel(R) Xeon(R) CPU E5520 @ 2.27GHz 64-bit
3CPUs, 4 cores and 8 threads
24 GB RAM
Software
Ubuntu 10.04.2 LTS
PostgreSQL 9.0.4
Python 2.6.5, Numpy, Scipy, Matplotlib
No tuning of Software
BENCHMARK MAY 2011 Results
Findings
Analyze statements
Functionality of hstore
Conclusion
June 17, 2011 15University of Applied Science Rapperswil
Results
June 17, 2011 University of Applied Science Rapperswil 16
Results
June 17, 2011 University of Applied Science Rapperswil 17
Results
June 17, 2011 University of Applied Science Rapperswil 18
Results
June 17, 2011 University of Applied Science Rapperswil 19
Results
June 17, 2011 University of Applied Science Rapperswil 20
Results
June 17, 2011 University of Applied Science Rapperswil 21
Findings
Table size
hstore:
KVP , whereas
Explain Analyze for KVP
June 17, 2011 University of Applied Science Rapperswil 22
tuplesentriesarray
array in the valuetuples nullvalue
KVP hstore
Without
index
Index on
keyCombined
index
Without
index
GiST index
Cost 0..437.03 0..406.88 0..215.91 0..213.72 0..11.33
Runtime 3.607 ms 2.770 ms 2.028 ms 1.883 ms 0.721 ms
Scans Seq scan 1 heap & 1
index scan
1 index
scan
Seq scan 1 heap & 1
index scan
Functionality of hstore
Buffers the whole hstore
Each hstore key value pair knows:
its position in the string
its length
value and its length
Hstore data type
June 17, 2011 University of Applied Science Rapperswil 23
1. CREATE TYPE hstore (
INTERNALLENGTH = -1,
INPUT = hstore_in,
OUTPUT = hstore_out,
RECEIVE = hstore_recv,
SEND = hstore_send,
STORAGE = extended
);
Functionality of hstore
-> as an example for the available operator
Procedure is linked to a PostgreSQL function
June 17, 2011 University of Applied Science Rapperswil 24
1. CREATE OPERATOR -> (
LEFTARG = hstore,
RIGHTARG = text,
PROCEDURE = fetchval
);
1. CREATE OR REPLACE FUNCTION fetchval(hstore,text)
2. RETURNS text
3. AS 'MODULE_PATHNAME','hstore_fetchval4. LANGUAGE C STRICT IMMUTABLE;
Functionality of hstore
PostgreSQL function is linked to a C method
hstore_fetchval method returns the value by
calling get_val method, which loops over the buffer and returns the position
Example
June 17, 2011 University of Applied Science Rapperswil 25
id : BIGINT kvp_hstore : HSTORE
1 zip=>8000, surname=>ebsaveq
2 zip=>6489, surname=>epofod
3 zip=>8000, surname=>kjuefs
1. SELECT hstore(bench_hstore)->surnameFROM bench_hstore
WHERE hstore(bench_hstore)->zip=8000;
Conclusion
For small data sets (< 500 records) KVP is preferable
However
500 records is easily exceeded
Changing schema involves huge effort
Transposing data
Changing database table schema
Possibly refactoring software to new schema
June 17, 2011 University of Applied Science Rapperswil 26
If unsure about size use hstore as data typeKVP is only 0.45 ms faster at 500 records
June 17, 2011 University of Applied Science Rapperswil 27
Thank You
MerciGrazie
Gracias
Obrigado
Danke
Japanese
English
French
Russian
German
Italian
Spanish
Brazilian PortugueseArabic
Traditional Chinese
Simplified Chinese
Hindi
Tamil
Thai
Korean
BACKUP
June 17, 2011 University of Applied Science Rapperswil 28
Findings
Table size
hstore:
KVP , whereas
Explain Analyze for KVP
June 17, 2011 University of Applied Science Rapperswil 29
tuplesentriesarray
array in the valuetuples nullvalue
1. Seq Scan on bench_kvp
(cost=229.38..437.03 rows=3 width=60)
(actual time=3.125..3.579 rows=2 loops=1)
2. Filter: (bench_id = $0)
3. InitPlan 1 (returns $0)
4. -> Seq Scan on bench_kvp
(cost=0.00..229.38 rows=1 width=8)
(actual time=1.406..2.162 rows=1 loops=1)
5. Filter: ((key = 'id'::text) AND
(value = '1735'::text))
6. Total runtime: 3.607 ms
Findings
Explain Analyze for KVP with index on attribute key
June 17, 2011 University of Applied Science Rapperswil 30
1. Seq Scan on bench_kvp
(cost=199.48..406.88 rows=3 width=60)
(actual time=2.268..2.730 rows=2 loops=1)
2. Filter: (bench_id = $0)
3. InitPlan 1 (returns $0)
4. -> Bitmap Heap Scan on bench_kvp
(cost=62.99..199.48 rows=1 width=8)
(actual time=0.925..1.227 rows=1 loops=1)
5. Recheck Cond: (key = 'id'::text)
6. Filter: (value = '1735'::text)
7. -> Bitmap Index Scan on kvpidx
(cost=0.00..62.99 rows=2499 width=0)
(actual time=0.373..0.373 rows=2500 loops=1)
8. Index Cond: (key = 'id'::text)
9. Total runtime: 2.770 ms
Findings
Explain Analyze for KVP with combined index
June 17, 2011 University of Applied Science Rapperswil 31
1. Seq Scan on bench_kvp
(cost=8.27..215.91 rows=3 width=60)
(actual time=1.376..1.954 rows=5 loops=1)
2. Filter: (bench_id = $0)
3. InitPlan 1 (returns $0)
4. -> Index Scan using kvpidx2 on bench_kvp
(cost=0.00..8.27 rows=1 width=8)
(actual time=0.048..0.049 rows=1 loops=1)
5. Index Cond: ((key = 'id'::text) AND
(value = '1735'::text))
6. Total runtime: 2.028 ms
7. (6 rows)
Findings
Explain Analyze for hstore
Explain Analyze for hstore with index
June 17, 2011 University of Applied Science Rapperswil 32
1. Seq Scan on bench_hstore
(cost=0.00..213.72 rows=45 width=40)
(actual time=1.318..1.778 rows=1 loops=1)
2. Filter: ((bench_hstore -> 'id'::text) = '1735'::text)
3. Total runtime: 1.883 ms
1. Bitmap Heap Scan on bench_hstore
(cost=4.27..11.33 rows=2 width=218)
(actual time=0.481..0.534 rows=1 loops=1)
2. Recheck Cond: (bench_hstore @> '"id"=>"1735"'::hstore)
3. -> Bitmap Index Scan on hidx_2_5k
(cost=0.00..4.27 rows=2 width=0)
(actual time=0.308..0.308 rows=70 loops=1)
4. Index Cond: (bench_hstore @> '"id"=>"1735"'::hstore)
5. Total runtime: 0.721 ms