+ All Categories
Home > Software > Using the PostgreSQL Extension Ecosystem for Advanced Analytics

Using the PostgreSQL Extension Ecosystem for Advanced Analytics

Date post: 15-Jan-2017
Category:
Upload: chartio
View: 513 times
Download: 1 times
Share this document with a friend
45
[email protected] (855) 232-0320 [email protected] (855) 232-0320 Using the PostgreSQL Extension Ecosystem for Advanced Analytics
Transcript
Page 1: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

Using the PostgreSQL Extension Ecosystem for

Advanced Analytics

Page 2: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

- The problem- The prevailing view vs. the practical reality

- A possible solution- Or just building blocks?

- Nearness- Near at hand, near to our skill set, near to our capabilities

- A more complete solution- The PostgreSQL extension ecosystem

Agenda

Page 3: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

The ProblemThe Prevailing View

vs. The Practical Reality

Page 4: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

The Prevailing View - LogicalDimension Relational Non-Relational

Schema objects ● Structured rows and columns● Schema on write● Referential integrity● Painful migrations

● Unstructured files, docs, etc● Schema on read● No referential integrity● No migrations

Query languages ● SQL● Declarative● Easy enough for non-tech users

● Various● Procedural● Requires some programming skills

Exploratory analysis ● Native support for joins● Interactive/low execution overhead

● No native support for joins● OLAP - Batch processing

Data science and ML ● Only descriptive statistics● Requires exporting dumps/samples

● Robust ecosystem● Does not require exports

Page 5: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

The Prevailing View - PhysicalDimension Relational Non-Relational

Parallel query processing

● Single node system● Single process per query

● Multiple node system● Multiple processes per query

Concurrency ● High concurrency● Single process per connection

● OLAP - low concurrency/high scheduling overhead

High Availability & Replication

● Async and sync replication● HA may not be native

● Async and sync replication● HA likely to be native

Sharding ● Sharding may not be native● Difficult to manage

● Sharding likely to be native● Easy to manage

Page 6: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

The Prevailing View - Summary- RDBMS have nice properties for producing rich data

- ACID, relational integrity, constraints, strong data types

- Easier for non-tech users and exploratory analysis- Probably don’t meet the needs of today’s analysts

- Data science & Machine Learning- Parallel processing

- Definitely don’t meet the needs of today’s apps- Schema migrations- Replication and sharding

Page 7: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

The Practical Reality

Page 8: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

But we still want more advanced functionality.

The Practical Reality

Page 9: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

A Possible SolutionOr Just Building Blocks?

Page 10: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Modern SQL- Many people still think of SQL in terms of SQL-92- Since then we’ve had: SQL:1999, SQL:2003, SQL:2006,

SQL:2008, SQL:2011- http://use-the-index-luke.com/blog/2015-02/modern-sql

- Common Table Expressions (CTEs) / Recursive CTEs- Window Functions- Ordered-set Aggregates- Lateral joins- Temporal support- The list goes on...

Page 11: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Procedural Languages- Native

pgSQL Tcl Perl Python

- Community

Java PHP R Javascript Ruby Scheme sh

Page 12: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

These solve some problems. For others, they are just building

blocks.

Building Blocks

Page 13: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

NearnessNear at Hand

Near to Our Skill SetNear to Our Capabilities

Page 14: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

- http://www.infoq.com/presentations/Simple-Made-Easy

Nearness

Page 15: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

- Near at hand- Easily installable

- Near to our skill set- Familiar tool/language/abstraction- Modular and composable

- Near to our capabilities- Capable of solving a problem in our domain

Nearness Drives Adoption

Page 16: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

[email protected] (855) 232-0320

A More Complete SolutionThe PostgreSQL Extension

Ecosystem

Page 17: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & operators: https://github.com/eulerto/pg_similarity- UDAs & data types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 18: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & operators: https://github.com/eulerto/pg_similarity- UDAs & data types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 19: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

- Package Manager: pgxn- Index/Network: http://pgxn.org/- PyPI, RubyGems, CPAN, CRAN

The PostgreSQL Extension Network

Page 20: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

The PostgreSQL Extension Network

- Near at hand- pgxn search semver- pgxn info semver- pgxn install semver- pgxn load –d somedb semver- pgxn unload –d somedb

semver- pgxn uninstall semver

- Search github? google? mailing list?- Github README?- git clone; make; make install;- psql –c “CREATE EXTENSION IF NOT

EXISTS”- psql –c “DROP EXTENSION IF EXISTS”- make uninstall?

Page 21: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & operators: https://github.com/eulerto/pg_similarity- UDAs & data types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 22: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

UDFs & Operators: pg_similarity- Near to our capabilities

- Similarity coefficient algorithms- L1 Distance- Cosine Distance- Dice Coefficient- Euclidean Distance- Hamming Distance- Jaccard Coefficient- Jaro Distance- Jaro-Winkler Distance- Levenshtein Distance

- Matching Coefficient- Monge-Elkan Coefficient- Needleman-Wunsch Coefficient- Overlap Coefficient- Q-Gram Distance- Smith-Waterman Coefficient- Smith-Waterman-Gotoh Coefficient- Soundex Distance

Page 23: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

UDFs & Operators: pg_similarity- Near to our skill set

Page 24: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

UDFs & Operators: pg_similarity- Implementation

Page 25: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & Operators: https://github.com/eulerto/pg_similarity- UDAs & Data Types:

https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 26: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

UDAs & Data Types: postgresql-hll- Near to our capabilities & near to our skill set

- Data type- Estimate count distinct with tunable precision- 1280 bytes estimates tens of billions of distinct values with few

percent error

Page 27: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

UDAs & Data Types: postgresql-hll

Page 28: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

UDAs & Data Types: postgresql-hll- Implementation

Page 29: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & Operators: https://github.com/eulerto/pg_similarity- UDAs & Data Types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 30: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Foreign Data Wrappers: API

Page 31: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Foreign Data Wrappers: multicorn

- Near to our skill set

Page 32: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Foreign Data Wrappers: pgosquery

- Near at hand

Page 33: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & Operators: https://github.com/eulerto/pg_similarity- UDAs & Data Types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 34: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Indexes: ZomboDB

- Index Access Method API- http://www.postgresql.org/docs/9.4/static/indexam.html

Page 35: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & Operators: https://github.com/eulerto/pg_similarity- UDAs & Data Types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes (GiST, GIN): https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 36: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Composing Extension Methods: MADlib Near to our capabilities

Page 37: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Composing Extension Methods: MADlib- Near to our skill set

Page 38: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Composing Extension Methods: MADlib

Page 39: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & Operators: https://github.com/eulerto/pg_similarity- UDAs & Data Types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/,

https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking: http://no0p.github.io/2015/10/20/record_linking.html#/

Page 40: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Parallel Processing

- Parallel sequential scan- http://rhaas.blogspot.com/2015/11/parallel-sequential-scan-is-committed.html

- Columnar FDW:- https://github.com/citusdata/cstore_fdw

Page 41: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Postgres Extension Ecosystem Examples- PostgreSQL Extension Network: http://pgxn.org/

- UDFs & Operators: https://github.com/eulerto/pg_similarity- UDAs & Data Types: https://github.com/aggregateknowledge/postgresql-hll- Foreign Data Wrappers: http://multicorn.org/, https://github.com/shish/

pgosquery- Indexes: https://github.com/zombodb/zombodb- Composing Extension Methods: http://doc.madlib.net/- MPP: https://www.citusdata.com/, https://github.com/greenplum-db/gpdb- Composing Extensions

- Custom Background Workers: https://github.com/no0p/alps- Record linking:

http://no0p.github.io/2015/10/20/record_linking.html#/

Page 42: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Composing Extensions: Alps

Page 43: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Composing Extensions: Record Linking

Page 44: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Beyond Analytics- Web app framework

- http://blog.aquameta.com/- REST API

- https://github.com/begriffs/postgrest- Unit testing framework

- http://pgtap.org/- Firewall

- https://github.com/uptimejp/sql_firewall- More every week!

Page 45: Using the PostgreSQL Extension Ecosystem for Advanced Analytics

[email protected] (855) 232-0320

Conclusion- With PostgreSQL, you get

- more than rows and columns- more than SELECT, FROM, WHERE, GROUP BY, ORDER

BY- more than a single machine

- Make sure you get the full return on your investment!

Get your Chartio free trial!

[email protected]

(855) 232-0320


Recommended