+ All Categories
Home > Documents > Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python...

Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python...

Date post: 11-Feb-2018
Category:
Upload: phamhanh
View: 229 times
Download: 3 times
Share this document with a friend
35
On snakes and elephants Using Python inside PostgreSQL Jan Urba´ nski [email protected] New Relic PyWaw Summit 2015, Warsaw, May 26 Jan Urba´ nski (New Relic) On snakes and elephants PyWaw Summit 1 / 32
Transcript
Page 1: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

On snakes and elephantsUsing Python inside PostgreSQL

Jan [email protected]

New Relic

PyWaw Summit 2015, Warsaw, May 26

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 1 / 32

Page 2: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

For those following at home

Getting the slides

$ wget http://wulczer.org/pywaw-summit.pdf

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 2 / 32

Page 3: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

1 Introduction

Stored procedures

PostgreSQL’s specifics

2 The PL/Python language

Implementation

Examples

3 Using PL/Python

Real-life applications

Best practices

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 3 / 32

Page 4: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Outline

1 Introduction

Stored procedures

PostgreSQL’s specifics

2 The PL/Python language

3 Using PL/Python

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 4 / 32

Page 5: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

What are stored procedures

I procedural code callable from SQL

I used to implement operations that are not easily expressed in SQL

I encapsulate business logic

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 5 / 32

Page 6: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Stored procedure examples

Calling stored procedures

SELECT purge_user_records(142);

SELECT lower(username) FROM users;

CREATE TRIGGER notify_user_trig

AFTER UPDATE ON users

EXECUTE PROCEDURE notify_user();

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 6 / 32

Page 7: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Stored procedure languages

I most RDBMS have one blessed language in which storedprocedures can we written

I Oracle has PL/SQL

I MS SQL Server has T-SQL

I but Postgres is better

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 7 / 32

Page 8: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Stored procedures in Postgres

I a stored procedure in Postgres is

I information about input and output types

I metadata like the name, the owner, additional modifiers

I finally, a bunch of text

I a procedural language in Postgres is just a C extension moduleexposing a single function

I its job is to execute that piece of text, accepting and producing theperscribed types

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 8 / 32

Page 9: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Outline

1 Introduction

Stored procedures

PostgreSQL’s specifics

2 The PL/Python language

3 Using PL/Python

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 9 / 32

Page 10: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Extensibility is king

I stored procedures in Postgres can be written in any language...

I ... as long as a handler has been defined for it

I several languages are officially supported

I PL/pgSQL, a PL/SQL look-alike

I PL/Tcp, PL/Perl and PL/Python

I and there’s a lot of unofficial ones

I Ruby, Lua, PHP, Java, Scheme, V8, R...

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 10 / 32

Page 11: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Trusted vs untrusted languages

I once installed, trusted languages are available to all users

I for example, PL/pgSQL or PL/V8

I they need to provide a sandboxed execution environment for arbitraryuser code

I the ability to create untrusted language functions is limited todatabase superusers

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 11 / 32

Page 12: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Outline

1 Introduction

2 The PL/Python language

Implementation

Examples

3 Using PL/Python

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 12 / 32

Page 13: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

What PL/Python actually is

I the ability to run a Python interpreter inside the backend

I runs as the backend’s OS user, so untrusted

I can run arbitrary Python code, including doing very nasty or reallycrazy things

I but that’s the fun of it!

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 13 / 32

Page 14: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

What PL/Python actually is

I the ability to run a Python interpreter inside the backend

I runs as the backend’s OS user, so untrusted

I can run arbitrary Python code, including doing very nasty or reallycrazy things

I but that’s the fun of it!

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 13 / 32

Page 15: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

How does it work

I the first time a PL/Python function is run, a Python interpreter isinitialised inside the backend process

I preload plpython.so to avoid initial slowdown

I use long-lived connections to only pay the overhead once

I Postgres types are transformed into Python types and vice versa

I only works for built-in types, the rest gets passed using thestring representation

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 14 / 32

Page 16: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

How does it work cd.

I function arguments are visible as global variables

I the function has access to various magic globals that describe theexecution environment

I the plpy module providing database access and utility functions

I a dictionary with the old and new tuples if called as a trigger

I dictionaries kept in memory between queries, useful for caches

I the module path depends on the server process’s PYTHONPATH

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 15 / 32

Page 17: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Outline

1 Introduction

2 The PL/Python language

Implementation

Examples

3 Using PL/Python

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 16 / 32

Page 18: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python examples

Using Python modules

create function histogram(a float[], bins int = 10)

returns int[]

as $$

import numpy

return numpy.histogram(a, bins)[0]

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 17 / 32

Page 19: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python examples

Using Python modules cd.

create function get_host(url text) returns text as $$

import urlparse

return urlparse.urlparse(url).netloc

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 18 / 32

Page 20: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python utility functions

Utility functions

create function find_table(name text)

returns text[] as $$

import difflib

sql = ’select tablename from pg_tables’

result = plpy.execute(sql)

all_names = [table[’tablename’] for table in result]

return difflib.get_close_matches(name, all_names)

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 19 / 32

Page 21: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python utility functions

Utility functions cd.

create function add_unique_user(name text, email text)

returns text as $$

lname, lemail = name, email

plan = plpy.prepare(

’insert into users(name, email) values ($1, $2)’,

(’text’, ’text’))

while True:

try: plpy.execute(plan, (lname, lemail))

except plpy.spiexceptions.UniqueViolation:

lname = lname + ’_’

else: return lname

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 20 / 32

Page 22: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python global dictionaries

Global dictionaries

create function get_mx(domain text) returns text as $$

import DNS, time

mx, expires = GD.get(domain, (None, 0))

if mx and time.time() < expires:

return mx

GD[domain] = DNS.mxlookup(domain)[0][1], time.time() + 5

return GD[domain][0]

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 21 / 32

Page 23: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python advanced examples

Avanced examples

create function check_mx() returns trigger as $$

import DNS

domain = TD[’new’][’email’].split(’@’, 1)[1]

try:

DNS.mxlookup(domain) or plpy.error(’no mx’)

except DNS.ServerError:

plpy.error(’lookup failed for domain %s’ % domain)

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 22 / 32

Page 24: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

PL/Python advanced examples

Avanced examples cd.

create function schedule(source text,

summary out text, location out text,

start out timestamptz)

returns setof record as $$

import icalendar, requests

resp = requests.get(source)

cal = icalendar.Calendar.from_ical(resp.content)

for event in cal.walk(’VEVENT’):

yield (event[’SUMMARY’], event[’LOCATION’],

event[’DTSTART’].dt.isoformat())

$$ language plpythonu;

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 23 / 32

Page 25: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Outline

1 Introduction

2 The PL/Python language

3 Using PL/Python

Real-life applications

Best practices

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 24 / 32

Page 26: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Where to use PL/Python

I writing a particular piece of logic in a nicer language than PL/pgSQL

I doing numerical computations in the database with NumPy

I doing text analysis with NLTK

I writing a constraint that checks if a column contains JSON

I or a protobuf stream

I or a PNG image

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 25 / 32

Page 27: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Crazier ideas

I implementing a simple cache layer right in the database

I connecting to other Postgres instances and doing things to them

I communicating with external services to invalidate caches or triggeractions

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 26 / 32

Page 28: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Multicorn

I not really PL/Python, but similar idea under the hood

I uses the foreign data wrapper mechanism

I foreign data wrappers are a way to present data residing in otherstorages as if they were local tables

I can use arbitrary Python code to provide unified access to disparatesources

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 27 / 32

Page 29: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Multicorn example

Filesystem access

create foreign table python_fs (

package text, module text, content bytea)

server filesystem_srv options (

root_dir ’/usr/lib/python2.7/dist-packages’,

pattern ’{package}/{module}.py’,

content_column ’content’);

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 28 / 32

Page 30: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Multicorn modules

I out of the box, Multicord provides modules for filesystem, IMAP,LDAP, SQLAlchemy and a few more

I but it’s easy to write your own!

I perfect for prototyping production-grade foreign data wrappers

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 29 / 32

Page 31: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Outline

1 Introduction

2 The PL/Python language

3 Using PL/Python

Real-life applications

Best practices

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 30 / 32

Page 32: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Organising PL/Python code

I keep your PL/Python code in a module

I make all your SQL functions two-liners

I test the Python code by mocking out magic variables

I it’s a sharp tool, be careful

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 31 / 32

Page 33: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Organising PL/Python code

I keep your PL/Python code in a module

I make all your SQL functions two-liners

create function the_func(arg1 text, arg2 text)

returns integer as $$

from myapp.plpython import functions

return functions.the_func(locals())

$$ language plpythonu;

I test the Python code by mocking out magic variables

I it’s a sharp tool, be careful

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 31 / 32

Page 34: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Organising PL/Python code

I keep your PL/Python code in a module

I make all your SQL functions two-liners

I test the Python code by mocking out magic variables

I it’s a sharp tool, be careful

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 31 / 32

Page 35: Using Python inside PostgreSQL Jan Urbanski · PDF fileOn snakes and elephants Using Python inside PostgreSQL Jan Urbanski wulczer@ New Relic PyWaw Summit 2015, Warsaw, May 26 Jan

Questions?

Jan Urbanski (New Relic) On snakes and elephants PyWaw Summit 32 / 32


Recommended