+ All Categories
Home > Documents > How (and why!) to build a Django based project with ... · How (and why!) to build a Django based...

How (and why!) to build a Django based project with ... · How (and why!) to build a Django based...

Date post: 30-Oct-2019
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
64
How (and why!) to build a Django based project with SQLAlchemy Core for data analysis
Transcript
Page 1: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

How (and why!) to build a Django based project with

SQLAlchemy Core for data analysis

Page 2: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Hi!I’m Gleb PushkovSoftware developer 6+ years(Python & Django)

Kyiv, Ukraine

[email protected]

https://github.com/glebtor

Page 3: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Link to slides

Page 4: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Why do we need SQLAlchemy Core in Django app?

Page 5: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

your application mostly works with aggregations

you have a lot of data

you need precise and performant queries

you’re building advanced queries dynamically

you’re transforming complex queries from SQL to Python

database is not natively supported by Django

(e.g SQL Azure, Sybase, Firebird)

You’re building some kind of Data-Analysis app, e.g.:

Page 6: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis
Page 7: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Subqueries

Window functions

FilteredRelation

Conditional Expressions

Date, Math, Text functions

Custom db constraints

Cool new features:

You have fewer reasons to switch to raw SQL!

Page 8: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

But...Django ORM has its specific

Property.objects.filter(city__startswith='K').select_related('owner')[:5]

SELECT"properties"."id","users"."username"...

FROM "properties"LEFT OUTER JOIN "users" ON ("properties"."owner_id" = "users"."id")WHERE "properties"."city" LIKE 'K%'LIMIT 5

Page 9: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

SQL getting simpler - python query getting complexProperty.objects

.filter(city__startswith='K')

.annotate(owner_name=F('owner__username'))

.values('id', 'owner_name')[:5]

SELECT"properties"."id","users"."username" as "owner_name"

FROM "properties"LEFT OUTER JOIN "users" ON ("properties"."owner_id" = "users"."id")WHERE "properties"."city" LIKE 'K%'LIMIT 5

Page 10: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

SQL getting simpler - python query getting complexProperty.objects

.filter(city__startswith='K')

.annotate(owner_name=F('owner__username'))

.values('id', 'owner_name')[:5]

SELECT"properties"."id","users"."username" as "owner_name"

FROM "properties"LEFT OUTER JOIN "users" ON ("properties"."owner_id" = "users"."id")WHERE "properties"."city" LIKE 'K%'LIMIT 5

Page 11: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

We JOIN all rows, and only then we have a LIMIT.

Explain

Page 12: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

What we want to get

SELECT"properties_by_city"."id","users"."username"FROM (SELECT "properties"."id" AS "id", "properties"."owner_id" AS "owner_id" FROM "properties" WHERE "properties"."city" LIKE 'K%'LIMIT 5) AS "properties_by_city"LEFT OUTER JOIN "users"ON "users"."id" = "properties_by_city"."owner_id"

Page 13: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Looks good!

Explain

Page 14: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

How it will look for SQLAlchemy Core

properties_by_city = (select([properties.c.uuid, properties.c.owner_id]).select_from(properties).where(properties.c.city.like('K%')).limit(5).alias()

)

query = select([properties_by_city.c.uuid, users.c.username]).select_from(properties_by_city.outerjoin(users))

Page 15: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

How it will look for Django ORM

Page 16: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

You can’t build such queries!

In ORM world everything is tied to models (in python) and tables (in db)

We can use `Subquery` in SELECT, WHERE, HAVING, but not in FROM. The “root” of query is a model/table.

Page 17: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

How similar query will look for Django ORM

properties_by_city = (Property.objects.filter(city__startswith='K')[:5].values('pk')

)

Property.objects.filter(pk__in=Subquery(properties_by_city)).annotate(owner_name=F('owner__username')).values('pk', 'owner_name')

Page 18: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Looks good!

Explain

Page 19: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

With django ORM you get everything / with SQLAlchemy Coreyou get only what you asked - they’re on different layers

Django ORM

Non-public API

Raw SQL

SQLAlchemy ORM

SQLAlchemy Core

Raw SQL

Page 20: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Model.objects.filter(...).values(...).annotate(...).filter(...)

select([...]).select_from(...).where(...).group_by(...).having(...)

Django ORM SQLAlchemy

SELECT ...FROM ...WHERE ...GROUP BY ...HAVING ...

SQL

There is a distance between SQL & ORM layer,so sometimes it’s not clear which query will be generated

Page 21: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

There is no freedom on ORM level!

Page 22: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

SQLAlchemy example

Page 23: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Select properties by criteria

level1 = (select([

properties.c.building_id,properties.c.sale_price,properties.c.owner_id

]).select_from(properties).where(properties.c.selling_status=='for_sale').where(properties.c.sale_price!=None).alias()

)

Page 24: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Join usernames

level2 = (select([

level1.c.building_id,level1.c.sale_price,users.c.username

]).select_from(level1.outerjoin(users)).alias()

)

Page 25: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis
Page 26: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Group by

level3 = (select([

level2.c.building_id, func.count(level2.c.building_id).label('apartments_count'), func.sum(level2.c.sale_price).label('sum_price'), func.array_agg(level2.c.username).label('users')

]).select_from(level2).group_by(level2.c.building_id).alias()

)

Page 27: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

One more join at the top

level4 = (select([

properties.c.total_apartments,level3.c.apartments_count,level3.c.sum_price,level3.c.users

]).select_from(level3.join(properties, properties.c.uuid==level3.c.building_id)

))

Page 28: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

SELECT properties.number_of_units, anon_1.apartments_count, anon_1.sum_price, anon_1.usersFROM (SELECT anon_2.building_id AS building_id, count(anon_2.building_id) AS apartments_count, sum(anon_2.sale_price) AS sum_price, array_agg(anon_2.username) AS users FROM (SELECT anon_3.building_id AS building_id, anon_3.sale_price AS sale_price, users.username AS username FROM (

level4 - join

level3 - group by

level2 - join

Page 29: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

SELECT properties.building_id AS building_id, properties.sale_price AS sale_price, properties.owner_id AS owner_id FROM properties WHERE properties.selling_status = :selling_status_1 AND properties.sale_price IS NOT NULL) AS anon_3LEFT OUTER JOIN users ON users.uuid = anon_3.owner_id) AS anon_2 GROUP BY anon_2.building_id) AS anon_1 JOIN properties ON properties.uuid = anon_1.building_id

level1

Wow! SQLAlchemy <3

Page 30: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Aggregation in subquery

SELECT * FROM “properties” WHERE "properties"."sale_price" = (

SELECT MIN("properties"."sale_price") FROM "properties" WHERE "properties"."sale_price" > 1000000

)

SQL we want to get

Situation: we want to find a first price which is bigger than 1’000’000, and then get all properties with exact price.

So we need to make a MIN aggregation.

Page 31: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Aggregation in subquery

min_price = Property.objects .filter(sale_price__gte=1000000) .aggregate(Min('sale_price'))['sale_price__min'] # evaluated :(

Property.objects.filter(sale_price=min_price)

SELECT * FROM "properties" WHERE "properties"."sale_price" = 1234567

We performed two separate queries

Page 32: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Aggregation in subquery - take 2 - subquery

min_price = Listing.objects.filter(sale_price__gte=1000000).values('sale_price').order_by('sale_price')[:1]

Property.objects.filter(sale_price=Subquery(min_price))

... WHERE "properties"."sale_price" = (SELECT U0."sale_price" FROM "properties" U0 WHERE U0."sale_price" >= 1000000 ORDER BY U0."sale_price" ASC LIMIT 1)

Now we have ORDER BY and LIMIT...too complicated for code & db

Page 33: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Aggregation in subquery - take 3 - not recommended

min_price_queryset = Property.objects.filter(sale_price__gte=1000000)min_price_queryset.query.add_annotation(

Min('sale_price'), 'min_price', is_summary=True)Property.objects.filter(

sale_price=Subquery(min_price_queryset.values('min_price')))

WHERE "properties"."sale_price" >= (SELECT MIN(U0."sale_price") AS "min_price" FROM "listings" U0 WHERE U0."sale_price" >= 1000000)

Page 34: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Aggregation in subquery - take 4 - template

class MinSalePrice(Subquery):template = "(SELECT MIN(sale_price) FROM (%(subquery)s) _subq)"output_field = models.IntegerField()

filtered_properties = Property.objects.filter(sale_price__gte=1000000)

Property.objects.filter(sale_price=MinSalePrice(filtered_properties))

Generated SQL is fine, but such approach is some kind of a hack and ....

Page 35: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis
Page 36: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Aggregation in subquery - SQLAlchemy

min_price = (select([func.min(properties.c.sale_price)]).select_from(properties)

)

query = (select([properties.c.id]).select_from(properties).where(properties.c.sale_price==min_price)

)

Page 37: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Joins:

You can't join tables of non-related models

You can't perform RIGHT OUTER JOIN… yes, it’s very rare :)

Django decides for you which join type to apply (INNER or LEFT OUTER)

and always generates

JOIN "table2" ON ("table1"."table2_id" = "table2"."id")

which could be a bit customized by FilteredRelation

Page 38: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Not supported by Django (yet)

Recursive CTE

raw SQL

django-cte-forest (implemented via ‘extra’, has limitations)

SQLAlchemy

Could be done via:

Page 39: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Combining multiple aggregations © Django docs <3

>>> book = Book.objects.first()>>> book.authors.count()2>>> book.store_set.count()3>>> q = Book.objects.annotate(Count('authors'), Count('store'))>>> q[0].authors__count6>>> q[0].store__count

6

Count('field', distinct=True) will fix this query, but other aggregations will not work as expected!

Page 40: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Hard to read advanced queries

Hard to understand what’s going on in SQL-level

Takes time & effort to convert SQL to python

Can’t control / change some parts of generated SQL

Queries could be not efficient

To sum up...

Page 41: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Usually all above is not a problem in 95%* of cases

* Just a number from my head

your application mostly works with aggregations

you have a lot of data

you need precise and performant queries

you’re transforming complex queries from SQL to Python

you’re building advanced queries dynamically

database is not natively supported by Django

(e.g SQL Azure, Sybase, Firebird)

But only when you’re building some kind of Data-Analysis app, e.g.:

Page 42: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis
Page 43: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Ok, how to start??

Page 44: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

1. Create `Engine` as a global variable and describe your connection

Page 45: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Engine Database

Pool

Dialect

DBAPIconnect()

QueuePool is default, to disable pooling use NullPool

sa_engine = create_engine(settings.DB_CONNECTION_URL,pool_recycle=settings.POOL_RECYCLE

)

Page 46: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Pooling

Postgres

8 threads4 uWSGI workers

Django connection

SQLAlchemy (NullPool)

pgbouncer

1 instance produce up to 64 connections1 connection to Postgres ~ 10 MB of RAM124 connections == ~1.2 GB of RAM

Page 47: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

2. Define tables

Page 48: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Re-use django models (aldjemy)

Table reflection (django-sabridge)

If you have models for tables:

Table reflection ( messages = Table('messages', meta, autoload=True)

Define explicitly

Define inline with expressions

No models:

Page 49: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Define explicitly

users = Table('users', metadata,Column('id', Integer, primary_key=True),Column('username', String(150), nullable=False),Column('email', String(254)),Column('role', String(64), nullable=False)

)

Page 50: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Define explicitly

Or even keep it simple, but to simplify `join` describe ForeignKeys:

users = Table('users', metadata,Column('id'),Column('username'),Column('email'),Column('role')

)properties = Table('properties', metadata,

Column('owner_id', None, ForeignKey('users.id')),...

)

Page 51: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Usage

Each returned row is a RowProxy:

all_users = engine.execute(select([users.c.username, users.c.email]).select_from(users)

).fetchall()

[('johndoe', '[email protected]'),('janedoe', '[email protected]')]

all_users[0].username / all_users[0][‘username’] / all_users[0][0]

Page 52: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Define inline with expressions

from sqlalchemy import table, columnengine.execute(

select([column('username'), column('email'),]).select_from(table('users'))).fetchall()

# Or if you need columns to be associated with tables:user = table(

‘user’, column(‘id’), column(‘username’),) queries like: select([user.c.username, ...

Page 53: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

That’s all, start building your fancy queries!

Page 54: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

That’s all, start building your fancy queries!

But what about tests?

Page 55: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Switch connection!

def create_sa_engine(connection_url): extra = {...} if "pytest" in sys.modules: connection_url = _get_test_db_url(connection_url) return create_engine(connection_url, **extra)

engine = create_sa_engine(settings.REMOTE_DB_CONNECTION_URL):

As `engine` is a global variable it will be evaluated earlier than any call of django.test.override_settings or some other approaches.Test db will be created by Django (if it’s listed in settings)

Page 56: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Pytest + ResultProxy cursor issue

return self.process_rows(result_proxy) # iterates over ResultProxy

If you have an exception during iteration over a cursor (ResultProxy)pytest will hang on forever

We have to close cursor explicitly

Page 57: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

def close_cursors(func): @wraps(func) def wrapper(*args, **kwargs): try: return func(*args, **kwargs) except Exception as e: for arg in chain(args, kwargs.values()): if isinstance(arg, ResultProxy): arg.close() raise e return wrapper

@close_cursorsdef process_rows(result_proxy: ResultProxy) ...

Page 58: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

TestCases & connections

TestCase - wraps test with a transaction and performs a rollback

TransactionTestCase - code is not wrapped with transaction, truncate all tables

Django connection

SQLAlchemy connectionsDatabaseApplication

Read Committed

Page 59: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

When to use TestCase

if you write tests for code which works only with 1 of connections

and test data is populated via the same connection

if tables populated via SQLAlchemy connection - you have

to clean up tables by yourself;

it's possible to share data between connections, but it requires

to change transaction isolation level (READ_UNCOMMITED),

but I would not recommend.

Keep in mind

Page 60: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

When to use TransactionTestCase

test code which works with both connections.

(no issues because of autocommit behavior)

this tests are slower

models related tables flushed automatically.

other tables have to be cleaned up by yourself (if you have such)

Keep in mind

Page 61: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Drawbacks

A bit hard to start

Can't easily get a final SQL query with parameters

Slower tests

More connections to database

Can't reuse libraries which work with querysets (e.g django-filters, pagination)

Page 62: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Benefits

Full control over SQL

Faster to express SQL in Python code

Easier to build application-specific SQL-generation layer

Readability & maintainability

Performance

Page 63: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis
Page 64: How (and why!) to build a Django based project with ... · How (and why!) to build a Django based project with SQLAlchemy Core for data analysis

Questions?

Thank you for attention! Link to slides


Recommended