Writing infinite scalability web applications with PHP and
PostgreSQL
PHPDay 2013 - VeronaGabriele Bartolini
2ndQuadrant [email protected]
@_GBartolini_
sabato 18 maggio 13
Gabriele Bartolini•Co-Founder and Manager of 2ndQuadrant Italia
• Data Architect, Business critical environments (OLTP)
• Data warehousing
•Co-Founder Italian PostgreSQL Users Group
•Co-Founder PostgreSQL Europe
•PostgreSQL Contributor and Advocate
sabato 18 maggio 13
Gabriele Bartolini 1.0 (1995-2005)•Web mining
•Web programmer
• C, C++ and PHP
• Web accessibility
• IWA
• W3C (EARL)
•Open source developer
• ht://Dig (C++)
• ht://Check (C++, MySQL, PHP)
• ht://Miner (C++, PostgreSQL)
sabato 18 maggio 13
PostgreSQL = Postgres100% open source, available at www.postgresql.org
sabato 18 maggio 13
Who’s using Postgres?... in a production environment
sabato 18 maggio 13
Who’s planning to use Postgres?
sabato 18 maggio 13
Who’s familiar with sharding?
sabato 18 maggio 13
Traditional PHP application
HTML 5Javascript
CSS 3PHP
Postgres
JSON
ResultSet
sabato 18 maggio 13
Postgres 9.2+ PHP Application
HTML 5Javascript
CSS 3PHP
Postgres
JSON
sabato 18 maggio 13
HTML 5Javascript
CSS 3PHP
Postgres
PL/Proxy
Postgres PostgresPostgres
“NoSQL”
......
NoSQL with Postgres
sabato 18 maggio 13
Sharding•Horizontal partitioning
technique
•Logically and physically split a table by distributing data across different databases/servers
• Reduced number of rows in each shard
• Reduced index size
•Some DBMS have the "DISTRIBUTED BY" clause for "CREATE TABLE"
•Query distribution
• Parallel processing
• Shared nothing architecture
sabato 18 maggio 13
Table 1
Table 1Shard 1
Table 1Shard 2
Table 1Shard 3
Table 1Shard ...
Table 1Shard n
Table 1
SELECT * FROM Table1;
Parallel execution on the ‘n’ shards
sabato 18 maggio 13
Part IPostgre(s|SQL)
sabato 18 maggio 13
PostgreSQL•Started in 1995 (Postgres started in 1986)
•Current major version is 9.2
• Supported major versions are 8.4, 9.0, 9.1 and 9.2
•The PostgreSQL License (BSD style)
• Not subject to monopoly nor acquisitions
•Platform independent (C language)
sabato 18 maggio 13
Some major features•Client/Server
•Concurrency (MVCC)
•Referential integrity
•ACID transactions
•Views
•Functions
•Triggers
•Tablespaces
•Partitioning
•Schemas
•Point In Time Recovery
•Master/Slave Replication
sabato 18 maggio 13
Features for developers•Standard compliant (e.g.
SQL, XML, UTF8, JSON, ...)
•Client interfaces: C/C++, PHP, JDBC, ODBC, .Net, Python, Perl, Ruby, ...
•Vast gamma of ORMs
•Several procedural languages
•Versatile and extensible
• custom data types
• intelligence in the database
• extensions (from 9.1)
• PostGIS
•Database Unit testing
sabato 18 maggio 13
PostgreSQL 9.3•Autumn 2013
•JSON operators and functions
•Updatable views
•Materialised views
•Writable foreign tables
•Support for LATERAL
• ...
•http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.3
sabato 18 maggio 13
NO ... DISTRIBUTED BY•PostgreSQL does not support transparent distribution of
data and queries (yet)
•Sharding can be implemented through PL/Proxy using functions
sabato 18 maggio 13
Part IIArchitecture and scalability
sabato 18 maggio 13
ScalabilityProperty of a database to grow according to both
technical and business requirements
sabato 18 maggio 13
Server
dbdb
SingleSMP Server
Vertical scalability
sabato 18 maggio 13
Server
db
CommodityHardware
Server Server
db
Horizontal scalability
sabato 18 maggio 13
DisclaimerFor the sake of simplicity, every database server described in the following architecture diagrams, deliberately represents a single point of failure.
Due to its high availability/disaster recovery features and open source tools, PostgreSQL is widely used in mission
critical environments.
sabato 18 maggio 13
Server
postgres
Physical/Virtualserver
Single server architecture
apache/php
Pros: costCons: performance (CPU, RAM, I/O), scalability, business continuity, security, no db connection pooling
sabato 18 maggio 13
Web server
Two-server architecture
apache/php
Pros: flexibility (hardware, OS), cost optimisation, security, performance (CPU, RAM, I/O), db connection poolingCons: business continuity, scalability
DB Server
postgres
pgbouncer
Virtual, +CPU, +RAM, -I/O
Physical, =CPU, =RAM, +I/O
sabato 18 maggio 13
Benchmark•2 x Intel Xeon E5630 CPUs
• 48 GB RAM
• Storage:
• 2 SAS 7.2k rpm, RAID 1
• 2 SAS 15k rpm, RAID 1
• 6 SAS 15k rpm, RAID 1+0
•850 MB/s seq read
•13k UPDATE/sec 32 clients
• 1.1k/sec for “large” dbs
•PostgreSQL 9.2/CentOS 6
•Rental costs:
• 1.8k € setup + 350 €/month
sabato 18 maggio 13
Web server
Multiple web servers architecture
apache/php
DB Server
postgres
pgbouncer
pgbouncer
Web server
apache/php
pgbouncer
HA Proxy
HTTP Balancing
Pros: scalability and continuity (Web)
Cons: scalability and continuity (DB)
sabato 18 maggio 13
Web server
Read only database servers
apache/php
DB Master
postgres
pgbouncer
pgbouncer
Web server
apache/php
pgbouncer
HA Proxy
HTTP Balancing
DB Hot Standby
postgres
pgbouncer
StreamingReplication
RO
RORW RW
sabato 18 maggio 13
Web server
Database sharding architecture
DB Access Server
postgres
pgbouncer
Web server
HA Proxy
Web layer
pl/proxyDatabase Access Layer
DB Shard ...
postgres
pgbouncer
DB Shard n
postgres
pgbouncer
DB Shard 1
postgres
pgbouncerDatabase Data Layer
“NoSQL”“NoSQL”
“Infinite” horizontal scalability
Shared nothing architecture
sabato 18 maggio 13
Architecture recap•Do not get carried away by current trends and buzzwords
•Make your decisions about architecture based on:
• business continuity requirements
• results of both database and application benchmarks
•Always start with a simple architecture (two-server)
•Enhance gradually up to sharding for infinite scalability
sabato 18 maggio 13
Part III(Open Source) Technologies
sabato 18 maggio 13
PgBouncer•Lightweight connection
pooler for PostgreSQL
•Written in C by Skype
•Recycling connections on:
• sessions
• transactions
• statements (for PL/Proxy)
• “Virtual databases”
• Real databases can be on different backend servers
•Online reconfiguration, restart and upgrade
• Online statistics
• Supports PAUSE/RESUME
sabato 18 maggio 13
Configuration example[databases]db1_ro = host=pg2 dbname=db1 port=5432* = host=pg1 port=6432
[pgbouncer]listen_addr = *listen_port = 6432pool_mode = session...
sabato 18 maggio 13
Statement
Transaction
Session/Transaction/StatementConnection/Session
Transaction
INSERT
UPDATE
Transaction
INSERT
SELECT
UPDATE
BEGIN
COMMIT
sabato 18 maggio 13
hstore•Data type in PostgreSQL
•Key/value pairs as a single value
•Useful for storing:
• unstructured data
• many attributes for a record
•Keys and values are stored as text
•Set of operators
• ->, @>, ?, ...
• Indexes
• Integrates with JSON in 9.3
sabato 18 maggio 13
JSON Support• Introduced in PostgreSQL 9.2
•Data type
•Generators (9.3)
•Operators (9.3)
• Integration with hstore (9.3)
sabato 18 maggio 13
CREATE TABLE pictures ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(id), title TEXT, metadata JSON, thumbnail BYTEA, picture BYTEA, created_at TIMESTAMP DEFAULT now(), updated_at TIMESTAMP DEFAULT now());
sabato 18 maggio 13
INSERT INTO pictures (user_id, title, metadata)VALUES ( 100, 'A picture of my Stratocaster', '{"latitude":43.883,"longitude":11.1,"width":2240,"height":1680}');
Storing a JSON value
sabato 18 maggio 13
WITH m AS ( SELECT 43.883 AS latitude, 11.1 AS longitude, 2240 AS width, 1680 AS height)INSERT INTO pictures (user_id, title, metadata) SELECT 100, 'Another picture of my Stratocaster', to_json(m) FROM m;
Storing a JSON value #2
sabato 18 maggio 13
WITH p AS ( SELECT title, metadata, created_at FROM pictures WHERE id = 1)SELECT to_json(p) AS json FROM p;
{"title":"A picture of my Stratocaster","metadata":{"latitude":43.883,"longitude":11.1,"width":2240,"height":1680},"created_at":"2013-05-11 13:11:37.788636"}
9.3\ Returning a JSON object
sabato 18 maggio 13
WITH p AS ( SELECT id, title, metadata, created_at FROM pictures WHERE user_id = 100)SELECT json_agg(p) AS json FROM p;
[{"id":1,"title":"A picture of my Stratocaster","metadata":{"latitude":43.883,"longitude":11.1,"width":2240,"height":1680},"created_at":"2013-05-11 13:15:21.078747"}, + {"id":2,"title":"Another picture of my Stratocaster","metadata":{"latitude":43.883,"longitude":11.1,"width":2240,"height":1680},"created_at":"2013-05-11 13:15:21.078747"}]
9.3\ Returning an array of JSONs
sabato 18 maggio 13
9.3\ A taste of JSON operatorsSELECT id, (metadata->'longitude', metadata->'latitude') AS coords FROM pictures WHERE user_id = 100;
id | coords ----+--------------- 1 | (11.1,43.883) 2 | (11.1,43.883)...
sabato 18 maggio 13
PL/Javascript•Procedural Language
•Allows to implement functions inside PostgreSQL using the Javascript Language
•Available as an extension
•Powered by V8 JavaScript Engine
sabato 18 maggio 13
CREATE FUNCTION plv8_test(keys TEXT[], vals TEXT[]) RETURNS TEXT LANGUAGE plv8 IMMUTABLE STRICT AS$BODY$ var o = {}; for(var i=0; i<keys.length; i++) { o[keys[i]] = vals[i]; } return JSON.stringify(o);$BODY$;
SELECT plv8_test(ARRAY['name', 'age'], ARRAY['Angus', '58']);
sabato 18 maggio 13
PL/Proxy•Procedural language for
defining proxy functions
•Written in C by Skype
•The body specifies:
• remote function (target)
• destination (shard)
•Execution can happen on one, some or all shards
• Execution is parallel if shards > 1
•Queries are run in auto-commit mode
• PgBouncer (statement)
sabato 18 maggio 13
The need for an API•PL/Proxy forces to work
with functions
•One database cluster with:
• access/proxy layer
• data layer (shards)
•Database cluster ~ Singleton class
•Business logic in the database
•Programmers need to define an API with access/authorisation rules
• Proxy functions (PL/Proxy)
• Remote functions (any)
•Supports SQL/MED
sabato 18 maggio 13
Database sharding architecture
pg-access-1
postgres
pl/proxy
pg-data-2
postgres
pg-data-1
postgres
Database Access
Layer
Database Data
Layers00 s01 s02 s03
- Definition of the cluster- Data/Queries distributed by user (hash)- Functions using the ‘plproxy’ language
- User data tables- Functions (any language)
sabato 18 maggio 13
Define the cluster with SQL/MEDCREATE SERVER mycluster FOREIGN DATA WRAPPER plproxy OPTIONS ( connection_lifetime '1800', s0 'dbname=shard00 host=pg-data-1', s1 'dbname=shard01 host=pg-data-1', s2 'dbname=shard02 host=pg-data-2', s3 'dbname=shard03 host=pg-data-2');
sabato 18 maggio 13
Proxy function for GETCREATE OR REPLACE FUNCTION get_pictures(i_uid INTEGER) RETURNS json LANGUAGE plproxy AS$BODY$ CLUSTER 'mycluster'; RUN ON hashint4(i_uid);$BODY$;
sabato 18 maggio 13
Data layer function for GETCREATE OR REPLACE FUNCTION get_pictures(i_uid INTEGER) RETURNS json LANGUAGE SQL AS$BODY$WITH p AS ( SELECT id, title, metadata, created_at FROM pictures WHERE user_id = $1) SELECT json_agg(p) AS json FROM p;$BODY$;
sabato 18 maggio 13
Proxy function for INSERTCREATE OR REPLACE FUNCTION insert_picture(i_uid INTEGER, i_title TEXT, i_metadata JSON, i_thumbnail BYTEA, i_picture BYTEA) RETURNS INTEGER LANGUAGE plproxy AS $BODY$ CLUSTER 'mycluster'; RUN ON hashint4(i_uid);$BODY$;
sabato 18 maggio 13
Data function for INSERTCREATE OR REPLACE FUNCTION insert_picture(i_uid INTEGER, i_title TEXT, i_metadata JSON, i_thumbnail BYTEA, i_picture BYTEA) RETURNS INTEGER LANGUAGE SQLAS $BODY$INSERT INTO pictures (user_id, title, metadata, thumbnail, picture) VALUES ($1, $2, $3, $4, $5) RETURNING id;$BODY$;
sabato 18 maggio 13
Data representation in PHP// Obviously all of this will come from a form$user_id = 100;$title = 'A picture of my Stratocaster';$metadata = array ( 'latitude' => 43.883, 'longitude' => 11.1, 'width' => 2240, 'height' => 1680, // ... EXIF?);$thumbnail = ... // Binary stream$picture = ... // Binary stream
sabato 18 maggio 13
INSERT using PDO// $access_layer_conninfo = 'pgsql:...'$connection = new PDO($access_layer_conninfo);$stmt = $connection->prepare('SELECT insert_picture(?, ?, ?, ?, ?)'); // NOSQL!$stmt->bindParam(1, $user_id, PDO::PARAM_INT);$stmt->bindParam(2, $title, PDO::PARAM_STR);$stmt->bindParam(3, json_encode($metadata), PDO::PARAM_STR);$stmt->bindParam(4, $thumbnail, PDO::PARAM_LOB);$stmt->bindParam(5, $picture, PDO::PARAM_LOB);$stmt->execute(); // TODO: Check and get the ID
sabato 18 maggio 13
Memcached•Distributed system for caching of objects in memory
• key/value pairs
• volatile data
•pgmemcache extension
• Integration of database and applications (read scaling)
sabato 18 maggio 13
PostgreSQL
Memcachedm[k] = v PHP
trigger+
pgmemcache
sabato 18 maggio 13
Part IVTesting and deployment
sabato 18 maggio 13
Test-Driven DB development•Database development
technique
•Write tests before actual features
•pgTAP is an extension for TDDD in PostgreSQL
•Allows to check for objects, privileges, functions, triggers, data, queries, etc.
•Transactional
•Provides with statistics
• via SQL or pg_prove
•Supports “Test Anything Protocol” (TAP)
• Continuous integration
sabato 18 maggio 13
BEGIN;SELECT plan(3);-- insert test 1 here-- insert test 2 here-- insert test 3 hereSELECT * FROM finish();ROLLBACK;
pgTAP group test example
sabato 18 maggio 13
SELECT results_eq( 'SELECT * FROM active_users()', 'SELECT * FROM users WHERE active', 'active_users() should return active users');
pgTAP example with ‘results_eq’
sabato 18 maggio 13
Deployment with EXTENSION•Extensibility
•Deployment is a crucial part of any ICT organisation
•PostgreSQL 9.1 introduces EXTENSION for packaging in-database applications
•Fully transactional (atomic)
•Allows for upgrades and downgrades
•Simplifies logical dump/restore of objects
•DevOps with EXTENSION:
• Devs write extensions
• Ops deploy them
sabato 18 maggio 13
Extensions in detail•Extensions require developer to write:
• Control file (metadata, settings, dependencies, ...)
• Installation scripts (SQL statements)
• Update scripts (SQL statements, optional)
• Source code / binaries (optional)
sabato 18 maggio 13
Extension management-- INSTALLATIONCREATE EXTENSION pictures_app;
-- REMOVALDROP EXTENSION pictures_app;
-- UPDATE (fully atomic)ALTER EXTENSION pictures_app UPDATE;
sabato 18 maggio 13
Part VConclusions
sabato 18 maggio 13
Conclusions•Postgres is much more than a “traditional” database
•PHP integrates perfectly with it, in multiple flavours (native interface, PDO, Doctrine, etc.)
•Start exploiting “vertical scalability”
•Always perform benchmark!
•Evaluate costs and complexity of “horizontal scalability”
•Postgres is “business friendly” in terms of licensing
sabato 18 maggio 13
Feel free to approach me about•Open Source Community
•License
•Total Cost of Ownership
•Development life cycle
•Quality of code and documentation
•Business Continuity
•Security and data protection
•PostgreSQL Eco-system
•Performance and scalability
•Software development
sabato 18 maggio 13
Community•Next events:
• PGDay.IT (Autumn 2013)
• PGConf.EU (Dublin, Oct 29 - Nov 1, 2013)
• Italian PostgreSQL Users Group (ITPUG)
• www.itpug.org
•PostgreSQL Europe
• www.postgresql.eu
sabato 18 maggio 13
References•PostgreSQL: www.postgresql.org
•PgBouncer: http://wiki.postgresql.org/wiki/PgBouncer
•PL/Proxy: http://wiki.postgresql.org/wiki/PL/Proxy
•PL/V8: http://pgxn.org/dist/plv8/
•pgTAP: www.pgtap.org
sabato 18 maggio 13
Thank you!Copyright (c) 2013 2ndQuadrant Italia
License: Creative Commons BY-NC-SA 3.0http://creativecommons.org/licenses/by-nc-sa/3.0/it/deed.en
sabato 18 maggio 13