Unlocking Proprietary Data with PostgreSQL Foreign Data WrappersPat PattersonPrincipal Developer Evangelist
[email protected]@metadaddy
Agenda
Foreign Data Wrappers
Writing FDW’s in C
Multicorn
Database.com FDW for PostgreSQL
FDW in action
Why Foreign Data Wrappers?
External data sources look like local tables!– Other SQL database
• MySQL, Oracle, SQL Server, etc– NoSQL database
• CouchDB, Redis, etc– File– LDAP– Web services
• Twitter!
Why Foreign Data Wrappers?
Make the database do the work– SELECT syntax
• DISTINCT, ORDER BY etc– Functions
• COUNT(), MIN(), MAX() etc– JOIN external data to internal tables– Use standard apps, libraries for data analysis,
reporting
Foreign Data Wrappers
2003 - SQL Management of External Data (SQL/MED) 2011 – PostgreSQL 9.1 implementation
– Read-only– SELECT-clause optimization– WHERE-clause push-down
• Minimize data requested from external source
Future Improvements– JOIN push-down
• Where two foreign tables are in the same server
– Support cursors
FDW’s in PostgreSQL
‘Compiled language’ (C) interface Implement a set of callbacks
typedef struct FdwRoutine{ NodeTag type; /* These functions are required. */ GetForeignRelSize_function GetForeignRelSize; GetForeignPaths_function GetForeignPaths; GetForeignPlan_function GetForeignPlan; ExplainForeignScan_function ExplainForeignScan; BeginForeignScan_function BeginForeignScan; IterateForeignScan_function IterateForeignScan; ReScanForeignScan_function ReScanForeignScan; EndForeignScan_function EndForeignScan; /* These functions are optional. */ AnalyzeForeignTable_function AnalyzeForeignTable;} FdwRoutine;
FDW’s in PostgreSQL
Much work!• CouchDB FDW• https://github.com/ZhengYang/couchdb_fdw/• couchdb_fdw.c > 1700 LoC
Multicorn
http://multicorn.org/ PostgreSQL 9.1+ extension Python framework for FDW’s Implement two methods…
Multicornfrom multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
def __init__(self, options, columns): super(ConstantForeignDataWrapper, self).__init__(options, columns) self.columns = columns
def execute(self, quals, columns): for index in range(20): line = {} for column_name in self.columns: line[column_name] = '%s %s' % (column_name, index) yield line
Database.com FDW for PostgreSQL
OAuth login to Database.com / Force.com– Refresh on token expiry
Force.com REST API– SOQL query
• SELECT firstname, lastname FROM Contact
Request thread puts records in Queue, execute() method gets them from Queue
JSON parsing – skip embedded metadat < 250 lines code
Demo
Conclusion
Foreign Data Wrappers make the whole world look like tables!
Writing FDW’s in C is hard!– Or, at least, time consuming!
Writing FDW’s in Python via Multicorn is easy!– Or, at least, quick!
Try it for yourself!
Resources
http://wiki.postgresql.org/wiki/SQL/MED
http://wiki.postgresql.org/wiki/Foreign_data_wrappers
http://multicorn.org/
https://github.com/metadaddy-sfdc/Database.com-FDW-for-PostgreSQL