Database Migrationswith Gradle and Liquibase
Dan StineCopyright Clearance Centerwww.copyright.com
Gradle SummitJune 12, 2015
2
About Me
• Software Architect
• Library & Framework Developer
• Platform Engineering Lead & Product Owner
• Gradle User Since 2011
• Enemy of Inefficiency & Needless Inconsistency
dstine at copyright.comsw at stinemail.comgithub.com/dstine
6/12/2015
3
About Copyright Clearance Center
• Global licensing solutions that make © work for everyone
– Get, share and manage content
– Rights broker for the world’s most sought-after materials
– Global company (US, Europe, Asia) – HQ in Danvers, MA
• Industry-specific software systems
– Internal and external user base
– Applications, services, databases
– Organic growth over many years
• In 2011, CCC adopted a Product Platform strategy for growing its software portfolio
6/12/2015
4
Agenda
• Context
• Liquibase
• Development Process
• Deploy Time
• Extensibility
• Wrap Up
6/12/2015
5
CONTEXT
6/12/2015
6
Database Migrations
• Database structure changes
– Tables, constraints, indexes, etc.
– Schema changes (DDL, not DML)
• Reference data
– List of countries, user types, order status, etc.
– Set of allowed values
• Database logic
– Functions, procedures, triggers
– (Very little of this)
6/12/2015
7
Our Historical Approach
• DB migrations handled in relatively ad-hoc fashion
• Various flavors of “standard” practice
– Framework copied and modified from project to project
– Framework not always used (“small” projects)
• Development teams shared a DEV database
– Conflicts between code and database
6/12/2015
8
Development Pain Points
• Intra-team collaboration was difficult
• Forced synchronous updates within development team
• Learn variations when switching between projects
• Project startup was costly
6/12/2015
9
Deployment Pain Points
• Manual process
– Where are the scripts for this app?
– Which scripts should be run and how?
• Recurring difficulties
– Hours spent resolving mismatches between app and database
– Testing activities frequently delayed or even restarted
• Impossible to automate
– Too many variations
• Self-service deployment was a pipe dream
6/12/2015
10
Standard Software Platform
• Started platform definition in 2011
– Homogenous by default
• Tools
– Java, Spring, Tomcat, Postgres
– Git / GitHub, Gradle, Jenkins, Artifactory, Liquibase, Chef
• Process
– Standard development workflow
– Standard application shape & operational profile
6/12/2015
11
Vision for Database Script Management
• Integrated into developer workflow
• Feeds cleanly into deployment workflow
• Developer commits scripts and the process takes over
– Just like with application code
6/12/2015
12
A Plan For Pain Relief
• Manage scripts as first-class citizens
– Same repo as application code
– Standard location in source tree
• Standard execution engine
– No more variations
– Automatic tracking of applied migrations
• Prevent conflicts and mismatches
– Introduce developer workstation databases (LOCAL )
– Dedicated sandbox
– Commit database and associated application change together
6/12/2015
13
A Plan For Pain Relief
• Liquibase
– Database described as code
– Execution engine & migration tracking
• Gradle
– Provide conventions
– Tasks for invoking Liquibase
– Already familiar to developers from existing build process
– Flexibility to integrate into deployment process
– Flexibility to handle emergent requirements
6/12/2015
14
LIQUIBASE
6/12/2015
15
Liquibase Basics
• Provides vocabulary of database changes
– Create Table, Add PK, Add FK, Add Column, Add Index, …
– Drop Table, Drop PK, Drop FK, Drop Column, Drop Index, …
– Insert, Update, Delete, …
• Changes are grouped into changesets
– Change(s) that should be applied atomically
• Changesets are grouped into changelogs
– Files managed in version control
6/12/2015
16
Liquibase Basics
• Changesets uniquely identified by [Author, ID, File]
– Liquibase tracks changeset execution in a special table
– Lock table to prevent concurrent Liquibase invocations
– Modified changesets are detected via checksums
• Supported databases
– MySQL, PostgreSQL, Oracle, SQL Server, …
• Groovy DSL
– Liquibase v2 supported only XML
– https://github.com/tlberglund/groovy-liquibase
6/12/2015
17
Example ChangesetchangeSet(id: '2015-01-23', author: 'John Doe <[email protected]>') {
createTable(schemaName: 'apps', tableName: 'myapp_version', tablespace: 'ccc_data') { column(name: 'version_uid', type: 'VARCHAR(128)') column(name: 'type', type: 'VARCHAR(10)') column(name: 'owner_uid', type: 'VARCHAR(128)') column(name: 'version', type: 'VARCHAR(20)') column(name: 'start_date', type: 'TIMESTAMPTZ') column(name: 'end_date', type: 'TIMESTAMPTZ') } addPrimaryKey(constraintName: 'PK_myapp_version', schemaName: 'apps', tableName: 'myapp_version', tablespace: 'ccc_index', columnNames: 'version_uid') addForeignKeyConstraint(constraintName: 'FK_myapp_version_2_owner',
baseTableSchemaName: 'apps', baseTableName: 'myapp_version', baseColumnNames: 'owner_uid', referencedTableSchemaName: 'apps', referencedTableName: 'myapp_owner', referencedColumnNames: 'owner_uid')}
6/12/2015
18
Liquibase @ CCC
• Learning curve
– Team needs to understand the underlying model
– Don’t edit changesets once they’ve been applied
• Our standards
– Schema name and tablespace are required
– Parameterize schema name and tablespace
createTable( schemaName: dbAppsSchema, tableName: 'myapp_version', tablespace: dbDataTablespace)
6/12/2015
19
DEVELOPMENT PROCESS
6/12/2015
20
Development Workflow
• Gradle is our SCM hub
– Workstation builds
– LOCAL app servers via command line
– IDE integration
– CI and release builds on Jenkins
• Maintain Gradle-centric workflow
– Integrated database development
6/12/2015
21
Standard Project Structure
• Single Git repo with multi-project Gradle build
myapp myapp-db myapp-rest myapp-service myapp-ui
group = com.copyright.myapp
• UI and REST service published as WARs
• DB published as JAR
6/12/2015
22
Custom Gradle Plugin
• Created custom plugin: ccc-postgres
• Standard script location
– Main source set: src/main/liquibase
– Package: com.copyright.myapp.db
• Standard versions
– Liquibase itself
– Postgres JDBC driver
6/12/2015
23
Plugin Extension
• Custom DSL via Gradle extension
cccPostgres { mainChangelog = 'com/copyright/myapp/db/main.groovy'
}
• Main changelog includes other changelogs
6/12/2015
24
Development Lifecycle Tasks
• Provided by ccc-postgres
• Easy to manage LOCAL development database
– Isolated from other developers and deployments
– Pull in new schema changes run a task
• Built on Gradle Liquibase plugin
https://github.com/tlberglund/gradle-liquibase-plugin
6/12/2015
25
Development Lifecycle Tasks
6/12/2015
26
Development Lifecycle Tasks
• Typical developer loop
– gradlew update
– gradlew tomcatRun and/or IDE
• Not just for product development teams
– Simple to run any app
– Architects, QA, Platform Engineering
6/12/2015
27
Development Lifecycle Tasks
Task Runs As Description
createDatabase postgres
Creates ccc user and databaseCreates data and index tablespaces
createSchema ccc Creates apps schema
update ccc Runs main changelog
dropDatabase postgres
Drops ccc user and database
resetBaseChangelog
postgres
Truncates postgres.public.databasechangelog
6/12/2015
• resetBaseChangelog
– Must clear all traces of Liquibase to start over
28
Plugin Configuration
• Override default library versions
cccPostgres.standardDependencies.postgresDriver
• Defaults point to LOCAL development database
– Can override property values
dbHost, dbPort, dbName
dbUsername, dbPassword
dbDataTablespace, dbIndexTablespace
dbBaseUsername, dbBasePassword
6/12/2015
29
Standardization and Compliance
• So all our teams are authoring DB code
• But Liquibase is new to many
• And we have company standards
• Let’s automate!
6/12/2015
30
Static Analysis
• CodeNarc
– Static analysis of Groovy code
– Allows custom rule sets
• Created a set of custom CodeNarc rules
– Analyze our Liquibase Groovy DSL changelogs
• Apply to our db projects via the Gradle codenarc plugin
– Fail build if violations are found
6/12/2015
31
Static Analysis – Required Attributes
• Our rule categorizes all change attributes
– Required by Liquibase• createTable requires tableName
– Required by CCC• createTable requires schemaName and tablespace
– Optional
• Unintended positive consequence!
– Catches typos that otherwise would not be detected until farther downstream
– constrainttName or tablspace
6/12/2015
32
Static Analysis – Required Parameterization
• Ensure that schemaName & tablespace are parameterized for future flexibility
@Overridevoid visitMapExpression(MapExpression mapExpression) {
mapExpression.mapEntryExpressions .findAll { it.keyExpression instanceof ConstantExpression }
.findAll { ['schemaName', 'tablespace'] .contains(it.keyExpression.value) } .findAll { it.valueExpression instanceof ConstantExpression }
.each { addViolation(it, "${it.keyExpression.value} should not be hard-coded") }
super.visitMapExpression(mapExpression)}
6/12/2015
33
Schema Spy
• Generates visual representation of database structure
– Requires running database instance
– Requires GraphViz installation
• Custom task runSchemaSpy
– By default, points at LOCAL database
6/12/2015
34
Continuous Integration for DB Scripts
• Compile Groovy
– Catches basic syntax errors
• CodeNarc analysis
– Catches policy and DSL violations
• Integration tests
– Apply Liquibase scripts to H2 in-memory database
– Catches additional classes of error
6/12/2015
35
Release Build
• Publish JAR
– Liquibase Groovy scripts from src/main/liquibase
• META-INF/MANIFEST.MF contains entry point
Name: ccc-postgres MainChangelog: com/copyright/myapp/db/main.groovy
6/12/2015
36
DEPLOY TIME
6/12/2015
37
Deployment Automation
• Early efforts focused on applications themselves
– Jenkins orchestrating Chef runs
– Initial transition from prose instructions to Infrastructure as Code
• Database deployments remained manual
– Better than ad-hoc approach
– But still error prone and time-consuming
6/12/2015
38
Automated Application Deployments
• Chef environment file
– Cookbook versions: which instructions are used
• Chef data bags
– Configuration values for each environment
– Encrypted data bags for (e.g.) database credentials
• Jenkins deploy jobs (a.k.a “the button”)
– Parameters = environment, application version
6/12/2015
39
Initial Delivery Pipeline
6/12/2015
ManualDeploy
40
Initial Delivery Pipeline (DB Deployments)
• Clone Git repo and checkout tag
• Manually configure & run Gradle task from ccc-postgres
gradlew update -PdbHost=testdb.copyright.com -PdbPort=5432 -PdbDatabase=ccc-PdbUsername=ccc -PdbPassword=******
• Many apps xmany versions xmultiple environments =
TIME & EFFORT & ERROR
6/12/2015
41
Target Delivery Pipeline
6/12/2015
Full StackAutomatedDeploy
42
Target Delivery Pipeline
• Automated process should also update database
– Single Jenkins job for both apps and database scripts
• Maintain data-driven design
– Environment file lists database artifacts
– Controlled flow down the pipeline
• Gradle database deployment task
– Retrieve scripts from Artifactory
– Harvest information already in Chef data bags (URL, password)
– Execute Liquibase
6/12/2015
43
Automated Database Deployment
6/12/2015
44
Jenkins Deploy Job
• One job per application group, per set of deployers
– E.g. myapp.qa allows QA to deploy to environments they own
– Typically contains multiple deployables (apps, db artifacts)
– Typical deployer sets = DEV, QA, OPS
• Executes Liquibase via Gradle for database deployments
– Invokes deployDbArtifact task for each db artifact
• (Executes Chef for application deployments)
6/12/2015
45
Gradle deployDbArtifact Task
• Parameterized via Gradle project properties
– appGroup = myapp
– artifactName = myapp-db
– artifactVersion = 2.1.12
– environment = TEST
• Downloads JAR from Artifactory
– com.copyright.myapp:myapp-db:2.1.12
– Extract MainChangelog value from manifest
6/12/2015
46
Gradle deployDbArtifact Task
• Retrieves DB URL from Chef data bag item for TEST
"myapp.db.url": "jdbc:postgresql://testdb:5432/ccc"
• Retrieves password from encrypted Chef data bag
– myapp.db.password
• Executes Liquibase
6/12/2015
47
Data Bag Access
• Built on top of Chef Java bindings from jclouds
• No support for encrypted data bags
• Java Cryptography Extensions and the following libs:compile 'org.apache.jclouds.api:chef:1.7.2'compile 'org.apache.jclouds.provider:enterprisechef:1.7.2'
compile 'commons-codec:commons-codec:1.9'
6/12/2015
48
Push-Button Deploys
6/12/2015
Deploy History
6/10/2015
DEV TEST PROD
50
Automated Deployments By Role
6/12/2015
QA Rising
QAOvertakes
OPSOPS
Falling
InitialRollout
51
EXTENSIBILITY
6/12/2015
52
Additional Scenarios
• Framework originally design to handle migrations for schema owned by each application
• Achieved additional ROI by managing additional database deployment types with low effort
6/12/2015
53
Roles and Permissions
• An application that manages user roles and permissions (RP) for all other applications
– Has rp-db project to manage its schema, of course
– But every consuming app (e.g. myapp) needs to manage the particular roles and permissions known to it
– Reference data that lives in tables owned by another app
• myapp now has multiple db projects
– myapp-db to manage its schema
– myapp-rp-db to manage its RP reference data
– Both are deployed with new versions of myapp
6/12/2015
54
Roles and Permissions
• Minor addition of conditional logic
if (artifactName.endsWith('-rp-db')) { // e.g. myapp-rp-db // deploy to RP database
} else { // e.g. myapp-db // deploy to application's own database }
• Easy to implement because … Gradle & Groovy
• Conceptual integrity of framework is maintained
6/12/2015
55
WRAP UP
6/12/2015
56
Observations
• Power of convention and consistency
– Once first schemas were automated, dominoes toppled quickly
• Power of flexible tools and building blocks
– Handle legacy complexities, special cases, acquisitions, strategy changes, evolving business conditions
– New database project types fell easily into place
6/12/2015
57
Observations
• Know your tools
– Knowledge (how) has to propagate through the organization
– Ideally the underlying model (why)
• Schema changes no longer restrained by process
6/12/2015
“If it hurts, do it more often”
“If it’s easy, do it more often”
“If it hurts, do it more often”
Reduced technical debt
58
Dirty Work …
• Database development and deployment processes are often considered to be unexciting
• But sometimes you need to roll up your sleeves and do the dirty work to realize a vision
• And relational databases are still the bedrock of most of today’s information systems
6/12/2015
59
Dirty Work … Can Be Exciting!
• Efficient processes
• Reliable and extensible automation
• CONTINUOUS DELIVERY
6/12/2015
60
Full Stack Automated Self-Service Deployments
• Reduced workload of Operations team
• Safely empowered individual product teams
• Significantly reduced the DEV-to-TEST time delay
• Reinvested the recouped bandwidth
– More reliable & frequent software releases
– Additional high-value initiatives
6/12/2015
61
Resources
• Liquibase
http://www.liquibase.org
https://github.com/tlberglund/groovy-liquibase
https://github.com/tlberglund/gradle-liquibase-plugin
• Refactoring Databases: Evolutionary Database Design Ambler and Sadalage (2006)
• Jenkins and Chef:Infrastructure CI and Application Deployment
http://www.slideshare.net/dstine4/jenkins-and-chef-infrastructure-ci-and-automated-deployment
https://www.youtube.com/watch?v=PQ6KTRgAeMU
6/12/2015
62
The word and design marks which appear in this presentation are the trademarks of their respective companies.
6/12/2015
Thank You:
Copyright Clearance Center Engineering Team
Gradle Summit Organizers