Date post: | 30-Jan-2015 |
Category: |
Technology |
Upload: | melbournepatterns |
View: | 957 times |
Download: | 0 times |
November 2006 Sreenivas Ananthakrishna 1
Database RefactoringDatabase Refactoring
An introduction to
Refactoring Databases & Evolutionary Database Design (Amber and Sadalage)
November 2006 Sreenivas Ananthakrishna 2
AgendaAgenda
► What is database refactoring about?What is database refactoring about?
► Evolutionary database development techniquesEvolutionary database development techniques
► Refactoring StrategiesRefactoring Strategies
► Classification of refactorings and examplesClassification of refactorings and examples
November 2006 Sreenivas Ananthakrishna 3
What is database refactoring about?What is database refactoring about?
Improving database design Improving database design Making small and incremental changes to Making small and incremental changes to
the schemathe schema Maintain existing information and Maintain existing information and
behaviourbehaviour Functionality is not added/removedFunctionality is not added/removed Not just limited to the database, but also Not just limited to the database, but also
the applications that use itthe applications that use it
November 2006 Sreenivas Ananthakrishna 4
A simple example…A simple example…
Customer
customerId <<PK>>
name
Account
accountId <<PK>>
customerId <<FK>>
accesses
balance
Customer
SynchronizeAccountBalance
{event = on update |on delete|on insert,
drop date = <date> }
balance
SynchronizeCustomerBalance
{event = on update |on delete|on insert,
drop date = <date> }
{drop date = <date>}
App A App B
maintainbalance()
maintainbalance()
November 2006 Sreenivas Ananthakrishna 5
Why refactor ?Why refactor ?► Data models built upfront tend to be complex and Data models built upfront tend to be complex and
need cleaningneed cleaning
► Maintain consistency between application domain Maintain consistency between application domain and data modeland data model
► Address performance requirementsAddress performance requirements
► Identify and eliminate db smellsIdentify and eliminate db smells
November 2006 Sreenivas Ananthakrishna 6
Database SmellsDatabase Smells
► Multipurpose Column - Multipurpose Column - eg.eg. Customer dob & employee Customer dob & employee start datestart date
► Multipurpose Table Multipurpose Table – eg. Customer table with – eg. Customer table with person/corpsperson/corps
► Redundant Data Redundant Data – same information in different tables– same information in different tables
► Table with too many columns Table with too many columns – eg. Customer with many – eg. Customer with many addressaddress
► Table with too many rows Table with too many rows ► Smart columns – Smart columns – eg. Data has positional contexteg. Data has positional context
► Fear of change – Fear of change – too risky to change schema, time to too risky to change schema, time to refactor!refactor!
November 2006 Sreenivas Ananthakrishna 7
Evolutionary Database DevelopmentEvolutionary Database Development► Evolve data models vs upfront designEvolve data models vs upfront design
► Database regression testingDatabase regression testing
► Configuration management of database Configuration management of database artifactsartifacts
► Developer SandboxesDeveloper Sandboxes
November 2006 Sreenivas Ananthakrishna 8
Database regression testingDatabase regression testing► Test the schemaTest the schema
Check logic in stored procedures and triggersCheck logic in stored procedures and triggers Test check and referential constraintsTest check and referential constraints View definitionsView definitions Default Values and InvariantsDefault Values and Invariants
► Test application codeTest application code Unit tests around application code which queries Unit tests around application code which queries
the db.the db.
► Test data migrationTest data migration
November 2006 Sreenivas Ananthakrishna 9
Config management of DB ArtifactsConfig management of DB Artifacts
►Schema creation scriptsSchema creation scripts►Data loading/migration scriptsData loading/migration scripts►Reference dataReference data►Stored proceduresStored procedures►View definitionsView definitions►Test dataTest data►Regression TestsRegression Tests
November 2006 Sreenivas Ananthakrishna 10
Developer SandboxesDeveloper Sandboxes
November 2006 Sreenivas Ananthakrishna 11
Database Refactoring StrategiesDatabase Refactoring Strategies► Apply small changesApply small changes
Small changes allow easy/early detection of Small changes allow easy/early detection of errorserrors
► Identify Individual RefactoringsIdentify Individual Refactorings Instead of doing “move column” and “rename Instead of doing “move column” and “rename
column” in one go, version each individually.column” in one go, version each individually.
► Create database configuration tableCreate database configuration table Helps identify current version of the database Helps identify current version of the database
and can be used in migrations.and can be used in migrations.
November 2006 Sreenivas Ananthakrishna 12
Database Refactoring Strategies (contd.)Database Refactoring Strategies (contd.)► Determine synchronization strategies during Determine synchronization strategies during
transition periodtransition period Triggers do real time update but might have Triggers do real time update but might have
performance impacts.performance impacts. Views might not supports updates but do not Views might not supports updates but do not
move datamove data Batch synch can be used during non-peak loads Batch synch can be used during non-peak loads
but might have to deal with multiple updatesbut might have to deal with multiple updates
► Encapsulate Database AccessEncapsulate Database Access Abstract database access eg. By using Abstract database access eg. By using
persistence frameworkspersistence frameworks
November 2006 Sreenivas Ananthakrishna 13
Database Refactoring ClassificationDatabase Refactoring Classification
► StructuralStructural
► Data QualityData Quality
► ReferentialReferential
► ArchitecturalArchitectural
► MethodMethod
November 2006 Sreenivas Ananthakrishna 14
Structural RefactoringsStructural RefactoringsRelated to structure of Tables, ViewsRelated to structure of Tables, Views
eg. eg. Move Column, Rename Table, Split Table, Merge ColumnMove Column, Rename Table, Split Table, Merge Column
Issues to consider when implementing:Issues to consider when implementing: Cyclic TriggersCyclic Triggers
Broken Views, Procedures, TriggersBroken Views, Procedures, Triggers
Transition period in multi-application setupTransition period in multi-application setup
November 2006 Sreenivas Ananthakrishna 15
Introduce Surrogate KeyIntroduce Surrogate Key
► MotivationsMotivations Reduce coupling between schema and business domainReduce coupling between schema and business domain Increase consistency by having a uniform key strategyIncrease consistency by having a uniform key strategy Improve performance by having index based on simpler Improve performance by having index based on simpler
keykey
► Potential TradeoffsPotential Tradeoffs Surrogate keys are not suitable for all situationsSurrogate keys are not suitable for all situations Introducing a new key might require further key Introducing a new key might require further key
consolidation and more effortconsolidation and more effort
“Replace an existing natural key with a surrogate key”
November 2006 Sreenivas Ananthakrishna 16
Introduce Surrogate Key (contd.)Introduce Surrogate Key (contd.)
contains
balance
PopulateOrderId
{event = on insert
drop date = <date> }
Order
customerNumber <<PK>> <<FK>> <<Natural>>
storeId <<PK>> <<Natural>>
OrderItem
customerNumber <<PK>> <<FK>> <<Natural>>
storeId <<PK>> <<Natural>>
orderItemNumber <<PK>>
orderId <<FK>> <<surrogate>>
orderId <<PK>> <<surrogate>>
{drop date = <date>}
November 2006 Sreenivas Ananthakrishna 17
Data Quality RefactoringsData Quality RefactoringsRelated to improving quality of information in Related to improving quality of information in
dbdbeg. eg. Add Lookup Table, Introduce column constraint, Introduce Add Lookup Table, Introduce column constraint, Introduce common formatcommon format
Issues to consider when implementing:Issues to consider when implementing: Constraint violationsConstraint violations
Broken logic in proceduresBroken logic in procedures
Broken Broken wherewhere clauses in Views clauses in Views
Updating large amounts of dataUpdating large amounts of data
November 2006 Sreenivas Ananthakrishna 18
Add Lookup TableAdd Lookup Table
► MotivationsMotivations Introduce referential integrity for a columnIntroduce referential integrity for a column Provide code lookup (move enum to the db)Provide code lookup (move enum to the db) Replace column constraint with set of expected values in Replace column constraint with set of expected values in
lookup tablelookup table
► Potential TradeoffsPotential Tradeoffs Identifying the data to populate (especially for multiple Identifying the data to populate (especially for multiple
apps)apps) Possible performance impact due to additional joinsPossible performance impact due to additional joins
“Create a lookup table for an existing column”
November 2006 Sreenivas Ananthakrishna 19
Add Lookup Table (contd.)Add Lookup Table (contd.)
Address
Street
State
State <<PK>>
Name <<FK>>
1. Identify the column 4. Introduce FK constraint
3. Populate Data
2. Create Lookup Table
State
PostCode
November 2006 Sreenivas Ananthakrishna 20
Referential Integrity RefactoringsReferential Integrity RefactoringsChanges that improve referential integrity of Changes that improve referential integrity of
datadataeg. eg. Add Foreign Key Constraint, Introduce cascading delete, Add Foreign Key Constraint, Introduce cascading delete, Introduce trigger for historyIntroduce trigger for history
Issues to consider when implementing:Issues to consider when implementing: Fix broken CRUD logic in procedureFix broken CRUD logic in procedure
Data cleansing to make new constraints workData cleansing to make new constraints work
November 2006 Sreenivas Ananthakrishna 21
Introduce Cascading DeleteIntroduce Cascading Delete
► MotivationsMotivations Preserve referential integrity of the parent /child rowsPreserve referential integrity of the parent /child rows Remove responsibility for child deletion in the applicationRemove responsibility for child deletion in the application
► Potential TradeoffsPotential Tradeoffs Deadlock ?Deadlock ? Trigger accidental mass deletion when deleting root nodesTrigger accidental mass deletion when deleting root nodes Duplicate functionality is introduced when using Duplicate functionality is introduced when using
persistence frameworks like Hibernate/Toplinkpersistence frameworks like Hibernate/Toplink
“Delete the child record(s) when the parent is deleted”
November 2006 Sreenivas Ananthakrishna 22
Introduce Cascading Delete (contd.)Introduce Cascading Delete (contd.)
Policy
PolicyId <<PK>>
Claim
ClaimId <<PK>>
1. Identify the column
2. Choose cascading mechanism (triggers or using cascade clause during constraint creation)
PolicyId <<FK>>
DeleteClaim
{event = on delete}
November 2006 Sreenivas Ananthakrishna 23
Architectural RefactoringsArchitectural RefactoringsChanges that improve performance, Changes that improve performance,
portability and define the architecture within portability and define the architecture within the databasethe databaseeg. eg. Encapsulate Table with View, Introduce Calculation Encapsulate Table with View, Introduce Calculation Method, Replace Method(s) with View, Introduce trigger for Method, Replace Method(s) with View, Introduce trigger for historyhistory
Issues to consider when implementing:Issues to consider when implementing: Performance vs Data redundancy Performance vs Data redundancy
Keeping business logic in the application vs databaseKeeping business logic in the application vs database
November 2006 Sreenivas Ananthakrishna 24
Introduce IndexIntroduce Index
► MotivationsMotivations Increase performance of read queriesIncrease performance of read queries
► Potential TradeoffsPotential Tradeoffs Too many indexes degrade performance during Too many indexes degrade performance during
insert/update/deletesinsert/update/deletes Existing data containing duplicates might need cleansing Existing data containing duplicates might need cleansing
when introducing unique indexeswhen introducing unique indexes
“Introduce a unique or non-unique Index”
November 2006 Sreenivas Ananthakrishna 25
Introduce Index (contd.)Introduce Index (contd.)
Customer
CustomerId <<PK>>
TFN <<index>>
1. Determine type of index – unique vs non-unique
3. Add a new index
TFN <<AK>>
Name
4. Add more disk space for index maintenance
2. Eliminate duplicate rows when using unique index
November 2006 Sreenivas Ananthakrishna 26
Method RefactoringsMethod RefactoringsChanges that improve code representing Changes that improve code representing
stored procedures, functions and triggersstored procedures, functions and triggerseg. eg. Rename Method, Reorder Parameters, Replace literal with Rename Method, Reorder Parameters, Replace literal with Table LookupTable Lookup
Issues to consider when implementing:Issues to consider when implementing: Broken triggers, procedures, functionsBroken triggers, procedures, functions
Tool supportTool support
November 2006 Sreenivas Ananthakrishna 27
Refactoring ToolsRefactoring Tools
► Schema Migration – Rails Migration, SundogSchema Migration – Rails Migration, Sundog► Unit Testing –JUnit, DBUnitUnit Testing –JUnit, DBUnit► Refactor Stored Procedures – Refactor Stored Procedures –
SQLRefactor(SQLServer Only)SQLRefactor(SQLServer Only)