NON-FUNCTIONAL DEPENDENCIES
Sets, Surrogates, Normalisation, Referential Integrity - the Theory with example Scaling considerations in SQL Server
Tony Rogerson, SQL Server MVPTech Articles: http://sqlblogcasts.com/blogs/tonyrogersonCommentary: http://twitter.com/tonyrogersonSQL User Group: http://sqlserverfaq.comHire Me: http://www.sql-server.co.uk
Who Am I?
July 86 started in IT on IBM mainframe – nearly 24 years – was 40 last Sat (10th!)
Went freelance in 98 PL/1, CICS, System W, VB, VB.NET,
C++, Application System DB2, SQL Server since 4.21 Studying on the MSC Business
Intelligence course at Dundee Uni (see Mark W ;))
What I wanted to cover
Surrogates Repeating Groups Relation Valued
Attributes Multi Valued
Groups Normalisation 1NF 2NF 3NF / BCNF 4NF Relations (Tables) Set Theory Functional
Dependencies
Join Dependencies Candidate Keys Row by Row
processing Sub Queries Indexing Index Intersection Indexing the FACT
schema Indexing the
Relational schema NULLs Referential
Integrity (Declarative and
Procedural) Foreign Keys INSTEAD OF
triggers Inserting into Views Relationships Decompose Domains
This 50 mins…
Thinking in Sets Surrogate Keys
What they are Comparison NEWID, NEWSEQUENTIALID, IDENTITY Fragmenation
Normalisation An introduction – what is it? Why use it? Joins – Pre-filter problems, index intersection Fragmentation again
Referential Integrity Optimiser -> Query rewrite Locking considerations around Foreign Keys and
Declarative RI (using Triggers)
What I think you need to know
Thinking in Sets
Functional Dependencies for Performance
Thinking in Sets(Relational Model)
Correct DatabaseDesign (Normalisation and Denormalisation)
Indexing Strategy
Thinking in Sets(SQL)
Hardware (Less IO’s cheaper kit)
Relational Theory V ObjectsReal World has Objects which have Attributes, Attributes themselves can be Objects which may themselves have Attributes (which may be Objects etc…)
Pub has SaleItemsDrinkshas Type (Lager, Wine, Spirit), PriceFoodhas Type (Snacks, Starter, Main), Price has IdentityLocation … Address … Town etc…NameOperator (Tenant, Owner, Licensee etc…)
Modelled in .NETPublic Class Pub
Dim PubID as IntegerDim PubName as StringDim PubSales() As SaleItem
Public Class SaleItemDim priv_ItemSold As New Object ‘ Drink or
Food
Public Class DrinkPublic Class Food
Modelled Relationally
PubsPubIDPubNameTown
PubSalesPubSaleIDPubIDSaleTypeItemIDSalePrice
SaleTypeItemsSaleTypeItemIDSaleTypeIDSaleTypeItem
SaleTypesSaleTypeIDSaleType
What’s a Relation (Table)?
Attribute Attribute AttributeAtomic Value Atomic Value Atomic ValueAtomic Value Atomic Value Atomic Value
Header
TupleTuple
BodyAttribute AttributeAtomic Value Atomic ValueAtomic Value Atomic Value
Projection
• To be a table it must have at least one candidate key (a key being an attribute that is unique)
• Attribute we know as a Column• Tuple we know as a Row• Projection we know as the SELECT clause (the columns you’ve chosen)
What they areComparison NEWID, NEWSEQUENTIALID, IDENTITYFragmentation
Surrogate Keys
Surrogate Keys
System Generated value used as the primary key.
NOT a replacement for natural candidate keys.
Value of not exposed outside the system boundary – so the user doesn’t see it because of validation.
Aids Concurrency with Key updates. Removes overhead with composite
key joins.
Surrogate Key Scope (Verification!)
Surrogate key scope: Application Plumbing
Natural key scope: External World
Surrogate Usage
Web Server
Web Server SQL
Server
Surrogate Keys - Choices
Random NEWID Application generated GUID
Sequential IDENTITY, MAX(x)+1 NEWSEQUENTIALID
Random causes poor performance because of disk latency – 8KB reads; yes – index nodes probably in buffer pool but leaf probably isn’t
Demo
INSERT comparison NEWID NEWSEQUENTIALID IDENTITY
An Introduction – what is it? Why use it?Joins – Pre-filter problems, index intersectionFragmenation
Normalisation
Normalisation (Dependency Theory) Removes redundancy – reduces size
of stored data Joins, Joins and more dam joins! Referential Integrity – more tables,
more FK’s that helps the Optimiser – slows INSERTS/UPDATES, causes fragmentation
The Join (lookup) problem – filter then Join (try and get it to Merge Join)
1NF
To be a table it needs a key to make the tuples (rows) unique
All attributes (column) names need to be unique
The row/column intersection value needs to be Atomic
Get rid of Repeating Groups
What’s a Repeating Group?
The same Attribute (column) has the same Domain (set of values) and occurs multiple times in the table but the instance of a domain can appear in any of the occurrences.
Example of Repeating Group
Session# Attendee1
Attendee2
Atendee3
Attendee4
NonFuncDepend
Rosie Ester Hazel Joe
DeNormalisation
Ester Poppy Hazel Joe
The attendee Ester in session NonFuncDepend can be recorded in Attendee1 to 4 – it don’t matter which one
Session# Attendee#
NonFuncDepend
Rosie
NonFuncDepend
Ester
DeNormalisation
Ester
Collapse the repeated attributes (the group) into its own table.
Repeating Group?? / RVA??
Session# DBA’s Developer
Analyst Project Manager
NonFuncDepend
Rosie Ester Hazel Joe
DeNormalisation
Ester Poppy Hazel Joe
The Attributes modelled are Attendees and Job Roles; domain values exist (attendees) and (job roles) each Session has {JobRole, Attendee}
Session# Job Role#
Attendee#
NonFuncDepend
DBA’s Rosie
NonFuncDepend
Developer
Ester
DeNormalisation
DBA’s Ester
Collapse into its own table.
NOT a Repeating Group
Attendee# Address1
Address2
Address3
Town
Rosie Torver 26 Moorla
Harpenden
Ester Torver 26 Moorla
Harpenden
Address 1 – 3 are separate domains with a completely different set of values
Attendee#
AddressLineNumb#
AddressLine
Rosie 1 TorverRosie 2 26 MoorlaEster 1 Torver
Collapse the repeated attributes (the group) into its own table???
2NF – Sort the Keys out
Remove non-key columns not related to the entire primary key
Check Functional Dependencies to find columns to decompose
2NF - ExampleSession# Presente
r#Attendee#
SessionRoom
NonFuncDepend
Tony Rosie DogAndDuck
NonFuncDepend
Tony Ester DogAndDuck
DeNormalisation
Mark Ester RabbitAndCarrotTable: SessionJobRoles
Session# Presenter#
Attendee#
NonFuncDepend
DBA’s Rosie
NonFuncDepend
Developer Ester
DeNormalisation
DBA’s Ester
Table: Session
Session# SessionRoomNonFuncDepend
DogAndDuck
DeNormalisation
RabbitAndCarrot
• Session# has functional dependency on SessionRoom
3NF/BCNF – Sort the Body out Columns must be dependant on the
whole key and nothing but the key so help me Codd.
Decompose FD’s in the table Body
3NF - ExampleAttendee#
Registered
From URL Email
Rosie 2010-04-01
SQLUG http://sqlserverfaq.com
Ester 2010-03-21
SQLUG http://sqlserverfaq.com
Poppy 2010-04-10
SQLBITS
http://sqlbits.com [email protected]
From# URLSQLUG http://sqlserverfaq.
comSQLBITS http://sqlbits.com
• Attendee# -> {Registered, From, URL, Email}• From -> {URL}
Attendee#
Registered
From Email
Rosie 2010-04-01
SQLUG [email protected]
Ester 2010-03-21
SQLUG [email protected]
Poppy 2010-04-10
SQLBITS
Denormalisation
DenormalisationDeliberately left NULL
See Mark Whitehorn and Yasmin Ahmad session “Denormalisation – having your cake and eating it”
This room after lunch
Demo
Joins Pre-Filter problem Index intersection
IntroductionOptimiser : Query RewriteLocking Considerations around Foreign Keys and Declarative RI (using Triggers).
Referential Integrity
Referential Integrity Is.. Declarative RI:
Done with CONSTRAINTS eg. FK, CHECK
Procedural RI:Done with Triggers (coded in a procedure)
DRI gives information to the optimiser like permitted values (CHECK CONSTRAINT), if a row exists in another table (FK CONSTRAINT)
Demo
RI Optimiser rewrite Foreign Keys and Locking Triggers and Locking – using for RI and
READ_COMMITTED_SNAPSHOT
DRI (FK’s) - Locking
Be wary of blocking from FK’s look ups Unaffected by READ_COMMITTED_SNAPSHOT Uses Serialisable ISOLATION FK’s have a benefit - Optimiser can use them
PRI (Triggers) - Locking
As per the rest of your queries – affected by the default isolation level for your database
READ_COMMITTED_SNAPSHOT causes a problem Remember: Last good committed value
returned rather than reader being blocked by the writer.
DRI is somewhat limited when faced with today’s more complex business data relationship rules
Where are we with this then?
Summary
Further Reading
Mark Whitehorn and Yasmin Ahmad session after lunch
C J Date Edinburgh seminar on 13th/14th May http://
www.justsql.co.uk/chris_date/cjd_edin_may_2010.htm
Live Meetings starting May – see http://sqlserverfaq.com