Non-Functional Dependencies

Post on 25-Feb-2016

31 views 3 download

description

Tony Rogerson, SQL Server MVP Tech Articles: http://sqlblogcasts.com/blogs/tonyrogerson Commentary: http://twitter.com/tonyrogerson SQL User Group: http://sqlserverfaq.com Hire Me: http://www.sql-server.co.uk. - PowerPoint PPT Presentation

transcript

NON-FUNCTIONAL DEPENDENCIES

Sets, Surrogates, Normalisation, Referential Integrity - the Theory with example Scaling considerations in SQL Server

Tony Rogerson, SQL Server MVPTech Articles: http://sqlblogcasts.com/blogs/tonyrogersonCommentary: http://twitter.com/tonyrogersonSQL User Group: http://sqlserverfaq.comHire Me: http://www.sql-server.co.uk

Who Am I?

July 86 started in IT on IBM mainframe – nearly 24 years – was 40 last Sat (10th!)

Went freelance in 98 PL/1, CICS, System W, VB, VB.NET,

C++, Application System DB2, SQL Server since 4.21 Studying on the MSC Business

Intelligence course at Dundee Uni (see Mark W ;))

What I wanted to cover

Surrogates Repeating Groups Relation Valued

Attributes Multi Valued

Groups Normalisation 1NF 2NF 3NF / BCNF 4NF Relations (Tables) Set Theory Functional

Dependencies

Join Dependencies Candidate Keys Row by Row

processing Sub Queries Indexing Index Intersection Indexing the FACT

schema Indexing the

Relational schema NULLs Referential

Integrity (Declarative and

Procedural) Foreign Keys INSTEAD OF

triggers Inserting into Views Relationships Decompose Domains

This 50 mins…

Thinking in Sets Surrogate Keys

What they are Comparison NEWID, NEWSEQUENTIALID, IDENTITY Fragmenation

Normalisation An introduction – what is it? Why use it? Joins – Pre-filter problems, index intersection Fragmentation again

Referential Integrity Optimiser -> Query rewrite Locking considerations around Foreign Keys and

Declarative RI (using Triggers)

What I think you need to know

Thinking in Sets

Functional Dependencies for Performance

Thinking in Sets(Relational Model)

Correct DatabaseDesign (Normalisation and Denormalisation)

Indexing Strategy

Thinking in Sets(SQL)

Hardware (Less IO’s cheaper kit)

Relational Theory V ObjectsReal World has Objects which have Attributes, Attributes themselves can be Objects which may themselves have Attributes (which may be Objects etc…)

Pub has SaleItemsDrinkshas Type (Lager, Wine, Spirit), PriceFoodhas Type (Snacks, Starter, Main), Price has IdentityLocation … Address … Town etc…NameOperator (Tenant, Owner, Licensee etc…)

Modelled in .NETPublic Class Pub

Dim PubID as IntegerDim PubName as StringDim PubSales() As SaleItem

Public Class SaleItemDim priv_ItemSold As New Object ‘ Drink or

Food

Public Class DrinkPublic Class Food

Modelled Relationally

PubsPubIDPubNameTown

PubSalesPubSaleIDPubIDSaleTypeItemIDSalePrice

SaleTypeItemsSaleTypeItemIDSaleTypeIDSaleTypeItem

SaleTypesSaleTypeIDSaleType

What’s a Relation (Table)?

Attribute Attribute AttributeAtomic Value Atomic Value Atomic ValueAtomic Value Atomic Value Atomic Value

Header

TupleTuple

BodyAttribute AttributeAtomic Value Atomic ValueAtomic Value Atomic Value

Projection

• To be a table it must have at least one candidate key (a key being an attribute that is unique)

• Attribute we know as a Column• Tuple we know as a Row• Projection we know as the SELECT clause (the columns you’ve chosen)

What they areComparison NEWID, NEWSEQUENTIALID, IDENTITYFragmentation

Surrogate Keys

Surrogate Keys

System Generated value used as the primary key.

NOT a replacement for natural candidate keys.

Value of not exposed outside the system boundary – so the user doesn’t see it because of validation.

Aids Concurrency with Key updates. Removes overhead with composite

key joins.

Surrogate Key Scope (Verification!)

Surrogate key scope: Application Plumbing

Natural key scope: External World

Surrogate Usage

Web Server

Web Server SQL

Server

Surrogate Keys - Choices

Random NEWID Application generated GUID

Sequential IDENTITY, MAX(x)+1 NEWSEQUENTIALID

Random causes poor performance because of disk latency – 8KB reads; yes – index nodes probably in buffer pool but leaf probably isn’t

Demo

INSERT comparison NEWID NEWSEQUENTIALID IDENTITY

An Introduction – what is it? Why use it?Joins – Pre-filter problems, index intersectionFragmenation

Normalisation

Normalisation (Dependency Theory) Removes redundancy – reduces size

of stored data Joins, Joins and more dam joins! Referential Integrity – more tables,

more FK’s that helps the Optimiser – slows INSERTS/UPDATES, causes fragmentation

The Join (lookup) problem – filter then Join (try and get it to Merge Join)

1NF

To be a table it needs a key to make the tuples (rows) unique

All attributes (column) names need to be unique

The row/column intersection value needs to be Atomic

Get rid of Repeating Groups

What’s a Repeating Group?

The same Attribute (column) has the same Domain (set of values) and occurs multiple times in the table but the instance of a domain can appear in any of the occurrences.

Example of Repeating Group

Session# Attendee1

Attendee2

Atendee3

Attendee4

NonFuncDepend

Rosie Ester Hazel Joe

DeNormalisation

Ester Poppy Hazel Joe

The attendee Ester in session NonFuncDepend can be recorded in Attendee1 to 4 – it don’t matter which one

Session# Attendee#

NonFuncDepend

Rosie

NonFuncDepend

Ester

DeNormalisation

Ester

Collapse the repeated attributes (the group) into its own table.

Repeating Group?? / RVA??

Session# DBA’s Developer

Analyst Project Manager

NonFuncDepend

Rosie Ester Hazel Joe

DeNormalisation

Ester Poppy Hazel Joe

The Attributes modelled are Attendees and Job Roles; domain values exist (attendees) and (job roles) each Session has {JobRole, Attendee}

Session# Job Role#

Attendee#

NonFuncDepend

DBA’s Rosie

NonFuncDepend

Developer

Ester

DeNormalisation

DBA’s Ester

Collapse into its own table.

NOT a Repeating Group

Attendee# Address1

Address2

Address3

Town

Rosie Torver 26 Moorla

Harpenden

Ester Torver 26 Moorla

Harpenden

Address 1 – 3 are separate domains with a completely different set of values

Attendee#

AddressLineNumb#

AddressLine

Rosie 1 TorverRosie 2 26 MoorlaEster 1 Torver

Collapse the repeated attributes (the group) into its own table???

2NF – Sort the Keys out

Remove non-key columns not related to the entire primary key

Check Functional Dependencies to find columns to decompose

2NF - ExampleSession# Presente

r#Attendee#

SessionRoom

NonFuncDepend

Tony Rosie DogAndDuck

NonFuncDepend

Tony Ester DogAndDuck

DeNormalisation

Mark Ester RabbitAndCarrotTable: SessionJobRoles

Session# Presenter#

Attendee#

NonFuncDepend

DBA’s Rosie

NonFuncDepend

Developer Ester

DeNormalisation

DBA’s Ester

Table: Session

Session# SessionRoomNonFuncDepend

DogAndDuck

DeNormalisation

RabbitAndCarrot

• Session# has functional dependency on SessionRoom

3NF/BCNF – Sort the Body out Columns must be dependant on the

whole key and nothing but the key so help me Codd.

Decompose FD’s in the table Body

3NF - ExampleAttendee#

Registered

From URL Email

Rosie 2010-04-01

SQLUG http://sqlserverfaq.com

rosie@rabbits.com

Ester 2010-03-21

SQLUG http://sqlserverfaq.com

ester@carrots.co.uk

Poppy 2010-04-10

SQLBITS

http://sqlbits.com poppy@toast.co.uk

From# URLSQLUG http://sqlserverfaq.

comSQLBITS http://sqlbits.com

• Attendee# -> {Registered, From, URL, Email}• From -> {URL}

Attendee#

Registered

From Email

Rosie 2010-04-01

SQLUG rosie@rabbits.com

Ester 2010-03-21

SQLUG ester@carrots.co.uk

Poppy 2010-04-10

SQLBITS

poppy@toast.co.uk

Denormalisation

DenormalisationDeliberately left NULL

See Mark Whitehorn and Yasmin Ahmad session “Denormalisation – having your cake and eating it”

This room after lunch

Demo

Joins Pre-Filter problem Index intersection

IntroductionOptimiser : Query RewriteLocking Considerations around Foreign Keys and Declarative RI (using Triggers).

Referential Integrity

Referential Integrity Is.. Declarative RI:

Done with CONSTRAINTS eg. FK, CHECK

Procedural RI:Done with Triggers (coded in a procedure)

DRI gives information to the optimiser like permitted values (CHECK CONSTRAINT), if a row exists in another table (FK CONSTRAINT)

Demo

RI Optimiser rewrite Foreign Keys and Locking Triggers and Locking – using for RI and

READ_COMMITTED_SNAPSHOT

DRI (FK’s) - Locking

Be wary of blocking from FK’s look ups Unaffected by READ_COMMITTED_SNAPSHOT Uses Serialisable ISOLATION FK’s have a benefit - Optimiser can use them

PRI (Triggers) - Locking

As per the rest of your queries – affected by the default isolation level for your database

READ_COMMITTED_SNAPSHOT causes a problem Remember: Last good committed value

returned rather than reader being blocked by the writer.

DRI is somewhat limited when faced with today’s more complex business data relationship rules

Where are we with this then?

Summary

Further Reading

Mark Whitehorn and Yasmin Ahmad session after lunch

C J Date Edinburgh seminar on 13th/14th May http://

www.justsql.co.uk/chris_date/cjd_edin_may_2010.htm

Live Meetings starting May – see http://sqlserverfaq.com