TSQL Coding Guidelines

Post on 14-Dec-2014

4,348 views 5 download

description

T-Sql programming guidelines, in terms of:- 1. Commenting code 2. Code readability 3. General good practise 4. Defensive coding and error handling 5. Coding for performance and scalability

transcript

Chris Adkin

August 2011

Commenting code

Code readability

General best practise

Comments and exception handling have been purposely omitted from code fragments in the interest of brevity, such that each fragment can fit onto one slide.

All code should be self documenting.

T-SQL code artefacts, triggers, stored procedures and functions should have a standard comment banner.

Comment code at all points of interest, describe why and not what.

Avoid in line comments.

Comment banners should include:-

Author details.

A brief description of what the code does.

Narrative comments for all arguments.

Narrative comments for return types.

Change control information.

An example is provided on the next slide

CREATE PROCEDURE

/*===================================================================================*/

/* */

/* Name : */

uspMyProc

/* */

/* Description: Stored procedure to demonstrate what a specimen comment banner */

/* should look like. */

/* */

/* Parameters : */

/* */

(

@Parameter1 int, /* First parameter passed into procedure. */

/* --------------------------------------------------------------------------- */

@Parameter2 int /* Second parameter passed into procedure. */

)

/* */

/* Change History */

/* ~~~~~~~~~~~~~~ */

/* */

/* Version Author Date Ticket Description */

/* ------- -------------------- -------- ------ ------------------------------------ */

/* 1.0 C. J. Adkin 09/08/11 3525 Initial version created. */

/* */

/*===================================================================================*/

AS

BEGIN

.

.

-- This is an example of an inline comment

Why are these bad ?

Because a careless backspace can turn a useful statement into a commented out one.

But my code is always thoroughly tested

NO EXCSUSE, always code defensively

Use /* */ comments instead.

Use and adhere to naming conventions.

Use meaningful object names.

Never prefix application stored procedures with sp

SQL Server will always scan through the system catalogue first, before executing such procedures

Bad for performance

Use ANSI SQL join syntax over none ANSI syntax.

Be consistent when using case:-

Camel case

Pascal case

Use of upper case for reserved key words

Be consistent when indenting and stacking text.

BEST PRACTICES

Never blindly take technical hints and tips written in a blog or presentation as gospel.

Test your assumptions using “Scientific method”, i.e.:-

Use test cases which use consistent test data across all tests, production realistic data is preferable.

If the data is commercially sensitive, e.g. bank account details, keep the volume and distribution the same, obfuscate the sensitive parts out.

Only change one thing at a time, so as to be able to gauge the impact of the change accurately and know what effected the change.

The “Scientific Method” Approach

• For performance related tests always clear the procedure and buffer cache out, so that results are not skewed between tests, use the following:-

– CHECKPOINT

– DBCC FREEPROCCACHE

– DBCC DROPCLEANBUFFERS

This is furnishing the code with a facility to allow its execution to be traced.

Write to a tracking table

And / or use xp_logevent to write to event log

DO NOT make the code a “Black box” which has to be dissected statement by statement in production if it starts to fail.

A term coined by Jeff Moden, a MVP and frequent poster on SQL Server Central.com .

Alludes to:-

Coding in procedural 3GL way instead of a set based way.

Chronic performance of row by row oriented processing.

Abbreviated to RBAR, pronounced Ree-bar.

Code whereby result sets and table contents are processed line by line, typically using cursors.

Correlated subqueries.

User Defined Functions.

Iterating through results sets as ADO objects in SQL Server Integration Services looping containers.

A simple, but contrived query written against the AdventureWorkds2008R2 database.

The first query will use nested subqueries.

The second will use derived tables.

SELECT ProductID,

Quantity

FROM AdventureWorks.Production.ProductInventory Pi

WHERE LocationID = (SELECT TOP 1

LocationID

FROM AdventureWorks.Production.Location Loc

WHERE Pi.LocationID = Loc.LocationID

AND CostRate = (SELECT MAX(CostRate)

FROM AdventureWorks.Production.Location) )

SELECT ProductID,

Quantity

FROM (SELECT TOP 1

LocationID

FROM AdventureWorks.Production.Location Loc

WHERE CostRate = (SELECT MAX(CostRate)

FROM AdventureWorks.Production.Location) ) dt,

AdventureWorks.Production.ProductInventory Pi

WHERE Pi.LocationID = dt.LocationID

What is the difference between the two queries ?.

Query 1, cost = 0.299164

Query 2, cost = 0.0202938

What is the crucial difference ?

Table spool operation in the first plan has been executed 1069 times.

This happens to be the number of rows in the ProductInventory table.

Row oriented processing may be unavoidable under certain circumstances:-

The processing of one row depends on the state of one or more previous rows in a result set.

The row processing logic involves a change to the global state of the database and therefore cannot be encapsulated in a function.

In this case there are ways to use cursors in a very efficient manner

As per the next three slides.

Elapsed time 00:22:27.892

DECLARE @MaxRownum int,

@OrderId int,

@i int;

SET @i = 1;

CREATE TABLE #OrderIds (

rownum int IDENTITY (1, 1),

OrderId int

);

INSERT INTO #OrderIds

SELECT SalesOrderID

FROM Sales.SalesOrderDetail;

SELECT @MaxRownum = MAX(rownum)

FROM #OrderIds;

WHILE @i < @MaxRownum

BEGIN

SELECT @OrderId = OrderId

FROM #OrderIds

WHERE rownum = @i;

SET @i = @i + 1;

END;

Elapsed time 00:00:03.106

DECLARE @s int;

DECLARE c CURSOR FOR

SELECT SalesOrderID

FROM Sales.SalesOrderDetail;

OPEN c;

FETCH NEXT FROM c INTO @s;

WHILE @@FETCH_STATUS = 0

BEGIN

FETCH NEXT FROM c INTO @s;

END;

CLOSE c;

DEALLOCATE c;

Elapsed time 00:00:01.555 DECLARE @s int;

DECLARE c CURSOR FAST_FORWARD FOR

SELECT SalesOrderID

FROM Sales.SalesOrderDetail;

OPEN c;

FETCH NEXT FROM c INTO @s;

WHILE @@FETCH_STATUS = 0

BEGIN

FETCH NEXT FROM c INTO @s;

END;

CLOSE c;

DEALLOCATE c;

No T-SQL language feature is a “Panacea to all ills”.

For example:-

Avoid RBAR logic where possible

Avoid nesting cursors

But cursors do have their uses.

Be aware of the FAST_FORWARD optimisation, applicable when:-

The data being retrieved is not being modified

The cursor is being scrolled through in a forward only direction

When using SQL Server 2005 onwards:-

Use TRY CATCH blocks.

Make the event logged in CATCH block verbose enough to allow the exceptional event to be easily tracked down.

NEVER use exceptions for control flow, illustrated with an upsert example in the next four slides.

NEVER ‘Swallow’ exceptions, i.e. catch them and do nothing with them.

DECLARE @p int;

DECLARE c CURSOR FAST_FORWARD FOR

SELECT ProductID

FROM Sales.SalesOrderDetail;

OPEN c;

FETCH NEXT FROM c INTO @p;

WHILE @@FETCH_STATUS = 0

BEGIN

FETCH NEXT FROM c INTO @p;

/* Place the stored procedure to be tested

* on the line below.

*/

EXEC dbo.uspUpsert_V1 @p;

END;

CLOSE c;

DEALLOCATE c;

CREATE TABLE SalesByProduct (

ProductID int,

Sold int,

CONSTRAINT [PK_SalesByProduct] PRIMARY KEY CLUSTERED

(

ProductID

) ON [USERDATA]

) ON [USERDATA]

Execution time = 00:00:51.200

CREATE PROCEDURE uspUpsert_V1 (@ProductID int) AS

BEGIN

SET NOCOUNT ON;

BEGIN TRY

INSERT INTO SalesByProduct

VALUES (@ProductID, 1);

END TRY

BEGIN CATCH

IF ERROR_NUMBER() = 2627

BEGIN

UPDATE SalesByProduct

SET Sold += 1

WHERE ProductID = @ProductID;

END

END CATCH;

END;

Execution time = 00:00:20.080

CREATE PROCEDURE uspUpsert_V2 (@ProductID int) AS

BEGIN

SET NOCOUNT ON;

UPDATE SalesByProduct

SET Sold += 1

WHERE ProductID = @ProductID;

IF @@ROWCOUNT = 0

BEGIN

INSERT INTO SalesByProduct

VALUES (@ProductID, 1);

END;

END;

With SQL Server 2008 onwards, consider using the MERGE statement for upserts, execution time = 00:00:20.904

CREATE PROCEDURE uspUpsert_V3 (@ProductID int) AS

BEGIN

SET NOCOUNT ON;

MERGE SalesByProduct AS target

USING (SELECT @ProductID)

AS

source (ProductID)

ON (target.ProductID = source.ProductID)

WHEN MATCHED THEN

UPDATE

SET Sold += 1

WHEN NOT MATCHED THEN

INSERT (ProductID, Sold)

VALUES (source.ProductID, 1);

END;

Understand and use the full power of T-SQL.

Most people know how to UNION results sets together, but do not know about INTERSECT and EXCEPT.

Also a lot of development effort can be saved by using T-SQL’s analytics extensions where appropriate:-

RANK()

DENSE_RANK()

NTILE()

ROW_NUMBER()

LEAD() and LAG() (introduced in Denali)

Scalar functions are another example of RBAR, consider this function:-

CREATE FUNCTION udfMinProductQty ( @ProductID int )

RETURNS int

AS

BEGIN

RETURN ( SELECT MIN(OrderQty)

FROM Sales.SalesOrderDetail

WHERE ProductId = @ProductID )

END;

Now lets call the function from an example query:-

SELECT ProductId,

dbo.udfMinProductQty(ProductId)

FROM Production.Product

Elapsed time = 00:00:00.746

Now doing the same thing, but using an inline table valued function:-

CREATE FUNCTION tvfMinProductQty (

@ProductId INT

)

RETURNS TABLE

AS

RETURN (

SELECT MAX(s.OrderQty) AS MinOrdQty

FROM Sales.SalesOrderDetail s

WHERE s.ProductId = @ProductId

)

Invoking the inline TVF from a query:-

SELECT ProductId,

(SELECT MinOrdQty

FROM dbo.tvfMinProductQty(ProductId) ) MinOrdQty

FROM Production.Product

ORDER BY ProductId

Elapsed time 00:00:00.330

Leverage functionality already in SQL Server, never reinvent it, this will lead to:-

More robust code

Less development effort

Potentially faster code

Code with better readability

Easier to maintain code

A scenario that actually happened:-

A row is inserted into the customer table

Customer table has a primary key based on an identity column

@@IDENTITY is used to obtain the key value of the customer row inserted for the creation of an order row with a foreign key linking back to customer.

The identity value obtained is nothing like the one for the inserted row – why ?

@@IDENTITY obtains the latest identity value irrespective of the session it came from.

In the example the replication merge agent inserted a row in the customer table just before @@IDENTITY was used.

The solution: always use SCOPE_IDENTITY() instead of @@IDENTITY.

Developing applications that use database and perform well depends on good:-

Schema design

Compiled statement plan reuse.

Connection management.

Minimizing the number of network round trips between the database and the tier above.

Parameterise your queries in order to minimize compiling.

BUT, watch out for “Parameter sniffing”.

At runtime the database engine will sniff the values of the parameters a query is compiled with and create a plan accordingly.

Unfortunate when the values cause plans with table scans, when the ‘Popular’ values lead to plans with index seeks.

Use the RECOMPILE hint to force the creation of a new plan.

Use the optimise for hint in order for a plan to be created for ‘Popular’ values you specify.

Use the OPTIMISE FOR UNKNOWN hint, to cause a “General purpose” plan to be created.

Copy parameters passed into a stored procedure to local variables and use those in your query.

For OLTP style applications:-

Transactions will be short

Number of statements will be finite

SQL will only affect a few rows for each execution.

The SQL will be simple.

Plans will be skewed towards using index seeks over table scans.

Recompiles could double+ query execution time.

Therefore recompiles are undesirable for OLTP applications.

For OLAP style applications:- Complex queries that may involve aggregation and analytic

SQL. Queries may change constantly due to the use of reporting

and BI tools. May involve WHERE clauses with potentially lots of

combinations of parameters. Foregoing a recompile via OTPION(RECOMPILE) may be

worth taking a hit on for the benefit of a significant reduction in total execution time.

This is the exception to the rule.

Be careful when using table variables.

Statistics cannot be gathered on these

The optimizer will always assume they only contain one row.

This can lead to unexpected execution plans.

Table variables will always inhibit parallelism in execution plans.

This applies to conditions in WHERE clauses.

If a WHERE clause condition can use an index, this is said to be ‘Sargable’

A searchable argument

As a general rule of thumb the use of a function on a column will suppress index usage.

i.e. WHERE ufn(MyColumn1) = <somevalue>

Constructs that will always force a serial plan:-

All T-SQL user defined functions.

All CLR user defined functions with data access.

Built in function including: @@TRANCOUNT, ERROR_NUMBER() and OBJECT_ID().

Dynamic cursors.

Constructs that will always force a serial region within a plan:- Table value functions TOP Recursive queries Multi consumer spool Sequence functions System table scans “Backwards” scans Sequence functions Global scalar aggregate

Make stored procedures and functions relatively single minded in what they do.

Stored procedures and functions with lots of arguments are a “Code smell” of code that:-

Is difficult to unit test with a high degree of confidence.

Does not lend itself to code reuse.

Smacks of poor design.

An ‘Ordinal’ in the context of the ORDER BY clause is when numbers are used to represent column positions.

If the new columns are added or their order changed in the SELECT, this query will return different results, potentially breaking the application using it.

SELECT TOP 5

[SalesOrderNumber]

,[OrderDate]

,[DueDate]

,[ShipDate]

,[Status]

FROM [AdventureWorks].[Sales].[SalesOrderHeader]

ORDER BY 2 DESC

SELECT * retrieves all columns from a table

bad for performance if only a subset of these is required.

Using columns by their names explicitly leads to improved code readability.

Code is easier to maintain, as it enables the “Developer” to see in situ what columns a query is using.