Sql Performance Tuning For Developers

transcript

SQL SERVER 2005/2008

Performance tuning for

the developer

Michelle Gutzait

gutzait@pythian.com

michelle.gutzait@gmail.com

Blog: http://michelle-gutzait.spaces.live.com/default.aspx

Whoami?

SQL Server Team Lead @ www.pythian.com

24/7 Remote DBA services

I live in Montreal

gutzait@pythian.com

michelle.gutzait@gmail.com

Blog: http://michelle-gutzait.spaces.live.com/default.aspx

Agenda – Part I

General concepts of performance and

tuning• Performance bottlenecks

• Optimization tools

• Table and index

• The data page

• the optimizer

• Execution plans

Agenda – Part II

Development performance Tips• T-SQL commands

• Views

• Cursors

• User-defined functions

• Working with temporary tables and table variables

• Stored Procedures and functions

• Data Manipulation

• Transactions

• Dynamic SQL

• Triggers

• Locks

• Table and database design issues

“The fact that I can

does not mean that I

should !”

Kimberly Tripp (?)

Always treat your

code as if it‟s

running:

Frequently

On large amount of data

In a very busy environment

The goal

Min response time and Max

throughput

Reduce network traffic, disk I/O

and CPU time

Start optimizing as early as

possible as it will be harder

later.

Design and Tuning Tradeoffs

Network Communication

Database Applications

Presentation Layer

Application Logic

Client OS

Network

OS/IO Subsystem

SQL Server

Operating

System and

Hardware

Client

Server

Client/Server Tuning Levels

The Typical Performance

Pyramid

Application / Query / Database Design

Operating Environment

HardwareBeware: In certain

environments this pyramid

may be upside down!

Application & performance

The result

“Ugly” code may perform

much better

Performance bottlenecks - tools

Windows Performance Monitor

SQL Server Profiler

SQL Server Management Studio

Performance bottlenecks – tools

(Cont…)

Database Engine Tuning Advisor

DMVs and statistics

SQL Server 2008 Activity Monitor

151515

Performance bottlenecks - tools

3-rd party tool

Let’s remember few basic

concepts…

Tables and Indexes

Possible

bottleneck

Possible bottleneck

Possible

bottleneck

Rows On A Page

Page Header

Data rows

Row Offset Table2 bytes each

The Data Row

Header Fixed data NB VB Variable data

Variable

4 bytes

20202020

Data access methods

• Helps locate data more rapidly

Index Structure: Nonclustered Index

Structure of Clustered Index

242424

Covering Index

252525

Heap table

• A table with no clustered index

RID is built from file:page:row

Table Scan

Will usually

be faster

using a

clustered

Parsing

Normalization

Sequence Tree

Is SQL?

Trivial Plan

Optimization

Syntatic

Transformation

Optimization

Execution Plan

YesIs Cheap

Enough?

SARG Selection

Index Selection

JOIN Selection

Caching

Memory Allocation

Execution

Plan – cost

optimization

Optimizer hints

View optimizer info

29292929

Few concepts in the Execution

Plan algorithm…

303030

Search ARGuments

SARG Always isolate Columns

SARG NOT SARG

where MonthlySalary > 600000/12 where MonthlySalary * 12 > 600000

where ID in (select ID from vw_Person) where dbo.fu_IsPerson(ID) = 1234

where firstname like 'm%' where SUBSTRING(firstname,1,1) = 'm’

= BETWEEN, >, <, LIKE ‟x%‟, EXISTS

Not SARGABLE:

LIKE „%x‟, NOT LIKE, NOT EXISTS, FUNCTION(column)

AND creates a single SARG

OR creates multiple SARG‟s

Table, column and index statistics

Sales…

…………………………………………………………

ALAKCACACACTILILILILIL

MTORORPATXTXWAWAWAWIWY

Step #

statblob

ALCAILIL

ORTXWAWY

……………………

sys.sysobjvalues (internal)

323232

Update statistics - “Rules of thumb”

Use auto create and auto update statistics

5% of the table changes

Still bad query:

Create statistics

Update statistics with FULLSCAN

Use multi-column statistics when queries have multi-

column conditions

Use AUTO_UPDATE_STATISTICS_ASYNC

database option

No stats for temporary objects and functions

333333

Join selection

JOIN Types

NESTED LOOP

Factors:

• JOIN strategies

• JOIN order

• SARG

• Indexes

HASH Joins are used when no useful index

exists on one or both of the JOIN inputs.

These can be converted to MERGE or LOOP

joins through effective indexing.

Joins - Optimization tip

Index intersection

SELECT *

FROM authors

WHERE au_fname = ‘Fox' AND au_lname

= ‘Mulder'

36363636

Tuning with indexes…

Index tips

MORE indexes – for queries, LESS indexes – for updates

More indexes – more possibilities for optimizer

Having a CLUSTERED INDEX is almost always a good

idea…

Sort operations: TOP, DISTINCT, GROUP BY, ORDER BY

and JOIN; WHERE

As narrow as possible to avoid excessive I/O

Use integer values rather than character values

Values with low selectivity

covering index - faster than a clustered index

Index tips 2

CLUSTERED index key in all non-clustered indexes (otherwise RID is used)

Frequently updated column and clustered index

Drop costly UNUSED indexes

High volume inserts – incremental Clustered index

Surrogate integer primary key (identity ?)

Clustered index for random modifications and index bottleneck

CLUSTERED index on non-unique columns – 16 bytes added (uniqueidentifier)

Creating index before rare heavy operations

When Changing/dropping CLUSTERED index, drop all

NON-CLUSTERED indexes first.

Don‟t forget to recreate them later

Indexes are almost always in cache, therefore are faster

Column referenced by OR and no index on the column

table scan.

PRIMARY KEY and UNIQUE CONSTRAINTS create

indexes

Foreign Keys do NOT create indexes

Index tips 3

Wide and fewer indexes are sometimes better

than many and narrower indexes

INCLUDE columns for covering index

Indexes are used to reduce the number of rows

fetched, otherwise they are not necessary

If TEMPDB resides on different physical disk,

you may use SORT_IN_TEMPDB

Index tips 4

414141

Analyze execution plans and

statistics

Demo - Indexes

424242

Fill Factor and PAD_INDEX

Default Fillfactor 0 – data pages 100% full

4343434343

Data modifications…

Page Header

Data modifications

Data rows

In-place

direct

Row A Ver 2

Page Header

Data modifications

Data rows

In-place

indirect

Page Header

Data modifications

Data rows

Differed update –forwarded

In a heap – rows are forwarded leaving old address in place

474747474747

Index fragmentation

INDEXES - fragmentationDBCC SHOWCONTIG ('Orders‘)

DBCC SHOWCONTIG scanning 'Orders' table...Table: 'Orders' (21575115); index ID: 1, database ID: 6TABLE level scan performed.- Pages Scanned................................: 20- Extents Scanned..............................: 5- Extent Switches..............................: 4- Avg. Pages per Extent........................: 4.0- Scan Density [Best Count:Actual Count].......: 60.00% [3:5]- Logical Scan Fragmentation ..................: 0.00%- Extent Scan Fragmentation ...................: 40.00%- Avg. Bytes Free per Page.....................: 146.5- Avg. Page Density (full).....................: 98.19

SELECT *

FROM sys.dm_db_index_physical_stats

(DatabaseID, TableId, IndexId, NULL, Mode)

Indexed Views

SELECT t1.Col2, t2.Col3,

count(*) as Cnt

FROM Table_1 t1

INNER JOIN Table_2 t2

ON t1.Col1 = t2.Col1

GROUP BY t1.Col2, t2.Col3

Possible bottleneck

“Performance tuning SQL Statements

involves doing things to allow the optimizer

make better decisions”

Your options for performance

tuning are indexing or rewriting

Questions

End of Part I…

Agenda – Part II

Development performance Tips• T-SQL commands

• Views

• Cursors

• User-defined functions

• Working with temporary tables and table variables

• Stored Procedures and functions

• Data Manipulation

• Transactions

• Dynamic SQL

• Triggers

• Locks

• Table and database design issues

535353

Returning/processing too much

data…

Database Applications

Presentation Layer

Application Logic

Client OS

Network

OS/IO Subsystem

SQL Server

What could possibly be “wrong”

with this query ?

SELECT * FROM MyTable WHERE Col1 = „x‟

SELECT Col1 FROM MyTable1, MyTable2

SELECT TOP 2000000 Col1 FROM MyTable1

Looping on the Client side: WHILE @i < 10000

Update tb1 WHERE Col = @i@i = @i + 1

What could possibly be wrong

with this query (cont) ?

SELECT *FROM MyTable t1INNER JOIN MyTable_2 t2 on t1.Col1 = t2.Col1INNER JOIN MyTable_3 t3 on t1.Col1 = t3.Col1LEFT JOIN MyTable_4 t4 on t1.Col1 = t4.Col1LEFT JOIN MyTable_5 t5 on t1.Col1 = t5.Col1 LEFT JOIN MyTable_6 t6 on t1.Col1 = t6.Col1LEFT JOIN MyTable_7 t7 on t1.Col1 = t7.Col1LEFT JOIN MyTable_8 t8 on t1.Col1 = t8.Col1LEFT JOIN MyTable_9 t9 on t1.Col1 = t8.Col1LEFT JOIN MyTable_10 t10 on t1.Col1 = t8.Col1 ……

What is the difference?

Short Long(er) ?

IF EXISTS

(SELECT 1 FROM MyTable)

SELECT @rc=COUNT(*)

FROM MyTable

IF @rc > 0

IF EXISTS

(SELECT * FROM MyTable)

IF EXISTS

IF NOT EXISTS

SELECT MyTable1.Col1,

MyTable1.Col2

FROM MyTable1

INNER JOIN MyTable2

ON MyTable1.Col1 = MyTable2.Col1

MyTable1.Col2

FROM MyTable1

WHERE MyTable1.Col1 IN

(SELECT MyTable2.Col1

FROM MyTable2)

585858

Short Long(er) ?

MyTable1.Col2

FROM MyTable1

WHERE MyTable1.Col1 IN

(SELECT MyTable2.Col1

FROM MyTable2)

MyTable1.Col2

FROM MyTable1

WHERE EXISTS

(SELECT 1

FROM MyTable2.Col1

WHERE MyTable2.Col1 =

MyTable1.Col1)

59595959

Sorting the data…

Sort No sort

SELECT Col1

FROM Table1

SELECT Col2

FROM Table2

SELECT Col1

FROM Table1

UNION ALL

SELECT Col2

FROM Table2

SELECT DISTINCT Col1

FROM Table1

SELECT Col1

FROM Table1

SELECT Col1

FROM Table1

WHERE col2 IN (SELECT DISTINCT Col3

FROM Table2)

SELECT Col1

FROM Table1

WHERE col2 IN (SELECT Col3

FROM Table2)

CREATE VIEW VW1

SELECT * FROM DB2..Table1

ORDER BY Col1

CREATE VIEW VW1

SELECT * FROM

DB2..Table1

Which one is BETTER ?

Sort No sort

SELECT Col1

FROM Table1

WHERE ModifiedDate

IN (SELECT TOP 1

FROM Table1

ORDER BY ModifiedDate

SELECT Col1

FROM Table1

WHERE ModifiedDate =

(SELECT MAX(ModifiedDate )

FROM Table1)

The OR operator

636363

What is the difference?OR No OR

SELECT Col1

FROM Table1

WHERE Col1 = „x‟

OR Col2 = „y‟

SELECT Col1

FROM Table1

WHERE Col1 = „x‟

SELECT Col1

FROM Table1

WHERE Col2 = „y‟

SELECT Col1

FROM Table1

WHERE Col1 IN

(SELECT C1 FROM Table2)

OR Col1 IN

SELECT Col1

FROM Table1

WHERE EXISTS (SELECT 1 FROM Table2

WHERE Col1 = C1)

UNION ALL

SELECT 1 FROM Table2

WHERE Col1 = C2)

SELECT *

FROM Table1

WHERE Col1 IN

OR Col2 IN

SELECT *

FROM Table1

•Row Locks

•Page Locks

•Table Locks

Lock granularity

Row Locks

Page LocksTable Locks

Lock granularity

> 5000

Principal lock types

Dirty Read

•WITH (NOLOCK)

• SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

Nonrepeatable Read

• Default

Phantom Read

ANSI Isolation Level

Dirty Reads Nonrepeatable Reads

Phantom Reads

Level 0

Level 1

Level 2

Level 3

Read uncommitted

Read committed (DEFAULT)

Repeatable reads

Serializable

SNAPSHOT

Programming with isolation

level locks

Database

Transaction

Statement/table

737373

Isolation levels - example

USE pubs

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

BEGIN TRANSACTION

SELECT au_lname FROM authors WITH (NOLOCK)

The locks generated are:

EXEC sp_lock

747474

EXEC Sp_lock

SELECT object_name(85575343)

-----------------------------

authors

spid dbid ObjId IndId Type Resource Mode Status

51 5 0 DB S GRANT

51 10 85575343 2 KEY (a802b526c101) RangeS-S GRANT

51 10 85575343 2 KEY (54013f7c6be5) RangeS-S GRANT

51 10 85575343 2 KEY (b200dbb63a8d) RangeS-S GRANT

51 10 85575343 2 KEY (49014dc93755) RangeS-S GRANT

51 10 85575343 2 KEY (170130366f3d) RangeS-S GRANT

51 10 85575343 2 PAG 1:1482 IS GRANT

51 10 85575343 2 KEY (c300d27116cf) RangeS-S GRANT

51 10 85575343 0 TAB IS GRANT

51 10 85575343 2 KEY (1101ed75c8f8) RangeS-S GRANT

51 10 85575343 2 KEY (2802f6d3696b) RangeS-S GRANT

51 10 85575343 2 KEY (0701fdd03550) RangeS-S GRANT

51 10 85575343 2 KEY (7f00d0d5506b) RangeS-S GRANT

Temporary Objects

767676

Temporary objects

##GlobalTmp

Tempdb..StaticTmp

@TableVariable

Table-valued functions

Common Table Extention (CTE)

View ?

FROM (SELECT …)

77777777777777

Stored Procedures…

What are the benefits of

Stored Procedures? Reduce network traffic

Reusable execution plans

Efficient Client execution requests

Code reuse

Encapsulation of logic

Client independence

Security implementation

As a general rule of thumb, all Transact-SQL code should be called from stored procedures.

Stored Procedures tips

SET NOCOUNT ON

No sp_

Owned by DBO

Exec databaseowner.objectname

Select from databaseowner.objectname

Break down large SPs

SP Recompilations

#temp instead of @Temp table variables

DDL statements

Some set commands

Use SQL Server Profiler to check recompilations

Which one is better and why?

IF @P = 0

SQL Statement Block1

SQL Statement Block2

IF @P = 0

Exec sp_Block1

Exec sp_Block2

What could be problematic

here?CREATE PROC MySP

@p_FROM INT, @p_TO INT

SELECT count(*) FROM MyTableWHERE PK between @p_FROM and @p_TO

198,739

3,898,787

CREATE … WITH RECOMPILE

EXECUTE … WITH RECOMPILE

sp_recompile objname

MyTable

7 million rows

Dynamic SQL… Sp_exectusql VS. execute

Which one is better and why?

EXEC („SELECT Col1 FROM Table1 „ +

„WHERE „ + @WhereClause)

Exec sp_executesql @SQLString

Exec sp_executesql @SQLString,

@ParmDefinition, @PK = @IntVariable

Reusable Execution (Query) Plan -

generated by sp_executesql

8686868686868686

Cursors…

Cusrors - implications

Resources Required at Each Stage

What could possibly replace

cursors? Loops ?

Temp tables

Local variables (!)

CASE statements

Multiple queries

AND…

Replacing cursor

Tip #1

Select Seq=identity(int,1,1),

……

Into #TmpTable

From Table1

Order by …

Seq Fld1 Fld2 …..

1 Aaa 45.7

2 Absb 555.0

3 Adasd 12.8

4 oioiooi 0.0

….. ….. ….. …..

Replacing cursor

Tip #2

declare @var int

set @var = 0

Update Table1set @Var = Fld2 = Fld2 + @VarFrom Table1 with (index=pk_MyExampleTable)option (maxdop 1)go

Cursor Example…

TRY ME….

Optimizer Hints…

Optimizer Hints

Most common

WITH (ROWLOCK)

WITH (NOLOCK)

WITH (INDEX = IX_INDEX_NAME)

WITH (HOLDLOCK)

SET FORCEPLAN ON

OPTION (MAXDOP 1)

Join hints (MERGE/HASH/LOOP)

Isolation levels WITH (SERIALIZABLE, READ COMMITED)

Granularity level (UPDLOCK, TABLOCK, TABLOCKX)

95959595

What is possibly wrong here?

BEGIN TRAN

UPDATE MyTable SET Col1 = ‘x’

WHERE Col1 IN

(SELECT Col1 from MyTable_2)

COMMIT TRAN

MyTable

BEGIN TRAN

WHERE Col1 IN

(SELECT Col1 from MyTable_2 WITH (NOLOCK) )

COMMIT TRAN

Tip…

If your database is Read Only in

nature, change it to be as such!

The Transaction Log…

989898

What is wrong here?

BEGIN TRAN

WHERE Col1 = ‘y’

IF @@ROWCOUNT <> 10

ROLLBACK TRAN

COMMIT TRAN

MyTable

1000 rows with Col1 = „y‟

99999999

What could be possibly

wrong here?

BEGIN TRAN

DELETE MyTable

COMMIT TRAN

MyTable

7 million rows

T-Log size

Concurrency

How do we “solve” this ?

What if we have a WHERE clause in the DELETE ?

Transaction Habits

As short as possible

Long transactions:

Reduce concurrency

Blocking and deadlocks more likely

Excess space in transaction log to not be

removed.

T-log IO

No “logical” ROLLBACKS!

101101

Triggers…

102102102102

What is wrong here?

CREATE TRIGGER TRG_MyTable_UP

ON MyTable

AFTER INSERT

UPDATE MyTableSET InsertDate = getdate()

FROM MyTable

INNER JOIN inserted ON MyTable.PK = inserted.PK

PK Insert

345667

MyTable

Typical Trigger Applications

• Cascading modifications through related tables

• Rolling back changes that violate data integrity

• Enforcing restrictions that are too complex for rules or constraints

• Maintaining duplicate data

• Maintaining columns with derived data

• Performing custom recording

• Try to use constraints instead of triggers, whenever possible.

104104

Tables Design Issues…

Column name Type Property Key/index

Employee ID Int NOT NULL

Identity (values are unique)

Clustered

First Name Char(100) NOT NULL

Last Name Char(100) NOT NULL

Hire Date Datetime NULL

Description Varchar(8000) NULL

ContractEndDate Char(8) NOT NULL Index

SelfDescription Varchar(8000) NOT NULL default „‟

Picture Image NULL

Comments Text NULL

Application rules:

All queries fetch EmployeeID , FirstName, LastName and HireDate WHERE EmployeeIDequals or BETWEEN two values, where ContractEndDate >= getdate()

All other column are fetched only when user drills down from application

FirstName, LastName, HireDate and ContractEndDate rarely change

Comments , Description And SelfDescription are rarely filled up and they never appear in the WHERE clause

Picture column is ALWAYS updated after row already exists.

Once the contract ends, the data should be saved but will not be queried by application

Employees

106106

Column name Type Property Key/index

Employee ID Int NOT NULL

Identity (values are unique)

Clustered

First Name Char(100) NOT NULL

Last Name Char(100) NOT NULL

Hire Date Datetime NULL

Description Varchar(8000) NULL

ContractEndDate Char(8) NOT NULL Index

SelfDescription Varchar(8000) NOT NULL default „‟

Picture Image NULL

Comments Text NULL

Clustered

UNIQUE

Varchar(100)

Datetime

Varbinary(MAX)

Varchar(MAX)

First…

107107107

Column name Key/index

Employee ID Clustered PK

First Name

Last Name

Hire Date

ContractEndDate Index

Description

SelfDescription

Picture

Comments

Employees (active

employees)

First Name

Last Name

Hire Date

Description

ContractEndDate

SelfDescription

Picture

Comments

OldEmployees (inactive

employees)

4 different tables?

Employees details

This is vertical

partitioning…

Column name Type

Employee ID INT

First Name Varchar(100)

Last Name Varchar(100)

Hire Date Datetime

ContractEndDate Datetime

Column name Type

Employee ID INT

Hire Date Datetime

Column name Type

Employee ID INT

Hire Date Datetime

Contract Date < 2008-01-01

Contract Date >= 2008-01-01

and < 2009-01-01

Contract Date >= 2009-01-01

Horizontal partitioning

109109109

Tips for the application side…

Server-side cursors prior to .NET 2.0

Sorts and grouping on the client

End-user reporting

Default Transaction isolation levels

Intensive communication with database

Connection pooling

Long transactions

Ad-hoc T-SQL

SQL injection…

Beware of…

111111

Performance Audit Checklist

Does the Transact-SQL code return more data than needed?

Is the interaction between the application and the Database Server too often.

Are cursors being used when they don't need to be? Does the application uses server-side cursors?

Are UNION and UNION ALL properly used?

Is SELECT DISTINCT being used properly?

Is the WHERE clause SARGable?

Are temp tables being used when they don't need to be?

Are hints being properly used in queries?

Are views unnecessarily being used?

Are stored procedures and sp_executesql being used whenever possible?

Inside stored procedures, is SET NOCOUNT ON being used?

Do any of your stored procedures start with sp_?

Are all stored procedures owned by DBO, and referred to in the form of databaseowner.objectname?

Are you using constraints or triggers for referential integrity?

Are transactions being kept as short as possible? Does the application keep transactions open when the user is modifying data?

Is the application properly opening, reusing, and closing connections?

Questions/

Autographs

End of Part II…

Sql Performance Tuning For Developers

Technology