+ All Categories
Home > Documents > Large Data Operations Joe Chang [email protected] .

Large Data Operations Joe Chang [email protected] .

Date post: 04-Jan-2016
Category:
Upload: amber-brook-osborne
View: 214 times
Download: 0 times
Share this document with a friend
27
Large Data Operations Joe Chang [email protected] www.sql-server-performance.com/joe _chang.asp
Transcript
Page 2: Large Data Operations Joe Chang jchang6@yahoo.com .

Large Data Operations OverviewLarge Data Operations Overview

Updates & DeletesModifying large row counts can be very slow?

Dropping indexes improves performance?

Inserts – See SQLDev.NetCovered in various presentations by Gert Drapers

Page 3: Large Data Operations Joe Chang jchang6@yahoo.com .

Execution Plan with IndexesExecution Plan with Indexes

1. Insert multiple rows into table with clustered index2. Rows are spooled3. Nonclustered indexes are modified from the spooled data

123

Operations with indexes in place should be fasterException - large inserts where bulk log requirements are met

Page 4: Large Data Operations Joe Chang jchang6@yahoo.com .

Execution Plan Cost Formula Execution Plan Cost Formula ReviewReview

Table Scan or Index Scan

I/O: 0.0375785 + 0.000740741 per pageCPU: 0.0000785 + 0.0000011 per row

Index Seek – Plan Formula

I/O Cost = 0.006328500 + 0.000740741 per additional page(≤1GB)

= 0.003203425 + 0.000740741 per additional page(>1GB)

CPU Cost = 0.000079600 + 0.000001100 per additional row

Bookmark Lookup

I/O Cost = multiple of 0.006250000 (≤1GB)

= multiple of 0.003124925 (>1GB)

CPU Cost = 0.0000011 per row

Insert, Update & Delete

IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows)

IUD CPU Cost = 0.000001 per row

Page 5: Large Data Operations Joe Chang jchang6@yahoo.com .

Plan Cost – Unit of MeasurePlan Cost – Unit of Measure

Time in seconds? CPU time?0.0062500sec -> 160/sec

0.000740741 ->1350/sec (8KB)->169/sec(64K)-> 10.8MB/sec

S2K BOL: Administering SQL Server, Managing Servers,Setting Configuration Options: cost threshold for parallelism OptQuery cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.

Too fast for 7200RPM disk random I/Os.

About right for 1997 sequential disk transfer rate?

Page 6: Large Data Operations Joe Chang jchang6@yahoo.com .

Test TableTest Table

CREATE TABLE M3C_00 (ID int NOT NULL, ID2 int NOT NULL,ID3 int NOT NULL, ID4 int NOT NULL,ID5 int NOT NULL, ID6 int NOT NULL,SeqID int NOT NULL,DistID int NOT NULL,Value char(10) NOT NULL,rDecimal decimal (9,4) NOT NULL,rMoney money NOT NULL,rDate datetime NOT NULL, sDate datetime NOT NULL )

CREATE CLUSTERED INDEX IX_M3C_00 ON M3C_00 (ID) WITH SORT_IN_TEMPDB

10M rows in table, 99 rows per page, 101,012 pages, 808MB100K rows for each distinct value of SeqID and DistIDCommon SeqID values are in adjacent rowsCommon DistID values are in separate 8KB pages (100 rows apart)

Page 7: Large Data Operations Joe Chang jchang6@yahoo.com .

Data Population ScriptData Population ScriptDECLARE @BatchStart int, @BatchEnd int , @BatchTotal int, @BatchSize int, @BatchRow int, @RowTotal int, @I int , @p int, @sc1 int, @dv1 intSELECT @BatchStart = 1, @BatchEnd = 1000, @BatchTotal = 1000, @BatchSize = 10000SELECT @RowTotal = @BatchTotal*@BatchSize , @p = 100, @sc1 = 100000SELECT @I = (@BatchStart-1)*@BatchSize+1 , @dv1 = @RowTotal/@sc1WHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3C_00 (ID,ID2,ID3,ID4,ID5,ID6,SeqID,DistID,Value,rDecimal,rMoney,rDate,sDate) VALUES ( @I, @I, 1+(@I-1)*@p/@RowTotal+((@I-1)*@p)%@RowTotal, (@I-1)%(@sc1)+1, (@I-1)/2+1, (@I-1)%320+1, (@I-1)/@sc1+1, (@I-1)%(@dv1)+1, CHAR(65+26*rand())+CHAR(65+26*rand())+CHAR(65+26*rand()) +CONVERT(char(6),CONVERT(int,100000*(9.0*rand()+1.0)))+CHAR(65+26*rand()), 10000*rand(), 10000*rand(), DATEADD(hour,100000*rand(),'1990-01-01'), DATEADD(hour,@I/5,'1990-01-01') ) SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(char,GETDATE(),121)+‘ row ' + CONVERT(char,@BatchRow)+' Complete'SET @BatchStart = @BatchStart+1END

Page 8: Large Data Operations Joe Chang jchang6@yahoo.com .

Data Population Script NotesData Population Script Notes

Double While LoopEach Insert/Update/Delete statement is an implicit transaction

Gets separate transaction log entry

Explicit transaction – generates a single transaction log write (max 64KB per IO)

Single TRAN for entire loop requires excessively large log file

Inserts are grouped into intermediate size batches

Page 9: Large Data Operations Joe Chang jchang6@yahoo.com .

IndexesIndexes

CREATE INDEX IX_M3C_01_Seq ON M3C_01 (SeqID) WITH SORT_IN_TEMPDBCHECKPOINT

CREATE INDEX IX_M3C_01_Dist ON M3C_01 (DistID) WITH SORT_IN_TEMPDBCHECKPOINT

UPDATE STATISTICS M3C_01 (IX_M3C_01_Seq) WITH FULLSCANUPDATE STATISTICS M3C_01 (IX_M3C_01_Dist) WITH FULLSCAN

Common SeqID values are in adjacent rowsCommon DistID values are in separate 8KB pages (100 rows apart)

Page 10: Large Data Operations Joe Chang jchang6@yahoo.com .

Test QueriesTest Queries

-- Sequential rows, table scanSELECT AVG(rMoney) FROM M3C_01 WHERE SeqID = 91

-- Sequential rows, index seek and bookmark lookupSELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Seq)) WHERE SeqID = 91

-- Distributed rows, table scanSELECT AVG(rMoney) FROM M3C_01 WHERE DistID = 91

-- Distributed rows, index seek and bookmark lookupSELECT AVG(rMoney) FROM M3C_01 WITH(INDEX(IX_M3C_01_Dist)) WHERE DistID = 91

Page 11: Large Data Operations Joe Chang jchang6@yahoo.com .

Execution Plans - Select Execution Plans - Select

Table scan involves 101,012 pagesBookmark Lookup involves 100,000 rows1 BL ~3.6X more expensive than 1 page in Table Scan

Page 12: Large Data Operations Joe Chang jchang6@yahoo.com .

Table Scan Cost DetailTable Scan Cost Detail

Table Scan Formula

I/O: 0.0375785 + 0.000740741 x 101,012 = 74.8CPU: 0.0000785 + 0.0000011 x 10M = 11.0

I/O and CPU cost occasionally show ½ the expected value, but combined cost shows the expected value

Page 13: Large Data Operations Joe Chang jchang6@yahoo.com .

Index and Bookmark DetailsIndex and Bookmark Details

Bookmark Lookup

I/O: 0.003124925x100Kx0.998

= 311.87

CPU: 0.0000011x100K = 0.11

Page 14: Large Data Operations Joe Chang jchang6@yahoo.com .

Measured Query TimesMeasured Query Times

SELECT query 100K rows

Sequential rows

Sequential rows

Distributed rows

Distributed rows

256M Server mem Index + BL Table Scan Index+BL Table Scan

Query time (sec) 0.3 10.5 167 10.5

Rows or Pages/sec 333,333(R) 9,620(P) 599(R) 9,620(P)

Disk IO/sec Low ~1,200 ~600 ~1,200

Avg. Byte/Read N/A 64K 8K 64K

1154MB Server mem

Query time 0.266 1.076 0.373 1.090

Rows or Pages/sec 376,000 93,877 268,000 92,672

Test System: 2x2.4GHz Xeon, data on 2 15K disk drives

Page 15: Large Data Operations Joe Chang jchang6@yahoo.com .

Disk Bound Select Query CostDisk Bound Select Query Cost

Performance limited by disk capability

Random 300/disk (small portion of 18GB drive & high queue depth)

Sequential 38MB/sec (Seagate ST318451, first generation 15K drive)

Disk drive random I/O ~2X gain since mid-1990’sSequential I/O ~ 5X

Cost formulas underestimate current generation disk drive sequential performance relative to randomHowever, SQL Server cost formulas do not reflect in-memory costs

Page 16: Large Data Operations Joe Chang jchang6@yahoo.com .

Update OperationUpdate Operation

Page 17: Large Data Operations Joe Chang jchang6@yahoo.com .

Update DetailsUpdate Details

Page 18: Large Data Operations Joe Chang jchang6@yahoo.com .

Actual Cost - Update Actual Cost - Update

UPDATE query - 100K rows

Sequential rows

Sequential rows

Distributed

rows

Distributed rows

256M server mem Index Table Scan

Index Table Scan

Query time (sec) 1.3 12.6 476.6 28

Checkpoint time (sec)

0.4 0.6 14.5 8

Rows /sec 57,471 7,576 203 2,778

1154MB server mem

Query time (sec) 0.8 1.3 0.9 1.5

Checkpoint time (sec)

0.2 0.1 23 23

Rows /sec 100,000 71,429 4,184 4,082

Page 19: Large Data Operations Joe Chang jchang6@yahoo.com .

Update VariationUpdate Variation

Default plan is now a table scanColumn value is not in the index, so a bookmark lookup is requiredHowever – data page must be loaded into buffer cache before it can be modified regardless!!

Page 20: Large Data Operations Joe Chang jchang6@yahoo.com .

Delete OperationDelete Operation

Page 21: Large Data Operations Joe Chang jchang6@yahoo.com .

Delete DetailsDelete Details

Page 22: Large Data Operations Joe Chang jchang6@yahoo.com .

Delete Details (2)Delete Details (2)

Page 23: Large Data Operations Joe Chang jchang6@yahoo.com .

Delete - Actual CostsDelete - Actual Costs

Delete query - 100K rows

Sequential rows

Sequential rows

Distributed rows

Distributed rows

256M Server mem Index Table Scan Index Table Scan

Query time (sec) 4.8 88.52 282 41

Checkpoint time (sec)

8.4 4.52 8.4 14

Rows / sec 7,576 1,075 340 1,800

1154MB Server mem

Query time (sec) 4.1 6.4 4.2 5.3

Checkpoint time (sec)

3.7 3.9 28.6 28.6

Rows /sec 12,821 9,708 3,048 2,949

Page 24: Large Data Operations Joe Chang jchang6@yahoo.com .

Delete–no indexesDelete–no indexes

Delete query, no index 100K rows Sequential rows Distributed rows

256M server mem Table Scan Table Scan

Query time (sec) 11.5 26

Checkpoint time (sec) 0.1 4

Rows / sec 8,621 3,300

1154MB server mem

Query time (sec) 1.9 1.5

Checkpoint time (sec) 0.2 22

Rows /sec 47,619 4,255

Page 25: Large Data Operations Joe Chang jchang6@yahoo.com .

Delete with Foreign KeysDelete with Foreign Keys

Page 26: Large Data Operations Joe Chang jchang6@yahoo.com .

SummarySummary

When large updates and deletes are slow

Examine the execute plan

Look for nonclustered index seeks on modified tables with high row count

Use index hint to force table scan

Page 27: Large Data Operations Joe Chang jchang6@yahoo.com .

Additional InformationAdditional Information

www.sql-server-performance.com/joe_chang.asp

SQL Server Quantitative Performance AnalysisSQL Server Quantitative Performance AnalysisServer System ArchitectureServer System ArchitectureProcessor PerformanceProcessor PerformanceDirect Connect Gigabit NetworkingDirect Connect Gigabit NetworkingParallel Execution PlansParallel Execution PlansLarge Data OperationsLarge Data OperationsTransferring StatisticsTransferring StatisticsSQL Server Backup Performance with Imceda LiteSpeedSQL Server Backup Performance with Imceda LiteSpeed

[email protected]


Recommended