Project Management Database and SQL Server Katmai New Features Qingsong Yao [email protected].

Project Management Database and SQL Server Katmai New

Features

Qingsong [email protected]

Disclaimer

• The content in this slides is demonstration only. Please do your own research before trying to apply either Sparse or Compression technology to your product server.

Topics

• Katmai Related New Features• A Project Management System• Experimental result• Reference

Sparse Columns• Sparse is purely a storage attribute• DDL support for specifying a column as “Sparse”

– CREATE TABLE t (id int, sparseProp1 int SPARSE);

– ALTER TABLE t ALTER COLUMNS sparseProp1 DROP SPARSE;

• No Query/DML behavior changes for a sparse column• Column metadata will have a bit to indicate the sparse

attribute• Storage Optimization:

– Sparse vector design: 0 bytes stored for a NULL value– Have overhead for not null values (4 byte per not null column + 4 byte

header)

Sparse Columns Restrictions• Sparse columns cannot be part of a key in

clustered index or a PK index or part of a partition key.

• Sparse column cannot be the key column for an unique index.

• Unique constraints are also not allowed on sparse columns

• Sparse columns cannot be defined as "non-null" and cannot have any "default" values.

• Rules" are not supported on sparse columns.

Filtered Indices

• “Filtered index” is such a mechanism that allows the table designer to define a regular index that optionally includes a simple filter (predicate) on that table to specify the qualifying rows that need to be indexed for that index.

• Examples:– Create Index filtered_index on WSS.List(Author) where ListId = 5

• Examples of valid filter expressions include– Listid = 10 and folderid > 20 – Listid = 10 and folderid > 20 and folderid < 50– Listid in (10, 20, 30) – Listid in (10, 20) and folderid in (15, 25)

Filtered IndicesTECHNOLOGY Index

Maintenance Cost Filtering Capability

Large number of indices

Regular Index High for IO; Low for CPU

None No (due to high maintenance costs)

Index Views Low for IO; High for CPU

Allows Complex filtering

No (due to high maintenance costs)

Filtered Index Low for IO; Low for CPU

Simple filters Yes (due to lowered maintenance costs)

• Support Online Operation, Alter Index, Partition Table, Index hints, and DTA• Has side impact on query parameterization (because we have to do predicate matching).• Sparse Column and Not-null Filtered Indices like:

Create index on t1(c1) where c1 is not null are very helpful on querying/storing “sparse” columns (no impact on query parameterization).

Data Compression• Different Compression Types– Vardecimal Compression (SQL Server 2005 SP2)– Row compression• Row compression contains vardecimal compression

– Page compression• Page compression contains row compression

• Main Focus was data warehouse scenarios– But very useful for certain OLTP scenarios as well

• Main goal: Enabling compression does not require application changes

• Compression only supported in Enterprise Edition

Microsoft Confidential 8

Row compression• Light-weight compression

– Useful for certain OLTP scenarios• All columns stored as variable data in new record format• Reduce overhead per column (4-bits vs 2 bytes)• Store minimal number of bytes per value:

– Leading zero bytes removed for int, smallint, tinyint, bigint, datetime, smalldatetime, money, smallmoney, real, float

– Trailing spaces removed for char, nchar, binary– Decimal / Numeric vardecimal compressed (same compression as

vardecimal compression)• NULL / 0 value take no space (besides overhead)• No compression for varchar, nvarchar, varbinary, text, ntext, image

9Microsoft Confidential

Page Compression

• Compress all data on a single data page• Compress ‘similar’ column values by only storing the value

once on the page instead of multiple times• Two page compression algorithms– Column Prefix– Dictionary

• User cannot choose algorithm, both algorithms are always applied

• Page Compression includes row compression

10Microsoft Confidential

Topics

• Katmai Related New Features• A Project Management System• Experimental result• Reference

Data Sources• Three tables store main workitem information

• Table has 17 regular columns (have meaningful name, and always have not null values), and all other columns has predefined random name, and random data type

• Views are defined on these columns to assign meaningful names to the columns

No. Columns

No. Rows No. Pages Min Rec Size

Max Rec Size

Avg. Rec Size

WorkItemsWere

360 1609413 521450 813 3055 1910.389

WorkItemsLatest

360 171331 56526 823 3111 2037.278

WorkItemAre: has identity had timestamp columns

Column Distribution Information• Column data type distribution:

• Null value distribution

Data Types Count

int 31

datetime 40

float 63

binary 1

nvarchar(256) 226

=100% null

>99% null >95% null >80% null >60% null

WorkItemsWere 64 266 293 313 322

WorkItemsLatest 64 265 291 317 327

Topics

• Katmai Related New Features• SQL Project Management database

overview• Experimental result• Reference

Summary• Using WorkItemsWere as source table , try following cases:

– Compress table using page compressionALTER TABLE WorkItemsWere REBUILD PARTITION = ALL

WITH (DATA_COMPRESSION = PAGE) – Compress table using row compression

ALTER TABLE WorkItemsWere REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW)

– Find all columns have more than 67% null, and change to sparse1. insert into temp table

select * into WorkItemsWere_Temp from WorkItemsWere 2. truncate data

truncate table WorkItemsWere3. change sparse script

alter table WorkItemsWere alter column [Fld10004] add sparse 4. insert data backinsert into WorkItemsWere select * from WorkItemsWere_temp5. rebuild indexALTER INDEX [PK_WorkItemsWere] ON [dbo].[WorkItemsWere] REBUILD

ResultNo. Pages

Min Rec Size

Max Rec Size

Avg. Rec Size

Estimated Saving

Time (mins)

Before 521450 813 3055 1910.4

Sparse 164503 (31.5%)

217 1863 697.3 (36%)

20

Page Compression

85116 (16%)

196 1336 286.57 (15%)

143388 (27%)

3

Row Compression

191047 (36%)

356 1739 817 238909 (46%)

3

Heap Rebuild

422092(81%)

1140 2874 1881 5

• Space Saving: Page Compress > Sparse > Row Compress > normal• Procedure sp_estimate_data_compression_savings can estimate space saving without doing the actual compression (but it is not very accurate)

CPU Overhead

• The database server have 16G memory, while table WorkItemsWere (the largest table) is 4G which mean all data can be in the cache, and physical I/Os are likely be 0.

• Sparse and Compression can save Logical I/O since they require reading less pages.

• Sparse and Compression can increase CPU time since the data need to be uncompressed.

Table Scan CPU Time Result

Cold Run (ms) Warm Run (ms) Logical I/O Physical I/O

Normal 11950 7831 523427 1647

Rebuild 8829 7644 (97%) 423193 161

Sparse 8502 8081 (108%) 164936 71

Page Compression

14321 13993 (178%) 85350 30

Row Compression

11310 10701 (136%) 191567 75

Select 17 regular columns + 27 predefined columns (with at least 20% not null value) + 10 random predefined columns

Result shows that sparse case has less CPU Overhead. Next slide shows the reason.

Table Scan CPU Time Result (2)Select 17 Regular columns

select 10 random sparse columns

Cold Run (ms)

Warm Run (ms)

Cold Run (ms)

Warm Run (ms)

Normal 10234 5959 3714 1872

Sparse 6848 5336 (89%) 2450 1888 (101%)

Page Compression 11388 11169 (187%)

1809 1607 (85%)

Row Compression 9126 8362 (149%) 2247 1638 (87.5%)

• Sparse does not have negative impact on regular columns. • Extracting Null values from sparse columns has higher CPU overhead than page compression and row compression case.• Page Compression and Row Compression are in table level, the CPU overhead of uncompressing Not Null values are higher.

Date post:	18-Dec-2015
Category:	Documents
Upload:	erik-harold-chandler
View:	215 times
Download:	3 times

Project Management Database and SQL Server Katmai New Features Qingsong Yao [email protected].

Documents