Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

Microsoft SQL ServerMicrosoft SQL ServerMicrosoft SQL ServerMicrosoft SQL ServerFiltered Filtered Indexes and Indexes and Sparse Sparse Columns:Columns:

Together Together SeparatelySeparatelyTogether, Together, SeparatelySeparatelySpeaker: Don Vilen

Chi f S i i B Si hChief Scientist, BuySight

February 2011February 2011

Mark Ginnebaugh, User Group Leaderwww.bayareasql.org

15 Feb 2011

Filtered Indexes and Filtered Indexes and Sparse Columns:Sparse Columns:

Together, Separately Together, Separately ––Together, Separately Together, Separately

Don Vilen Chief Scientist BuysightDon Vilen, Chief Scientist, [email protected]

AgendaAgendaAgendaAgenda◦ Filtered Indexes◦ Filtered Statistics◦ Wide Tables

S C l◦ Sparse Columns

T th ◦ Together …◦ … and Separately

◦ Everything is SQL Server 2008 (and later), in all editions

The ScenarioThe ScenarioThe ScenarioThe Scenario◦ 100,000 rows in the table

99 500 hi i l i i 500 99,500 rows are historical, remaining 500 rows are current Indicated by NULL EndDate column or IsActive bit, etc.

◦ All queries on current data use index◦ But why index all the historical 99.5% of the table?

◦ 1 000 columns in a table◦ 1,000 columns in a table◦ BikeColor column is relevant only if ItemType is

‘Bicycle’ For 0.5% of the rows; remainder are NULL

◦ But why index all the rows regardless of ItemTypevalue?

Filtered IndexesFiltered IndexesFiltered IndexesFiltered Indexes◦ Indexes only rows with values that match WHERE clause CREATE INDEX xyz ON table(columns, …)y ( , )

WHERE EndDate IS NULL WHERE IsActive = 1 WHERE ItemType = ‘Bicycle’

◦ Uses: Ranges of values for smaller portion of large table

Avoid the common 80-90% of data where the index wouldn’t be helpful For categories of row data

Index on Column120 and Column121 only useful when C1 = 37 Table partitions, where index is needed only on the ‘current’ partition(s)

Each partition will have the index structure, but only ‘current’ partitions will have any rows in the index

◦ Benefits Better query performance Reduction in storage costs Reduction in maintenance cost/time

Filtered Index Filtered Index –– Allowed SyntaxAllowed SyntaxFiltered Index Filtered Index Allowed SyntaxAllowed Syntax◦ WHERE <filter_predicate>[from BOL: CREATE INDEX] <filter_predicate> ::= <conjunct> [ AND <conjunct> ] <conjunct> ::= <disjunct> | <comparison> <disjunct> ::= column_name IN (constant ,…) <comparison> ::= column_name <comparison_op> constant

<comparison_op> ::= { IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !< }

◦ No BETWEEN, no LIKE, no subquery, no variables

◦ So must be simple and deterministic

Filtered Indexes Filtered Indexes –– RequirementsRequirementsFiltered Indexes Filtered Indexes RequirementsRequirements◦ Always some comparison involved, so must agree

on how operations work so requires standard on how operations work, so requires standard SET options ON for ANSI_NULLS, ANSI_PADDING,

ANSI WARNINGS ARITHABORT ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER

OFF for NUMERIC_ROUNDABORT◦ Else: If not set when index is created, won’t create the index If not set when INSERT, UPDATE, DELETE, MERGE If not set when INSERT, UPDATE, DELETE, MERGE

affects the data, gives error and rolls back If not set when the index might be used to optimize the

query, it will not be considered

Filtered Indexes Filtered Indexes –– ApplicabilityApplicabilityFiltered Indexes Filtered Indexes ApplicabilityApplicability◦ Non-clustered indexes only (rather obviously )

F UNIQUE i d l th i d d ◦ For UNIQUE indexes, only the indexed rows must have unique index values Duplicates in the non-indexed rows are not checked, but

be careful that an update to a qualifying column doesn’t be careful that an update to a qualifying column doesn t cause a duplicate to occur CREATE UNIQUE INDEX ix1 ON xyz (c3)

WHERE c2 = 10 So now there is a way to create a unique index on

column with multiple NULL values; create index WHERE ColY IS NOT NULL

Fil d i d d l ◦ Filtered indexes do not apply to: XML indexes Full-text indexes Spatial indexes

Filtered Indexes Filtered Indexes –– Getting Them Used 1Getting Them Used 1Filtered Indexes Filtered Indexes Getting Them Used 1Getting Them Used 1

◦ QO can only use the index when it knows the index will match the conditions in the query’s WHERE clausematch the conditions in the query s WHERE clause◦ Assume Column120 and Column121 useful only when

C1 = 37 So CREATE INDEX i1 on dbo t1 (Column120 Column121)So CREATE INDEX i1 on dbo.t1 (Column120, Column121)

WHERE C1 = 37 SELECT Column121

FROM dbo.t1WHERE Column120 = 13WHERE Column120 = 13

Cannot use the index even if Column120 and Column121 only appear for C1 = 37 As far as the QO knows, there may be other Column120 or Column121

values that are not in the indexvalues that are not in the index

◦ Help the QO by adding more limiting predicates to WHERE clause Make it WHERE Column120 = 13 AND C1 = 37Make it WHERE Column120 = 13 AND C1 = 37


◦ WHERE with a variable rather than a literal◦ Assume index is on WHERE IsActive > 0 DECLARE @IsActive int; SET @IsActive = 1; SELECT xyz FROM table WHERE IsActive = @IsActiveSELECT xyz FROM table WHERE IsActive @IsActive

◦ QO doesn’t know value of variable, so doesn’t know if index fits So shouldn’t use variables as if they were constants

◦ Again, help the QO by adding more limiting predicates to WHERE clausep Make it WHERE IsActive = @IsActive AND IsActive > 0

B t h th t d ’t ll k hBut perhaps that doesn’t really make sense here


◦ WHERE with a function or conversion on the filter predicatepredicate Obvious: WHERE ABS(C1) = 37 Cannot use index on WHERE C1 = 37 Could change it to WHERE C1 = ABS(37) if same meaning .. but not in

hi this case Implicit conversions: Assume index is WHERE c3 > 100 DECLARE @varR real; SET @varR = 1000.5;@ @ SELECT * FROM tv2 WHERE c3 = @varR Requires conversion of c3 to real before comparison, so can’t use

index SELECT * FROM tv2 WHERE c3 = cast(@varR as int)(@ ) At least it requires no conversion of c3, but is unknown value at

optimization time, so can’t use index So add a limiting predicate … assuming you know it will always be

right SELECT * FROM tv2 WHERE c3 = cast(@varR as int) AND c3 > 100

A A MisMis--Application of Filtered IndexesApplication of Filtered IndexesA A MisMis Application of Filtered IndexesApplication of Filtered Indexes

◦ Create a filtered index on c and b with WHERE on c

◦ Attempt to use the index as a validation table

◦ In code use the index in a hint and expect to get no row back for a b where c is a match, b d d h but it gets an error instead due to hint prevents a plan from being created

Filtered Indexes Filtered Indexes –– And ViewsAnd ViewsFiltered Indexes Filtered Indexes And ViewsAnd Views◦ Cannot create a Filtered index on a view, not

even a non-clustered index on an indexed view But a filtered index can be chosen by the QO for the

f d f i f tiquery formed from a view .. or function

Filtered Indexes Filtered Indexes –– Considerations 1Considerations 1Filtered Indexes Filtered Indexes Considerations 1Considerations 1

◦ Storage size differences Fewer index rows take less space Less IO, more information fits in memory 4,000 pages vs. 1 pagep g p g

◦ Limits auto-parameterization QO will not auto-parameterize if predicate is used in a

filtered index (“in most cases” per BOL) filtered index ( in most cases , per BOL) Otherwise would inhibit use of filtered index So can affect plan reuse

◦ Index maintenance – same rebuild and reorganize as regular index But hopefully much less work to doBut hopefully much less work to do

Filtered Indexes Filtered Indexes –– Considerations 2Considerations 2Filtered Indexes Filtered Indexes Considerations 2Considerations 2

◦ Covering index Consider INCLUDEing other columns so more

likely to be selected by QO

DTA fil d i d◦ DTA can suggest a filtered index ColX IS NOT NULL – only of this form But the missing indexes functionality does not flag But the missing-indexes functionality does not flag

them as missing

◦ When not to use: When non-filtered index already exists, or another

access path is likely better or adequate Avoid the extra index maintenance

Filtered StatisticsFiltered StatisticsFiltered StatisticsFiltered Statistics◦ CREATE STATISTICS stats1 ON table (cols)

WHERE <condition>◦ Uses: Can create filtered statistics on skewed data to assist QO Filtered Statistics will likely be more precise because they cover only the

data in the filtered subset (or filtered index)data in the filtered subset (or filtered index) Table partitions, where statistics are needed only on ‘current’ partition(s)

◦ Cannot reference a computed column, a UDT column, a spatial data type column, or a hierarchyID data type column

◦ AutoCreateStats will create statistics on Filtered Index key columns

◦ AutoCreateStats will not create filtered statistics on other ◦ AutoCreateStats will not create filtered statistics on other columns You have to create them yourself

◦ AutoUpdateStats will keep them updated once they are created

Metadata for Indexes, StatisticsMetadata for Indexes, StatisticsMetadata for Indexes, StatisticsMetadata for Indexes, Statistics◦ sys.indexes has_filter, filter_definition

◦ sys.stats has_filter, filter_definition

SSMS◦ SSMS Indexes and Statistics Properties have a Filter tab

Questions on Filtered Indexes, Questions on Filtered Indexes, StatisticsStatistics Any questions?y q

Now we’ll move on to Wide Tables Now we ll move on to Wide Tables, Sparse Columns

Wide TablesWide TablesWide TablesWide Tables◦ Up to 30,000 Columns Great for Sharepoint-like “a row is an object, some

attributes depend on other attributes”◦ Some limits:Some limits: Columns per non-wide table: 1,024 Columns per wide table: 30,000 Columns per SELECT statement: 4,096 Columns per INSERT statement: 4,096 Indexes per table: 1 000 Indexes per table: 1,000 Statistics per table: 30,000 BOL: Maximum Capacity Specifications for SQL Server

Wide TableWide TableWide TableWide Table◦ A wide table has defined a column set, using sparse

columnscolumns New row structure for sparse columns {column, value}, {column, value} …

Can create flexible schemas within an application Can create flexible schemas within an application Can add or drop columns whenever you want without

having to touch each row◦ The maximum size of a wide table row is 8 018 ◦ The maximum size of a wide table row is 8,018

bytes, so most of the data in a row has to be NULL Or has to be varchar-type columns so it can overflow to

another pageanother page◦ Limit is still 1,024 for number of non-sparse

columns plus computed columns, even in a wide tabletable

Wide Tables Wide Tables –– Performance ImpactPerformance ImpactWide Tables Wide Tables Performance ImpactPerformance Impact

◦ Performance considerations: Increased run-time and compile-time memory

requirementsWid t bl h t 30 000 l d fi d Wide tables can have up to 30,000 columns defined; this can increase compile time

There can be up to 1,000 indexes on a wide table, p , ,which increases the index maintenance time Nonclustered indexes should be filtered indexes to

minimize their impactminimize their impact

For more information, see BOL: Performance Considerations for Wide Tablesfor Wide Tables

Sparse ColumnsSparse ColumnsSparse ColumnsSparse Columns◦ CREATE TABLE … (…, c1 int SPARSE NULL,

…)◦ New row format for sparse columns

◦ Column: Must be NULLable Cannot be part of a cluster index

C b f k d Cannot be part of a primary key index Cannot have a DEFAULT Cannot be a computed column Cannot be a computed column

Sparse Columns Sparse Columns –– Some More Some More CannotsCannotsSparse Columns Sparse Columns Some More Some More CannotsCannots

◦ Some types cannot be sparse: geography • ntext • User-defined data types geometry • text image • timestamp

S b b l◦ Some attributes cannot be on sparse columns No Filestream

N t Id tit Not Identity Not RowGuidCol

Sparse Columns Sparse Columns –– Types and SizeTypes and SizeSparse Columns Sparse Columns Types and SizeTypes and Size◦ Size impact An important consideration but not the only one

◦ At what percentage of NULLs does a sparse At what percentage of NULLs does a sparse column take less space than a non-sparse column?

N S S N ll E iNon-Sparse Sparse Null Estimate BIT 1/8th byte 4 1/8th bytes –> 98% BIGINT 8 bytes 12 bytes –> 52%y y

See BOL: Using Sparse Columns for a complete table of typesSee BOL: Using Sparse Columns for a complete table of types

Column SetsColumn SetsColumn SetsColumn Sets◦ How do you know which columns ‘exist’ for a row?◦ You could just SELECT them; those that don’t exist are NULLYou could just SELECT them; those that don t exist are NULL◦ Can define a “Column set” Optional, only one per table

◦ Include a column: MyColSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

◦ Selecting from MyColSet returns an XML description of the sparse columns in that row <c25>ABC</c25><c34>599</c34> <c25>ABC</c25><c34>599</c34>

◦ Can INSERT / UPDATE sparse columns by Referring to them by name as usual, or Specifying the XML for the Column_Set column

See BOL: Using Column Sets for more details

Feature / Technology SupportFeature / Technology SupportFeature / Technology SupportFeature / Technology Support◦ Sparse columns and column sets are not fully

d b SQL S h l isupported by some SQL Server technologies

◦ S arse Col mns not s orted b :◦ Sparse Columns not supported by: Merge Replication

◦ Column Sets not supported by: Replication, Distributed Query, Change Data p y g

Capture

See BOL: Using Column Sets for more details See BOL: Using Column Sets for more details

Meta Data for Sparse ColumnsMeta Data for Sparse ColumnsMeta Data for Sparse ColumnsMeta Data for Sparse Columns◦ sys.columns – is_sparse, is_column_set And in: sys.system_columns sys all columns sys.all_columns sys.computed_columns sys.identity_columns

◦ Do not confuse with sparse files as used for Database Snapshots The is_sparse in sys.database_files, sys.master_files

TogetherTogetherTogetherTogether◦ Sparse Columns together with Filtered Index◦ On Sparse column, filtered index with

xx IS NOT NULL avoids indexing all the rows with no value

◦ Makes a lot of sense, and likely the driving force behind filtered indexesB d d l◦ But not needed on every sparse column

SeparatelySeparatelySeparatelySeparately◦ Filtered Index without Sparse Column Filtered indexes on skewed data Filtered statistics on skewed data

◦ Sparse Column without Filtered Index Sparse columns on sparse data, perhaps no index to

go with it

SummarySummarySummarySummary◦ Filtered Indexes◦ Filtered Statistics◦ Wide Tables◦ Sparse Columns◦ Sparse Columns

◦ Together …Together …◦ … and Separately

◦ Don Vilen Chief Scientist, Buysight DVilen@buysight com [email protected]

To learn more or inquire about speaking opportunities, please q p g pp , pcontact:

Mark Ginnebaugh, User Group Leader [email protected]

Date post:	29-Nov-2014
Category:	Business
Upload:	mark-ginnebaugh
View:	2,081 times
Download:	1 times

Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

Business