Date post: | 18-Mar-2018 |
Category: |
Technology |
Upload: | itcamp |
View: | 323 times |
Download: | 0 times |
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Columnstore indexes – best
practices for the ETL process
Damian Widera
Microsoft Data Platform MVP
EUVIC
@damianwidera
http://sqlblog.com/blogs/damian_widera/default.aspx
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Many thanks to our sponsors & partners!
GOLD
SILVER
PARTNERS
PLATINUM
POWERED BY
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Visit Poland this autumn – 16th September
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Damian Widera
Project Manager & Technical Lead | EUVIC (www.euvic.pl)
MVP | MCT | MCSE | MCITP
+48 665-229-227
@damian.widera
facebook.com/damian.widera.10
http://sqlblog.com/blogs/damian_widera/default.aspx
Channel9
Kursy MVA
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
EUVIC
PALO ALTO
NOWY JORK
WARSZAWA
KATOWICE
GLIWICE
BIELSKO BIAŁA
WROCŁAW
CZĘSTOCHOWA
GDYNIA
KRAKÓW
BYDGOSZCZ
WIEDEŃ
BIAŁYSTOK
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Customers…
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Introduction to CI
• Three important views at the Clustered Columnstore
Index:
– How to load data efficiently
– How to use the index efficiently
– How to maintain it efficiently
• Internals....
What and how?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Traditional (rowstore) clustered
index
Saledate Product Amt GrossPrice SalesTax NetPrice ...
2012-03-08 Candy bar 50 75.00 14.25 89.25 ...
2012-03-10 Smart phone 1 349.50 66.41 419.91 ...
2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...
2012-03-12 Smart phone 1 349.50 66.41 419.91 ...
2012-03-19 Chair 1 599.50 113.91 713.41 ...
2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...
2012-03-20 Toy car 3 29.97 5.69 35.66 ...
2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...
2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...
2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...
2012-03-28 Candy bar 5 7.50 1.43 8.93 ...
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Traditional (rowstore) nonclustered
index
Saledate Product Amt GrossPrice SalesTax NetPrice ...
2012-03-08 Candy bar 50 75.00 14.25 89.25 ...
2012-03-10 Smart phone 1 349.50 66.41 419.91 ...
2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...
2012-03-12 Smart phone 1 349.50 66.41 419.91 ...
2012-03-19 Chair 1 599.50 113.91 713.41 ...
2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...
2012-03-20 Toy car 3 29.97 5.69 35.66 ...
2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...
2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...
2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...
2012-03-28 Candy bar 5 7.50 1.43 8.93 ...
Saledate Amt NetPrice
2012-03-08 50 89.25
2012-03-10 1 419.91
2012-03-11 7 33.46
2012-03-12 1 419.91
2012-03-19 1 713.41
2012-03-20 3 2,140.22
2012-03-20 2 3,403.40
2012-03-20 3 35.66
2012-03-21 14 66.93
2012-03-24 1 15.41
2012-03-27 2 9.56
2012-03-28 5 8.93
Saledate Amt NetPrice
2012-04-08 50 89.25
2012-04-10 1 419.91
2012-04-11 7 33.46
2012-04-12 1 419.91
2012-04-19 1 713.41
2012-04-20 3 2,140.22
2012-04-20 2 3,403.40
2012-04-20 3 35.66
2012-04-21 14 66.93
2012-04-24 1 15.41
2012-04-27 2 9.56
2012-04-28 5 8.93
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How Row Mode Works
• Each operator calls child for each row to
“pull” the next row
• Works fine for smaller queries
• Often each operator transition causes L2
cache misses to load instructions/data
• When databases were new, the cost of IO
was MUCH larger than CPU speed and
this never mattered
• Now the equation has changed
Project
Filter
Table Scan
GetRow()…(row returned)
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Saledate Product Amt GrossPrice SalesTax NetPrice ...
2012-03-08 Candy bar 50 75.00 14.25 89.25 ...
2012-03-10 Smart phone 1 349.50 66.41 419.91 ...
2012-03-11 Apple (bag) 7 31.57 1.89 33.46 ...
2012-03-12 Smart phone 1 349.50 66.41 419.91 ...
2012-03-19 Chair 1 599.50 113.91 713.41 ...
2012-03-20 Chair 3 1,798.50 341.72 2,140.22 ...
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40 ...
2012-03-20 Toy car 3 29.97 5.69 35.66 ...
2012-03-21 Apple (bag) 14 63.14 3.79 66.93 ...
2012-03-24 Pocket knife 1 12.95 2.46 15.41 ...
2012-03-27 Apple (bag) 2 9.02 0.54 9.56 ...
2012-03-28 Candy bar 5 7.50 1.43 8.93 ...
Anatomy of a columnstore index
• Columnstore index
Saledate
2012-03-08
2012-03-10
2012-03-11
2012-03-12
2012-03-19
2012-03-20
2012-03-20
2012-03-20
2012-03-21
2012-03-24
2012-03-27
2012-03-28
1 m
illio
n r
ow
ch
un
ks
Storage inLOB pages
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Saledate Product
2012-03-08 Candy bar
2012-03-10 Smart phone
2012-03-11 Apple (bag)
2012-03-12 Smart phone
2012-03-19 Chair
2012-03-20 Chair
2012-03-20 Laptop
2012-03-20 Toy car
2012-03-21 Apple (bag)
2012-03-24 Pocket knife
2012-03-27 Apple (bag)
2012-03-28 Candy bar
Anatomy of a columnstore index
• Nonclustered columnstore index
12
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
13
Saledate Product Amt
2012-03-08 Candy bar 50
2012-03-10 Smart phone 1
2012-03-11 Apple (bag) 7
2012-03-12 Smart phone 1
2012-03-19 Chair 1
2012-03-20 Chair 3
2012-03-20 Laptop 2
2012-03-20 Toy car 3
2012-03-21 Apple (bag) 14
2012-03-24 Pocket knife 1
2012-03-27 Apple (bag) 2
2012-03-28 Candy bar 5
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
14
Saledate Product Amt GrossPrice
2012-03-08 Candy bar 50 75.00
2012-03-10 Smart phone 1 349.50
2012-03-11 Apple (bag) 7 31.57
2012-03-12 Smart phone 1 349.50
2012-03-19 Chair 1 599.50
2012-03-20 Chair 3 1,798.50
2012-03-20 Laptop 2 2,860.00
2012-03-20 Toy car 3 29.97
2012-03-21 Apple (bag) 14 63.14
2012-03-24 Pocket knife 1 12.95
2012-03-27 Apple (bag) 2 9.02
2012-03-28 Candy bar 5 7.50
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
15
Saledate Product Amt GrossPrice SalesTax
2012-03-08 Candy bar 50 75.00 14.25
2012-03-10 Smart phone 1 349.50 66.41
2012-03-11 Apple (bag) 7 31.57 1.89
2012-03-12 Smart phone 1 349.50 66.41
2012-03-19 Chair 1 599.50 113.91
2012-03-20 Chair 3 1,798.50 341.72
2012-03-20 Laptop 2 2,860.00 543.40
2012-03-20 Toy car 3 29.97 5.69
2012-03-21 Apple (bag) 14 63.14 3.79
2012-03-24 Pocket knife 1 12.95 2.46
2012-03-27 Apple (bag) 2 9.02 0.54
2012-03-28 Candy bar 5 7.50 1.43
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Anatomy of a columnstore index
• Nonclustered columnstore index
16
Saledate Product Amt GrossPrice SalesTax NetPrice
2012-03-08 Candy bar 50 75.00 14.25 89.25
2012-03-10 Smart phone 1 349.50 66.41 419.91
2012-03-11 Apple (bag) 7 31.57 1.89 33.46
2012-03-12 Smart phone 1 349.50 66.41 419.91
2012-03-19 Chair 1 599.50 113.91 713.41
2012-03-20 Chair 3 1,798.50 341.72 2,140.22
2012-03-20 Laptop 2 2,860.00 543.40 3,403.40
2012-03-20 Toy car 3 29.97 5.69 35.66
2012-03-21 Apple (bag) 14 63.14 3.79 66.93
2012-03-24 Pocket knife 1 12.95 2.46 15.41
2012-03-27 Apple (bag) 2 9.02 0.54 9.56
2012-03-28 Candy bar 5 7.50 1.43 8.93
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
An Aside…How CPUs Work
Level 3 Cache (Megabytes)
Level 2 Cache (100s Kilobytes)
L1 Data (32KB)
CPU Core
L1 Instr(32KB)
• Modern CPUs have Multiple Cores
• Cache Hierarchies: L1, L2, L3– Small L1 and L2 per core; L3 shared by all cores on die
– L1 is faster than L2, L2 faster than L3
– CPUs can stall waiting for caches to load
Level 2 Cache (100s Kilobytes)
L1 Data (32KB)
CPU Core
L1 Instr(32KB)
Time to Access Increases each level you need to touch!
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Batch Model
• Move from “pull” model to “push”
• Group rows into batches– Re-use instructions while in cache
– Touch all “close” data in each operator
• This model reduces L2 cache misses
• It works best for queries with lots of
rows being processed
Project
Filter
Table Scan
ProcessBatch()
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
…
C1 C2 C3 C5C4
Benefits:• Improved compression:
Data from same domain compress better
• Reduced I/O:Fetch only columns needed
• Improved Performance:More data fits in memory
Data stored as rows
Columnstore Refresher = > how is it different?
Data stored as columns
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
ColumnStore Terminology
C1 C2 C3 C5 C6C4
Row Group
Column Segment
• Column Segment– contains values from one column for a set of rows
• Row Group– Segments for the same set of rows comprise a row group
• Segments are compressed
• Each segment stored in a separate LOB
• Segment is unit of transfer between disk and memory
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
First – quick recap of the CCI
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
First – quick recap of the CCI
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Columnstore Index – segment elimination
SELECT ProductKey, SUM (SalesAmount) FROM dbo.FactInternetSalesWHERE OrderDateKey < 20101108GROUP BY ProductKey
Column elimination
Segm
ent
elim
inat
ion
OrderDateKey
20101107
20101107
20101107
20101107
20101107
20101108
ProductKey
106
103
109
103
106
106
StoreKey
01
04
04
03
05
02
RegionKey
1
2
2
2
3
1
Quantity
6
1
2
1
4
5
SalesAmount
30.00
17.00
20.00
17.00
20.00
25.00
OrderDateKey
20101108
20101108
20101108
20101109
20101109
20101109
ProductKey
102
106
109
106
106
103
StoreKey
02
03
01
04
04
01
RegionKey
1
2
1
2
2
1
Quantity
1
5
1
4
5
1
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How to load data to the CCI and not get into the troubles
Initial situation: Table is a Heap
– (1) Use INSERT .... SELECT and then create CCI
– (2) Use BULK LOAD and then create CCI
– (3) Use SELECT * INTO and then create CCI
Initial situation: Table already has a CCI
– (1) Use INSERT .... SELECT
– (2) Use BULK LOAD
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• The „Magic Number” described by Niko
Neugebauer – 102400
• There is also another magic number: 1048576
How to load data to the CCI – BONUS!!!
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How to use the index
• Don’t use it in OLTP scenario – but WHY NOT????
• Update or Insert + Delete?
• What about transaction support?
• Partitioning
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
How to maintain the index
• Tupple mover revealed
• Reorganize or rebuild the index ?
• Extended events – great monitoring „tool”
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• How to make use of the DBCC commands for the CCI ?
• Where is my memory?
• What about memory grants?
• What about memory pressure?
• What about the transaction log usage?
Internals
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Resources
• Niko Neugebauer: http://www.nikoport.com/columnstore/
• Benjamin Nevarez: http://www.benjaminnevarez.com/
• Paul White: http://sqlblog.com/blogs/paul_white/
• Remus Rusanu: http://rusanu.com/
• Hugo Kornelis: http://sqlblog.com/blogs/hugo_kornelis/
• Joe Sack: http://www.sqlskills.com/blogs/joe
• Sunil Agarwalhttp://blogs.msdn.microsoft.com/sqlserverstorageengine
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Q & A