+ All Categories
Home > Documents > Informix Chat with the Labs

Informix Chat with the Labs

Date post: 03-Jan-2016
Category:
Upload: elvis-kirby
View: 38 times
Download: 2 times
Share this document with a friend
Description:
Unlocking the Mysteries Behind Update Statistics. John F. Miller III. STSM. Informix Chat with the Labs. Throw dice, how many will be 1?. The Dice Problem. How many dice are you throwing? How many sides does each dice have? Are all the dice the same?. Questions about the Dice. - PowerPoint PPT Presentation
37
© IBM Corporation 2006 1 Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM
Transcript
Page 1: Informix Chat with the Labs

© IBM Corporation 2006

1

Informix Chat with the Labs

John F. Miller III

Unlocking the Mysteries Behind Update Statistics

STSM

Page 2: Informix Chat with the Labs

© IBM Corporation 2006

2

The Dice Problem

• Throw dice, how many will be 1?

Page 3: Informix Chat with the Labs

© IBM Corporation 2006

3

Questions about the Dice

• How many dice are you throwing?

• How many sides does each dice have?

• Are all the dice the same?

The better the information,the more accurate the estimate.

Page 4: Informix Chat with the Labs

© IBM Corporation 2006

4

What does Update Statistics do?

• Collects information for the optimizer– Statistics LOW– Distributions MEDIUM & HIGH

• Drop Distributions

• Compile stored procedures

Page 5: Informix Chat with the Labs

© IBM Corporation 2006

5

Statistics Collected

• systables• systables • syscolumns• syscolumns• sysindexes• sysindexes

Number of Rows

Number of pages to store the data

Second largest value for a column

Second smallest value for a column

# of unique values for the lead key

How highly clustered the values for the lead key

Page 6: Informix Chat with the Labs

© IBM Corporation 2006

6

Update Statistics LowBasic Algorithm

• Walk the leaf pages in each index

• Submit btree cleaner requests when deleted items are found causing re-balancing of indexes

• Collects the following information– Number of unique items– Number of leave pages– How clustered the data is– Second highest and lowest value

Page 7: Informix Chat with the Labs

© IBM Corporation 2006

7

--- DISTRIBUTION --- ( -11: ( 868317, 70, 75)2: ( 868317, 24, 100)3: ( 868317, 12, 116)4: ( 868317, 30, 147)5: ( 868317, 39, 194)6: ( 868317, 28, 222) --- OVERFLOW ---1: ( 779848, 43)2: ( 462364, 45)

How to Read Distributions

To get the range of values look at the highest value in the previous bin.

# of rows represented in this bin

# of unique values

Highest Value in this bin

# of rows for this value

The value

Page 8: Informix Chat with the Labs

© IBM Corporation 2006

8

Example - Approximating a Value

--- DISTRIBUTION --- ( -11: ( 868317, 70, 75)2: ( 868317, 24, 100)3: ( 868317, 12, 116)4: ( 868317, 30, 147)5: ( 868317, 39, 194)6: ( 868317, 28, 222) --- OVERFLOW ---1: ( 779848, 43)2: ( 462364, 45)

• There are 868317 rows containing a value between -1 and 75

• There are 70 unique values in this range

• The optimizer will deduce 868317 / 70 = 12,404 records for each value between -1 and 75

Page 9: Informix Chat with the Labs

© IBM Corporation 2006

9

Example - Dealing with Data Skew

--- DISTRIBUTION --- ( -11: ( 868317, 70, 75)2: ( 868317, 24, 100)3: ( 868317, 12, 116)4: ( 868317, 30, 147)5: ( 868317, 39, 194)6: ( 868317, 28, 222) --- OVERFLOW ---1: ( 779848, 43)2: ( 462364, 45)

• Data skew• For the value 43 how many

records will the optimizer estimate will exist?

• Answer 779848 values• Any value that exceeds 25%

of the bin size will be placed in an overflow bin

Page 10: Informix Chat with the Labs

© IBM Corporation 2006

10

Basic Algorithm for Distributions

• Develop scan plan based on available resources

• Scan table – High = All rows– Medium = Sample of

rows• Sort each column

• Build distributions• Begin transaction

– Delete old columns distributions

– Insert new columns distributions

• Commit transaction

Page 11: Informix Chat with the Labs

© IBM Corporation 2006

11

Sample Size

• HIGH– All rows in the table

• Medium– Misconception about the number of

rows sampled is based on the number of rows in the table, this is incorrect.

– The number of samples depends on the Confidence and Resolution.

– If the sample size is greater than the number of row in the table Medium turns into High mode

Page 12: Informix Chat with the Labs

© IBM Corporation 2006

12

Update Statistics Medium Sample Size

Resolution Confidence Samples

2.5 .95 2,963

2.5 .99 4,273

1.0 .95 18,516

1.0 .99 26,569

0.5 .95 74,064

0.5 .99 106,276

Resolution Confidence Samples

0.25 .95 296,255

0.25 .99 425,104

0.1 .95 1,851,593

0.1 .99 2,656,900

0.05 .95 7,406,375

0.05 .99 10,627,600

Page 13: Informix Chat with the Labs

© IBM Corporation 2006

13

How Much Information is Enough??

The better the information,the more accurate the estimate.

Page 14: Informix Chat with the Labs

© IBM Corporation 2006

14

Examining the Running QueryNo Statistics VS Medium Statistics

No StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 20888Estimated # of Rows Returned: 6760

1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200

No StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 20888Estimated # of Rows Returned: 6760

1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200

Medium StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 21Estimated # of Rows Returned: 19

1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250

Medium StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 21Estimated # of Rows Returned: 19

1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250

Overall performance improved The estimates were more accurate

The query plan changed

Page 15: Informix Chat with the Labs

© IBM Corporation 2006

15

Examining the Running QueryMedium Statistics VS High Statistics

High StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 33Estimated # of Rows Returned: 30

1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

High StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 33Estimated # of Rows Returned: 30

1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

Overall performance did not change The estimates were slightly more accurate

The query plan did not change

Medium StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 21Estimated # of Rows Returned: 19

1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

Medium StatisticsQUERY:------select * from t1 where c1 > 20200

Estimated Cost: 21Estimated # of Rows Returned: 19

1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250

Page 16: Informix Chat with the Labs

© IBM Corporation 2006

16

Version of Update Statistics Improvements

• All version of 9.40 and 10.00

• 9.30.UC3

• 9.21 Not fixed

• 7.31.UD2

Page 17: Informix Chat with the Labs

© IBM Corporation 2006

17

• Update statistics can not allocated memory between 4MB and 100MB of sort memory– The default has been raised from 4MB to 15MB– User can now configure the amount of memory

• Use DBUPSPACE has been augmented to include memory• Format of DBUPSPACE

– {max disk space}:{default memory} – To increase the memory to 35 MB, set DBUPSPACE=0:35.

• Allow update statistics to use light scans when scanning a a table – Implemented light scans– Set oriented reads

Improvements in Update Statistics

Page 18: Informix Chat with the Labs

© IBM Corporation 2006

18

Improvements in update statistics

• Information about building data distributions is not viewable by the DBA– Set explain will now print the scan path and resource

usage when building data distributions• Update statistics low on fragmented tables does not

run in parallel– With PDQ turned on each index fragment will be

scanned in parallel– PDQ at 1 means 10% of the index fragments

scanned in parallel, while PDQ at 10 means all the index fragments will be scanned in parallel

Page 19: Informix Chat with the Labs

© IBM Corporation 2006

19

Improvements in Update Statistics

• Various errors (126, 312, 100,…) when executing update statistics – Errors when trying to insert the distributions because set lock mode

to wait was not handled properly inside update statistics

• Range scanning a fragmented index is slow Replace the next loop merge with a binary search merge when ordering items from index fragments – Most noticeable when the number of fragments in an index is large

Page 20: Informix Chat with the Labs

© IBM Corporation 2006

20

Update Statistics Medium Memory Requirements

Confidence .99

Resolution Row Size 2.5 2.0 1.5 1.0

100 .96 MB 1.2 MB 1.8 MB 3.5 MB

200 1.3 MB 1.9 MB 3.0 MB 6.1 MB

300 1.8 MB 2.5 MB 4.2 MB 8.7 MB

400 2.2 MB 3.2 MB 5.3 MB 11.3 MB

500 2.6 MB 3.9 MB 6.4 MB 13.9 MB

Confidence .99

Resolution Row Size 2.5 2.0 1.5 1.0

600 3.0 MB 4.5 MB 7.6 MB 16.5 MB

700 3.4 MB 5.1 MB 8.7 MB 19.1 MB

800 3.8 MB 5.8 MB 9.9 MB 21.7 MB

900 4.3 MB 6.4 MB 11.1 MB 24.2 MB

1000 4.7 MB 7.1 MB 12.2 MB 26.9 MB

Page 21: Informix Chat with the Labs

© IBM Corporation 2006

21

Update Statistics High Memory Requirements

• In memory sort– Approximate Memory = number of rows *

sum(column widths + 2 * sizeof(pointer) )

Page 22: Informix Chat with the Labs

© IBM Corporation 2006

22

Memory Rules

• Estimated Update Stats memory is below 100MB – Hard coded limit of 4MB– Attempts to minimize the scans by fitting as many

columns into 4MB• Estimated Update Stats memory is above 100MB

– Memory is requested from MGM– Attempt to minimize the scans by fitting as many

columns in the MGM memory

Page 23: Informix Chat with the Labs

© IBM Corporation 2006

23

Examples

• Customer TableCust_id integer

Fname char(50)

Lname char(50)

Address1 char(200)

Address2 char(200)

State char(2)

zipcode integer

• Number of Rows 500,000

Page 24: Informix Chat with the Labs

© IBM Corporation 2006

24

ExamplesMemory for Incore Sort

Column Data Type Size Sort Memory

Cust_id Integer 4 bytes 2 MB

Fname Char(50) 50 bytes 25 MB

Lname Char(50) 50 bytes 25 MB

Address1 Char(200) 200 bytes 100 MB

Address2 Char(200) 200 bytes 100 MB

State Char(2) 2 bytes 1 MB

Zipcode Integer 4 bytes 2 MB

Page 25: Informix Chat with the Labs

© IBM Corporation 2006

25

ExamplesNumber of Table Scans

PDQPRIORITY 0 PDQPRIORITY 0With 100 MB of Memory

Scan #1 Cust_idState

Scan #1 Cust_id FnameLname StateZipCode

Scan #2 Fname Scan #2 Address1

Scan #3 Lname Scan #3 Address2

Scan #4 Address1

Scan #5 Address2

Scan #6 ZipCode

Page 26: Informix Chat with the Labs

© IBM Corporation 2006

26

Confidence

• A factor in the number of samples used by update statistics medium

Page 27: Informix Chat with the Labs

© IBM Corporation 2006

27

Resolution

• Percentage of data that is represented in a distribution bin

• Example– 100,000 rows in the table– Resolution of 2%– Each bin will represent 2,000 rows

Page 28: Informix Chat with the Labs

© IBM Corporation 2006

28

Example

• Following Example– Table size 215,000 rows

– Row size 445 bytes

– Uniprocessor

Page 29: Informix Chat with the Labs

© IBM Corporation 2006

29

Example of the current update statistics

Table: jmiller.t9

Mode: HIGH

Number of Bins: 267 Bin size 1082

Sort data 101.4 MB

Sort memory granted 4.0 MB

Estimated number of table scans 10

PASS #1 c9

PASS #2 c5

PASS #3 c7

PASS #4 c6

…..

PASS #10 c4

Completed pass 1 in 0 minutes 24 seconds

Completed pass 2 in 0 minutes 20 seconds

Completed pass 3 in 0 minutes 17 seconds

Completed pass 4 in 0 minutes 17 seconds

Completed pass 5 in 0 minutes 17 seconds

Completed pass 6 in 0 minutes 15 seconds

Completed pass 7 in 0 minutes 14 seconds

Completed pass 8 in 0 minutes 15 seconds

Completed pass 9 in 0 minutes 16 seconds

Completed pass 10 in 0 minutes 14 seconds

Total Time 146 seconds

Page 30: Informix Chat with the Labs

© IBM Corporation 2006

30

The New Defaults

Completed pass 1 in 0 minutes 34 seconds

Completed pass 2 in 0 minutes 19 seconds

Completed pass 3 in 0 minutes 16 seconds

Completed pass 4 in 0 minutes 14 seconds

Completed pass 5 in 0 minutes 15 seconds

Total Time 98 secondsNew Memory Default

Table: jmiller.t9

Mode: HIGH

Number of Bins: 267 Bin size 1082

Sort data 101.4 MB

Sort memory granted 15.0 MB

Estimated number of table scans 7

PASS #1 c9,c8,c10,c5,c7

PASS #2 c6,c1

PASS #3 c3

PASS #4 c2

PASS #5 c4

Page 31: Informix Chat with the Labs

© IBM Corporation 2006

31

Enabling PDQ with Update Statistics

Table: jmiller.t9Mode: HIGHNumber of Bins: 267 Bin size 1082Sort data 101.4 MB PDQ memory granted 106.5 MBEstimated number of table scans 1PASS #1 c1,c2,c3,c4,c5,c6,c7,c8,c9,c10Index scans disabledLight scans enabledCompleted pass 1 in 0 minutes 29

seconds Total Time 29 seconds

PDQ Memory

Features Enabled

Page 32: Informix Chat with the Labs

© IBM Corporation 2006

32

Tuning with the New Statistics

• Turn on PDQ when running update statistics, but only for tables– Avoid PDQ when updating statistics for procedures

• When running high or medium increase the memory update statistics has to work with

• Enable parallel sorting (i.e. PSORT_NPROCS)

Page 33: Informix Chat with the Labs

© IBM Corporation 2006

33

Considerations

• Change the RESOLUTION to 1.5– Increasing the number of bins for the distributions

– Increasing the sample size for update statistics medium

Page 34: Informix Chat with the Labs

© IBM Corporation 2006

34

Old Recommendations

• Start one update statistics for each column of a table

Fname Lname Address

Three sequential

scans of the table

Page 35: Informix Chat with the Labs

© IBM Corporation 2006

35

New Recommendations

• Start one update statistics for ALL columns giving it more resources (memory)

• Requires only one scan of the table to produce distributions on several columns.

Fname Lname Address

One scans of the table

Page 36: Informix Chat with the Labs

© IBM Corporation 2006

36

Other Information

• An Overview of the IBM Informix Dynamic Server Optimizer www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0211desai/

0211desai.html

• Understanding and Tuning Update Statistics www.ibm.com/developerworks/db2/zones/informix/library/techarticle/miller/

0203miller.html

• Predicate Inference in Informix Dynamic Server www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0206goswami/

0206goswami.html

• IBM Informix Performance Manual

• IBM Informix SQL Reference Manual

Page 37: Informix Chat with the Labs

© IBM Corporation 2006

37

Questions


Recommended