Short Queries and Indexes
Local Adaptation from: Henrietta Dombrovskaya, Boris Novikov
“System Tuning” Saint Petersburg, Russia 2006
Henrietta Dombrovskaya – Enova Financial
What will be covered:
• Which queries are considered short.• Short queries and indexes• Indexes and mass data update• Choosing selection criteria• Excessive selection conditions• How to avoid using indexes• Joins order• Impact of indexes on nested loops• Other index types
2012
Henrietta Dombrovskaya – Enova Financial
Which Queries are Considered Short?
• The query is considered short, when result can be obtained processing the small number of records, even if the tables are large.
• For short queries sorting, grouping, and even joins are not time consuming.
• Optimization goal for short queries: to avoid full scan of large tables (for small tables full scan may be still OK)
• Typically for short queries tuning is necessary to improve throughput, not response time – the user does not care whether it takes 50 ms or 150 ms.
2012
Henrietta Dombrovskaya – Enova Financial
Short Queries and Indexes
• If we want to avoid full scan, some sort of index for the table should exist
• Any index is a part of the database schema; we can create new indexes only if schema changes are allowed. Sometimes they can’t be implemented right away, and sometimes delayed index creation may be almost impossible
• Be aware of potential implication of index creation on other queries and data modifications
2012
Henrietta Dombrovskaya – Enova Financial
Indexes and Bulk Data Update
drop index index_1;drop index index_2;…Bulk INSERT …
create index index_1(….);create index index_2(….);….
2012
Henrietta Dombrovskaya – Enova Financial
Choosing Selection Criteria
• Index selectivity• Unique indexes• Selection criteria and indexes• Compound indexes• Using index for data retrieval
2012
Henrietta Dombrovskaya – Enova Financial
Index Selectivity
• Do not use an index with small number of distinct values (exception - bitmap indexes, where applicable)
• Index usage order: make sure that the index with the highest selectivity level will be used first
2012
Henrietta Dombrovskaya – Enova Financial
Index Selectivity - Exampleselect customer_id from customer_sources cs where incoming_brand_id
=11 AND type_cd IN ('lead_reject_import', 'import',
'pass_active_customer') AND cs.received_time > current_date - interval '24 hours' AND cs.received_time < current_timestamp - interval ' 15
minutes' AND cs.source_type_cd not in
('yesloans1stgbi','yesloansgbi','noworries1stgbi','noworriesgbi','aspiregbi','aspire-cpfgbi')
We have indexes for type_cd, source_type_cd and received_time, the optimizer may choose condition with =, while in this case received_time has higher selectivity level.
2012
Henrietta Dombrovskaya – Enova Financial
Unique Indexes
• If a column is described as a primary key, the unique index will be created automatically
• You may need extra unique constraints for data integrity purposes (each UNIQUE constraint will generate unique index, too)
• Unique indexes make nested loops efficient.
2012
Nested LoopsFOR row1 in table1 loop For row2 in table2 loop If match(row1,row2) insert output row end loopend loop
• When to use: joins and products
• Cost: proportional (Size_of_T1)X(Size_of_T2)
5863413
312
1
Table 1 Table 2
Henrietta Dombrovskaya – Enova Financial
Selection Criteria and Indexes – Columns Transformation
• Any column transformations will prevent from using an index:
where lower(last_name)=‘smith’• If we need to search by transformed value, we need
to create additional index:CREATE INDEX people_m13 ON private.people USING btree (lower(last_name::text));
2012
Henrietta Dombrovskaya – Enova Financial
Selection Criteria and Indexes- Using like Operator
• Using like operator:– WHERE (lower(people.first_name) like E'chaman%‘ will not use the indexPossible rewriting, if for some reason we can’t create an
index:– WHERE (lower(people.first_name) >=E'chaman' and
lower(people.first_name) <E'chamam') will use the index
2012
Henrietta Dombrovskaya – Enova Financial
Pattern Indexes
CREATE INDEX people__last_name_pattern ON private.people (lower(last_name::text) text_pattern_ops);
2012
Henrietta Dombrovskaya – Enova Financial
Compound Indexes• Compound index is build for several columns of one table.Note: if an index was built for columns (X,Y,Z), you can use it
to search X, XY and XYZ, not Y and not YZ. Postgres : equality on leading columns, inequality on other columns, which will still be scanned, but may save additional trip to the table.
• Why to create compound index?– Additional selectivity– Additional data storage
• Index-organized tables – not currently available in Postgres
2012
Henrietta Dombrovskaya – Enova Financial
Using Indexes for Data RetrievalWhen all the columns from select statement are included
into compound index, they may be retrieved without accessing the table.
Example:
CREATE INDEX loans_m2 ON cnu.loans (customer_id, funding_date, status_cd);
SELECT funding_date, status_cd FROM loans WHERE customer_id =1111111
2012
Henrietta Dombrovskaya – Enova Financial
Using Multiple Indexes in Postgres
Postgres can use the search results from multiple indexes by creating a bitmap of matching rows in main memory and then OR-ing or AND-ing themIn this case the records will be scanned in the physical order, so the index-based ordering will be lost. Usage of compound indexes vs. sets of single-column indexes should be justified on case-by-case basis.
2012
Henrietta Dombrovskaya – Enova Financial
Excessive Selection Criteria
We can add redundant selection criteria to pre-select small records subset from the big table:– prompt to use specific indexes – reduce the sizes of join arguments.
2012
Henrietta Dombrovskaya – Enova Financial
Excessive Selection Criteria – Example
SELECT <…> FROM reports r, expense_items iWHERE r.rep_id=i.rep_id AND (r.proc_date=’1-jul-2003’ AND i.charge<0 OR r.proc_date=’15-jun-2003 AND i.charge=0 );
In this case the complex selection criteria can’t be applied before the tables are joined. (expense_items may contain over several million records)
2012
Henrietta Dombrovskaya – Enova Financial
Excessive Selection Criteria – Example (2)
We will modify this select statement the following way:
SELECT <…>FROM reports r,
expense_items IWHERE r.rep_id=i.rep_idAND (r.proc_date IN (‘1-jul-2003’, ’15-jun-2003’) )AND (r.proc_date=’1-jul-2003’ AND i.charge<0 OR r.proc_date=’15-jun-2003’ AND i.charge=0 );
This will allow to restrict reports to the ones processed on Jul 1 and Jul 15, and then join only those reports with expense_items.
2012
Henrietta Dombrovskaya – Enova Financial
How to Avoid Using Indexes
• Use system-specific mechanisms, like optimizer hints
• When optimizer hints can’t be used - modify selection criteria.
Examples:– attr1+0=p_value– COALESCE (t1.attr2, 0)=t2.attr2
2012
Henrietta Dombrovskaya – Enova Financial
Order of Joins
In short queries the size of join result may be small because of:– Restrictions on the joined tables (reduce the
number of records in join arguments)– Semi-join (one argument significantly restricts the
result size)If the optimizer generates an execution plan
with large intermediate results size, the order of joins should be changed
2012
Henrietta Dombrovskaya – Enova Financial
Order of Joins - ExampleSELECT DISTINCT c.* FROM customers c LEFT OUTER JOIN loans l on l.customer_id = c.id INNER JOIN customer_sources cs ON cs.id = (SELECT MAX(id) FROM customer_sources cs1 WHERE cs1.customer_id = c.id AND incoming_brand_id = c.brand_id AND type_cd IN ('lead_reject_import', 'import', 'pass_active_customer')) LEFT OUTER JOIN work_items w ON w.customer_id = c.id and w.created_by in ('generate_ast_workitems','generate_ast_workitems_instl') and w.created_on > current_date - interval '24 hours‘WHERE
c.brand_id = 11AND c.status_cd = 'active'AND c.fraud_flg = FALSEAND cs.received_time > current_date - interval '24 AND cs.received_time < current_timestamp - interval ' 15 minutes' AND cs.source_type_cd not in
('yesloans1stgbi','yesloansgbi','noworries1stgbi','noworriesgbi','aspiregbi','aspire-cpfgbi')
2012
Henrietta Dombrovskaya – Enova Financial
Impact of Indexes on Nested Loops• For small queries nested loops will be efficient, when
the inner table is indexed by join attribute• No transformation should be applied to the indexed
attribute.• Note: this may not work for long queries
2012
Henrietta Dombrovskaya – Enova Financial
Other Index Types• Function-based indexes - use when the queries WHERE clause often
contains the same attribute(s) transformation(like end_date - start_date)
• Bitmap indexes - use for combination on several low-cardinality attributes, mostly for Data Warehouse type applications.
• Partial indexes:CREATE INDEX payment_transactions_m16c ON cnu.payment_transactions_committed (credit_account_cd, status_cd, eff_date, payment_method_cd) WHERE status_cd::text = 'created'::text AND payment_method_cd::text = 'bank_account_ach'::text AND (credit_account_cd::text = 'disbursement_account'::text OR credit_account_cd::text = 'cso_disbursement_account'::text);
2012
Henrietta Dombrovskaya – Enova Financial
More Complex Index ExampleCREATE INDEX customers__email_alt_pattern_idx ON cnu.customers (lower(email_alt::text) text_pattern_ops) WHERE email_alt IS NOT NULL;
Selection criteria example:
lower(email) like ‘johns%' OR (email_alt IS NOT NULL AND lower(email_alt) like ‘johns%')
This condition will use bitmap OR’ing.
2012