+ All Categories
Home > Documents > L7 DBPerformance QO

L7 DBPerformance QO

Date post: 23-Nov-2015
Category:
Upload: ashi-sharma
View: 22 times
Download: 1 times
Share this document with a friend
Description:
cfff
Popular Tags:
34
Database Performance Issues & Query Optimisation
Transcript
  • Database Performance Issues & Query Optimisation

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Topic & Structure of LessonIn this lecture we will look at:Database performance issuesIndexing Steps involved in query optimization Already covered most of thisTips for SQL performance tuning An overview of distributed query processingSlide 2 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Learning OutcomesBy the end of this lesson you should be able to:Discuss database performance issuesExplain how a query can be optimizedUse tips for tuning SQLSlide 3 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Introduction to Performance TuningYou need an understanding/awareness of:Transforming models to implementationThe relational model (for a number of reasons)Technology factorsSlide 4 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Introduction to Performance TuningDatabase design starts with modeling the requirement and producing a conceptual model In implementation, a number of trade-offs need to be considered to:Satisfy todays needs for informationSatisfy the above in reasonable time (performance requirements)Satisfy anticipated or unanticipated user demands, e.g. ad-hoc queriesBe capable of being extendedBe easy to modify in changing hardware & software environmentsSlide 5 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    TuningApplication (Analyst/Programmer)DBA - Systems Level TuningVendor - Product specificInvestigationMonitoring DB Statistics e.g.. logs, transaction timesSimulationDescribes how the system evaluates the queryExplain facility (Oracle) Execution Plan facility (SQL Server)Slide 9 (of 37)

  • Indexing

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999Tuning: Table IndexingIndex is a means of expediting retrieval of datae.g. Find all students with gpa > 3.3May need to scan entire table

    Index enables finding data quickly without having to scan the whole table

    Indexes are built on a column(s) (search key)1 column or a combination of columns of a tableBy default, primary key field is indexedFields marked as 'unique' are indexed

    Index consists of a set of entries pointing to locations of each search key

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999B-Tree Index ExampleCommonly used with attribute tables as well as graphic-attribute tables (CAD data structures)Binary coding reduces the search list by streaming down the tree.A balanced tree is best.

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999IndexIndexes take up storage choose carefullyOn popular 'queryed' columnsCan be added after understanding query patternsWHERE condition should be tuned to take advantage of the indexesRethink: attributes with a high badness factor, e.g. genderFull table scan may be better if hit rate > 20%

    Indexes need 'maintainance'Indexes need to be periodically refreshedIndexes are normally refreshed during downtimeIndexes arent technically necessary for operationIndexes must be maintained by DB administratorPruned, refreshed, analyzed...

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999Create IndexCREATE INDEX index_name ON table_name (column_name)CREATE INDEX IDX_CUSTOMER_LAST_NAMEON CUSTOMER (Last_Name)CREATE INDEX IDX_CUSTOMER_LOCATIONON CUSTOMER (City, Country)

    CREATE TABLE employee_records (name VARCHAR(50), employeeID INT, INDEX (employeeID))

    ALTER DROP

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999Types of IndexesClustered vs. UnclusteredClustered- ordering of data records same as ordering of data entries in the indexUnclustered- data records in different order from indexPrimary vs. SecondaryPrimary index on fields that include primary keySecondary other indexesUnique vs. Non-uniqueNon-unique e.g. Lastname

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999Example: Clustered IndexSorted by sid

    sidnamegpa50000Dave3.353650Smith3.853666Jones3.453688Smith3.253831Madayan1.853832Guldu2.0

    500005360053800

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999Example: Unclustered IndexSorted by sid but Index on gpa

    sidnamegpa50000Dave3.353650Smith3.853666Jones3.453688Smith3.253831Madayan1.853832Guldu2.0

    1.82.03.23.33.43.8

    Database Management Systems - TMC Computer School, August 1999

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Database Management Systems - TMC Computer School, August 1999Using Indexes- Need to choose attributes to index wisely!- Examine transaction requirements- Update V Query?- Volume of rows updated or queried- Frequency of a query- Transaction rates- User priorities- Index usage (Oracle)- Not used with Nulls in the WHERE clause- Nor used if mathematics is used with the indexed attribute- What queries could benefit most from an index?

    Database Management Systems - TMC Computer School, August 1999

  • Query Optimization

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Query OptimizationObjective: Find the optimum set of access paths to retrieve the required dataApplies to updates and queriesReasons for automating the optimization process:Machine can use more informationRe-optimization easier following data re-organizationOptimizer can evaluate more solutions than a non-automated process (i.e the user)Automation makes expertise more widely available Still very dependent on programmer skills Query syntax dramatically affects access path choicese.g whether or not an index is usedSlide 16 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Query OptimizationStill scope for human intervention Still very dependent on programmer skills because query syntax dramatically affects the access path choicese.g whether or not an index is usedThe term optimization can be an over claimSlide 17 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    ExampleGet names of suppliers who supply part P2:Select distinct s.name from s,sp where s.s# = sp.s# and sp.p = p2The database contains 100 suppliers and 10,000 shipments, 50 of which supply p2Consider how to evaluate the query without optimization?Slide 18 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Unoptimized1 Compute cartesian product of s and spinvolves reading the 10,000 sp tuples 100 times resulting in 1,000,000 tuple readsProduct will contain 1,000,000 tuples which will need to be written back to disk2 Apply restriction in the where clause involves 1,000,000 tuple reads but gives a 50 row result which can stay in memory3 Project the result of step 2 over sname to give the final result, containing at most 50 tuples, which again can remain in memorySlide 19 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Optimized1 Restrict SP to those tuples containing p2involves 10,000 tuple reads but the results has 50 rows which stay in memory2 Join the result of step 1 to relation S over s#100 tuple reads and results in 50 tuples, still in memory3 Project the result of step 2 over sname to give a final result of 50 tuplesOptimized version is about 300 times faster in terms of tuple I/O.Unoptimized version needs about 3,000,000 I/Os whereas the optimized version need around 10,100A restriction followed by a join instead of a product then the restriction has produced a dramatic improvementSlide 20 (of 37)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Example: IndexingIf SP was indexed or hashed on p# the tuples read in step 1 would be 50 rather than 10,000 and optimised version would be around 20,000 times fasterAlso, I/Os in step 2 to at most 50in practice, block I/O are what count.Slide 21 (of 37)

  • SQL Tuning

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    SQL Optimizations: Basics- Use column names instead of * in SELECT- Try to minimize the number of subquery block in your query- Try to use UNION ALL in place of UNION- Avoid != or NOT or Unable to use indexes even if 1 exists- Avoid DISTINCT- Avoid HAVING: e.g. Write the query asSELECT subject, count(subject) FROM student_detailsWHERE subject != 'Science' AND subject != 'Maths'GROUP BY subject; Instead of:SELECT subject, count(subject) FROM student_detailsGROUP BY subject HAVING subject!= Science' AND subject!= Maths';

    http://beginner-sql-tutorial.com/sql-query-tuning.htm

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    - Use IN instead of OR for non-indexed columnBetter: .... WHERE IN (val1, val2, val3) Than: .... WHERE = val1 OR = val2... - Use LIMIT 1 or EXISTS instead of IN or TOP- Preferably use 'Prefix' pattern matches- Avoid use of functions in Left-side of comparisonBetter: ...WHERE first_name LIKE 'Chan%';Than: WHERE SUBSTR(first_name,1,3) = 'Cha';- Avoid functions on indexed columns Better: WHERE event_date >= '2011/03/15' - INTERVAL 7 DAYSThan: WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(event_date) get performance benefit

    - Does not need to be 're-parsed' each time- Protects application against SQL injection attacks- (In MySQL) Transmitted in a native binary form -> more efficient & help reduce network delays- BUT cannot be used by query cache (in MySQL)

    PreparedStatement updateemp = connection.prepareStatement ("insert into emp values(?,?)"); updateemp.setInt(1,23); updateemp.setString(2,"Roshan"); updateemp.executeUpdate();

    http://www.roseindia.net/jdbc/jdbc-mysql/TwicePreparedStatement.shtml

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Demos & WalkthrusView queries executedCheck the query planChange query and re-executehttp://www.mysql.com/products/enterprise/demo.html

    View Query Execution Planhttp://www.codeproject.com/Articles/9990/SQL-Tuning-Tutorial-Understanding-a-Database-Execu

    How to find if a query is worth optimizing?http://www.mysqlperformanceblog.com/2012/09/11/how-to-find-mysql-queries-worth-optimizing/

    Slow Query Log and indexing walkthruhttp://www.dreamhost.com/dreamscape/2013/08/19/mysql-checking-the-slow-query-log-and-simple-indexing/

  • Distributed Query Processing

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Slide 34 (of 37)Distributed Query ProcessingConsider get London suppliers of red partsthe user is at the New York site, data is in Londonn suppliers satisfy the criteriaa relational system involves 2 messages

    A non-relational system

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Slide 35 (of 37)Optimization is an important issueThere may be many ways of moving the data aroundRx at X, Ry at YRxYRyXRx, RyZDistributed Query Processing

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Slide 36 (of 37)Distributed Query ProcessingSuppliers(S) (S#, CITY) 10,000 Site AParts(P) (P#, COLOUR) 100,000 Site BSupplies(SP) (S#, P#)1000,000 Site AEvery tuple is 200 bits longThere are 10 red parts100,000 shipments by London suppliersData transfer at 50,000 bpsAccess delay of .1 secondQuery: Find London suppliers of red parts.Total time (t) = access delay + (data vol. / data rate)

    CT004-3.5-3-Advanced Database SystemsDatabase Performance & QO

    Slide 37 (of 37)Distributed Query ProcessingMove relation P to site A and process.1 + ((100,000 x 200) / 50,000) = 400s (6.67 minutes)Move relations S and SP to B and process.1 + ((10,000 + 1000,000) x 200) / 50,000 = 4040s (1.12hrs)Restrict P at site B (to give 10 red parts). Move the result to site A.1 + (10 x 200) / 50,000 = .14 second!Distributed Query Optimization

    ************************************


Recommended