+ All Categories
Home > Documents > 07.Overview of Query Processing

07.Overview of Query Processing

Date post: 14-Apr-2018
Category:
Upload: rukayat-gbemisola-adebayo
View: 221 times
Download: 0 times
Share this document with a friend

of 35

Transcript
  • 7/30/2019 07.Overview of Query Processing

    1/35

    Distributed Database SystemsAutumn, 2008

    Chapter 7

    Overview of Query

    Processing

    1Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    2/35

    SQL: Non-Procedural Language of RDB

    Tuple calculus

    { t|F(t) } where:

    t: tuple variable

    F(t) : well formed formula

    Example

    Get the No. and name of all managers

    2Distributed Database Systems

    ""|, MANAGERTITLEtEMPtENAMEENOt

  • 7/30/2019 07.Overview of Query Processing

    3/35

    SQL: Non-Procedural Language of RDB

    Domain calculus

    where:

    xi : domain variables

    : well formed formula

    Example

    {x,y |E(x,y, "manager") }

    3Distributed Database Systems

    ,,,|,,, 2121 nn xxxFxxx

    nxxxF ,,, 21

    Variables are position sensitive!

  • 7/30/2019 07.Overview of Query Processing

    4/35

    SQL: Non-Procedural Language of RDB

    SQL is a tuple calculus language

    SELECT ENO,ENAME

    FROM EMP

    WHERE TITLE=manager

    4Distributed Database Systems

    End user uses non-procedural languages

    to express queries.

  • 7/30/2019 07.Overview of Query Processing

    5/35

    Query Processor

    Query processor transforms queries into

    procedural operations to access data

    5Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    6/35

    Query Processor

    Distributed query processor has to deal

    with

    query decomposition, and

    data localization

    6Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    7/35

    7.1 Query Processing Problems

    Distributed Database Systems 7

  • 7/30/2019 07.Overview of Query Processing

    8/35

    7.1 Query Processing Problems

    Centralized query processor must transform calculus query intoalgebra operation, and

    choose the best execution plan Example:

    SELECT ENAME

    FROME,G

    WHERE E.ENO = G.ENO

    AND RESP=manager

    8Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    9/35

    7.1 Query Processing Problems

    Relational Algebra 1

    Relational Algebra 2

    9Distributed Database Systems

    GE ManagerRESPENOENAME ""

    GEENOGENOEManagerRESPENAME ..""

    Execution plan 2 is better for consumingless resources!

  • 7/30/2019 07.Overview of Query Processing

    10/35

    7.1 Query Processing Problems

    In DDB, the query processor mustconsider the communication cost and

    select the best site!

    Same query as last example, but G and Eare distributed.

    Simple plan:

    To transport all segments to query site andexecute there. This causes too much network

    traffic, very costly.

    10Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    11/35

    7.1 Query Processing Problems

    Distributed Query Example

    Distribution of E and G

    11Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    12/35

    7.1 Query Processing Problems

    Distributed Query Example

    Query

    12Distributed Database Systems

    GE ManagerREPSPENOENAME ""

  • 7/30/2019 07.Overview of Query Processing

    13/35

    7.1 Query Processing Problems

    Distributed Query Example

    Optimized Processing

    13Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    14/35

    7.2 Objectives of Query Processing

    Distributed Database Systems 14

  • 7/30/2019 07.Overview of Query Processing

    15/35

    7.2 Objectives of Query Processing

    Two-fold objectives:

    Transformation, and

    Optimization

    15Distributed Database Systems

  • 7/30/2019 07.Overview of Query Processing

    16/35

    7.2 Objectives of Query Processing

    Cost to be considered for optimization:

    CPU time

    I/O time, and

    Communication time

    16Distributed Database Systems

    WAN: the last cost is dominant

    LAN: all three are equal

  • 7/30/2019 07.Overview of Query Processing

    17/35

    7.3 Complexity of Relational Algebra Operations

    Distributed Database Systems 17

  • 7/30/2019 07.Overview of Query Processing

    18/35

    7.3 Complexity of Relational Algebra Operations

    Measured by n (cardinality) and tuples aresorted on comparison attributes

    Distributed Database Systems 18

    O(n)

    O(nlogn)O(nlogn)

    O(n2)

    )duplicates(with,

    GROUP),duplicates(with

    ,,,,

  • 7/30/2019 07.Overview of Query Processing

    19/35

    7.4 Characterization of Query Processor

    Distributed Database Systems 19

  • 7/30/2019 07.Overview of Query Processing

    20/35

    7.4.1 Languages

    For users:

    calculus or algebra based languages.

    For query processor:

    map the input into internal form of

    algebra augmented with

    communication primitives.

    Distributed Database Systems 20

  • 7/30/2019 07.Overview of Query Processing

    21/35

    7.4.2 Types of Optimization

    Exhaustive search

    Workable for small solution space

    Heuristics

    Perform first,semi-join, etc. for largesolution space

    Distributed Database Systems 21

    ,

  • 7/30/2019 07.Overview of Query Processing

    22/35

    7.4.3 Optimization Timing

    Static

    Do it at compiling time by using statistics,

    appropriate for exhaustive search, optimized

    once, but executed many times. Dynamic

    Do it at execution time, accurate, repeated

    for every execution, expensive.

    Distributed Database Systems 22

  • 7/30/2019 07.Overview of Query Processing

    23/35

    7.4.4 Statistics

    Facts of

    Cardinalities

    Attribute value distribution

    Size of relation, etc.

    Provided to query optimizer and

    periodically updated.

    Distributed Database Systems 23

  • 7/30/2019 07.Overview of Query Processing

    24/35

    7.4.5 Decision Site

    For query optimization, it may be done by Single site centralizedapproach, or

    All the sites involved distributed, or

    Hybrid

    one site makes major decision incooperation with other sites making local

    decisions

    Distributed Database Systems 24

  • 7/30/2019 07.Overview of Query Processing

    25/35

    7.4.6 Exploration of the Network Topology

    WAN communication cost is dominant

    LAN

    communication cost is comparable to I/Ocost. Broadcasting capability, star network,

    satellite network should be considered.

    Distributed Database Systems 25

  • 7/30/2019 07.Overview of Query Processing

    26/35

    7.4.7 Exploration of Replicated Fragments

    Use replications to minimize

    communication costs.

    Distributed Database Systems 26

  • 7/30/2019 07.Overview of Query Processing

    27/35

    7.4.8 Use of Semi-joins

    Reduce the size of operand

    relations to cut down

    communication costs whenoverhead is not significant.

    Distributed Database Systems 27

  • 7/30/2019 07.Overview of Query Processing

    28/35

    7.5 Layers of Query Processing

    Distributed Database Systems 28

  • 7/30/2019 07.Overview of Query Processing

    29/35

  • 7/30/2019 07.Overview of Query Processing

    30/35

    7.5.1 Query Decomposition

    Decompose calculus query into algebraquery using global conceptual schema

    information.

    Distributed Database Systems 30

    Step 1 calculus normalization

    Step 2 semantic analysis to rejectincorrect queries

    Step 3 simplification to eliminateredundant components

    Step 4 translation of calculus queryinto optimized algebra query.

  • 7/30/2019 07.Overview of Query Processing

    31/35

    7.5.2 Data Localization

    Distributed query is mapped intoa fragment query and simplified

    to produce agoodone.

    Distributed Database Systems 31

  • 7/30/2019 07.Overview of Query Processing

    32/35

    7.5.3 Global Query Optimization

    Find an execution strategy close tooptimal.

    Find the best ordering of operations in

    the fragment query, includingcommunication operations.

    Cost function defined in time is required.

    Distributed Database Systems 32

  • 7/30/2019 07.Overview of Query Processing

    33/35

    7.5.4 Local Query Optimization

    Centralized system algorithms

    (to be discussed in chapter 9)

    Distributed Database Systems 33

  • 7/30/2019 07.Overview of Query Processing

    34/35

  • 7/30/2019 07.Overview of Query Processing

    35/35

    7.6 Conclusions

    Query processor

    must be able to findgood execution plan for a calculus query, s.

    t. CPU time, I/O time and communication

    time are minimized. Method: laying of

    decomposition

    localization global query optimization

    local query optimization


Recommended