Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software...

Post on 01-Jan-2016

214 views 0 download

transcript

Querying Large Databases

Rukmini Kaushik

Purpose

• Research for efficient algorithms and software architectures of query engines.

Query Execution Engine Architecture

• Query processing algorithms – physical algebra

• Data Model – logical algebra

Sorting & Hashing

• Both are memory intensive.

• Memory Concerns

- Merge Efficiency & memory

management.

- Hash table overflow

Aggregation and Duplicate Removal

• Aggregation Concept

Describes a set of objects with one value.• Algorithms

Three Types

- Nested Loops

- Sorting

- Hashing

Aggregation & Duplicate Removal

• Nested Loops - Easiest of the three - Doesn’t work well for large inputs• Sorting - Sort for common elements which results

in a simple duplicate removal. - Should remove duplicates as early as

possible.

Aggregation & Duplicate Removal

• Hashing

- Hash on group attributes.

- Can perform duplicate removal when creating hash table.

• Algorithm Analysis

Sorting and hashing functions are logarithmic with input size

Complex Query Execution Plan

• Purpose

- To schedule a query with several operations optimally

• Ideas

- Right-deep plans

- Left-deep plans

Complex Query Execution Plan

• Prediction

- Use a decision tree of sub-plans

- Done by using choose-plan operators

• Major Concern

- Optimal resource allocation

Parallel Query Execution Mechanism

• Goal

Obtain speed-up & scale-up

• Speed-up

- Uses extra hardware for constant size problem

- Linear speed-up is optimal

- Can be expressed as parallel efficiency

Parallel Query Execution Mechanism

• Scale-up

- Uses same resources with altered problem size

- Can be expressed as parallel efficiency.

Parallel Query Execution Mechanism

• Parallel Vs Distributed Systems

• Distributed

- Locally Autonomous

- Also uses Parallelism

Parallel Query Execution Mechanism

• Parallel

- One center of control

- Three types

Shared memory

Shared Disk

Distributed Memory

Parallel Query Execution Mechanism

• Three forms of parallelism

- Inter Query: Servicing multiple requests at the same time

- Inter Operator: Pipelining

- Intra Operator: Execute a single operator in multiple processors

Parallel Query Execution Mechanism

• Implementation

Bracket Models

Operator Models

• Bracket Model

Goal: Generic process template that receives and sends data and performs one operation at a time

Parallel Query Execution Mechanism

Number of inputs is limited to two

Can be run in parallel by having many templates in the system running simultaneously.

• Operator Model

Goal: Insert parallel operators in an ordered plan

Parallel Query Execution Mechanism

• Uses the exchange operator

• Exchange operator

- Does not manipulate data

- Provides capabilities for parallel query processing

- Changes a complex query into a single process

Parallel Algorithms

• Idea: More focus on algorithms and parallel execution

• Parallel selections and updates

- Disk input and output should be made parallel

- Selection: Maintain indices near stored data

- Updates: Use keys for partitioning attributes

Parallel Algorithms

• Parallel Sorting:

-classified by

- number of parallel inputs

- number of parallel outputs

- Algorithms consists of local sort and a data exchange step

Parallel Algorithms

- Major Concern

- Deadlock can be avoided by using range partitioning

- having a sufficient size data exchange buffer

- using a modified sort algorithm

Query Optimization

• Uses the differences between logical and physical aspects

• Must keep track of the properties of the inputs

• Cost models focus on throughput measures

Tuning query performance

• Focus

- Guidelines for improving query performance

• Guidelines for three points of view

- implementor and vendor

- database administrator

- application programmer

Tuning Query Performance

• Implementor

System should support indexing and clustering

Query optimizer should be reliable and accurate

• Administrator

Ensure usage of system facilities

Tuning Query Performance

carefully choose physical database design

provide available and efficient processing resources

• Application Programmer

Provide high level queries