CS405G: Introduction to Database Systems
Final Review
04/22/23 Jinze Liu @ University of Kentucky 2
Database Design
E-R model
• E-R model• Entities
• Attributes
• Relationships
04/22/23 Jinze Liu @ University of Kentucky 4
04/22/23 5
Database Design
504/22/23
From E-R Diagram to Relations
• Relations• Schemas• Converting E-R diagram to relations• Keys
• Super keys• Candidate keys• Primary keys
• Relational integrity constraints
Key Constraints
• Superkey: – (Uniqueness constraints) A set of attributes where
no two distinct tuples can have the same values– Every relation has at least one superkey:
• The set of all attributes.
• Key: A minimal superkey– Uniqueness constraint (superkey)– Minimum Constraint
• No attribute can be removed and still satisfy the uniqueness constraints.
04/22/23 7Jinze Liu @ University of Kentucky
04/22/23 8
Relational Integrity Constraints
• Constraints are conditions that must hold on all valid relation instances. There are four main types of constraints:
1. Domain constraints1. The value of a attribute must come from its domain
2. Key constraints3. Entity integrity constraints4. Referential integrity constraints
804/22/23
Database Normalization
• Functional Dependency• Functional Closure• Keys
– Redefined– Based on functional dependency
• DB Norm Form– 1st, 2nd, 3rd, BCNF
04/22/23 Luke Huan Univ. of Kansas 10
Database Query
Relational Algebra and SQL
• Relational algebra• SQL query
SFW Group by …, Having Subqueries
Relationship between R.A. and SQL
04/22/23 Jinze Liu @ University of Kentucky 12
Relational algebra
• Core set of operators:– Selection, projection, cross product, union, difference, and
renaming• Additional, derived operators:
– Join, natural join, intersection, etc.• Compose operators to make complex queries
RelOp
RelOp
A language for querying relational databases based on operators:
04/22/23 Jinze Liu @ University of Kentucky 13
Summary of core operators• Selection:• Projection:• Cross product:• Union:• Difference: • Renaming:
– Does not really add “processing” power
σp R
πL R
R X S
R SR - Sρ S(A1, A2, …) R
04/22/23 Jinze Liu @ University of Kentucky 14
Summary of derived operators
• Join: • Natural join:• Intersection:
R p S
R S
R S
04/22/23 Jinze Liu @ University of Kentucky 15
Classification of relational operators
• Selection: σp R
• Projection: πL R
• Cross product: R X S• Join: R p S
• Natural join: R S• Union: R U S• Difference: R - S• Intersection: R ∩ S
MonotoneMonotoneMonotoneMonotoneMonotoneMonotoneMonotone w.r.t. R; non-monotone w.r.t S
Monotone
04/22/23 Jinze Liu @ University of Kentucky 16
Update Operations on Relations
• Update operations– INSERT a tuple.– DELETE a tuple.– MODIFY a tuple.
• Constraints should not be violated in updates
04/22/23 Luke Huan Univ. of Kansas 17
Basic queries: SFW statement
• SELECT A1, A2, …, An
FROM R1, R2, …, Rm
WHERE condition;• Also called an SPJ (select-project-join) query• (almost) Equivalent to relational algebra query
π A1, A2, …, An (σ condition (R1 X R2 X … X Rm))
04/22/23 Luke Huan Univ. of Kansas 18
Semantics of SFW
• SELECT E1, E2, …, En
FROM R1, R2, …, Rm
WHERE condition;• For each t1 in R1:
For each t2 in R2: … … For each tm in Rm: If condition is true over t1, t2, …, tm: Compute and output E1, E2, …, En as a row
• t1, t2, …, tm are often called tuple variables
• Not 100% correct, we will see
04/22/23 Jinze Liu @ University of Kentucky 19
Operational semantics of GROUP BY
SELECT … FROM … WHERE … GROUP BY …;
• Compute FROM • Compute WHERE • Compute GROUP BY: group rows according to
the values of GROUP BY columns• Compute SELECT for each group
– For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group
Number of groups = number of rows in the final output
04/22/23 Jinze Liu @ University of Kentucky 20
Database Design
21
physical data organization• Storage hierarchy (DC vs. Pluto) ! count I/O’s• Disk geometry: three components of access cost;
random vs. sequential I/O• Data layout
– Record layout (handling variable-length fields, NULL’s)– Block layout (NSM, PAX) ! inter-/intra-record locality
• Access paths– Primary versus secondary indexes– Tree-based indexes: ISAM, B+-tree! Again, reintroduce redundancy to improve performance! Fundamental trade-off: query versus update cost
Performance Issues on Indexes
• Indexes– ISAM– B+ Tree
• Metrics – Storage– IO-costs
• Operations– Single value query & range query– Insertion and deletion
22
Query Processing Implementation
• Typical Query Processings– Selection– Join– Set operations.
• Typical approaches– Sequential scans in unsorted database– Sorted database– What are the tradeoffs.
23