1JYP
Database System ConceptsDatabase System Concepts
Instructor:
Jin Yuanping [email protected] Professor of College of Software Engineering, Southeast University
The reference book:
Database System Concepts (4th Edition),
A. Silberschatz , et al, McGraw-Hill, 2002
2JYP
The Content of the Course1 Introduction
2 Entity-Relationship Model
3 Relational Model
4 SQL
5 Integrity Constraints
6 Transactions
7 Concurrency Control
8 Database System Architecture
9 New Applications
10 XML
3JYP
Topic 1: IntroductionTopic 1: Introduction
Database System
View of Data
Data Models
Data Definition Language
Data Manipulation Language
Transaction Management
Storage Management
Database Administrator
Database Users
Overall System Structure
4JYP
A Database System A Database System
A collection of interrelated data (DB)
A set of programs to access the data (DBMS)
A set of application programs
A group of people to administrate it (DBA)
DB contains information about a particular enterprise
DBMS provides an environment that is both convenient and efficient to use.
5JYP
View of DataView of Data
An architecture for a database system
6JYP
Levels of AbstractionLevels of Abstraction
Physical level describes how a record (e.g., customer) is stored.
Logical level: describes data stored in database, and the relationships among the data.
type customer = recordname : string;street : string;city : integer;
end;
View level: application programs hide details of data types. Views can also hide information (e.g., salary) for security purposes.
7JYP
Instances and SchemasInstances and Schemas
Similar to types, variables and values in programming languages
Schema – the logical structure of the database (e.g., set of customers and accounts and the relationship between them)
Instance – the actual content of the database at a particular point in time
8JYP
Data IndependenceData Independence
Ability to modify a schema definition in one level without affecting a schema definition in the next higher level.
The interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.
Two levels of data independence: Physical data independence
Logical data independence
9JYP
Data ModelsData Models
A collection of concepts and tools for describing data
data relationships
data semantics
data constraints
Entity-relationship model
Object-oriented model
Relational model (Oracle, DB2)
Semi-structured model (XML)
10JYP
Entity-Relationship ModelEntity-Relationship Model
Example of entity-relationship model
11JYP
Relational ModelRelational Model
Example of tabular data in the relational model
JonesSmithCurry
Lindsay
customer-name
MainNorthNorthPark
customer-street
HarrisonRyeRye
Pittsfield
customer-city
customer
12JYP
XML DataXML Data
<bank> <account>
<account-number> A-101 </account-number> <branch-name> Downtown </branch-name> <balance> 500 </balance>
</account> <depositor>
<account-number> A-101 </account-number> <customer-name> Johnson </customer-name>
</depositor> </bank>
13JYP
Data Definition Language (DDL)Data Definition Language (DDL)
Specification notation for defining the database schema
DDL compiler generates a set of tables
Data dictionary contains metadata (i.e., data about data)
Data storage and definition language – special type of DDL in which the storage structure and access methods used by the database system are specified
14JYP
Data Manipulation Language (DML)Data Manipulation Language (DML)
Language for accessing and manipulating the data organized by the appropriate data model
Two classes of languages Procedural – user specifies what data is required and how to get
those data
Nonprocedural – user specifies what data is required without specifying how to get those data
15JYP
Transaction ManagementTransaction Management
A transaction is a collection of operations that performs a single logical function in a database application
Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
16JYP
Storage ManagementStorage Management
Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
The storage manager is responsible for the following tasks: interaction with the file manager
efficient storing, retrieving and updating of data
17JYP
Database AdministratorDatabase Administrator
Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.
Database administrator's duties include: Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
Specifying integrity constraints
Acting as liaison with users
Monitoring performance and responding to changes in requirements
18JYP
Database UsersDatabase Users
Users are differentiated by the way they expect to interact with the system
Application programmers – interact with system through DML calls
Sophisticated users – form requests in a database query language
Specialized users – write specialized database applications that do not fit into the traditional data processing framework
Naïve users – invoke one of the permanent application programs that have been written previously
19JYP
Overall System StructureOverall System Structure
20JYP
Overall System Structure (Cont.)Overall System Structure (Cont.)
The storage manager components also include:
Authorization and integrity manager
21JYP
Topic 2: Entity-Relationship ModelTopic 2: Entity-Relationship Model
Entity Sets
Relationship Sets
Mapping Constraints
Keys
E-R Diagram
Design of an E-R Database Schema
Reduction of an E-R Schema to Tables
22JYP
Entity SetsEntity Sets
A database can be modeled as: a collection of entities,
relationship among entities.
An entity is an object that exists and is distinguishable from other objects.
Example: specific person, company, event, plant
An entity set is a set of entities of the same type that share the same properties.
Example: set of all persons, companies, trees, holidays
23JYP
Entity Sets Customer and LoanEntity Sets Customer and Loan
24JYP
AttributesAttributes
An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.
Example:
customer = (customer-name, social-security, customer-street, customer-city)
account = (account-number, balance)
Domain – the set of permitted values for each attribute
Attribute types: Simple and composite attributes.
Single-valued and multi-valued attributes.
Null attributes.
Derived attributes.
25JYP
Composite Attributes Composite Attributes customer-namecustomer-name and and customer-addresscustomer-address
26JYP
Relationship SetsRelationship Sets
A relationship is an association among several entitiesExample:
Hayes (Hayes, A-102) A-102customer depositor account
entity set relationship set entity set
A relationship set is a mathematical relation among n 2 entity sets.
{(e1, e2, … en) | e1 E1, e2 E2, …, en En}
where (e1, e2, …, en) is a relationship Example:
(Hayes, A-102) depositor
27JYP
Relationship Sets (Cont.)Relationship Sets (Cont.)
an attribute can also be property of a relationship set. For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date
E-R Diagram with an Attribute Attached to a Relationship
28JYP
Relationship set Relationship set borrowerborrower
29JYP
Degree of a Relationship SetDegree of a Relationship Set
Refers to number of entity sets that participate in a relationship set.
Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary.
Relationship sets may involve more than two entity sets. The entity sets customer, loan, and branch may be linked by the ternary (degree three) relationship set CLB.
30JYP
RolesRoles
The labels “manager” and “worker” are called roles; they specify how employee entities interact via the works-for relationship set.
Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles.
Role labels are optional, and are used to clarify semantics of the relationship
Entity sets of a relationship need not be distinct
31JYP
Mapping CardinalitiesMapping Cardinalities
Express the number of entities to which another entity can be associated via a relationship set.
Most useful in describing binary relationship sets.
For a binary relationship set the mapping cardinality must be one of the following types: One to one
One to many
Many to one
Many to many
We distinguish among these types by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.
32JYP
Mapping CardinalitiesMapping Cardinalities
One to one One to many
33JYP
Mapping Cardinalities Mapping Cardinalities
Many to one Many to many
34JYP
One-To-One RelationshipOne-To-One Relationship
A customer is associated with at most one loan via the relationship borrower
A loan is associated with at most one customer via borrower
35JYP
One-To-Many and Many-To-One One-To-Many and Many-To-One RelationshipsRelationships
36JYP
One-To-Many and Many-To-One (Cont.)One-To-Many and Many-To-One (Cont.)
In the one-to-many relationship (a), a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower
In the many-to-one relationship (b), a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower
37JYP
Many-To-Many RelationshipMany-To-Many Relationship
A customer is associated with several (possibly 0) loans via borrower
A loan is associated with several (possibly 0) customers via borrower
38JYP
Existence DependenciesExistence Dependencies
If the existence of entity x depends on the existence of entity y, then x is said to be existence dependent on y. y is a dominant entity (in example below, loan)
x is a subordinate entity (in example below, payment)
loan-payment paymentloan
If a loan entity is deleted, then all its associated payment entities must be deleted also.
39JYP
KeysKeys
A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.
A candidate key of an entity set is a minimal super key social-security is candidate key of customer
account-number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.
The combination of primary keys of the participating entity sets forms a candidate key of a relationship set. must consider the mapping cardinality and the semantics of the
relationship set when selecting the primary key.
(social-security, account-number) is the primary key of depositor)
40JYP
E-R Diagram ComponentsE-R Diagram Components
Rectangles represent entity sets.
Ellipses represent attributes
Diamonds represent relationship sets.
Lines link attributes to entity sets and entity sets to relationship sets.
Double ellipses represent multivalued attributes.
Dashed ellipses denote derived attributes.
Primary key attributes are underlined.
41JYP
Weak Entity SetsWeak Entity Sets
An entity set that does not have a primary key is referred to as a weak entity set.
The existence of a weak entity set depends on the existence of a strong entity set, it must relate to the strong set via a one-to-many relationship set.
The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set that depend on one particular strong entity.
The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator.
42JYP
Weak Entity Sets (Cont.)Weak Entity Sets (Cont.) We depict a weak entity set by double rectangles.
We underline the discriminator of a weak entity set with a dashed line.
payment-number – discriminator of the payment entity set
Primary key for payment – (loan-number, payment-number)
43JYP
SpecializationSpecialization
Top-down design process, we designate subgroupings within an entity set that are distinctive from other entities in the set.
These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.
Depicted by a triangle component labeled ISA (i.e., saving-account “is an” account).
44JYP
Specialization ExampleSpecialization Example
45JYP
GeneralizationGeneralization
A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.
46JYP
Design Constraints on a GeneralizationDesign Constraints on a Generalization
Constraint on which entities can be members of a given lower-level entity set. condition-defined user-defined
Constraint on whether or not entities may belong to more than one lower-level entity set within a single generalization. disjoint overlapping
Completeness constraint – specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization.
total partial
47JYP
E-R Diagram for Banking EnterpriseE-R Diagram for Banking Enterprise
48JYP
Reduction of an E-R Schema to TablesReduction of an E-R Schema to Tables
Primary keys allow entity sets and relationship sets to be expressed uniformly as tables which represent the contents of the database.
A database which conforms to an E-R diagram can be represented by a collection of tables.
For each entity set and relationship set there is a unique table which is assigned the name of the corresponding entity set or relationship set.
Each table has a number of columns (generally corresponding to attributes), which have unique names.
Converting an E-R diagram to a table format is the basis for deriving a relational database design from an E-R diagram.
49JYP
Representing Entity Sets as TablesRepresenting Entity Sets as Tables
A strong entity set reduces to a table with the same attributes.
JonesSmithHayes
321-12-3123019-28-3746677-89-9011
MainNorthMain
HarrisonRye
Harrison
customer-name social-security c-street c-city
A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set.
L-17L-23L-15
51122
10 May 199617 May 199612 May 1996
loan-number payment-number payment-date
5075
300
payment-amount
50JYP
Representing Relationship Sets as TablesRepresenting Relationship Sets as Tables A many-to-many relationship set is represented as a table with
columns for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set.
social-security account-number access-date
... ... ...
The depositor table
The table corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant.The payment table already contains the information that would appear in the loan-payment table (i.e., the columns loan-number and payment-number).
51JYP
Representing Generalization as TablesRepresenting Generalization as Tables Method 1: Form a table for the generalized entity account.
Form a table for each entity set that is specialized (include primary key of generalized entity set)
table table attributesaccount account-number, balance, account-typesavings-account account-number, interest-ratechecking-account account-number, overdraft-amount
Method 2: If the generalization is disjoint and total, form a table for each entity set that is specialized
table table attributessavings-account account-number, balance, interest-ratechecking-account account-number, balance, overdraft-
amountMethod 2 has no table for generalized entity account
52JYP
Topic 3: Relational ModelTopic 3: Relational Model
Structure of Relational Databases
Relational Algebra
Extended Relational-Algebra-Operations
Modification of the Database
Views
53JYP
Basic StructureBasic Structure
Given sets A1, A2, …. An, a relation r is a subset of
A1 x A2 x … x An
Thus a relation is a set of n-tuples (a1, a2, …, an)
where ai Ai
Example: if
customer-name = {Jones, Smith, Curry, Lindsay}customer-street = {Main, North, Park}customer-city = {Harrison, Rye, Pittsfield}
Then
r = {(Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye),
(Lindsay, Park, Pittsfield)}
is a relation over
customer-name x customer-street x customer-city
54JYP
Relation SchemaRelation Schema
A1, A2, …, An are attributes and also as their domains
R = (A1, A2, …, An ) is a relation schema
Customer-schema = (customer-name, customer-street, customer-city)
r(R) is a relation on the relation schema R
customer (Customer-schema)
55JYP
Relation InstanceRelation Instance
The current values (relation instance) of a relation are specified by a table
An element t of r is a tuple, represented by a row in a table
JonesSmithCurry
Lindsay
customer-name
MainNorthNorthPark
customer-street
HarrisonRyeRye
Pittsfield
customer-city
customer
56JYP
KeysKeys
Let K R
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) by “possible r” we mean a relation r that could exist in the enterprise we are modeling.Example: {customer-name, customer-street} and {customer-name} are both superkeys of Customer, if no two customers can possibly have the same name.
K is a candidate key if K is minimalExample: {customer-name} is a candidate key for Customer, since it is a superkey (assuming no two customers can possibly have the same name), and no subset of it is a superkey.
57JYP
Determining Keys from E-R SetsDetermining Keys from E-R Sets
Strong entity set. The primary key of the entity set becomes the primary key of the relation.
Weak entity set. The primary key of the relation consists of the union of the primary key of the strong entity set and the discriminator of the weak entity set.
Relationship set. The union of the primary keys of the related entity sets becomes a super key of the relation.For binary many-to-one relationship sets, the primary key of the “many” entity set becomes the relation’s primary key.For one-to-one relationship sets, the relation’s primary key can be that of either entity set.
58JYP
Query LanguagesQuery Languages
Language in which user requests information from the database.
Typically a level higher than a standard programming language.
Categories of languages procedural
non-procedural
“Pure” languages---concise: Relational Algebra
Tuple Relational Calculus
Domain Relational Calculus
Pure languages form underlying basis of query languages that people use.
59JYP
Relational AlgebraRelational Algebra
Procedural language
Six basic operators select
project
union
set difference
Cartesian product
rename
The operators take one or two relations as inputs and give a new relation as a result.
60JYP
Select OperationSelect Operation
Notation: p(r)
Defined as:
p(r) = {t | t r and p(t)}
Where P is a formula in propositional calculus, dealing with terms of the form:
<attribute> = <attribute> or <constant>
>
<
“connected by” : (and), (or), (not)
61JYP
Select Operation – ExampleSelect Operation – Example
• Relation r A B C D
15
1223
773
10
A=B ^ D > 5 (r)A B C D
123
710
62JYP
Project OperationProject Operation
Notation:
A1, A2, …, Ak (r)
where A1, A2 … Ak are attribute names and r is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that are not listed
Duplicate rows removed from result, since relations are sets
63JYP
Project Operation – ExampleProject Operation – Example
Relation r: A B C
10203040
1112
A C
1112
=
A C
112
)( , rCA
64JYP
Union OperationUnion Operation
Notation: r s
Defined as:
r s = {t | t r or t s}
For r s to be valid.
1. r, s must have the same number of attributes
2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s)
65JYP
Union Operation – ExampleUnion Operation – Example
Relations r, s:
r s:
A B
121
A B
23
rs
A B
1213
66JYP
Set Difference OperationSet Difference Operation
Notation r – s
Defined as:
r – s = {t | t r and t s}
Set differences must be taken between compatible relations. r and s must have the same number of attributes
attribute domains of r and s must be compatible
67JYP
Set Difference Operation – ExampleSet Difference Operation – Example
Relations r, s:
r – s:
A B
121
A B
23
rs
A B
11
68JYP
Cartesian-Product OperationCartesian-Product Operation
Notation r x s
Defined as:
r x s = {t q| t r and q s}
Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ).
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
69JYP
Cartesian-Product Operation – Cartesian-Product Operation – ExampleExample
Relations r, s:
r x s:
A B
12
A B
11112222
C D
1019201010102010
E
++––++––
C D
10102010
E
++––r
s
70JYP
Composition of OperationsComposition of Operations
Can build expressions using multiple operations
Example: A=C(r x s)
Notation: r s (natural join) Let r and s be relations on schemas R and S respectively. The
result is a relation on schema R S which is obtained by considering each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R S, a tuple t is added to the result, where
t has the same value as tr on r
t has the same value as ts on s
71JYP
Composition of Operations (Cont.)Composition of Operations (Cont.)
Example:
R = (A, B, C, D)
S = (E, B, D)
Result schema = (A, B, C, D, E)
r s is defined as:
r.A, r.B, r.C ,r.D, s.E (c.B=s.B ^ r.D=s.D(r x s))
72JYP
Natural Join Operation – ExampleNatural Join Operation – Example
Relations r, s:
A B
12412
C D
aabab
B
13123
D
aaabb
E
r
A B
11112
C D
aaaab
E
s
r s
73JYP
Banking ExampleBanking Example
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-city)
account (branch-name, account-number, balance)
loan (branch-name, loan-number, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
74JYP
Extended Relational-Algebra-OperationsExtended Relational-Algebra-Operations
Generalized Projection
Outer Join
Aggregate Functions
75JYP
Generalized ProjectionGeneralized Projection
Extends the projection operation by allowing arithmetic functions to be used in the projection list.
P F1, F2, …, Fn(E)
E is any relational-algebra expression
Each of F1, F2, …, Fn are arithmetic expressions involving constants and attributes in the schema of E.
Given relation credit-info(customer-name, limit, credit-balance), find how much more each person can spend:
P customer-name, limit – credit-balance (credit-info)
76JYP
Outer JoinOuter Join
An extension of the join operation that avoids loss of information.
Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.
Uses null values: null signifies that the value is unknown or does not exist
All comparisons involving null are false by definition.
77JYP
Outer Join – ExampleOuter Join – Example
Relation loan
branch-name loan-number amount
DowntownRedwoodPerryridge
L-170L-230L-260
300040001700
Relation borrower
customer-name loan-number
JonesSmithHayes
L-170L-230L-155
78JYP
Outer Join – ExampleOuter Join – Example
loan borrower
branch-name loan-number amount
DowntownRedwood
L-170L-230
30004000
customer-name
JonesSmith
branch-name loan-number amount
DowntownRedwoodPerryridge
L-170L-230L-260
300040001700
customer-name
JonesSmithnull
loan borrower
79JYP
Outer Join – ExampleOuter Join – Example
loan borrower
branch-name loan-number amount
DowntownRedwoodnull
L-170L-230L-155
30004000null
customer-name
JonesSmithHayes
branch-name loan-number amount
DowntownRedwoodPerryridgenull
L-170L-230L-260L-155
300040001700null
customer-name
JonesSmithnullHayes
loan borrower
80JYP
Aggregate FunctionsAggregate Functions
Aggregation operator takes a collection of values and returns a single value as a result.
avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values
G1, G2, …, Gn F1, A1, F2, A2, …, Fm, Am (E)
E is any relational-algebra expression
G1, G2 …, Gn is a list of attributes on which to group
Fi is an aggregate function
Ai is an attribute name
81JYP
Aggregate Function – ExampleAggregate Function – Example
Relation r:
A B
C
773
10
sumC(r)sum-C
27
82JYP
Aggregate Function – ExampleAggregate Function – Example
Relation account grouped by branch-name:
branch-name sum balance (account)
branch-name account-number balance
PerryridgePerryridgeBrightonBrightonRedwood
A-102A-201A-217A-215A-222
400900750750700
branch-name sum of balance
PerryridgeBrightonRedwood
13001500700
83JYP
Modification of the DatabaseModification of the Database
The content of the database may be modified using the following operations: Deletion
Insertion
Updating
All these operations are expressed using the assignment operator().
84JYP
DeletionDeletion
A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected tuples are removed from the database.
Can delete only whole tuples; cannot delete values on only particular attributes
A deletion is expressed in relational algebra by:
r r – E
where r is a relation and E is a relational algebra query.
85JYP
Deletion ExamplesDeletion Examples
Delete all account records in the Perryridge branch.
account account – branch-name = “Perryridge” (account)
Delete all loan records with amount in the range of 0 to 50
loan loan – amount 0and amount 50 (loan)
Delete all accounts at branches located in Needham.
r1 branch-city = “Needham” (account branch)
r2 branch-name, account-number, balance (r1)
r3 customer-name, account-number (r2 depositor)
account account – r2
depositor depositor – r3
86JYP
InsertionInsertion
To insert data into a relation, we either: specify a tuple to be inserted
write a query whose result is a set of tuples to be inserted
in relational algebra, an insertion is expressed by:
r r E
where r is a relation and E is a relational algebra expression.
The insertion of a single tuple is expressed by letting E be a constant relation containing one tuple.
87JYP
Insertion ExamplesInsertion Examples
Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch.
account account {(“Perryridge”, A-973, 1200)}
depositor depositor {(“Smith”, A-973)}
Provide as a gift for all loan customers in the Perryridge branch, a $200 savings account. Let the loan number serve as the account number for the new savings account.
r1 (branch-name = “Perryridge” (borrower loan))
account account ( branch-name, loan-number (r1) x{(200)})
depositor depositor customer-name, loan-number (r1)
88JYP
UpdatingUpdating
A mechanism to change a value in a tuple without changing all values in the tuple
Use the generalized projection operator to do this task
r F1, F2, …, Fn (r)
Each Fi, is either the ith attribute of r, if the ith attribute is not updated, or, if the attribute is to be updated Fi is an expression, involving only constants and the attributes of r, which gives the new value for the attribute
89JYP
Update ExamplesUpdate Examples
Make interest payments by increasing all balances by 5 percent.
account branch-name, account-number, balance * 1.05 (account)
Pay all accounts with balances over $10,0006 percent interest and pay all others 5 percent
account branch-name, account-number, balance * 1.06 ( balance 10000 (account)) branch-name, account-number, balance * 1.05 ( balance 10000 (account))
90JYP
ViewsViews
In some cases, it is not desirable for all users to see the entire logical model (i.e., all the actual relations stored in the database.)
Consider a person who needs to know a customer’s loan number and branch name but has no need to see the loan amount. This person should see a relation described, in the relational algebra, by
customer-name, loan-number, branch-name (borrower loan)
Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view.
91JYP
View DefinitionView Definition
A view is defined using the create view statement which has the form
create view v as <query expression>
where <query expression> is any legal relational algebra query expression. The view name is represented by v.
Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.
View definition is not the same as creating a new relation by evaluating the query expression. Rather, a view definition causes the saving of an expression to be substituted into queries using the view.
92JYP
View ExamplesView Examples
Consider the view (named all-customer) consisting of branches and their customers.
create view all-customer as
branch-name, customer-name (depositor account)
branch-name, customer-name (borrower loan)
We can find all customers of the Perryridge branch by writing:
customer-name( branch-name = “Perryridge” (all-customer))
93JYP
Updates Through ViewUpdates Through View
Database modifications expressed as views must be translated to modifications of the actual relations in the database.
Consider the person who needs to see all loan data in the loan relation except amount. The view given to the person, branch-loan, is defined as:
create view branch-loan as
branch-name, loan-number (loan) Since we allow a view name to appear wherever a relation name
is allowed, the person may write:
branch-loan branch-loan {(“Perryridge”, L-37)}
94JYP
Updates Through Views (Cont.)Updates Through Views (Cont.)
The previous insertion must be represented by an insertion into the actual relation loan from which the view branch-loan is constructed.
An insertion into loan requires a value for amount. The insertion can be dealt with by either. rejecting the insertion and returning an error message to the user.
inserting a tuple (“Perryridge”, L-37, null) into the loan relation
95JYP
Views Defined Using Other ViewsViews Defined Using Other Views
One view may be used in the expression defining another view
A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1 , this is depicted by a directed edge < v2 v1 >
A view relation v1 is said to depend on view relation v2 , if there is a path from v2 to v1 .
A view relation v is said to be recursive if it depends on itself.
96JYP
View ExpansionView Expansion
A way to define the meaning of views defined in terms of other views.
Let view v1 be defined by an expression e1 that may itself contain uses of view relations.
View expansion of an expression repeats the following replacement step:
repeatFind any view relation vi in e1
Replace the view relation vi by the expression defining vi until no more view relations are present in e1
As long as the view definitions are not recursive, this loop will terminate
97JYP
Topic 4: SQLTopic 4: SQL
SQL: Structured Query Language for commercial DBMS---more user-friendly
Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Joined Relations Data Definition Language Embedded SQL
98JYP
Basic Structure Basic Structure
SQL is based on set and relational operations with certain modifications and enhancements
A typical SQL query has the form:select A1, A2, ..., An
from r1, r2, ..., rm
where P
Ais represent attributes
ris represent relations
P is a predicate.
This query is equivalent to the relational algebra expression.
A1, A2, ..., An (P (r1 x r2 x ... x rm))
The result of an SQL query is a relation.
99JYP
The select ClauseThe select Clause
The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query.
Q1:Find the names of all branches in the loan relation
select branch-namefrom loan
the query is equivalent to:
branch-name(loan)
An asterisk in the select clause denotes “all attributes”
select * from loan
100JYP
The select Clause (Cont.)The select Clause (Cont.)
SQL allows duplicates in relations as well as in query results.
To force the elimination of duplicates, insert the keyword distinct after select.
Q2: Find the names of all branches in the loan relations, and remove duplicates
select distinct branch-namefrom loan
The keyword all specifies that duplicates not be removed.
select all branch-namefrom loan
101JYP
The select Clause (Cont.)The select Clause (Cont.)
The select clause can contain arithmetic expressions involving the operation +, –, , and /, and operating on constants or attributes of tuples---generalized projection.
Q3:
select branch-name, loan-number, amount 100 from loan
would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by 100.
102JYP
The where ClauseThe where Clause
The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause.
Q4: Find all loan number for loans made at the Perryridge branch with loan amounts greater than $1300.
select loan-numberfrom loanwhere branch-name = “ Perryridge” and amount >
1300
SQL uses the logical connectives and, or, and not. It allows the use of arithmetic expressions as operands to the comparison operators.
103JYP
The where Clause (Cont.)The where Clause (Cont.)
SQL Includes a between comparison operator in order to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some other value.
Q5:Find the loan number of those loans with loan amounts between $900 and $1000 (that is, $900 and $1000)
select loan-numberfrom loanwhere amount between 900 and 1000
104JYP
The from ClauseThe from Clause
The from clause corresponds to the Cartesian product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression.
Q6:Find the Cartesian product borrower x loan
select from borrower, loan
Q7:Find the name and loan number of all customers having a loan at the Perryridge branch.
select distinct customer-name, borrower.loan-numberfrom borrower, loanwhere borrower.loan-number = loan.loan-number and
branch-name = “Perryridge”
105JYP
The Rename OperationThe Rename Operation
The SQL mechanism for renaming attributes is accomplished through the as clause:
old-name as new-name
Q8: Find the name and loan number of all customers having a loan at the Perryridge branch; replace the column name loan-number with the name loan-id.
select distinct customer-name, borrower.loan-number as loan-idfrom borrower, loan
where borrower.loan-number = loan.loan-number and branch-name = “Perryridge”
106JYP
Tuple VariablesTuple Variables Tuple variables are defined in the from clause via the use of the
as clause. Q9:Find the customer names, their loan numbers and amount for
all customers having a loan at some branch.
select distinct customer-name, T.loan-number, S.amount from borrower as T, loan as S where T.loan-number = S.loan-number
Q10:Find the names of all branches that have greater assets than some branch located in Brooklyn.
select distinct T.branch-namefrom branch as T, branch as SWhere T.assets > S.assets and S.branch-city = “Brooklyn”
107JYP
String OperationsString Operations
SQL includes a string-matching operator for comparisons on character strings. Patterns are described using two special characters: percent (%). The % character matches any substring.
underscore (_). The _ character matches any character.
Find the names of all customers whose street includes the substring “Main”.
select customer-namefrom customerwhere customer-street like “%Main%”
Match the name “Main%”
like “Main\%” escape “\”
Note in ACCESS, “%” is replaced by “*”, see query X1.
108JYP
Ordering the Display of TuplesOrdering the Display of Tuples
Q11:List in alphabetic order the names of all customers having a loan in Perryridge branch
select distinct customer-namefrom borrower, loanwhere borrower.loan-number = loan.loan-number and branch-name = “Perryridge”order by customer-name
We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.
SQL must perform a sort to fulfil an order by request. Since sorting a large number of tuples may be costly. It is desirable to sort only when necessary.
109JYP
DuplicatesDuplicates
In relations with duplicates, SQL can define how many copies of tuples appear in the result.
Multiset versions of some of the relational algebra operators – given multiset relations r1 and r2:
1. If there are c1 copies of tuple t1 in r1, and t1 satisfies selection , then there are c1 copies of t1 in (r1).
2. For each copy of tuple t1 in r1, there is a copy of tuple A(t1) in A(r1) where A(t1) denotes the projection of the single tuple t1.
3. If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t1. t2 in r1 x r2
110JYP
Duplicates (Cont.)Duplicates (Cont.)
Suppose relations r1 with schema (A, B) and r2 with schema (C) are the following multisets:
r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}
Then B(r1) would be {(a), (a)}, while B(r1) x r2 would be
{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}
SQL duplicate semantics:
select A1,, A2, ..., An
from r1, r1, ..., rm
where P
is equivalent to the multiset version of the expression:
A1,, A2, ..., An(P(r1 x r2 x ... x rm))
111JYP
Set OperationsSet Operations
The set operations union, intersect, and except operate on relations and correspond to the relational algebra operations
Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding multiset versions union all, intersect all and except all. Suppose a tuple occurs m times in r and n times in s, then, it occurs:
m + n times in r union all s
min (m, n) times in r intersect all s
112JYP
Set OperationsSet Operations
Q12: Find all customers who have a loan, an account, or both:
(select customer-name from depositor)union(select customer-name from borrower)
Find all customers who have both a loan and an account.
(select customer-name from depositor)intersect(select customer-name from borrower)
Find all customers who have an account but no loan.
select customer-name from depositor)except(select customer-name from borrower)
113JYP
Aggregate FunctionsAggregate Functions
These functions operate on the multiset of values of a column of a relation, and return a value
avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values
114JYP
Aggregate Functions (Cont.)Aggregate Functions (Cont.)
Q13:Find the average account balance at the Brighton branch.
select avg (balance)from accountwhere branch-name = “Brighton”
Q14:Find the number of tuples in the customer relation.
select count (*)from customer
Find the number of depositors in the bank.
select count (distinct customer-name)
from depositor
115JYP
Aggregate Functions – Group ByAggregate Functions – Group By
Find the number of depositors for each branch.
select branch-name, count (distinct customer-name)
from depositor, accountwhere depositor.account-number =
account.account-numbergroup by branch-name
Note: Attributes in select clause outside of aggregate functions must appear in group by list
116JYP
Aggregate Functions – Having ClauseAggregate Functions – Having Clause
Q15: Find the names of all branches where the average account balance is more than $400.
select branch-name, avg (balance)from accountgroup by branch-namehaving avg (balance) > 400
Note: predicates in the having clause are applied after the formation of groups.
117JYP
Null ValuesNull Values
It is possible for tuples to have a null value, denoted by null, for some of their attributes, null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.
Roughly speaking, all comparisons involving null return false. More precisely. Any comparison with null returns unknown
(unknown or unknown) = unknown,(true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown
Result of where clause predicate is treated as false if it evaluates to unknown
“P is unknown” evaluates to true if predicate P evaluates to unknown
118JYP
Null Values (Cont.)Null Values (Cont.)
Find all loan number which appear in the loan relation with nulll values for amount.
select loan-numberfrom loanwhere amount is null
Total all loan amounts
select sum (amount)from loan
Above statement ignores null amounts; result is null if there is no non-null amount.
All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.
119JYP
Nested SubqueriesNested Subqueries
SQL provides a mechanism for the nesting of subqueries.
A subquery is a select-from-where expression that is nested within another query.
A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality.
120JYP
Set MembershipSet Membership
F in r t r (t = F)04
5
(5 in ) = true
04
6
(5 in ) = false
04
6
(5 not in ) = true
121JYP
Example QueryExample Query
Q16:Find all customers who have both an account and a loan at the bank.
select distinct customer-namefrom borrowerwhere customer-name in (select customer-name
from depositor)
Q17:Find all customers who have a loan at the bank but do not have an account at the bank
select distinct customer-namefrom borrowerwhere customer-name not in (select customer-name
from depositor)
122JYP
Example QueryExample Query
Find all customers who have both an account and a loan at the Perryridge branch.
select distinct customer-namefrom borrower, loanwhere borrower.loan-number = loan.loan-number and
branch-name = “Perryridge” and (branch-name, customer-name) in
(select branch-name, customer-namefrom depositor, accountwhere depositor.account-number =
account.account-number)
123JYP
The Some ClauseThe Some Clause
F <comp> some r t(t r [F <comp> t])Where <comp> can be:
056
(5< some ) = true
05
0
) = false
5
05(5 some ) = true (since 0 5)
(read: 5 < some tuple in the relation)
(5< some
) = true(5 = some
(= some) inHowever, ( some) not in
124JYP
Example QueryExample Query
Q18:Find all branches that have greater assets than some branch located in Brooklyn.
select branch-namefrom branchwhere assets >some
(select assetsfrom branchwhere branch-city = “Brooklyn”)
125JYP
The All ClauseThe All Clause
F <comp> all r t(t r [F <comp> t])
056
(5< all ) = false
610
4
) = true
5
46(5 all ) = true (since 5 4 and 5 6)
(5< all
) = false(5 = all
( all) not inHowever, (= all) in
126JYP
Example QueryExample Query
Q19:Find the names of all branches that have greater assets than all branches located in Horsenneck.
select branch-namefrom branchwhere assets > all
(select assetsfrom branchwhere branch-city = “Horsenneck”)
127JYP
Test for Empty RelationsTest for Empty Relations
The exists construct returns the value true if the argument subquery is nonempty.
exists r r Ø
not exists r r = Ø
128JYP
Example QueryExample Query
Find all customers who have an account at all branches located in Brooklyn.
select distinct S.customer-namefrom depositor as Swhere not exists (
(select branch-namefrom branchwhere branch-city = “Brooklyn”)
except(select R.branch-namefrom depositor as T, account as Rwhere T.account-number = R.account-number and
S.customer-name=T.customer-name))
Note that X – Y = Ø X Y
129JYP
Test for Absence of Duplicate TuplesTest for Absence of Duplicate Tuples
The unique construct tests whether a subquery has any duplicate tuples in its result.
Q20:Find all customers who have only one account at the Perryridge branch.
select T.customer-namefrom depositor as Twhere unique (
select R.customer-namefrom account, depositor as Rwhere T.customer-name = R.customer-name and
R.account-number = account.account-number
andaccount.branch-name = “Perryridge”)
Note: in ACCESS, it does not work when the subquery return more than one tuple.
130JYP
Example QueryExample Query
Find all customers who have at least two accounts at the Perryridge branch.
select distinct T.customer-namefrom depositor Twhere not unique (
select R.customer-namefrom account, depositor as Rwhere T.customer-name = R.customer-name and
R.account-number - account.account-number andaccount.branch-name = “Perryridge”)
131JYP
Derived RelationsDerived Relations
Find the average account balance of those branches where the average account balance is greater than $1200.
select branch-name, avg-balancefrom (select branch-name, avg (balance)
from accountgroup by branch-name)
as result (branch-name, avg-balance)where avg-balance > 1200
Note that we do not need to use the having clause, since we compute in the from clause the temporary relation result, and the attributes of result can be used directly in the where clause.
132JYP
ViewsViews
Provide a mechanism to hide certain data from the view of certain users. To create a view we use the command:
create view v as <query expression>
where: <query expression> is any legal expression
The view name is represented by v
133JYP
Example QueriesExample Queries
A view consisting of branches and their customerscreate view all-customer as
(select branch-name, customer-namefrom depositor, accountwhere depositor.account-number = account.account-number)
union(select branch-name, customer-namefrom borrower, loanwhere borrower.loan-number = loan.loan-number)
Find all customers of the Perryridge branch
select customer-namefrom all-customerwhere branch-name = “Perryridge”
134JYP
Modification of the Database – DeletionModification of the Database – Deletion
Delete all account records at the Perryridge branch
delete from depositorwhere account-number in (select account-number
from account where branch-name =
“Perryridge”)
delete from accountwhere branch-name = “Perryridge”
135JYP
Example DeletionExample Deletion
Delete all accounts at every branch located in Needham.
delete from depositorwhere account-number in (select account-number
from branch, accountwhere branch-city = “Needham”and branch.branch-name = account.branch-
name)
delete from accountwhere branch-name in (select branch-name
from branchwhere branch-city = “Needham”)
136JYP
Example Deletion (Cont.)Example Deletion (Cont.)
Delete the record of all accounts with balances below the average at the bank.
delete from accountwhere balance < (select avg (balance)
from account) Problem: as we delete tuples from account, the average balance
changes
Solutions used in SQL:
1. First, compute avg balance and find all tuples to delete
2. Next, delete all tuples found above (without recomputing avg or retesting the tuples)
137JYP
Modification of the Database – Modification of the Database – InsertionInsertion
Add a new tuple to account
insert into accountvalues (“Perryridge”, “A-9732”, 1200)
or equivalently
insert into account (branch-name, balance, account-number)values (“Perryridge”, 1200, “A-9732”)
Add a new tuple to account with balance set to null
insert into accountvalues (“Perryridge”, “A-777”, null)
138JYP
Modification of the Database – Modification of the Database – InsertionInsertion
Provide as a gift for all loan customers of the Perryridge branch, a $200 savings account. Let the loan number serve as the account number for the new savings account
insert into accountselect branch-name, loan-number, 200from loanwhere branch-name = “Perryridge”
insert into depositorselect customer-name, loan-numberfrom loan, borrowerwhere branch-name = “Perryridge”and loan.account-number = borrower.account-number
139JYP
Modification of the Database – UpdatesModification of the Database – Updates
Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%. Write two update statements:
update accountset balance = balance 1.06where balance > 10000
update accountset balance = balance 1.05where balance 10000
The order is important
140JYP
Update of a ViewUpdate of a View
Create a view of all loan data in loan relation, hiding the amount attribute
create view branch-loan asselect branch-name, loan-numberfrom loan
Add a new tuple to branch-loan
insert into branch-loanvalues (“Perryridge”,” L-307”)
This insertion must be represented by the insertion of the tuple
(“Perryridge”, “L-307”, null)
into the loan relation
Updates on more complex views are difficult or impossible to translate, and hence are disallowed.
141JYP
Joined RelationsJoined Relations
Join operations take two relations and return as a result another relation.
These additional operations are typically used as subquery expressions in the from clause.
Join condition – defines which tuples in the two relations match, and what attributes are present in the result of the join.
Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated.
Join Types
inner joinleft outer joinright outer joinfull outer join
Join Conditions
naturalon <predicate>using (A1, A2, ..., An)
142JYP
Joined Relations – Datasets for Joined Relations – Datasets for ExamplesExamples
Relation loan
branch-name loan-number amount
Downtown
Redwood
Perryridge
L-170
L-230
L-260
3000
4000
1700
Relation borrower
customer-name loan-number
Jones
Smith
Hayes
L-170
L-230
L-155
143JYP
Joined Relations – Examples Joined Relations – Examples
Q21:loan inner join borrower onloan.loan-number = borrower.loan-number
branch-name loan-number amount
Downtown
Redwood
L-170
L-230
3000
4000
customer-name loan-number
Jones
Smith
L-170
L-230
branch-name loan-number amount
Downtown
Redwood
Perryridge
L-170
L-230
L-260
3000
4000
1700
customer-name loan-number
Jones
Smith
null
L-170
L-230
null
Q22: loan left outer join borrower onloan.loan-number = borrower.loan-number
144JYP
Joined Relations – ExamplesJoined Relations – Examples
loan natural inner join borrower
branch-name loan-number amount
Downtown
Redwood
L-170
L-230
3000
4000
customer-name
Jones
Smith
loan natural right outer join borrower
branch-name loan-number amount
Downtown
Redwood
null
L-170
L-230
L-155
3000
4000
null
customer-name
Jones
Smith
Hayes
145JYP
Joined Relations – ExamplesJoined Relations – Examples
loan natural full outer join borrower using (loan-number)
branch-name loan-number amount
Downtown
Redwood
Perryridge
null
L-170
L-230
L-260
L-155
3000
4000
1700
null
customer-name
Jones
Smith
null
Hayes
Find all customers who have either an account or a loan (but not both) at the bank.
select customer-namefrom (depositor natural full outer join borrower)where account-number is null or loan-number is null
146JYP
Data Definition Language (DDL)Data Definition Language (DDL)
The schema for each relation.
The domain of values associated with each attribute.
Integrity constraints
The set of indices to be maintained for each relations.
Security and authorization information for each relation.
The physical storage structure of each relation on disk.
Allows the specification of not only a set of relations but also information about each relation, including:
147JYP
Domain Types in SQLDomain Types in SQL
char(n). Fixed length character string, with user-specified length n.
varchar(n). Variable length character strings, with user-specified maximum length n.
integer. Integer (a finite subset of the integers that is machine-dependent).
smallint. Small integer (a machine-dependent subset of the integer domain type).
numeric(p,d). Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.
148JYP
Domain Types in SQL (Cont.)Domain Types in SQL (Cont.)
real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision.
float(n). Floating point number, with user-specified precision of at least n digits.
date. Dates, containing a (4 digit) year, month and date.
time. Time of day, in hours, minutes and seconds. Null values are allowed in all the domain types. Declaring an
attribute to be not null prohibits null values for that attribute.
create domain construct in SQL-92 creates user-defined domain types
create domain person-name char(20) not null
149JYP
Create Table ConstructCreate Table Construct
An SQL relation is defined using the create table command:
create table r (A1 D1, A2 D2, ..., An Dn,(integrity-constraint1),...,(integrity-constraintk))
r is the name of the relation
each Ai is an attribute name in the schema of relation r
Di is the data type of values in the domain of attribute Ai
Example:
create table branch(branch-name char(15) not null,branch-city char(30),assets integer)
150JYP
Integrity Constraints in Create TableIntegrity Constraints in Create Table
not null
primary key (A1, ..., An)
check (P) , where P is a predicate
Example: Declare branch-name as the primary key for branch and ensure that the values of assets are non-negative.
create table branch(branch-name char(15) not null,branch-city char(30),assets integer,primary key (branch-name),check (assets >=0))
primary key declaration on an attribute automatically ensures not null in SQL-92
151JYP
Drop and Alter Table ConstructsDrop and Alter Table Constructs
The drop table command deletes all information about the dropped relation from the database.
The alter table command is used to add attributes to an existing relation. All tuples in the relation are assigned null as the value for the new attribute. The form of the alter table command is
alter table r add A D
where A is the name of the attribute to be added to relation r and D is the domain of A.
The alter table command can also be used to drop attributes of a relation
alter table r drop Awhen A is the name of an attribute of relation r.
152JYP
Embedded SQLEmbedded SQL
The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal, PL/I, Fortran, C, and Cobol.
A language to which SQL queries are embedded is referred to as a host language, and the SQL structures permitted in the host language comprise embedded SQL.
The basic form of these languages follows that of the system R embedding of SQL into PL/I.
EXEC SQL statement is used to identify embedded SQL request to the preprocessor
EXEC SQL <embedded SQL statement > END-EXEC
153JYP
Example QueryExample Query
Specify the query in SQL and declare a cursor for it
EXEC SQL
declare c cursor for select customer-name, account-numberfrom depositor, accountwhere depositor account-number = account.account-number
and account.balance > :amount
END-EXEC
From within a host language, find the names and account numbers of customers with more than the variable amount dollars in some account.
154JYP
Embedded SQL (Cont.)Embedded SQL (Cont.)
The open statement causes the query to be evaluated
EXEC SQL open c END-EXEC
The fetch statement causes the values of one tuple in the query result to be placed on host language variables.
EXEC SQL fetch c into :cn , :an END-EXECRepeated calls to fetch get successive tuples in the query result; a variable in the SQL communication area indicates when end-of-file is reached.
The close statement causes the database system to delete the temporary relation that holds the result of the query.
EXEC SQL close c END-EXEC
155JYP
Dynamic SQLDynamic SQL
Allows programs to construct and submit SQL queries at run time.
Example of the use of dynamic SQL from within a C program.
char * sqlprog = “ update account set balance = balance * 1.05 where account-number = ? ”EXEC SQL prepare dynprog from :sqlprog;char account [10] = “A-101”;EXEC SQL execute dynprog using :account;
The dynamic SQL program contains a ?, which is a place holder for a value that is provided when the SQL program is executed.
156JYP
Other SQL FeaturesOther SQL Features
Fourth-generation languages – special language to assist application programmers in creating templates on the screen for a user interface, and in formatting data for report generation, available in most commercial database products
SQL sessions – provide the abstraction of a client and a server (possibly remote) client connects to an SQL server, establishing a session
executes a series of statements
disconnects the session
can commit or rollback the work carried out in the session
157JYP
An SQL environment contains several components, including a user identifier, and a schema, which identifies which of several schemas a session is using.
the instructions appear in each individual transaction.
serializable schedules to be generated, while others allow view-serializable schedules that are not conflict-serializable.
158JYP
Topic 5: Integrity ConstraintsTopic 5: Integrity Constraints
Domain Constraints
Referential Integrity
Assertions
Triggers
Functional Dependencies
159JYP
Domain ConstraintsDomain Constraints
Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency.
Domain constraints are the most elementary form of integrity constraint.
They test values inserted in the database, and test queries to ensure that the comparisons make sense.
160JYP
Domain Constraints (Cont.)Domain Constraints (Cont.)
The check clause in SQL-92 permits domains to be restricted: Use check clause to ensure that an hourly-wage domain allows only
values greater than a specified value.
create domain hourly-wage numeric(5,2)constraint value-test check(value > = 4.00)
The domain hourly-wage is declared to be a decimal number with 5 digits, 2 of which are after the decimal point
The domain has a constraint that ensures that the hourly-wage is greater than or equal to 4.00
The clause constraint value-test is optional; useful to indicate which constraint an update violated.
161JYP
Referential IntegrityReferential Integrity
Ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. Example: If “Perryridge” is a branch name appearing in one of the
tuples in the account relation, then there exists a tuple in the branch relation for branch “Perryridge”.
Formal Definition
Let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively.
The subset of R2 is a foreign key referencing K1 in relation r1, if for every t2 in r2 there must be a tuple t1 in r1 such that t1[K1] = t2[].
Referential integrity constraint: (r2) K1 (r1)
162JYP
referential Integrity in the E-R Modelreferential Integrity in the E-R Model
Consider relationship set R between entity sets E1 and E2. The relational schema for R includes the primary keys K1 of E1 and K2 of E2.Then K1 and K2 form foreign keys on the relational schemas for E1 and E2 respectively.
Weak entity sets are also a source of referential integrity constraints. For the relation schema for a weak entity set must include the primary key of the entity set on which it depends.
163JYP
Database ModificationDatabase Modification
The following tests must be made in order to preserve the following referential integrity constraint:
(r2) K1 (r1)
Insert. If a tuple t2 is inserted into r2, the system must ensure that there is a tuple t1 in r1 such that t1[K1] = t2[]. That is
t2 [] K1 (r1)
Delete. If a tuple, t1 is deleted from r1, the system must compute the set of tuples in r2 that reference t1:
= t1[K1] (r2)
If this set is not empty, either the delete command is rejected as an error, or the tuples that reference t1 must themselves be deleted (cascading deletions are possible).
164JYP
Database Modification (Cont.)Database Modification (Cont.)
Update. There are two cases:
If a tuple t2 is updated in relation r2 and the update modifies values
for foreign key , then a test similar to the insert case is made. Let t2’ denote the new value of tuple t2. The system must ensure that
t2’[] K1(r1)
If a tuple t1 is updated in r1, and the update modifies values for the
primary key (K1), then a test similar to the delete case is made. The
system must compute
= t1[K1] (r2)
using the old value of t1 (the value before the update is applied). If
this set is not empty, the update may be rejected as an error, or the update may be cascaded to the tuples in the set, or the tuples in the set may be deleted.
165JYP
Referential Integrity in SQLReferential Integrity in SQL
Primary and candidate keys and foreign keys can be specified as part of the SQL create table statement: The primary key clause of the create table statement includes a
list of the attributes that comprise the primary key.
The unique key clause of the create table statement includes a list of the attributes that comprise a candidate key.
The foreign key clause of the create table statement includes both a list of the attributes that comprise the foreign key and the name of the relation referenced by the foreign key.
166JYP
Referential Integrity in SQL – ExampleReferential Integrity in SQL – Example
create table customer(customer-name char(20) not null,customer-street char(30),customer-city char(30),primary key (customer-name))
create table branch(branch-name char(15) not null,branch-city char(30),assets integer,primary key (branch-name))
167JYP
Referential Integrity in SQL – Example (Cont.)Referential Integrity in SQL – Example (Cont.)
create table account(branch-name char(15),account-number char(10) not null,balance integer,primary key (account-number), foreign key (branch-name) references branch)
create table depositor(customer-name char(20) not null,account-number char(10) not null,primary key (customer-name, account-number),foreign key (account-number) references account,foreign key (customer-name) references customer)
168JYP
Cascading Actions in SQLCascading Actions in SQL
create table account
. . .foreign key(branch-name) references branch
on delete cascadeon update cascade.
. . . )
Due to the on delete cascade clauses, if a delete of a tuple in branch results in referential-integrity constraint violation, the delete “cascades” to the account relation, deleting the tuple that refers to the branch that was deleted.
Cascading updates are similar.
169JYP
Cascading Actions in SQL (Cont.)Cascading Actions in SQL (Cont.)
If there is a chain of foreign-key dependencies across multiple relations, with on delete cascade specified for each dependency, a deletion or update at one end of the chain can propagate across the entire chain.
If a cascading update to delete causes a constraint violation that cannot be handled by a further cascading operation, the system aborts the transaction. As a result, all the changes caused by the transaction and its cascading actions are undone.
170JYP
AssertionsAssertions
An assertion is a predicate expressing a condition that we wish the database always to satisfy.
An assertion in SQL-92 takes the form
create assertion <assertion-name> check <predicate>
When an assertion is made, the system tests it for validity. This testing may introduce a significant amount of overhead; hence assertions should be used with great care.
171JYP
Assertion ExampleAssertion Example
The sum of all loan amounts for each branch must be less than the sum of all account balances at the branch.
create assertion sum-constraint check(not exists (select * from branch
where (select sum(amount) from loanwhere loan.branch-name = branch.branch-name)
>=(select sum(balance) from accountwhere account.branch-name = branch.branch-name)))
172JYP
Assertion ExampleAssertion Example Every loan has at least one customer who maintains an account
with a minimum balance of $1000.
create assertion balance-constraint check
(not exists (select * from loan
where not exists ( select *
from borrower, depositor, account
where loan.loan-number = borrower.loan-number
and borrower.customer-name = depositor.customer-name
and depositor.account-number = account.account-number
and account.balance >=1000)))
Note if a loan has no customer who maintains an account with a
minimum balance of $1000, the sub-query will evaluate to .
173JYP
Assertion ExampleAssertion Example All borrower of every loan should maintain an account with a
minimum balance of $1000.
create assertion balance-constraint check
(not exists (select * from loan
where exists ( select *
from borrower, depositor, account
where loan.loan-number = borrower.loan-number
and borrower.customer-name = depositor.customer-name
and depositor.account-number = account.account-number
and account.balance <1000))
and not exists ( select * from loan
where not exists ( select *
from borrower, depositor
where borrower.customer-name = depositor.customer-name
and borrower.loan-number = loan.loan-number)))
174JYP
TriggersTriggers
A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.
To design a trigger mechanism, we must: Specify the conditions under which the trigger is to be executed.
Specify the actions to be taken when the trigger executes.
The SQL-92 standard does not include trigger, but many implementations support triggers.
175JYP
Trigger ExampleTrigger Example
Suppose that instead of allowing negative account balances, the bank deals with overdrafts by setting the account balance to zero
creating a loan in the amount of the overdraft
giving this loan a loan number identical to the account number of the overdrawn account
The condition for executing the trigger is an update to the account relation that results in a negative balance value.
176JYP
Trigger Example (Cont.)Trigger Example (Cont.)
define trigger overdraft on update of account T(if new T.balance < 0then (insert into loan values
(T.branch-name, T.account-number, –new T.balance)insert into borrower
(select customer-name, account-numberfrom depositorwhere T.account-number = depositor account-
number)update account Sset S.balance = 0where S.account-number = T.Account-number))
The keyword new used before T.balance indicates that the value of T.balance after the update should be used; if it is omitted, the value before the update is used.
177JYP
Functional DependenciesFunctional Dependencies
Constraints on the set of legal relations.
Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.
A functional dependency is a generalization of the notion of a key.
178JYP
Functional Dependencies (Cont.)Functional Dependencies (Cont.)
Let R be a relation schema
R R
The functional dependency
holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,
t1[] = t2 [] t1[] = t2 []
K is a superkey for relation schema R if and only if K R
K is a candidate key for R if and only if K R, and
There is no K, R
179JYP
Functional Dependencies (Cont.)Functional Dependencies (Cont.)
Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:
Loan-info-schema = (branch-name, loan-number,customer-name, amount).
We expect this set of functional dependencies to hold:
loan-number amountloan-number branch-name
but would not expect the following to hold:
loan-number customer-name
180JYP
Use of Functional DependenciesUse of Functional Dependencies
We use functional dependencies to: test relations to see if they are legal under a given set of functional
dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.
specify constraints on the set of legal relations; we say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.
Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number customer-name.
181JYP
Topic 6: TransactionsTopic 6: Transactions
Transaction Concept
Transaction State
Implementation of Atomicity and Durability
Concurrent Executions
Serializability
Recoverability
Implementation of Isolation
Transaction Definition in SQL
Testing for Serializability.
182JYP
Transaction ConceptTransaction Concept
A transaction is a unit of program execution that accesses and possibly updates various data items.
A transaction must see a consistent database.
During transaction execution the database may be inconsistent.
When the transaction is committed, the database must be consistent.
Two main issues to deal with: Failures of various kinds, such as hardware failures and
system crashes
Concurrent execution of multiple transactions
183JYP
ACID PropertiesACID Properties
Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Consistency. Execution of a transaction in isolation preserves the consistency of the database.
Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished.
Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
To preserve integrity of data, the database system must ensure:
184JYP
Example of Fund TransferExample of Fund Transfer
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Consistency requirement — the sum of A and B is unchanged by the execution of the transaction.
Atomicity requirement — if the transaction fails after step 3 and before step 6, the system should ensure that its updates are not reflected in the database, otherwise an inconsistency will result.
185JYP
Example of Fund Transfer (Cont.)Example of Fund Transfer (Cont.)
Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist despite failures.
Isolation requirement — if between steps 3 and 6, another transaction is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be).Can be ensured trivially by running transactions serially, that is one after the other. However, executing multiple transactions concurrently has significant benefits, as we will see.
186JYP
Transaction StateTransaction State
Active, the initial state; the transaction stays in this state while it is executing
Partially committed, after the final statement has been executed.
Failed, after the discovery that normal execution can no longer proceed.
Aborted, after the transaction has been rolled back and the database restored to its state prior to the start of the transaction.
Committed, after successful completion.
187JYP
Transaction State (Cont.)Transaction State (Cont.)
188JYP
Implementation of Atomicity and Implementation of Atomicity and DurabilityDurability
The recovery-management component of a database system implements the support for atomicity and durability.
189JYP
Concurrent ExecutionsConcurrent Executions
Multiple transactions are allowed to run concurrently in the system. Advantages are: increased processor and disk utilization, leading to better
transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk
reduced average response time for transactions: short transactions need not wait behind long ones.
Concurrency control schemes – mechanisms to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database
190JYP
SchedulesSchedules
Schedules – sequences that indicate the chronological order in which instructions of concurrent transactions are executed a schedule for a set of transactions must consist of all instructions of
those transactions
must preserve the order in which the instructions appear in each individual transaction.
191JYP
Example SchedulesExample Schedules Let T1 transfer $50 from A to B, and T2 transfer 10% of
the balance from A to B. The following is a serial schedule (Schedule 1), in which T1 is followed by T2.
T1 T2
read(A)A := A – 50write(A)read(B)B := B + 50write(B)
read(A)temp := A*0.1;A := A – tempwrite(A)read(B)B := B + tempwrite(B)
192JYP
Let T1 and T2 be the transactions defined previously. The following schedule (Schedule 3) is not a serial schedule, but it is equivalent to Schedule 1.
T1 T2
read(A)A := A – 50write(A)
read (A)temp := A*0.1A = A – tempwrite(A)
read(B)B := B + 50write(B)
read(B)B := B + tempwrite(B)
In both Schedule 1 and 3, the sum A + B is preserved.
193JYP
Example Schedules (Cont.)Example Schedules (Cont.) The following concurrent schedule (Schedule 4) does not preserve the value of the sum A + B.
T1 T2
read(A)A := A – 50
read (A)temp := A*0.1
A = A – tempwrite(A)read(B)
write(A)read(B)B := B + 50write(B)
B := B + tempwrite(B)
A=90 A=40
temp=4 A=36
B=80
A=40
B=130
B=84
194JYP
SerializabilitySerializability
Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
We ignore operations other than read and write instructions, and we assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions.
195JYP
Conflict SerializabilityConflict Serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.2. li = read(Q), lj = write(Q). They conflict.3. li = write(Q), lj = read(Q). They conflict4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.
196JYP
Conflict Serializability (Cont.)Conflict Serializability (Cont.)
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule
Example of a schedule that is not conflict serializable:
T3 T4
read(Q)write(Q)
write(Q)
We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.
197JYP
Conflict Serializability (Cont.)Conflict Serializability (Cont.)
Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.
T1 T2
read(A)write(A)
read (A)write(A)
read (B) write(B)
read (B) write(B)
198JYP
View Serializability*View Serializability* Let S and S´ be two schedules with the same set of transactions. S
and S´ are view equivalent if the following three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj.
3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads and writes alone.
199JYP
View Serializability *(Cont.)View Serializability *(Cont.)
A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable.
Schedule 9 — a schedule which is view-serializable but not conflict serializable.
T3 T4 T6
read(Q)write(Q)
write(Q)write (Q)
Every view serializable schedule which is not conflict serializable has bind writes ( no read before write).
200JYP
RecoverabilityRecoverability
Recoverable schedule — if a transaction Tj reads a data items previously written by a transaction Ti , the commit operation of Ti appears before the commit operation of Tj.
The following schedule (Schedule 11) is not recoverable if T9 commits immediately after the read
T8 T9
read(A)write(A)
read(A) read(B)
If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence database must ensure that schedules are recoverable.
Need to address the effect of transaction failures on concurrently running transactions.
201JYP
Recoverability (Cont.)Recoverability (Cont.)
Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)
T10 T11 T12
read(A)read(B)write(A)
read(A) write(A)
read(A)
If T10 fails, T11 and T12 must also be rolled back.
Can lead to the undoing of a significant amount of work
202JYP
Recoverability (Cont.)Recoverability (Cont.)
Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are cascadeless
203JYP
Implementation of IsolationImplementation of Isolation
Schedules must be conflict or view serializable, and recoverable, for the sake of database consistency, and preferably cascadeless.
A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency.
Concurrency-control schemes tradeoff between the amount of concurrency they allow and the amount of overhead that they incur.
Some schemes allow only conflict-serializable schedules to be generated, while others allow view-serializable schedules that are not conflict-serializable.
204JYP
Transaction Definition in SQLTransaction Definition in SQL
Data manipulation language must include a construct for specifying the set of actions that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by: Commit work commits current transaction and begins a new one.
Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92: Serializable — default
Repeatable read
Read committed
Read uncommitted
205JYP
Levels of Consistency in SQL-92Levels of Consistency in SQL-92
Serializable — default
Repeatable read —only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others that have not yet been commited.
Read committed — only committed records can be read, but successive reads of record may return different (but committed) values.
Read uncommitted — even uncommitted records may be read.
Lower degrees of consistency useful for gathering approximateinformation about the database, e.g., statistics for query optimizer.
206JYP
Testing for SerializabilityTesting for Serializability
Consider some schedule of a set of transactions T1, T2, ..., Tn
Precedence graph — a direct graph where the vertices are the transactions (names).
We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data item on which the conflict arose earlier.
We may label the arc by the item that was accessed.
Example 1
x
y
207JYP
Example Schedule (Schedule A)Example Schedule (Schedule A)T1 T2 T3 T4 T5
read(X)read(Y)read(Z)read(V)read(W)read(W)read(Y)write(Y)write(Z)read(U)read(Y)write(Y)read(Z)write(Z)
read(U)write(U)
208JYP
Precedence Graph for Schedule APrecedence Graph for Schedule A
T3
T4T1
T2
T5
209JYP
Test for Conflict SerializabilityTest for Conflict Serializability
A schedule is conflict serializable if and only if its precedence graph is acyclic.
Cycle-detection algorithms exist which take order n2 time, where n is the number of vertices in the graph. (Better algorithms take order n + e where e is the number of edges.)
If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph. This is a linear order consistent with the partial order of the graph.For example, a serializability order for Schedule A would beT5 T1 T3 T2 T4 .
210JYP
Test for View SerializabilityTest for View Serializability
The precedence graph test for conflict serializability must be modified to apply to a test for view serializability.
Construct a labeled precedence graph. Look for an acyclic graph which is derived from the labeled precedence graph. Schedule is view serializable if and only if such an acyclic graph can be found.
The problem of looking for such an acyclic graph falls in the class of NP-complete problems. Thus existence of an efficient algorithm is unlikely.
However practical algorithms that just check some sufficient conditions for view serializability can still be used.
211JYP
Concurrency Control vs. Serializability TestsConcurrency Control vs. Serializability Tests
Testing a schedule for serializability after it has executed is a little too late!
Goal – to develop concurrency control protocols that will assure serializability. They will generally not examine the precedence graph as it is being created; instead a protocol will impose a discipline that avoids nonseralizable schedules.Will study such protocols later.
Tests for serializability help understand why a concurrency control protocol is correct.
212JYP
Topic 7: Concurrency ControlTopic 7: Concurrency Control
Lock-Based Protocols
Multiple Granularity
Deadlock Handling
Insert and Delete Operations
213JYP
Lock-Based ProtocolsLock-Based Protocols
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted.
214JYP
Lock-Based Protocols (Cont.)Lock-Based Protocols (Cont.)
Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions
The matrix allows any number of transactions to hold shared locks on an item, but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item.
S X
falsetrueS
X true false
215JYP
Lock-Based Protocols (Cont.)Lock-Based Protocols (Cont.)
If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.
216JYP
Lock-Based Protocols (Cont.)Lock-Based Protocols (Cont.)
Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules.
217JYP
Pitfalls of Lock-Based ProtocolsPitfalls of Lock-Based Protocols
Consider the partial schedule
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A.
Such a situation is called a deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its locks released.
T3 T4
lock-X(B)read(B)B:= B-50write(B)
lock-S(A)read(A)lock-S(B)
lock-X(A)
218JYP
Pitfalls of Lock-Based Protocols (Cont.)Pitfalls of Lock-Based Protocols (Cont.)
The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.
Starvation is also possible if concurrency control manager is badly designed. For example: A transaction may be waiting for an X-lock on an item, while a
sequence of other transactions request and are granted an S-lock on the same item.
The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.
219JYP
The Two-Phase Locking ProtocolThe Two-Phase Locking Protocol
This is a protocol which ensures conflict-serializable schedules.
Phase 1: Growing Phase transaction may obtain locks
transaction may not release locks
Phase 2: Shrinking Phase transaction may release locks
transaction may not obtain locks
The protocol assures serializability. It can be proved that the
transactions can be serialized in the order of their lock points
(i.e. the point where a transaction acquired its final lock).
220JYP
The Two-Phase Locking Protocol (Cont.)The Two-Phase Locking Protocol (Cont.)
Two-phase locking does not ensure freedom from deadlocks
Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.
Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.
221JYP
The Two-Phase Locking Protocol (Cont.)The Two-Phase Locking Protocol (Cont.)
There can be conflict serializable schedules that cannot be obtained if two-phase locking is used.
e.g., T1 T2
lock-S(X)read (X)unlock-S(X)
lock-X(Y)write (Y)unlock-X(Y)
lock-X(X)write (X)unlock-X(X)
222JYP
The Two-Phase Locking Protocol (Cont.)The Two-Phase Locking Protocol (Cont.)
However, in the absence of extra information (e.g., ordering of access to data), two-phase locking is needed for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not conflict serializable.
223JYP
Lock ConversionsLock Conversions
Two-phase locking with lock conversions:
– First Phase: can acquire a lock-S on item
can acquire a lock-X on item
can convert a lock-S to a lock-X (upgrade)
– Second Phase: can release a lock-S
can release a lock-X
can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on the
programmer to insert the various locking instructions.
224JYP
Automatic Acquisition of LocksAutomatic Acquisition of Locks
A transaction Ti issues the standard read/write instruction,
without explicit locking calls.
The operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else
begin
if necessary wait until no other
transaction has a lock-X on D
grant Ti a lock-S on D;
read(D) end
225JYP
Automatic Acquisition of Locks (Cont.)Automatic Acquisition of Locks (Cont.)
write(D) is processed as:
if Ti has a lock-X on D
then write(D) else begin if necessary wait until no other trans. has any lock on D,
if Ti has a lock-S on D then upgrade lock on D to lock-X else grant Ti a lock-X on D
write(D) end; All locks are released after commit or abort
226JYP
Multiple GranularityMultiple Granularity
Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones
Can be represented graphically as a tree
when a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents in the same mode.
Granularity of locking (level in tree where locking is done): fine granularity (lower in tree): high concurrency, high locking
overhead
coarse granularity (higher in tree): low locking overhead, low concurrency
227JYP
Example of Granularity HierarchyExample of Granularity Hierarchy
The highest level in the example hierarchy is the entire database.
The levels below are of type area, file and record in that order.
228JYP
Intention Lock ModesIntention Lock Modes
In addition to S and X lock modes, there are three additional lock modes with multiple granularity: intention-shared (IS): indicates explicit locking at a lower level of the
tree but only with shared locks.
intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks
shared and intention- exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and indicates explicit locking at a lower level with exclusive-mode locks.
intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.
229JYP
Compatibility Matrix with Intention Lock ModesCompatibility Matrix with Intention Lock Modes
The compatibility matrix for all lock modes is:
IS IX S S IX X
IS
IX
S
S IX
X
230JYP
Multiple Granularity Locking SchemeMultiple Granularity Locking Scheme
Transaction Ti can lock a node Q, using the following rules:
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in
any mode.
3. A node Q can be locked by Ti in IS or S mode only if the parent
of Q is currently locked by Ti in either IX or IS mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the
parent of Q is currently locked by Ti in either IX or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node
(that is, Ti is two-phase).
6. Ti can unlock a node Q only if none of the children of Q are
currently locked by Ti.
Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order.
231JYP
Deadlock HandlingDeadlock Handling
Consider the following two transactions:
T1: write (X) T2: write(Y)
write(Y) write(X)
Schedule with deadlock
T1 T2
lock-X on Xwrite (X)
lock-X on Ywrite (Y) wait for lock-X on X
wait for lock-X on Y
232JYP
Deadlock HandlingDeadlock Handling
System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention strategies : Require that each transaction locks all its data items before it begins
execution (predeclaration).
Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order (graph-based protocol).
233JYP
More Deadlock Prevention StrategiesMore Deadlock Prevention Strategies
Following schemes use transaction timestamps for the sake of deadlock prevention alone.
wait-die scheme — non-preemptive older transaction may wait for younger one to release data item.
Younger transactions never wait for older ones; they are rolled back instead.
a transaction may die several times before acquiring needed data item
wound-wait scheme — preemptive older transaction wounds (forces rollback) younger transaction
instead of waiting for it. Younger transactions may wait for older ones.
may be fewer rollbacks than wait-die scheme.
234JYP
Deadlock prevention (Cont.)Deadlock prevention (Cont.)
Both in wait-die and in wound-wait schemes, a rolled back transactions is restarted with its original timestamp. Older transactions thus have precedence over newer ones, and starvation is hence avoided.
Timeout-Based Schemes : a transaction waits for a lock only for a specified amount of time.
After that, the wait times out and the transaction is rolled back.
thus deadlocks are not possible
simple to implement; but starvation is possible. Also difficult to determine good value of the timeout interval.
235JYP
Deadlock Detection*Deadlock Detection*
Deadlocks can be described as a wait-for graph, which consists of a pair G = (V,E), V is a set of vertices (all the transactions in the system)
E is a set of edges; each element is an ordered pair Ti Tj.
If Ti Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting for Tj to release a data item.
When Ti requests a data item currently being held by Tj, then the edge Ti Tj is inserted in the wait-for graph. This edge is removed only when Tj is no longer holding a data item needed by Ti.
The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-detection algorithm periodically to look for cycles.
236JYP
Deadlock Detection* (Cont.)Deadlock Detection* (Cont.)
Wait-for graph with a cycle
Wait-for graph without a cycle
237JYP
Deadlock Recovery*Deadlock Recovery*
When deadlock is detected : Some transaction will have to rolled back (made a victim) to break
deadlock. Select that transaction as victim that will incur minimum cost.
Rollback -- determine how far to roll back transaction
Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to break deadlock.
Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation
238JYP
Topic 8: Database System ArchitecturesTopic 8: Database System Architectures
Centralized Systems
Client--Server Systems
Distributed Systems
239JYP
Centralized SystemsCentralized Systems
Run on a single computer system and do not interact with other computer systems.
General-purpose computer system: one to a few CPUs and a number of device controllers that are connected through a common bus that provides access to shared memory.
Single-user system (e.g., personal computer or workstation): desk-top unit, single user, usually has only one CPU and one or two hard disks; the OS may support only one user.
Multi-user system: more disks, more memory, multiple CPUs, and a multi-user OS. Serve a large number of users who are connected to the system via terminals. Often called server systems.
240JYP
Client-Server SystemsClient-Server Systems
Server systems satisfy requests generated at m client systems, whose general structure is shown below:
Client Client Client Client….
network
Server
241JYP
Client-Server Systems (Cont.)Client-Server Systems (Cont.)
Database functionality can be divided into: Back-end: manages access structures, query evaluation and
optimization, concurrency control and recovery.
Front-end: consists of tools such as forms, report-writers, and graphical user interface facilities.
The interface between the front-end and the back-end is through SQL or through an application program interface.
242JYP
Client-Server Systems (Cont.)Client-Server Systems (Cont.)
Advantages of replacing mainframes with networks of workstations or personal computers connected to back-end server machines: better functionality for the cost
flexibility in locating resources and expanding facilities
better user interfaces
easier maintenance
One kind of server systems is transaction servers which are widely used in relational database systems.
243JYP
Transaction ServersTransaction Servers
Also called query server systems or SQL server systems; clients send requests to the server system where the transactions are executed, and results are shipped back to the client.
Requests specified in SQL, and communicated to the server through a remote procedure call (RPC) mechanism.
Transactional RPC allows many RPC calls to collectively form a transaction.
Open Database Connectivity (ODBC) is an application program interface standard from Microsoft for connecting to a server, sending SQL requests, and receiving results.
244JYP
Distributed SystemsDistributed Systems
Data spread over multiple machines (also referred to as sites or nodes.
Network interconnects the machines
Data shared by users on multiple machines
Differentiate between local and global transactions A local transaction accesses data in the single site at which the
transaction was initiated.
A global transaction either accesses data in a site different from the one at which the transaction was initiated or accesses data in several different sites.
245JYP
Trade-offs in Distributed SystemsTrade-offs in Distributed Systems
Sharing data – users at one site able to access the data residing at some other sites.
Autonomy – each site is able to retain a degree of control over data stored locally.
Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.
Disadvantage: added complexity required to ensure proper coordination among sites. Software development cost.
Greater potential for bugs.
Increased processing overhead.
246JYP
Topic 9: New ApplicationsTopic 9: New Applications
Decision-Support Systems
Data Analysis
Data Mining
Data Warehousing
Spatial and Geographic Databases
Multimedia Databases
Mobility and Personal Databases
Information-Retrieval Systems
247JYP
Decision Support SystemDecision Support System
Decision-Support systems are used to make business decisions often based on data collected by on-line transaction-processing systems.
Examples of business decisions: what items to stock?
What insurance premium to change?
Who to send advertisements to?
Examples of data used for making decisions Retail sales transaction details
Customer profiles (income, age, sex, etc.)
248JYP
Decision-Support Systems: OverviewDecision-Support Systems: Overview
Data analysis tasks are simplified by SQL extensions.
Statistical analysis packages (e.g., S++) can be interfaced with databases Statistical analysis is a large field, will not study it here
Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from Large databases.
A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site.
249JYP
Data AnalysisData Analysis
Aggregate functions summarize large volumes of data
A histogram partitions the values taken by an attribute into ranges, and computes an aggregate over the values in each range; cumbersome to use standard SQL to construct a histogram. Extension proposed by Red Brick:
select percentile, avg (balance)
from account
group by N_tile (balance, 10) as percentile
Here, N_tile(balance,10) divides the values for balance into 10 consecutive ranges, with an equal number of values in each range, duplicates are not eliminated.
250JYP
Small Medium Large Total
LightDark
820
3510
10 5
5335
Total 28 45 15 88
Cross-tabulation of number by size and color of sample relation sales with the schema Sales(color, size, number).
Data Analysis (Cont.)Data Analysis (Cont.)
251JYP
Data Analysis (Cont.)Data Analysis (Cont.)Color Size Number
LightLightLightLightDarkDarkDarkDark
allallallall
SmallMediumLarge
allSmall
MediumLarge
allSmall
MediumLarge
all
835105320105
3528451588
Can represent subtotals in relational form by using the value all
E.g. : obtain (Light, all, 53) and (Dark, all, 35) by aggregating individual tuples with different values for size for each color.
252JYP
Data Analysis (Cont.)Data Analysis (Cont.)
Rollup: Moving from finer-granularity data to a coarser granularity by means of aggregation.
Drill down: Moving from coarser-granularity data to finer-granularity data.
Proposed extensions to SQL, such as the cube operation help to support generation of summary data
The following query generates the previous table.
select color, size, sum (number)
from sales
groupby color, size with cube
253JYP
Data Analysis (Cont.)Data Analysis (Cont.)
Figure shows the combinations of dimensions size, color, price
In general computing cube operation with n groupby columns gives 2n different groupby combinations.
254JYP
Data MiningData Mining
Like knowledge discovery in artificial intelligence data mining discovers statistical rules and patterns it differs from machine learning in that it deals with large volumes of data stored primarily on disk.
Knowledge discovered from a database can be represented by a set of rules.
e.g.,: “Young women with annual incomes greater than $50,000 are most likely to buy sports cars”
Discover rules using one of two models:
1. The user is involved directly in the process of knowledge discovery.
2. The system is responsible for automatically discovering knowledge from the database by detecting patterns and correlation's in the data.
255JYP
Knowledge Representation Using RulesKnowledge Representation Using Rules
General form of rules X antecedent consequent X is a list of one or more variables with associated ranges.
The rule transactions T, buys (T, bread) buys(T, milk) states: if there is a tuple (t1, bread) in the relation buys, there must also be a tuple (t1, milk) in the relation buys.
Population: Cross-product of the ranges of the variables in the rule. In the above example, the set of all transactions.
Support: Measure of what fraction of the population satisfies both the antecedent and the consequent of the rule. e.g., 10% of transactions buy bread and milk.
Confidence : Measure of how often the consequent is true when the antecedent is true. e.g., 80% of transactions that buy bread also buy milk
256JYP
Some Classes of Data-Mining ProblemsSome Classes of Data-Mining Problems
Classification : Finding rules that partition the given data into disjoint groups (classes) that are relevant for making a decision (e.g.,: which of several factors help classify a person’s credit
worthiness).
Associations: Useful to determine associations between different items (e.g.,: someone who buys bread is quite likely also to buy
milk).
Sequence correlations: determine correlations between related sequence data. (e.g., : when bond rates go up stock prices go down within
two days.)
257JYP
User-Guided Data MiningUser-Guided Data Mining
In user-guided data mining, primary responsibility for discovering rules is with the user.
User may runs tests on the database to verify or refute a hypothesis. Confidence and support for rules expressing a hypothesis are derived from the database.
An iterative process of forming and refining rules is used. Example: Test the hypothesis “People who hold master’s degrees are the most likely to have an excellent credit rating.” If confidence of rule is low, may refine it into the rule:
people P, P.degree = Masters and P.income 75, 000
P.credit = excellent
Data-visualization though graphical representations like maps, charts, and color-coding, helps detect patterns in data
258JYP
Classification RulesClassification Rules
Classification rules help assign new objects to a set of classes. E.g., given a new automobile insurance applicant, should he or she be classified as low risk, medium risk or high risk?
Classification rules for above example could use a variety of knowledge, such as educational level of applicant, salary of applicant, age of applicant, etc.
Classification rules can be compactly shown as a Classification tree.
259JYP
Part of Credit Risk Classification TreePart of Credit Risk Classification Tree
260JYP
Discovery of Classification Rules*Discovery of Classification Rules*
Training set: a data sample in which the grouping for each tuple is already known.
Top down generation of classification tree. Each internal node of the tree partitions the data into groups
based on the attribute.
The data at a node is not partitioned further if either all (or most) of the items at the node belong to the same class, or all attributes have been considered. Such a node is a leaf node.
Otherwise the data at the node is partitioned further by picking an attribute for partitioning data at the node.
261JYP
Discovery of Classification Rules *(Cont.)Discovery of Classification Rules *(Cont.)
Consider credit risk example: Suppose degree is chosen to partition the data at the root. Since degree has a small number of possible values, one child is created for each value.
At each child node of the root, further classification is done if required. Here, partitions are defined by income. Since income is a continuous attribute, some number of intervals are chosen, and one child created for each interval.
Different classification algorithms use different ways of choosing which attribute to partition on at each node, and what the intervals, if any, are.
In general, different branches of the tree could grow to different levels. Different nodes at the same level may use different partitioning attributes.
262JYP
Discovery of Association RulesDiscovery of Association Rules
Example: transactions T, buys (T, bread) buys (T, milk)
In general: notion of transaction , and its itemset, the set of items contained in the transaction
General form of rule:
transactions T, c(T, i1) and . . . and c(T, in) c(T, i0)
where c(T, ik) denotes that transaction T contains item ik.
Above can be represented as A b where A = {i1, i2, . . . , in}
and b = i0.
Support of rule = number of transactions whose itemsets contain A {b}
Usually desire rules with strong support, which will involve only items purchased in a significant percentage of the transactions.
263JYP
Discovery of Association Rules (Cont.)Discovery of Association Rules (Cont.)
Consider all possible sets of relevant items.
For each set find its support (i.e. , how many transactions purchase all items in the set).
Use sets with sufficiently high support to generate association rules. From set A generate the rules A - {b} b for each b A.
Support of each of the rules is support of A.
Confidence of a rule is support of A divided by support of A - {b}.
264JYP
Finding SupportFinding Support
Few sets: Determine level of support via a single pass. A count is maintained for each set, initially set to 0.
When a transaction is fetched, the count is incremented for each set of items which contained in the itemset of the transaction.
Sets with a high count at the end of the pass correspond to items with a high degree of association.
Many sets: If memory not enough to hold all counts for all sets Use multiple passes, considering only some sets in each pass.
Optimization: Once a set is eliminated because it occurs in too small a fraction of the transactions, none of its supersets needs to be considered.
265JYP
Data WarehousingData Warehousing
A data warehouse is a repository of information gathered from multiple sources.
266JYP
Data Warehousing (Cont.)Data Warehousing (Cont.)
Provides a single consolidated interface to data
Data stored for an extended period, providing access to historical data
Data/updates are periodically downloaded from online transaction processing (OLTP) systems. Typically, download happens each night.
Data may not be completely up-to-date, but is recent enough for analysis.
Running large queries at the warehouse ensures that OLTP systems are not affected by the decision-support workload.
267JYP
Issues in Building a WarehouseIssues in Building a Warehouse
When and how to gather data. Source driven: data source initiates data transfer
Destination driven: warehouse initiates data transfer
What schema to use. Schema integration
Cleaning and conversion of incoming data
What data to summarize. Raw data may be too large to store on-line
Aggregate values (totals/subtotals) often suffice
Queries on raw data can often be transformed by query optimizer to use aggregate values
How to propagate updates. Date at warehouse is a view on source data
Efficient view maintenance techniques required
268JYP
Spatial and Geographic DatabasesSpatial and Geographic Databases
Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying of spatial data.
Special purpose index structures are important for accessing spatial data, and for processing spatial join queries.
Design databases (or CAD databases) store design information about how objects are constructed e.g.: designs of buildings, aircraft, layouts of integrated-circuits
Geographic databases store geographic information (e.g., maps): often called geographic information systems or GIS.
269JYP
Representation of Geometric InformationRepresentation of Geometric Information
Various geometric constructs can be represented in a database in a normalized fashion.
Represent a line segment by the coordinates of its endpoints.
Approximate a curve by partitioning it into a sequence of segments; represents each segment as a separate tuple that also carries with it the identifier of the curve.
Closed polygons: list its vertices in order, starting vertex is the same as the ending vertex.
Alternative: triangulation — divide polygon into triangles; give the polygon an identifier with each of its
triangles.
270JYP
Representation of Geometric Constructs (Cont.)Representation of Geometric Constructs (Cont.)
271JYP
Representation of Geometric Information (Cont.)Representation of Geometric Information (Cont.)
Representation of points and line segment in 3-D similar to 2-D, except that points have an extra z component
Represent arbitrary polyhedra by dividing them into tetrahedrons, like triangulating polygons.
Alternative: List their faces, each of which is a polygon, along with an indication of which side of the face is inside the polyhedron.
272JYP
Design DatabasesDesign Databases
Represent design components as objects (generally geometric objects); the connections between the objects indicate how the design is structured.
Simple two-dimensional objects: points, lines, triangles, rectangles, polygons.
Complex two-dimensional objects: formed from simple objects via union, intersection, and difference operations.
Complex three-dimensional objects: formed from simpler objects such as spheres, cylinders, and cuboids, by union, intersection, and difference operations.
Wireframe models represent three-dimensional surfaces as a set of simpler objects.
273JYP
Representation of Geometric ConstructsRepresentation of Geometric Constructs
Design databases also store non-spatial information about objects (e.g., construction material, color, etc.)
Spatial integrity constraints are important.
E.g., pipes should not intersect, wires should not be too close to each other, etc.
(a) Difference of cylinders (b) Union of cylinders
274JYP
Geographic DataGeographic Data
Raster data consist of bit maps or pixel maps, in two or more dimensions. Example 2-D raster image: satellite image of cloud cover, where
each pixel stores the cloud visibility in a particular area.
Additional dimensions might include the temperature at different altitudes at different regions, or measurements taken at different points in time.
Design databases generally do not store raster data.
275JYP
Geographic Data (Cont.)Geographic Data (Cont.)
Vector data are constructed from basic geometric objects: points, line segments, triangles, and other polygons in two dimensions, and cylinders, speheres, cuboids, and other polyhedrons in three dimensions.
Vector format often used to represent map data. Roads can be considered as two-dimensional and represented
by lines and curves.
Some features, such as rivers, may be represented either as complex curves or as complex polygons, depending on whether their width is relevant.
Features such as regions and lakes can be depicted as polygons.
276JYP
Applications of Geographic DataApplications of Geographic Data Examples of geographic data
map data for vehicle navigation
distribution network information for power, telephones, water supply, and sewage
Vehicle navigation systems store information about roads and services for the use of drivers: Spatial data: e.g, road/restaurant/gas-station coordinates
Non-spatial data: e.g., one-way streets, speed limits, traffic congestion
Global Positioning System or GPS unit - utilizes information broadcast from GPS satellites to find the current location of user with an accuracy of tens of meters. increasingly used in vehicle navigation systems as well as utility maintenance
applications.
277JYP
Spatial QueriesSpatial Queries
Nearness queries request objects that lie near a specified location.
Nearest neighbor queries, given a point or an object, find the nearest object that satisfies given conditions.
Region queries deal with spatial regions. e.g., ask for objects that lie partially or fully inside a specified region.
Queries that compute intersections or unions of regions.
Spatial join of two spatial relations with the location playing the role of join attribute.
278JYP
Spatial Queries (Cont.)Spatial Queries (Cont.)
Spatial data is typically queried using a graphical query language; results are also displayed in a graphical manner.
Graphical interface constitutes the front-end
Extensions of SQL with abstract data types, such as lines, polygons and bit maps, have been proposed to interface with back-end. allows relational databases to store and retrieve spatial
information
queries can mix spatial and nonspatial conditions
extensions also include and allowing spatial conditions (contains or overlaps).
279JYP
Multimedia DatabasesMultimedia Databases
To provide such database functions as indexing and consistency, it is desirable to store multimedia data in a database (rather than storing them outside the database, in a file system).
The database must handle large object representation.
Similarity-based retrieval must be provided by special index structures.
Must provide guaranteed steady retrieval rates for continuos-media data.
280JYP
Similarity-Based RetrievalSimilarity-Based Retrieval
Pictorial data - Two pictures or images that are slightly different as represented in the database may be considered the same by a user.
E.g., identify similar designs for registering a new trademark.
Audio data- Speech-based user interfaces allow the user to give a command or identify a data item by speaking.
E.g., test user input against stored commands.
Handwritten data -- Identify a handwritten data item or command stored in the database (requires similarity testing).
281JYP
Continuos-Media DataContinuos-Media Data
Most important types are video and audio data.
Characterized by high data volumes and real-time information-delivery requirements. Data must be delivered sufficiently fast that there are no
gaps in the audio or video.
Data must be delivered at a rate that does not cause overflow of system buffers.
Synchronization among distinct data streams must be maintained (video of a person speaking must show lips moving synchronously with the audio).
282JYP
Multimedia Data FormatsMultimedia Data Formats
Store and transmit multimedia data in compressed form;JPEG is the most widely used format for image data.
Encoding each frame of a video using JPEG is wasteful since, successive frames of a video are often nearly the same.
MPEG standards use commonalties among a sequence of frames to achieve a greater degree of compression.
MPEG-1 stores a minute of 30-frame-per-second video and audio in approximately 12.5 MB (compares with 75 MB for video using only JPEG); quality comparable to VHS video tape.
MPEG-2 designed for digital broadcast systems and DVD; negligible loss of video quality. Compresses 1 minute of audio-video to approximately 17 MB.
MPEG-4
283JYP
Video ServersVideo Servers
Current video-on-demand servers are based on file systems; existing database systems do not meet real-time response requirements.
Multimedia data are stored on several disks (RAID configuration), or on tertiary storage for less frequently accessed data.
Head-end terminals - used to view multimedia data; PCs or TVs attached to a small, inexpensive computer called a set-top box.
Network - Transmission of multimedia data from a server to multiple head-end terminals requires a high-capacity network, such as asynchronous-transfer-mode (ATM) network.
284JYP
Mobility and Personal DatabasesMobility and Personal Databases
A mobile computing environment consists of mobile computers, referred to as mobile hosts, and a wired network of computers.
Mobile host may be able to communicate with wired network through a wireless digital communication network.
A model for mobile communication Mobile hosts communicate to the wired network via
computers referred to as mobile support stations.
Each mobile support station manages those mobile hosts within its cell.
When mobile hosts move between cells, there is a handoff of control from one mobile support station to another.
285JYP
Database Issues in Mobile ComputingDatabase Issues in Mobile Computing
New issues for query optimization. energy (battery power) is a scarce resource and its usage
must be minimized
mobile user’s locations may be a parameter of the query.
Users may need to be able to perform database updates even while the mobile computer is disconnected. e.g., mobile salesman records sale of products on (local
copy of) database.
Can result in conflicts detected on reconnection, which may need to be resolved manually.
286JYP
Disconnectivity and ConsistencyDisconnectivity and Consistency
A mobile host may remain in operation during periods of disconnection.
Problems created if the user of the mobile host issues queries and updates on data that resides or is cached locally: Recoverability: Updates entered on a disconnected machine
may be lost if the mobile host fails. Since the mobile host represents a single point of failure, stable storage cannot be simulated well.
Consistency : Cached data may become out of date, but the mobile host cannot discover this until it is reconnected.
287JYP
Information Retrieval SystemsInformation Retrieval Systems
Information retrieval (IR) systems use a simpler data model than database systems, but provide more powerful querying capabilities within the restricted model.
Queries attempt to locate documents that are of interest by specifying, for example, sets of keywords. e.g., find documents containing the words “database
systems”
Information retrieval systems order answers based on their estimated relevance. e.g., user may really only want documents about
database systems, but the system may retrieve all documents that mention the phrase “database systems”.
Documents may be ordered by, for example, how many times the phrase appears in the document.
288JYP
Differences From Database SystemsDifferences From Database Systems
Information retrieval systems, unlike traditional database systems, handle: Unstructured documents
Searching using keywords and relevance ranking
Most information retrieval systems do not handle: High update rates
Concurrency control
Data structured using more complex data models (e.g., relational or object oriented data models)
Complex queries written in, e.g., SQL
289JYP
Topic 10: XMLTopic 10: XML
XML: Extensible Markup Language
Defined by the WWW Consortium (W3C)
Originally intended as a document markup language not a database language Documents have tags giving extra information about sections of the
document
E.g. <title> XML </title> <slide> Introduction …</slide>
Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML
Extensible, unlike HTML
Users can add new tags, and separately specify how the tag should be handled for display
Goal was (is?) to replace HTML as the language for publishing documents on the Web
290JYP
XML Introduction (Cont.)XML Introduction (Cont.)
The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data, not just documents. Much of the use of XML has been in data exchange applications, not as a
replacement for HTML
Tags make data (relatively) self-documenting E.g.
<bank> <account>
<account-number> A-101 </account-number> <branch-name> Downtown </branch-name> <balance> 500 </balance>
</account> <depositor>
<account-number> A-101 </account-number> <customer-name> Johnson </customer-name>
</depositor>
</bank>
291JYP
XML: MotivationXML: Motivation
Data interchange is critical in today’s networked world Examples:
Banking: funds transfer
Order processing (especially inter-company orders)
Scientific data
– Chemistry: ChemML, …
– Genetics: BSML (Bio-Sequence Markup Language), …
Paper flow of information between organizations is being replaced by electronic flow of information
Each application area has its own set of standards for representing information
XML has become the basis for all new generation data interchange formats
292JYP
XML Motivation (Cont.)XML Motivation (Cont.) Earlier generation formats were based on plain text with line
headers indicating the meaning of fields Similar in concept to email headers
Does not allow for nested structures, no standard “type” language
Tied too closely to low level document structure (lines, spaces, etc)
Each XML based standard defines what are valid elements, using XML type specification languages to specify the syntax
DTD (Document Type Descriptors)
XML Schema
Plus textual descriptions of the semantics
XML allows new tags to be defined as required However, this may be constrained by DTDs
A wide variety of tools is available for parsing, browsing and querying XML documents/data
293JYP
Structure of XML DataStructure of XML Data
Tag: label for a section of data
Element: section of data beginning with <tagname> and ending with matching </tagname>
Elements must be properly nested Proper nesting
<account> … <balance> …. </balance> </account>
Improper nesting
<account> … <balance> …. </account> </balance>
Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.
Every document must have a single top-level element
294JYP
Example of Nested ElementsExample of Nested Elements
<bank-1> <customer>
<customer-name> Hayes </customer-name> <customer-street> Main </customer-street> <customer-city> Harrison </customer-city> <account>
<account-number> A-102 </account-number> <branch-name> Perryridge </branch-name> <balance> 400 </balance>
</account> <account> … </account>
</customer> . .
</bank-1>
295JYP
Motivation for NestingMotivation for Nesting
Nesting of data is useful in data transfer Example: elements representing customer-id, customer name, and
address nested within an order element
Nesting is not supported, or discouraged, in relational databases With multiple orders, customer name and address are stored
redundantly
normalization replaces nested structures in each order by foreign key into table storing customer name and address information
Nesting is supported in object-relational databases
But nesting is appropriate when transferring data External application does not have direct access to data referenced
by a foreign key
296JYP
Structure of XML Data (Cont.)Structure of XML Data (Cont.)
Mixture of text with sub-elements is legal in XML. Example:
<account> This account is seldom used any more. <account-number> A-102</account-number> <branch-name> Perryridge</branch-name> <balance>400 </balance></account>
Useful for document markup, but discouraged for data representation
297JYP
AttributesAttributes
Elements can have attributes <account acct-type = “checking” >
<account-number> A-102 </account-number> <branch-name> Perryridge </branch-name> <balance> 400 </balance>
</account>
Attributes are specified by name=value pairs inside the starting tag of an element
An element may have several attributes, but each attribute name can only occur once
<account acct-type = “checking” monthly-fee=“5”>
298JYP
Attributes Vs. SubelementsAttributes Vs. Subelements
Distinction between subelement and attribute In the context of documents, attributes are part of markup, while
subelement contents are part of the basic document contents
In the context of data representation, the difference is unclear and may be confusing
Same information can be represented in two ways
– <account account-number = “A-101”> …. </account>
– <account> <account-number>A-101</account-number> … </account>
Suggestion: use attributes for identifiers of elements, and use subelements for contents
299JYP
More on XML SyntaxMore on XML Syntax
Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag <account number=“A-101” branch=“Perryridge” balance=“200 />
To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below
<![CDATA[<account> … </account>]]>
Here, <account> and </account> are treated as just strings
300JYP
NamespacesNamespaces
XML data has to be exchanged between organizations
Same tag name may have different meaning in different organizations, causing confusion on exchanged documents
Specifying a unique string as an element name avoids confusion
Better solution: use unique-name:element-name
Avoid using long unique names all over document by using XML Namespaces
<bank Xmlns:FB=‘http://www.FirstBank.com’> …
<FB:branch> <FB:branchname>Downtown</FB:branchname>
<FB:branchcity> Brooklyn </FB:branchcity> </FB:branch>…
</bank>
301JYP
XML Document SchemaXML Document Schema
Database schemas constrain what information can be stored, and the data types of stored values
XML documents are not required to have an associated schema
However, schemas are very important for XML data exchange Otherwise, a site cannot automatically interpret data received from
another site
Two mechanisms for specifying XML schema Document Type Definition (DTD)
Widely used
XML Schema
Newer, increasing use
302JYP
Document Type Definition (DTD)Document Type Definition (DTD)
The type of an XML document can be specified using a DTD
DTD constraints structure of XML data What elements can occur
What attributes can/must an element have
What subelements can/must occur inside each element, and how many times.
DTD does not constrain data types All values represented as strings in XML
DTD syntax <!ELEMENT element (subelements-specification) >
<!ATTLIST element (attributes) >
303JYP
Element Specification in DTDElement Specification in DTD
Subelements can be specified as names of elements, or #PCDATA (parsed character data), i.e., character strings EMPTY (no subelements) or ANY (anything can be a subelement)
Example<! ELEMENT depositor (customer-name account-number)>
<! ELEMENT customer-name (#PCDATA)><! ELEMENT account-number (#PCDATA)>
Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)>
Notation:
– “|” - alternatives
– “+” - 1 or more occurrences
– “*” - 0 or more occurrences
304JYP
Bank DTDBank DTD
<!DOCTYPE bank [<!ELEMENT bank ( ( account | customer | depositor)+)><!ELEMENT account (account-number branch-name balance)><! ELEMENT customer(customer-name customer-street customer-city)><! ELEMENT depositor (customer-name account-number)><! ELEMENT account-number (#PCDATA)><! ELEMENT branch-name (#PCDATA)><! ELEMENT balance(#PCDATA)><! ELEMENT customer-name(#PCDATA)><! ELEMENT customer-street(#PCDATA)><! ELEMENT customer-city(#PCDATA)>
]>
305JYP
Attribute Specification in DTDAttribute Specification in DTD
Attribute specification : for each attribute Name Type of attribute
CDATA ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)
– more on this later Whether
mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED)
Examples <!ATTLIST account acct-type CDATA “checking”> <!ATTLIST customer
customer-id ID # REQUIREDaccounts IDREFS # REQUIRED >
306JYP
IDs and IDREFsIDs and IDREFs
An element can have at most one attribute of type ID
The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier
An attribute of type IDREF must contain the ID value of an element in the same document
An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document
307JYP
Bank DTD with AttributesBank DTD with Attributes
Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[
<!ELEMENT account (branch, balance)> <!ATTLIST account account-number ID # REQUIRED
owners IDREFS # REQUIRED> <!ELEMENT customer(customer-name, customer-street,
customer-city)> <!ATTLIST customer
customer-id ID # REQUIRED accounts IDREFS # REQUIRED>
… declarations for branch, balance, customer-name, customer-street and customer-city]>
308JYP
XML data with ID and IDREF attributesXML data with ID and IDREF attributes
<bank-2><account account-number=“A-401” owners=“C100 C102”>
<branch-name> Downtown </branch-name> <balance> 500 </balance>
</account><customer customer-id=“C100” accounts=“A-401”>
<customer-name>Joe </customer-name> <customer-street> Monroe </customer-street> <customer-city> Madison</customer-city>
</customer><customer customer-id=“C102” accounts=“A-401 A-402”>
<customer-name> Mary </customer-name> <customer-street> Erin </customer-street> <customer-city> Newark </customer-city>
</customer></bank-2>
309JYP
Limitations of DTDsLimitations of DTDs
No typing of text elements and attributes All values are strings, no integers, reals, etc.
Difficult to specify unordered sets of subelements Order is usually irrelevant in databases
(A | B)* allows specification of an unordered set, but
Cannot ensure that each of A and B occurs only once
IDs and IDREFs are untyped The owners attribute of an account may contain a reference to
another account, which is meaningless
owners attribute should ideally be constrained to refer to customer elements
310JYP
XML SchemaXML Schema
XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values
E.g. integer, string, etc
Also, constraints on min/max values
User defined types
Is itself specified in XML syntax, unlike DTDs
More standard representation, but verbose
Is integrated with namespaces
Many more features
List types, uniqueness and foreign key constraints, inheritance ..
BUT: significantly more complicated than DTDs, not yet widely used.
311JYP
XML Schema Version of Bank DTDXML Schema Version of Bank DTD<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema>
<xsd:element name=“bank” type=“BankType”/>
<xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>
</xsd:element>….. definitions of customer and depositor ….
<xsd:complexType name=“BankType”><xsd:squence>
<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/><xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/><xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>
</xsd:sequence></xsd:complexType></xsd:schema>
312JYP
Querying and Transforming XML DataQuerying and Transforming XML Data
Translation of information from one XML schema to another
Querying on XML data
Above two are closely related, and handled by the same tools
Standard XML querying/translation languages XPath
Simple language consisting of path expressions
XQuery
An XML query language with a rich set of features
Wide variety of other languages have been proposed, and some served as basis for the Xquery standard XML-QL, Quilt, XQL, …
313JYP
Tree Model of XML DataTree Model of XML Data
Query and transformation languages are based on a tree model of XML data
An XML document is modeled as a tree, with nodes corresponding to elements and attributes Element nodes have children nodes, which can be attributes or
subelements Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML
document Element and attribute nodes (except for the root node) have a single
parent, which is an element node The root node has a single child, which is the root element of the
document
We use the terminology of nodes, children, parent, siblings, ancestor, descendant, etc., which should be interpreted in the above tree model of XML data.
314JYP
XPathXPath
XPath is used to address (select) parts of documents using path expressions
A path expression is a sequence of steps separated by “/” Think of file names in a directory hierarchy
Result of path expression: set of values that along with their containing elements/attributes match the specified path
E.g. /bank-2/customer/customer-name evaluated on the bank-2 data we saw earlier returns
<customer-name>Joe</customer-name>
<customer-name>Mary</customer-name>
E.g. /bank-2/customer/customer-name/text( )
returns the same names, but without the enclosing tags
315JYP
XPath (Cont.)XPath (Cont.)
The initial “/” denotes root of the document (above the top-level tag)
Path expressions are evaluated left to right Each step operates on the set of instances produced by the previous step
Selection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400]
returns account elements with a balance value greater than 400
/bank-2/account[balance] returns account elements containing a balance subelement
Attributes are accessed using “@” E.g. /bank-2/account[balance > 400]/@account-number
returns the account numbers of those accounts with balance > 400
IDREF attributes are not dereferenced automatically (more on this later)
316JYP
Functions in XPathFunctions in XPath
XPath provides several functions The function count() at the end of a path counts the number of
elements in the set generated by the path E.g. /bank-2/account[customer/count() > 2]
– Returns accounts with > 2 customers Also function for testing position (1, 2, ..) of node w.r.t. siblings
Boolean connectives and and or and function not() can be used in predicates
IDREFs can be referenced using function id() id() can also be applied to sets of references such as IDREFS and
even to strings containing multiple references separated by blanks E.g. /bank-2/account/id(@owner)
returns all customers referred to from the owners attribute of account elements.
317JYP
More XPath FeaturesMore XPath Features Operator “|” used to implement union
E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower)
gives customers with either accounts or loans
However, “|” cannot be nested inside other operators.
“//” can be used to skip multiple levels of nodes E.g. /bank-2//customer-name
finds any customer-name element anywhere under the /bank-2 element, regardless of the element in which it is contained.
A step in the path can go to:
parents, siblings, ancestors and descendants
of the nodes generated by the previous step, not just to the children “//”, described above, is a short from for specifying “all descendants”
“..” specifies the parent.
We omit further details,
318JYP
XQueryXQuery
XQuery is a general purpose query language for XML data
Currently being standardized by the World Wide Web Consortium (W3C) The textbook description is based on a March 2001 draft of the standard.
The final version may differ, but major features likely to stay unchanged.
Alpha version of XQuery engine available free from Microsoft
XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL
XQuery uses a for … let … where .. result … syntax for SQL from where SQL where result SQL select let allows temporary variables, and has no equivalent in SQL
319JYP
FLWR Syntax in XQuery FLWR Syntax in XQuery
For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath
Simple FLWR expression in XQuery
find all accounts with balance > 400, with each result enclosed in an <account-number> .. </account-number> tag
for $x in /bank-2/account
let $acctno := $x/@account-number where $x/balance > 400 return <account-number> $acctno </account-number>
Let clause not really needed in this query, and selection can be done In XPath. Query can be written as:
for $x in /bank-2/account[balance>400]return <account-number> $x/@account-number
</account-number>
320JYP
Path Expressions and FunctionsPath Expressions and Functions
Path expressions are used to bind variables in the for clause, but can also be used in other places E.g. path expressions can be used in let clause, to bind variables to
results of path expressions
The function distinct( ) can be used to removed duplicates in path expression results
The function document(name) returns root of named document
E.g. document(“bank-2.xml”)/bank-2/account
Aggregate functions such as sum( ) and count( ) can be applied to path expression results
XQuery does not support group by, but the same effect can be got by nested queries, with nested FLWR expressions within a result clause More on nested queries later
321JYP
JoinsJoins Joins are specified in a manner very similar to SQL
for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor
where $a/account-number = $d/account-number and $c/customer-name = $d/customer-name
return <cust-acct> $c $a </cust-acct> The same query can be expressed with the selections specified
as XPath selections:
for $a in /bank/account $c in /bank/customer
$d in /bank/depositor[ account-number = $a/account-number and customer-name = $c/customer-name] return <cust-acct> $c $a</cust-acct>
322JYP
Changing Nesting StructureChanging Nesting Structure
The following query converts data from the flat structure for bank information into the nested structure used in bank-1
<bank-1> for $c in /bank/customer return
<customer> $c/* for $d in /bank/depositor[customer-name = $c/customer-name], $a in /bank/account[account-number=$d/account-number] return $a
</customer> </bank-1>
$c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag
Exercise for reader: write a nested query to find sum of accountbalances, grouped by branch.
323JYP
XQuery Path ExpressionsXQuery Path Expressions
$c/text() gives text content of an element without any subelements/tags
XQuery path expressions support the “–>” operator for dereferencing IDREFs Equivalent to the id( ) function of XPath, but simpler to use
Can be applied to a set of IDREFs to get a set of results
June 2001 version of standard has changed “–>” to “=>”
324JYP
Sorting in XQuery Sorting in XQuery
Sortby clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer return <customer> $c/* </customer> sortby(name)
Can sort at multiple levels of nesting (sort by customer-name, and by account-number within each customer)
<bank-1> for $c in /bank/customer return
<customer> $c/* for $d in /bank/depositor[customer-name=$c/customer-name], $a in /bank/account[account-number=$d/account-number]
return <account> $a/* </account> sortby(account-number)</customer> sortby(customer-name)
</bank-1>
325JYP
Functions and Other XQuery FeaturesFunctions and Other XQuery Features
User defined functions with the type system of XMLSchema function balances(xsd:string $c) returns list(xsd:numeric) { for $d in /bank/depositor[customer-name = $c], $a in /bank/account[account-number=$d/account-number] return $a/balance
}
Types are optional for function parameters and return values
Universal and existential quantification in where clause predicates
some $e in path satisfies P
every $e in path satisfies P
XQuery also supports If-then-else clauses
326JYP
Application Program InterfaceApplication Program Interface
There are two standard application program interfaces to XML data:SAX (Simple API for XML)
Based on parser model, user provides event handlers for parsing events
– E.g. start of element, end of element
– Not suitable for database applicationsDOM (Document Object Model)
XML data is parsed into a tree representation Variety of functions provided for traversing the DOM tree E.g.: Java DOM API provides Node class with methods
getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), …
Also provides functions for updating DOM tree
327JYP
The end of the courseThe end of the course
Thank you for your cooperation! Thank you for your cooperation!