IELM 511: Information System design
Introduction
Part 1. ISD for well structured data – relational and other DBMS
ISD for systems with non-uniformly structured data
Part III: (one out of)
Basics of web-based IS (www, web2.0, …)Markup’s, HTML, XMLDesign tools for Info Sys: UML
API’s for mobile appsSecurity, CryptographyIS product lifecyclesAlgorithm analysis, P, NP, NPC
Info storage (modeling, normalization)Info retrieval (Relational algebra, Calculus, SQL)DB integrated API’s
Agenda
Relational Algebra
Relational Calculus
Structured Query Language (SQL)
DB API’s
1n
1
n
mn
1
n
1 n
m
n
1n
1
n
mn
1
n
1 n
m
n
Recall our Bank DB design
BRANCH( b-name, city, assets)
CUSTOMER( cssn, c-name, street, city, banker, banker-type)
LOAN( l-no, amount, br-name)
PAYMENT( l-no, pay-no, date, amount)
EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn)
ACCOUNT( ac-no, balance)
SACCOUNT( ac-no, int-rate)
CACCOUNT( ac-no, od-amt)
BORROWS( cust-ssn, loan-num)
DEPOSIT( c-ssn, ac-num, access-date)
DEPENDENT( emp-ssn, dep-name)
Background: Algebra
What is an algebra ?
Study of systems of mathematical objects and operations defined on the objects
Examples of algebras:
Integers, with operations: +, -, ×, /, % …
Real numbers, with operations: +, -, ×, /, …
Vectors, with operations: +, -, , ×, …
Booleans, with operations: , , , …
Relational Algebra
Relational Algebra: objects: instances of relational schemas (namely, tables) operations: , , ×, set-theoretic operations: , , -, ÷
Key concepts:Operator arguments: Arguments of operators are instances of schemas (table)
Operation closure: The outcome of the operator is an instance of schema
Expressions: A sequence of operations can be written as an expression
Operator precedence: The sequence of application of operations in an expression is fixed.
Compare these concepts to those in other algebras
Relational Algebra: select,
: unary operator, input: one table; output: table
Notation: in remainder, we will refer to an instance of a schema as a table
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
[amount > 1200](LOAN)
loan_number amount branch_name
L23 2000 Redwood
L15 1500 Pennyridge
L16 1300 Pennyridge
Relational Algebra: select,
conditions of operator: - Denote the criterion for selection of a given tuple - Must be evaluated one tuple at a time - Must evaluate to ‘true’ or ‘false’ - Output = set of tuples for which -conditions are ‘true’
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
[(amount > 1200) (branch_name = ‘Pennyridge’)] (LOAN)
loan_number amount branch_name
L15 1500 Pennyridge
L16 1300 Pennyridge
Relational Algebra: project,
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
: unary operator, input: one table; output: table
[list of attributes] (TABLE)
[loan_number, amount] (LOAN)
loan_number amount
L17 1000
L23 2000
L15 1500
L93 500
L11 900
L16 1300
Relational Algebra: project,
[branch_name] (LOAN)
Project returns a set of tuples; the number of rows may be smaller that input
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
branch_name
Downtown
Redwood
Pennyridge
Mianus
Round Hill
Example: Find the names of all branches that have given loans
Relational Algebra: combining operations
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
branch_name
Redwood
Pennyridge
Example: Find the names of all branches that have given loans larger than 1200
[branch_name] ([(amount > 1200) ] (LOAN))
Relational Algebra: combining operations
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
branch_name
Redwood
Pennyridge
Example: Find the names of all branches that have given loans larger than 1200
X = ([(amount > 1200) ] (LOAN)) Y = [branch_name] (X)
Note: expressions impose a sequence in which operations are perfromed
Xloan_number amount branch_name
L23 2000 Redwood
L15 1500 Pennyridge
L16 1300 Pennyridge
Y
Relational Algebra: join, ×
Join is useful when the information required is in two (or more) tables.
Tables are sets of tuples, andthe join of two tables produces a cartesian product of the two sets
Background (set theory): cartesian product, A × B = { (x, y) | x A, y B}
Example:A = { 1, 2, 3 }, B = { a, s}
A × B = { (1, a), (1, s), (2, a), (2, s), (3, a), (3, s) }
Relational Algebra: join, ×
Cartesian product, BORROWS × LOAN
BORROWScustomer loan_no
111-12-0000 L17
222-12-0000 L23
333-12-0000 L15
444-00-0000 L93
666-12-0000 L17
111-12-0000 L11
999-12-0000 L17
777-12-0000 L16
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
5 columns
customer loan_no loan_number amount branch_name
111-12-0000 L17 L17 1000 Downtown
111-12-0000 L17 L23 2000 Redwood
111-12-0000 L17 L15 1500 Pennyridge
…
…
777-12-0000 L16 L16 1300 Pennyridge
48 rows
Relational Algebra: join, ×
Usually, a cartesian product produces several tuples with un-relatedinformation.-join specifies a -condition (same as a selection criterion) to restrictthe output of a join to meaningful tuples only.
Example: Find the loan no, amount and branch name for all customers.
BORROWS ×[loan_no = loan_number] LOAN
customer loan_no loan_number amount branch_name
111-12-0000 L17 L17 1000 Downtown
222-12-0000 L23 L23 2000 Redwood
333-12-0000 L15 L15 1500 Pennyridge
444-00-0000 L93 L93 500 Mianus
666-12-0000 L17 L17 1000 Downtown
111-12-0000 L11 L11 900 Round Hill
999-12-0000 L17 L17 1000 Downtown
777-12-0000 L16 L16 1300 Pennyridge
5 columns
8 rows [Why ?]
Relational Algebra: dot-notation in join, ×
Two tables being joined may have the same attribute name(possibly denoting two different things). To distinguish the columnsin the -join, the names of attributes use dot-notation
C = BORROWS ×[BORROWS.loan_no = LOAN.loan_number] LOAN
C = BORROWS ×[loan_no = loan_number] LOAN
The following are all equivalent:
A = BORROWS B = LOANC = A ×[A.loan_no = B.loan_number] B
Relational Algebra: set theoretic operations,
Since a table is a set of tuples, it is possible to make a union of twotables.BUT: we require closure (union of two tables should be a table).
Union is defined for two tables with identical schemas.
Example: Find the names of customers who have either a deposit, or a loanwith the bank
A = [customer] (BORROWS) [c_ssn] (DEPOSIT)
RESULT = [name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )
name
Jones
Smith
Hayes
Curry
Turner
Williams
Adams
Johnson
Brooks
Lindsay
Relational Algebra: set theoretic operations,
Other set theoretic operations can be applied with same rules.
Example: Find the names of customers who have both, a deposit and a loanwith the bank
A = [customer] (BORROWS) [c_ssn] (DEPOSIT)
RESULT = [name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )
name
Jones
Smith
Hayes
c_ssn
888-12-0000
222-12-0000
333-12-0000
555-00-0000
111-12-0000
000-12-0000
customer
111-12-0000
222-12-0000
333-12-0000
444-00-0000
666-12-0000
999-12-0000
777-12-0000
customer
111-12-0000
222-12-0000
333-12-0000
=
RESULT
Relational Algebra: set theoretic operations,
Other set theoretic operations (same rules).
Example: Find the names of customers who have a loan but no deposits.
A = [customer] (BORROWS) [c_ssn] (DEPOSIT)
RESULT = [name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )
name
Johnson
Turner
Lindsay
c_ssn
888-12-0000
222-12-0000
333-12-0000
555-00-0000
111-12-0000
000-12-0000
customer
111-12-0000
222-12-0000
333-12-0000
444-00-0000
666-12-0000
999-12-0000
777-12-0000
customer
888-12-0000
555-12-0000
000-12-0000
=
RESULT
Relational Algebra: set theoretic operations, ÷Set division extends the meaning of integer division, in the sense that it‘cancels away’ common multiples. It is useful in answering ‘for all’ queries.
Example: Do all the loan officers have the same manager ?A solution: Find the ssn of the person who manages all the loan officers.
A = [banker] ([b_type=LO] (CUSTOMER) )
B = [mgr_ssn, e_ssn] (EMPLOYEE)
RESULT = B ÷ A
RESULTbanker
333-11-4444
123-45-6789
Amgr_ssn e_ssn
321-32-4321 111-22-3333
111-22-3333 333-11-4444
111-22-3333 123-45-6789
321-32-4321 555-66-8888
888-99-9999 987-65-4321
777-77-7777 888-99-9999
777-77-7777 321-32-4321
null 777-77-7777
B
÷ mgr_ssn
111-22-3333
Note: for this example, we have to specify thatthe common divisor in B is e_ssn.
Relational Algebra: set theoretic operations, ÷
Generic definition of ÷
Attribute restrictions:A ÷ B is defined only for A( R, C) and B( C), where R, C are sets of attributes.
Output:The output contains each ti[R] such that tuples tj[C] B, a tuple, t A in which t[C] = tj[C] and t[R] = ti[R].
r1… rm c1 … ck
…
c1 … ck
common attribute set, C
attribute set, R
t1…tn
r1… rm
…
OUTPUT
÷
Relational Algebra: concluding remarks
RA provides a formal language to get information from the database
RA can potentially answer any query, as long as the query pertains toexactly one row of some table derivable using expressions.
Limitations of RA: aggregation and summary informationExamples:
find the average amount of assets in the branchesfind the total assets of the bank,…
RA is procedural, namely, an expression of RA specifies a step bystep procedure for computing the result.
Relational Calculus (RC)
Background: what is a calculus ?
RC is based on a formal system in logic, first order predicate calculus (fopc)
A formal system has: a set of symbols; rules about how the symbols can be arranged in well formed formulae (wff) a (logical) mechanism to derive if a wff is true/false. additionally, fopc allows wff with ‘variables’ and quantifiers (, ).
A query in RC takes the form: {t | P(t) }
Meaning: the set of all tuples, t, for which some Proposition, P(t) is true.P is also called a predicate.
Relational Calculus (RC) examples
1. Report the loans that exceed $1200:
{ t | t LOAN t[amount] > 1200}
2. Find the names of customers who took a loan from the Pennyridge branch.
{ t[name] | s BORROWS s[customer] = t[ssn] u LOAN u[loan_number] = s[loan_no] u[branch_name] = ‘Pennyridge’}
Relational Calculus (RC) remarks
RC is non-procedural – any way that the predicate P can be evaluatedis valid.
RC is the formal basis for Structured Query Language (SQL)
SQL is the de facto standard language for all RDBMSs
In terms of functionality (i.e. the power to get some information from anyDB) RA and RC are equivalent). Namely, any query that can be written inRC has an equivalent RA formula, and vice versa.
Advantage of RC (over RA): conceptually, it is better to allow the user todefine the logic of the query, but leave the procedure for computing itto the program [why ?].
Bank tables..
BRANCHbranch_name city assets
Downtown Brooklyn 9000000
Redwood Palo Alto 2100000
Pennyridge Horseneck 1700000
Mianus Horseneck 400000
Round Hill Horseneck 8000000
Pownal Bennington 300000
North Town Rye 3700000
Brighton Brooklyn 7100000
EMPLOYEEe_ssn e_name tel start_date mgr_ssn
111-22-3333 Jones 12345 Nov-2005 321-32-4321
333-11-4444 Smith 54321 Mar-1998 111-22-3333
123-45-6789 Lee 54321 Mar-1998 111-22-3333
555-66-8888 Turner 55555 Aug-2002 321-32-4321
987-65-4321 Jones 87621 Mar-1995 888-99-9999
888-99-9999 Chan 87654 Feb-1980 777-77-7777
321-32-4321 Adams 77777 Feb-1990 777-77-7777
777-77-7777 Black 99111 Jan-1980 null
CUSTOMERssn name street city banker b_type
111-12-0000 Jones Main Harrison 321-32-4321 CRM
222-12-0000 Smith North Rye 321-32-4321 CRM
333-12-0000 Hayes Main Harrison 321-32-4321 CRM
444-12-0000 Curry North Rye 333-11-4444 LO
555-12-0000 Turner Putnam Stamford 888-99-9999 DO
666-12-0000 Williams Nassau Princeton 333-11-4444 LO
777-12-0000 Adams Spring Pittsfield 123-45-6789 LO
888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO
999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO
000-12-0000 Lindsay Park Pittsfield 888-99-9999 DO
DEPOSITc_ssn ac_num accessDate
888-12-0000 A101 Jan 1, 09
222-12-0000 A215 Feb 1, 09
333-12-0000 A102 Feb 28, 09
555-00-0000 A305 Mar 10, 09
888-12-0000 A201 Mar 1, 98
111-12-0000 A217 Mar 1, 09
000-12-0000 A101 Feb 25, 09
BORROWScustomer loan_no
111-12-0000 L17
222-12-0000 L23
333-12-0000 L15
444-00-0000 L93
666-12-0000 L17
111-12-0000 L11
999-12-0000 L17
777-12-0000 L16
LOANloan_number amount branch_name
L17 1000 Downtown
L23 2000 Redwood
L15 1500 Pennyridge
L93 500 Mianus
L11 900 Round Hill
L16 1300 Pennyridge
References and Further Reading
Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill
Next: SQL and DB API’s