+ All Categories
Home > Documents > IELM 511: Information System design

IELM 511: Information System design

Date post: 19-Mar-2016
Category:
Upload: faraji
View: 41 times
Download: 2 times
Share this document with a friend
Description:
IELM 511: Information System design. Introduction. Part 1. ISD for well structured data – relational and other DBMS. Info storage (modeling, normalization) Info retrieval (Relational algebra, Calculus, SQL) DB integrated API’s. ISD for systems with non-uniformly structured data. - PowerPoint PPT Presentation
Popular Tags:
27
IELM 511: Information System design Introduction Part 1. ISD for well structured data – relational and other ISD for systems with non-uniformly structured data Part III: (one out of) Basics of web-based IS (www, web2.0, …) Markup’s, HTML, XML Design tools for Info Sys: UML API’s for mobile apps Security, Cryptography IS product lifecycles Algorithm analysis, P, NP, NPC Info storage (modeling, normalization) Info retrieval (Relational algebra, Calculus, SQL) DB integrated API’s
Transcript
Page 1: IELM 511: Information System design

IELM 511: Information System design

Introduction

Part 1. ISD for well structured data – relational and other DBMS

ISD for systems with non-uniformly structured data

Part III: (one out of)

Basics of web-based IS (www, web2.0, …)Markup’s, HTML, XMLDesign tools for Info Sys: UML

API’s for mobile appsSecurity, CryptographyIS product lifecyclesAlgorithm analysis, P, NP, NPC

Info storage (modeling, normalization)Info retrieval (Relational algebra, Calculus, SQL)DB integrated API’s

Page 2: IELM 511: Information System design

Agenda

Relational Algebra

Relational Calculus

Structured Query Language (SQL)

DB API’s

Page 3: IELM 511: Information System design

1n

1

n

mn

1

n

1 n

m

n

1n

1

n

mn

1

n

1 n

m

n

Recall our Bank DB design

BRANCH( b-name, city, assets)

CUSTOMER( cssn, c-name, street, city, banker, banker-type)

LOAN( l-no, amount, br-name)

PAYMENT( l-no, pay-no, date, amount)

EMPLOYEE( e-ssn, e-name, tel, start-date, mgr-ssn)

ACCOUNT( ac-no, balance)

SACCOUNT( ac-no, int-rate)

CACCOUNT( ac-no, od-amt)

BORROWS( cust-ssn, loan-num)

DEPOSIT( c-ssn, ac-num, access-date)

DEPENDENT( emp-ssn, dep-name)

Page 4: IELM 511: Information System design

Background: Algebra

What is an algebra ?

Study of systems of mathematical objects and operations defined on the objects

Examples of algebras:

Integers, with operations: +, -, ×, /, % …

Real numbers, with operations: +, -, ×, /, …

Vectors, with operations: +, -, , ×, …

Booleans, with operations: , , , …

Page 5: IELM 511: Information System design

Relational Algebra

Relational Algebra: objects: instances of relational schemas (namely, tables) operations: , , ×, set-theoretic operations: , , -, ÷

Key concepts:Operator arguments: Arguments of operators are instances of schemas (table)

Operation closure: The outcome of the operator is an instance of schema

Expressions: A sequence of operations can be written as an expression

Operator precedence: The sequence of application of operations in an expression is fixed.

Compare these concepts to those in other algebras

Page 6: IELM 511: Information System design

Relational Algebra: select,

: unary operator, input: one table; output: table

Notation: in remainder, we will refer to an instance of a schema as a table

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

[amount > 1200](LOAN)

loan_number amount branch_name

L23 2000 Redwood

L15 1500 Pennyridge

L16 1300 Pennyridge

Page 7: IELM 511: Information System design

Relational Algebra: select,

conditions of operator: - Denote the criterion for selection of a given tuple - Must be evaluated one tuple at a time - Must evaluate to ‘true’ or ‘false’ - Output = set of tuples for which -conditions are ‘true’

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

[(amount > 1200) (branch_name = ‘Pennyridge’)] (LOAN)

loan_number amount branch_name

L15 1500 Pennyridge

L16 1300 Pennyridge

Page 8: IELM 511: Information System design

Relational Algebra: project,

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

: unary operator, input: one table; output: table

[list of attributes] (TABLE)

[loan_number, amount] (LOAN)

loan_number amount

L17 1000

L23 2000

L15 1500

L93 500

L11 900

L16 1300

Page 9: IELM 511: Information System design

Relational Algebra: project,

[branch_name] (LOAN)

Project returns a set of tuples; the number of rows may be smaller that input

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

branch_name

Downtown

Redwood

Pennyridge

Mianus

Round Hill

Example: Find the names of all branches that have given loans

Page 10: IELM 511: Information System design

Relational Algebra: combining operations

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

branch_name

Redwood

Pennyridge

Example: Find the names of all branches that have given loans larger than 1200

[branch_name] ([(amount > 1200) ] (LOAN))

Page 11: IELM 511: Information System design

Relational Algebra: combining operations

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

branch_name

Redwood

Pennyridge

Example: Find the names of all branches that have given loans larger than 1200

X = ([(amount > 1200) ] (LOAN)) Y = [branch_name] (X)

Note: expressions impose a sequence in which operations are perfromed

Xloan_number amount branch_name

L23 2000 Redwood

L15 1500 Pennyridge

L16 1300 Pennyridge

Y

Page 12: IELM 511: Information System design

Relational Algebra: join, ×

Join is useful when the information required is in two (or more) tables.

Tables are sets of tuples, andthe join of two tables produces a cartesian product of the two sets

Background (set theory): cartesian product, A × B = { (x, y) | x A, y B}

Example:A = { 1, 2, 3 }, B = { a, s}

A × B = { (1, a), (1, s), (2, a), (2, s), (3, a), (3, s) }

Page 13: IELM 511: Information System design

Relational Algebra: join, ×

Cartesian product, BORROWS × LOAN

BORROWScustomer loan_no

111-12-0000 L17

222-12-0000 L23

333-12-0000 L15

444-00-0000 L93

666-12-0000 L17

111-12-0000 L11

999-12-0000 L17

777-12-0000 L16

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

5 columns

customer loan_no loan_number amount branch_name

111-12-0000 L17 L17 1000 Downtown

111-12-0000 L17 L23 2000 Redwood

111-12-0000 L17 L15 1500 Pennyridge

777-12-0000 L16 L16 1300 Pennyridge

48 rows

Page 14: IELM 511: Information System design

Relational Algebra: join, ×

Usually, a cartesian product produces several tuples with un-relatedinformation.-join specifies a -condition (same as a selection criterion) to restrictthe output of a join to meaningful tuples only.

Example: Find the loan no, amount and branch name for all customers.

BORROWS ×[loan_no = loan_number] LOAN

customer loan_no loan_number amount branch_name

111-12-0000 L17 L17 1000 Downtown

222-12-0000 L23 L23 2000 Redwood

333-12-0000 L15 L15 1500 Pennyridge

444-00-0000 L93 L93 500 Mianus

666-12-0000 L17 L17 1000 Downtown

111-12-0000 L11 L11 900 Round Hill

999-12-0000 L17 L17 1000 Downtown

777-12-0000 L16 L16 1300 Pennyridge

5 columns

8 rows [Why ?]

Page 15: IELM 511: Information System design

Relational Algebra: dot-notation in join, ×

Two tables being joined may have the same attribute name(possibly denoting two different things). To distinguish the columnsin the -join, the names of attributes use dot-notation

C = BORROWS ×[BORROWS.loan_no = LOAN.loan_number] LOAN

C = BORROWS ×[loan_no = loan_number] LOAN

The following are all equivalent:

A = BORROWS B = LOANC = A ×[A.loan_no = B.loan_number] B

Page 16: IELM 511: Information System design

Relational Algebra: set theoretic operations,

Since a table is a set of tuples, it is possible to make a union of twotables.BUT: we require closure (union of two tables should be a table).

Union is defined for two tables with identical schemas.

Example: Find the names of customers who have either a deposit, or a loanwith the bank

A = [customer] (BORROWS) [c_ssn] (DEPOSIT)

RESULT = [name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )

name

Jones

Smith

Hayes

Curry

Turner

Williams

Adams

Johnson

Brooks

Lindsay

Page 17: IELM 511: Information System design

Relational Algebra: set theoretic operations,

Other set theoretic operations can be applied with same rules.

Example: Find the names of customers who have both, a deposit and a loanwith the bank

A = [customer] (BORROWS) [c_ssn] (DEPOSIT)

RESULT = [name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )

name

Jones

Smith

Hayes

c_ssn

888-12-0000

222-12-0000

333-12-0000

555-00-0000

111-12-0000

000-12-0000

customer

111-12-0000

222-12-0000

333-12-0000

444-00-0000

666-12-0000

999-12-0000

777-12-0000

customer

111-12-0000

222-12-0000

333-12-0000

=

RESULT

Page 18: IELM 511: Information System design

Relational Algebra: set theoretic operations,

Other set theoretic operations (same rules).

Example: Find the names of customers who have a loan but no deposits.

A = [customer] (BORROWS) [c_ssn] (DEPOSIT)

RESULT = [name] (A ×[A.customer= CUSTOMER.ssn] CUSTOMER )

name

Johnson

Turner

Lindsay

c_ssn

888-12-0000

222-12-0000

333-12-0000

555-00-0000

111-12-0000

000-12-0000

customer

111-12-0000

222-12-0000

333-12-0000

444-00-0000

666-12-0000

999-12-0000

777-12-0000

customer

888-12-0000

555-12-0000

000-12-0000

=

RESULT

Page 19: IELM 511: Information System design

Relational Algebra: set theoretic operations, ÷Set division extends the meaning of integer division, in the sense that it‘cancels away’ common multiples. It is useful in answering ‘for all’ queries.

Example: Do all the loan officers have the same manager ?A solution: Find the ssn of the person who manages all the loan officers.

A = [banker] ([b_type=LO] (CUSTOMER) )

B = [mgr_ssn, e_ssn] (EMPLOYEE)

RESULT = B ÷ A

RESULTbanker

333-11-4444

123-45-6789

Amgr_ssn e_ssn

321-32-4321 111-22-3333

111-22-3333 333-11-4444

111-22-3333 123-45-6789

321-32-4321 555-66-8888

888-99-9999 987-65-4321

777-77-7777 888-99-9999

777-77-7777 321-32-4321

null 777-77-7777

B

÷ mgr_ssn

111-22-3333

Note: for this example, we have to specify thatthe common divisor in B is e_ssn.

Page 20: IELM 511: Information System design

Relational Algebra: set theoretic operations, ÷

Generic definition of ÷

Attribute restrictions:A ÷ B is defined only for A( R, C) and B( C), where R, C are sets of attributes.

Output:The output contains each ti[R] such that tuples tj[C] B, a tuple, t A in which t[C] = tj[C] and t[R] = ti[R].

r1… rm c1 … ck

c1 … ck

common attribute set, C

attribute set, R

t1…tn

r1… rm

OUTPUT

÷

Page 21: IELM 511: Information System design

Relational Algebra: concluding remarks

RA provides a formal language to get information from the database

RA can potentially answer any query, as long as the query pertains toexactly one row of some table derivable using expressions.

Limitations of RA: aggregation and summary informationExamples:

find the average amount of assets in the branchesfind the total assets of the bank,…

RA is procedural, namely, an expression of RA specifies a step bystep procedure for computing the result.

Page 22: IELM 511: Information System design

Relational Calculus (RC)

Background: what is a calculus ?

RC is based on a formal system in logic, first order predicate calculus (fopc)

A formal system has: a set of symbols; rules about how the symbols can be arranged in well formed formulae (wff) a (logical) mechanism to derive if a wff is true/false. additionally, fopc allows wff with ‘variables’ and quantifiers (, ).

A query in RC takes the form: {t | P(t) }

Meaning: the set of all tuples, t, for which some Proposition, P(t) is true.P is also called a predicate.

Page 23: IELM 511: Information System design

Relational Calculus (RC) examples

1. Report the loans that exceed $1200:

{ t | t LOAN t[amount] > 1200}

2. Find the names of customers who took a loan from the Pennyridge branch.

{ t[name] | s BORROWS s[customer] = t[ssn] u LOAN u[loan_number] = s[loan_no] u[branch_name] = ‘Pennyridge’}

Page 24: IELM 511: Information System design

Relational Calculus (RC) remarks

RC is non-procedural – any way that the predicate P can be evaluatedis valid.

RC is the formal basis for Structured Query Language (SQL)

SQL is the de facto standard language for all RDBMSs

In terms of functionality (i.e. the power to get some information from anyDB) RA and RC are equivalent). Namely, any query that can be written inRC has an equivalent RA formula, and vice versa.

Advantage of RC (over RA): conceptually, it is better to allow the user todefine the logic of the query, but leave the procedure for computing itto the program [why ?].

Page 25: IELM 511: Information System design

Bank tables..

BRANCHbranch_name city assets

Downtown Brooklyn 9000000

Redwood Palo Alto 2100000

Pennyridge Horseneck 1700000

Mianus Horseneck 400000

Round Hill Horseneck 8000000

Pownal Bennington 300000

North Town Rye 3700000

Brighton Brooklyn 7100000

EMPLOYEEe_ssn e_name tel start_date mgr_ssn

111-22-3333 Jones 12345 Nov-2005 321-32-4321

333-11-4444 Smith 54321 Mar-1998 111-22-3333

123-45-6789 Lee 54321 Mar-1998 111-22-3333

555-66-8888 Turner 55555 Aug-2002 321-32-4321

987-65-4321 Jones 87621 Mar-1995 888-99-9999

888-99-9999 Chan 87654 Feb-1980 777-77-7777

321-32-4321 Adams 77777 Feb-1990 777-77-7777

777-77-7777 Black 99111 Jan-1980 null

Page 26: IELM 511: Information System design

CUSTOMERssn name street city banker b_type

111-12-0000 Jones Main Harrison 321-32-4321 CRM

222-12-0000 Smith North Rye 321-32-4321 CRM

333-12-0000 Hayes Main Harrison 321-32-4321 CRM

444-12-0000 Curry North Rye 333-11-4444 LO

555-12-0000 Turner Putnam Stamford 888-99-9999 DO

666-12-0000 Williams Nassau Princeton 333-11-4444 LO

777-12-0000 Adams Spring Pittsfield 123-45-6789 LO

888-12-0000 Johnson Alma Palo Alto 888-99-9999 DO

999-12-0000 Brooks Senator Brooklyn 123-45-6789 LO

000-12-0000 Lindsay Park Pittsfield 888-99-9999 DO

DEPOSITc_ssn ac_num accessDate

888-12-0000 A101 Jan 1, 09

222-12-0000 A215 Feb 1, 09

333-12-0000 A102 Feb 28, 09

555-00-0000 A305 Mar 10, 09

888-12-0000 A201 Mar 1, 98

111-12-0000 A217 Mar 1, 09

000-12-0000 A101 Feb 25, 09

BORROWScustomer loan_no

111-12-0000 L17

222-12-0000 L23

333-12-0000 L15

444-00-0000 L93

666-12-0000 L17

111-12-0000 L11

999-12-0000 L17

777-12-0000 L16

LOANloan_number amount branch_name

L17 1000 Downtown

L23 2000 Redwood

L15 1500 Pennyridge

L93 500 Mianus

L11 900 Round Hill

L16 1300 Pennyridge

Page 27: IELM 511: Information System design

References and Further Reading

Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill

Next: SQL and DB API’s


Recommended