Database Design and the
Entity/Relationship Model
You’ve just been hired by Bank of America as their DBA for their online banking web site.
You are asked to create a database that
monitors:
◦ customers
◦ accounts
◦ loans
◦ branches
◦ transactions, …
Now what??!!!
1. Requirements Specification
◦ Determine the requirements of clients
2. Conceptual Design◦ Express client requirements in terms of some high-level
model (E/R model).◦ Confirm with clients that requirements are correct.
3. Functional Requirements◦ Specify required data operations ◦ priorities, response times
4. Logical Design◦ Convert E/R model to relational, object-based, XML-based,…
5. Physical Design◦ Specify file organizations, build indexes
Conceptual DesignThe E/R Data Model
What is a Data Model?
Framework for organizing and interpreting data
Example: E/R Data Model
Entity1 Entity2Relationship
Attribute1a Attribute1bAttribute2a Attribute2b
Attribute2c
E/R Data ModelBasics
Entities noun phrases (e.g., Bob Smith, Thayer St. Branch)
contained in entity sets (e.g. Employee, Branch)
have attributes (e.g., Employee = (essn, ename, …))
Relationships verb phrases (e.g., works_at, works_for)
relate 2 (binary) or more (n-ary) entities
relationship sets characterize relationships amongst entity sets
e.g., (Bob Smith, Thayer St Branch) Works_At
E/R Data ModelAn Example
Employee Works_At
essn
Branch
ename
phone
children
since seniority bname bcity
Works_Formanager
workerEntity Set
Relationship Set
Attribute
Employee
Works_At
phone
Lots of notation to come.
Color is irrelevant
E/R Data ModelTypes of Attributes
Employee Works_At
essn
Branch
ename
phone
children
since seniority bname bcity
Works_Formanager
worker Defaultename
children
seniority
Multivalued
Derived
E/R Data ModelTypes of relationships
Employee Works_At
essn
Branch
ename
phone
children
since seniority bname bcity
Works_Formanager
workerMany-to-One (n:1)
Many-to-Many (n:m)Works_At
Works_For
E/R Data ModelRecursive relationships
Employee Works_At
essn
Branch
ename
phone
children
since seniority bname bcity
Works_Formanager
worker
Employeemanager
worker Works_For
Recursive relationships: Must be declared with roles
E/R Data ModelDesign Issue #1: Entity Sets vs. Attributes
Employee
phone_no phone_loc
Employee
no loc
PhoneUsesvs
(a) (b)
To resolve, determine how phones are used 1. Can many employees share a phone?
(If yes, then (b))
2. Can employees have multiple phones?
(if yes, then (b), or (a) with multivalued attributes)
3. Else (a)
An Example: How to model bank loans
E/R Data ModelDesign Issue #2: Entity Sets vs. Relationship Sets
Customer
cssn cname
vs
(a)
To resolve, determine how loans are issued 1. Can there be more than one customer per loan?
If yes, then (a). Otherwise, loan info must be replicated for each customer (wasteful, potential update anomalies)
2. Is loan a noun or a verb?
Both, but more of a noun to a bank. (hence (a) probably more
appropriate)
Borrows Loan
lno amt
Customer
cssn cname
(b)
Loans Branch
bname bcity
lno amt
An Example:
E/R Data ModelDesign Issue #3: Relationship Cardinalities
Customer Borrows Loan? ?
Variations on Borrows:
1. Can a customer hold multiple loans?
2. Can a loan be jointly held by more than 1 customer?
E/R Data ModelDesign Issue #3: Relationship Cardinalities
Customer Borrows Loan? ?
Type Illustrated Multiple Loans? Joint Loans?
One-to-One (1:1) No No
Many-to-one (n:1) No Yes
One-to-many (1:n) Yes No
Many-to-many (n:m) Yes Yes
Cardinalities of Borrows:
Borrows
Borrows
Borrows
Borrows
E/R Data ModelDesign Issue #3: Relationship Cardinalities (cont)
In general...
1:1
n:1
1:n
n:m
An Example: Works_At
E/R Data ModelDesign Issue #4: N-ary vs Binary Relationship Sets
Employee Works_at Branch
Dept
Employee WAE Branch
Dept
WABinary:
Ternary:
WAB
WAD
vs
(Joe, Thayer, Acct) Works_At
(Joe, w3) WAE
(Thayer, w3) WAB
(Acct, w3) WAD
Choose n-ary
when possible!
Key = set of attributes identifying individual entities or relationships
E/R Data ModelKeys
Employee
essn ename eaddress ephone
A. Superkey: any attribute set that distinguishes identities e.g., {essn}, {essn, ename, eaddress}
B. Candidate Key:
“minimal superkey” (can’t remove attributes and preserve “keyness”) e.g., {essn}, {ename, eaddress}
C. Primary Key:
candidate key chosen as the key by a DBA e.g., {essn} (denoted by underline)
E/R Data ModelRelationship Set Keys
Employee
essn ename ...
Works_At Branch
bname bcity ...
Q: What attributes are needed to represent
relationships in Works_At?
since
e1
e2
e3
b1
b2
A: {essn, bname, since}
E/R Data ModelRelationship Set Keys (cont.)
Q: What are the candidate keys of Works_At?
e1
e2
e3
b1
b2
A: {essn}
Employee
essn ename ...
Works_At Branch
bname bcity ...since
E/R Data ModelRelationship Set Keys (cont.)
Q: What are the candidate keys if Works_At is...?
A: {essn, bname} b. n:m
a. 1:n
c. 1:1
A: {bname}
A: {essn} or {bname}
Employee
essn ename ...
Works_At Branch
bname bcity ...since
? ?
General Rules for Relationship Set Keys
E/R Data ModelRelationship Set Keys (cont.)
E1
P (E1)...
R E2
P (E2)...
If R is:
R
1:1
1:n
n:1
n:m
Candidate Keys
P (E1) or P (E2)
P (E2)
P (E1)
P (E1) P (E2)
Idea:◦ Existence of one entity depends on another
Example: Loans and Loan Payments
E/R Data ModelExistence Dependencies and Weak Entity Sets
Loan
lno lamt
Loan_Pmt Payment
pno pdate pamt
Weak Entity Set
Identifying Relationship
Total Participation
E/R Data ModelExistence Dependencies and Weak Entity Sets
Weak Entity Sets
existence of payments depends upon loans
have no superkeys: different payment records (for different loans) can be identical
instead of keys, discriminators: discriminate
between payments for given loan (e.g., pno)
Loan
lno lamt
Loan_Pmt Payment
pno pdate pamt
E/R Data ModelExistence Dependencies and Weak Entity Sets
Identifying Relationships
We say:
Loan is dominant in Loan_Pmt
Payment is subordinate in Loan_Pmt
Payment is existence dependent on Loan
Loan
lno lamt
Loan_Pmt Payment
pno pdate pamt
E/R Data ModelExistence Dependencies and Weak Entity Sets
All elements of Payment appear in Loan_Pmt
Loan
lno lamt
Payment
pno pdate pamt
Loan_Pmt
E/R Data ModelExistence Dependencies and Weak Entity Sets
E1
attam
E2
attb1 attbn
Q. Is {attb1, …, attbn} a superkey of E2?
......atta1
A: No
Q. Name a candidate key of E2
A: {atta1, attb1}
R
Q. Does total participation of E2 in R E2 is existence-dep?
A: No
E/R Data ModelExtensions to the Model: Specialization and Generalization
An Example: Customers can have checking and savings accts
Checking ~ Savings (many of the same attributes)
Old Way:
Customer Has1 Savings Acct
acct_no balance interest
Has2 Checking Acct
acct_no balance overdraft
E/R Data ModelExtensions to the Model: Specialization and Generalization
Customer Has Account
acct_no balance
Checking Acct
overdraftinterest
Isa
Savings Acct
An Example: Customers can have checking and savings accts
Checking ~ Savings (many of the same attributes)
New Way:
superclass
subclasses
E/R Data ModelExtensions to the Model: Specialization and Generalization
Subclass Distinctions:
1. User-Defined vs. Condition-Defined
User: Membership in subclasses explicitly determined
(e.g., Employee, Manager < Person)
Condition: Membership predicate associated with
subclasses - e.g:
Person
Isa
Child Adult Senior
age < 18 18 age age > 65
E/R Data ModelExtensions to the Model: Specialization and Generalization
Subclass Distinctions:
2. Overlapping vs. Disjoint
Overlapping: Entities can belong to >1 entity set
(e.g., Adult, Senior)
Disjoint: Entities belong to exactly 1 entity set
(e.g., Child)
Person
Isa
Child Adult Senior
age < 18 18 age age > 65
E/R Data ModelExtensions to the Model: Specialization and Generalization
Subclass Distinctions:
3. Total vs. Partial Membership
Total: Every entity of superclass belongs to a subclass
e.g.,
Partial: Some entities of superclass do not belong to any
subclass (e.g., Suppose Adult condition is age 21 )
Person
Isa
Child Adult Senior
age < 18 age 18 age 65
E/R Data ModelExtensions to the Model: Aggregation
E/R: No relationships between relationships E.g.: Associate loan officers with Borrows relationship set
Customer LoanBorrows
Employee
Loan_Officer
?
Associate Loan Officer with Loan?
What if we want a loan officer for every (customer, loan) pair?
E/R Data ModelSummary
Entities, Relationships (sets)
Both can have attributes (simple, multivalued, derived,
composite)
Cardinality or relationship sets (1:1, n:1, n:m)
Keys: superkeys, candidate keys, primary key
DBA chooses primary key for entity sets
Automatically determined for relationship sets
Weak Entity Sets, Existence Dependence, Total/Partial
Participation
Specialization and Generalization (E/R + inheritance)
E/R Relational Schema
Entity Sets
E = (a1, …, an)E
a1an…
E/R Relational Schema
Entity Sets
E = (a1, …, an)
Relationship Sets
R = (a1, b1, c1, …, ck)
a1: E1’s key
b1: E2’s key
c1, …, ck: attributes of R
E
a1an…
E2
bm
E1
a1an
ck
R
…
…
…a1 b1
c1
Not the whole story for Relationship Sets …
What about…E2
bm
E1
a1an
ck
R
…
…
…a1
b1c1
Rule of Thumb
Fewer tables
good, as long as
no redundancy
• a1 is a key for R
• a1 also a key for E1 = (a1, …, an)
Could have: R = (a1, b1, c1, …, ck) but…
• Ignore R
• Add b1, c1, …, ck to E1 instead (i.e: E1 = (a1, …, an,, b1, c1, …, ck))
Instead:
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
E2
bm
E1
a1an
ck
R
……
…a1 b1
c1
? ?
R
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
n:1 E1 = (a1, …, an, b1, c1, …, cn)E2 = (b1, …, bm)
E2
bm
E1
a1an
ck
R
……
…a1 b1
c1
? ?
R
R
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
n:1 E1 = (a1, …, an, b1, c1, …, cn)E2 = (b1, …, bm)
1:n E1 = (a1, …, an)E2 = (b1, …, bm,, a1, c1, …, cn)
R
R
R
E2
bm
E1
a1an
ck
R
……
…a1 b1
c1
? ?
Relationship Cardinality Relational Schema
n:m E1 = (a1, …, an)E2 = (b1, …, bm)R = (a1, b1, c1, …, cn)
n:1 E1 = (a1, …, an, b1, c1, …, cn)E2 = (b1, …, bm)
1:n E1 = (a1, …, an)E2 = (b1, …, bm,, a1, c1, …, cn)
1:1Treat as n:1 or 1:n
R
R
R
R
E2
bm
E1
a1an
ck
R
……
…a1 b1
c1
? ?
Acct-BranchAccount
acct_no balance
Branch
bname bcity assets
BorrowerCustomer
cname ccity
Loan
lno amtcstreet
Depositor Loan-Branch
Q. How many tables does this get translated into?
A. 6 (account, branch, customer, loan, depositor, borrower)
Q. What are the schemas?
Acct-BranchAccount
acct_no balance
Branch
bname bcity assets
BorrowerCustomer
cname ccity
Loan
lno amtcstreet
Depositor Loan-Branch
Account
bname acct_no balance
Depositor
cname acct_no
Customer
cname cstreet ccity
Branch
bname bcity assets
Borrower
cname lno
Loan
bname lno amt
Account
bname acct_no balance
DowntownMianusPerryR.H.
BrightonRedwoodBrighton
A-101A-215A-102A-305A-201A-222A-217
500700400350900700750
Depositor
cname acct_no
JohnsonSmithHayesTurner
JohnsonJones
Lindsay
A-101A-215A-102A-305A-201A-217A-222
Customer
cname cstreet ccity
JonesSmithHayesCurry
LindsayTurner
WilliamsAdams
JohnsonGlennBrooksGreen
MainNorthMainNorthPark
PutnamNassauSpringAlma
Sand HillSenatorWalnut
HarrisonRye
HarrisonRye
PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford
Branch
bname bcity assets
DowntownRedwood
PerryMianusR.H.
PownelN. TownBrighton
BrooklynPalo Alto
HorseneckHorseneckHorseneckBennington
RyeBrooklyn
9M2.1M1.7M0.4M8M
0.3M3.7M7.1M
Borrower
cname lno
JonesSmithHayes
JacksonCurrySmith
WilliamsAdams
L-17L-23L-15L-14L-93L-11L-17L-16
Loan
bname lno amt
DowntownRedwood
PerryDowntown
MianusR.H.Perry
L-17L-23L-15L-14L-93L-11L-16
10002000150015005009001300
E/R Relational Schema
Weak Entity Sets
E1 = (a1, …, an)
E2 = (a1, b1, …, bm)E1
a1an
E2
b1 … bm
…
IR
E/R Relational Schema
Multivalued Attributes
Emp = (ssn, name)
Emp-Depts = (ssn, dept)
Emp
ssn namedept
Emp
ssn name
001
…
Smith
…
Emp-Depts
ssn dept
001
001
…
Acct
Sales
…
E/R Relational Schema
Subclasses
Method 1:
E = (a1, …, an)
E1 = (a1, b1, …, bm)
E2 = (a1, c1, …, ck)
E
a1an
E2
c1b1
Isa
E1
…
… bm … ck
E/Rb Relational Schema
Subclasses
Method 1:
E = (a1, …, an)
E1 = (a1, b1, …, bm)
E2 = (a1, c1, …, ck)
Method 2:
E1 = (a1, …, an, b1, …, bm)
E2 = (a1, …, an, c1, …, ck)
E
a1an
E2
c1b1
Isa
E1
…
… bm … ck
Subclasses example:
Method 1:Account = (acct_no, balance)SAccount = (acct_no, interest)CAccount = (acct_no, overdraft)
Method 2:SAccount = (acct_no, balance, interest)CAccount = (acct_no, balance, overdraft)
Q: When is method 2 not possible?
A: When subclassing is partial