Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | abraham-webb |
View: | 217 times |
Download: | 0 times |
Fundamentals/ICY: Databases2012/13Week 4
John BarndenProfessor of Artificial Intelligence
School of Computer ScienceUniversity of Birmingham, UK
Reminder of Week 3
ENTITIES, RELATIONSHIPS & ATTRIBUTES(Introduction)
Entities Basically, entities are just things of the “important types” that we
judged above to merit tables. So we had entity types such as: People Employing Organizations Phone Stations (as opposed to just phone numbers as such)
So what the entity types are in a given working environment are partly a matter of judgment, as explained earlier.
But we’ll see that in designing a DB we may need to introduce new, not immediately obvious, entity types.
“Entities” are, or should be, the things of a type: e.g., individual people. An entity is represented by a row in the appropriate table.
Entity Terminology
Unfortunately: “entity” is often used to mean entity type. “entity set” is often used for entity type. “entity occurrence” is often used to mean individual entity.
New for Week 4
Relationships
These are the relationships between entity types, such as A person being employed by an organization A person having a phone station
Have to think about both directions of a relationship: e.g., both employed-by and employs.
CAUTION: Tables are also called “relations” [hence “relational” DB] (much more on this later). This is to do with the internals of tables/entities rather than with “relationships” between entities.
Relationship Connectivity
Relationships are importantly categorized as to uniqueness or multiplicity of entities at either end – “connectivity.”
Has big effect on DB design.
1:1 (“one to one”): e.g., the people/phone-stations relationship, if each person has at most one phone station and each phone station is assigned to at most one person.
M:N (“many to many”): e.g., the employs relationship, assuming a person may have more than one employing organization (or none) and an organization may have more than one employee (or none). (Don’t take “many” seriously – just means possibly more than one.)
1:M (“one to many”): e.g., the employs relationship, if an organization may have more than one employee (or none) but a person has at most one employing-org.
Relationship Cardinality Relationships can be further specified as to “how many entities allowed or
required at either end” – cardinality.
Also has significant effect on DB design.
In a relationship from entity type A to entity type B, a minimum and a maximum can be specified for the number of B entities for each A entity.
A maximum greater than 1 can only be specified if the relationship from A to B is 1:M or M:N. (So the notions of connectivity and cardinality are not properly separated).
E.g., could be specified that a person can only be employed by up to five organizations.
Most normally, the important choice for the minimum is between none and one. E.g., the minimum for employed-by could be none, but the minimum for employs could be one. But the minimum number of wheels for a car could be specified to be three.
If the minimum is none, then B is optional for A. Otherwise, it is mandatory for A.
Attributes
Attributes of entities of a given type are the names of the different pieces of information that need to be stored for entities of that type. So they’re just the column names for the table for the entity type.
E.g., entities of the type “people” could have the following attributes: person ID number, last name, first name, phone number, age.
Note: Attributes include artificial ones like the employer identity numbers (EMPL. NUM.) that we introduced in an example above. These may have no significance outside the DB itself.
Relationships are represented by associative linking by means of shared attributes. (For now, will always assume that the same attribute name is used in each of the tables involved.)
Attribute Determination REMEMBER: Rows in a table are uniquely determined
(picked out) by the values in some set of columns, i.e. the values of some collection of attributes.
That is, given some values for those attributes, there is at most one entity that has those values for those attributes.
Hence, that collection of attributes determines all the other attributes.
That is, given some values for the determining attributes, there’s at most one value for each of the other attributes.
Attribute Determination, contd.
More generally, a collection of one or more attributes determines another attribute A if only one value for A is possible given the values for the former attributes.
E.g., the collection DAY-NUMBER, MONTH and YEAR specifying birth-date in a table about people could determine DAY-NAME,
even though it doesn’t determine other attributes such as NATIONALITY: several people could have the same birth-date but be of different nationalities.
We alternatively say that DAY-NAME is functionally dependent on DAY-NUMBER, MONTH and YEAR.
Attribute Determination, contd.
The determining collection of attributes is called the determinant in the determination.
Write determination as: DAY-NUMBER, MONTH, YEAR DAY-NAME
Determination does not mean that you have at hand an algorithm for working out a dependent attribute from the determinant, although you may do.
E.g. Consider a “father” attribute.
Keys A key for a table is a collection of one or more attributes that
determines at least some other attribute(s) in that same table. CAUTION: But sometimes people use “key” as a sloppy abbreviation for
“primary key” (see below) or other notions.
A superkey for a table is a collection of one or more attributes that determines all the other attributes in the table, i.e. determines a whole row.
Trivially, the collection of all the attributes is a superkey.
A candidate key is a minimal superkey (i.e., you can’t remove attributes from it and still have a superkey.)
It does NOT necessarily mean a numerically smallest superkey.
Superkeys & Candidate Keys: Example
Suppose a “day” entity type has attributes DAY-NAME, DAY-NUMBER, MONTH, YEAR, IS-HOLIDAY, …
Then DAY-NAME, DAY-NUMBER, MONTH, YEAR would be a superkey for the day type.
But it’s not a candidate key because DAY-NUMBER, MONTH, YEAR is also a superkey.
This smaller collection is a candidate key because no sub-collection of it uniquely identifies a day.
Primary Keys A primary key for a table (entity type) is a candidate key that the
DB designer has chosen as being the main way of uniquely identifying a row (entity). Extra restriction: Its attributes are not allowed to have null values.
It could be that there’s only one candidate key in practice anyway, such as a person’s ID number.
Primary keys are the main way of identifying target entities in entity relationships, e.g., the way to identify someone’s employing organization.
For efficiency reasons, the simpler that primary keys are, the better.
Identity numbers (of people, companies, products, courses, etc.), or combinations of them with one or two other attributes, are the typical primary keys in examples in the textbook and handouts.
Relationships and Foreign Keys Standardly, a relationship is represented by means of
foreign keys.
A foreign key in a table TT is a chosen collection of attributes intended to match the attributes that constitute (usually) the primary key in another table, UU, and thereby to refer to entities in UU.
Intuitively, the foreign key in TT is UU’s “ambassador” [my word] in TT.
Primary & Foreign KeysPERS-ID NAME PHONE EMPL. ID AGE
9568876 Chopples 0121-414-3816 E22561 37
2544799 Blurp 01600-719975 E85704 21
1698674 Rumpel 07970-852657 E22561 88
EMPL. ID EMPL. NAME ADDRESS NUM. EMPLS SECTOR
E48693 BT BT House, London, …
1,234,5678 Private TCOM
E85704 MonmouthSchool for Girls
Hereford Rd, Monmouth, …
245 Private 2E
E22561 University ofBirmingham
Edgbaston Park Rd, ….
4023 Public HE
PHONE TYPE STATUS
0121-414-3816 office OK
01600-719975 home FAULT
0121-440-5677 home OK
07970-852657 mobile UNPAIDForeign keys are in italics
Primary keys are underlined
Composite Primary and Foreign Keys
PERS-ID NAME AREA CODE PHONE BODY EMPL ID AGE
9568876 Chopples 0121 414-3816 E22561 37
2544799 Blurp 01600 719975 E85704 21
1698674 Rumpel 07970 852657 E22561 88
AREA CODE PHONE BODY TYPE STATUS
0121 414-3816 office OK
01600 719975 home FAULT
0121 440-5677 home OK
07970 852657 mobile UNPAID
Phones
People
1:1 Connectivity between Tables
PERS-ID NAME PHONE EMPL ID AGE
9568876 Chopples 0121-414-3816 E22561 37
2544799 Blurp 01600-719975 E85704 21
1698674 Rumpel 07970-852657 E22561 88
5099235 Biggles E22561 29
PHONE TYPE STATUS
0121-414-3816 office OK
01600-719975 home FAULT
0121-440-5677 home OK
07970-852657 mobile UNPAID
Note: the representation is still asymmetric in that the People table mentions phones but not vice versa – symmetry would create extra redundancy.NB: Biggles has no phone listed, and 0121-440-5677 has no person recorded. Suggests a possible reason for not combining such tables.
1:1: that is, no more than one phone allowed per person, and vice versa.
People
Phones
1:M Connectivity between Tables
PERS-ID NAME PHONE EMPL ID AGE
9568876 Chopples 0121-414-3816 E22561 37
2544799 Blurp 01600-719975 E85704 21
1698674 Rumpel 07970-852657 E22561 88
1800748 Dunston 0121-414-3886 E22561 29
EMPL ID EMPL NAME ADDRESS NUM EMPLS SECTOR
E48693 BT BT House, London, …
1,234,5678 Private TCOM
E85704 MonmouthSchool
Hereford Rd, Monmouth, …
245 Private 2E
E22561 University ofBirmingham
Edgbaston Park Rd, ….
3023 Public HE
People
Organizations
More than one employee allowed per organization, but no more than one employer per person.
NOTE direction of use of the foreign key. Why so??