Introduction to Database

Time: 19:00-20:30(Tue,Thu)

Term2 Nov-30-2014

CONTENTS

1 Entity Relationship (ER) Modeling 01

1.1 The Entity Relationship Model (ERM) 01

1.1.1 Entities 01

1.1.2 Attributes 02

1.1.3 Relationships 06

1.1.4 Connectivity and Cardinality 08

1.1.5 Existence Dependence 09

1.1.6 Relationship Strength 09

1.1.7 Weak Entities 11

1.1.8 Relationship Participation 14

1.1.9 Relationship Degree 17

1.1.10 Recursive Relationships 18

1.1.11 Associative (Composite) Entities 22

1.2 Developing an ER Diagram 24

1.3 Database Design Challenges: Conflicting Goals 29

2 Normalization of Database Tables 35

2.1 Database Tables and Normalization 35

2.2 The Need for Normalization 35

2.3 The Normalization Process 39

2.3.1 Conversion to First Normal Form 41

2.3.2 Conversion to Second Normal Form 44

2.3.3 Conversion to Third Normal Form 45

2.4 Improving the Design 47

2.5 Surrogate Key Considerations 51

2.6 Higher-Level Normal Forms 52

2.6.1 The Boyce-Codd Normal Form (BCNF) 52

2.6.2 Fourth Normal Form (4NF) 56

2.7 Normalization and Database Design 57

2.8 Denormalization 60

2.9 Data-Modeling Checklist 64

CONTENTS

3 Introduction to Structured Query Language (SQL) 67

3.1 Introduction to SQL 67

3.2 Data Definition Commands 69

3.2.1 The Database Model 69

3.2.2 Creating the Database 71

3.2.3 The Database Schema 71

3.2.4 Data Types 72

3.2.5 Creating Table Structures 75

3.2.6 SQL Constraints 78

3.2.7 SQL Indexes 81

3.3 Data Manipulation Commands 83

3.3.1 Adding Table Rows 83

3.3.2 Saving Table Changes 84

3.3.3 Listing Table Rows 84

3.3.4 Updating Table Rows 86

3.3.5 Restoring Table Contents 86

3.3.6 Deleting Table Rows 87

3.3.7 Inserting Table Rows with a Select Subquery 88

3.4 SELECT Queries 88

3.4.1 Selecting Rows with Conditional Restrictions 88

3.4.2 Arithmetic Operators:The Rule of Precedence 93

3.4.3 Logical Operators: AND, OR, and NOT 93

3.4.4 Special Operators 95

4 Views 100

4.1 Practical Learning: Introducing Views 100

4.2 Fundamentals of Creating and Using a View 116

4.2.1 Visually Creating a View 116

4.2.2 The Name of a View 118

4.3 Practical Learning: Visually Creating a View 120 4.4 Programmatically Creating and Using a View 142

4.4.1 Creating a View 142

4.4.2 Practical Learning: Programmatically Creating a View 142

4.4.3 Executing a View 143

4.5 View Maintenance 143

4.5.1 The Properties of a View 143

4.5.2 Modifying a View 143

4.5.3 Deleting a View 144

4.6 Using a View 145

4.6.1 Data Entry with a View 145

4.6.2 Practical Learning: Performing Data Entry Using a View 145

4.6.3 Views and Functions 147

4.6.4 A View with Alias Names 148

4.6.5 Views and Conditional Statements 149

CONTENTS

5 Stored Procedures 150

5.1 Creating a Stored Procedure 150

5.2 Managing Procedures 151

5.2.1 Modifying a Procedure 151

5.2.2 Deleting a Procedure 151

5.3 Exploring Procedures 152

5.3.1 Introduction 152

5.3.2 Practical Learning: Creating a Stored Procedure 153

5.3.3 Executing a Procedure 153

5.4 Practical Learning: Executing a Stored Procedure 154

5.4.1 Using Expressions and Functions 154

5.4.2 Practical Learning: Using Expressions and Functions 155

5.4.3 Introduction to Arguments of a Stored Procedure 156

5.4.4 Executing an Argumentative Stored Procedure 157

5.4.5 In SQL 158

5.4.6 Default Arguments 161

5.4.7 Output Parameters 167

6 References 169

Entity Relationship (ER) Modeling

In this chapter, you will learn:

The main characteristics of entity relationship components

How relationships between entities are defined, refined, and incorporated into the database

design process

How ERD components affect database design and implementation

That real-world database design often requires the reconciliation of conflicting goals

This chapter expands coverage of the data-modeling aspect of database design. Data

modeling is the first step in the database design journey, serving as a bridge between

real-world objects and the database model that is implemented in the computer. Therefore,

the importance of data-modeling details, expressed graphically through entity relationship

review diagrams (ERDs), cannot be overstated.

Most of the basic concepts and definitions used in the entity relationship model (ERM) were

introduced in Chapter 2, Data Models. For example, the basic components of entities and

relationships and their representation should now be familiar to you. This chapter goes

much deeper and further, analyzing the graphic depiction of relationships among the entities

and showing how those depictions help you summarize the wealth of data required to

implement a successful design.

Finally, the chapter illustrates how conflicting goals can be a challenge in database design,

possibly requiring you to make design compromises.

P

1

Pages 01

Note

Because this book generally focuses on the relational model, you might be tempted to conclude that the ERM is

exclusively a relational tool. Actually, conceptual models such as the ERM can be used to understand and

design the data requirements of an organization. Therefore, the ERM is independent of the database type.

Conceptual models are used in the conceptual design of databases, while relational models are used in the

logical design of databases. However, because you are now familiar with the relational model from the previous

chapter, the relational model is used extensively in this chapter to explain ER constructs and the way they are

used to develop database designs.

1.1 THE ENTITY RELATIONSHIP MODEL (ERM)

You should remember from Chapter 2, Data Models, and Chapter 3, The Relational Database Model, that the ERM

forms the basis of an ERD. The ERD represents the conceptual database as viewed by the end user. ERDs depict the

database’s main components: entities, attributes, and relationships. Because an entity represents a real-world object, the

words entity and object are often used interchangeably. Thus, the entities (objects) of the Tiny College database design

developed in this chapter include students, classes, teachers, and classrooms. The order in which the ERD

components are covered in the chapter is dictated by the way the modeling tools are used to develop ERDs that can

form the basis for successful database design and implementation.

In Chapter 2, you also learned about the various notations used with ERDs—the original Chen notation and the newer

Crow’s Foot and UML notations. The first two notations are used at the beginning of this chapter to introduce some

basic ER modeling concepts. Some conceptual database modeling concepts can be expressed only using the Chen

notation. However, because the emphasis is on design and implementation of databases, the Crow’s Foot and UML class

diagram notations are used for the final Tiny College ER diagram example. Because of its implementation emphasis,

the Crow’s Foot notation can represent only what could be implemented. In other words:

The Chen notation favors conceptual modeling.

The Crow’s Foot notation favors a more implementation-oriented approach.

The UML notation can be used for both conceptual and implementation modeling.

O n l i n e C o n t e n t

To learn how to create ER diagrams with the help of Microsoft Visio, see the Premium Website for this book:

Appendix A, Designing Databases with Visio Professional: A Tutorial shows you how to create Crow’s

Foot ERDs.

Appendix H, Unified Modeling Language (UML), shows you how to create UML class diagrams.

1.1.1 Entities

Recall that an entity is an object of interest to the end user. In Chapter 2, you learned that at the ER modeling level, an

entity actually refers to the entity set and not to a single entity occurrence. In other words, the word entity in the ERM

corresponds to a table—not to a row—in the relational environment. The ERM refers to a table row as an entity

instance or entity occurrence. In both the Chen and Crow’s Foot notations, an entity is represented by a rectangle

containing the entity’s name. The entity name, a noun, is usually written in all capital letters.

E N T I T Y R E L A T I O N S H I P ( E R ) M O D E L I N G

1.1.2 Attributes

Attributes are characteristics of entities. For example, the STUDENT entity includes, among many others, the

attributes STU_LNAME, STU_FNAME, and STU_INITIAL. In the original Chen notation, attributes are represented

by ovals and are connected to the entity rectangle with a line. Each oval contains the name of the attribute it represents.

In the Crow’s Foot notation, the attributes are written in the attribute box below the entity rectangle. (See Figure 4.1.)

Because the Chen representation is rather space-consuming, software vendors have adopted the Crow’s Foot attribute

display.

FIGURE The attributes of the STUDENT entity: Chen and Crow’s Foot

1.1

Chen Model Crow’s Foot Model

STU_INITIAL

STU_FNAME STU_EMAIL

STU_LNAME STUDENT STU_PHONE

Required and Optional Attributes

A required attribute is an attribute that must have a value; in other words, it cannot be left empty. As shown in

Figure 4.1, there are two boldfaced attributes in the Crow’s Foot notation. This indicates that a data entry will be

required. In this example, STU_LNAME and STU_FNAME require data entries because of the assumption that all

students have a last name and a first name. But students might not have a middle name, and perhaps they do not (yet)

have a phone number and an e-mail address. Therefore, those attributes are not presented in boldface in the entity

box. An optional attribute is an attribute that does not require a value; therefore, it can be left empty.

Domains

Attributes have a domain. As you learned in Chapter 3, a domain is the set of possible values for a given attribute.

For example, the domain for the grade point average (GPA) attribute is written (0,4) because the lowest possible GPA

value is 0 and the highest possible value is 4. The domain for the gender attribute consists of only two possibilities: M

or F (or some other equivalent code). The domain for a company’s date of hire attribute consists of all dates that fit

in a range (for example, company startup date to current date).

Attributes may share a domain. For instance, a student address and a professor address share the same domain of all

possible addresses. In fact, the data dictionary may let a newly declared attribute inherit the characteristics of an existing

attribute if the same attribute name is used. For example, the PROFESSOR and STUDENT entities may each have an

attribute named ADDRESS and could therefore share a domain.

Identifiers (Primary Keys)

The ERM uses identifiers, that is, one or more attributes that uniquely identify each entity instance. In the relational

model, such identifiers are mapped to primary keys (PKs) in tables. Identifiers are underlined in the ERD. Key attributes

are also underlined in a frequently used table structure shorthand notation using the format:

TABLE NAME (KEY_ATTRIBUTE 1, ATTRIBUTE 2, ATTRIBUTE 3, . . . ATTRIBUTE K)

02

Pages 03

For example, a CAR entity may be represented by:

CAR (CAR_VIN, MOD_CODE, CAR_YEAR, CAR_COLOR)

(Each car is identified by a unique vehicle identification number, or CAR_VIN.)

Composite Identifiers

Ideally, an entity identifier is composed of only a single attribute. For example, the table in Figure 4.2 uses a

single-attribute primary key named CLASS_CODE. However, it is possible to use a composite identifier, that is, a

primary key composed of more than one attribute. For instance, the Tiny College database administrator may decide

to identify each CLASS entity instance (occurrence) by using a composite primary key composed of the combination

of CRS_CODE and CLASS_SECTION instead of using CLASS_CODE. Either approach uniquely identifies each entity

instance. Given the current structure of the CLASS table shown in Figure 4.2, CLASS_CODE is the primary key, and

the combination of CRS_CODE and CLASS_SECTION is a proper candidate key. If the CLASS_CODE attribute is

deleted from the CLASS entity, the candidate key (CRS_CODE and CLASS_SECTION) becomes an acceptable

composite primary key.

FIGURE The CLASS table (entity) components and contents

1.2

Note

Remember that Chapter 3 made a commonly accepted distinction between COURSE and CLASS. A CLASS

constitutes a specific time and place of a COURSE offering . A class is defined by the course description and its

time and place, or section. Consider a professor who teaches Database I, Section 2; Database I, Section 5;

Database I, Section 8; and Spreadsheet II, Section 6. That instructor teaches two courses (Database I and

Spreadsheet II), but four classes. Typically, the COURSE offerings are printed in a course catalog , while the

CLASS offerings are printed in a class schedule for each semester, trimester, or quarter.

If the CLASS_CODE in Figure 4.2 is used as the primary key, the CLASS entity may be represented in shorthand

form by:

CLASS (CLASS_CODE, CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)

CLASS_SECTION, the CLASS entity may be represented by:


On the other hand, if CLASS_CODE is deleted, and the composite primary key is the combination of CRS_CODE and

04

CLASS (CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)

Note that both key attributes are underlined in the entity notation.

Composite and Simple Attributes

Attributes are classified as simple or composite. A composite attribute, not to be confused with a composite key, is

an attribute that can be further subdivided to yield additional attributes. For example, the attribute ADDRESS can be

subdivided into street, city, state, and zip code. Similarly, the attribute PHONE_NUMBER can be subdivided into area

code and exchange number. A simple attribute is an attribute that cannot be subdivided. For example, age, sex, and

marital status would be classified as simple attributes. To facilitate detailed queries, it is wise to change composite

attributes into a series of simple attributes.

Single-Valued Attributes

A single-valued attribute is an attribute that can have only a single value. For example, a person can have only one

Social Security number, and a manufactured part can have only one serial number. Keep in mind that a single-valued

attribute is not necessarily a simple attribute. For instance, a part’s serial number, such as SE-08-02-189935, is single-

valued, but it is a composite attribute because it can be subdivided into the region in which the part was produced (SE), the

plant within that region (08), the shift within the plant (02), and the part number (189935).

Multivalued Attributes

Multivalued attributes are attributes that can have many values. For instance, a person may have several college

degrees, and a household may have several different phones, each with its own number. Similarly, a car’s color may be

subdivided into many colors (that is, colors for the roof, body, and trim). In the Chen ERM, the multivalued attributes are

shown by a double line connecting the attribute to the entity. The Crow’s Foot notation does not identify

multivalued attributes. The ERD in Figure 4.3 contains all of the components introduced thus far. In Figure 4.3, note that

CAR_VIN is the primary key, and CAR_COLOR is a multivalued attribute of the CAR entity.

FIGURE A multivalued attribute in an entity

1.3

Chen Model

MOD_CODE CAR_YEAR

CAR_VIN CAR CAR_COLOR

Crow’s Foot Model

Pages 05

Note

In the ERD models in Figure 4.3, the CAR entity’s foreign key (FK) has been typed as MOD_CODE. This attribute was

manually added to the entity. Actually, proper use of database modeling software will automatically produce

the FK when the relationship is defined. In addition, the software will label the FK appropriately and write the FK’s

implementation details in a data dictionary. Therefore, when you use database modeling software like Visio

Professional, never type the FK attribute yourself; let the software handle that task when the relationship

between the entities is defined. (You can see how that's done in Appendix A, Designing Databases with Visio

Professional: A Tutorial, in the Premium Website.)

Implementing Multivalued Attributes

Although the conceptual model can handle M:N relationships and multivalued attributes, you should not implement

them in the RDBMS. Remember from Chapter 3 that in the relational table, each column/row intersection represents a

single data value. So if multivalued attributes exist, the designer must decide on one of two possible courses of action:

1. Within the original entity, create several new attributes, one for each of the original multivalued attribute’s

components. For example, the CAR entity’s attribute CAR_COLOR can be split to create the new attributes

CAR_TOPCOLOR, CAR_BODYCOLOR, and CAR_TRIMCOLOR, which are then assigned to the CAR

entity. (See Figure 4.4.)

FIGURE Splitting the multivalued attribute into new attributes 1.4

Chen Model Crow’s Foot Model

CAR_YEAR

MOD_CODE CAR_TOPCOLOR CAR_VIN CAR CAR_TRIMCOLOR

CAR_BODYCOLOR

Although this solution seems to work, its adoption can lead to major structural problems in the table. For

example, if additional color components—such as a logo color—are added for some cars, the table structure

must be modified to accommodate the new color section. In that case, cars that do not have such color sections

generate nulls for the nonexisting components, or their color entries for those sections are entered as N/A to

indicate “not applicable.” (Imagine how the solution in Figure 4.4—splitting a multivalued attribute into new

attributes—would cause problems if it were applied to an employee entity containing employee degrees and

certifications. If some employees have 10 degrees and certifications while most have fewer or none, the

number of degree/certification attributes would number 10, and most of those attribute values would be null for

most of the employees.) In short, although you have seen solution 1 applied, it is not an acceptable solution.

2. Create a new entity composed of the original multivalued attribute’s components. This new entity allows the

designer to define color for different sections of the car. (See Table 4.1.) Then, this new CAR_COLOR entity

is related to the original CAR entity in a 1:M relationship.

TABLE

1.1

SECTION

Top

Body

Trim

Interior

Components of the Multivalued Attribute

COLOR

White

Blue

Gold

Blue


Using the approach illustrated in Table 4.1, you even get a

fringe benefit: you are now able to assign as many colors as

necessary without having to change the table structure. Note

that the ERM shown in Figure 4.5 reflects the components

listed in Table 4.1. This is the preferred way to deal with

multivalued attributes. Creating a new entity in a 1:M rela-

tionship with the original entity yields several benefits: it’s a

more flexible, expandable solution, and it is compatible with

the relational model!

06

FIGURE A new entity set composed of a multivalued attribute’s components

1.5

Derived Attributes

Finally, an attribute may be classified as a derived attribute. A derived attribute is an attribute whose value is

calculated (derived) from other attributes. The derived attribute need not be physically stored within the database;

instead, it can be derived by using an algorithm. For example, an employee’s age, EMP_AGE, may be found by

computing the integer value of the difference between the current date and the EMP_DOB. If you use Microsoft

Access, you would use the formula INT((DATE() - EMP_DOB)/365). In Microsoft SQL Server, you would use SELECT

DATEDIFF(“YEAR”, EMP_DOB, GETDATE()), where DATEDIFF is a function that computes the difference between

dates. The first parameter indicates the measurement, in this case, years.

If you use Oracle, you would use SYSDATE instead of DATE(). (You are assuming, of course, that the EMP_DOB was

stored in the Julian date format.) Similarly, the total cost of an order can be derived by multiplying the quantity ordered

by the unit price. Or the estimated average speed can be derived by dividing trip distance by the time spent en route.

A derived attribute is indicated in the Chen notation by a dashed line connecting the attribute and the entity. (See

Figure 4.6.) The Crow’s Foot notation does not have a method for distinguishing the derived attribute from other

attributes.

Derived attributes are sometimes referred to as computed attributes. A derived attribute computation can be as simple as

adding two attribute values located on the same row, or it can be the result of aggregating the sum of values located on

many table rows (from the same table or from a different table). The decision to store derived attributes in database tables

depends on the processing requirements and the constraints placed on a particular application. The designer should be

able to balance the design in accordance with such constraints. Table 4.2 shows the advantages and disadvantages of

storing (or not storing) derived attributes in the database.

1.1.3 Relationships

Recall from Chapter 2 that a relationship is an association between entities. The entities that participate in a

relationship are also known as participants, and each relationship is identified by a name that describes the

relationship. The relationship name is an active or passive verb; for example, a STUDENT takes a CLASS, a

PROFESSOR teaches a CLASS, a DEPARTMENT employs a PROFESSOR, a DIVISION is managed by an

EMPLOYEE, and an AIRCRAFT is flown by a CREW.

Pages 07

FIGURE Depiction of a derived attribute

1.6

Chen Model

EMP_FNAME EMP_INITIAL

EMP_LNAME EMP_DOB

EMP_NUM EMPLOYEE EMP_AGE

Crow’s Foot Model

TABLE

1.2 Advantage

Advantages and Disadvantages of Storing Derived Attributes

DERIVED ATTRIBUTE

STORED NOT STORED

Saves CPU processing cycles Saves storage space

Saves data access time Computation always yields current value

Data value is readily available

Can be used to keep track of historical data

Disadvantage Requires constant maintenance to ensure Uses CPU processing cycles

derived value is current, especially if any values Increases data access time

used in the calculation change Adds coding complexity to queries

Relationships between entities always operate in both directions. That is, to define the relationship between the entities

named CUSTOMER and INVOICE, you would specify that:

A CUSTOMER may generate many INVOICEs.

Each INVOICE is generated by one CUSTOMER.

Because you know both directions of the relationship between CUSTOMER and INVOICE, it is easy to see that this

relationship can be classified as 1:M.

The relationship classification is difficult to establish if you know only one side of the relationship. For example, if you

specify that:

A DIVISION is managed by one EMPLOYEE.

You don’t know if the relationship is 1:1 or 1:M. Therefore, you should ask the question “Can an employee manage

more than one division?” If the answer is yes, the relationship is 1:M, and the second part of the relationship is then

written as:

An EMPLOYEE may manage many DIVISIONs.

If an employee cannot manage more than one division, the relationship is 1:1, and the second part of the relationship is

then written as:

An EMPLOYEE may manage only one DIVISION.

1.1.4 Connectivity and Cardinality


08

You learned in Chapter 2 that entity relationships may be classified as one-to-one, one-to-many, or many-to-many. You

also learned how such relationships were depicted in the Chen and Crow’s Foot notations. The term connectivity is

used to describe the relationship classification.

Cardinality expresses the minimum and maximum number of entity occurrences associated with one occurrence of the

related entity. In the ERD, cardinality is indicated by placing the appropriate numbers beside the entities, using the format

(x,y). The first value represents the minimum number of associated entities, while the second value represents the

maximum number of associated entities. Many database designers who use Crow’s Foot modeling notation do not depict

the specific cardinalities on the ER diagram itself because the specific limits described by the cardinalities cannot be

implemented directly through the database design. Correspondingly, some Crow’s Foot ER modeling tools do not print

the numeric cardinality range in the diagram; instead, you can add it as text if you want to have it shown. When the

specific cardinalities are not included on the diagram in Crow’s Foot notation, cardinality is implied by the use of the

symbols shown in Figure 4.7, which describe the connectivity and participation (discussed below). The numeric

cardinality range has been added using the Visio text drawing tool.

Knowing the minimum and maximum number of entity FIGURE Connectivity and cardinality in 1.7 an ERD

occurrences is very useful at the application software level.

For example, Tiny College might want to ensure that a class

is not taught unless it has at least 10 students enrolled.

Similarly, if the classroom can hold only 30 students, the

application software should use that cardinality to limit

enrollment in the class. However, keep in mind that the

DBMS cannot handle the implementation of the cardinalities

at the table level—that capability is provided by the applica-

tion software or by triggers. You will learn how to create and

execute triggers in Chapter 8, Advanced SQL.

As you examine the Crow’s Foot diagram in Figure 4.7,

keep in mind that the cardinalities represent the number of occurrences in the related entity. For example, the

cardinality (1,4) written next to the CLASS entity in the “PROFESSOR teaches CLASS” relationship indicates that

each professor teaches up to four classes, which means that the PROFESSOR table’s primary key value occurs at least

once and no more than four times as foreign key values in the CLASS table. If the cardinality had been written as (1,N),

there would be no upper limit to the number of classes a professor might teach. Similarly, the cardinality (1,1) written

next to the PROFESSOR entity indicates that each class is taught by one and only one professor. That is, each CLASS

entity occurrence is associated with one and only one entity occurrence in PROFESSOR.

Connectivities and cardinalities are established by very concise statements known as business rules, which were

introduced in Chapter 2. Such rules, derived from a precise and detailed description of an organization’s data

environment, also establish the ERM’s entities, attributes, relationships, connectivities, cardinalities, and constraints.

Because business rules define the ERM’s components, making sure that all appropriate business rules are identified is a

very important part of a database designer’s job.

Note

The placement of the cardinalities in the ER diagram is a matter of convention. The Chen notation places the

cardinalities on the side of the related entity. The Crow’s Foot and UML diagrams place the cardinalities next to the

entity to which the cardinalities apply.

Pages 09


Because the careful definition of complete and accurate business rules is crucial to good database design, their

derivation is examined in detail in Appendix B, The University Lab: Conceptual Design. The modeling skills you

are learning in this chapter are applied in the development of a real database design in Appendix B. The initial

design shown in Appendix B is then modified in Appendix C, The University Lab: Conceptual Design

Verification, Logical Design, and Implementation. (Both appendixes are found in the Premium Website.)

1.1.5 Existence Dependence

An entity is said to be existence-dependent if it can exist in the database only when it is associated with another

related entity occurrence. In implementation terms, an entity is existence-dependent if it has a mandatory foreign

key—that is, a foreign key attribute that cannot be null. For example, if an employee wants to claim one or more

dependents for tax-withholding purposes, the relationship “EMPLOYEE claims DEPENDENT” would be appropriate. In

that case, the DEPENDENT entity is clearly existence-dependent on the EMPLOYEE entity because it is impossible for

the dependent to exist apart from the EMPLOYEE in the database.

If an entity can exist apart from all of its related entities (it is existence-independent), then that entity is referred to as

a strong entity or regular entity. For example, suppose that the XYZ Corporation uses parts to produce its

products. Furthermore, suppose that some of those parts are produced in-house and other parts are bought from

vendors. In that scenario, it is quite possible for a PART to exist independently from a VENDOR in the relationship

“PART is supplied by VENDOR,” because at least some of the parts are not supplied by a vendor. Therefore, PART is

existence-independent from VENDOR.

Note

The relationship strength concept is not part of the original ERM. Instead, this concept applies directly to Crow’s Foot

diagrams. Because Crow’s Foot diagrams are used extensively to design relational databases, it is important to

understand relationship strength as it affects database implementation. The Chen ERD notation is oriented toward

conceptual modeling and therefore does not distinguish between weak and strong relationships.

1.1.6 Relationship Strength

The concept of relationship strength is based on how the primary key of a related entity is defined. To implement a

relationship, the primary key of one entity appears as a foreign key in the related entity. For example, the 1:M

relationship between VENDOR and PRODUCT in Chapter 3, Figure 3.3, is implemented by using the VEND_CODE

primary key in VENDOR as a foreign key in PRODUCT. There are times when the foreign key also is a primary key

component in the related entity. For example, in Figure 4.5, the CAR entity primary key (CAR_VIN) appears as both a

primary key component and a foreign key in the CAR_COLOR entity. In this section, you will learn how various

relationship strength decisions affect primary key arrangement in database design.

Weak (Non-identifying) Relationships


10

A weak relationship, also known as a non-identifying relationship, exists if the PK of the related entity does not

contain a PK component of the parent entity. By default, relationships are established by having the PK of the parent

entity appear as an FK on the related entity. For example, suppose that the COURSE and CLASS entities are

defined as:

COURSE(CRS_CODE, DEPT_CODE, CRS_DESCRIPTION, CRS_CREDIT)

CLASS(CLASS_CODE, CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)

In this case, a weak relationship exists between COURSE and CLASS because the CLASS_CODE is the CLASS

entity’s PK, while the CRS_CODE in CLASS is only an FK. In this example, the CLASS PK did not inherit the PK

component from the COURSE entity.

Figure 4.8 shows how the Crow’s Foot notation depicts a weak relationship by placing a dashed relationship line

between the entities. The tables shown below the ERD illustrate how such a relationship is implemented.

FIGURE A weak (non-identifying) relationship between COURSE and CLASS

1.8 Table name: COURSE Database name: Ch04_TinyCollege

Table name: CLASS

Pages 11


All of the databases used to illustrate the material in this chapter are found in the Premium Website.

Note

If you are used to looking at relational diagrams such as the ones produced by Microsoft Access, you expect to see

the relationship line in the relational diagram drawn from the PK to the FK. However, the relational diagram convention

is not necessarily reflected in the ERD. In an ERD, the focus is on the entities and the relationships between them,

rather than on the way those relationships are anchored graphically. You will discover that the placement of the

relationship lines in a complex ERD that includes both horizontally and vertically placed entities is largely

dictated by the designer’s decision to improve the readability of the design. (Remember that the ERD is used for

communication between the designer(s) and end users.)

Strong (Identifying) Relationships

A strong relationship, also known as an identifying relationship, exists when the PK of the related entity contains a

PK component of the parent entity. For example, the definitions of the COURSE and CLASS entities

COURSE(CRS_CODE, DEPT_CODE, CRS_DESCRIPTION, CRS_CREDIT)

CLASS(CRS_CODE, CLASS_SECTION , CLASS_TIME, ROOM_CODE, PROF_NUM)

indicate that a strong relationship exists between COURSE and CLASS, because the CLASS entity’s composite PK is

composed of CRS_CODE + CLASS_SECTION. (Note that the CRS_CODE in CLASS is also the FK to the

COURSE entity.)

The Crow’s Foot notation depicts the strong (identifying) relationship with a solid line between the entities, shown in

Figure 4.9. Whether the relationship between COURSE and CLASS is strong or weak depends on how the CLASS

entity’s primary key is defined.

Keep in mind that the order in which the tables are created and loaded is very important. For example, in the

“COURSE generates CLASS” relationship, the COURSE table must be created before the CLASS table. After all, it

would not be acceptable to have the CLASS table’s foreign key reference a COURSE table that did not yet exist. In fact,

you must load the data of the “1” side first in a 1:M relationship to avoid the possibility of referential

integrity errors, regardless of whether the relationships are weak or strong.

As you examine Figure 4.9 you might wonder what the O symbol next to the CLASS entity signifies. You will discover the

meaning of this cardinality in Section 4.1.8, Relationship Participation.

Remember that the nature of the relationship is often determined by the database designer, who must use professional

judgment to determine which relationship type and strength best suit the database transaction, efficiency, and

information requirements. That point will often be emphasized in detail!

1.1.7 Weak Entities

In contrast to the strong or regular entity mentioned in Section 4.1.5, a weak entity is one that meets two conditions:

1. The entity is existence-dependent; that is, it cannot exist without the entity with which it has a relationship.

2. The entity has a primary key that is partially or totally derived from the parent entity in the relationship.


FIGURE A strong (identifying) relationship between COURSE and CLASS

1.9 Table name: COURSE Database name: Ch04_TinyCollege_Alt

Table name: CLASS

For example, a company insurance policy insures an employee and his/her dependents. For the purpose of describing

an insurance policy, an EMPLOYEE might or might not have a DEPENDENT, but the DEPENDENT must be

associated with an EMPLOYEE. Moreover, the DEPENDENT cannot exist without the EMPLOYEE; that is, a person

cannot get insurance coverage as a dependent unless s(he) happens to be a dependent of an employee. DEPENDENT

is the weak entity in the relationship “EMPLOYEE has DEPENDENT.” This relationship is shown in Figure 4.10.

Note that the Chen notation in Figure 4.10 identifies the weak entity by using a double-walled entity rectangle. The Crow’s

Foot notation generated by Visio Professional uses the relationship line and the PK/FK designation to indicate whether the

related entity is weak. A strong (identifying) relationship indicates that the related entity is weak. Such a relationship means

that both conditions for the weak entity definition have been met—the related entity is existence-dependent, and the PK

of the related entity contains a PK component of the parent entity. (Some versions of the Crow’s Foot ERD depict the

weak entity by drawing a short line segment in each of the four corners of the weak entity box.)

Remember that the weak entity inherits part of its primary key from its strong counterpart. For example, at least part of

the DEPENDENT entity’s key shown in Figure 4.10 was inherited from the EMPLOYEE entity:

EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_DOB, EMP_HIREDATE)

DEPENDENT (EMP_NUM, DEP_NUM, DEP_FNAME, DEP_DOB)

12

Pages 13

FIGURE A weak entity in an ERD 1.10

Chen Model 1 M

EMPLOYEE has DEPENDENT

(0,N) (1,1)

EMP_NUM EMP_NUM

EMP_LNAME DEP_NUM

EMP_FNAME DEP_FNAME

EMP_INITIAL DEP_DOB

EMP_DOB

EMP_HIREDATE

Crow’s Foot Model

Figure 4.11 illustrates the implementation of the relationship between the weak entity (DEPENDENT) and its parent or

strong counterpart (EMPLOYEE). Note that DEPENDENT’s primary key is composed of two attributes, EMP_NUM and

DEP_NUM, and that EMP_NUM was inherited from EMPLOYEE.

FIGURE A weak entity in a strong relationship

1.11 Table name: EMPLOYEE Database name: Ch04_ShortCo

Table name: DEPENDENT


Given this scenario, and with the help of this relationship, you can determine that:

Jeanine J. Callifante claims two dependents, Annelise and Jorge.

Keep in mind that the database designer usually determines whether an entity can be described as weak based on the

business rules. An examination of the relationship between COURSE and CLASS in Figure 4.8 might cause you to

conclude that CLASS is a weak entity to COURSE. After all, in Figure 4.8, it seems clear that a CLASS cannot exist

without a COURSE; so there is existence dependence. For example, a student cannot enroll in the Accounting I class

ACCT-211, Section 3 (CLASS_CODE 10014) unless there is an ACCT-211 course. However, note that the CLASS

table’s primary key is CLASS_CODE, which is not derived from the COURSE parent entity. That is, CLASS may be

represented by:

CLASS (CLASS_CODE, CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)

The second weak entity requirement has not been met; therefore, by definition, the CLASS entity in Figure 4.8 may

not be classified as weak. On the other hand, if the CLASS entity’s primary key had been defined as a composite key,

composed of the combination CRS_CODE and CLASS_SECTION, CLASS could be represented by:

CLASS (CRS_CODE, CLASS_SECTION, CLASS_TIME, ROOM_CODE, PROF_NUM)

In that case, illustrated in Figure 4.9, the CLASS primary key is partially derived from COURSE because CRS_CODE

is the COURSE table’s primary key. Given this decision, CLASS is a weak entity by definition. (In Visio Professional

Crow’s Foot terms, the relationship between COURSE and CLASS is classified as strong, or identifying.) In any case,

CLASS is always existence-dependent on COURSE, whether or not it is defined as weak.

1.1.8 Relationship Participation

Participation in an entity relationship is either optional or mandatory. Recall that relationships are bidirectional; that

is, they operate in both directions. If COURSE is related to CLASS, then by definition, CLASS is related to COURSE.

Because of the bidirectional nature of relationships, it is necessary to determine the connectivity of the relationship

from COURSE to CLASS and the connectivity of the relationship from CLASS to COURSE. Similarly, the specific

maximum and minimum cardinalities must be determined in each direction for the relationship. Once again, you must

consider the bidirectional nature of the relationship when determining participation.

Optional participation means that one entity occurrence does not require a corresponding entity occurrence in a

particular relationship. For example, in the “COURSE generates CLASS” relationship, you noted that at least some

courses do not generate a class. In other words, an entity occurrence (row) in the COURSE table does not necessarily

require the existence of a corresponding entity occurrence in the CLASS table. (Remember that each entity is

implemented as a table.) Therefore, the CLASS entity is considered to be optional to the COURSE entity. In Crow’s

Foot notation, an optional relationship between entities is shown by drawing a small circle (O) on the side of the

optional entity, as illustrated in Figure 4.9. The existence of an optional entity indicates that the minimum cardinality

is 0 for the optional entity. (The term optionality is used to label any condition in which one or more optional

relationships exist.)

Note

Remember that the burden of establishing the relationship is always placed on the entity that contains the

foreign key. In most cases, that will be the entity on the “many” side of the relationship.

Mandatory participation means that one entity occurrence requires a corresponding entity occurrence in a

particular relationship. If no optionality symbol is depicted with the entity, the entity is assumed to exist in a mandatory

relationship with the related entity. If the mandatory participation is depicted graphically, it is typically shown as a small

14

Pages 15

hash mark across the relationship line, similar to the Crow’s Foot depiction of a connectivity of 1. The existence of a

mandatory relationship indicates that the minimum cardinality is at least 1 for the mandatory entity.

Note

You might be tempted to conclude that relationships are weak when they occur between entities in an optional

relationship and that relationships are strong when they occur between entities in a mandatory relationship.

However, this conclusion is not warranted. Keep in mind that relationship participation and relationship

strength do not describe the same thing . You are likely to encounter a strong relationship when one entity is

optional to another. For example, the relationship between EMPLOYEE and DEPENDENT is clearly a strong one,

but DEPENDENT is clearly optional to EMPLOYEE. After all, you cannot require employees to have dependents.

And it is just as possible for a weak relationship to be established when one entity is mandatory to another. The

relationship strength depends on how the PK of the related entity is formulated, while the relationship

participation depends on how the business rule is written. For example, the business rules “Each part must be

supplied by a vendor” and “A part may or may not be supplied by a vendor” create different optionalities for

the same entities! Failure to understand this distinction may lead to poor design decisions that cause major

problems when table rows are inserted or deleted.

When you create a relationship in MS Visio, the default relationship will be mandatory on the “1” side and optional on

the “many” side. Table 4.3 shows the various connectivity and participation combinations that are supported by the

Crow’s Foot notation. Recall that these combinations are often referred to as cardinality in Crow’s Foot notation when

specific cardinalities are not used.

TABLE Crow’s Foot Symbols

1.3

CROW’S FOOT SYMBOL CARDINALITY

(0,N)

(1,N)

(1,1)

(0,1)

COMMENT

Zero or many. Many side is optional.

One or many. Many side is mandatory.

One and only one. 1 side is mandatory.

Zero or one. 1 side is optional.

Because relationship participation turns out to be a very important component of the database design process, let’s

examine a few more scenarios. Suppose that Tiny College employs some professors who conduct research without

teaching classes. If you examine the “PROFESSOR teaches CLASS” relationship, it is quite possible for a

PROFESSOR not to teach a CLASS. Therefore, CLASS is optional to PROFESSOR. On the other hand, a CLASS

must be taught by a PROFESSOR. Therefore, PROFESSOR is mandatory to CLASS. Note that the ERD model in

Figure 4.12 shows the cardinality next to CLASS to be (0,3), thus indicating that a professor may teach no classes at

all or as many as three classes. And each CLASS table row will reference one and only one PROFESSOR

row—assuming each class is taught by one and only one professor—represented by the (1,1) cardinality next to the

PROFESSOR table.

Failure to understand the distinction between mandatory and optional participation in relationships might yield

designs in which awkward (and unnecessary) temporary rows (entity instances) must be created just to accommodate

the creation of required entities. Therefore, it is important that you clearly understand the concepts of mandatory and

optional participation.

It is also important to understand that the semantics of a problem might determine the type of participation in a

relationship. For example, suppose that Tiny College offers several courses; each course has several classes. Note


FIGURE An optional CLASS entity in the relationship “PROFESSOR teaches CLASS”

1.12

again the distinction between class and course in this discussion: a CLASS constitutes a specific offering (or section)

of a COURSE. (Typically, courses are listed in the university’s course catalog, while classes are listed in the class

schedules that students use to register for their classes.)

Analyzing the CLASS entity’s contribution to the “COURSE generates CLASS” relationship, it is easy to see that a

CLASS cannot exist without a COURSE. Therefore, you can conclude that the COURSE entity is mandatory in the

relationship. But two scenarios for the CLASS entity may be written, shown in Figures 4.13 and 4.14.

FIGURE CLASS is optional to COURSE

1.13

FIGURE COURSE and CLASS in a mandatory relationship

1.14

The different scenarios are a function of the semantics of the problem; that is, they depend on how the relationship

is defined.

1. CLASS is optional. It is possible for the department to create the entity COURSE first and then create the

CLASS entity after making the teaching assignments. In the real world, such a scenario is very likely; there may

be courses for which sections (classes) have not yet been defined. In fact, some courses are taught only once

a year and do not generate classes each semester.

2. CLASS is mandatory. This condition is created by the constraint that is imposed by the semantics of the

statement “Each COURSE generates one or more CLASSes.” In ER terms, each COURSE in the “generates”

relationship must have at least one CLASS. Therefore, a CLASS must be created as the COURSE is created,

in order to comply with the semantics of the problem.

Keep in mind the practical aspects of the scenario presented in Figure 4.14. Given the semantics of this relationship,

the system should not accept a course that is not associated with at least one class section. Is such a rigid environment

16

Pages 17

desirable from an operational point of view? For example, when a new COURSE is created, the database first updates

the COURSE table, thereby inserting a COURSE entity that does not yet have a CLASS associated with it. Naturally,

the apparent problem seems to be solved when CLASS entities are inserted into the corresponding CLASS table.

However, because of the mandatory relationship, the system will be in temporary violation of the business rule

constraint. For practical purposes, it would be desirable to classify the CLASS as optional in order to produce a more

flexible design.

Finally, as you examine the scenarios presented in Figures 4.13 and 4.14, keep in mind the role of the DBMS. To

maintain data integrity, the DBMS must ensure that the “many” side (CLASS) is associated with a COURSE through

the foreign key rules.

1.1.9 Relationship Degree

A relationship degree indicates the number of entities or participants associated with a relationship. A unary

relationship exists when an association is maintained within a single entity. A binary relationship exists when two

entities are associated. A ternary relationship exists when three entities are associated. Although higher degrees exist,

they are rare and are not specifically named. (For example, an association of four entities is described simply as a four-

degree relationship.) Figure 4.15 shows these types of relationship degrees.

Unary Relationships

In the case of the unary relationship shown in Figure 4.15, an employee within the EMPLOYEE entity is the manager

for one or more employees within that entity. In this case, the existence of the “manages” relationship means that

EMPLOYEE requires another EMPLOYEE to be the manager—that is, EMPLOYEE has a relationship with itself. Such

a relationship is known as a recursive relationship. The various cases of recursive relationships will be explored in

Section 4.1.10.

Binary Relationships

A binary relationship exists when two entities are associated in a relationship. Binary relationships are most common. In

fact, to simplify the conceptual design, whenever possible, most higher-order (ternary and higher) relationships are

decomposed into appropriate equivalent binary relationships. In Figure 4.15, the relationship “a PROFESSOR teaches one

or more CLASSes” represents a binary relationship.

Ternary and Higher-Degree Relationships

Although most relationships are binary, the use of ternary and higher-order relationships does allow the designer some

latitude regarding the semantics of a problem. A ternary relationship implies an association among three different

entities. For example, note the relationships (and their consequences) in Figure 4.16, which are represented by the

following business rules:

A DOCTOR writes one or more PRESCRIPTIONs.

A PATIENT may receive one or more PRESCRIPTIONs.

A DRUG may appear in one or more PRESCRIPTIONs. (To simplify this example, assume that the business

rule states that each prescription contains only one drug. In short, if a doctor prescribes more than one drug, a

separate prescription must be written for each drug.)

As you examine the table contents in Figure 4.16, note that it is possible to track all transactions. For instance, you

can tell that the first prescription was written by doctor 32445 for patient 102, using the drug DRZ.

FIGURE Three types of relationship degree

1.15

1.1.10 Recursive Relationships


18

As was previously mentioned, a recursive relationship is one in which a relationship can exist between occurrences of the

same entity set. (Naturally, such a condition is found within a unary relationship.) For example, a 1:M unary

relationship can be expressed by “an EMPLOYEE may manage many EMPLOYEEs, and each EMPLOYEE is

managed by one EMPLOYEE.” And as long as polygamy is not legal, a 1:1 unary relationship may be expressed by “an

EMPLOYEE may be married to one and only one other EMPLOYEE.” Finally, the M:N unary relationship may be

expressed by “a COURSE may be a prerequisite to many other COURSEs, and each COURSE may have many other

COURSEs as prerequisites.” Those relationships are shown in Figure 4.17.

The 1:1 relationship shown in Figure 4.17 can be implemented in the single table shown in Figure 4.18. Note that

you can determine that James Ramirez is married to Louise Ramirez, who is married to James Ramirez. And Anne

Jones is married to Anton Shapiro, who is married to Anne Jones.

Pages 19

FIGURE The implementation of a ternary relationship

1.16

Table name: DRUG Table name: PATIENT

Table name: DOCTOR Table name: PRESCRIPTION

FIGURE An ER representation of recursive relationships

1.17

Database name: Ch04_Clinic

FIGURE The 1:1 recursive relationship

1.18 “EMPLOYEE is married to EMPLOYEE”

Database name: CH04_PartCo

Table name: EMPLOYEE_V1

Unary relationships are common in manufacturing

industries. For example, Figure 4.19 illustrates that a rotor

assembly (C-130) is composed of many parts, but each part

is used to create only one rotor assembly. Figure 4.19

indicates that a rotor assembly is composed of four 2.5-cm

washers, two cotter pins, one 2.5-cm steel shank, four

10.25-cm rotor blades, and two 2.5-cm hex nuts. The

relationship implemented in Figure 4.19 thus enables you to

track each part within each rotor assembly.

If a part can be used to assemble several different kinds of

other parts and is itself composed of many parts, two tables


FIGURE Another unary relationship: “PART contains PART”

1.19 Table name: PART_V1 Database name; CH04_PartCo

are required to implement the “PART contains PART” relationship. Figure 4.20 illustrates such an environment. Parts

tracking is increasingly important as managers become more aware of the legal ramifications of producing more

complex output. In fact, in many industries, especially those involving aviation, full parts tracking is required by law.

FIGURE Implementation of the M:N recursive relationship “PART contains PART”

1.20 Table name: COMPONENT Database name: Ch04_PartCo

Table name: PART

The M:N recursive relationship might be more familiar in a school environment. For instance, note how the M:N

“COURSE requires COURSE” relationship illustrated in Figure 4.17 is implemented in Figure 4.21. In this example,

MATH-243 is a prerequisite to QM-261 and QM-362, while both MATH-243 and QM-261 are prerequisites to

QM-362.

Finally, the 1:M recursive relationship “EMPLOYEE manages EMPLOYEE,” shown in Figure 4.17, is implemented in

Figure 4.22.

One common pitfall when working with unary relationships is to confuse participation with referential integrity. In

theory, participation and referential integrity are very different concepts and are normally easy to distinguish in binary

relationships. In practical terms, conversely, participation and referential integrity are very similar because they are

both implemented through constraints on the same set of attributes. This similarity often leads to confusion when the

concepts are applied within the limited structure of a unary relationship. Consider the unary 1:1 relationship described

in Figure 4.18 of a spousal relationship between employees. Participation, as described above, is bidirectional,

20

Pages 21

FIGURE Implementation of the M:N recursive relationship “COURSE requires COURSE”

1.21 Table name: COURSE Database name: Ch04_TinyCollege

Table name: PREREQ

FIGURE Implementation of the 1:M recursive relationship “EMPLOYEE manages EMPLOYEE”

1.22

Database name: Ch04_PartCo


meaning that it must be addressed in both directions along the relationship. Participation in Figure 4.18 addresses the

questions:

Must every employee have a spouse who is an employee?

Must every employee be a spouse to another employee?

For the data shown in Figure 4.18, the correct answer to both of those questions is “No.” It is possible to be an

employee and not have another employee as a spouse. Also, it is possible to be an employee and not be the spouse of

another employee.

Referential integrity deals with the correspondence of values in the foreign key with values in the related primary key.

Referential integrity is not bidirectional, and therefore has only one question that it answers.

Must every employee spouse be a valid employee?

For the data shown in Figure 4.18, the correct answer is “Yes.” Another way to frame this question is to consider

whether or not every value provided for the EMP_SPOUSE attribute must match some value in the EMP_NUM

attribute.

In practical terms, both participation and referential integrity involve the values used as primary key/foreign key to

implement the relationship. Referential integrity requires that the values in the foreign key correspond to values in the

primary key. In one direction, participation considers whether or not the foreign key can contain a null. In Figure 4.18,


for example, employee Robert Delaney is not required to have a value in EMP_SPOUSE. In the other direction,

participation considers whether or not every value in the primary key must appear as a value in the foreign key. In

Figure 4.18, for example, employee Robert Delaney’s value for EMP_NUM (348) is not required to appear as a value in

EMP_SPOUSE for any other employee.

1.1.11 Associative (Composite) Entities

In the original ERM described by Chen, relationships do not contain attributes. You should recall from Chapter 3 that

the relational model generally requires the use of 1:M relationships. (Also, recall that the 1:1 relationship has its place,

but it should be used with caution and proper justification.) If M:N relationships are encountered, you must create a

bridge between the entities that display such relationships. The associative enti ty is used to implement a M:N

relationship between two or more entities. This associative entity (also known as a composite or bridge entity) is

composed of the primary keys of each of the entities to be connected. An example of such a bridge is shown in Figure

4.23. The Crow’s Foot notation does not identify the composite entity as such. Instead, the composite entity is

identified by the solid relationship line between the parent and child entities, thereby indicating the presence of a strong

(identifying) relationship.

FIGURE Converting the M:N relationship into two 1:M relationships

1.23 Table name: STUDENT Database name: Ch04_CollegeTry

Table name: ENROLL

Table name: CLASS

Note that the composite ENROLL entity in Figure 4.23 is existence-dependent on the other two entities; the

composition of the ENROLL entity is based on the primary keys of the entities that are connected by the composite

entity. The composite entity may also contain additional attributes that play no role in the connective process. For

example, although the entity must be composed of at least the STUDENT and CLASS primary keys, it may also

include such additional attributes as grades, absences, and other data uniquely identified by the student’s performance

in a specific class.

Finally, keep in mind that the ENROLL table’s key (CLASS_CODE and STU_NUM) is composed entirely of the

primary keys of the CLASS and STUDENT tables. Therefore, no null entries are possible in the ENROLL table’s key

attributes.

Implementing the small database shown in Figure 4.23 requires that you define the relationships clearly. Specifically,

you must know the “1” and the “M” sides of each relationship, and you must know whether the relationships are

mandatory or optional. For example, note the following points:

22

Pages 23

� A class may exist (at least at the start of registration) even though it contains no students. Therefore, if you

examine Figure 4.24, an optional symbol should appear on the STUDENT side of the M:N relationship

between STUDENT and CLASS.

You might argue that to be classified as a STUDENT, a person must be enrolled in at least one CLASS.

Therefore, CLASS is mandatory to STUDENT from a purely conceptual point of view. However, when a

student is admitted to college, that student has not (yet) signed up for any classes. Therefore, at least initially,

CLASS is optional to STUDENT. Note that the practical considerations in the data environment help dictate the

use of optionalities. If CLASS is not optional to STUDENT—from a database point of view—a class

assignment must be made when the student is admitted. But that’s not how the process actually works, and the

database design must reflect this. In short, the optionality reflects practice.

FIGURE The M:N relationship between STUDENT and CLASS

1.24

Because the M:N relationship between STUDENT and CLASS is decomposed into two 1:M relationships

through ENROLL, the optionalities must be transferred to ENROLL. (See Figure 4.25.) In other words, it now

becomes possible for a class not to occur in ENROLL if no student has signed up for that class. Because a class

need not occur in ENROLL, the ENROLL entity becomes optional to CLASS. And because the ENROLL

entity is created before any students have signed up for a class, the ENROLL entity is also optional to

STUDENT, at least initially.

FIGURE A composite entity in an ERD

1.25

As students begin to sign up for their classes, they will be entered into the ENROLL entity. Naturally, if a

student takes more than one class, that student will occur more than once in ENROLL. For example, note that in

the ENROLL table in Figure 4.23, STU_NUM = 321452 occurs three times. On the other hand, each

student occurs only once in the STUDENT entity. (Note that the STUDENT table in Figure 4.23 has only one

STU_NUM = 321452 entry.) Therefore, in Figure 4.25, the relationship between STUDENT and ENROLL is

shown to be 1:M, with the M on the ENROLL side.


As you can see in Figure 4.23, a class can occur more than once in the ENROLL table. For example,

CLASS_CODE = 10014 occurs twice. However, CLASS_CODE = 10014 occurs only once in the CLASS

table to reflect that the relationship between CLASS and ENROLL is 1:M. Note that in Figure 4.25, the M is

located on the ENROLL side, while the 1 is located on the CLASS side.

1.2 DEVELOPING AN ER DIAGRAM

The process of database design is an iterative rather than a linear or sequential process. The verb iterate means “to

do again or repeatedly.” An iterative process is, thus, one based on repetition of processes and procedures. Building

an ERD usually involves the following activities:

Create a detailed narrative of the organization’s description of operations.

Identify the business rules based on the description of operations.

Identify the main entities and relationships from the business rules.

Develop the initial ERD.

Identify the attributes and primary keys that adequately describe the entities.

Revise and review the ERD.

During the review process, it is likely that additional objects, attributes, and relationships will be uncovered. Therefore,

the basic ERM will be modified to incorporate the newly discovered ER components. Subsequently, another round of

reviews might yield additional components or clarification of the existing diagram. The process is repeated until the end

users and designers agree that the ERD is a fair representation of the organization’s activities and functions.

During the design process, the database designer does not depend simply on interviews to help define entities,

attributes, and relationships. A surprising amount of information can be gathered by examining the business forms and

reports that an organization uses in its daily operations.

To illustrate the use of the iterative process that ultimately yields a workable ERD, let’s start with an initial interview

with the Tiny College administrators. The interview process yields the following business rules:

1. Tiny College (TC) is divided into several schools: a school of business, a school of arts and sciences, a school

of education, and a school of applied sciences. Each school is administered by a dean who is a professor. Each

professor can be the dean of only one school, and a professor is not required to be the dean of any school.

Therefore, a 1:1 relationship exists between PROFESSOR and SCHOOL. Note that the cardinality can be

expressed by writing (1,1) next to the entity PROFESSOR and (0,1) next to the entity SCHOOL.

2. Each school comprises several departments. For example, the school of business has an accounting

department, a management/marketing department, an economics/finance department, and a computer

information systems department. Note again the cardinality rules: The smallest number of departments

operated by a school is one, and the largest number of departments is indeterminate (N). On the other hand,

each department belongs to only a single school; thus, the cardinality is expressed by (1,1). That is, the

minimum number of schools that a department belongs to is one, as is the maximum number. Figure 4.26

illustrates these first two business rules.

24

Pages 25

FIGURE The first Tiny College ERD segment

1.26 Note

It is again appropriate to evaluate the reason for maintaining the 1:1 relationship between PROFESSOR and

SCHOOL in the PROFESSOR is dean of SCHOOL relationship. It is worth repeating that the existence of 1:1

relationships often indicates a misidentification of attributes as entities. In this case, the 1:1 relationship could easily

be eliminated by storing the dean’s attributes in the SCHOOL entity. This solution would also make it easier to

answer the queries, “Who is the dean?” and “What are that dean’s credentials?” The downside of this solution is that it

requires the duplication of data that are already stored in the PROFESSOR table, thus setting the stage for

anomalies. However, because each school is run by a single dean, the problem of data duplication is rather minor.

The selection of one approach over another often depends on information requirements, transaction speed,

and the database designer’s professional judgment. In short, do not use 1:1 relationships lightly, and make sure

that each 1:1 relationship within the database design is defensible.

3. Each department may offer courses. For example, the management/marketing department offers courses such

as Introduction to Management, Principles of Marketing, and Production Management. The ERD segment for

this condition is shown in Figure 4.27. Note that this relationship is based on the way Tiny College operates.

If, for example, Tiny College had some departments that were classified as “research only,” those departments

would not offer courses; therefore, the COURSE entity would be optional to the DEPARTMENT entity.

4. The relationship between COURSE and CLASS was illustrated in Figure 4.9. Nevertheless, it is worth

repeating that a CLASS is a section of a COURSE. That is, a department may offer several sections (classes)

of the same database course. Each of those classes is taught by a professor at a given time in a given place.

In short, a 1:M relationship exists between COURSE and CLASS. However, because a course may exist in

Tiny College’s course catalog even when it is not offered as a class in a current class schedule, CLASS is

optional to COURSE. Therefore, the relationship between COURSE and CLASS looks like Figure 4.28.

FIGURE The second Tiny College ERD segment

1.27

FIGURE The third Tiny College ERD segment

1.28


26

5. Each department should have one or more professors assigned to it. One and only one of those professors

chairs the department, and no professor is required to accept the chair position. Therefore, DEPARTMENT

is optional to PROFESSOR in the “chairs” relationship. Those relationships are summarized in the ER segment

shown in Figure 4.29.

FIGURE The fourth Tiny College ERD segment

1.29

6. Each professor may teach up to four classes; each class is a section of a course. A professor may also be on

a research contract and teach no classes at all. The ERD segment in Figure 4.30 depicts those conditions.

7. A student may enroll in several classes but takes each class only once during any given enrollment period. For

example, during the current enrollment period, a student may decide to take five classes—Statistics,

Accounting, English, Database, and History—but that student would not be enrolled in the same Statistics class

five times during the enrollment period! Each student may enroll in up to six classes, and each class may have

up to 35 students, thus creating an M:N relationship between STUDENT and CLASS. Because a CLASS can

Pages 27

FIGURE The fifth Tiny College ERD segment

1.30

initially exist (at the start of the enrollment period) even though no students have enrolled in it, STUDENT is

optional to CLASS in the M:N relationship. This M:N relationship must be divided into two 1:M relationships

through the use of the ENROLL entity, shown in the ERD segment in Figure 4.31. But note that the optional

symbol is shown next to ENROLL. If a class exists but has no students enrolled in it, that class doesn’t occur

in the ENROLL table. Note also that the ENROLL entity is weak: it is existence-dependent, and its (composite)

PK is composed of the PKs of the STUDENT and CLASS entities. You can add the cardinalities (0,6) and

(0,35) next to the ENROLL entity to reflect the business rule constraints, as shown in Figure 4.31. (Visio

Professional does not automatically generate such cardinalities, but you can use a text box to accomplish

that task.)

FIGURE The sixth Tiny College ERD segment

1.31 8. Each department has several (or many) students whose major is offered by that department. However, each

student has only a single major and is, therefore, associated with a single department. (See Figure 4.32.)

However, in the Tiny College environment, it is possible—at least for a while—for a student not to declare a

major field of study. Such a student would not be associated with a department; therefore, DEPARTMENT is

optional to STUDENT. It is worth repeating that the relationships between entities and the entities themselves

reflect the organization’s operating environment. That is, the business rules define the ERD components.

9. Each student has an advisor in his or her department; each advisor counsels several students. An advisor is also

a professor, but not all professors advise students. Therefore, STUDENT is optional to PROFESSOR in the

“PROFESSOR advises STUDENT” relationship. (See Figure 4.33.)

10. As you can see in Figure 4.34, the CLASS entity contains a ROOM_CODE attribute. Given the naming

conventions, it is clear that ROOM_CODE is an FK to another entity. Clearly, because a class is taught in a

room, it is reasonable to assume that the ROOM_CODE in CLASS is the FK to an entity named ROOM. In

turn, each room is located in a building. So the last Tiny College ERD is created by observing that a BUILDING

FIGURE The seventh Tiny College ERD segment

1.32

FIGURE The eighth Tiny College ERD segment

1.33


28

can contain many ROOMs, but each ROOM is found in a single BUILDING. In this ERD segment, it is clear

that some buildings do not contain (class) rooms. For example, a storage building might not contain any named

rooms at all.

FIGURE The ninth Tiny College ERD segment 1.34

Using the preceding summary, you can identify the following entities:

SCHOOL COURSE

DEPARTMENT CLASS

PROFESSOR STUDENT

BUILDING ROOM

ENROLL (the associative entity between STUDENT and CLASS)

Pages 29

Once you have discovered the relevant entities, you can define the initial set of relationships among them. Next, you

describe the entity attributes. Identifying the attributes of the entities helps you to better understand the relationships

among entities. Table 4.4 summarizes the ERM’s components, and names the entities and their relations.

TABLE Components of the ERM

1.4

ENTITY RELATIONSHIP CONNECTIVITY ENTITY

SCHOOL operates 1:M DEPARTMENT

DEPARTMENT has 1:M STUDENT

DEPARTMENT employs 1:M PROFESSOR

DEPARTMENT offers 1:M COURSE

COURSE generates 1:M CLASS

PROFESSOR is dean of 1:1 SCHOOL

PROFESSOR chairs 1:1 DEPARTMENT

PROFESSOR teaches 1:M CLASS

PROFESSOR advises 1:M STUDENT

STUDENT enrolls in M:N CLASS

BUILDING contains 1:M ROOM

ROOM is used for 1:M CLASS

Note: ENROLL is the composite entity that implements the M:N relationship “STUDENT enrolls in CLASS.”

You must also define the connectivity and cardinality for the just-discovered relations based on the business rules.

However, to avoid crowding the diagram, the cardinalities are not shown. Figure 4.35 shows the Crow’s Foot ERD for

Tiny College. Note that this is an implementation-ready model. Therefore it shows the ENROLL composite entity.

Figure 4.36 shows the conceptual UML class diagram for Tiny College. Note that this class diagram depicts the M:N

relationship between STUDENT and CLASS. Figure 4.37 shows the implementation-ready UML class diagram for

Tiny College (note that the ENROLL composite entity is shown in this class diagram.

1.3 DATABASE DESIGN CHALLENGES: CONFLICTING GOALS

Database designers often must make design compromises that are triggered by conflicting goals, such as adherence to

design standards (design elegance), processing speed, and information requirements.

Design standards. The database design must conform to design standards. Such standards have guided you in

developing logical structures that minimize data redundancies, thereby minimizing the likelihood that

destructive data anomalies will occur. You have also learned how standards prescribe avoiding nulls to the

greatest extent possible. In fact, you have learned that design standards govern the presentation of all

components within the database design. In short, design standards allow you to work with well -defined

components and to evaluate the interaction of those components with some precision. Without design

standards, it is nearly impossible to formulate a proper design process, to evaluate an existing design, or to

trace the likely logical impact of changes in design.

Processing speed. In many organizations, particularly those generating large numbers of transactions, high

processing speeds are often a top priority in database design. High processing speed means minimal access

time, which may be achieved by minimizing the number and complexity of logically desirable relationships. For

example, a “perfect” design might use a 1:1 relationship to avoid nulls, while a higher transaction-speed design

might combine the two tables to avoid the use of an additional relationship, using dummy entries to avoid the

nulls. If the focus is on data-retrieval speed, you might also be forced to include derived attributes in the design.

Information requirements. The quest for timely information might be the focus of database design. Complex

information requirements may dictate data transformations, and they may expand the number of entities and

FIGURE The completed Tiny College ERD

1.35


30

attributes within the design. Therefore, the database may have to sacrifice some of its “clean” design structures

and/or some of its high transaction speed to ensure maximum information generation. For example, suppose

Pages 31

FIGURE The conceptual UML class diagram for Tiny College

1.36

that a detailed sales report must be generated periodically. The sales report includes all invoice subtotals, taxes, and

totals; even the invoice lines include subtotals. If the sales report includes hundreds of thousands (or even millions)

of invoices, computing the totals, taxes, and subtotals is likely to take some time. If those computations had been

made and the results had been stored as derived attributes in the INVOICE and LINE tables at the time of the

transaction, the real-time transaction speed might have declined. But that loss of speed would only be noticeable

if there were many simultaneous transactions. The cost of a slight loss of transaction speed at the front end and the

addition of multiple derived attributes is likely to pay off when the sales reports are generated (not to mention the

fact that it will be simpler to generate the queries).

A design that meets all logical requirements and design conventions is an important goal. However, if this perfect

design fails to meet the customer’s transaction speed and/or information requirements, the designer will not have done a

proper job from the end user’s point of view. Compromises are a fact of life in the real world of database design.

Even while focusing on the entities, attributes, relationships, and constraints, the designer should begin thinking about

end-user requirements such as performance, security, shared access, and data integrity. The designer must consider

processing requirements and verify that all update, retrieval, and deletion options are available. Finally, a design is of little

value unless the end product is capable of delivering all specified query and reporting requirements.


FIGURE The implementation-ready UML class diagram for Tiny College

1.37 You are quite likely to discover that even the best design process produces an ERD that requires further changes

mandated by operational requirements. Such changes should not discourage you from using the process. ER modeling

is essential in the development of a sound design that is capable of meeting the demands of adjustment and growth.

Using ERDs yields perhaps the richest bonus of all: a thorough understanding of how an organization really functions.

There are occasional design and implementation problems that do not yield “clean” implementation solutions. To get a

sense of the design and implementation choices a database designer faces, let’s revisit the 1:1 recursive relationship

“EMPLOYEE is married to EMPLOYEE” first examined in Figure 4.18. Figure 4.38 shows three different ways of

implementing such a relationship.

Note that the EMPLOYEE_V1 table in Figure 4.38 is likely to yield data anomalies. For example, if Anne Jones

divorces Anton Shapiro, two records must be updated—by setting the respective EMP_SPOUSE values to null—to

properly reflect that change. If only one record is updated, inconsistent data occur. The problem becomes even worse if

several of the divorced employees then marry each other. In addition, that implementation also produces undesirable

nulls for employees who are not married to other employees in the company.

Another approach would be to create a new entity shown as MARRIED_V1 in a 1:M relationship with EMPLOYEE.

(See Figure 4.38.) This second implementation does eliminate the nulls for employees who are not married to

somebody working for the same company. (Such employees would not be entered in the MARRIED_V1 table.)

However, this approach still yields possible duplicate values. For example, the marriage between employees 345 and

32

Pages 33

FIGURE Various implementations of the 1:1 recursive relationship

1.38


First implementation

Table name: EMPLOYEE Table name: MARRIED_V1

Database name: Ch04_PartCo

Second implementation

Table name: MARRIAGE Table name: MARPART Table name: EMPLOYEE

The relational diagram for the third implementation

Third implementation

347 may still appear twice, once as 345,347 and once as 347,345. (Since each of those permutations is unique the

first time it appears, the creation of a unique index will not solve the problem.)

As you can see, the first two implementations yield several problems:

Both solutions use synonyms. The EMPLOYEE_V1 table uses EMP_NUM and EMP_SPOUSE to refer to an

employee. The MARRIED_V1 table uses the same synonyms.

Both solutions are likely to produce inconsistent data. For example, it is possible to enter employee 345 as

married to employee 347 and to enter employee 348 as married to employee 345.

Both solutions allow data entries to show one employee married to several other employees. For example, it is

possible to have data pairs such as 345,347 and 348,347 and 349,347, none of which will violate entity

integrity requirements, because they are all unique.

A third approach would be to have two new entities, MARRIAGE and MARPART, in a 1:M relationship. MARPART

contains the EMP_NUM foreign key to EMPLOYEE. (See the relational diagram in Figure 4.38.) But even this

approach has issues. It requires the collection of additional data regarding the employees’ marriage—the marriage


date. If the business users do not need this data, then requiring them to collect it would be inappropriate. To ensure

that an employee occurs only once in any given marriage, you would have to create a unique index on the EMP_NUM

attribute in the MARPART table. Another potential problem with this solution is that the database implementation will

allow more than two employees to “participate” in the same marriage.

As you can see, a recursive 1:1 relationship yields many different solutions with varying degrees of effectiveness and

adherence to basic design principles. Any of the above solutions would likely involve the creation of program code to

help ensure the integrity and consistency of the data. In a later chapter, we will examine the creation of database

triggers that can do exactly that. Your job as a database designer is to use your professional judgment to yield a solution

that meets the requirements imposed by business rules, processing requirements, and basic design principles.

Finally, document, document, and document! Put all design activities in writing. Then review what you’ve written.

Documentation not only helps you stay on track during the design process, but also enables you (or those following

you) to pick up the design thread when the time comes to modify the design. Although the need for documentation

should be obvious, one of the most vexing problems in database and systems analysis work is that the “put it in writing”

rule is often not observed in all of the design and implementation stages. The development of organizational

documentation standards is a very important aspect of ensuring data compatibility and coherence.

34

Normalization of Database Tables


What normalization is and what role it plays in the database design process

About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF

How normal forms can be transformed from lower normal forms to higher normal forms

That normalization and ER modeling are used concurrently to produce a good database

design

That some situations require denormalization to generate information efficiently

Good database design must be matched to good table structures. In this chapter, you will

learn to evaluate and design good table structures to control data redundancies, thereby

avoiding data anomalies. The process that yields such desirable results is known as

normalization.

In order to recognize and appreciate the characteristics of a good table structure, it is useful

to examine a poor one.Therefore, the chapter begins by examining the characteristics of a

poor table structure and the problems it creates. You then learn how to correct a poor

table structure. This methodology will yield important dividends: you will know how to

design a good table structure and how to repair an existing poor one.

You will discover not only that data anomalies can be eliminated through normalization, but

also that a properly normalized set of table structures is actually less complicated to use

than an unnormalized set. In addition, you will learn that the normalized set of table

structures more faithfully reflects an organization’s real operations.

2

Previews

N O R M A L I Z A T I O N O F D A T A B A S E T A B L E S 35

2.1 DATABASE TABLES AND NORMALIZATION

Having good relational database software is not enough to avoid the data redundancy discussed in Chapter 1, Database

Systems. If the database tables are treated as though they are files in a file system, the relational database management

system (RDBMS) never has a chance to demonstrate its superior data-handling capabilities.

The table is the basic building block of database design. Consequently, the table’s structure is of great interest. Ideally, the

database design process explored in Chapter 4, Entity Relationship (ER) Modeling, yields good table structures. Yet it is

possible to create poor table structures even in a good database design. So how do you recognize a poor table

structure, and how do you produce a good table? The answer to both questions involves normalization.

Normalization is a process for evaluating and correcting table structures to minimize data redundancies, thereby

reducing the likelihood of data anomalies. The normalization process involves assigning attributes to tables based on the

concept of determination you learned about in Chapter 3, The Relational Database Model.

Normalization works through a series of stages called normal forms. The first three stages are described as first normal

form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of view, 2NF is better than

1NF, and 3NF is better than 2NF. For most purposes in business database design, 3NF is as high as you need to go in

the normalization process. However, you will discover that properly designed 3NF structures also meet the

requirements of fourth normal form (4NF).

Although normalization is a very important database design ingredient, you should not assume that the highest level

of normalization is always the most desirable. Generally, the higher the normal form, the more relational join

operations are required to produce a specified output and the more resources are required by the database system to

respond to end-user queries. A successful design must also consider end-user demand for fast performance. Therefore,

you will occasionally be expected to denormalize some portions of a database design in order to meet performance

requirements. Denormalization produces a lower normal form; that is, a 3NF will be converted to a 2NF through

denormalization. However, the price you pay for increased performance through denormalization is greater data

redundancy.

Note

Although the word table is used throughout this chapter, formally, normalization is concerned with relations. In Chapter

3 you learned that the terms table and relation are frequently used interchangeably. In fact, you can say that a table is

the implementation view of a logical relation that meets some specific conditions. (See Table 3.1.) However, being

more rigorous, the mathematical relation does not allow duplicate tuples, whereas duplicate tuples could exist in

tables (see Section 6.5). Also, in normalization terminology, any attribute that is at least part of a key is known as a

prime attribute instead of the more common term key attribute, which was introduced earlier. Conversely, a

nonprime attribute, or a nonkey attribute, is not part of any candidate key.

2.2 THE NEED FOR NORMALIZATION

Normalization is typically used in conjunction with the entity relationship modeling that you learned in the previous

chapters. There are two common situations in which database designers use normalization. When designing a new

database structure based on the business requirements of the end users, the database designer will construct a data

model using a technique such as Crow’s Foot notation ERDs. After the initial design is complete, the designer can use

normalization to analyze the relationships that exist among the attributes within each entity, to determine if the

structure can be improved through normalization. Alternatively, database designers are often asked to modify existing

data structures that can be in the form of flat files, spreadsheets, or older database structures. Again, through an

analysis of the relationships among the attributes or fields in the data structure, the database designer can use the

normalization process to improve the existing data structure to create an appropriate database design. Whether

designing a new database structure or modifying an existing one, the normalization process is the same.

Pages 36

To get a better idea of the normalization process, consider the simplified database activities of a construction company

that manages several building projects. Each project has its own project number, name, employees assigned to it, and

so on. Each employee has an employee number, name, and job classification, such as engineer or computer

technician.

The company charges its clients by billing the hours spent on each contract. The hourly billing rate is dependent on the

employee’s position. For example, one hour of computer technician time is billed at a different rate than one hour of

engineer time. Periodically, a report is generated that contains the information displayed in Table 2.1.

The total charge in Table 2.1 is a derived attribute and, at this point, is not stored in the table.

The easiest short-term way to generate the required report might seem to be a table whose contents correspond to

the reporting requirements. (See Figure 2.1.)

FIGURE Tabular representation of the report format 2.1

Table name: RPT_FORMAT Database name: Ch06_ConstructCo


The databases used to illustrate the material in this chapter are found in the Premium Website for this book.

Note that the data in Figure 2.1 reflect the assignment of employees to projects. Apparently, an employee can be

assigned to more than one project. For example, Darlene Smithson (EMP_NUM = 112) has been assigned to two

projects: Amber Wave and Starflight. Given the structure of the dataset, each project includes only a single occurrence of

any one employee. Therefore, knowing the PROJ_NUM and EMP_NUM value will let you find the job classification and

its hourly charge. In addition, you will know the total number of hours each employee worked on each project. (The

total charge—a derived attribute whose value can be computed by multiplying the hours billed and the charge per hour—

has not been included in Figure 2.1. No structural harm is done if this derived attribute is included.)


Pages 38

Unfortunately, the structure of the dataset in Figure 6.1 does not conform to the requirements discussed in Chapter 3, nor

does it handle data very well. Consider the following deficiencies:

1. The project number (PROJ_NUM) is apparently intended to be a primary key or at least a part of a PK, but it

contains nulls. (Given the preceding discussion, you know that PROJ_NUM + EMP_NUM will define each row.)

2. The table entries invite data inconsistencies. For example, the JOB_CLASS value “Elect. Engineer” might be

entered as “Elect.Eng.” in some cases, “El. Eng.” in others, and “EE” in still others.

3. The table displays data redundancies. Those data redundancies yield the following anomalies:

a. Update anomalies. Modifying the JOB_CLASS for employee number 105 requires (potentially) many

alterations, one for each EMP_NUM = 105.

b. Insertion anomalies. Just to complete a row definition, a new employee must be assigned to a project. If

the employee is not yet assigned, a phantom project must be created to complete the employee data entry.

c. Deletion anomalies. Suppose that only one employee is associated with a given project. If that employee

leaves the company and the employee data are deleted, the project information will also be deleted. To

prevent the loss of the project information, a fictitious employee must be created just to save the project

information.

In spite of those structural deficiencies, the table structure appears to work; the report is generated with ease.

Unfortunately, the report might yield varying results depending on what data anomaly has occurred. For example, if you

want to print a report to show the total “hours worked” value by the job classification “Database Designer,” that report

will not include data for “DB Design” and “Database Design” data entries. Such reporting anomalies cause a multitude

of problems for managers—and cannot be fixed through applications programming.

Even if very careful data-entry auditing can eliminate most of the reporting problems (at a high cost), it is easy to

demonstrate that even a simple data entry becomes inefficient. Given the existence of update anomalies, suppose

Darlene M. Smithson is assigned to work on the Evergreen project. The data-entry clerk must update the PROJECT

file with the entry:

15 Evergreen 112 Darlene M Smithson DSS Analyst $45.95 0.0

to match the attributes PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, and

HOURS. (When Ms. Smithson has just been assigned to the project, she has not yet worked, so the total number of

hours worked is 0.0.)

Note

Remember that the naming convention makes it easy to see what each attribute stands for and what its likely

origin is. For example, PROJ_NAME uses the prefix PROJ to indicate that the attribute is associated with the

PROJECT table, while the NAME component is self-documenting , too. However, keep in mind that name length is

also an issue, especially in the prefix designation. For that reason, the prefix CHG was used rather than

CHARGE. (Given the database’s context, it is not likely that that prefix will be misunderstood.)

Each time another employee is assigned to a project, some data entries (such as PROJ_NAME, EMP_NAME, and

CHG_HOUR) are unnecessarily repeated. Imagine the data-entry chore when 200 or 300 table entries must be made!

Note that the entry of the employee number should be sufficient to identify Darlene M. Smithson, her job description,

and her hourly charge. Because there is only one person identified by the number 112, that person’s characteristics

(name, job classification, and so on) should not have to be typed in each time the main file is updated. Unfortunately, the

structure displayed in Figure 6.1 does not make allowances for that possibility.


The data redundancy evident in Figure 6.1 leads to wasted disk space. What’s more, data redundancy produces data

anomalies. For example, suppose the data-entry clerk had entered the data as:

15 Evergeen 112 Darla Smithson DCS Analyst $45.95 0.0

At first glance, the data entry appears to be correct. But is Evergeen the same project as Evergreen? And is DCS

Analyst supposed to be DSS Analyst? Is Darla Smithson the same person as Darlene M. Smithson? Such confusion is

a data integrity problem that was caused because the data entry failed to conform to the rule that all copies of

redundant data must be identical.

The possibility of introducing data integrity problems caused by data redundancy must be considered when a database

is designed. The relational database environment is especially well suited to help the designer overcome those

problems.

2.3 THE NORMALIZATION PROCESS

In this section, you will learn how to use normalization to produce a set of normalized tables to store the data that will be

used to generate the required information. The objective of normalization is to ensure that each table conforms to the

concept of well-formed relations—that is, tables that have the following characteristics:

Each table represents a single subject. For example, a course table will contain only data that directly pertain to

courses. Similarly, a student table will contain only student data.

No data item will be unnecessarily stored in more than one table (in short, tables have minimum controlled

redundancy). The reason for this requirement is to ensure that the data are updated in only one place.

All nonprime attributes in a table are dependent on the primary key—the entire primary key and nothing but

the primary key. The reason for this requirement is to ensure that the data are uniquely identifiable by a primary

key value.

Each table is void of insertion, update, or deletion anomalies. This is to ensure the integrity and consistency of

the data.

To accomplish the objective, the normalization process takes you through the steps that lead to successively higher

normal forms. The most common normal forms and their basic characteristic are listed in Table 6.2. You will learn the

details of these normal forms in the indicated sections.

TABLE

2.2

Normal Forms

NORMAL FORM

First normal form (1NF)

Second normal form (2NF)

Third normal form (3NF)

Boyce-Codd normal form (BCNF)

Fourth normal form (4NF)

CHARACTERISTIC SECTION

Table format, no repeating groups, and PK identified 6.3.1

1NF and no partial dependencies 6.3.2

2NF and no transitive dependencies 6.3.3

Every determinant is a candidate key (special case of 3NF) 6.6.1

3NF and no independent multivalued dependencies 6.6.2

The concept of keys is central to the discussion of normalization. Recall from Chapter 3 that a candidate key is a

minimal (irreducible) superkey. The primary key is the candidate key that is selected to be the primary means used to

identify the rows in the table. Although normalization is typically presented from the perspective of candidate keys, for the

sake of simplicity while initially explaining the normalization process, we will make the assumption that for each table

there is only one candidate key, and therefore, that candidate key is the primary key.

From the data modeler’s point of view, the objective of normalization is to ensure that all tables are at least in third

normal form (3NF). Even higher-level normal forms exist. However, normal forms such as the fifth normal form (5NF)

Pages 40

and domain-key normal form (DKNF) are not likely to be encountered in a business environment and are mainly of

theoretical interest. More often than not, such higher normal forms increase joins (slowing performance) without

adding any value in the elimination of data redundancy. Some very specialized applications, such as statistical research,

might require normalization beyond the 4NF, but those applications fall outside the scope of most business operations.

Because this book focuses on practical applications of database techniques, the higher-level normal forms are not

covered.

Functional Dependence

Before outlining the normalization process, it’s a good idea to review the concepts of determination and functional

dependence that were covered in detail in Chapter 3. Table 6.3 summarizes the main concepts.

TABLE Functional Dependence Concepts 2.3

CONCEPT

Functional dependence

Functional dependence

(generalized definition)

Fully functional dependence

(composite key)

DEFINITION

The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B. Example: PROJ_NUM PROJ_NAME

(read as “PROJ_NUM functionally determines PROJ_NAME”)

In this case, the attribute PROJ_NUM is known as the “determinant” attribute,

and the attribute PROJ_NAME is known as the “dependent” attribute.

Attribute A determines attribute B (that is, B is functionally dependent on A) if all

of the rows in the table that agree in value for attribute A also agree in value for

attribute B.

If attribute B is functionally dependent on a composite key A but not on any sub-

set of that composite key, the attribute B is fully functionally dependent on A.

It is crucial to understand these concepts because they are used to derive the set of functional dependencies for a given

relation. The normalization process works one relation at a time, identifying the dependencies on that relation and

normalizing the relation. As you will see in the following sections, normalization starts by identifying the dependencies of a

given relation and progressively breaking up the relation (table) into a set of new relations (tables) based on the

identified dependencies.

Two types of functional dependencies that are of special interest in normalization are partial dependencies and

transitive dependencies. A partial dependency exists when there is a functional dependence in which the

determinant is only part of the primary key (remember we are assuming there is only one candidate key). For example, if

(A, B) (C,D), B C, and (A, B) is the primary key, then the functional dependence B C is a partial dependency

because only part of the primary key (B) is needed to determine the value of C. Partial dependencies tend to be rather

straightforward and easy to identify.

A transitive dependency exists when there are functional dependencies such that X Y, Y Z, and X is the

primary key. In that case, the dependency X Z is a transitive dependency because X determines the value of Z via

Y. Unlike partial dependencies, transitive dependencies are more difficult to identify among a set of data. Fortunately,

there is an easier way to identify transitive dependencies. A transitive dependency will occur only when a functional

dependence exists among nonprime attributes. In the previous example, the actual transitive dependency is X Z.

However, the dependency Y Z signals that a transitive dependency exists. Hence, throughout the discussion of the

normalization process, the existence of a functional dependence among nonprime attributes will be considered a sign

of a transitive dependency. To address the problems related to transitive dependencies, changes to the table structure

are made based on the functional dependence that signals the transitive dependency’s existence. Therefore, to simplify

the description of normalization, from this point forward we will refer to the signaling dependency as the transitive

dependency.


2.3.1 Conversion to First Normal Form

Because the relational model views data as part of a table or a collection of tables in which all key values must be

identified, the data depicted in Figure 6.1 might not be stored as shown. Note that Figure 6.1 contains what is known

as repeating groups. A repeating group derives its name from the fact that a group of multiple entries of the same

type can exist for any single key attribute occurrence. In Figure 6.1, note that each single project number

(PROJ_NUM) occurrence can reference a group of related data entries. For example, the Evergreen project

(PROJ_NUM = 15) shows five entries at this point—and those entries are related because they each share the

PROJ_NUM = 15 characteristic. Each time a new record is entered for the Evergreen project, the number of entries

in the group grows by one.

A relational table must not contain repeating groups. The existence of repeating groups provides evidence that the

RPT_FORMAT table in Figure 2.1 fails to meet even the lowest normal form requirements, thus reflecting data

redundancies.

Normalizing the table structure will reduce the data redundancies. If repeating groups do exist, they must be eliminated by

making sure that each row defines a single entity. In addition, the dependencies must be identified to diagnose the

normal form. Identification of the normal form will let you know where you are in the normalization process. The

normalization process starts with a simple three-step procedure.

Step 1: Eliminate the Repeating Groups

Start by presenting the data in a tabular format, where each cell has a single value and there are no repeating groups. To

eliminate the repeating groups, eliminate the nulls by making sure that each repeating group attribute contains an

appropriate data value. That change converts the table in Figure 2.1 to 1NF in Figure 2.2.

FIGURE A table in first normal form

2.2 Table name: DATA_ORG_1NF Database name: Ch06_ConstructCo

Step 2: Identify the Primary Key

The layout in Figure 6.2 represents more than a mere cosmetic change. Even a casual observer will note that

PROJ_NUM is not an adequate primary key because the project number does not uniquely identify all of the remaining

entity (row) attributes. For example, the PROJ_NUM value 15 can identify any one of five employees. To maintain a

Pages 42

proper primary key that will uniquely identify any attribute value, the new key must be composed of a combination of

PROJ_NUM and EMP_NUM. For example, using the data shown in Figure 6.2, if you know that PROJ_NUM = 15 and

EMP_NUM = 103, the entries for the attributes PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, and HOURS

must be Evergreen, June E. Arbough, Elect. Engineer, $84.50, and 23.8, respectively.

Step 3: Identify All Dependencies

The identification of the PK in Step 2 means that you have already identified the following dependency:

PROJ_NUM, EMP_NUM PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS

That is, the PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, and HOURS values are all dependent on—that is,

they are determined by—the combination of PROJ_NUM and EMP_NUM. There are additional dependencies. For

example, the project number identifies (determines) the project name. In other words, the project name is dependent on

the project number. You can write that dependency as:

PROJ_NUM PROJ_NAME

Also, if you know an employee number, you also know that employee’s name, that employee’s job classification, and that

employee’s charge per hour. Therefore, you can identify the dependency shown next:

EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR

However, given the previous dependency components, you can see that knowing the job classification means knowing the

charge per hour for that job classification. In other words, you can identify one last dependency:

JOB_CLASS CHG_HOUR

This dependency exists between two nonprime attributes; therefore it is a signal that a transitive dependency exists,

and we will refer to it as a transitive dependency. The dependencies you have just examined can also be depicted with

the help of the diagram shown in Figure 6.3. Because such a diagram depicts all dependencies found within a given

table structure, it is known as a dependency diagram. Dependency diagrams are very helpful in getting a bird’s-eye

view of all of the relationships among a table’s attributes, and their use makes it less likely that you will overlook an

important dependency.

As you examine Figure 2.3, note the following dependency diagram features:

1. The primary key attributes are bold, underlined, and shaded in a different color.

2. The arrows above the attributes indicate all desirable dependencies, that is, dependencies that are based on the

primary key. In this case, note that the entity’s attributes are dependent on the combination of PROJ_NUM

and EMP_NUM.

3. The arrows below the dependency diagram indicate less desirable dependencies. Two types of such

dependencies exist:

a. Partial dependencies. You need to know only the PROJ_NUM to determine the PROJ_NAME; that is, the

PROJ_NAME is dependent on only part of the primary key. And you need to know only the EMP_NUM

to find the EMP_NAME, the JOB_CLASS, and the CHG_HOUR. A dependency based on only a part of

a composite primary key is a partial dependency.

b. Transitive dependencies. Note that CHG_HOUR is dependent on JOB_CLASS. Because neither

CHG_HOUR nor JOB_CLASS is a prime attribute—that is, neither attribute is at least part of a key—the

condition is a transitive dependency. In other words, a transitive dependency is a dependency of one

nonprime attribute on another nonprime attribute. The problem with transitive dependencies is that they

still yield data anomalies.


FIGURE First normal form (1NF) dependency diagram 2.3

PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR HOURS

Partial dependency Transitive dependency

Partial dependencies

1NF (PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOURS, HOURS)

PARTIAL DEPENDENCIES: (PROJ_NUM PROJ_NAME) (EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR)

TRANSITIVE DEPENDENCY: (JOB CLASS CHG_HOUR)

Note that Figure 2.3 includes the relational schema for the table in 1NF and a textual notation for each identified

dependency.

Note

The term first normal form (1NF) describes the tabular format in which:

All of the key attributes are defined.

There are no repeating groups in the table. In other words, each row/column intersection contains one and

only one value, not a set of values.

All attributes are dependent on the primary key.

All relational tables satisfy the 1NF requirements. The problem with the 1NF table structure shown in Figure 6.3 is that it

contains partial dependencies—that is, dependencies based on only a part of the primary key.

While partial dependencies are sometimes used for performance reasons, they should be used with caution. (If the

information requirements seem to dictate the use of partial dependencies, it is time to evaluate the need for a data

warehouse design, discussed in Chapter 13, Business Intelligence and Data Warehouses.) Such caution is warranted

because a table that contains partial dependencies is still subject to data redundancies, and therefore, to various

anomalies. The data redundancies occur because every row entry requires duplication of data. For example, if Alice

K. Johnson submits her work log, then the user would have to make multiple entries during the course of a day. For

each entry, the EMP_NAME, JOB_CLASS, and CHG_HOUR must be entered each time, even though the attribute

values are identical for each row entered. Such duplication of effort is very inefficient. What’s more, the duplication of

effort helps create data anomalies; nothing prevents the user from typing slightly different versions of the employee

name, the position, or the hourly pay. For instance, the employee name for EMP_NUM = 102 might be entered as

Dave Senior or D. Senior. The project name might also be entered correctly as Evergreen or misspelled as Evergeen.

Such data anomalies violate the relational database’s integrity and consistency rules.

Pages 44

2.3.2 Conversion to Second Normal Form

Converting to 2NF is done only when the 1NF has a composite primary key. If the 1NF has a single-attribute primary

key, then the table is automatically in 2NF. The 1NF-to-2NF conversion is simple. Starting with the 1NF format

displayed in Figure 2.3, you do the following:

Step 1: Make New Tables to Eliminate Partial Dependencies

For each component of the primary key that acts as a determinant in a partial dependency, create a new table with a

copy of that component as the primary key. While these components are placed in the new tables, it is important that

they also remain in the original table as well. It is important that the determinants remain in the original table

because they will be the foreign keys for the relationships that are needed to relate these new tables to the original

table. For the construction of our revised dependency diagram, write each key component on a separate line; then

write the original (composite) key on the last line. For example:

PROJ_NUM

EMP_NUM

PROJ_NUM EMP_NUM

Each component will become the key in a new table. In other words, the original table is now divided into three tables

(PROJECT, EMPLOYEE, and ASSIGNMENT).

Step 2: Reassign Corresponding Dependent Attributes

Use Figure 2.3 to determine those attributes that are dependent in the partial dependencies. The dependencies for the

original key components are found by examining the arrows below the dependency diagram shown in Figure 2.3. The

attributes that are dependent in a partial dependency are removed from the original table and placed in the new table

with its determinant. Any attributes that are not dependent in a partial dependency will remain in the original table. In

other words, the three tables that result from the conversion to 2NF are given appropriate names (PROJECT,

EMPLOYEE, and ASSIGNMENT) and are described by the following relational schemas:

PROJECT (PROJ_NUM, PROJ_NAME)

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

Because the number of hours spent on each project by each employee is dependent on both PROJ_NUM and

EMP_NUM in the ASSIGNMENT table, you leave those hours in the ASSIGNMENT table as ASSIGN_HOURS.

Notice that the ASSIGNMENT table contains a composite primary key composed of the attributes PROJ_NUM and

EMP_NUM. Notice that by leaving the determinants in the original table as well as making them the primary keys of the

new tables, primary key/foreign key relationships have been created. For example, in the EMPLOYEE table,

EMP_NUM is the primary key. In the ASSIGNMENT table, EMP_NUM is part of the composite primary key

(PROJ_NUM, EMP_NUM) and is a foreign key relating the EMPLOYEE table to the ASSIGNMENT table.

The results of Steps 1 and 2 are displayed in Figure 2.4. At this point, most of the anomalies discussed earlier have

been eliminated. For example, if you now want to add, change, or delete a PROJECT record, you need to go only to

the PROJECT table and make the change to only one row.

Because a partial dependency can exist only when a table’s primary key is composed of several attributes, a table

whose primary key consists of only a single attribute is automatically in 2NF once it is in 1NF.

Figure 2.4 still shows a transitive dependency, which can generate anomalies. For example, if the charge per hour

changes for a job classification held by many employees, that change must be made for each of those employees. If


FIGURE Second normal form (2NF) conversion results 2.4

Table name: PROJECT

PROJ_NUM PROJ_NAME

Table name: EMPLOYEE

EMP_NUM EMP_NAME

Table name: ASSIGNMENT


EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)

TRANSITIVE DEPENDENCY (JOB_CLASS CHG_HOUR)

JOB_CLASS CHG_HOUR

Transitive dependency


PROJ_NUM EMP_NUM ASSIGN_HOURS

you forget to update some of the employee records that are affected by the charge per hour change, different

employees with the same job description will generate different hourly charges.

Note

A table is in second normal form (2NF) when:

It is in 1NF.

and

It includes no partial dependencies; that is, no attribute is dependent on only a portion of the primary key.

Note that it is still possible for a table in 2NF to exhibit transitive dependency; that is, the primary key may

rely on one or more nonprime attributes to functionally determine other nonprime attributes, as is indicated by a

functional dependence among the nonprime attributes.

2.3.3 Conversion to Third Normal Form

The data anomalies created by the database organization shown in Figure 2.4 are easily eliminated by completing the

following two steps:

Step 1: Make New Tables to Eliminate Transitive Dependencies

For every transitive dependency, write a copy of its determinant as a primary key for a new table. A determinant is any

attribute whose value determines other values within a row. If you have three different transitive dependencies, you will

have three different determinants. As with the conversion to 2NF, it is important that the determinant remain in the

original table to serve as a foreign key. Figure 2.4 shows only one table that contains a transitive dependency.

Pages 46

Therefore, write the determinant for this transitive dependency as:

JOB_CLASS

Step 2: Reassign Corresponding Dependent Attributes

Using Figure 2.4, identify the attributes that are dependent on each determinant identified in Step 1. Place the

dependent attributes in the new tables with their determinants and remove them from their original tables. In this

example, eliminate CHG_HOUR from the EMPLOYEE table shown in Figure 2.4 to leave the EMPLOYEE table

dependency definition as:

EMP_NUM EMP_NAME, JOB_CLASS

Draw a new dependency diagram to show all of the tables you have defined in Steps 1 and 2. Name the table to reflect its

contents and function. In this case, JOB seems appropriate. Check all of the tables to make sure that each table has a

determinant and that no table contains inappropriate dependencies. When you have completed these steps, you will see

the results in Figure 2.5.

FIGURE Third normal form (3NF) conversion results 2.5

PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS

Table name: PROJECT Table name: EMPLOYEE

PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)

JOB_CLASS CHG_HOUR PROJ_NUM EMP_NUM ASSIGN_HOURS

Table name: JOB Table name: ASSIGNMENT

JOB (JOB_CLASS, CHG_HOUR) ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

In other words, after the 3NF conversion has been completed, your database will contain four tables:


EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)

JOB (JOB_CLASS, CHG_HOUR)


Note that this conversion has eliminated the original EMPLOYEE table’s transitive dependency; the tables are now said to

be in third normal form (3NF).


Note

A table is in third normal form (3NF) when:

It is in 2NF.

and

It contains no transitive dependencies.

It is interesting to note the similarities between resolving 2NF and 3NF problems. To convert a table from 1NF to 2NF, it

is necessary to remove the partial dependencies. To convert a table from 2NF to 3NF, it is necessary to remove the

transitive dependencies. No matter whether the “problem” dependency is a partial dependency or a transitive

dependency, the solution is the same. Create a new table for each problem dependency. The determinant of the

problem dependency remains in the original table and is placed as the primary key of the new table. The dependents of

the problem dependency are removed from the original table and placed as nonprime attributes in the new table.

Be aware, however, that while the technique is the same, it is imperative that 2NF be achieved before moving on to

3NF; be certain to resolve the partial dependencies before resolving the transitive dependencies. Recall, however, the

assumption that was made at the beginning of the discussion of the normalization process—that each table has only

one candidate key, which is the primary key. If a table has multiple candidate keys, then the overall process remains the

same, but there are additional considerations.

For example, if a table has multiple candidate keys and one of those candidate keys is a composite key, the table can

have partial dependencies based on this composite candidate key, even when the primary key chosen is a single

attribute. In those cases, following the process described above, those dependencies would be perceived as transitive

dependencies and would not be resolved until 3NF. The simplified process described above will allow the designer to

achieve the correct result, but through practice, you should recognize all candidate keys and their dependencies as

such, and resolve them appropriately. The existence of multiple candidate keys can also influence the identification of

transitive dependencies. Previously, a transitive dependency was defined to exist when one nonprime attribute

determined another nonprime attribute. In the presence of multiple candidate keys, the definition of a nonprime

attribute as an attribute that is not a part of any candidate key is critical. If the determinant of a functional dependence

is not the primary key but is a part of another candidate key, then it is not a nonprime attribute and does not signal

the presence of a transitive dependency.

2.4 IMPROVING THE DESIGN

The table structures are cleaned up to eliminate the troublesome partial and transitive dependencies. You can now

focus on improving the database’s ability to provide information and on enhancing its operational characteristics. In the

next few paragraphs, you will learn about the various types of issues you need to address to produce a good

normalized set of tables. Please note that for space issues, each section presents just one example—the designer must

apply the principle to all remaining tables in the design. Remember that normalization cannot, by itself, be relied on to

make good designs. Instead, normalization is valuable because its use helps eliminate data redundancies.

Evaluate PK Assignments

Each time a new employee is entered into the EMPLOYEE table, a JOB_CLASS value must be entered. Unfortunately, it

is too easy to make data-entry errors that lead to referential integrity violations. For example, entering DB Designer

instead of Database Designer for the JOB_CLASS attribute in the EMPLOYEE table will trigger such a violation.

Therefore, it would be better to add a JOB_CODE attribute to create a unique identifier. The addition of a JOB_CODE

attribute produces the dependency:

JOB_CODE JOB_CLASS, CHG_HOUR

Pages 48

If you assume that the JOB_CODE is a proper primary key, this new attribute does produce the dependency:

JOB_CLASS CHG_HOUR

However, this dependency is not a transitive dependency because the determinant is a candidate key. Further, the

presence of JOB_CODE greatly decreases the likelihood of referential integrity violations. Note that the new JOB table

now has two candidate keys—JOB_CODE and JOB_CLASS. In this case, JOB_CODE is the chosen primary key as

well as a surrogate key. A surrogate key, as you should recall, is an artificial PK introduced by the designer with the

purpose of simplifying the assignment of primary keys to tables. Surrogate keys are usually numeric, they are often

automatically generated by the DBMS, they are free of semantic content (they have no special meaning), and they are

usually hidden from the end users.

Evaluate Naming Conventions

It is best to adhere to the naming conventions outlined in Chapter 2, Data Models. Therefore, CHG_HOUR will be

changed to JOB_CHG_HOUR to indicate its association with the JOB table. In addition, the attribute name JOB_CLASS

does not quite describe entries such as Systems Analyst, Database Designer, and so on; the label JOB_DESCRIPTION fits

the entries better. Also, you might have noticed that HOURS was changed to ASSIGN_HOURS in the conversion from

1NF to 2NF. That change lets you associate the hours worked with the ASSIGNMENT table.

Refine Attribute Atomicity

It is generally good practice to pay attention to the atomicity requirement. An atomic attribute is one that cannot be

further subdivided. Such an attribute is said to display atomicity. Clearly, the use of the EMP_NAME in the

EMPLOYEE table is not atomic because EMP_NAME can be decomposed into a last name, a first name, and an initial. By

improving the degree of atomicity, you also gain querying flexibility. For example, if you use EMP_LNAME,

EMP_FNAME, and EMP_INITIAL, you can easily generate phone lists by sorting last names, first names, and initials.

Such a task would be very difficult if the name components were within a single attribute. In general, designers prefer to

use simple, single-valued attributes as indicated by the business rules and processing requirements.

Identify New Attributes

If the EMPLOYEE table were used in a real-world environment, several other attributes would have to be added. For

example, year-to-date gross salary payments, Social Security payments, and Medicare payments would be desirable. An

employee hire date attribute (EMP_HIREDATE) could be used to track an employee’s job longevity and serve as a

basis for awarding bonuses to long-term employees and for other morale-enhancing measures. The same principle must

be applied to all other tables in your design.

Identify New Relationships

According to the original report, the users need to track which employee is acting as the manager of each project. This can

be implemented as a relationship between EMPLOYEE and PROJECT. From the original report, it is clear that each

project has only one manager. Therefore, the system’s ability to supply detailed information about each project’s

manager is ensured by using the EMP_NUM as a foreign key in PROJECT. That action ensures that you can access the

details of each PROJECT’s manager data without producing unnecessary and undesirable data duplication. The

designer must take care to place the right attributes in the right tables by using normalization principles.

Refine Primary Keys as Required for Data Granularity

Granularity refers to the level of detail represented by the values stored in a table’s row. Data stored at their lowest level

of granularity are said to be atomic data, as explained earlier. In Figure 2.5, the ASSIGNMENT table in 3NF uses the

ASSIGN_HOURS attribute to represent the hours worked by a given employee on a given project. However, are those

values recorded at their lowest level of granularity? In other words, does ASSIGN_HOURS represent the hourly total,

daily total, weekly total, monthly total, or yearly total? Clearly, ASSIGN_HOURS requires more careful


definition. In this case, the relevant question would be as follows: For what time frame—hour, day, week, month, and so

on—do you want to record the ASSIGN_HOURS data?

For example, assume that the combination of EMP_NUM and PROJ_NUM is an acceptable (composite) primary key in

the ASSIGNMENT table. That primary key is useful in representing only the total number of hours an employee

worked on a project since its start. Using a surrogate primary key such as ASSIGN_NUM provides lower granularity

and yields greater flexibility. For example, assume that the EMP_NUM and PROJ_NUM combination is used as the

primary key, and then an employee makes two “hours worked” entries in the ASSIGNMENT table. That action violates

the entity integrity requirement. Even if you add the ASSIGN_DATE as part of a composite PK, an entity integrity

violation is still generated if any employee makes two or more entries for the same project on the same day. (The

employee might have worked on the project a few hours in the morning and then worked on it again later in the day.)

The same data entry yields no problems when ASSIGN_NUM is used as the primary key.

Note

In an ideal (database design) world, the level of desired granularity is determined at the conceptual design or at the

requirements-gathering phase. However, as you have already seen in this chapter, many database designs involve

the refinement of existing data requirements, thus triggering design modifications. In a real-world

environment, changing granularity requirements might dictate changes in primary key selection, and those

changes might ultimately require the use of surrogate keys.

Maintain Historical Accuracy

Writing the job charge per hour into the ASSIGNMENT table is crucial to maintaining the historical accuracy of the data

in the ASSIGNMENT table. It would be appropriate to name this attribute ASSIGN_CHG_HOUR. Although this attribute

would appear to have the same value as JOB_CHG_HOUR, this is true only if the JOB_CHG_HOUR value remains

the same forever. However, it is reasonable to assume that the job charge per hour will change over time. But suppose

that the charges to each project were figured (and billed) by multiplying the hours worked on the project, found in the

ASSIGNMENT table, by the charge per hour, found in the JOB table. Those charges would always show the current

charge per hour stored in the JOB table, rather than the charge per hour that was in effect at the time of the

assignment.

Evaluate Using Derived Attributes

Finally, you can use a derived attribute in the ASSIGNMENT table to store the actual charge made to a project. That

derived attribute, to be named ASSIGN_CHARGE, is the result of multiplying ASSIGN_HOURS by ASSIGN_CHG_

HOUR. This creates a transitive dependency such that

(ASSIGN_CHARGE + ASSIGN_HOURS) ASSIGN_CHG_HOUR.

From a strictly database point of view, such derived attribute values can be calculated when they are needed to write

reports or invoices. However, storing the derived attribute in the table makes it easy to write the application software to

produce the desired results. Also, if many transactions must be reported and/or summarized, the availability of the

derived attribute will save reporting time. (If the calculation is done at the time of data entry, it will be completed when the

end user presses the Enter key, thus speeding up the process.)

The enhancements described in the preceding sections are illustrated in the tables and dependency diagrams shown in

Figure 2.6.

Pages 50

FIGURE The completed database 2.6

Table name: PROJECT

Table name: JOB

PROJ_NUM PROJ_NAME EMP_NUM JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR

Database name: Ch06_ConstructCo

Table name: PROJECT Table name: JOB


ASSIGN_NUM ASSIGN_DATE PROJ_NUM EMP_NUM ASSIGN_HOURS ASSIGN_CHG_HOUR ASSIGN_CHARGE


Figure 2.6 is a vast improvement over the original database design. If the application software is designed properly, the

most active table (ASSIGNMENT) requires the entry of only the PROJ_NUM, EMP_NUM, and ASSIGN_HOURS


FIGURE The completed database (continued)

2.6 Table name: EMPLOYEE

EMP_NUM EMP_LNAME EMP_FNAME EMP_INITIAL EMP_HIREDATE JOB_CODE

Table name: EMPLOYEE

values. The values for the attributes ASSIGN_NUM and ASSIGN_DATE can be generated by the application. For

example, the ASSIGN_NUM can be created by using a counter, and the ASSIGN_DATE can be the system date read by

the application and automatically entered into the ASSIGNMENT table. In addition, the application software can

automatically insert the correct ASSIGN_CHG_HOUR value by writing the appropriate JOB table’s JOB_CHG_

HOUR value into the ASSIGNMENT table. (The JOB and ASSIGNMENT tables are related through the JOB_CODE

attribute.) If the JOB table’s JOB_CHG_HOUR value changes, the next insertion of that value into the ASSIGNMENT

table will reflect the change automatically. The table structure thus minimizes the need for human intervention. In fact, if

the system requires the employees to enter their own work hours, they can scan their EMP_NUM into the

ASSIGNMENT table by using a magnetic card reader that enters their identity. Thus, the ASSIGNMENT table’s

structure can set the stage for maintaining some desired level of security.

2.5 SURROGATE KEY CONSIDERATIONS

Although this design meets the vital entity and referential integrity requirements, the designer must still address some

concerns. For example, a composite primary key might become too cumbersome to use as the number of attributes

grows. (It becomes difficult to create a suitable foreign key when the related table uses a composite primary key. In

addition, a composite primary key makes it more difficult to write search routines.) Or a primary key attribute might

simply have too much descriptive content to be usable—which is why the JOB_CODE attribute was added to the JOB

table to serve as that table’s primary key. When, for whatever reason, the primary key is considered to be unsuitable,

designers use surrogate keys, as discussed in the previous chapter.

Pages 52

At the implementation level, a surrogate key is a system-defined attribute generally created and managed via the

DBMS. Usually, a system-defined surrogate key is numeric, and its value is automatically incremented for each new

row. For example, Microsoft Access uses an AutoNumber data type, Microsoft SQL Server uses an identity column,

and Oracle uses a sequence object.

Recall from Section 6.4 that the JOB_CODE attribute was designated to be the JOB table’s primary key. However,

remember that the JOB_CODE does not prevent duplicate entries from being made, as shown in the JOB table in

Table 2.4.

TABLE Duplicate Entries in the Job Table

2.4

JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR

511 Programmer $35.75

512 Programmer $35.75

Clearly, the data entries in Table 2.4 are inappropriate because they duplicate existing records—yet there has been no

violation of either entity integrity or referential integrity. This “multiple duplicate records” problem was created when

the JOB_CODE attribute was added as the PK. (When the JOB_DESCRIPTION was initially designated to be the PK,

the DBMS would ensure unique values for all job description entries when it was asked to enforce entity integrity. But

that option created the problems that caused the use of the JOB_CODE attribute in the first place!) In any case, if

JOB_CODE is to be the surrogate PK, you still must ensure the existence of unique values in the JOB_DESCRIPTION

through the use of a unique index.

Note that all of the remaining tables (PROJECT, ASSIGNMENT, and EMPLOYEE) are subject to the same limitations. For

example, if you use the EMP_NUM attribute in the EMPLOYEE table as the PK, you can make multiple entries for the

same employee. To avoid that problem, you might create a unique index for EMP_LNAME, EMP_FNAME, and

EMP_INITIAL. But how would you then deal with two employees named Joe B. Smith? In that case, you might use

another (preferably externally defined) attribute to serve as the basis for a unique index.

It is worth repeating that database design often involves trade-offs and the exercise of professional judgment. In a

real-world environment, you must strike a balance between design integrity and flexibility. For example, you might

design the ASSIGNMENT table to use a unique index on PROJ_NUM, EMP_NUM, and ASSIGN_DATE if you want to

limit an employee to only one ASSIGN_HOURS entry per date. That limitation would ensure that employees

couldn’t enter the same hours multiple times for any given date. Unfortunately, that limitation is likely to be undesirable from

a managerial point of view. After all, if an employee works several different times on a project during any given day, it

must be possible to make multiple entries for that same employee and the same project during that day. In that case, the

best solution might be to add a new externally defined attribute—such as a stub, voucher, or ticket number—to

ensure uniqueness. In any case, frequent data audits would be appropriate.

2.6 HIGHER-LEVEL NORMAL FORMS

Tables in 3NF will perform suitably in business transactional databases. However, there are occasions when higher

normal forms are useful. In this section, you will learn about a special case of 3NF, known as Boyce-Codd normal form

(BCNF), and about fourth normal form (4NF).

2.6.1 The Boyce-Codd Normal Form (BCNF)

A table is in Boyce-Codd normal form (BCNF) when every determinant in the table is a candidate key. (Recall from

Chapter 3 that a candidate key has the same characteristics as a primary key, but for some reason, it was not chosen to

be the primary key.) Clearly, when a table contains only one candidate key, the 3NF and the BCNF are equivalent.


Putting that proposition another way, BCNF can be violated only when the table contains more than one

candidate key.

Note

A table is in Boyce-Codd normal form (BCNF) when every determinant in the table is a candidate key.

Most designers consider the BCNF to be a special case of the 3NF. In fact, if the techniques shown here are used, most

tables conform to the BCNF requirements once the 3NF is reached. So how can a table be in 3NF and not be in

BCNF? To answer that question, you must keep in mind that a transitive dependency exists when one nonprime

attribute is dependent on another nonprime attribute.

In other words, a table is in 3NF when it is in 2NF and there are no transitive dependencies. But what about a case in

which a nonkey attribute is the determinant of a key attribute? That condition does not violate 3NF, yet it fails to meet

the BCNF requirements because BCNF requires that every determinant in the table be a candidate key. The situation

just described (a 3NF table that fails to meet BCNF requirements) is shown in Figure 2.7.

Note these functional dependencies in Figure 2.7: FIGURE A table that is in 3NF but not 2.7 in BCNF

A B C D

A + B C, D

A + C B, D

C B

Notice that this structure has two candidate keys: (A + B) and

(A + C). The table structure shown in Figure 2.7 has no partial

dependencies, nor does it contain transitive dependencies.

(The condition C B indicates that a nonkey attribute

determines part of the primary key—and that dependency

is not transitive or partial because the dependent is a prime

attribute!) Thus, the table structure in Figure 2.7 meets the

3NF requirements. Yet the condition C B causes the table to fail to meet the BCNF requirements.

To convert the table structure in Figure 6.7 into table structures that are in 3NF and in BCNF, first change the primary key

to A + C. That is an appropriate action because the dependency C B means that C is, in effect, a superset of

B. At this point, the table is in 1NF because it contains a partial dependency, C B. Next, follow the standard

decomposition procedures to produce the results shown in Figure 2.8.

To see how this procedure can be applied to an actual problem, examine the sample data in Table 6.5.

TABLE

2.5

STU_ID

125

125

135

144

144

Sample Data for a BCNF Conversion

STAFF_ID CLASS_CODE ENROLL_GRADE

25 21334 A

20 32456 C

20 28458 B

25 27563 C

20 32456 B

Pages 54

FIGURE Decomposition to BCNF 2.8

3NF, but not BCNF A B C D

1NF A C B D

Partial dependency

A C D C B

3NF and BCNF 3NF and BCNF

Table 2.5 reflects the following conditions:

Each CLASS_CODE identifies a class uniquely. This condition illustrates the case in which a course might

generate many classes. For example, a course labeled INFS 420 might be taught in two classes (sections), each

identified by a unique code to facilitate registration. Thus, the CLASS_CODE 32456 might identify INFS 420,

class section 1, while the CLASS_CODE 32457 might identify INFS 420, class section

CLASS_CODE 28458 might identify QM 362, class section 5.

A student can take many classes. Note, for example, that student 125 has taken both 21334 and 32456,

earning the grades A and C, respectively.

A staff member can teach many classes, but each class is taught by only one staff member. Note that staff

member 20 teaches the classes identified as 32456 and 28458.

The structure shown in Table 6.5 is reflected in Panel A of Figure 6.9:

STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE

CLASS_CODE STAFF_ID

Panel A of Figure 2.9 shows a structure that is clearly in 3NF, but the table represented by this structure has a major

problem, because it is trying to describe two things: staff assignments to classes and student enrollment information.

Such a dual-purpose table structure will cause anomalies. For example, if a different staff member is assigned to teach

class 32456, two rows will require updates, thus producing an update anomaly. And if student 135 drops class 28458,


FIGURE Another BCNF decomposition

2.9 Panel A: 3NF, but not BCNF

STU_ID STAFF_ID CLASS_CODE ENROLL_GRADE

Panel B: 3NF and BCNF

STU_ID CLASS_CODE ENROLL_GRADE CLASS_CODE STAFF_ID

information about who taught that class is lost, thus producing a deletion anomaly. The solution to the problem is to

decompose the table structure, following the procedure outlined earlier. Note that the decomposition of Panel B shown in

Figure 2.9 yields two table structures that conform to both 3NF and BCNF requirements.

Remember that a table is in BCNF when every determinant in that table is a candidate key. Therefore, when a table

contains only one candidate key, 3NF and BCNF are equivalent.

Pages 56

2.6.2 Fourth Normal Form (4NF)

You might encounter poorly designed databases, or you might be asked to convert spreadsheets into a database format in

which multiple multivalued attributes exist. For example, consider the possibility that an employee can have multiple

assignments and can also be involved in multiple service organizations. Suppose employee 10123 does volunteer work for

the Red Cross and United Way. In addition, the same employee might be assigned to work on three projects: 1, 3, and

4. Figure 2.10 illustrates how that set of facts can be recorded in very different ways.

FIGURE Tables with multivalued dependencies

2.10 Database name: Ch06_Service

Table name: VOLUNTEER_V1 Table name: VOLUNTEER_V2

Table name: VOLUNTEER_V3

There is a problem with the tables in Figure 2.10. The attributes ORG_CODE and ASSIGN_NUM each may have

many different values. In normalization terminology, this situation is referred to as a multivalued dependency. A

multivalued dependency occurs when one key determines multiple values of two other attributes and those attributes are

independent of each other. (One employee can have many service entries and many assignment entries. Therefore, one

EMP_NUM can determine multiple values of ORG_CODE and multiple values of ASSIGN_NUM; however,

ORG_CODE and ASSIGN_NUM are independent of each other.) The presence of a multivalued dependency means that

if versions 1 and 2 are implemented, the tables are likely to contain quite a few null values; in fact, the tables do not even

have a viable candidate key. (The EMP_NUM values are not unique, so they cannot be PKs. No combination of the

attributes in table versions 1 and 2 can be used to create a PK because some of them contain nulls.) Such a condition

is not desirable, especially when there are thousands of employees, many of whom may have multiple job assignments

and many service activities. Version 3 at least has a PK, but it is composed of all of the attributes in the table. In fact,

version 3 meets 3NF requirements, yet it contains many redundancies that are clearly undesirable.

The solution is to eliminate the problems caused by the multivalued dependency. You do this by creating new tables for

the components of the multivalued dependency. In this example, the multivalued dependency is resolved by creating the

ASSIGNMENT and SERVICE_V1 tables depicted in Figure 2.11. Note that in Figure 2.11, neither the

ASSIGNMENT nor the SERVICE_V1 table contains a multivalued dependency. Those tables are said to be in 4NF.

If you follow the proper design procedures illustrated in this book, you shouldn’t encounter the previously described

problem. Specifically, the discussion of 4NF is largely academic if you make sure that your tables conform to the

following two rules:

1. All attributes must be dependent on the primary key, but they must be independent of each other.

2. No row may contain two or more multivalued facts about an entity.


FIGURE A set of tables in 4NF 2.11

Database name: CH06_Service

Table name: PROJECT Table name: EMPLOYEE

Table name: ORGANIZATION


Table name: SERVICE_V1

The relational diagram

Note

A table is in fourth normal form (4NF) when it is in 3NF and has no multivalued dependencies.

2.7 NORMALIZATION AND DATABASE DESIGN

The tables shown in Figure 6.6 illustrate how normalization procedures can be used to produce good tables from poor

ones. You will likely have ample opportunity to put this skill into practice when you begin to work with real-world

databases. Normalization should be part of the design process. Therefore, make sure that proposed entities meet

the required normal form before the table structures are created. Keep in mind that if you follow the design procedures

discussed in Chapter 3 and Chapter 4, the likelihood of data anomalies will be small. But even the best database

designers are known to make occasional mistakes that come to light during normalization checks. However, many of

the real-world databases you encounter will have been improperly designed or burdened with anomalies if they were

improperly modified over the course of time. And that means you might be asked to redesign and modify existing

databases that are, in effect, anomaly traps. Therefore, you should be aware of good design principles and procedures

as well as normalization procedures.

First, an ERD is created through an iterative process. You begin by identifying relevant entities, their attributes, and

their relationships. Then you use the results to identify additional entities and attributes. The ERD provides the big

picture, or macro view, of an organization’s data requirements and operations.

Pages 58

Second, normalization focuses on the characteristics of specific entities; that is, normalization represents a micro view

of the entities within the ERD. And as you learned in the previous sections of this chapter, the normalization process

might yield additional entities and attributes to be incorporated into the ERD. Therefore, it is difficult to separate the

normalization process from the ER modeling process; the two techniques are used in an iterative and incremental

process.

To illustrate the proper role of normalization in the design process, let’s reexamine the operations of the contracting

company whose tables were normalized in the preceding sections. Those operations can be summarized by using the

following business rules:

The company manages many projects.

Each project requires the services of many employees.

An employee may be assigned to several different projects.

Some employees are not assigned to a project and perform duties not specifically related to a project. Some

employees are part of a labor pool, to be shared by all project teams. For example, the company’s executive

secretary would not be assigned to any one particular project.

Each employee has a single primary job classification. That job classification determines the hourly billing rate.

Many employees can have the same job classification. For example, the company employs more than one

electrical engineer.

Given that simple description of the company’s operations, two entities and their attributes are initially defined:


EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION,

JOB_CHG_HOUR)

Those two entities constitute the initial ERD shown in Figure 2.12.

After creating the initial ERD shown in Figure 2.12, the FIGURE Initial contracting company normal forms are defined: 2.12 ERD

PROJECT is in 3NF and needs no modification at

this point.

EMPLOYEE requires additional scrutiny. The JOB_

DESCRIPTION attribute defines job classifications

such as Systems Analyst, Database Designer, and

Programmer. In turn, those classifications determine

the billing rate, JOB_CHG_HOUR. Therefore,

EMPLOYEE contains a transitive dependency.

The removal of EMPLOYEE’s transitive dependency yields three entities:


EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE)

JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

Because the normalization process yields an additional entity (JOB), the initial ERD is modified as shown in

Figure 2.13.

To represent the M:N relationship between EMPLOYEE and PROJECT, you might think that two 1:M relationships

could be used—an employee can be assigned to many projects, and each project can have many employees assigned to

it. (See Figure 2.14.) Unfortunately, that representation yields a design that cannot be correctly implemented.

Because the M:N relationship between EMPLOYEE and PROJECT cannot be implemented, the ERD in Figure 2.14

must be modified to include the ASSIGNMENT entity to track the assignment of employees to projects, thus yielding


FIGURE Modified contracting company ERD

2.13

FIGURE Incorrect M:N relationship representation

2.14

the ERD shown in Figure 2.15. The ASSIGNMENT entity in Figure 2.15 uses the primary keys from the entities

PROJECT and EMPLOYEE to serve as its foreign keys. However, note that in this implementation, the ASSIGNMENT

entity’s surrogate primary key is ASSIGN_NUM, to avoid the use of a composite primary key. Therefore, the “enters”

relationship between EMPLOYEE and ASSIGNMENT and the “requires” relationship between PROJECT and

ASSIGNMENT are shown as weak or nonidentifying.

Note that in Figure 6.15, the ASSIGN_HOURS attribute is assigned to the composite entity named ASSIGNMENT.

Because you will likely need detailed information about each project’s manager, the creation of a

relationship is useful. The “manages” relationship is implemented through the foreign key in PROJECT. Finally, some

additional attributes may be created to improve the system’s ability to generate additional information. For example,

Pages 60

FIGURE Final contracting company ERD

2.15

you may want to include the date on which the employee was hired (EMP_HIREDATE) to keep track of worker

longevity. Based on this last modification, the model should include four entities and their attributes:

PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)

EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_HIREDATE, JOB_CODE)

JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

ASSIGNMENT (ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM, EMP_NUM, ASSIGN_HOURS, ASSIGN_CHG_

HOUR, ASSIGN_CHARGE)

The design process is now on the right track. The ERD represents the operations accurately, and the entities now

reflect their conformance to 3NF. The combination of normalization and ER modeling yields a useful ERD, whose

entities may now be translated into appropriate table structures. In Figure 2.15, note that PROJECT is optional to

EMPLOYEE in the “manages” relationship. This optionality exists because not all employees manage projects. The

final database contents are shown in Figure 2.16.

2.8 DENORMALIZATION

It’s important to remember that the optimal relational database implementation requires that all tables be at least in

third normal form (3NF). A good relational DBMS excels at managing normalized relations; that is, relations void of any

unnecessary redundancies that might cause data anomalies. Although the creation of normalized relations is an

important database design goal, it is only one of many such goals. Good database design also considers processing (or

reporting) requirements and processing speed. The problem with normalization is that as tables are decomposed to

conform to normalization requirements, the number of database tables expands. Therefore, in order to generate

information, data must be put together from various tables. Joining a large number of tables takes additional

input/output (I/O) operations and processing logic, thereby reducing system speed. Most relational database systems are

able to handle joins very efficiently. However, rare and occasional circumstances may allow some degree of

denormalization so processing speed can be increased.


FIGURE The implemented database

2.16 Table name: EMPLOYEE Database name: Ch06_ConstructCo

Table name: JOB

Table name: PROJECT


Keep in mind that the advantage of higher processing speed must be carefully weighed against the disadvantage of data

anomalies. On the other hand, some anomalies are of only theoretical interest. For example, should people in a real-

world database environment worry that a ZIP_CODE determines CITY in a CUSTOMER table whose primary key is the

customer number? Is it really practical to produce a separate table for

ZIP (ZIP_CODE, CITY)

to eliminate a transitive dependency from the CUSTOMER table? (Perhaps your answer to that question changes if you

are in the business of producing mailing lists.) As explained earlier, the problem with denormalized relations and

redundant data is that the data integrity could be compromised due to the possibility of data anomalies (insert, update,

and deletion anomalies). The advice is simple: use common sense during the normalization process.

Pages 62

Furthermore, the database design process could, in some cases, introduce some small degree of redundant data in the

model (as seen in the previous example). This, in effect, creates “denormalized” relations. Table 2.6 shows some

common examples of data redundancy that are generally found in database implementations.

TABLE Common Denormalization Examples

2.6

CASE EXAMPLE

Redundant data Storing ZIP and CITY attributes in the CUSTOMER

table when ZIP determines CITY. (See Table 1.4.)

Derived data Storing STU_HRS and STU_CLASS (student classifica-

tion) when STU_HRS determines STU_CLASS. (See

Figure 3.29.)

Preaggregated Storing the student grade point average (STU_GPA)

data (also derived aggregate value in the STUDENT table when this can

data) be calculated from the ENROLL and COURSE tables.

(See Figure 3.29.)

Information Using a temporary denormalized table to hold report

requirements data. This is required when creating a tabular report in

which the columns represent data that are stored in

the table as rows. (See Figure 6.17 and Figure 6.18.)

RATIONALE AND CONTROLS

Avoid extra joint operations.

Program can validate city

(drop-down box) based on the

zip code.


Program can validate classifica-

tion (lookup) based on the stu-

dent hours.


Program computes the GPA

every time a grade is entered or

updated.

STU_GPA can be updated only

via administrative routine.

Impossible to generate the data

required by the report using plain

SQL.

No need to maintain table.

Temporary table is deleted once

report is done.

Processing speed is not an issue.

A more comprehensive example of the need for denormalization due to reporting requirements is the case of a faculty

evaluation report in which each row list the scores obtained during the last four semesters taught. (See Figure 2.17.)

FIGURE The faculty evaluation report

2.17

Although this report seems simple enough, the problem arises from the fact that the data are stored in a normalized

table in which each row represents a different score for a given faculty member in a given semester. (See Figure 2.18.)

The difficulty of transposing multirow data to multicolumnar data is compounded by the fact that the last four semesters

taught are not necessarily the same for all faculty members (some might have taken sabbaticals, some might have had


FIGURE The EVALDATA and FACHIST tables 2.18 Table name: EVALDATA Table name: FACHIST Database name: Ch06_EVAL

Denormalized Repeating Group

Normalized

research appointments, some might be new faculty with only two semesters on the job, etc.). To generate this report, the

two tables you see in Figure 2.18 were used. The EVALDATA table is the master data table containing the

evaluation scores for each faculty member for each semester taught; this table is normalized. The FACHIST table

contains the last four data points—that is, evaluation score and semester—for each faculty member. The FACHIST

table is a temporary denormalized table created from the EVALDATA table via a series of queries. (The FACHIST table is

the basis for the report shown in Figure 2.17.)

As seen in the faculty evaluation report, the conflicts between design efficiency, information requirements, and

performance are often resolved through compromises that may include denormalization. In this case, and assuming

there is enough storage space, the designer’s choices could be narrowed down to:

Store the data in a permanent denormalized table. This is not the recommended solution, because the

denormalized table is subject to data anomalies (insert, update, and delete). This solution is viable only if

performance is an issue.

Create a temporary denormalized table from the permanent normalized table(s). Because the denormalized

table exists only as long as it takes to generate the report, it disappears after the report is produced. Therefore,

there are no data anomaly problems. This solution is practical only if performance is not an issue and there are

no other viable processing options.

As shown, normalization purity is often difficult to sustain in the modern database environment. You will learn

in Chapter 13, Business Intelligence and Data Warehouses, that lower normalization forms occur (and are even

required) in specialized databases known as data warehouses. Such specialized databases reflect the ever-growing

demand for greater scope and depth in the data on which decision support systems increasingly rely. You will discover

that the data warehouse routinely uses 2NF structures in its complex, multilevel, multisource data environment. In

short, although normalization is very important, especially in the so-called production database environment, 2NF is

no longer disregarded as it once was.

Pages 64

Although 2NF tables cannot always be avoided, the problem of working with tables that contain partial and/or

transitive dependencies in a production database environment should not be minimized. Aside from the possibility of

troublesome data anomalies being created, unnormalized tables in a production database tend to suffer from these

defects:

Data updates are less efficient because programs that read and update tables must deal with larger tables.

Indexing is more cumbersome. It is simply not practical to build all of the indexes required for the many

attributes that might be located in a single unnormalized table.

Unnormalized tables yield no simple strategies for creating virtual tables known as views. You will learn how to

create and use views in Chapter 7, Introduction to Structured Query Language (SQL).

Remember that good design cannot be created in the application programs that use a database. Also keep in mind that

unnormalized database tables often lead to various data redundancy disasters in production databases such as the ones

examined thus far. In other words, use denormalization cautiously and make sure that you can explain why the

unnormalized tables are a better choice in certain situations than their normalized counterparts.

2.9 DATA-MODELING CHECKLIST

In the chapters of Part II, you have learned how data modeling translates a specific real-world environment into a data

model that represents the real-world data, users, processes, and interactions. The modeling techniques you have

learned thus far give you the tools needed to produce successful database designs. However, just as any good pilot uses a

checklist to ensure that all is in order for a successful flight, the data-modeling checklist shown in Table 2.7 will help

ensure that you perform data-modeling tasks successfully based on the concepts and tools you have learned in this text.


TABLE

2.7

Data-Modeling Checklist

DATA-MODELING CHECKLIST

BUSINESS RULES

Properly document and verify all business rules with the end users.

Ensure that all business rules are written precisely, clearly, and simply. The business rules must help identify

entities, attributes, relationships, and constraints.

Identify the source of all business rules, and ensure that each business rule is justified, dated, and signed off by an

approving authority.

DATA MODELING

Naming Conventions: All names should be limited in length (database-dependent size).

Entity Names:

Should be nouns that are familiar to business and should be short and meaningful

Should document abbreviations, synonyms, and aliases for each entity

Should be unique within the model

For composite entities, may include a combination of abbreviated names of the entities linked through

composite entity

Attribute Names:

Should be unique within the entity

Should use the entity abbreviation as a prefix

Should be descriptive of the characteristic

Should use suffixes such as _ID, _NUM, or _CODE for the PK attribute

Should not be a reserved word

Should not contain spaces or special characters such as @, !, or &

Relationship Names:

Should be active or passive verbs that clearly indicate the nature of the relationship

Entities:

Each entity should represent a single subject.

Each entity should represent a set of distinguishable entity instances.

All entities should be in 3NF or higher. Any entities below 3NF should be justified.

The granularity of the entity instance should be clearly defined.

The PK should be clearly defined and support the selected data granularity.

Attributes:

Should be simple and single-valued (atomic data)

Should document default values, constraints, synonyms, and aliases

Derived attributes should be clearly identified and include source(s)

Should not be redundant unless this is required for transaction accuracy, performance, or

maintaining a history

Nonkey attributes must be fully dependent on the PK attribute

Relationships:

Should clearly identify relationship participants

Should clearly define participation, connectivity, and document cardinality

ER Model:

Should be validated against expected processes: inserts, updates, and deletes

Should evaluate where, when, and how to maintain a history

Should not contain redundant relationships except as required (see attributes)

Should minimize data redundancy to ensure single-place updates

Should conform to the minimal data rule: “All that is needed is there, and all that is there is needed.”

Pages 66

S u m m a r y

◗ Normalization is a technique used to design tables in which data redundancies are minimized. The first three normal

forms (1NF, 2NF, and 3NF) are most commonly encountered. From a structural point of view, higher normal forms

are better than lower normal forms, because higher normal forms yield relatively fewer data redundancies in the

database. Almost all business designs use 3NF as the ideal normal form. A special, more restricted 3NF known as

Boyce-Codd normal form, or BCNF, is also used.

◗ A table is in 1NF when all key attributes are defined and when all remaining attributes are dependent on the primary

key. However, a table in 1NF can still contain both partial and transitive dependencies. (A partial dependency is one in

which an attribute is functionally dependent on only a part of a multiattribute primary key. A transitive

dependency is one in which one attribute is functionally dependent on another nonkey attribute.) A table with a

single-attribute primary key cannot exhibit partial dependencies.

◗ A table is in 2NF when it is in 1NF and contains no partial dependencies. Therefore, a 1NF table is automatically

in 2NF when its primary key is based on only a single attribute. A table in 2NF may still contain transitive

dependencies.

◗ A table is in 3NF when it is in 2NF and contains no transitive dependencies. Given that definition of 3NF, the

Boyce-Codd normal form (BCNF) is merely a special 3NF case in which all determinant keys are candidate keys.

When a table has only a single candidate key, a 3NF table is automatically in BCNF.

◗ A table that is not in 3NF may be split into new tables until all of the tables meet the 3NF requirements.

◗ Normalization is an important part—but only a part—of the design process. As entities and attributes are defined

during the ER modeling process, subject each entity (set) to normalization checks and form new entity (sets) as

required. Incorporate the normalized entities into the ERD and continue the iterative ER process until all entities

and their attributes are defined and all equivalent tables are in 3NF.

◗ A table in 3NF might contain multivalued dependencies that produce either numerous null values or redundant data.

Therefore, it might be necessary to convert a 3NF table to the fourth normal form (4NF) by splitting the table to

remove the multivalued dependencies. Thus, a table is in 4NF when it is in 3NF and contains no multivalued

dependencies.

◗ The larger the number of tables, the more additional I/O operations and processing logic required to join them.

Therefore, tables are sometimes denormalized to yield less I/O in order to increase processing speed.

Unfortunately, with larger tables, you pay for the increased processing speed by making the data updates less efficient,

by making indexing more cumbersome, and by introducing data redundancies that are likely to yield data anomalies.

In the design of production databases, use denormalization sparingly and cautiously.

◗ The Data-Modeling Checklist provides a way for the designer to check that the ERD meets a set of minimum

requirements.

K e y T e r m s

atomic attribute, 188 first normal form (1NF), 183 partial dependency, 180

atomicity, 188 fourth normal form (4NF), 197 prime attribute, 175

Boyce-Codd normal form granularity, 188 repeating group, 181

(BCNF), 192 key attribute, 175 second normal form (2NF), 185

denormalization, 175 nonkey attribute, 175 third normal form (3NF), 187

dependency diagram, 182 nonprime attribute, 175 transitive dependency, 180

determinant, 185 normalization, 175

Introduction to Structured Query Language


The basic commands and functions of SQL

How to use SQL for data administration (to create tables, indexes, and views)

How to use SQL for data manipulation (to add, modify, delete, and retrieve

How to use SQL to query a database for useful information

In this chapter, you will learn the basics of Structured Query Language (SQL). SQL,

pronounced S-Q-L by some and “sequel” by others, is composed of commands that enable

users to create database and table structures, perform various types of data manipulation

and data administration, and query the database to extract useful information.All relational

DBMS software supports SQL, and many software vendors have developed extensions to

the basic SQL command set.

Because SQL’s vocabulary is simple, the language is relatively easy to learn. Its simplicity is

enhanced by the fact that much of its work takes place behind the scenes. For example, a

single command creates the complex table structures required to store and manipulate data

successfully. Furthermore, SQL is a nonprocedural language; that is, the user specifies what

must be done, but not how it is to be done. To issue SQL commands, end users and

programmers do not need to know the physical data storage format or the complex

activities that take place when a SQL command is executed.

Although quite useful and powerful, SQL is not meant to stand alone in the applications

arena. Data entry with SQL is possible but awkward, as are data corrections and additions.

SQL itself does not create menus, special report forms, overlays, pop-ups, or any of the

other utilities and screen devices that end users usually expect. Instead, those features are

available as vendor-supplied enhancements. SQL focuses on data definition (creating tables,

indexes, and views) and data manipulation (adding, modifying, deleting, and retrieving data);

we will cover these basic functions in this chapter. In spite of its limitations, SQL is a

powerful tool for extracting information and managing data.

3

I N T R O D U C T I O N T O S T R U C T U R E D Q U E R Y L A N G U A G E 67

3.1 INTRODUCTION TO SQL

Ideally, a database language allows you to create database and table structures, to perform basic data management

chores (add, delete, and modify), and to perform complex queries designed to transform the raw data into useful

information. Moreover, a database language must perform such basic functions with minimal user effort, and its

command structure and syntax must be easy to learn. Finally, it must be portable; that is, it must conform to some basic

standard so that an individual does not have to relearn the basics when moving from one RDBMS to another. SQL

meets those ideal database language requirements well.

SQL functions fit into two broad categories:

It is a data definition language (DDL): SQL includes commands to create database objects such as tables,

indexes, and views, as well as commands to define access rights to those database objects. The data definition

commands you will learn in this chapter are listed in Table 3.1.

TABLE

3.1

SQL Data Definition Commands

COMMAND OR OPTION DESCRIPTION

CREATE SCHEMA AUTHORIZATION Creates a database schema

CREATE TABLE Creates a new table in the user's database schema

NOT NULL Ensures that a column will not have null values

UNIQUE Ensures that a column will not have duplicate values

PRIMARY KEY Defines a primary key for a table

FOREIGN KEY Defines a foreign key for a table

DEFAULT Defines a default value for a column (when no value is given)

CHECK Validates data in an attribute

CREATE INDEX Creates an index for a table

CREATE VIEW Creates a dynamic subset of rows/columns from one or more

tables

ALTER TABLE Modifies a tables definition (adds, modifies, or deletes attributes

or constraints)

CREATE TABLE AS Creates a new table based on a query in the user's database

schema

DROP TABLE Permanently deletes a table (and its data)

DROP INDEX Permanently deletes an index

DROP VIEW Permanently deletes a view

It is a data manipulation language (DML): SQL includes commands to insert, update, delete, and retrieve

data within the database tables. The

Table 3.2.

TABLE SQL Data Manipulation 3.2

data manipulation commands you will learn in this chapter are listed in

Commands

COMMAND OR OPTION

INSERT

SELECT

WHERE

GROUP BY

HAVING

ORDER BY

UPDATE

DESCRIPTION

Inserts row(s) into a table

Selects attributes from rows in one or more tables or views

Restricts the selection of rows based on a conditional expression

Groups the selected rows based on one or more attributes

Restricts the selection of grouped rows based on a condition

Orders the selected rows based on one or more attributes

Modifies an attribute’s values in one or more table’s rows

Pages 68

TABLE

3.2

SQL Data Manipulation Commands (continued)

COMMAND OR OPTION

DELETE

COMMIT

ROLLBACK

COMPARISON OPERATORS

=, <, >, <=, >=, <>

LOGICAL OPERATORS

AND/OR/NOT

SPECIAL OPERATORS

BETWEEN

IS NULL

LIKE

IN

EXISTS

DISTINCT

AGGREGATE FUNCTIONS

COUNT

MIN

MAX

SUM

AVG

You will be happy to know that

100 words. Better yet, SQL is a

DESCRIPTION

Deletes one or more rows from a table

Permanently saves data changes

Restores data to their original values

Used in conditional expressions



Checks whether an attribute value is within a range

Checks whether an attribute value is null

Checks whether an attribute value matches a given string pattern

Checks whether an attribute value matches any value within a value list

Checks whether a subquery returns any rows

Limits values to unique values

Used with SELECT to return mathematical summaries on columns

Returns the number of rows with non-null values for a given column

Returns the minimum attribute value found in a given column

Returns the maximum attribute value found in a given column

Returns the sum of all values for a given column

Returns the average of all values for a given column

SQL is relatively easy to learn. Its basic command set has a vocabulary of fewer

nonprocedural language: you merely command what is to be done; you don’t have to

worry about how it is to be done. The American National Standards Institute (ANSI) prescribes a standard SQL—the

current fully approved version is SQL-2003. The ANSI SQL standards are also accepted by the International

Organization for Standardization (ISO), a consortium composed of national standards bodies of more than 150

countries. Although adherence to the ANSI/ISO SQL standard is usually required in commercial and government contract

database specifications, many RDBMS vendors add their own special enhancements. Consequently, it is seldom

possible to move a SQL-based application from one RDBMS to another without making some changes.

However, even though there are several different SQL “dialects,” the differences among them are minor. Whether you

use Oracle, Microsoft SQL Server, MySQL, IBM’s DB2, Microsoft Access, or any other well-established RDBMS, a

software manual should be sufficient to get you up to speed if you know the material presented in this chapter.

At the heart of SQL is the query. In Chapter 1, Database Systems, you learned that a query is a spur-of-the-moment

question. Actually, in the SQL environment, the word query covers both questions and actions. Most SQL queries are

used to answer questions such as these: “What products currently held in inventory are priced over $100, and what is

the quantity on hand for each of those products?” “How many employees have been hired since January 1, 2008 by

each of the company’s departments?” However, many SQL queries are used to perform actions such as adding or

deleting table rows or changing attribute values within tables. Still other SQL queries create new tables or indexes. In

short, for a DBMS, a query is simply a SQL statement that must be executed. But before you can use SQL to query a

database, you must define the database environment for SQL with its data definition commands.


3.2 DATA DEFINITION COMMANDS

Before examining the SQL syntax for creating and defining tables and other elements, let’s first examine the simple

database model and the database tables that will form the basis for the many SQL examples you’ll explore in this

chapter.

3.2.1 The Database Model

A simple database composed of the following tables is used to illustrate the SQL commands in this chapter:

CUSTOMER, INVOICE, LINE, PRODUCT, and VENDOR. This database model is shown in Figure 3.1.

FIGURE The database model

3.1

The database model in Figure 31 reflects the following business rules:

A customer may generate many invoices. Each invoice is generated by one customer.

An invoice contains one or more invoice lines. Each invoice line is associated with one invoice.

Each invoice line references one product. A product may be found in many invoice lines. (You can sell more

than one hammer to more than one customer.)

A vendor may supply many products. Some vendors do not (yet?) supply products. (For example, a vendor list

may include potential vendors.)

If a product is vendor-supplied, that product is supplied by only a single vendor.

Some products are not supplied by a vendor. (For example, some products may be produced in-house or

bought on the open market.)

As you can see in Figure 3.1, the database model contains many tables. However, to illustrate the initial set of data

definition commands, the focus of attention will be the PRODUCT and VENDOR tables. You will have the opportunity to

use the remaining tables later in this chapter and in the problem section.

Pages 70


The database model in Figure 7.1 is implemented in the Microsoft Access Ch07_SaleCo database located in the

Premium Website for this book. (This database contains a few additional tables that are not reflected in Figure

3.1. These tables are used for discussion purposes only.) If you use MS Access, you can use the database supplied

online. However, it is strongly suggested that you create your own database structures so you can practice the

SQL commands illustrated in this chapter.

SQL script files for creating the tables and loading the data in Oracle and MS SQL Server are also located in the

Premium Website. How you connect to your database depends on how the software was installed on your

computer. Follow the instructions provided by your instructor or school.

So that you have a point of reference for understanding the effect of the SQL queries, the contents of the PRODUCT and

VENDOR tables are listed in Figure 3.2.

FIGURE The VENDOR and PRODUCT tables

3.2 Table name: VENDOR Database name: Ch07_SaleCo

Table name: PRODUCT

Note the following about these tables. (The features correspond to the business rules reflected in the ERD shown in

Figure 3.1.)

The VENDOR table contains vendors who are not referenced in the PRODUCT table. Database designers note

that possibility by saying that PRODUCT is optional to VENDOR; a vendor may exist without a reference to a

product. You examined such optional relationships in detail in Chapter 4, Entity Relationship (ER) Modeling.


Existing V_CODE values in the PRODUCT table must (and do) have a match in the VENDOR table to ensure

referential integrity.

A few products are supplied factory-direct, a few are made in-house, and a few may have been bought in a

warehouse sale. In other words, a product is not necessarily supplied by a vendor. Therefore, VENDOR is

optional to PRODUCT.

A few of the conditions just described were made for the sake of illustrating specific SQL features. For example, null

V_CODE values were used in the PRODUCT table to illustrate (later) how you can track such nulls using SQL.

3.2.2 Creating the Database

Before you can use a new RDBMS, you must complete two tasks: first, create the database structure, and second,

create the tables that will hold the end-user data. To complete the first task, the RDBMS creates the physical files that will

hold the database. When you create a new database, the RDBMS automatically creates the data dictionary tables in

which to store the metadata and creates a default database administrator. Creating the physical files that will hold the

database means interacting with the operating system and the file systems supported by the operating system.

Therefore, creating the database structure is the one feature that tends to differ substantially from one RDBMS to

another. The good news is that it is relatively easy to create a database structure, regardless of which RDBMS you use.

If you use Microsoft Access, creating the database is simple: start Access, select File New Blank Database, specify the

folder in which you want to store the database, and then name the database. However, if you work in a database

environment typically used by larger organizations, you will probably use an enterprise RDBMS such as Oracle, SQL

Server, MySQL, or DB2. Given their security requirements and greater complexity, those database products require a

more elaborate database creation process. (See Appendix N, Creating a New Database using Oracle 11g, for an

illustration of specific instructions to create a database structure in Oracle.)

You will be relieved to discover that, with the exception of the database creation process, most RDBMS vendors use SQL

that deviates little from the ANSI standard SQL. For example, most RDBMSs require that each SQL command ends with

a semicolon. However, some SQL implementations do not use a semicolon. Important syntax differences among

implementations will be highlighted in the Note boxes.

If you are using an enterprise RDBMS, before you can start creating tables you must be authenticated by the RDBMS.

Authentication is the process through which the DBMS verifies that only registered users may access the database. To

be authenticated, you must log on to the RDBMS using a user ID and a password created by the database

administrator. In an enterprise RDBMS, every user ID is associated with a database schema.

3.2.3 The Database Schema

In the SQL environment, a schema is a group of database objects—such as tables and indexes—that are related to

each other. Usually, the schema belongs to a single user or application. A single database can hold multiple schemas

belonging to different users or applications. Think of a schema as a logical grouping of database objects, such as tables,

indexes, and views. Schemas are useful in that they group tables by owner (or function) and enforce a first level of

security by allowing each user to see only the tables that belong to that user.

ANSI SQL standards define a command to create a database schema:

CREATE SCHEMA AUTHORIZATION {creator};

Therefore, if the creator is JONES, use the command:

CREATE SCHEMA AUTHORIZATION JONES;

Pages 72

Most enterprise RDBMSs support that command. However, the command is seldom used directly—that is, from the

command line. (When a user is created, the DBMS automatically assigns a schema to that user.) When the DBMS is

used, the CREATE SCHEMA AUTHORIZATION command must be issued by the user who owns the schema. That is,

if you log on as JONES, you can only use CREATE SCHEMA AUTHORIZATION JONES.

For most RDBMSs, the CREATE SCHEMA AUTHORIZATION is optional. That is why this chapter focuses on the

ANSI SQL commands required to create and manipulate tables.

3.2.4 Data Types

In the data dictionary in Table 3.3, note particularly the data types selected. Keep in mind that data-type selection is

usually dictated by the nature of the data and by the intended use. For example:

P_PRICE clearly requires some kind of numeric data type; defining it as a character field is not acceptable. Just

as clearly, a vendor name is an obvious candidate for a character data type. For example, VARCHAR2(35) fits well

because vendor names are “variable-length” character strings, and in this case, such strings may be up to 35

characters long.

At first glance, it might seem logical to select a numeric data type for V_AREACODE because it contains only

digits. However, adding and subtracting area codes does not yield meaningful results. Therefore, selecting a

character data type is more appropriate. This is true for many common attributes found in business data models. For

example, even though zip codes contain all digits, they must be defined as character data because some zip codes

begin with the digit zero (0), and a numeric data type would cause the leading zero to be dropped.

U.S. state abbreviations are always two characters, so CHAR(2) is a logical choice.

Selecting P_INDATE to be a (Julian) DATE field rather than a character field is desirable because the Julian

dates allow you to make simple date comparisons and to perform date arithmetic. For instance, if you have

used DATE fields, you can determine how many days there are between them.

If you use DATE fields, you can also determine what the date will be in say, 60 days from a given P_INDATE by using

P_INDATE + 60. Or you can use the RDBMS’s system date—SYSDATE in Oracle, GETDATE() in MS SQL Server,

and Date() in Access—to determine the answer to questions such as, “What will be the date 60 days from today?” For

example, you might use SYSDATE + 60 (in Oracle), GETDATE() + 60 (in MS SQL Server), or Date() + 60 (in Access).

Date arithmetic capability is particularly useful in billing. Perhaps you want your system to start charging interest on a

customer balance 60 days after the invoice is generated. Such simple date arithmetic would be impossible if you used a

character data type.

Data-type selection sometimes requires professional judgment. For example, you must make a decision about the

V_CODE’s data type as follows:

If you want the computer to generate new vendor codes by adding 1 to the largest recorded vendor code, you

must classify V_CODE as a numeric attribute. (You cannot perform mathematical procedures on character

data.) The designation INTEGER will ensure that only the counting numbers (integers) can be used. Most SQL

implementations also permit the use of SMALLINT for integer values up to six digits.

If you do not want to perform mathematical procedures based on V_CODE, you should classify it as a character

attribute, even though it is composed entirely of numbers. Character data are “quicker” to process in queries.

Therefore, when there is no need to perform mathematical procedures on the attribute, store it as a character

attribute.

The first option is used to demonstrate the SQL procedures in this chapter.


Pages 74

When you define the attribute’s data type, you must pay close attention to the expected use of the attributes for sorting and

data-retrieval purposes. For example, in a real estate application, an attribute that represents the numbers of

bathrooms in a home (H_BATH_NUM) could be assigned the CHAR(3) data type because it is highly unlikely the

application will do any addition, multiplication, or division with the number of bathrooms. Based on the CHAR(3)

data-type definition, valid H_BATH_NUM values would be '2','1','2.5','10'. However, this data-type decision creates

potential problems. For example, if an application sorts the homes by number of bathrooms, a query would “see” the

value '10' as less than '2', which is clearly incorrect. So you must give some thought to the expected use of the data in

order to properly define the attribute data type.

The data dictionary in Table 3.3 contains only a few of the data types supported by SQL. For teaching purposes, the

selection of data types is limited to ensure that almost any RDBMS can be used to implement the examples. If your

RDBMS is fully compliant with ANSI SQL, it will support many more data types than the ones shown in Table 3.4.

And many RDBMSs support data types beyond the ones specified in ANSI SQL.

TABLE Some Common SQL Data Types

3.4

DATA TYPE FORMAT

Numeric NUMBER(L,D)

INTEGER

SMALLINT

DECIMAL(L,D)

Character CHAR(L)

VARCHAR(L) or

VARCHAR2(L)

Date DATE

COMMENTS

The declaration NUMBER(3,2) indicates numbers that will be stored with

two decimal places and may be up to seven digits long , including the sign

and the decimal place. Examples: 12.32, -134.99.

May be abbreviated as INT. Integers are (whole) counting numbers, so they

cannot be used if you want to store numbers that require decimal places.

Like INTEGER but limited to integer values up to six digits. If your integer

values are relatively small, use SMALLINT instead of INT.

Like the NUMBER specification, but the storage length is a minimum specification. That is, greater lengths are acceptable, but smaller ones are not. DECIMAL(9,2), DECIMAL(9), and DECIMAL are all acceptable.

Fixed-length character data for up to 255 characters. If you store strings that

are not as long as the CHAR parameter value, the remaining spaces are left

unused. Therefore, if you specify CHAR(25), strings such as Smith and

Katzenjammer are each stored as 25 characters. However, a U.S. area code

is always three digits long , so CHAR(3) would be appropriate if you wanted

to store such codes.

Variable-length character data. The designation VARCHAR2(25) will let you

store characters up to 25 characters long . However, VARCHAR will not leave

unused spaces. Oracle automatically converts VARCHAR to VARCHAR2.

Stores dates in the Julian date format.

In addition to the data types shown in Table 3.4, SQL supports several other data types, including TIME, TIMESTAMP,

REAL, DOUBLE, FLOAT, and intervals such as INTERVAL DAY TO HOUR. Many RDBMSs have also expanded the list

to include other types of data, such as LOGICAL, CURRENCY, AutoNumber (Access), and sequence (Oracle).

However, because this chapter is designed to introduce the SQL basics, the discussion is limited to the data types

summarized in Table 3.4.


3.2.5 Creating Table Structures

Now you are ready to implement the PRODUCT and VENDOR table structures with the help of SQL, using the

CREATE TABLE syntax shown next.

CREATE TABLE tablename (

column1 data type [constraint] [,

column2 data type [constraint] ] [,

PRIMARY KEY (column1 [, column2]) ] [,

FOREIGN KEY (column1 [, column2]) REFERENCES tablename] [,

CONSTRAINT constraint ] );


All the SQL commands you will see in this chapter are located in script files in the Premium Website for this

book. You can copy and paste the SQL commands into your SQL program. Script files are provided for Oracle

and SQL Server users.

To make the SQL code more readable, most SQL programmers use one line per column (attribute) definition. In

addition, spaces are used to line up the attribute characteristics and constraints. Finally, both table and attribute names are

fully capitalized. Those conventions are used in the following examples that create VENDOR and PRODUCT tables and

throughout the book.

Note

SQL SYNTAX

Syntax notation for SQL commands used in this book:

CAPITALS Required SQL command keywords

italics An end-user-provided parameter (generally required)

{a | b | ..} A mandatory parameter; use one option from the list separated by |

[ ] An optional parameter—anything inside square brackets is optional

Tablename The name of a table

Column The name of an attribute in a table

data type A valid data-type definition

constraint A valid constraint definition

condition A valid conditional expression (evaluates to true or false)

columnlist One or more column names or expressions separated by commas

tablelist One or more table names separated by commas

conditionlist One or more conditional expressions separated by logical operators

expression A simple value (such as 76 or Married) or a formula (such as P_PRICE − 10)

Pages 76

CREATE TABLE VENDOR (

V_CODE INTEGER NOT NULL UNIQUE,

V_NAME VARCHAR(35) NOT NULL,

V_CONTACT VARCHAR(15) NOT NULL,

V_AREACODE CHAR(3) NOT NULL,

V_PHONE CHAR(8) NOT NULL,

V_STATE CHAR(2) NOT NULL,

V_ORDER CHAR(1) NOT NULL,

PRIMARY KEY (V_CODE));

Note

Because the PRODUCT table contains a foreign key that references the VENDOR table, create the

VENDOR table first. (In fact, the M side of a relationship always references the 1 side. Therefore, in a 1:M

relationship, you must always create the table for the 1 side first.)

If your RDBMS does not support the VARCHAR2 and FCHAR format, use CHAR.

Oracle accepts the VARCHAR data type and automatically converts it to VARCHAR2.

If your RDBMS does not support SINT or SMALLINT, use INTEGER or INT. If INTEGER is not supported,

use NUMBER.

If you use Access, you can use the NUMBER data type, but you cannot use the number delimiters at the

SQL level. For example, using NUMBER(8,2) to indicate numbers with up to eight characters and two

decimal places is fine in Oracle, but you cannot use it in Access—you must use NUMBER without the

delimiters.

If your RDBMS does not support primary and foreign key designations or the UNIQUE specification,

delete them from the SQL code shown here.

If you use the PRIMARY KEY designation in Oracle, you do not need the NOT NULL and UNIQUE

specifications.

The ON UPDATE CASCADE clause is part of the ANSI standard, but it may not be supported by your

RDBMS. In that case, delete the ON UPDATE CASCADE clause.

CREATE TABLE PRODUCT (

P_CODE VARCHAR(10) NOT NULL UNIQUE,

P_DESCRIPT VARCHAR(35) NOT NULL,

P_INDATE DATE NOT NULL,

P_QOH SMALLINT NOT NULL,

P_MIN SMALLINT NOT NULL,

P_PRICE NUMBER(8,2) NOT NULL,

P_DISCOUNT NUMBER(5,2) NOT NULL,

V_CODE INTEGER,

PRIMARY KEY (P_CODE),

FOREIGN KEY (V_CODE) REFERENCES VENDOR ON UPDATE CASCADE);

As you examine the preceding SQL table-creating command sequences, note the following features:

The NOT NULL specifications for the attributes ensure that a data entry will be made. When it is crucial to have the

data available, the NOT NULL specification will not allow the end user to leave the attribute empty (with no

data entry at all). Because this specification is made at the table level and stored in the data dictionary,

application programs can use this information to create the data dictionary validation automatically.

The UNIQUE specification creates a unique index in the respective attribute. Use it to avoid having duplicated

values in a column.


The primary key attributes contain both a NOT NULL and a UNIQUE specification. Those specifications

enforce the entity integrity requirements. If the NOT NULL and UNIQUE specifications are not supported, use

PRIMARY KEY without the specifications. (For example, if you designate the PK in MS Access, the NOT

NULL and UNIQUE specifications are automatically assumed and are not spelled out.)

The entire table definition is enclosed in parentheses. A comma is used to separate each table element

(attributes, primary key, and foreign key) definition.

Note

If you are working with a composite primary key, all of the primary key’s attributes are contained within the

parentheses and are separated with commas. For example, the LINE table in Figure 7.1 has a primary key that

consists of the two attributes INV_NUMBER and LINE_NUMBER. Therefore, you would define the primary key

by typing:

PRIMARY KEY (INV_NUMBER, LINE_NUMBER),

The order of the primary key components is important because the indexing starts with the first-mentioned

attribute, then proceeds with the next attribute, and so on. In this example, the line numbers would be ordered

within each of the invoice numbers:

INV_NUMBER LINE_NUMBER

1001 1 1001 2 1002 1 1003 1 1003 2

The ON UPDATE CASCADE specification ensures that if you make a change in any VENDOR’s V_CODE, that

change is automatically applied to all foreign key references throughout the system (cascade) to ensure that

referential integrity is maintained. (Although the ON UPDATE CASCADE clause is part of the ANSI standard,

some RDBMSs, such as Oracle, do not support ON UPDATE CASCADE. If your RDBMS does not support the

clause, delete it from the code shown here.)

An RDBMS will automatically enforce referential integrity for foreign keys. That is, you cannot have an invalid

entry in the foreign key column; at the same time, you cannot delete a vendor row as long as a product row

references that vendor.

The command sequence ends with a semicolon. (Remember, your RDBMS may require that you omit the

semicolon.)

Note

NOTE ABOUT COLUMN NAMES

Do not use mathematical symbols such as +, −, and / in your column names; instead, use an underscore to

separate words, if necessary. For example, PER-NUM might generate an error message, but PER_NUM is

acceptable. Also, do not use reserved words. Reserved words are words used by SQL to perform specific functions. For

example, in some RDBMSs, the column name INITIAL will generate the message invalid column name.

Pages 78

Note

NOTE TO ORACLE USERS

When you press the Enter key after typing each line, a line number is automatically generated as long as you do not

type a semicolon before pressing the Enter key. For example, Oracle's execution of the CREATE TABLE

command will look like this:

CREATE TABLE PRODUCT (

2 P_CODE VARCHAR2(10)

3 CONSTRAINT PRODUCT_P_CODE_PK PRIMARY KEY,

4 P_DESCRIPT VARCHAR2(35) NOT NULL,

5 P_INDATE DATE NOT NULL,

6 P_QOH NUMBER NOT NULL,

7 P_MIN NUMBER NOT NULL,

8 P_PRICE NUMBER(8,2) NOT NULL,

9 P_DISCOUNT NUMBER(5,2) NOT NULL,

10 V_CODE NUMBER,

11 CONSTRAINT PRODUCT_V_CODE_FK

12 FOREIGN KEY V_CODE REFERENCES VENDOR)

13 ;

In the preceding SQL command sequence, note the following:

The attribute definition for P_CODE starts in line 2 and ends with a comma at the end of line 3.

The CONSTRAINT clause (line 3) allows you to define and name a constraint in Oracle. You can name the

constraint to meet your own naming conventions. In this case, the constraint was named PRODUCT_P_

CODE_PK.

Examples of constraints are NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK. For

additional details about constraints, see below.

To define a PRIMARY KEY constraint, you could also use the following syntax: P_CODE VARCHAR2(10)

PRIMARY KEY,.

In this case, Oracle would automatically name the constraint.

Lines 11 and 12 define a FOREIGN KEY constraint name PRODUCT_V_CODE_FK for the attribute

V_CODE. The CONSTRAINT clause is generally used at the end of the CREATE TABLE command

sequence.

If you do not name the constraints yourself, Oracle will automatically assign a name. Unfortunately, the Oracle-

assigned name makes sense only to Oracle, so you will have a difficult time deciphering it later. You should assign a

name that makes sense to human beings!

3.2.6 SQL Constraints

In Chapter 3, The Relational Database Model, you learned that adherence to rules on entity integrity and referential

integrity is crucial in a relational database environment. Fortunately, most SQL implementations support both integrity

rules. Entity integrity is enforced automatically when the primary key is specified in the CREATE TABLE command

sequence. For example, you can create the VENDOR table structure and set the stage for the enforcement of entity

integrity rules by using:

PRIMARY KEY (V_CODE)

In the PRODUCT table’s CREATE TABLE sequence, note that referential integrity has been enforced by specifying in

the PRODUCT table:

FOREIGN KEY (V_CODE) REFERENCES VENDOR ON UPDATE CASCADE


That foreign key constraint definition ensures that:

You cannot delete a vendor from the VENDOR table if at least one product row references that vendor. This is

the default behavior for the treatment of foreign keys.

On the other hand, if a change is made in an existing VENDOR table’s V_CODE, that change must be reflected

automatically in any PRODUCT table V_CODE reference (ON UPDATE CASCADE). That restriction makes it

impossible for a V_CODE value to exist in the PRODUCT table pointing to a nonexistent VENDOR table

V_CODE value. In other words, the ON UPDATE CASCADE specification ensures the preservation of

referential integrity. (Oracle does not support ON UPDATE CASCADE.)

In general, ANSI SQL permits the use of ON DELETE and ON UPDATE clauses to cover CASCADE, SET NULL, or

SET DEFAULT.


For a more detailed discussion of the options for the ON DELETE and ON UPDATE clauses, see Appendix D,

Converting an ER Model into a Database Structure, Section D.2, General Rules Governing Relationships

Among Tables. Appendix D is in the Premium Website.

Note

NOTE ABOUT REFERENTIAL CONSTRAINT ACTIONS

The support for the referential constraints actions varies from product to product. For example:

MS Access, SQL Server, and Oracle support ON DELETE CASCADE.

MS Access and SQL Server support ON UPDATE CASCADE.

Oracle does not support ON UPDATE CASCADE.

Oracle supports SET NULL.

MS Access and SQL Server do not support SET NULL.

Refer to your product manuals for additional information on referential constraints.

While MS Access does not support ON DELETE CASCADE or ON UPDATE CASCADE at the SQL

command-line level, it does support them through the relationship window interface. In fact, whenever you try

to establish a relationship between two tables in Access, the relationship window interface will automatically

pop up.

Besides the PRIMARY KEY and FOREIGN KEY constraints, the ANSI SQL standard also defines the following

constraints:

The NOT NULL constraint ensures that a column does not accept nulls.

The UNIQUE constraint ensures that all values in a column are unique.

The DEFAULT constraint assigns a value to an attribute when a new row is added to a table. The end user may, of

course, enter a value other than the default value.

The CHECK constraint is used to validate data when an attribute value is entered. The CHECK constraint does

precisely what its name suggests: it checks to see that a specified condition exists. Examples of such constraints

include the following:

- The minimum order value must be at least 10. -

The date must be after April 15, 2010.

Pages 80

If the CHECK constraint is met for the specified attribute (that is, the condition is true), the data are accepted for that

attribute. If the condition is found to be false, an error message is generated and the data are not accepted.

Note that the CREATE TABLE command lets you define constraints in two different places:

When you create the column definition (known as a column constraint).

When you use the CONSTRAINT keyword (known as a table constraint).

A column constraint applies to just one column; a table constraint may apply to many columns. Those constraints are

supported at varying levels of compliance by enterprise RDBMSs.

In this chapter, Oracle is used to illustrate SQL constraints. For example, note that the following SQL command

sequence uses the DEFAULT and CHECK constraints to define the table named CUSTOMER.

CREATE TABLE CUSTOMER (

CUS_CODE NUMBER PRIMARY KEY,

CUS_LNAME VARCHAR(15) NOT NULL,

CUS_FNAME VARCHAR(15) NOT NULL,

CUS_INITIAL CHAR(1),

CUS_AREACODE CHAR(3) DEFAULT '615' NOT NULL

CHECK(CUS_AREACODE IN ('615','713','931')),

CUS_PHONE CHAR(8) NOT NULL,

CUS_BALANCE NUMBER(9,2) DEFAULT 0.00,

CONSTRAINT CUS_UI1 UNIQUE (CUS_LNAME, CUS_FNAME));

In this case, the CUS_AREACODE attribute is assigned a default value of '615'. Therefore, if a new CUSTOMER table row is

added and the end user makes no entry for the area code, the '615' value will be recorded. Also note that the CHECK

condition restricts the values for the customer’s area code to 615, 713, and 931; any other values will be rejected.

It is important to note that the DEFAULT value applies only when new rows are added to a table and then only when no

value is entered for the customer’s area code. (The default value is not used when the table is modified.) In contrast, the

CHECK condition is validated whether a customer row is added or modified. However, while the CHECK

condition may include any valid expression, it applies only to the attributes in the table being checked. If you want to

check for conditions that include attributes in other tables, you must use triggers. (See Chapter 8, Advanced SQL.)

Finally, the last line of the CREATE TABLE command sequence creates a unique index constraint (named CUS_UI1) on

the customer’s last name and first name. The index will prevent the entry of two customers with the same last name and

first name. (This index merely illustrates the process. Clearly, it should be possible to have more than one person named

John Smith in the CUSTOMER table.)

Note

NOTE TO MS ACCESS USERS

MS Access does not accept the DEFAULT or CHECK constraints. However, MS Access will accept the

CONSTRAINT CUS_UI1 UNIQUE (CUS_LNAME, CUS_FNAME) line and create the unique index.

In the following SQL command to create the INVOICE table, the DEFAULT constraint assigns a default date to a new

invoice, and the CHECK constraint validates that the invoice date is greater than January 1, 2010.

CREATE TABLE INVOICE (

INV_NUMBER NUMBER PRIMARY KEY,

CUS_CODE NUMBER NOT NULL REFERENCES CUSTOMER(CUS_CODE),

INV_DATE DATE DEFAULT SYSDATE NOT NULL,

CONSTRAINT INV_CK1 CHECK (INV_DATE > TO_DATE('01-JAN-2010','DD-MON-YYYY')));


In this case, notice the following:

The CUS_CODE attribute definition contains REFERENCES CUSTOMER (CUS_CODE) to indicate that the

CUS_CODE is a foreign key. This is another way to define a foreign key.

The DEFAULT constraint uses the SYSDATE special function. This function always returns today’s date.

The invoice date (INV_DATE) attribute is automatically given today’s date (returned by SYSDATE) when a new

row is added and no value is given for the attribute.

A CHECK constraint is used to validate that the invoice date is greater than 'January 1, 2010'. When

comparing a date to a manually entered date in a CHECK clause, Oracle requires the use of the TO_DATE

function. The TO_DATE function takes two parameters: the literal date and the date format used.

The final SQL command sequence creates the LINE table. The LINE table has a composite primary key (INV_

NUMBER, LINE_NUMBER) and uses a UNIQUE constraint in INV_NUMBER and P_CODE to ensure that the same

product is not ordered twice in the same invoice.

CREATE TABLE LINE (

INV_NUMBER NUMBER NOT NULL,

LINE_NUMBER NUMBER(2,0) NOT NULL,

P_CODE VARCHAR(10) NOT NULL,

LINE_UNITS NUMBER(9,2) DEFAULT 0.00 NOT NULL,

LINE_PRICE NUMBER(9,2) DEFAULT 0.00 NOT NULL,

PRIMARY KEY (INV_NUMBER, LINE_NUMBER),

FOREIGN KEY (INV_NUMBER) REFERENCES INVOICE ON DELETE CASCADE,

FOREIGN KEY (P_CODE) REFERENCES PRODUCT(P_CODE),

CONSTRAINT LINE_UI1 UNIQUE(INV_NUMBER, P_CODE));

In the creation of the LINE table, note that a UNIQUE constraint is added to prevent the duplication of an invoice line.

A UNIQUE constraint is enforced through the creation of a unique index. Also note that the ON DELETE CASCADE

foreign key action enforces referential integrity. The use of ON DELETE CASCADE is recommended for weak entities

to ensure that the deletion of a row in the strong entity automatically triggers the deletion of the corresponding rows

in the dependent weak entity. In that case, the deletion of an INVOICE row will automatically delete all of the LINE

rows related to the invoice. In the following section, you will learn more about indexes and how to use SQL commands

to create them.

3.2.7 SQL Indexes

You learned in Chapter 3 that indexes can be used to improve the efficiency of searches and to avoid duplicate column

values. In the previous section, you saw how to declare unique indexes on selected attributes when the table is created. In

fact, when you declare a primary key, the DBMS automatically creates a unique index. Even with this feature, you often

need additional indexes. The ability to create indexes quickly and efficiently is important. Using the CREATE INDEX

command, SQL indexes can be created on the basis of any selected attribute. The syntax is:

CREATE [UNIQUE] INDEX indexname ON tablename(column1 [, column2])

For example, based on the attribute P_INDATE stored in the PRODUCT table, the following command creates an

index named P_INDATEX:

CREATE INDEX P_INDATEX ON PRODUCT(P_INDATE);

SQL does not let you write over an existing index without warning you first, thus preserving the index structure within

the data dictionary. Using the UNIQUE index qualifier, you can even create an index that prevents you from using a

value that has been used before. Such a feature is especially useful when the index attribute is a candidate key whose

values must not be duplicated:

Pages 82

CREATE UNIQUE INDEX P_CODEX ON PRODUCT(P_CODE);

If you now try to enter a duplicate P_CODE value, SQL produces the error message “duplicate value in index.” Many

RDBMSs, including Access, automatically create a unique index on the PK attribute(s) when you declare the PK.

A common practice is to create an index on any field that is used as a search key, in comparison operations in a

conditional expression, or when you want to list rows in a specific order. For example, if you want to create a report of

all products by vendor, it would be useful to create an index on the V_CODE attribute in the PRODUCT table.

Remember that a vendor can supply many products. Therefore, you should not create a UNIQUE index in this case.

Better yet, to make the search as efficient as possible, using a composite index is recommended.

Unique composite indexes are often used to prevent data duplication. For example, consider the case illustrated in

Table 7.5, in which required employee test scores are stored. (An employee can take a test only once on a given date.)

Given the structure of Table 7.5, the PK is EMP_NUM + TEST_NUM. The third test entry for employee 111 meets

entity integrity requirements—the combination 111,3 is unique—yet the WEA test entry is clearly duplicated.

TABLE A Duplicated Test Record

3.5

EMP_NUM TEST_NUM TEST_CODE TEST_DATE TEST_SCORE

110 1 WEA 15-Jan-2010 93

110 2 WEA 12-Jan-2010 87

111 1 HAZ 14-Dec-2009 91

111 2 WEA 18-Feb-2010 95

111 3 WEA 18-Feb-2010 95

112 1 CHEM 17-Aug-2009 91

Such duplication could have been avoided through the use of a unique composite index, using the attributes

EMP_NUM, TEST_CODE, and TEST_DATE:

CREATE UNIQUE INDEX EMP_TESTDEX ON TEST(EMP_NUM, TEST_CODE, TEST_DATE);

By default, all indexes produce results that are listed in ascending order, but you can create an index that yields output in

descending order. For example, if you routinely print a report that lists all products ordered by price from highest to

lowest, you could create an index named PROD_PRICEX by typing:

CREATE INDEX PROD_PRICEX ON PRODUCT(P_PRICE DESC); To

delete an index, use the DROP INDEX command:

DROP INDEX indexname

For example, if you want to eliminate the PROD_PRICEX index, type:

DROP INDEX PROD_PRICEX;

After creating the tables and some indexes, you are ready to start entering data. The following sections use two tables

(VENDOR and PRODUCT) to demonstrate most of the data manipulation commands.


3.3 DATA MANIPULATION COMMANDS

In this section, you will learn how to use the basic SQL data manipulation commands INSERT, SELECT, COMMIT,

UPDATE, ROLLBACK, and DELETE.

3.3.1 Adding Table Rows

SQL requires the use of the INSERT command to enter data into a table. The INSERT command’s basic syntax looks

like this:

INSERT INTO tablename VALUES (value1, value2, ... , valuen)

Because the PRODUCT table uses its V_CODE to reference the VENDOR table’s V_CODE, an integrity violation will

occur if those VENDOR table V_CODE values don’t yet exist. Therefore, you need to enter the VENDOR rows before

the PRODUCT rows. Given the VENDOR table structure defined earlier and the sample VENDOR data shown in

Figure 7.2, you would enter the first two data rows as follows:

INSERT INTO VENDOR

VALUES (21225,'Bryson, Inc.','Smithson','615','223-3234','TN','Y');

INSERT INTO VENDOR

VALUES (21226,'Superloo, Inc.','Flushing','904','215-8995','FL','N');

and so on, until all of the VENDOR table records have been entered.

(To see the contents of the VENDOR table, use the SELECT * FROM VENDOR; command.)

The PRODUCT table rows would be entered in the same fashion, using the PRODUCT data shown in Figure 7.2.

example, the first two data rows would be entered as follows, pressing the Enter key at the end of each line:

INSERT INTO PRODUCT

VALUES ('11QER/31','Power painter, 15 psi., 3-nozzle','03-Nov-09',8,5,109.99,0.00,25595);

INSERT INTO PRODUCT

VALUES ('13-Q2/P2','7.25-in. pwr. saw blade','13-Dec-09',32,15,14.99, 0.05, 21344);

(To see the contents of the PRODUCT table, use the SELECT * FROM PRODUCT; command.)

Note

Date entry is a function of the date format expected by the DBMS. For example, March 25, 2010 might be

shown as 25-Mar-2010 in Access and Oracle, or it might be displayed in other presentation formats in another

RDBMS. MS Access requires the use of # delimiters when performing any computations or comparisons based on

date attributes, as in P_INDATE >= #25-Mar-10#.

In the preceding data entry lines, observe that:

The row contents are entered between parentheses. Note that the first character after VALUES is a parenthesis

and that the last character in the command sequence is also a parenthesis.

Character (string) and date values must be entered between apostrophes (').

Numerical entries are not enclosed in apostrophes.

Attribute entries are separated by commas.

A value is required for each column in the table.

This version of the INSERT commands adds one table row at a time.

Pages 84

Inserting Rows with Null Attributes

Thus far, you have entered rows in which all of the attribute values are specified. But what do you do if a product does

not have a vendor or if you don’t yet know the vendor code? In those cases, you would want to leave the vendor code

null. To enter a null, use the following syntax:

INSERT INTO PRODUCT

VALUES ('BRT-345','Titanium drill bit','18-Oct-09', 75, 10, 4.50, 0.06, NULL);

Incidentally, note that the NULL entry is accepted only because the V_CODE attribute is optional—the NOT NULL

declaration was not used in the CREATE TABLE statement for this attribute.

Inserting Rows with Optional Attributes

There might be occasions when more than one attribute is optional. Rather than declaring each attribute as NULL in the

INSERT command, you can indicate just the attributes that have required values. You do that by listing the attribute names

inside parentheses after the table name. For the purpose of this example, assume that the only required attributes for

the PRODUCT table are P_CODE and P_DESCRIPT:

INSERT INTO PRODUCT(P_CODE, P_DESCRIPT) VALUES ('BRT-345','Titanium drill bit');

3.3.2 Saving Table Changes

Any changes made to the table contents are not saved on disk until you close the database, close the program you are

using, or use the COMMIT command. If the database is open and a power outage or some other interruption occurs

before you issue the COMMIT command, your changes will be lost and only the original table contents will be retained.

The syntax for the COMMIT command is:

COMMIT [WORK]

The COMMIT command permanently saves all changes—such as rows added, attributes modified, and rows

deleted—made to any table in the database. Therefore, if you intend to make your changes to the PRODUCT table

permanent, it is a good idea to save those changes by using:

COMMIT;

Note


MS Access doesn't support the COMMIT command because it automatically saves changes after the execution of

each SQL command.

However, the COMMIT command’s purpose is not just to save changes. In fact, the ultimate purpose of the COMMIT

and ROLLBACK commands (see Section 7.3.5) is to ensure database update integrity in transaction management.

(You will see how such issues are addressed in Chapter 10, Transaction Management and Concurrency Control.)

3.3.3 Listing Table Rows

The SELECT command is used to list the contents of a table. The syntax of the SELECT command is as follows:

SELECT columnlist FROM tablename


The columnlist represents one or more attributes, separated by commas. You could use the * (asterisk) as a wildcard

character to list all attributes. A wildcard character is a symbol that can be used as a general substitute for other

characters or commands. For example, to list all attributes and all rows of the PRODUCT table, use:

SELECT * FROM PRODUCT;

Figure 3.3 shows the output generated by that command. (Figure 3.3 shows all of the rows in the PRODUCT table

that serve as the basis for subsequent discussions. If you entered only the PRODUCT table’s first two records, as shown

in the preceding section, the output of the preceding SELECT command would show only the rows you entered. Don’t

worry about the difference between your SELECT output and the output shown in Figure 3.3. When you complete the

work in this section, you will have created and populated your VENDOR and PRODUCT tables with the correct rows

for use in future sections.)

FIGURE The contents of the PRODUCT table

3.3

Note

Your listing may not be in the order shown in Figure 3.3. The listings shown in the figure are the result of

system-controlled primary-key-based index operations. You will learn later how to control the output so that it

conforms to the order you have specified.

Note


Some SQL implementations (such as Oracle's) cut the attribute labels to fit the width of the column. However,

Oracle lets you set the width of the display column to show the complete attribute name. You can also change the

display format, regardless of how the data are stored in the table. For example, if you want to display dollar symbols

and commas in the P_PRICE output, you can declare:

COLUMN P_PRICE FORMAT $99,999.99

to change the output 12347.67 to $12,347.67.

In the same manner, to display only the first 12 characters of the P_DESCRIPT attribute, use:

COLUMN P_DESCRIPT FORMAT A12 TRUNCATE

Pages 86

Although SQL commands can be grouped together on a single line, complex command sequences are best shown on

separate lines, with space between the SQL command and the command’s components. Using that formatting

convention makes it much easier to see the components of the SQL statements, making it easy to trace the SQL logic, and

if necessary, to make corrections. The number of spaces used in the indention is up to you. For example, note the

following format for a more complex statement:

SELECT P_CODE, P_DESCRIPT, P_INDATE, P_QOH, P_MIN, P_PRICE, P_DISCOUNT, V_CODE

FROM PRODUCT;

When you run a SELECT command on a table, the RDBMS returns a set of one or more rows that have the same

characteristics as a relational table. In addition, the SELECT command lists all rows from the table you specified in the

FROM clause. This is a very important characteristic of SQL commands. By default, most SQL data manipulation

commands operate over an entire table (or relation). That is why SQL commands are said to be set-oriented

commands. A SQL set-oriented command works over a set of rows. The set may include one or more columns and

zero or more rows from one or more tables.

3.3.4 Updating Table Rows

Use the UPDATE command to modify data in a table. The syntax for this command is:

UPDATE tablename

SET columnname = expression [, columnname = expression]

[WHERE conditionlist ];

For example, if you want to change P_INDATE from December 13, 2009, to January 18, 2010, in the second row of

the PRODUCT table (see Figure 7.3), use the primary key (13-Q2/P2) to locate the correct (second) row. Therefore, type:

UPDATE PRODUCT

SET P_INDATE = '18-JAN-2010'

WHERE P_CODE = '13-Q2/P2';

If more than one attribute is to be updated in the row, separate the corrections with commas:

UPDATE PRODUCT

SET P_INDATE = '18-JAN-2010', P_PRICE = 17.99, P_MIN = 10

WHERE P_CODE = '13-Q2/P2';

What would have happened if the previous UPDATE command had not included the WHERE condition? The

P_INDATE, P_PRICE, and P_MIN values would have been changed in all rows of the PRODUCT table. Remember,

the UPDATE command is a set-oriented operator. Therefore, if you don’t specify a WHERE condition, the UPDATE

command will apply the changes to all rows in the specified table.

Confirm the correction(s) by using this SELECT command to check the PRODUCT table’s listing:

SELECT * FROM PRODUCT;

3.3.5 Restoring Table Contents

If you have not yet used the COMMIT command to store the changes permanently in the database, you can restore the

database to its previous condition with the ROLLBACK command. ROLLBACK undoes any changes since the last

COMMIT command and brings the data back to the values that existed before the changes were made. To restore the

data to their “prechange” condition, type:

ROLLBACK;


and then press the Enter key. Use the SELECT statement again to see that the ROLLBACK did, in fact, restore the

data to their original values.

COMMIT and ROLLBACK work only with data manipulation commands that are used to add, modify, or delete table

rows. For example, assume that you perform these actions:

1. CREATE a table called SALES.

2. INSERT 10 rows in the SALES table.

3. UPDATE two rows in the SALES table.

4. Execute the ROLLBACK command.

Will the SALES table be removed by the ROLLBACK command? No, the ROLLBACK command will undo only the

results of the INSERT and UPDATE commands. All data definition commands (CREATE TABLE) are automatically

committed to the data dictionary and cannot be rolled back. The COMMIT and ROLLBACK commands are examined in

greater detail in Chapter 10.

Note


MS Access does not support the ROLLBACK command.

Some RDBMSs, such as Oracle, automatically COMMIT data changes when issuing data definition commands. For

example, if you had used the CREATE INDEX command after updating the two rows in the previous example, all

previous changes would have been committed automatically; doing a ROLLBACK afterward wouldn’t have undone

anything. Check your RDBMS manual to understand these subtle differences.

3.3.6 Deleting Table Rows

It is easy to delete a table row using the DELETE statement; the syntax is:

DELETE FROM tablename


For example, if you want to delete from the PRODUCT table the product that you added earlier whose code (P_CODE)

is 'BRT-345', use:

DELETE FROM PRODUCT

WHERE P_CODE = 'BRT-345';

In that example, the primary key value lets SQL find the exact record to be deleted. However, deletions are not limited to

a primary key match; any attribute may be used. For example, in your PRODUCT table, you will see that there are

several products for which the P_MIN attribute is equal to 5. Use the following command to delete all rows from the

PRODUCT table for which the P_MIN is equal to 5:

DELETE FROM PRODUCT

WHERE P_MIN = 5;

Check the PRODUCT table’s contents again to verify that all products with P_MIN equal to 5 have been deleted.

Finally, remember that DELETE is a set-oriented command. And keep in mind that the WHERE condition is optional.

Therefore, if you do not specify a WHERE condition, all rows from the specified table will be deleted!

Pages 88

3.3.7 Inserting Table Rows with a Select Subquery

You learned in Section 3.3.1 how to use the INSERT statement to add rows to a table. In that section, you added rows one

at a time. In this section, you will learn how to add multiple rows to a table, using another table as the source of the

data. The syntax for the INSERT statement is:

INSERT INTO tablename SELECT columnlist FROM tablename;

In that case, the INSERT statement uses a SELECT subquery. A subquery, also known as a nested query or an

inner query, is a query that is embedded (or nested) inside another query. The inner query is always executed first by the

RDBMS. Given the previous SQL statement, the INSERT portion represents the outer query, and the SELECT portion

represents the subquery. You can nest queries (place queries inside queries) many levels deep; in every case, the output

of the inner query is used as the input for the outer (higher-level) query. In Chapter 8 you will learn more about the

various types of subqueries.

The values returned by the SELECT subquery should match the attributes and data types of the table in the INSERT

statement. If the table into which you are inserting rows has one date attribute, one number attribute, and one

character attribute, the SELECT subquery should return one or more rows in which the first column has date values,

the second column has number values, and the third column has character values.


Before you execute the commands in the following sections, you MUST do the following:

If you are using Oracle, run the sqlintrodbinit.sql script file in the Premium Website to create all tables and

load the data in the database.

If you are using Access, copy the original Ch07_SaleCo.mbd file from the Premium Website.

3.4 SELECT QUERIES

In this section, you will learn how to fine-tune the SELECT command by adding restrictions to the search criteria. SELECT,

coupled with appropriate search conditions, is an incredibly powerful tool that enables you to transform data into

information. For example, in the following sections, you will learn how to create queries that can be used to answer

questions such as these: “What products were supplied by a particular vendor?” “Which products are priced below $10?”

“How many products supplied by a given vendor were sold between January 5, 2010 and March 20, 2010?”

3.4.1 Selecting Rows with Conditional Restrictions

You can select partial table contents by placing restrictions on the rows to be included in the output. This is done by

using the WHERE clause to add conditional restrictions to the SELECT statement. The following syntax enables you

to specify which rows to select:

SELECT columnlist

FROM tablelist


The SELECT statement retrieves all rows that match the specified condition(s)—also known as the conditional

criteria—you specified in the WHERE clause. The conditionlist in the WHERE clause of the SELECT statement is

represented by one or more conditional expressions, separated by logical operators. The WHERE clause is optional.


If no rows match the specified criteria in the WHERE clause, you see a blank screen or a message that tells you that no

rows were retrieved. For example, the query:

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE

FROM PRODUCT

WHERE V_CODE = 21344;

returns the description, date, and price of products with a vendor code of 21344, as shown in Figure 3.4.

MS Access users can use the Access QBE (query by example) FIGURE Selected PRODUCT table 3.4 attributes for VENDOR

code 21344

query generator. Although the Access QBE generates its

own “native” version of SQL, you can also elect to type

standard SQL in the Access SQL window, as shown at the

bottom of Figure 3.5. Figure 3.5 shows the Access QBE

screen, the SQL window’s QBE-generated SQL, and the

listing of the modified SQL.

FIGURE The Microsoft Access QBE and its SQL

3.5

Query options

Microsoft Access-generated SQL User-entered SQL

Numerous conditional restrictions can be placed on the selected table contents. For example, the comparison

operators shown in Table 3.6 can be used to restrict output.

Pages 90

Note


The MS Access QBE interface automatically designates the data source by using the table name as a prefix. You

will discover later that the table name prefix is used to avoid ambiguity when the same column name appears

in multiple tables. For example, both the VENDOR and the PRODUCT tables contain the V_CODE attribute.

Therefore, if both tables are used (as they would be in a join), the source of the V_CODE attribute must be

specified.

TABLE Comparison Operators

3.6

SYMBOL MEANING

= Equal to

< Less than

<= Less than or equal to

> Greater than

>= Greater than or equal to

<> or != Not equal to

The following example uses the “not equal to” operator:


FROM PRODUCT

WHERE V_CODE <> 21344;

The output, shown in Figure 3.6, lists all of the rows for which the vendor code is not 21344.

Note that, in Figure 3.6, rows with nulls in the V_CODE column (see Figure 3.3) are not included in the SELECT

command’s output.

The command sequence: FIGURE Selected PRODUCT table 3.6 attributes for VENDOR codes

other than 21344

FIGURE Selected PRODUCT table

3.7 attributes with a P_PRICE restriction

SELECT P_DESCRIPT, P_QOH, P_MIN, P_PRICE

FROM PRODUCT

WHERE P_PRICE <= 10;

yields the output shown in Figure 3.7.

Using Comparison Operators on Character Attributes

Because computers identify all characters by their (numeric)

American Standard Code for Information Interchange

(ASCII) codes, comparison operators may even be used to

place restrictions on character-based attributes. Therefore,

the command:

SELECT P_CODE, P_DESCRIPT, P_QOH, P_MIN,

P_PRICE

FROM PRODUCT

WHERE P_CODE < '1558-QW1';

would be correct and would yield a list of all rows in which the

P_CODE is alphabetically less than 1558-QW1. (Because the


ASCII code value for the letter B is greater than the value of the letter A, it follows that A is less than B.) Therefore, the

output will be generated as shown in Figure 3.8.

String (character) comparisons are made from left to right. FIGURE Selected PRODUCT table

3.8 attributes: the ASCII code effect

This left-to-right comparison is especially useful when

attributes such as names are to be compared. For example,

the string “Ardmore” would be judged greater than the

string “Aarenson” but less than the string “Brown”; such

results may be used to generate alphabetical listings like

those found in a phone directory. If the characters 0−9 are

stored as strings, the same left-to-right string comparisons

can lead to apparent anomalies. For example, the ASCII

code for the character “5” is, as expected, greater than the

ASCII code for the character “4.” Yet the same “5” will also be judged greater than the string “44” because the first

character in the string “44” is less than the string “5.” For that reason, you may get some unexpected results from

comparisons when dates or other numbers are stored in character format. This also applies to date comparisons. For

example, the left-to-right ASCII character comparison would force the conclusion that the date “01/01/2010”

occurred before “12/31/2009.” Because the leftmost character “0” in “01/01/2010” is less than the leftmost

character “1” in “12/31/2009,” “01/01/2010” is less than “12/31/2009.” Naturally, if date strings are stored in

a yyyy/mm/dd format, the comparisons will yield appropriate results, but this is a nonstandard date presentation.

That’s why all current RDBMSs support “date” data types; you should use them. In addition, using “date” data types

gives you the benefit of date arithmetic.

Using Comparison Operators on Dates

Date procedures are often more software-specific than other SQL procedures. For example, the query to list all of the

rows in which the inventory stock dates occur on or after January 20, 2010 will look like this:

SELECT P_DESCRIPT, P_QOH, P_MIN, P_PRICE, P_INDATE

FROM PRODUCT

WHERE P_INDATE >= '20-Jan-2010';

(Remember that MS Access users must use the # delimiters for dates. For example, you would use #20-Jan-10# in the

above WHERE clause.) The date-restricted output is shown in Figure 3.9.


3.9 attributes: date restriction

Using Computed Columns and Column Aliases

Suppose that you want to determine the total value of each

of the products currently held in inventory. Logically, that

determination requires the multiplication of each product’s

quantity on hand by its current price. You can accomplish

this task with the following command:

SELECT P_DESCRIPT, P_QOH, P_PRICE, P_QOH *

P_PRICE

FROM PRODUCT;

Pages 92

FIGURE SELECT statement with a

3.10 computed column

Entering that SQL command in Access generates the output

shown in Figure 3.10.

SQL accepts any valid expressions (or formulas) in the

computed columns. Such formulas can contain any valid

mathematical operators and functions that are applied to

attributes in any of the tables specified in the FROM clause

of the SELECT statement. Note also that Access automati-

cally adds an Expr label to all computed columns. (The first

computed column would be labeled Expr1; the second,

Expr2; and so on.) Oracle uses the actual formula text as the

label for the computed column.

To make the output more readable, the SQL standard

permits the use of aliases for any column in a SELECT

statement. An alias is an alternative name given to a

column or table in any SQL statement.

For example, you can rewrite the previous SQL statement as:

SELECT P_DESCRIPT, P_QOH, P_PRICE, P_QOH * P_PRICE AS TOTVALUE

FROM PRODUCT;

The output of that command is shown in Figure 3.11.

You could also use a computed column, an alias, and date FIGURE SELECT statement with a 3.11 computed column and an alias

arithmetic in a single query. For example, assume that you

want to get a list of out-of-warranty products that have been

stored more than 90 days. In that case, the P_INDATE is at

least 90 days less than the current (system) date. The MS

Access version of this query is:

SELECT P_CODE, P_INDATE, DATE() - 90 AS

CUTDATE

FROM PRODUCT

WHERE P_INDATE <= DATE() - 90;

The Oracle version of the same query is shown here:

SELECT P_CODE, P_INDATE, SYSDATE - 90 AS

CUTDATE

FROM PRODUCT

WHERE P_INDATE <= SYSDATE - 90;

Note that DATE() and SYSDATE are special functions that return the current date in MS Access and Oracle,

respectively. You can use the DATE() and SYSDATE functions anywhere a date literal is expected, such as in the value list

of an INSERT statement, in an UPDATE statement when changing the value of a date attribute, or in a SELECT

statement as shown here. Of course, the previous query output would change based on the current date.


Suppose that a manager wants a list of all products, the dates they were received, and the warranty expiration date (90

days from when the product was received). To generate that list, type:

SELECT P_CODE, P_INDATE, P_INDATE + 90 AS EXPDATE

FROM PRODUCT;

Note that you can use all arithmetic operators with date attributes as well as with numeric attributes.

3.4.2 Arithmetic Operators: The Rule of Precedence

As you saw in the previous example, you can use arithmetic operators with table attributes in a column list or in a

conditional expression. In fact, SQL commands are often used in conjunction with the arithmetic operators shown in

Table 3.7.

TABLE

3.7

The Arithmetic Operators

ARITHMETIC OPERATOR DESCRIPTION

+ Add

- Subtract

* Multiply

/ Divide

^ Raise to the power of (some applications use ** instead of ^)

Do not confuse the multiplication symbol (*) with the wildcard symbol used by some SQL implementations, such as MS

Access; the latter is used only in string comparisons, while the former is used in conjunction with mathematical

procedures.

As you perform mathematical operations on attributes, remember the rules of precedence. As the name suggests, the

rules of precedence are the rules that establish the order in which computations are completed. For example, note the

order of the following computational sequence:

1. Perform operations within parentheses.

2. Perform power operations.

3. Perform multiplications and divisions.

4. Perform additions and subtractions.

The application of the rules of precedence will tell you that 8 + 2 * 5 = 8 + 10 = 18, but (8 + 2) * 5 = 10 * 5 = 50.

Similarly, 4 + 5^2 * 3 = 4 + 25 * 3 = 79, but (4 + 5)^2 * 3 = 81 * 3 = 243, while the operation expressed by (4

+ 5^2) * 3 yields the answer (4 + 25) * 3 = 29 * 3 = 87.

3.4.3 Logical Operators: AND, OR, and NOT

In the real world, a search of data normally involves multiple conditions. For example, when you are buying a new

house, you look for a certain area, a certain number of bedrooms, bathrooms, stories, and so on. In the same way,

SQL allows you to include multiple conditions in a query through the use of logical operators. The logical operators

are AND, OR, and NOT. For example, if you want a list of the table contents for either the V_CODE = 21344 or the

V_CODE = 24288, you can use the OR operator, as in the following command sequence:


FROM PRODUCT

WHERE V_CODE = 21344 OR V_CODE = 24288;

Pages 94

That command generates the six rows shown in Figure 3.12 that match the logical restriction.

The logical AND has the same SQL syntax requirement. FIGURE Selected PRODUCT table The following command generates a list of all rows for which 3.12 attributes: the logical OR


3.13 attributes: the logical AND

SELECT P_DESCRIPT, P_INDATE, P_PRICE,

FROM PRODUCT

P_PRICE is less than $50 and for which P_INDATE is a date

occurring after January 15, 2010:

SELECT P_DESCRIPT, P_INDATE, P_PRICE,

V_CODE

FROM PRODUCT

WHERE P_PRICE < 50

AND P_INDATE > '15-Jan-2010';

This command will produce the output shown in Figure 7.13.

You can combine the logical OR with the logical AND

to place further restrictions on the output. For

example, suppose that you want a table listing for the

following conditions:

The P_INDATE is after January 15, 2010, and the

P_PRICE is less than $50.

Or the V_CODE is 24288.

The required listing can be produced by using:

V_CODE

WHERE (P_PRICE < 50 AND P_INDATE > '15-Jan-2010')

OR V_CODE = 24288;

Note the use of parentheses to combine logical restrictions. Where you place the parentheses depends on how you

want the logical restrictions to be executed. Conditions listed within parentheses are always executed first. The

preceding query yields the output shown in Figure 7.14.

Note that the three rows with the V_CODE = 24288 are FIGURE Selected PRODUCT table 3.14 attributes: the logical AND

and OR

included regardless of the P_INDATE and P_PRICE entries

for those rows.

The use of the logical operators OR and AND can become

quite complex when numerous restrictions are placed on the

query. In fact, a specialty field in mathematics known as

Boolean algebra is dedicated to the use of logical

operators.

The logical operator NOT is used to negate the result of a

conditional expression. That is, in SQL, all conditional

expressions evaluate to true or false. If an expression is true,


the row is selected; if an expression is false, the row is not selected. The NOT logical operator is typically used to find the

rows that do not match a certain condition. For example, if you want to see a listing of all rows for which the

vendor code is not 21344, use the command sequence:

SELECT *

FROM PRODUCT

WHERE NOT (V_CODE = 21344);

Note that the condition is enclosed in parentheses; that practice is optional, but it is highly recommended for clarity.

The logical NOT can be combined with AND and OR.

Note

If your SQL version does not support the logical NOT, you can generate the required output by using the

condition:

WHERE V_CODE <> 21344

If your version of SQL does not support <>, use:

WHERE V_CODE != 21344

3.4.4 Special Operators

ANSI-standard SQL allows the use of special operators in conjunction with the WHERE clause. These special operators

include:

BETWEEN: Used to check whether an attribute value is within a range IS

NULL: Used to check whether an attribute value is null

LIKE: Used to check whether an attribute value matches a given string pattern

IN: Used to check whether an attribute value matches any value within a value list

EXISTS: Used to check whether a subquery returns any rows

The BETWEEN Special Operator

If you use software that implements a standard SQL, the operator BETWEEN may be used to check whether an

attribute value is within a range of values. For example, if you want to see a listing for all products whose prices are

between $50 and $100, use the following command sequence:

SELECT *

FROM PRODUCT

WHERE P_PRICE BETWEEN 50.00 AND 100.00;

Note


When using the BETWEEN special operator, always specify the lower range value first. If you list the higher range value

first, Oracle will return an empty result set.

Pages 96

If your DBMS does not support BETWEEN, you can use:

SELECT *

FROM PRODUCT

WHERE P_PRICE > 50.00 AND P_PRICE < 100.00;

The IS NULL Special Operator

Standard SQL allows the use of IS NULL to check for a null attribute value. For example, suppose that you want to list

all products that do not have a vendor assigned (V_CODE is null). Such a null entry could be found by using the

command sequence:

SELECT P_CODE, P_DESCRIPT, V_CODE

FROM PRODUCT

WHERE V_CODE IS NULL;

Similarly, if you want to check a null date entry, the command sequence is:

SELECT P_CODE, P_DESCRIPT, P_INDATE

FROM PRODUCT

WHERE P_INDATE IS NULL;

Note that SQL uses a special operator to test for nulls. Why? Couldn’t you just enter a condition such as V_CODE =

NULL ? No. Technically, NULL is not a “value” the way the number 0 (zero) or the blank space is, but instead a

NULL is a special property of an attribute that represents precisely the absence of any value.

The LIKE Special Operator

The LIKE special operator is used in conjunction with wildcards to find patterns within string attributes. Standard SQL

allows you to use the percent sign (%) and underscore (_) wildcard characters to make matches when the entire string is

not known:

% means any and all following or preceding characters are eligible. For example,

'J%' includes Johnson, Jones, Jernigan, July, and J-231Q.

'Jo%' includes Johnson and Jones.

'%n' includes Johnson and Jernigan.

_ means any one character may be substituted for the underscore. For example,

'_23-456-6789' includes 123-456-6789, 223-456-6789, and 323-456-6789. '_23-

_56-678_' includes 123-156-6781, 123-256-6782, and 823-956-6788. '_o_es'

includes Jones, Cones, Cokes, totes, and roles.

Note

Some RDBMSs, such as Microsoft Access, use the wildcard characters * and ? instead of % and _.

For example, the following query would find all VENDOR rows for contacts whose last names begin with Smith.

SELECT V_NAME, V_CONTACT, V_AREACODE, V_PHONE

FROM VENDOR

WHERE V_CONTACT LIKE 'Smith%';

If you check the original VENDOR data in Figure 7.2 again, you’ll see that this SQL query yields three records: two

Smiths and one Smithson.


Keep in mind that most SQL implementations yield case-sensitive searches. For example, Oracle will not yield a result

that includes Jones if you use the wildcard search delimiter 'jo%' in a search for last names. The reason is that Jones

begins with a capital J, and your wildcard search starts with a lowercase j. On the other hand, MS Access searches are

not case sensitive.

For example, suppose that you typed the following query in Oracle:


FROM VENDOR

WHERE V_CONTACT LIKE 'SMITH%';

No rows will be returned because character-based queries may be case sensitive. That is, an uppercase character has

a different ASCII code than a lowercase character, causing SMITH, Smith, and smith to be evaluated as different

(unequal) entries. Because the table contains no vendor whose last name begins with (uppercase) SMITH, the

(uppercase) 'SMITH%' used in the query cannot be matched. Matches can be made only when the query entry is written

exactly like the table entry.

Some RDBMSs, such as Microsoft Access, automatically make the necessary conversions to eliminate case sensitivity.

Others, such as Oracle, provide a special UPPER function to convert both table and query character entries to

uppercase. (The conversion is done in the computer’s memory only; the conversion has no effect on how the value is

actually stored in the table.) So if you want to avoid a no-match result based on case sensitivity, and if your RDBMS

allows the use of the UPPER function, you can generate the same results by using the query:


FROM VENDOR

WHERE UPPER(V_CONTACT) LIKE 'SMITH%';

The preceding query produces a list that includes all rows containing a last name that begins with Smith, regardless

of uppercase or lowercase letter combinations such as Smith, smith, and SMITH.

The logical operators may be used in conjunction with the special operators. For instance, the query:


FROM VENDOR

WHERE V_CONTACT NOT LIKE 'Smith%';

will yield an output of all vendors whose names do not start with Smith. Suppose that you do not know whether a person’s name is spelled Johnson or Johnsen. The wildcard character _ lets

you find a match for either spelling. The proper search would be instituted by the query:

SELECT *

FROM VENDOR

WHERE V_CONTACT LIKE 'Johns_n';

Thus, the wildcards allow you to make matches when only approximate spellings are known. Wildcard characters may be

used in combinations. For example, the wildcard search based on the string '_l%' can yield the strings Al, Alton, Elgin,

Blakeston, blank, bloated, and eligible.

Pages 98

The IN Special Operator

Many queries that would require the use of the logical OR can be more easily handled with the help of the special

operator IN. For example, the query:

SELECT *

FROM PRODUCT

WHERE V_CODE = 21344

OR V_CODE = 24288;

can be handled more efficiently with:

SELECT *

FROM PRODUCT

WHERE V_CODE IN (21344, 24288);

Note that the IN operator uses a value list. All of the values in the list must be of the same data type. Each of the values in

the value list is compared to the attribute—in this case, V_CODE. If the V_CODE value matches any of the values in

the list, the row is selected. In this example, the rows selected will be only those in which the V_CODE is either

21344 or 24288.

If the attribute used is of a character data type, the list values must be enclosed in single quotation marks. For instance, if

the V_CODE had been defined as CHAR(5) when the table was created, the preceding query would have read:

SELECT *

FROM PRODUCT

WHERE V_CODE IN ('21344', '24288');

The IN operator is especially valuable when it is used in conjunction with subqueries. For example, suppose that you

want to list the V_CODE and V_NAME of only those vendors who provide products. In that case, you could use a

subquery within the IN operator to automatically generate the value list. The query would be:

SELECT V_CODE, V_NAME

FROM VENDOR

WHERE V_CODE IN (SELECT V_CODE FROM PRODUCT);

The preceding query will be executed in two steps:

1. The inner query or subquery will generate a list of V_CODE values from the PRODUCT tables. Those

V_CODE values represent the vendors who supply products.

2. The IN operator will compare the values generated by the subquery to the V_CODE values in the VENDOR

table and will select only the rows with matching values—that is, the vendors who provide products.

The IN special operator will receive additional attention in Chapter 8, where you will learn more about subqueries.

The EXISTS Special Operator

The EXISTS special operator can be used whenever there is a requirement to execute a command based on the result of

another query. That is, if a subquery returns any rows, run the main query; otherwise, don’t. For example, the

following query will list all vendors, but only if there are products to order:

SELECT *

FROM VENDOR

WHERE EXISTS (SELECT * FROM PRODUCT WHERE P_QOH <= P_MIN);


The EXISTS special operator is used in the following example to list all vendors, but only if there are products with the

quantity on hand, less than double the minimum quantity:

SELECT *

FROM VENDOR

WHERE EXISTS (SELECT * FROM PRODUCT WHERE P_QOH < P_MIN * 2);

The EXISTS special operator will receive additional attention in Chapter 8, where you will learn more about

subqueries.

4

Overview

of Views

Views

When studying data analysis, we saw that a query was a technique of

isolating a series of columns and/or records of a table. Although this is

usually done for the purpose of data analysis, it can also be done to create

a new list of items for any particular reason. Most of the time, a query is

created temporarily, such as during data analysis while using a table, a

form, or a web page. After using such a temporary list, it is then

dismissed. Many database applications, including Microsoft SQL Server,

allow you to create a query and be able to save it for later use, or even to

use it as if it were its own table. This is the idea behind a view.

Definition

A view is a list of columns or a series of records retrieved from one or more existing

tables, or as a combination of one or more views and one or more tables. Based on this,

before creating a view, you must first decide where its columns and records would come

from. Obviously the easiest view is one whose columns and records come from one

table.

4.1 Practical Learning: Introducing Views

1. Start Microsoft Visual C# and create a new Windows Application named YugoNationalBank1

2. In the Solution Explorer, right-click Form1.cs and click Rename

3. Type Central.cs and press Enter

4. Double-click the middle of the form and implement the Load event as follows:

using System;

using System.Collections.Generic;

using System.ComponentModel;

using System.Data;

using System.Drawing;

using System.Linq;

using System.Text;

using System.Windows.Forms;

using System.Data.SqlClient;

namespace YugoNationalBank1

{

public partial class Central : Form

{

public Central()

{

InitializeComponent();

}

void CreateDatabase()

{

string strAction = "";

SqlConnection cnnYNB = null;

SqlCommand cmdYNB = null;

using (cnnYNB = new SqlConnection("Data

Source=(local); " +

"Integrated

Security='SSPI';"))

{

strAction = "IF EXISTS ( " +

"SELECT name " +

"FROM sys.databases " +

"WHERE name = N'YugoNationalBank2') "

+

"DROP DATABASE YugoNationalBank2; " +

"CREATE DATABASE YugoNationalBank2";

cmdYNB = new SqlCommand(strAction, cnnYNB);

cnnYNB.Open();

cmdYNB.ExecuteNonQuery();

MessageBox.Show("A database named

YugoNationalBank2 " +

"has been created.");

}


Source=(local); " +

Pages 100

"Database='YugoNationalBank2'; " +

"Integrated

Security='SSPI';"))

{

strAction = "CREATE TABLE dbo.AccountTypes( " +

"AccountTypeID int Identity(1,1) NOT

NULL, " +

"AccountType nvarchar(40) NOT NULL, "

+

"Notes text NULL, " +

"CONSTRAINT PK_AccountTypes PRIMARY "

+

" KEY (AccountTypeID));";


cnnYNB.Open();


MessageBox.Show("A table named AccountTypes " +

"has been added to the

database.");

}


Source=(local); " +


"Integrated

Security='SSPI';"))

{

strAction = "CREATE TABLE dbo.Employees( " +

"EmployeeID int identity(1,1) NOT

NULL, " +

"EmployeeNumber char(6), " +

"FirstName nvarchar(32), " +

"LastName nvarchar(32) NOT NULL, " +

"Title nvarchar(50), " +

"CanCreateNewAccount bit, " +

"HourlySalary nvarchar(50), " +

"Username nvarchar(20), " +

"Password nvarchar(20), " +

"EmailAddress nvarchar(100), " +

"Notes text, " +

"CONSTRAINT PK_Employees PRIMARY KEY

(EmployeeID));";


cnnYNB.Open();


MessageBox.Show("A table named Employees has " +

"been added to the database.");

}


Source=(local); " +


"Integrated

Security='SSPI';"))

{

strAction = "CREATE TABLE dbo.Customers( " +

"CustomerID int Identity(1,1) NOT

NULL, " +

"EmployeeID int Constraint FK_Employee

" +

Pages 101

" References Employees(EmployeeID), " +

"DateCreated nvarchar(50), " +

"AccountTypeID int Constraint

FK_TypeOfAccount " +

" References

AccountTypes(AccountTypeID), " +

"AccountNumber nvarchar(12), " +

"CustomerName nvarchar(50) NOT NULL, "

+

"Address nvarchar(100), " +

"City nvarchar(50), " +

"State nvarchar(50), " +

"ZIPCode nvarchar(50), " +

"AccountStatus nvarchar(50), " +

"Username nvarchar(20), " +

"Password nvarchar(20), " +

"EmailAddress nvarchar(100), " +

"Notes text, " +

"CONSTRAINT PK_Customers PRIMARY KEY

(CustomerID));";


cnnYNB.Open();


MessageBox.Show("A table named Customers has " +

"been added to the database.");

}


Source=(local); " +


"Integrated

Security='SSPI';"))

{

strAction = "CREATE TABLE

dbo.AccountsTransactions( " +

"AccountTransactionID int identity(1,

1) NOT NULL, " +

"EmployeeID int Constraint FK_Clerk "

+

" References Employees(EmployeeID),

" +

"CustomerID int Constraint

FK_Depositor " +

" References Customers(CustomerID)

NOT NULL, " +

"TransactionDate nvarchar(50), " +

"TransactionType nvarchar(50), " +

"CurrencyType nvarchar(50), " +

"DepositAmount nvarchar(50), " +

"WithdrawalAmount nvarchar(50), " +

"ChargeAmount nvarchar(50), " +

"ChargeReason nvarchar(50), " +

"Notes text, " +

"CONSTRAINT PK_AccountTransactions

PRIMARY KEY " +

" (AccountTransactionID));";


cnnYNB.Open();


MessageBox.Show("A table named AccountTransactions

Pages 102

" +

"has been added to the

database.");

}

using (SqlConnection cnnTimesheets =

new SqlConnection("Data Source=(local);" +

"Database='YugoNationalBank9';"

+

"Integrated Security=SSPI;"))

{

string strTimesheets = "CREATE TABLE

dbo.Timesheets ( " +

"TimesheetID int identity(1, 1) NOT NULL, " +

"EmployeeNumber nvarchar(5), " +

"StartDate nvarchar(50), " +

"TimesheetCode nvarchar(15), " +

"Week1Monday nvarchar(6), " +

"Week1Tuesday nvarchar(6), " +

"Week1Wednesday nvarchar(6), " +

"Week1Thursday nvarchar(6), " +

"Week1Friday nvarchar(6), " +

"Week1Saturday nvarchar(6), " +

"Week1Sunday nvarchar(6), " +

"Week2Monday nvarchar(6), " +

"Week2Tuesday nvarchar(6), " +

"Week2Wednesday nvarchar(6), " +

"Week2Thursday nvarchar(6), " +

"Week2Friday nvarchar(6), " +

"Week2Saturday nvarchar(6), " +

"Week2Sunday nvarchar(6), " +

"Notes text, " +

"CONSTRAINT PK_Timesheets PRIMARY KEY

(TimesheetID));";

SqlCommand cmdTimesheets =

new SqlCommand(strTimesheets,

cnnTimesheets);

cnnTimesheets.Open();

cmdTimesheets.ExecuteNonQuery();

MessageBox.Show("A table named Timesheets has

been created.");

}

}

private void Central_Load(object sender, EventArgs e)

{

CreateDatabase();

}

}

}

Pages 103

5. Execute the application to create the database

6. Close the form and return to your programming environment

7. To create a data source, on the main menu, click Data -> Add New Data Source...

8. In the first page of the wizard, make sure Database is selected and click Next

9. In the combo box

a. If you see a YugoNationalBank2, select it

b. If you do not have YugoNationalBank2, click New Connection... In the Server combo box,

select the server or type (local). In the Select Or Enter A Database Name combo box, select

YugoNationalBank2. Click Test Connection. Click OK twice. In the Data Source

Configuration Wizard, make sure the new connection is selected and click Next. Change the

Connection String to csYugoNationalBank and click Next. Click the check box of Tables.

Change the DataSet Name to dsYugoNationalBank

10. Click Finish

11. To create a new form, on the main menu, click Project -> Add Windows Form...

12. Set the Name to AccountTypes and click Add

13. From the Data Sources window, drag the AccountTypes node and drop it on the form

14. Design the form as follows:

Pages 104

Control Text Name Other Properties

DataGridView dgvProperties Anchor: Top,

Bottom, Left, Right

Button Close btnClose Anchor: Bottom,

Right

15. Double-click the Close button and implement its even as follows:

private void btnClose_Click(object sender, EventArgs e)

{

Close();

}

16. Access the Central form, add a button and change its properties as follows:

(Name): btnAccountTypes

Text: Account Types...

17. Double-click the Account Types button and implement its event as follows:

private void Central_Load(object sender, EventArgs e)

{

// CreateDatabase();

}

private void btnAccountTypes_Click(object sender, EventArgs e)

{

AccountTypes types = new AccountTypes();

types.ShowDialog();

}

18. Execute the application and open the Account Types form

19. Create the following records:

Pages 105

AccountType

Saving

Checking

Certificate of Deposit

20. Close the forms and return to your programming environment


22. Set the Name to Employees and click Add

23. In the Data Sources window, click Employees and click the arrow on its right side to drop the

combo box

24. Select Details

25. Drag the Employees node and drop it on the form


Pages 106



{

Close();

}


(Name): btnEmployees

Text: Employees...


private void btnEmployees_Click(object sender, EventArgs e)

{

Employees staff = new Employees();

staff.ShowDialog();

}

30. Execute the application and open the Employees form

Pages 107




34. Set the Name to Timesheet and click Add


Pages 108

http://www.yevol.com/en/vcsharp/databasedesign/ynb.htm


Label Employee #:

MaskedTextBox txtEmployeeNumber Mask: 00000

Label . lblEmployeeName

Label Start Date:

DateTimePicker dtpStartDate

Label End Date:

Label . lblEndDate

Label Mon

Label Tue

Label Wed

Label Thu

Label Fri

Label Sat

Label Sun

Label Week 1:

TextBox 0.00 txtWeek1Monday TextAlign: Right

TextBox 0.00 txtWeek1Tuesday TextAlign: Right

TextBox 0.00 txtWeek1Wednesday TextAlign: Right

TextBox 0.00 txtWeek1Thursday TextAlign: Right

TextBox 0.00 txtWeek1Friday TextAlign: Right

TextBox 0.00 txtWeek1Saturday TextAlign: Right

TextBox 0.00 txtWeek1Sunday TextAlign: Right

Label Week 2:

TextBox 0.00 txtWeek2Monday TextAlign: Right

TextBox 0.00 txtWeek2Tuesday TextAlign: Right

TextBox 0.00 txtWeek2Wednesday TextAlign: Right

TextBox 0.00 txtWeek2Thursday TextAlign: Right

TextBox 0.00 txtWeek2Friday TextAlign: Right

TextBox 0.00 txtWeek2Saturday TextAlign: Right

TextBox 0.00 txtWeek2Sunday TextAlign: Right

Label Notes

TextBox txtNotes Multiline: true

Button Submit btnSubmit

Button Reset btnReset

Button Close btnClose

Pages 109

36. Double-click the middle of the form and implement the event as follows:

37. Make the following changes:

using System;



using System.Data;


using System.Linq;

using System.Text;



namespace YugoNationalBank1g

{

public partial class Timesheet : Form

{

int EmployeeID;

bool bNewRecord;

bool ValidTimeSheet;

string strTimeSheetCode;

public Timesheet()

{


}

private void Timesheet_Load(object sender,

EventArgs e)

{

EmployeeID = 0;

bNewRecord = true;

ValidTimesheet = false;

strTimesheetCode = "";

}

}

}

38. Return to the form, click the EmployeeNumber text box and, on the Properties window, click the

Events button

39. In the Events section, double-click Leave and implement the even as follows:

private void txtEmployeeNumber_Leave(object sender,

EventArgs e)

{

if (this.txtEmployeeNumber.Text == "")

{


return;

}

string strSelect = "SELECT * FROM Employees " +

"WHERE EmployeeNumber = '" +

txtEmployeeNumber.Text + "';";

SqlConnection conDatabase =

new SqlConnection("Data Source=(local); " +

Pages 110

"Database='YugoNationalBank9';" +

"Integrated Security=true");

SqlCommand cmdDatabase = new SqlCommand(strSelect,

conDatabase);

DataSet dsEmployees = new DataSet();

SqlDataAdapter sda = new SqlDataAdapter();

sda.SelectCommand = cmdDatabase;

sda.Fill(dsEmployees);

try

{

DataRow recEmployee =

dsEmployees.Tables[0].Rows[0];

if (recEmployee.IsNull("EmployeeNumber"))

{


throw new

System.IndexOutOfRangeException("Bad Employee

Number!");

}

else

{

ValidTimesheet = true;

EmployeeID =

(int)recEmployee["EmployeeID"];

string strFullName =

(string)recEmployee["FirstName"] +

" " +

(string)recEmployee["LastName"];

lblEmployeeName.Text = "Welcome " +

strFullName;

}

}

catch (IndexOutOfRangeException)

{

MessageBox.Show("There is no employee with that

number!");


lblEmployeeName.Text = "";

txtEmployeeNumber.Text = "";

}

dtpStartDate.Value = DateTime.Today;

txtWeek1Monday.Text = "0.00";

txtWeek1Tuesday.Text = "0.00";

txtWeek1Wednesday.Text = "0.00";

txtWeek1Thursday.Text = "0.00";

txtWeek1Friday.Text = "0.00";

txtWeek1Saturday.Text = "0.00";

txtWeek1Sunday.Text = "0.00";








conDatabase.Close();

}

Pages 111

40. Return to the form and double-click the Reset button

41. Implement its event as follows:

private void btnReset_Click(object sender, EventArgs e)

{















bNewRecord = true;

}

To implement the electronic time, we will use two pieces of information are required: an employee's number

and a starting period. After an employee has opened a time sheet:

1. The employee must first provide an employee number, which we will check in the Employees table. If

the employee provides a valid employee number, we can continue with the time sheet. If the employee

number is invalid, we will let the user know and we cannot continue with the time sheet

2. After the employee has provided a valid employee number, we will request the starting period. After

entering a (valid) date, we will check the time. If there is a record that holds both the employee number

and the start date, this means that the employee had previously worked on a time sheet and we will

open that existing time sheet.

After the the employee or contractor has entered a valid employee number and a start date, we will create a

number called a time sheet code, represented in the TimeSheet as the TimeSheetCode column. This number is

created as follows:

0000000000000

The first 5 digits represent the employee's number. The second 4 digits represent the year of the start date. The

next 2 digits represent the month, and the last 2 digits represent the day. This number must be unique so that

there would not be a duplicate number throughout the time sheet.

To make sure the value of the TimeSheetCode is unique for each record, after the employee has provided a

valid employee number and a start date, we will create the time sheet code and check if that number exists in

the TimeSheet table already:

o If that number exists already, this means that the employee has previously worked on that time

sheet and he or she simply wants to verify or update it. We will then open the time values for that

record and let the user view or change it

o If there is no record with the specified time sheet code, we will conclude that the employee is

working on a new time sheet

Pages 112

42. Return to the Timesheet form and click the Start Date control

1. In the Events section of the Properties window, double-click CloseUP and implement the event as

follows:

private void dtpStartDate_CloseUp(object sender,

EventArgs e)

{

lblEndDate.Text =

dtpStartDate.Value.AddDays(14).ToString();

if (txtEmployeeNumber.Text.Equals(""))

{


return;

}

string strMonth;

string strDay;

int iMonth;

int iDay;

DateTime dteStart;

dteStart = dtpStartDate.Value;

iMonth = dteStart.Month;

iDay = dteStart.Day;

if (iMonth < 10)

strMonth = dteStart.Year + "0" +

iMonth.ToString();

else

strMonth = dteStart.Year +

iMonth.ToString();

if (iDay < 10)

strDay = strMonth + "0" +

iDay.ToString();

else

strDay = strMonth + iDay.ToString();

strTimesheetCode = txtEmployeeNumber.Text +

strDay;

MessageBox.Show(strTimesheetCode);

SqlConnection conTimeSheet = null;

string strSQL =

String.Concat("SELECT * FROM dbo.Timesheets

WHERE TimeSheetCode = '",

strTimesheetCode, "';");

conTimeSheet =

new SqlConnection("Data

Source=(local); " +


"Integrated

Security=true");

SqlCommand cmdTimeSheet = new

SqlCommand(strSQL, conTimeSheet);

Pages 113

DataSet dsTimeSheet = new

DataSet("TimeSheetSet");

SqlDataAdapter sdaTimeSheet = new

SqlDataAdapter();

sdaTimeSheet.SelectCommand = cmdTimeSheet;

sdaTimeSheet.Fill(dsTimeSheet);

conTimeSheet.Close();

try

{

DataRow recTimeSheet =

dsTimeSheet.Tables[0].Rows[0];

strTimesheetCode =

(string)(recTimeSheet["TimeSheetCode"]);

if

(recTimeSheet.IsNull("TimeSheetCode"))

{

bNewRecord = true;

throw new

System.IndexOutOfRangeException(

"No TimeSheet with that number

exists!");

}

else

{

txtWeek1Monday.Text =

(string)(recTimeSheet["Week1Monday"]);

txtWeek1Tuesday.Text =

(string)(recTimeSheet["Week1Tuesday"]);

txtWeek1Wednesday.Text =

(string)(recTimeSheet["Week1Wednesday"]);

txtWeek1Thursday.Text =

(string)(recTimeSheet["Week1Thursday"]);

txtWeek1Friday.Text =

(string)(recTimeSheet["Week1Friday"]);

txtWeek1Saturday.Text =

(string)(recTimeSheet["Week1Saturday"]);

txtWeek1Sunday.Text =

(string)(recTimeSheet["Week1Sunday"]);

txtWeek2Monday.Text =

(string)(recTimeSheet["Week2Monday"]);

txtWeek2Tuesday.Text =

(string)(recTimeSheet["Week2Tuesday"]);

txtWeek2Wednesday.Text =

(string)(recTimeSheet["Week2Wednesday"]);

txtWeek2Thursday.Text =

(string)(recTimeSheet["Week2Thursday"]);

txtWeek2Friday.Text =

(string)(recTimeSheet["Week2Friday"]);

txtWeek2Saturday.Text =

(string)(recTimeSheet["Week2Saturday"]);

txtWeek2Sunday.Text =

(string)(recTimeSheet["Week2Sunday"]);

bNewRecord = false;

}

}

catch (IndexOutOfRangeException)

{

btnReset_Click(sender, e);

}

}

Pages 114


(Name): btnTimesheet

Text: Employee's Time Sheet...


private void btnTimesheet_Click(object sender, EventArgs e)

{

Timesheet sheet = new Timesheet();

sheet.ShowDialog();

}

4. Save all

Pages 115

4.2 Fundamentals of Creating and Using a View

4.2.1 Visually Creating a View

To create a view, you can use the Object Explorer (Microsoft SQL Server Management Studio), a query

window (Microsoft SQL Server Management Studio), or the Server Explorer (Microsoft Visual Studio).

Before starting the view, you would have to specify the table(s) that would be involved. To create a view from

the Object Explorer or the Server Explorer, you can expand the database, right-click Views and click New

View or Add New View. This would open the Add Table dialog box:

The basic functionality of this dialog box is exactly the same as we reviewed for data analysis in the previous

lesson:

To specify the table that would be used as the source, you can click it in the list box of the Tables

property page

If you will be using another existing view, from the Views property page, you can click the name of

the desired view

If a function will be used to generate the records, you can locate it in the Functions property page.

After selecting the source object, you can either double-click it or you can click it once and click Add.

In the previous lesson, we saw that you could add more than one existing table. In the same way, you

can add more than one view or functions

After selecting the source(s), you can click Close on the Add Table dialog box

After selecting the objects, as we saw in the previous lesson, they would display in the window

As seen in the previous lesson, if you are using more than one table and they are not (yet) related, you

can drag a column from one table and drop it on another table to create a JOIN between them

As we saw in previous lessons, to select a column, you can click its check box in the top list. This

would display it in the first empty box under the Column column and would add its name to the

SELECT statement. Alternatively, you can click an empty box in the Column column to reveal its

combo box, then click the arrow of the combo box and select the desired column from the list

Pages 116

After selecting the column, its check box would be checked in the top section of the window, its name

would be displayed in the Column column, and it would be added to the SELECT statement. If you

know the name of the column you want to add, you can manually type it in the SELECT statement.

The structure of a view can be considered complete when the SELECT statement is as complete as possible.

At any time, to test the results of a view, you can run it. To do this, in the Microsoft SQL Server Management

Studio you can click the Execute SQL button or in Microsoft Visual Studio, you can right-click the view

and click Execute SQL. This would cause the bottom section of the view to display the results of the query.

Here is an example:

As reviewed during data analysis and when creating joins in previous lessons, you can add conditions in a

view to make it isolate only some records.

Pages 117

Here is an example:

4.2.2 The Name of a View

As stated already, one of the reasons for creating a view is to be able to use it over and over again. To achieve

this, the view must be saved. Like most objects in Microsoft SQL Server, a view must have a name and it is

saved as its own object. To save a view from the view window, you can click the Save button on the toolbar.

You can also attempt to close the window. You would then be prompted to save it. When saving a view, you

should give it a name that follows the rules and suggestions of SQL. In our lessons, here are the rules we will

use to name our views:

A name will start with a letter

After the first letter, the name will have combinations of underscores, letters, and digits. Examples are

n24, act_52_t

A name will not include special characters such as !, @, #, $, %, ^, &, or *

Pages 118

A name will not have spaces

If the name is a combination of words, each word will start in uppercase

After saving a view, it becomes part of the Views node of its database: a node would be created for it and its

name would appear in the Views node of its database.

Opening a View

As stated already, a view is a technique of selecting records to view or use over an over again. After a view

has been created, you can open it. You have two main options.

To see the structure of a view, such as the table(s) on which it is based and the relationships, if any that

compose it, in the Object Explorer, right-click the view and click Design

To see the SQL code that makes up a view, in the Object Explorer, right-click the view and click Edit

Executing a View

Executing a view consists of seeing its results. To do this, you have various options. To view the results of a

view:

Open an empty query window associated with the database that contains the view. In the query

window, write a SELECT statement using the same formulas and rules we saw for tables. Here is an

example:

Pages 119

From the Object Explorer, expand the database and its Views node. Right-click the name of the view

and click Open View

4.3 Practical Learning: Visually Creating a View

1. In the Server Explorer, expand the YugoNationalBank2 if necessary.

Right-click Views and click Add New View

2. In the Add Table dialog box, click Employees, click Add, and click Close

3. In the Diagram section of the view, click the check boxes of EmployeeID and EmployeeNumber

4. In the Criteria section, click the empty box under EmployeeNumber and type LastName + ', ' +

FirstName

5. Set its Alias to EmployeeName

Pages 120

6. In the Diagram section, click the check box of CanCreateNewAccount

7. Close the view

8. When asked whether you want to save it, click Yes

9. In the Choose Name dialog box, set the name to EmployeeIdentification and click OK

10. In the Data Sources window, right-click dsYugoNationalBank and click Configure DataSet With

Wizard...

11. Click the check box of Views

12. Click Finish


14. Set the Name to Customers and click Add

15. In the Data Sources window, click Customers and click the arrow on its right side to drop the combo

box

16. Select Details

17. Drag the Customers node and drop it on the form

18. Under the form, click the objects and, using the Properties window, change their names as follows:

Object Name

customersBindingSource bsCustomers

customersTableAdapter taCustomers

customersBindingNavigator bnCustomers

19. Once again, from the Data Sources window, drag EmployeeIdentification and drop it on the form

20. While the data grid view is still selected, press Delete to remove it


Object Name Filter

employeeIdentificationBindingSource bsEmployeeIdentification CanCreateNewAccount

= True

employeeIdentificationTableAdapter taEmployeeIdentification

22. Once again, from the Data Sources window, drag AccountTypes and drop it on the form

23. While the data grid view is still selected, press Delete to remove it

Pages 121


Object Name

accountTypesBindingSource bsAccountTypes

accountTypesTableAdapter taAccountTypes

25. On the form, click the text box on the right side of Employee ID and press Delete

26. On the form, click the text box on the right side of Date Created and press Delete

27. On the form, click the text box on the right side of Account Type ID and press Delete

28. On the form, click the text box on the right side of Account Number and press Delete

29. On the form, click the text box on the right side of Account Status and press Delete


New Control Text Name Other Properties

ComboBox cbxEmployeeID

DropDownStyle: DropDownList

DataSource: bsAccountManagers

DisplayMember: EmployeeName

ValueMember: EmployeeID

(DataBindings) -> Selected Value:

bsCustomers - EmployeeID

ComboBox cbxAccountTypeID


DataSource: bsAccountTypes

DisplayMember: AccountType

ValueMember: AccountTypeID

(DataBindings) -> Selected Value:

bsCustomers - AccountTypeID

MaskedTextBox txtAccountNumber

Mask: 00-000000-00

(DataBindings) -> Text:

bsCustomers - AccountNumber

ComboBox cbxAccountStatus


(DataBindings) -> Text:

bsCustomers - DateCreated

Items:

Active

Closed

suspended

Button Close btnClose Anchor: Bottom, Right

Pages 122



{

Close();

}


(Name): btnCustomers

Text: Customers...


private void btnCustomers_Click(object sender, EventArgs e)

{

Customers clients = new Customers();

clients.ShowDialog();

}





38. Set the Name to NewDeposit and click Add



Label Transaction Date:

DateTimePicker dtpTransactionDate

Label Processed By:


TextBox txtEmployeeName

Label Processed For:

MaskedTextBox txtAccountNumber Mask: 00-000000-00

TextBox txtCustomerName

Label Currency Type:

ComboBox cbxCurrencyTypes DropDownStyle: DropDownList

Items: Cash, Check, Money, Order

Label Amount Deposited:

TextBox txtAmount TextAlign: Right

Label Notes

TextBox txtNotes Multiline: True

ScrollBars: Vertical



Pages 123

http://www.yevol.com/en/vcsharp/databasedesign/ynb.htm



using System;



using System.Data;


using System.Linq;

using System.Text;




{

public partial class NewDeposit : Form

{

int EmployeeID;

int CustomerID;

public NewDeposit()

{


}

private void NewDeposit_Load(object sender,

EventArgs e)

{

EmployeeID = 0;

CustomerID = 0;

}

}

}

42. On the form, click the EmployeeNumber text box and, on the Properties window, click the Events

button


private void txtEmployeeNumber_Leave(object sender, EventArgs

e)

{

if (txtEmployeeNumber.Text.Length == 0)

{

MessageBox.Show("You must specify the employee number "

+

"of the clerk who is processing the

deposit.");

return;

}

else

{

using (SqlConnection cnnYNB =




{

string strYNB = "SELECT EmployeeID, FirstName,

LastName " +

Pages 124

"FROM Employees WHERE EmployeeNumber = '" +


SqlCommand cmdYNB = new SqlCommand(strYNB, cnnYNB);

cnnYNB.Open();

SqlDataReader rdrEmployees =

cmdYNB.ExecuteReader();

while (rdrEmployees.Read())

{

EmployeeID =

int.Parse(rdrEmployees.GetSqlInt32(0).ToString());

txtEmployeeName.Text =

rdrEmployees.GetString(1) + " " +

rdrEmployees.GetString(2);

}

if (EmployeeID == 0)

{

MessageBox.Show("The employee number you

entered " +

"is not recognized in our

database.");


}

}

}

}

44. Return to the form, click the AccountNumber text box and, in the Events section of the Properties

window, double-click Leave

45. Implement the even as follows:

private void txtAccountNumber_Leave(object sender,

EventArgs e)

{

if( txtAccountNumber.Text.Length == 0)

{

MessageBox.Show("You must specify the account

number " +

"of the customer whose deposit you

are entering.");

return;

}

else

{




"Integrated

Security=SSPI;"))

{

string strYNB = "SELECT CustomerID,

CustomerName FROM " +

"Customers WHERE AccountNumber

= '" +

txtAccountNumber.Text + "';";

SqlCommand cmdYNB = new SqlCommand(strYNB,

cnnYNB);

Pages 125

SqlDataAdapter daYNB = new SqlDataAdapter();

daYNB.SelectCommand = cmdYNB;

DataSet dsCustomers = new

DataSet("CustomersSet");

daYNB.Fill(dsCustomers);

cnnYNB.Open();

foreach (DataRow rowCustomer in

dsCustomers.Tables[0].Rows)

{

CustomerID =

int.Parse(rowCustomer["CustomerID"].ToString());

txtCustomerName.Text =

rowCustomer["CustomerName"].ToString();

break;

}

if (CustomerID == 0)

{

MessageBox.Show("The account number you

entered " +


database.");

txtAccountNumber.Text = "";

}

}

}

}

46. Return to the form and double-click the Submit button


private void btnSubmit_Click(object sender, EventArgs

e)

{

DateTime dteTransaction = DateTime.Today;

string strCurrencyType = "Unknown";

double Amount = 0.00;

if( EmployeeID == 0 )

{

MessageBox.Show("You must specify the employee

number " +


deposit.");

return;

}

if( CustomerID == 0)

{

MessageBox.Show("You must enter an account

number " +

"for the new customer.");

return;

}

strCurrencyType = cbxCurrencyTypes.Text;

try

Pages 126

{

Amount = double.Parse(txtAmount.Text);

}

catch (FormatException)

{

MessageBox.Show("Invalid Amount.");

}





{

string strEmployees = "INSERT INTO

AccountsTransactions(" +

"EmployeeID, CustomerID, "

+

"TransactionDate,

TransactionType, " +

"CurrencyType,

DepositAmount, Notes) " +

"VALUES('" + EmployeeID +

"', '" +

CustomerID + "', '" +

dtpTransactionDate.Value.ToString("d") +

"', 'Deposit', '" +

cbxCurrencyTypes.Text +

"', '" + Amount + "', '" +

txtNotes.Text + "');";

SqlCommand cmdEmployees =

new SqlCommand(strEmployees, cnnYNB);

cnnYNB.Open();

cmdEmployees.ExecuteNonQuery();

dtpTransactionDate.Value = DateTime.Today;


txtEmployeeName.Text = "";


txtCustomerName.Text = "";

cbxCurrencyTypes.SelectedIndex = 0;

txtAmount.Text = "0.00";

txtNotes.Text = "";

}

}

48. Return to the form and double-click the Close button

49. Implement its even as follows:


{

Close();

}


(Name): btnNewDeposit

Text: New Deposit...

Pages 127


private void btnNewDeposit_Click(object sender, EventArgs e)

{

NewDeposit deposit = new NewDeposit();

deposit.ShowDialog();

}


53. Set the Name to NewWithdrawal and click Add



Label Transaction

Date:


Label Processed By:


TextBox txtEmployeeName




Label Currency Type:

ComboBox cbxCurrencyTypes

DropDownStyle:

DropDownList

Items:

Cash

Check

Money Order

Label Amount

Withdrawn:


Label Notes






Pages 128


using System;



using System.Data;


using System.Linq;

using System.Text;




{


{

int EmployeeID;

int CustomerID;

public NewDeposit()

{


}


EventArgs e)

{

EmployeeID = 0;

CustomerID = 0;

}

}

}

57. Return to the form, click the EmployeeNumber text box and, on the Properties window, click the

Events button


private void txtEmployeeNumber_Leave(object sender, EventArgs

e)

{

if (txtEmployeeNumber.Text.Length == 0)

{

MessageBox.Show("You must specify the employee number "

+


transaction.");

return;

}

else

{





{

string strYNB = "SELECT EmployeeID, FirstName,

LastName " +

"FROM Employees WHERE EmployeeNumber

= '" +

Pages 129


SqlCommand cmdYNB = new SqlCommand(strYNB, cnnYNB);

cnnYNB.Open();

SqlDataReader rdrEmployees =


while (rdrEmployees.Read())

{

EmployeeID =

int.Parse(rdrEmployees.GetSqlInt32(0).ToString());

txtEmployeeName.Text =

rdrEmployees.GetString(1) + " " +

rdrEmployees.GetString(2);

}


{

MessageBox.Show("The employee number you

entered " +


database.");


}

}

}

}

59. Return to the form, click the AccountNumber text box and, in the Events section of the Properties

window, double-click Leave


private void txtAccountNumber_Leave(object sender, EventArgs

e)

{


{

MessageBox.Show("You must specify the account number "

+

"of the customer whose withdrawal you

are processing.");

return;

}

else

{





{

string strYNB = "SELECT CustomerID, CustomerName

FROM " +

"Customers WHERE AccountNumber = '"

+



cnnYNB);


Pages 130


DataSet dsCustomers = new DataSet("CustomersSet");


cnnYNB.Open();



{

CustomerID =




break;

}


{

MessageBox.Show("The account number you entered

" +


database.");


}

}

}

}



private void btnSubmit_Click(object sender, EventArgs e)

{


string strCurrencyType = "Unknown";



{

MessageBox.Show("You must specify a valid employee

number " +


withdrawal.");

return;

}


{

MessageBox.Show("You must enter a valid account

number " +


return;

}

strCurrencyType = cbxCurrencyTypes.Text;

try

{


}


Pages 131

{


return;

}





{

string strEmployees = "INSERT INTO


"EmployeeID, CustomerID, " +

"TransactionDate,


"CurrencyType,

WithdrawalAmount, Notes) " +

"VALUES('" + EmployeeID + "',

'" +

CustomerID + "', '" +


"', 'Withdraw', '" +

cbxCurrencyTypes.Text +

"', '" + Amount + "', '" +


SqlCommand cmdEmployees = new

SqlCommand(strEmployees, cnnYNB);

cnnYNB.Open();

cmdEmployees.ExecuteNonQuery();



txtEmployeeName.Text = "";



cbxCurrencyTypes.SelectedIndex = 0;


txtNotes.Text = "";

}

}




{

Close();

}


(Name): btnNewWithdrawal

Text: New Withdrawal...


private void btnNewWithdrawal_Click(object sender, EventArgs e)

{

NewWithdrawal withdraw = new NewWithdrawal();

withdraw.ShowDialog();}

Pages 132


68. Set the Name to NewCharge and click Add



Label Transaction

Date:





Label Charge

Reason:

ComboBox cbxChargeReason

DropDownStyle:

DropDownList

Items:

Overdraft

Money Order

Check Stopping

Monthly Charge

Label Amount

Charged:


Label Notes







using System;



using System.Data;


using System.Linq;

using System.Text;




{


Pages 133

{

int CustomerID;

public NewDeposit()

{


}


EventArgs e)

{

CustomerID = 0;

}

}

}

72. On the form, click the AccountNumber text box and, in the Events section of the Properties window,

double-click Leave


private void txtAccountNumber_Leave(object sender, EventArgs

e)

{


{

MessageBox.Show("You must specify the account number "

+

"of the customer whose withdrawal you are

processing.");

return;

}

else

{





{

string strYNB = "SELECT CustomerID, CustomerName

FROM " +

"Customers WHERE AccountNumber = '"

+



cnnYNB);



DataSet dsCustomers = new DataSet("CustomersSet");


cnnYNB.Open();



{

CustomerID =




break;

}

Pages 134


{

MessageBox.Show("The account number you

entered " +


database.");


}

}

}

}




e)

{




{

MessageBox.Show("You must enter a valid account

number " +


return;

}

try

{


}


{


return;

}





{

string strCharges = "INSERT INTO


"CustomerID, " +

"TransactionDate,


"ChargeAmount, ChargeReason,

Notes) " +

"VALUES('" + CustomerID +

"', '" +


"', 'Charge', '"+ Amount +

"', '" +

cbxChargeReasons.Text + "',

'" +


Pages 135

SqlCommand cmdCharges = new SqlCommand(strCharges,

cnnYNB);

cnnYNB.Open();

cmdCharges.ExecuteNonQuery();




cbxChargeReasons.SelectedIndex = 0;


txtNotes.Text = "";

}

}




{

Close();

}


(Name): btnNewCharge

Text: New Charge...


private void btnNewCharge_Click(object sender, EventArgs e)

{

NewCharge charge = new NewCharge();

charge.ShowDialog();

}


81. Create a few records


83. To create a new view, in the Server Explorer, under YugoNationalBank2, right-click Views and click

Add New View

84. In the Add Table dialog box, double-click Customers and AccountTypes

85. Click Close

86. In the Diagram section, click the check boxes of CustomerID, CustomerName, AccountNumber,

AccountType, DateCreated, and AccountStatus

Pages 136

http://www.yevol.com/en/vcsharp/databasedesign/ynb.htm#Transactions

87. Close the view


89. Set the Name to CustomerIdentification and click OK


Wizard...

91. Click the check box of Views to remove the check mark

92. Click it again to put the check mark and click Finish

93. To create a new view, in the Server Explorer, under YugoNationalBank2, right-click Views and click

Add New View

94. In the Add Table dialog box, double-click Customers and AccountsTransactions

95. Click Close

96. In the Diagram section, click the check boxes of AccountNumber, TransactionDate, TransactionType,

CurrencyType, DepositAmount, WithdrawalAmount, ChargeAmount, ChargeReason, and Balance

97. Close the view


99. Set the Name to AccountTransactions and click OK


Wizard...

101. Click the check box of View to remove the check mark

102. Click it again to put the check mark

103. Click Finish


105. Set the Name to AccountTransactions and click Add

106. From the Data Sources window, drag AccountTransactions and drop it on the form

107. Under the form, click accountTransactionsBindingNavigator and press Delete


Object Name

accountTransactionsBindingSource bsAccountTransactions

accountTypesTableAdapter taAccountTransactions

Pages 137



Label Account Number:

MaskedTextBox txtAccountNumber Mask: 00-000000-

00

Button Locate btnLocate

Label Customer Name:


Label Account Type:

TextBox Account Type

Label Account Status:

TextBox txtAccountStatus

Label Date Created:

DateTimePicker dtpDateCreated

DataGridView dgvAccountProperties

Label Total Deposits

TextBox txtTotalDeposits Text: 0.00

TextAlign: Right

Label Total Charges

TextBox txtTotalCharges Text: 0.00

TextAlign: Right


Label Total

Withdrawals

TextBox txtTotalWithdrawals Text: 0.00

TextAlign: Right

Label Balance

TextBox txtBalance Text: 0.00

TextAlign: Right

110. On the form, double-click the Locate and change the following changes:

using System;



using System.Data;


using System.Linq;

using System.Text;




Pages 138

{

public partial class AccountTransactions : Form

{

public AccountTransactions()

{


}

private void AccountTransactions_Load(object sender,

EventArgs e)

{

// TODO: This line of code loads data into the

// 'dsYugoNationalBank.AccountTransactions' table.

// You can move, or remove it, as needed.

taAccountTransactions.Fill(dsYugoNationalBank.AccountTransactions);

bsAccountTransactions.Filter = "AccountNumber = '00-

000000-00'";

}

private void btnLocate_Click(object sender, EventArgs e)

{

int CustomerID = 0;

if (txtAccountNumber.Text.Length == 0)

{

MessageBox.Show("You must specify the account number

" +

"of the customer whose transactions you want to

view.");

return;

}

else

{





{

string strYNB = "SELECT * FROM " +

"CustomerIdentification WHERE AccountNumber

= '" +



cnnYNB);



DataSet dsCustomers = new

DataSet("CustomersSet");


cnnYNB.Open();



{

CustomerID =



Pages 139


txtAccountType.Text =

rowCustomer["AccountType"].ToString();

txtAccountStatus.Text =

rowCustomer["AccountStatus"].ToString();

dtpDateCreated.Value =

DateTime.Parse(rowCustomer["DateCreated"].ToString());

break;

}

bsAccountTransactions.Filter =

"AccountNumber = '" + txtAccountNumber.Text

+ "'";

}

}

if (CustomerID != 0)

{

double Deposits = 0.00, Withdraws = 0.00,

Charges = 0.00, Balance = 0.00;




"Integrated

Security=SSPI;"))

{

string strYNB =

"SELECT SUM(CAST(DepositAmount AS money)), "

+

"SUM(CAST(WithdrawalAmount AS money)), " +

"SUM(CAST(ChargeAmount AS money)) FROM " +

"AccountsTransactions WHERE CustomerID = '"

+

CustomerID.ToString() + "';";


cnnYNB);

cnnYNB.Open();

SqlDataReader rdrTransactions =


while (rdrTransactions.Read())

{

try

{

Deposits =

double.Parse(rdrTransactions[0].ToString());

}


{

}

try

{

Withdraws =


}

Pages 140


{

}

try

{

Charges =


}


{

}

txtTotalDeposits.Text =

Deposits.ToString("F");

txtTotalWithdrawals.Text =

Withdraws.ToString("F");

txtTotalCharges.Text =

Charges.ToString("F");

Balance = Deposits - (Withdraws + Charges);

txtBalance.Text = Balance.ToString("F");

}

}

}

}

}

}




{

Close();

}


(Name): btnAccountTransactions

Text: View an Account's Transactions...


private void btnAccountsTransactions_Click(object sender, EventArgs

e)

{

AccountTransactions transactions = new AccountTransactions();

transactions.ShowDialog();

}

115. Execute the application

116. Open the Account's Transactions form, enter an account number and click Locate


Pages 141

4.4 Programmatically Creating and Using a View

4.4.1 Creating a View

To programmatically create a view, you use the following SQL syntax:

CREATE VIEW ViewName

AS

SELECT Statement

If you are using Microsoft SQL Server Management Studio, it can generate skeleton code of a view for you.

To use it, first create an empty query window. Display the Template Explorer. In the Template Explorer,

expand the View node. From the View node, drag Create View and drop it in the query window.

The creation of a view starts with the CREATE VIEW expression followed by a name. The name of a view

follows the rules and suggestions we reviewed for view names. After the name of the view, use the AS

keyword to indicate that you are ready to define the view.

Because a view is primarily a SQL statement, it is defined using a SELECT statement, using the same rules

we studied for data analysis. Here is an example of a view:

CREATE VIEW dbo.ListOfMen

AS

SELECT dbo.Sexes.Sex,

dbo.Persons.FirstName, dbo.Persons.LastName

FROM dbo.Genders INNER JOIN dbo.Persons

ON dbo.Sexes.SexID = dbo.Persons.SexID

WHERE (dbo.Sexes.Sex = 'Male');

GO

After creating the SQL statement that defines the view, you must execute the statement. If using a query

window in Microsoft SQL Server Management Studio, you can do this by pressing F5. Once the statement is

executed, its name is automatically added to the Views node of its database even if you do not save its code.

4.4.2 Practical Learning: Programmatically Creating a View

1. Display the Central form and double-click the Close button

2. To create a new view, change the event as follows:


{

using (SqlConnection cnnTimesheet =




{

string strTimesheet = "CREATE VIEW

dbo.Timesheet " +

"AS " +

"SELECT EmployeeID, StartDate, " +

" TimesheetCode, Week1Monday, " +

" Week1Tuesday, Week1Wednesday, "

+

" Week1Thursday, Week1Friday, " +

" Week1Saturday, Week1Sunday, " +

" Week2Monday, Week2Tuesday, " +

" Week2Wednesday, Week2Thursday, "

Pages 142

http://www.yevol.com/en/vcsharp/databasedesign/Lesson38.htm#Name

+

" Week2Friday, Week2Saturday, " +

" Week2Sunday, Notes " +

"FROM dbo.Timesheets;";

SqlCommand cmdTimesheet =

new SqlCommand(strTimesheet, cnnTimesheet);

cnnTimesheet.Open();

cmdTimesheet.ExecuteNonQuery();

MessageBox.Show("A view named Timesheet has

been created.");

}

Close();

}

3. Execute the application and click the Close button

4. Click OK

4.4.3 Executing a View

After creating a view, it shares many of the characteristics of a table. For example, a view has its own columns

although the columns are actually tied to the table(s) that hold(s) the original data. Treated as a table, you can

access the columns of a view using a SELECT statement. This means that you can access one, a few, or all of

the columns. Here is an example that accesses all columns of a view:

SELECT PayrollPreparation.* FROM PayrollPreparation;

4.5 View Maintenance

4.5.1 The Properties of a View

In Transact-SQL, a view is considered an object. As such, it can be viewed, changed, or deleted. Like any

regular object, a view has its own characteristics. To see them in Microsoft SQL Server Management Studio,

you can right-click the view and click Properties. A View Properties dialog box would come up. It can give

you information such as the name of the database the view belongs to, the date the view was created, etc.

4.5.2 Modifying a View

After a view has been created, either by you or someone else, you may find out that it has an unnecessary

column, it needs a missing column, it includes unnecessary records, or some records are missing. Fortunately,

you can change the structure or the code of a view. This is referred to as altering a view. You have various

options:

To visually change a view, in the Object Explorer of Microsoft SQL Server Management Studio, you

can right-click the view and click Design. In the Server Explorer of Microsoft Visual Studio, you can

right-click the view and click Open View Definition.

From the view window, you can add or remove the columns. You can also change any options in one

of the sections of the window. After modifying the view, save it and close it

To change the code of a view, in the Object Explorer of Microsoft SQL Server Management Studio,

right-click it and click Edit. After editing the view's code, you can save it

From the Object Explorer of Microsoft SQL Server Management Studio, you can right-click the view,

position the mouse on Script View As -> ALTER To -> New Query Editor Window

Pages 143

The basic formula to programmatically modify a view is:

ALTER VIEW ViewName

AS

SELECT Statement

You start the alteration with the ALTER VIEW expression followed by the name of the view. After the name

of the view, use the AS keyword to specify that you are ready to show the change. After the AS keyword, you

can then define the view as you see fit. For example, you can create a SELECT statement that includes a

modification of the existing code or a completely new statement.

In the view we created to show a list of men of a table, we included a column for the sex. This column is

useless or redundant because we already know that the list includes only men. Here is an example of altering

the view to remove (or rather omit) the Sex column of the Persons table:

ALTER VIEW dbo.ListOfMen

AS

SELECT dbo.Persons.FirstName, dbo.Persons.LastName

FROM dbo.Sexes INNER JOIN dbo.Persons

ON dbo.Sexes.SexID = dbo.Persons.SexID

WHERE (dbo.Sexes.Sex = 'Male');

4.5.3 Deleting a View

Instead of modifying a view, if you find it altogether useless, you can remove it from its database. You have

various options. To delete a view:

In the Object Explorer in Microsoft SQL Server Management Studio, right-click the name of the view

and click Delete. The Delete Object dialog box would display to give you the opportunity to confirm

your intention or to change your mind

In the Object Explorer in Microsoft SQL Server Management Studio, right-click the view, position the

mouse on Script View As -> DROP To New Query Editor Window

In Microsoft SQL Server Management Studio, you can open an empty query window associated with

the database that has the undesired view. From the Template Explorer, in the View node, drag Drop

View and drop it in the query window

In the Server Explorer in Microsoft Visual Studio, under the Views node of the database, you can

right-click the view and click Delete. A message box would display, asking you whether you are sure

you want to delete the view. You can decide to continue or change your mind

The formula to programmatically delete a view is:

DROP VIEW ViewName

On the right side of the DROP VIEW expression, enter the name of the undesired view and execute the

statement. You will not be warned before the interpreter deletes the view. If you are programmatically

creating a Windows application, of course you can use a conditional statement to assist the user with deciding

whether to continue deleting the view or not.

Pages 144

4.6 Using a View

4.6.1 Data Entry With a View

As seen so far, a view is a selected list of records from a table. As you may suspect, the easiest view is

probably one created from one table. Imagine you have a table of employees and you want to create a view

that lists only their names. You may create a view as follows:

CREATE VIEW dbo.EmployeesNames

AS

SELECT FirstName,

LastName,

LastName + ', ' + FirstName AS FullName

FROM Persons;

GO

On such a view that is based on one table, you can perform data entry, using the view, rather than the table. To

do this, you follow the same rules we reviewed for table data entry. Here is an example:

INSERT INTO dbo.EmployeesNames(FirstName, LastName)

VALUES('Peter', 'Justice');

If you perform data entry using a view, the data you provide would be entered on the table from which the

view is based. This means that the table would be updated automatically. Based on this feature, you can create

a view purposely intended to update a table so that, in the view, you would include only the columns that need

to be updated.

4.6.2 Practical Learning: Performing Data Entry Using a View

1. Display the Timesheet form and double-click the Submit button

2. To create a new view, change the event as follows:


e)

{

string strTimeSheet = "";

// If this is new record, then create a new

time sheet

if (bNewRecord == true)

{

strTimeSheet = "INSERT INTO

dbo.Timesheet " +

"VALUES('" +

txtEmployeeNumber.Text +

"', '" +

dtpStartDate.Value.ToString("MM/dd/yyyy") + "', '" +

strTimesheetCode + "',

'" +

txtWeek1Monday.Text +

"', '" +

txtWeek1Tuesday.Text +

"', '" +

txtWeek1Wednesday.Text +

"', '" +

txtWeek1Thursday.Text +

"', '" +

txtWeek1Friday.Text +

Pages 145

"', '" +

txtWeek1Saturday.Text +

"', '" +

txtWeek1Sunday.Text +

"', '" +


"', '" +


"', '" +


"', '" +


"', '" +


"', '" +


"', '" +


"', '" +


}

// If this is an existing record, then,

only update it

if (bNewRecord == false)

{

strTimeSheet = "UPDATE dbo.Timesheets

SET Week1Monday = '" +


"', Week1Tuesday = '" +


"', Week1Wednesday = '" +


"', Week1Thursday = '" +


"', Week1Friday = '" +


"', Week1Saturday = '" +


"', Week1Sunday = '" +


"', Week2Monday = '" +


"', Week2Tuesday = '" +


"', Week2Wednesday = '" +


"', Week2Thursday = '" +


"', Week2Friday = '" +


"', Week2Saturday = '" +


"', Week2Sunday = '" +


"', Notes = '" + txtNotes.Text +

"' WHERE TimeSheetCode =

'" + strTimesheetCode + "';";

}

if (ValidTimesheet == true)

{

SqlConnection conTimeSheet =

Pages 146



"Integrated

Security=true");

SqlCommand cmdTimeSheet = new

SqlCommand(strTimeSheet, conTimeSheet);

conTimeSheet.Open();

cmdTimeSheet.ExecuteNonQuery();

conTimeSheet.Close();

MessageBox.Show("Your time sheet has

been submitted");

// Reset the timesheet


dtpStartDate.Value = DateTime.Today;

btnReset_Click(sender, e);

}

else

{

MessageBox.Show("The time sheet is not

valid\n" +

"either you didn't

enter a valid employee number, " +

"or you didn't select

a valid start date\n" +

"The time sheet will

not be saved");

}

}

3. Execute the application

4. Open the employees timesheet and create a few entries

5. Close the form and return to your programming environment:

4.6.3 Views and Functions

To create more complex or advanced views, you can involve functions. As always, probably the easiest

functions to use are those built-in. If there is no built-in function that performs the operation you want, you

can create your own. Here is an example:

USE People;

GO

CREATE FUNCTION dbo.GetFullName

(

@FName varchar(20),

@LName varchar(20)

)

RETURNS varchar(41)

AS

BEGIN

RETURN @LName + ', ' + @FName;

}

GO

Pages 147

Once you have a function you want to use, you can call it in the body of your view as you judge it necessary.

Here is an example:

CREATE VIEW dbo.MyPeople

AS

SELECT dbo.GetFullName(FirstName, LastName) AS [Full Name],

dbo.Genders.Gender

FROM Genders INNER JOIN dbo.Persons

ON dbo.Genders.GenderID = dbo.Persons.GenderID;

4.6.4 A View With Alias Names

It is important to know that a view is more of a table type than any other object. This means that a view is not

a function but it can use a function. The word argument here only means that some values can be passed to a

view but these values can be specified only when creating the view. They are not real arguments.

When structuring a view, you can create placeholders for columns and pass them in the parentheses of the

view. This would be done as follows:

CREATE VIEW CarIdentifier([Tag #], Manufacturer, [Type of Car], Available)

. . .

If you use this technique, the names passed in the parentheses of the view are the captions that would be

displayed in place of the columns of the view. This technique allows you to specify the strings of your choice

for the columns. If you want a column header to display the actual name of the column, write it the same.

Otherwise, you can use any string you want for the column. If the name is in one word, you can just type it. If

the name includes various words, include them between an opening square bracket "[" and a closing square

bracket "]".

After listing the necessary strings as the captions of columns, in your SELECT statement of the view, you

must use the exact same number of columns as the number of arguments of the view. In fact, each column of

your SELECT statement should correspond to an argument of the same order.

Here is an example:

CREATE VIEW dbo.MenAndWomen([First Name], [Last Name], Gender)

AS

SELECT dbo.Persons.FirstName,

dbo.Persons.LastName,

dbo.Genders.Gender

FROM dbo.Genders INNER JOIN dbo.Persons

ON dbo.Genders.GenderID = dbo.Persons.GenderID;

GO

Because, as we stated already, a view is not a function and the values passed to the view are not real

arguments, when executing the view, do not specify the names of arguments. Simply create a SELECT

statement and specify the name of the view as the source.

Pages 148

4.6.5 Views and Conditional Statements

Besides its querying characteristics that allow it to perform data analysis, probably the most important feature

of a query is its ability to be as complex as possible by handling conditional statements. This makes it possible

to use a view instead of a table in operations and expressions that would complicate the code or structure of a

table. When creating a view, in its SELECT statement, you can perform column selections, order them, and

set criteria to exclude some records.

Here is an example:

Pages 149

Stored Procedures

We had an introduction to some types of actions that could be

performed on a database. These actions were called functions. The

SQL provides another type of action called a stored procedure. If

you have developed applications in some other languages such as

Pascal or Visual Basic, you are probably familiar with the idea of

a procedure. Like a function, a stored procedure is used to perform

an action on a database.

Introduction

A benefit of stored procedures is that you can centralize data access logic

into a single place that is then easy for DBA's to optimize. Stored

procedures also have a security benefit in that you can grant execute rights

to a stored procedure but the user will not need to have read/write

permissions on the underlying tables. This is a good first step against SQL

Injection.

5

5.1 Creating a Stored Procedure

To create a procedure:

In Microsoft SQL Server Management Studio:

o In the Object Explorer, expand the database for which you want to create the procedure,

expand its Programmability node, right-click Stored Procedures, and click New Stored

Procedure... A query window with a skeleton syntax would be displayed. You can then modify

that code using the techniques we will learn in this lesson

o Open an empty query window associated with the database for which you want to create the

stored procedure and display the Templates Explorer. In the Template Explorer, expand the

Store Procedure node. Drag Create Stored Procedure and drop it in the query window

o Open an empty query window associated with the database for which you want to create the

stored procedure and enter the necessary code

In Microsoft Visual Studio, in the Server Explorer, under the database connection, right-click Stored

Procedure and click Add New Stored Procedure. An empty window would open in the Code Editor,

waiting for you to do your thing

In SQL, to create a procedure, you start with the CREATE PROCEDURE expression. You can also use

CREATE PROC. Both expressions produce the same result. Like everything in your database, you must

name your procedure:

The name of a procedure can be any string that follows the rules we reviewed for naming the functions

Refrain from starting the name of a procedure with sp_ because it may conflict with some of the stored

procedures that already ship with Microsoft SQL Server

After the name of the procedure, type the keyword AS. The section, group of words, or group of lines after the

AS keyword is called the body of the procedure. It states what you want the procedure to do or what you want

it to produce.

Based on this, the simplest syntax of creating a procedure is:

CREATE PROCEDURE ProcedureName

AS

Body of the Procedure

You can also start the body of the stored procedure with BEGIN and end it with END. The formula to use

would be:


AS

BEGIN


END

It is important to keep in mind that there are many other issues related to creating a procedure but for now, we

will consider that syntax.

After creating the procedure, you must store it as an object in your database. To do this:

1. If you are working in Microsoft SQL Server Management Studio, on the SQL Editor toolbar, you can

click the Execute button. If the code of the procedure is right, a new node and a name for the stored

procedure would be added to the Stored Procedures section of the database

Pages 150

http://www.yevol.com/en/vcsharp/databasedesign/Lesson26.htm#Name

2. In you are working in Microsoft Visual Studio, first save the stored procedure. If it is already closed,

open it from the Server Explorer. Then, you can right-click the Code Editor and click Execute. The

result would show in the Output window

5.2 Managing Procedures

5.2.1 Modifying a Procedure

As a regular SQL Server database object, you can modify a stored procedure without recreating it. To do this:

In Microsoft SQL Server Management Studio:

o In the Object Explorer, you can right-click the procedure and click Modify

o In the Object Explorer, you can right-click the procedure, position the mouse on Script Stored

Procedure As -> ALTER To -> New Query Editor Window

o Open an empty query window associated with the database that contains the stored procedure.

From the Template Explorer, expand Stored Procedure. Drag the Drop Stored Procedure node

and drop it in the empty query window

In Microsoft Visual Studio, in the Server Explorer, under the database connection, under the Stored

Procedures node, double-click the stored procedure or right-click it and click Open

In each case, the code of the stored procedure would open and you can modify it as you see fit. After editing

the code, you can execute the SQL statement to update the stored procedure.

In SQL, the basic formula to modify a stored procedure is:

ALTER PROCEDURE ProcedureName

AS

BEGIN


END

5.2.2 Deleting a Procedure

One of the biggest characteristics of a stored procedure is that it is treated like an object in its own right.

Therefore, after creating it, if you do not need it anymore, you can get rid of it.

There are various types of procedures, some of which are considered temporary. Those types of procedures

delete themselves when not needed anymore, such as when the person who created the procedure disconnects

from the database or shuts down the computer. Otherwise, to delete a procedure, you can use either the Object

Explorer of Microsoft SQL Server Management Studio of the Server Explorer of Microsoft Visual Studio. As

mentioned with tables, even if you create a procedure using the Object Explorer or the Server Explorer, you

can delete it using SQL code.

To delete a stored procedure:

In the Object Explorer of Microsoft SQL Server Management Studio, after expanding the database, its

Programmability, and its Stored Procedure nodes, you can right-click the procedure and click Delete.

You can also click it in the Object Explorer to select it and then press Delete. The Delete Object dialog

box would come up to let you make a decision

In the Server Explorer of Microsoft Visual Studio, you can right-click the stored procedure and click

Delete. A message box would warn and allow you to confirm your decision or change your mind

Pages 151

To delete a procedure in SQL, the syntax to use is:

DROP PROCEDURE ProcedureName

Of course, you should make sure you are in the right database and also that the ProcedureName exists.

5.3 Exploring Procedures

5.3.1 Introduction

Probably the simplest procedure you can write would consist of selecting columns from a table. To do this in

Microsoft Visual Studio, in the Code Editor, you can right-click somewhere after the AS operator and click

Insert SQL:

This action would display the Query Builder. From there, you can select the tables of the database and the

desired columns of the table(s). You can then build a SQL expression as you see fit. After building it, you can

click OK. A SQL SELECT expression would be generated for you.

To manually create a SQL expression, you can enter a SELECT expression after AS and apply the techniques

we reviewed for data analysis. For example, to create a stored procedure that would hold a list of students

from a table named Students, you would create the procedure as follows:

CREATE PROCEDURE GetStudentIdentification

AS

BEGIN

SELECT FirstName, LastName, DateOfBirth, Sex

FROM Students

}

GO

Besides SELECT operations, in a stored procedure, you can perform any of the database operations we have

applied so far. These include creating and maintaining records, etc.

Pages 152

5.3.2 Practical Learning: Creating a Stored Procedure

1. Open the YugoNationalBank1 application from the previous lesson

2. In the Server Explorer, expand the YugoNationalBank1 connection, right-click its Stored Procedures

node and click Add New Stored Procedure

3. Change the text in the Code Editor as follows:

-- =============================================

-- Author: FunctionX

-- Creation date: Monday, January 28, 2008

-- Description: This stored procedure assigns a

-- default password to each employee.

-- =============================================

CREATE PROCEDURE dbo.AssignDefaultPassword

AS

BEGIN

UPDATE dbo.Employees

SET Password = 'Password1' FROM dbo.Employees;

}

4. To save the stored procedure, on the Standard toolbar, click the Save button

5.3.3 Executing a Procedure

To get the results of creating a procedure, you must execute it (in other words, to use a stored procedure, you

must call it). To execute a stored procedure in Microsoft Visual Studio:

If the stored procedure is opened in the Code Editor, you can right-click anywhere in the window and

click Execute

In the Server Explorer (even if the stored procedure is displaying its text in the Code Editor), you can

right-click the name of the stored procedure and click Execute

To execute a procedure in SQL, you use the EXECUTE keyword followed by the name of the procedure.

Although there are some other issues related to executing a procedure, for now, we will consider that the

simplest syntax to call a procedure is:

EXECUTE ProcedureName

Alternatively, instead of EXECUTE, you can use the EXEC keyword:

EXEC ProcedureName

For example, if you have a procedure named GetStudentIdentification, to execute it, you would type:

EXECUTE GetStudentIdentification

You can also precede the name of the procedure with its schema, such as dbo. Here is an example:

EXECUTE dbo.GetStudentIdentification;

You can also precede the name of the schema with the name of the database. Here is an example:

EXECUTE ROSH.dbo.GetStudentIdentification;

Pages 153

http://www.yevol.com/en/vcsharp/databasedesign/Lesson38.htm

5.4 Practical Learning: Executing a Stored Procedure

1. (You should open the Employees table from the Server Explorer and verify that the Password fields

are empty or NULL, then close it).

While the contents of the stored procedure is still displaying in the Code Editor, right-click anywhere

in the window and click Execute.

In the Output window, you should receive various lines of code that indicate success:

Running [dbo].[AssignDefaultPassword].

(8 row(s) affected)

(0 row(s) returned)

@RETURN_VALUE = 0

Finished running [dbo].[AssignDefaultPassword].

2. (You should open the Employees table from the Server Explorer and verify that the Password fields

now have Password1 each).

Close the stored procedure tab

5.4.1 Using Expressions and Functions

One of the advantages of using stored procedures is that not only can they produce the same expressions as we

saw during analysis but also they can store such expressions to be recalled any time without having to re-write

them. Based on this, you can create an expression that combines a first and a last name to produce and store a

full name. Here is an example:

CREATE PROCEDURE GetStudentIdentification

AS

BEGIN

SELECT FullName = FirstName + ' ' + LastName,

DateOfBirth, Sex

FROM Students

END

A stored procedure can also call a function in its body. To do this, follow the same rules we reviewed for

calling functions during data analysis. Here is an example of a procedure that calls a function:

USE ROSH;

GO

CREATE PROCEDURE GetStudentsAges

AS

BEGIN

SELECT FullName = FirstName + ' ' + LastName,

DATEDIFF(year, DateOfBirth, GETDATE()) AS Age,

Sex

FROM Students

END

Pages 154

http://www.yevol.com/en/vcsharp/databasedesign/Lesson34.htm#Functions

Here is an example of executing the procedure:

5.4.2 Practical Learning: Using Expressions and Functions

1. In the Server Explorer, under the YugoNationalBank1 connection, right-click its Stored Procedures

node and click Add New Stored Procedure

2. Change the text in the Code Editor as follows:

-- =============================================

-- Author: FunctionX

-- Create date: Friday, May 25, 2007

-- Description: This stored procedure creates a

-- username for each employee.

-- It also assigns an email to the

employee.

-- =============================================

CREATE PROCEDURE dbo.CreateUsername

AS

BEGIN


SET Username = LOWER(LEFT(FirstName, 1) +

Pages 155

LEFT(LastName, 5))

FROM dbo.Employees;


SET EmailAddress = LOWER(LEFT(FirstName, 1) +

LEFT(LastName, 5)) +

'@yugonationalbank.com'

FROM dbo.Employees;

END

3. To save the stored procedure, on the Standard toolbar, click the Save button

4. To execute the stored procedure, in the Server Explorer and under the YugoNationalBank2 database,

right-click CreateUsername and click execute

5. Close the stored procedure tab

5.4.3 Introduction to Arguments of a Stored Procedure

Introduction

Like a method of a class in the C# language, and like a function in Transact-SQL, a store procedure can take

0, 1, 2, or more arguments. An argument allows a procedure to receive values from the code that is accessing

the procedure. For example, if you decide to calculate the age of a student, because there are many students,

you can create a stored procedure that receives a student's date of birth, the procedure can then use that value

and it would produce a number that represents the corresponding age.

All of the procedures we used in the previous sections of this lesson assumed that the values they needed were

already in a table of the database. In some cases, you may need to create a procedure that involves values that

are not part of the database. On such a scenario, for the procedure to carry its assignment, you would supply it

with one or more values.

An external value that is provided to a stored procedure is called a parameter. When you create a procedure,

you must also create the parameter if you judge it necessary.

When it comes to arguments, most, if not all, of the rules used in the methods of a class also apply to a

procedure. For example, when you execute a procedure that takes one or more arguments, you must provide a

value for each argument.

Passing Arguments

To create a procedure that takes an argument, type the formula CREATE PROCEDURE or CREATE

PROC followed by the name of the procedure. Then, type the name of an argument starting with the @

symbol. The parameter is created like a column of a table. That is, a parameter must have a name, a data type

and an optional length depending on the type. Here is the syntax you would use:


@ParameterName DataType

AS

BEGIN


}

Pages 156

When implementing the procedure, you can define what you want to do with the parameter(s), in the body of

the procedure. One way you can use a parameter is to run a query whose factor the user would provide. For

example, imagine you want to create a procedure that, whenever executed, would be supplied with a student's

sex, then it would display the list of students of that sex. Since you want the user to specify the sex of students

to display, you can create a procedure that receives the sex. Here is an example:

CREATE PROC dbo.GetListOfStudentsBySex

@sx VARCHAR(12)

AS

SELECT FirstName, LastName,

DateOfBirth, HomePhone, Sex

FROM Students

WHERE Sex = @sx

5.4.4 Executing an Argumentative Stored Procedure

In Microsoft Visual Studio

As seen in previous sections, to call a stored procedure in Microsoft Visual Studio, you can right-click it in the

Server Explorer and click Execute. If the procedure takes at least one argument, you must supply it. If you are

working visually, a dialog box would come up to allow you to provide a value for the argument. Here is an

example after right-clicking the above store procedure and clicking Execute:

As you can see, you can enter the desired value in the Value column, then click OK. The result would then

appear in the Output window:

Pages 157

In the same way, if you execute a stored procedure that takes more than one argument, in the Run Stored

Procedure dialog box, specify the desired but right value for each argument. The value must be of the

appropriate type.

5.4.5 In SQL

As mentioned already, when executing a procedure that takes a parameter, make sure you provide a value for

the parameter. The syntax used is:

EXEC ProcedureName ParameterValue

If the parameter is Boolean or numeric, make sure you provide an appropriate numeric value. If the parameter

is a character or a string, type its value in single-quotes. Here is an example:

EXEC ROSH.dbo.GetListOfStudentsBySex 'Male';

Pages 158

Here is an example of executing it:

Notice that we could/should have omitted to include the Sex column in the statement since it would be implied

to the user.

Another type of stored procedure can be made to take more than one parameter. In this case, create the

parameters in the section before the AS keyword. Separate the parameters by a comma. The syntax you would

use is:


@ParameterName1 DataType, @ParameterName2 DataType, @ParameterName_n DataType

AS


Pages 159

Here is an example:

CREATE PROCEDURE IdentifyStudentsByState

@Gdr varchar(20),

@StateOrProvince char(2)

AS

BEGIN

SELECT FullName = LastName + ', ' + FirstName,


Sex

FROM Students

WHERE (Sex = @Sx) AND (State = @StateOrProvince)

}

When calling a procedure that takes more than one parameter, you must still provide a value for each

parameter but you have two alternatives. The simplest technique consists of providing a value for each

parameter in the exact order they appear in the procedure. Here is an example:

USE ROSH;

GO

EXEC ROSH.dbo.IdentifyStudentsByState 'Female', 'MD';

GO

This would produce:

Pages 160

Alternatively, you can provide the value for each parameter in the order of your choice. Consider the

following procedure that takes 3 arguments:

CREATE PROCEDURE IdentifySomeStudents

@Gdr varchar(20),

@StateOrProvince char(2),

@HomeStatus bit

AS

BEGIN

SELECT FullName = LastName + ', ' + FirstName,


Sex

FROM Students

WHERE (Sex = @Gdr) AND

(State = @StateOrProvince) AND

(SPHome = @HomeStatus)

}

If you visually execute this type of procedure, in the Run Stored Procedure dialog box, enter the desired but

appropriate value under the Value column corresponding to each argument. Here is an example:

After clicking OK, this would produce:

Pages 160

If you are programmatically executing this type of procedure, you can type the name of each parameter and

assign it the corresponding value. Here is an example:

EXEC IdentifySomeStudents @HomeStatus=1, @StateOrProvince='MD', @Sx='Female';

5.4.6 Default Arguments

Imagine you create a database for a department store and a table that holds the list of items sold in the store:

Pages 161

Supposed you have filled the table with a few items as follows:

ItemNumber ItemCategoryID ItemName ItemSize UnitPrice

264850 2 Long-Sleeve Jersey

Dress Petite 39.95

930405 4 Solid Crewneck Tee Medium 12.95

293004 1 Cotton Comfort

Open Bottom Pant XLarge 17.85

924515 1 Hooded Full-Zip

Sweatshirt S 69.95

405945 3 Plaid Pinpoint Dress

Shirt 22 35-36 35.85

294936 2 Cool-Dry Soft Cup

Bra 36D 15.55

294545 2 Ladies Hooded

Sweatshirt Medium 45.75

820465 2 Cotton Knit Blazer M 295.95

294694 2 Denim Blazer -

Natural Brown Large 75.85

924094 3 Texture-Striped

Pleated Dress Pants 44x30 32.85

359405 3 Iron-Free Pleated

Khaki Pants 32x32 39.95

192004 3 Sunglasses 15.85

Pages 162

Imagine you want to create a mechanism of calculating the price of an item after a discount has been applied

to it. Such a procedure can be created as follows:

CREATE PROCEDURE dbo.CalculateNetPrice

@discount Decimal

AS

SELECT ItemName, UnitPrice - (UnitPrice * @discount / 100)

FROM StoreItems

This can be executed as follows:

Pages 163

If you are planning to create a procedure that takes an argument and you know that the argument will likely

have the same value most of the time, you can provide that value as a parameter but leave a room for other

values of that argument. A value given to an argument is referred to as default. What this implies is that, when

the user calls that stored procedure, if the user does not provide a value for the argument, the SQL interpreter

would use the default value.

To create a stored procedure that takes an argument that carries a default value, after declaring the value, on

its right side, type = followed by the desired value. Here is an example applied to the above database:

CREATE PROCEDURE dbo.CalculateDiscountedPrice

@discount decimal = 10.00

AS

SELECT ItemName, UnitPrice - (UnitPrice * @discount / 100)

FROM StoreItems;

When executing a procedure that takes a default argument, you do not have to provide a value for the

argument if the default value suits you. If you are executing the procedure in Microsoft Visual Studio, when

the Run Stored Procedure dialog box comes up, you can leave the Value with the <DEFAULT> option:

Pages 164

Based on this, the above procedure call would produce:

If the default value does not apply to your current calculation, you can provide a value for the argument.

You can create a procedure that takes more than one argument with default values. To provide a default value

for each argument, after declaring it, type the desired value to its right side. Here is an example of a procedure

that takes two arguments, each with a default value:

CREATE PROCEDURE dbo.CalculateSalePrice

@Discount Decimal = 20.00,

@TaxRate Decimal = 7.75

AS

SELECT ItemName As [Item Description],

UnitPrice As [Marked Price],

UnitPrice * @Discount / 100 As [Discount Amt],

UnitPrice - (UnitPrice * @Discount / 100) As [After Discount],

UnitPrice * @TaxRate / 100 As [Tax Amount],

(UnitPrice * @TaxRate / 100) + UnitPrice -

(UnitPrice * @Discount / 100) + (@TaxRate / 100) As [Net Price]

FROM StoreItems;

RETURN

Pages 165

Here is an example of executing the procedure:

When calling a procedure that takes more than one argument with all arguments having default values, you do

not need to provide a value for each argument, you can provide a value for only one or some of the arguments.

The above procedure can be called with one argument as follows:

EXEC CalculateSalePrice2 55.00

In this case, the other argument(s) would use its(their) default value(s). We saw that, when calling a procedure

that takes more than one argument, you did not have to provide the values of the argument in the exact order

they appeared in the procedure, you just had to type the name of each argument and assign it the desired value.

In the same way, if a procedure takes more than one argument and some of the arguments have default values,

when calling it, you can provide the values in the order of your choice, by typing the name of each argument

Pages 166

and assigning it the desired value. Based on this, the above procedure can be called with only the value of the

second argument as follows:

EXEC CalculateSalePrice2 @TaxRate = 8.55

In this case, the first argument would use its default value.

5.4.7 Output Parameters

Many computer languages, including C#, use the notion of passing an argument by reference. This type of

argument is passed to a stored procedure but it is meant to return a value. Transact-SQL uses the same

technique. In other words, you can create a stored procedure that takes a parameter but the purpose of the

parameter is to carry a new value when the procedure ends so you can use that value as you see fit.

To create a parameter that will return a value from a stored procedure, after the name of the procedure, if you

want the procedure to take arguments, type them. Otherwise, omit them. On the other hand, you must pass at

least one argument, name it starting with the @ symbol, specify its data type, and enter the OUTPUT

keyword on its right. Based on this, the basic syntax you can use is:


@ParameterName DataType OUTPUT

AS


or


@ParameterName DataType OUTPUT

AS

BEGIN


}

In the body of the stored procedure, you can perform the assignment as you see fit. The primary rule you must

follow is that, before the end of the procedure, you must have specified a value for the OUTPUT argument,

that is, the value that the procedure will return. Here is an example:

CREATE PROCEDURE dbo.CreateFullName

@FName varchar(20),

@LName varchar(20),

@FullName varchar(42) OUTPUT

AS

SELECT @FullName = @LName + ', ' + @FName

When calling the procedure, you must pass an argument for the OUTPUT parameter and, once again, you

must type OUTPUT to the right side of the argument. Remember that the procedure would return the

argument. This means that, after calling the procedure, you can get back the OUTPUT argument and use it as

you see fit. Here is an example:

DECLARE @FirstName varchar(20),

@LastName varchar(20),

@Full varchar(42)

SET @FirstName = 'Melanie';

SET @LastName = 'Johanssen';

EXECUTE dbo.CreateFullName @FirstName, @LastName, @Full OUTPUT

SELECT @Full;

Pages 167

One of the advantages of using a function or a stored procedure is that it has access to the tables and records of

its database. This means that you can access the columns and records as long as you specify the table or the

view, which is done with a FROM clause associated with a SELECT statement. Consider the following

stored procedure created in a database that contains a table named Students:

USE ROSH;

GO

CREATE PROCEDURE ShowStudentsFullNames


AS

SELECT @FullName = LastName + ', ' + FirstName FROM Students;

GO

When you execute this procedure, it would work on the records of the table. One of the particularities of a

procedure that takes an OUTPUT argument is that it can return only one value. Consider the following

example of executing the above procedure:

When calling such a procedure, if you do not specify a condition to produce one particular result, the SQL

interpreter in this case would select the last record. This means that you should always make sure that your

procedure that takes an OUTPUT parameter would have a way to isolate a result. If the procedure processes a

SELECT statement, you can use a WHERE condition. Here is an example of such a procedure:

USE ROSH;

GO

CREATE PROCEDURE ShowStudentsFullNames


AS

SELECT @FullName = LastName + ', ' + FirstName FROM Students

WHERE StudentID = 8;

GO

When this procedure is executed, it would produce only the record stored in the 8th position of the table.

Pages 168

References:

Website


http://www.ucl.ac.uk/archaeology/cisp/database/manual/node1.html

http://webcache.googleusercontent.com/search?q=cache:1MKqsSHXUIoJ:www.cs.umb.edu/cs630/hd1.pdf+&cd=4&hl=en&ct=clnk&gl=kh

http://www.studytonight.com/dbms/database-normalization.php

https://support.microsoft.com/en-us/kb/283878/

http://holowczak.com/database-normalization/

http://www.1keydata.com/database-normalization/

BooK

Book : Introducing Microsoft SQL Server 2012

PUBLISHED BY

Microsoft Press

A Division of Microsoft Corporation

One Microsoft Way

Redmond, Washington 98052-6399

Copyright © 2012 by Microsoft Corporation

Acquisitions Editor: Anne Hamilton

Developmental Editor: Devon Musgrave

Project Editor: Carol Dillingham

Technical Reviewer: Mitch Tulloch; Technical Review services provided by Content Master, a member of CM

Group, Ltd.

Copy Editor: Roger LeBlanc

Indexer: Christina Yeager

Editorial Production: Waypoint Press

Cover: Twist Creative • Seattle

Book : Database Systems: Design, Implementation, and Management, Ninth Edition

Carlos Coronel, Steven Morris, and Peter Rob Vice President of Editorial, Business: Jack W. Calhoun Publisher: Joe Sabatino

Senior Acquisitions Editor: Charles McCormick, Jr.

Senior Product Manager: Kate Mason

Development Editor: Deb Kaufmann

Editorial Assistant: Nora Heink

Senior Marketing Communications Manager:

Libby Shipp

Marketing Coordinator: Suellen Ruttkay

Content Product Manager: Matthew Hutchinson

Senior Art Director: Stacy Jenkins Shirley

Cover Designer: Itzhack Shelomi

Cover Image: iStock Images

Media Editor: Chris Valentine

Manufacturing Coordinator: Julio Esperas Copyeditor: Andrea Schein Proofreader: Foxxe Editorial Indexer: Elizabeth Cunningham Composition: GEX Publishing Services

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

Pages 169


http://www.ucl.ac.uk/archaeology/cisp/database/manual/node1.html



http://www.studytonight.com/dbms/database-normalization.php

https://support.microsoft.com/en-us/kb/283878/

http://holowczak.com/database-normalization/

http://www.1keydata.com/database-normalization/

Date post:	09-Feb-2017
Category:	Education
Upload:	sobunna-seng
View:	161 times
Download:	18 times

Introduction to Database

Education