+ All Categories
Home > Documents > Partial values in a tabular database model

Partial values in a tabular database model

Date post: 23-Aug-2016
Category:
Upload: john-grant
View: 224 times
Download: 3 times
Share this document with a friend
3
Volume 9, number 2 INFORMATION PROCESSING LETTERS 17 August 1979 PARTIAL VALUES IN A TABULAR DATABASE MODEL John GRANT Computer and Information Sciences, University of Florida, Gainesville, FL3261 I, U.S.A. Received 22 August 1978; revised version received 12 June 1979 Partial values, relational database In a previous paper, [3], we discussed null values in a relational database. The purpose of this paper is to indicate how partial values may be introduced in the framework of a relational database. In a forthcoming paper, [4], we investigate further the handling of incomplete information. We also refer the reader to [5] for important results and additional references about incomplete information in databases. We motivate our discussion by considering a situa- tion where one of the attributes in a relation is AGE. We would expect a data entry for AGE to be a single number. If the value of AGE is unknown, the relatio- nal model allows an entry called the null value to indi- cate this situation. However no allowance is made for the case where the value of AGE is known partially (say it is between 30 and 40). One of the fundamental properties of the relaiional model of data is that all the entries in a table are (non- decomposable) atomic elements [2, p-761. We need to modify this property but we try to keep the other features of the relational model and continue to deal strictly with tables. In the relational model, given a table with n columns whose domains are Dr, Dz, .... D, respectively, an n-tuple (at, .. .. a,> may be a row if ai E Di, 1 G i < n. Now assume that for each i, 1 < i < n, Pi is a set of (nonempty) subsets of Die Then in our tabular model an n-tuple (At, .. . . A,) may be a row if Ai E Pi, 1 < i < n. If Pi = (the set of singletons of Ai) u {Di), then we get back to the relational model with a singleton representing the corresponding element. The null value in column i is represented by Di. For our applications in this paper we use numerical entries. It seems reasonable to restrict partinl value to intervals where an interval can be represented by its endpoints. Thus for each i, Di = Z and Pi = (the set of singletons) u (the set of intervals) u (Z}. There are now three types of data entries: a single number n representing the singleton (n), a pair of numbers (i, j) where i < j representing the interval {n E Z 1 i < n < j}, and Z (usually called the null value). In dealing with partial values it is useful to define the concept of the range, or the set of possible values of an entry, a row, and a table (a column is a special kind of table). The range of an entry is the set repre- sented by the entry and is denoted by a’ (for entry a). Definition 1. (i) Let r be a row: r = (al, . . . . a,>. Then r’ = a; X -** X a;. (ii) Let V be a table (set of rows): V = {q, .... rk}. Then V’ = ((~1, . . . . Sk) 1 Si E ri, i = 1, . . . . k}. For each predicate P over tables Ut, .... U, WR can define the corresponding true-predicate (the case where the predicate’s value is definitely true) and the corresponding maybe-predicate (the case where the predicate’s value is true or unknown). We use the sub- scripts T and M for this purpose. Rows and columns are special kinds of tables. Definition 2. P is a predicate over tables Vt, .... V,: (i) PT(VI, ...) V,)ifforallUr EV;,...,U,EVA, w,, l *‘9 U,), (ii) P&VI, . . . . V,) if there exist lJ1 E Vi, .... U, E VAsuch that P(Ut, .... U,). 97
Transcript

Volume 9, number 2 INFORMATION PROCESSING LETTERS 17 August 1979

PARTIAL VALUES IN A TABULAR DATABASE MODEL

John GRANT Computer and Information Sciences, University of Florida, Gainesville, FL3261 I, U.S.A.

Received 22 August 1978; revised version received 12 June 1979

Partial values, relational database

In a previous paper, [3], we discussed null values in a relational database. The purpose of this paper is to indicate how partial values may be introduced in the framework of a relational database. In a forthcoming paper, [4], we investigate further the handling of incomplete information. We also refer the reader to [5] for important results and additional references about incomplete information in databases.

We motivate our discussion by considering a situa- tion where one of the attributes in a relation is AGE. We would expect a data entry for AGE to be a single number. If the value of AGE is unknown, the relatio- nal model allows an entry called the null value to indi- cate this situation. However no allowance is made for the case where the value of AGE is known partially (say it is between 30 and 40).

One of the fundamental properties of the relaiional model of data is that all the entries in a table are (non- decomposable) atomic elements [2, p-761. We need to modify this property but we try to keep the other features of the relational model and continue to deal strictly with tables. In the relational model, given a table with n columns whose domains are Dr, Dz, . . . . D, respectively, an n-tuple (at, . . . . a,> may be a row if ai E Di, 1 G i < n. Now assume that for each i, 1 < i < n, Pi is a set of (nonempty) subsets of Die Then in our tabular model an n-tuple (At, . . . . A,) may be a row if Ai E Pi, 1 < i < n. If Pi = (the set of singletons of Ai) u {Di), then we get back to the relational model with a singleton representing the corresponding element. The null value in column i is represented by Di.

For our applications in this paper we use numerical

entries. It seems reasonable to restrict partinl value to intervals where an interval can be represented by its endpoints. Thus for each i, Di = Z and Pi = (the set of singletons) u (the set of intervals) u (Z}. There are now three types of data entries: a single number n representing the singleton (n), a pair of numbers (i, j) where i < j representing the interval {n E Z 1 i < n < j}, and Z (usually called the null value).

In dealing with partial values it is useful to define the concept of the range, or the set of possible values of an entry, a row, and a table (a column is a special kind of table). The range of an entry is the set repre- sented by the entry and is denoted by a’ (for entry a).

Definition 1. (i) Let r be a row: r = (al, . . . . a,>. Then r’ = a; X -** X a;.

(ii) Let V be a table (set of rows): V = {q, . . . . rk}. Then V’ = ((~1, . . . . Sk) 1 Si E ri, i = 1, . . . . k}.

For each predicate P over tables Ut, . . . . U, WR can

define the corresponding true-predicate (the case where the predicate’s value is definitely true) and the corresponding maybe-predicate (the case where the predicate’s value is true or unknown). We use the sub- scripts T and M for this purpose. Rows and columns are special kinds of tables.

Definition 2. P is a predicate over tables Vt, . . . . V,:

(i) PT(VI, . ..) V,)ifforallUr EV;,...,U,EVA, w,, l *‘9 U,), (ii) P&VI, . . . . V,) if there exist lJ1 E Vi, . . . . U, E

VA such that P(Ut, . . . . U,).

97

Volume 9, number 2 BNFORMATION PROCE!iSINC LETTERS 17 August 1979

We obtain the following general properties for true- urns. Our database model is not even in first normal

predicates and maybe-predicates. form since partial (nonatomic) values are allowed

Proposition 1. Let P and Q be predicates: (i> (not P)T iCi PM does not hold; (not P)M iff PT

does not hold, (ii) (P and Q)T iff PT and QT; (P or Q)M iff PM or

QM9 (iii) the statements (P and Q)M iff PM and QM, U'

or Q)T iff PT or Q-r are not necessarily true.

The following is a counterexample to the state- ments in (iii): r is a row of a table consisting of one column, r = ((3,7)>, P: a = 5, Q: not a = 5. It follows from Proposition 1 that a <T b is not necessarily the same as a <r b or a =T b. In particular if a = (3,6) and b = (6,9), then a <T b but neither a <T b nor a =T b. Proposition 1 also implies that PT and PM cannot be defined by starting with atomic predicates and apply- ing the connectives in the usual way.

If we restrict Definition 2 to rows and the atomic predicates = and <, we obtain the following result.

Definition 4. V is a table two of whose columns are A and B, cl and c2 are arbitrary rows of V with values al, a2 for A and b 1, b2 for B respectively. For any two rows of V either al = a2 or not al =M a$

(i) A +r B if: A and B have single entries only and ifar = a2, then br = b2,

(ii) A +2 B if: A has single entries only and if ar z a2, then bt = b2,

(iii) A j3 B if: B has single entries only and if al = a2, then br = b2,

(iv) A -j4 B if: if al = a2, then br = b2.

Proposition 2. Let a and b be c:;1tries: l (i) a =T b iff a and b are siilgleton entrieS and a = b;

a =M b iff a’ n b’ f 0, (ii) a <T b iff sup a’ < inf b’; a <M b iff inf a’ <

cup b’.

We next apply Definition 2 to the subset relation.

Definition 3. Let A and B be columns: (i) A CT B if for all At E A’ and Br E B”, A, c Bt ,

(ii) A CM B if there exist Ar E A’ and Br 5 B’such

A +1 B is functional dependency in our model since it means that an element of A determines an element of B. We may interpret the other three depen- dencies as follows: A -?2 B means that an element of A determines a range of B; A -9 B means,that a range of A determines an element of B; A +4 B means that a range of A determines a range of B. Note that 32 seems similar to multivalued dependency. But A multi- determines B means that a specific set of elements of B is associated with an element of A. By associating a partial value of B with an element of A we mean that one element of B is associated with an element of A but we do not know which element it is although we know that if falls in a certain range. For an EMPLOYEE table where AGE is known only within a certain range we may have EMPLOYEE-NUMBER +2 AGE.

that At G Bt.

Consider now the example on page 26 of [ 1 ] where, using our notation, R = (2, 1 }, S = (Z, 1,2}. In [ 1 ]

S G R is given the truth-value false. However the null value, Z, can be any value; it can be 2. Using the sub- stitution 2 for Z in both R and S to obtain Rr and S, we find that Sr c Rr. Thus S CM R SO that S C R should be given the truth-value maybe instead of false.

Dependencies 33 and d4 are different in that the determining column may have partial values. The fol- lowing example of +4 seems reasonable. Assume that a table INSURANCE is given including columns for AGE and PREMIUM. We may have a range AGE deter- mining a range of PREMIUM so that for example if AGE is (21,30), then PREMIUM is (100,120). We ob- tain AGE d4 PREMIUM. We feel that such a depen- dency should be considered in deciding on database structure.

Now we consider dependency statements. The most important kind of dependency is functional depen- dency which is used to obtain third normal form for a relational database. We deal with the simplest form of such dependency containing two columns only but the discussicn can be extended to the case of many col-

When a relational database is modified to change its structure to a better normal form, it is decomposed into more relations. Now consider a situation where the table is in a form analogous to second normal form with attributes A, B, C where A +1 B, A 32 C, B +2 C and only column C contains partial values. We

98

Volume 9, number 2 INFGRMATION PROCESSING LETTERS 17 August 1979

recommend the decomposition of this table into two tables with columns A, B and B, C just as if we had A + B, A 3 C, B + C in a relation. Otherwise we run into the insertion, deletion, and updating anomalies characterising second normal form.

Decomposition for a relational database is done in such a way that no information is lost: the result of joining the decomposed relations is the original rela- tion. We now show that this can be done for a table with partial values.

Definition 5. U and V are tables with columns Al, . . . .

A,, and Br , . . . . Bm respectively. Then JOIN U and V over An and Br = {(a,, . . . . a,,, bz, . . . . b,) l(ar ,..., a& U, (bt, . . . . b,)EV,a,=br).

We note that this JOIN operation is just one of several possibilities. We deal with it here because it allows the recovery of the original relation after decomposition. This is due to the assumption in Deli- nition 4 that any column mentioned on the left side of a dependency must have identical or disjoint entries. The decomposition of a table is done by the projection of the table over specified columns.

Proposition 3. U is a table with columns A, B, C. As- sume that A +i B, A +i C, B 3k C with i, j, k E {1,2, 3,4}. If U is decomposed along B into two tables Ur (columns A and B) and U2 (columns B and C), then JOIN U1 and Uz over B and B = U.

In Definition 4 and the subsequent discussion we assume that for any two rows of U either al = a2 or not al =M a2. The reason is that we do not want to allow a situation where A determines B with (1,s) and (3,10) as entries in the A column and 6 and 7 the car responding entries in the B column.

Another approach to this problem allows al =M a2

and not al = a2 with the assumption that if A “‘i B, i E ik4) and al =M a2, then br =M b2. IRed now our general definition where the i* column ha$i do- main Di, Pi is a set of nonempty subsets of Di and if (A 1, . . . . A,) is a row, then Ai EPi, 1 < i < n. I31 this framework a dependency Di + Dj is a function f :

Pi + Pj which preserves union and intersection!, i.e. f(Ai U Bi) = Ai lJ Bj (if Ai U Bi E Pi) and f(Ai (7 Bi) = Ai n Bi (if Ai n Bi E Pi).

We conclude by pointing out that the introduction of partial values considerably complicates the handling of the database. It has the virtue however of allowing the storage and retrieval of incompletely specified information. Going back to our second paragmph, if the partial value for the AGE of a particular EMPLOYEE is (30j 40) and a retrieval request is made for all employees whose AGE < 50 (an instance of the predicate CT), this employee would be includsed.

Acknowledgment

I wish to thank the referees for very valuable com- ments and suggestions.

References

[ 1 ] E.F. Codd, Understanding relations, FDT Bull. ACM- SIGMOD 7 (3,4) (1975) 23-28.

[2] C.J. Date, An Introduction to Database Systems (Addi- son-Wesley, Reading, MA, 2nd ed., 1977).

[ 31 J. Grant, NuII values in a relational database, Information Processing Lett. 6 (1977) 156-157.

(41 J. Grant, Incomplete information in a relational database, Fund. Informaticae, to appear.

[ 51 W. Lipski, Jr., On semantic issues connected with incom- plete information databases, ACM Trans. Database Sys- tems, to appear.

99


Recommended