+ All Categories
Home > Documents > !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

!!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

Date post: 09-Apr-2018
Category:
Upload: george-pavalache
View: 216 times
Download: 0 times
Share this document with a friend

of 26

Transcript
  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    1/26

    Algorithms for analysing related constraint business rules

    Gaihua Fu a,*, Jianhua Shao a, Suzanne M. Embury b, W. Alex Gray a

    a Department of Computer Science, Cardiff University, Newport Road, Cardiff CF24 3XF, UKb Department of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, UK

    Available online 7 February 2004

    Abstract

    Constraints represent a class of business rules that describe the conditions under which an organisation

    operates. It is common that organisations implement a large number of constraints in their supporting

    information systems. To remain competitive in todays ever-changing business environment, organisations

    are increasingly recognising the ability to evolve the implemented constraints timely and correctly. While

    many techniques have been proposed to assist constraint specification and enforcement in information

    systems, little has been done so far to help constraint evolution. In this paper, we introduce a form of

    constraint analysis that is particularly geared towards constraint evolution. More specifically, we proposeseveral algorithms for determining which constraints collectively restrict a specified set of business objects,

    and we study their performance. Since the constraints contained in an information system are typically in

    large quantities and tend to be fragmented during implementation, this type of analysis is desirable and

    valuable in the process of their evolution.

    2004 Elsevier B.V. All rights reserved.

    Keywords: Constraints; Business rules; Constraint analysis

    1. Introduction

    Constraints are commonly employed in information systems to ensure that the data stored andproduced by the systems are valid and represent the intended semantics of the real-world

    [15,17,25]. Such constraints are typically derived from business requirements, and therefore can bethought of representing a class of business rules that describe the conditions under which an

    * Corresponding author. Tel.: +44-29-20874000x7329; fax: +44-29-20874598.

    E-mail addresses: [email protected] (G. Fu), [email protected] (J. Shao), [email protected] (S.M.

    Embury), [email protected] (W.A. Gray).

    0169-023X/$ - see front matter 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.datak.2004.01.006

    www.elsevier.com/locate/datak

    Data & Knowledge Engineering 50 (2004) 215240

    http://mail%20to:%[email protected]/http://mail%20to:%[email protected]/
  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    2/26

    organisation operates [18,26]. For example, one constraint might define the criteria that make a

    customer eligible for a certain kind of discount, which corresponds closely to the organisationspolicy on customer discount.

    Constraints are typically very volatile, perhaps due to their business origin. New restrictions arefrequently imposed on organisations as a result of changes to statutory regulations, governmentpolicy and economic/market conditions. Many organisations also choose to modify their business

    rules on an on-going basis, in order to remain competitive in todays ever-changing businessenvironment. These changes frequently trigger the need to update the constraints implemented inthe supporting information systems.

    However, updating an implemented set of constraints is not easy. Typically, a constraint beginsits life as a high-level (natural language) statement of some condition or policy that the organi-sation wishes to enforce, and then gets fragmented and dispersed throughout the system in its

    implementation. For example, a constraint might state that all customers whose total business

    within the last year is greater than

    5000 are eligible for a 5% discount on all orders. The coderelating to its implementation might need to be inserted into application programs and thedatabase management system which manage: the creation of new customer orders; modification

    of existing orders, maintenance of information on current discounts; and archiving of historicalcustomer data. If the mapping from the high-level constraint statement to the software artefactswas accurately documented and maintained, then it would not be very difficult to update this

    constraint. Unfortunately, many constraints are introduced into an information system as andwhen they are needed. Even today, constraints are rarely documented comprehensively duringsystem development and any document that is created quickly becomes out of date and unreliable.

    Therefore, system engineers are often forced to sift through the entire system in order to identifythe constraints that are affected by a change made at the high level. This is undesirable because

    First, an information system typically implements a large number of constraints. For example,Herbst et al. reported the existence of 627 business rules in a 12,000 line COBOL application

    [19], and a similar result was shown in [10], where 809 business rules were found in a 30,000 lineCOBOL code. Many of these rules are constraints. The large number of constraints makes it avery time-consuming process for the system engineer to manually identify the subset of con-

    straints that is actually relevant to the task in hand. Second, without an accurate knowledge of how the statement is mapped to the corresponding

    software artefacts, trying to determine manually which implemented constraints are affected bythe change made to the high-level statement during evolution becomes a rather error-prone

    process, particularly when the set of implemented constraints is large.

    Thus, there is a need for techniques which can help system engineers to locate and piece to-gether the relevant constraints at a higher level automatically. While many theories and tech-

    niques have been proposed to assist constraint specification and enforcement in informationsystems, little has been done so far to help their evolution. In this paper, we introduce a form ofconstraint analysis that supports constraint evolution. More specifically, we assume that the set of

    constraints implemented in an information system has already been identified or recovered, eitherfrom available documentation, domain experts or through reverse engineering [10,12], and we

    propose several algorithms for determining which subset of the constraints collectively restricts

    216 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    3/26

    the state of a specified set of business objects. This is based on our observation that the majority of

    constraints are defined in terms of business concepts or objects. For example, our example con-straint on customer discount is stated in terms of business concepts such as customer, discount and

    order. Thus, by using business objects as glue, we can help the system engineer to locate acoherent set of constraints that is relevant to the evolution task in hand.

    The paper is organised as follows. Section 2 introduces some basic concepts and notations used

    in the paper. The algorithms for determining which constraints collectively restrict a specified setof business objects are given in Section 3. In Section 4 we study the performance of the proposedalgorithms. We show that our algorithms have a polynomial complexity and therefore can scale to

    non-trivial applications. Conclusions and future work are given in Section 5.

    2. Preliminaries

    In this section we introduce some basic notations and concepts. We define constraints in termsofstructures and briefly describe the formalism that we use to express our constraints.

    Definition 1. A structure defines an intension for a set of data. A structure can be primitive orcomposite:

    a primitive structure defines an intension for a set of atomic data, and is denoted by r, the name

    of the structure; a composite structure defines an intension for a set of composite data, and is denoted by

    rr1;r2; . . . ; rn, where r is the name of the structure and each ri (16 i6 n) is a componentofr and is a structure itself.

    So, in contrast to the majority of work on constraint analysis for relational database systemsthat assume a flat structure [7,23], we allow a structure to be nested. This is necessary because the

    nature of constraint evolution is such that we often need to work with constraints that arerecovered from legacy systems. These systems, such as COBOL or IDMS, typically do not assumeflat structures in their constraint definition and implementation. Allowing nested structures is also

    useful for analysing constraints that are specified for complex objects in modern informationsystems [27].

    Example 1. The following is a structure that a mobile phone service provider uses to record ordersreceived from its customers:

    order(customer(id, name, status),service(network, freeTime),recommender(id, name, status))

    Here, order, customer, service and recommender are composite structures, and id,name, status, network and freeTime are primitive structures.

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 217

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    4/26

    Given a structure S, we say that Shas a depth kif it has klevels of nested components. For

    example, in Example 1, order has a depth 2, customer has a depth 1 and id has a depth 0. Werequire a structure to be acyclic. That is, a structure is not allowed to be a component of itself.

    Definition 2. Each structure has a domain. The domain for a primitive structure is a set of valuesfrom which the structure draws its instances. The domain for a composite structure is the

    Cartesian Product of its components domains. Let Sbe a structure and DomS be its domain.The state ofSat time t, denoted by St, is a subset ofDomS that Shas as its instances at time t.A constraint on Sis a statement that specifies the criteria that a valid Stmust satisfy.

    For example, for structure customer, if a constraint states that each customer must have aunique ID, then customert fc101; Kelly; current; c102; Frank; currentg is a valid state andcustomert fc101; Frank; current; c101; Kelly; currentg is not.

    One important factor that affects constraint analysis is the formalism in which the constraintsare represented [6]. We express our constraints in a formalism based on predicate logic [14].Different from other logic based constraint languages [2,29], however, our formalism is designed

    with two restrictions. First, only a (small) set of pre-defined predicates may be used. Second, weuse meta-level elements in predicates. These restrictions are introduced to ensure that our for-malism supports efficient constraint analysis and at the same time is expressive enough to capture

    the semantics of various constraints implemented in information systems. Example 2 below givesexamples of constraints expressed in our formalism, which we will use throughout the paper toillustrate the proposed algorithms. The reader is referred to Appendix A for the syntax and

    semantics, and [14] for a detailed discussion of the formalism.

    Example 2. The following constraints are enforced by the mobile phone service provider that weintroduced in Example 1:

    C1. The company uses the following networks:Vodafone, Orange, O2 and T-

    Mobile.

    C2. The company offers two categories of free talk time:300 and 600 min.

    C3. Only Vodafone customers are entitled to 600 min free talk time.

    C4. The maximum number of O2 services available for ordering is 10,000.

    C5. The status of a customer is one of the following:current, temporary or

    historic.

    C6. A recommender can recommend at most three customers.C7. A customer can subscribe up to three services.

    C8. A recommender must be an existing customer.

    Based on the structure given in Example 1, these constraints are expressed in our formalism asfollows:

    C1. ENUMERATEnetwork; fVodafone; Orange; O2; T-MobilegC2. ENUMERATEfreeTime; f300; 600g

    218 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    5/26

    C3. EQUALservice:freeTime; 600 ! EQUALservice:network; VodafoneC4. EQUALorder:service:network; O2 ! ATMOSTorder:customer; 10; 000C5. ENUMERATEstatus; fcurrent; temporary; historicg

    C6. ATMOSTrecommender; 3; customerC7. ATMOSTcustomer; 3; serviceC8. SUBSUMEcustomer; recommender

    Here, ENUMERATE, EQUAL, ATMOSTand SUBSUMEare all pre-defined predicates. The termsused in the predicates, e.g. network, freeTime, are meta-level elements that represent structuresrather than instances.

    To express nested structures in our predicates, we introduce the concept ofpath expression.

    Definition 3. A path expression has the form Sm Si1 Si S0, where each Si is a structure name

    and the dot stands for the component-ofrelationship between the structures appearing oneither side of it. Given a path expression Sm Si1 Si S0, we call S0 the target and Sm thesource.

    For example, service:freeTime in C3 is an example of path expression. Constraints expressedin our formalism are considered to restrict target structures of path expressions. For example,C3 in Example 2 is considered to restrict freeTime and network, because both appear as

    target in the path expressions. Since a structure can be a component of more than onestructure, a full path expression is often necessary to avoid any ambiguity. For example,status is a component of both customer and recommender according to Example 1. If a

    constraint applies only to the status ofcustomer, then customer must be used to

    qualify status in the path expression. However, if a constraint applies to a structureregardless of its higher level structures, then a full path expression is unnecessary. For

    example, C5 applies to the status of both customer and recommender, so the parentstructures are omitted.

    3. Determining related constraints

    Constraint analysis has been studied extensively in the literature (see Section 3.4 for a dis-cussion of related work). In this section, we introduce a form of analysis that has not been

    considered previously: determine which set of constraints that collectively restrict the state of aspecified set of business objects. We call such set of constraints related constraints. In the fol-lowing, we first explain precisely what we mean by related constraints and outline the difficulties

    associated with their derivation. We then introduce three algorithms for deriving them.

    3.1. SCgraph and related constraints

    To illustrate what we mean by related constraints, we introduce the concept ofSCgraph first.

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 219

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    6/26

    Definition 4. An SCgraph, G hS[ C; SS[ CSi, is defined as follows. The vertex set is theunion ofSand C, where Sis a set of structures and Cis a set of constraints. The arc set is theunion ofSSand CS. An ss arc in SShas the form Si; Sj, where Si; Sj 2 S, and it indicates that Sjis a component ofSi. A cs arc in CShas the form Ci; Sj, where Ci 2 Cand Sj 2 S, and it specifiesthat Ci constrains Sj.

    Given a set of constraints expressed in our formalism, we construct a corresponding SCgraphas follows. We assume that structure definitions are available, and Sand SSare derived directlyfrom those definitions. Each constraint in the given set becomes a vertex in C, and a cs arc Ci; Sjis added to the graph ifSj is the target of a path expression in Ci. For example, the SCgraph forthe constraints introduced in Example 2 is shown in Fig. 1, where a rectangle vertex stands for astructure and is labelled with the structure name, a round vertex stands for a constraint and is

    labelled with the constraint identifier, an arc connecting two rectangle vertices represents an ss

    arc, and an arc connecting a round vertex to a rectangle vertex represents a cs arc.From an SCgraph, three kinds of relatedness can be identified. We say that

    A constraint Ci is directly relatedto a structure Sj ifCi is connected to Sj in the SCgraph. Forexample, C5 is directly related to status according to Fig. 1.

    A constraint Ci is indirectly relatedto a structure Sj ifCi is connected to any components ofSj.

    For example, C5 is indirectly related to customer and recommender since status is a com-ponent of both customer and recommender.

    A constraint is implicitly relatedto a structure if it does not exist in C, but can be deduced fromC. For example, from the constraints given in Example 2, we can infer a new constraint:

    C9 ATMOSTorder:recommender; 9; order:serviceThis constraint is implicitly related to recommender, service and order.

    Fig. 1. The SCgraph for Example 2.

    220 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    7/26

    It is worth noting the need to consider both indirectly and implicitly related constraints in our

    analysis. For a nested structure S, we consider that the constraints declared for its components arerelated to Stoo (as DomS is derived from the domains of its components), and hence we identify

    indirectly related constraints. Implicit constraints, on the other hand, will typically not be foundin the actual implementation, since they are automatically enforced by the constraints that implythem. However, to help complete system engineers knowledge about allthe constraints that are

    effectively enforced in the system, making implicit constraints explicitly available to them duringconstraint evolution is useful.

    Based on the above observations, we define related constraints as follows.

    Definition 5. Given an SCgraph G hS[ C; SS[ CSi, a structure Sj 2 Sand a depth require-ment n, a set of related constraints for Sj within depth n, denoted by RelhSj; ni, consists of thefollowing: 1

    (1) constraints that are directly related to Sj,(2) constraints that are indirectly related to Sj within depth n,

    (3) constraints that are deducible from Cthat satisfy 1 or 2.

    For example, for customer, we have Relhcustomer; 0i fC4; C6; C7; C8g and Relhcustomer;1i fC4; C5; C6; C7; C8g, assuming no implicitly related constraints are derivable from Fig. 1.

    3.2. Difficulties of deriving related constraints

    Deriving RelhSj; ni is not a straightforward traversal of an SCgraph. One of the difficulties

    arises from the fact that a structure can be a component of several structures, thus, a constraintconnecting to a component reachable from a structure Si may not actually be related to Si.Consider the following two constraints for example:

    Ci : ENUMERATEcustomer:status; fe1; e2g

    Cj : ENUMERATErecommender:status; fe3; e4g

    According to Section 3.1, both Ci and Cj are connected to status in the SCgraph. However, ifwe are interested in deriving Relhcustomer; 1i, then Cj must not be returned since it applies to thestatus ofrecommender only. We must therefore ensure that we traverse through an SC

    graph correctly with respect to Sj and n.

    Second, since a path expression may not be fully specified (as discussed in Section 2), indeducing implicit constraints, we need to consider term compatibility. For example, suppose thatwe have the following two constraints:

    SUBSUMEP1; P2 and SUBSUMEP3; P4

    1 This definition can be generalised to including multiple structures as discussed in [13], i.e., given S1; . . . ; Sk2 Sand aset of depth requirements n1; . . . ; nk, the set of related constraints for S1; . . . ; Sk2 S, denoted by RelhS1; n1i; . . . ;hSk; nki, is the intersection ofRelhS1; n1i; . . . ;RelhSk; nki.

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 221

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    8/26

    and the axiom

    SUBSUMEX; Y ^ SUBSUMEY; Z ! SUBSUMEX; Z

    Whether SUBSUMEP1; P4 is deducible from the above two constraints depends on the formationofP2 and P3. If they are identical, then SUBSUMEP1; P4 is deducible. However, if they are not, itdoes not necessarily mean that SUBSUMEP1; P4 is not deducible. This is rather different fromperforming logical deduction in a conventional sense, and we must address the problem of termcompatibility when dealing with path expressions in deduction.

    Finally, since not every deducible constraint is actually related to hSj; ni, it is desirable thatonly the constraints that are related to the given Sj and n are deduced. For example, if we areinterested in deriving Relhcustomer; 1i, then although C9 (see Section 3.1) is deducible from theconstraints given in Example 2, it should not be generated since it does not qualify to be amember ofRelhcustomer; 1i. Given that a large set of constraints we typically need to handle in

    evolving a set of constraints for real-world applications, how to derive RelhSj; ni efficiently is achallenge.

    3.3. Algorithms for deriving related constraints

    We now introduce algorithms for computing RelhSj; ni. We assume that constraints are al-ready expressed in our formalism, and the corresponding SCgraph already constructed. All the

    algorithms are based on a combination of graph traversal and logical deduction, and perform, inessence, the following three functions:

    Normalisation. Constraints may not have their path expressions fully specified and they areonly connected to the structures to which they are directly related. Normalisation is to complete

    the path expressions by adding into them the omitted higher level structures, and to completethe SCgraph by inserting additional cs arcs that connect structures to their indirectly relatedconstraints.

    Deduction. This is to deduce implicit constraints from the given ones. How deduction is per-formed depends on the type of constraint in hand and the axioms applicable. For constraintsexpressible in our formalism, we consider two groups of axioms in deduction: transitivity

    axioms and inheritance axioms. A transitivity axiom has the form

    PredX; Y ^ PredY; Z ! PredX; Z

    where Predis a predicate that has the transitivity property. For example, the following is atransitivity axiom:

    SUBSUMEX; Y ^ SUBSUMEY; Z ! SUBSUMEX; Z

    An inheritance axiom has the form

    SUBSUMEX; Y ^ PredX; Z ! PredY; Z

    where Predis any predicate. For example, the following is an inheritance axiom:

    SUBSUMEX; Y ^ ATMOSTX; n ! ATMOSTY; n

    222 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    9/26

    It is possible to have other types of axiom. For example, the axiom

    ONEOFP; fP1; . . . ; Pmg ) SUBSUMEP; P1 _ _ SUBSUMEP; Pm

    is neither a transitivity nor an inheritance axiom. In this paper, we consider the deduction usingtransitivity and inheritance axioms only. The set of axioms we use is given in Appendix B.

    Identification. This is to traverse through an SCgraph to retrieve the constraints that satisfyRelhSi; ni. How the SCgraph is traversed to derive related constraints for hSi; ni depends onhow Normalisation and Deduction are performed. This will be explained later.

    It is important to note that the three functions outlined above are not necessarily performed inthe sequence given. For example, we can perform Normalisation as part ofIdentification. In the

    following sections, we introduce three algorithms, each of which performs these functions in adifferent way.

    3.3.1. The CNCD algorithm

    An intuitive approach to computing RelhSj; ni from an SCgraph is to completely expand theSCgraph first, and then identify the constraints that satisfy the given criteria. That is, weaugment the initial SCgraph with new cs arcs and new constraints that are derivable from the

    given set of constraints, so that indirect and implicit constraints are explicitly connected to thestructures they relate to. This will then be followed by a straightforward identification ofRelhSj; ni.

    The Complete Normalisation, Complete Deduction or CNCD algorithm we introduce in this

    section is based on this approach. Given an SCgraph G, a set of axioms A fA1; A2; . . . ; Akg andthe relatedness criteria hSj; ni, the CNCD algorithm works as follows:

    Algorithm CNCDG; A; Sj; n1. G NormaliseGraphG2. for every Ai 2 A3. ifAi is a transitivity axiom

    4. G CNCD-DeduceByTransitivityG; Ai5. else

    6. G CNCD-DeduceByInheritanceG; Ai7. Rel IdentifyRelatedG; Sj; n8. return Rel

    Normalising an SCgraph (line 1) is straightforward. For every path expression Pin a con-straint Ckin G, NormaliseGraph expands Pwith all the higher level structures that have beenomitted from it and then connects Ckto every structure in the expanded P. To indicate how closely

    a constraint is related to a structure, we assign each cs arc a weight w. An arc from Ckto Sj has aweight w ifSj is the wth structure from the target in a path expression in Ck. For example, for C7 in

    Example 2, we have C7; customer; 0, C7; service; 0 and C7; order; 1 after normalisation.It is worth noting that when a path expression in a constraint is expanded to its full extent, it

    may be necessary to create multiple copies of the constraint. For example, when the pathexpression status in C5 is expanded, we need to create the following:

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 223

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    10/26

    ENUMERATEorder:customer:status; fcurrent; temporary; historicg

    ENUMERATEorder:recommender:status; fcurrent; temporary; historicg

    This is necessary because status has multiple high-level structures. Fig. 2 shows the normalised

    version of the SCgraph given in Fig. 1.Following Normalisation, the CNCD algorithm proceeds to perform Deduction (lines 26).

    Since the axioms in the same group are handled similarly, we essentially have two deductivefunctions to perform: one for handling transitivity axioms (CNCD-DeduceByTransitivity), and theother for inheritance axioms (CNCD-DeduceByInheritance). Before we describe these two func-

    tions, observe the following first.To deduce a constraint using a transitivity axiom, say deducing EQUALP1; Pm using

    EQUALX; Y ^ EQUALY; Z ! EQUALX; Z, it is necessary that the following series exists:

    EQUALP1; P2

    EQUALP2; P3

    EQUALPm1; PmOn the other hand, to deduce a constraint using an inheritance axiom, say deducingATMOSTPm; n using SUBSUMEX; Y ^ ATMOSTX; n ! ATMOSTY; n, we must have thefollowing:

    SUBSUMEP1; P2

    SUBSUMEPm1; Pm

    ATMOSTP1; n

    Fig. 2. The SCgraph for Example 2 after normalisation.

    224 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    11/26

    It is easy to see the two types of axiom are handled rather similarly: both involve essentially a

    reachability computation (from P1 to Pm) [7,24] and they differ only in the actual generation ofderived constraint. 2 For this reason, we will only describe how transitivity axioms are handled in

    this paper and the reader is referred to [13] for the handling of inheritance axioms.The following function illustrates how implicit constraints are derived by using a transitivity

    axiom, Ai. Given a predicate, say EQUALP1; P2, ComputeClosure performs reachability com-putation and finds all the path expressions that are reachable from the left path expression in thepredicate (P1 in EQUALP1; P2). Since path expressions have been fully expanded by Normalise-Graph, deduction can be carried out here without having to resolve term compatibility. Onceclosure is obtained, new constraints are generated and added them into Gby GenerateConstraint.

    Function CNCD-DeduceByTransitivityG; Ai1. for every constraint Ck2 G

    2. ifAi applies to Ck3. P Cks left path expression4. closure ComputeClosureP; Ai; G5. G G[ GenerateConstraintclosure; Ck6. return G

    For example, Axiom 2 in Appendix B applies to constraint C6 in Example 2. Using theComputeClosure function, we obtain closure forder:customer; order:serviceg, starting from theleft path expression order.recommender (after normalisation) in C6. GenerateConstraint thengenerates the following new constraint and adds it to Gaccordingly:

    C9 ATMOSTorder:recommender; 9; order:serviceAfter the SCgraph is fully expanded, RelhSj; ni may be identified according to the following: aconstraint Ck2 Gbelongs to RelhSj; ni if and only if there exists a cs arc Ck; Sj; w 2 Gandw6 n. This is performed by IdentifyRelated (line 7 in CNCD) and is a straightforward traversal ofthe graph. For example, to derive Relhcustomer; 1i, we retrieve the constraints that are con-nected to customer with a weights less than or equal to 1, and we found C4, C5, C6, C7, and C8.Note that the newly deduced C9 is not retrieved because it is not relevant to Relhcustomer; 1i.

    3.3.2. The CNPD algorithm

    The CNCD algorithm is easy to understand, but may not be efficient. For applications with a

    large number of constraints involved, attempting to completely expanding an SCgraph can be

    both time- and space-consuming, and may not scale. In this section, we introduce the CompleteNormalisation, Partial Deduction or CNPD algorithm for computing RelhS; ni. The idea is thatwe still expand an SCgraph with new cs arcs that are derivable from path expressions, but willtry to deduce new constraints that are related to hSj; ni only. The algorithm is given below, whichhas the same input as CNCD has.

    2 The EQUAL predicate has the symmetry property (EQUALP1; P2 $ EQUALP2; P2) that may be exploited inreachibility computation. However, as our focus in this paper is to consider how transitivity axioms may be handled in

    general, we will not utilize this property in our algorithms.

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 225

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    12/26

    Algorithm CNPDG; A; Sj; n1. G NormaliseGraphG2. Rel IdentifyRelatedG; Sj; n

    3. for every Ai 2 A4. ifAi is a transitivity axiom5. Rel CNPD-DeduceByTransitivityG; Ai; Sj; n; Rel6. else

    7. Rel CNPD-DeduceByInheritanceG; Ai; Sj; n; Rel8. return Rel

    Here we perform Deduction after Identification. This means that our deduction must now return

    only the constraints that are actually related to hSj; ni. Before we describe how this is done, ob-serve the following first. If a deduced constraint Ck, say EQUALP1; Pm, is to be related to hSj; ni,

    then Sj must appear either in P1 or Pm and must appear in such a position that it is no more than nplaces away from the target. That is, Ckmust be in either of the following forms in order for it to

    be related to hSj; ni (where j6 n):

    Ck1 EQUALP1; Sh Sj S0

    Ck2 EQUALSh Sj S0; Pm

    Now, ifCk1 is to be deduced from other constraints using the relevant transitivity axiom, then thefollowing series must exist:

    EQUALP1; P2

    EQUALP2; P3

    EQUALPm1; Sh Sj S0

    Similarly, to deduce Ck2 , the following must exist:

    EQUALSh Sj S0; P2

    EQUALP2; P3

    EQUALPm1; Pm

    This suggests that to deduce implicit constraints for hSj; ni, we can start with a constraint Ckthatis directly or indirectly related to hSj; ni, then perform reachability computation. That is, given aconstraint Ck, say EQUALPL; PR, ifPL is in the form ofSh Sj S0, we search for all the pathexpressions that are reachable from PL (we call it left reachability computation); ifPR is in the form

    ofSh Sj S0, we search for all the path expressions that can reach PR (we call it right reach-ability computation); if both conditions are satisfied, then both are performed. The function givenbelow is based on this observation, where subfunctions Member and Index are used to test the

    position ofSj in PL and PR.

    226 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    13/26

    Function CNPD-DeduceByTransitivityG; Ai; Sj; n; Rel1. for every Ck2 Rel2. ifAi applies to Ck

    3. PL Cks left path expression4. ifMemberSj; PL ^ IndexSj; PL6 n5. lclosure CNPD-ComputeLeftClosurePL; Ai; G6. Rel Rel [ CNPD-GenerateLeftConstraintlclosure; Ck7. PR Cks right path expression8. ifMemberSj; PR ^ IndexSj; PR6 n9. rclosure CNPD-ComputeRightClosurePR; Ai; G10. Rel Rel [ CNPD-GenerateRightConstraintrclosure; Ck11. return Rel

    Again, since all the path expressions are normalised, there is no need to consider term com-patibility in CNPD. Note that CNPD performs reachability computation for the constraintsidentified by IdentifyRelated only, rather than for every one in the original set. This ensures that

    only the set of constraints that is related to hSj; ni is deduced by CNPD-DeduceByTransitivity.For example, to drive Relhrecommender; 1i using CNPD, we first obtain fC5; C6g, the con-

    straints directly or indirectly related to hrecommender; 1i, through Normalisation and Identifica-tion. We then apply the above function to the set fC5; C6g, and deduce the following (through theonly qualifying path expression in this case: the left path expression in C6):

    C9 ATMOSTorder:recommender; 9; order:service

    So, Relhrecommender; 1i fC5; C6; C9g. Note that if the requirement was to computeRelhcustomer; 1i, for example, then C9 would not be deduced at all using CNPD.

    3.3.3. The PNPD algorithm

    The CNPD algorithm normalises path expressions for all constraints. This helps deduction.However, the side-effect of performing path expression expansion is that the number of con-

    straints may increase (see Section 3.3.1). This could affect performance because deduction isheavily dependent on the number of constraints to be handled [9]. So rather than expanding pathexpressions for all constraints, our final algorithm, the Partial Normalisation, Partial Deduction orPNPD algorithm, derives RelhS; ni without expanding the initial SCgraph.

    Since the given graph is not normalised, the PNPD algorithm, as shown below, effectively has

    to perform Normalisation as part ofIdentification and Deduction.

    Algorithm PNPDG; A; Sj; n1. Rel PNPD-IdentifyRelatedG; Sj; n2. for every Ai 2 A3. ifAi is a transitivity axiom

    4. Rel PNPD-DeduceByTransitivityG; Ai; Sj; n; Rel5. else6. Rel PNPD-DeduceByInheritanceG; Ai; Sj; n; Rel7. return Rel

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 227

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    14/26

    The PNPD-IdentifyRelated function recursively drills down, in a depth first manner, to thecomponents that are reachable from Sj within depth n to identify the constraints that are related

    to hSj; ni. This is performed according to the following criteria:

    (1) IfSj appears in a path expression Pin Ck, and Sj appears in such a position that it is no morethan n places from the target, then Ck2 RelhSj; ni.

    (2) IfSj does not appear in a path expression Pin Ck, but the source ofPis reachable from Sjwithin depth n l, where l is the number of structures in P, then Ck2 RelhSj; ni.

    For example, to derive Relhrecommender; 1i, we examine all the constraints in Example 2. Theonly constraint that satisfies (1) above is C6 and the only constraint that satisfies (2) is C5. By

    performing PNPD-IdentifyRelated, we derive fC5; C6g for hrecommender; 1i.The deduction performed by PNPD is similar to that of CNPD. However, since path expres-

    sions are unnormalised here, the following must be observed. First, for any constraint Ck2 Rel,

    we perform reachability computation if and only if it satisfies the two conditions given above.Second, we must check term incompatibility during deduction. For example, whetherEQUALP1; P4 is deducible from EQUALP1; P2 and EQUALP3; P4 depends on ifP2 is compatiblewith P3. The possible compatibility relationships between two path expressions are shown in Table1. As we can see, two path expressions are compatible if and only if they are identical or one is a

    suffix of the other.The function given below is performed by PNPD to deduce implicit constraints, and is largely

    identical to the one given in Section 3.3.2. The only difference is that we must now performTestRelevance according the two conditions given earlier (instead of relying on normalised pathexpressions) and we must perform term compatibility test in both PNPD-ComputeRightClosure

    and PNPD-ComputeLeftClosure according to Table 1.

    Function PNPD-DeductionByTransitivityG; Ai; Sj; n; Rel1. for every Ck2 Rel2. ifAi applies to Ck3. PL Cks left path expression4. if TestRelevanceCk; PL; Sj; n TRUE5. lclosure PNPD-ComputeLeftClosurePL; Ai; G6. Rel Rel [ PNPD-GenerateLeftConstraintlclosure; Ck7. PR Cks right path expression

    Table 1

    Compatibility relationships between P2 and P3

    Case Relationships P2 P3 Compatible

    1 Identical S1 S2 S3 S1 S2 S3 Yes2 Different S1 S2 S3 S4 S5 S6 No3 Head-overlapping S1 S2 S3 S1 S2 S4 No4 Tail-overlapping S1 S2 S3 S0 S2 S3 No5 Prefix-containment S1 S2 S3 S1 S2 No6 Suffix-containment S1 S2 S3 S2 S3 Yes

    228 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    15/26

    8. if TestRelevanceCk; PR; Sj; n TRUE9. rclosure PNPD-ComputeRightClosurePR; Ai; G10. Rel Rel [ PNPD-GenerateRightConstraintrclosure; Ck

    11. return Rel

    For example, to deduce implicit constraints for hrecommender; 1i, we start with fC5; C6g, the setidentified by PNPD-IdentifyRelated. By performing TestRelevance, only the left path expressionrecommender in C6 is found to be relevant to hrecommender; 1i, therefore we perform PNPD-ComputeLeftClosure to compute lclosure for C6. Since the right path expression customer in C6is compatible with the left path expression customer in C7 according to Table 1, we derive

    fcustomer; serviceg as lclosure, and generate the following constraint for hrecommender; 1i usingPNPD-GenerateLeftConstraint:

    C9 ATMOSTrecommender; 9; service

    3.4. Comparing with other constraint analysis

    Constraint analysis has been studied extensively in the literature. The earlier research con-centrated on deriving axiomatic properties for constraints. This has resulted in a rich collection of

    axioms for characterising, for example, inclusion dependency [7,24], functional dependency [7,24],numerical dependency [16,28] and multi-valued dependency [3]. These axioms have then been used

    in a variety of constraint analyses, including cardinality constraint satisfiability for ER models[21], predicate satisfiability and subsumption in FOL [20], schema equivalence in meta-level FOL[8], and constraint subsumption and satisfiability in Description Logic [1,4]. Very generally

    speaking, what is common about these analyses is that they all attempt to perform CA c. That is,given a set of constraints Cand a set of axioms A, they determine whether a constraint c isdeducible from Cand A. In contrast, our algorithms may be considered as performing a kind ofconstrained deduction: CA;hSj;ni c, i.e., we deduce c from Cand A only if the relatednessrequirement hSj; ni is satisfied. For example, given

    C fATMOSTrecommender; 3; customer; ATMOSTcustomer; 3; serviceg

    and

    A fATMOSTP1; n; P2 ^ ATMOSTP2; m; P3 ! ATMOSTP1; n m; P3g

    whether the constraint ATMOSTorder:recommender; 9; order:service is derivable from Cand A, inour work, depends on the given relatedness criteria. Ifhcustomer; 1i is given, then the constraintwill not be deduced, despite the fact it is logically implied by Cand A. This type of deduction

    requires some additional optimisation issues to be considered, as we have discussed in the pre-vious sections.

    Another difference between the algorithms we propose here and other constraint analysis

    techniques is the way in which constraints are expressed. The majority of constraint analysistechniques assume a flat structure (such as normalised relations) and deal with the instances of

    such structures in their analysis [5,7,21]. Our algorithms, on the other hand, allow nested struc-tures to be used in constraints. This is motivated by the fact that in constraint evolution, we often

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 229

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    16/26

    need to deal with constraints that are defined in terms of non-flat structures in legacy systems. The

    use of nested structures in constraints makes it necessary to consider how they may be dealt within analysis. In our algorithms, we have extended normal reachability computation [22,23] to

    handle path expressions (representing nested structured) used in predicates.Finally, we point out the difference between our constraint analysis and techniques for con-

    straint checking in general. Constraint checking is concerned with checking constraints against a

    database state, so that we can establish whether the constraints are violated by the database state.Since the number of constraints to be checked is usually large, a constraint checking method willtypically try to select only the ones that are affected by a database update for evaluation [11]. So, in

    terms of efficiently identifying a subset of constraints according to some relevance criteria,constraint checking and our analysis can be considered as sharing the same objective. However,there are two main differences between the two. First, in our analysis, we identify all the constraints

    that are related to a given structure, including indirectly and implicitly related ones. Constraint

    checking, on the other hand, only examines the constraints that are directly related to a structureand does not attempt to deduce implicit constraints. Second, the two use different relevance cri-teria. In our relatedness analysis, constraints are related through structures, whereas in constraint

    checking, constraints can be related through structures as well as other conditions such as thetriggering relationship among the constraints [5]. Thus, constraint checking techniques cannotdirectly be applied to analysing the relatedness property for constraints, and vice versa.

    3.5. Time complexity analysis

    In this section, we analyse the time complexity of the proposed algorithms. The size of input,i.e., an SCgraph, can be characterised by the parameters given in Table 2.

    The maximum number ofcs arcs, NCS, is given by NCS NS NSand this happens when everyconstraint is connected to every structure. The maximum number ofss arcs, NSS, is determinedby NSS NS N

    NDoutand this happens when every structure has Noutnumber of components.

    Table 3 shows the worse-case time complexity of the proposed algorithms, where the cost ofeach algorithm is divided into several parts, each corresponding to one step of an algorithm. 3

    All three algorithms have a polynomial complexity with respect to NSand NC, the number of

    structures and constraints. ND could affect the performance of the proposed algorithms signifi-cantly, but a large ND is unlikely in the real-world applications. We therefore expect our algo-rithms to scale to non-trivial applications. In the following we analyse the algorithms in detail.

    The CNCD Algorithm. In CNCD, we go through all the path expressions in all the constraints

    to normalise an SCgraph. The worse case occurs when every constraint is connected to everystructure, and this means that we must iterate NS NCtimes to normalise the path expressions.During each iteration, we need NNDin steps to expand the path expressions with the structures thathave been omitted from them, and then need NSsteps to connect constraints to structures. Thus

    we need a total ofNS NNDin steps to connect a constraint to the structures specified in it.

    Therefore, the time complexity of CNCDs Normalisation is N2S NC NNDin .

    3 For Deduction, time complexity is calculated for handling transitivity axioms only. The complexity for handling

    inheritance axioms in each algorithm is similar and is given in [13].

    230 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    17/26

    The complexity of deduction in CNCD is mainly determined by the complexity ofCompute-Closure, which is known to be n2 m [22], when n is the number of constraints and m is thenumber of elements in closure. Note that after normalisation, the number of constraints may

    increase, and the maximum, N0C, is given by N0C NC N

    NDin . The number of elements in closure is

    determined by the possible number of path expressions, which can be calculated usingNP NS N

    NDin . Thus the time complexity for ComputeClosure in CNCD is NS N

    2C N

    3NDin . In

    CNCD we compute a closure for every available constraint. However, if different constraints

    share the same path expression, then they will have the same closure. The maximum number ofdistinct closures to be computed depends on the available path expressions, and is NS N

    NDin .

    Therefore, the time complexity for CNCDs Deduction is N2S N2C N

    4NDin .

    To identify related constraints using CNCD, we go through all the existing cs arcs, and wehave, after normalisation N0CS NS NC N

    NDin . Therefore, the complexity of CNCDs Identifi-

    cation is NS NC NNDin .

    The CNPD Algorithm. CNPD uses the same functions as those used by CNCD for normalising

    an SCgraph and identifying related constraints. Therefore, CNPDs Normalisation and Identi-fication have the same complexity as those of CNCD. For Deduction, however, CNPD computes

    closures for the relevant structures only. For the given hSj; ni, the maximum number of closures

    Nclosure we need to compute is determined by Nclosure NND

    out. Therefore, the time complexity forCNPDs Deduction is NS N2C N

    NDout N

    3NDin .

    The PNPD Algorithm. PNPD does not normalise an SCgraph. It derives RelhSj; nidirectly during Identification and Deduction. To identify constraints according to hSj; ni, PNPD-IdentifyRelated recursively traverses n steps to locate constraints that are related to Sj. Since Sj canhave up to ND levels of components and each component can have at most Noutnumber of direct

    components, in the worst case PNPD needs to check NNDoutnumber of structures for RelhSj; ni.Each structure can have NCnumber of constraints connected to it. To determine whether each

    constraint Ci is actually related to hSj; ni, we need to perform the TestRelevance function whichtraverses through the SCgraph to see ifSj is a predecessor of a path expression in Ci within depth

    Table 2

    Parameters characterising an SCgraph

    NSthe number of structure vertices

    NCthe number of constraint verticesNinthe maximum in-degree of a structure vertex

    Noutthe maximum out-degree of a structure vertex

    NDthe maximum nesting depth of a structure

    Table 3

    Time complexity of three algorithms

    Function CNCD CNPD PNPD

    Normalisation N2S NC NNDin N

    2S NC N

    NDin

    Deduction N2S N2C N

    4NDin NS N

    2C N

    NDout N

    3NDin NS N

    3C N

    2NDin

    Identification NS NC NNDin NS NC N

    NDin N

    NDout NC N

    NDin

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 231

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    18/26

    n. This requires NNDin steps. Therefore, the time complexity for PNPDs Identification is NNDout

    NC NNDin .

    For Deduction, we need to go through all the constraints in Rel and in the worst case the size of

    Rel is NC. To decide whether to compute the closure for a constraint in Rel, we need to performthe TestRelevance function, which needs NNDin steps as we have discussed earlier. However, PNPDdoes not normalise an SCgraph, so the number of constraints remains unchanged. ThusN2C NS N

    NDin steps are required to compute the closure. Therefore, the time complexity for

    PNPDs Deduction is NS N3C N

    2NDin .

    4. Performance study

    In this section, we study the empirical performance of the proposed algorithms. We report on

    the experimental results, study the impact of the various characteristics of an SCgraph on theperformance, and indicate suitable application areas for the algorithms.

    4.1. Experimental setup

    All experiments were carried out on a Pentium III PC with a 700 MHz processor and 128 MB

    of memory. The system runs Microsoft Windows/NT 4.0, and the algorithms were implementedusing SICStus Prolog 3.8.6.

    To test the algorithms in a controlled manner, we use synthetic SCgraphs in our experiments.The characteristics of an SCgraph are controlled by NS, NCand ND introduced in Table 2 and

    two additional parameters given below:

    Ncomthe out-degree and the maximum in-degree of a structure vertex. Rthe structure and constraint distribution ratio.

    NSand NCare needed for obvious reasonsthey determine the number of structures andconstraints in an SCgraph. Ncom decides the number of direct components that a structure has, as

    well as the maximum number of direct parents that it may have. Ncom thus combines Nin and Noutintroduced in Table 2. NSand Ncom together determine the number ofss arcs, NSS NS NPS Ncom, where NPSis the number of primitive structures.

    Since inheritance axioms are handled in a similar way to that of transitivity axioms, wetherefore expect both to exhibit similar performances. Thus, only the performance of handlingtransitivity axioms is studied here. In our experiments, only one type of constraint,SUBSUMP1; P2, is generated. The fact that we generate only one type of constraint does notaffect the validity of the test results reported here. This is because when using a particular axiom in

    deduction, for example SUBSUMEX; Y ^ SUBSUMEY; Z ! SUBSUMEX; Z, only the type ofconstraint included in the axiom (SUBSUMEP1; P2 in our example) will actually be processed.Other types of constraint will not have any effect on the performance of deduction. This means

    that all the constraints involve two path expressions. So the total number ofcs arcs, NCS, in an SCgraph is NCS NC 2.

    232 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    19/26

    ND controls the maximum depth of nested structures, i.e., how many levels of components theymay have. In our experiments, we divide the structures in an SCgraph into ND 1 disjoint

    groups and organise the groups into layers as shown in Fig. 3. Each layer has NSi number ofstructures. The structures at level i draw their direct components from level i 1, and level ND 1contains primitive structures. We ensure that NS

    PND1i1 NSi .

    The distribution of structures across the levels is controlled by the ratio parameter R. For

    16 i6ND, the number of structures at level i (NSi ) and the number of structures at level i 1(NSi1 ) satisfy the relationship: NSi1 R NSi . The same parameter R is also used to control thedistribution of constraints to the structures at all levels. That is, for 16 i6ND, we haveNCi1 R NCi , where NCi and NCi1 are the number of constraints associated with the structuresat levels i and i 1, respectively.

    By setting R > 1, we distribute structures and constraints into a triangular form as shown inFig. 3, i.e., having more lower level structures than the higher level ones in an SCgraph, and

    having more constraints associated with lower level structures. This corresponds to applications inthe real-world.

    4.2. Experimental results

    We first experimented the execution of three algorithms using fairly small SCgraphs. The

    results are shown in Table 4. These results were obtained by fixing ND 3, Ncom 3 and R 3 forthe SCgraphs, and running the algorithms to compute RelhSj; 3i, where Sj is an arbitrarystructure selected from the structures at level 4.

    The CPU time in the left part of Table 4 was recorded by varying NCwhile setting NS 50, and

    the right part of Table 4 shows the CPU time obtained by varying NSwhile fixing NC 80. FromTable 4, we can observe the following:

    Fig. 3. Distribution of structures.

    Table 4

    CPU time (s) with varying NCand NS

    NC NS

    70 80 90 100 70 80 90 100

    CNCD 70.9 94.4 145.1 227.0 79.6 67.4 55.8 48.6

    CNPD 2.40 2.75 2.96 3.81 2.01 1.67 1.30 1.07

    PNPD 0.33 0.34 0.35 0.35 0.25 0.26 0.24 0.23

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 233

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    20/26

    CNCD was considerably slower than the other two algorithms. All algorithms increased their CPU time when we increased NCand fix other parameters. All algorithms decreased their CPU time when we increased NSand fix other parameters.

    We will discuss the last two issues later in the section. But first, we analyse the performance ofCNCD in more detail by breaking down its CPU time into three parts, Normalisation, Deduction

    and Identification. This is shown in Table 5.It is easy to see that Deduction consumed most of the CPU time and Identification consumed the

    least. This suggests that the CNCD algorithm is more suited for a multiple-user environment,where although it is expensive to expand an SCgraph fully, once expanded, it can efficiently

    support multiple users to derive related constraints for different sets of structures.We now turn our attention to the performance of CNPD and PNPD. We carried out two sets

    of experiments to test the impact ofNSand NCon the two algorithms. In the first set we varied NCand fixed other parameters. The results are given in Fig. 4(a).

    As can be seen, both algorithms increased their response time when NCincreased. This is within

    our expectation since both algorithms are heavily dependent on NC. However, comparing the two,PNPD had a much better response time. There are two reasons for this.

    First, CNPD expands the SCgraph with all the cs arcs that are derivable from pathexpressions. This increases the search space for Identification. The potential number ofcs arcs

    that can be added into an SCgraph by the Normalisation step (N0CS) can be calculated by thefollowing formula:

    Table 5

    CPU time distribution for CNCD

    NS NC Total Normalisation Deduction Identification

    50 70 70.9 0.21 70.67 0.0250 80 94.4 0.26 94.11 0.03

    50 90 145.0 0.33 144.64 0.03

    50 100 227.0 0.36 226.4 0.04

    Fig. 4. Impact ofNC(a) and NS(b).

    234 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    21/26

    N0CSXND1i2

    NCi Xij1

    Njcom

    !where NCi

    NC

    PND1n1 R

    n1 Ri1 1

    So, increasing NCwill increase N0CSand the search space.Second, CNPD may increase the number of constraints during the process of expanding path

    expressions (see Section 3.3.1). This can affect the performance of its deduction process. Afternormalisation, the potential number of constraints (N0C) can be

    N0CXND1i2

    NCi Ni1com 2

    So increasing NCwill increase N0C, and the larger N

    0Cis, the slower the deduction becomes.

    The second set of experiments tested the performance of the two algorithms by varying NS. Theresults are shown in Fig. 4(b). Again, PNPD performed better. It is interesting to observe that

    both algorithms start to decrease their response time after NShas reached a certain value. Themain reason for this is the following. When NSis relatively small, increasing it tends to spread the

    fixed number of constraints to more structures, and hence increases the search space for com-puting closure in deduction. However, when NSbecomes sufficiently large, the constraints tend to

    be distributed to the structures sparsely. This reduces the chances for the constraints to betransitively reachable from each other, and therefore helps the computation of closure indeduction.

    To examine the impact of other parameters on the performance of the two algorithms, thefollowing sets of experiments were conducted. The first set tested the impact ofND. Note that ifn

    is fixed and NDP n, then increasing ND will have no effect on computing RelhSi; ni. Thus, in this

    set of experiments, we computed RelhSi; NDi. Fig. 5(a) shows the results of our experiments. Inthe second set of experiments, we examined the effect ofNcom on the two algorithms. The resultsare given in Fig. 5(b).

    PNPD performed considerably better than CNPD. The reason for both set of experiments canbe explained using Eq. (2). The number of constraints that CNPD and PNPD handle in deduction

    is N0Cand NC, respectively. It is easy to see that N0Cincreases significantly with ND and Ncom, and

    therefore has a marked effect on the performance of CNPD.

    Fig. 5. Impact ofND (a) and Ncom (b).

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 235

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    22/26

    In the final set of experiments, we tested the performance of the two algorithms by varying R.Interestingly, we found that varying R does not have an obvious effect on PNPD, but can dra-

    matically reduce the response time for CNPD. The results are shown in Fig. 6.The main reason for the decrease in response time for CNPD is that for a given hSj; ni,

    increasing R will decrease NCi according to Eq. (1), which in turn will reduce the number of

    constraints for CNPD to handle in deduction. R has little effect on the performance of PNPDbecause in its deduction PNPD involves only the initial set of constraints.

    4.3. Summary

    In summary, CNPD and PNPD are much more efficient than CNCD, and can be expected towork well with large applications in the real-world. PNPD performs considerably better in ourexperiments, but one has to bear in mind that this is for computing a single RelhS; ni only. In amulti-user environment, PNPD may not deliver the best performance because every singleRelhS; ni is computed on its own and from the initial set of constraints directly. CNPD, on theother hand, may offer better performance in this respect. CNPD is also particularly suited for

    applications where structures do not have their components deeply nested, for examples, foranalysing the constraints that are declared for a relational database.

    5. Conclusions and future work

    In this paper, we introduced a form of analysis that can be used to determine which set ofconstraints is related to a specified set of business objects. This form of analysis is important anduseful in constraint evolutionit helps system engineers to identify, from the many implemented

    in an information system, a coherent subset of constraints that must be examined and updated inorder to implement a change made to the high-level policy statements or business rules. It also

    helps system designers and business users to comprehend how constraint business rules areactually implemented and enforced in the supporting information systems. So far, there has beenlittle work on developing techniques to support this type of constraint analysis. In this paper, we

    proposed three algorithms for identifying constraints that are directly, indirectly and implicitly

    Fig. 6. Impact ofR.

    236 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    23/26

    related to a specified set of business objects. Both theoretic analysis and experimental results show

    that the proposed algorithms are efficient and have a performance that can scale to non-trivialapplications.

    There are a number of issues that we plan to investigate in future. First, while this paper isfocused on analysing the relatedness property for constraints, there are other properties thatconstraints may have and identifying them can be beneficial to both constraint evolution and

    comprehension. For example, for a given group of business objects, we may wish to determinewhether there exists a conflict among the constraints that constrain them. This sort of analysis isuseful when we wish to check if we have updated the set of constraints correctly for the set of

    business objects concerned. Second, in order to help system engineers in the task of evolvingconstraint business rules, it is desirable to present the constraints identified by our algorithms tothe user as human comprehensible business rules. This requires (1) an ontological mapping from

    the system-oriented terms used in the implemented constraint to those preferred by the user in the

    business context; and (2) the ability to assemble several implemented constraints into a single,high-level constraint statement. Both are non-trivial and require substantial research effort.

    Acknowledgements

    This work is part of the BRULEE project which was funded by Grant GR/M66219 from theUK Engineering and Physical Sciences Research Council. We would like to thank the anonymousreferees for their constructive comments which have helped to improve the paper.

    Appendix A. Constraint representation formalism

    The constraint representation formalism we used in this research is a predicate logic based

    language. Its syntax in BNF is given below.

    hcons expri :: :hcons expri j hcons expri ^ hcons expri jhcons expri _ hcons expri j hcons expri ! hcons expri j8xhcons expri j 9xhcons expri jTYPEOFhpath expri; hbuilt ini jENUMERATEhpath expri; fhobjecti; hobjectig j

    GREATERTHANhpath expri; harith expri jEQUALhpath expri; harith expri jSUBSUMEhpath expri; hpath expri jDISJOINThpath expri; hpath expri jONEOFhpath expri; fhpath expri; hpath exprig jATMOSThpath expri; hpositive integeri jATLEASThpath expri; hnon negative integeri jATLEASThpath expri; hnon negative integeri; hpath expri jATMOSThpath expri; hpositive integeri; hpath expri

    harith expri :: hpath expri j hobjecti j sumhpath expri j

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 237

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    24/26

    counthpath expri j avehpath expri j maxhpath expri jminhpath expri j harith expri harith expri jharith expri harith expri j harith expri harith expri j

    harith expri=harith exprihpath expri :: hstructurei:hstructureihbuilt ini :: real j integerj boolean j stringj charhstructurei :: hsymbolihobjecti :: hsymboli

    The logic connectives have their usual interpretations. The semantics of the built-in predicates

    are intuitive. TYPEOFS; T specifies that structure Shas Tas a type. ENUMERATES; fa1;a2; . . . ; amg is another predicate for specifying a type ofS: Scan have any ai as an instance.GREATERTHANand EQUAL have their usual arithmetic interpretations. SUBSUMES1; S2 speci-

    fies that S1 subsumes or is a super-class ofS2. DISJOINTS1; S2 is the opposite ofSUBSUMES1;S2 and specifies that the domains ofS1 and S2 do not overlap. ONEOFS; fS1; S2; . . . ; Skg specifiesthat an instance ofSmust be one and only one instance ofSi. Finally, ATMOSTS; n(ATLEASTS; n) specifies that Scan have a maximum (minimum) ofn instances, whereasATMOSTS1; n; S2 (ATLEASTS1; n; S2) specifies that for each instance ofS1, we can have amaximum (minimum) ofn instances ofS2. For a more formal interpretation of these predicates,the reader is referred to [13].

    Appendix B. Axioms for deriving implicit constraints

    The following is the set of axioms that are used by our algorithms in deduction, where Prepresents a path expression, A an arithmetic expression, and b a built-in type. Axioms 15 aretransitivity axioms and Axioms 613 are inheritance axioms. Note that Axiom 5 is different from

    other transitivity axioms in that it involves two predicates, GREATERTHANand EQUAL. Weclassify it as a transitivity axiom since it is handled in the same way as the other transitivity axiomsare handled.

    Axiom 1 SUBSUMEP1; P2 ^ SUBSUMEP2; P3 ! SUBSUMEP1; P3Axiom 2 ATMOSTP1; n; P2 ^ ATMOSTP2; m; P3 ! ATMOSTP1; n m; P3Axiom 3 EQUALP1; P2 ^ EQUALP2; P3 ! EQUALP1; P3Axiom 4 GREATERTHAN

    P

    1;P

    2 ^GREATERTHAN

    P

    2;P

    3 !GREATERTHAN

    P

    1;P

    3Axiom 5 GREATERTHANP1; P2 ^ EQUALP2; P3 ! GREATERTHANP1; P3Axiom 6 SUBSUMEP2; P1 ^ ENUMERATEP2; fO1; . . . ; Omg ! ENUMERATEP1; fO1; . . . ; OmgAxiom 7 SUBSUMEP2; P1 ^ GREATERTHANP2; A ! GREATERTHANP1; AAxiom 8 SUBSUMEP2; P1 ^ EQUALP2; A ! EQUALP1; AAxiom 9 SUBSUMEP2; P1 ^ TYPEOFP2; b ! TYPEOFP1; bAxiom 10 SUBSUMEP2; P1 ^ ATMOSTP2; m ! ATMOSTP2; mAxiom 11 SUBSUMEP2; P1 ^ ATLEASTP2; n; P3 ! ATLEASTP1; n; P3Axiom 12 SUBSUMEP2; P1 ^ ATMOSTP2; m; P3 ! ATMOSTP1; m; P3Axiom 13 SUBSUMEP2; P1 ^ DISJOINTP2; P3 ! DISJOINTP1; P3

    238 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240

  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    25/26

    References

    [1] A. Artale, E. Franconi, Temporal ER modeling with description logics, in: Proceedings of 18th International

    Conference on Conceptual Modeling, 1999, pp. 8195.[2] N. Bassiliades, P.M.D. Gray, CoLan: a functional constraint language and its implementation, Data and

    Knowledge Engineering 14 (3) (1994) 203249.

    [3] C. Beeri, R. Fagin, J.H. Howard, A complete axiomatization for functional and multivalued dependencies in

    database relations, in: Proceedings of the 1977 ACM SIGMOD International Conference on Management of Data,

    1977, pp. 4761.

    [4] A. Borgida, Description logics in data management, IEEE Transactions on Knowledge and Data Engineering 7 (5)

    (1995) 671682.

    [5] F. Bry, H. Decker, R. Manthey, A uniform approach to constraint satisfaction and constraint satisfiability in

    deductive databases, in: Proceedings of the 1st International Conference on Extending Database Technology, 1988,

    pp. 488505.

    [6] D. Calvanese, M. Lenzerini, On the interaction between ISA and cardinality constraints, in: Proceedings of the

    10th International Conference on Data Engineering, 1994, pp. 204213.

    [7] M.A. Casanova, R. Fagin, C.H. Papadimitriou, Inclusion dependencies and their interaction with functional

    dependencies, Journal of Computer and System Sciences 28 (1) (1984) 2959.

    [8] J.A. Chudziak, H. Rybinski, J. Vorbach, Towards a unifying logic formalism for semantic data models, in:

    Proceedings of 12th International Conference on the Entity-Relationship Approach, 1993, pp. 492507.

    [9] S.S. Cosmadakis, P.C. Kanellakis, M.Y. Vardi, Polynomial-time implication problems for unary inclusion

    dependencies, ACM Transactions on Database Systems 37 (1) (1990) 1546.

    [10] A.B. Earls, S.M. Embury, N.H. Turner, A method for the manual extraction of business rules from large source

    code, BT Technology Journal 20 (4) (2002) 127143.

    [11] S.M. Embury, S.M. Brandt, J.S. Robinson, I. Sutherland, F.A. Bisby, W.A. Gray, A.C. Jones, R.J. White,

    Adapting integrity enforcement techniques for data reconciliation, Information Systems 26 (8) (2001) 657689.

    [12] S.M. Embury, J. Shao, G. Fu, X.K. Liu, W.A. Gray, A theoretical framework for the recovery business rules from

    legacy systems, Brulee project technical report, Department of Computer Science, Cardiff University, 2003.

    [13] G. Fu, Comprehension of constraint business rule extracted from legacy systems, Brulee project technical report,Department of Computer Science, Cardiff University, 2002.

    [14] G. Fu, J. Shao, S.M. Embury, W.A. Gray, Representing constraint business rules extracted from legacy systems, in:

    Proceedings of 13th International Conference on Database and Expert System Application, 2002.

    [15] P. Godfrey, J. Grant, J. Gryz, J. Minker, Integrity constraints: semantics and applications, in: C.J. Date (Ed.),

    Logics for Databases And Information Systems, Kluwer, 1998, pp. 265306.

    [16] J. Grant, J. Minker, Inferences for numerical dependencies, Theoretical Computer Science 41 (1985) 271287.

    [17] P. Grefen, P. Apers, Integrity control in relational database systemsan overview, Data and Knowledge

    Engineering 10 (2) (1993) 187223.

    [18] D. Hay, K.A. Healy, Defining Business RulesWhat are They Really, 1996. Available online .

    [19] H. Herbst, T. Myrach, A repository system for business rules, in: Proceedings of the Sixth IFIP TC-2 Working

    Conference on Data Semantics (DS-6), 1995, pp. 119139.[20] M. Lenzerini, Covering and disjointness constraints in type networks, in: Proceedings of the 3th International

    Conference on Data Engineering, 1987, pp. 386393.

    [21] M. Lenzerini, P. Nobili, On the satisfiability of dependency constraints in entity-relationship schemata, in:

    Proceedings of 13th International Conference on Very Large Data Bases, 1987, pp. 147154.

    [22] D. Maier, The Theory of Relational Databases, PITMAN, London, 1983.

    [23] H. Mannila, K. Raiha, Algorithms for inferring functional dependencies from relations, Data and Knowledge

    Engineering 12 (1) (1994) 8399.

    [24] J.C. Mitchell, The implication problem for functional and inclusion dependencies, Information and Control 56 (3)

    (1983) 154173.

    [25] R. Reiter, What should a database know?, Journal of Logic Programming 14 (1 and 2) (1992) 127153.

    G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240 239

    http://www.omg.org/techprocess/meetings/schedule/uml_2.0_ocl_rfp.htmlhttp://www.omg.org/techprocess/meetings/schedule/uml_2.0_ocl_rfp.htmlhttp://www.omg.org/techprocess/meetings/schedule/uml_2.0_ocl_rfp.htmlhttp://www.omg.org/techprocess/meetings/schedule/uml_2.0_ocl_rfp.html
  • 8/8/2019 !!!!!!!!!!!!!!!!!!!!!!!!!Algorithms for Analysing Related Constraint Business Rules

    26/26

    [26] J. Shao, C. Pound, Reverse engineering business rules from information systems, BT Technology Journal 17 (4)

    (1999) 179186.

    [27] A.H.M. ter Hofstede, H.A. Proper, Th.P. van der Weide, A conceptual language for the description and

    manipulation of complex information models, in: Seventeenth Annual Computer Science Conference, 1994.[28] B. Thalheim, Fundamentals of cardinality constraints, in: Proceedings of the 11th International Conference on the

    Entity-Relationship Approach, 1992, pp. 723.

    [29] S.D. Urban, ALICE: an assertion language for integrity constraint expression, in: Proceedings of the 13th Annual

    Conference on Computer Software and Application, 1989, pp. 292299.

    Gaihua Fu is a Research Associate and Ph.D. candidate at the Department of Computer Science of the CardiffUniversity, United Kingdom. She received her B.Sc. (Hons) in Computer Science from Inner MongoliaUniversity of P.R. China. Her major research interests include business rules, database semantics, reverseengineering, ontology, knowledge representation, semantic web, and spatial representation and reasoning.

    Dr. Jianhua Shao received his B.Sc. degree from Shanghai University, China, and his Ph.D. from the Uni-versity of Ulster, UK, all in Computer Science. He is currently a lecturer in Computer Science at CardiffUniversity. His main research interests include database systems, metadata management and extraction andanalysis of business rules in legacy information systems.

    Suzanne M. Embury is a lecturer in the Department of Computer Science at the University of Manchester. Shegraduated from the University of Kent at Canterbury with a B.Sc. in Computer Science in 1990, and thenobtained her Ph.D. from the University of Aberdeen in 1994. After working for several years as a researchfellow at Aberdeen, she became a lecturer in the Department of Computer Science at Cardiff University,before moving to her present post in 2001. Her previous work focused on data semantics (usually in the formof integrity constraints) and data reconciliation. Her current research interests are in the areas of informationquality (in particular, practical ways of assessing and improving IQ over the long term) and evolution of data-intensive systems (in particular, the evolution and management of business rules in database applications).

    Alex Gray was educated at Edinburgh University and the University of Newcastle on Tyne and is now aprofessor in Computing at Cardiff University. His current research interests are in metadata and its role inintegration and interoperation of heterogeneous distributed information systems, the design and architectureof distributed systems to support distributed concurrent working over networks, and the role of metadata inconstructing the information and knowledge layers of the Information GRID.

    240 G. Fu et al. / Data & Knowledge Engineering 50 (2004) 215240


Recommended