+ All Categories
Home > Documents > Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

Date post: 04-Dec-2016
Category:
Upload: xue
View: 212 times
Download: 0 times
Share this document with a friend
12
Appl Intell DOI 10.1007/s10489-012-0386-4 Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases Jian Liu · Z.M. Ma · Xue Feng © Springer Science+Business Media, LLC 2012 Abstract Since semi-structured documents (e.g., XML) could benefit greatly from database support and more specif- ically from object-oriented (OO) database management sys- tems, we study the methodology of reengineering XML to object-oriented databases when database migration occurs in this paper. In particular, considering the need of process- ing the imprecise and uncertain information existing in prac- tical applications, we investigate the problem of migrating fuzzy XML to fuzzy object-oriented databases. To find the object-oriented schema that best describes the existing fuzzy XML schema (DTD), we devise a comprehensive approach centering on a set of mapping rules. Such reengineering practices could not only provide a significant consolidation of the interoperability between fuzzy OO and fuzzy XML modeling techniques, but also develop the practical design methodology for fuzzy OO databases. Keywords Object-oriented databases · XML · Imprecise and uncertain information · Mapping 1 Introduction In the current computer science field, since the notion of ob- ject and class is ubiquitous in real-world applications, there is a paradigm shift to object orientation in the formalisms for the representation of structured knowledge used both in J. Liu ( ) · Z.M. Ma School of Information Science & Engineering, Northeastern University, Shenyang 110004, China e-mail: [email protected] X. Feng The General Hospital of Shenyang Military Command, Shenyang, China knowledge representation and in databases. Object-oriented modeling is an effective way to represent static and dynamic data semantics in the form of objects, links/associations, and methods, a great number of works [5, 7, 8, 14, 21, 27] are therefore based on this topic. In [7], Fong developed an approach to reengineer extended entity-relationship model- ing to object modeling techniques. In [21], Soutou proposed an approach for designing an object-relational database and provided solutions for modeling semantic relationships such as one-to-one, one-to-many, many-to-many, etc. in such database systems. In [27], Zhang and Fong proposed an ap- proach for mapping relational update operations including update, insert and delete operations to their object-oriented equivalents. 1.1 Data migration As a significant research issue for information exchange, migrating from a legacy database to another paradigm has received extensive attention from both academic and indus- trial communities. Previous efforts have been mainly made on the problem of data migration between relational and object-oriented databases, for example, Fong [7, 8], Soutou [21], and Zhang and Fong [27]. Nowadays, with the rapid development of web-based applications, a style of flexi- bly publishing information by using semi-structured docu- ments becomes popular [3]. Not only that, more and more applications, such as software engineering, libraries, tech- nical documentation, etc. focus on semi-structured docu- ments, in which, the eXtensible Markup Language (XML), which is the lingua franca for data exchange on the Internet, has become the de-facto standard for semi-structured doc- uments creation. XML is noted for its simplicity and flex- ibility, however, these advantages also brings limitations to present its structure using semantics to store data [9]. Rela- tional and object-oriented database systems are praised for
Transcript

Appl IntellDOI 10.1007/s10489-012-0386-4

Formal approach for reengineering fuzzy XML in fuzzyobject-oriented databases

Jian Liu · Z.M. Ma · Xue Feng

© Springer Science+Business Media, LLC 2012

Abstract Since semi-structured documents (e.g., XML)could benefit greatly from database support and more specif-ically from object-oriented (OO) database management sys-tems, we study the methodology of reengineering XML toobject-oriented databases when database migration occursin this paper. In particular, considering the need of process-ing the imprecise and uncertain information existing in prac-tical applications, we investigate the problem of migratingfuzzy XML to fuzzy object-oriented databases. To find theobject-oriented schema that best describes the existing fuzzyXML schema (DTD), we devise a comprehensive approachcentering on a set of mapping rules. Such reengineeringpractices could not only provide a significant consolidationof the interoperability between fuzzy OO and fuzzy XMLmodeling techniques, but also develop the practical designmethodology for fuzzy OO databases.

Keywords Object-oriented databases · XML · Impreciseand uncertain information · Mapping

1 Introduction

In the current computer science field, since the notion of ob-ject and class is ubiquitous in real-world applications, thereis a paradigm shift to object orientation in the formalismsfor the representation of structured knowledge used both in

J. Liu (�) · Z.M. MaSchool of Information Science & Engineering, NortheasternUniversity, Shenyang 110004, Chinae-mail: [email protected]

X. FengThe General Hospital of Shenyang Military Command, Shenyang,China

knowledge representation and in databases. Object-orientedmodeling is an effective way to represent static and dynamicdata semantics in the form of objects, links/associations,and methods, a great number of works [5, 7, 8, 14, 21, 27]are therefore based on this topic. In [7], Fong developed anapproach to reengineer extended entity-relationship model-ing to object modeling techniques. In [21], Soutou proposedan approach for designing an object-relational database andprovided solutions for modeling semantic relationships suchas one-to-one, one-to-many, many-to-many, etc. in suchdatabase systems. In [27], Zhang and Fong proposed an ap-proach for mapping relational update operations includingupdate, insert and delete operations to their object-orientedequivalents.

1.1 Data migration

As a significant research issue for information exchange,migrating from a legacy database to another paradigm hasreceived extensive attention from both academic and indus-trial communities. Previous efforts have been mainly madeon the problem of data migration between relational andobject-oriented databases, for example, Fong [7, 8], Soutou[21], and Zhang and Fong [27]. Nowadays, with the rapiddevelopment of web-based applications, a style of flexi-bly publishing information by using semi-structured docu-ments becomes popular [3]. Not only that, more and moreapplications, such as software engineering, libraries, tech-nical documentation, etc. focus on semi-structured docu-ments, in which, the eXtensible Markup Language (XML),which is the lingua franca for data exchange on the Internet,has become the de-facto standard for semi-structured doc-uments creation. XML is noted for its simplicity and flex-ibility, however, these advantages also brings limitations topresent its structure using semantics to store data [9]. Rela-tional and object-oriented database systems are praised for

J. Liu

their abilities in storing a large set of semantic informa-tion, therefore, the choice of storing data using relationalor object-oriented databases instead of native XML is nat-urally presented. Effectively migrating from XML docu-ments to relational or object-oriented databases will facili-tate platform independent exchange of data and satisfy theneeds of storing (fuzzy) XML data in (fuzzy) relational orobject-oriented databases. As such, some transformation ap-proaches in this area were proposed in previous papers, forexample, approaches on transformation between XML andrelational databases such as Fong and Cheung [9], Gau-rav and Alhajj [10], Hollander and Keulen [11], and Maand Yan [17], and approaches on transformation betweenXML and object-oriented databases such as the both-waytransformation approach introduced in [18]. Moreover, sinceobject-oriented database technology can bring a lot of ben-efits to documents management, e.g., recovery, concurrencycontrol and high level query capabilities [6], we focus onhow to effectively migrate XML to the object-oriented plat-form.

1.2 Fuzzy representation in XML

In practical applications, information is often vague or am-biguous. The objects of reengineering usually involve agreat deal of imprecise and uncertain data [23, 26] be-sides accurate data. Most of the existing works on migrat-ing from a legacy database are processed under the assump-tion that the stored data is accurate. In fact, this assump-tion is often not valid for many of the next generation in-formation systems since they may involve complex infor-mation with uncertainty. In many domains, such as environ-mental surveillance, market analysis and quantitative eco-nomics research [20], it is difficult to state all informationwith one hundred percent certainty. Imprecise and uncer-tain data in those domains are generally caused by datarandomness and incompleteness, limitations of measuringequipment, delayed data updates, etc. Currently, the studyof combining XML and imprecision/uncertainty becomesan emerging topic for various applications on the Web, andthere have been some achievements in this area, includingseveral proposed combination frameworks [2, 10, 12, 13, 16,17, 19, 22]. A framework for querying and updating prob-abilistic XML information over two specific models is in-troduced in [2]. Some representations of probabilistic datain XML were proposed in previous research papers, suchas Abiteboul et al. [1], Nierrman and Jagadish [19], andHung et al. [12]. Gaurav and Alhajj [10] incorporate fuzzydata in XML documents extending the XML Schema as-sociated to these documents. They define a mechanism torepresent fuzzy data along with crisp data in XML formatby introducing some new constructors. Ma and Yan [17] in-vestigate fuzzy XML data modeling. They identified mul-

tiple granularity of data fuzziness in XML based on possi-bility theory. A fuzzy XML data model that addresses typesof fuzziness is also developed. Unfortunately, a fuzzy XMLdocument leaves the preceding limitation (i.e., fuzzy XMLstill has limitations on presenting its structure using seman-tics to store data) unchanged as it stands in crisp scenar-ios. Surprisingly, although a fuzzy object-oriented databasehas the advantage of presenting its structure using seman-tics to store data, the study of reengineering fuzzy XMLin fuzzy object-oriented databases has received little atten-tion.

1.3 Outline and contributions

To fill in this gap which hitherto has existed in the literature,in this paper, we study the methodology of reengineeringXML schema (XML schema indicates XML DTD through-out the paper) to object-oriented schema when database mi-gration occurs. In particular, considering the need of pro-cessing the imprecise and uncertain information existing inpractical applications, we investigate the problem of migrat-ing fuzzy XML to fuzzy object-oriented databases.

The contributions of this paper are summarized as fol-lows:

• To find the object-oriented schema that best describes theexisting fuzzy XML schema (DTD), we devise a set ofreengineering rules.

• We develop effective approaches to reengineer flat andnested XML DTDs in a fuzzy OO database.

The rest of the paper is organized as follows. Section 2introduces preliminary knowledge. The fuzzy XML datamodel and its DTD representation are described in Sect. 3.Section 4 and 5 present the transforming principles, themapping construction algorithms, as well as reengineeringexamples for flat and nested fuzzy XML, respectively. Fi-nally, Sect. 6 summarizes the results of this work and sug-gests future directions.

2 Preliminaries

2.1 Characteristics of XML

XML is a subset of the Standard General Mark-up Lan-guage (SGML) [6] with an intent on being the format forthe use on the Internet. In order to define a document’s log-ical structure, XML often adds descriptive mark-up (tags)in document instances, e.g., DTD (Document Type Defini-tion). The purpose of a DTD is to define the legal buildingblocks of an XML document, which is not compulsory forevery XML document. But with a DTD, each XML docu-ments can carry a description of its own format. In addition,

Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

with a DTD, independent groups of people can agree to usea standard DTD for interchanging data. In this scenario, ap-plications can use a standard DTD to verify that the datareceived from the outside world is valid. Therefore, in viewof the advantage of using a DTD, practical applications of-ten use valid XML documents conforming to the rules of aDTD. Each XML document has (i) a prologue including aDTD, i.e., a set of grammar rules specifying the documentgeneric logical structure; and (ii) a document instance con-taining the information content as well as the tags e.g., thespecific logical structure of the document.

In XML (or SGML) jargon, the logical components of adocument are called elements, and element names are usedas tags in the document. The cross references between ele-ments are specified using keywords ID (for the element thatwill be referenced) and IDREF (for the element referenc-ing it). The specification of an element in the DTD gives itsname, its structure and indications. The element structure isbuilt using other elements or basic types such as #PCDATA,EMPTY, etc. and connectors that can be further qualifiedwith occurrence indicators. In particular, the following indi-cators could be used:

(1) the aggregation connector “,” implies an order betweenelements, whereas aggregation connector “&” does notimply an order between elements;

(2) the choice connector “|” provides an alternative in thetype definition;

(3) the optional indicator “?” indicates zero or one occur-rence of an element, the plus sign “+” indicates one ormore occurrences of an element, and the asterisk “∗”indicates zero or more occurrences.

2.2 Characteristics of object-oriented databases

Object-oriented databases combine ideas from traditionaldatabase management systems, semantic data models, knowl-edge representation in artificial intelligence and object-oriented programming [4]. The object-oriented paradigmbased on the notion of object and class has gained wide ac-ceptance as a unifying paradigm for the design of databasesystems.

Objects represent things or concepts from an applicationenvironment. Each object has a state and behavior. The stateof an object is the set of attributes. An attribute refers toinstance variables in object-oriented terminology. It corre-sponds to a column of a relation in relational databases. Thebehavior of an object is the set of operations which oper-ate upon the state of the object. Every object has a uniqueidentity that does not change throughout its lifetime. Theobject identifier is independent of values of attributes. Classdenotes a set of objects with common characteristics. Theconcept of a class serves two purposes: first, it describes thestructure of objects and operations that are used to access the

object; second, it represents a collection of all objects whichhave the same type. Objects that belong to a class are calledinstances of the class.

The classes are organized in a hierarchy called class hier-archy. The notion of class hierarchy and inheritance providesa convenient way of organizing and describing relationshipsbetween objects. A class can be created from an existingclass. The new class is called a subclass of the existing class,which is called a superclass. A subclass inherits all prop-erties (attributes and operations) from the superclass and itcan have additional properties, operators and redefined op-erations. Object-oriented databases allow single inheritanceor multiple inheritance. With single inheritance, a subclassinherits properties from only one superclass, whereas multi-ple inheritance allows a class to inherit properties from morethan one class. Object-oriented databases also allow the stateand behavior of an object to be accessed or invoked onlythrough messages. For each message understood by an ob-ject there is a corresponding method that executes the mes-sage. An object reacts to a message by executing the corre-sponding method and returning an object or performing anaction.

2.3 Fuzzy set and possibility distribution

Fuzzy sets are sets whose elements have degrees of mem-bership. The fuzzy set was originally introduced by Zadehin [24, 25] as an extension of the classical notion of set. Inclassical set theory, the membership of elements in a set isassessed in binary terms according to a bivalent condition—an element either belongs or does not belong to the set. Bycontrast, fuzzy set theory permits the gradual assessment ofthe membership of elements in a set; this is described withthe aid of a membership function valued in the real unit in-terval [0,1]. Fuzzy sets generalize classical sets, since theindicator functions of classical sets are special cases of themembership functions of fuzzy sets, if the latter only takevalues 0 or 1. In fuzzy set theory, classical bivalent sets areusually called crisp sets. Possibility distribution is an im-portant concept used in possibility theory (possibility the-ory is a mathematical theory for dealing with certain typesof uncertainty and is an alternative to probability theory).A possibility distribution is a function (or a measure) thatdescribes the possibility of a variable taking crisp values (orlinks each outcome with its possibility of occurrence). Insummary, a fuzzy set is a representation of a concept whilepossibility distribution relates the possibility of a value oc-curring within a distribution.

3 Fuzzy XML data model

In this section, the fuzziness in XML and the representationmodel of the fuzzy XML are introduced.

J. Liu

3.1 Fuzziness in XML documents

Two kinds of fuzziness can be found in a relational model:one is to associate membership degrees with individual tu-ples, and the other is to represent attribute values with possi-bility distributions. A membership degree associated with atuple is interpreted to mean the possibility of the tuple beinga member of the corresponding relation. A possibility distri-bution represented as an attribute value means that we do notknow a crisp value of the attribute but we know the range ofvalues that the attribute may take and the possibility of eachvalue being true.

XML data are structured, and XML can naturally rep-resent imprecise and uncertain information [15, 17]. In thecase of XML, when elements (or attributes) are uncertain,then we can associate them with membership degrees to rep-resent them and their possibilities. Similarly, when values ofelements (or attributes) are uncertain, we can use possibil-ity distributions to express them and their possibilities. Thetheoretical foundation of the suitability of using the aboverepresentations to describe fuzzy data and their possibilitiesis that XML has flexible format and the character of self-definition. XML restricts attributes to only a unique singlevalue. We modify the schema in XML to make any XMLattribute with multiple possible values associated with pos-sibility distributions into a sub-element.

Now let us interpret what a membership degree asso-ciated with an element means, given that the element cannest under other elements, and more than one of these el-ements may have an associated membership degree. Theexistential membership degree associated with an elementshould be the possibility that the state of the world includesthis element and the sub-tree rooted at it. For an elementwith the sub-tree rooted at it, not each node in the sub-tree is treated as independent but dependent upon its root-to-node chain. Each possibility in the source XML docu-ment is assigned conditioned on the fact that the parent el-ement exists certainly. In other words, this possibility is arelative one based upon the assumption that the possibil-ity the parent element exists is exactly 1.0. In order to cal-culate the absolute possibility, we must consider the rela-tive possibility in the parent element. In general, the abso-lute possibility of an element e can be obtained by multi-plying the relative possibilities found in the source XMLalong the path from e to the root. It should be pointed outthat, using multiply operator is just a method to evaluatethe absolute possibility, but it is not the only way to ag-gregate the relative possibilities (i.e., other t-norm operatorscan be also used). Of course, each of these relative possi-bilities will be available in the source XML document. Bydefault, relative possibilities are therefore regarded as 1.0(i.e., the relative possibility is equal to 1.0 when correspond-ing to the certain cases). Consider a chain A→B→C from

the node A to C. Assume that the source XML documentcontains the relative possibilities Poss(C|B), Poss(B|A), andPoss(A), associated with the nodes C, B, and A, respec-tively. Then we have Poss(B) = Poss(B|A)×Poss(A) andPoss(C) = Poss(C|B)×Poss(B|A)×Poss(A). Here Poss(A),Poss(C|B) and Poss(B|A) can be obtained from the sourceXML document. For example, consider the (partial) chainuniversity (depicted as U for short) → department (de-picted as D for short) → Professor (depicted as P for short)from node university to Professor (data content) as shownin Fig. 1, from Fig. 1, we can see that Poss(P|D) = 0.8,Poss(D|U) = 0.6 and Poss(U) = 1, associated with nodes U,D and P respectively. Then we have Poss(D) = Poss(U)×Poss(D|U) = 0.8 and Poss(D) = Poss(U)×Poss(D|U)×Poss(P|D) = 0.48.

For values of elements (or attributes), XML restricts ele-ments (or attributes) to a unique precise value. It is not dif-ficult to find that this restriction does not always hold true.It is often the case that some data items may be completelyunknown and their possible values can be specified with apossibility distribution. For instance, the age (see Fig. 1, ageis a sub-element of element student) of a person in years isa unique nonnegative integer (i.e., its possible values belongto a continuous domain). If such a value is unknown so far,we can use a predefined triangular distribution by specifyingits value range (i.e., its start value and end value) to expressit.

As summarized in [17], we have two kinds of fuzziness inan XML document: (i) the first is the fuzziness in elements,and we use membership degrees associated with such ele-ments; (ii) the second is the fuzziness in values of elements(or attributes), and we use possibility distribution to repre-sent such values. Let us take the material originally providedin [17] as an example. Figure 1 shows a fragment of an XMLdocument with fuzzy information talking about the univer-sities in an area of a given cite, say Detroit, in the USA,in which a possibility attribute, denoted Poss, which takesa value of [0,1], should be introduced first. This possibilityattribute is applied together with a fuzzy construct called Valto specify the possibility of a given element existing in theXML document.

Consider Line 3 of Figure 1, <Val Poss = 0.8>, whereit is stated that the possibility of the given element univer-sity being Oakland University located in Detroit is equal to0.8. In our discussions, fuzzy XML indicates an XML doc-ument containing imprecise and uncertain information, aswell as deterministic information. For an element with pos-sibility 1.0, pair <Val Poss = 1.0> and </Val> is omittedfrom the XML document. Based on pair <Val Poss> and</Val>, possibility distribution being disjunctive for an ele-ment can be expressed. Also, possibility distribution can beused to express fuzzy elements and values. For this purpose,we introduce another fuzzy construct called Dist to spec-ify a possibility distribution. Typically, a Dist element has

Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

multiple Val elements as children, each with an associatedpossibility. Again consider Fig. 1. Lines 31–36 and 40–46are the Dist constructs for the age and email of student “TomSmith”, respectively. It should be pointed out that, the possi-bility distributions in lines 31–36 and lines 40–46 are all forleaf nodes. In fact, we can also have possibility distributionsand values over non-leaf nodes. Observe the disjunctive Distconstruct in lines 7–24, which express the three possible sta-tuses for the possibilistic variable employees. Lines 7–12imply the possibility of the possibilistic variable employeeis equal to 0.8, lines 13–18 express the possibility of the pos-sibilistic variable employee is equal to 0.6, and lines 19–24imply the possibility of the possibilistic variable employeeis equal to 0.5.

3.2 DTD modification in fuzzy XML

As soon as fuzzy information is accommodated in XML, theDTD of the source XML document has to be correspond-ingly modified. In this section, we focus on DTD modifica-tion for fuzzy XML data modeling.

First, the fuzzy construct Val is defined as follows:

<!ELEMENT Val (#PCDATA | original-definition)>

<!ATTLIST Val Poss CDATA “1.0”>

Then the fuzzy construct Dist can be defined as follows:

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

Now we modify the element definition in the classicalDTD so that all of the elements can use possibility distri-butions (Dist). For a leaf element which only contains textor #PCDATA, say, leafElment, its definition in the DTD ischanged from

<!ELEMENT leafElement(#PCDATA)> to

<!ELEMENT leafElement(#PCDATA | Dist)>

that is, leaf element leafElement may be a deterministic one(e.g., sname of student in Fig. 1), and then could be definedas

<!ELEMENT leafElement(#PCDATA)>

Also, it is possible that leaf element leafElement may be afuzzy one, taking a value represented by a possibility distri-bution, then it may be defined as

<!ELEMENT leafElement(Dist)>

Furthermore, we have the following definition for theleaf elements with discrete values (e.g., email of student inFig. 1):

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (#PCDATA)>

<!ATTLIST Val Poss CDATA “1.0”>

1. <universities>2. <university UName = “Oakland University”>3. <Val Poss = 0.8>4. <department DName = “Computer Science andEngineering”>5. <employee FID = “85431095”>6. <Dist type = “disjunctive”>7. <Val Poss = 0.8>8. <fname>Frank Yager</name>9. <position>Associate Professor</position>10. <office>B1024</office>11. <course>Advances in Database Systems</course>12. </Val >13. <Val Poss = 0.6>14. <fname>Frank Yager</name>15. <position>Professor</position>16. <office>B1024</office>17. <course>Advances in Database Systems</course>18. </Val >19. <Val Poss = 0.5>20. <fname>Frank Yager</name>21. <position>Lecturer</position>22. <office>B1025</office>23. <course>Artificial Intelligence</course>24. </Val >25. . . .

26. </Dist>27. </employee>28. <student SID = “96421027”>29. <sname>Tom Smith</name>30. <age>31. <Dist type = “disjunctive”>32. <Val type = “triangular distribution”>33. <startValue>24</startValue>34. <endValue>25</endValue>35. </Val>36. </Dist>37. </age>38. <sex>Male</sex>39. <email>40. <Dist type = “disjunctive” >41. <Val Poss = 0.60>[email protected]</Val>42. <Val Poss = 0.85>[email protected]</Val>43. <Val Poss = 0.85>[email protected]</Val>44. <Val Poss = 0.55>[email protected]</Val>45. <Val Poss = 0.45>[email protected]</Val>46. </Dist>47. </email>48. </student>49. </department >50. </Val>51. <Val Poss = 0.5>52. <department DName = “Biology”>53. . . .

54. </department>55. </Val>56. . . .

57. </university>58. <university Uname = “Wayne State University”>59. . . .

60. </university>61. . . .

62. </universities >

Fig. 1 A fragment of an XML document with fuzzy data

J. Liu

And for the leaf elements with continuous values (e.g., ageof student in Fig. 1), we have the following definition:

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (startRange, endRange)>

<!ATTLIST Val type (triangular distribution)>

<!ELEMENT startRange (#PCDATA)>

<!ELEMENT endRange (#PCDATA)>

For the non-leaf element, say nonleafElement, first weshould change the element definition form

<!ELEMENT nonleafElement(original-definition)>

to

<!ELEMENT nonleafElement(original-definition |Val+ | Dist)>

and then add

<!ELEMENT Val(original-definition)>

that is, the non-leaf element nonleafElement may be deter-ministic (e.g., student in Fig. 1) and then may be defined as

<!ELEMENT nonleafElement(original-definition)>

When the non-leaf element nonleafElement is a fuzzyone, we differentiate two situations: the element takes avalue connected with a possibility degree (e.g., universityin Fig. 1), and second, the element takes a set of values andeach value is connected with a possibility degree (e.g., em-ployee in Fig. 1). The former is defined as follows:

<!ELEMENT nonleafElement(Val+)>

<!ELEMENT Val (original-definition)>

<!ATTLIST Val Poss CDATA “1.0”>

The latter is defined as follows:

<!ELEMENT nonleafElement(Dist)>

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (original-definition)>

<!ATTLIST Val Poss CDATA “1.0”>

4 Reengineering flat fuzzy XML in fuzzy OO databases

The flat fuzzy XML document is a very simple XML for-mat that only has simple leaf elements (or attributes) as sub-elements of non-leaf elements (e.g., the fragment of fuzzytree rooted at non-leaf element student in Fig. 1). In this sec-tion, we present the proposed process for reengineering flatfuzzy XML in a fuzzy OO database. The detailed reengi-neering rules are depicted as follows.

Algorithm 1 Reengineering Flat Fuzzy XML

Input: flat fuzzy XML DTD F

Output: corresponding fuzzy OO database schema O

01 Create a class in O by applying Rule 1

02 Define the OID for each class identified in step 1 byapplying Rule 2

03 Map the reference attribute R in F to the attributewithin the generated class by applying Rule 3

04 Transform the deterministic leaf elements L in F intoattributes within the generated class by applying Rule 4

05 Transform the deterministic leaf elements L in F intoattributes within the generated class by applying Rule 5and 6

Rule 1. For the flat fuzzy XML F , a class with the samename as the root of F and corresponding method are created.

Rule 2. For the key attribute with the name K in the flatfuzzy XML F , an attribute with the same name as K is usedfor creating the object identifier (OID) for class F .

Rule 3. For the reference attribute with the name R in theflat fuzzy XML F , an attribute with the same name as R iscreated and then put inside class F .

Rule 4. For each deterministic leaf element L in the flatfuzzy XML F , an attribute with the same name as L is cre-ated and then put inside class F .

Rule 5. For each fuzzy leaf element FL in the flat fuzzyXML F , if FL has a possibility to the root of F , i.e., the ele-ment takes a value connected with a possibility degree in theXML instance, then an attribute with the same name as FLfollowed by a pair of words WITH mem DEGREE, where0 ≤ mem ≤ 1 is a scalar and used to indicate the degree thatthe attribute belongs to the class, is created and then put in-side class F .

Rule 6. For each fuzzy leaf element FL in the flat fuzzyXML F , if L has a set of values and each value is connectedwith a possibility degree in the XML instance, then a key-word Fuzzy followed by an attribute with the same name asFL is created and then put inside class F .

Based on the reengineering roles above, we present thealgorithm for reengineering flat fuzzy XML in fuzzy OOdatabases. The pseudo-code is depicted in Algorithm 1.

To understand the steps of Algorithm 1, we present moredetails with the following supporting examples. Figure 2(a)shows a flat fuzzy XML DTD, where the root element SUVhas a key attribute OID, a reference attribute RID, and threeleaf elements Name, 4WD (4 wheel driven) and Displace-ment. According to Algorithm 1, a class named Car is firstlycreated, and then OID is translated into the object identi-fier of the created class. RID is mapped into one of the at-tributes of the created class. From Fig. 2(a), we know thatName is deterministic leaf element, therefore, it could be di-

Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

<!ELEMENT SUV (Name+, Val+, Displacement+)>

<!ATTLIST SUV OID ID #REQUIRED)>

<!ATTLIST SUV RID IDREF #REQUIRED)>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Val (4WD+)>

<!ELEMENT 4WD (#PCDATA)>

<!ELEMENT Displacement (Dist)>

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (#PCDATA)>

<!ATTLIST Val Poss CDATA “1.0”>

(a) Flat fuzzy XML DTD

(b) Fuzzy OO database schema

Fig. 2 Reengineering flat fuzzy XML in fuzzy OO database

rectly mapped into one of the attributes of the created class.From Fig. 2(a), we could also know that, 4WD may belongto SUV, since SUVs may or may not have 4WD in the realworld. So when class SUV is designed, it is unknown forsure if attribute 4WD should be included in class SUV. Asa result, we use attribute 4WD WITH mem DEGREE to de-scribe it in the class definition. Moreover, leaf element Dis-placement may take a set of fuzzy values, that is, its domainis fuzzy, therefore, an attribute whose name is Fuzzy Dis-placement is translated inside the created class. Figure 2(b)shows the final reengineering result.

After the schema reengineering, flat fuzzy XML datacould be effectively reorganized under the fuzzy OO schema.Migrating data from the flat fuzzy XML to OO platformconsists of uploading XML data, restructuring and upload-ing data to the OO platform. Since this migrating processingis quite straightforward, we will not go into the detailed pro-cess but only give the data migration rules so that we havean overall picture of data migration. We should first populatethe classes which are transformed from flat fuzzy XML, andthen map the key attribute, reference attribute, deterministicleaf elements and fuzzy leaf elements.

5 Reengineering nested fuzzy XML in fuzzy OOdatabases

In fuzzy XML, semantic association between data is ex-pressed as nested constraints between data values. To bemore specific, nested elements are used to define the rela-tionship between two elements, and reference attributes areused to express inheritance and association between XMLtrees (an XML tree consists of a set of elements and at-tributes, and is the basic unit for organizing XML infor-mation). In OO databases, attribute values between objects,and the semantic associations between objects, inheritanceor association are directly expressed as inheritance or ref-erences between objects by OIDs. In view of the differencebetween fuzzy XML and OO database, in the following, wewill introduce our solutions which could smoothly reengi-neer nested fuzzy XML in OO database. We begin with in-troducing the basic notion used in our approach.

Given any element e in fuzzy XML, a child element clue,denoted as CE(e) = {c1, c2, . . . , cn}, n = 1,2, . . . , n, is alldistinct deterministic child elements (or the nearest deter-ministic descendants) of e. This clue is easily derived fromDTD constraint. Two elements ei and ej apply to a commonclass if and only if they denote the objects of the domainof interest with common characteristics. Here, user involve-ment is required to identify these characteristics. In particu-lar, given that two elements ei and ej belonging to a com-mon class in fuzzy XML, a common child element clue, de-noted as CCE(ei, ej ), is all common deterministic child el-ements (or the nearest deterministic descendants) betweenei and ej . For example, consider the DTD in Fig. 3(a), allchildren of element SUV are OID, RID, Name, 4WD andDisplacement, therefore we have CE(SUV) = {OID, RID,Name, 4WD, Displacement}. Likewise, we have CE(Truck)= {OID, RID, Name, Carrying Capacity, Displacement}.Since SUV and Truck are both vehicles, that is to say, theyhave some common characteristics and belong to a com-mon vehicle class. As a result, we have CCE(SUV, Truck)= {OID, RID, Name, Displacement}.

In the following, we will give the reengineering rules ofthe relationships such as fuzzy generalizations, fuzzy asso-ciations and fuzzy aggregations in a fuzzy OO database.

Generalization in fuzzy OO database defines a sub-class/superclass relationship between classes: one class,which is called a superclass, is a more general description ofa set of other classes, which are called subclasses. In partic-ular, we have the following rule:

Rule 7. Given two non-leaf elements ei and ej in fuzzyXML, CE(ei) = {c1, c2, . . . , ci}, n = 1,2, . . . , i, CE(ei) ={c1, c2, . . . , cj }, n = 1,2, . . . , j , if CCE(ei, ej ) is not null,assume that CCE(ei, ej ) = {c1, c2, . . . , ck}, k ≤ i, and k ≤ j ,then a fuzzy generalization relationship is created whenreengineering nested fuzzy XML in fuzzy OO database. In

J. Liu

<!ELEMENT Vehicle (SUV+, Truck+)>

<!ELEMENT SUV (Name+, Val+, Displacement+)>

<!ATTLIST SUV OID ID #REQUIRED)>

<!ATTLIST SUV RID IDREF #REQUIRED)>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Val (4WD+)>

<!ELEMENT 4WD (#PCDATA)>

<!ELEMENT Displacement (Dist)>

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (#PCDATA)>

<!ATTLIST Val Poss CDATA “1.0”>

<!ELEMENT Truck (Name+, Carrying Capacity+,Displacement+)>

<!ELEMENT Name (#PCDATA)>

<!ATTLIST Truck OID ID #REQUIRED)>

<!ATTLIST Truck RID IDREF #REQUIRED)>

<!ELEMENT Carrying Capacity (#PCDATA)>

(a) Nested fuzzy XML DTD

(b) Fuzzy OO database schema

Fig. 3 Reengineering fuzzy generalization in fuzzy OO database

particular, a superclass with attributes with the same namesas c1, c2, . . . , ck is firstly created, and then two subclasseswith attributes with the same names as ck+1, ck+2, . . . , ci

and ck+1, ck+2, . . . , cj are created.For generalization, the variances among nested fuzzy

XML trees are suppressed and their commonalities are iden-tified by generalizing them into one single class. The orig-inal nested fuzzy XML tree with each of its unique differ-ences is the special subclass(es). For example, consideringthe DTD shown in Fig. 3(a), the root element Vehicle ofthe nested fuzzy XML has two non-leaf elements Truck andSUV as its descendants, where Truck has a key attribute

<!ELEMENT Owner (Address+, Salary+, Own+)>

<!ELEMENT Address (Dist)>

<!ELEMENT Salary (#PCDATA)>

<!ELEMENT Own (SUV+)>

<!ELEMENT SUV (Name+, Val+, Displacement+)>

<!ATTLIST SUV OID ID #REQUIRED)>

<!ATTLIST SUV RID IDREF #REQUIRED)>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Val (4WD+)>

<!ELEMENT 4WD (#PCDATA)>

<!ELEMENT Displacement (Dist)>

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (#PCDATA)>

<!ATTLIST Val Poss CDATA “1.0”>

(a) Nested fuzzy XML DTD

(b) Fuzzy OO database schema

Fig. 4 Reengineering fuzzy association in fuzzy OO databases

OID, a reference attribute RID, and three leaf elementsName, Carrying Capacity and Displacement, and SUV hasa key attribute OID, a reference attribute RID, and threeleaf elements Name, 4WD, and Displacement. Accordingto Rule 7, a superclass named Vehicle is firstly created (re-call the flat fuzzy XML reengineering shown in Sect. 4),where OID is translated into the object identifier of classVehicle, element RID, Name, and Displacement are mappedinto the attributes of class Vehicle. Then, two subclassesnamed Truck and SUV are created, where class Truck hasthe unique attribute Carrying Capacity and class SUV hasthe unique attribute 4WD inside the subclasses. Figure 3(b)shows the final reengineering result.

Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

<!ELEMENT Computer (Price, Host, Display+)>

<!ELEMENT Price (Dist)>

<!ELEMENT Host (Manufacturers+, Date+)>

<!ELEMENT Manufacturers (#PCDATA)>

<!ELEMENT Date (#PCDATA)>

<!ELEMENT Display (Name, Val+)>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Val (Specifications+)>

<!ELEMENT Specifications (#PCDATA)>

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (#PCDATA)>

<!ATTLIST Val Poss CDATA “1.0”>

(a) Nested fuzzy XML DTD

(b) Fuzzy OO database schema

Fig. 5 Reengineering fuzzy aggregation in fuzzy OO databases

In fuzzy XML, associations among XML trees are repre-sented by using links. Link is a physical conceptual connec-tion between object instances. Associations in fuzzy XMLdescribe a group of links with common structure and se-mantics and pointer as an attribute in an object that com-bines an explicit reference to another object. In practical ap-plications, these relationships could be effectively identifiedby user involvement. In particular, we have the followingrule:

Rule 8. Given two non-leaf elements ei and ej in fuzzyXML, assume that ei exists as an explicit reference to ej ,then a fuzzy association relationship is created when reengi-neering these two non-leaf elements of fuzzy XML in thefuzzy OO database. In particular, two classes with the samenames as ei and ej are firstly created, and a fuzzy associa-tion on an n : m basis with its corresponding multiplicity oflinks and pointers is then created. Here, n and m are the con-

straints constrained by cardinality, which could be directlyobtained by scanning the XML data beforehand.

Algorithm 2 Reengineering Nested Fuzzy XML

Input: nested fuzzy XML DTD F

Output: corresponding fuzzy OO database schema O

01 for each non-leaf elements in F

02 create a class whose attributes have the same namesas its child element clue in O by applying Algorithm 1

03 end for

04 for non-leaf elements with fuzzy generalizations

05 transform fuzzy generalizations applying Rule 7

06 end for

07 for the non-leaf elements with fuzzy associations

08 transform fuzzy associations by applying Rule 8

09 end for

10 for the non-leaf elements with fuzzy aggregations

11 transform fuzzy aggregations by applying Rule 9

12 end for

Figure 4 shows an example of reengineering fuzzy asso-ciation in fuzzy OO databases. According to Rule 8, twoclasses named Owner and SUV are created based on theDTD shown in Fig. 4(a). Assume that the ratio of cardinal-ity constraints is 1 : n, i.e., an owner might have at least oneSUV. Then a fuzzy association on a 1 : n basis is establishedbetween classes Owner and SUV. Figure 4(b) shows the fi-nal reengineering result.

Fuzzy aggregations are relationships that describe com-binations among class instances. In a fuzzy OO database,fuzzy aggregations permit the combination of class(es) thatare related into a higher level of a composite class. In partic-ular, we have the following rule:

Rule 9. Given three non-leaf elements ei , ej and ek infuzzy XML, assume that ei is the composite element ofej and ek , then a fuzzy aggregation relationship is createdwhen reengineering these three non-leaf elements of fuzzyXML in the fuzzy OO database. In particular, three classeswith the same names as ei , ej and ek are firstly created, andthen a fuzzy aggregation among classes ei , ej and ek is cre-ated.

Figure 5 shows an example of reengineering fuzzy ag-gregation in a fuzzy OO database. According to Rule 9,three classes named Computer, Host, and Display are cre-ated based on the DTD shown in Fig. 5(a). Then a fuzzyaggregation is established among classes Computer, Hostand Display. Figure 5(b) shows the final reengineering re-sult.

J. Liu

<!ELEMENT Vehicle (SUV+, Truck+)>

<!ELEMENT SUV (Name+, Val+, Displacement+)>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Val (4WD+)>

<!ELEMENT 4WD (#PCDATA)>

<!ELEMENT Displacement (Dist)>

<!ELEMENT Dist (Val+)>

<!ELEMENT Truck (Name+, Carrying Capacity +, Displacement+)>

<!ELEMENT Carrying Capacity (#PCDATA)>

<!ELEMENT Owner (Address+, Salary+, Own+)>

<!ELEMENT Address (Dist)>

<!ELEMENT Salary (#PCDATA)>

<!ELEMENT Own (SUV+, Computer+)>

<!ELEMENT Computer (Price, Host, Display+)>

<!ELEMENT Price (Dist)>

<!ELEMENT Host (Manufacturers+, Date+)>

<!ELEMENT Manufacturers (#PCDATA)>

<!ELEMENT Date (#PCDATA)>

<!ELEMENT Display (Name, Val+)>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Val (Specifications+)>

<!ELEMENT Specifications (#PCDATA)>

<!ELEMENT Dist (Val+)>

<!ATTLIST Dist type (disjunctive)>

<!ELEMENT Val (#PCDATA)>

<!ATTLIST Val Poss CDATA “1.0”>

(a) Nested fuzzy XML DTD

(b) Fuzzy OO database schema

Fig. 6 An example of reengineering nested fuzzy XML in fuzzy OO databases

Based on the reengineering roles above, we now presentthe algorithm for reengineering nested fuzzy XML in thefuzzy OO database. The pseudo-code is depicted in Algo-rithm 2.

To understand the steps of Algorithm 2, we present moredetails with the following supporting example. Accordingto Algorithm 2, non-leaf elements Owner, Vehicle, SUV,Truck, Computer, Host and Display are firstly transformed

Formal approach for reengineering fuzzy XML in fuzzy object-oriented databases

into corresponding classes. Since the relationships amongVehicle, SUV and Truck are fuzzy generation, then theycould be reengineered based on Rule 7 in a fuzzy OOdatabases. Moreover, the relationships among Owner, SUVand Computer are fuzzy association, then they could bereengineered based on Rule 8 in a fuzzy OO databases. Therelationships among Computer, Host and Display are fuzzyaggregation, they are therefore reengineered based on Rule 9in a fuzzy OO databases. Finally, the nested DTD shownin Fig. 6(a) could be reengineered as the schema shown inFig. 6(b).

6 Conclusion

In this paper, we study the methodology of reengineeringXML to object-oriented databases when database migra-tion occurs. In particular, considering the need of processingthe imprecise and uncertain information existing in prac-tical applications, we investigate the problem of migrat-ing fuzzy XML to fuzzy object-oriented databases. In or-der to achieve the formal transformation from fuzzy XMLto fuzzy OO database, we developed a set of rules, whichsuccessfully handles the reengineering process. Further-more, we develop comprehensive approaches to reengineerflat and nested XML DTDs in fuzzy OO databases. Theproposed approaches successfully help us find the object-oriented schema that best describes the existing fuzzy XMLschema.

Future research is both geared towards applicability andenhancement. We are currently working on a prototypeshowing the feasibility of the approach for large practicalapplications. We furthermore work on automatically reengi-neering fuzzy XML in fuzzy OO databases without hu-man intervention, as well as devising the reengineering ap-proach with the use of XML Schema, as enhancements ofour methodology.

Acknowledgements The authors thank the anonymous referees fortheir valuable comments and suggestions, which improved the techni-cal content and the presentation of the paper. The work is supportedby the National Natural Science Foundation of China (60873010)and the Fundamental Research Funds for the Central Universities(N090504005 and N100604017).

References

1. Abiteboul S, Segoufin L, Vianu V (2001) Representing and query-ing XML with incomplete information. In: Proceedings of PODS,pp 150–161

2. Abiteboul S, Senellart P (2006) Querying and updating probabilis-tic information in XML. In: Proceedings of EDBT, pp 1059–1068

3. An Y, Borgida A, Mylopoulos J (2005) Constructing complex se-mantic mappings between XML data and ontologies. In: Proceed-ings of international semantic web conference, pp 6–20

4. Bhalla N (1991) Object-oriented data models: a perspective andcomparative review. J Inf Sci 17:145–160

5. Calvanese D, Lenzerini M, Nardi D (1999) Unifying class-basedrepresentation formalisms. J Artif Intell Res 11:199–240

6. Christophides V, Abiteboul S, Cluet S, Scholl M (1994) Fromstructured documents to novel query facilities. In: Proceedings ofSIGMOD, pp 313–324

7. Fong J (1995) Mapping extended entity relationship model to ob-ject modeling technique. ACM SIGMOD Rec 24(3):18–22

8. Fong J (2002) Translating object-oriented database transactionsinto relational transactions. Inf Softw Technol 44(1):41–51

9. Fong J, Cheung SK (2005) Translating relational schema intoXML schema definition with data semantic preservation and XSDgraph. Inf SoftwTechnol 47:437–462

10. Gaurav A, Alhajj R (2006) Incorporating fuzziness in XML andmapping fuzzy relational data into fuzzy XML. In: Proceedings ofthe 2006 ACM symposium on applied computing, pp 456–460

11. Hollander ES, van Keulen M (2010) Storing and querying proba-bilistic XML using a probabilistic relational DBMS. In: Proceed-ings of the 4th international workshop on management of uncer-tain data (MUD 2010), pp 35–49

12. Hung E, Getoor L, Subrahmanian VS (2003) PXML: a proba-bilistic semistructured data model and algebra. In: Proceedings ofICDE, pp 467–478

13. Liu J, Ma ZM, Yan L (2009) Efficient processing of twig patternmatching in fuzzy XML. In: Proceedings of CIKM, pp 193–204

14. Ma ZM (2004) Advances in fuzzy object-oriented databases, mod-eling and applications. Idea Group Publishing, Hershey

15. Ma ZM, Liu J, Yan L (2010) Fuzzy data modeling and algebraicoperations in XML. Int J Intell Syst 25(9):925–947

16. Ma ZM, Liu J, Yan L (2011) Matching twigs in fuzzy XML. InfSci 181(1):184–200

17. Ma ZM, Yan L (2007) Fuzzy XML data modeling with the UMLand relational data models. Data Knowl Eng 63:972–996

18. Naser T, Alhajj R, Ridley MJ (2009) Two-way mapping betweenobject-oriented databases and XML. Informatica 33(3):297–308

19. Nierrman A, Jagadish HV (2002) ProTDB: probabilistic data inXML. In: Proc VLDB, pp 646–657

20. Pei J et al (2007) Probabilistic skylines on uncertain data. In: Pro-ceedings of VLDB, pp 15–26

21. Soutou C (2001) Modeling relationships in object-relationaldatabases. Data Knowl Eng 36:79–107

22. Turowski K, Weng U (2002) Representing and processing fuzzyinformation an XML-based approach. J Knowl Based Syst 15:67–75

23. Valova I, Milano G, Bowen K, Gueorguieva N (2011) Bridgingthe fuzzy, neural and evolutionary paradigms for automatic targetrecognition. Appl Intell 35(2):211–225

24. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–35325. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility.

Fuzzy Sets Syst 1(1):3–2826. Zajaczkowski J, Verma B (2012) Selection and impact of different

topologies in multi-layered hierarchical fuzzy systems. Appl Intell36(3):564–584

27. Zhang X, Fong J (2000) Translating update operations from rela-tional to object-oriented databases. Inf Softw Technol 42:197–210

J. Liu

Jian Liu was born in 1984. Since2009, he has been a Ph.D. candi-date in computer application tech-nology from the Northeastern Uni-versity, Liaoning, China. His cur-rent research interests include un-certain databases and XML datamanagement.

Z.M. Ma is a Full Professor in Col-lege of Information Science and En-gineering at Northeastern Univer-sity, China. His current research in-terests include intelligent databasesystems, knowledge management,the Semantic Web and XML, knowledge-based systems, and engineeringdatabase modeling. He has pub-lished over 150 papers in interna-tional journals, conferences, editedbooks and encyclopedias in theseareas since 1998. He also edited andauthored several scholarly bookspublished by Idea Group Publishing

and Springer-Verlag, respectively. He is currently a Senior Member ofIEEE.

Xue Feng was born in 1982. Since2010, she has been a Ph.D. candi-date in computer application tech-nology from the Northeastern Uni-versity, Liaoning, China. Her cur-rent research is XML data manage-ment.


Recommended