[ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information

Post on 26-Jan-2015

104 views 2 download

description

 

transcript

A Probabilistic Relational Data

Model for Uncertain Information

Nguyen Hoa and Tran Duc Hieu

IEEE 2013 the 3rd International Conference on Information Science and Technology (ICIST 2013)

March23-25, Yangzhou, Jiangsu, China & March 27-28, Phuket, Thailand

Reporter: Tran Duc Hieu Department for Computational and Knowledge Engineering

Institute of Applied Mechanics and Informatics

Vietnam Academy of Science and Technology

Contents

Introduction 1

Uncertain Attribute Values 2

Probabilistic Relational Data Base model 3

Selection Operation 4

PRDB Management System 3

Conclusions and Future Works 4

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 2

Introduction

• Motivation The restriction of Traditional Relational Database

(RDB) in representing and handling uncertain and

imprecise information

Uncertain or imprecise information is very

important and also very popular in daily life

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 3

Introduction

• Objectives Build a new Probabilistic Relational Data Base

(PRDB) model to represent and handle uncertain

information in the real world

Build an initial PRDB-SQLite Management System

to demonstrate the ability to apply and process of

PRDB in reality

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 4

Some Probabilistic Combination Strategies

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 5

Strategy Operators

Independence ([L1, U1] in [L2, U2]) [L1 . L2, U1 . U2]

([L1, U1] in [L2, U2]) [L1 + L2 – (L1 . L2), U1 + U2 – (U1 . U2)]

([L1, U1] ⊖in [L2, U2]) [L1 . (1 – U2), U1 . (1– L2)]

Mutual Exclusion ([L1, U1] me [L2, U2]) [0, 0]

([L1, U1] me [L2, U2]) [min(1, L1 + L2), min(1, U1 + U2)]

([L1, U1] ⊖me [L2, U2]) [L1, min(U1, 1 – L2)]

• Prob(e1) = [L1, U1], prob(e2) = [L2, U2]

Prob(e1 e2), Prob(e1 e2), Prob(e1 e2) is calculated as followed

Uncertain Attribute Values

• In PRDB the value of each attribute is a probabilistic triple

A: V, α, β

V: a set of values of the atribute A ( V = {v1, v2,…,vk} )

α, β: lower bound and upper bound probabilistic

distribution on V

The attribute A take a value v in V with a probability

belongs to [α(v), β(v)]

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 6

PRDB Model

PRDB model is extended from RDB model by

integrating uncertain attribute values

Each tuple of a relation is a list of probabilistic triples

t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 7

Probabilistic Relations

A probabilistic relation r over a probabilistic relational

schema R(A1, A2, …, Ak) is

r = {t t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)}

Example

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 8

PATIENT_ID PHYSICIAN_ID DISEASE DURATION

PT0421, u, u DT005, u, u lung cancer, tuberculosis, 0.8u, 1.2u 400, 500, u, u

PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u

PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u

Probabilistic Functional Dependencies

The probabilistic measure for equal attribute values

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 9

where t1.A = V1, 1, 1, t2.A = V2, 2, 2 and

[(v), (v)] = [1(v1), 1(v1)] [2(v2), 2(v2)],

v W = (v1, v2) V1 V2 v1 = v2

[vW (v), min(1, vW (v))], if W

prob(t1.A = t2.A) =

[0, 0], if W =

Probabilistic Functional Dependencies

The probabilistic functional dependency in PRDB is

extended from the functional dependency in RDB

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 10

t1, t2 r, prob(t1[X] = t2[X]) prob(t1[Y] = t2[Y])

Selection Expressions

x.A v x X, A is an attribute in R, is a

binary relation from =, , , , , ≥

and v is a value

x.A1 = x.A2 is a probabilistic conjunction strategy of

combining the probabilities for x.A1 = v1

and x.A2 = v2 so that v1 = v2

E1 E2 E1 and E2 are selection expressions

E1 E2 is a probabilistic disjunction strategy

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 11

Selection Expressions

Example Relation DIAGNOSE

Selection expression

(x.DURATION 40) (x.COST 60)

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 12

PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST

PT0421, u, u DT005, u, u lung cancer, tuberculosis,

0.8u, 1.2u 400, 500, u, u 300, 350, u, u

PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u

PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u

Selection Conditions

(E)[L, U] E is a selection expression [L, U] is an

probabilistic interval

( ) and are selection conditions

( )

Example

((x.DURATION 40) (x.COST 60))[0.4, 0.6])

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 13

Probabilistic Interpretation of Selection Expressions

probt(E) is a probabilistic interval for a tuple t to satisfy

selection expression E

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 14

Probabilistic Interpretation of Selection Expressions

Probt (x.A d) = [vW (v), min(1, vW (v))]

Probt (x.A1 = x.A2) = [vW (v), min(1, vW (v))]

Probt (E1 E2) = probt (E1) probt (E2)

Probt (E1 E2) = probt (E1) probt (E2)

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 15

Satisfaction of Selection Conditions

Probt ⊨ (E)[L, U] if and only if probt(E) [L, U]

Probt ⊨ if and only if probt ⊨ does not hold

Probt ⊨ if and only if probt ⊨ and probt ⊨

Probt ⊨ if and only if probt ⊨ or probt ⊨

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 16

Selection Operation

The selection on a relation r with respect selection

condition

(r) = t r | probR,r,t ⊨

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 17

Selection Operation

Example Relation DIAGNOSE

Selection operation on DIAGNOSE with the selection condition

(x.DISEASE = hepatitis in x.COST 70)[0.4, 0.6])

is t =

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 18

PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST

PT0421, u, u DT005, u, u lung cancer, tuberculosis,

0.8u, 1.2u 400, 500, u, u 300, 350, u, u

PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u

PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u

PRDB-SQLite Architecture

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 19

PRDB-SQLite Schema

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 20

PRDB-SQLite Relation

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 21

PRDB-SQLite Query

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 22

Conclusions & Future Works

• Conclusions Building a new PRDB model which is extended from RDB

model

Uncertain values in PRDB model are represented by a

probabilistic triple

The notions of schema, relation, functional dependency,

and selection on PRDB are respectively defined

Implement a simple visual management system for PRDB

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 23

Conclusions & Future Works

• Future Works Build all other relational algebra operations on PRDB

Build a complete database management system for PRDB

Integrate fuzzy set value into the attribute value to build a

fuzzy and probabilistic relational data base

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 24

Any questions for us ?

March 27-28th 2013 in Kathu, Phuket, Thailand Sea Pearl Villa Resort

References

[1] R. Cavallo, M. Pittarelli, “The theory of probabilistic databases”, in Proc. 13th International

Conf. on Very Large Data Bases, Brighton, England, 1987, pp. 71-81.

[2] E. F. Codd, “A Relational model of data for large shared data banks”, Communications of

the Association for Computing Machinery, vol. 13,June. 1970, pp. 377-387.

[3] N. Fuhr, T. Rolleke, “A probabilistic relational algebra for the integration of information

retrieval and database systems”, Association for Computing Machinery Transactions on

Information Systems, vol. 15, Jan. 1997, pp. 32-66.

[4] T. Eiter, T. Lukasiewicz, and M. Walter, “Extension of the relational algebra to probabilistic

complex values”, in Proc. 1th International Symposium on Foundations of Information and

Knowledge System, Burg, Germany, 2000, 1762, pp. 95-115.

[5] T. Eiter, J. J. Lu, T. Lukasiewicz, and V. S. Subrahmanian, “Probabilistic object bases”,

Association for Computing Machinery Transactions on Database Systems, vol. 26, 2001, pp.

264–312.

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 26

References

[6] H. Garcia-Molina, J. D. Ullman, J. Widom, Database systems: the complete book, 2nd ed.,

Prentice Hall, Upper Saddle River, New Jersey, 2002.

[7] T. Imielinski, Jr. W. Lipski, “Incomplete Information in Relational Databases”, Journal of the

Association for Computing Machinery, vol. 31 issue 4, Oct. 1984, pp. 761-791.

[8] L. V. S. Lakshmanan, N. Leone, R. Ross, V. S. Subrahmanian, “Probview: A flexible

probabilistic database system”, Association for Computing Machinery Transactions on Database

Systems, vol. 22, 1997, pp. 419-469.

[9] H. Nguyen, T. H. Cao, “Extending probabilistic object bases with uncertain applicability and

imprecise values of class properties”, in Proc. 5th IEEE International Conf. on Fuzzy Systems,

London, England, 2007, pp. 487-492.

[10] T. H. Cao, H. Nguyen, “Uncertain and fuzzy object bases: a data model and Algebraic

operations”, International Journal of Transaction on Fuzzy Systems, 2011, pp. 275-305.

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 27

References

[11] H. D. Tran, “Constructing A Probabilistic Relational Data Base”, B.A. thesis, Dept.

Information. Tech., Ho Chi Minh City Open Univ., Ho Chi Minh City, Vietnam, 2010.

[12] W. Zhao, A. Dekhtyar, J. Goldsmith, “Databases for interval probabilities”, International

Journal of Intelligent Systems, vol. 19, 2009, pp. 789–815.

[13] W. Zhao, A. Dekhtyar, J. Goldsmith, “Query algebra operations for interval probabilities”,

in Proc. 14th International Conf. on Database and Expert Systems Applications, Prague, Czech

Republic, 2003, pp. 527-536.

March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 28