Post on 26-Jan-2015
description
transcript
A Probabilistic Relational Data
Model for Uncertain Information
Nguyen Hoa and Tran Duc Hieu
IEEE 2013 the 3rd International Conference on Information Science and Technology (ICIST 2013)
March23-25, Yangzhou, Jiangsu, China & March 27-28, Phuket, Thailand
Reporter: Tran Duc Hieu Department for Computational and Knowledge Engineering
Institute of Applied Mechanics and Informatics
Vietnam Academy of Science and Technology
Contents
Introduction 1
Uncertain Attribute Values 2
Probabilistic Relational Data Base model 3
Selection Operation 4
PRDB Management System 3
Conclusions and Future Works 4
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 2
Introduction
• Motivation The restriction of Traditional Relational Database
(RDB) in representing and handling uncertain and
imprecise information
Uncertain or imprecise information is very
important and also very popular in daily life
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 3
Introduction
• Objectives Build a new Probabilistic Relational Data Base
(PRDB) model to represent and handle uncertain
information in the real world
Build an initial PRDB-SQLite Management System
to demonstrate the ability to apply and process of
PRDB in reality
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 4
Some Probabilistic Combination Strategies
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 5
Strategy Operators
Independence ([L1, U1] in [L2, U2]) [L1 . L2, U1 . U2]
([L1, U1] in [L2, U2]) [L1 + L2 – (L1 . L2), U1 + U2 – (U1 . U2)]
([L1, U1] ⊖in [L2, U2]) [L1 . (1 – U2), U1 . (1– L2)]
Mutual Exclusion ([L1, U1] me [L2, U2]) [0, 0]
([L1, U1] me [L2, U2]) [min(1, L1 + L2), min(1, U1 + U2)]
([L1, U1] ⊖me [L2, U2]) [L1, min(U1, 1 – L2)]
• Prob(e1) = [L1, U1], prob(e2) = [L2, U2]
Prob(e1 e2), Prob(e1 e2), Prob(e1 e2) is calculated as followed
Uncertain Attribute Values
• In PRDB the value of each attribute is a probabilistic triple
A: V, α, β
V: a set of values of the atribute A ( V = {v1, v2,…,vk} )
α, β: lower bound and upper bound probabilistic
distribution on V
The attribute A take a value v in V with a probability
belongs to [α(v), β(v)]
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 6
PRDB Model
PRDB model is extended from RDB model by
integrating uncertain attribute values
Each tuple of a relation is a list of probabilistic triples
t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 7
Probabilistic Relations
A probabilistic relation r over a probabilistic relational
schema R(A1, A2, …, Ak) is
r = {t t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)}
Example
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 8
PATIENT_ID PHYSICIAN_ID DISEASE DURATION
PT0421, u, u DT005, u, u lung cancer, tuberculosis, 0.8u, 1.2u 400, 500, u, u
PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u
PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u
Probabilistic Functional Dependencies
The probabilistic measure for equal attribute values
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 9
where t1.A = V1, 1, 1, t2.A = V2, 2, 2 and
[(v), (v)] = [1(v1), 1(v1)] [2(v2), 2(v2)],
v W = (v1, v2) V1 V2 v1 = v2
[vW (v), min(1, vW (v))], if W
prob(t1.A = t2.A) =
[0, 0], if W =
Probabilistic Functional Dependencies
The probabilistic functional dependency in PRDB is
extended from the functional dependency in RDB
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 10
t1, t2 r, prob(t1[X] = t2[X]) prob(t1[Y] = t2[Y])
Selection Expressions
x.A v x X, A is an attribute in R, is a
binary relation from =, , , , , ≥
and v is a value
x.A1 = x.A2 is a probabilistic conjunction strategy of
combining the probabilities for x.A1 = v1
and x.A2 = v2 so that v1 = v2
E1 E2 E1 and E2 are selection expressions
E1 E2 is a probabilistic disjunction strategy
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 11
Selection Expressions
Example Relation DIAGNOSE
Selection expression
(x.DURATION 40) (x.COST 60)
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 12
PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST
PT0421, u, u DT005, u, u lung cancer, tuberculosis,
0.8u, 1.2u 400, 500, u, u 300, 350, u, u
PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u
PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u
Selection Conditions
(E)[L, U] E is a selection expression [L, U] is an
probabilistic interval
( ) and are selection conditions
( )
Example
((x.DURATION 40) (x.COST 60))[0.4, 0.6])
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 13
Probabilistic Interpretation of Selection Expressions
probt(E) is a probabilistic interval for a tuple t to satisfy
selection expression E
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 14
Probabilistic Interpretation of Selection Expressions
Probt (x.A d) = [vW (v), min(1, vW (v))]
Probt (x.A1 = x.A2) = [vW (v), min(1, vW (v))]
Probt (E1 E2) = probt (E1) probt (E2)
Probt (E1 E2) = probt (E1) probt (E2)
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 15
Satisfaction of Selection Conditions
Probt ⊨ (E)[L, U] if and only if probt(E) [L, U]
Probt ⊨ if and only if probt ⊨ does not hold
Probt ⊨ if and only if probt ⊨ and probt ⊨
Probt ⊨ if and only if probt ⊨ or probt ⊨
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 16
Selection Operation
The selection on a relation r with respect selection
condition
(r) = t r | probR,r,t ⊨
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 17
Selection Operation
Example Relation DIAGNOSE
Selection operation on DIAGNOSE with the selection condition
(x.DISEASE = hepatitis in x.COST 70)[0.4, 0.6])
is t =
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 18
PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST
PT0421, u, u DT005, u, u lung cancer, tuberculosis,
0.8u, 1.2u 400, 500, u, u 300, 350, u, u
PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u
PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u
PRDB-SQLite Architecture
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 19
PRDB-SQLite Schema
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 20
PRDB-SQLite Relation
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 21
PRDB-SQLite Query
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 22
Conclusions & Future Works
• Conclusions Building a new PRDB model which is extended from RDB
model
Uncertain values in PRDB model are represented by a
probabilistic triple
The notions of schema, relation, functional dependency,
and selection on PRDB are respectively defined
Implement a simple visual management system for PRDB
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 23
Conclusions & Future Works
• Future Works Build all other relational algebra operations on PRDB
Build a complete database management system for PRDB
Integrate fuzzy set value into the attribute value to build a
fuzzy and probabilistic relational data base
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 24
Any questions for us ?
March 27-28th 2013 in Kathu, Phuket, Thailand Sea Pearl Villa Resort
References
[1] R. Cavallo, M. Pittarelli, “The theory of probabilistic databases”, in Proc. 13th International
Conf. on Very Large Data Bases, Brighton, England, 1987, pp. 71-81.
[2] E. F. Codd, “A Relational model of data for large shared data banks”, Communications of
the Association for Computing Machinery, vol. 13,June. 1970, pp. 377-387.
[3] N. Fuhr, T. Rolleke, “A probabilistic relational algebra for the integration of information
retrieval and database systems”, Association for Computing Machinery Transactions on
Information Systems, vol. 15, Jan. 1997, pp. 32-66.
[4] T. Eiter, T. Lukasiewicz, and M. Walter, “Extension of the relational algebra to probabilistic
complex values”, in Proc. 1th International Symposium on Foundations of Information and
Knowledge System, Burg, Germany, 2000, 1762, pp. 95-115.
[5] T. Eiter, J. J. Lu, T. Lukasiewicz, and V. S. Subrahmanian, “Probabilistic object bases”,
Association for Computing Machinery Transactions on Database Systems, vol. 26, 2001, pp.
264–312.
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 26
References
[6] H. Garcia-Molina, J. D. Ullman, J. Widom, Database systems: the complete book, 2nd ed.,
Prentice Hall, Upper Saddle River, New Jersey, 2002.
[7] T. Imielinski, Jr. W. Lipski, “Incomplete Information in Relational Databases”, Journal of the
Association for Computing Machinery, vol. 31 issue 4, Oct. 1984, pp. 761-791.
[8] L. V. S. Lakshmanan, N. Leone, R. Ross, V. S. Subrahmanian, “Probview: A flexible
probabilistic database system”, Association for Computing Machinery Transactions on Database
Systems, vol. 22, 1997, pp. 419-469.
[9] H. Nguyen, T. H. Cao, “Extending probabilistic object bases with uncertain applicability and
imprecise values of class properties”, in Proc. 5th IEEE International Conf. on Fuzzy Systems,
London, England, 2007, pp. 487-492.
[10] T. H. Cao, H. Nguyen, “Uncertain and fuzzy object bases: a data model and Algebraic
operations”, International Journal of Transaction on Fuzzy Systems, 2011, pp. 275-305.
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 27
References
[11] H. D. Tran, “Constructing A Probabilistic Relational Data Base”, B.A. thesis, Dept.
Information. Tech., Ho Chi Minh City Open Univ., Ho Chi Minh City, Vietnam, 2010.
[12] W. Zhao, A. Dekhtyar, J. Goldsmith, “Databases for interval probabilities”, International
Journal of Intelligent Systems, vol. 19, 2009, pp. 789–815.
[13] W. Zhao, A. Dekhtyar, J. Goldsmith, “Query algebra operations for interval probabilities”,
in Proc. 14th International Conf. on Database and Expert Systems Applications, Prague, Czech
Republic, 2003, pp. 527-536.
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 28