Post on 07-May-2015
description
transcript
2010, October 29th2010, October 29thContact: Shou Matsumoto (cardialfly@[yahoo|gmail].com)
On ImplementingOn ImplementingProbabilistic Relational ModelsProbabilistic Relational Models
*Project page: http://sourceforge.net/projects/unbbayes/
ContentContentPurposePurposeContextualizationContextualization
•E/RE/R•PRMPRM•Link UncertaintyLink Uncertainty
A Java implementationA Java implementation•UnBBayes-PRM*UnBBayes-PRM*
3
ObjectivesObjectives
What is this presentation for?What is this presentation for?– Overview of PRM and its Overview of PRM and its
underlying conceptsunderlying concepts– Overview of extensions of PRMOverview of extensions of PRM
• Link uncertaintyLink uncertainty– To present a simple To present a simple
implementation of PRMimplementation of PRM• UnBBayes-PRM UnBBayes-PRM
Purp
ose
Purp
ose
4
MotivationsMotivations
E/R models are heavily usedE/R models are heavily used– Most of commercial databases are Most of commercial databases are
based on E/R modelsbased on E/R models PRM allows E/R with uncertaintyPRM allows E/R with uncertainty
– PRM is compatible with optimizations PRM is compatible with optimizations of BN and E/Rof BN and E/R
Implementations of PRM are rareImplementations of PRM are rare
Purp
ose
Purp
ose
We assume you have basic knowledge about Bayesian Networks5
TargetTarget For whom is this presentation intended?For whom is this presentation intended?
– People interested on PRMPeople interested on PRM• E.g. Database architects willing to incorporate E.g. Database architects willing to incorporate
probabilistic reasoningprobabilistic reasoning• People looking for a BN extension with the People looking for a BN extension with the
expressiveness of relational calculusexpressiveness of relational calculus– People looking for a PRM toolPeople looking for a PRM tool
• E.g. Developers looking for a sample E.g. Developers looking for a sample implementationimplementation
• Learners willing to exercise PRMLearners willing to exercise PRM
Purp
ose
Purp
ose
6
What is PRM?What is PRM?C
onte
xtua
lizat
ion
Con
text
ualiz
atio
n
BNBN E/RE/R
PRMPRMPRMPRM
++
==
Attributes holds actual data content.7
What is E/R?What is E/R?C
onte
xtua
lizat
ion
Con
text
ualiz
atio
n
E/R = Entity-RelationshipE/R = Entity-Relationship Abstract conceptual representation of dataAbstract conceptual representation of data
– Often used in relational database modelsOften used in relational database models• E.g. Oracle, MySQL, PostgreSQL...E.g. Oracle, MySQL, PostgreSQL...
Entities = “nouns”Entities = “nouns”– A set of elements in a domainA set of elements in a domain
Relationships = “verbs”Relationships = “verbs”– Captures how 2 or more entities are relatedCaptures how 2 or more entities are related
Attributes = “characteristics”Attributes = “characteristics”
8
What is E/R?What is E/R?C
onte
xtua
lizat
ion
Con
text
ualiz
atio
n
ConstraintsConstraints– CardinalityCardinality
• 1-1, 1-many, many-1, many-many1-1, 1-many, many-1, many-many– Primary Key (PK): Primary Key (PK):
• minimal set of uniquely identifying attributesminimal set of uniquely identifying attributes– Foreign Key (FK): Foreign Key (FK):
• Attributes that refers to other attributes (PK)Attributes that refers to other attributes (PK)– This is used to conduct relationshipsThis is used to conduct relationships
– Allowed valuesAllowed values– Etc.Etc.
UnBBayes-PRM sees E/R as a set of tables.9
What is E/R?What is E/R?C
onte
xtua
lizat
ion
Con
text
ualiz
atio
n E/R can be represented as a set of TablesE/R can be represented as a set of Tables– Entities → tablesEntities → tables– Attributes → columnsAttributes → columns– Values of attributes → content of a cellValues of attributes → content of a cell– 1-1 and 1-many (many-1) relationships → FK1-1 and 1-many (many-1) relationships → FK– Many-many relationships → table + FKMany-many relationships → table + FK
Problem Problem – Classic E/R models do not handle uncertaintyClassic E/R models do not handle uncertainty
10
So, what is PRM?So, what is PRM? Probabilistic Relational ModelsProbabilistic Relational Models
– Template for probability distribution over a Template for probability distribution over a database (E/R model)database (E/R model)• Compact graphical probabilistic modelCompact graphical probabilistic model
– well defined semanticswell defined semantics• Natural domain modelingNatural domain modeling
– objects, properties, relations...objects, properties, relations...• Attributes can depend on attributes of related Attributes can depend on attributes of related
entitiesentities• Generalization over a variety of situationsGeneralization over a variety of situations
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
Machine learning is a major concern in PRM11
So, what is PRM?So, what is PRM?
PRM's learning algorithmsPRM's learning algorithms– Captures relationships in Bayesian learning Captures relationships in Bayesian learning
algorithmsalgorithms• There's no need to “flatten” databaseThere's no need to “flatten” database
PRM's are composed of:PRM's are composed of:– Relational Schema,Relational Schema,– Relational Skeleton,Relational Skeleton,– Probabilistic distribution.Probabilistic distribution.
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
12
SchemaSchema Static partStatic part
– Entities + Relationships + AttributesEntities + Relationships + Attributes– PK, FK, possible (allowed) values...PK, FK, possible (allowed) values...
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
Person
Father : FK to PersonMother: FK to Person
ID: PK
BloodType : any of {A,B,AB,O}
PersonPerson BloodType
hasFather
hasMother
13
SkeletonSkeleton Dynamic partDynamic part
– Instantiation of a SchemaInstantiation of a Schema– Actual objectsActual objects
• Attributes are filled with some valuesAttributes are filled with some values
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
Father: NULLMother: NULL
ID: Augustine
BloodType: OFather: Augustine
Mother: Mary
ID: George
BloodType: NULL
Father: NULLMother: NULL
ID: Mary
BloodType: A
(Slot chain = empty) := no parents | parents reside in the same table14
PRM's structurePRM's structure Schema + probabilistic dependenciesSchema + probabilistic dependencies Attributes have path expressions describing their Attributes have path expressions describing their
parents of that attribute.parents of that attribute.– Path expressions = slot chainPath expressions = slot chain
• List of FKList of FK– If slot chain contains 1-many relationship, the If slot chain contains 1-many relationship, the
number of parents is unknownnumber of parents is unknown Conditional Probability Distribution (CPD)Conditional Probability Distribution (CPD)
– Conditional Probability Table (CPT)Conditional Probability Table (CPT)– Functions + parametersFunctions + parameters
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
15
CPD of BloodTypeCPD of BloodType
PRM's structurePRM's structureC
onte
xtua
lizat
ion
Con
text
ualiz
atio
n
Father A A A ...
Mother A B AB ...
A 75% 25% 50% ...
B 0% 25% 25% ...
AB 0% 25% 25% ...
O 25% 25% 0% ...
PersonPerson
FatherMother
PK FK2FK1
Edge from BloodType
of the objectreferenced by FK1
Edge from BloodType
of the objectreferenced by FK1
Edge from BloodType
of the objectreferenced by FK2
Edge from BloodType
of the objectreferenced by FK2
John Doe
Me
Jane Doe
InstantiationInstantiationInstantiationInstantiation
BloodType
UnBBayes-PRM uses the approach 216
CPD with aggregationCPD with aggregation How do we declare the CPD if the number of parents is How do we declare the CPD if the number of parents is
unknown?unknown? Approach 1Approach 1: special purpose scripts: special purpose scripts
– E.g. UnBBayes-MEBN's CPD scriptsE.g. UnBBayes-MEBN's CPD scripts• A set of IF-THEN-ELSE statementsA set of IF-THEN-ELSE statements
Approach 2Approach 2: aggregation: aggregation– E.g. Mode, Max, Min, Average...E.g. Mode, Max, Min, Average...
• Equivalent to an intermediate “deterministic” nodeEquivalent to an intermediate “deterministic” node
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
17
InferenceInference
Instantiation of a BN from skeletonInstantiation of a BN from skeleton Descriptive attributes become random Descriptive attributes become random
variablesvariables Once generated, further inference is done as Once generated, further inference is done as
normal BN (evidence propagation)normal BN (evidence propagation)
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
18
Does the instantiated BN Does the instantiated BN have cycles?have cycles?
Case 1Case 1: check at PRM schema level: check at PRM schema level– Schema has no cycle → instances have no cycleSchema has no cycle → instances have no cycle
Case 2Case 2: schema contains cycles, but the instantiated BN : schema contains cycles, but the instantiated BN does notdoes not
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
ID: Augustine
BloodType
ID: GeorgeWashingtonBloodType
ID: MaryBloodType
Person
PersonPerson
(Father) (Mother)
OBS. Link uncertainty is not implemented in UnBBayes-PRM19
Extension: Extension: link uncertaintylink uncertainty
We only mentioned about distribution over attributes We only mentioned about distribution over attributes of the objects in a modelof the objects in a model– Only the values of the attributes were uncertainOnly the values of the attributes were uncertain
Uncertainty over relational structure of domain was Uncertainty over relational structure of domain was not addressed yetnot addressed yet– Structure uncertaintyStructure uncertainty
• Values of FK are uncertainValues of FK are uncertain– Slot chains are uncertainSlot chains are uncertain
Reference uncertaintyReference uncertainty & & existence uncertaintyexistence uncertainty
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
20
Reference uncertaintyReference uncertainty
Slots' (FK) values become a random variableSlots' (FK) values become a random variable– ProblemProblem
• Unknown number of possible valuesUnknown number of possible values– It's difficult to declare CPD at schema levelIt's difficult to declare CPD at schema level
– SolutionSolution• Create partitions based on “other attributes”Create partitions based on “other attributes”
– Assuming that ordinal attributes has a Assuming that ordinal attributes has a known number of possible valuesknown number of possible valuesC
onte
xtua
lizat
ion
Con
text
ualiz
atio
n
We can now specify parents of FKs and CPD21
Reference uncertaintyReference uncertaintyC
onte
xtua
lizat
ion
Con
text
ualiz
atio
n
Entity1Entity1
PKFKToEntity2
Entity2Entity2
PKBooleanAttrib
Possible values:PKs of Entity2
(unknown)
Entity1Entity1PK
Selector
Entity2Entity2
PKBooleanAttrib
Possible values:2 (true/false)
Link to a set (partition) of instances of Entity2, based on the current value of BooleanAttrib
Link to a single instance of Entity2based on the current value of PK
FKToEntity2
Extracted from Probabilistic Relational Models (Getoor et al., SRL07)22
Reference uncertainty:Reference uncertainty:instantiating BNinstantiating BN
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
Edge types:Edge types:– I: within single objectI: within single object– II: between objectsII: between objects– III: from FKs of a slot chainIII: from FKs of a slot chain– IV: from partition attributes to selectorsIV: from partition attributes to selectors– V: from selectors to FKV: from selectors to FK
Objects are related to every possible objects, with 0% ~ 100%23
Existence uncertaintyExistence uncertainty Creation of a Boolean attribute “Exists” in tablesCreation of a Boolean attribute “Exists” in tables
– Technically, entities also contain “Exists”Technically, entities also contain “Exists”• But we assume instances (objects) of entities But we assume instances (objects) of entities
“do exist” if they were instantiated“do exist” if they were instantiated– So, this mechanism is mainly for So, this mechanism is mainly for
relationshipsrelationships– Because “Exists” is not a FK, we can use it as a Because “Exists” is not a FK, we can use it as a
normal random variable.normal random variable.• No major changes on BN instantiationNo major changes on BN instantiation
Con
text
ualiz
atio
nC
onte
xtua
lizat
ion
Project page: http://sourceforge.net/projects/unbbayes/24
UnBBayes-PRMUnBBayes-PRM Open-source Java softwareOpen-source Java software
– GUI & inference machineGUI & inference machine FeaturesFeatures
– Edit Schema and Skeleton as tablesEdit Schema and Skeleton as tables– Edit probabilistic dependencies as CPTEdit probabilistic dependencies as CPT– Edit constraints (PK, FK and allowed values)Edit constraints (PK, FK and allowed values)– Generate BN from SkeletonGenerate BN from Skeleton– Save/load projects from fileSave/load projects from file
Developed as a plug-in for UnBBayes:Developed as a plug-in for UnBBayes:– Alpha version (for internal use)Alpha version (for internal use)
A J
ava
Impl
emen
tatio
nA
Jav
a Im
plem
enta
tion
A plugin descriptor is the main and minimal content of a plugin25
UnBBayes-PRMUnBBayes-PRMA
Jav
a Im
plem
enta
tion
A J
ava
Impl
emen
tatio
n
A plugin descriptor is the main and minimal content of a plugin26
UnBBayes-PRMUnBBayes-PRMA
Jav
a Im
plem
enta
tion
A J
ava
Impl
emen
tatio
n
27
UnBBayes-PRMUnBBayes-PRMA
Jav
a Im
plem
enta
tion
A J
ava
Impl
emen
tatio
n
PRM is currently stored as a SQL script. This is a temporary solution.28
UnBBayes-PRM - I/OUnBBayes-PRM - I/OA
Jav
a Im
plem
enta
tion
A J
ava
Impl
emen
tatio
n /* Table and PK declaration */CREATE TABLE "Person" (
"id" VARCHAR2(300) not null, "Father" VARCHAR2(300) , "Mother" VARCHAR2(300) , "BloodType" VARCHAR2(300)
);ALTER TABLE "Person" ADD CONSTRAINT PK_Person
PRIMARY KEY ("id");/* Possible values */ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType
CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O'));/* Foreign keys (relationships) */ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father
FOREIGN KEY ("Father") REFERENCES "Person" ("id");ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother
FOREIGN KEY ("Mother") REFERENCES "Person" ("id");
This is also a temporary solution.29
UnBBayes-PRM - I/OUnBBayes-PRM - I/OA
Jav
a Im
plem
enta
tion
A J
ava
Impl
emen
tatio
n
COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType()[ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0 0.0 0.25 0.25 0.25 0.25 0.25 (...)(...) }';
Dependencies are stored as in-table commentsDependencies are stored as in-table comments
Basic format: Basic format: – <listOfParents>;{<listOfProbabilities>}<listOfParents>;{<listOfProbabilities>}
<listOfParents> := comma separated list<listOfParents> := comma separated list– <parentClass>.<parentColumn><parentClass>.<parentColumn>
(<aggregateFunction>){<listOfForeignKeys>}(<aggregateFunction>){<listOfForeignKeys>}• <listOfForeignKeys> represents a slot chain<listOfForeignKeys> represents a slot chain
30
UnBBayes-PRM:UnBBayes-PRM:limitationslimitations
No support for link uncertaintyNo support for link uncertainty– But existence uncertainty can be “simulated”But existence uncertainty can be “simulated”
Only 1 attribute as PKOnly 1 attribute as PK Only String types allowedOnly String types allowed
– Thus, no sequences are allowedThus, no sequences are allowed No marginalizationNo marginalization
– Cannot delete dependencies Cannot delete dependencies • We must re-create attribute or edit the SQL We must re-create attribute or edit the SQL
scriptscriptA J
ava
Impl
emen
tatio
nA
Jav
a Im
plem
enta
tion
31
UnBBayes-PRM:UnBBayes-PRM:limitationslimitations
2 edges (dependencies) to a same attribute is 2 edges (dependencies) to a same attribute is not allowednot allowed– Even using different slot chainsEven using different slot chains
3 aggregation functions: 3 aggregation functions: – mode, min, max.mode, min, max.
No machine No machine learninglearning No direct access to an actual database (yet)No direct access to an actual database (yet)
– Only by means of a SQL script.Only by means of a SQL script. A J
ava
Impl
emen
tatio
nA
Jav
a Im
plem
enta
tion
DBMS = DataBase Management System32
UnBBayes-PRM:UnBBayes-PRM:(possible) future works(possible) future works
Add extension points for plug-insAdd extension points for plug-ins Integration with DBMSIntegration with DBMS
– Constraints/rules can be delegated to DBMSConstraints/rules can be delegated to DBMS• Some of the limitations may be automatically fixedSome of the limitations may be automatically fixed
Implement machine learning and link Implement machine learning and link uncertaintyuncertainty
Edit E/R models as diagramsEdit E/R models as diagrams PRM → MSBN compilationPRM → MSBN compilation
Con
clus
ion
Con
clus
ion
¹A Java open-source tool from University of Massachusetts Amherst33
UnBBayes-PRM:UnBBayes-PRM:(possible) future works(possible) future works
Implement Dynamic PRM Implement Dynamic PRM – Dynamic BN + E/RDynamic BN + E/R
Integration with PROXIMITY¹Integration with PROXIMITY¹– RDN - Relational Dependency NetworkRDN - Relational Dependency Network
• Generalization of BN + E/R + Relational Markov Generalization of BN + E/R + Relational Markov NetworkNetworkCon
clus
ion
Con
clus
ion
34
FinallyFinally
PRM looks practicalPRM looks practical– Uncertainty on relational dataUncertainty on relational data
• Immediate applicability in databasesImmediate applicability in databases– Advanced DBMS can add advanced Advanced DBMS can add advanced
featuresfeatures Machine learning seems to be PRM's major Machine learning seems to be PRM's major
concernconcern– It was not addressed by this presentationIt was not addressed by this presentation
Con
clus
ion
Con
clus
ion
35
FinallyFinally
PRM cannot specify advanced rules and PRM cannot specify advanced rules and constraints on conditional probabilitiesconstraints on conditional probabilities– Some conditions must be fulfilled “manually”Some conditions must be fulfilled “manually”– Some may be fulfilled by DBMS' featuresSome may be fulfilled by DBMS' features
UnBBayes-PRM provides an editor and inference UnBBayes-PRM provides an editor and inference engine for basic PRMengine for basic PRMC
oncl
usio
nC
oncl
usio
n
Project page: http://sourceforge.net/projects/unbbayes/
Questions?Questions?