“Ideal Parent” Structure Learning

transcript

School of Engineering & Computer Science

The Hebrew University, Jerusalem, Israel

Gal Elidan

withIftach Nachman and Nir Friedman

Problems: Need to score many candidates Each one requires costly parameter optimization

Structure learning is often impractical

Learning Structure

VariablesInput:

-17.23

-19.19

-23.13

Output:

Init: Start with initial structure

Consider local changes1Score each candidate2

Apply best modification 3 The “Ideal Parent” Approach Approximate improvements of changes (fast)

Optimize & score promising candidates (slow)

Linear Gaussian Networks

Goal: Score only promising candidates

The “Ideal Parent” Idea

Parent Profile

Child Profile

Instances

Pred(X|U)

Ideal Profile

Instances

Pred(X|U)

Step 1:Compute optimal

hypothetical parent

Pred(X|U,Y)

Instances

Step 2:Search for

“similar” parent

Parent Profile

Child Profile

Step 3:Add new parent

and optimize parameters

Instances

Step 1:Compute optimal

hypothetical parent

Instances

Step 2:Search for

“similar” parent

Z4Pred(X|U,Y)

Ideal Profile

Parent(s) Profile

Predicted(X|U,Z)

Child Profile

Choosing the best parent Z

Our goal: Choose Z that maximizes

Likelihood of Likelihood of

Theorem: likelihood improvement when only z is optimized

We define:

Similarity vs. Score

C2 is more accurate

C1 will be useful later

scoreC

ilarit

We now have an efficient approximation for the score

effect of fixed variance is large

Ideal Parent in Search Structure search involves

O(N2) Add parentO(NE) Replace parentO(E) Delete parentO(E) Reverse edge

-17.23

-19.19

-23.13

Vast majority of evaluations are replaced by ideal approximation

Only K candidates per family are optimized and scored

Gene Expression Experiment

4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables

1 2 3 4 5K

d AminoMetabolismConditions (AA)Conditions (Met)

1 2 3 4 5K

1 2 3 4 5K0.4%-3.6%

changes evaluated

greedy

Speedup:1.8-2.7

Conditional probability distribution (CPD) of the form

link function white noise

General requirement:

g(U) be any invertible (w.r.t ui) function

Linear Gaussian Chemical ReactionSigmoid Gaussian

Problem: No simple form for similarity measures

Sigmoid Gaussian CPD

-4 -2 0 2 4

X = 0.5 X = 0.85

g(z) 0.5

Y(0.5) Y(0.85)-4 -2 0 2 4-4 -2 0 2 4 Linear approximation

around Y=0ExactApprox

Solution:

Sensitivity to Z depends on gradient of specific instance

Sigmoid Gaussian CPD

-0.86 -0.3 0.27 0.83

Z x 0.25 (g0.5)

-1.85 -0.64 0.58 1.79

Z (X=0.5)

Equi-Likelihood Potential After gradient correction

We can now use the same measure

Sigmoid Gene Expression

4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables

0 5 10 15 20K

AminoMetabolismConditions (AA)Conditions (Met)

greedy

0 5 10 15 20K 2.2%-6.1% moves evaluated

18-30 times faster

For the Linear Gaussian case:

Challenge: Find that maximizes this bound

Adding New Hidden Variables

Idea Profile

Idea: Introduce hidden parent for nodeswith similar ideal profiles

X1 X2 X4

Instances

where is the matrix whose columns are

must lie in the span of

is the eigenvector with largest eignevalue

Setting and using the above (with A invertible)

Scoring a parent

Rayleigh quotient of the matrix and .

Finding h* amounts to solving an eigenvector problem where |A|=size of cluster

X1 X2 X3 X4

compute only once

Compute using

Finding the best Cluster

X1 X2 X3 X4

compute only once

X1 X3 X2

X1 X3 X2 X416.79

Finding the best Cluster

Select cluster with highest score Add hidden parent and continue with search

Bipartite Network

Instances from biological expert network with7 (hidden) parents and 141 (observed) children

10 100

Instances10 100

Instances

GreedyIdeal K=2Ideal K=5Gold

Speedup is roughly x 10

Greedy takes over 2.5 days!

Summary New method for significantly speeding up structure learning in continuous variable networks

Offers promising time vs. performance tradeoff

Guided insertion of new hidden variables

Future work Improve cluster identification for non-linear case

Explore additional distributions and relation to GLM

Combine the ideal parent approach as plug-in with other search approaches

“Ideal Parent” Structure Learning

Documents