Running Probabilistic Programs Backwardsntoronto/papers/toronto-2015esop-slides.p… · •...

Post on 27-Jul-2020

0 views 0 download

transcript

Running ProbabilisticRunning ProbabilisticRunning Probabilistic

Programs BackwardsPrograms BackwardsPrograms Backwards

Neil Toronto * Jay McCarthy † David Van Horn *

* University of Maryland † Vassar College

ESOP 2015

2015/04/14

RoadmapRoadmapRoadmap

• Probabilistic inference, and why it’s hard

1111111111111111

RoadmapRoadmapRoadmap

• Probabilistic inference, and why it’s hard

• Limitations of current probabilistic programming languages(PPLs)

1111111111111111

RoadmapRoadmapRoadmap

• Probabilistic inference, and why it’s hard

• Limitations of current probabilistic programming languages(PPLs)

• Contributions

1111111111111111

RoadmapRoadmapRoadmap

• Probabilistic inference, and why it’s hard

• Limitations of current probabilistic programming languages(PPLs)

• Contributions

Uncomputable, compositional ways to not limit language

1111111111111111

RoadmapRoadmapRoadmap

• Probabilistic inference, and why it’s hard

• Limitations of current probabilistic programming languages(PPLs)

• Contributions

Uncomputable, compositional ways to not limit language

Computable, compositional ways to not limit language

1111111111111111

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)]) x)

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)]) x)

0.5

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)]) x)

0.5

0.5

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)][y (flip 0.5)])

(cons x y))

0.5

0.5 ⟨ , ⟩0.5 ⟨ , ⟩

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)][y (flip 0.5)])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)][y (flip 0.5)])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let* ([x (flip 0.5)][y (flip (if (equal? x heads) 0.5 0.3))])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

2222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let* ([x (flip 0.5)][y (flip (if (equal? x heads) 0.5 0.3))])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

0.3 0.72222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

0.3 0.72222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

0.3 0.72222222222222222

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

0.5

0.5 ⟨ , ⟩0.5 ⟨ , ⟩

0.32222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

sto·cha·stic /stō-ˈkas-tik/ adj. fancy word for "randomized"

2222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

sto·cha·stic /stō-ˈkas-tik/ adj. fancy word for "randomized"

2222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

ap·er·ture /ˈap-ə(r)-chər/ n. fancy word for "opening"

2222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

ap·er·ture /ˈap-ə(r)-chər/ n. fancy word for "opening"

2222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

Simulate projecting rays onto a sensor...

2222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

... and collect them to form an image

2222222222222222

Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing

• Normally thousands of lines of code

2222222222222222

Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing

• Normally thousands of lines of code

• Bears little resemblance to the physical process

2222222222222222

Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing

• Normally thousands of lines of code

• Bears little resemblance to the physical process

• In DrBayes, it’s simple physics simulation:

(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])

(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)

(collision t (vec+ p0 (vec* v t)) n)#f))

#f)))

2222222222222222

Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing

• Normally thousands of lines of code

• Bears little resemblance to the physical process

• In DrBayes, it’s simple physics simulation:

(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])

(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)

(collision t (vec+ p0 (vec* v t)) n)#f))

#f)))

• Other PPLs really aren’t up to this yet

2222222222222222

Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing

• Normally thousands of lines of code

• Bears little resemblance to the physical process

• In DrBayes, it’s simple physics simulation:

(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])

(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)

(collision t (vec+ p0 (vec* v t)) n)#f))

#f)))

• Other PPLs really aren’t up to this yet

• The issue is one of theory, not engineering effort

2222222222222222

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (random):

xxxxxxxxx

p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (random):

xxxxxxxxx

p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (random):

xxxxxxxxx

p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (max 0.5 (random)):

xxxxxxxxx

pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (max 0.5 (random)):

xxxxxxxxx

pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (max 0.5 (random)):

xxxxxxxxx

pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???

3333333333333333

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

Density function for value of (max 0.5 (random)):

xxxxxxxxx

pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist

3333333333333333

What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?

4444444444444444

What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?

• Results of discontinuous functions (bounded measuring devices)

(let ([temperature (normal 99 1)]) (min 100 temperature))

4444444444444444

What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?

• Results of discontinuous functions (bounded measuring devices)

(let ([temperature (normal 99 1)]) (min 100 temperature))

• Variable-dimensional things (union types)

(if test? none (just x))

4444444444444444

What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?

• Results of discontinuous functions (bounded measuring devices)

(let ([temperature (normal 99 1)]) (min 100 temperature))

• Variable-dimensional things (union types)

(if test? none (just x))

• Infinite-dimensional things (recursion)

4444444444444444

What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?

• Results of discontinuous functions (bounded measuring devices)

(let ([temperature (normal 99 1)]) (min 100 temperature))

• Variable-dimensional things (union types)

(if test? none (just x))

• Infinite-dimensional things (recursion)

• In general: the distributions of program values

4444444444444444

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

5555555555555555

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

• Measure of (random) is , defined by

5555555555555555

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

• Measure of (random) is , defined by

5555555555555555

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

• Measure of (random) is , defined by

• Measure of (max 0.5 (random)) defined by

5555555555555555

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

• Measure of (random) is , defined by

• Measure of (max 0.5 (random)) defined by

This term assigns probability

5555555555555555

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

• Measure of (random) is , defined by

• Measure of (max 0.5 (random)) defined by

This term assigns probability

• Need a way to derive measures from code

5555555555555555

Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages

• Interpret (max 0.5 (random)) as , defined

6666666666666666

Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages

• Interpret (max 0.5 (random)) as , defined

• Derive measure of (max 0.5 (random)) as

6666666666666666

Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages

• Interpret (max 0.5 (random)) as , defined

• Derive measure of (max 0.5 (random)) as

where

6666666666666666

Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages

• Interpret (max 0.5 (random)) as , defined

• Derive measure of (max 0.5 (random)) as

where

• Factored into random and deterministic parts:

6666666666666666

Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages

• Interpret (max 0.5 (random)) as , defined

• Derive measure of (max 0.5 (random)) as

where

• Factored into random and deterministic parts:

• In other words, compute measures of expressions by runningthem backwards

6666666666666666

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

7777777777777777

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

7777777777777777

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

7777777777777777

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

7777777777777777

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

Efficient way to compute areas of preimage sets

7777777777777777

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

Efficient way to compute areas of preimage sets

Proof of correctness w.r.t. standard interpretation

7777777777777777

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

Standard interpretation of programs as pure functions from arandom source

Efficient way to compute preimage sets

Efficient representation of arbitrary sets

Efficient way to compute areas of preimage sets

Proof of correctness w.r.t. standard interpretation

• WAT

7777777777777777

What About Approximating?What About Approximating?What About Approximating?

Conservative approximation with rectangles:

8888888888888888

What About Approximating?What About Approximating?What About Approximating?

Conservative approximation with rectangles:

8888888888888888

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Restricting preimages to rectangular subdomains:

9999999999999999

What About Approximating?What About Approximating?What About Approximating?

Sampling: exponential to quadratic (e.g. days to minutes)

10101010101010101010101010101010

What About Approximating?What About Approximating?What About Approximating?

Sampling: exponential to quadratic (e.g. days to minutes)

10101010101010101010101010101010

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute preimage sets

• Efficient representation of arbitrary sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute abstract preimage subsets

• Efficient representation of arbitrary sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute abstract preimage subsets

• Efficient representation of abstract sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute abstract preimage subsets

• Efficient representation of abstract sets

• Efficient way to sample uniformly in preimage sets

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute abstract preimage subsets

• Efficient representation of abstract sets

• Efficient way to sample uniformly in preimage sets

Efficient domain partition sampling

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute abstract preimage subsets

• Efficient representation of abstract sets

• Efficient way to sample uniformly in preimage sets

Efficient domain partition sampling

Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

• Nonrecursive, nonprobabilistic programs: , ,

11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

• Nonrecursive, nonprobabilistic programs: , ,

• Add 3 semantic functions for recursion and probabilistic choice

11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

• Nonrecursive, nonprobabilistic programs: , ,

• Add 3 semantic functions for recursion and probabilistic choice

• Full development needs 2 more to transfer theorems frommeasure theory...

11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

• Nonrecursive, nonprobabilistic programs: , ,

• Add 3 semantic functions for recursion and probabilistic choice

• Full development needs 2 more to transfer theorems frommeasure theory...

• ... oh, and 1 more to collect information for Monte Carlointegration

11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

• Nonrecursive, nonprobabilistic programs: , ,

• Add 3 semantic functions for recursion and probabilistic choice

• Full development needs 2 more to transfer theorems frommeasure theory...

• ... oh, and 1 more to collect information for Monte Carlointegration

Tally: 3+3+2+1 = 9 semantic functions, 11 or 12 rules each

11111111111111111111111111111111

Enter Category TheoryEnter Category TheoryEnter Category Theory

• Moggi (1989): Introduces monads for interpreting effects

11111111111111111111111111111111

Enter Category TheoryEnter Category TheoryEnter Category Theory

• Moggi (1989): Introduces monads for interpreting effects

• Other kinds of categories: idioms, arrows

11111111111111111111111111111111

Enter Category TheoryEnter Category TheoryEnter Category Theory

• Moggi (1989): Introduces monads for interpreting effects

• Other kinds of categories: idioms, arrows

• Arrow defined by type constructor and thesecombinators:

11111111111111111111111111111111

Enter Category TheoryEnter Category TheoryEnter Category Theory

• Moggi (1989): Introduces monads for interpreting effects

• Other kinds of categories: idioms, arrows

• Arrow defined by type constructor and thesecombinators:

• Arrows are always function-like

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Pair PreimagesPair PreimagesPair Preimages

12121212121212121212121212121212

Pair PreimagesPair PreimagesPair Preimages

12121212121212121212121212121212

Pair PreimagesPair PreimagesPair Preimages

:

12121212121212121212121212121212

Pair PreimagesPair PreimagesPair Preimages

and :

12121212121212121212121212121212

Pair PreimagesPair PreimagesPair Preimages

:

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

• Derive and others so that distributes; e.g.

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

• Derive and others so that distributes; e.g.

• Distributive properties makes proving this very easy:

Theorem (correctness). For all , .

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

• Derive and others so that distributes; e.g.

• Distributive properties makes proving this very easy:

Theorem (correctness). For all , .

In English: computes preimages under .

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

• Derive and others so that distributes; e.g.

• Distributive properties makes proving this very easy:

Theorem (correctness). For all , .

In English: computes preimages under .

• Other correctness proofs are similarly easy: prove 5 distributiveproperties

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

• Derive and others so that distributes; e.g.

• Distributive properties makes proving this very easy:

Theorem (correctness). For all , .

In English: computes preimages under .

• Other correctness proofs are similarly easy: prove 5 distributiveproperties

• Can add (random) and recursion to all semantics in one shot

12121212121212121212121212121212

AbstractionAbstractionAbstraction

Rectangle: An interval or union of intervals, a subset of , or for rectangles and

13131313131313131313131313131313

AbstractionAbstractionAbstraction

Rectangle: An interval or union of intervals, a subset of , or for rectangles and

• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.

13131313131313131313131313131313

AbstractionAbstractionAbstraction

Rectangle: An interval or union of intervals, a subset of , or for rectangles and

• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.

• Define (and therefore ) by replacing sets and setoperations with rectangles and rectangle operations

13131313131313131313131313131313

AbstractionAbstractionAbstraction

Rectangle: An interval or union of intervals, a subset of , or for rectangles and

• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.

• Define (and therefore ) by replacing sets and setoperations with rectangles and rectangle operations

• Recursion is somewhat tricky—requires fine control overrecursion depth or if choices

13131313131313131313131313131313

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling in abstract preimages doesn’t leaveanything out

14141414141414141414141414141414

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling in abstract preimages doesn’t leaveanything out

Theorem (decreasing). never returns preimages larger thanthe given subdomain.

• Consequence: Refining abstract preimage sets never results in aworse approximation

14141414141414141414141414141414

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling in abstract preimages doesn’t leaveanything out

Theorem (decreasing). never returns preimages larger thanthe given subdomain.

• Consequence: Refining abstract preimage sets never results in aworse approximation

Theorem (monotone). is monotone.

• Consequence: Partitioning and then refining never results in aworse approximation 14141414141414141414141414141414

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

Program Domain ValuesProgram Domain ValuesProgram Domain Values

16161616161616161616161616161616

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Program inputs are infinite binary trees:

16161616161616161616161616161616

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Program inputs are infinite binary trees:

• Every expression in a program is assigned a node

16161616161616161616161616161616

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Program inputs are infinite binary trees:

• Every expression in a program is assigned a node

• Implemented using lazy trees of random values

16161616161616161616161616161616

Program Domain ValuesProgram Domain ValuesProgram Domain Values

• Program inputs are infinite binary trees:

• Every expression in a program is assigned a node

• Implemented using lazy trees of random values

• No probability density for domain, but there is a measure 16161616161616161616161616161616

Example: Stochastic Ray TracingExample: Stochastic Ray TracingExample: Stochastic Ray Tracing

17171717171717171717171717171717

Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification

(struct/drbayes float-any ())(struct/drbayes float (value error))

18181818181818181818181818181818

Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification

(struct/drbayes float-any ())(struct/drbayes float (value error))

(define/drbayes (flsqrt x) (if (float-any? x)

x(let ([v (float-value x)]

[e (float-error x)]) (cond [(negative? v) (float-any)]

[(zero? v) (float 0 0)][else (float (sqrt v)

(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))

18181818181818181818181818181818

Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification

(struct/drbayes float-any ())(struct/drbayes float (value error))

(define/drbayes (flsqrt x) (if (float-any? x)

x(let ([v (float-value x)]

[e (float-error x)]) (cond [(negative? v) (float-any)]

[(zero? v) (float 0 0)][else (float (sqrt v)

(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))

• Idea: sample e where (> (float-error e) threshold)

18181818181818181818181818181818

Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification

(struct/drbayes float-any ())(struct/drbayes float (value error))

(define/drbayes (flsqrt x) (if (float-any? x)

x(let ([v (float-value x)]

[e (float-error x)]) (cond [(negative? v) (float-any)]

[(zero? v) (float 0 0)][else (float (sqrt v)

(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))

• Idea: sample e where (> (float-error e) threshold)

• Verified flhypot, flsqrt1pm1, flsinh in Racket’s mathlibrary, as well as others

18181818181818181818181818181818

Examples: Other Inference TasksExamples: Other Inference TasksExamples: Other Inference Tasks

• Typical Bayesian inference

Hierarchical models

Bayesian regression

Model selection

19191919191919191919191919191919

Examples: Other Inference TasksExamples: Other Inference TasksExamples: Other Inference Tasks

• Typical Bayesian inference

Hierarchical models

Bayesian regression

Model selection

• Atypical

Programs that halt with probability < 1, or never halt

Probabilistic context-free grammars with context-sensitiveconstraints

19191919191919191919191919191919

SummarySummarySummary

• Probabilistic inference is hard, so PPLs have been popping up

20202020202020202020202020202020

SummarySummarySummary

• Probabilistic inference is hard, so PPLs have been popping up

• Interpreting every program requires measure theory

20202020202020202020202020202020

SummarySummarySummary

• Probabilistic inference is hard, so PPLs have been popping up

• Interpreting every program requires measure theory

• Defined a semantics that computes preimages

20202020202020202020202020202020

SummarySummarySummary

• Probabilistic inference is hard, so PPLs have been popping up

• Interpreting every program requires measure theory

• Defined a semantics that computes preimages

• Measuring abstract preimages or sampling in them carries outinference

20202020202020202020202020202020

SummarySummarySummary

• Probabilistic inference is hard, so PPLs have been popping up

• Interpreting every program requires measure theory

• Defined a semantics that computes preimages

• Measuring abstract preimages or sampling in them carries outinference

• Can do a lot of cool stuff that’s normally inaccessible

20202020202020202020202020202020

https://github.com/ntoronto/drbayes

21212121212121212121212121212121