Download - Paroma Varma, Bryan He, PayalBajaj, Nishith Khandwala, …bryanhe/posters/inferring_nips... · 2018. 1. 30. · Inferring Generative Model Structure with Static Analysis Paroma Varma,

Inferring Generative Model Structure with Static Analysis Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré

MotivationsSummary

• Generative Models to Label Training DataUse generative models to combine sources of weak supervision to assign noisy labels

• Complex Dependencies among SourcesSources of labels are rarely independent

• Inferring Model StructureUse static analysis to infer dependencies and encode in generative model structure

Heuristic Structure• Domain Specific Primitives (DSPs)

Interpretable characteristics of raw data

• Heuristic Functions (HFs)Programmatic rules that output noisy labels

MotivationsStatic Analysis• Shared Input Sharing primitives as inputs

leads to explicit dependencies

• Compositions: Primitives composed of others can lead to implicit dependencies

Statistical Modeling• HF Dependency Represents the

dependencies found using static analysis

• DSP Similarity Represents the learned correlations among the DSPs

MotivationsTheoretical Results

• Learning DependenciesLearning k-degree dependencies among 𝑛heuristics requires 𝑂 𝑛#$%log𝑛 samples

• Inferring DependenciesAnalyzing the code can infer the dependencies among heuristics without data

Given dependencies, learning heuristic accuracies requires 𝑂 𝑛 log𝑛 samples

MotivationsExperimental Results

p1𝜙%+,

𝜙-+,

𝜙.+,

Y 𝜙/01

λ1

λ2

λ3

p2

p3

𝜙-233

𝜙%233

𝜙.233

P.w

P.h

P.yλ1

λ2

λ3

DSP HF

Shared Input, Explicit Dep.

Composition, Implicit Dep.

𝜙4+,

𝜙5233

𝜙/01

YTrue Label

Accuracy

HFDependency

DSP Similarity

def λ_1(P.y):if P.y[‘person’]>= P.y[‘bike’]:

return Trueelse:

return False

def λ_2(P.h):if P.h[‘person’] >= P.h[‘bike’]:

return Trueelse:

return False

def λ_3(P.area):if P.area[‘person’]

<= 2*P.area[‘bike’]:return True

else:return False

• Inferring dependencies outperforms learning dependencies

• Outperforms fully supervised model with additional noisy training labels

* reports F1 scores, rest in accuracy (%)