Paroma Varma, Bryan He, PayalBajaj, Nishith Khandwala, …bryanhe/posters/inferring_nips... ·...

Post on 06-Oct-2020

3 views 0 download

transcript

Inferring Generative Model Structure with Static Analysis Paroma Varma, Bryan He, Payal Bajaj, Nishith Khandwala, Imon Banerjee, Daniel Rubin, Christopher Ré

MotivationsSummary

• Generative Models to Label Training DataUse generative models to combine sources of weak supervision to assign noisy labels

• Complex Dependencies among SourcesSources of labels are rarely independent

• Inferring Model StructureUse static analysis to infer dependencies and encode in generative model structure

Heuristic Structure• Domain Specific Primitives (DSPs)

Interpretable characteristics of raw data

• Heuristic Functions (HFs)Programmatic rules that output noisy labels

MotivationsStatic Analysis• Shared Input Sharing primitives as inputs

leads to explicit dependencies

• Compositions: Primitives composed of others can lead to implicit dependencies

Statistical Modeling• HF Dependency Represents the

dependencies found using static analysis

• DSP Similarity Represents the learned correlations among the DSPs

MotivationsTheoretical Results

• Learning DependenciesLearning k-degree dependencies among 𝑛heuristics requires 𝑂 𝑛#$%log𝑛 samples

• Inferring DependenciesAnalyzing the code can infer the dependencies among heuristics without data

Given dependencies, learning heuristic accuracies requires 𝑂 𝑛 log𝑛 samples

MotivationsExperimental Results

p1𝜙%+,

𝜙-+,

𝜙.+,

Y 𝜙/01

λ1

λ2

λ3

p2

p3

𝜙-233

𝜙%233

𝜙.233

P.w

P.h

P.yλ1

λ2

λ3

DSP HF

Shared Input, Explicit Dep.

Composition, Implicit Dep.

𝜙4+,

𝜙5233

𝜙/01

YTrue Label

Accuracy

HFDependency

DSP Similarity

def λ_1(P.y):if P.y[‘person’]>= P.y[‘bike’]:

return Trueelse:

return False

def λ_2(P.h):if P.h[‘person’] >= P.h[‘bike’]:

return Trueelse:

return False

def λ_3(P.area):if P.area[‘person’]

<= 2*P.area[‘bike’]:return True

else:return False

• Inferring dependencies outperforms learning dependencies

• Outperforms fully supervised model with additional noisy training labels

* reports F1 scores, rest in accuracy (%)