Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
Pasquale Minervini, Sebastian Riedel
Presented by: Tiantian Feng
ContributionsThe author explored the usage of adversarial examples to:
1. Identifying cases where models violate existing background knowledge, expressed in the form of logic rules.
2. Training models that are robust to such violations
What is NLI (Natural Language Inference)?In NLI, an input is presented with two sentences, a premise p and a hypothesis h with possible relationships:
1. Entailment – h is definitely true given p (p entails h)
2. Contradiction – h is definitely not true given p (p contradicts h)
3. Neutral – h might be true given p
NLI model is asked to classify the relationship between p and h.
An example of NLI
The fig is referenced from: Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning, “A large annotated corpus for learning natural language inference”, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Function given parameter : is a model dependent score function
NLI Model - BackgroundInput sentence a and b with length and belong to sentence set , with word embedding of size k:
Sentence a and b can be encoded as:
Conditional probability distribution over all three classes is computed using:
NLI Model - Background
State-of-art models:
1. cBiLSTM (Rocktäschel et al., 2016)2. Decomposable Attention Model (DAM) (Parikh et al., 2016)3. Enhanced LSTM model (ESIM) (Chen et al., 2017)
Evaluation datasets:
1. Stanford Natural Language Inference (SNLI) (Bowman et al., 2015)2. MultiNLI (Williams et al., 2017)
Training using cross entropy loss:
Background Knowledge
In this paper, background knowledge is defined as a set of First-Order Logic (FOL) rules, each having the following body ⇒ head form:
Background Knowledge ViolationViolation Example to First-Order Logic rules:
Consider , , a violation is, when according to NLI model,
1. Sentence contradict
2. Sentence didn’t contradict
Background Knowledge - Inconsistency LossTo measure the degree of Rule 2 violation, we define inconsistency loss as:
is a substitution set that maps the variables and in Rule 2 to the sentences and , , specifies the conditional probability that contradict to
Background Knowledge - Inconsistency LossWe can generalize the inconsistency loss as:
To compute the conditional probability of body with multiple conjunction atoms in Rule 5, the author applied Gödel t-norm (Gupta and Qi, 1991):
Generating Adversarial Examples - ConstrainingAdversarial example based only on inconsistency loss can lead NLI models to violate available background knowledge, but they may not be well-formed and meaningful.
Solution: Constraining the perplexity of generated sentences
Generating Adversarial Examples - SummaryThe search objective for adversarial examples can be formalised by the following optimisation problem:
The goal of the optimisation problem:
1. Maximise the inconsistency loss described in Eq. (4)
2. Composed sentences with a low perplexity
Generating Adversarial Examples With Low PerplexityGenerating low-perplexity sentences set S :
1. Sample sentences close to the data manifold (i.e. with a low perplexity)
2. Make small variations to the sentences
a. Change one word in one of the input sentences
b. Remove one parse subtree from one of the input sentences
c. Insert one parse sub-tree from one sentence in the parse tree of existing
sentences.
Adversarial RegularisationInstead of minimizing
We use the adversarial examples to regularise the training process, and we have:
Specifies the trade of between data loss and the inconsistency loss, measured using substitution set S
ensures the perplexity of sentences is low,
Adversarial RegularisationThe paper solved the optimisation problem using Mini-Batch Gradient Descent
Experiment - Background Knowledge ViolationsThe results showed violations on SNLI dataset, yield by original presented works using cBiLSTM, DAM, ESIM.
Observations:
The model tends to detect entailment relationships between longer (i.e., possibly more specific) and shorter (i.e., possibly more general) sentences.
Experiment - Adversarial Regularisation
The author didn’t specify which lamda produces the accuracy prediction in this table.
Experiment - Adversarial Regularisation
Experiment - Generating Adversarial ExamplesTo further validate the robustness of Adversarial Regulizing, the author crafted a series of datasets for evaluation.
is the generated dataset, where m identifies model in selecting sentences, and k denote number samples in generated dataset. is the original dataset.
Experiment - Generating Adversarial Examples, Continue
For each in ,we consider substitution sets:
The summed consistency loss is :
2. For each selected sentence , , we create and
3. We added and , where is known and is annotated by annotators
1. We rank the loss of each sentence , and select top k instances with highest summed consistency loss
Experiment - Generating Adversarial Examples, Continue
Conclusions1. Results showed that the proposed method consistently yields significant increases
to the predictive accuracy on adversarially-crafted datasets – up to a 79.6% relative
improvement
2. Drastically reducing the number of background knowledge violations
3. Adversarial examples transfer across model architectures, and the proposed
adversarial training procedure produces generally more robust models.
References1. Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomas Kocisky, and Phil Blunsom. 2016. Reasoning about
entailment with neural attention. In International Conference on Learning Representations (ICLR).
2. Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural
language inference. In (Su et al., 2016), pages 2249–2255.
3. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language
inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, pages
1657–1668. Association for Computational Linguistics.
4. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for
learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language
Processing, EMNLP 2015, pages 632–642. The Association for Computational Linguistics.
5. Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2017. A broad-coverage challenge corpus for sentence
understanding through inference. CoRR, abs/1704.05426.
6. M. M. Gupta and J. Qi. 1991. Theory of t-norms and fuzzy inference methods. Fuzzy Sets Syst., 40(3):431–450.
EndThank you!