+ All Categories
Home > Documents > Representations Natural Language and...

Representations Natural Language and...

Date post: 08-May-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
56
Slav Petrov on behalf of the Language Team @ Google Research Natural Language Representations and Challenges
Transcript
Page 1: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Slav Petrovon behalf of theLanguage Team @ Google Research

Natural Language Representations and Challenges

Page 2: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Outline

●○○○

●○○○

Page 3: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

State-of-the-art in Natural Language Understanding in 2017

→ → Custom (Recurrent) Architectures

P 3

Page 4: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Oct. 2018: One Model with Task-specific Tuning in Minutes

P 4

Page 5: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Question Answering (SQuAD 1.1)

P 5

Page 6: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Pre-Training in NLP

king

[-0.5, -0.9, 1.4, …]

queen

[-0.6, -0.8, -0.2, …] king wore a crown

Inner Product

queen wore a crown

Inner Product

[0.3, 0.2, -0.8, …]

open a bank account on the river bank

Page 7: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

History of Contextual Representations

Train LSTMLanguage Model

LSTM

<s>

open

LSTM

open

a

LSTM

a

bank

LSTM

very

LSTM

funny

LSTM

movie

POSITIVE

...

Fine-tune on Classification Task

Page 8: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

History of Contextual Representations

Train Separate Left-to-Right and Right-to-Left LMs

LSTM

<s>

open

LSTM

open

a

LSTM

a

bank

Apply as “Pre-trained Embeddings”

LSTM

open

<s>

LSTM

a

open

LSTM

bank

aExisting Model Architecture

Page 9: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

History of Contextual Representations

Transformer

<s>

open

open

a

a

bank

Transformer Transformer

POSITIVE

Fine-tune on Classification Task

Transformer

<s> open a

Transformer Transformer

Train Deep (12-layer) Transformer LM

Page 10: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Unidirectional vs. Bidirectional Models

P 10

Layer 2

<s>

Layer 2

open

Layer 2

open

Layer 2

a

Layer 2

a

Layer 2

bank

Unidirectional contextBuild representation incrementally

Layer 2

<s>

Layer 2

open

Layer 2

open

Layer 2

a

Layer 2

a

Layer 2

bank

Bidirectional contextWords can “see themselves”

Page 11: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Masked Language Model (Fill-in-the-blank)

P 11

● Solution: Mask out k% of the input words, and then predict the masked words○ We always use k = 15%

● Too little masking: Too expensive to train● Too much masking: Not enough context

the man went to the [MASK] to buy a [MASK] of milk

store gallon

Page 12: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Next Sentence Prediction

P 12

● To learn relationships between sentences, predict whether Sentence B is actual sentence that proceeds Sentence A, or a random sentence

Page 13: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Next Sentence Prediction

P 13

● Use 30,000 WordPiece vocabulary on input● Each token is sum of three embeddings● Single sequence is much more efficient

Page 14: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Transformer Architecture

● Multi-headed self attention○

● Feed-forward layers○

● Layer norm and residuals○

● Positional embeddings○

Page 15: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

From One-Hot Vectors to Word Embeddings & Self-Attention

P 15

on ... river bank

0 0

0 0

…1

0 0

0 1…

01 0

0 0

…0

one-hot

1.4 … 3.7

4.9 … 6.4

2.5 … 8.0

embedding

The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT

Page 16: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

From One-Hot Vectors to Word Embeddings & Self-Attention

P 16

0 0

0 0

…1

0 0

0 1…

01 0

0 0

…0

one-hot

1.4 … 3.7

4.9 … 6.4

2.5 … 8.0

embedding

The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT

on ... river bank

Page 17: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

query, key, value

From One-Hot Vectors to Word Embeddings & Self-Attention

P 17

0 0

0 0

…1

0 0

0 1…

01 0

0 0

…0

one-hot

1.4 … 3.7

4.9 … 6.4

2.5 … 8.0

embedding

The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT

on ... river bank

Page 18: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

query, key, value

From One-Hot Vectors to Word Embeddings & Self-Attention

P 18

0 0

0 0

…1

0 0

0 1…

01 0

0 0

…0

one-hot

0.1

0.2

0.7

(self-)attention

1.4 … 3.7

4.9 … 6.4

2.5 … 8.0

embedding

The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT

on ... river bank

Page 19: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

query, key, value

From One-Hot Vectors to Word Embeddings & Self-Attention

P 19

0 0

0 0

…1

0 0

0 1…

01 0

0 0

…0

one-hot

0.1

0.2

0.7

(self-)attention

1.4 … 3.7

4.9 … 6.4

2.5 … 8.0

embedding

The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT

on ... river bank

Page 20: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Transformer vs LSTM

P 20

● Self-attention == no locality bias○ Long-distance context has “equal opportunity”

● Single multiplication per layer == efficiency on TPU○ Effective batch size is number of words, not sequences

X_0_0 X_0_1 X_0_2 X_0_3

X_1_0 X_1_1 X_1_2 X_1_3

✕ W

X_0_0 X_0_1 X_0_2 X_0_3

X_1_0 X_1_1 X_1_2 X_1_3

✕ W

Transformer LSTM

Page 21: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Basic BERT Recipe

P 21

Page 22: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Basic BERT Recipe

P 22

Page 23: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Basic BERT Recipe

P 23

Page 24: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

GLUE Benchmark

MultiNLIPremise: Hills and mountains are especially sanctified in Jainism.Hypothesis: Jainism hates nature.Label: Contradiction

CoLaSentence: The wagon rumbled down the road.Label: Acceptable

Sentence: The car honked down the road.Label: Unacceptable

Page 25: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

SWAG: Zellers, Bisk, Schwartz, Choi, EMNLP 2018 SQuAD: Rajpurkar, Zhang, Lopyrev, Liang, EMNLP 2016

Results: Commonsense Reasoning and Question Answering

P 25

Page 26: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Ablation Experiments

● Masked LM (compared to left-to-right LM) is very important on some tasks, Next Sentence Prediction is important on other tasks.

● Left-to-right model does very poorly on word-level task (SQuAD), although this is mitigated by BiLSTM

Page 27: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

● More data (and training longer) helps => not yet asymptoted

● Bigger model helps a lot

More is Better

Page 28: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Try It Out, Get Faster Training with TPUs

P 28

Page 29: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Do I Need Full BERT Models for All My Tasks?

P 29Houlsby, Giurgiu, Jastrzebski, Morrone, de Laroussilhe, Gesmundo, Attariyan, Gelly, arxiv Feb 2019

Page 30: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Recently: Even Better Pretraining

P 30

Page 31: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

BERT vs. XLNet

BERT XLNet

Objective Masked LM + NextSentence Autoregressive LM

Masking Random 15% Last ⅙ in permutation order

Position Encoding Absolute Relative

Data 13 GB of text 126 GB of text

Page 32: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Other BERT-inspired work

P 32

What does BERT learn - Tenney et al., ACL 2019Relation learning - Baldini-Soares et al., ACL 2019Passage representations - Lee et al., ACL 2019

Page 33: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

What does BERT know about language?

Tenney et al., ICLR 2019, ACL 2019

● What linguistic relationships?● Where in the model are they

computed?

Classifier-based probing: project model activations into space of linguistic annotations (graph edges)

Page 34: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

BERT Rediscovers the Classical NLP Pipeline

Probing Representations

High weights for POS in lower layers, then constituents, dependencies, and SRL, followed by entities and coreference as we move up the stack!

BERT improves coref over ELMo (84->91, or 39% relative)

We can trace hypotheses on individual sentences!

“he smoked toronto in the playoffs with six hits, seven walks …”

Page 35: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Representing Entities and Relations

BERT

BERT

BLANK BLANKBERT

Baldini-Soares et al., ACL 2019

Page 36: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Matching The Blanks Results

few-shot relation extraction

Baldini-Soares et al., ACL 2019

Page 37: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Open Retrieval Question Answering

What does the zip in zip code

stand for?

Zone Improvement

Plan

How many districts are in

Alabama?Wikipedia 7

Input OutputLatent Retrieval

Wikipedia

Goal: Learn to efficiently read Wikipedia without any retrieval data.

Motivation: Best known recipe for latent retrieval is TF-IDF filtering + brute force. We can do better.

Key Insight: Pre-train an unsupervised ScaM neural retriever. This enables efficient end-to-end fine-tuning with standard latent variable learning methods.

Lee et al., ACL 2019

Page 38: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Zebras have four gaits: walk, trot, canter and gallop. They are generally slower than horses, but their great stamina helps them outrun predators. When chased, a zebra will zig-zag from side to side, making it more difficult for the predator to attack...

Pseudo EvidenceZebras have four gaits: walk, trot, canter and gallop. When chased, a zebra will zig-zag from side to side, making it more difficult for the predator to attack...

Pseudo QueryThey are generally slower than horses, but their great stamina helps them outrun predators.

In-batch negative examplePoe capitalized on the success of "The Raven" by following it up with his essay "The Philosophy of Composition" (1846), in which he detailed the poem's creation….

In-batch negative exampleGagarin was further selected for an elite training group known as the Sochi Six, from which the first cosmonauts of the Vostok programme would be chosen...

Open Retrieval Question Answering

Inverse Cloze Task (ICT): Given a sentence (pseudo-query), predict the context (pseudo-evidence)

Can you be charged for the same crime in two different states?

Wikipedia ?

Progress towards one of the hardest binary text classification tasks today:

Page 39: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Realistic Challenge Sets

Natural Questions - Kwiatkowski et al., TACL 2019Yes/No Questions - Clark et al., NAACL 2019Identifying Commands - Elkahky et al., EMNLP 2018

P 39

Page 40: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Natural Questions Motivation

Kwiatkowski et al. TACL 2019

Question: The success of the Britain Can Make It exhibition led to the planning of what exhibition in 1951?

Evidence: ... The success of this exhibition led to the planning of the Festival of Britain (1951). By 1948 most of the collections had been returned to the museum.

Answer: Festival of Britain

Question: Can you make and receive calls in airplane mode?

Evidence: Airplane mode, …. suspends radio-frequency signal transmission by the device, thereby disabling Bluetooth, telephony, and Wi-Fi. GPS may or may not be disabled, because it does not involve transmitting radio waves.

Answer: No

Goal: Provide academia with first question answering dataset that represents a real question answering problem.

Previous question answering datasets are contrived. E.g. SQuAD's questions often paraphrase evidence text.

Answering real user queries requires much deeper language understanding and world knowledge.

Many questions have multiple acceptable answers: last hurricane in Massachusetts has a formal meaning (eye of the storm in MA) and a different colloquial meaning (hurricane force winds in MA). NQ embraces this acceptable variability. Solutions should model the full distribution of possible answers.

SQuAD dataset

Natural Questions

Page 41: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

P 41

Challenge:Many Correct Answers

NQ annotators are encouraged to pick the first good answer.

In practice we sometimes get many different answer locations for the same question.

Question: name the substance used to make the filament of bulb

Page 42: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

P 42

Defining CorrectnessWrong annotations are often the result of annotators trying to find an answer when the evidence isn't sufficient.

When all annotators agree that there is enough evidence available to answer a question, the annotations are overwhelmingly correct.

Long answers Short answers

X-axis - proportion of annotations that are non-null for question.Y-axis - expectation that a non-null annotation's question is in this bucket.

Also broken down, conditioned on annotation being: Correct (C ); Correct but debatable (C

d ); or Wrong (W ).

Page 43: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Natural Questions - Status

● First ever release of Google queries.

● 300k training items, 16k for evaluation. Upper bound of 87% on long answers, 76% on short.

● Leaderboard seeing good activity, task is quite a bit harder than Squad.

Page 44: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Williams, Nangia, Bowman, 2018

Prior Approaches to Testing Inference/Reasoning Abilities

P 44

Page 45: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Clark, Lee, Chang, Kwiatkowski, Collins, Toutanova, NAACL 2019

BoolQ: Naturally Occurring Yes-No Questions

P 45

Page 46: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Real Problems that Naturally Require Inference to Solve

Question

Passage

Answer

P 46

Page 47: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Collecting Passages

P 47BoolQ

Document Selection Paragraph Selection Answer Selection

Are there blue whales in the Atlantic Ocean?

YesNo

Pipeline from Natural Questions (Kwiatkowski et al., 2019)

Page 48: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Test Set Results

P 48

Page 49: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Sample Efficiency: MultiNLI > BERT for Small Data

P 49

Page 50: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Noun-Verb Ambiguity

“lives” / Noun → /laIvz/

“lives” / Verb → /lIvz/

fliesNOUN

Mark VERB

P 50Elkahky, Webster, Andor, Pitler, EMNLP 2018

Page 51: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Certain insects can damage plumerias, such as mites, flies, or aphids. NOUNMark which area you want to distress. VERB

P 51

“A Challenge Set and Methods for Noun-Verb Ambiguity”, EMNLP 2018

Page 52: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Accuracy on Noun-Verb Disambiguation

P 52

Page 53: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Pronunciation of Homographs Accuracy

P 53

Page 54: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Webster, Recasens, Axelrod, Baldridge, TACL 2019 Kwiatkowski, Palomaki, Redfield, Collins, Parikh, Alberti, Epstein, Polosukhin, Kelcey, Devlin, Lee, Toutanova, Jones, Chang, Dai, Uszkoreit, Le, Petrov, TACL 2019

Released Datasets with “In-the-Wild” Natural Challenges

P 54

Mark VERB

Are there blue whales in the Atlantic Ocean? YES

Page 55: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Summary

P 55

Page 56: Representations Natural Language and Challengeslxmls.it.pt/2019/Natural_Language_Representations... · Natural Language Representations ... Natural Questions - Kwiatkowski et al.,

Thank you for your attention!

[email protected]


Recommended