+ All Categories
Home > Documents > NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf ·...

NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf ·...

Date post: 17-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
38
NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding Dr. Quanming YAO Researcher@4Paradigm. Inc Accepted by ICDE 2019: https://arxiv.org/pdf/1812.06410.pdf Code: https:// github.com/yzhangee/NSCaching Email: [email protected]
Transcript
Page 1: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding

Dr. Quanming YAO

Researcher@4Paradigm. Inc

Accepted by ICDE 2019: https://arxiv.org/pdf/1812.06410.pdf

Code: https://github.com/yzhangee/NSCaching

Email: [email protected]

Page 2: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

About This Talk

• Knowledge Graph Embedding

• Negative Sampling

• NSCaching: faster and better negative sampling

• Experiments

Page 3: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Knowledge Graph Embedding

Page 4: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Knowledge Graph (KG)

Knowledge structure as graph• Each node = an entity

• Each edge = a relation

Fact (triplet):• (head, relation, tail)

Typical KGs:• WordNet: Linguistic KG

• Freebase, DBpedia, YAGO: World KG

Applications:• Structured search [Dong et.al. KDD 2014]

• Question answering [Lukovnikov et.al. WWW 2017]

• Recommendation [Zhang et.al. KDD 2016]

(Michelle, hasChild, ?)

Page 5: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

KG Embedding (what & why)

Encode entities and relations in a KG into low-dimensional vector spaces ℝ𝑑, while capturing nodes’ and edges’ connection properties

Once triplets are processed into vectors, they can be used for subsequent learning tasks

Page 6: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

KG Embedding (how)

A scoring function 𝑓(ℎ, 𝑟, 𝑡) is given to capture the interactions (similarity) between two entities based on a relation by their embeddings

TransE [Bordes, etal 2013]: 𝑓 ℎ, 𝑟, 𝑡 = − 𝐡 + 𝐫 − 𝐭 1

DistMult [Yang, etal. 2017]: 𝑓 ℎ, 𝑟, 𝑡 = 𝐡𝑇diag 𝐫 𝐭

𝑓 ℎ, 𝑟, 𝑡

• ℎ: embedded vector of head entity

• 𝑟: embedded vector of relation

• 𝑡: embedded vector of tail entity

Page 7: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

KG Embedding (how)

Target:

• maximize 𝑓 on a set of positive triplets 𝒮 = ℎ, 𝑟, 𝑡

• minimize 𝑓 on a set of negative triplets ҧ𝒮 = തℎ, 𝑟, ҧ𝑡

− 𝐎𝐛𝐚𝐦𝐚 +𝐌𝐚𝐫𝐫𝐢𝐞𝐝𝐓𝐨 −𝐌𝐢𝐜𝐡𝐞𝐥𝐥𝐞 1

− 𝐎𝐛𝐚𝐦𝐚 +𝐌𝐚𝐫𝐫𝐢𝐞𝐝𝐓𝐨 − 𝐓𝐫𝐮𝐦𝐩 1

Objective, minimize, e.g.,

• 𝐿 ℰ,ℛ = σ ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮 𝛾 − 𝑓 ℎ, 𝑟, 𝑡 + 𝑓 തℎ, 𝑟, ҧ𝑡+, (𝛾 > 0)

• 𝐿 ℰ,ℛ = σ ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮 ℓ +1, 𝑓 ℎ, 𝑟, 𝑡 + ℓ(−1, 𝑓(തℎ, 𝑟, ҧ𝑡))

Use TransE as example

ℓ is a loss function for binary classification

Page 8: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (why)

A KG only contains observed facts (positive triplets)

Non-observed ones are assumed to be negative with large probability

Positive Negative

(Obama, marriedTo, Michelle) (Obama, marriedTo, Sasha), (SaSha, marriedTo, Michelle), (Obama, bornOn, Michelle)

(Michelle, hasChild, Malia) (Michelle, hasChild, Obama), (Sasha, hasChild, Malia), (Michelle, bornOn, Malia)

Page 9: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (why)

𝐿 ℰ,ℛ =

ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮

ℓ +1, 𝑓 ℎ, 𝑟, 𝑡 + ℓ(−1, 𝑓(തℎ, 𝑟, ҧ𝑡))

• Performance: Not all negative samples are equally good, bad ones can make

performance worse

Positive: (Steve Jobs, FounderOf, Apple Inc.)

Low-quality: (Baseball, FounderOf, Apple Inc.)

High-quality: (Bill Gates, FounderOf, Apple Inc.)

Page 10: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (why)

𝐿 ℰ,ℛ =

ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮

ℓ +1, 𝑓 ℎ, 𝑟, 𝑡 + ℓ(−1, 𝑓(തℎ, 𝑟, ҧ𝑡))

• Computation: Number of negative samples (unobserved triplets) is very large,

considering all of them is computationally infeasible

How to sample a few high-quality negative samples is important for performance and efficiency

Page 11: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (why)

Negative sampling is not isolated in KG, they also appears in word2vec [Mikolov, et.al. 2013], Click-Through Rate prediction (CTR)

The needs of negative sampling are the same

word2vec

Page 12: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (how)

Given a positive triplet ℎ, 𝑟, 𝑡 , the set of negative triplets

Few negative samples are sampled from ҧ𝒮 ℎ,𝑟,𝑡 .

Note that ℎ, ҧ𝑟, 𝑡 ∉ 𝒮| ҧ𝑟 ∈ ℰ is not included since it is more likely to be false negative.

ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ

Page 13: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (problems)

Given a negative triplets set ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ ,

uniformly sampling from the set is widely used in literature.

The quality of negative sample matters!

Page 14: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Negative Sampling (problems)

Low quality negative samples become less informative gradually [Wang et.al. AAAI 2018]

• Positive: (Steve Jobs, FounderOf, Apple Inc.)

• Low-quality: (Baseball, FounderOf, Apple Inc.)

• High-quality: (Bill Gates, FounderOf, Apple Inc.)

Vanishing Gradient

𝐿 ℰ,ℛ =

ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮

𝛾 − 𝑓 ℎ, 𝑟, 𝑡 + 𝑓 തℎ, 𝑟, ҧ𝑡+

[Wang et.al. AAAI 2018]

We need to adaptively generate high quality negative triplets as training goes on.

High-quality negative triplets should havelarge scores. We need to capture the dynamicdistribution of them and sample from it.

Page 15: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

GAN-based Method (existing solutions)

Key idea• Use a generator to model the dynamic negative triplet distribution

• High quality negative triplets are sampled by the generator

• Joint optimize (reinforcement learning is used):• Discriminator is trained based on negative triplets provided by generator;

• Generator obtains reward by the discriminator.

target KG embedding

IGAN [Wang et.al. AAAI 2018]

KBGAN [Cai et.al. NAACL 2018]

Self-pace NE [Gao et. al. KDD 2018]

Page 16: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching: faster and better negative sampling

Page 17: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

𝐿 ℰ,ℛ =

ℎ,𝑟,𝑡 ∈𝒮, ഥℎ,𝑟, ҧ𝑡 ∈ ҧ𝒮

𝛾 − 𝑓 ℎ, 𝑟, 𝑡 + 𝑓 തℎ, 𝑟, ҧ𝑡+

Key Observations

High-quality: large score evaluated from the scoring function

Recall that:

Page 18: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Key Observations

The score distribution of negative triplets is highly skewed

Properties: dynamic, rare, complex

Page 19: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Key Observations

Word2vec has similar observations on negative samples (on words)

- “While NCE can be shown to approximately maximize the log probability of the softmax, the Skip-gram model is only concerned with learning high-quality vector representations, so we are free to simplify NCE as long as the vector representations retain their quality.”

Can we design a sampling scheme fully explore above properties?

Page 20: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

GAN-based Method (existing solutions)

GAN based NSCaching

Increased number of training parameters No extra parameters introduced

Sampling is not efficient Efficient sampling through the cache

Training suffers from instability and degeneracy Stable without pre-train

Page 21: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching (overview)

Challenges:

• How to model the dynamic distribution of negative triplets

• How to sample high-quality negative triplets in an efficient way

Motivation:

the KG embedding itself contains information of triplets quality

• Use a small amount of extra memory, which caches negative samples with large scores for each triplet in 𝒮 during training

• Sample the negative triplet directly from the cache

Page 22: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching (design issues)

Core idea: cache high-quality negative samples for each observed triplets

• How to construct & update the cache?

• How to sample from the cache?

Recall that: the distribution of negative samples are dynamic

during training, but high-quality ones are rare

Page 23: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching (design issues)

Sampling from the cache• The negative triplets in the cache may not be accurate enough;

• There are false negative triplets in the negative sample sets.

Update the cache• The cache needs to be dynamically changed during the iterations of the algorithm;

• Should be able to explore all the possible high-quality negative samples;

• The update procedure should be efficient.

sample

sample

ҧ𝑡

തℎ

𝒯ℎ,𝑟

ℋ 𝑟,𝑡

Recall that all possible choices are:ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ

Page 24: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching (update & construct cache)

• Randomly sample candidates

from all possible negative triplets

• Evaluate scores of each candidate

• Compared with existing ones in

the cache and keep top ones

Recall that all possible choices are:ҧ𝒮 ℎ,𝑟,𝑡 = തℎ, 𝑟, 𝑡 ∉ 𝒮|തℎ ∈ ℰ ∪ ℎ, 𝑟, ҧ𝑡 ∉ 𝒮| ҧ𝑡 ∈ ℰ

Page 25: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Cache update

Possible Choices:• Compute the score over all ℎ′ ∈ ℰ, 𝑡′ ∈ ℰ and

select among them.

• Sample a subset ℛ𝑚from ℰ, and select among

them.

• Sample a subset ℛ𝑚from ℰ, concatenate it with

the cache, and select among the new set.

Entity set ℰ

𝒯ℎ,𝑟 ℋ 𝑟,𝑡

ℛ𝑚

Design Requirements

Capture dynamic distribution Explore all possible candidates Efficient

Page 26: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

𝒯ℎ,𝑟 ℋ 𝑟,𝑡ℛ𝑚

Entity set ℰ

𝒯ℎ,𝑟 ℋ 𝑟,𝑡

update cache

random sample

ℛ𝑚

Update scheme• top-k• importance sampling

Connection to self-paced learning:• As training goes on, easy samples will gradually have small scores and are removed from the

cache. Thus, hard samples are gradually stored.

NSCaching (update & construct cache)

Page 27: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching (sample from cache)

Since negative samples in the cache is almost equally good, we uniformly

sample from them

Page 28: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

𝒯

tail cache

head cache

ℎ, 𝑟, 𝑡

ℎ, 𝑟

𝑟, 𝑡

index

index

ℎ, 𝑟, ҧ𝑡

തℎ, 𝑟, 𝑡

samp

lesam

ple

concatenate

concatenate

𝑓(ℎ, 𝑟, 𝑡)

തℎ, 𝑟, ҧ𝑡 𝑓

𝑓 തℎ, 𝑟, ҧ𝑡

ℎ, 𝑟, 𝑡 തℎ, 𝑟, ҧ𝑡

loss

𝑓

negative triplet

update cache

Cache KGE

NSCaching (overview)

𝒯ℎ,𝑟

ℋ 𝑟,𝑡

Page 29: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

NSCaching (detailed design concerns)

There are other design possibilities for NSCaching, e.g.

• Sample the top 1 in the cache

• Keep the top in the cache

• Etc….

Please check our paper for detailed discussion, the main principle is :

exploration and exploitation

Page 30: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Experiments

Page 31: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Effectiveness

Measurements• Given a triplet ℎ, 𝑟, 𝑡 ;• Compute the score of ℎ′, 𝑟, 𝑡 , ∀ℎ′ ∈ ℰ;• Get the rank of ℎ among all ℎ′;• Same for 𝑡.

Metrics• MRR (mean reciprocal rank):

• MR (mean rank):

• Hit@10:

1

𝒮

𝑖=1

𝒮1

rank𝑖

1

𝒮

𝑖=1

𝒮

rank𝑖

1

𝒮

𝑖=1

𝒮

𝕀 rank𝑖 < 10

Page 32: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

EfficiencyWe measure the convergence by testing performance v.s. training time.

Page 33: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

EfficiencyWe measure the convergence by testing performance v.s. training time.

Page 34: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Sampling and Updating schemes

• Sampling from cache: uniform, importance sampling (IS), top-1

• Cache update: importance sampling (IS), top-k

Diff. sampling scheme. Diff. updating scheme.

Page 35: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Stability

We change the cache size 𝑁1 among 10, 30, 50, 70, 90 when fixing 𝑁2 = 50,

and random subset size 𝑁2 among 10, 30, 50, 70, 90 when fixing 𝑁1 = 50.

Diff. 𝑁1 Diff. 𝑁2

Page 36: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Visualization

Given positive triplet (manorama, profession, actor), we randomly select and

visualize some entities in the tail-cache 𝒯(𝑚𝑎𝑛𝑜𝑟𝑎𝑚𝑎,𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑖𝑜𝑛) during

training.

Page 37: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Summary

A novel negative sampling method.

Why it works

• It can dynamically hold high-quality negative samples;

• Sampling is efficient and extra memory is small;

• Both sampling and updating schemes are carefully designed to balance through exploration and exploitation;

• The cache schemes has connection with self-paced learning.

Page 38: NSCaching: Simple and Efficient Negative Sampling for …qyaoaa/papers/NSCaching-quanming.pdf · 2019-03-08 · NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph

Future works

• Using advanced index structure for the cache to further improve efficiency and

reduce cache sizes.

• Adapt to negative sampling in other tasks like word embedding, network

embedding, PU learning;

• Theoretical analysis on the convergence;

• Using AutoML to make NSCaching better adapt to other datasets.


Recommended