SuperGlue:
Learning Feature Matching
with Graph NeuralNetworks
Report by Ilia Shipachev
Paul-Edouard Sarlin
Tomasz Malisiewicz
Paper Authors:
Daniel DeTone
Andrew Rabinovich
A minimal matching pipeline
> Classical: SIFT, ORB
> Learned: SuperPoint, D2-Net
detection descriptionfeature
matching
outlier
filtering
pose
estimation
> Heuristics: ratio test, mutual check
> Learned: classifier on set
deep net
Nearest
Neighbor
Matching
SuperGlue: context aggregation + matching + filtering
image
pair
[DeTone et al,2018] [Yi et al,2018]
Outputs
Single a matchper keypoint
+ occlusionand noise
→ a soft partialassignment:
Inputs
● Images Aand B
● 2 sets of M, N local features
○ Keypoints:
- Coordinates
-Confidence
○ Visual descriptors:sum ≤1
sum ≤1
Problem formulation
Solving a partial
assignment problem
A Graph NeuralNetwork
withattention
L dus tbin score
+ score matrix
AttentionalAggregation
Self Crosspartial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
local
featuresSinkhornAlgorithm
column norm.
rownormalization
T
Encodes contextual cues& priors
Reasons about the 3Dscene
Differentiablesolver
Enforces the assignment constraints
= domainknowledge
● Initial representation for each keypoints :
● Combines visual appearanceand position with an MLP:
dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
L
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
local
featuresSinkhornAlgorithm
column norm.
rownormalization
T
Multi-LayerPerceptron
Update the representation based on other keypoints:
- in the same image:“self” edges
- in the other image: “cross”edges
→ A complete graph withtwotypes of edges
L dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
SinkhornAlgorithm
column norm.
rownormalization
T
feature in image at layer
local
features
Update the representation using a Message Passing Neural Network
L dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
SinkhornAlgorithm
column norm.
rownormalization
T
themessage
local
features
Attentional Aggregation
● Compute themessage
● Soft database retrieval:query
using selfand cross attention
,key , and value
= [tile, position (70,100)]
= [tile, pos. (80,110)]
= [corner, pos.(60, 90)]
= [grid, pos. (400,600)]
queryneighbors
query
salientpoints[Vaswani et al,2017]
A B
A B
Self-attention
= intra-image
information
flow
Cross-attention
= inter-image
distinctive
points
Attention buildsa
soft, dynamic,
sparse graph
candidate
matches
L dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
SinkhornAlgorithm
column norm.
rownormalization
T
Compute a scorematrix
for allmatches:
local
features
L dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
SinkhornAlgorithm
column norm.
rownormalization
T
● Occlusion and noise: unmatched keypoints are assigned toa dustbin
● Augment thescores with a learnable dustbin score
local
features
L dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
SinkhornAlgorithm
column norm.
rownormalization
T
● Compute the assignment that maximizes
● Solve an optimal transportproblem
● With the Sinkhorn algorithm: differentiable & soft Hungarianalgorithm
[Sinkhorn & Knopp,1967]
local
features
L dus tbin score
+
Attentional Graph NeuralNetwork
score matrix
AttentionalAggregation
Self Cross
Optimal MatchingLayer
partial
assignment
M+1
N+1
visual descriptor
=1
matching descriptors
position
Keypoint
Encoder
+
SinkhornAlgorithm
column norm.
rownormalization
T
● Compute groundtruthcorrespondences from pose and depth
● Find which keypoints shouldbe unmatched
● Loss: maximizethelog-likelihood of the GT cells
local
features
SuperPoint + NN + heuristics
Results: indoor -ScanNetSuperPoint + SuperGlue
SuperGlue: more correctmatches and fewer mismatches
SuperPoint + NN + mutual checkSuperPoint + NN + OA-Net (inlier classifier)
Results: outdoor -SfMSuperPoint + SuperGlue
SuperGlue: more correctmatches and fewer mismatches
21
Results: attention patterns
globalcontext neighborhood distinctive keypoints self-similarities
match candidates
Flexibility of attention → diversity ofpatterns