SuperGlue - cvut.cz

SuperGlue:

Learning Feature Matching

with Graph NeuralNetworks

Report by Ilia Shipachev

Paul-Edouard Sarlin

Tomasz Malisiewicz

Paper Authors:

Daniel DeTone

Andrew Rabinovich

A minimal matching pipeline

> Classical: SIFT, ORB

> Learned: SuperPoint, D2-Net

detection descriptionfeature

matching

outlier

filtering

pose

estimation

> Heuristics: ratio test, mutual check

> Learned: classifier on set

deep net

Nearest

Neighbor

Matching

SuperGlue: context aggregation + matching + filtering

image

pair

[DeTone et al,2018] [Yi et al,2018]

The importance of context

with

SuperGlue

no

SuperGlue

Outputs

Single a matchper keypoint

+ occlusionand noise

→ a soft partialassignment:

Inputs

● Images Aand B

● 2 sets of M, N local features

○ Keypoints:

- Coordinates

-Confidence

○ Visual descriptors:sum ≤1

sum ≤1

Problem formulation

Solving a partial

assignment problem

A Graph NeuralNetwork

withattention

L dus tbin score

+ score matrix

AttentionalAggregation

Self Crosspartial

assignment

M+1

N+1

visual descriptor

=1

matching descriptors

position

Keypoint

Encoder

+

local

featuresSinkhornAlgorithm

column norm.

rownormalization

T

Encodes contextual cues& priors

Reasons about the 3Dscene

Differentiablesolver

Enforces the assignment constraints

= domainknowledge

● Initial representation for each keypoints :

● Combines visual appearanceand position with an MLP:

dus tbin score

+

Attentional Graph NeuralNetwork

score matrix


Self Cross

L

Optimal MatchingLayer

partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

local

featuresSinkhornAlgorithm

column norm.

rownormalization

T

Multi-LayerPerceptron

Update the representation based on other keypoints:

- in the same image:“self” edges

- in the other image: “cross”edges

→ A complete graph withtwotypes of edges

L dus tbin score

+


score matrix


Self Cross


partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

SinkhornAlgorithm

column norm.

rownormalization

T

feature in image at layer

local

features

Update the representation using a Message Passing Neural Network

L dus tbin score

+


score matrix


Self Cross


partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

SinkhornAlgorithm

column norm.

rownormalization

T

themessage

local

features

Attentional Aggregation

● Compute themessage

● Soft database retrieval:query

using selfand cross attention

,key , and value

= [tile, position (70,100)]

= [tile, pos. (80,110)]

= [corner, pos.(60, 90)]

= [grid, pos. (400,600)]

queryneighbors

query

salientpoints[Vaswani et al,2017]

A B

A B

Self-attention

= intra-image

information

flow

Cross-attention

= inter-image

distinctive

points

Attention buildsa

soft, dynamic,

sparse graph

candidate

matches

L dus tbin score

+


score matrix


Self Cross


partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

SinkhornAlgorithm

column norm.

rownormalization

T

Compute a scorematrix

for allmatches:

local

features

L dus tbin score

+


score matrix


Self Cross


partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

SinkhornAlgorithm

column norm.

rownormalization

T

● Occlusion and noise: unmatched keypoints are assigned toa dustbin

● Augment thescores with a learnable dustbin score

local

features

L dus tbin score

+


score matrix


Self Cross


partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

SinkhornAlgorithm

column norm.

rownormalization

T

● Compute the assignment that maximizes

● Solve an optimal transportproblem

● With the Sinkhorn algorithm: differentiable & soft Hungarianalgorithm

[Sinkhorn & Knopp,1967]

local

features

L dus tbin score

+


score matrix


Self Cross


partial

assignment

M+1

N+1

visual descriptor

=1


position

Keypoint

Encoder

+

SinkhornAlgorithm

column norm.

rownormalization

T

● Compute groundtruthcorrespondences from pose and depth

● Find which keypoints shouldbe unmatched

● Loss: maximizethelog-likelihood of the GT cells

local

features

Loss function

- set of GT matches

- set of unmacthed points in GT

SuperPoint + NN + heuristics

Results: indoor -ScanNetSuperPoint + SuperGlue

SuperGlue: more correctmatches and fewer mismatches

SuperPoint + NN + mutual checkSuperPoint + NN + OA-Net (inlier classifier)

Results: outdoor -SfMSuperPoint + SuperGlue

SuperGlue: more correctmatches and fewer mismatches

21

Results: attention patterns

globalcontext neighborhood distinctive keypoints self-similarities

match candidates

Flexibility of attention → diversity ofpatterns

Homography estimation

Indoor pose estimation

Outdoor pose estimation

Ablation of SuperGlue

Evaluation

Heuristics

Learned

inlierclassifier

SuperGlue yields largeimprovements in all cases

Date post:	20-Feb-2022
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

SuperGlue - cvut.cz

Documents