Projective transformations - MIT CSAIL · 2004. 3. 15. · – Expensive search, which is also...

1

1

6.891

Computer Vision and Applications

Prof. Trevor. Darrell

Lecture 11: Model-based vision• Hypothesize and test• Interpretation Trees• Alignment• Pose Clustering• Geometric Hashing

Readings: F&P Ch 18.1-18.5 2

Last time

Projective SFM– Projective spaces– Cross ratio– Factorization algorithm– Euclidean upgrade

3

Projective transformations

A projectivity is an invertible mapping h from P2 to itself such that three points x1,x2,x3 lie on the same line if and only if h(x1),h(x2),h(x3) do.

Definition:

A mapping h:P2→P2 is a projectivity if and only if there exist a non-singular 3x3 matrix H such that for any point in P2 reprented by a vector x it is true that h(x)=Hx

Theorem:

Definition: Projective transformation

=

3

2

1

333231

232221

131211

3

2

1

'''

xxx

hhhhhhhhh

xxx

xx' H=or

8DOFprojectivity=collineation=projective transformation=homography

[F&P, www.cs.unc.edu/~marc/mvg]

4

The value of this cross ratio is independent of the intersecting line or plane:

[F&P]

5

Two-frame reconstruction(i) Compute F from correspondences(ii) Compute camera matrices from F(iii) Compute 3D point for each pair of

corresponding points

computation of Fuse x‘iFxi=0 equations, linear in coeff. F8 points (linear), 7 points (non-linear), 8+ (least-squares)(more on this next class)computation of camera matrices

triangulationcompute intersection of two backprojected rays

Possible choice:

]e'|F][[e'P' 0]|[IP ×==

[www.cs.unc.edu/~marc/mvg]

6

Perspective factorization

All equations can be collected for all i as

where,

with:

PMm =

=

Λ

ΛΛ

=

mnn P

PP

P

m

mm

m...

,...

2

1

22

11

m are known, but Λi,P and M are unknown…

Observe that PM is a product of a 3mx4 matrix and a 4xnmatrix, i.e. it is a rank 4 matrix


[ ] [ ]( )imiii

mimiii mmmλ,...,λ,λdiag

M,...,M,M,,...,,

21

2121

=Λ== Mm

mjmijiijij ,...,1,,...,1,Mmλ === P

2

7

Iterative perspective factorizationWhen Λi are unknown the following algorithm can be used:

1. Set λij=1 (affine approximation).

2. Factorize PM and obtain an estimate of P and M. If s5 is sufficiently small then STOP.

3. Use m, P and M to estimate Λi from the camera equations (linearly) mi Λi=PiM

4. Goto 2.

In general the algorithm minimizes the proximity measureP(Λ,P,M)=s5

Structure and motion recovered up to an arbitrary projective transformation


8

Given a camera with known intrinsic parameters, we can take the calibration matrix to be the identity and write the perspective projection equation in some Euclidean world coordinate system as

for any non-zero scale factor λ. If Mi and Pj denote the shape and motion parameters measured in some Euclidean coordinate system, there must exist a 4 ×4 matrix Q such that

Euclidean upgrade

[F&P]

9

Today: “Model-based Vision”

Still feature and geometry-based, but now with moving objects rather than cameras…

Topics:– Hypothesize and test– Interpretation Trees– Alignment– Pose Clustering– Invariances– Geometric Hashing 10

Approach

• Given– CAD Models (with features)– Detected features in an image

• Hypothesize and test recognition…– Guess – Render – Compare

11

Hypothesize and Test Recognition

• Hypothesize object identity and correspondence– Recover pose– Render object in camera– Compare to image

• Issues– where do the hypotheses come from?– How do we compare to image (verification)?

12

Features?

• Points

but also,• Lines• Conics• Other fitted curves• Regions (particularly the center of a region,

etc.)

3

13

How to generate hypotheses?

• Brute force– Construct a correspondence for all object features to

every correctly sized subset of image points– Expensive search, which is also redundant.– L objects with N features– M features in image– O(LMN) !

• Add geometric constraints to prune search, leading to interpretation tree search

• Try subsets of features (frame groups)… 14

Interpretation Trees

• Tree of possible model-image feature assignments• Depth-first search• Prune when unary (binary, …) constraint violated

– length– area– orientation

(a,1)

(b,2)

…

…

15

Interpretation Trees

[ A.M. Wallace. 1988. ]

“Wild cards” handle spurious image features

16

Adding constraints

• Correspondences between image features and model features are not independent.

• A small number of good correspondences yields a reliable pose estimation --- the others must be consistent with this.

• Generate hypotheses using small numbers of correspondences (e.g. triples of points for a calibrated perspective camera, etc., etc.)

17

Pose consistency / Alignment

• Given known camera type in some unknown configuration (pose)– Hypothesize configuration from set of initial features– Backproject – Test

• “Frame group” -- set of sufficient correspondences to estimate configuration, e.g.,– 3 points– intersection of 2 or 3 line segments, and 1 point

18

Alignment

4

19 20

Pose clustering

• Each model leads to many correct sets of correspondences, each of which has the same pose

• Vote on pose, in an accumulator array (per object)

21

PoseClustering

22

23

Pose clusteringProblems

– Clutter may lead to more votes than the target!– Difficult to pick the right bin size

Confidence-weighted clustering– See where model frame group is reliable

(visible!)– Downweight / discount votes from frame

groups at poses where that frame group is unreliable… 24

pick feature pair

dark regions show reliable views of those features

5

25 26

27 28

29

Detecting 0.1% inliers among 99.9% outliers?

• Example: David Lowe’s SIFT-based Recognition system• Goal: recognize clusters of just 3 consistent features

among 3000 feature match hypotheses• Approach

– Vote for each potential match according to model ID and pose

– Insert into multiple bins to allow for error in similarity approximation

– Using a hash table instead of an array avoids need to form empty bins or predict array size

[Lowe]30

Lowe’s Model verification step

• Examine all clusters with at least 3 features• Perform least-squares affine fit to model. • Discard outliers and perform top-down check for

additional features.• Evaluate probability that match is correct

– Use Bayesian model, with probability that features would arise by chance if object was not present

– Takes account of object size in image, textured regions, model feature count in database, accuracy of fit (Lowe, CVPR 01)

[Lowe]

6

31

Solution for affine parameters

• Affine transform of [x,y] to [u,v]:

• Rewrite to solve for transform parameters:

[Lowe]32

Models for planar surfaces with SIFT keys:

[Lowe]

33

Planar recognition

• Planar surfaces can be reliably recognized at a rotation of 60° away from the camera

• Affine fit approximates perspective projection

• Only 3 points are needed for recognition

[Lowe]34

3D Object Recognition

• Extract outlines with background subtraction

[Lowe]

35

3D Object Recognition

• Only 3 keys are needed for recognition, so extra keys provide robustness

• Affine model is no longer as accurate

[Lowe]36

Recognition under occlusion

[Lowe]

7

37

Location recognition

[Lowe]38

Robot Localization• Joint work with Stephen Se, Jim Little

[Lowe]

39

Map continuously built over time

[Lowe]40

Locations of map features in 3D

[Lowe]

41

Invariant recognition

• Affine invariants– Planar invariants– Geometric hashing

• Projective invariants– Determinant ratio

• Curve invariants

42

Invariance• There are geometric properties that are invariant to

camera transformations• Easiest case: view a plane object in scaled

orthography.• Assume we have three base points P_i on the

object– then any other point on the object can be written as

Pk = P1 + µka P2 − P1( )+ µkb P3 − P1( )

8

43

Invariance

• Now image points are obtained by multiplying by a plane affine transformation, so

pk = APk

= A P1 + µka P2 − P1( )+ µkb P3 − P1( )( )= p1 + µka p2 − p1( )+ µkb p3 − p1( )

44

Invariance

Given the base points in the image, read off the µvalues for the object– they’re the same in object and in image --- invariant– search correspondences, form µ’s and vote

pk = APk

= A P1 + µka P2 − P1( )+ µkb P3 − P1( )( )= p1 + µka p2 − p1( )+ µkb p3 − p1( )

Pk = P1 + µka P2 − P1( )+ µkb P3 − P1( )

45

Geometric Hashing• Objects are represented as sets of “features”• Preprocessing:

– For each tuple b of features, compute location (µ) of all other features in basis defined by b

– Create a table indexed by (µ)– Each entry contains b and object ID

S. Rusinkiewicz[http://www.cs.princeton.edu/courses/archive/fall03/cs597D/lectures/rigid_registration.pdf]

46

GH: Identification• Find features in target image• Choose an arbitrary basis b’• For each feature:

– Compute (µ’) in basis b’– Look up in table and vote for (Object, b)

• For each (Object, b) with many votes:– Compute transformation that maps b to b’– Confirm presence of object, using all available

featuresS. Rusinkiewicz

[http://www.cs.princeton.edu/courses/archive/fall03/cs597D/lectures/rigid_registration.pdf]

47

Geometric Hashing

Wolfson and Rigoutsos, Geometric Hashing, an Overview, 1997[http://www.cs.princeton.edu/courses/archive/fall03/cs597D/lectures/rigid_registration.pdf]

48

Basis Geometric Hashing


9

49

Geometric Hashing


50

==b

51

Indexing with invariants

• Generalize to heterogeneous geometric features

• Groups of features with identity information invariant to pose – invariant bearing groups

52

Projective invariants

• Projective invariant for coplanar points• Perspective projection of coplanar points

is a plane perspective transform:p=MP p=AP, with 3x3 A

• determinant ratio of 5 point tuples is invariant

det pi pj pk[ ]( )det pi pl pm[ ]( )det pi pj pl[ ]( )det pi pk pm[ ]( )

53

det pi pj pk[ ]( )det pi pl pm[ ]( )det pi pj pl[ ]( )det pi pk pm[ ]( )

=det APiAP j AP k[ ]( )det APiAPl AP m[ ]( )det AP iAP jAPl[ ]( )det APiAP k APm[ ]( )

=det A PiPjPk[ ]( )det A PiPlPm[ ]( )det A PiPjPl[ ]( )det A PiPk Pm[ ]( )

=det A( )2( )det A( )2( )

det PiPjPk[ ]( )det PiPlPm[ ]( )det PiPjPl[ ]( )det PiPkPm[ ]( )

=det PiPjPk[ ]( )det PiPlPm[ ]( )det PiPjPl[ ]( )det PiPk Pm[ ]( )

54

Tangent invariance

• Incidence is preserved despite transformation

• Transform four points above to unit square: measurements in this canonical frame will be invariant to pose.

M-curve construction

10

55 56

57

Verification?• Edge score

– are there image edges near predicted object edges?– very unreliable; in texture, answer is usually yes

• Oriented edge score– are there image edges near predicted object edges with the right

orientation?– better, but still hard to do well (see next slide)

• Texture largely ignored [Forsythe]– e.g. does the spanner have the same texture as the wood?

58

59

Algorithm Sensitivity

Grimson and Huttenlocher, 1990

• Geometric Hashing– A relatively sparse hash table is critical for good

performance– Method is not robust for cluttered scenes (full hash

table) or noisy data (uncertainty in hash values)• Generalized Hough Transform

– Does not scale well to multi-object complex scenes– Also suffers from matching uncertainty with noisy

data

[http://www.cs.princeton.edu/courses/archive/fall03/cs597D/lectures/rigid_registration.pdf]

60

Comparison to template matching

• Costs of template matching– 250,000 locations x 30 orientations x 4 scales = 30,000,000

evaluations– Does not easily handle partial occlusion and other variation

without large increase in template numbers– Viola & Jones cascade must start again for each qualitatively

different template• Costs of local feature approach

– 3000 evaluations (reduction by factor of 10,000)– Features are more invariant to illumination, 3D rotation, and object

variation– Use of many small subtemplates increases robustness to partial

occlusion and other variations

[Lowe]

11

61

Today: “Model-based Vision”

• Hypothesize and test• Interpretation Trees• Alignment• Pose Clustering• Invariances• Geometric Hashing• Tuesday: Project previews!

Date post:	21-Jan-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Projective transformations - MIT CSAIL · 2004. 3. 15. · – Expensive search, which is also...

Documents