+ All Categories
Home > Documents > Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix...

Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix...

Date post: 28-Feb-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
26
Tensors o Contravariant and covariant tensors o Einstein notation o Interpretation
Transcript
Page 1: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

TensorsoContravariant and covariant tensorsoEinstein notationo Interpretation

Page 2: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

What is a Tensor?§ It is the generalization of a:

– Matrix operator for >2 dimensions

– Vector data to >1 dimension

§ Why do we need it?– Came out of differential geometry.– Was used by Albert Einstein to formulate the theory of

General Relativity.– It’s needed for many problems, including Deep NNs.

!inputoutput

or

"#"$

Page 3: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Contravariant and Covariant Vectors

§ Contravariant vectors:– “Column Vectors” that represent data– Vectors that describe the position of something– "# for 0 ≤ & < ( and ) ∈ ℜ

§ Covariant vector:– “Row vectors” that operate on data– Gradient vectors– ,# for 0 ≤ & < (

§ Einstein notation

) = ,# "# = .#/0

123,# "#

– Leave out the sum to make notation cleaner– Always sum over any two indices that appear twice

) = ,"

, is 1×(

"is (×

1

) =

Picture of multiplication

contravariantcovariant

Page 4: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Vector-Matrix Products as Tensors

§ 1D Contravariant vectors:– "# for 0 ≤ & < ()– *# for 0 ≤ & < (+

§ 2D tensor (i.e., matrix):– ,-# for 0 ≤ . < (+ and 0 ≤ & < ()– There is a gradient for each component *-

§ Einstein notation

*- = ,-# "# = 0#12

3456

,-# "#

– Leave out the sum to make notation cleaner– Always sum over any two indices that appear twice

* = ,"

, is (×(

"is (×1

=

Picture of multiplication

contravariantcovariant

rows

*is (×1

Page 5: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Tensor Products

§ Einstein notation!"#,"% = '"#,"%(#,(%)(#,(%

§ 2-D Contravariant vectors:– )(#,(% for 0 ≤ -., -/ < 12– !"#,"% for 0 ≤ 3., 3/ < 14

§ 4-D Tensor:– '"#,"%(#,(%for 0 ≤ 3., 3/ < 14 and 0 ≤ -., -/ < 12– ' is known as a tensor• 2D covariant input• 2D contravariant output

Sum over pairs of indices

Page 6: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Picture of a Tensor Product

§ For example, if we have

– Tensor 3-D tensor ! has

– General idea: ! ∈ ℜ$%×$'

() = !)+,,+. /+,,+.

Input: 2D image indexed by 01, 02Output: 1D vector indexed by 3

/

4

!

=

(4

contravariant 2D tensor

3D tensor

Page 7: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Some Useful Definitions§ Delta functions:

!"# = %1 if ) = *0 if ) ≠ *

– So then we have that-.# = -." !"#

§Gradient w.r.t. vector∇0 12 = 1

– In tensor notation, we have that

∇0 12"# = 1"#

§Gradient w.r.t. matrix∇3 12 =? ?

– First, it must have 1 output dimension and 2 input dimensions. So we have that

∇3 12 "#5,#7 = !"#52#7

Page 8: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

GD for Single Layer NNoStructure of the GradientoGradients for NN parametersoUpdates for NN parameters

Page 9: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Gradient Direction for Single Layer NN§ Single layer NN:

– We will need gradient w.r.t the parameters ! = #, %, & :

– Later, we will also need:

∇()( * = ∇+)+,,,- * , ∇,)+,,,- * , ∇-)+,,,- *

∇.)( *

#* /0*

&

1 ⋅+ %)( * = %1 #* + &

# %

Page 10: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Gradient Structure for Single Layer NN

§ Single layer NN:– Parameters are ! = #, %, &– We will need the parameter gradients:

– And the input gradient:

∇()( * = ∇+)+,,,- * , ∇,)+,,,- * , ∇-)+,,,- *

∇.)( *

/ 01/* +

/#

∇+)+

/&

∇,)+

/%

∇-)∇.)

Page 11: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Gradient w.r.t. !

§ For this case,

– Using Einstein notation

∇#$ = ∇#$& ' = ( ∇) *' + , *

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

∇#$-. = (--/ ∇) -/-0*

-0.

This is a matrix or 2-D tensor

Page 12: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Gradient w.r.t. !

§ For this case,

– Using Einstein notation, since

– Then

∇#$ = & ∇' () + + ∇# ()

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

This is a tensor!

∇#$ ,-.-,.0 = &,-,0 ∇' ,0,1 ∇# () ,1.-,.0= &,-,0 ∇' ,0,12

,1.-).0

∇#$ ,-.-,.0 = &,-,0 ∇' ,0.- ).0

∇# () ,.-,.0 = 2,.-).0

34

Page 13: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Gradient w.r.t. !

§ For this case,

– Using Einstein notation,

∇#$ = & ∇' () + + ∇# () + += & ∇' , = & ∇'

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

∇#$ -./. = &-.-0 ∇' -0/.

Page 14: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Gradient w.r.t. !

§ For this case,

– Using Einstein notation, since

– We have that

∇#$ = ∇# &'

!" #$"

%

& ⋅+ ()* " = (& !" + %

! (

∇($ )*+*,+- = .)*+*'+-

This is a tensor!

∇# &' )+*,+- = .)+*'+-

Page 15: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Update Direction for !

where " is given by

Gradient step:# ← # + &"'

"(),(+ = −∇/0 1 = 234567

89:;5,<) =<)<+ ∇>5 <+() ?5

(+

For efficiency, computation goes this way!

234567

89:1×BC;5

BC×B:=5

=B:×B:∇>5 B:

Page 16: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Update Direction for !

where " is given by

Gradient step:# ← # + &"'

"() = −∇-. / = 212345

67893,;) <;);= ∇>3 ;=()

For efficiency, computation goes this way!

212345

6781×AB93

AB×A8<3

= A8×1A8×A8∇>3

Page 17: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Update Direction for !

where " is given by

Gradient step:# ← # + &"'

"(),(+ = −∇/0 1 = 234567

89:;5,()<5(+

For efficiency, computation goes this way!

234567

89:1×?@;5 =<5 ?@

Page 18: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Local and Global MinimaoOpen and Closed SetsoConvex Sets and FunctionsoProperties of Convex FunctionsoLocal Minimum, Saddle Points, and Global MinimaoOptimization Theorems

Page 19: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Open and Closed Sets

§ Define:– " ⊂ ℜ%– Open ball of diameter & is ' (, & = ( ∈ ℜ%: ( − (. < & .

§ A set " is open if – Ateverypoint,thereisanopenballcontainedin".– ∀( ∈ ", ∃& > 0 s.t. ' (, & ⊂ ".

§ A set " is closed if "G = ℜ% −" is open.

§ A set " is compact if it is closed and bounded.

§Facts:– ℜ% is both open and closed, but it is not compact.– If " is compact, than every sequence in " has a limit point in ".

Page 20: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Convexity Sets

§ A set ! is convex if∀# ∈ 0,1 , ∀(, ) ∈ !, then #) + 1 − # ( ∈ !

§ Properties:– The intersection of convex sets is convex

Line connecting points is sometimes outside set

Nonconvex Set

Line connecting points is always in set

Convex Set

Page 21: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Convexity Functions§ Let !: # → ℜ where # is a convex set. Then we say that ! is a convex function if

∀' ∈ 0,1 , ∀,, - ∈ #, then ! '- + 1 − ' , ≤ '! - + 1 − ' ! ,

§ Properties:– The sums of convex functions is convex– The maximum of a set of convex functions is convex– If ! 1 is convex, then ! 21 is also convex.– ! 1 is concave if −! 1 is convex. – ! 1 = 21 + 5 is both convex and concave

6- ,Convex Function

Line is always above function

6- ,Nonconvex Function

Line is sometimes below function

Page 22: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Properties of Convex Functions§ Valuable properties of convex functions• The sums of convex functions is convex.

– Let ! " = ∑% !% " . If !% " are convex, then ! " is convex.

• The maximum of a set of convex functions is convex.– Let ! " = max

%!% " . If !% " are convex, then ! " is convex.

• The second derivative of a convex function is positive.

– Let ! ) have two continuous derivatives, and let *+,- ) = ./0.12

be the

Jacobian of ! at ). Then ! ) is convex if and only if * ) is non-negative definite for all ).

• A convex function of a linear transform is convex.– If ! ) is convex, then ! 3) is also convex.

• An affine transform is both convex and concave.– ! ) = 3) + 6 is both convex and concave.

• A function is concave if its negative is convex.– ! ) is concave if −! ) is convex.

Page 23: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Local Minimum§ Let !:# → ℜ where # ⊂ ℜ'.

– We say that () ∈ # is a local minimum of ! if there ∃, > 0 s.t. ∀ ( ∈ # ∩ 1 (, , , ! ( ≥ ! () .

• Necessary condition for local minima:– Let ! be continuously differentiable and let () be a local

minimum, then ∇! () = 0.

()

local minimum

() a local minimum ∇! () = 0

Page 24: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Saddle Point§ Let !:# → ℜ where # ⊂ ℜ' be a continuously differentiable function.

– We say that () ∈ # is a saddle point of ! if ∇! () = 0 and () is not a local minimum.

– Saddle points can cause more problems than you might think.

()

saddle point in 1D

*Shared from Wikipedia

saddle point in 2D*

Graph of ! . = ./0 − .00

Page 25: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Global Minimum§ Let !:# → ℜ where # ⊂ ℜ'.

– We say that () ∈ # is a global minimum of ! if ∀ ( ∈ #, ! ( ≥ ! () .

§ Comments:– In general, finding global minimum is difficult.– Gradient descent optimization typically becomes trapped in local minima.

()

global minimum

GD steps

Local minimum

Page 26: Tensors - engineering.purdue.eduWhat is a Tensor? §It is the generalization of a: –Matrix operator for >2 dimensions –Vector data to >1 dimension §Why do we need it? –Came

Optimization Theorems§ Let !:# → ℜ where # ⊂ ℜ'.

– If ! is continuous and # is compact, then ! takes on a global minimum in #.– If ! is convex on #, then any local minimum is a global minimum.– If ! is continuously differentiable and convex on #, then ∇! )* = 0 implies that )* ∈# is a global minimum of !.

§ Important facts:– Global minimum may not be unique.– If # is closed but not bounded, then it may not take on a global minimum.– Generally speaking, gradient descent algorithms converge to the global minimum of

continuously differentiable convex functions.– Most interesting functions in ML are not convex!

Convex Function

GD steps global minimum


Recommended