+ All Categories
Home > Science > Doubly Decomposing Nonparametric Tensor Regression

Doubly Decomposing Nonparametric Tensor Regression

Date post: 16-Jan-2017
Category:
Upload: masaaki-imaizumi
View: 72 times
Download: 1 times
Share this document with a friend
24
Tensor regression problem Our Approach Convergence Analysis Experiments Summary Doubly Decomposing Nonparametric Tensor Regression M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) 2016/06/21 (ICML2016) M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression
Transcript
Page 1: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Doubly DecomposingNonparametric Tensor Regression

M.Imaizumi (Univ. of Tokyo)K.Hayashi (AIST / JST ERATO)

2016/06/21 (ICML2016)

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 2: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Outline

TopicRegression with Tensor inputNonparametric approach

MethodPropose nonparametric regression model with Bayes estimatorIt avoids “The Curse of Dimensionality” of tensor regression

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 3: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

1 Tensor regression problem

2 Our Approach

3 Convergence Analysis

4 Experiments

5 Summary

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 4: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Tensor Regression Problem

Tensor dataX ∈ RI1×...×IK

K : mode of tensor XIk : dim of k-th mode

Tensor Regressionn observations Dn = {(Xi, Yi)}ni=1

Input (tensor) : Xi ∈ RI1×...×IK Output (real) : Yi ∈ RDn is generated with a function f : RI1×...×IK → R as

Yi = f(Xi) + εi

for i = 1, . . . , n.εi is a Gaussian noise

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 5: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Application of Tensor Regression Problem

Predict health conditions from medical 3D imagesXi : medical 3D image of patient i, Yi : health condition of i

Predict spread of epidemics on networksXi : adjacency matrix of network i, Yi : # of infected nodes

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 6: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

The curse of dimensionality

Performance of the estimator of f gets worse with tensor input

Naive estimator f̃n

‖f̃n − f‖2 = O(n−2β/(2β+

∏k Ik)

),

where β is smoothness of f .∏k Ik = # of elements in X

We propose a method to obtain better convergence rate

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 7: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

1 Tensor regression problem

2 Our Approach

3 Convergence Analysis

4 Experiments

5 Summary

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 8: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Double Decomposition

IdeaReduce the complexity of f by decomposing f intononparametric local functions

Control bias variance trade-off with the estimation

We decompose both the function f and the input tensor X

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 9: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Double Decomposition 1

1.Tensor (CP) DecompositionConsider X ∈ RI1×...×IK

There exists a set of normalized vectors {x(k)r ∈ RIk}R∗,K

r,k=1,1

and scale term λr for all r = 1, . . . , R∗, then

X =

R∗∑r=1

λrx(1)r ⊗ x(2)r ⊗ . . .⊗ x(K)

r .

R∗ is a tensor rank

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 10: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Double Decomposition 2

2.Functional Decomposition

For each r, consider a function f(x(1)r , x

(2)r , . . . , x

(K)r )

x(k)r are Ik-dimensional vectors

There exist M∗ ∈ Z+ ∪ {∞} and a set of local functions

{f (k)m }K,M∗

k,m=1 satisfying

f(x(1)r , x(2)r , . . . , x(K)r ) =

M∗∑m=1

K∏k=1

f (k)m (x(k)r ).

M∗ is a model complexity

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 11: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Proposed Framework

Assumption

f is additive separable with respect to r = 1, . . . , R∗

Consider doubly decomposed form of f

f(X) =

M∗∑m=1

R∗∑r=1

λr

K∏k=1

f (k)m (x(k)r )

Additive-Multiplicative Nonparametric Regression(AMNR)

Composed by f(k)m for all m = 1, . . . ,M∗ and k = 1, . . . ,K

f(k)m takes a Ik-dimensional vector as inputM∗ (model complexity) and R∗ (tensor rank) are tuningparameters

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 12: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Estimation Method

The Bayes method with the Gaussian process prior.

Prior

π(f) =∏m

∏k

GP(k)(f (k)m ),

Posterior

π(f |Dn) =exp(−

∑ni=1(Yi −G[f ](Xi))

2)∫exp(−

∑ni=1(Yi −G[f ′](Xi))2)π(df ′)

π(f),

where G[f ](Xi) :=M∑m=1

R∑r=1

K∏k=1

f (k)m

(x(k)r,i

).

ImplementationThe estimation bases on Gibbs sampling

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 13: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

1 Tensor regression problem

2 Our Approach

3 Convergence Analysis

4 Experiments

5 Summary

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 14: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Analyze Convergence Theoretically

In the form of AMNR, M∗ controls the size of biasAs M∗ increases, the bias decreases

Focus on the distance between f̂n and f∗ with given M∗

We start with a case then finite M∗ is sufficient to represent fThen, we consider M∗ is larger (infinite)

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 15: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Finite M ∗ Case

In the following, we assume that

i f belongs to Sobolev space with order βii Parameters of the prior estimation is appropriately selected

Theorem 1

Let M∗ <∞. Then, with some finite constant C > 0,

E‖f̂n − f∗‖2n ≤ Cn−2β/(2β+maxk Ik).

Remind that the naive estimator f̃n has a convergence raten−2β/(2β+

∏k Ik)

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 16: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Infinite M ∗ Case

With infinite M∗, we estimate first M components with someassumption.

Theorem 2

Assume that with some constant γ ≥ 1,∥∥∥∑r λr∏k f

(k)m

∥∥∥2= o

(m−γ−1

), as m→∞. Suppose we

construct the estimator with a proximal complexity M such that

M � (n−2β/(2β+maxk Ik))1/(1+γ).

Then, with some finite constant C > 0,

E‖f̂n − f∗‖2n ≤ C(n−2β/(2β+maxk Ik))γ/(1+γ).

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 17: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Convergence Rate

Compare Nomparametric method for Tensor Regression

For example case, we set K = 3, Ik = 100, β = γ = 2

Method Conv. Rate Example

Naive n−2β/(2β+∏

k Ik) n−1/2501

AMNR (Finite M∗) n−2β/(2β+maxk Ik) n−1/26

AMNR (Infinite M∗) (n−2β/(2β+maxk Ik))γ/(1+γ) n−1/39

AMNR achieves better convergence rate, by reducing the sizeof the model space by the double decomposition

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 18: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

1 Tensor regression problem

2 Our Approach

3 Convergence Analysis

4 Experiments

5 Summary

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 19: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Experiment Outline

We introduce 3 experiments

1 Prediction performance2 Convergence analysis3 Real data analysis

Methods

AMNR (our method)TGP (Tensor Gaussian Process ; Zhou et al. (2014))

Close to the naive estimator

TLR (Tensor Linear Regression ; Zhao et al. (2013))

Not nonparametric method

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 20: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Prediction Performance

Generate synthetic data with low rank tensor as

f(X) =∑2

r=1 λr∏Kk=1(1 + exp(–γTx

(k)r ))−1

Full view

100 200 300 400 500

n

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

MSE

Enlarged view

100 200 300 400 500

n

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

MSE

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 21: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Convergence analysis

Generate synthetic data with smoothness-controlled processf(X) =

∑Rr=1

∏Kk=1

∑l µlφl(γ

Tx)

Symmetric tensor

4.0 4.5 5.0 5.5 6.0

log n

log M

SE

TGP(10*10*10)

AMNR(10*10*10)

Asymmetric tensor

4.0 4.5 5.0 5.5 6.0

log n

log M

SE

TGP(10*3*3)

AMNR(10*3*3)

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 22: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Real Data Analysis

Epidemic Spreading DataXi : Adjacency matrix of network iYi : the number of total infected nodes of network i

0 50 100 150 200

n

0

10

20

30

40

50

60

70

MSE

Testing MSE

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 23: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

1 Tensor regression problem

2 Our Approach

3 Convergence Analysis

4 Experiments

5 Summary

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression

Page 24: Doubly Decomposing Nonparametric Tensor Regression

Tensor regression problemOur Approach

Convergence AnalysisExperiments

Summary

Conclusion

Proposed nonparametric regression model with tensor input

AMNR controls the complexity of the regression function, bydecomposing both the function and the input tensor

It controls the bias variance trade-off and avoids the curse ofdimensionality of tensor input

Its limitation is a computational complexity

M.Imaizumi (Univ. of Tokyo) K.Hayashi (AIST / JST ERATO) Doubly Decomposing Nonparametric Tensor Regression


Recommended