+ All Categories
Home > Documents > Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Date post: 24-May-2015
Category:
Upload: laboratoire-statistique-et-genome
View: 813 times
Download: 0 times
Share this document with a friend
Popular Tags:
102
Sparsity with sign-coherent groups of variables via the cooperative-Lasso Julien Chiquet 1 , Yves Grandvalet 2 , Camille Charbonnier 1 1 Statistique et G´ enome, CNRS & Universit´ e d’ ´ Evry Val d’Essonne 2 Heudiasyc, CNRS & Universit´ e de Technologie de Compi` egne SSB – 29 mars 2011 arXiv preprint. http://arxiv.org/abs/1103.2697 R-package scoop. http://stat.genopole.cnrs.fr/logiciels/scoop cooperative-Lasso 1
Transcript
Page 1: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Sparsity with sign-coherent groups of variables via thecooperative-Lasso

Julien Chiquet1, Yves Grandvalet2, Camille Charbonnier1

1 Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne

2 Heudiasyc, CNRS & Universite de Technologie de Compiegne

SSB – 29 mars 2011

arXiv preprint.

http://arxiv.org/abs/1103.2697

R-package scoop.

http://stat.genopole.cnrs.fr/logiciels/scoop

cooperative-Lasso 1

Page 2: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Notations

Let

I Y be the output random variable,

I X = (X1, . . . , Xp) be the input random variables, where Xj is thejth predictor.

The dataGiven a sample (yi, xi), i = 1, . . . , n of i.id. realizations of (Y,X),denote

I y = (y1, . . . , yn)ᵀ the response vector,

I xj = (xj1, . . . , xjn)ᵀ the vector of data for the jth predictor,

I X the n× p design matrix of data whose jth column is xj ,

I D = i : (yi, xi) ∈ training set,I T = i : (yi, xi) ∈ test set.

cooperative-Lasso 2

Page 3: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Generalized linear models

Suppose Y depends linearly on X through a function g:

E(Y ) = g(Xβ?).

We predict a response yi by yi = g(xiβ) for any i ∈ T by solving

β = arg maxβ

`D(β) = arg minβ

∑i∈D

Lg(yi,xiβ),

where Lg is a loss function depending on the function g. Typically,

I if Y is Gaussian and g = Id (OLS),

Lg(y, xβ) = (y − xβ)2

I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)

Lg(y, xβ) = −(y · xβ − log

1 + exβ

)or any negative log-likelihood ` of an exponential family distribution.

cooperative-Lasso 3

Page 4: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Generalized linear models

Suppose Y depends linearly on X through a function g:

E(Y ) = g(Xβ?).

We predict a response yi by yi = g(xiβ) for any i ∈ T by solving

β = arg maxβ

`D(β) = arg minβ

∑i∈D

Lg(yi,xiβ),

where Lg is a loss function depending on the function g. Typically,

I if Y is Gaussian and g = Id (OLS),

Lg(y, xβ) = (y − xβ)2

I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)

Lg(y, xβ) = −(y · xβ − log

1 + exβ

)or any negative log-likelihood ` of an exponential family distribution.

cooperative-Lasso 3

Page 5: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Estimation and selection at the group level

1. Structure: the set I = 1, . . . , p splits into a known partition.

I =

K⋃k=1

Gk, with Gk ∩ G` = ∅, k 6= `.

2. Sparsity: the support S of β? has few entries.

S = i : β?i 6= 0, such as |S| p.

The group-Lasso estimator

Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06

βgroup

= arg minβ∈Rp

−`D(β) + λ

K∑k=1

wk∥∥βGk∥∥

.

I λ ≥ 0 controls the overall amount of penalty,

I wk > 0 adapts the penalty between groups (dropped hereafter).

cooperative-Lasso 4

Page 6: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Estimation and selection at the group level

1. Structure: the set I = 1, . . . , p splits into a known partition.

I =

K⋃k=1

Gk, with Gk ∩ G` = ∅, k 6= `.

2. Sparsity: the support S of β? has few entries.

S = i : β?i 6= 0, such as |S| p.

The group-Lasso estimator

Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06

βgroup

= arg minβ∈Rp

−`D(β) + λ

K∑k=1

wk∥∥βGk∥∥

.

I λ ≥ 0 controls the overall amount of penalty,

I wk > 0 adapts the penalty between groups (dropped hereafter).

cooperative-Lasso 4

Page 7: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

lambda (log scale)

coeffi

cien

ts

lcp

agepgg45

gleason

lbph

lcavol

lweight

svi

Figure: Lasso

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Page 8: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

age

pgg45

lbph

lcavol

svi

lcp

lweigh

t

gleason

0100

200

300

400

500

600

Heigh

t

Figure: hierarchical clustering

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Page 9: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

-3 -2 -1 0

lambda (log scale)

coeffi

cien

ts

lcp

agepgg45

gleason

lbph

lcavol

lweight

svi

Figure: group-Lasso

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Page 10: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

lambda (log scale)

coeffi

cien

ts

lcp

agepgg45

gleason

lbph

lcavol

lweight

svi

Figure: Lasso

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Page 11: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Application to splice site detection

Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.

1 2 3 4 5 6 7 8 9

Position

0

0.5

1

1.5

2

Inform

ationcontent I order 0: 7 factors with 4 levels,

I order 1: C27 factors with 42 levels,

I order 2: C37 factors with 43 levels,

I using dummy coding for factor,we form groups.

L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.

cooperative-Lasso 6

Page 12: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Application to splice site detection

Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

g18

g5

g4

g44

g54

g42

g49

g45

g61

order 0order 1order 2

I order 0: 7 factors with 4 levels,

I order 1: C27 factors with 42 levels,

I order 2: C37 factors with 43 levels,

I using dummy coding for factor,we form groups.

L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.

cooperative-Lasso 6

Page 13: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Group-Lasso limitations

1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)

establish the conditions where the group-Lasso outperforms the Lasso,and conversely.

2. No sign-coherence within groupI Required if groups gather consonant variables

e.g., groups defined by clusters of positively correlated variables.

The cooperative-Lasso

A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either

I non-positive,

I non-negative,

I or null parameters.

cooperative-Lasso 7

Page 14: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Group-Lasso limitations

1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)

establish the conditions where the group-Lasso outperforms the Lasso,and conversely.

2. No sign-coherence within groupI Required if groups gather consonant variables

e.g., groups defined by clusters of positively correlated variables.

The cooperative-Lasso

A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either

I non-positive,

I non-negative,

I or null parameters.

cooperative-Lasso 7

Page 15: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: multiple network inference

experiment 1 experiment 2 experiment 3

inference inference inference

A group is a set of corresponding edges across tasks (e.g., red or blueones): sign-coherence matters!

J. Chiquet, Y. Grandvalet, C. Ambroise, 2010.Inferring multiple graphical structures, Statistics and Computing.

cooperative-Lasso 8

Page 16: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimize

β∈Rp‖β − y‖2,

s.t

p∑i=1

|βi − βi−1| < s,

where

I y a vector in Rp,

I β a vector in Rp,

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 17: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 18: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 19: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 20: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 21: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 22: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Page 23: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 10

Page 24: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 11

Page 25: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

The cooperative-Lasso estimator

Definition

βcoop

= arg minβ∈Rp

J(β), with J(β) = −`D(β) + λ‖β‖coop,

where, for any v ∈ Rp,

‖v‖coop = ‖v+‖group + ‖v−‖group =

K∑k=1

∥∥∥v+Gk

∥∥∥+∥∥∥v−Gk∥∥∥ ,

and

I v+ = (v+1 , . . . , v+p ), v+j = max(0, vj),

I v− = (v−1 , . . . , v−p ), v+j = max(0,−vj).

cooperative-Lasso 12

Page 26: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

A geometric view of sparsity`(β1,β

2)

β2 β1

minimizeβ1,β2

−`(β1, β2) + λΩ(β1, β2)

mmaximize

β1,β2`(β1, β2)

s.t. Ω(β1, β2) ≤ c

cooperative-Lasso 13

Page 27: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

A geometric view of sparsityβ2

β1

minimizeβ1,β2

−`(β1, β2) + λΩ(β1, β2)

mmaximize

β1,β2`(β1, β2)

s.t. Ω(β1, β2) ≤ c

cooperative-Lasso 13

Page 28: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Page 29: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Page 30: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Page 31: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Page 32: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Page 33: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Page 34: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Page 35: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Page 36: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 16

Page 37: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

cooperative-Lasso 17

Page 38: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

cooperative-Lasso 17

Page 39: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

cooperative-Lasso 17

Page 40: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

cooperative-Lasso 17

Page 41: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

β2

β1

There are Supporting Hyperplane at all points of convex sets:Generalize tangents

cooperative-Lasso 17

Page 42: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Page 43: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Page 44: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Page 45: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Page 46: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Optimality conditions

TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :

0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,

where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define

ϕj(v) = (sign(vj)v)+,

then θ is such as

∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)

∥∥ ,

∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)

∥∥ ≤ 1.

We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).

cooperative-Lasso 19

Page 47: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Optimality conditions

TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :

0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,

where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define

ϕj(v) = (sign(vj)v)+,

then θ is such as

∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)

∥∥ ,

∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)

∥∥ ≤ 1.

We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).

cooperative-Lasso 19

Page 48: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Optimality conditions

TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :

0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,

where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define

ϕj(v) = (sign(vj)v)+,

then θ is such as

∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)

∥∥ ,

∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)

∥∥ ≤ 1.

We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).

cooperative-Lasso 19

Page 49: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Linear regression with orthonormal design

Consider

β = arg minβ

1

2‖y −Xβ‖2 + λΩ(β)

,

with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols

and

β = arg minβ

1

2βᵀ(β − βols

) + λΩ(β)

.

We may find a closed-form of β for, e.g.,

1. Ω(β) = ‖β‖lasso,

2. Ω(β) = ‖β‖group,

3. Ω(β) = ‖β‖coop.

cooperative-Lasso 20

Page 50: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Linear regression with orthonormal design

Consider

β = arg minβ

1

2‖y −Xβ‖2 + λΩ(β)

,

with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols

and

β = arg minβ

1

2βᵀ(β − βols

) + λΩ(β)

.

We may find a closed-form of β for, e.g.,

1. Ω(β) = ‖β‖lasso,

2. Ω(β) = ‖β‖group,

3. Ω(β) = ‖β‖coop.

cooperative-Lasso 20

Page 51: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Linear regression with orthonormal design

βlasso1

βols2 βols

1

∀j ∈ 1, . . . , p ,

βlassoj =

1− λ∣∣∣βolsj

∣∣∣+ βols

j ,

∣∣∣βlassoj

∣∣∣ =(∣∣∣βols

j

∣∣∣− λ)+ .

Fig.: Lasso as a function of the OLS coefficients

cooperative-Lasso 20

Page 52: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Linear regression with orthonormal design

βgroup1

βols2 βols

1

∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,

βgroupj =

1− λ∥∥∥βols

Gk

∥∥∥+ βols

j ,

∥∥∥βgroup

Gk

∥∥∥ =(∥∥∥βols

Gk

∥∥∥− λ)+ .

Fig.: Group-Lasso as a function of the OLS coefficients

cooperative-Lasso 20

Page 53: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Linear regression with orthonormal design

βcoop1

βols2 βols

1

∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,

βcoopj =

1− λ∥∥∥ϕj(βols

Gk )∥∥∥+ βols

j ,

∥∥∥ϕj(βcoop

Gk )∥∥∥ =

(∥∥∥ϕj(βols

Gk )∥∥∥− λ)+ .

Fig.: Coop-Lasso as a function of the OLS coefficients

cooperative-Lasso 20

Page 54: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 21

Page 55: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Linear regression setupTechnical assumptions

(A1) X and Y have finite fourth order moments

E‖X‖4 <∞, E|Y |4 <∞,

(A2) the covariance matrix Ψ = EXXᵀ ∈ Rp×p is invertible,

(A3) for every k = 1, . . . ,K,if ‖(β?)+‖ > 0 and ‖(β?)−‖ > 0 then for every j ∈ Gk β?j 6= 0.(All sign-coherent groups are either included or excluded from the true support).

cooperative-Lasso 22

Page 56: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Irrepresentability condition

Define Sk = S ∩ Gk the support within a group and

[D(β)]jj = ‖[sign(βj)βGk ]+‖−1.

Assume there exists η > 0 such that

(A4) For every group Gk including at least one null coefficient:

max(‖(ΨSckSΨ−1SSD(β?S)β?S)+‖, ‖(ΨSckSΨ−1SSD(β?S)β?S)−‖) ≤ 1− η,

(A5) For every group Gk intersecting the support and including eitherpositive or negative coefficients, let νk be the sign of thesecoefficients (νk = 1 if ‖(β?Gk)+‖ > 0 and νk = −1 if ‖(β?Gk)−‖ > 0):

νkΨSckSΨ−1SSD(β?S)β?S 0,

where denotes componentwise inequality.

cooperative-Lasso 23

Page 57: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Consistency results

TheoremIf assumptions (A1-5) are satisfied and if there exists η > 0, then forevery sequence λn such that λn = λ0n

−γ , γ ∈]0, 1/2[,

βcoop P−→ β? and P(S(β

coop) = S)→ 1.

Asymptotically, the cooperative-Lasso is unbiased and enjoys exactsupport recovery (even when there are irrelevant variables within agroup).

cooperative-Lasso 24

Page 58: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Page 59: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Page 60: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Page 61: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Page 62: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Illustration

-3 -2 -1 0 1

-1.0

-0.5

0.0

0.5

1.0

log10(λ)

coeffi

cien

ts

Generate data y = Xβ? + σε,

I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,

I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.

I average over 100 simulations.

Fig.:: 50% coverage intervals (upper / lower quartiles)

cooperative-Lasso 26

Page 63: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Illustration

-3 -2 -1 0 1

-1.0

-0.5

0.0

0.5

1.0

log10(λ)

coeffi

cien

ts

Generate data y = Xβ? + σε,

I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,

I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.

I average over 100 simulations.

Fig.:group-Lasso: 50% coverage intervals (upper / lower quartiles)

cooperative-Lasso 26

Page 64: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Illustration

-3 -2 -1 0 1

-1.0

-0.5

0.0

0.5

1.0

log10(λ)

coeffi

cien

ts

Generate data y = Xβ? + σε,

I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,

I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.

I average over 100 simulations.

Fig.:coop-Lasso: 50% coverage intervals (upper / lower quartiles)

cooperative-Lasso 26

Page 65: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 27

Page 66: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Optimism of the training error

I The training error:

err =1

|D|∑i∈D

L(yi,xiβ).

I The test error (“extra-sample” error):

Errex = EX,Y [L(Y,Xβ)|D].

I The “in-sample” error

Errin =1

|D|∑i∈D

EY[L(Yi,xiβ)|D

].

Definition (Optimism)

Errin = err + ”optimism”.

cooperative-Lasso 28

Page 67: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Optimism of the training error

I The training error:

err =1

|D|∑i∈D

L(yi,xiβ).

I The test error (“extra-sample” error):

Errex = EX,Y [L(Y,Xβ)|D].

I The “in-sample” error

Errin =1

|D|∑i∈D

EY[L(Yi,xiβ)|D

].

Definition (Optimism)

Errin = err + ”optimism”.

cooperative-Lasso 28

Page 68: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Cp statistics

For squared-error loss (and some other loss),

Errin = err +2

|D|∑i∈D

cov(yi, yi).

The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).

Mallows’ Cp Statistic

For a linear regression fit yi with p inputs∑

i∈D cov(yi, yi) = pσ2 :

Cp = err + 2 · df

|D| σ2, with df = p.

cooperative-Lasso 29

Page 69: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Cp statistics

For squared-error loss (and some other loss),

Errin = err +2

|D|∑i∈D

cov(yi, yi).

The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).

Mallows’ Cp Statistic

For a linear regression fit yi with p inputs∑

i∈D cov(yi, yi) = pσ2 :

Cp = err + 2 · df

|D| σ2, with df = p.

cooperative-Lasso 29

Page 70: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Generalized degrees of freedom

Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.

Proposition (Efron (’04)+ Stein’s Lemma (’81))

df(λ).=

1

σ2

∑i∈D

cov(yi(λ), yi) = Ey

[tr∂yλ∂y

].

For the Lasso, Zou et al. (’07) show that

df lasso(λ) =∥∥∥βlasso

(λ)∥∥∥0.

Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals

dfgroup(λ) =

K∑k=1

1(∥∥∥βgroup

Gk (λ)∥∥∥ > 0

)1 +

∥∥∥βgroup

Gk (λ)∥∥∥∥∥βols

Gk∥∥ (pk − 1)

.

cooperative-Lasso 30

Page 71: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Generalized degrees of freedom

Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.

Proposition (Efron (’04)+ Stein’s Lemma (’81))

df(λ).=

1

σ2

∑i∈D

cov(yi(λ), yi) = Ey

[tr∂yλ∂y

].

For the Lasso, Zou et al. (’07) show that

df lasso(λ) =∥∥∥βlasso

(λ)∥∥∥0.

Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals

dfgroup(λ) =

K∑k=1

1(∥∥∥βgroup

Gk (λ)∥∥∥ > 0

)1 +

∥∥∥βgroup

Gk (λ)∥∥∥∥∥βols

Gk∥∥ (pk − 1)

.

cooperative-Lasso 30

Page 72: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Generalized degrees of freedom

Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.

Proposition (Efron (’04)+ Stein’s Lemma (’81))

df(λ).=

1

σ2

∑i∈D

cov(yi(λ), yi) = Ey

[tr∂yλ∂y

].

For the Lasso, Zou et al. (’07) show that

df lasso(λ) =∥∥∥βlasso

(λ)∥∥∥0.

Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals

dfgroup(λ) =

K∑k=1

1(∥∥∥βgroup

Gk (λ)∥∥∥ > 0

)1 +

∥∥∥βgroup

Gk (λ)∥∥∥∥∥βols

Gk∥∥ (pk − 1)

.

cooperative-Lasso 30

Page 73: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Approximated degrees of freedom for the coop-Lasso

Proposition

Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)

dfcoop(λ) =

K∑k=1

1∥∥∥∥(βcoopGk

(λ))+∥∥∥∥>0

1 + (pk+ − 1)

∥∥∥∥(βcoop

Gk (λ))+∥∥∥∥∥∥∥∥(βols

Gk

)+∥∥∥∥

+ 1∥∥∥∥(βcoopGk

(λ))−∥∥∥∥>0

1 + (pk− − 1)

∥∥∥∥(βcoop

Gk (λ))−∥∥∥∥∥∥∥(βols

Gk)−∥∥∥

,

where pk+ and pk− are respectively the number of positive and negative

entries in βols

Gk (γ).

cooperative-Lasso 31

Page 74: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Approximated degrees of freedom for the coop-Lasso

Proposition

Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)

dfcoop(λ) =

K∑k=1

1∥∥∥∥(βcoopGk

(λ))+∥∥∥∥>0

1 +

pk+− 1

1 + γ

∥∥∥∥(βcoop

Gk (λ))+∥∥∥∥∥∥∥∥(βridge

Gk (γ))+∥∥∥∥

+ 1∥∥∥∥(βcoopGk

(λ))−∥∥∥∥>0

1 +

pk−− 1

1 + γ

∥∥∥∥(βcoop

Gk (λ))−∥∥∥∥∥∥∥∥(βridge

Gk (γ))−∥∥∥∥

,

where pk+ and pk− are respectively the number of positive and negative

entries in βridge

Gk (γ).cooperative-Lasso 31

Page 75: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Approximated information criteria

Following Zou et al, we extend the Cp stat to an “approximated” AIC

AIC(λ) =‖y − y(λ)‖

σ2+ 2df(λ),

and from the AIC, there is (small) step to BIC:

BIC(λ) =‖y − y(λ)‖

σ2+ log(n)df(λ).

I The K–fold cross-validation works well but is computationallyintensive.

I It is required when we do not meet the linear regression setup. . .

cooperative-Lasso 32

Page 76: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 33

Page 77: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Revisiting Elastic-Net experiments (1)

lasso enet group coop

1020

3040

5060

70

MS

E

Generate data y = Xβ? + σε,

I β? =(0, . . . , 0︸ ︷︷ ︸

10

, 2, . . . , 2︸ ︷︷ ︸10

, 0, . . . , 0︸ ︷︷ ︸10

, 2, . . . , 2︸ ︷︷ ︸10

)

I G1 = 1, . . . , 10,G2 = 11, . . . , 20,G3 = 21, . . . , 30,G4 = 31, . . . , 40.

I σ = 15, corr(xi,xj) = 0.5,

I training/validation/test/ =100/100/400,

I average over 100 simulations.

cooperative-Lasso 34

Page 78: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Revisiting Elastic-Net experiments (2)

lasso enet group coop

050

100

150

200

250

MS

E

Generate data y = Xβ? + σε,

I β? = (3, . . . , 3︸ ︷︷ ︸15

, 0, . . . , 0︸ ︷︷ ︸25

)

I σ = 15,

I G1 = 1, . . . , 5,G2 = 6, . . . , 10,G3 = 11, . . . , 15,G4 = 16, . . . , 40.

I xj = Z1 + ε, Z1 ∼ N (0, 1), ∀j ∈ G1I xj = Z3 + ε, Z2 ∼ N (0, 1), ∀j ∈ G2I xj = Z3 + ε, Z3 ∼ N (0, 1), ∀j ∈ G3I xj ∼ N (0, 1), ∀j ∈ G4.

I training/validation/test/ =50/50/400,

I average over 100 simulations.

cooperative-Lasso 35

Page 79: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 1, |Sk| = 1 non-zero coefficients in each active group.

cooperative-Lasso 36

Page 80: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 2, |Sk| = 3 non-zero coefficients in each active group.

cooperative-Lasso 36

Page 81: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 3, |Sk| = 5 non-zero coefficients in each active group.

cooperative-Lasso 36

Page 82: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 4, |Sk| = 7 non-zero coefficients in each active group.

cooperative-Lasso 36

Page 83: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 5, |Sk| = 9 non-zero coefficients in each active group.

cooperative-Lasso 36

Page 84: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

Remark

Covariance structure is purposely disconnected from the group structure.

None of the support recovery conditions are fulfilled.

cooperative-Lasso 37

Page 85: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

One shot sample with n = 120

cooperative-Lasso 37

Page 86: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

-0.2

0.0

0.2

0.4

0.6

log10(λ)

βlasso

0 20 40 60 80

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

i

βlasso

True signal

Estimated signal

Figure: Lassocooperative-Lasso 37

Page 87: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

log10(λ)

βgroup

0 20 40 60 80

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

i

βgroup

True signal

Estimated signal

Figure: Group-Lassocooperative-Lasso 37

Page 88: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

log10(λ)

βco

op

0 20 40 60 80

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

i

βco

op

True signal

Estimated signal

Figure: Coop-Lassocooperative-Lasso 37

Page 89: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupErrors as a function of the sample size n

pred

icti

on

erro

r

100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sig

ner

ror

100 200 300 400 500

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n n

Figure: h = 3, |Sk| = 5 (favoring Lasso).

lasso group coop

cooperative-Lasso 38

Page 90: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupErrors as a function of the sample size n

pred

icti

on

erro

r

100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sig

ner

ror

100 200 300 400 500

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n n

Figure: h = 4, |Sk| = 7 (intermediate).

lasso group coop

cooperative-Lasso 38

Page 91: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Breiman’s setupErrors as a function of the sample size n

pred

icti

on

erro

r

100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sig

ner

ror

100 200 300 400 500

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n n

Figure: h = 5, |Sk| = 9 (favoring group-Lasso).

lasso group coop

cooperative-Lasso 38

Page 92: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 39

Page 93: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Robust microarray gene selection

Affymetrix typically contains multiple probes per gene defined as siblingprobes.

Reasons (Li, Zhu, Cook, BMC genomics 2008)

1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.

2. instabilityprobe sets cross-hybridize in an unpredictable manner.

3. designed on purposeprobe sets specific to RNA variant (splicing).

at least two good reasons to put sibling probe sets in the same group

cooperative-Lasso 40

Page 94: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Robust microarray gene selection

Affymetrix typically contains multiple probes per gene defined as siblingprobes.

Reasons (Li, Zhu, Cook, BMC genomics 2008)

1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.

2. instabilityprobe sets cross-hybridize in an unpredictable manner.

3. designed on purposeprobe sets specific to RNA variant (splicing).

at least two good reasons to put sibling probe sets in the same group

cooperative-Lasso 40

Page 95: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Application: Basal tumor

Methodology

1. select a restricted number of d probes from differential analysis,

2. determine the genes associated to these d probes, retrieve all the pprobes related to the genes, regardless of their signal,

3. fit a model with group penalties where groups are defined by genes.

Breast cancer data set

I 22269 probes,

I n = 29 patients with basal tumor,

I predict response to chemotherapy: pCR / not-pCR.

cooperative-Lasso 41

Page 96: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Application: Basal tumor

Pretreatment

I ordered p-values with differential analysis (Jeanmougin et al. 2011),

I keep the d = 10 most differentiated probes,

I this corresponds to exactly 10 genes for a total of 27 probes.

Methods comparison

1. probes: logistic Lasso on the d = 10 most differentiated probes,

2. lasso: logistic Lasso on the p = 27 probes (with no group effect),

3. group: logistic group-Lasso on the p = 27 probes (with groupeffect),

4. coop: logistic coop-Lasso on the p = 27 probes (with signed groupeffect).

cooperative-Lasso 42

Page 97: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Results

Gk (gene) pk symbol probes lasso group coop

frmd4b 3 0.38 0.62 0.68 0.75rnps1 2 0 0 0 0phlda3 1 1.82 1.93 4.12 7.32tbc1d22a 3 0 0 0 0ece1 2 0.89 0 0 1.87lzts1 6 1.34 1.57 1.15 0rpp38 1 0.95 0.90 1.92 3.66gtse1 5 0.88 0.85 1.21 0pak4 3 1.68 0.96 1.70 4.58chst10 1 0.79 0.36 1.08 2.50

Table: Genes corresponding to the probes selected by differential analysis, sizeof groups of probes, and `2-norm of each group of parameters for each estimate.

cooperative-Lasso 43

Page 98: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Results0

24

6

Figure: Lasso

Gk (gene) pk symbol

frmd4b 3

rnps1 2

phlda3 1

tbc1d22a 3

ece1 2

lzts1 6

rpp38 1

gtse1 5

pak4 3

chst10 1

cooperative-Lasso 44

Page 99: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Results0

24

6

Figure: Group-Lasso

Gk (gene) pk symbol

frmd4b 3

rnps1 2

phlda3 1

tbc1d22a 3

ece1 2

lzts1 6

rpp38 1

gtse1 5

pak4 3

chst10 1

cooperative-Lasso 44

Page 100: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Results0

24

6

Figure: Coop-Lasso

Gk (gene) pk symbol

frmd4b 3

rnps1 2

phlda3 1

tbc1d22a 3

ece1 2

lzts1 6

rpp38 1

gtse1 5

pak4 3

chst10 1

cooperative-Lasso 44

Page 101: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Results

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

1.2

‖β‖

binom

ialdeviance

lassogroupcoopprobes CV(λ?) CV

?

probes 0.511 0.474lasso 0.513 0.499group 0.430 0.372coop 0.263 0.194

Table: Best average CV scoreCV(λ?) and averaged best CV

score CV?

.

cooperative-Lasso 45

Page 102: Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Conclusion

Summary

I A variant of the group-Lasso which assumes sign-coherent groups,possibly sparse.

I the coop-Lasso comes with the “usual” accompanying toolsI consistency theorem,I model selection criteria,I subset algorithm,I R-package scoop

I very encouraging results on real genomic data

Perspective

I enhance algorithms/implementation for large scale experiments

I deeper analysis in the gene selection framework

I other application in genomics (aCGH segmentation ?)

cooperative-Lasso 46


Recommended