Sparsity with sign-coherent groups of variables via the cooperative-Lasso

Post on 24-May-2015

813 views 0 download

Tags:

transcript

Sparsity with sign-coherent groups of variables via thecooperative-Lasso

Julien Chiquet1, Yves Grandvalet2, Camille Charbonnier1

1 Statistique et Genome, CNRS & Universite d’Evry Val d’Essonne

2 Heudiasyc, CNRS & Universite de Technologie de Compiegne

SSB – 29 mars 2011

arXiv preprint.

http://arxiv.org/abs/1103.2697

R-package scoop.

http://stat.genopole.cnrs.fr/logiciels/scoop

cooperative-Lasso 1

Notations

Let

I Y be the output random variable,

I X = (X1, . . . , Xp) be the input random variables, where Xj is thejth predictor.

The dataGiven a sample (yi, xi), i = 1, . . . , n of i.id. realizations of (Y,X),denote

I y = (y1, . . . , yn)ᵀ the response vector,

I xj = (xj1, . . . , xjn)ᵀ the vector of data for the jth predictor,

I X the n× p design matrix of data whose jth column is xj ,

I D = i : (yi, xi) ∈ training set,I T = i : (yi, xi) ∈ test set.

cooperative-Lasso 2

Generalized linear models

Suppose Y depends linearly on X through a function g:

E(Y ) = g(Xβ?).

We predict a response yi by yi = g(xiβ) for any i ∈ T by solving

β = arg maxβ

`D(β) = arg minβ

∑i∈D

Lg(yi,xiβ),

where Lg is a loss function depending on the function g. Typically,

I if Y is Gaussian and g = Id (OLS),

Lg(y, xβ) = (y − xβ)2

I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)

Lg(y, xβ) = −(y · xβ − log

1 + exβ

)or any negative log-likelihood ` of an exponential family distribution.

cooperative-Lasso 3

Generalized linear models

Suppose Y depends linearly on X through a function g:

E(Y ) = g(Xβ?).

We predict a response yi by yi = g(xiβ) for any i ∈ T by solving

β = arg maxβ

`D(β) = arg minβ

∑i∈D

Lg(yi,xiβ),

where Lg is a loss function depending on the function g. Typically,

I if Y is Gaussian and g = Id (OLS),

Lg(y, xβ) = (y − xβ)2

I if Y is binary and g : t 7→ g(t) = (1 + e−t)−1 (logistic regression)

Lg(y, xβ) = −(y · xβ − log

1 + exβ

)or any negative log-likelihood ` of an exponential family distribution.

cooperative-Lasso 3

Estimation and selection at the group level

1. Structure: the set I = 1, . . . , p splits into a known partition.

I =

K⋃k=1

Gk, with Gk ∩ G` = ∅, k 6= `.

2. Sparsity: the support S of β? has few entries.

S = i : β?i 6= 0, such as |S| p.

The group-Lasso estimator

Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06

βgroup

= arg minβ∈Rp

−`D(β) + λ

K∑k=1

wk∥∥βGk∥∥

.

I λ ≥ 0 controls the overall amount of penalty,

I wk > 0 adapts the penalty between groups (dropped hereafter).

cooperative-Lasso 4

Estimation and selection at the group level

1. Structure: the set I = 1, . . . , p splits into a known partition.

I =

K⋃k=1

Gk, with Gk ∩ G` = ∅, k 6= `.

2. Sparsity: the support S of β? has few entries.

S = i : β?i 6= 0, such as |S| p.

The group-Lasso estimator

Grandvalet and Canu ’98, Bakin ’99, Yuan and Lin ’06

βgroup

= arg minβ∈Rp

−`D(β) + λ

K∑k=1

wk∥∥βGk∥∥

.

I λ ≥ 0 controls the overall amount of penalty,

I wk > 0 adapts the penalty between groups (dropped hereafter).

cooperative-Lasso 4

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

lambda (log scale)

coeffi

cien

ts

lcp

agepgg45

gleason

lbph

lcavol

lweight

svi

Figure: Lasso

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

age

pgg45

lbph

lcavol

svi

lcp

lweigh

t

gleason

0100

200

300

400

500

600

Heigh

t

Figure: hierarchical clustering

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

-3 -2 -1 0

lambda (log scale)

coeffi

cien

ts

lcp

agepgg45

gleason

lbph

lcavol

lweight

svi

Figure: group-Lasso

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Toy example: the prostate dataset

Examines the correlation between the prostate specific antigen and 8clinical measures for 97 patients.

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

lambda (log scale)

coeffi

cien

ts

lcp

agepgg45

gleason

lbph

lcavol

lweight

svi

Figure: Lasso

I lcavol log(cancer volume)

I lweight log(prostate weight)

I age age

I lbph log(benign prostatichyperplasia amount)

I svi seminal vesicle invasion

I lcp log(capsular penetration)

I gleason Gleason score

I pgg45 percentage Gleason scores 4or 5

cooperative-Lasso 5

Application to splice site detection

Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.

1 2 3 4 5 6 7 8 9

Position

0

0.5

1

1.5

2

Inform

ationcontent I order 0: 7 factors with 4 levels,

I order 1: C27 factors with 42 levels,

I order 2: C37 factors with 43 levels,

I using dummy coding for factor,we form groups.

L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.

cooperative-Lasso 6

Application to splice site detection

Predict splice site status (0/1) by a sequence of 7 bases and theirinteractions.

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

g18

g5

g4

g44

g54

g42

g49

g45

g61

order 0order 1order 2

I order 0: 7 factors with 4 levels,

I order 1: C27 factors with 42 levels,

I order 2: C37 factors with 43 levels,

I using dummy coding for factor,we form groups.

L. Meier, S. van de Geer, P. Buhlmann, 2008.The group-Lasso for logistic regression, JRSS series B.

cooperative-Lasso 6

Group-Lasso limitations

1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)

establish the conditions where the group-Lasso outperforms the Lasso,and conversely.

2. No sign-coherence within groupI Required if groups gather consonant variables

e.g., groups defined by clusters of positively correlated variables.

The cooperative-Lasso

A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either

I non-positive,

I non-negative,

I or null parameters.

cooperative-Lasso 7

Group-Lasso limitations

1. Not a single zero should belong to a group with non-zerosI Strong group sparsity (Huang and Zhang, ’10 arXiv)

establish the conditions where the group-Lasso outperforms the Lasso,and conversely.

2. No sign-coherence within groupI Required if groups gather consonant variables

e.g., groups defined by clusters of positively correlated variables.

The cooperative-Lasso

A penalty which assumes a sign-coherent group structure, that is to say,groups which gather either

I non-positive,

I non-negative,

I or null parameters.

cooperative-Lasso 7

Motivation: multiple network inference

experiment 1 experiment 2 experiment 3

inference inference inference

A group is a set of corresponding edges across tasks (e.g., red or blueones): sign-coherence matters!

J. Chiquet, Y. Grandvalet, C. Ambroise, 2010.Inferring multiple graphical structures, Statistics and Computing.

cooperative-Lasso 8

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimize

β∈Rp‖β − y‖2,

s.t

p∑i=1

|βi − βi−1| < s,

where

I y a vector in Rp,

I β a vector in Rp,

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Motivation: joint segmentation of aCGH profiles

0 50 100 150 200

-2-1

01

position on chromosom

log-ratio(C

NVs)

minimizeβ∈Rn×p

‖β −Y‖2,

s.t

p∑i=1

‖βi − βi−1‖ < s,

where

I Y a n× p matrix with n profileswith size p.

I βi a size-n vector with ith probesfor the n profiles.

I a group gathers every position iacross profiles.

Sign-coherence may avoid inconsistent

variations across profiles.

K. Bleakley and J.-P. Vert, 2010.Joint segmentation of many aCGH profiles using fast group LARS, NIPS.

cooperative-Lasso 9

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 10

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 11

The cooperative-Lasso estimator

Definition

βcoop

= arg minβ∈Rp

J(β), with J(β) = −`D(β) + λ‖β‖coop,

where, for any v ∈ Rp,

‖v‖coop = ‖v+‖group + ‖v−‖group =

K∑k=1

∥∥∥v+Gk

∥∥∥+∥∥∥v−Gk∥∥∥ ,

and

I v+ = (v+1 , . . . , v+p ), v+j = max(0, vj),

I v− = (v−1 , . . . , v−p ), v+j = max(0,−vj).

cooperative-Lasso 12

A geometric view of sparsity`(β1,β

2)

β2 β1

minimizeβ1,β2

−`(β1, β2) + λΩ(β1, β2)

mmaximize

β1,β2`(β1, β2)

s.t. Ω(β1, β2) ≤ c

cooperative-Lasso 13

A geometric view of sparsityβ2

β1

minimizeβ1,β2

−`(β1, β2) + λΩ(β1, β2)

mmaximize

β1,β2`(β1, β2)

s.t. Ω(β1, β2) ≤ c

cooperative-Lasso 13

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Ball crafting: group-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖group ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 14

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Ball crafting: cooperative-Lasso

Admissible set

I β = (β1, β2, β3, β4)ᵀ,

I G1 = 1, 2, G2 = 3, 4.

Unit ball

‖β‖coop ≤ 1

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

1

1

−1

−1

β1

β3

β2=

0β2=

0.3

β4 = 0 β4 = 0.3

cooperative-Lasso 15

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 16

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

cooperative-Lasso 17

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

cooperative-Lasso 17

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

cooperative-Lasso 17

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

cooperative-Lasso 17

Convex analysisSupporting Hyperplane

An hyperplane supports a set iff

I the set is contained in one half-space

I the set has at least one point on the hyperplane

β2

β1

β2

β1

β2

β1

There are Supporting Hyperplane at all points of convex sets:Generalize tangents

cooperative-Lasso 17

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Convex analysisDual Cone and subgradient

Generalizes normals

β2

β1

β2

β1

β2

β1

g is a subgradient at xm

the vector (g,−1) is normal to the supporting hyperplane at this point

The subdifferential at x is the set of all subgradient at x.

cooperative-Lasso 18

Optimality conditions

TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :

0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,

where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define

ϕj(v) = (sign(vj)v)+,

then θ is such as

∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)

∥∥ ,

∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)

∥∥ ≤ 1.

We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).

cooperative-Lasso 19

Optimality conditions

TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :

0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,

where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define

ϕj(v) = (sign(vj)v)+,

then θ is such as

∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)

∥∥ ,

∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)

∥∥ ≤ 1.

We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).

cooperative-Lasso 19

Optimality conditions

TheoremA necessary and sufficient condition for the optimality of β is that thenull vector 0 belong to the subdifferential of the convex function J :

0 3 ∂βJ(β) = v ∈ Rp : v = −∇β`(β) + λθ,

where θ ∈ Rp belongs to the subdifferential of the coop-norm.Define

ϕj(v) = (sign(vj)v)+,

then θ is such as

∀k ∈ 1, . . . ,K , ∀j ∈ Sk(β) , θj =βj∥∥ϕj(βGk)

∥∥ ,

∀k ∈ 1, . . . ,K , ∀j ∈ Sck(β) ,∥∥ϕj (θGk)

∥∥ ≤ 1.

We derive a subset algorithm to solve that problem (that you canenjoy in the paper and the package).

cooperative-Lasso 19

Linear regression with orthonormal design

Consider

β = arg minβ

1

2‖y −Xβ‖2 + λΩ(β)

,

with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols

and

β = arg minβ

1

2βᵀ(β − βols

) + λΩ(β)

.

We may find a closed-form of β for, e.g.,

1. Ω(β) = ‖β‖lasso,

2. Ω(β) = ‖β‖group,

3. Ω(β) = ‖β‖coop.

cooperative-Lasso 20

Linear regression with orthonormal design

Consider

β = arg minβ

1

2‖y −Xβ‖2 + λΩ(β)

,

with XᵀX = I. Hence, (xj)ᵀ(Xβ − y) = βj − βols

and

β = arg minβ

1

2βᵀ(β − βols

) + λΩ(β)

.

We may find a closed-form of β for, e.g.,

1. Ω(β) = ‖β‖lasso,

2. Ω(β) = ‖β‖group,

3. Ω(β) = ‖β‖coop.

cooperative-Lasso 20

Linear regression with orthonormal design

βlasso1

βols2 βols

1

∀j ∈ 1, . . . , p ,

βlassoj =

1− λ∣∣∣βolsj

∣∣∣+ βols

j ,

∣∣∣βlassoj

∣∣∣ =(∣∣∣βols

j

∣∣∣− λ)+ .

Fig.: Lasso as a function of the OLS coefficients

cooperative-Lasso 20

Linear regression with orthonormal design

βgroup1

βols2 βols

1

∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,

βgroupj =

1− λ∥∥∥βols

Gk

∥∥∥+ βols

j ,

∥∥∥βgroup

Gk

∥∥∥ =(∥∥∥βols

Gk

∥∥∥− λ)+ .

Fig.: Group-Lasso as a function of the OLS coefficients

cooperative-Lasso 20

Linear regression with orthonormal design

βcoop1

βols2 βols

1

∀k ∈ 1, . . . ,K , ∀j ∈ Gk ,

βcoopj =

1− λ∥∥∥ϕj(βols

Gk )∥∥∥+ βols

j ,

∥∥∥ϕj(βcoop

Gk )∥∥∥ =

(∥∥∥ϕj(βols

Gk )∥∥∥− λ)+ .

Fig.: Coop-Lasso as a function of the OLS coefficients

cooperative-Lasso 20

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 21

Linear regression setupTechnical assumptions

(A1) X and Y have finite fourth order moments

E‖X‖4 <∞, E|Y |4 <∞,

(A2) the covariance matrix Ψ = EXXᵀ ∈ Rp×p is invertible,

(A3) for every k = 1, . . . ,K,if ‖(β?)+‖ > 0 and ‖(β?)−‖ > 0 then for every j ∈ Gk β?j 6= 0.(All sign-coherent groups are either included or excluded from the true support).

cooperative-Lasso 22

Irrepresentability condition

Define Sk = S ∩ Gk the support within a group and

[D(β)]jj = ‖[sign(βj)βGk ]+‖−1.

Assume there exists η > 0 such that

(A4) For every group Gk including at least one null coefficient:

max(‖(ΨSckSΨ−1SSD(β?S)β?S)+‖, ‖(ΨSckSΨ−1SSD(β?S)β?S)−‖) ≤ 1− η,

(A5) For every group Gk intersecting the support and including eitherpositive or negative coefficients, let νk be the sign of thesecoefficients (νk = 1 if ‖(β?Gk)+‖ > 0 and νk = −1 if ‖(β?Gk)−‖ > 0):

νkΨSckSΨ−1SSD(β?S)β?S 0,

where denotes componentwise inequality.

cooperative-Lasso 23

Consistency results

TheoremIf assumptions (A1-5) are satisfied and if there exists η > 0, then forevery sequence λn such that λn = λ0n

−γ , γ ∈]0, 1/2[,

βcoop P−→ β? and P(S(β

coop) = S)→ 1.

Asymptotically, the cooperative-Lasso is unbiased and enjoys exactsupport recovery (even when there are irrelevant variables within agroup).

cooperative-Lasso 24

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Sketch of the proof

1. Construct an artifical estimator βS restricted to the true support Sand extend it with 0 coefficients on Sc.

2. Consider the event En on which β satisfies the original optimalityconditions. On En, βS = β

coop

S and βcoop

Sc = 0, by uniqueness.

3. We need to prove that limn→∞ P(En) = 1.

4. Derive the asymptotic distribution of the derivative of the lossfunction Xᵀ(y −Xβ) from

I TCL on second order moments,I Optimality conditions on βS .I Right choice of λn provides convergence in probability.

5. Assumptions (A4-5) assume that the limits in probability satisfyoptimality constraints with strict inequalities.

6. As a result, optimility conditions are satisfied (with largeinequalities) with probability tending to 1.

cooperative-Lasso 25

Illustration

-3 -2 -1 0 1

-1.0

-0.5

0.0

0.5

1.0

log10(λ)

coeffi

cien

ts

Generate data y = Xβ? + σε,

I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,

I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.

I average over 100 simulations.

Fig.:: 50% coverage intervals (upper / lower quartiles)

cooperative-Lasso 26

Illustration

-3 -2 -1 0 1

-1.0

-0.5

0.0

0.5

1.0

log10(λ)

coeffi

cien

ts

Generate data y = Xβ? + σε,

I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,

I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.

I average over 100 simulations.

Fig.:group-Lasso: 50% coverage intervals (upper / lower quartiles)

cooperative-Lasso 26

Illustration

-3 -2 -1 0 1

-1.0

-0.5

0.0

0.5

1.0

log10(λ)

coeffi

cien

ts

Generate data y = Xβ? + σε,

I β? = (1, 1,−1,−1, 0, 0, 0, 0)I G = 1, 2, 3, 4, 5, 6, 7, 8I σ = 0.1, R2 ≈ 0.99, n = 20,

I irrepresentability conditionsI holds for the coop-Lasso,I holds not for the group-Lasso.

I average over 100 simulations.

Fig.:coop-Lasso: 50% coverage intervals (upper / lower quartiles)

cooperative-Lasso 26

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 27

Optimism of the training error

I The training error:

err =1

|D|∑i∈D

L(yi,xiβ).

I The test error (“extra-sample” error):

Errex = EX,Y [L(Y,Xβ)|D].

I The “in-sample” error

Errin =1

|D|∑i∈D

EY[L(Yi,xiβ)|D

].

Definition (Optimism)

Errin = err + ”optimism”.

cooperative-Lasso 28

Optimism of the training error

I The training error:

err =1

|D|∑i∈D

L(yi,xiβ).

I The test error (“extra-sample” error):

Errex = EX,Y [L(Y,Xβ)|D].

I The “in-sample” error

Errin =1

|D|∑i∈D

EY[L(Yi,xiβ)|D

].

Definition (Optimism)

Errin = err + ”optimism”.

cooperative-Lasso 28

Cp statistics

For squared-error loss (and some other loss),

Errin = err +2

|D|∑i∈D

cov(yi, yi).

The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).

Mallows’ Cp Statistic

For a linear regression fit yi with p inputs∑

i∈D cov(yi, yi) = pσ2 :

Cp = err + 2 · df

|D| σ2, with df = p.

cooperative-Lasso 29

Cp statistics

For squared-error loss (and some other loss),

Errin = err +2

|D|∑i∈D

cov(yi, yi).

The amount by which err underestimates the true error dependson how strongly yi affects its own prediction. The harder we fitthe data, the greater the covariance will be thereby increasingthe optimism (ESLII 5th print).

Mallows’ Cp Statistic

For a linear regression fit yi with p inputs∑

i∈D cov(yi, yi) = pσ2 :

Cp = err + 2 · df

|D| σ2, with df = p.

cooperative-Lasso 29

Generalized degrees of freedom

Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.

Proposition (Efron (’04)+ Stein’s Lemma (’81))

df(λ).=

1

σ2

∑i∈D

cov(yi(λ), yi) = Ey

[tr∂yλ∂y

].

For the Lasso, Zou et al. (’07) show that

df lasso(λ) =∥∥∥βlasso

(λ)∥∥∥0.

Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals

dfgroup(λ) =

K∑k=1

1(∥∥∥βgroup

Gk (λ)∥∥∥ > 0

)1 +

∥∥∥βgroup

Gk (λ)∥∥∥∥∥βols

Gk∥∥ (pk − 1)

.

cooperative-Lasso 30

Generalized degrees of freedom

Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.

Proposition (Efron (’04)+ Stein’s Lemma (’81))

df(λ).=

1

σ2

∑i∈D

cov(yi(λ), yi) = Ey

[tr∂yλ∂y

].

For the Lasso, Zou et al. (’07) show that

df lasso(λ) =∥∥∥βlasso

(λ)∥∥∥0.

Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals

dfgroup(λ) =

K∑k=1

1(∥∥∥βgroup

Gk (λ)∥∥∥ > 0

)1 +

∥∥∥βgroup

Gk (λ)∥∥∥∥∥βols

Gk∥∥ (pk − 1)

.

cooperative-Lasso 30

Generalized degrees of freedom

Let y(λ) = Xβ(λ) be the predicted values for a penalized estimator.

Proposition (Efron (’04)+ Stein’s Lemma (’81))

df(λ).=

1

σ2

∑i∈D

cov(yi(λ), yi) = Ey

[tr∂yλ∂y

].

For the Lasso, Zou et al. (’07) show that

df lasso(λ) =∥∥∥βlasso

(λ)∥∥∥0.

Assuming XᵀX = I Yuan and Lin (’06) show for the group-Lasso thatthe trace term equals

dfgroup(λ) =

K∑k=1

1(∥∥∥βgroup

Gk (λ)∥∥∥ > 0

)1 +

∥∥∥βgroup

Gk (λ)∥∥∥∥∥βols

Gk∥∥ (pk − 1)

.

cooperative-Lasso 30

Approximated degrees of freedom for the coop-Lasso

Proposition

Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)

dfcoop(λ) =

K∑k=1

1∥∥∥∥(βcoopGk

(λ))+∥∥∥∥>0

1 + (pk+ − 1)

∥∥∥∥(βcoop

Gk (λ))+∥∥∥∥∥∥∥∥(βols

Gk

)+∥∥∥∥

+ 1∥∥∥∥(βcoopGk

(λ))−∥∥∥∥>0

1 + (pk− − 1)

∥∥∥∥(βcoop

Gk (λ))−∥∥∥∥∥∥∥(βols

Gk)−∥∥∥

,

where pk+ and pk− are respectively the number of positive and negative

entries in βols

Gk (γ).

cooperative-Lasso 31

Approximated degrees of freedom for the coop-Lasso

Proposition

Assuming that data are generated according to a linear regression modeland that X is orthonormal, the following expression of dfcoop(λ) is anunbiased estimate of df(λ)

dfcoop(λ) =

K∑k=1

1∥∥∥∥(βcoopGk

(λ))+∥∥∥∥>0

1 +

pk+− 1

1 + γ

∥∥∥∥(βcoop

Gk (λ))+∥∥∥∥∥∥∥∥(βridge

Gk (γ))+∥∥∥∥

+ 1∥∥∥∥(βcoopGk

(λ))−∥∥∥∥>0

1 +

pk−− 1

1 + γ

∥∥∥∥(βcoop

Gk (λ))−∥∥∥∥∥∥∥∥(βridge

Gk (γ))−∥∥∥∥

,

where pk+ and pk− are respectively the number of positive and negative

entries in βridge

Gk (γ).cooperative-Lasso 31

Approximated information criteria

Following Zou et al, we extend the Cp stat to an “approximated” AIC

AIC(λ) =‖y − y(λ)‖

σ2+ 2df(λ),

and from the AIC, there is (small) step to BIC:

BIC(λ) =‖y − y(λ)‖

σ2+ log(n)df(λ).

I The K–fold cross-validation works well but is computationallyintensive.

I It is required when we do not meet the linear regression setup. . .

cooperative-Lasso 32

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 33

Revisiting Elastic-Net experiments (1)

lasso enet group coop

1020

3040

5060

70

MS

E

Generate data y = Xβ? + σε,

I β? =(0, . . . , 0︸ ︷︷ ︸

10

, 2, . . . , 2︸ ︷︷ ︸10

, 0, . . . , 0︸ ︷︷ ︸10

, 2, . . . , 2︸ ︷︷ ︸10

)

I G1 = 1, . . . , 10,G2 = 11, . . . , 20,G3 = 21, . . . , 30,G4 = 31, . . . , 40.

I σ = 15, corr(xi,xj) = 0.5,

I training/validation/test/ =100/100/400,

I average over 100 simulations.

cooperative-Lasso 34

Revisiting Elastic-Net experiments (2)

lasso enet group coop

050

100

150

200

250

MS

E

Generate data y = Xβ? + σε,

I β? = (3, . . . , 3︸ ︷︷ ︸15

, 0, . . . , 0︸ ︷︷ ︸25

)

I σ = 15,

I G1 = 1, . . . , 5,G2 = 6, . . . , 10,G3 = 11, . . . , 15,G4 = 16, . . . , 40.

I xj = Z1 + ε, Z1 ∼ N (0, 1), ∀j ∈ G1I xj = Z3 + ε, Z2 ∼ N (0, 1), ∀j ∈ G2I xj = Z3 + ε, Z3 ∼ N (0, 1), ∀j ∈ G3I xj ∼ N (0, 1), ∀j ∈ G4.

I training/validation/test/ =50/50/400,

I average over 100 simulations.

cooperative-Lasso 35

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 1, |Sk| = 1 non-zero coefficients in each active group.

cooperative-Lasso 36

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 2, |Sk| = 3 non-zero coefficients in each active group.

cooperative-Lasso 36

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 3, |Sk| = 5 non-zero coefficients in each active group.

cooperative-Lasso 36

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 4, |Sk| = 7 non-zero coefficients in each active group.

cooperative-Lasso 36

Breiman’s setupSimulations setting

A wave-like vector of parameters β?

I p = 90 variables partitioned into K = 10 groups of size pk = 9,

I 3 (partially) active groups, 6 groups of zeros,

I in active groups, β?j ∝ (h− |5− j|) with h = 1, . . . , 5.

0 20 40 60 80

Figure: β? with h = 5, |Sk| = 9 non-zero coefficients in each active group.

cooperative-Lasso 36

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

Remark

Covariance structure is purposely disconnected from the group structure.

None of the support recovery conditions are fulfilled.

cooperative-Lasso 37

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

One shot sample with n = 120

cooperative-Lasso 37

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

-0.2

0.0

0.2

0.4

0.6

log10(λ)

βlasso

0 20 40 60 80

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

i

βlasso

True signal

Estimated signal

Figure: Lassocooperative-Lasso 37

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

log10(λ)

βgroup

0 20 40 60 80

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

i

βgroup

True signal

Estimated signal

Figure: Group-Lassocooperative-Lasso 37

Breiman’s setupExample of path of solution and signal recovery with BIC choice

The signal strength is generated so as

I y = Xβ? + σε, with σ = 1, n = 30 to 500,

I X ∼ N (0,Ψ) with Ψij = ρ|i−j| (ρ = 0.4 in the example),

I magnitude in β chosen so as R2 ≈ 0.75.

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

log10(λ)

βco

op

0 20 40 60 80

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

i

βco

op

True signal

Estimated signal

Figure: Coop-Lassocooperative-Lasso 37

Breiman’s setupErrors as a function of the sample size n

pred

icti

on

erro

r

100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sig

ner

ror

100 200 300 400 500

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n n

Figure: h = 3, |Sk| = 5 (favoring Lasso).

lasso group coop

cooperative-Lasso 38

Breiman’s setupErrors as a function of the sample size n

pred

icti

on

erro

r

100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sig

ner

ror

100 200 300 400 500

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n n

Figure: h = 4, |Sk| = 7 (intermediate).

lasso group coop

cooperative-Lasso 38

Breiman’s setupErrors as a function of the sample size n

pred

icti

on

erro

r

100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

1.2

sig

ner

ror

100 200 300 400 500

0.00

0.05

0.10

0.15

0.20

0.25

0.30

n n

Figure: h = 5, |Sk| = 9 (favoring group-Lasso).

lasso group coop

cooperative-Lasso 38

Outline

Definition

Resolution

Consistency

Model selection

Simulation studies

Sibling probe sets and gene selection

cooperative-Lasso 39

Robust microarray gene selection

Affymetrix typically contains multiple probes per gene defined as siblingprobes.

Reasons (Li, Zhu, Cook, BMC genomics 2008)

1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.

2. instabilityprobe sets cross-hybridize in an unpredictable manner.

3. designed on purposeprobe sets specific to RNA variant (splicing).

at least two good reasons to put sibling probe sets in the same group

cooperative-Lasso 40

Robust microarray gene selection

Affymetrix typically contains multiple probes per gene defined as siblingprobes.

Reasons (Li, Zhu, Cook, BMC genomics 2008)

1. lack of knowledgegenome annotation maps probe sets to the same genes after chip design.

2. instabilityprobe sets cross-hybridize in an unpredictable manner.

3. designed on purposeprobe sets specific to RNA variant (splicing).

at least two good reasons to put sibling probe sets in the same group

cooperative-Lasso 40

Application: Basal tumor

Methodology

1. select a restricted number of d probes from differential analysis,

2. determine the genes associated to these d probes, retrieve all the pprobes related to the genes, regardless of their signal,

3. fit a model with group penalties where groups are defined by genes.

Breast cancer data set

I 22269 probes,

I n = 29 patients with basal tumor,

I predict response to chemotherapy: pCR / not-pCR.

cooperative-Lasso 41

Application: Basal tumor

Pretreatment

I ordered p-values with differential analysis (Jeanmougin et al. 2011),

I keep the d = 10 most differentiated probes,

I this corresponds to exactly 10 genes for a total of 27 probes.

Methods comparison

1. probes: logistic Lasso on the d = 10 most differentiated probes,

2. lasso: logistic Lasso on the p = 27 probes (with no group effect),

3. group: logistic group-Lasso on the p = 27 probes (with groupeffect),

4. coop: logistic coop-Lasso on the p = 27 probes (with signed groupeffect).

cooperative-Lasso 42

Results

Gk (gene) pk symbol probes lasso group coop

frmd4b 3 0.38 0.62 0.68 0.75rnps1 2 0 0 0 0phlda3 1 1.82 1.93 4.12 7.32tbc1d22a 3 0 0 0 0ece1 2 0.89 0 0 1.87lzts1 6 1.34 1.57 1.15 0rpp38 1 0.95 0.90 1.92 3.66gtse1 5 0.88 0.85 1.21 0pak4 3 1.68 0.96 1.70 4.58chst10 1 0.79 0.36 1.08 2.50

Table: Genes corresponding to the probes selected by differential analysis, sizeof groups of probes, and `2-norm of each group of parameters for each estimate.

cooperative-Lasso 43

Results0

24

6

Figure: Lasso

Gk (gene) pk symbol

frmd4b 3

rnps1 2

phlda3 1

tbc1d22a 3

ece1 2

lzts1 6

rpp38 1

gtse1 5

pak4 3

chst10 1

cooperative-Lasso 44

Results0

24

6

Figure: Group-Lasso

Gk (gene) pk symbol

frmd4b 3

rnps1 2

phlda3 1

tbc1d22a 3

ece1 2

lzts1 6

rpp38 1

gtse1 5

pak4 3

chst10 1

cooperative-Lasso 44

Results0

24

6

Figure: Coop-Lasso

Gk (gene) pk symbol

frmd4b 3

rnps1 2

phlda3 1

tbc1d22a 3

ece1 2

lzts1 6

rpp38 1

gtse1 5

pak4 3

chst10 1

cooperative-Lasso 44

Results

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

1.2

‖β‖

binom

ialdeviance

lassogroupcoopprobes CV(λ?) CV

?

probes 0.511 0.474lasso 0.513 0.499group 0.430 0.372coop 0.263 0.194

Table: Best average CV scoreCV(λ?) and averaged best CV

score CV?

.

cooperative-Lasso 45

Conclusion

Summary

I A variant of the group-Lasso which assumes sign-coherent groups,possibly sparse.

I the coop-Lasso comes with the “usual” accompanying toolsI consistency theorem,I model selection criteria,I subset algorithm,I R-package scoop

I very encouraging results on real genomic data

Perspective

I enhance algorithms/implementation for large scale experiments

I deeper analysis in the gene selection framework

I other application in genomics (aCGH segmentation ?)

cooperative-Lasso 46