Center for Evolutionary Functional Genomics
Large-Scale Sparse Logistic Regression
Jieping Ye
Arizona State University
Joint work with Jun Liu and Jianhui Chen
Center for Evolutionary Functional Genomics
Prediction: Disease or not Confidence (probability) Identify Informative features
Sparse Logistic Regression
Center for Evolutionary Functional Genomics
Logistic Regression
Logistic Regression (LR) has been applied to Document classification (Brzezinski, 1999)
Natural language processing (Jurafsky and Martin, 2000)
Computer vision (Friedman et al., 2000)
Bioinformatics (Liao and Chin, 2007)
Regularization is commonly applied to reduce overfitting and obtain a robust classifier. Two well-known regularizations are:
L2-norm regularization (Minka, 2007)
L1-norm regularization (Koh et al., 2007)
Center for Evolutionary Functional Genomics
Sparse Logistic Regression
L1-norm regularization leads to sparse logistic regression (SLR) Simultaneous feature selection and classification Enhanced model interpretability Improved classification performance
Applications M.-Y. Park and T. Hastie, Penalized Logistic Regression for
Detecting Gene Interactions. Biostatistics, 2008. T. Wu et al. Genomewide Association Analysis by Lasso Penalized
Logistic Regression. Bioinformatics, 2009.
Center for Evolutionary Functional Genomics
Large-Scale Sparse Logistic Regression
Many applications involve data of large dimensionality
The MRI images used in Alzheimer’s Disease study contain more than 1 million voxels (features)
Major Challenge How to scale sparse logistic
regression to large-scale problems?
Center for Evolutionary Functional Genomics
The Proposed Lassplore Algorithm
Lassplore (LArge-Scale SParse LOgistic REgression) is a first-order method
Each iteration of Lassplore involves the matrix-vector multiplication onlyScale to large-size problemsEfficient for sparse data
Lassplore achieves the optimal convergence rate among all first-order methods
Center for Evolutionary Functional Genomics
Outline
Logistic Regression
Sparse Logistic Regression
Lassplore
Experiments
Center for Evolutionary Functional Genomics
Logistic Regression (1)
Logistic regression model is given by
1
Prob( | ) ( )1 exp ( )
T
Tb a w a c
b w a c
a nR is the sample
{ 1, 1}b is the class label
Center for Evolutionary Functional Genomics
Logistic Regression (2)
1{a , }mi i ib Given a set of m training data , we can compute w
and c by minimizing the average logistic loss:
1a2a ma
Prob( | ) ( )Ti i ib a w a c
i
Prob( | )i ib a is maximized
overfitting
Center for Evolutionary Functional Genomics
L1-ball Constrained Logistic Regression
Favorable Properties: Obtaining sparse solution Performing feature selection and classification simultaneously Improving classification performance
How to solve the L1-ball constrained optimization problem?
Center for Evolutionary Functional Genomics
Gradient Method for Sparse Logistic Regression
Let us consider the gradient descent for solving the optimization problem: min ( )
x Gg x
kx1kx
1 '( ) /k k k kx x g x L
Center for Evolutionary Functional Genomics
Euclidean Projection onto the L1-Ball
v1
π(v1)
π(v2)
π(v3)
v2
v3
z
y
x
z
0
The Euclidean projection onto the L1-ball (Duchi et al., 2008) is a building block, and it can be solved in linear time (Liu and Ye, 2009).
Center for Evolutionary Functional Genomics
Gradient Method & Nesterov’s Method (1)
g(.) Gradient Descent Nesterov’s method
smooth and convex O(1/k) O(1/k2)
smooth and strongly convex with conditional number C
21
1
kC
OC
11
k
OC
Convergence rates:
Nesterov’s method achieves the lower-complexity bound of smooth optimization by first-order black-box method, and thus is an optimal method.
Center for Evolutionary Functional Genomics
Gradient Method & Nesterov’s Method (2)
The theoretical number of iterations (up to a constant factor) for achieving an accuracy of 10-8:
g(.) Gradient Descent Nesterov’s method
smooth and convex 108 104
smooth and strongly convex with conditional number
C= 104
4.6×104 1.8×103
Center for Evolutionary Functional Genomics
Characteristics of the Lassplore
First-order black-box Oracle based method At each iteration, we only need to evaluate the function value and gradient
Utilizing the Nesterov’s method (Nesterov, 2003)
Global convergence rate of O(1/k2) for the general case
Linear convergence rate for the strongly convex case
An adaptive line search scheme The step size is allowed to increase during the iterations
This line search scheme is applicable to the general smooth convex optimization
Center for Evolutionary Functional Genomics
Key Components and Settings
xk
sk
xk+1
xk-1
sk=xk+βk(xk-xk-1)xk+1=sk-g'(sk)/Lk
Previous schemes for :Nesterov’s constant scheme (Nesterov, 2003)
Nemirovski’s line search scheme (Nemirovski, 1994)
,k kL
1kx
1 '( ) /k k k kx x g x L kx
Center for Evolutionary Functional Genomics
Previous Line Search Schemes
k
Nesterov’s constant scheme (Nesterov, 2003): is set to a constant value L, the Lipschitz continuous gradient of the function g(.) is dependent on the conditional number C
kL
k
Nemirovski’s line search scheme (Nemirovski, 1994): is allowed to increase, and upper-bounded by 2L is identical for every function g(.)
kL
Center for Evolutionary Functional Genomics
Proposed Line Search SchemeCharacteristics: is allowed to adaptively tuned (increasing and decreasing) and
upper-bounded by 2L is dependent on It preserves the optimal convergence rate (technical proof refers to the
paper)
kL
kkL
Center for Evolutionary Functional Genomics
Related Work Y. Nesterov. Gradient methods for minimizing composite
objective function (Technical Report 2007/76). S. Becker, J. Bobin, and E. J. Candès. NESTA: a fast and
accurate first-order method for sparse recovery. 2009. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding
algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183-202, 2009.
K.-C. Toh and S. Yun. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Preprint, National University of Singapore, March 2009.
S. Ji and J. Ye. An Accelerated Gradient Method for Trace Norm Minimization. The Twenty-Sixth International Conference on Machine Learning, 2009.
Center for Evolutionary Functional Genomics
Experiments: Data Sets
Center for Evolutionary Functional Genomics
Comparison of the Line Search Schemes
Comparison the proposed adaptive scheme (Adap) with the one proposed by Nemirovski (Nemi)
kL Objective
Center for Evolutionary Functional Genomics
Pathwise Solutions: Warm Start vs. Cold Start
Center for Evolutionary Functional Genomics
Comparison with ProjectionL1 (Schmidt et al., 2007)
Adaptive Scheme
Center for Evolutionary Functional Genomics
Comparison with ProjectionL1 (Schmidt et al., 2007)
Adaptive Scheme
Center for Evolutionary Functional Genomics
Comparison with l1-logreg (Koh et al., 2007)
Center for Evolutionary Functional Genomics
Drosophila Gene Expression Image Analysis
Drosophila embryogenesis is divided into 17 developmental stages (1-17)
BDGP
Fly-FISH
Center for Evolutionary Functional Genomics
Sparse Logistic Regression: Application (2)
Center for Evolutionary Functional Genomics
Summary
The Lassplore algorithm for sparse logistic regression First-order black-box method Optimal convergence rate Adaptive line search scheme
Future work Apply the proposed approach for other mixed-norm
regularized optimization Biological image analysis
Center for Evolutionary Functional Genomics
The Lassplore Package
http://www.public.asu.edu/~jye02/Software/lassplore/
Center for Evolutionary Functional Genomics
Thank you!