+ All Categories
Home > Documents > IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

Date post: 03-Oct-2021
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
27
IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization [Where Signal Processing Meets Financial Engineering] Ziping Zhao, Rui Zhou, Daniel P. Palomar, and Yiyong Feng The application of research ideas from theoretical physics, mathematics, and control theory to the financial markets has been a common industrial practice for now almost three decades. Engineering has also witnessed a steady flow of contributions to financial engineering from fields like com- puter science, data analytics, and artificial intelligence. Signal processing, without exception, has benefited financial engi- neering substantially through well-known and widely applied techniques as well, to name a few, the Fourier transform, the Kalman filter, and shrinkage methods. The connection between signal processing and financial engineering is becoming more and more evident. This tutorial paper attempts to summarize modern portfolio theory, regarded as one of the pillars of modern finance, from the perspective of signal processing. I. MOTIVATION Portfolio optimization is an important and fundamental topic, not only in the academic community, but in the financial industry. Modern portfolio theory (MPT), pioneered by Harry Markowitz in the 1950s [1], has revolutionized the finance world. Prior to that, people made investments mainly based on their expertise or experience. The appearance of MPT shaped the investment into a mathematical, systematical, and scientific activity. It provides an answer to the fundamental decision-making question in finance: How should an investor allocate the capital among the possible investment choices? Without exaggeration, MPT is one of the pillars of modern finance and scientific investing. Besides the pioneering MPT, many other investment strategies have also been proposed for different goals. To name a few, the robust portfolio is designed to mitigate the parameters estimation errors in the portfolio optimization problems; the risk parity portfolio aims at distributing the overall risk of the designed portfolio equally among assets for risk diversification purpose; the index track- ing portfolio attempts to follow the market indices based on the efficient market assumption; the statistical arbitrage portfolio as a contrarian strategy forms a synthetic asset independent of the market and then benefits from its random oscillations (to be more specific, mean-reverting pattern). Interestingly, in financial engineering the whole process of a portfolio investment strategy can be seamlessly connected to the signal modeling techniques, parameter estimation methods, advanced optimization algorithms, and others from a signal processing perspective. All these mathematical connections motivate the application of theories and methodologies from signal processing to financial engineering and vice-versa. There has been a relative short history of employing signal pro- cessing techniques for solving financial engineering problems in the signal processing community. In terms of publications on financial engineering topics, there have been many regular papers and several special issues 1 in major signal processing journals. With the development of signal processing theories and techniques, we believe there is much space for researchers to explore and to unveil the science behind the financial investing world. The goal of this tutorial paper is to provide the signal processing community with a starting point to approach finan- cial engineering. Especially, it aims at overviewing advanced portfolio strategies from a signal processing perspective. II. SIGNAL PROCESSING MEETS FINANCIAL ENGINEERING A financial market can be viewed as a signal processing sys- tem. In this section, we elaborate some connections between signal processing and financial engineering. A. Signal, Noise, and Asset Pricing Consider a narrowband array model with N receive anten- nas in signal processing [5]. At time index t, the received signals at the N antennas x t , [x 1,t ,...,x N,t ] T C N can be expressed as x t = hx tx,t + t , where x tx,t C is the transmitted temporal waveform, h C N is the transmission channel also called the spatial steering vector, and t C N denotes the additive noise. The received signals are noisy due to random disturbances. In financial engineering, financial asset returns are signals carrying information on assets valuation. The excess returns 2 of N assets ˜ r t , r 1,t ,..., ˜ r N,t ] T R N can be represented by the capital asset pricing model (CAPM) 3 [6] as ˜ r t = β ˜ r mk,t + t , where ˜ r mk,t R is the market excess return, β R N is the factor loading vector, and t R N denotes the additive noise. Financial returns are noisy signals due to reasons like 1 Examples include the 2011 IEEE SIGNAL PROCESSING MAGAZINE Special Issue on Signal Processing for Financial Applications [2], the 2012 IEEE J OURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING Special Issue on Signal Processing Methods in Finance and Electronic Trading [3], and the 2016 IEEE J OURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING Special Issue on Financial Signal Processing and Machine Learning for Electronic Trading [4]. 2 Excess returns are investment returns from an asset that exceed the risk- free asset, such as a certificate of deposit or a government-issued bond. 3 William Sharpe was awarded the NOBEL PRIZE IN ECONOMICS in 1990 for his contributions to the pricing theory of financial assets, i.e., CAPM.
Transcript
Page 1: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 1

Portfolio Optimization[Where Signal Processing Meets Financial Engineering]

Ziping Zhao, Rui Zhou, Daniel P. Palomar, and Yiyong Feng

The application of research ideas from theoretical physics,mathematics, and control theory to the financial markets hasbeen a common industrial practice for now almost threedecades. Engineering has also witnessed a steady flow ofcontributions to financial engineering from fields like com-puter science, data analytics, and artificial intelligence. Signalprocessing, without exception, has benefited financial engi-neering substantially through well-known and widely appliedtechniques as well, to name a few, the Fourier transform, theKalman filter, and shrinkage methods. The connection betweensignal processing and financial engineering is becoming moreand more evident. This tutorial paper attempts to summarizemodern portfolio theory, regarded as one of the pillars ofmodern finance, from the perspective of signal processing.

I. MOTIVATION

Portfolio optimization is an important and fundamentaltopic, not only in the academic community, but in the financialindustry. Modern portfolio theory (MPT), pioneered by HarryMarkowitz in the 1950s [1], has revolutionized the financeworld. Prior to that, people made investments mainly basedon their expertise or experience. The appearance of MPTshaped the investment into a mathematical, systematical, andscientific activity. It provides an answer to the fundamentaldecision-making question in finance: How should an investorallocate the capital among the possible investment choices?Without exaggeration, MPT is one of the pillars of modernfinance and scientific investing. Besides the pioneering MPT,many other investment strategies have also been proposedfor different goals. To name a few, the robust portfolio isdesigned to mitigate the parameters estimation errors in theportfolio optimization problems; the risk parity portfolio aimsat distributing the overall risk of the designed portfolio equallyamong assets for risk diversification purpose; the index track-ing portfolio attempts to follow the market indices based on theefficient market assumption; the statistical arbitrage portfolioas a contrarian strategy forms a synthetic asset independent ofthe market and then benefits from its random oscillations (tobe more specific, mean-reverting pattern).

Interestingly, in financial engineering the whole process ofa portfolio investment strategy can be seamlessly connected tothe signal modeling techniques, parameter estimation methods,advanced optimization algorithms, and others from a signalprocessing perspective. All these mathematical connectionsmotivate the application of theories and methodologies fromsignal processing to financial engineering and vice-versa.There has been a relative short history of employing signal pro-cessing techniques for solving financial engineering problems

in the signal processing community. In terms of publicationson financial engineering topics, there have been many regularpapers and several special issues1 in major signal processingjournals.

With the development of signal processing theories andtechniques, we believe there is much space for researchers toexplore and to unveil the science behind the financial investingworld. The goal of this tutorial paper is to provide the signalprocessing community with a starting point to approach finan-cial engineering. Especially, it aims at overviewing advancedportfolio strategies from a signal processing perspective.

II. SIGNAL PROCESSING MEETS FINANCIALENGINEERING

A financial market can be viewed as a signal processing sys-tem. In this section, we elaborate some connections betweensignal processing and financial engineering.

A. Signal, Noise, and Asset Pricing

Consider a narrowband array model with N receive anten-nas in signal processing [5]. At time index t, the receivedsignals at the N antennas xt , [x1,t, . . . , xN,t]

T ∈ CN canbe expressed as

xt = hxtx,t + εt,

where xtx,t ∈ C is the transmitted temporal waveform, h ∈CN is the transmission channel also called the spatial steeringvector, and εt ∈ CN denotes the additive noise. The receivedsignals are noisy due to random disturbances.

In financial engineering, financial asset returns are signalscarrying information on assets valuation. The excess returns2

of N assets rt , [r1,t, . . . , rN,t]T ∈ RN can be represented

by the capital asset pricing model (CAPM)3 [6] as

rt = βrmk,t + εt,

where rmk,t ∈ R is the market excess return, β ∈ RN isthe factor loading vector, and εt ∈ RN denotes the additivenoise. Financial returns are noisy signals due to reasons like

1Examples include the 2011 IEEE SIGNAL PROCESSING MAGAZINESpecial Issue on Signal Processing for Financial Applications [2], the 2012IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSING SpecialIssue on Signal Processing Methods in Finance and Electronic Trading [3], andthe 2016 IEEE JOURNAL ON SELECTED TOPICS IN SIGNAL PROCESSINGSpecial Issue on Financial Signal Processing and Machine Learning forElectronic Trading [4].

2Excess returns are investment returns from an asset that exceed the risk-free asset, such as a certificate of deposit or a government-issued bond.

3William Sharpe was awarded the NOBEL PRIZE IN ECONOMICS in 1990for his contributions to the pricing theory of financial assets, i.e., CAPM.

Page 2: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 2

Fig. 1. Power is money: the parallelism between power allocation in signal processing and capital allocation in financial engineering.

market news impacts and even physical system errors. In signalprocessing, engineers need to estimate the channel h. In thefinancial engineering, in order to properly design an investmentstrategy or make a prediction of the asset value, investors needto estimate the factor loading β.

B. Beamforming, Filtering, and Portfolio DesignIn addition to signal modelings, the beamformer design

or filter design in signal processing also shares a strikingmathematical similarity with the portfolio design in financialengineering. In signal processing, a beamformer w ∈ CN isa vector that gives the N antenna signals (complex-valued)weights and the output signal is obtained as

xp,t = wHxt = wHhxtx,t +wHεt.

In financial engineering, a portfolio w ∈ RN is a vector thatgives the N asset returns (real-valued) weights (i.e., relativedollar amounts) and the resulting portfolio is given by

rp,t = wT rt = wTβrmk,t +wT εt.

In signal processing, the beamforming problem is about howto allocate weights (i.e., power) to different antenna signalsbased on a certain design criterion [5]. Likewise, in financialengineering the portfolio design problem is about how toallocate weights (i.e., capital) to different financial assets basedon a given design objective [7]. Such a connection is shown inFig. 1. In fact, in signal processing there are several classicalreceive adaptive optimal beamforming techniques that havenice counterparts in financial engineering.

Case 1: The maximum output signal-to-noise ratio (SNR)beamformer in signal processing is attained through the fol-lowing problem:

w?MSNR = argmax

w

E[∣∣wHhxtx,t

∣∣2]E[|wHεt|2

] = argmaxw

ϕ2tx

∣∣wHh∣∣2

wHΣw,

where ϕ2tx , E

[|xtx,t|2

]denotes the average signal power

and Σ , E[εtε

Ht

]is the noise covariance matrix. In financial

engineering, the maximum Sharpe ratio (SR) portfolio designis given by

w?MSRP = argmax

w

E[wTβrmk,t

]√E[|wT εt|2

] = argmaxw

µmkwTβ√wTΣw

,

where µmk , E [rmk,t] denotes the expected market excessreturn (a.k.a. market premium) and Σ , E

[εtε

Tt

]is the noise

covariance matrix.Case 2: The maximization of SNR is equivalent (up to a

scaling factor) to minimizing the noise power:

w?MSNR = argmin

wwHΣw s.t. wHh = ρ.

Similarly, the maximum Sharpe ratio (SR) portfolio can alsobe reformulated as

w?MVP = argmin

wwTΣw s.t. wTβ = ρ,

which is the Markowitz mean-variance portfolio (MVP) [1].Case 3: The maximum SNR beamformer is also equivalent

to the minimum output energy (MOE) beamformer which isobtained by minimizing the output signal energy:

w?MOE =argmin

wE[∣∣wHxt

∣∣2]=argmin

wwH

(ϕ2txhh

H +Σ)w s.t. wHh = ρ,

which becomes the minimum variance distortionless response(MVDR) or Capon beamforming [8] when ρ = 1. In financialengineering, the mean-variance portfolio can also be equiva-lent formulated as

w?MVP =argmin

wE[∣∣wT rt

∣∣2]=argmin

wwT

(ϕ2mkββ

T +Σ)w s.t. wTβ = ρ,

where ϕ2mk , E

[|rmk,t|2

].

Case 4: The minimum mean squared error (MMSE) beam-former design (a.k.a. Wiener filtering) [5] defines a differentbeamforming optimality criterion. It minimizes the differencebetween the output signal xp,t and a reference signal ηt(usually a quantity of scaled xtx,t):

w?MMSE = argmin

wE[∣∣wHxt − ηt

∣∣2],which is again equivalent to the previous beamformers. In fi-nancial engineering, a similar idea can be applied to minimizethe tracking error between the portfolio return rp,t and a targetreturn ηt:

w?MVP = argmin

wE[∣∣wT rt − ηt

∣∣2] ,which can also be interpret as an MVP. Like all the beamform-ing problems leading to the same optimal receive beamformer,

Page 3: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 3

0

30

60

90

2010 2012 2014

Da

ily p

rice

2.5

3.0

3.5

4.0

4.5

5.0

2010 2012 2014

Da

ily lo

g−

pri

ce

−0.10

−0.05

0.00

0.05

2010 2012 2014

Da

ily lo

g−

retu

rn

Fig. 2. Financial signals: stock price, log-price, and log-return of NASDAQ: AAPL (Apple Inc.).

TABLE ICONNECTIONS BETWEEN SIGNAL PROCESS. AND FINANCIAL ENG.

Signal Processing Financial Engineeringantenna signals asset returns

beamformer portfoliosignal-to-noise ratio Sharpe ratio

channel modeling and estimation factor modeling and estimationantenna signal covariance matrix asset return covariance matrix

beamforming signal portfolio returnnoise power portfolio risk

beamforming quality of service portfolio performance measurerobust beamformer design robust portfolio design

multi-user beamformer design multi-account portfolio designsparse regression sparse index trackingKalman filtering pairs trading

the portfolio designed by the equivalent problems are namedmarket portfolios in finance. (See Section IV-H for details.)

C. More Connections Between Signal Processing and Finan-cial Engineering

Besides the above showcase examples between beamformerdesign in signal processing and portfolio design in financialengineering, there are many other connections with some ofthem listed in Table I. These striking similarities illustratethe potential benefits of transferring expertise from one areato the other. For example, the estimation of the covariancematrix with identity shrinkage in financial engineering ismathematically equivalent to the diagonal loading matrix(i.e., the addition of a scaled identity matrix to the samplecovariance matrix) derived for robust adaptive beamforming insignal processing. For large-dimensional data, the asymptoticperformance of the covariance matrix estimators has been awell-researched topic both in signal processing and financialengineering using random matrix theory [9], [10], [11]. Morethan these, there is plenty of room for applying signal process-ing knowledge to financial engineering problems, such as low-rank (reduced-rank) signal processing [12], subspace methods[13], sparsity-aware signal processing [14], robust estimationtechniques [15], Fourier and wavelet analysis [16], missingdata and imputation theory [17], resampling and bootstrapping[18], graph signal processing and network theory [19], [20],information theory [21], game theory [22], machine learningand deep leaning [23], [24], [25], etc. Specifically, high-frequency financial data belongs to the big data paradigm.

In the following, we will first introduce the fundamentals offinancial signal modeling and then explore the many variationsand facets of portfolio optimization.

III. SIGNAL MODELING IN PORTFOLIO OPTIMIZATION

In financial markets, two of the most basic quantities are“price” and “return”. The price of a financial asset (e.g., stock,option, exchanged-traded-fund (ETF), commodity, real estate,etc.) is denoted as pt ∈ R+ at (discrete) time index t. Insteadof the price, the natural logarithm of price named log-priceyt , log (pt) ∈ R is commonly used for modeling. Thesimplest, most fundamental, and widely accepted model onlog-prices (mainly for stocks) is the random walk with drift:4

yt = ν + yt−1 + εt, (Random Walk)

where ν denotes the drift and εt is a white noise. The driftν is important in finance and represents the time trend of thelog-prices. Stationarity is a cornerstone property facilitatingthe analysis and processing of random signals in the timedomain. (A nonstationary process has time-variant moments.)A random walk model is nonstationary and hence is notdirectly used for signal analysis.

For financial assets, returns are used for signal modelingsince they are stationary, as opposed to the previous randomwalk [28]. The simple return (a.k.a. linear return or net return)of an asset (assuming no dividends) is

Rt ,pt − pt−1

pt−1=

ptpt−1

− 1. (Return)

It defines the relative profit or loss in terms of percentageand lower-bounded by −1. Total return or gross return isaccordingly defined as 1 +Rt = pt/pt−1. Based on the assetlog-price yt, the log-return (a.k.a. continuously compoundedreturn) commonly used in finance is given by

rt , yt − yt−1 = log

(pt

pt−1

). (Log-return)

Observe that the log-return is “stationary” since rt = yt −yt−1 = ν + εt is stationary. (See Signal Modeling in Finance:Linear Return or Log-Return?) Fig. 2 depicts the prices, log-prices, and log-returns of “AAPL” (Apple Inc.) in the U.S.stock market.

4The continuous time counterpart of a random walk is a Brownian motionprocess (a.k.a. Wiener process) with drift [26], [27].

Page 4: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 4

0

50

100

150

200

−0.10 −0.05 0.00 0.05 0.10

Log−return

Co

un

tHistogram of daily log−returns

0

50

100

150

−0.2 −0.1 0.0 0.1

Log−return

Co

un

t

Histogram of weekly log−returns

0

50

100

150

−0.2 0.0 0.2

Log−return

Co

un

t

Histogram of monthly log−returns

Fig. 3. Skewness and kurtosis of Standard & Poor’s (S&P) 500 returns of different granularities. (Skewness statistics is used to assess the lack of symmetry indata distribution; Kurtosis statistics is a measure of whether the data distribution is heavy-tailed or light-tailed relative to a Gaussian distribution (blue line).)

0.00

0.25

0.50

0.75

1.00

0 10 20 30

Lag

AC

F

ACF of daily log−returns

0.00

0.25

0.50

0.75

1.00

0 10 20 30

Lag

AC

F

ACF of absolute daily log−returns

Fig. 4. Auto-correlation function (ACF) of daily S&P 500 log-returns, ACF of absolute S&P 500 daily log-returns, and S&P 500 daily log-returns with theconditional sample volatilities (red line). (The conditional sample volatility at a time point is computed as the volatility using samples in a nearby interval.See “volatility clustering” around the technology “bubble” in 2000 and the financial crisis in 2008 where the magnitude of log-returns is much larger thanthe rest.)

Signal Modeling in Finance: Linear Return or Log-Return?The asset linear return, or simply return, directly characterizesthe relative profit or loss of an asset. Conceptually, it shouldbe the natural one to be used in signal modeling and analysis.However, in practice the log-return is the one that econome-tricians actually use. Why? First, the linear return is boundedbelow by −1 and unbounded above, which brings troubles tostatistically model it. By contrast, log-returns can take valuesin the whole real domain, which is a desirable property forsignal modeling. Second, linear returns empirically exhibitan asymmetric distribution with high skewness. However,after the logarithmic transformation log-returns become moresymmetric, making the corresponding distributions easier tomodel. Third, when computing compounding “return” overmultiple investment periods, the log-return has the advantageof be additive over time, whereas the compounding linearreturn leads to a nasty running product form. On the otherhand, the linear return has the desirable property of beingadditive over assets which brings a lot of convenience in areaslike portfolio optimization and risk analysis. In any case, thesetwo “returns” are approximately equal when they are “small”which is the case for most of investment scenarios.

Remark 1 (On the Data Granularity/Frequency). The timeindex t in the above quantities can denote any arbitrary periodsuch as years, quarters, months, weeks, days, 5-min intervals,etc. In this article, the signal models and portfolios generallyapply to most kinds except high frequency.

For the two different return definitions, we can see that

rt = log (1 +Rt) ≈ Rt,

when |Rt| � 1. Asset returns with higher frequencies aresmaller in magnitude. In practice, the approximation errors areusually acceptable up to yearly returns. In this article, unlessotherwise specified rt and Rt will be used interchangeably.

Financial return signals exhibit specialized characteristics.Many empirical studies have identified a set of commonfeatures among financial returns that are known as stylizedfacts [29], which include:

• Asymmetry: The return distribution is negatively skewed,suggesting that extreme negative returns are more fre-quent than extreme positive ones (see Fig. 3);

• Heavy tails and outliers: The return distribution of thereturns is heavy-tailed or leptokurtotic unlike a Gaussiandistribution (see Fig. 3);

• Lack of stationarity: In the long run, returns show alack of stationarity. Past returns do not necessarily reflectfuture performance;

• Lack of auto-correlations: Asset returns are not exactlyi.i.d. but their auto-correlations are often insignificant(prices are highly auto-correlated), except for high fre-quency ones (see Fig. 4 (a));

• Time-varying cross-correlations: Cross-correlations be-tween assets returns tend to increase during high-volatility periods, in particular during financial crashes.

• Volatility clustering: Different measures of conditionalvolatility5 display a positive autocorrelation over time,

5Volatility is defined as the standard deviation in finance and econometrics.

Page 5: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 5

which means high-volatility events tend to cluster in time(see Fig. 4 (b) and (c));

• Conditional asymmetry and heavy tails: Even after cor-recting the returns for volatility clustering, the residualtime series still exhibits asymmetry and heavy tails,although less pronounced than the unconditional ones.

These stylized facts are pervasive across time, assets, andmarkets. Modern econometric models can reflect all these phe-nomena, some of which will be introduced in the following.

In financial engineering, signal modeling is commonlybased on the log-return time series. In practice, we do notjust model one asset but N assets together. Denote the log-returns of the N assets at time t as rt ∈ RN and the historicaldata up to time t − 1 as Ft−1. Signal modeling in financialengineering aims at modeling rt conditional on Ft−1 [30],[31], [32], which can generally expressed as

rt = µt + εt, (General Signal Model)

where µt is the conditional mean, i.e.,

µt , E[rt | Ft−1

], (Cond. Mean)

and εt is a white noise with zero mean and conditionalcovariance matrix

Σt , Cov[rt | Ft−1

]= E

[(rt − µt) (rt − µt)

T | Ft−1].

(Cond. Covariance)

Here, µt and Σt (or equivalently the conditional volatilitymatrix Σ

12t ) are the two quantities to model. In the following,

several models on them will be introduced.

A. I.I.D. Model

The independent and identically distributed (i.i.d.) modelassumes rt follows an i.i.d. distribution:

rt = ν + εt, (I.I.D. Model)

where ν denotes a constant drift and εt is a white noisewith zero mean and constant covariance matrix Σε. Both theconditional mean and conditional covariance of the i.i.d. modelare constant, i.e.,

µt = ν and Σt = Σε.

It is a very simplified model (returns are empirically not i.i.d.).However, it is one of the most fundamental assumptions formany important works, e.g., the Markowitz’s modern portfoliotheory [1].

B. Factor Model

The factor model is a special case of the i.i.d. model thatcan be used for dimension reduction and regression modelingfor returns. It assumes the returns are described by a fewunderlying explanatory variables, called factors. The factormodel is

rt = α+Bft + εt, (Factor Model)

where α denotes a constant vector, ft ∈ RK with K � Nis a vector containing a few factors that are responsible for

most of the randomness in the market, B ∈ RN×K is calledfactor loadings and denotes how the low-dimensional factorsaffect the high-dimensional returns, and εt is a white noisewith zero mean and constant covariance matrix Σε which hasonly a marginal effect and is uncorrelated with ft. In general,ft is assumed to follow an i.i.d. distribution with constant meanµf and constant covariance Σf . Then the conditional meanand conditional covariance are both constant and given by

µt = α+Bµf and Σt = BΣfBT +Σε.

Under the model specification, the conditional covariancecan be decomposed into a part related to low-dimensionalfactors and one related to marginal noise. Factor models inthis form include the macroeconomic factor models (arbitragepricing models like Sharpe’s CAPM [6]), where factors areobservable economic and financial time series, the funda-mental factor models (like the Fama-French model6 [33],the MSCI’s BARRA risk model7), where factors are createdfrom observable asset characteristics, and the statistical factormodels, where factors are unobservable and extracted fromasset returns. Statistical factor models are closely related toprincipal component analysis (PCA) in signal processing.

C. VARMA and VECM ModelsThe previous i.i.d. models, while commonly employed, do

not incorporate any time-dependency in the models (see Fig.4). The vector auto-regressive and moving average (VARMA)model for rt can incorporate the past information into theconditional mean [30], [31], [32]. A general VARMA(p, q)model is given by

rt = φ+

p∑i=1

Φirt−i +

q∑j=1

Ψjεt−j + εt, (VARMA)

where p and q are nonnegative integers (specifying the modelorders) and the vector φ ∈ RN and the matrices Φi,Ψj ∈ RN×N are parameters of the model, εt is a whitenoise with zero mean and constant covariance matrix Σε.When Φi = 0 for all i, the VARMA reduces to vector auto-regressive (VAR) model and similarly, when Ψj = 0 for all j,the VARMA reduces to vector moving average (VMA) model.The conditional mean and conditional covariance matrix are

µt = φ+

p∑i=1

Φirt−i +

q∑j=1

Ψjεt−j and Σt = Σε.

Until now, we have focused on modeling directly the log-returns rt = ∆yt = yt − yt−1. The reason is that the log-price time series yt is not weakly stationary, whereas the log-return time series rt is weakly stationary and can be moreeasily modeled. In financial engineering, it has been shownthat although the log-prices yt in level are not stationary, theremay exist some linear combinations of them, i.e., Πyt withΠ ∈ RN×N and rank (Π) < N , that are stationary, termedcointegration relations [34]. (See How was “Cointegration”Discovered in the History?) Such relations can also be taken

6Eugene Fama was awarded the NOBEL PRIZE IN ECONOMICS in 2013 forhis contributions in empirical analysis of asset prices.

7https://www.msci.com/www/research-paper/barra-s-risk-models/014972229/

Page 6: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 6

How was “Cointegration” Discovered in the History?In the 1980s, economists showed statistical evidence that manymacroeconomic time series (like GNP, employment, wages,etc.) contain stochastic trends. It was also found that directlyapplying linear regressions on them can lead to “spurious”regression result and the de-trending approach commonly forseries with deterministic trends (the trend is a function oftime) does not work for the series with stochastic trends.In 1987, Clive Granger and Robert Engle coined a conceptof “cointegration” and developed the cointegrating vectorapproach to analyzing such series. The idea is if two seriesare with a common stochastic trend then a linear combinationof them can lead to a stationary series, i.e., the common trendare eliminated, in which case the two time series are named“cointegrated”. Due to the discovery of cointegration, CliveGranger was awarded the NOBEL PRIZE IN ECONOMICS in2003. In modern econometrics, checking for stochastic trendshas become a standard methodology in time series analysis.

into consideration in the VAR models leading to the vectorerror correction model (VECM) (a.k.a. cointegrated VARmodel) given by

rt = φ+Πyt−1 +

p∑i=1

Φirt−i + εt, (VECM)

where Π is called the error correction matrix. The conditionalmean and conditional covariance are accordingly given by

µt = φ+Πyt−1 +

p∑i=1

Φirt−i and Σt = Σε.

D. Conditional Heteroskedastic Model

The previous models are only capable of including the time-dependency in the conditional mean µt, while the conditionalcovariance matrix still remains constant. In financial markets,time-varying rather than constant covariance is commonlyobserved. As previously mentioned, one of the stylized factsof financial returns is the “volatility clustering” which is notcaptured by these models. (See “Volatility Clustering” andConditional Heteroskedasticity.) To model the volatility ofasset returns, we first rewrite the white noise εt as

εt = Σ12t εt,

where εt ∈ RN is an i.i.d. (white) noise with zero mean andidentity covariance matrix I. In conditional heteroskedasticmodeling, the key is to model the conditional covariancematrix Σt (or equivalently the conditional volatility matrixΣ

12t ). Many conditional heteroskedastic models have been

proposed. In this article, we only briefly showcase one ofthem called vector autoregressive conditional heteroskedastic-ity model [30], [31], [36] as follows:

vech (Σt) = φ+

p∑i=1

Ψivech(Σt−i)+

q∑j=1

Φjvech(εt−jεTt−j),

where p and q are nonnegative integers (defining the order ofthe model), vech (·) represents the half-vectorization operator

“Volatility Clustering” and Conditional HeteroskedasticityIn financial time series, “volatility clustering” [35] refers to thephenomenon that “large changes tend to be followed by largechanges, of either sign, and small changes tend to be followedby small changes.” A quantitative description of this fact isthat, while returns themselves can be uncorrelated, absolutereturns or squared returns display a positive, significant, andslowly decaying auto-correlation function. The concept of con-ditional heteroskedasticity is used to describe such nonconstantvolatility fact in finance and it goes against the simple randomwalk model. Many conditional heteroskedastic models havebeen proposed. One of the famous and earliest ones is theautoregressive conditional heteroskedasticity model proposedby Robert Engle, who was later awarded the NOBEL PRIZE INECONOMICS in 2003 to recognize his contribution in methodsof analyzing time series with time-varying volatility.

that keeps the lower N (N + 1) /2 lower triangular part of theN ×N matrix argument, and φ ∈ RN(N+1)/2 and Di, Ej ∈RN(N+1)/2×N(N+1)/2 are model parameters. It is easy to seethat a time-varying structure is imposed on the conditionalcovariance matrix.

We have reviewed several linear discrete-time models forasset returns. But there are many other models, including non-linear models [30], continuous-time stochastic models [27],state-space models [37], etc. However, caution must be takenwhen using sophisticated models with many parameters toavoid overfitting.

Remark 2 (Financial Data Cleaning and Pre-processing).Sophisticated methods are developed for analyzing financialdata. But in practice the data may contain many imperfectionssuch as the presence of outliers or missing values: it is so-called “dirty data”. It is often the more sophisticated methodsthat are more susceptible to dirty data. While it is sometimespossible to use robust techniques that are less sensitive tooutliers or bad observations, for example, using the medianinstead of the mean, it makes sense to deal with the dirtydata before the modeling stage. Data scientists, accordingto interviews and expert estimates, spend 80% of their timecleaning and manipulating data and only 20% of their timeactually analyzing it [38]. Why? Because of a simple truth:“better data beats fancier algorithms.” Improving the qualityof the data one is very likely to improve the quality of theresults. This is why traders spend huge amounts of moneyto purchase high-quality clean data from data providers likeBloomberg, Thomson Reuters, FactSet, etc.

Remark 3 (Missing and Incomplete Data Issues in Finance).In finance, missing values problems can happen during dataobservation or recording process [17]. There are various rea-sons resulting in missing values: values may not be measured,values may be measured but get lost, or values may bemeasured but are considered unusable. For example, somestocks may suffer a lack of liquidity resulting in no transactionand hence no price recorded. In the presence of missing values,one either has to develop algorithms capable of handling that(most existing algorithms cannot) or one can fill in those

Page 7: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 7

values, termed imputation, before any subsequent modelingstage or algorithmic stage [39].

E. Model Selection and Estimation

There is a plethora of models that can be used to modelfinancial returns. Selecting a right model is important anddepends on the application scenario. Model selection canbe done with numerical-based cross-validation methods [32]or with methods based on the log-likelihood of the fittingpenalized by accounting for the degrees of freedom of themodel [40]. Model estimation can be carried out throughdifferent techniques like least squares estimation, maximumlikelihood estimation, Bayesian estimation, and so on [30].

IV. THE BIRTH OF MODERN PORTFOLIO THEORY

A. What is a Portfolio?

In finance, a portfolio is used to represent a basket of assetsheld by an investor (an institution or a private individual). Sup-pose there are N assets in a portfolio and an investment budgetB. The portfolio weight w ∈ RN satisfying 1Tw = 1 is usedto represent the proportion of the total budget invested on thedifferent assets, which is also commonly called portfolio.

A portfolio w (a decision variable) tells an investor how toallocate the dollar budget B among different assets.

The initial absolute value for the asset n (n = 1, 2, . . . , N )at time t − 1 is given by Bwn and the end value after onetrading period is Bwn (pn,t/pn,t−1) = Bwn (rn,t + 1). Thereturn of the portfolio w at time t can be computed by8

rp,t ,

∑Nn=1 Bwn (rn,t + 1)−B

B=

N∑n=1

wnrn,t = wT rt.

For n = 1, 2, . . . , N , wn > 0, wn < 0, and wn = 0 mean,respectively, a long position (i.e., it is bought), a short position(i.e., it is short-sold or, more plainly, borrowed and sold),and no position on the asset n. (See What is Short-sellingin Finance?)

B. Why Portfolio Optimization?

Portfolio optimization is a financial decision-making prob-lem, which aims at designing a portfolio such that a profit (apositive return) can hopefully be obtained. Other alternativeterms for portfolio optimization are portfolio design, portfolioconstruction, portfolio selection, portfolio management, assetallocation, asset selection, and asset management. Originatingfrom the belief that “one should never put all eggs in onebasket.” Portfolio design has gone through several significantdevelopments: from heuristic strategies to systematical andmathematical ones, from experience-oriented to model-basedapproaches, from return-oriented to risk diversification, mak-ing portfolio design techniques evolve from “rule-of-thumb”practices to complicated portfolio theories. The concepts of

8Generally speaking, portfolio weights w and budget B should also beindexed by time t. Since in this article we mainly focus on the portfoliodesign for one-period investment, the index t will be omitted.

What is Short-selling in Finance?The most natural way to invest is to purchase some positiveamount of dollars of some asset. For example, one can buy$1,000 of Apple; if the stock price goes up one makes abenefit, but if it goes down one has a loss (in the worst caseone can lose everything, so a loss of 100%). Interestingly, infinance one can also invest a negative amount of dollars! Howcan that be? This is called short-selling or shorting or goingshort. The idea is to sell some amount of an asset that one doesnot have yet, but with the promise to buy it at a later time.For example, one can short-sell $1,000 of Apple, which canbe thought of owing -$1,000; if the stock price goes up oneloses money because one has to buy those shares of the stockat a higher price, however, if the price goes down one makesa benefit. The main danger of short-selling is that the loss isunbounded because the price can potentially grow high withoutbound. In practice, short-selling can be used for hedging (onecan simultaneously buy a stock and one of its derivatives tohedge the overall investment risk), financing, and speculating.Speculation on a potential decline in an asset, however, canlead to dramatic consequences; for example, one can short-sellthe stocks of one company thinking that it will go bad, butthe mere fact of several speculators selling will drive the pricedown, so eventually severely affecting the company.

portfolio optimization and diversification have been instrumen-tal in the development and understanding of financial marketsand financial decision making.

When it comes to portfolio theory, one can never avoidreferring to the renowned Markowitz portfolio, introduced byHarry Markowitz in 1952 in a seminar paper [1]. The work byMarkowitz has laid the foundation for modern portfolio theory(MPT), for which he was awarded the NOBEL PRIZE INECONOMICS in 1990. In this section, before introducing theMarkowitz portfolio, we will first introduce several heuristicinvestment strategies. It should also be mentioned that wemainly focus on the portfolio design techniques for equitystocks and bonds in this article, leaving investments in options,futures, other derivatives, and other financial products out.

C. Just Buying One Asset

The simplest investment strategy consists of selecting justone asset, allocating the whole budget B to it, and sticking toit for some periods. The belief behind such investment is thatthe asset will increase gradually in value over the investmentperiod. There is no diversification in this strategy. One canuse different methods (like fundamental analysis or technicalanalysis) to make the choice. (See Momentum Strategies vs.Mean Reversion Strategies.)

D. Equal-Weighted or Uniform Portfolio

One of the most important goals of quantitative portfoliomanagement is to realize the goal of diversification acrossdifferent assets in a portfolio. Given N assets, a simpleway to achieve diversification is by allocating the capital

Page 8: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 8

Momentum Strategies vs. Mean Reversion StrategiesMomentum and mean-reversion are two common investmentstrategies. Momentum strategies, also called trend following,are devised on the premise that although the price movementsare random the market on the average will maintain its trend,i.e., if an asset is declining it will continue to decline and ifit is growing it will continue to do so. These strategies tryto capitalize on market movements by buying (selling) assetsshowing an upward (downward) trend. Momentum strategiesfor one asset can be generalized to multi-asset trading. Sincethe price movements are random, there is always a noisycomponent on top of the trend. The objective is to reducethat noise by somehow averaging it out (either along the timedomain or along the stock domain). This is precisely whythe variance of the noise is considered as a measure of theunwanted risk in the Markowitz portfolio. Mean reversionstrategies, on the other hand, are based on a totally differentpremise that price of an asset eventually returns to its averageprice after a temporary deviation which can be regarded as a“noise” component on the average. To make a benefit fromthe “noise”, people try to buy (sell) assets when they areunder (above) the average values. Mean-reversion strategiescan also be generalized to the multi-asset case which isrelated to statistical arbitrage. The way to achieve this is byforming a synthetic asset that is mean-reverting, i.e., that hasan equilibrium mean value so that the noise term will drivethe value above or below that equilibrium only to revert backat a later point. If one can properly time the enter point andthe exit point, then one is effectively “riding the wave” andmaking a benefit.

equally across all the assets, which is commonly named equal-weighted portfolio (EWP) (a.k.a. uniform portfolio, maximumdeconcentration portfolio, or 1

N portfolio):

wn =1

N, ∀n = 1, . . . , N. (EWP)

This could be the simplest diversification investment strat-egy without looking at any realized statistics of the assetreturns rt. Authors of [41] call the EW strategy the “Talmudicrule” since the Babylonian Talmud recommended this strategyapproximately 1,500 years ago: “A man should always placehis money, one third in land, a third in merchandise, and keepa third in hand.” The EWP has been famously used in someindex funds, differing from the weighting methods commonlyused by funds and portfolio managers where the stocks areweighted based on their market capitalizations, i.e., value-weighted (refer to Section VIII-G). The concept of EWPs (orEW funds) has gained much interest due to superior historicalperformance and the emergence of several EW ETF [42]. Forexample, Standard & Poor’s has developed many S&P 500equal weighted indices (more details on index funds will begiven in Section VIII-J).

E. Global Maximum Return Portfolio

Another simple way to make an investment from the Nassets is to only invest on the one with the highest return

performance. Given portfolio return rp,t, the expected returnof the portfolio is given by

µp (w) , E[wT rt

]= wTE [rt] = wTµ.

The parameter µ , E [rt] stands for the forecast of theassets mean returns, which can be estimated based on the past(realized) returns. Selecting the asset with the highest returnis related to the global maximum return portfolio (GMRP)design problem given as follows:

maximizew

wTµ

subject to 1Tw = 1, w ≥ 0,(GMRP)

where the constraint 1Tw = 1 is the budget constraint andw ≥ 0 indicates short-selling is not allowed. It aims at aportfolio with the highest future return µp. This problem is aconvex linear programming which can be efficiently solved.In fact, the optimal solution is trivially achieved by allocatingall the budget to the stock with maximum return.

F. Quantile Portfolio and Rank Trading

The quantile portfolios are widely used by practitionersand there are many variations, like the decile portfolio, thequintile portfolio, and the quartile portfolio. The procedureto construct such a quantile portfolio is very simple. GivenN assets, an investor will first rank the assets according toa ranking scheme. A ranking scheme is any model that canassign each asset a number indicating how they are expectedto perform, where a lower number is better (or worse bydefinition). Examples of the ranking schemes could be meanreturns, value factors, technical indicators, pricing models, ora combination of the above. Suppose you want to hold a totalof P positions (P ≤ N ). If the stock at rank 1 is expected toperform the best and stock at rank N is expected to performthe worst. For each asset in positions 1, . . . , P (with highestrankings) in the ranking, buy equal dollars worth of each asset.Define w(n) to be portfolio weight corresponding to the assetin position n, and the quantile portfolio is

w(n) =1

P, ∀n = 1, . . . , P

w(n) = 0, ∀n = P + 1, . . . , N.(Quantile)

Specifically, a quintile portfolio is designed when N/P = 5.In practice, besides longing the assets in highest rankings

the bottom part in positions N − P, . . . , N (with lowestrankings) can be short-sold, commonly referred to as long-short quantile portfolio. In this case, the long positions arefinanced by the short positions, leading to a dollar-neutralportfolio. The success of this strategy lies almost entirely in theranking scheme used. In practice, developing a good rankingscheme is nontrivial. The better a ranking scheme separateshigh performing assets from low performing assets, better thereturns of the quantile portfolio are.

G. Global Minimum Variance Portfolio

The variance of portfolio returns can be used as a proxy toquantify the “risk” of a portfolio, reflecting the chance that

Page 9: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 9

Standard deviation

Expecte

d r

etu

rn

r f

GMVP

MSRP

GMRP

Assets

Inefficient portfolio

Efficient portfolio frontier (w/ risk-free asset)

Portfolio frontierEfficient portfolio frontier

EWP

Quintile

Fig. 5. Efficient portfolio frontiers w/o the risk-free asset in long-only case.

the return on an investment may be very different from theexpected one µp (w). Based on this risk measure, another wayof designing a portfolio is to focus on minimizing its variance(Var) which is defined by

σ2p (w) , Cov

[wT rt

]= wTE

[(rt − µ) (rt − µ)

T]w = wTΣw.

The parameter Σ , E[(rt − µ) (rt − µ)

T]

stands for the

asset return covariance matrix and σp (w) =√wTΣw is the

portfolio standard deviation (SD) or the portfolio volatility. Itis assumed that Σ is positive definite, i.e., wTΣw > 0 forall w 6= 0, which is equivalent to assuming that none of theunderlying assets can be perfectly replicated by a combinationof the remaining ones. The global minimum variance portfolio(GMVP) [43] is given as follows:

minimizew

wTΣw

subject to 1Tw = 1, w ≥ 0,(GMVP)

which is a simple convex quadratic programming (QP).

H. Markowitz (Mean-Variance) Portfolio

Different assets may have different risk and return profiles,Markowitz pioneered the idea that investors should not con-sider an asset’s risk and return separately, but rather by howit contributes to the portfolio’s overall risk and return, whichleads to the modern portfolio theory [1], [44]. Central to MPTis the notion of diversification, i.e., not only the individualassets’ performances are important, but also their relationshipwith other assets in the market. This portfolio design idea isformalized into an optimization problem, the so-called mean-variance optimization, and the portfolio is hence called mean-variance portfolio (MVP). (See Making Portfolio Design aTheory: Harry Markowitz and Modern Portfolio Theory.)

Different from GMRP and GMVP, MVP is able to achievea desirable trade-off between portfolio expected return wTµand portfolio risk wTΣw:

minimizew

wTΣw − λwTµ

subject to 1Tw = 1, w ≥ 0,(MVP)

where λ is an investor-specific risk-aversion parameter. MVPis a multi-objective optimization problem. In terms of MPT,the EWP is only optimal when all assets have the same riskand return characteristics. If one ignores the expected returnterm by choosing λ = 0, then MVP reduces to GMVP, whichis widely used in academic papers for simplicity of evaluationand comparison of different estimators of the covariancematrix Σ (while ignoring the estimation of µ). If all assetshave the same expected return independent of risk, MVP isequivalent to GMVP.

Alternative MVP formulations include maximizing the ex-pected return subject to an predefined portfolio variance(the return maximization form of MVP) and minimizing theportfolio variance subject to a predefined expected return(the variance minimization form of MVP). The later varianceminimization form is given by

minimizew

wTΣw

subject to wTµ = ρ

1Tw = 1, w ≥ 0,

(MVP-MV)

where the specific value of ρ depends on the risk aversion ofthe investor.

Remark 4 (MVP-MV as a Constrained Linear Regression).The objective in MVP-MV can be rewritten as

wTΣw = wTE[(rt − µ) (rt − µ)

T]w = E

[(wT rt − ρ)2

].

where we have used wTµ = ρ. Replacing the expectationby its sample estimates, the MVP-MV can be written as thefollowing constrained linear regression problem:

minimizew

1

N‖Rw − ρ1T ‖22

subject to wTµ = ρ

1Tw = 1, w ≥ 0,

(MVP-MV)

where R , [r1, . . . , rT ]T ∈ RT×N .

Intuitively, MVP-MV suggests that among the infinite num-ber of portfolios that achieve a particular return objective, theinvestor should choose the portfolio that has the smallest vari-ance. All other portfolios are “inefficient” because they have ahigher risk. When we plot the the portfolio return against thecorresponding minimal portfolio volatility we obtain the mean-variance trade-off curve (a.k.a. Pareto curve), i.e., the so-calledportfolio frontier as shown in Fig. 5. We can also identify theportfolio having minimal variance among all risky portfolioswhich is the GMVP. The points on the portfolio frontier withexpected returns greater than the minimum variance portfolio’sexpected return are said to lie on the efficient (portfolio)frontier, as shown in Fig. 5. In fact, the three variations ofthe MVP formulation result in the same portfolio frontier.

Page 10: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 10

Making Portfolio Design a Theory: Harry Markowitz and the Modern Portfolio TheoryModern portfolio theory (MPT) was put forth by Harry Markowitz in his seminal paper “Portfolio Selection,” published in1952 by the Journal of Finance. Markowitz was later awarded a NOBEL PRIZE IN ECONOMICS in 1990 for having developedthe theory of portfolio choice. Before MPT, investors were mainly concerned with the expected return ignoring the risk aspect.Markowitz precisely proposed the combination of the expected return and the risk (so-called mean-variance portfolio becausethe risk is typically measured with the variance). It was the introduction of risk into the investment decisions that was theexceptional feature of MPT and a real breakthrough. MPT is one of the most important and influential economic theoriesdealing with finance and investment recognized by both academia and industry and it also opened the door for scholars toformally employ mathematical methods in financial applications.

I. Maximum Sharpe Ratio Portfolio

In [45], [46], Sharpe provided a risk-adjusted measure of theperformance of an asset or portfolio named Sharpe ratio (SR).For a portfolio w, it is defined as the ratio of the expectedexcess return to the its volatility

SRp (w) =µp (w)− rf

σp (w)=

wT rt − rf√wTΣw

,

where wT rt− rf is the excess return with respect to the risk-free return rf (a.k.a. riskless return or risk-free rate). When weset rf = 0, SR becomes the so-called information ratio (IR).A portfolio can be designed by maximizing the SR, resultingin the following maximum Sharpe ratio portfolio (MSRP):

maximizew

wTµ− rf√wTΣw

subject to 1Tw = 1, w ≥ 0.

(MSRP)

The problem is nonconvex. Assuming that wTµ − rf > 0(otherwise this portfolio is a bit useless since the risk-freereturn is achieved with zero risk), the problem is still non-convex but is quasiconvex (more exactly, quasilinear), whichcan be efficiently solved by reformulating it as a (convex)second-order cone programming (SOCP) or a convex QP. (SeeDiscovering “Convexity” in Maximum SR Portfolio.)

The MSRP is also named tangency portfolio. This is becauseif we connect the MSRP with the risk-free asset, as shown inFig. 5, the straight line is tangential to the efficient frontier. Itis not difficult to show that if we include the risk-free assetin MVP, then the efficient frontier is extended to include thetangent line from the MSRP to the risk-free asset. In finance,this tangent line is called the efficient (portfolio) frontier withrisk-free asset or the capital market line. More interestingly,when considering the risk-free asset in the above portfolioproblems, we can induce a nice connection with the “beam-forming design equivalence” in signal processing. (See TheInterplay Between the "Portfolio Design Equivalence" and theClassical "Beamformer Design Equivalence".) A comparisonof the aforementioned portfolios in terms of asset allocationand investment performance is given in Fig. 6.

V. WEAKNESSES OF MVP AND FURTHER DEVELOPMENTS

We have discussed several heuristic portfolio design meth-ods, the diversification idea based on MVP undoubtedly hashad a major impact not only on academic research but alsoon the financial industry as a whole. It changed the focus ofinvestment analysis away from individual security selection

Discovering “Convexity” in Maximum SR PortfolioFirstly, due to 1Tw = 1 and defining µ , µ− rf1, we have

minimizew

√wTΣw

wT µ

subject to 1Tw = 1, (w ≥ 0) .

Since the objective is scale-invariant to w, we can define w ,tw (t > 0) and set wT µ = 1 (any positive number). Then weget the following equivalent form

minimizew,w,t

wTΣw

subject to wT µ = 1

w = tw

1T w = t > 0, (w ≥ 0) .

Due to wT µ = 1, w cannot be identically zero. So t > 0 canbe relaxed as t ≥ 0. Reducing w and t, the problem becomes

minimizew

wTΣw

subject to wT µ = 1

1T w ≥ 0, (w ≥ 0) ,

(MSRP)

which is a convex QP. Obtaining w?, we have w? = w?

1T w? .

toward the concept of diversification and the impact of in-dividual securities on a portfolio’s risk-return characteristics.Despite the theoretical elegance of the model, the MVP alsohas been widely criticized for many reasons as described next.

A. High Sensitivity to Parameter Estimation Errors

The sensitivity to the estimates of input parameters, i.e., theexpected returns and covariances, is a vital problem in MVP.As a matter of fact, estimating the expected returns µ is morechallenging than estimating the covariance matrix Σ; indeed,the errors in the estimated expected returns are of much greatersignificance [47], [48]. For example, estimation errors can leadto extreme long and short portfolio weights after portfoliooptimization. For this reason, the mean-variance optimizationis sometimes cynically referred to as “error maximizer” [47].As we rebalance the portfolio, slight changes in the inputparameter can then lead to extreme changes in the portfolioweights and to high turnover levels. As a result, portfoliomanagers generally do not trust these extreme weights. Thisis one of the main reasons why the “vanilla” MVP is not

Page 11: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 11

The Interplay Between the “Portfolio Design Equivalence” and the Classical “Beamformer Design Equivalence”When designing portfolios with N risky assets and the risk-free asset which has return rf and variance 0 with weight w0 ∈ R,the Markowitz (variance minimization), Markowitz (mean-variance), and maximum Sharpe Ratio portfolios are given by

minimizew0,w

wTΣw

subject to w0rf +wTµ = ρ

w0 + 1Tw = 1

minimizew0,w

wTΣw − λ(w0rf +wTµ

)subject to w0 + 1Tw = 1

maximizew0,w

w0rf +wTµ− rf√wTΣw

subject to w0 + 1Tw = 1.

MVP-MV MVP MSRP

By noticing that w0 = 1− 1Tw, these three problems can be be equivalently reformulated as follows:

minimizew

wTΣw

subject to wT µ = ρminimize

wwTΣw − λwT µ maximize

w

(wT µ

)2wTΣw

"MOE beamforming" "MMSE beamforming" "Max SNR beamforming"w?

MVP−MV = ρµTΣ−1µ

Σ−1µ w?MVP = λ

2Σ−1µ w?

MSRP = γµTΣ−1µ

Σ−1µ, (∀γ 6= 0) .

where µ , µ− rf1 and ρ , ρ− rf . It is easy to see that the three portfolio design problems finally fall into the forms of thethree optimal receiver beamforming problems, i.e., minimum output energy (MOE) beamforming, minimum mean-square error(MMSE) beamforming, and maximum signal-to-noise ratio (SNR) beamforming. It is a classical result that beamforming withthese three criteria are equivalent under proper choice of ρ, λ, and γ [5]. In fact, these three portfolios with the inclusion ofthe risk-free asset are also equivalent and called the market portfolios, which lie on the efficient frontier with risk-free asset.

0.00

0.25

0.50

0.75

1.00

AAPL AMZN FB GOOG MSFT

Ca

pita

l fr

actio

n

EWP

Quintile

GMRP

GMVP

MVP

MSRP

Capital allocation of different portfolios

0.9

1.0

1.1

1.2

1.3

Jun 2016 Aug 2016 Oct 2016 Dec 2016 Feb 2017

Cu

mu

lative

Pn

L

EWP

Quintile

GMRP

GMVP

MVP

MSRP

AAPL

AMZN

FB

GOOG

MSFT

Investment performance

Fig. 6. Allocation and cumulative profit and loss (PnL) performance of different portfolios.

widely used by practitioners. Ironically, the EWP, which isindependent of estimation errors, can perform much better inpractice than the MVP [42]. Besides the problem of estimationerrors, portfolio managers are unlikely to have a detailedunderstanding of all the assets, so it is simply unrealistic toexpect them to produce reasonable estimates for µ and Σ.Mitigating the impact of estimation errors in mean-varianceoptimization is a formidable task given the number of assetsavailable today. In high-dimensional regimes where N > T ,the sample covariance matrix is rank deficient. (In the financialmarkets, the number of assets is far larger than the numberof observations due to the lack of past historical data or tothe lack of stationary of data which makes it unusable forlong lookback windows. An exception is high-frequency data,which is abundant.) These problems call for the applicationsof more advanced estimation techniques like shrinkage androbust estimations which will be briefly discussed in Section

VI.

In addition, portfolio optimization problems are often for-mulated as deterministic optimization problems, ignoring theuncertainty in the parameters. Using point estimates for pa-rameters and naively treating them as error-free in portfolioformulation is not prudent. In practice, the portfolio man-ager may have more confidence in some of the estimatesthan others, so it is desirable to treat these parameters dif-ferently while designing portfolios. With these concerns inmind, robust portfolio optimization methods [49], [50] canbe used to incorporate the uncertainty of the parameters intothe optimization process to be discussed in Section VIII-A.Another way to reduce the influence of the estimation errorin the portfolio optimization stage is to use regularizationtechniques, which includes using objective penalizations likethe `1- and `2-norm or imposing some portfolio constraintslike no-shorting constraint. For example, practitioners often

Page 12: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 12

use long-only constraints or upper and lower bounds foreach security to avoid over-concentration in a few assets.The regularized portfolio will be discussed in Section VIII-B.Practical portfolio constraints will be introduced in SectionVII.

B. Improper Risk Measurement

In the previous problems, variance is used as a proxy forthe risk which is not good in practice since it penalizesboth the unwanted down-side risk and the desired up-siderisk. To solve this issue, alternative risk measures, e.g., meanabsolute deviation, down-side variance, value-at-risk, expectedshortfall, etc., have been proposed for portfolio optimization,which will be introduced in Section VIII-C.

C. Misspecification of Utility and Data Distribution

The mean-variance analysis can be regarded as an expectedutility maximization problem based on a quadratic criterionor jointly Gaussian (more generally, elliptical) distributionassumption for returns. (See Expected Utility Theory andMarkowitz Portfolio.) This is limiting as financial return

Expected Utility Theory and Markowitz PortfolioIn economics and finance, the most popular approach to theproblem of choice under uncertainty is the expected utility(EU) theory [51]. The EU theory defines a relation betweenthe initial wealth B and the final wealth

(1 +wT rt

)B which

is described by a (von Neumann-Morgenstern) utility function,U (·). In general, investors will choose different utility func-tions. For portfolio optimization, the EU theory states that theindividual will choose the portfolio w such that the expectedvalue of utility at the end of the period is maximized:

maximizew

Ert∼D[U((1 +wT rt

)B)]

subject to 1Tw = 1, w ≥ 0.

The mean-variance portfolio theory can be explained to bebased on a quadratic utility function U (·) or Gaussian distri-bution D assumption for returns [7].

distributions are not jointly Gaussian, but instead exhibit fattails and asymmetry that cannot be described by their mean-variances alone. In practice, the tails and skewness of thereturn distribution can affect the portfolio performance. Thesolution to this problem is the general expected utility portfolioin Section VIII-D.

D. From Weight Diversification to Risk Diversification

The idea of MVP is to realize diversification. However, itonly considers the risk of the portfolio as a whole and ignoresthe risk diversification. One solution to this issue is the famousrisk parity approach [52], [53], which in portfolio constructionaims at building portfolios where the overall portfolio risk isdiversified by allocating the risk equally across the differentassets or types of assets. The risk diversification portfolios willbe introduced in Section VIII-H.

E. Portfolio Design with Practical Constraints

In the traditional MVP problem, only simple investmentconstraints (i.e., budget and long-only constraints) are con-sidered. As mentioned before, portfolio constraints can helpto avoid high turnover and transaction costs in the port-folio construction process. Besides that, in practice varioustypes of constraints are needed to take specific investmentguidelines and institutional features into account. Clearly,adding constraints to an MVP problem can never improve thein-sample optimization results. The inclusion of constraintsin portfolio optimization problems can lead to better out-of-sample performance, compared to portfolios constructedwithout these constraints. More practical portfolio constraintswill be introduced in Section VII.

F. Portfolio Design based on Diversified Strategies

The Markowitz mean-variance portfolio optimization iswell-known and has been extended in many different direc-tions. Other portfolio design problems that have less attentionin the open scientific literature include index tracking portfolio,statistical arbitrage portfolio, multi-period portfolio, multi-account portfolio, etc. Different from MVP, they are basedon alternative design philosophies and model assumptions.These design targets can be mathematically modeled intooptimization problems which in general result in nonconvexproblems. We will introduce some of them in Section VIII.Together with the inclusion of practical portfolio constraints,to solve the the portfolio design problems (even the MVPproblem) can be very time-consuming, especially for a largenumber of assets [54]. Efficient algorithms for solving theseproblems will also be briefly outlined in Section VIII.

VI. PARAMETER ESTIMATION FOR PORTFOLIOOPTIMIZATION

In this section, we discuss several topics on parameterestimation especially related to the mean-variance analysisregime. The most naive method for estimation are the sam-ple mean and sample covariance matrix. However, they areonly good for large number of observations T and becomeparticularly inefficient with very noisy measurements [26]. Inpractice, T is not large enough due to either unavailabilityof data or lack of stationarity of data. As a consequence,the sample estimates are really bad, especially the samplemean [48], based on which a portfolio design can be fatal.Indeed, this is one of the reasons why Markowitz portfolioand related portfolio designs are rarely used by practitioners.In this section, we discuss some advanced techniques formitigating these estimation errors. Since parameter estimationis not the focus of this paper, we only given a brief outline.

Robust Estimation. The traditional statistical estimates arebased on the least squares estimation or maximum likelihoodestimation under a Gaussian distribution, which have idealtheoretical properties in the large sample regime and fornormally distributed variables. However, the portfolio managercannot rely on estimators based on those assumptions becausethe number of observations is limited (due to scarce historicaldata and lack of data stationarity) and the distribution is far

Page 13: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 13

from Gaussian [55], [56], [57]. Especially, the number andmagnitude of financial crashes will be underestimated. Thismotivates us to investigate the use of more robust estimationprocedures which can handle limited number of observationsusing tools like robust statistics [58]. The estimation pro-cedures can be done using robust loss functions [59], [60]or based on a robust likelihood function like the ellipticaldistributions [61], [62], [63], [64], [65] or the generalizedhyperbolic distributions [66].

Structured Estimation. The models on return time seriesintroduced in Section III suffice to specify structures in returnmodeling. Factor models are widely used by practitioners(fund managers commonly buy factors, e.g., the BARRAfactors, at a high premium), under which the constant returncovariance matrix can be decomposed into two parts: low-dimensional factors and marginal noise. Hence, with highlyreduced number of parameters, sample complexity can belowered down and estimation precision will be improved.Besides factor models, the auto-regressive moving averagemodels and the conditional heteroskedasticity models can alsobe used to improve the estimation results.

Shrinkage Estimation. Prior information can aid in improv-ing the estimation performance, which is commonly coinedas shrinkage method, like the James-Stein’s method [67])and the Bayesian method [68]. A great number of generalBayesian approaches have been used for estimation for theexpected returns [69] and the covariance matrix [9]. In financea widely used “market-based” shrinkage approach is calledBlack-Litterman model [70]. It is a “view mixing” modelwhere the estimate of expected returns is calculated as aweighted average of the market equilibrium (e.g., the CAPMequilibrium) and the investor’s views. The weights dependupon the volatility of each asset and its correlations with theother assets and the degree of confidence in each forecast [26].

Information-Aided Estimation. Many methods have beenexplored to improve the estimation performance of parametersby using side information. The joint estimation of volatilityand trading volume (which is not a parameter in of the mean-variance optimization) is one good example. News sentimentcommonly has an impact on the movement of asset priceswhich can be used together with the asset return series tomodel the volatility.

It should be mentioned that the aforementioned approachesto improve parameter estimation can be combined. For ex-ample, the robust estimation procedure can be combined withthe structured estimation and shrinkage estimation. Note thatthe parameter estimation results are based on the log-returnsand, in principle, they should be converted into linear returnsfor the portfolio optimization stage. However, in practicesuch transformation is not necessary due to the approximateequivalence of linear returns and log-returns.

VII. PORTFOLIO CONSTRAINTS

In portfolio design, it is common to consider various typesof constraints that take specific investment guidelines, invest-ment restrictions, and institutional features into account. Forexample, some constraints reflect the restrictions imposed on

the portfolio manager by the market regulators. As such, theyare inflexible and must be respected at all times, even if theylimit the portfolio manager’s ability to add value to the port-folio through security selection and positioning. Examples ofsuch constraints include the short-selling limitations imposedin various jurisdictions. In practice, portfolio constraints canbe imposed in the portfolio design progress or during trading,although it is desirable to take them into account during theportfolio optimization stage.

Mathematically, portfolio constraints can be convex ornonconvex. The budget constraint 1Tw = 1 and long-onlyconstraint w ≥ 0 are examples of linear constraints. Butthere are also some nonlinear ones like leverage constraint‖w‖1 ≤ L and even nonconvex ones like the cardinality con-straint for asset selection. In Table II, we list several commonlyused portfolio constraints. In the following sections, someportfolio measures and optimization criteria will be introduced.The portfolio optimization criteria specifying specific portfo-lio design targets in different investment environments (likeportfolio target return, portfolio risk, tracking error, and riskparity) and the portfolio performance measures can also beused as constraints. It should be noted that the definition forportfolio portfolio measures, design criteria and constraintsare not absolute, in which sense a measure or criterion canalso be used as a constraint, and vice-versa. In the rest of thearticle, we will use set W to generally denote all the portfolioconstraints considered in the optimization problems.

VIII. ADVANCED PORTFOLIO OPTIMIZATION PROBLEMS

In this section, going beyond the basic portfolio optimiza-tions problems in Section IV, a list of important and advancedportfolio optimization problems will be introduced in detail.

A. Robust Portfolio

Recall that the MVP problem formulation contains notonly the optimization variables w but also the parameters,which define the problem instance and in practice have to beestimated. The fact that the parameters of MVP are subject toestimation errors leads to the topic of optimization under un-certainty [71], where the robust portfolio optimization emergedas an solution to this target.

Robust optimization aims at finding a solution that is guar-anteed to be satisfactory for most realizations of the uncertainparameters in a sense of “worst-case” robustness [72], [73],[74], [75], [76], [50], [77]. Taking the MVP as an example, aworst-case robust MVP is given as follows:

minimizew

maxΣ∈UΣ

wTΣw

subject to minµ∈Uµ

wTµ ≥ ρ

w ∈ W,

(Robust MVP)

where Σ and µ are subject to some uncertainty sets UΣ

and Uµ, respectively. Uncertainty sets that contain possiblevalues of the uncertain parameters are used to describe theuncertainty in the problem and their size represents the levelof the uncertainty and/or the desired level of robustness. Inpractice, the uncertainty sets can take different forms like

Page 14: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 14

TABLE IIPORTFOLIO CONSTRAINTS W

Portfolio Constraint Mathematical Forma Meaning in Portfolio Optimizationbudget constraint 1Tw = 1 If any element in w is negative, we call the portfolio leveraged.

self-financing constraint 1Tw = 0 It denotes the long positions can be financed by the short positions.long-only (no-short) constraint w ≥ 0 Short-selling is not allowed.

holding constraint l ≤ w ≤ u It specifies the minimum and maximum holding size.leverage constraint ‖w‖1 ≤ L ‖w‖1 = 1T [w]+ − 1T [w]−. b

cardinality constraint ‖w‖0 ≤ K It is used to realize asset selection.turnover constraint ‖∆w‖1 ≤ U∗ The ∆w denotes the portfolio changes between two holding periods.

benchmark exposure constraint ‖w −wb‖1 ≤ Ub It is commonly used in index tracking.market-neutral constraint βTw = 0 It designs a portfolio that is insensitive to swings of the overall market.

other constraints: margin constraint, risk factor constraints, sector constraint, minimum transaction size, etc.

aAll these constraints can be modified to more general cases, like the sparsity constraint can be modified taking the sector information into consideration.bLeverage constraint is commonly used if short-selling is allowed, one needs to limit the amount of leverage to avoid ridiculous solutions where some

elements of w are too large with positive sign and others are too large with negative sign canceling out. When used together with the budget constraint, it isrequired that L ≥ 1. Specifically, L = 1 means no shorting (so equivalent to w ≥ 0) and L > 1 allows some shorting as well as leverage in the longs.

box uncertainty set, elliptical uncertainty set, etc. [78]. Forexample, in [79], it is assumed min/max bounds are onreturns and ellipsoidal uncertainty for covariance. In [80], it isassumed known factor covariances and the exposures subjectto error and risk error and return errors are orthogonal. It isassumed certain covariance data with ellipsoidal uncertaintyfor returns in [81]. Robust optimization formulations leadto challenging mathematical problems but in many casesmodern optimization techniques such as second-order coneoptimization or semidefinite optimization provide tools andsoftware that make robust portfolio optimization problemscomputationally tractable.

Different from the worst-case optimization which is com-monly carried out in a static case, stochastic optimization havealso been used for robust portfolio optimization, where theparameters are modeled as random variables that fluctuatesaround its true value and probability distributions needs to beassumed to represent the parameter uncertainties [82], [83],[84]. Another way to design robust portfolios is the chance-constrained optimization, which will be briefly mentioned inSection VIII-C. Interested readers on robust portfolio designcan refer to [84], [85].

B. Regularized Portfolio

A natural approach to extend the MVP framework is theregularized portfolio which refers to either adding a penaltyterm in the objective or imposing a constraint on the weightvector and the regularizers can be simply a norm or a portfolioperformance measure, whose intensity is controlled by atuning parameter. Regularization techniques can be used forrobustness and stability purposes. Portfolio managers oftenimpose limits on the portfolio weights of securities or groupsof securities to avoid extreme weights that may result frommodel inaccuracies. An `2-norm can be heuristically includedin the MVP objective as a shrinkage term to mitigate thesingularity issue of covariance matrix. In [86], [87], it is shownthat the no short-selling constraints are equivalent to reducingthe estimated security covariances, whereas upper bounds areequivalent to increasing the corresponding covariances. Forexample, stocks that have high covariance with other stocks

tend to receive negative portfolio weights. Therefore, whentheir covariance is decreased (which is equivalent to theeffect of imposing no short-selling constraints), these negativeweights diminish in magnitude. Similarly, stocks that havelow covariances with other stocks tend to get over-weighted.Hence, by increasing the corresponding covariances the impactof these over-weighted stocks decrease.

One important regularizer used is the sparsity inducingpenalty or constraint to reduce the number of active positionsin a portfolio. A penalized sparse MVP is given as follows:

minimizew

wTΣw + γ ‖w‖0subject to wTµ = ρ

w ∈ W,

(Sparse MVP)

where ‖w‖0 is the `0-“norm” sparsity regularizer. Accordingto Section IV-H, this problem can be recast as a sparseconstrained linear regression problem as

minimizew

1

N‖Rw − α1T ‖22 + γ ‖w‖0

subject to wTµ = ρ

w ∈ W.

(Sparse MVP)

Due to the non-smoothness of ‖w‖0, the simplest alternativeapproach is to use the `1-norm. In [88], it was shown thatLASSO results in constraining the gross exposures. It alsoimplicitly accounts for the gross exposure (i.e., a constrainton the total amount of shorting), transaction costs, and setsan upper bound on the portfolio risk depending just on themaximum estimation error of the covariance matrix [89]. In[90], the portfolio optimization problem is considered usinglinear a regularizer for the market impact9. For more workson the regularized portfolio design, please refer to [91], [92].

C. Alternative-Risk Portfolio

In addition to the robust counterparts of MVP, there arealso formulations based on alternative risk measures to thevariance. As mentioned before, the mean return is very relevant

9Market impact is the effect that a market participant has on prices frombuying and selling assets.

Page 15: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 15

1−α

ProbabilityVaR Max Loss

LossVaR Deviation

CVaR

CVaR Deviation

Mean

Max Loss Deviation

Fre

quen

cy

Fig. 7. Value-at-risk (VaR) and conditional value-at-risk (CVaR).

as it quantifies the average benefit of the investment. However,in practice, the average performance is not good enough andone needs to control the probability of going bankrupt. Riskmeasures control how risky an investment strategy is. Themost basic measure of risk is the variance as considered byMarkowitz [1]: a higher variance means that there are largepeaks in the risk distribution which may cause a big loss.However, Markowitz himself already recognized and stressedthe limitations of the mean-variance analysis [44]. Variance isnot a good measure of risk in practice since it penalizes boththe unwanted high losses and the desired low losses (or gains)[93]. Indeed, the mean-variance portfolio framework penalizesup-side and downside risk equally, whereas most investorsdo not mind upside risk. Objections against the symmetry ofvariance of returns as a measure of risk has lead to variousasymmetric risk definitions [94].

One example arising in the early 1950s is the downsiderisk, where the idea is that the left-hand side of the returndistribution involves risk while the right-hand side containsthe better investment opportunities. One particular case ofthe downside risk is the semi-variance, already consideredby Markowitz [44]. The semi-variance (SVar) or downsidevariance is measured by the squared deviations of returns lessthan the mean:

SVarp (w) = E

[((µp (w)−wT rt

)+)2],

where (·)+ , max (0, ·). We can also define the semi-deviation or downside deviation as

√SVarp (w). Some other

downside risk measures are (below-)target semi-variance and(below-)target semi-deviation. Using the downside risk mea-sures can decrease the numerical tractability of the resultingproblems [95].

To overcome this drawback, other popular single-side riskmeasurements have been proposed. The Value-at-Risk (VaR)was proposed by J.P. Morgan and Reuters in 1994 [96]. Denote−wT rt as the portfolio loss. The VaR of the portfolio loss atα confidence level for a certain time horizon, shown in Fig.

7, is defined as the lower α-percentile of the portfolio loss:

VaRp (w, α) = infη

{η|P

(−wT rt ≤ η

)≥ α

}.

However, VaR cannot take into account the shape of lossesexceeding the VaR, is nonconvex, and is not sub-additive [97].The conditional value-at-risk (CVaR) (a.k.a. expected shortfall,average value-at-risk, and expected tail loss) was proposedto take into account the shape of the losses exceeding theVaR. (See Improving the Measure of Risk: From Varianceto Value-at-Risk and to Conditional Value-at-Risk.) CVaRat 100 (1− α)% level for a certain time horizon shown inFig. 7 is the expected return of the portfolio in the worst100 (1− α)% of cases:

CVaRp (w, α) = E[−wT rt| −wT rt ≥ VaRp (w, α)

].

There are some other alternative measures like entropic valueat risk, drawdown, maximum drawdown, average drawdown,conditional drawdown at risk [98].

Based on these alternative risks, some mean-“alternativerisk” portfolio can be formulated. For example, the mean-CVaR portfolio is given as follows [99]:

minimizew

CVaRp (w, α)

subject to wTµ = ρ

w ∈ W.

(Mean-CVaR Port.)

When the VaR is used as a constraint, it coincides withthe chance-constrained optimization described mentioned inSection VIII-A. The mean-drawdown portfolio is consideredin [100].

Similar to the definition of Sharpe ratio (excess returndivided by the SD), alternative-risks are also used to definerisk-adjusted return measurements. For example, the Sortinoratio (STR) based on SVar [101] is calculated as follows:

STRp (w) =µp (w)− rf√SVarp (w)

=wTµ− rf√SVarp (w)

.

Other risk-adjusted return ratios based on drawdown areBurke ratio, Calmer ratio, and Sterling ratio. Like MSRP, themaximum STR portfolio can be designed by

maximizew

wTµ− rf√SVarp (w)

subject to w ∈ W.

(MSTRP)

Note that the robust procedure discussed in Section VIII-Acan also be used by combining with the alternative riskslike the worst-case VaR portfolio [76] and worst-case CVaRportfolio [102], [103], [104]. Especially, in [105], the worst-case CVaR portfolio is designed under Student’s t and skewedt distribution.

D. General Expected Utility Portfolio

The mean-variance framework is a special case of so-calledexpected utility maximization where investors are assumed tohave quadratic utility or returns distributions are jointly normal[106]. This may sometimes be limiting as many financialreturn distributions are not jointly normal, but exhibit fat

Page 16: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 16

Improving the Measure of Risk: From Variance to Value-at-Risk and to Conditional Value-at-RiskThe variance or, equivalently, standard deviation (SD) (i.e., volatility) was proposed as a risk measure by Markowitz in 1952.The idea was that investor was interested in the momentum, i.e., in the expected return, while the fluctuations were consideredas undesired noise; hence the measure of variance or standard deviation as risk. However, SD is not a “good” risk measure,since it only measures the dispersion of the data and not sensitive to the data distribution shape. Value-at-risk (VaR) wasproposed by J.P. Morgan in 1994, which is sensitive to the data distribution shape (strictly speaking, SD is called a deviationrisk measure and VaR is called downside risk measure [93]). Figs. (a) and (b) show the probability distribution for the portfolioloss distributions over a specified period of time and illustrate the difference between SD and VaR. The losses in Fig. (a) and (b)with reverse distribution shapes have the same SD but different VaR. While VaR is sensitive to the shape of the distributions, ithas several undesirable mathematical characteristics, specifically is not sensitive to the shape of the tail of the loss distribution.Conditional value-at-risk (CVaR) is an alternative to VaR that considers the tail shape of the loss distributions; to be exact, itmeasures the expected value of the tail. This idea of CVaR can be shown in Fig. (b) and (c). Both portfolios have the sameVaR. However, the portfolio in Fig. (c) is much riskier that that in Fig. (b) because potential losses are much larger, whichcan be clearly shown by CVaR since it is much higher. To summarize, we may say SD asks the question “how uncertain arethings in general?”, VaR asks “how bad can things get?”, and CVaR asks “if things do get bad, what is the expected loss?”.

tails and asymmetry that cannot be described by their mean-variances alone. A general expected utility portfolio can begiven by

maximizew

Ert∼D[U((1 +wT rt

)B)]

subject to w ∈ W.

Kelly portfolio, proposed in [107], can be interpreted asportfolio optimization based on expected logarithmic utility.In [108], the performance of portfolio allocation was com-pared by maximizing expected power utility with that of thestandard mean-variance optimization. There are studies withother utility functions [109], [110].

Some studies show that the quadratic utility function pro-vides a good approximation for many of the standard utilityfunctions such as exponential, power, and logarithmic utilityunder the elliptical distribution assumption for asset returns[111]. However, for discontinuous or S-shaped utility functionsor asset returns exhibiting skew, fat-tails, and high correlation,the result no longer holds true and mean-variance optimizationshows significant loss in utility compared to an optimizationof the full utility function [112]. Given the computationalpower available today, it is possible to construct portfolios(at least portfolios of small size) directly maximizing theexpected utility under the empirical asset return distribution.Another related portfolio optimization method alternative tothe general expected utility maximization is to extend themean-variance framework by directly incorporating portfolioskew and kurtosis. In fact, such extensions can be seenas approximations to general expected utility maximization,where one considers a Taylor series expansion of the utilityfunction and drops the higher order terms from the expansion.

Interested readers on this topic are referred to [113], [114],[115].

E. Maximum Diversification Portfolio

In [116], it was proposed that markets are risk-efficient, suchthat investments will produce returns in proportion to theirtotal risk (measured by volatility). Consistent with the viewthat returns are directly proportional to volatility, the diversifi-cation ratio (DR) was defined substituting weighted volatilityfor weighted return in the SR. Define σ ,

√diag (Σ), where

diag(·) is a vector with the diagonal elements of the matrixargument. The DR is given by

DR (w) =

∑Nn=1 wnσn

σp (w)=

wTσ√wTΣw

.

This ratio is by construction always larger than or equal to 1and the investor typically prefers a higher value of the DR.The maximum diversification portfolio (MDP) is defined as

maximizew

wTσ√wTΣw

subject to w ∈ W.

(MDP)

An interesting implication explored at length in a follow-uppaper [117] is that maximized DR quantifies the “amount ofdiversification” in the portfolio. This is quite intuitive. Thevolatility of a portfolio of perfectly correlated assets would beequal to the weighted sum of the volatilities of its constituents,because there is no opportunity for diversification. When assetsare not perfectly correlated, the weighted average volatilitybecomes larger than the portfolio volatility in proportion tothe amount of diversification that is available. Therefore, the

Page 17: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 17

DR, which is to be maximized, actually quantifies the degreeto which the portfolio risk can be minimized through strategicplacement of weights on diversifying the assets.

F. Maximum Decorrelation Portfolio

Maximum decorrelation, described in [118], is closely re-lated to GMVP and MDP, but applies to the case where aninvestor believes all assets have similar returns and volatility,but heterogeneous correlations. The maximum decorrelationportfolio (MDCP) is found by solving

maximizew

wTCw

subject to w ∈ W,(MDCP)

where C , (Diag (Σ))− 1

2 Σ (Diag (Σ))− 1

2 is a correlationmatrix, where Diag(·) defines a diagonal matrix with diagonalelements given by the diagonal elements of the matrix argu-ment. Interestingly, when the weights derived from the MDCPare divided by their respective volatilities and re-standardizedso that they sum to 1, we retrieve the MDP weights. Thus,the portfolio weights that maximize decorrelation will alsomaximize the DR when all assets have equal volatility andmaximize the SR when all assets have equal risks and returns.

Portfolios like GMVP, MVP, MDP, and MDCP introducedso far may have the drawback that they can be quite concen-trated in a small number of assets. For example, the GMVPwill place disproportionate weight in the lowest volatilityasset while the MDP will concentrate in assets with highvolatility and low covariance with the market. There aresituations where this may not be preferable. Concentratedportfolios also may not accommodate large amounts of capitalwithout high market impact costs. In addition, concentratedportfolios are more susceptible to mis-estimation of volatilitiesor correlations. These issues prompted a search for heuristicoptimizations that meet similar optimization objectives, butwith less concentration, of which the EWP is an example. Inthe following, we will introduce several compelling methodsto design more diversified portfolios.

G. Weight Budgeting Portfolio

In Section IV, we have introduced the EWP and QuintilePortfolio. Basically, in those portfolios, we directly allocatea fixed ration of allocation to specific assets which can besummarized as follows:

wn = bn, ∀n = 1, . . . , N, (WBP)

where b , [b1, . . . , bN ] defines the allocation preference andsatisfies 1Tb = 1. This kind of allocation strategy to constructwell-diversified portfolio is commonly called weight budgetingportfolios (WBP) or capital budgeting portfolio.

Another famous example of WBP is the market-capitalization weighted portfolio (MWP) (a.k.a. market-valueweighted portfolio). In an equity market, stocks are dividedinto large, mid, small, and micro-cap ones. An MWP isa portfolio with individual components that are weightedaccording to their total market capitalization. It is widely usedto construct index fund in mutual fund companies. The MWP

can be interpreted as to track a market-capitalization weightedindex, which will be introduced in Section VIII-J. People alsoproposed the diversity weighted portfolio (DWP), where theweights are chosen as the power transformations of the marketweights [119].

H. Risk Budgeting Portfolio

Although portfolio management did not change much dur-ing the 40 years after the seminal works of Markowitz andSharpe, the development of risk budgeting techniques markedan important milestone in the deepening of the relationshipbetween risk and asset management. In the traditional MVP,the risk of the portfolio is only considered as a whole andthe risk diversification is usually bad. In a typical 60/40 USequities-bond portfolio, the equity part is often responsible formore than 90% of the total portfolio’s volatility. Risk paritybecame a popular financial model after the global financialcrisis in 2008 [52], [120]. Since then risk management [121]has particularly become more important than performancemanagement in portfolio optimization. (See What is RiskManagement in Finance?) The risk parity portfolio design hasbeen receiving significant attention from both the theoreticaland practical sides because it diversifies the risk, instead ofthe capital, among the assets and is less sensitive to param-eter estimation errors. Today, pension funds and institutionalinvestors are using this approach in the development of smartindexing and the redefinition of long-term investment policies[52]. (See A New Era for Asset Management: From "Dollar"Diversification to Risk Diversification.)

One of the important concepts in portfolio management isquantifying the risk of individual components (such as thatof strategies and/or securities) to the total portfolio risk. Thedistribution of risk contributions from different portfolio com-ponents can then be used to measure the level of diversificationwithin the portfolio. The marginal contribution to risk (MRC)of asset n with respect to portfolio weight wn is defined bythe first derivative of the portfolio variance σp

MRCn (w) =∂σp (w)

∂wn=

(Σw)nσp (w)

.

The MRC measures the sensitivity of the portfolio variancegiven a change in asset nth weight. In general, MRC can bedefined based on other risk measures, like VaR and CVaR. Therisk contribution (RC) of one asset is defined by the productof the asset weight in the portfolio and MRC

RCn (w) = wnMRCn (w) =wn (Σw)nσp (w)

.

In general, the RC of an asset to a portfolio can be negative. Animportant property of this definition is that the sum of the RC’sfor all assets in the portfolio is the total risk of the portfolio,that is

∑Nn=1 RCn (w) =

∑Nn=1

wn(Σw)nσp(w) = wTΣw

σp(w) = σp (w)(known as the Euler’s theorem). The relative risk contribution(RRC) of an asset can also be fined to be the ratio of its RCto the total portfolio risk, that is RRCn (w) = RCn(w)

σp(w) .The goal of the risk parity portfolio is to allocate the

weights so that all the assets contribute the same amount of

Page 18: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 18

0%

25%

50%

75%

100%

0 10 20 30

Weight allocation of EWP

0%

25%

50%

75%

100%

0 10 20 30

Weight allocation of GMVP

0%

25%

50%

75%

100%

0 10 20 30

Weight allocation of MDP

0%

25%

50%

75%

100%

0 10 20 30

Weight allocation of ERP

0.00

0.25

0.50

0.75

1.00

0 10 20 30

Rel

ativ

e R

isk

Con

tribu

tion

AAPL

GOOG

MSFT

AMZN

FB

0%

25%

50%

75%

100%

0 10 20 30

Risk allocation of EWP

0%

25%

50%

75%

100%

0 10 20 30

Risk allocation of GMVP

0%

25%

50%

75%

100%

0 10 20 30

Risk allocation of MDP

0%

25%

50%

75%

100%

0 10 20 30

Risk allocation of ERP

0.00

0.25

0.50

0.75

1.00

0 10 20 30

Rel

ativ

e R

isk

Con

tribu

tion

AAPL

GOOG

MSFT

AMZN

FB

Fig. 8. Evolution over time of the weight and risk allocations of EWP, GMVP, MDP, and ERP (in-sample training period).

What is Risk Management is Finance?In the financial world, risk management is the process of iden-tification, analysis, and acceptance or mitigation of uncertaintyin investment decisions. Essentially, risk management occurswhen an investor or fund manager analyzes and quantifies thepotential for losses (called risk exposure) in an investmentand then takes the appropriate action (or inaction) giventhe investment objectives and risk tolerance. In practice, allfinancial companies have a risk department that takes all thepositions from all other teams (investment funds, hedge funds,etc.) and assesses their risk exposure.

risk, effectively “equalizing” the risk. Given an N -dimensionalcovariance matrix Σ, a portfolio w is called a risk parityportfolio or equal risk contribution portfolio (ERP) [122] withrespect to Σ if it satisfies the following equivalent conditions:

RCn (w) =1

Nσp (w) or RRCn (w) =

1

N, ∀n = 1, . . . , N.

(ERP)In view of the last equation, risk parity portfolios can becompared to EWP where wn = 1

N for n = 1, . . . , N , whereinstead of allocating capital evenly across all the assets inthe investment universe, RBP aims at allocating the total riskevenly across the assets. More generally, we can have the riskbudgeting portfolio (RBP):

RCn (w) = bnσp (w) or RRCn (w) = bn, ∀n = 1, . . . , N,(RBP)

where bn denotes the risk budget for asset n with bn ≥ 0 and∑n bn = 1. ERP is a special case of the RBP with bn = 1/N .

In general, finding a risk parity portfolio is not trivial.But when portfolio designed under budget and no-shortingconstraints, it can be solved as a convex problem. (SeeDiscovering "Convexity" in Risk Budgeting Portfolio.) In moregeneral settings, the RBP can be formulated by minimizing the

deviation between RC and the target budgeting [36] like

minimizew

N∑i=1

(RCn (w)− bnσp (w))2

subject to w ∈ W.

(RBP)

Some other RBP problem formulations can be found in [127],[36]. This problem in general is nonconvex. To solve such anonconvex problem, efficient nonconvex optimization methodshave been applied [128], [129], [36], [126]. RBP can generallyused for asset allocation across different geographic regionsor different risk factors in an equity portfolio [130] and beadapted to other investment targets (like mean-risk portfolioand index tracking), in which risk budget constraints can beimposed. In fact, people find ERP is a trade-off between theGMVP and EWP. Thus, to some degree, it inherits some of theproperties of both portfolios. Its volatility lies between thoseof the GMVP and of the EWP. On the one hand, comparedwith the GMVP, which typically concentrates the risk on a fewassets, the ERP assigns an equal amount of risk to each asset,which is diversification in the true sense. On the other hand,compared with the EWP, the ERP is less risky overall and itis not heuristic. A comparison of several diversification-basedportfolios is given in Fig. 8.

I. Inverse Volatility and Inverse Variance Portfolio

Similar to RBP, the aim of inverse volatility (weighting)portfolio (IVP) is to control the portfolio risk. Define σ ,√diag (Σ), it is given by

w =σ−1

1Tσ−1. (IVP)

Therefore, lower weights are given to high volatility assetsand higher weights to low volatility assets. IVP is also called“equal volatility” portfolio since the weighted constituentassets have equal volatility. The IVP strategy also coincideswith ERP when there is no correlation between the underlying

Page 19: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 19

A New Era for Asset Management: From “Dollar” Diversification to Risk DiversificationRisk parity (or risk premia parity) is an approach to portfolio management that focuses on allocation of risk rather than allocationof capital. The risk parity approach asserts that when asset allocations are adjusted to the same risk level, the portfolio canachieve a higher Sharpe ratio and can be more resistant to market downturns. While the minimum variance portfolio triesto minimize the variance (with the disadvantage that a few assets may be the ones contributing most to the risk), the riskparity portfolio tries to constrain each asset (or asset class, such as bonds, stocks, real estate, etc.) to contribute equally tothe portfolio overall volatility. The term “risk parity” was coined by Edward Qian from PanAgora Asset Management in 2005[52] and was then adopted by the asset management industry. Some of its theoretical components were developed in the 1950sand 1960s but the first risk parity fund, called the “All Weather” fund, was pioneered by Bridgewater Associates LP in 1996.Interest in the risk parity approach has increased since the late 2000s financial crisis as the risk parity approach fared betterthan traditionally constructed portfolios. Some portfolio managers have expressed skepticism about the practical applicationof the concept and its effectiveness in all types of market conditions but others point to its performance during the financialcrisis of 2007-2008 as an indication of its potential success.

Discovering “Convexity” in Risk Budgeting PortfolioConsider the following risk budgeting equations

wn (Σw)n = bnwTΣw n = 1, . . . , N

with constraints 1Tw = 1 and w ≥ 0. This is a problemof solving a constrained quadratic system of equations. If wedefine w = w/

√wTΣw, the above equations can be rewritten

compactly asΣw = b/w

with w ≥ 0 and we can always recover the portfolio bynormalizing: w = w/(1T w). Interestingly, the risk budget-ing portfolio design can be recast as the following convexoptimization problem [123], [124], [125], [126]:

minimizew≥0

1

2wTΣw − bT log (w) .

To see the equivalence, just write the optimality condition ofthe above problem, i.e., the gradient equal to zero:

Σw − b/w = 0.

assets. Assuming that Σ is diagonal and with constraints1Tw = 1 and w ≥ 0, the RBP is readily given by

wn =

√bnσ

−1n∑N

i=1

√biσ

−1i

∀n = 1, . . . , N.

When bn = 1, it reduces to the IVP. However, for non-diagonal Σ or with other additional constraints, this closed-form solution does not exist in general and some optimizationprocedures have to be constructed. The previous diagonalsolution can always be used and is commonly called naiverisk budgeting portfolio.

Instead of using volatility, a similar portfolio called inversevariance (weighting) portfolio can be designed where portfolioweights are inversely proportional to the variances of assets.Interestingly, a GMVP with a diagonal covariance matrix Σwith constraints 1Tw = 1 and w ≥ 0 coincides with theinverse variance portfolio.

J. Index Tracking PortfolioIndex tracking (a.k.a. index replication, indexing, or indexa-

tion) is one of the most popular passive portfolio management

strategies. (See Two Paths of Asset Management: Active vs.Passive.) It refers to the problem of reproducing or mimicking

Two Paths of Asset Management: Active vs. PassiveFund managers from investment institutions commonly followtwo basic types of investment strategies: active and passive.Active management is based on the assumption that the marketis not perfectly efficient and the so-called “free lunch” exists[131]. Passive management, on the other hand, relies on theefficient market assumption which implies that the marketcannot be beaten in the long run [132], like the classic60/40 portfolio and index tracking fund. Active managementstrategies have significantly higher management fees withthe promise of better performance. However, analysis of thehistorical data has shown that in the long run, the majority ofthem do not outperform the market which has historically risen[133]. These reasons have recently attracted more attentionfrom investors to passive management strategies instead.

the performance of a market index. The most straightforwardway is to construct an efficient portfolio called index trackingportfolio (a.k.a. index tracking fund) by buying appropriatequantities of all the stocks that compose the index, whichguarantees a perfect tracking. The MWP discussed in SectionVIII-G is a common index tracking portfolio. By tracking anindex, an investment portfolio typically gets good diversifica-tion, low turnover (good for keeping down internal transactioncosts), and low management fees.

A naive way to realize index tracking is to buy all theunderlying assets in an index. However, this approach hasseveral important practical drawbacks like the transaction costsincurred by allocating capital to all the assets and the dangerof holding positions on illiquid assets. In addition, the marketindex, such as S&P 500, usually contains a large universe ofstocks, which means the problem is high-dimensional. Theseissues have led to a solution based on the construction of asubset of the index. Fig. 9 shows an example where the S&P500 index is tracked by 50, 30, and 20 assets.

Traditionally, the sparse index tracking portfolio involvestwo steps, namely stock selection and capital allocation [134],[135]. But this procedure has two major drawbacks: i) stockselection is difficult, and although various stock selectionmethods have been proposed, most of them are very heuristic

Page 20: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 20

1.0

1.1

1.2

Jul 2010 Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 Jan 2011

Cu

mu

lative

Pn

L

S&P 500

50 Active assets

30 Active assets

20 Active assets

Fig. 9. Index tracking for S&P 500 with different number of assets.

with the effect in the tracking performance not clear; and ii)from the perspective of minimizing the tracking error, this two-stage approach is obviously not optimal. As an alternative,a one-step index tracking optimization problem is given asfollows [136]:

minimizew

1

N‖Rw − rb‖22 + λ ‖w‖0

subject to w ∈ W,(1)

where the first term denotes the tracking error criterion withrb = [rb,1, . . . , rb,T ]

T denoting the returns of the target indexand the `0-“norm” penalty plays the role of asset selection.

In practice, the sector information (e.g., technology sector,health sector, utilities sector) can be taken into account in theindex tracking portfolio problem so that sparsity is controlledon a sector-by-sector basis [137]. Combining the concept ofindex tracking with active management methods like the mean-variance portfolio will lead to the enhanced index trackingproblems [138], [139].

K. Statistical Arbitrage and Mean-Reverting Portfolio

In the previous sections, we have discussed several mo-mentum strategies based on the modeling of portfolio returns.In this section, we will discuss one contrarian strategy calledstatistical arbitrage (Stat Arb) which evolves from the pairstrading strategy [140], [141], [142]. (See Trading the "Noise":Pairs Trading Strategy.) Different from using the returns, instatistical arbitrage the trading is based on the portfolio price,termed “spread”:

yp,t = wTyt, (MRP)

where yt denote the stock log-prices. The portfolio definingthis spread is called mean-reverting portfolio (MRP). Spreadcharacterizes the mispricing between assets and exists as aresult of the market inefficiency. The idea of Stat Arb is that al-though the log-prices yt are nonstationary, the resulting spreadyp,t is stationary and, specifically, has the mean-revertingproperty. The spread oscillates around some equilibrium meanvalue and when it deviates it will eventually come back tothe equilibrium value. The portfolio weights w describe thestationary and hence mean-reverting relationship of underlying

1.5

2.0

2.5

2001 2002 2003

log

−p

rice

EWHEWZ

−2

0

2

2001 2002 2003

Sp

rea

d1

2

3

4

2001 2002 2003

Cu

mu

lative

Pn

L

Fig. 10. Illustration of the pairs trading: two nonstationary log-prices, thestationary mean-reverting spread, and the cumulative PnL obtained by tradingit.

nonstationary time series. The idea is to buy the syntheticasset, characterized by the spread when it is cheap and shortwhen it is expensive. Trading on the spread implicitly meansbuying and shorting different assets according to the weightsw that define the spread [143].

Since Stat Arb focuses on the relative prices of a pair of (ormore) stocks, investors or arbitrageurs embracing this strategydo not need to forecast the absolute price of every single asset,which by nature is hard to assess, but only the relative pricing.Stat Arb is referred to as a market-neutral strategy [144], [145],since profits do not depend on the movements of the generalfinancial markets. (See What is Market-Neutral Strategy inFinance?) Stat Arb is popular among institutions, hedgefunds, and individual investors as it protects them againstthe market swings. Many bank proprietary operations nowcenter to varying degrees around statistical arbitrage trading.Companies in the same financial sector or industry usuallyshare similar fundamental characteristics and economies mayhave exposure to very similar industries. In these cases, theasset prices may co-move under the same trend based onwhich cointegration relations can be established as shown byan illustration example is given in Fig. 10.

Some papers studied the pairs trading strategy based onKalman filtering. Another line of research is the optimalportfolio design problem. Traditionally, w is defined basedon a cointegration relation [34] like using the VECM model(see Section III). However, through cointegration analysis asubspace is identified which leads to infinite cointegrationrelations. In this case, the MRP design aims at designing an

Page 21: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 21

Trading the “Noise”: Pairs Trading StrategyPairs trading, as the most basic form of statistical arbitrage, is a famous quantitative trading strategy pioneered by GerryBamberger, David Shaw, and the trading group led by Nunzio Tartaglia at Morgan Stanley in the mid 1980s. Pairs tradingconsists of two assets for trading, where the pairs of stocks are properly selected from the universe of stocks based onfundamental or market-based similarities. When one stock in a pair outperforms the other, the poorer performing stock isbought with the expectation that it will later recover, while the other is sold short. Mathematically, the two assets are combinedinto one synthetic asset that has the property of mean reversion, i.e., it may temporarily deviate from its equilibrium or meanvalue but it will eventually revert back to it. Various statistical tools have been used for discovering pairs of assets with the rightproperties ranging from simple distance-based approaches to more complex tools such as cointegration and copula concepts.As a trading strategy, it is a heavily quantitative and computational approach to securities trading. It involves data mining andstatistical methods, as well as the use of automated trading systems. Extending to multiple assets, statistical arbitrage considersnot just pairs of stocks but a larger universe of stocks-some long, some short-that are carefully matched by sector and regionto eliminate exposure to the market (i.e., the beta is zero) or other risk factors.

What is Market-Neutral Strategy in Finance?A market-neutral strategy is a type of investment strategywhich aims at making a profit from the individual movementsof the assets with minimal effects from the overall movementsof the market. But how can this be achieved? Being market-neutral can imply dollar-neutral, beta-neutral, or both. Adollar-neutral strategy has zero net investment (i.e., equaldollar amounts in long and short positions: 1Tw = 0). Abeta-neutral strategy targets a zero total portfolio beta (i.e.,the beta of the long side equals the beta of the short sideβTw = 0, where β is the factor loading of the market factorin CAPM). While dollar-neutrality has the virtue of simplicity,beta-neutrality better defines a strategy that is uncorrelatedwith the the markets. In practice, neutrality strategies can begenerally established based on other factors such as currency,sector, industry, or market capitalization. By doing this, astrategy can get rid of its exposure to specific factors andhence the risk from those factors may be avoided.

optimal portfolio within the cointegration space. Details onthe mean-reverting portfolio optimization problem be foundin [146], [147], [148]. Combing the index tracking idea andstatistical arbitrage portfolio, we can also design the indexarbitrage portfolio. Because of the large number of stocksinvolved, the high portfolio turnover and the fairly small sizeof the effects one is trying to capture, the strategy is oftenimplemented in an automated fashion and great attention hasbeen placed on reducing trading costs.

L. Learning-Based Portfolio

The value of machine learning in finance is becomingmore apparent by the day. As banks and other financialinstitutions strive to beef up security, streamline processes,and improve financial analysis, machine learning is becomingthe technology of choice [149]. Portfolio optimization has alsobenefited a lot from machine learning [150].

In the big data era, portfolio optimization can be carriedout to involve massive datasets existing in the market likehigh-frequency data. The data analysis processes in this set-ting can be difficult and sensitive to human decisions. Deeplearning has become a mainstream methodology since it can

automatically learn the best features [151]. The deep learninghierarchical models can be potentially used for portfolio opti-mization. Especially, deep reinforcement learning for portfoliooptimization can be used for dynamic portfolio constructionin a dynamic trading environment [152]. Deep learning iscommonly regarded as a data-hungry method. However, thefinancial return data is generally scarce (except high-frequencydata). How to feed the leaning model with enough and high-quality data is an interesting research problem. Leveraging onlearning-based portfolio optimization, we can consider the so-called end-to-end solutions where the input is the historicalinformation (like asset returns, trading volumes, companyincome statements, financial reports, media news, etc.) andthe output is the portfolio.

M. Multi-Period Portfolio

The work of Markowitz [1] only dealt with portfolio designfor the single-period case. Later, several works [153], [154]illustrated that single-period portfolio choice policies are ingeneral not optimal as they do not capture inter-temporaleffects and hedging demands (called single period distortion).Return predictability and market impact naturally give rise tointer-temporal hedging demands for securities, and investorsneed to look beyond just the next period when optimallyallocating across securities [155]. For instance, market impactcosts from trades in the current period have an effect onprices in later periods. In general, prices move against theportfolio manager. Market impact costs are costs associatedwith the immediacy of trading. In general, if the trading isdone quickly (slowly) market impact costs are higher (lower).Portfolio managers frequently break up their orders (parentorders) into smaller pieces (child orders) and trade those overlonger periods of time [156].

Rather than using single-period models to rebalance theportfolio from one period to another, a multi-period portfolio(a.k.a. dynamic portfolio) optimization framework allows usto jointly model risk, return predictability (alpha) and itsdecay, and impact costs, as well as incorporating standardportfolio constraints [157]. A robust version of the multi-period optimization is given in [158] and a recent overview ingiven in [159].

Page 22: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 22

N. Multi-Account Portfolio

Previously, each portfolio was considered and optimizedindividually disregarding the effect or impact on other port-folios. However, portfolio managers usually manage multipleaccounts adding impact on each other. If this aggregate marketis not considered when each account is individually optimized,the actual market impact can be severely underestimated. Thus,a more realistic way is to analyze and optimize the multipleportfolios jointly while adhering to both the account-specificconstraints and also some global constraints presented onall accounts. The holistic approach is termed multi-accountportfolio optimization. Taking each portfolio account as aplayer, and multi-portfolio optimization can be modeled viagame theory [160]. Based on the MVP problem, the multi-account portfolio optimization is given as follows [161]:

minimizewk

wTk Σwk − λkw

Tk µ+ TC (wk,w−k)

subject to wk ∈ Wk, ∀ k = 1, . . . ,K,

where wk is the portfolio, λk is the return-risk trade-offparameter of account k and TC (wk,w−k) is the transactioncost caused by the market impact of all the other accounts,with w−k denoting the portfolio of other accounts exceptaccount k.

O. Multi-Manager Portfolio

A multi-manager portfolio, often referred to as fund of funds(FOF), is an investment strategy of holding a portfolio of otherinvestment funds rather than investing directly in stocks, bondsor other securities. There are different types of FOF, eachinvesting in a different type of collective investment scheme(typically one type per FOF), for example a mutual fund FOF,a hedge fund FOF, a private equity FOF, or an investmenttrust FOF. The original FOF was created by Bernie Cornfeldin 1962 and went bankrupt after being looted by Robert Vesco.Interested readers on FOF portfolio optimization can refer to[162], [163], [164].

Although we have discussed several important portfoliodesign problems, there are many other interesting portfolioslike the factor mimicking portfolio (a.k.a. mimicking portfolio)[30], the absolute return portfolio [165], the universal portfolio[166], the information-theoretic entropy portfolio [167], etc.Due to the space limit, we cannot go into these.

IX. PORTFOLIO PERFORMANCE ANALYSIS ANDPORTFOLIO BACKTESTING

A. Portfolio Performance Analysis

In the previous sections, we have introduced multiple cri-teria for portfolio performance evaluation. It should be notedthat all the measures introduced before and in this section canbe either the optimization criteria in portfolio design or thediagnostic criteria in performance testing. In this section weintroduce two more portfolio performance measures.

We have introduced a wide variety of portfolios. The targetof portfolio design is to diversify the assets and to expectfor a future profit. A widely used measurement for the returnperformance is the compound annual growth rate (CAGR).

CAGR defines as the rate of return of an investment overa certain period of time, expressed in annual percentage:CAGRp = (BT /B0)

1/T − 1, assuming T investment periodswith initial value (or budget) B0 and end value BT .

In practice, we may need to rebalance our portfolio duringthe investment period. (See To Rebalance, or Not To Re-balance?) As frequent rebalancing of a portfolio is costly,

To Rebalance, or Not To Rebalance?After the portfolio capital allocation w has been designed,the buy and sell orders have to be executed, i.e., sent to thebroker for execution in the market in the form of numberof shares to buy or sell. However, the real-time asset holdingvalues will be changing over time since the asset prices changeand, as a consequence, the portfolio w will also be changing(the number of shares is constant, but not the dollar amount).After some investment period of time, the portfolio w mayhave changed significantly from its originally designed value.The question is: should we leave it as it is or should werebalance w to the originally designed value? This leads to theportfolio rebalancing problem, which plays an important rulein portfolio management. There are generally two different re-balancing ways. The buy-and-hold (B&H) strategy, also calledposition trading, is to buy an initial allocation mix and hold itindefinitely, without rebalancing, regardless the performance.B&H is basically a “no-rebalancing” strategy. The rebalancingstrategy, also called constant-mix, is to change the allocationmix manually when the current w has deviated too much fromthe originally designed one. The rebalancing can be carriedout at fixed intervals or in a dynamic fashion. In practice,which strategy to use depends on many factors like the marketcondition (bull or bear) and the investment preferences ofclients and portfolio managers.

the turnover (TO) of a portfolio is an important measure inportfolio analysis. A one-period TO between portfolio wp,t

and wp,t−1 is given by TOp,t = ‖wp,t −wp,t−1‖1 whichmeasures the change of portfolio capital weights between twoconsecutive periods. A multi-period TO is given by TOp =∑T

t=1 ‖wp,t −wp,t−1‖1 assuming T times of rebalancing.Another portfolio measure is the transaction cost (TC) or

trading cost which is also related to any non-zero turnover.TC is defined directly on the amount of assets (e.g., shares ofstocks) and depends on the rebalancing frequency. TCs consistof commissions and taxes, bid-ask spread10, slippage11, etc.Based on the liquidity of different assets, their trading costsmay be significantly different. There are many models on TCand different asset classes have different models [49].

B. Portfolio Backtesting

A full investment process includes many stages like datacollection and cleaning (Section III), signal modeling (Section

10Bid-ask spread refers to the difference between the bid price (i.e., thehighest buying price) and the ask price (i.e., the lowest selling price) in themarket.

11Slippage refers to the difference between the expected price of a tradeand the price that the trade is actually executed.

Page 23: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 23

III), portfolio design (Section IV-VIII), performance evalua-tion (Section IX), etc. Before a portfolio is executed in themarket, the portfolio should be designed and tested basedon some historical data (a look-back period), which is calledbacktesting or backtrading. Backtesting has historically onlybeen performed by large institutions and professional assetmanagers due to the expense of obtaining and using detaileddatasets. However, it is becoming a required step for many in-vestment firms, and many independent web-based backtestingplatforms have emerged.

Fig. 11. The sliding-window scheme for in-sample training and out-of-sampletesting.

The simplest way to do backtesting is based on a period ofhistorical data, typically divided into three parts: the trainingset (in-sample data) to estimate the model parameters (forexample, µ and Σ), the cross-validation set (in-sample data)to choose the model and the model order, and the testing set(out-of-sample data) to assess the performance, as can be seenin Fig. 11. A more refined way of doing backtesting is via asliding-window or rolling-window scheme as shown in Fig. 11:basically it is like the previous mentioned block backtestingbut on a rolling window basis. Conducting backtesting onlyon one specific set of historical data is not representativeof what could have happened. Then, a more serious way ofdoing backtesting is by having a large number of sets ofhistorical data covering different market regimes (bull market,side market, and bear market) as well as choosing a differentgroup of stocks from one market, and even having differentmarkets. More sophisticated methods involve some kind ofresampling or synthetic regeneration of data based on theavailable limited historical data.

Remark 5 (Stress-Testing in Finance). In finance, stress-testing is used to test the resilience and robustness of in-stitutions and investment portfolios against possible futurefinancial situations like the financial crisis. It can be carriedout on history data of special periods or using data throughsimulating a testing scenario. Such testing is customarily usedby the financial industry to help gauge investment risk andthe adequacy of assets, as well as to help evaluate internalprocesses and controls. In recent years, regulators have alsorequired financial institutions to carry out stress-tests to ensuretheir capital holdings and other assets are adequate.

It is always easy to make mistakes when it comes tobacktesting. Two of the most common mistakes are reusing thetest data too many times (which then effectively transforms the

test data into cross-validation data and invalidates any testingresults) and survivorship bias in the data (this refers to thefact that stocks that went bankrupt are typically removed fromdatasets which totally biases the results). It is important toemphasize that the in-sample performance is not an accurateassessment due to overfitting. That is, it is often possible tofind a strategy that would have worked well in the past, but willnot work well in the future. For example, the mean-variancefrontier that one is expected to get as measured with the in-sample data, will get distorted in the test set.

X. CONCLUSIONS AND DIRECTIONS FOR FURTHERDEVELOPMENT

In this article, we started from signal modeling in financeand then provided overview of a wide variety of portfolio op-timization problems. Many different perspectives of portfoliooptimization have been touched. Starting from Markowitz’sseminal paper, portfolio optimization has been the focus for thefinancial asset management industries. Although Markowitz’smodel has been widely criticized, the mean-variance idea isstill the backbone of the vast majority of portfolio optimizationframeworks that continue to be applied widely and is stilllargely used in practice, especially in FinTech companies aspart of their robo-advisory system.

Data science and data-driven research have been very hottopics in signal processing. We believe there will be moreand more applications stemming from the combination ofsignal processing and financial engineering fields. Portfoliooptimization is just one direction in financial engineering,there are other topics to be explored in financial engineeringbased on signal processing methods. Possible directions likeleveraging massive data for combining or fusion of differentalphas and incorporating news and other information into thedata modeling, temporal causal modeling and high dimen-sional modeling for financial data, deep learning methods fordata exploration and portfolio designs, etc. We hope that thisintroductory article serves as a good starting point for readersto apply signal processing theory and methods in the financialengineering applications.

XI. ACKNOWLEDGMENTS

This work was supported by the Hong Kong RGC 16208917research grant. The work of Ziping Zhao was supported by theHong Kong PhD Fellowship Scheme (HKPFS).

XII. AUTHORS

Ziping Zhao ([email protected]) received theB.Eng. degree in Electronics and Information Engineering(with Highest Honors) from the Huazhong University ofScience and Technology (HUST), Wuhan, China, in 2014.Currently, he is pursuing the Ph.D. degree with the Departmentof Electronic and Computer Engineering at the Hong KongUniversity of Science and Technology (HKUST), Hong Kong.He was an awardee of the Hong Kong PhD Fellowship Scheme(HKPFS). His research interests are in optimization, machine

Page 24: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 24

learning, deep learning, and statistical signal processing meth-ods, with applications in data analytics, financial engineering,and communication networks.

Rui Zhou ([email protected]) received the B.Eng.degree in information engineering from the Southeast Uni-versity, Nanjing, China, in 2017. He is currently workingtoward the Ph.D. degree with the Department of Electronicand Computer Engineering, Hong Kong University of Scienceand Technology (HKUST), Hong Kong. His research interestsinclude optimization theory and fast algorithms, with appli-cations in financial engineering, machine learning, and signalprocessing.

Daniel P. Palomar ([email protected]) received the ElectricalEngineering and Ph.D. degrees from the Technical Universityof Catalonia (UPC), Barcelona, Spain, in 1998 and 2003,respectively, and was a Fulbright at Princeton University.

He is a Professor in the Department of Electronic and Com-puter Engineering at the Hong Kong University of Science andTechnology (HKUST), Hong Kong, which he joined in 2006.Since 2013 he is a Fellow of the Institute for Advance Study(IAS) at HKUST. He had previously held several researchappointments, namely, at King’s College London (KCL), Lon-don, UK; Stanford University, Stanford, CA; Telecommunica-tions Technological Center of Catalonia (CTTC), Barcelona,Spain; Royal Institute of Technology (KTH), Stockholm,Sweden; University of Rome “La Sapienza”, Rome, Italy;and Princeton University, Princeton, NJ. His current researchinterests include applications of convex optimization theoryand signal processing to financial systems, big data systems,and communication systems.

Dr. Palomar is an IEEE Fellow, a recipient of a 2004/06Fulbright Research Fellowship, the 2004 and 2015 (co-author)Young Author Best Paper Awards by the IEEE Signal Process-ing Society, the 2015-16 HKUST Excellence Research Award,the 2002/03 best Ph.D. prize in Information Technologies andCommunications by the Technical University of Catalonia(UPC), the 2002/03 Rosina Ribalta first prize for the BestDoctoral Thesis in Information Technologies and Communi-cations by the Epson Foundation, and the 2004 prize for thebest Doctoral Thesis in Advanced Mobile Communications bythe Vodafone Foundation and COIT.

He has been a Guest Editor of the IEEE JOURNAL OFSELECTED TOPICS IN SIGNAL PROCESSING 2016 SpecialIssue on “Financial Signal Processing and Machine Learningfor Electronic Trading”, an Associate Editor of IEEE TRANS-ACTIONS ON INFORMATION THEORY and of IEEE TRANS-ACTIONS ON SIGNAL PROCESSING, a Guest Editor of theIEEE SIGNAL PROCESSING MAGAZINE 2010 Special Issueon “Convex Optimization for Signal Processing,” the IEEEJOURNAL ON SELECTED AREAS IN COMMUNICATIONS 2008Special Issue on “Game Theory in Communication Systems,”and the IEEE JOURNAL ON SELECTED AREAS IN COMMU-NICATIONS 2007 Special Issue on “Optimization of MIMOTransceivers for Realistic Communication Networks.”

Yiyong Feng ([email protected]) received a B.Eng.degree in Electronic and Information Engineering from theHuazhong University of Science and Technology (HUST),Wuhan, China, in 2010 and a Ph.D. degree from the Hong

Kong University of Science and Technology (HKUST), HongKong, in August 2015. Dr. Feng is a quantitative researchanalyst with Three Stones Capital Ltd, Hong Kong, where heworks on developing portfolios and automated quant tradingstrategies using sophisticated statistical signal processing, op-timization, and machine learning techniques. From September2015 to March 2016 and from March 2013 to August 2013,Mr. Feng was with the Systematic Market-Making Group atCredit Suisse (Hong Kong).

REFERENCES

[1] H. M. Markowitz, “Portfolio selection,” The Journal of Finance, vol. 7,no. 1, pp. 77–91, 1952.

[2] I. Pollak, M. Avellaneda, E. Bacry, R. Cont, and S. Kulkarni, “Im-proving the visibility of financial applications among signal processingresearchers [from the guest editors],” IEEE Signal Processing Maga-zine, vol. 28, no. 5, pp. 14–15, 2011.

[3] A. N. Akansu, S. R. Kulkarni, M. M. Avellaneda, and A. R. BarronIEEE Journal of Selected Topics in Signal Processing (Special Issue onSignal Processing Methods in Finance and Electronic Trading), vol. 6,Aug. 2012.

[4] A. N. Akansu, D. Malioutov, D. P. Palomar, E. Jay, and D. P. MandicIEEE Journal of Selected Topics in Signal Processing (Special Issueon Financial Signal Processing and Machine Learning for ElectronicTrading), vol. 10, Sept. 2016.

[5] H. Van Trees, Optimum array processing. New York: Wiley, 2002.[6] W. F. Sharpe, “Capital asset prices: A theory of market equilibrium

under conditions of risk,” The Journal of Finance, vol. 19, no. 3,pp. 425–442, 1964.

[7] W. F. Sharpe, Investors and Markets: Portfolio Choices, Asset Pricesand Investment Advice. Princeton University Press, 2011.

[8] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,”Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.

[9] O. Ledoit and M. Wolf, “A well-conditioned estimator for large-dimensional covariance matrices,” Journal of Multivariate Analysis,vol. 88, no. 2, pp. 365–411, 2004.

[10] J. Bun, J.-P. Bouchaud, and M. Potters, “Cleaning correlation matrices,”Risk Management, 2006.

[11] F. Rubio, X. Mestre, and D. P. Palomar, “Performance analysis andoptimal selection of large minimum-variance portfolios under estima-tion risk,” IEEE Transaction on Signal Processing, vol. 6, pp. 337–350,Aug. 2012.

[12] Y. Chen and Y. Chi, “Harnessing structures in big data via guaranteedlow-rank matrix estimation: Recent theory and fast algorithms via con-vex and nonconvex optimization,” IEEE Signal Processing Magazine,vol. 35, no. 4, 2018.

[13] G. Ganesan, “A subspace approach to portfolio analysis,” IEEE SignalProcessing Magazine, vol. 28, pp. 49–60, sep 2011.

[14] E. J. Candes and M. B. Wakin, “An introduction to compressivesampling,” IEEE Signal Processing Magazine, vol. 25, pp. 21–30, mar2008.

[15] A. M. Zoubir, V. Koivunen, Y. Chakhchoukh, and M. Muma, “Robustestimation in signal processing: A tutorial-style treatment of fundamen-tal concepts,” IEEE Signal Processing Magazine, vol. 29, pp. 61–80,jul 2012.

[16] J. B. Ramsey, “Wavelets in economics and finance: Past and future,”Studies in Nonlinear Dynamics & Econometrics, 2002.

[17] R. J. Little and D. B. Rubin, Statistical analysis with missing data,vol. 793. Wiley, 2019.

[18] R. O. Michaud and R. O. Michaud, Efficient asset management: apractical guide to stock portfolio optimization and asset allocation.Oxford University Press, 2008.

[19] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-dergheynst, “The emerging field of signal processing on graphs: Ex-tending high-dimensional data analysis to networks and other irregulardomains,” IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83–98,2013.

[20] G. Marti, F. Nielsen, M. Binkowski, and P. Donnat, “A review oftwo decades of correlations, hierarchies, networks and clustering infinancial markets,” arXiv preprint arXiv:1703.00485., 2017.

[21] T. M. Cover and J. A. Thomas, Elements of information theory. JohnWiley & Sons, 2012.

Page 25: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 25

[22] R. Bell and T. M. Cover, “Game-theoretic optimal portfolios,” Man-agement Science, vol. 34(6), pp. 724–733, 1988.

[23] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta,and A. A. Bharath, “Generative adversarial networks: An overview,”IEEE Signal Processing Magazine, vol. 35, pp. 53–65, jan 2018.

[24] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,“Deep reinforcement learning: A brief survey,” IEEE Signal ProcessingMagazine, vol. 34, pp. 26–38, nov 2017.

[25] D. Yu and L. Deng, “Deep learning and its applications to signal andinformation processing [exploratory DSP],” IEEE Signal ProcessingMagazine, vol. 28, pp. 145–154, jan 2011.

[26] A. Meucci, Risk and asset allocation. Springer Science & BusinessMedia, 2009.

[27] J. C. Hull, Options, Futures, and Other Derivatives. New York, NY,USA: Pearson, 9th ed ed., 2015.

[28] D. Ruppert, Statistics and data analysis for financial engineering,vol. 13. Springer, 2011.

[29] R. Cont, “Empirical properties of asset returns: stylized facts andstatistical issues,” Quantitative Finance, vol. 1, no. 2, pp. 223–236,2001.

[30] R. S. Tsay, Analysis of financial time series, vol. 543. John Wiley &Sons, 2005.

[31] H. Lütkepohl, New introduction to multiple time series analysis.Springer, 2007.

[32] Y. Feng and D. P. Palomar, “A signal processing perspective onfinancial engineering,” Foundations and Trends® in Signal Processing,2016.

[33] E. F. Fama and K. R. French, “Common risk factors in the returnson stocks and bonds,” Journal of Financial Economics, vol. 33, no. 1,pp. 3–56, 1993.

[34] R. F. Engle and C. W. Granger, “Co-integration and error correction:representation, estimation, and testing,” Econometrica, pp. 251–276,1987.

[35] B. B. Mandelbrot, “The variation of certain speculative prices,” TheJournal of Business, 1963.

[36] Y. Feng and D. P. Palomar, “SCRIP: Successive convex optimizationmethods for risk parity portfolio design,” IEEE Transaction on SignalProcessing, vol. 63, pp. 5285–5300, Oct. 2015.

[37] C. Wells, The Kalman filter in finance. Springer Science & BusinessMedia, 2013.

[38] S. Lohr, “For big-data scientists, ‘janitor work’ is key hurdle toinsights.” The New York Times, 2014.

[39] J. Liu, S. Kumar, and D. P. Palomar, “Parameter estimation of heavy-tailed ar model with missing data via stochastic EM,” IEEE TransactionSignal Processing, vol. 67, pp. 2159–2172, Apr. 2019.

[40] J. Ding, V. Tarokh, and Y. Yang, “Model selection techniques: Anoverview,” IEEE Signal Processing Magazine, 2018.

[41] R. Duchin and H. Levy, “Markowitz versus the Talmudic portfoliodiversification strategies,” Journal of Portfolio Management, vol. 35,no. 2, p. 71, 2009.

[42] V. DeMiguel, L. Garlappi, and R. Uppal, “Optimal versus naivediversification: How inefficient is the 1/n portfolio strategy?,” TheReview of Financial Studies, 2007.

[43] R. A. Haugen and N. L. Baker, “The efficient market inefficiency ofcapitalization-weighted stock portfolios,” Journal of Portfolio Manage-ment, 1991.

[44] H. M. Markowitz, Portfolio selection: efficient diversification of invest-ments, vol. 16. John Wiley & Sons, 1959.

[45] W. F. Sharpe, “Mutual fund performance,” The Journal of Business,vol. 39, no. 1, pp. 119–138, 1966.

[46] W. F. Sharpe, “The Sharpe ratio,” The Journal of Portfolio Manage-ment, vol. 21, no. 1, pp. 49–58, 1994.

[47] R. O. Michaud, “The Markowitz optimization enigma: Is ‘optimized’optimal?,” Financial Analysts Journal, vol. 45, no. 1, pp. 31–42, 1989.

[48] V. Chopra and W. Ziemba, “The effect of errors in means, variancesand covariances on optimal portfolio choice,” Journal of PortfolioManagement, 1993.

[49] F. J. Fabozzi, P. N. Kolm, D. A. Pachamanova, and S. M. Focardi,robust portfolio optimization and management. John Wiley & Sons,2007.

[50] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization,vol. 28. Princeton University Press, 2009.

[51] L. Eeckhoudt, C. Gollier, and H. Schlesinger, Economic and financialdecisions under risk. Princeton University Press, 2005.

[52] E. Qian, “Risk parity portfolios: Efficient portfolios through truediversification,” PanAgora Asset Management, 2005.

[53] E. E. Qian, “On the financial interpretation of risk contribution: Riskbudgets do add up,” Journal of Investment Management, vol. 4, no. 4,p. 41, 2006.

[54] C. Gerard and R. Tütüncü, Optimization methods in finance, vol. 5.Cambridge University Press, dec 2006.

[55] S. T. Rachev, Handbook of heavy tailed distributions in finance:Handbooks in finance, vol. 1. Elsevier, 2003.

[56] K. Aas and I. H. Haff, “The generalized hyperbolic skew Student’st-distribution,” Journal of Financial Econometrics, vol. 4, no. 2,pp. 275–309, 2006.

[57] S. Foss, D. Korshunov, S. Zachary, and others, An introduction toheavy-tailed and subexponential distributions, vol. 6. Springer, 2011.

[58] P. J. Huber, Robust statistics. Springer Berlin Heidelberg, 2011.[59] F. Trojani and P. Vanini, “A note on robustness in Merton’s model of

intertemporal consumption and portfolio choice,” Journal of EconomicDynamics and Control, vol. 26, no. 3, pp. 423–435, 2002.

[60] V. DeMiguel and F. J. Nogales, “Portfolio selection with robustestimation,” Operations Research, vol. 57, no. 3, pp. 560–577, 2009.

[61] D. E. Tyler, “A distribution-free M-estimator of multivariate scatter,”The Annals of Statistics, pp. 234–251, 1987.

[62] D. E. Tyler, “Statistical analysis for the angular central gaussiandistribution on the sphere,” Biometrika, pp. 579–589, 1987.

[63] A. Wiesel, “Unified framework to regularized covariance estimation inscaled Gaussian models,” IEEE Transaction Signal Processing, vol. 60,no. 1, pp. 29–38, 2012.

[64] Y. Sun, P. Babu, and D. P. Palomar, “Regularized Tyler’s scatterestimator: Existence, uniqueness, and algorithms,” IEEE TransactionSignal Processing, vol. 62, pp. 5143–5156, July 2014.

[65] Y. Sun, P. Babu, and D. P. Palomar, “Regularized robust estimationof mean and covariance matrix under heavy-tailed distributions,” IEEETransaction Signal Processing, vol. 63, pp. 3096–3109, June 2015.

[66] W. Hu and A. N. Kercheval, “The skewed t,” in Econometrics and riskmanagement, pp. 55–83, Emerald Group Publishing Limited, 2008.

[67] W. James and C. Stein, “Estimation with quadratic loss,” in Break-throughs in statistics, Springer, 1992.

[68] H. M. Markowitz and G. P. Todd, Mean-variance analysis in portfoliochoice and capital markets, vol. 66. John Wiley & Sons, 2000.

[69] P. Jorion, “Bayes-Stein estimation for portfolio analysis,” Journal ofFinancial and Quantitative Analysis, 1986.

[70] F. Black and R. Litterman, “Asset equilibrium: Combining investorviews with market equilibrium,” Journal of Fixed Income, 1991.

[71] J. F. Bonnans and A. Shapiro, Perturbation analysis of optimizationproblems. Springer Science & Business Media, 2013.

[72] L. El Ghaoui and H. Lebret, “Robust solutions to least-squares prob-lems with uncertain data,” SIAM Journal on Matrix Analysis andApplications, vol. 18, no. 4, pp. 1035–1064, 1997.

[73] A. Ben-Tal and A. Nemirovski, “Robust convex optimization,” Math-ematics of Operations Research, vol. 23, no. 4, pp. 769–805, 1998.

[74] L. El Ghaoui, F. Oustry, and H. Lebret, “Robust solutions to uncertainsemidefinite programs,” SIAM Journal on Optimization, vol. 9, no. 1,pp. 33–52, 1998.

[75] A. Ben-Tal and A. Nemirovski, “Robust solutions of uncertain linearprograms,” Operations Research Letters, vol. 25, no. 1, pp. 1–13, 1999.

[76] L. E. Ghaoui, M. Oks, and F. Oustry, “Worst-case value-at-risk androbust portfolio optimization: A conic programming approach,” Oper-ations Research, vol. 51, no. 4, pp. 543–556, 2003.

[77] M. S. Lobo, Robust and convex optimization with applications infinance. PhD thesis, Stanford University, 2000.

[78] R. H. Tütüncü and M. Koenig, “Robust asset allocation,” Annals ofOperations Research, 2004.

[79] D. Goldfarb and G. Iyengar, “Robust portfolio selection problems,”Mathematics of Operations Research, vol. 28, no. 1, pp. 1–38, 2003.

[80] B. V. Halldórsson and R. H. Tütüncü, “An interior-point method for aclass of saddle-point problems,” Journal of Optimization Theory andApplications, 2003.

[81] S. Ceria and R. A. Stubbs, “Incorporating estimation errors intoportfolio selection: Robust portfolio construction,” Journal of AssetManagement, 2006.

[82] I. Popescu, “Robust mean-covariance solutions for stochastic optimiza-tion,” Operations Research, 2007.

[83] E. Delage and Y. Ye, “Distributionally robust optimization under mo-ment uncertainty with application to data-driven problems,” Operationsresearch, 2010.

[84] V. Gabrel, C. Murat, and A. Thiele, “Recent advances in robust opti-mization: An overview,” European Journal of Operational Research,2014.

Page 26: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 26

[85] J. H. Kim, W. C. Kim, and F. J. Fabozzi, “Recent advancements inrobust optimization for investment management,” Annals of OperationsResearch, 2018.

[86] R. Jagannathan and T. Ma, “Risk reduction in large portfolios: Whyimposing the wrong constraints helps,” The Journal of Finance, vol. 58,no. 4, pp. 1651–1683, 2003.

[87] V. DeMiguel, L. Garlappi, F. Nogales, and R. Uppal, “A generalizedapproach to portofolio optimization: Improving performance by con-straining portfolio norms,” Management Science, 2008.

[88] J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris, “Sparseand stable Markowitz portfolios,” Proceedings of the National Academyof Sciences, 2009.

[89] J. Fan, J. Zhang, and K. Yu, “Vast portfolio selection with gross-exposure constraints,” Journal of the American Statistical Association,vol. 107, no. 498, pp. 592–606, 2012.

[90] F. Caccioli, S. Still, M. Marsili, and I. Kondor, “Optimal liquidationstrategies regularize portfolio selection,” The European Journal ofFinance, 2013.

[91] C. Chen, X. Li, C. Tolman, S. Wang, and Y. Ye, “Sparse portfolio selec-tion via quasi-norm regularization,” arXiv preprint arXiv:1312.6350.,2013.

[92] B. Fastrich, S. Paterlini, and P. Winker, “Constructing optimal sparseportfolios using regularization methods,” Computational ManagementScience, 2015.

[93] A. J. McNeil, R. Frey, P. Embrechts, et al., Quantitative risk man-agement: Concepts, techniques and tools. Princeton University Press,2005.

[94] R. T. Rockafellar, S. P. Uryasev, and M. Zabarankin, “Deviationmeasures in risk analysis and optimization,” University of Florida,Department of Industrial & Systems Engineering Working Paper, 2002.

[95] J. Estrada, “Mean-semivariance optimization: A heuristic approach,”Journal of Applied Finance (Formerly Financial Practice and Educa-tion), vol. 18, no. 1, 2008.

[96] J. Longerstaey and M. Spencer, “Riskmetrics - Technical Document,”tech. rep., 1996.

[97] P. Artzner, F. Delbaen, J. M. Eber, and D. Heath, “Coherent measuresof risk,” Mathematical Finance, 1999.

[98] A. Ang, J. Chen, and Y. Xing, “Downside risk,” The Review ofFinancial Studies, vol. 19, no. 4, pp. 1191–1239, 2006.

[99] R. T. Rockafellar and S. Uryasev, “Optimization of conditional value-at-risk,” The Journal of Risk Finance, vol. 2, pp. 21–42, 2000.

[100] A. Chekhlov, S. Uryasev, and M. Zabarankin, “Portfolio optimizationwith drawdown constraints,” In Supply chain and finance, 2004.

[101] F. A. Sortino and L. N. Price, “Performance measurement in a downsiderisk framework,” The Journal of Investing, 1994.

[102] L. Garlappi, R. Uppal, and T. Wang, “Portfolio selection with param-eter and model uncertainty: A multi-prior approach,” The Review ofFinancial Studies, 2006.

[103] A. G. Quaranta and A. Zaffaroni, “Robust optimization of conditionalvalue at risk and portfolio selection,” Journal of Banking & Finance,2008.

[104] S. Zhu and M. Fukushima, “Worst-case conditional value-at-risk withapplication to robust portfolio management,” Operations Research,2009.

[105] W. Hu and A. N. Kercheval, “Portfolio optimization for Student t andskewed t returns,” Quantitative Finance, vol. 10, no. 1, pp. 91–105,2010.

[106] G. Chamberlain, “A characterization of the distributions that implymean-variance utility functions,” Journal of Economic Theory, vol. 29,no. 1, pp. 185–201, 1983.

[107] J. Kelly, “A new interpretation of information rate,” IEEE Transactionson Information Theory, vol. 2, no. 3, pp. 185–189, 1956.

[108] H. Levy and H. M. Markowitz, “Approximating expected utility by afunction of mean and variance,” American Economic Review, 1979.

[109] L. M. Pulley, “A general mean-variance approximation to expectedutility for short holding periods,” Journal of Financial and QuantitativeAnalysis, 1981.

[110] Y. Kroll, H. Levy, and H. M. Markowitz, “Mean-variance versus directutility maximization,” The Journal of Finance, vol. 39, no. 1, pp. 47–61,1984.

[111] J. H. Cremers, M. Kritzman, and S. Page, “Portfolio formation withhigher moments and plausible utility,” Revere Street Working PaperSeries, Financial Economics., 2003.

[112] E. Jondeau, S.-H. Poon, and M. Rockinger, Financial modeling undernon-Gaussian distributions. Springer Science & Business Media, 2007.

[113] W. H. Jean, “The extension of portfolio analysis to three or moreparameters,” Journal of Financial and Quantitative Analysis, 1971.

[114] G. M. De Athayde and R. G. Flôres Jr., “Finding a maximum skewnessportfolio-general solution to three-moments portfolio choice,” Journalof Economic Dynamics and Control, vol. 28, no. 7, pp. 1335–1352,2004.

[115] C. R. Harvey, J. C. Liechty, M. W. Liechty, and P. Müller, “Portfolioselection with higher moments,” Quantitative Finance, vol. 10, no. 5,pp. 469–485, 2010.

[116] Y. Choueifaty and Y. Coignard, “Toward maximum diversification,”Journal of Portfolio Management, 2008.

[117] Y. Choueifaty, T. Froidure, and J. Reynier, “Properties of the mostdiversified portfolio,” Journal of Investment Strategies, 2013.

[118] P. Christoffersen, V. Errunza, K. Jacobs, and H. Langlois, “Is the po-tential for international diversification disappearing? a dynamic copulaapproach,” The Review of Financial Studies, 2012.

[119] R. Fernholz, R. Garvy, and J. Hannon, “Diversity-weighted indexing,”Journal of Portfolio Management, vol. 24, no. 2, p. 74, 1998.

[120] C. S. Asness, A. Frazzini, and L. H. Pedersen, “Leverage aversion andrisk parity,” Financial Analysts Journal, vol. 68, no. 1, pp. 47–59, 2012.

[121] D. E. Johnston and P. M. Djuric, “The science behind risk manage-ment,” IEEE Signal Processing Magazine, vol. 28, pp. 26–36, sep 2011.

[122] S. Maillard, T. Roncalli, and J. Teïletche, “On the properties ofequally-weighted risk contributions portfolios,” Journal of PortfolioManagement, vol. 36, no. 4, pp. 60–70, 2010.

[123] H. Kaya and W. Lee, “Demystifying risk parity,” Neuberger Berman,2012.

[124] F. Spinu, “An algorithm for computing risk parity weights,” Availableat SSRN 2297383, 2013.

[125] T. Griveau-Billion, J.-C. Richard, and T. Roncalli, “A fast algorithm forcomputing high-dimensional risk parity portfolios,” Available at SSRN2325255, 2013.

[126] X. Bai, K. Scheinberg, and R. Tutuncu, “Least-squares approach torisk parity in portfolio selection,” Quantitative Finance, vol. 16, no. 3,pp. 357–376, 2016.

[127] T. Roncalli, Introduction to risk parity and budgeting. CRC Press,2013.

[128] D. B. Chaves, J. C. Hsu, F. Li, and O. Shakernia, “Efficient algorithmsfor computing risk parity portfolio weights,” Journal of Investing,vol. 21, pp. 150–163, 2012.

[129] X. Bai and K. Scheinberg, “Alternating direction methods for nonconvex optimization with applications to second-order least-squaresand risk parity portfolio selection,” Available at optimization-online,2015.

[130] T. Roncalli and G. Weisang, “Risk parity portfolios with risk factors,”Quantitative Finance, vol. 16, no. 3, pp. 377–388, 2016.

[131] W. F. Sharpe, “The arithmetic of active management,” FinancialAnalysts Journal, pp. 7–9, 1991.

[132] B. G. Malkiel, “Passive investment strategies and efficient markets,”European Financial Management, vol. 9, no. 1, pp. 1–10, 2003.

[133] B. M. Barber and T. Odean, “Trading is hazardous to your wealth:The common stock investment performance of individual investors,”The journal of Finance, vol. 55, no. 2, pp. 773–806, 2000.

[134] K. J. Oh, T. Y. Kim, and S. Min, “Using genetic algorithm to supportportfolio optimization for index fund management,” Expert Systemswith Applications, vol. 28, no. 2, pp. 371–379, 2005.

[135] R. Jansen and R. Van Dijk, “Optimal benchmark tracking with smallportfolios,” The Journal of Portfolio Management, vol. 28, no. 2,pp. 33–39, 2002.

[136] K. Benidis, Y. Feng, and D. P. Palomar, “Sparse portfolios for high-dimensional financial index tracking,” IEEE Transaction on SignalProcessing, vol. 66, no. 1, pp. 155–170, 2017.

[137] K. Benidis, Y. Feng, and D. P. Palomar, “Optimization methods forfinancial index tracking: From theory to practice,” Foundations andTrends® in Optimization, vol. 3, no. 3, pp. 171–279, 2018.

[138] N. A. Canakgoz and J. E. Beasley, “Mixed-integer programmingapproaches for index tracking and enhanced indexation,” EuropeanJournal of Operational Research, vol. 196, no. 1, pp. 384–399, 2009.

[139] C. Dose and S. Cincotti, “Clustering of financial time series withapplication to index and enhanced index tracking portfolio,” Physica A:Statistical Mechanics and its Applications, vol. 355, no. 1, pp. 145–151,2005.

[140] G. Vidyamurthy, Pairs Trading: quantitative methods and analysis.John Wiley & Sons, 2004.

[141] D. S. Ehrman, The handbook of pairs trading: Strategies using equities,options, and futures. John Wiley & Sons, 2006.

[142] E. Gatev, W. N. Goetzmann, and K. G. Rouwenhorst, “Pairs trading:Performance of a relative-value arbitrage rule,” Review of FinancialStudies, vol. 19, no. 3, pp. 797–827, 2006.

Page 27: IEEE SIGNAL PROCESSING MAGAZINE 1 Portfolio Optimization

IEEE SIGNAL PROCESSING MAGAZINE 27

[143] A. Pole, Statistical arbitrage: algorithmic trading insights and tech-niques. John Wiley & Sons, 2011.

[144] B. I. Jacobs and K. N. Levy, Market Neutral Strategies. John Wiley& Sons, 2005.

[145] J. G. Nicholas, Market Neutral Investing: Long/Short Hedge FundStrategies. Bloomberg Press, 2000.

[146] M. Cuturi and A. d’Aspremont, “Mean-reverting portfolios,” in Finan-cial Signal Processing and Machine Learning (A. N. Akansu, S. R.Kulkarni, and D. M. Malioutov, eds.), ch. 3, pp. 23–40, John Wiley &Sons, 2016.

[147] Z. Zhao and D. P. Palomar, “Mean-reverting portfolio with bud-get constraint,” IEEE Transaction on Signal Processing, vol. 66,pp. 2342–2357, Jan. 2018.

[148] Z. Zhao, R. Zhou, and D. P. Palomar, “Optimal mean-reverting portfoliowith leverage constraint for statistical arbitrage in finance,” IEEETransaction Signal Processing, vol. 67, pp. 1681–1695, Apr. 2019.

[149] M. L. De Prado, Advances in financial machine learning. John Wiley& Sons, 2018.

[150] G. Y. Ban, N. El Karoui, and A. E. Lim, “Machine learning and portfo-lio optimization,” Management Science, vol. 64, no. 3, pp. 1136–1154,2016.

[151] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning,vol. 1. MIT Press Cambridge, 2016.

[152] J. Moody and M. Saffell, “Learning to trade via direct reinforcement,”IEEE Transaction Neural Networks, vol. 12, no. 4, pp. 875–889, 2001.

[153] R. C. Merton, “Lifetime portfolio selection under uncertainty: Thecontinuous-time case,” The Review of Economics and Statistics, 1969.

[154] J. Y. Campbell and L. M. Viceira, Strategic asset allocation: portfoliochoice for long-term Investors. Clarendon Lectures in Economic, 2002.

[155] J. Dupacova, “Portfolio optimization via stochastic programming:Methods of output analysis,” Mathematical Methods of OperationsResearch, 1999.

[156] Y. Feng, D. P. Palomar, and F. Rubio, “Robust optimization of order ex-ecution,” IEEE Transaction on Signal Processing, vol. 63, pp. 907–920,Feb. 2015.

[157] T. Björk, A. Murgoci, and X. Y. Zhou, “Mean-variance portfolio opti-mization with state-dependent risk aversion,” Mathematical Finance:An International Journal of Mathematics, Statistics and FinancialEconomics, 2014.

[158] D. Bertsimas and D. Pachamanova, “Robust multiperiod portfolio man-agement in the presence of transaction costs,” Computers & OperationsResearch, 2008.

[159] S. Boyd, E. Busseti, S. Diamond, R. N. Kahn, K. Koh, P. Nystrup, andJ. Speth, Multi-Period Trading via Convex Optimization. now, 2017.

[160] M. W. Savelsbergh, R. A. Stubbs, and D. Vandenbussche, “Multi-portfolio optimization: A natural next step,” in Handbook of portfolioconstruction, pp. 565–581, Springer, 2010.

[161] Y. Yang, F. Rubio, G. Scutari, and D. P. Palomar, “Multi-portfoliooptimization: A potential game approach,” IEEE Transaction on SignalProcessing, vol. 61, pp. 5590–5602, Nov. 2013.

[162] D. J. Brophy and M. W. Guthner, “Publicly traded venture capitalfunds: Implications for institutional "fund of funds" investors,” Journalof Business Venturing, 1988.

[163] S. Brands and D. R. Gallagher, “Portfolio selection, diversification andfund-of-funds: a note,” Accounting & Finance, 2005.

[164] T. Conlon, H. J. Ruskin, and M. Crane, “Random matrix theory andfund of funds portfolio optimisation,” Physica A: Statistical Mechanicsand its applications, 2007.

[165] C. Vallea, N. Meadeb, and J. Beasley, “Absolute return portfolios,”Omega, 2014.

[166] T. M. Cover, “Universal portfolios,” in The Kelly Capital Growth Invest-ment Criterion: Theory and Practice, pp. 181–209, World Scientific,2011.

[167] G. C. Philippatos and C. J. Wilson, “Entropy, market risk, and theselection of efficient portfolios,” Applied Economics, vol. 4, no. 3,pp. 209–220, 1972.


Recommended