Institute of Astronomy, University of Cambridge, Madingley ...

MNRAS 000, 1–15 (2021) Preprint 5 October 2021 Compiled using MNRAS LATEX style file v3.0

Precision in high resolution absorption line modelling,analytic Voigt derivatives, and optimisation methods.

John K. Webb1?, Robert F. Carswell2†, Chung-Chi Lee1‡.1Clare Hall, University of Cambridge, Herschel Rd, Cambridge CB3 9AL.2Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge, CB3 0HA, UK.

Accepted . Received ; in original form

ABSTRACTThis paper describes the optimisation theory on which vpfit, a non-linear least-squares program for modelling absorption spectra, is based. Particular attention ispaid to precision. Voigt function derivatives have previously been calculated usingnumerical finite difference approximations. We show how these can instead be com-puted analytically using Taylor series expansions and look-up tables. We introduce anew optimisation method for an efficient descent path to the best-fit, combining theprinciples used in both the Gauss-Newton and Levenberg-Marquardt algorithms. Asimple practical fix for ill-conditioning is described, a common problem when mod-elling quasar absorption systems. We also summarise how unbiased modelling dependson using an appropriate information criterion to guard against over- or under-fitting.

The methods and the new implementations introduced in this paper are aimed atoptimal usage of future data from facilities such as ESPRESSO/VLT and HIRES/ELT,particularly for the most demanding applications such as searches for spacetime vari-ations in fundamental constants and attempts to detect cosmological redshift drift.

Key words: quasars: absorption lines, cosmology: observations, methods: data anal-ysis

1 INTRODUCTION

vpfit (Carswell & Webb 2014) is an optimisation code de-signed primarily for the analysis of high resolution quasarspectra. However it has been used in a number of other appli-cations, including the interstellar medium, absorption linesin stellar photospheres, and emission line fitting. vpfit hasbeen cited in more than 300 papers1. A comprehensive vp-fit user guide is given in (Carswell & Webb 2020). Themathematical details for a fledgling version of vpfit wereprovided in Webb (1987) but have not been reported in thepeer-reviewed literature.

This paper first summarises the theoretical basis for vp-fit and then describes new enhancements and inclusionsthat improve accuracy, stability, and provide quantitativeinformation about systematic uncertainties. Modificationsof particular note are (i) new analytic calculations of Voigtfunction derivatives (previously finite difference derivativeswere used) and analytic derivatives for all other (non-Voigt)

? [email protected]† [email protected]‡ [email protected] ADS Abstracts citations since 1995; actual usage is higher.

parameters, (ii) addition of distortion parameters insidethe non-linear least squares processes, (iii) enhanced resolu-tion and interpolation in the Voigt function look-up tables,and (iv) the hybridisation of Gauss-Newton and Levenberg-Marquardt to form a unified new optimisation method.

The new modifications are in part motivated by therecent application of Artificial Intelligence methods (Bain-bridge & Webb 2017a,b; Lee et al. 2021b) and InformationCriterion techniques (Webb et al. 2021) to spectroscopy.The advances reported here facilitate optimal analyses offuture high signal to noise and high calibration precisiondata achievable with new and forthcoming spectroscopic fa-cilities2, especially future challenges such as redshift driftand searches for varying fundamental constants.

High precision in computing the Voigt function andprofile is paramount. Sections 2 and 3) discuss precision

2 Notably, the Echelle SPectrograph for Rocky Exoplanets andStable Spectroscopic Observations (ESPRESSO) on the Eu-ropean Southern Observatory’s Very Large Telescope (VLT)

(Pepe et al. 2021) and the High Resolution Echelle Spectograph(HIRES) on the forthcoming Extremely Large Telescope (ELT)

e.g. Marconi et al. (2016); Tamai et al. (2018).

© 2021 The Authors

arX

iv:2

108.

1121

8v2

[as

tro-

ph.I

M]

4 O

ct 2

021

2 Webb, Carswell, Lee

and also introduce a modified optimisation method, mergingthe two different approaches used in the Gauss-Newton andLevenberg-Marquardt techniques.

Accurate derivatives of the Voigt profile are also essen-tial for optimal, robust, χ2 descent and unbiased parameterestimation. Previous computations (in vpfit and, as far aswe know, in other codes) have either been based on finitedifference approximations, or have made use of analytic ap-proximations (which lack accuracy or are computationallydemanding or both). Finite difference derivatives (fdd) canwork well although have two important disadvantages: (i)fdd intervals need to be chosen according to the characteris-tics of the data being modelled, and (ii) the chosen intervalsmay in fact not be appropriate for all absorption compo-nents within a complex comprising many absorption lines(since line parameters and blending vary substantially).

In this paper we introduce a new approach for calcu-lating Voigt derivatives that entirely eliminates such diffi-culties. We use Taylor series expansions of the derivativesof the Voigt function, with look-up tables. This method isanalogous to that of Harris (1948) applied to the Voigt func-tion itself, except now the idea is applied directly to ana-lytic derivatives of the Voigt function. Section 4 describesthis and makes use of the derivative convolution theorem toallow for instrumental resolution. Both of the problems men-tioned above are then avoided, no user decisions are needed,and the new method provides significant precision and somespeed improvements compared to finite difference deriva-tives or analytic approximations. Section 5 summarises theadvances described in this work and two appendices discussseveral practical aspects of the calculations.

2 NON-LINEAR LEAST-SQUARESMINIMISATION

A comprehensive description of non-linear optimisationmethods is given in the excellent book by Gill et al. (1981).The application of such methods to the detection and mea-surement of stars in crowded fields was described in Irwin(1985). Both of the previous citations were strongly influ-ential in the application of non-linear optimisation to spec-troscopy described in this paper.

Non-linear least-squares methods fall into two broadclasses: the simpler Gradient methods, using only first orderderivatives and Newton-type methods which use both firstand second derivatives of an objective function. The lat-ter are more powerful since they generally converge fasterand are more robust. The rate at which a Newton methodconverges depends on the form of the function being min-imised. The nearer the function is to quadratic, the faster theconvergence. If it is exactly quadratic, convergence can beachieved in a single iteration. When residuals are Gaussian,non-linear least squares techniques are equivalent to Max-imum Likelihood methods and provide optimal parameterestimates (Charnes et al. 1976). An additional important ad-vantage of Newton-type methods is that reliable parametererror estimates are available at virtually no extra computingeffort.

Let the model intensity be I(x ), where the vector xis the set of all free model parameters and let the ith nor-malised residual between model and data di (having uncer-

tainties σi) be

f(x )i =I(x )i − di

σi. (1)

where the subscript i is the index in the spectral array atwhich the observed-frame wavelength is λi. I(x ) is the modelintensity after convolution with the instrumental resolutionG(λ) in wavelength space, i.e.

I(x )i = (Iν ∗ G)i =

∫∞−∞ IνG(λ)idλ∫∞−∞ G(λ)idλ

, (2)

where Iν is calculated from Eq. (A1), expressed as a functionof wavelength as defined by Eq. (A11), and G(λ)i is theinstrumental profile for pixel-i. vpfit provides the optionof G being either a Gaussian instrumental profile or a user-defined numerical instrumental profile3.

To fit I(x ) to the set of n observed data points di, wewant to minimise

F (x ) =1

2

n∑i=1

f(x )2i =1

2f(x)T f(x) , (3)

where T denotes transpose and the factor of 1/2 has been in-cluded to eliminate an extra factor of 2 in subsequent deriva-tives of this equation.

To set up the minimisation procedure, we approximateF (x ) using a quadratic model, i.e. a Taylor series expansionto second order,

F (x + p) ≈ F (x ) + pT f ′(x ) +1

2pT f ′′(x )p

= F (x ) + pT g(x ) +1

2pTG(x )p , (4)

where p is the predicted parameter unit vector update thatminimises F (x + p), prime denotes derivative, g(x ) is thegradient vector, and G(x ) is the Hessian matrix. For theqth model parameter, the corresponding component of thegradient vector is

g(x )q =∂F (x )

∂xq=

n∑i=1

∂I(x )i∂xq

f(x )i , (5)

or in vector/matrix form,

g(x ) = J(x )T f(x) , (6)

where J(x ) is the n ×m Jacobian matrix of f(x) (n is thenumber of data points, m is the number of free parameters)whose ith row is

∇f(x )i = (∂fi/∂x1, ∂fi/∂x2, . . . , ∂fi/∂xm) . (7)

For any two model parameters xq and xr, the compo-nent of the Hessian matrix is

G(x )qr =∂2F (x )

∂xq∂xr

=

[n∑i=1

∂2I(x )i∂xq∂xr

f(x )i

]+

[n∑i=1

∂I(x )i∂xq

∂I(x )i∂xr

1

σ2i

], (8)

3 As required by Hubble Space Telescope spectroscopic data for

example, and as is likely to be required for future high-precisionspectroscopy with instruments such as ESPRESSO/VLT and

HIRES/ELT

MNRAS 000, 1–15 (2021)

VPFIT theory 3

or in vector/matrix form,

G(x ) = Q(x ) + J(x )T J(x ) . (9)

Consider the first term in Eq. (8). If the model I(x )is a reasonable representation of the data, each f(x )i maybe considered as an independent random variable such that〈f(x )i〉 → 0 as n → ∞. For this reason, the second termin Eq. (8) dominates, so we drop the first term in squarebrackets above, and the Hessian matrix can be approximatedusing only first order derivatives,

G(x ) ≈ J(x )T J(x ) . (10)

An advantage of approximating the Hessian using onlyfirst-order derivatives, apart from simplicity, is that it ren-ders G(x ) positive-definite, thereby guaranteeing a descentdirection when solving the matrix equations for optimal pa-rameter updates. To minimise Eq. (4) by the choice of somesuitable p, it is convenient to formulate a quadratic func-tion in terms of p, the step to the minimum, rather thanthe predicted minimum itself. Then at each iteration (i.e.for some particular set of model parameters x at the cur-rent iteration), the optimal parameter updates are found byminimising Eq. (4) with respect to p. Doing so gives

g(x ) = −G(x )pmin , (11)

An algorithm in which the search direction is obtained usingequations of the form (11) is called a “Newton type method”.The equation used in vpfit is not (11), but instead is a mod-ified form of it, for the reasons discussed shortly in Sections2.1 to 2.4. For simplicity, from here on we drop the “(x )”notation.

2.1 Gauss-Newton and Levenberg-Marquardtmethods

The two best known non-linear minimisation methods areGauss-Newton (GN) and Levenberg-Marquardt (LM). Wehave experimented using both individually but also havemost recently implemented a hybrid procedure formed fromboth.

The GN method attempts to improve efficiency bytweaking the search direction at each iteration. After solv-ing Eq. (11) for pmin, an extra univariate minimisation iscarried out, to identify that value of α which minimisesF (x + αpmin). The Gauss-Newton parameter updates arethen

pGN = αpmin . (12)

The LM method takes a different approach, modifyingthe Hessian matrix,

g = −GLMpmin = − (G + ηI)pmin , (13)

where I is an identity matrix and η is a non-negative scalar,adjusted iteratively to find the largest reduction in F (x +pmin). Newton’s method is recovered when η = 0 and whenηI � G, the search direction becomes parallel to that of thegradient descent method (but step lengths are altered by afactor of 1/η). Improvements to the standard LM processhave been proposed, for problems in which the number ofparameters is large, e.g. Transtrum & Sethna (2012), but wehave not explored those particular methods.

2.2 Switching between GN and LM

Empirically, at some points during minimisation, GN canproduce a larger step towards convergence, whilst at otherpoints, LM does so. In vpfit version 12.2 and earlier, GNand LM are used in conjunction with each other, based onthe Hessian described by Eq. (15); GN and LM parame-ter updates are computed at every vpfit iteration and thedescent direction was taken to be that which produces thelargest drop in chi squared, ∆F . In practice, the extra timelost in computing both GN and LM descents was more thanoffset by a greater convergence efficiency. To assist the fol-lowing discussion, we call this switching procedure the “GN-LM” method.

2.3 Hybrid Optimisation (HO), ill-conditioning,and Modified Cholesky Factorisation (MCF)

As will be seen shortly, when the GN method is modified toaccount for ill-conditioning, it is very similar to LM. This,and the previous discussion, suggests what seems to be anobvious point - instead of the GN-LM approach (computingboth GN and LM solutions and then chosing the best ateach iteration), one could simply unify GN and LM to forma single minimisation technique in the following way: withinthe same iteration, first use Eq. (13) to solve for η, thenuse Eq. (12) to find α (or vice versa). This unification ofthe GN and LM methods can result in a faster descent thaneither GN or LM individually and is implemented in vpfitversion 12.3. Since individually, GN and LM each have onlyone tuning parameter each, the hybrid method has two. Toassist the following discussion, we will refer to the new hy-brid method as the “Hybrid Optimisation (HO) method”.However, prior to solving for the descent direction using thisnew HO approach, it is necessary to first deal with dynamicrange and ill-conditioning problems.

In principle, the parameter updates pmin could be ob-tained by solving Eq. (11) using Cholesky decomposition.However, practical issues arise that require modifications ofEq. (11):

(i) The quadratic model, Eq. (4), is unlikely to be perfect,particularly when far from the best-fit solution;

(ii) G often has a huge dynamical range. The reasonis easy to understand; redshift parameters are generallyfar more tightly constrained than column density or b-parameters. There are similar considerations for other pa-rameters, such that rounding errors can become importantwhen using Cholesky decomposition to solve for pmin. Em-pirically, the dynamic range can be ∼ 1014 or even greater,creating precision difficulties. If left unchecked, the conse-quence of this can be to render solutions of Eq. (11) unsta-ble;

(iii) Ill-conditioning is often present, again causing sta-bility problems when applying Cholesky decomposition toEq. (11).

Therefore, the following modifications are carried out. As de-scribed in Section 2, the Hessian matrix and gradient vectorare formed from derivatives with respect to the free param-eters (Eq. (5) and (8)). The solution implemented in vpfit(prior to applying the HO method) involves two modifica-tions of the Hessian matrix:

MNRAS 000, 1–15 (2021)


(1) Each row/column in the Hessian is normalised such thatall diagonal terms are unity,

Gn = D−1/2GD−1/2 , (14)

where the subscript “n” indicates normalised, Dij =√Giiδij . This reduces the dynamic range, lessening the im-

pact of possible rounding error problems, and is also usefulfor ill-conditioning.(2) We assume ill-conditioning occurs (it frequently does,particularly when line blending inevitably results in someparameters being poorly determined). The Hessian is ren-dered positive-definite by adding a constant to its diagonalterms. The theoretical basis for this is discussed in e.g. Gillet al. (1981). The solution implemented in vpfit, the secondmodification of the Hessian, is

GV PFIT → Gn + ηnI , (15)

where ηn is a tunable non-negative scalar and I is the iden-tity matrix.

As Eq. (13) shows, MCF and LM are in fact similar.Both add constants to the Hessian diagonals. To maximisethe iterative reduction in F (x + p), MCF (as implementedhere) adds the tunable constant to the normalised Hessianwhilst LM does the same but using the unnormalised Hes-sian. MCF (again, as implemented here) doubles the diago-nal terms. The advantage (of both approaches) is that thepositive-definite Hessian guarantees a descent direction sothe process is always stable. The penalty is that in modellingsituations that are not inherently ill-conditioned, MCF re-duces efficiency such that reaching the minimum requires aslightly larger number of steps. The final matrix equations,i.e. the modification to Eq. (11), therefore become

gn = −(1 + ηn)Gn pnmin , (16)

where

gn = D−1/2g and pnmin = D1/2pmin . (17)

Having now modified the Hessian in the two ways just dis-cussed and then solved for pmin using MCF, the final stepin our hybrid procedure is univariate minimisation of F byoptimising αn to find the parameter updates,

xnew = x + αnD−1/2pnmin , (18)

i.e. the new hybrid procedure described above tunes twonew parameters ηn and αn to minimise F 4. Because of theway in which the new HO method has been set-up, from thesame set of starting parameters within a single iteration, itmust always descend at least as rapidly than LM or GNand in general will win. However, HO will follow a differentdescent path than either LM or GN and therefore a moreefficient overall descent is not guaranteed. The logic flow forone vpfit iteration is illustrated schematically in Figure 1.

4 In practice, ηn is tuned quite coarsely, stepping in powers of10. This has the advantage of speed and empirically it works well.

In principle however, finer tuning could result in faster descentper iteration but would require more computing time within eachiteration. We have not yet explored the trade-off between thesetwo things.

1. Calculate model spec-

trum I(x) (Eq. (2))

2. Calculate objectivefunction F (x) (Eq. (3))

3. Calculate the gradi-ent vector f(x) (Eq. (6))

4. Calculate the preliminary

Hessian matrix G(x ) (Eq. (9))

5. Normalise Hessian to unit

diagonals Gn (Eq. (14))

6. Add tuned LM constant toHessian diagonals (Eq. (15))

7. Calculate preliminary parameterupdates pmin (Eqs. (16) and (19))

8. Find optimal αn to update the

parameter space, xnew (Eq. (18))

9. Return to Step 1 for the

next iteration, with xnew.

Figure 1. Bottom-up flow chart illustrating sequential logic for

one vpfit iteration.

2.4 Comparing GN-LM and HO methods

We can thus compare the search directions obtained usingthe new hybrid method to those obtained using the regularGN and LM methods. To do so let us first transfer Eq. (16)back by multiplying by D1/2, which gives

g = − (G + ηnD) pmin , (19)

showing that when ηn = 0, the HO method reduces to theGN method and as ηn → ∞, the search direction (solvedfor using Eq. (19)) becomes pmin ∝ D−1g, such that the im-pact of off-diagonal terms becomes negligible, i.e. we assumethat each parameter is parabolic along its respective axis inparameter space.

The relative performances of the hybrid and GN-LMmethods will, of course, vary according to the data beingmodelled, the starting parameter guesses, and other obser-

MNRAS 000, 1–15 (2021)

VPFIT theory 5

vational details. Figure 2 illustrates a two-component syn-thetic absorption system, comprising three atomic species.The signal to noise is 100 per pixel. Absorption line param-eters taken from a known absorption system towards thebright zem = 3.12 quasar Q0420-388. The three spectralsegments were fitted simultaneously using vpfit. The finestructure constant was included as a free parameter. Thedata were fitted twice, each time using a different set ofstarting guesses, which were perturbed far from the true in-put parameter values (initial normalised chi squared valuesfor the starting models were ∼ 700 and ∼ 900).

Figure 3 shows the evolution of the normalised valueof χ2

n at each iteration (χ2n = 2F (x )/ndf , where ndf is the

number of degrees of freedom in the fit). We examine the rel-ative performances of the hybrid and GN/LM methods byrunning both methods on a synthetic spectrum (panel (a))and also on real data (panel (b)). Different transitions wereused for the synthetic and real cases. For the synthetic spec-trum, the transitions are illustrated in Figure 2. The realdata used is a segment of the absorption system towardsHE0515-4414, 1.1494347 < zabs < 1.1499145, described inMilakovic et al. (2021), designated “region IV” and illus-trated in figure B4 of that paper, comprising five Fe ii lines,the Mg ii doublet, and one Mg i line.

Two trials were carried out for both spectra, each trialhaving a slightly different set of first-guess parameters. Inpanel (a) the starting parameter guesses were different fortwo trials. The blue lines show the results for trial 1 and thered lines are for trial 2 (solid line = HO method, dashed line= GN-LM method). For trial 1, HO and GN-LM descendat about the same rates for the first 4 iterations, but fromthen on, HO performs significantly better. For the secondmodel (red), there is little difference between the two ap-proaches. Panel (b) shows the two trial fits to the real data.For both trials, the HO method marginally out-performs theGN-LM method although the differences are small. Interest-ingly however, trial 2 converges to a slightly worse χ2

n thantrial 1 but only when the GN-LM method is used; when bothmodels are run using the HO method, the two trials convergeto the same χ2

n. A visual check was made on the best-fit mod-els and indeed the absorption component relative positionswas different for the two GN-LM methods using real data.This is intriguing because it may suggest the HO methodis less likely to find spurious secondary minima in χ2 space.This point must remain speculative since we have only car-ried out two trials here. A more detailed study should clarifythis possibility. Nevertheless, the overall conclusion, albeittentative, is that the HO method works slightly better thanGN-LM.

2.5 Stopping criteria

Stopping criteria are discussed in detail in Carswell& Webb (2020). The basic criterion is simply thatthe fitting procedure iterates until the condition[F (x)− F (x + αpmin)] /F (x) ≤ ∆ is satisfied, where∆ can be user-defined. The appropriate value of ∆ dependson the data characteristics i.e. spectral resolution, pixelsize, signal to noise, and number of spectral segments beingsimultaneously modelled.

0

1

SiII 1526.71 f = 0.1227 2abs = 48.9900

npix = 48 2

red = 1.3608ndf = 36 1 - vp1_sn100.asciiq = 47

aaac

1 +1

0

1

AlII 1670.79 f = 1.7400 2abs = 31.5400

npix = 44 2

red = 0.7885ndf = 40 2 - vp2_sn100.asciiq = 270

AAAC

1 +1

100 0 100Velocity relative to zabs = 3.088095 (km/s)

0

1

FeII 1608.45 f = 0.0529 2abs = 49.1300

npix = 55 2

red = 1.2597ndf = 39 3 - vp3_sn100.asciiq = -1165

AAAC

1 +1

/Users/leechungchi/Documents/VPFIT/velplot/vpfit/data/syn/vp.13

Figure 2. Synthetic spectrum of a two-component absorptionsystem, with three atomic species. The signal to noise per pixel is

100 and the spectral resolution is 6 km/s FWHM. The broadening

mechanism is turbulent i.e. all transitions at the same redshifthave the same b-parameter. The data were fitted twice, each time

with a different set of starting guesses.

3 COMPUTING THE VOIGT FUNCTION

The Voigt function H(a, u), a convolution of Gaussian andLorentzian profiles. It has been described in many textbooksand papers but for the sake of completeness (and to as-sist other descriptive aspects of this paper), it is briefly de-scribed in Appendix A1. There are published formulations ofH(a, u) that are derived in both frequency and wavelengthspace. The latter is incorrect, as discussed in Appendix A2;the correct procedure is to use frequency space. H(a, u) isexpressed by an integral equation and there is a vast litera-ture across many scientific fields describing methods for itspractical computation. Numerical integration is impracticalin an iterative application such as vpfit because a hugenumber of repeated calculations are done. A comprehensivediscussion about accuracy in computing the Voigt profileis given in Murphy (2002). Whilst analytic approximationsexist, the most practical approach (and the most accuratemethod, other than full numerical integration of the ana-lytic Voigt function) is that introduced by (Harris 1948); theVoigt function is expanded using a Taylor series and thenlook-up tables of the series coefficients Hn as a function of uused, with interpolation. This method can achieve an arbi-trarily high level of accuracy, depending only on how manyTaylor series terms are used, the resolution of the look-uptables, and how well interpolation in the look-up table isdone (Section 3.1).

A Taylor series expansion allows the Voigt function tobe expressed as

H(a, u) =∞∑n=0

anHn(u) , (20)

(see Appendix A1 for definitions of terms) for which the first

MNRAS 000, 1–15 (2021)


0 2 4 6 8 10 12 14 16 18 20

Number of iterations

100

101

102

103

I.C. 1, HO methodI.C. 1, GN-LM methodI.C. 2, HO methodI.C. 2, GN-LM method

(a)

1 2 3 4 5 6 7 8 9 10 20

Number of iterations

100

101

102

I.C. 1, HO methodI.C. 1, GN-LM methodI.C. 2, HO methodI.C. 2, GN-LM method

5 6 7 8 9 10 12 15 20 250.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

(b)

Figure 3. The evolution of the normalised χ2n per iteration for

trial fits, each with a different set of starting guesses. The solid

lines illustrate descent for the HO optimisation method and thedashed lines are for the GN-LM method. Panel (a) shows the

results for fitting a synthetic spectrum (described in Section 2.4).

Panel (b) shows the results for fitting a real spectrum.

five terms are

H0(u) = e−u2

,

H1(u) = − 2√π

[1− 2uD(u)] ,

H2(u) = (1− 2u2)e−u2

,

H3(u) = − 4√π

[1− u2

3− u

(1− 2u2

3

)D(u)

],

H4(u) =

(1

2− 2u2 +

2

3u4

)e−u

2

,

(21)

and D(u) is the Dawson function,

D(u) = e−u2∫ u

0

et2

dt . (22)

3.1 Look-up table and interpolation

Using look-up tables in this way requires decisions that im-pact on the precision achieved for H(a, u). First we must

0 1 2 3 4 5 6 7 8 9 10

10-15

10-10

10-5

100

3 P.I. (200)

2 P.I. (20,000)

6 P.I. (20,000)

Figure 4. Illustration of Eq. (23) for a = 10−4 (Eq. (A8)), fordifferent resolution in the Hn(u) look-up tables (Equations (21))

and different numbers of interpolation points. The blue region

illustrates δH/H for 200 point look-up tables and 3 point inter-polation. The “line-like” structure see in in the blue region is a

consequence of the look-up table resolution; each time the look-upvalue of u coincides exactly with an entry in the table, the preci-

sion minimises. The relative precision is worse half-way between

look-up values. The red region is for 20,000 point look-up tablesand 2 point interpolation. The yellow region is for 20,000 point

interpolation and 6 point interpolation. The latter are the default

settings adopted for use in vpfit, so the worst-case relative pre-cision is then ∼ 10−9. The line-like structure seen in the blue

region is no longer visible in the red and yellow regions because

the minima are 100x more closely spaced.

decide on the number of terms in the Taylor series to beincluded. Secondly, we must choose the number of points ineach look-up table i.e. the sampling in u. Thirdly, we needto apply an interpolation method to extrapolate to the re-quired value of u, in particular choosing the number of tablepoints to use for polynomial fitting. To explore this, we firstverified that including terms beyond n = 3 had a negligibleeffect for representative values of a and u. We then computeEq. (20), summing only up to n = 3, for combinations oflook-up table resolution and the number of table interpola-tion points, each time calculating the fractional precision,

δH

H=

∣∣∣∣Htrue −Hlook−upHtrue

∣∣∣∣ . (23)

In the equation above, Htrue is calculated by full (highprecision) numerical integration of the Voigt function andHlook−up is Eq. (20). It is not feasible to calculate Htrue inreal-time vpfit usage as calculations would be too slow.

Classical Lagrange interpolation is used to extract thefour coefficients H0(u), H1(u), H2(u), H3(u) for any valueof the parameter u (Eq. (A11)). We experimented using be-tween 2 and 6 look-up table values for interpolation. Fig. 4shows the dependence of δH/H on the number of points inthe look-up table and the number of points used for inter-polation within the look-up table. In practice, using 20,000points in the look-up tables and 6-point interpolation (thedefault vpfit settings adopted) yields a worst-case precisionon δH/H of ∼ 10−9, as Fig. 4 shows.

MNRAS 000, 1–15 (2021)

VPFIT theory 7

4 ABSORPTION LINE DERIVATIVES

4.1 Numerical derivatives – finite differences

The gradient vector and Hessian matrix in Eqs. (5) and (8)need the derivative of the absorption profile in Eq. (A1). Fi-nite difference derivatives (fdd) of I(x ) generally work wellalthough there are drawbacks. First, fdd intervals need tobe assigned for every fitting parameter in the calculation. Inpractice this generally means one fdd interval for all columndensity parameters, one fdd interval for all b-parameters, onefdd for all logN parameters, and one fdd interval for eachof the other types of free parameters. The optimal (in termsof precision) fdd interval depends on observational charac-teristics such as spectral resolution and spectral pixel size,as well as absorption line properties such as column density(e.g. heavily saturated lines may require different fdd inter-vals compared to unsaturated lines). Therefore for optimalprecision, different spectral characteristics require differentfdd interval settings. In principle, if using fdd, one couldcalculate optimal fdd intervals that minimise the overall er-ror on the target function. Methods for doing this are de-scribed in Gill et al. (1981) (section 4.6.1.3) and in Presset al. (2007). Nevertheless, in practice it is still difficult toassign individual intervals for every parameter for every ve-locity component in an absorption complex. This problemcan be completely eliminated by replacing the fdd with an-alytic derivatives. We therefore next describe an efficientway of calculating analytic derivatives of the Voigt function.The new method builds on the widely used Taylor series ex-pansion method applied to the Voigt function itself (Harris1948). Surprisingly, as far as we know, the idea has not beenapplied to derivatives of the Voigt function before (or in-deed to other parameter derivatives relevant to absorptionline spectroscopy).

4.2 Voigt function and other analytic derivatives

Derivatives of I(x ) can be expressed in terms of derivativesof H(a, u). The Harris (1948) method can also be used tocalculate analytic derivatives, as we now show.

dIνdxi

= Iν

(1

Io

dIodxi−

m∑j=1

dτjdxi

), (24)

where Io is the unabsorbed continuum intensity and for sim-plicity the explicit frequency dependence on τ (subscript)has been dropped so that we can assign an index for eachabsorption component. Ignoring convolution with the instru-mental profile for the moment (until Section 4.4), this func-tion can be calculated analytically, making use of Eqs. (A2)and (A3),

dτjdxi

= κjdNjdxi

+Njdκjdxi

. (25)

For each fitting parameter, xi, analytic derivatives can bespecified as follows.

(i) Redshift (xi = zi):

dIo/dzi = dNj/dzi = 0 ,

dajdzi

= 0 ,

dujdzi

=c

b

λoλ,

dτjdzi

= −√πe2fNiH,umebi∆νd

λoλδij ,

(26)

where λo is the rest-frame atomic wavelength, δij is theDirac delta function and H,u = ∂H/∂u.

(ii) b-parameter (xi = bi):

dIo/dbi = dNj/dbi = 0 ,

dajdbi

= −aibiδij ,

dujdbi

= −uibiδij ,

dτjdbi

= −√πe2fNimec∆νd

(H

bi+uiH,ubi

+aiH,abi

)δij ,

(27)

where H,a = ∂H/∂a.

(iii) Column density (xi = logNi):

dIod logNi

=duj

d logNi=

dajd logNi

= 0 ,

dNjd logNi

= (Ni ln10)δij ,

dτjd logNi

= (τi ln10)δij .

(28)

To solve equations (26) to (28), the Voigt functionderivatives H,a and H,u are needed, which are

H,a =

∞∑n=1

nan−1Hn and H,u =

∞∑n=0

anHn,u , (29)

where we have dropped notating the (a, u) dependence forsimplicity. It is usually true that a � 10−2, so that we caneasily reach an accuracy of � 10−10 if we expand to fourthorder. The first four terms of H,u can be obtained from

H0,u = −2ue−u2

,

H1,u =4√π

[u+

(1− 2u2)D(u)

],

H2,u =(−6u+ 4u3) e−u2

,

H3,u =4

3√π

[(5u− 2u3)+

(3− 12u2 + 4u4)D(u)

],

(30)

where we have used

dD(u)

du= 1− 2uD(u) . (31)

We adopt, as default internal settings in vpfit, thesame set of parameters for the resolution of the four look-uptables (20,000 points in each) and 6-point interpolation asbefore.

(iv) Continuum level:The continuum level for each spectral segment in the

absorption system model can be varied within vpfit. This isdone using a two-parameter fit, the constant normalisation

MNRAS 000, 1–15 (2021)


I(0)o and the first order derivative of the existing (i.e. the

user-provided) continuum level I(1)o ,

Io = I(0)o + I(1)o (λ− λc) , (32)

where I(0)o = Io|λ=λc , I

(1)o = (dIo/dλ)|λ=λc and λc is a user-

provided wavelength (commonly selected as the centre of thespectral segment being fitted). The analytical derivatives ofboth terms are

dIν

dI(0)o

=IνIo,

dIν

dI(1)o

= (λ− λc)IνIo.

(33)

Equations (33) are used to calculate the Hessian matrix andgradient vector components such that the two free contin-uum parameters may be solved for in the same way as theabsorption line Voigt parameters.

(v) Zero level adjustment:During the spectral data reduction, it may sometimes

be that sky subtraction is imperfect, or that some scatteredlight in the spectrograph causes a residual, non-astrophysicalzero level offset in the spectrum being analysed. There maybe other sources of scattered light. If this is not properly ac-counted for or removed prior to Voigt profile modelling, theVoigt parameters will be systematically biased. In practice,a continuum may be fitted prior to any attempt to detectany residual background in the observed spectrum. To prop-erly account for this, we can model the observed spectrumas

Iν = Zo + (1− Zo)Io exp

(−

m∑j=1

τj

),

dIνdZo

= 1− Io exp

(−

m∑j=1

τj

).

(34)

Expressed in this way, the parameter Zo allows for a residualbackground and corrects the local continuum level appro-priately. The derivative in Eq. (34) is then used to form therequired terms in the gradient vector and Hessian matrix.

(vi) Velocity shift:It is useful to have the capability of including an addi-

tional free parameter to each spectral segment that allows ashift (in velocity space) relative to other spectral segments.This may be desirable for a number of reasons, for eitherastrophysical reasons or to allow for for possible systematicuncertainties in the zero point of the wavelength calibration.This can be done simply, as follows.

The velocity shift effectively contributes to the redshiftparameter of all absorption components. This can be ex-pressed in terms of the distance from the line centre inDoppler width units, u,

dujdv

=∑i

dujdzi

dzidv

=∑i

(1 + zic

dujdzi

),

dτjdv

=∑i

dτjdzi

dzidv

=∑i

(1 + zic

dτjdzi

).

(35)

where v is the velocity shift parameter and duj/dzi, dτj/dzifollow Eq. (26). Equations (35) are used to calculate thederivatives required by the Hessian matrix and gradient vec-tor.

4.3 Comparing finite difference and analyticderivatives

It is informative to compare the relative precision of the fddand Taylor series derivative methods for different absorp-tion line parameters. To do so we examine derivatives withrespect to the b-parameter, for two cases, unsaturated andsaturated absorption lines, defining the following quantities:

Iν,b =dIνdb

, (36)

I fddν,b =Iν |b+δb − Iν |b−δb

2δb, (37)

I look−upν,b =

dIνdb

= Iode−τν

db= −Iν

dτνdb

=Iντνb

(1 +

uH,uH

+aH,aH

),

(38)

where H, H,a and H,u are calculated using look-up tablesas described in Section 4.2. We note again for clarity that Iνis the spectrum prior to convolution with the instrumentalprofile, whereas I(x ) is the convolved spectrum, (Iν ∗ G).

In Fig. 5, four sets of three panels are illustrated. Thetop 6 panels are for b = 3 and the lower 6 panels are forb = 30 km/s. The range in b-parameters covers the majorityof observed parameters for both hydrogen and heavy elementlines. Whilst b = 3 is unrepresentative of typical hydrogenlines seen in quasar spectra, it is not atypical for heavy el-ement lines so we represent the calculations this way, usingHI as the atomic transition, for convenience (so that otherquantities, notably Γ and the central wavelength, remainconstant).

The optical depth is given by

τν = N

√πe2

mec

fλ∆νd

H = κ0NH . (39)

See Appendix A1 for definition of quantities. The panels onthe left hand sides of Fig. 5 are for an unsaturated absorptionline with κ0N = 1 (b = 3 km s−1) and κ0N = 10 (b = 30km s−1), corresponding to a column density of logN = 12.55and logN = 13.55 atoms cm−2. The right hand panels arefor a saturated line with κ0N = 100 (b = 3 km s−1) andκ0N = 1, 000 (b = 30 km s−1), corresponding to a columndensity of logN = 14.55 and logN = 15.55 atoms cm−2.Each grouping of 3 panels plots the intensity I, the analyticderivative with respect to the velocity dispersion parameter|Ilook−up,b |, and the absolute difference between the analytic

and fdd derivatives |Ilook−up,b − I fdd,b |, as a function of rest-frame wavelength. The derivative panels illustrate severalmodels, discussed in the figure caption and indicated by thefigure legends.

The main things we learn from the quantities illustratedin Fig. 5 are:1. For the range of absorption line parameters considered,fdd and analytic derivatives agree well for suitably small fi-nite difference intervals,2. However, for unsuitably large fdd increments (e.g. 0.1km/s or larger) the absolute difference between fdd and an-alytic derivatives can be as large as ∼ 10−4 for unsaturatedlines and ∼ 10−2 for saturated lines, corresponding to per-centage differences of ∼ 0.1 and ∼ 1% respectively.

MNRAS 000, 1–15 (2021)

VPFIT theory 9

Translating the fdd precision constraints into practicalquantities such as non-linear least squares descent efficiencyis complicated because there are generally many variablesinvolved. In the idealised case considered here, there is onlyone single absorption line. In any real quasar absorption sys-tem, there are multiple blended components. In a complexabsorption system (comprising multiple blends), some ab-sorption parameters may be well determined whilst othersmay be very poorly determined or even completely degen-erate. In these cases, one single fdd interval setting may beoptimal for some absorption components, but inappropriatefor others. Analytic derivatives do not suffer from this dif-ficulty and hence completely avoid the potential problemsarising through choice of fdd interval.

4.4 Convolution with the instrumental profile foranalytic derivatives

When fdd are calculated from the model profiles, convolutionwith the instrumental profile has already taken place, so theinstrumental resolution is properly accounted for. However,the analytic derivatives of Section 4.2 have ignored the in-strumental profile. The derivative of a convolution theoreme.g. Bracewell (1978) can be employed,

∂ (Iν ∗ G)

∂xi=∂Iν∂xi∗ G , (40)

where ∗ denotes convolution, xi is the model parameter usedin Sec. 4.2, and G is the instrumental resolution. The re-quired analytic derivatives, i.e. the left hand side of Eq. (40),are thus obtained by first calculating Eqs. (24) and then con-volving those derivative arrays with the appropriate functionG (e.g. a Gaussian or a numerical function).

5 DISCUSSION

New astronomical facilities such as the ESPRESSO spectro-graph on the European Southern Observatory’s VLT (Pepeet al. 2014) and the forthcoming ELT, will carry out largeexplorations for new physics (Hook 2009; ESO ELT team2010, 2011; Liske et al. 2014; Marconi et al. 2016). Twoareas of particular study will be searches for temporal orspatial variations in fundamental constants and measure-ments of cosmological redshift drift. Both of these projectsrequire maximally precise absorption line modelling proce-dures. These things motivate the analytic and numericalmethods presented in this paper. Focusing on precision andcomputational practicality, we have aimed primarily to dothe following:

(i) present a detailed account of the theoretical methodson which the widely used code vpfit is based,

(ii) introduce a fast and precise method for calculatingVoigt function derivatives, and other relevant derivatives,newly implemented in vpfit,

(iii) describe a new optimisation method, incorporatingthe simultaneous use of two different fine tuning methodsdrawn from the Gauss-Newton and Levenberg-Marquardtapproaches.

ACKNOWLEDGMENTS

We are grateful to Mike Irwin for help with an early im-plementation of the Gauss-Newton method to Voigt pro-file modelling. We thank Jochen Liske for pointing out theimportance of sufficiently high sampling of the Voigt pro-file. Several people contributed either by making vpfit codemodifications or by testing, including Andrew Cooke, Vin-cent Dumont, Matthew Bainbridge, Julian King, Dinko Mi-lakovic, and Michael Murphy. We thank Pasquier Noter-daeme and Cedric Ledoux for raising the question of fre-quency vs. wavelength formulation for the Voigt function.JKW thanks the John Templeton Foundation for support.

DATA AVAILABILITY

The vpfit code and data used in this paper is availableat https://people.ast.cam.ac.uk/~rfc/. Any additionalmaterial not included there can be requested directly fromthe authors.

REFERENCES

Bainbridge M. B., Webb J. K., 2017a, Universe, 3, 34Bainbridge M. B., Webb J. K., 2017b, MNRAS, 468, 1639

Bevington P. R., Robinson D. K., 2003, Data reduction and error

analysis for the physical sciences; 3rd ed.. McGraw-Hill, NewYork, NY, https://cds.cern.ch/record/1305448

Bracewell R., 1978, The Fourier Transform and its Applications,second edn. McGraw-Hill Kogakusha, Ltd., Tokyo

Carswell R. F., Webb J. K., 2014, VPFIT: Voigt profile fitting

program, Astrophysics Source Code Library (ascl:1408.015)Carswell R. F., Webb J. K., 2020, Bob Carswell’s homepage,

https://people.ast.cam.ac.uk/~rfc/

Charnes A., Frome E., Yu P., 1976, Journal of The AmericanStatistical Association, 71, 169

Dahlquist G., Bjorck A., 1974, Numerical methods. Prentice-Hall

Series in Automatic Computation, Englewood Cliffs: Prentice-Hall, 1974

Dumont V., Webb J. K., 2017, MNRAS, 468, 1568

Dzuba V. A., Flambaum V. V., Webb J. K., 1999, Phys. Rev.Lett., 82, 888

ESO ELT team 2010, The science case for the European Ex-tremely Large Telescope: The next step in mankind’s quest

for the Universe. ESO online, https://www.eso.org/sci/

facilities/eelt/science/doc/eelt_science_book.pdf

ESO ELT team 2011, An Expanded View of the Universe Sci-

ence with the European Extremely Large Telescope. ESO on-

line, https://www.eso.org/sci/facilities/eelt/science/

doc/eelt_sciencecase.pdf

Fisher R., 1958, Statistical Methods for Research Workers. Bi-

ological monographs and manuals, Hafner, http://books.

google.com.au/books?id=qqNpAAAAMAAJ

Fontana A., Ballester P., 1995, The Messenger, 80, 37

Gill P. E., Murray W., Wright M. H., 1981, Practical optimiza-tion. Academic Press Inc. [Harcourt Brace Jovanovich Pub-

lishers], LondonHarris III D. L., 1948, ApJ, 108, 112Hook I., 2009, in Moorwood A., ed., Science with the VLT in the

ELT Era. Springer Netherlands, Dordrecht, pp 225–232Irwin M. J., 1985, MNRAS, 214, 575

King J. A., 2010, PhD thesis, School of Physics, UNSW Sydney,

http://handle.unsw.edu.au/1959.4/50886

King J. A., Mortlock D. J., Webb J. K., Murphy M. T., 2009,

Mem. Soc. Astron. Italiana, 80, 864

MNRAS 000, 1–15 (2021)


http://dx.doi.org/10.3390/universe3020034

https://ui.adsabs.harvard.edu/#abs/2017Univ....3...34B

http://dx.doi.org/10.1093/mnras/stx179

http://adsabs.harvard.edu/abs/2017MNRAS.468.1639B

https://cds.cern.ch/record/1305448


http://dx.doi.org/10.1080/01621459.1976.10481508

http://dx.doi.org/10.1080/01621459.1976.10481508

http://dx.doi.org/10.1093/mnras/stx381

https://ui.adsabs.harvard.edu/abs/2017MNRAS.468.1568D

http://dx.doi.org/10.1103/PhysRevLett.82.888


https://ui.adsabs.harvard.edu/abs/1999PhRvL..82..888D

https://www.eso.org/sci/facilities/eelt/science/doc/eelt_science_book.pdf

https://www.eso.org/sci/facilities/eelt/science/doc/eelt_science_book.pdf

https://www.eso.org/sci/facilities/eelt/science/doc/eelt_sciencecase.pdf

https://www.eso.org/sci/facilities/eelt/science/doc/eelt_sciencecase.pdf

http://books.google.com.au/books?id=qqNpAAAAMAAJ

http://books.google.com.au/books?id=qqNpAAAAMAAJ

https://ui.adsabs.harvard.edu/abs/1995Msngr..80...37F

http://dx.doi.org/10.1086/145047

http://adsabs.harvard.edu/abs/1948ApJ...108..112H

http://dx.doi.org/10.1093/mnras/214.4.575

https://ui.adsabs.harvard.edu/abs/1985MNRAS.214..575I


https://ui.adsabs.harvard.edu/abs/2009MmSAI..80..864K


Figure 5. Plots of I(λ), I,b = dI/db (Eq. (36)), and the difference between fdd and analytic derivatives, Ilook−up,b − Ianalytic,b for a

hydrogen Lyman-α absorption line. See Section 4.3 for details.

King J. A., Webb J. K., Murphy M. T., Flambaum V. V., CarswellR. F., Bainbridge M. B., Wilczynska M. R., Koch F. E., 2012,

MNRAS, 422, 3370

Krogager J.-K., 2018, arXiv e-prints, p. arXiv:1803.01187

Lampton M., Margon B., Bowyer S., 1976, ApJ, 208, 177

Lee H.-W., 2013, ApJ, 772, 123

Lee C. C., Webb J. K., Carswell R. F., 2020, MNRAS, 491, 5555

Lee C.-C., Webb J. K., Milakovic D., Carswell R. F., 2021a, arXiv

e-prints, p. arXiv:2102.11648

Lee C.-C., Webb J. K., Carswell R. F., Milakovic D., 2021b, MN-RAS, 504, 1787

Liske J., Spyromillio J., Tamai R., 2014, Top Level Requirementsfor ELT-HIRES, https://www.eso.org/sci/facilities/

eelt/docs/ESO-204697_1_Top_Level_Requirements_for_

ELT-HIRES.pdf

Marconi A., et al., 2016, in Ground-based and Airborne In-strumentation for Astronomy VI, eds. Christopher J. Evansand Luc Simard and Hideki Takami. SPIE, pp 676 –687, doi:10.1117/12.2231653, https://doi.org/10.1117/12.

2231653

Mihalas D., 1978, Stellar atmospheres. W.H. Free-man, San Francisco, http://www.amazon.com/

Stellar-Atmospheres-Series-astronomy-astrophysics/

dp/0716703599

Milakovic D., Lee C.-C., Carswell R. F., Webb J. K., Molaro P.,Pasquini L., 2021, MNRAS, 500, 1

Molaro P., Levshakov S. A., Monai S., Centurion M., Bonifacio

P., D’Odorico S., Monaco L., 2008, A&A, 481, 559

MNRAS 000, 1–15 (2021)

http://dx.doi.org/10.1111/j.1365-2966.2012.20852.x

https://ui.adsabs.harvard.edu/abs/2018arXiv180301187K

http://dx.doi.org/10.1086/154592

https://ui.adsabs.harvard.edu/abs/1976ApJ...208..177L

http://dx.doi.org/10.1088/0004-637X/772/2/123

http://adsabs.harvard.edu/abs/2013ApJ...772..123L

http://dx.doi.org/10.1093/mnras/stz3170

https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.5555L

https://ui.adsabs.harvard.edu/abs/2021arXiv210211648L

http://dx.doi.org/10.1093/mnras/stab977

http://dx.doi.org/10.1093/mnras/stab977

https://ui.adsabs.harvard.edu/abs/2021MNRAS.504.1787L

https://www.eso.org/sci/facilities/eelt/docs/ESO-204697_1_Top_Level_Requirements_for_ELT-HIRES.pdf



http://dx.doi.org/10.1117/12.2231653

https://doi.org/10.1117/12.2231653

https://doi.org/10.1117/12.2231653

http://www.amazon.com/Stellar-Atmospheres-Series-astronomy-astrophysics/dp/0716703599



http://dx.doi.org/10.1093/mnras/staa3217

https://ui.adsabs.harvard.edu/abs/2021MNRAS.500....1M

http://dx.doi.org/10.1051/0004-6361:20078864

https://ui.adsabs.harvard.edu/abs/2008A&A...481..559M

VPFIT theory 11

Murphy M. T., 2002, PhD thesis, School of Physics, UNSW Syd-

ney, http://handle.unsw.edu.au/1959.4/19062

Pepe F., et al., 2014, Astronomische Nachrichten, 335, 8

Pepe F., et al., 2021, A&A, 645, A96

Press W. H., Teukolsky S. A., Vetterling W. T., Flan-

nery B. P., 2007, Numerical Recipes 3rd Edition:

The Art of Scientific Computing, 3 edn. Cam-bridge University Press, http://www.amazon.com/

Numerical-Recipes-3rd-Scientific-Computing/dp/

0521880688/ref=sr_1_1?ie=UTF8&s=books&qid=1280322496&

sr=8-1

Rahmani H., et al., 2013, MNRAS, 435, 861

Schreier F., 2017, Journal of Quantitative Spectroscopy and Ra-diative Transfer, 187, 44

Struve O., Elvey C. T., 1934, ApJ, 79, 409

Tamai R., Koehler B., Cirasuolo M., Biancat-Marchet F., Tuti M.,

Gonzales Herrera J. C., 2018, in Marshall H. K., SpyromilioJ., eds, Society of Photo-Optical Instrumentation Engineers

(SPIE) Conference Series Vol. 10700, Ground-based and Air-

borne Telescopes VII. p. 1070014, doi:10.1117/12.2309515

Tennyson J., et al., 2014, arXiv e-prints, p. arXiv:1409.7782

Tepper-Garcıa T., 2006, MNRAS, 369, 2025

Transtrum M. K., Sethna J. P., 2012, arXiv e-prints, p.

arXiv:1201.5885

Webb J. K., 1987, PhD thesis, Institute of Astronomy, University

of Cambridge, (doi: https://doi.org/10.17863/CAM.72486),

https://www.repository.cam.ac.uk/handle/1810/325031

Webb J. K., Flambaum V. V., Churchill C. W., Drinkwater M. J.,

Barrow J. D., 1999, Phys. Rev. Lett., 82, 884

Webb J. K., Lee C.-C., Carswell R. F., Milakovic D., 2021, MN-

RAS, 501, 2268

Whiting E., 1968, Journal of Quantitative Spectroscopy and Ra-

diative Transfer, 8, 1379

Whitmore J. B., Murphy M. T., 2014, MNRAS, 447, 446

Wilczynska M. R., Webb J. K., King J. A., Murphy M. T., Bain-bridge M. B., Flambaum V. V., 2015, MNRAS, 454, 3082

APPENDIX A: ABSORPTION PROFILE

A1 Voigt profile

For a complex of m blended absorption profiles, the intensityas a function of frequency is given by

Iν = Io exp

(−

m∑j=1

τj

), (A1)

where Iν is the observed intensity, Io is the unabsorbed con-tinuum intensity, and τj is the optical depth. Instrumentalresolution is not included here. It is assumed (in vpfit) thata single opacity applies to a single absorption component ina complex, i.e. that

τj = κjNj , (A2)

where N is the column density of absorbing atoms and theabsorption coefficient (averaged over the gas cloud) is

κj =

√πe2fjH(a, u)jmec∆νd

(A3)

e.g. Mihalas (1978). Hereafter, we focus on a single absorp-tion component and drop the subscript j. H(a, u) is theVoigt function, f is the atomic oscillator strength, e and me

are the electron charge and mass, and c is the speed of light.The Doppler width ∆νd is related to the velocity dispersion

parameter b and the line central frequency in the rest frameνo by

∆νd = bνo/c . (A4)

In general, the observed velocity dispersion parameter is thequadrature addition of the thermal and turbulent broaden-ing (Struve & Elvey 1934),

b2obs = b2thermal + b2turbulent

=2kT

M+ b2turbulent , (A5)

where M is the atomic mass.The Voigt function H(a, u) is the ratio of the (frequency

dependent) absorption coefficient to the absorption coeffi-cient at the line centre, i.e.

H(a, u) =κνκ0

, (A6)

where

κ0 =

√πe2f

mec∆νd. (A7)

H(a, u) is expressed in terms of two dimensionless parame-ters. The first,

a = Γ/4π∆νd , (A8)

gives the ratio of the natural line width to the Dopplerwidth and depends on the damping constant, the frequencyof the line centre, the temperature, and the atomic mass. Γis the damping constant, the sum of the spontaneous emis-sion rates. If the transition is from the ground state, this is

Γ =

k∑j=1

Akj , (A9)

where k refers to the upper level. If the lower level is also anexcited state,

Γ =

l∑i=1

Γli +

u∑i=1

Γui . (A10)

The second dimensionless argument of the Voigt func-tion expresses the distance from the line centre in Dopplerwidth units,

u =νr − νo

∆νd=λr − λobλr/c

, (A11)

where λo = c/νo, νr = ν(1 + z) and λr = λ/(1 + z) arethe rest-frame frequency and wavelength, λ is the observed-frame wavelength, and z is redshift. Since ∆νd is constant,u is symmetric in frequency space. Optical/UV spectra aregenerally plotted (and analysed) in wavelength space, whichis what vpfit uses.

A2 Should the Voigt function be formed infrequency space or in wavelength space?

This seemingly trivial question is in fact important5. His-torically, the semi-classical Voigt function was derived in

5 Nikola Tesla reputedly said: “If you want to find the secrets of

the Universe, think in terms of energy, frequency and vibration.”

MNRAS 000, 1–15 (2021)


http://dx.doi.org/12.1002/asna.201312004

https://ui.adsabs.harvard.edu/abs/2014AN....335....8P

http://dx.doi.org/10.1051/0004-6361/202038306

https://ui.adsabs.harvard.edu/abs/2021A&A...645A..96P

http://www.amazon.com/Numerical-Recipes-3rd-Scientific-Computing/dp/0521880688/ref=sr_1_1?ie=UTF8&s=books&qid=1280322496&sr=8-1




http://dx.doi.org/10.1093/mnras/stt1356

https://ui.adsabs.harvard.edu/abs/2013MNRAS.435..861R

http://dx.doi.org/https://doi.org/10.1016/j.jqsrt.2016.08.009

http://dx.doi.org/https://doi.org/10.1016/j.jqsrt.2016.08.009

http://dx.doi.org/10.1086/143551

http://adsabs.harvard.edu/abs/1934ApJ....79..409S

http://dx.doi.org/10.1117/12.2309515

https://ui.adsabs.harvard.edu/abs/2014arXiv1409.7782T

http://dx.doi.org/10.1111/j.1365-2966.2006.10450.x

https://ui.adsabs.harvard.edu/abs/2006MNRAS.369.2025T



https://www.repository.cam.ac.uk/handle/1810/325031


https://ui.adsabs.harvard.edu/abs/1999PhRvL..82..884W



https://ui.adsabs.harvard.edu/abs/2021MNRAS.501.2268W

http://dx.doi.org/http://dx.doi.org/10.1016/0022-4073(68)90081-2

http://dx.doi.org/http://dx.doi.org/10.1016/0022-4073(68)90081-2

http://dx.doi.org/10.1093/mnras/stu2420

http://dx.doi.org/10.1093/mnras/stv2148

https://ui.adsabs.harvard.edu/abs/2015MNRAS.454.3082W


frequency space. vpfit uses frequency space (Eqs. (A4) and(A11)). However, formulations of the Voigt profile exist inwhich wavelength rather than frequency is the function vari-able. Examples of this are Whiting (1968), the analytic ap-proximation of Tepper-Garcıa (2006), codes such as FITLY-MAN (Fontana & Ballester 1995), and VoigtFit (Krogager2018). Given the broad usage of the Voigt function, theremay be many other codes that also use wavelength space,particularly in fields other than astrophysics.

A Voigt function in which the u parameter is defined as

u =λr − λobλo/c

(A12)

cannot share the same symmetry as a Voigt function us-ing Eq. (A11). The symmetry difference increases with in-creasing distance from the line centre as can be seen in thetwo damped Lyman α (DLA) Voigt profiles illustrated inFig. A1. The red dashed line (using Eq. (A11)) is symmet-ric in frequency space and the orange dotted line (usingEq. A12) is symmetric in wavelength space. Both cannotcorrectly represent Nature and the following examples sug-gest that frequency is the fundamental quantity:

(i) The energy level spacings in an atom define transitionfrequencies. The natural line width of a transition is givenby the lifetimes of the upper and lower states, as requiredby the Heisenberg Uncertainty principle, thus is defined bythe damping constant Γ in Hertz (Eqs.A9 and A10).

(ii) Wavelength, not frequency, changes when light crossesa refractive index boundary. This is obvious from deBroglie’s wave equation, λ = h/p; as a photon’s momentumchanges, so does its wavelength.

(iii) Noether’s Theorem shows that invariance under timetranslation leads to the principle of conservation of energy.Changing the frequency in the same reference frame wouldviolate energy conservation.

These considerations illustrate that frequency is the morefundamental quantity and hence that the correct formu-lation of H(a, u) should be in frequency space; Eq. (A11)should always be used. Eq. (A12) should not be used.

Searches for spacetime variations of fundamental con-stants require placing constraints on tiny shifts in spectro-scopic transitions. At lower column densities (relevant forhigh redshift varying fine structure constant measurementsusing heavy element absorption lines), the symmetry differ-ence between the uν and uλ profiles is small although itincreases with column density. This can be illustrated bycomparing line centroids,

CentroidHν,Hλ =

∫∞−∞ λ (1− IHν,Hλ) dλ∫∞−∞ (1− IHν,Hλ) dλ

, (A13)

where IHν,Hλ represents the model intensity calculatedeither using the Voigt function defined in frequency space,or the Voigt function defined in wavelength space. Thecentroid shift between Lyman α lines calculated bothways is about 1.3 × 10−5A for NHI = 1014 atoms cm−2,increasing to 0.11A for NHI = 1020 atoms cm−2 (bothcomputed using a representative b-parameter of 15 kms−1, although these shifts are relatively insensitive to b).A centroid shift of 1.3 × 10−5 A, assuming zabs = 3,means that Eq. (A12) gives a line position that is wrong

by ∼ 1 m/s. The present best-case fractional uncertaintyon the measured redshift of a heavy element absorptionline is ∼ 1 × 10−7, which is 30 times larger. Nevertheless,Eq. (A12) should be avoided since systematic biases couldaccumulate over large statistical samples of high signal tonoise measurements.

A3 The Kramers-Heisenberg-Thermal (KHT)profile

The Voigt function treats each atomic transition in isolationas a classical 2-level damped harmonic oscillator, ignoringinteractions associated with other discrete levels, and ignor-ing transitions from continuum energy states to other boundlevels. Despite this, the Voigt model is very accurate exceptat higher column densities, where interactions between mul-tiple discrete levels and with continuum transitions becomeimportant.

Deficiencies of the Voigt profile in some applicationshave been widely discussed. Tennyson et al. (2014) discussan alternative to the Voigt profile, the Hartmann–Tran pro-file, which some authors recommend for high resolution spec-troscopy, eg. Schreier (2017). Lee (2013) and earlier papersby the same authors have highlighted significant departuresbetween the semi-classical Voigt profile and the quantummechanical Kramers-Heisenberg profile. However, unlike theVoigt profile, the Kramers-Heisenberg profile does not in-clude thermal broadening, prompting Lee et al. (2020) to de-velop the Kramers-Heisenberg-Thermal profile (KHT). Onecannot, of course, define an exact column density belowwhich Voigt is accurate and above which KHT should beused, as this depends on data quality and the required pre-cision. However, as a rule-of-thumb, one should be wary ofapplying the Voigt profile to column densities above ∼ 1020

atoms cm−2.Fig. A1 provides a comparison between the Voigt

profiles (derived in frequency and wavelength space) andthe KHT profile, for a column density of 1022 atoms cm−2.The KHT profile has been implemented in vpfitv12.2.

APPENDIX B: FURTHER NUMERICALCONSIDERATIONS

B1 The importance of sub-binning in practicalcalculation

When computing Eq. (A1), it is essential to do so using a finegrid, usually much finer than the original spectral data. Thesub-bin pixel size must be small enough for the shape of theabsorption profile to be effectively linear over the wavelengthrange spanned by a single sub-pixel. If this is not done (i.e.if instead the profile is simply evaluated at the centre of alarge pixel), the opacity model will be wrong. In practice,vpfit assigns a default value but this can be user-defined ifrequired.

MNRAS 000, 1–15 (2021)

VPFIT theory 13

1160 1180 1200 1220 1240 12600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure A1. Damped Lyman-α profile with logNHI = 22.0and b = 15 km s−1. Blue continuous line: Kramers-Heisenberg-

Thermal profile (Lee et al. 2020). Red dashed profile: Voigt profile

using uν = (ν − νo)/∆νd and ∆νd = bνo/c. Orange dotted line:Voigt profile using uλ = (λo − λ)/(bλo/c).

B2 Parameter update bounds

When the model parameters are far from the best-fit solu-tion, the parameter update vector p may be unstable suchthat one or more parameters are driven even further awayfrom the correct solution. The simple solution to this prob-lem is to constrain each parameter update to be within someupper bound. Empirically, the most unstable parameter isprobably logN . When bad blending/ill-conditioning occurs,this parameter can be poorly constrained and, if left un-bound, can shoot to very low or implausibly high values.The problem can be avoided i.e. stability can be improved,by limiting the maximum step at any vpfit iteration. Sim-ilar effects can occur for b-parameter and ∆α/α. Parameterties amongst several species (Carswell & Webb 2020) helpsto avoid these problems (for b and of course for ∆α/α),but not always. Overall parameter bounds are already in-cluded in vpfitv12.3. As of vpfitv12.3 maximum step sizes(within each vpfit iteration) can also be user-defined, ordefault presets can be used.

B3 Sensitivity of best-fit model parameters toinitial parameter guesses

For one single-component unblended absorption system,provided stopping criteria are appropriately set, thereshould be no sensitivity of the final best-fit model to initialparameter guesses. However, for multiply blended absorp-tion components, this is not the case and in general it ispossible to find multiple models that give similarly statisti-cally acceptable representations of the data.

ai-vpfit, is an artificial intelligence version of vpfitthat builds on the work of Bainbridge & Webb (2017b,a)and is described in Lee et al. (2021b). It employs a MonteCarlo method for generating first guess parameters for eachabsorption component fitted in an absorption complex. In-

dependent runs of ai-vpfit on the same spectrum use dif-ferent random number seeds for trial absorption componentplacement. This means that model construction progressesdifferently in each ai-vpfit run, emulating the different ap-proaches that would be taken by different humans doing thejob interactively using vpfit. By running ai-vpfit multi-ple times, Lee et al. (2021a) explored the sensitivity of finestructure constant (α) measurements to the initial parame-ters guesses; multiple ai-vpfit runs end up with slightly dif-ferent final answers, revealing sub-structure in χ2-parameterspace. In other words, multiple ai-vpfit runs reveal modelnon-uniqueness. Lee et al. (2021a) show that the final bestfit values of α may indeed depend on the manner in whichmodels are constructed.

The degree of model non-uniqueness was shown to de-pend on the particular Information Criterion used (the cor-rected Akaike Information criterion (AICc), the Bayesian In-formation Criterion (BIC), or the Spectroscopic InformationCriterion (SpIC)), as well as the physical line broadeningmechanism used (turbulent, thermal, or compound broad-ening). Since only two high redshift quasar absorption sys-tems were studied in Lee et al. (2021a), the extent to whichthese things are generally true awaits a larger study.

In the context of vpfit usage, it may be assumed thatthe best-fit model parameters could depend on the way inwhich the model for an absorption complex comprising mul-tiple components is constructed.

B4 Information criteria; avoiding over- orunder-fitting

Over-fitting with too many model parameters creates spu-rious line bends, in turn causing potentially biased resultsand unnecessarily increased parameter errors. Under-fittingresults in artificially small parameter uncertainties but in-creased scatter (Wilczynska et al. 2015). Using the value ofbest-fit χ2 as an indicator of the appropriate number of freeparameters to use for any model is unreliable for two rea-sons: (a) χ2 does not minimise as a function of the numberof free parameters so only a lower limit on the number offree parameters can be obtained, and (b) in any case, it isgenerally the case that the spectral error array is imperfectsuch that the absolute value of χ2 is uncertain. For thesereasons, the appropriate number of free parameters shouldbe selected using an Information Criterion (IC). Using anIC provides an objective and reproducible method for im-plementing the “Principle of Parsimony”, i.e. of using thecharacteristics of the data to find an optimal balance be-tween over- and under-fitting.

A detailed discussion about IC methods for deciding onthe appropriate number of free model parameters for anyabsorption complex is given in Webb et al. (2021). That pa-per compares three ICs: BIC, AICc, and a new informationcriterion designed specifically for spectroscopy, SpIC. Webbet al. (2021) showed that BIC does not work as well as ei-ther AICc or SpIC for absorption system modelling and thatSpIC seems to offer significant advantages over AICc. SpIChas been implemented in ai-vpfit (Lee et al. 2021b).

MNRAS 000, 1–15 (2021)


B5 Spatial segregation of species

The process of tieing redshift parameters between differentatomic (or molecular) species raises the question of spatialsegregation between different ionisation states or betweendifferent elements. The question is of particular importancein the context of measurements of the fine structure constantat high redshift. The Many Multiplet method (Dzuba et al.1999; Webb et al. 1999) has been used to search for spec-troscopic wavelength shifts between multiple atomic speciesthat may be caused by spacetime variation of the fine struc-ture constant α. If different species do not cohabit the samespatial location and if no account is taken of this, measure-ments of any α variation could be biased. If the same speciesis used (e.g. Fe ii), the problem does not exist.

Fig. B1 provides a graphical representation of speciesseen in quasar absorption systems, illustrating the range ofionisation potentials. To take one (extreme) example, C iand C iv have IPs of 11.3 and 64.5 eV and their relativestrengths therefore vary substantially in different physicallocations with a galaxy halo, according to local ionisationand other parameters. Should spatial segregation exist be-tween species of widely different atomic number due to grav-itational effects, e.g. carbon and zinc, this too could mimica varying α.

vpfit correctly accommodates these effects when red-shift parameters for the two species are tied because columndensity parameters are free to iterate. If, for example, thereis very little C iv in a velocity component exhibiting C i ab-sorption, vpfit will reduce the C iv column density (anddiscard the line if it falls below the detection threshold).Moreover, any missing components at slightly different red-shifts will be discovered by a system like ai-vpfit (Lee et al.2021b). To summarise the main points of the previous dis-cussion:1. No bias is expected even if all ionisation states are tiedat the same redshift;2. Free (untied) column density parameters permit speciesto be discarded if appropriate;3. Where visibly different velocity structures exist e.g. forC iv and CI, in principle there is no good reason not to tieredshifts in an initial fit because when column densities iter-ate to a value below the nominal detection thresholds, vpfit“naturally” allows for this and effectively unties initially tiedcomponents.In other words, potential spatial segregation of species isproperly accounted for when redshifts are tied, simply be-cause column density parameters are allowed to vary freely.

B6 Parameter errors from the Hessian

Informative discussions on parameter error estimation maybe found in Bevington & Robinson (2003); Dahlquist &Bjorck (1974); Gill et al. (1981); Press et al. (2007); Irwin(1985); Lampton et al. (1976). Once a final model has beenfound, the unmodified Hessian, Eq. (10) is used to derivemeaningful parameter uncertainties from the Hessian diag-onals. The parameter covariance matrix V is then obtainedfrom the inverse of the Hessian matrix (Fisher 1958),

V(x ) = G(x )−1 . (B1)

Parameter uncertainties derived from the covariance

matrix of course provide no information about additionalsystematic errors that may be present in real data. If off-diagonal terms in the covariance matrix are zero, the errorsin the parameters are independent and V (x )qq is the esti-mated variance of the qth parameter. Generally however,the off-diagonal terms are non-zero and some level of corre-lation exists between parameters. The usual approximationis to ignore such correlations, to quote (for simplicity) onlydiagonal terms as approximate uncertainties, and to checktheir validity using Monte Carlo calculations and syntheticspectra. This has been done many times for vpfit modelsand in all cases the covariance matrix errors are found tobe consistent with (or slight over-estimates compared to)Monte Carlo results, Webb (1987); Murphy (2002); Kinget al. (2009); King (2010).

B7 Importance of fitting range selection

Modelling absorption lines requires a decision as to howmuch of the flanking continuum regions are included inthe fit. The best-fit model parameters are sensitive to this.Wilczynska et al. (2015) carried out a detailed study of23 absorption systems, comparing the results from differentmodels obtained using varying amounts of continuum flank-ing regions. The overall results for the sample showed thatif the continuum flanking regions were too small, increasedscatter was found i.e. additional measurement errors wereintroduced.

As far as we know, this effect has not been quantifiedbeyond the Wilczynska et al. (2015) measurements but theimplications are obvious: spectral fitting segments shouldbe selected such that they are flanked by line-free regionseither side. Damped absorption lines were not looked at inthe Wilczynska et al. (2015) study. For non-damped lines, areasonable rule-of-thumb would be that these regions shouldbe no less than the width of an individual absorption lineand preferably larger.

B8 Correcting spectral error arrays

During the numerical procedures used in taking multipleraw astronomical exposures from the telescope to a usableco-added one-dimensional spectrum, re-binning to a com-mon wavelength grid (and other processes) result in smallpixel-to-pixel correlations in the final spectrum. It is wellknown that these effects create a slight smoothing of thedata, biasing the value of χ2 to smaller values. In principle,this could be accommodated by correcting the derivatives ofχ2 with respect to free variables and hence modifying theHessian matrix and gradient vector accordingly, as discussedin Irwin (1985). In practice however, this is difficult to imple-ment (because the required noise covariance matrix is gen-erally unknown). The reader is referred to a comprehensivediscussion on this problem, and solutions to it, in appendixB (“Re-binning and combining spectra”) in the vpfit docu-mentation (Carswell & Webb 2020).

B9 Linear distortion

The first check for the presence of potential wavelength dis-tortion effects in echelle quasar spectra was made by Molaro

MNRAS 000, 1–15 (2021)

VPFIT theory 15

101

102

Ionisation Energy (eV)

1H

6C

7N

8O

12Mg

13Al

14Si

16S

22Ti

24Cr

25Mn

26Fe

28Ni

30Zn

Spe

cies

I

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

I II III IV V VI

Figure B1. Representation of ionisation potentials including absorption species seen in quasar absorption systems. The y-axis shows the

element and atomic number. ‘I’ indicates the energy above which one electron is lost, ‘II’ indicates the energy above which two electronsare lost, etc. Figure kindly provided by Vincent Dumont.

et al. (2008). Whilst that study did not reveal distortions,improved data showed that for a single exposure, distortioncould be identified and that it could be modelled reasonablywell using a simple linear function in velocity space Rahmaniet al. (2013). Subsequently, Whitmore & Murphy (2014) ap-plied a simple linear correction to UVES measurements ofthe fine structure constant by King et al. (2012). However, asimple linear distortion function is inappropriate in general;the long integration times required to obtain an acceptablesignal to noise for quasar spectroscopy require multiple ex-posures, often taken using different central wavelength set-tings. This means that in practice, any realistic distortionfunction is a complex combination of shifted linear func-tions and rarely resembles a simple linear relationship. Forthis reason, a detailed analysis was reported in Dumont &Webb (2017), describing more appropriate distortion func-tions, determined on a case by case basis.

The Dumont & Webb (2017) distortion modelling wascarried out as an external analysis, separate from the vp-fit process. The main disadvantage of externally solvingfor distortion was one of calculation time; each quasar ab-sorption system required a large set of models, minimisingthe overall system χ2 as a function of the distortion modelparameters. The Dumont & Webb (2017) model has nowbeen implemented within vpfit such that all parameters,

i.e. absorption parameters as well as distortion parameters,are solved for simultaneously. The approximation made isthat the velocity shift pattern can be described by a singleparameter γ, the slope of the linear distortion model,

vdist,i(λ) = γ(λ− λcent,i) , (B2)

where λcent,i is the central wavelength of the ith exposure.Whilst this approximation is somewhat unsatisfactory, wehave little choice because a model adopting one slope foreach exposure used in the co-added spectrum would cre-ate modelling degeneracies. The overall velocity shift is theweighted linear combinations of vdist,i(λ),

vnet =∑i

√Tivdist,i(λ)√

Ti, (B3)

where√Ti is the square root of exposure time of the ob-

servational segment, thus playing the role of the weight forthe ith exposure. The distortion slope parameter γ is takenas a free fitting parameter and its statistical uncertainty isderived from the Hessian matrix at the best fit in vpfit.

This paper has been typeset from a TEX/LATEX file prepared bythe author.

MNRAS 000, 1–15 (2021)

Date post:	16-Oct-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times