Improvements in Computational E ciency for CBC Analysis of ...II. GRAVITATIONAL THEORY A. General...

Improvements in Computational Efficiency for CBC Analysis of AdvancedGravitational Wave Detector Data

James Michael Bell1, 2

1 Millsaps College 1701 N. State Street Jackson, MS 39210,2Nikhef, Science Park 105, 1098 XG Amsterdam, The Netherlands

(Dated: August 14, 2013)

Bayesian inference algorithms have been adopted to perform parameter estimation and modelselection for gravitational wave data from compact binary coalescence (CBC). However, their highprecision has a high computational cost, which is especially apparent when analyzing signals fromlow mass systems. Here we show that parallelizing the Nested Sampling algorithm is a viable meansof reducing the required computational time and discuss the optimal settings necessary to do so.We also produce an algorithm that variably resolves the frequency domain of an inspiral signal,reducing the time required to produce the waveform by an order of magnitude, and explore itsapplicability to the time domain. Both methods produced promising preliminary results that willlead to further optimization and implementation into the analysis software for advanced detectorgravitational wave data.

I. INTRODUCTION

Mankind has observed the cosmos and sought to un-derstand the underlying principles and mechanisms thatgovern the universe. Until now our observations havebeen of the electromagnetic spectrum, conventionallyclassified by names such as radio, microwaves, infraredlight, visible light, ultraviolet light, x-rays and gammarays. Each of these forms of electromagnetic radiationhave led to the discovery of previously unknown phenom-ena with consequences reaching far beyond the physicscommunity, such as the development of communicationand medical technology. However each of these forms ofradiation come from the same spectrum. A remainingconsequence of General Relativity is expected to open anew window on the universe in the form of a completelyseparate specturm of radiation which will forever changethe way in which it is observed−gravitational waves. Thefirst gravitational wave detection will undoubtedly proveequal in historical significance to the ivention of the tele-scope and produce many unexpected results, furtheringour understanding of the nature of the universe.

In the early twentieth century, physicist Albert Ein-stein developed the Theory of Relativity. Until thenspace, time, matter and energy were all considered sep-arate entities. The theory suggests instead that spaceand time are manifestations of the four-dimensional re-ality in which we live known now as “spacetime” andthat mass and energy are equivalent notions. Relativityarises from the principle that the speed of light is con-stant and concludes that the gravitational interactionswe experience are not due to an independent force, butrather the curved geometry of spacetime. The geometryof spacetime, far from static, is capable of fluctuating in amanner similar to the surface of water. Just as ripples areproduced when a stone is thrown into a pond, some phys-ical systems are capable of sending ripples in spacetimeacross the universe, which we denote gravitational waves.Physicists from the Virgo and LIGO (Laser Interferom-

eteric Gravitational Wave Observatory) collaborationshave produced specialized Michelson interferometers de-signed to detect the path length difference of their twodetector arms by using high-power lasers. The premisebehind these massive rulers is that a passing gravitationalwave will cause a fluctuation in the spacetime metric,which will change the length of one detector arm withrespect to the other and produce recognizable patternsin the output. Currently, these devices are undergoingan upgrade to their second generation, “advanced” con-figuration, and it is expected that the resulting boost insensitivity will enable physicists to measure the first evergravitational wave signals and open this new window onthe universe.

The first detected signals in ground based detectorsare expected to come from coalescing compact binariesconsisting of black holes (BBH), neutron stars (BNS) orboth (NSBH). The reason for this is due to the strongquadrupole moment possessed by such systems, givingoff more gravitational radiation than any other source.The detection rate for such interactions with second gen-eration detectors is estimated at 40 events per year; how-ever, the possible range is from 0.04 to 400 per year [2].As the distances over which we may observe gravitationalradiation increase, these numbers will grow. Also as thedetectors improve, it is expected that we will eventuallybe capable of observing burst signals from supernovae,various stellar dynamical processes, and gain access tonew standard candles for measuring cosmological dis-tances more accurately. Initial LIGO and Virgo detectorswere capable of measuring a signal in the visible band forapproximately 30 seconds, while the advanced configura-tions will allow a signal to be visible for over 3 minutes.This is due to the detection of lower frequencies throughimproved detector technology, thus increasing the chirptime [9] of the wave. The subsequent data available fromlonger waveforms will allow for more accurate parameterestimation. However, the added cost associated with theincreased volume of data is the increase in time required

2

to analyze it.

The framework currently used to analyze the data fromgravitational wave signals is known as the LIGO Algo-rithm Library suite, or LALsuite, which houses Bayesianinference software called LALInference for signal analy-sis among other things. Bayesian inference is a math-ematically straightforward approach to performing dataanalysis and has been implemented to determine the vi-ability of various hypotheses and perform parameter es-timation by returning probability density functions rep-resenting the likelihood of various values. Examples ofparameters in gravitational wave astronomy include skyposition, masses, distance to the source, and the magni-tude and direction of the associated spin for each objectin the system. For every unique combination of theseand other parameters, a unique gravitational wave canbe produced. The downside to Bayesian inference is theexpensive computational cost of integrating over a high-dimensional parameter space. To address this problem ingravitational wave analysis, a method known as NestedSampling [7] has been implemented. It takes as inputsthe likelihood and prior density functions, maps the pa-rameter space to one dimension, returns the evidence in-tegral and samples from the posterior distribution usingMarkov-Chain Monte Carlo (MCMC) integration meth-ods (see [8]). For an overview of this method, see [4], andfor details on the current implementation of this proce-dure in gravitational wave detection, see [6].

Further boosts in the computational efficiency of LAL-Inference implementation of Nested Sampling could im-prove the rate at which analysis of detector data is per-formed. This paper investigates two separate efficiencyboosts to the LALInference software currently used bythe Compact Binary Coalescence (CBC) group and dataanalysis of inspiral signals. First, we investigated an op-timal parallelization of the implementation of the nestedsampling algorithm and also confirmed the accuracy ofthe results given further parallelization. We also achievedan increase in efficiency through the variable resolutionof the frequency domain. Current algorithms samplethe frequency domain waveform at a regular frequency,meaning that at the high-frequencies, the waveform isoversampled. We present an algorithm to break up thefrequency domain of waveforms and sample each brokeninterval at its Nyquist time to decrease the number ofpoints to be processed. We also briefly discuss the the-ory, expected efficiency boost and potential sources oferror of applying a similar approach to the time domain.

The paper is organized as follows: in Section II weprovide a review of concepts pertinent to gravitationalwave physics and binary inspiral signals; in Section IIIwe describe Bayesian inference and how it is used to ex-tract and analyze data; Section IV describes the NestedSampling algorithm and elaborates on its implementationin LALInference; Section V presents the procedure andresults of the parallelization performed; and Section VIcontains a detailed description of the variable resolutionalgorithm as well as some preliminary results.

II. GRAVITATIONAL THEORY

A. General Relativity

General Relativity is a metric theory of gravity of thatexpresses gravitational interactions through the geomet-ric curvature of spacetime. In flat, Minkowski spacetime,we can denote a line element representing the displace-ment between two events in spacetime by the following

ds2 = −c2dt2 + dx2 + dy2 + dz2. (1)

This can be generalized to systems of arbitrary curvaturedescribed by any particular coordinate system throughthe introduction of the metric tensor, gµν . The compo-nents of this tensor define the relationship between ds2

and the coordinate directions dxµ and dxν :

ds2 = gµνdxµdxν . (2)

From this relationship, it is clear that the metric tensordefines lengths in arbitrary coordinate systems.

General Relativity relates the stress-energy tensor Tµνto the Einstein tensor, Gµν , and thus the lengths mea-sured in an arbitrary coordinate system, through a sys-tem of second order differential equations called the Ein-stein field equations, which are

Gµν = Rµν −1

2gµνR =

8πGNc4

Tµν , (3)

where the Ricci tensor, Rµν , and the Ricci scalar, R, arecontractions of the Riemann tensor. From this we can seethat the Einstein tensor contains information about thecurvature of a system as well as the lengths described bythe metric tensor. Therefore, the physical significance ofthe field equations is that they connect the stress-energydensity to the curvature of spacetime produced by anobject. In short, the more massive an object, the greatercurvature it will produce and the stronger a gravitationalinteraction it will have with other objects.

B. Gravitational Radiation

The Einstein field equations can be linearized by con-sidering a metric perturbation, hµν , on a Minkowski met-ric

gµν = ηµν + hµν , hµν << 1. (4)

Writing the linearized Einstein field equations in termsof the d’Alembertian operator, given by ≡ ∂µ∂

µ, wefind

hµν+ηµν∂σ∂ρhρσ−∂ρ∂ν hµρ−∂ρ∂µhνρ =

−16πGNc4

Tµν .

(5)From here, it is convenient to use harmonic gauge suchthat ∂ν hµν = 0, which simplifies the expression above tothe form

hµν =−16πGN

c4Tµν . (6)

3

Outside the source where Tµν = 0, the field equationssimplify even further to

hµν = 0, (7)

which is immediately recognized as the wave equation(− 1

c2∂

∂t2+∇2

)hµν = 0. (8)

Therefore, we see that Einstein’s field equations give riseto waves which travel at the speed of light, and it is thiswavelike nature that gives rise to the notion of gravita-tional waves.

1. CBC Inspiral Signals

Binary coalescence is theorized to be the first astro-physical source observed by advanced configuration grav-itational wave detectors, and for this reason there is anenormous effort in the field of General Relativity to bet-ter understand their theoretical behavior. In this section,we focus on deriving the form of a binary inspiral signalfrom the Newtonian order approximation and present aformula for the frequency of a gravitational wave as afunction of time for later use.

FIG. 1: A circularly orbiting binary system

Observing the system described in figure 1, we see thatit consists of two masses, m1, and m2, separated by adistance R, orbiting their common center of mass in thecounterclockwise direction. The orbital plane intersectsthe x-axis, making an angle ι with the observer’s lineof sight along the z-axis. Additionally, we see that theobjects are not rotating themselves, leaving us with atotal of nine parameters. Spinning objects would require3 parameters each to describe their rotations, namely a

magnitude and two angles describing orintation, raisingthe total number of parameters in such cases to 15. Thepositions of the masses as a function of time, denoted~x1(t), ~x2(t), are given by

~x1 =m2

m1 +m2Re(t) =

µ

m1Re(t) (9)

~x2 =−m1

m1 +m2Re(t) =

µ

m2Re(t) (10)

where e(t) = (cos(ωt), sin(ωt) cos(ι), sin(ωt) sin(ι)) andµ = m1m2/(m1 +m2) is the reduced mass.

The spatial tensor, representing the second moment ofthe mass distribution, or quadrupole moment, and canbe written as

M ij =1

c2

∫T 00(t, ~x)xixjd3x =

∫ρ(t, ~x)xixjd3x, (11)

where we rewrite T 00/c2 as the mass distribution ρ(t, ~x),which can be expressed by

ρ(t, ~x) = m1δ3

(~x− µ

m1Re(t)

)+m2δ

3

(~x− µ

m2Re(t)

).

(12)Likewise, the strain amplitude of a gravitational wavemeasured in the approximately flat geometry far fromthe source [1] is

hij =2

r

d2M ij(t)

dt2. (13)

Since the system system studied here is positioned suchthat the observer is along the z axis, the spatial compo-nents of the amplitude tensor are given by

hTTij =

h+ h× 0h× −h+ 00 0 0

, (14)

where h+ and h× represent the two independent polariza-tions. In terms of the spatial tensor, the +-polarization,h+, and ×-polarization Mij are given by

h+ =1

r

GNc4

(M11 − M22) (15)

h× =2

r

GNc4

M12. (16)

Therefore the relevant components of the spatial tensorare given by

M11 = µR2 cos2(ωt) (17)

M22 = µR2 sin2(ωt) cos2(ι) (18)

M12 = µR2 cos(ωt) sin(ωt) cos(ι). (19)

Substituting their second derivatives

M11 = −2µR2ω2 sin(2ωt) cos(ι) (20)

M22 = −2µR2ω2 cos(2ωt) (21)

M12 = 2µR2ω2 cos(2ωt) cos2(ι) (22)

4

into the polarization equations from equation 16, we findthat the polarizations obtain the following form

h+ = −4

r

GNc4

µR2ω2 cos(2ωt)1 + cos2(ι)

2(23)

h× = −4

r

GNc4

µR2ω2 cos(ι) sin(2ωt). (24)

By introducing the chirp mass, given by

Mc =(m1m2)3/5

(m1 +m2)1/5, (25)

and defining R = (GNMtot)1/3ω−2/3 from the relation-

ship between the Newtonian gravitational interaction andthe angular acceleration of the system, we find that

h+ = −4

r

(GNMc)5/3

c4ω10/3 cos(2ωt)

1 + cos2(ι)

2(26)

h× = −4

r

(GNMc)5/3

c4ω10/3 cos(ι) sin(2ωt). (27)

From these polarizations, it is possible to use the New-tonian order approximation to derive the functional de-pendence of gravitational wave frequency on time andvice versa.

Due to the nature of the rotating system, the compo-nents of the quadrupole tensor complete a cycle everyhalf a period. This means that the gravitational wavefrequency is given by

fgw = 2forbit =ωorbitπ

. (28)

We will define the characteristic radius, Rc = 2GMc/c2,

and wavelength λ = c/fgw. Writing the polarizations interms of Rc and λ, including an arbitrary phase factor,2φ, and evaluating at the retarded time, tret, we have[3][1]

h+ = A1 + cos2(ι)

2cos(2πfgwtret + 2φ) (29)

h× = A cos(ι) sin(2πfgwtret + 2φ), (30)

where

A =

(2π

sqrt2

)2/3(Rcr

)(Rcλ

)2/3

(31)

An analysis of the radiated power from this system willlead toward the time-frequency functions we seek. Firstnote that the power emitted per unit solid angle is givenby

dPgwdΩ

=r2c3

16πGN

⟨h2

+ + h2×

⟩. (32)

Inserting the derived expressions for gravitational wavepolarizations, we find

dPgwdΩ

=2

π

c5

GN

(GNMcπfgw

c3

)10/3

g(ι), (33)

where g(ι) = (1 + cos2(ι)/2)2 + cos2(ι). And by integrat-ing over the sphere, we produce

Pgw =

∫dPgwdΩ

dΩ =32

5

c5

GN

(GMcπfgw

c3

)10/3

. (34)

Under the assumption that the binaries are on fixedorbital paths, the non-relativistic energy of the sys-tem is simply the sum of kinetic and potential energy,E = Ek + Ep = −GNm1m2/2R. The energy radiatedby gravitational waves requires energy and will force theorbital radius to shrink as it is radiated. Noting thatω2orbit = GNM/R3, the R in the orbital energy expres-

sion may be eliminated

Eorbit = −

(G2NM5

cπ2f2gw

8

)1/3

. (35)

Therefore the loss in orbital energy per unit time is givenby the energy flux

−dEorbitdt

= Pgw, (36)

which produces the following expression for the time evo-lution of gravitational wave frequency.

fgw =96

5π8/3

(GNMc

c3

)5/3

f11/3gw . (37)

Integrating this expression, we find that in terms of thetime before coalescence, τobs = tcoal− t, the gravitationalwave frequency is given by

fgw(τobs) =1

π

(5

256

1

τobs

) 38(GMc

c3

)−58

, (38)

For implementation in the frequency domain, however,we require the inverse of this function, tobs(fgw), whichis:

tobs(fgw) =5

256

((πfgw)

(GMc

c3

) 58

) 83

. (39)

These will be utilized later to determine the optimal sam-pling frequency for systems of arbitrary mass given by theNyquist time and Nyquist frequency in the frequency andtime domains respectively. A plot of equation 38 for givenmass pairs are shown in figure 2.

III. BAYESIAN INFERENCE

Data analysis viewed as statistical inference consistsof two closely related goals: model selection and param-eter estimation. The former seeks to decide which avail-able model best matches collected data, while the latterfocuses on the determination of the governing parame-ters on which a model depends. Bayesian inference offers

5

FIG. 2: The frequency dependence on time for various pairsof masses in units of solar mass. The second is a log-log scaledplot of the inverse function. The binary system described bythese plotted functions evolves slowly at early times, but theevolution accelerates dramatically as the system approachesISCO (Innermost Stable Circular Orbit).

a straightforward method to achieve both by exploringeach system parameter and hypothesized model as sepa-rate dimensions in the parameter space and returning aprobability density function (PDF) associated with each.In the case of gravitational wave signals produced by co-alescing compact binaries, the analysis of gravitationalwave signals with Bayesian inference requires the explo-ration of a minimum of nine dimensional parameters thatis performed at an exceedingly high computational cost.However, the cost is driven down over time by new algo-rithms and faster computers, increasing the practicalityof the application of the Bayesian framework in a growingnumber of fields.

To derive Bayes theorem, the principle underlying all ofBayesian inference, consider a set of hypotheses denoted

H = Hi|i = 1, ..., N, (40)

a vector of collected data, ~d, and prior information, I.From the concept of conditional probability and by thelaw of total probability, we may construct Bayes theorem,which states:

P (Hi|~d, I) =P (Hi|I)P (~d|Hi, I)

P (~d|I), (41)

where we read “P (Hi|~d)” as the probability of hypothesis“i” given collected data. By convention P (Hi|I) is calledthe prior probability, which encodes our confidence in a

hypothesis. The likelihood function, P (~d|Hi, I), denoteshow probable the data is given what we suspect theo-

retically, and the P (~d|I) is called the evidence, which

represents a sum over each hypothesis

P (~d|I) =∑i

P (~d|Hi, I), (42)

and behaves as a normalization factor. The posteriorprobability on the left side of equation 41 may thereforebe calculated by comparing the prior and likelihood dis-tributions against the evidence.

A. Model Selection

The Bayesian framework permits the comparison ofthe viability of various hypotheses in the form of an oddsratio. The procedure to determine this ratio is denotedmodel selection or hypothesis testing. Consider Bayestheorem applied to the posterior distributions of two sep-arate hypotheses, Hi, and Hj . Comparing these two pos-terior PDFs, we find

P (Hi|~d, I)

P (Hj |~d, I)=P (Hi|I)P (~d|Hi, I)

P (Hj |I)P (~d|Hj , I)=P (Hi|I)

P (Hj |I)Bij , (43)

where Bij is called the Bayes factor and the ratio of pri-ors preceeding this term is typically set to unity unlessthere is reason to favor a particular hypothesis. The morethe data supports the Hi, the larger the Bayes factor andvice versa for Hj . Convention suggests that values over100 represent definitive of the relative validity of one hy-pothesis versus another.

The computation of the Bayes factor is simple in thecase of hypotheses with no free parameters; however, inthe case of gravitational wave analysis, the set of vari-able parameters makes this computation exceedingly dif-ficult. To determine the Bayes factor in such a situation,the likelihood must be marginalized over each of the pa-rameters weighted by the prior, which produces what isknown as the evidence, Z, given by

Z = P (~d|Hi, I) =

∫~θ∈Θ

p(~d|~θ,Hi, I)p(~θ|Hi, I)d~θ (44)

In all but the most trivial cases, this integral is exceed-ingly difficult to compute due to the high dimensionalityof the parameter space and the large intervals of valuesthat can be taken by the parameters themselves. Thiscomputational challenge has been addressed in a paperby Veitch and Vecchio, and further boosts to their meth-ods are described here.

B. Parameter Estimation

As mentioned in our derivation of equations 38 and39, parameters associated with producing a unique grav-itational waveform are the two masses, time, sky posi-tion, distance, phase of coalescence and three orientation

6

angles, which we express as part of a nine dimensionalparameter space

Θ = M, ν, t0, φ0, DL, α, δ, ψ, ι. (45)

Other parameters such as the 6 spin components of a [10]binary system exist as we remove some of our simplifyingassumptions. The maximum number of parameters fora coalescing compact binary is therefore 15, and thoughfinite, it nonetheless presents a computationally costlychallenge.

To determine the distribution of each of these parame-ters, a procedure known as marginalization is performed.In the case of parameter estimation, the process begins

by considering a subset of parameters, say ~θA in the pa-rameter space Θ. The marginalized distribution for oneparameter may be calculated by “integrating out” theother parameters

p(~θA|~d,H, I) =

∫ΘB

p(~θA|~d,H, I)d~θB . (46)

Then the expectation of a parameter may be determinedby computing the weighted average over its marginalizeddistribution

〈~θA〉 =

∫ΘA

~θAp(~θA|~d,H, I)d~θA, (47)

and the variance of the data is described by

σ2 = 〈~θ 2A 〉 − 〈~θA〉2. (48)

IV. NESTED SAMPLING

In 2004, Skilling developed the Nested Sampling al-gorithm [7], which represented a novel shift the conven-tional calculations of Bayesian inference. It it capable ofperforming both model selection and parameter estima-tion by taking as inputs the likelihood and prior distribu-tions, sampling the parameter space using Markov-ChainMonte Carlo (MCMC) methods, and returning the com-puted evidence integral and samples from the posteriordistribution. We begin our explanation of the algorithmby multiplying either side of bayes theorem (equation ??)

by the evidence,P (~d|I), and establishing for each distri-bution the following nomenclature:

P (~d|θ, I)P (θ|I) = P (~d|I)P (θ|~d, I) (49)

Likelihood× Prior = Evidence× Posterior

L(Θ)× π(Θ) = Z× P (Θ).

A. The Procedure

The algorithm begins by mapping the parameter spaceto a one dimensional line. To perform this mapping, theprogram simply extracts the likelihood of each value in

the prior and orders these values from greatest likelihoodto least over the interval (0, 1). Next Nlive samples, orlive points in the conventional nested sampling jargon,are drawn uniformly from the prior and their likelihoodsare calculated.

FIG. 3: This diagram depicts samples drawn from Area(Z)which are equivalent to those drawn from the posterior dis-tribution.

The next step is for the algorithm to remove the livepoint farthest from 0, though it is stored along withits likelihood for later use. In its place another is re-sampled uniformly over the interval from 0 to the loca-tion of the removed point. This process is repeated untilthe sampled region has shrunk to very near 0 where thelikelihoods are very large. Each contour of equal likeli-hood shrinks by a constant factor so that log(X)i+1 =log(X)i − 1/Nlive, so for fewer live points, an instanceof this procedure will take less time than when utilizingmore.

Once the algorithm explores the parameter space suf-ficiently by resampling many times and reaching a ter-mination condition specified by the programmer, the al-gorithm calculates the evidence integral

Z =

∫Θ

L(Θ)π(Θ)dΘ. (50)

by defining the prior mass

X(λ) =

∫L(Θ)>λ

π(Θ)dΘ, (51)

and performing the summation

Z =

∫ 1

0

L(X)dX (52)

≈N∑i=1

L(Xi)∆Xi , (53)

where N is the number of samples collected andL(X(λ)) = λ.

Next the area beneath Z is sampled which producessamples from P (x) = L(x)/Z and thus from the poste-

rior P (~x|~d, I). By simultaneously sampling the posteriordistribution and calculating the evidence integral, nested

7

FIG. 4: A depiction of the nature of the repeated samplingand movement of the samples to regions of increasing likeli-hood.

sampling is highly efficient, greatly reducing the com-putational overhead necessary in the exploration of highdimensional parameter spaces. However, there remainsa great deal to be desired in terms of the computationaltime , as data analyses of gravitational wave systems cantake two months or more.

FIG. 5: A representation of the evidence integral and howthe posterior is sampled by sampling the area below the cal-culated evidence integral at a negligible additional computa-tional cost..

V. GAINING EFFICIENCY THROUGHPARALLELIZATION

For a given number of live points, the nested samplingprocedure will converge on a maximum likelihood by atime dependent on the rate at which the contours of equallikelihood shrink. This means a computation performedwith a lower number of live points, though less accurate,will be faster than for a larger number. We intend toreduce the number of iterations necessary to completethe algorithm by running multiple instances of the al-gorithm in parallel with a lower number of live pointsand analyze the effect of this reduction. The multipleruns will encourage statistical accuracy while reducingthe overall computational cost. Our goal is to optimizethis procedure while maintaining a sufficient level accu-racy by finding the most effective numbers of live pointsto utilize.

A. Method

We first utilized a random number generator whichprovided each run with a different random seed to inducedifferences between each output. An integer with a largenumber of factors was chosen arbitrarily to provide theupper end of the test. We ran one instance at 1024, twoat 512, four at 256, and so on to a minimum number of 16live points where the algorithm no longer produced suf-ficiently accurate results. Instances using factors of 1200live points to a minimum of 16 and 256 to a minimum of64 were also performed following the same scheme. Next,we organized the posterior samples from each parallel runby weighting and resampling within each run. Once col-lected, we merged each of the parallel runs by weightingeach run according to their own evidence estimates anddrew posterior samples from all runs by drawing from thecollated weighted parallel posterior samples.

The post-processing procedure for the parallelized in-stances generated plots of the probability distributions ofeach parameter and the covariance matrix of the param-eters in each run individually. It then merged the datafrom each instance using a common number of live pointsand ploted the data together for comparison. The poste-rior samples from each of the runs were then combined togenerate a figure of merit for the relative accuracy givenvarious numbers of Nlive. The merged files were be com-pared against each other to determine which numbers oflive points required the least computational time. Oncegenerated, the efficiency and accuracy figures of meritwere compared, and an optimal range of live points de-termined.

B. Results

Observing the relative rates at which the nested sam-pling algorithm converges on the most likely parametervalues produces a glempse of the efficiency boost gainedby decreasing the number of live points. Figures 6 and 7show how the reduction in live points more sparsely sam-ples the parameter space and generate a mental pictureof how nested sampling discovers and explores regions ofincreasing likelihood. The parameter shown here is thechirp mass, which is highly sensitive to changes in thewaveform generation inputs. Besides the pictorial repre-sentation, these images provide a quantitative into thereduction in computational cost. Noting the horizontalaxes which represent the number of iterations undergoneby the algorithm up to that sample, computations withlower numbers of live points do indeed require far fewercomputations and thus take less time.

The initial parallelization containing 1024 live pointsand factors thereof produced the following curve describ-ing the number of posterior samples versus the number oflive points per run. This plot also includes a more highlyresolved segment where we suspected the algorithm toperform most effificnely. This is explained in the stas-

8

FIG. 6: Samples drawn during nested sampling instance runwith 1200 live points. Here the x-axis denotes the numberof iterations performed by the algorithm, while the y-axisdescribes the parameter estimate.

FIG. 7: Samples drawn during nested sampling instance runwith 300 live points. Here the x-axis denotes the numberof iterations performed by the algorithm, while the y-axisdescribes the parameter estimate.

tical variance of the time to perform the likelihood andthus maintains the increasing trend as Nlive grows.

From the sampling, is clear that an efficiency boostcan be generated by cutting the number of live points.However, such a procedure will not produce viable resultswhen run with fewer than 64 live points. This is seen inthe drastic decrease in posterior samples, and thus accu-racy, with a relatively small gain in efficiency. This is inpart because the amount of time to complete a run withhalf the number of live points does not correspond tohalf the time to complete that of the larger number. In-stead instances run with fewer live points require roughly75% of the computational time of the instances contain-ing twice as many live points, which is seen in figure8. Since accuracy is lost as the number of live points isdecreased, the standard deviation of our parameter esti-

FIG. 8: A plot of posterior samples versus the number oflive points used to perform the nested sampling algorithm.For each 50% reduction in Nlive, approximately 75% of theposterior samples are retained.

mates also increases as can be observed in figures 11, 12,9, and 10. At a sufficiently small number of live points,accurate physics can no longer be performed.

Continuing to analyze the accuracy of the parameterestimates as we decrease the number of live points, itis beneficial to analyze the distributions of each of themerged parallel runs. Similarity between the histogramsfrom figures 9 and 10 shows that parallelization main-tained a significant amount of accuracy in the waveformparameters estimated by the algorithm. Figures 9 and 10depict the probability distributions for the values of thechirp mass. It was expected that as this number dimin-ished, the variance would increase dramatically; howeverthat was not observed. Though the spread of the data forrelatively small values was larger than for large numbersof live points, the rate at which the spread increased wasslower than expected.

Figures 11 12 depict the log Bayes factors calculatedby the parallel instances containing the same numbers oflive points. The range of the log Bayes factor and theaccuracy of parameter estimates are highly correlated.At lower numbers of live points, the variance of the logBayes factors becomes too large, meaning the algorithmwill not converge on as consistent a quantity. However,these histograms show that there is a range inside 512and 128 which produced sufficient accuracy to extractphysics from gravitational wave signals.

Comparing the cumulative distributions of the chirpmass calculated with varying numbers of live points, weproduced figure 13 and 14, the latter of which is a rangeof live points within the larger range. As is visible inthe plots, there is little difference in the distributionscalculated for different numbers of live points. Each runis slightly different; however, merging all of the outputfiles for common numbers of live points, we see that the

9

FIG. 9: A probability density function for chirp mass param-eter calculated using 1200 live points.

FIG. 10: A probability density function for chirp mass pa-rameter calculated using 300 live points.

average of each has a nice overlap. This is precisely whatwe desired when the problem was defined.

Figure 15 describes the computational time versus thenumber of live points for this set of runs. There is a cleartrend seen here; however, the greatest improvement rel-ative to the preceding higher number of live points wasfound in the transition between 300 and 200. Based onthe number of posterior samples and the efficiency boostin computational time, the optimal range of live pointnumbers falls between 250 and 300. Here it was deter-mined that we produced the fastest result while main-taining sufficient accuracy to produce viable parameterestimates and allow us to maintain confidence in futureapplicatins of this approach.

FIG. 11: A histogram of the 2 Instances with 512 Live Points.Here the log Bayes factor was 545.17 ± 0.06.

FIG. 12: A histogram of the 8 Instances with 128 Live Points.Here the log Bayes factor was 545.71 ± 0.67.

VI. VARIABLE RESOLUTION OF THEFREQUENCY DOMAIN

At the most basic level, any attempt to better resolveof a set of data will cost computational time, but produceimproved accuracy. The opposite is true for lower reso-lutions. There is a limit to how much or little we mayresolve a function, given by the Nyquist frequency. Theideal resolution is achieved when a function is sampled atthis rate, whereas sampling above this rate will produceno improvement in accuracy and a loss in efficiency andbelow it will produce effects known as aliasing where awaveform can no longer be uniquely determined.

When we refer to sampling in this section, we meanthe variety performed in the reconstruction of a ban-dlimited signal given by the Sampling Theorem. TheNyquist frequency, coming directly from the this theo-

10

FIG. 13: Cumulative distribution for chirp mass parametergiven merged parallel instances from a large range of livepoints

FIG. 14: Cumulative distribution for chirp mass parametergiven merged parallel instances from a small range of livepoints

rem, states that a signal, t(f), bandlimited by some fre-quency, F , may be completely determined by a seriesof its ordinates spaced at a frequency of F/2 Hz. Therate at which these ordinates occur is called the samplingfrequency, and the exact frequency at which a signal iscompletely determined is called the Nyquist frequency.A subsequent result of this theorem is that a frequencydomain waveform containing no amplitudes greater thanT is completely determined by giving its ordinates at aseries of abscissas spaced 1/(2T ) = f Nyquist Hz apart.Since this is a frequency domain waveform, and we haveused the inverse of the function t(f), we could think ofthis sampling frequency as the Nyquist time instead of

FIG. 15: A plot of the total computational time at variousnumbers of live points. There is an enormous redution pro-duced by the first reduction of 50%. Further reductions pro-duce relatively smaller reduction in computational time.

frequency; however, we have opted to utilize a notationmostly consistent signal processing conventions (see [5]).

The current algorithm utilizes constant resolutionfunction stationed well above the Nyquist frequency forall but the lowest frequency. Instead, using a functionwhich returns the Nyquist frequency for each point onthe waveform we wish to generate and smapling at thisrate would be more efficient. Our goal is thus to exploitthe monotonically decreasing nature of the Nyquist fre-quency of an inspiral signal and optimize this aspect ofwaveform generation. This will result in sampling at ahigh rate at low frequencies and down-sampling gradu-ally as we reach higher frequencies.

A. Method

The first step toward optimizing the resolution of thefrequency domain waveform through variable resolutionis to determine the Nyquist frequency at every point ina theoretical frequency domain waveform. We do this byintroducing the functional dependence of gravitationalwave frequency on time observed which is given by equa-tion 38. Given particular systems of BNS, BBH andNSBH, we may establish scenarios for different mass ra-tios which result in a different time dependence on fre-quency and vice versa. The largest amplitude for thisfunction occurs when the smallest chirp mass is imple-mented, which for the sake of gravitational wave analysisis a binary system of neutron stars, each weighing 1M.Such a system establishes a worst-case scenario for sam-pling rates and was used in our subsequent calculations.The plot shown in figure 39 describe the functional de-pendence of time on frequency and vice versa given ninedifferent mass combinations listed in solar masses. By

11

using this plotted equation we may determine the corre-sponding Nyquist sampling frequency by calculating

fnyq =1

2

1

tobs(fgw). (54)

and thereby optimize the sampling rate for arbitrarymass CBC systems. Graphically this is shown in fig-ure 16, where we can see that at each frequency at whichwe change sampling rates, we may simply take twice thevalue of the function, the inverse of which is the idealsampling rate.

We have chosen to break frequency domain waveformsaccording to an optimization that minimizes the num-ber of samples as a function of the frequencies at whichthe function is broken. This means that for any chosennumber of breaks, f1, f2, ..., fM , we may find the optimalfrequencies at which to change the sampling rate and per-form calculations with optimal efficiency. For this num-ber of breaks, the function of the number of samples wewill minimize is given by

N(f1, f2, ..., fM ) =f1 − fminfnyq(fmin)

+f1 − f2

fnyq(f1)+ ...

+fN − fN−1

tnyq(fN−1)+fmax − fMfnyq(fM )

,

where nyq = ∆fnyq = 1/(2τ(fgw) + 1) is the nyquistfrequency given fgw and utilizes equation 39. In wordsthe length of each band divided by the rate at whichit is sampled, produces the number of samples for eachband. Summing this over each band produces the numberof samples required to construct the waveform, so theminimum of this function returns the minimum numberof samples required.

In the continuous case and with arbitrary computingpower, it would be advisable to make an enormous num-ber of breaks in the frequency domain so as to alwayssample waveform at an optimal rate. However, withcomputational limitations, it is more efficient to breaka relatively small number of times because the time re-quired to recompose the bands becomes a significant asthe number of breaks increases.

Next we recompose the waveform by placing the bandsside by side. This is achievable due to the analytic na-ture of the frequency domain, theoretically guaranteed toproduce a perfect match, though this was not observed inour results. Here we have utilized linear interpolation toproduce a sufficient match against a waveform sampledat a constant rate. Alternatives such as sinc and splineinterpolation methods were considered; however, the na-ture of this problem led naturally to a linear procedure.The fully composed waveform was then compared againstone at a constant sampling rate. Results of this imple-mentation follow.

FIG. 16: An example of how a frequency domain waveformis broken and sampled. The location of each break is deter-mined by the minimization of the number of samples as afunction of frequency breaks 1 through 5 shown in equation55, and the sampling rate for each band is determined by thenyquist frequency given by the inverse of twice the maximumamplitude of equation 39.

B. Results

Using linear interpolation between the reference wave-form at a constant sampling rate and our broken wave-forms with various numbers of bands, the following wave-form matches were calculated. Note that the one bandcase is not a perfect match. This is due to error in-duced by the interpolation method itself, which propa-gates through the other matches as well. Despite thissmall error, the matches remain excellent for the pur-poses of signal analysis.

Bands Percent Match Time (d+h:m:s)

1 99.9997 ≈ 4 + 09 : 00 : 002 99.9541 3 + 16 : 28 : 543 99.7589 2 + 23 : 22 : 124 99.5351 2 + 18 : 19 : 395 99.3865 2 + 17 : 30 : 19

FIG. 17: Preliminary results showing the greatly diminishedcomputational time at little cost to accuracy

In terms of computational efficiency gain, it is clearthat variable resolution will prove useful in future grav-itational wave analysis. The results shown here suggestthat a frequency domain broken into 5 bands will producea nearly 50% reduction in computational time. Sincethese results are for only short wavefoms, it is expectedthat this reduction will be further amplified as the wave-form length increases. A beneficial test for the futurewould be to test a waveform of the same length we ex-pect our first detection to be. This would allow us to

12

understand just how long a detection will take to ana-lyze and ultimately publicize.

FIG. 18: Three overlayed frequency domain waveforms. Oneis sampled at a constant rate, another is composed withoutinterpolation, and another uses linear interpolation to recon-nect each of the bands. The linear interpolated match is seenin VI B.

FIG. 19: Time required to generate the frequency domainwaveform

VII. TOWARDS A TIME DOMAINIMPLENTATION

A supplementary goal is to apply the variable resolu-tion approach to the time domain. A large portion ofthe computational time required to complete a nestedsampling instance is invested in the generation of wave-forms. The most time consuming waves to generateare time domain waveforms because they are not ana-lytic. They present a greater computational challenge be-cause they require numerically solving a system of highly-coupled second-order differential equations and require

the matching the broken waveforms in a consinuous anddifferentiable manner in order to reduce inconsistenciesbrought about by transforming between the time andfrequency domains. While the former explains why thematches are imperfect, the latter expresses why this isa hurdle which must be overcome to produce accurateresults.

In principle, the time domain waveform is very sim-ilar to the frequency domain except it is reversed andincreases in amplitude from some initial time to ISCO.This means that the strategy initially implemented tobreak the frequency domain into multiple intervals mustbe reversed so the minimization of the new function N(t)produces the optimal times at which the time domainshould be broken from greatest to least. We may writethis function as follows:

N(t1, t2, ..., tN ) =tISCO − tNfnyq(tN )

+tN − tN−1

fnyq(tN−1)+ ...

+t2 − t1fnyq(t2)

+t1 − tminfnyq(t1)

.

With the function N(t1, t2, ..., tN ) established, it is easyto consider an implementation in the time domain usingthe same framework developed for the frequency domain.First we would minimize this function for the number oftime domain breaks we desire, sample at the Nyquist fre-quency (instead of Nyquist time) for the largest time am-plitude inside the band, and then compose the resultingpieces of the waveform.

However, the current code has the following limita-tion which will )have to be addressed before this conceptis used in LALInference. The waveform generator pro-duces waves from some initial frequency all the way toISCO. This is not a problem, except that it results ina waste of computational resources when using a multi-band framework. Instead, we suggest the implementationof code which allows for a certain maximum frequency tobe specified and the waveform calculated only for thatregion.

Once this is addressed, we expect there will be a prob-lem matching the broken waveforms. The frequency do-main is analytic, meaning that producing a match fromthe broken waveforms is relatively easy and, in fact,should be perfect except for interpolation generated er-rors. The time domain waveform, on the other hand, is

FIG. 20: An example time domain imspiral waveform.

13

found by solving a system of coupled differential equa-tions. Changing boundary conditions such as the end-points of the interval over which you seek a solution cancause changes in the endpoints. Even if only a slight dif-ference between the end of one band and the beginning ofthe next is produced, this will produce what is known asGibbs phenomenon when the function undergoes a fastFourier transform. To reduce this effect, we suggest theeventual implementation of a convolution of a small over-lap of adjacent waveform bands. This would smooth thefunction, thereby encouraging continuity and making thefunction analytic at points which would not otherwise bedifferentiable.

VIII. CONCLUSIONS

The initial goals set for this project were to analyzeparallel instances of the nested sampling to obtain anoptimal configuration and to study and implement a vari-able resolution algorithm for more efficient waveform gen-eration. From the procedure desceibed here, we deter-mined that the nested sampling algorithm could indeedbe made more efficient by the implementation of parallelprocessing as well as by variable resolution of the fre-quency domain. There was also progress made towardthe implementation of the algorithm in the time domain;however, those results will be saved for a future papercombining the results from this research and a more de-tailed analysis of the behavior of the multiple band ap-proach in the time domain.

Regarding the parallelization performed, we deter-mined that the optimal range of live points is between250 to 300 with a minimum of 4 instances of nested sam-pling should be implemented with these values to performmeaningful statistics. There is too significant a loss in ac-curacy when the number is decreased below this value.Even if a large number of instances were run simultane-ously, the results would be less meaningful than a smallerbut more accurately computed group. Since the num-ber of posterior samples drawn from runs containing 200to 256 live points is approximately 50% of the numberdrawn from a run containing 1000 to 1024, this means

that running, say, 4 instances with 250 live points willproduce a 2.25% increase in the total number of pos-terior samples, meaning a more accurate result can beproduced at a lower computatonal cost.

Similarly, results from the variable resolution algo-rithm indicate that it is possible to gain efficiency bybreaking up the frequency domain waveform to performanalysis. The results shown above are only preliminary,as they utilize short waveforms for the sake of determin-ing results quickly. To test the full potential of this ap-proach, it will be necessary to test longer waveforms andalso determine the effect of louder signals with highersignal to noise ratios. If the pattern found in table VI Bremains consistant for longer and louder waveforms, itis highly likely that the algorithm will cut the time re-quired to perform nested sampling in half. We feel thatthe development of a similar algorithm for the time do-main would come full circle in boosting efficiency underby means of this framework.

A combination of parallelization and variable resolu-tion would clearly be a beneficial application of thesenewly developed tools. A test of these in unison will helpdetermie a new time frame within which we will be ableto publish results from the first ever gravitational wavedetection expected from the advanced-configuration de-tectors. With these procedures in hand, it is hoped thatwe will become closer to the goal time frame of a publi-cation three months after the detection is made.

IX. ACKNOWLEDGEMENTS

The author would like to thank the gravitationalphysics group at NIKHEF for their encouragement, pa-tience and advice throughout the research process. Hewould like to extend special thanks to his mentor JohnVeitch for his support and dedication. Additionally hewould like to thank the University of Florida Depart-ment of Physics for the opportunity to perform this re-search via the NSF IREU program. It was a wonderfulexperience that will have benefits far beyond the scopethe research performed and physics and computationalmethods learned.

[1] Gravitational Waves, Maggiore[2] http://arxiv.org/pdf/1003.2480v2.pdf[3] Schutz[4] Data Analysis: A Bayesian Tutorial[5] Communication in the Presence of Noise[6] Bayesian Coherent Analysis of Advanced Detector Data[7] http://www.inference.phy.cam.ac.uk/bayesys/nest.pdf[8] http://web.mit.edu/ wingated/www/introductions/

[9] As a gravitational wave passes, the frequency and am-plitude increase in the audible band, making a chirpingnoise, so the chirp time is in reference to the time thewave is within a detector’s visible band.

[10] The spin of an object is described of a magnitude andtwo orientations, so for a binary system, a total of 6 pa-rameters completely determine the spins.

Date post:	22-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Improvements in Computational E ciency for CBC Analysis of ...II. GRAVITATIONAL THEORY A. General...

Documents