Statistics in Astronomy

Statistics in Astronomy

Peter Coles

STFC Summer School, Cardiff 27th August 2015

Lecture 1Probability

“The Essence of Cosmology is Statistics”

George McVittie

1 May 2023

Precision Cosmology

“…as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.”

SAY “PRECISION COSMOLOGY” ONE MORE TIME…


George McVittie

Direct versus Inverse Reasoning

Theory (, H0…)

Observations

Fine Tuning• In the standard model of cosmology the

free parameters are fixed by observations• But are these values surprising?• Even microscopic physics seems to have

“unnecessary” features that allow complexity to arise

• Are these coincidences? Are they significant?

• These are matters of probability…

What is a Probability?• It’s a number between 0 (impossible) and 1

(certain)• Probabilities can be manipulated using simple

rules (“sum” for OR and “product” for “AND”).• But what do they mean?• Standard interpretation is frequentist (proportions

in an ensemble)

Bayesian Probability• Probability is a measure of the “strength of

belief” that it is reasonable to hold.• It is the unique way to generalize

deductive logic (Boolean Algebra)• Represents insufficiency of knowledge to

make a statement with certainty• All probabilities are conditional on stated

assumptions or known facts, e.g. P(A|B)• Often called “subjective”, but at least the

subjectivity is on the table!

Balls• Two urns A and B.• A has 999 white balls and 1 black one; B

has 1 white balls and 999 black ones.• P(white| urn A) = .999, etc. • Now shuffle the two urns, and pull out a

ball from one of them. Suppose it is white. What is the probability it came from urn A?

• P(Urn A| white) requires “inverse” reasoning: Bayes’ Theorem

Urn A Urn B

999 white 1 black

999 black 1 white

P(white ball | urn is A)=0.999, etc

Bayes’ Theorem: Inverse reasoning

• Rev. Thomas Bayes (1702-1761)

• Never published any mathematical papers during his lifetime

• The general form of Bayes’ theorem was actually given later (by Laplace).

Bayes’ Theorem

• In the toy example, X is “the urn is A” and Y is “the ball is white”.

• Everything is calculable, and the required posterior probability is 0.999

I)|P(YI)X,|I)P(Y|P(X=I)Y,|P(X

Probable Theories

I)|P(DI)H,|I)P(D|P(H=I)D,|P(H

• Bayes’ Theorem allows us to assign probabilities to hypotheses (H) based on (assumed) knowledge (I), which can be updated when data (D) become available

• P(D|H,I) – likelihood• P(H|I) – prior probability• P(H|D,I) – posterior probability• The best theory is the most probable!

Why does this help?• Rigorous Form of Ockham’s Razor: the hypothesis

with fewest free parameters becomes most probable.

• Can be applied to one-off events (e.g. Big Bang)• It’s mathematically consistent!• It can even make sense of the Anthropic

Principle…

Null Hypotheses• The frequentist approach to statistical

hypothesis testing involves the idea of a null hypothesis H0,which is the model you are prepared to accept unless there is evidence to the contrary.

• Under the null hypothesis one then constructs the sampling distribution of some statistic Q, called f(Q).

• If the measured value of Q is unlikely on the basis of H0 then the null hypothesis is rejected.

Type I and Type II Errors• There are two ways of making an error in this

kind of test.• Type I is to reject the null when it is actually

true. The probability of this happening is called the significance level (or p-value or “size”), usually called . It is usually chosen to be 5% or 1%.

• The other possibility is to fail to reject the null when it is wrong. If the probability of this happening is then (1-) is called the power.

Bayesian Hypothesis TestingTwo of the advantages of this is that it doesn’t put one hypothesis in a special position (the null), and it doesn’t separate estimation and testing.Suppose Dr A has a theory that makes a direct prediction while Professor B has one that has a free parameter, say .Suppose the likelihoods for a given set of data are P(D|A) and P(D|B,)

Occam’s Razor

λ)B,|(DB)|(λdλA)|(D

(B)(A)

λ)B,|(Dλ)(B,dλA)|(D(A)

D)|λ(B,dλD)|(A=

D)|(BD)|(A

PrPrPr

PrPr

PrPrPrPr

PrPr

PrPr

Occam factor

Bayesian estimation

aI)d,aa|xI)p(x|ap(a=K

I),aa|xI)p(x|aKp(a=I),xx|ap(a

mmnm

mnmnm

.......

............

1111

11111

This involves finding the posterior distribution of the parameters given the data and any prior information.

Evidence!

Is there anything wrong with Frequentism?

• The laws for manipulating probabilities are no different

• What is different is the interpretation.• OK to imagine an ensemble, but there is

no need to assert that it is real! (mind projection fallacy)

• The idea of a prior is worrying for many, but is the only way to make this reasoning consistent

Prior and Prejudice• Priors are essential. • You usually know more than you

think..• Flat priors usually don’t make much

sense.• Maximum entropy, etc, give useful

insights within a well-defined theory: “objective Bayesian”

• “Theory” priors are hard to assign, especially when there isn’t a theory…

Why is the Universe (nearly) flat?

• Assume the Universe is one of the Friedman family

• Q: What should we expect, given only this assumption?

• Ω=1 is a fixed point (so is Ω=0)..

• The Universe is walking a tightrope..

a2=8πGρ

3 a2−kc 2

The Friedman ModelsThe simplest relativistic cosmological models are remarkably similar (although the more general ones have additional options…)

a=−4πGρ

3 a

Solutions of these are complicated, except when k=0 (flat Universe). This special case is called the Einstein de Sitter universe.

Notice that

ρ∝1a3

For non-relativistic particles (“dust”)

Curvature

Cosmology by Numbers

c

2

2

ρ=ρ

H==aaa=a=k

kca=a

8ππ3H

38ππG

38ππG0

38ππG

2

222

22

The “Critical Density”

This applies at any time, but we usually take the “present” time. In general,

020

0

3H8ππG Ω=,ρ=ρ,H=H,t=t 000

The Cosmic Tightrope• We know the Universe doesn’t have either

a very large or a very small one, or we wouldn’t be around.

• We exist and this fact is an observation about the Universe

• The most probable value of is therefore very close to unity

• Still leaves the mystery of what trained the Universe to walk the tightrope (inflation?)

Theories

Observations

FrequentistBayesian


George McVittie

“CONCORDANCE”

Cosmology is an exercise in data compression

Cosmology is a massive exercise in data compression...

….but it is worth looking at the information that has been thrown away to check that it makes sense!

1 May 2023

How Weird is the Universe?• The (zero-th order) starting point is

FLRW.• The concordance cosmology is a “first-

order” perturbation to this• In it (and other “first-order” models), the

initial fluctuations were a statistically homogeneous and isotropic Gaussian Random Field (GRF)

• These are the “maximum entropy” initial conditions having “random phases” motivated by inflation.

• Anything else would be weird….

A)!|P(MM)|P(A

Beware the Prosecutor’s Fallacy!

Is there an Elephant in the Room?

Types of CMB Anomalies• Type I – obvious problems with data

(e.g. foregrounds)• Type II – anisotropies (North-South, Axis

of Evil..)• Type III – localized features, e.g. “The

Cold Spot”• Type IV – Something else (even/odd

multipoles, magnetic fields, ?)

“If tortured sufficiently, data will confess to almost

anything”

Fred Menger

Weirdness in PhasesΔT (θ,φ )

T=∑∑ a l,mY lm (θ,φ )

ml,ml,ml, ia=a exp

For a homogeneous and isotropic Gaussian random field (on the sphere) the phases are independent and uniformly distributed. Non-random phases therefore indicate weirdness..

Final Points

Date post:	21-Jan-2017
Category:	Science
Upload:	peter-coles
View:	3,025 times
Download:	0 times

Statistics in Astronomy

Science