+ All Categories
Home > Documents > Bayesian modelling in the big data era: case studies in ... · a university for the real world...

Bayesian modelling in the big data era: case studies in ... · a university for the real world...

Date post: 10-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
64
CRICOS No. 00213J a university for the world real R Bayesian modelling in the big data era: case studies in understanding neurodegenerative disease Kerrie Mengersen Distinguished Professor Statistics, QUT
Transcript
Page 1: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian modelling in the

big data era: case studies

in understanding

neurodegenerative

disease

Kerrie Mengersen

Distinguished Professor

Statistics, QUT

Page 2: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Acknowledgements

• Peter Silburn (MD, PD)

• Graham Kerr (QUT, PD)

• James Doecke (CSIRO, AD)

• BRAG (QUT)

• ACEMS (QUT)

Page 3: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

https://www.youtube.com/watch?v=PI7SLOovO5c

Page 4: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Tackling Big Data with Data Science

DataScience@QUT

Australian Data Science Network

Page 5: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Partnerships in research and

translation

Domain-specific data science

research

Fundamentaldata science

research

Mathematical Sciences

+

Computer Science

DataScience@QUT

+

Collaborative Domains

Australian Data Science Network

+

External Partners

Page 6: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Advancing Training in Data Science

Creating and enhancing data science capability and capacity

Research Training HDRs and ECRs

Undergraduate vacation research programs

Targeted programs, e.g., Women in Data Science, Indigenous Data Science

Knowledge Exchange Peer-to-peer connections: internships, professional sabbaticals

Short courses, seminars, conferences, workshops, MOOC

Directory of Data Science Resources

QUT Activities Master of Data Analytics

Work Integrated Learning undergraduate program

QUT EX Program

e-Research Training Program

Linkages ARC Training Centres

Page 7: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian Research and Applications Group

(BRAG) at QUT

Modelling

• Combining data

sources

• Modelling with

uncertainty

• Using prior

information

• Probabilistic

prediction

• Risk stratification

• Complex systems

models

Computational Methods

• Algorithms

(MCMC, ABC, VB, HMC)

• Software

(R, Python, Julia)

• Approximations

• Ensembles

• Platforms

• Dimension reduction

• Parallelisation, Sketching

• Distributed learning

Applications

• Health

• Environment

• Industry

• Complex systems

• Varied information

sources

• Varied data sources

(citizen science, VR,

drones, digital data)

Page 8: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Models:

• Probabilistic

• Regularised

• Flexible

• Robust

• Transferable

• Adaptive

Computation:

• Scalable (parallelisable)

• Subsampling

• Pre-computable

• Approximations (eg. ABC)

"In the past ten years, it's hard to find anything that doesn't advocate a Bayesian approach." -Nate Silver

Inference:

• Estimation

• Optimisation

• Uncertainty quantification

• Testing

• Model averaging

Meeting the challenge: “New Bayes”

Page 9: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian Modelling

p(q|y) = p(y|q) p(q) / p(y)

p(q|y) = p(q) p(y|q) / p(y)

Likelihood x Prior / Normalising constant

Prior x Likelihood / Normalising constant

Page 11: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian spatial image analysis

MRI scans of brains

https://www.radiologyinfo.org/en/info.cfm?pg=alzheimers

Page 12: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Approaches

• Clair Alston-Knox – spatial mixture models

• Chris Strickland – spatial dynamic factor models

• Zoe van Havre – overfitted mixture models

• Matthew Moores – scalable approximate algorithms and pre-processing

• Cathy Hargrave – feature alignment

• Marcela Cespedes – hierarchical multivariate models

• Hongbo Xi – sparse matrix factorisation

• Insha Ullah – PCA approaches for high-D variable selection

• Jacinta Holloway – decision-tree methods

• Aleysha Thomas – ensemble meta-analysis + Bayesian network models

Page 13: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

1. Parkinson’s Disease (PD)

• PD is a common neurodegenerative disorder that

affects 0.3% of the general population.

• 4% of the cases are under the age of 50 years.

• Onset of PD is often mistaken for normal healthy

ageing.

• There is limited literature on the age at PD onset. A

deeper understanding about the age at onset could

lead to better clinical assessments and timely

management.

Page 14: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Parkinson’s Disease (PD)

• Although some non-genetic risk factors have

been shown to have a strong influence on PD,

most of these have been individually studied,

and only a few have been studied in

association with the age at PD onset.

• One of the factors that is increasing in interest

is organochlorine pesticide (OCP) exposure.

• What are the combined effects of non-genetic

risk factors and pesticide exposure on the age

at onset of Parkinson’s Disease?

Page 15: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Ensemble Approach

• Meta-analysis to obtain odds ratio estimates from previous studies on the effect

of individual risk factors on age at PD onset and merge inferences on five OCPs

into a single distribution.

• Bayesian Network to combine results results of the meta-analysis and

information of PD patients from multiple data sources

Page 16: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Data

• Risk factor information collected on a cohort of 350 PD patients as part of the

Queensland Parkinson’s Project (QPP).

• Concentrations of OCP (HCB, -HCH, trans-nonachlor, p,p’-DDE and p,p’-DDT)

measured from pooled samples of human blood serum from males and females

collected in Brisbane, Australia in age groups 5-15, 16-30, 31-45, 46-60 and >60

years

• Estimates of the association between risk factors and age at onset obtained from a

systematic review: articles that reported an odds ratio (OR) and a 95% CI were

included.

Page 17: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Literature Review

Page 18: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Meta-analysis model

Fit separate models for each age group and gender.

Let yi be the estimated log odds ratio for early age at

onset of PD (<50 years old) associated with exposure

to pesticide for the ith study.

• Fit a random-effects meta-analysis model to

combine effects across studies.

The overall combined OCP concentrations (q0 ) for

each age group and gender were used to

parameterise part of the BN.

Page 19: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Meta-analysis results: OR of early age at onset of PD

associated with exposure to PD

Page 20: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian Network

Page 21: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian Network: strength of influence

Page 22: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian Network:

varying the evidence

Large variation in outcome associated

with:

- OCP exposure

- Head injury

- Both head injury and family history

Small variation associated with:

- smoking

- alcohol

- presence of both led to lower

probability of an early age at onset

The absence of one or both medical

history risk factors had a lower

probability of early age at onset.

Page 23: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Conclusions

• Family history, prior head injury and OCP exposure are strongly associated

with an earlier age at PD onset.

• Irrespective of other risk factors, OCP exposure has a strong influence on the

probability of an early age at onset: high exposure is linked to a higher

probability of early onset compared to low exposure.

Page 24: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Extracellular recordings provide real time

monitoring of brain activity.

Measurement of action potentials (spikes) indicate

neuron populations present in the region of interest.

Spike sorting assigns individual spikes to source

neurons.

We want to analyse spikes collected from the

subthalamic nucleus during Deep Brain Stimulation, a

surgical intervention for the alleviation of symptoms in

patients with advanced Parkinson's disease (PD).

2. Spike sorting

Zoe van Havre, Nicole White, Judith

Rousseau, Kerrie Mengersen

Page 25: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

• In PD, parts of the basal ganglia are either

under- or over-stimulated. Normal movement

is replaced by tremor, rigidity and stiffness.

• DBS of specific ganglia alters the abnormal

electrical circuits and helps stabilize the

feedback loops, thus reducing symptoms.

• Electrodes can be placed in the subthalmic

nucleus, thalamus or globus pallidus.

Deep Brain Stimulation

Page 26: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

• In PD, parts of the basal ganglia are either

under- or over-stimulated. Normal movement is

replaced by tremor, rigidity and stiffness.

• DBS of specific ganglia alters the abnormal

electrical circuits and helps stabilize the

feedback loops, thus reducing symptoms.

• Electrodes can be placed in the subthalmic

nucleus, thalamus or globus pallidus.

Deep Brain Stimulation

Page 27: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

• Three independent samples, Y1, Y2, Y3

• Dimension reduction performed using robust Principal Components Analysis (PCA).

The first four principal components (PCs) were used as inputs into the model.

Data

Page 28: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

• Problem: infer the partition of n multivariate observations into K clusters (K unknown).

• For the sample y = (y1,…, yn) let yi = (yi1,…, yir) consist of r measurements associated

with observation i.

• Cluster membership for each yi is inferred via the discrete latent variable zi where zi = k

denotes the assignment of yi to cluster k.

𝑝 𝒚𝑖 𝑧𝑖 = 𝑘, 𝜽𝑘 = 𝑁𝑟(𝝁𝑘 , 𝚺𝑘)• Priors:

𝑝 𝜇𝑘 Σ𝑘 = 𝑁𝑟 𝑏0,Σ𝑘𝑁0

; 𝑝 Σ𝑘 = 𝐼𝑊(𝑐0, 𝐶0)

Model setup

Page 29: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

• Likelihood

𝑝 𝒚 𝜽, 𝝅 = ෑ

𝑖=1

𝑛

𝑘=1

𝐾∗>𝐾

𝜋𝑘𝑁𝑟(𝝁𝑘 , 𝜮𝑘)

• Priors:

𝑧𝑖|𝝅 ~ 𝑀𝑁(1; 𝜋1, … , 𝜋𝐾∗)

𝜋1, … , 𝜋𝐾∗ ~ 𝐷(𝛼,… , 𝛼)

• Set 𝑏0 = ത𝑦,𝑁0 = 0.01, 𝑐0 = 5, 𝐶0 = 0.75cov 𝑦 .

• Choose a by prior tempering (ZMix).

1: Overfitted finite Gaussian mixture model

Page 30: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Random measure G characterized by mean G0 of DP and

concentration parameter m0

𝑦𝑖|𝜃𝑖 ~ 𝜃𝑖

𝜃𝑖|𝐺 ~ 𝐺

𝐺 ~ 𝐷𝑃(𝑚𝐺0)

2: Dirichlet Process model

Page 31: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

2: Dirichlet Process mixture model

𝐺 = σ𝑘=1∞ 𝜋𝑘𝛿𝜃𝑘

𝜋𝑘 = 𝜈𝑘ෑ

𝑙<𝑘

(1 − 𝜐𝑙)

𝜈𝑘~Beta(1,𝑚)

𝜃𝑘|𝐺0 ~ 𝐺0

𝑚 ~ Gamma(1,1)

• Implemented using slice sampler

• The Posterior Expected Rand (PEAR) index used as a MAP estimator for z.

Page 32: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results: posterior distribution of number

of occupied components

Page 33: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results: optimal partitioning

Page 34: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results: optimal partitioning

Page 35: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results: optimal partitioning

Page 36: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results: frequency of cluster membership

Page 37: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Conclusions

• Both methods could identify high probability clusters comprising spikes with similar

trajectories.

• The uncertainty in the clustering was caused by a small number of quite different

spikes in each dataset.

• The DPM captured this through a variable number of small clusters.

• The OFM captured this by combining the ‘outliers’ into a single group with a large

covariance (multivariate Gaussian noise component). This prevented the

interpretation of the smallest clusters and fine structure.

Page 38: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

From PD spikes to AD pre-clinical diagnosis

• Accumulation of b-amyloid (Ab) accumulation begins 15-25 years prior to the clinical

classification of dementia.

https://www.keepmemoryalive.org/

brain-science/alzheimers-brain

The cortex shrivels up, damaging

areas involved in thinking, planning

and remembering.

Shrinkage is especially sever in the

hippocampus (formation of new

memories).

Ventricles (fluid-filled spaces) grow

larger.

Page 39: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

From PD spikes to AD pre-clinical diagnosis

• We wanted to identify pre-clinical Alzheimer's Disease in a population of elderly

cognitively normal participants.

• We sampled 761 clinically normal (CN) participants at 4 periods (0-54 months) in the

Australian Imaging, Biomarkers and Lifestyle (AIBL) study of ageing.

• We used six standardised composite neuropsychological scores:

Verbal episodic memory; Visual memory;

Executive function; Language;

Attention and processing speed; Visuo-spatial functioning

Page 40: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Method

• We fitted a Bayesian mixture model to each composite score and time point, using

ZMix.

• We defined an aggregate measure of posterior probabilities (AMPP score) to

establish the likelihood of pre-clinical AD.

• For the ith person,

𝐴𝑀𝑃𝑃𝑖 =

𝑗=1

24

Pr 𝑧𝑖𝑗 = 𝑘 Pr(𝜇𝑘 < 0) > 0.95)

• We compared these results with groupings based on clinical measures, PET and MRI

scans.

Page 41: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results

AMPP for 0 vs 18

months, for each

composite score.

Scale for AMPP:

Green: low

Yellow: medium

Red/black: high

Page 42: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results

Low

Moderate

High

Page 43: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results

• From Baseline through to 54 months, visuo-spatial function had

the greatest contribution to the AMPP score, followed by attention

and processing speed and visual memory.

• Participants with the highest AMPP scores had both increasing

neo-cortical amyloid burden and decreasing hippocampus volume

over 54 months, compared to those in the lowest category with

stable amyloid burden and hippocampus volume.

• This approach can provide an indication of pre-clinical AD.

Page 44: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Compare brain networks in normal and AD patients

3. Alzheimers Disease

https://physicsworld.com/a/

towards-a-vaccine-for-

alzheimers-disease/

Page 45: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Pearson

pairwise

correlations

MildNormal

Alzheimers

Page 46: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Yirk : cortical thickness of region k = 1:K for participant i = 1:I who has r = 1:Ri replicates

𝑦𝑖𝑟𝑘|𝑏𝑖𝑘 , 𝛽, 𝜎2~𝑁(𝑥𝑖𝛽 + 𝑏𝑖𝑘, 𝜎2)

𝒃𝑖|𝜎𝑠2,𝑊~𝑀𝑉𝑁(𝟎, 𝜎𝑠

2Q)

𝑄−1 = 𝜌 𝐷𝑤 −𝑊 + 1 − 𝜌 𝐼

D : diagonal matrix with elements given by row sums (or number of neighbours) σ𝑗=1𝐾 𝑤𝑗𝑘

W : zero-diagonal, binary symmetric matrix, 𝑤𝑗𝑘 = 1 if regions j and k are neighbours, else = 0

r : determines global level of spatial correlation

Bayesian Hierarchical Model

Typically r is fixed – can we estimate it?

Page 47: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Estimating the neighbourhood

Computation: MCMC

Page 48: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results:

Simulation

Page 49: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Normal

(N=120)

Alzheimers

(N=20)

Results:

case study

Page 50: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Comparison of neurodegeneration over time

• Biomarker and Lifestyle (AIBL) study of ageing

• Neuroimaging data across healthy controls (HC), mild cognitive impaired (MCI) and AD.

• Focus on ventricle and hippocampus regions

• Three types of inference:

1. comparisons of estimated rates of population deterioration

2. ranking of participants by order of linear volumetric rate of change

3. probability trajectories across age of diagnosis groups

Page 51: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Results

(i) large differences in average rate of change of

volume for the ventricle and hippocampus regions

across diagnosis groups

(ii) high risk individuals who had progressed from HC

to MCI and displayed similar rates of deterioration as

AD counterparts

(iii) critical time points which indicate where

deterioration of regions begin to diverge between the

diagnosis groups

Page 52: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Computation: Bayesian Analysis via AutoStat https://autostat.com.au/

Page 53: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Page 54: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Page 55: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Page 56: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Intelligent data collection via (adaptive) design

If we have a specific

question, we don’t need

to analyse all of it.

Use experimental design

principles to select the

data required to answer

the question.

CC Drovandi, CC Holmes, JM McGree, K Mengersen, S Richardson, EG Ryan,

Principles of Experimental Design for Big Data Analysis, Statistical Science, 32 (3), 385–404, 2017

Page 57: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

A decision analysis approach to experimental design

Page 58: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Bayesian experimental design

Page 59: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Experimental design in the context of big data

1. Answer questions of interest: Find the optimal (or near optimal) design to answer the

question and use the design as a ‘template’ for sub-sampling the data.

2. Sequential learning: Apply a given design to incoming data or new datasets until the

question of interest answered.

3. Assess data quality: Absence of design points/windows may indicate structured

missingness or bias in the big dataset.

4. Assess model quality: Replicate designs can be ’laid over’ the big data for model

checking (eg posterior predictives), concept drift etc.

5. Enlarge loss function: Include model misspecification, time constraints etc.

Page 60: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Example: logistic regression

• 6 covariates

• 1,000,000 records

Analysis aims:

- Identification of important covariates for prediction

- Accurate and precise parameter estimates

Page 61: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Experimental design approach

• Select a random sample of 10,000 points to construct prior distributions.

• “Value add” to the information gained through a sequential design process.

• Use Sequential Monte Carlo for fast computation.

• For each new data point, update the prior information to reflect the information

gained and form a 95% credible interval for all parameters.

• If any credible interval is contained within (−tol, tol), drop it from the model.

• Re-fit the reduced model and re-run.

• Iterate until 20,000 data points are extracted.

Page 62: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

Tackling Big Data with Bayesian Statistics

DataScience@QUT

Australian Data Science Network

[email protected]

@Twitter

Page 63: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

QUT Centre for Data Science

Page 64: Bayesian modelling in the big data era: case studies in ... · a university for the real world CRICOS No. 00213J R Bayesian modelling in the big data era: case studies in understanding

CRICOS No. 00213Ja university for the worldrealR

ReferencesM Cespedes, J McGree, CC Drovandi, K Mengersen, LB Reid, JD Doecke, J Fripp (2017) A Bayesian hierarchical approach to jointly model structural biomarkers and covariance networks. Arxiv.

M Cespedes, J Fripp, JM McGree, CC Drovandi, K Mengersen, JD Doecke (2017) Comparison of neurodegeneration over time between health ageing and Alzheimer’s disease cohorts via Bayesian inference. BMJ Open, 7(2).

Z van Havre, P Manuff, V Villemagne, K Mengersen, J Rousseau, N White, J Doecke(2019) Identification of pre-clinical Alzheimer’s Disease in a population of elderly cognitively normal participants. Journal of Alzheimer’s Disease, 1-10.

Z van Havre, N White, J Rousseau, K Mengersen (2015) Clustering action potential spikes: Insights on the use of overfitted finite mixture models and Dirichlet process mixture modelsarXiv preprint arXiv:1602.01915

A Thomas, NM White, LM Leontjew Toms, K Mengersen (2018) Application of ensemble methods to analyse the decline of organochloring pesticides in relation to the interactions between age, gender and time. PLOS ONE.


Recommended