+ All Categories
Home > Documents > Applied Ba Yes i an Analysis Oh Agan

Applied Ba Yes i an Analysis Oh Agan

Date post: 13-Apr-2015
Category:
Upload: fredy-chungara
View: 87 times
Download: 4 times
Share this document with a friend
924
Transcript

The Oxford Handbook of Applied Bayesian AnalysisThis page intentionally left blank The Oxford Handbook ofApplied BayesianAnalysisEdited byAnthony OHaganMike West13Great Clarendon Street, Oxford ox2 6dpOxford University Press is a department of the University of Oxford.It furthers the Universitys objective of excellence in research, scholarship,and education by publishing worldwide inOxford New YorkAuckland Cape Town Dar es Salaam Hong Kong KarachiKuala Lumpur Madrid Melbourne Mexico City NairobiNew Delhi Shanghai Taipei TorontoWith ofces inArgentina Austria Brazil Chile Czech Republic France GreeceGuatemala Hungary Italy Japan Poland Portugal SingaporeSouth Korea Switzerland Thailand Turkey Ukraine VietnamOxford is a registered trademark of Oxford University Pressin the UK and in certain other countriesPublished in the United Statesby Oxford University Press Inc., New York Oxford University Press 2010The moral rights of the authors have been assertedDatabase right Oxford University Press (maker)First published 2010All rights reserved. No part of this publication may be reproduced,stored in a retrieval system, or transmitted, in any form or by any means,without the prior permission in writing of Oxford University Press,or as expressly permitted by law, or under terms agreed with the appropriatereprographics rights organization. Enquiries concerning reproductionoutside the scope of the above should be sent to the Rights Department,Oxford University Press, at the address aboveYou must not circulate this book in any other binding or coverand you must impose the same condition on any acquirerBritish Library Cataloguing in Publication DataData availableLibrary of Congress Cataloging in Publication DataData availableTypeset by SPI Publisher Services, Pondicherry, IndiaPrinted in Great Britainon acid-free paper byCPI Antony Rowe, Chippenham, WiltshireISBN 97801995489031 3 5 7 9 10 8 6 4 2ContentsPreface .............................................................................................................. xvAnthony OHagan and Mike WestList of Contributors...................................................................................... xxixPart I Biomedical and Health Sciences1 Flexible Bayes regression of epidemiologic data .......................................... 3David B. Dunson1.1 Introduction ............................................................................................. 31.2 Mixture models........................................................................................ 71.3 Density regression for pregnancy outcomes ..................................... 131.4 Discussion.............................................................................................. 19Appendix................................................................................................. 20References .............................................................................................. 242 Bayesian modelling for matching and alignment ofbiomolecules.................................................................................................... 27Peter J. Green, Kanti V. Mardia, Vysaul B. Nyirongo andYann Rufeux2.1 Introduction ........................................................................................... 272.2 A Bayesian hierarchical model for pairwise matching..................... 302.3 Alignment of multiple congurations................................................ 332.4 Data analysis........................................................................................... 362.5 Further discussion................................................................................. 40Appendix................................................................................................. 43References .............................................................................................. 493 Bayesian approaches to aspects of the Vioxx trials:Non-ignorable dropout and sequential meta-analysis................................ 51Jerry Cheng and David Madigan3.1 Introduction ........................................................................................... 513.2 Sequential meta-analysis ...................................................................... 533.3 Non-ignorable dropout ......................................................................... 58vi Contents3.4 Conclusion.............................................................................................. 66Appendix................................................................................................. 66References .............................................................................................. 674 Sensitivity analysis in microbial risk assessment: Vero-cytotoxigenicE. coli O157 in farm-pasteurized milk ......................................................... 69Jeremy E. Oakley and Helen E. Clough4.1 Introduction ........................................................................................... 694.2 Microbial risk assessment .................................................................... 694.3 Vero-cytotoxic Escherichia coli O157 in milk sold as pasteurized .... 704.4 A contamination assessment model .................................................. 724.5 Model input distributions ................................................................... 734.6 Model output analysis .......................................................................... 794.7 Further discussion................................................................................. 84Appendix................................................................................................. 84References .............................................................................................. 875 Mapping malaria in the Amazon rain forest:A spatio-temporal mixture model ................................................................. 90Alexandra M. Schmidt, Jennifer A. Hoeting, Joo Batista M. Pereiraand Pedro P. Vieira5.1 Introduction ........................................................................................... 905.2 Motivation .............................................................................................. 925.3 A multivariate Poisson lognormal model ....................................... 965.4 Results................................................................................................... 1025.5 Further discussion............................................................................... 109Appendix............................................................................................... 109References ............................................................................................ 1156 Trans-study projection of genomic biomarkers in analysis ofoncogene deregulation and breast cancer.................................................. 118Dan Merl, Joseph E. Lucas, Joseph R. Nevins, Haige Shenand Mike West6.1 Oncogene pathway deregulation and human cancers.................... 1186.2 Modelling and data analysis............................................................... 1216.3 Biological evaluation and pathway annotation analysis ................. 140Appendices ........................................................................................... 144References ............................................................................................ 1517 Linking systems biology models to data: A stochastic kineticmodel of p53 oscillations ............................................................................. 155Daniel A. Henderson, Richard J. Boys, Carole J. Proctor andDarren J. Wilkinson7.1 Introduction ......................................................................................... 155Contents vii7.2 Stochastic kinetic model ..................................................................... 1607.3 Data ....................................................................................................... 1617.4 Linking the model to the data............................................................ 1627.5 Posterior computation ........................................................................ 1667.6 Inference based on single cell data ................................................... 1707.7 Inference based on multiple cells ..................................................... 1737.8 Further discussion............................................................................... 180Appendix............................................................................................... 181References ............................................................................................ 1848 Paternity testing allowing for uncertain mutation rates.......................... 188A. Philip Dawid, Julia Mortera and Paola Vicard8.1 Introduction ......................................................................................... 1888.2 Simple paternity testing...................................................................... 1908.3 Mutation ............................................................................................... 1938.4 Case analysis with assumed mutation rate ...................................... 1978.5 Uncertain mutation rate..................................................................... 1988.6 Paternity casework data ...................................................................... 2008.7 The likelihood for the mutation rate................................................. 2018.8 Data analysis for mutation rate.......................................................... 2048.9 Application to new case ...................................................................... 2068.10 Further discussion............................................................................... 206Appendix............................................................................................... 208References ............................................................................................ 213Part II Industry, Economics and Finance9 Bayesian analysis and decisions in nuclear powerplant maintenance ........................................................................................ 219Elmira Popova, David Morton, Paul Damien and Tim Hanson9.1 Introduction ......................................................................................... 2199.2 Maintenance model............................................................................. 2229.3 Optimization results ........................................................................... 2239.4 Data and Bayesian models ................................................................. 224Appendix............................................................................................... 233References ............................................................................................ 23910 Bayes linear uncertainty analysis for oil reservoirs based onmultiscale computer experiments .............................................................. 241Jonathan A. Cumming and Michael Goldstein10.1 Introduction ......................................................................................... 24110.2 Preliminaries........................................................................................ 242viii Contents10.3 Uncertainty analysis for the Gullfaks reservoir ............................... 247Appendix............................................................................................... 268References ............................................................................................ 27011 Bayesian modelling of train door reliability .............................................. 271Antonio Pievatolo and Fabrizio Ruggeri11.1 Train door reliability............................................................................ 27111.2 Modelling and data analysis............................................................... 27311.3 Further discussion............................................................................... 285Appendix............................................................................................... 286References ............................................................................................ 29312 Analysis of economic data with multiscale spatio-temporalmodels ............................................................................................................ 295Marco A. R. Ferreira, Adelmo I. Bertolde and Scott H. Holan12.1 Introduction ......................................................................................... 29512.2 Multiscale factorization....................................................................... 29812.3 Exploratory multiscale data analysis ................................................. 30012.4 Dynamic multiscale modelling.......................................................... 30212.5 Estimation ............................................................................................ 30412.6 Agricultural production in Esprito Santo........................................ 30612.7 Further discussion............................................................................... 311Appendix............................................................................................... 313References ............................................................................................ 31613 Extracting S&P500 and NASDAQ volatility: The creditcrisis of 20072008........................................................................................ 319Hedibert F. Lopes and Nicholas G. Polson13.1 Introduction ......................................................................................... 31913.2 Models................................................................................................... 32513.3 Sequential learning via particle ltering .......................................... 32913.4 Empirical results.................................................................................. 33313.5 Conclusions.......................................................................................... 337Appendix............................................................................................... 338References ............................................................................................ 34014 Futures markets, Bayesian forecasting and risk modelling .................... 343Jos M. Quintana, Carlos M. Carvalho, James Scott andThomas Costigliola14.1 Introduction ......................................................................................... 34314.2 Subjective expectations ....................................................................... 34314.3 Futures markets................................................................................... 34414.4 Bayesian speculation........................................................................... 348Contents ix14.5 Bayesian forecasting............................................................................ 35014.6 Risk modelling..................................................................................... 35214.7 Conclusions.......................................................................................... 358Appendix............................................................................................... 359References ............................................................................................ 36315 The new macroeconometrics: A Bayesian approach................................ 366Jess Fernndez-Villaverde, Pablo Guerrn-Quintana andJuan F. Rubio-Ramrez15.1 Introduction ......................................................................................... 36615.2 A benchmark new Keynesian model ................................................ 37115.3 Empirical analysis................................................................................ 38215.4 Lines of further research .................................................................... 390Appendix............................................................................................... 391References ............................................................................................ 397Part III Environment and Ecology16 Assessing the probability of rare climate events....................................... 403Peter Challenor, Doug McNeall and James Gattiker16.1 Introduction ......................................................................................... 40316.2 Climate models.................................................................................... 40616.3 Inference from climate models ......................................................... 40816.4 Application to the collapse of the meridional overturningcirculation............................................................................................. 41116.5 Multivariate and high dimensional emulators ................................ 41716.6 Uncertainty analysis............................................................................ 41916.7 Summary and further discussions.................................................... 424Appendix............................................................................................... 425References ............................................................................................ 42817 Models for demography of plant populations ........................................... 431James S. Clark, Dave Bell, Michael Dietze, Michelle Hersh, Ines Ibanez,Shannon L. LaDeau, Sean McMahon, Jessica Metcalf, Emily Moran,Luke Pangle and Mike Wolosin17.1 Introduction ......................................................................................... 43117.2 Demographic data ............................................................................... 43317.3 Models to synthesize data and previous knowledge....................... 44017.4 Prior distributions ............................................................................... 44817.5 Computation ........................................................................................ 45217.6 Diagnostics........................................................................................... 45317.7 Summarizing the complexity............................................................. 463x Contents17.8 Potential ................................................................................................ 467Appendix............................................................................................... 467References ............................................................................................ 47918 Combining monitoring data and computer model output inassessing environmental exposure ............................................................. 482Alan E. Gelfand and Sujit K. Sahu18.1 Introduction ......................................................................................... 48218.2 Algorithmic and pseudo-statistical approaches in weatherprediction.............................................................................................. 48618.3 Review of data fusion methods for environmental exposure........ 48818.4 A downscaling approach..................................................................... 49018.5 Further discussion............................................................................... 498Appendix............................................................................................... 503References ............................................................................................ 50819 Indirect elicitation from ecological experts: From methods andsoftware to habitat modelling and rock-wallabies..................................... 511Samantha Low Choy, Justine Murray, Allan James and Kerrie Mengersen19.1 Introduction ......................................................................................... 51119.2 Ecological application: Modelling and mapping habitat ofa rock-wallaby....................................................................................... 51219.3 Elicitation for regression .................................................................... 51419.4 Software tool for elicitation................................................................ 52019.5 Results................................................................................................... 52319.6 Discussion............................................................................................ 531Appendix............................................................................................... 534References ............................................................................................ 54020 Characterizing the uncertainty of climate change projections usinghierarchical models....................................................................................... 545Claudia Tebaldi and Richard L. Smith20.1 Climate change and human inuences, the current stateand future scenarios............................................................................ 54520.2 A world of data. Actually, make that many worlds ........................ 54820.3 Our simplied datasets....................................................................... 55020.4 A hierarchy of statistical models........................................................ 55120.5 Validating the statistical models ........................................................ 55820.6 Application: The latest model projections, and theirsynthesis through our Bayesian statistical models ......................... 56220.7 Further discussion............................................................................... 572Appendix............................................................................................... 573References ............................................................................................ 593Contents xiPart IV Policy, Political and Social Sciences21 Volatility in prediction markets: A measure of information owin political campaigns................................................................................... 597Carlos M. Carvalho and Jill Rickershauser21.1 Introduction ......................................................................................... 59721.2 Political prediction markets ............................................................... 59921.3 Volatility, trading volume and information ow ............................. 60221.4 The 2004 presidential election........................................................... 60821.5 Concluding remarks ........................................................................... 615Appendix............................................................................................... 617References ............................................................................................ 62122 Bayesian analysis in item response theory applied to a large-scaleeducational assessment................................................................................ 624Dani Gamerman, Tu M. Soares and Flvio B. Gonalves22.1 Introduction ......................................................................................... 62422.2 Programme for International Student Assessment (PISA)........... 62522.3 Differential item functioning (DIF) .................................................. 62722.4 Bayesian model for DIF analysis....................................................... 62922.5 DIF analysis of PISA 2003.................................................................. 63322.6 Conclusions.......................................................................................... 641Appendix............................................................................................... 641References ............................................................................................ 65123 Sequential multilocation auditing and the New York food stampsprogram.......................................................................................................... 653Karl W. Heiner, Marc C. Kennedy and Anthony OHagan23.1 Introduction ......................................................................................... 65323.2 Modellingof error rates and error classes........................................ 65823.3 Updating............................................................................................... 66123.4 Projection.............................................................................................. 66223.5 Application to New York food stamps audit..................................... 66623.6 Discussion ........................................................................................... 674Appendix............................................................................................... 676References ............................................................................................ 67724 Bayesian causal inference: Approaches to estimating the effectof treating hospital type on cancer survival in Sweden usingprincipal stratication................................................................................... 679Donald B. Rubin, Xiaoqin Wang, Li Yin and Elizabeth R. Zell24.1 Introduction ......................................................................................... 67924.2 Bayesian causal inference General framework ............................ 680xii Contents24.3 Bayesian inference for the causal effect of large versus smalltreating hospitals ................................................................................. 689Appendix............................................................................................... 705References ............................................................................................ 706Part V Natural and Engineering Sciences25 Bayesian statistical methods for audio and music processing................ 711A. Taylan Cemgil, Simon J. Godsill, Paul Peeling and Nick Whiteley25.1 Introduction ......................................................................................... 71125.2 Time-domain models for audio......................................................... 72125.3 Frequency-domain models................................................................. 72825.4 Conclusions.......................................................................................... 742Appendix............................................................................................... 742References ............................................................................................ 74526 Combining simulations and physical observations to estimatecosmological parameters.............................................................................. 749Dave Higdon, Katrin Heitmann, Charles Nakhleh andSalman Habib26.1 Introduction ......................................................................................... 74926.2 The statistical framework ................................................................... 75226.3 Combined CMB and large scale structure analysis ........................ 765Appendix............................................................................................... 770References ............................................................................................ 77327 Probabilistic grammars and hierarchical Dirichlet processes ................ 776Percy Liang, Michael I. Jordan and Dan Klein27.1 Introduction ......................................................................................... 77627.2 The hierarchical Dirichlet process PCFG (HDP-PCFG) ................ 78227.3 The HDP-PCFG for Grammar Renement(HDP-PCFG-GR) ................................................................................. 78527.4 Bayesianinference .............................................................................. 78727.5 Experiments ......................................................................................... 79427.6 Discussion............................................................................................ 801Appendix............................................................................................... 801References ............................................................................................ 81928 Designing and analysing a circuit device experiment using treedGaussian processes....................................................................................... 823Herbert K. H. Lee, Matthew Taddy, Robert B. Gramacy andGenetha A. Gray28.1 Introduction ......................................................................................... 82328.2 Experimental design ........................................................................... 831Contents xiii28.3 Calibrating the computer model ....................................................... 83328.4 Further discussion............................................................................... 838Appendix............................................................................................... 839References ............................................................................................ 84229 Multistate models for mental fatigue ......................................................... 845Raquel Prado29.1 Goals and challenges in the analysis of brain signals:The EEG case ....................................................................................... 84529.2 Modelling and data analysis............................................................... 84829.3 Further discussion............................................................................... 864Appendix............................................................................................... 866References ............................................................................................ 873Index...................................................................................................................... 875This page intentionally left blank PrefaceA Bayesian 21st centuryThe diversity of applications of modern Bayesian analysis at the start of the21st century is simply enormous. From basic biology to frontier informationtechnology, the applications of highly structured stochastic models of increasingrealism often with high-dimensional parameters and latent variables, multiplelayers of hierarchically structured random effects, and nonparametric compo-nents are increasingly routine. Much of the impetus behind this growth andsuccess of applied Bayesian methods over the last 20 years has come fromaccess to the increasingly rich array of advanced computational strategies forBayesian analysis; this has led to increasing adoption of Bayesian methods fromheavily practical and pragmatic perspectives.Coupled with this evolution in the nature of applied statistical work toa model-based, computational perspective is change in statistical scienticthought at a more fundamental level. As researchers become increasinginvolved in more complex stochastic model building enabled by advancedBayesian computational methods, they also become more and more exposed tothe inherent logic and directness of Bayesian model building. Scientically rel-evant, highly structured stochastic models are often simply naturally developedfrom Bayesian formalisms and have overt Bayesian components. Hierarchicalmodels with layers of random effects, random processes in temporal or spatialsystems, and large-scale latent variables models of many avours are just a fewgeneric examples of nowadays standard stochastic structures in wide applica-tion, and that are all inherently Bayesian models. Much of the rapid growthin adoption of Bayesian methods from pragmatic viewpoints is engenderingdeeper, foundational change in scientic philosophy towards a more holisti-cally Bayesian perspective. And this, in turn, has important implications forthe core of the discipline; bringing Bayesian methods of stochastic modellingcentre-stage with models of increasing complexity and structure for reasonsof increased realism is inevitably re-energizing the core of the discipline,presenting new conceptual and theoretical challenges to Bayesian researchersas applied problems scale in dimension and complexity.xvi PrefaceThe HandbookThe Handbook of Applied Bayesian Analysis is a showcase of contemporaryBayesian analysis in important and challenging applied problems, bringingtogether chapters contributed by leading researchers and practitioners in inter-disciplinary Bayesian analysis. Each chapter presents authoritative discussionsof currently topical application areas together with key aspects of Bayesiananalysis in these areas, and takes the reader to the cutting edge of researchin that topic. Importantly, each chapter is built around the application, andrepresents personal interests, experiences and views of the authors speakingfrom deep and detailed expertise and engagement in the applied problem area.Each chapter of the Handbook involves a concise review of the applica-tion area, describes the problem contexts and goals, discusses aspects of thedata and overall statistical issues, and develops detailed analysis with relevantBayesian models and methods. Discussion generally contacts current frontiersof research in each application, with authors presenting their own perspectives,their own latest thinking, and highlighting their own research in both theapplication and in related and relevant Bayesian methodology used in theirapplication. Each chapter also includes references linking to core publicationsin the applied eld as well as relevant models and computational methods, andsome also provide access to data, software and additional material related to thestudy of the chapter.Importantly, each chapter contains appendix material that adds further foun-dational and supporting discussion of two avours: material on the basic sta-tistical models and methods, with background and key references, for readersinterested in going further into methodological aspects, and more traditionalappendix material representing additional technical developments in the spe-cic application. Collectively, the appendices are an important component anddistinctive feature of the Handbook, as they reect a broad selection of modelsand computational tools used widely in applied Bayesian analysis across thediverse range of applied contexts represented.Chapter outlinesChapters are grouped by broad eld of application, namelyr Biomedical and Health Sciencesr Industry, Economics and Financer Environment and Ecologyr Policy, Political and Social Sciencesr Natural and Engineering SciencesPreface xviiInevitably selective in terms of broad elds as well as specic applicationcontexts within each broad area, the chapters nevertheless represent topical,challenging and statistically illuminating studies in each case. Chapters withineach area are as follows.Biomedical and Health SciencesDunson discusses an epidemiological study involving pregnancy outcomes. Thischapter showcases Bayesian analysis in epidemiological studies that collectcontinuous health outcomes data, and in which the scientic and clinical inter-est typically focuses on the relationships between exposures and risks of anabnormal response, corresponding to an observation in the tails of the distri-bution. As there is minimal interest in relationships between exposures andthe centre of the response distribution in such studies, traditional regressionmodels are inadequate. For this reason, epidemiologists typically categorizeboth the outcome and the predictors, with the resulting inferences very sensitiveto this categorization. Bayesian analysis using density regression, mixtures andnonparametric models, as developed and applied in this pregnancy outcomestudy, avoid and overcome these challenges.Green, Mardia, Nyirongo and Rufeux discuss the alignment of biomolecules.This chapter showcases Bayesian methods for shape analysis to assist withunderstanding the three-dimensional structure of protein molecules, which isone of the major unsolved biological challenges. This chapter addresses theproblem of matching instances of the same structure in the CoMFA (Com-parative Molecular Field Analysis) database of steroid molecules, where thethree-dimensional coordinates of all the atoms in each molecule are stored.The matching problem is challenging because two instances of the samethree-dimensional structure in such a database can have very different sets ofcoordinates, due not just to noisy measurements but also to rotation, trans-lation and scaling. The authors present an efcient Bayesian methodologyto identify, given two or more biomolecules represented by the coordinatesof their atoms, subsets of those atoms which match within measurementerror, after allowing for appropriate geometrical transformations to align thebiomolecules.Cheng and Madigan discuss a study of pharmaceutical testing from multipleclinical trials concerned with side-effects and adverse events among patientstreated with a popular pain-relieving drug. This chapter showcases the develop-ment of sensitive Bayesian analysis of clinical trials studies involving problemsof missing data and particularly non-ignorable dropout of patients from studies,as well as sequential methods and meta-analysis of multiple studies. The studyconcerns Vioxx, an anti-inammatory drug that was licensed for use in thexviii PrefaceUSA by the FDA in 1999, and then withdrawn from the market in 2004 dueto cardiovascular safety concerns. Merck, the manufacturer of Vioxx, conductedmany clinical trials both before and after 1999. In part to avoid potential futurescenarios like Vioxx, analyses of the data from these multiple clinical trials areof considerable importance and interest. The study raises multiple, challengingstatistical issues and questions requiring sensitive evaluation, and the chapterhighlights the utility of Bayesian analysis in addressing these challenges.Oakley and Clough discuss uncertainty in a mechanistic model that has beenused to conduct a risk assessment of contamination of farm-pasteurized milkwith the bacterium Vero-cytotoxigenic E. coli (VTEC) O157. This chapter show-cases Bayesian methods for analysing uncertainties in complex computer mod-els. The VTEC model has uncertain input parameters, and so outputs from themodel used to inform the risk assessment are also uncertain. The question thenarises of how to reduce output uncertainty most efciently. The authors conducta variance-based sensitivity analysis to identify the most important uncertainmodel inputs, and so prioritize what further research would be needed to bestreduce model output uncertainty.Schmidt, Hoeting, Pereira and Vieira discuss temporal prediction and spatialinterpolation for out-breaks of malaria over time for municipalities in the stateof Amazonas, Brazil. This chapter showcases Bayesian spatial-temporal mod-elling for epidemiological discrete count data. Malaria is a world-wide publichealth problem with 40% of the population of the world at risk of acquiring thedisease. It is estimated that there are over 500 million clinical cases of malariaeach year world-wide. This work falls in the area of disease mapping, where dataon aggregate incidence of some disease is available for various administrativeareas, but the data for Amazonas are incomplete, covering only a subset of themunicipalities. Furthermore, the temporal aspect is important because malariaincidence is not constant over time. A free-form spatial covariance structure isadopted which allows for the estimation of unobserved municipalities to drawon observations in neighbouring areas, but without making strong assumptionsabout the nature of spatial relations. A multivariate dynamic linear modelcontrols the temporal effects and facilitates the forecasting of future malariaincidence.Merl, Lucas, Nevins, Shen and West discuss a study in cancer genomics. Thischapter showcases the application of Bayesian concepts and methods in anoverall strategy for linking the results of in vitro laboratory studies of geneexpression to in vivo human observation studies. The basic problem of trans-lating inferences across contexts constitutes a generic, critical, and growingchallenge in modern biology, which typically moves from laboratory experi-ments with cultured cells, to animal model experiments, to human outcomestudies and clinical trials. The study described here concerns this problem inPreface xixthe context of the genomics of several oncogene pathways that are fundamentalto many human cancers. The application involves Bayesian sparse multivariateregression and sparse latent factor models for large-scale multivariate data, anddetails the use of such models to dene and relate statistical signatures ofbiological phenomena between contexts. In addition, the study requires linkingthe resulting, model-based inferences to known biology; this is achieved usingBayesian methods for mapping summary inferences to databases of biologicalpathways. The study includes detailed discussion of biological interpretations ofexperimentally dened gene expression signatures and their elaborated subsig-nature representations emerging from Bayesian factor analysis of in vivo data,model-generated leads to design new biological experiments based on someof the ndings, and contextual discussions of connections to clinical cancerproling and prognosis.Henderson, Boys, Proctor and Wilkinson discuss oscillations observed in thelevels of two proteins, p53 and Mdm2, in single living cancer cells. This chaptershowcases Bayesian methods in systems biology using genuine prior infor-mation and MCMC computation. The p53 tumour suppressor protein playsa major role in cancer. It has been described as the guardian of the genome,blocking cell cycle progression to allowthe repair of damaged DNA. An increaseof p53 due to stress causes an increase in the level of Mdm2 which in turninhibits p53. From observations of levels of these two proteins in individualcancer cells, the objective is to learn about the rate parameters that control thisfeedback loop. By examining several cells, it is hoped to understand why they donot oscillate in phase. However, the modelling of the complex reactions withinthe cell makes this exercise highly computationally intensive. The authorsdevelop a Bayesian approximation to the discrete time transition probabilities inthe underlying continuous time stochastic model. Prior information about theunknown rate parameters is incorporated based on experimental values in theliterature and they apply sophisticated MCMC methods to compute posteriordistributions.Dawid, Mortera and Vicard discuss the problem of evaluating the probabilityof a putative father being the real father of a child, based on his DNA proleand those of the mother and child. The chapter is a showcase for carefulprobabilistic reasoning. In recent years there has been heavy media coverageof DNA proling for criminal identication, but the technique has also beenuseful in illuminating a number of complex genetic problems, including casesof disputed paternity. The paternity problem is complicated by the possibility ofgenetic mutation: the putative father could be the real father, yet a mutation inthe childs DNA could seem to imply that he is not. The probability of paternitynow depends strongly on the rate of mutation. On the other hand, estimatesof mutation rates are themselves very sensitive to assumptions about paternity.xx PrefaceUsing Austrian-German casework data, the authors present a meticulous studyof this problem, constructing and analysing a model to handle paternity andmutation jointly.Industry, Economics and FinancePopova, Morton, Damien and Hanson discuss a study in Bayesian analysis anddecision making in the maintenance and reliability of nuclear power plants.The chapter showcases Bayesian parametric and semiparametric methodologyapplied to the failure times of components that belong to an auxiliary feedwatersystem. This system supplies cooling water during an emergency operation orto an electro-hydraulic control system, used for the control of the main electricalgenerating steam turbine. The parametric models produce estimates of thehazard functions that are compared to the output from a mixture of Polya treesmodel. The statistical output is used as the most critical input in a stochasticoptimization model which nds the optimal replacement time for a systemthat randomly fails over a nite horizon. The chapter also discusses decisionanalysis, using the model in dening strategies that minimize expected totaland discounted cost of nuclear plant maintenance.Cumming and Goldstein discuss analysis of the Gullfaks oil eld using a reser-voir simulation model run at two different levels of complexity. This chaptershowcases Bayes linear methods to address highly complex problems for whichthe full Bayesian analysis may be computationally intractable. A simulatorof a hydrocarbon reservoir represents properties of the reservoir on a three-dimensional grid. The ner this grid is, the more accurately the simulator isexpected to predict the real reservoir behaviour, but ner resolution also impliesrapidly escalating computation times. Observed behaviour of the reservoir canin principle be used to learn about values of parameters in the simulator, butthis Bayesian calibration demands that the simulator can be run many times atdifferent values of these parameters in order to search for regions of parameterspace in which acceptable matches are found to the observed data. The authorsemploy many runs of the simulator at low resolution to augment a few runs ofthe ne simulator. Their approach involves careful modelling of the relationshipbetween the two versions of the simulator, as well as how the ne simulatorrelates to reality.Pievatolo and Ruggeri discuss a study in Bayesian reliability analysis con-cerning underground train door failures in a European underground systemover a period of nine years. The chapter showcases development and appli-cation of Bayesian stochastic process models in a reliability context. Facingquestions about relevant reliability time scales, the authors develop a novelbivariate Poisson process as a natural way to extend the usual Poisson modelsPreface xxifor the occurrence of failures in repairable systems; the bivariate model usesboth calendar time and kilometres driven by trains as metrics. An importantconsequence of this choice is that seasonal effects are easily incorporated intothe model. The Bayesian models and MCMC methods developed lead to pre-dictive distributions for failures and address key practical questions of how toassess reliability before warranty expiration, combining the data from severaltrains. This study also claries the advantages and disadvantages of using Pois-son process models for repairable systems with a number of different failuremodes.Ferreira, Bertolde and Holan discuss an economic study of agricultural pro-duction in Esprito Santo State, Brazil, from 1990 to 2005. The chapter show-cases the use of Bayesian multiscale spatio-temporal models that uses thenatural geopolitical division of Esprito Santo State at levels of macroregions,microregions, and counties. The models involve multiscale latent parametersthat evolve through time over a period of several years of the economic study.The analysis sheds light on the similarities and differences in agriculturalproduction between regions within each scale of resolution, and on the tem-poral changes in relative agricultural importance of those regions as explic-itly described by the evolution of the estimated multiscale components. Thestudy involves a number of other questions relevant to the underlying spatio-temporal agricultural production process at each level of resolution, and buildson advanced Markov chain Monte Carlo methods for multivariate dynamicmodels integrated in an overall, highly-structured multiscale spatio-temporalframework.Lopes and Polson discuss nancial time series at the time of the 200708 creditcrisis, showcasing the ability of Bayesian modelling and inference to representa period of nancial instability and to identify the underlying mechanisms.The authors consider several forms of model that exhibit stochastic volatilityso as to capture the rapidly changing behaviour of the nancial indicators atthat time. Using Bayesian sequential model choice techniques, they show howthe evidence accumulates over time that the pure stochastic volatility model isinferior to a model with jumps. Their work has implications for analysis andprediction in times of unusual market behaviour.Quintana, Carvalho, Scott and Costigliola discuss studies in applications of theBayesian approach to risk modelling regarding speculative trading strategies innancial futures markets. The chapter showcases applied Bayesian thinkingin the context of nancial investment management, highlighting the corre-sponding concepts of betting and investing, prices and expectations, and coher-ence and arbitrage-free pricing. Covering central applied methods and toolsof Bayesian decision analysis and speculation in portfolio studies, risk mod-elling, dynamic linear models and Bayesian forecasting, and highly structuredBayesian graphical modelling approaches for multivariate, time-varying covari-xxii Prefaceance matrices in multivariate dynamic models, the chapter develops studies ofinvestment strategies and returns in futures markets over a period between1990 and 2008 based on portfolios of currency exchange rates, governmentbonds and stock market indices.Fernndez-Villaverde, Guerrn-Quintana and Rubio-Ramirz discuss macro-economic studies of the dynamics of the US economy over the last 50 yearsusing Bayesian analysis of dynamic stochastic equilibriummodels. This chapteris a showcase of modern, model-based Bayesian analysis in mainstream eco-nomics studies, and an approach increasingly referred to as the new macroecono-metrics. The authors formulate and estimate a benchmark dynamic stochasticequilibrium model that captures much of the time-varying structure in the USmacroeconomy over these years, and describe its application in policy analysisfor public institutions such as central banks and private organizations andbusinesses. Application involves likelihood evaluations that are enabled usingBayesian sequential Monte Carlo and MCMC methods. The study discussescritical questions of the roles of priors and pre-sample information, documentsa range of real and nominal rigidities in the US economy and discusses theincreasingly central roles of such Bayesian approaches in this context as well asfrontier research issues.Environment and EcologyChallenor, McNeall and Gattiker discusses the potential collapse of the merid-ional overturning circulation in the Atlantic Ocean. This chapter showcasesBayesian methods for analysing uncertainty in complex models, and in partic-ular for quantifying the risk of extreme outcomes. While climate science hasconcentrated on predictions of global warming, there are possible scenarioswhich, although with low probability, would have high impact. One such eventis the collapse of the ocean circulation that currently ensures that WesternEurope enjoys a warmer climate than, for instance, similar latitudes in WesternNorth America. Collapse of the meridional overturning circulation (MOC) ispredicted by the GENIE-1 climate model for some values of the model inputs,but the actual values of these inputs are unknown. A single run of GENIE-1takes several hours, and the authors use Bayesian emulation to estimate theprobability of MOC collapse based on a limited number of model runs, and toincorporate data comprising a sparse time series of ve measurements of theMOC from 1957 to 2004.Clark, Bell, Dietze, Hersh, Ibanez, LaDeau, McMahon, Metcalf, Moran, Pangleand Wolosin discuss demography of plant populations, showcasing appliedBayesian analysis and methods that allow for synthesis of information frommultiple sources to estimate the demographic rates of trees and how theyPreface xxiiirespond to environmental variation. Data come from individual (tree) measure-ments over a period of 18 years, including diameter, crown area, maturationstatus, and survival, and from seed traps, which provide indirect informationon fecundity. Different observations are available for different years and trees.The multiple data sets are synthesized with a process model where each indi-vidual is represented by a multivariate state-space submodel for both contin-uous (fecundity potential, growth rate, mortality risk, maturation probability)and discrete states (maturation status). Each year, state variables respond toa dynamic environment. Results provide unprecedented detail on the ways inwhich demographic rates relate to one another, within individuals over time,among individuals, and among species. The chapter also describes how resultsof these Bayesian methods are being used to assess how forests can respond tochanging climate.Gelfand and Sahu discuss environmental studies that aim to combine moni-toring data and computer model outputs in assessing environmental exposure.This chapter showcases Bayesian data fusion methods using spatial Gaussianprocess models in studies of weekly deposition data from multiple US sitesmonitored by the US National Atmospheric Deposition Program. Environ-mental exposure community numerical models are now widely available fora number of air pollutants. Based on inputs from a number of factors such asmeteorological conditions, land usage, and power station emission volumes,all of which are responsible for producing air pollution, and some predictionsof spatial surfaces for current, past, and future time periods, these modelsprovide output exposures at various spatial and temporal resolutions. For largespatial regions such as the entire United States, the spatial coverage of theavailable network monitoring stations can never match the coverage at whichthe computer models produce their output. However, the monitoring data willbe more accurate than the computer model output since, up to measurementerror, they provide the actual true levels: observations from the realization of thepollution process surface at that time. It is important to combine these two setsof information to make inference regarding pollution exposure, and this studyrepresents best-Bayesian practices in addressing this problem.Choy, Murray, James and Mengersen discuss eliciting knowledge from eco-logical experts about the habitat of the Australian brush-tailed rock-wallaby.This chapter is a showcase of techniques for eliciting expert judgement aboutcomplex uncertainties. The rock-wallaby is an endangered species, and in orderto map where it is likely to be found, it is essential to use expert judgementabout how the various environmental factors (such as geology, land cover andelevation) inuence the probability of rock-wallabies being present at a site.The authors employ an indirect elicitation method in which the experts arepresented with descriptions of some specic sites and asked for their probabili-ties. The relationship between probability of occurrence and the environmentalxxiv Prefacevariables is then inferred and used to predict the rock-wallabys likely habitatsthroughout the region of interest.Tebaldi and Smith discuss studies in characterizing the uncertainty of cli-mate change projections, showcasing Bayesian methods for integration andcomparison of predictions from multiple models and groups. The chapterdescribes a suite of customised Bayesian hierarchical models that synthesizeensembles of climate model simulations, with the aim of reconciling differentfuture projections of climate change, while characterizing their uncertaintyin a rigorous fashion. Posterior distributions of future temperature and/orprecipitation changes at regional scales are obtained, accounting for many pecu-liar data characteristics, such as systematic biases, model-specic precisions,region-specic effects, changes in trend with increasing rates of greenhouse gasemissions, and others. The chapter expands on many important issues charac-terizing model experiments and their collection into multimodel ensembles,and addresses the need of impact research, by proposing posterior predictivedistributions as a representation of probabilistic projections. In addition, thecalculation of the posterior predictive distribution for a new set of model dataallows a rigorous cross-validation approach to assess and, in this study, conrmthe reasonableness of the Bayesian modelling assumptions.Policy, Political and Social SciencesCarvalho and Rickershauser discuss a study of temporal volatility and informa-tion ows in political campaigns, showcasing Bayesian analysis in evaluation ofinformation impact on vote sentiment and behaviour in highly publicized cam-paigns. The core application is to the 2004 US presidential campaign. The studybuilds a measure of information ow based on the returns and volume of theBush wins the popular vote in 2004 futures contract on the tradesports/intradeprediction market. This measure links events to information level, providing adirect way to evaluate its impact in the election. Among the ndings are thatinformation ows increased as a result of the televised debates, Kerrys accep-tance speech at the Democratic convention, and national security-related storiessuch as the report that explosives vanished in Iraq under the USs watch, theCBS story about Bushs National Guard service and the subsequent retraction,and the release of the bin Laden tape a few days before the election. Contraryto popular accounts of the election, ads attacking Kerrys military service airedby the Swift Boat Veterans for Truth in August apparently contributed only alimited amount of information to the campaign. This political science applica-tion develops novel hidden state-space models of volatility in information owsand model tting and evaluation using Bayesian MCMC methods for nonlinearstate-space models.Preface xxvGamerman, Soares and Gonalves discuss whether cultural differences mayaffect the performance of students from different countries in the various testitems which make up the international PISA test of mathematics ability. Thischapter showcases a Bayesian model that incorporates this kind of differentialitem functioning (DIF) and the role of prior information. The PISA tests inmathematics and other subjects are widely used to compare the educationalattainment of 15-year old students in different countries; in 2009, 67 countrieshave taken part fromaround the world. DIF is a signicant issue with the poten-tial to compromise such comparisons between countries, and substantial DIFmay remain in the administered test despite preliminary screening of candidatetest items. The authors seek to discover the extent of DIF remaining in themathematics test of 2003. They employ a hierarchical three-parameter logisticmodel for the probability of a correct response on an individual item, where thethree parameters control the difculty of the item, its discriminating power andits guessability, and their model allows for different kinds of DIF where any ofthese parameters may vary between countries. The authors Bayesian modelavoids identiability problems faced by competing approaches and requiresweaker hypotheses due, especially, to the important role played by the priordistributions.Heiner, Kennedy and OHagan discuss auditing of the operation of the foodstamps welfare scheme in the state of New York, USA, highlighting the powerof Bayesian methods in analysing data that evolve over time. Auditors examine asample of individual awards of food stamps to see if the value awarded is correctaccording to the rules of the scheme. The food stamps program is a federalscheme, and if a state is found to have too large an error rate in administeringit the federal government can impose large nancial penalties. In New Yorkstate, the program is administered by individual counties, and sizes of auditsamples in small counties can be so small that only one or two errors are foundin any given year. The authors propose a model that includes a nonparametriccomponent for the error magnitudes (taints), a hierarchical model for overallerror rates across counties and parameters controlling the variation of ratesfrom one year to the next, including an overall trend in error rates. The modelallows in particular for estimation of rates in small counties to be smoothedacross counties and through time.Rubin, Wang, Yin and Zell discuss a study in estimating the effects of treatinghospital type on cancer survival, using administrative data from central andnorthern Sweden via the Karolinska Institute in Stockholm. The chapter repre-sents a showcase in application of Bayesian causal inference, in particular usingthe posterior predictive approach of the Rubin causal model and methods ofprincipal stratication. The central applied question, inferring which type ofhospital (e.g. large patient volume versus small volume) is superior for treatingcertain serious conditions, is a difcult and important problem in institutionalxxvi Prefaceassessment and comparisons in a medical context. Ideal data from random-ized experiments are simply not available, leading to reliance on observationaldata. The study involves questions of which factors may reasonably be con-sidered ignorable in the context of covariates available, and non-compliancecomplications due to transfers between hospital types for treatment, andshowcases Bayesian causal modelling utilizing simulation-based imputationtechniques.Natural and Engineering SciencesCemgil, Godsill, Peeling and Whiteley discuss musical audio signal analysis inthe context of an application to multipitch audio and determining a musicalscore representation that includes pitch and time duration summary for amusical extract (the so-called piano-roll representation of music). This chaptershowcases applied Bayesian analysis in audio signal processing in real envi-ronments where acoustical conditions and sound sources are highly variable,yet audio signals possess strong statistical structure. There is typically muchprior information about underlying structures and the detail of the recordedacoustical waveform (physical mechanisms by which sounds are generated,cognitive processes by which sounds are perceived by the human auditorysystem, mechanisms by which high-level sound structures are compiled). Arange of Bayesian hierarchical models involving both time and frequencydomain dynamic models, and methods of tting using simulation-based andvariational approximations are developed in this chapter. The resulting mod-els possess complex statistical structure and so highly adaptive and power-ful computational techniques are needed to perform inference, as this studyexemplies.Higdon, Heitmann, Nakhleh and Habib discuss perhaps the grandest of allproblems, the nature and evolution of the universe. This chapter showcasestechniques for emulating complex computer models with many inputs andoutputs. The A-cold dark matter model is the simplest cosmological model inagreement with the cosmic microwave background and large scale structuremeasurements. This model is determined by a small number of parameterswhich control the composition, expansion and uctuations of the universe,and the objective of this study is to learn about the values of these parametersusing measurements from the Sloan Digital Sky Survey (SDSS). Model outputsinclude a dark matter spectrumfor the universe and a temperature spectrumforthe cosmic microwave background. A key component of the Bayesian analysisis to nd a parsimonious representation of such high-dimensional output.Another is innovative modelling to combine the evidence from data on bothPreface xxviispectra to nd which model input parameters inuence the output appreciably,and to learn about those parameters.Liang, Jordan and Klein discuss the use of probabilistic context-free gram-mars in natural language processing, involving a large-scale natural languageparsing task. The chapter is a showcase of detailed, highly-structured Bayesianmodelling in which model dimension and complexity responds naturally toobserved data, building on the adaptive nature of the underlying nonparametricBayesian models developed by the authors. The framework involves structuredhierarchical Dirichlet process modelling and customized model tting via vari-ational methods, to address the core problem of identifying appropriate levelsof model complexity in using probabilistic context-free grammars as importantcomponents in the modelling of syntax in natural language processing. Detaileddevelopment and evaluation in experiments with a synthetic grammar induc-tion task complement the application to a large-scale natural language parsingstudy on data from the Wall Street Journal portion of the Penn Treebank, a largedata set used in the natural language processing community for evaluatingparsers.Lee, Taddy, Gramacy and Gray discuss the development of circuit devices,bipolar junction transistors, which are used to amplify electrical current, andshowcases the use of a exible kind of emulator based on a treed Gaussianprocess. To aid with the design of the circuit device, a computer model predictsits peak output as a function of the input dosage and a number of design para-meters. The peak output response can jump sharply with only small changesin dosage, a feature that the treed Gaussian process emulator is able to capture.The methodology also involves a novel sequential design procedure to generatedata to t the emulator, and performs sensitivity analysis and both calibrationand validation using experimental data.Prado discusses a study of experimental data involving large-scale EEG timeseries generated on individuals subject to tasks inducing cognitive fatigue, withthe eventual goals of models able to predict cognitive fatigue based on non-invasive scalp monitoring of real-time EEG uctuations. The chapter showcasesthe development and application of structured, multivariate Bayesian dynamicmodels for analysis of time-varying, non-stationary and erratic (brain wave)time series. Novel time-varying autoregressive and regime switching models,incorporating substantively relevant prior information via structured priors andtted using novel, customized Bayesian computational methods, are described.The applied study involves an experimental subject asked to perform simplearithmetic operations for a period of three hours. Prior to the experiment, thesubject was conrmed to be alert. After the experiment ended, the subject wasfatigued, as determined by measures of performance and post-task mood. Thestudy shows how the Bayesian analysis is used to assist practitioners in realtime detection of cognitive fatigue.xxviii PrefaceInvitationWe expect the Handbook to be of broad interest to researchers and expert prac-titioners, as well as to advanced students in statistical science and related disci-plines. We believe the important and challenging studies represented across adiverse ranges of applications areas, involving cutting-edge statistical thinkingand a broad array of Bayesian model-based and computational methodolo-gies, will also enthuse young researchers and non-statistical readers, and thatthe chapters exemplify and promote cross-fertilization in advanced statisticalthinking across multiple application areas. The Handbook will also serve as areference resource for researchers across these elds as well as within statisticalscience, and we invite you to use it broadly in support of education and teaching,as well as in disciplinary and interdisciplinary research.Tony OHagan and Mike West2009List of ContributorsDave Bell Nicholas School of the Environment, Duke University, Durham, NC27708, USAAdelmo I. Bertolde Departamento de Estatstica, Universidade Federal doEsprito Santo, CCE - UFES, Av. Fernando Ferrari, s/n, Vitoria - ES - CEP: 29065-900, BrazilRichard J. Boys School of Mathematics and Statistics, Newcastle University,Newcastle upon Tyne, NE1 7RU, UKCarlos M. Carvalho The University of Chicago Booth School of Business, 5807South Woodlawn Avenue, Chicago, IL 60637, USAA. Taylan Cemgil Department of Computer Engineering, Bo gazii University,34342 Bebek, Istanbul, TurkeyPeter Challenor National Oceanography Centre, Southampton, SO14 3ZH, UKJerry Cheng Department of Statistics, 501 Hill Center, Busch Campus, Rutgers,The State University of New Jersey, 110 Frelinghuysen Road, Piscataway, NJ08854-8019, USASamantha Low Choy School of Mathematical Sciences, Queensland Universityof Technology, Brisbane, AustraliaJames S. Clark Nicholas School of the Environment, Duke University, Durham,NC 27708, USA, and, Department of Biology, Duke University, Durham, NC27708, USAHelen E. Clough National Centre for Zoonosis Research, University of Liver-pool, Leahurst, Chester High Road, Neston, CH64 7TE, UKThomas Costigliola Research and Technology, BEST, LLC, Riverview HistoricalPlaza II, 3341 Newark Street, PH, Hoboken, NJ 07030, USAJonathan A. Cumming Department of Mathematical Statistics, Durham Uni-versity, Durham, UKPaul Damien McCombs School of Business, IROM, The University of Texas atAustin, Austin, TX 78712, USAA. Philip Dawid Statistical Laboratory, Centre for Mathematical Sciences,Wilberforce Road, Cambridge, CB3 0WB, UKxxx List of ContributorsMichael Dietze Department of Biology, Duke University, Durham, NC 27708,USA, and Department of Plant Biology, University of Illinois, Champaign-Urbana, Illinois, USADavid B. Dunson Department of Statistical Science, Duke University, Durham,NC 27705, USAJess Fernndez-Villaverde Department of Economics, University of Pennsyl-vania, 160 McNeil Building, 3718 Locust Walk Philadelphia, PA 19004, USAMarco A.R. Ferreira Department of Statistics, University of Missouri Columbia, Columbia, MO 65211-6100, USADani Gamerman Instituto de Matemtica, Universidade Federal do Rio deJaneiro, BrazilJames Gattiker Statistical Sciences Group, PO Box 1663, MS F600, Los Alamos,NM 87545, USAAlan E. Gelfand Department of Statistical Science, Duke University, Durham,NC 27708, USASimon J. Godsill Signal Processing and Communications Laboratories, Depart-ment of Engineering, Trumpington Street, University of Cambridge, Cam-bridge, CB2 1PX, UKMichael Goldstein Department of Mathematical Statistics, Durham University,Durham, UKFlvio B. Gonalves Department of Statistics, University of Warwick, UKPeter J. Green School of Mathematics, University of Bristol, Bristol, BS8 1TW,UKRobert B. Gramacy Statistical Laboratory, University of Cambridge, Wilber-force Road, Cambridge, CB3 0WB, UKGenetha A. Gray Technical Staff Member, Sandia National Laboratories, P.O.Box 969, MS 9159, Livermore, CA 94551, USAPablo Guerrn-Quintana Economist, Federal Reserve Bank of Philadelphia,Ten Independence Mall, Philadelphia, PA 19106, USASalman Habib Nuclear & Particle Physics, Astrophysics, and Cosmology, LosAlamos National Laboratory, PO Box 1663, MS B285, Los Alamos, NM 87545,USATim Hanson Division of Biostatistics, School of Public Health, University ofMinnesota, Minneapolis, MN 55455, USAKarl W. Heiner The School of Business, State University of New York at NewPaltz, 1 Hawk Drive, New Paltz, New York, 12561, USAKatrin Heitmann Space Science and Applications, Los Alamos National Labo-ratory, PO Box 1663, MS D466, Los Alamos, NM 87545, USAList of Contributors xxxiDaniel A. Henderson School of Mathematics and Statistics, Newcastle Univer-sity, Newcastle upon Tyne, NE1 7RU, UKMichelle Hersh Department of Biology, Duke University, Durham, NC 27708,USADave Higdon Statistical Sciences, Los Alamos National Laboratory, PO Box1663, MS F600, Los Alamos, NM 87545, USAJennifer A. Hoeting Department of Statistics, Colorado State University, FortCollins, CO 80523-1877, USAScott H. Holan Department of Statistics, University of Missouri Columbia,Columbia, MO 65211-6100, USAInes Ibanez Department of Biology, Duke University, Durham, NC 27708,USA, and School of Natural Resources and Environment, University of Michi-gan, 440 Church Street, Ann Arbor, MI 48109, USAAllan James High Performance Computing and Research Support Group,Queensland University of Technology, Brisbane, AustraliaMichael I. Jordan Computer Science Division, EECS Department, Universityof California at Berkeley, Berkeley, CA 94720, USA, and, Department of Statis-tics, University of California at Berkeley, Berkeley, CA 94720, USAMarc C. Kennedy The Food and Environment Research Agency, Sand Hutton,York, YO41 1LZ, UKDan Klein Computer Science Division, EECS Department, University of Cali-fornia at Berkeley, Berkeley, CA 94720, USAShannon L. LaDeau Department of Biology, Duke University, Durham, NC27708, USA and Carey Institute of Ecosystem Studies, Milbrook, New York,USAHerbert K.H. Lee Department of Applied Mathematics and Statistics, Univer-sity of California, Santa Cruz, 1156 High Street, MS: SOE2, Santa Cruz, CA95064, USAPercy Liang Computer Science Division, EECS Department, University of Cal-ifornia at Berkeley, Berkeley, CA 94720, USAHedibert F. Lopes The University of Chicago Booth School of Business, 5807South Woodlawn Avenue, Chicago, IL 60637, USAJoseph E. Lucas Institute for Genome Sciences and Policy, Duke UniversityMedical Center, Durham, NC 27710, USADavid Madigan Department of Statistics, Columbia University, 1255 Amster-dam Ave, New York, NY 10027, USAKanti V. Mardia Department of Statistics, University of Leeds, Leeds, LS2 9JT,UKxxxii List of ContributorsSean McMahon Nicholas School of the Environment, Duke University,Durham, NC 27708, USADoug McNeall Meteorological Ofce Hadley Centre, Fitzroy Road, Exeter, UKKerrie Mengersen School of Mathematical Sciences, Queensland University ofTechnology, Brisbane, AustraliaDan Merl Department of Statistical Science, Duke University, Durham, NC27708, USAJessica Metcalf Nicholas School of the Environment, Duke University,Durham, NC 27708, USAEmily Moran Department of Biology, Duke University, Durham, NC 27708,USAJulia Mortera Dipartimento di Economia, Universit Roma Tre, Via SilvioDAmico 77, 00145 Roma, ItalyDavid Morton ORIE, Department of Mechanical Engineering, University ofTexas in Austin, Austin, TX 78712, USAJustine Murray The Ecology Centre, School of Integrative Biology, The Univer-sity of Queensland, St Lucia, AustraliaCharles Nakhleh Pulsed Power Sciences Center, Sandia National Laboratories,PO Box 5800, Albuquerque, NM 87185-1186, USAJoseph R. Nevins Department of Molecular Genetics and Microbiology, Insti-tute for Genome Sciences and Policy, Duke University Medical Center, Durham,NC 27710, USAVysaul B. Nyirongo MLW Clinical Research Programme, College of Medicine,University of Malawi, Blantyre 3, MalawiJeremy E. Oakley School of Mathematics and Statistics, University of Shefeld,The Hicks Building, Hounseld Road, Shefeld, S3 7RH, UKAnthony OHagan Department of Probability and Statistics, University ofShefeld, The Hicks Building, Hounseld Road, Shefeld, S3 7RH, UKLuke Pangle Nicholas School of the Environment, Duke University, Durham,NC 27708, USAPaul Peeling Signal Processing and Communications Laboratories, Depart-ment of Engineering, Trumpington Street, University of Cambridge, Cam-bridge, CB2 1PX, UKJoo Batista M. Pereira Departamento de Mtodos Estatsticos, UniversidadeFederal do Rio de Janeiro, Caixa Postal 68530, Rio de Janeiro, BrazilAntonio Pievatolo CNR IMATI, Via Bassini 15, I-20133 Milano, ItalyNicholas G. Polson The University of Chicago Booth School of Business, 5807South Woodlawn Avenue, Chicago, IL 60637, USAList of Contributors xxxiiiElmira Popova ORIE, Department of Mechanical Engineering, University ofTexas in Austin, Austin, TX 78712, USARaquel Prado Department of Applied Mathematics and Statistics, BaskinSchool of Engineering, University of California, Santa Cruz, 1156 High Street,Santa Cruz, CA 95064, USACarole J. Proctor Institute for Ageing and Health, Newcastle University, New-castle upon Tyne, NE4 6BE, UKJos M. Quintana President, BEST, LLC, Riverview Historical Plaza II, 3341Newark Street, PH, Hoboken, NJ 07030, USAJill Rickershauser Department of Government, American University, Washing-ton, DC 20016, USADonald B. Rubin Department of Statistics, Harvard University, Cambridge,Massachusetts, USAJuan F. Rubio-Ramrez 213 Social Sciences, Duke University, Durham, NC27708, USAYann Rufeux FSB IMA STAT Station 8, EPFL - Swiss Federal Institute ofTechnology, CH - 1015 Ecublens, SwitzerlandFabrizio Ruggeri CNR IMATI, Via Bassini 15, I-20133 Milano, ItalySujit K. Sahu School of Mathematics, Southampton Statistical SciencesResearch Institute, University of Southampton, Southampton, UKAlexandra M. Schmidt Departamento de Mtodos Estatsticos, UniversidadeFederal do Rio de Janeiro, Caixa Postal 68530, Rio de Janeiro, BrazilJames Scott Department of Statistical Science, Duke University, Durham, NC27708, USAHaige Shen Novartis Research, New Jersey, USARichard L. Smith Department of Statistics and Operations research, Universityof North Carolina, Chapel Hill, NC 27599-3621, USATu M. Soares Departamento de Estatstica, Universidade Federal de Juiz deFora, BrazilMatthew Taddy Booth School of Business, The University of Chicago, 5807South Woodlawn Avenue, Chicago, IL 60637, USAClaudia Tebaldi Climate Central, Princeton, NJ, USAPaola Vicard Dipartimento di Economia, Universit Roma Tre, Via SilvioDAmico 77, 00145 Roma, ItalyPedro P. Vieira Fundao de Medicina Tropical do Amazonas, Gernciade Malria, Av.Pedro Teixeira n.25, Dom Pedro I, 69040-000 Manaus,AM Brazilxxxiv List of ContributorsXiaoqin Wang Department of Mathematics, Natural and Computer Sciences,University College of Gvle, 801 76, Gvle, SwedenMike West Department of Statistical Science, Duke University, Durham, NC27708, USANick Whiteley Signal Processing and Communications Laboratories, Depart-ment of Engineering, Trumpington Street, University of Cambridge, Cam-bridge, CB2 1PX, UKDarren J. Wilkinson School of Mathematics and Statistics, Newcastle Univer-sity, Newcastle upon Tyne, NE1 7RU, UKMike Wolosin Department of Biology, Duke University, Durham, NC 27708,USALi Yin Department of Medical Epidemiology and Biostatistics, Karolinska Insti-tute, Box 281, SE-171 77, Stockholm, SwedenElizabeth R. Zell Division of Bacterial Diseases, National Center for Immu-nization and Respiratory Diseases, Centers for Disease Control and Prevention,1600 Clifton Road, Atlanta, GA 30333, USAPART IBiomedical and Health SciencesThis page intentionally left blank 1Flexible Bayes regression ofepidemiologic dataDavid B. Dunson1.1 IntroductionEpidemiology is focused on the study of relationships between exposures andthe risk of adverse health outcomes or diseases. A better understanding of thefactors leading to disease is fundamental in developing better strategies for dis-ease prevention and treatment. Hence, the ndings from epidemiology studieshave the potential to have a fundamental impact on public health. However, thispotential is often not fully realized due to distrust among the general publicand physicians about the reliability of conclusions drawn from epidemiologystudies. This distrust stems in part from inconsistencies in ndings fromdifferent studies.Although statistics cannot solve certain problem in epidemiology, such asthe occurrence of unmeasured confounders, many of the problems with lackof reproducibility stem from the overly simplistic statistical analyses that areroutinely used. In particular, it is standard practice to reduce a potentiallycomplex health outcome to a simple 0/1 indicator of disease and then apply alogistic regression model. By also categorizing exposures, one can then obtainexposure group-specic odds ratios, which form the primary basis for inferenceon exposure disease relationships. Such an approach leads to transparentanalyses, with results easily interpretable by clinicians having minimal exper-tise in statistics.However, there are important limitations to this paradigm, which can lead tomisinterpretations of exposure disease relationships. The rst is the obviouslack of efciency that results from discarding data. For example, if one hasa continuous health response and a continuous exposure, then categorizingboth the response and exposure can lead to a reduction in power to detectan association. In addition, the results of the analysis will clearly be verysensitive to the number of categories chosen and the cutpoints for deningthese categories (Boucher et al., 1998). For continuous exposures, an obvi-ous alternative to categorization is to use splines to estimate the unknowndose-response curve (Greenland, 1995). However, for continuous responses,4 The Oxford Handbook of Applied Bayesian Analysisit is not clear how to best assess risk as a function of exposures and otherfactors.This chapter focuses on exible Bayesian methods for addressing this prob-lem motivated by pregnancy outcome data on birth weight and gestationalage at delivery. As argued by Wilcox (2001), birth weight is not particularlymeaningful in itself as a measure of health of the baby. However, there isoften interest in assessing factors predictive of intra-uterine growth restriction(IUGR). IUGR is best studied using longitudinal ultrasound data to assessfetal growth over time as in Slaughter, Herring and Thorp (2009). However,such data are typically not available, so it is most common to study IUGRusing small for gestational age (SGA) as a surrogate. SGA is dened as a0/1 indicator that the baby is below the 10th percentile of the populationdistribution of birth weight stratied on gestational age at delivery. One is alsoconcerned about large-for-gestational age (LGA) babies, which are more likelyto be born by Cesarian delivery or with a low Apgar score (Nohr et al., 2008). Thestandard approach analyses risk of SGA, LGA and preterm birth, dened as adelivery prior to 37 weeks completed gestation, in separate logistic regressionanalyses.A natural question that arises is whether one can obtain a coherent pictureof the impact of an exposure on pregnancy outcomes from such analyses.Although SGA is meant to provide a surrogate of growth restriction, whichis adjusted for gestational age at delivery, it is not biologically plausible toassume that fetal growth and the biological process of initiating delivery areindependent. Hence, it seems much more natural to consider birth weight andgestational age at delivery as a bivariate response, while avoiding the loss ofinformation that accompanies categorization (Gage, 2003). Even if the focus isonly on assessing predictors of risk of premature delivery, the cutoff of 37 weekseliminates the possibility of assessing risk on a ner scale. In particular, babiesborn near the 37 week cutoff experience limited short and long term morbiditycompared with babies born earlier.Figure 1.1 shows data on gestational age at delivery and birth weight forn = 2313 pregnancies from the Longnecker et al. (2001) substudy of the USCollaborative Perinatal Project. The vertical dashed line at 37 weeks showsthe cutoff used to dene preterm births, while the solid lines show the cutofffor SGA depending on gestational age at delivery for male and female babies.Interest focuses on assessing how predictors, such as the level of exposure tochemicals in the environment (DDT, PCBs, etc.), impact the risk of adversepregnancy outcomes, which correspond to lower values of birth weight andgestational age at delivery. As reducing the data to binary indicators has cleardisadvantages, I propose to instead let the response for pregnancy i correspondto yi = (yi 1. yi 2)/, with yi 1 = gestational age at delivery and yi 2 = birth weight.Although this seems very natural, it does lead to some complications in thatFlexible Bayes Regression of Epidemiologic Data 528 30 32 34 36 38 40 42 44100015002000250030003500400045005000Gestational age at delivery (weeks)Birth weight (gms)Fig. 1.1 Data on gestational age at delivery and birth weight for the Longnecker et al. (2001) substudyof the Collaborative Perinatal Project.standard parametric models for bivariate continuous data, such as the bivariatenormal or t-distributions, provide a poor to the data. This is clear in examin-ing Figure 1.2, which shows an estimate of the marginal density of yi 1. Thedensity is left skewed and standard transformations fail to produce a Gaussianshape.As adverse health outcomes correspond to values in the tails of the responsedistribution, the main interest is in studying how predictors impact the distri-butional tails. For example, in studying the impact of DDE levels in maternalserum on risk of premature delivery using the Longnecker et al. (2001) data,we would like to assess how the left tail of the distribution in Figure 1.2changes with dose of DDE and other predictors. Potentially, this interest can beaddressed using quantile regression. However, this would necessitate choosinga particular percentile of the distribution that is of primary interest, which isin some sense as unappealing as categorization. As an alternative, one canallow the conditional distribution of yi given predictors xi = (xi 1. . . . . xi p)/ tobe unknown and changing exibly over the predictor space.6 The Oxford Hand


Recommended