+ All Categories
Home > Documents > The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee...

The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee...

Date post: 26-Feb-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
106
Portland, OR June 15-18, 2014 The City of Roses Welcomes You!
Transcript
Page 1: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene

Portland OR

June 15-18 2014

The City of Roses

Welcomes You

Published by International Chinese Statistical Association - Korean International Statistical Society

International Chinese Statistical Association - Korean InternationalStatistical Society

Applied Statistics Symposium

2014

CONFERENCE INFORMATION PROGRAM AND ABSTRACTS

June 15 - 18 2014

Portland Marriot Downtown Waterfront

Portland Oregon USA

Organized byInternational Chinese Statistical Association - Korean International Statistical Society

ccopy2014International Chinese Statistical Association - Korean International Statistical Society

Contents

Welcome 1Conference Information 2

Committees 2Acknowledgements 4Conference Venue Information 6Program Overview 7Keynote Lectures 8Student Paper Awards 9Short Courses 10Social Program 15ICSA 2015 in Fort Collins CO 16ICSA 2014 China Statistics Conference 17ICSA Dinner at 2014 JSM 18

Scientific Program 19Monday June 16 800 AM - 930 AM 19Monday June 16 1000 AM-1200 PM 19Monday June 16 130 PM - 310 PM 21Monday June 16 330 PM - 510 PM 23Tuesday June 17 820 AM - 930 AM 25Tuesday June 17 1000 AM - 1200 PM 25Tuesday June 17 130 PM - 310 PM 27Tuesday June 17 330 PM - 530 PM 29Wednesday June 18 830 AM - 1010 AM 31Wednesday June 18 1030 AM-1210 PM 33

Abstracts 36Session 1 Emerging Statistical Methods for Complex Data 36Session 2 Statistical Methods for Sequencing Data Analysis 36Session 3 Modeling Big Biological Data with Complex Structures 37Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses 38Session 5 Recent Advances in Astro-Statistics 38Session 6 Statistical Methods and Application in Genetics 39Session 7 Statistical Inference of Complex Associations in High-Dimensional Data 40Session 8 Recent Developments in Survival Analysis 40Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products 41Session 10 Analysis of Observational Studies and Clinical Trials 42Session 11 Lifetime Data Analysis 44Session 12 Safety Signal Detection and Safety Analysis 44Session 13 Survival and Recurrent Event Data Analysis 45Session 14 Statistical Analysis on Massive Data from Point Processes 45Session 15 High Dimensional Inference (or Testing) 46Session 16 Phase II Clinical Trial Design with Survival Endpoint 47Session 17 Statistical Modeling of High-throughput Genomics Data 47Session 18 Statistical Applications in Finance 48Session 19 Hypothesis Testing 49Session 20 Design and Analysis of Clinical Trials 50

iii

Session 21 New methods for Big Data 51Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data 51Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process 52Session 24 Bayesian Models for High Dimensional Complex Data 53Session 25 Statistical Methods for Network Analysis 54Session 26 New Analysis Methods for Understanding Complex Diseases and Biology 54Session 27 Recent Advances in Time Series Analysis 55Session 28 Analysis of Correlated Longitudinal and Survival Data 56Session 29 Clinical Pharmacology 57Session 30 Sample Size Estimation 58Session 31 Predictions in Clinical Trials 59Session 32 Recent Advances in Statistical Genetics 59Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization 60Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications 61Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials 61Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis 62Session 37 High-Dimensional Data Analysis Theory and Application 63Session 38 Leading Across Boundaries Leadership Development for Statisticians 64Session 39 Recent Advances in Adaptive Designs in Early Phase Trials 64Session 40 High Dimensional RegressionMachine Learning 65Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice 66Session 42 Applications of Spatial Modeling and Imaging Data 67Session 43 Recent Development in Survival Analysis and Statistical Genetics 67Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population 68Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis 69Session 46 Missing Data the Interface between Survey Sampling and Biostatistics 70Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine 70Session 48 Student Award Session 1 71Session 49 Network AnalysisUnsupervised Methods 72Session 50 Personalized Medicine and Adaptive Design 73Session 51 New Development in Functional Data Analysis 74Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs 75Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials 76Session 54 Approaches to Assessing Qualitative Interactions 76Session 55 Interim Decision-Making in Phase II Trials 77Session 56 Recent Advancement in Statistical Methods 78Session 57 Building Bridges between Research and Practice in Time Series Analysis 78Session 58 Recent Advances in Design for Biostatistical Problems 79Session 59 Student Award Session 2 79Session 60 Semi-parametric Methods 80Session 61 Statistical Challenges in Variable Selection for Graphical Modeling 81Session 62 Recent Advances in Non- and Semi-Parametric Methods 82Session 63 Statistical Challenges and Development in Cancer Screening Research 83Session 64 Recent Developments in the Visualization and Exploration of Spatial Data 84Session 65 Advancement in Biostaistical Methods and Applications 84Session 66 Analysis of Complex Data 85Session 67 Statistical Issues in Co-development of Drug and Biomarker 86Session 68 New Challenges for Statistical AnalystProgrammer 86Session 69 Adaptive and Sequential Methods for Clinical Trials 87Session 70 Survival Analysis 88Session 71 Complex Data Analysis Theory and Application 88Session 72 Recent Development in Statistics Methods for Missing Data 89Session 73 Machine Learning Methods for Causal Inference in Health Studies 90Session 74 JP Hsu Memorial Session 90Session 75 Challenge and New Development in Model Fitting and Selection 91Session 76 Advanced Methods and Their Applications in Survival Analysis 91

Session 77 High Dimensional Variable Selection and Multiple Testing 92Index of Authors 94

2014 Joint Applied Statistics Symposium of ICSA and KISS

June 15-18 Marriot Downtown Waterfront Portland Oregon USA

Welcome to the 2014 joint International Chinese Statistical Association (ICSA) and

the Korean International Statistical Society (KISS) Applied Statistical Symposium

This is the 23rd of the ICSA annual symposium and 1st for KISS The organizing committees have

been working hard to put together a strong program including 7 short courses 3 keynote lectures

76 scientific sessions student paper sessions and social events Our scientific program includes

keynote lectures from prominent statisticians Dr Sharon-Lise Normand Dr Robert Gentleman and

Dr Sastry Pantula and invited and contributed talks covering cutting-edge topics on Genome Scale

data and big data as well as on the new world of statistics after 2013 international year of statis-

tics We hope this symposium will provide abundant opportunities for you to engage learn and

network and get inspirations to advance old research ideas and develop new ones We believe this

will be a memorable and worthwhile learning experience for you

Portland is located near the confluence of the Willamette and Columbia rivers with unique city cul-

ture It is close to the famous Columbia gorge Oregon high mountains and coast Oregon is also

famous for many micro- breweries and beautiful wineries without sale tax June is a great time to

visit We hope you also have opportunities to experience the rich culture and activities the city has

to offer during your stay

Thanks for coming to the 2014 ICSA-KISS Applied Statistics Symposium in Portland

Dongseok Choi and Rochelle Fu on behalf of

2014 ICSA-KISS Applied Statistics Symposium Executive and Organizing committees

The city The city The city of roses of roses of roses welcomes welcomes welcomes you you you

Committees

2 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Executive13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Co-Chair amp Treasurer Oregon Health amp Science U Joan Hu Simon Fraser U Zhezhen Jin Program Chair Columbia U Ouhong Wang Amgen Ru-Fang Yeh Genentech XH Andrew Zhou U of Washington Cheolwoo Park Webmaster U of Georgia

Local13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Chair Oregon Health amp Science U Yiyi Chen Oregon Health amp Science U Thuan Nguyen Oregon Health amp Science U Byung Park Oregon Health amp Science U Xinbo Zhang Oregon Health amp Science U

Program13 Committee13 Zhezhen Jin Chair Columbia U Gideon Bahn VA Hospital Kani Chen Hong Kong U of Science and Technology Yang Feng Columbia U Liang Fang Gilead Qi Jiang Amgen Mikyoung Jun Texas AampM U Sin-Ho Jung Duke U Xiaoping Sylvia Hu Gene Jane Paik Kim Stanford U Mimi Kim Albert Einstein College of Medicine Mi-OK Kim Cincinnati Childrens Hospital Medical Center Gang Li Johnson and Johnson Yunfeng Li Phamacyclics Mei-Ling Ting Lee U of Maryland Yoonkyung Lee Ohio State U Meng-Ling Liu New York U Xinhua Liu Columbia U Xiaolong Luo Celgene Corporation Taesung Park Seoul National U Yu Shen MD Anderson Cancer center Greg (Guoxing) Soon US Food and Drug Administration Zheng Su Deerfield Company Christine Wang Amgen Lan Xue Oregon State U Yichuan Zhao Georgia State U

Committees

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 3

Program13 Book13 Committee13 Mengling Liu Chair New York U Tian Zheng Columbia U Wen (Jenna) Su Columbia U Zhenzhen Jin Columbia U

Student13 Paper13 Award13 Committee13 Wenqing He Chair U of Western Ontario Qixuan Chen Columbia U Hyunson Cho National Cancer Institute Dandan Liu Vanderbilt U Jinchi Lv U of Southern California

Short13 Course13 Committee13 Xiaonan Xue Chair Albert Einstein College of Medicine Wei-Ting Hwang U of Pennsylvania Ryung Kim Albert Einstein College of Medicine Jessica Kim US Food and Drug Administration Laura Lu US Food and Drug Administration Mikyoung Jun Texas AampM U Tao Wang Albert Einstein College of Medicine

IT13 Support13 Lixin (Simon) Gao Biopier Inc

Symposium Sponsors

The 2014 ICSA-KISS Applied Statistics Symposium is supported by a financial contribu-

tion from the following sponsors

The organizing committees greatly appreciate the support of the above sponsors

The 2014 ICSA-KISS Joint Applied Statistics Symposium Exhibitor

CRC Press mdash Taylor amp Francis Group

Springer Science amp Business Media

The Lotus Group

MedfordRoom

Salon G

Salon H

Salon F Salon E

Salon I

Salon A

Lounge

GiftShop

Willamette Room

ColumbiaRoom

BellStand

Main Lobby

SunstoneRoom

FitnessCenter

Whirlpool

SwimmingPool

MeadowlarkRoom

Douglas FirRoom

SalmonRoom

Patio

Skywalk toCrown Plaza Parking

Guest Laundry

Ice

Hot

elSe

rvic

e A

rea

Concierge

Front Desk

Mai

n En

tran

ce

BallroomLobby

EscalatorStairs

Stairs

Elevators

Elevators

Elevators

PortlandRoom

EugeneRoom

Salon B

Salon C

Salon D

SalemRoom

EscalatorStairs

Stairs

Lower Level 1Main Lobby

3rd Floor2nd Floor

HotelService Area

HotelService Area

portland marriott downtown waterfront

hotel floor plans 1401 SW Naito Parkway bull Portland Oregon 97201Hotel (503) 226-7600

Sales Facsimile (503) 226-1209portlandmarriottcom

RegistrationDesk

SalesEvents

and Executive

Offices

Hotel Service Area

RegistrationStorage

Audio Visual

Storage

Mount HoodRoom

Haw

thor

neRo

omB

elm

ont

Room

Laur

elhu

rst

Room

PearlRoom

Open ToLobby

RestaurantLobby

Hotel

Service Area

Elev

ator

s

Escalator

Lobby Baramp Cafeacute

Program Overview

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18

Sunday June 15th 2014 Time Room Session 800 AM - 600 PM Ballroom Foyer Registration 700 AM - 845AM Breakfast 945 AM ndash 1015 AM Break 800 AM - 500 PM Salon A Short Course Recent Advances in Bayesian Adaptive Clinical Trial Design 800 AM - 500 PM Salon B Short Course Analysis of Life History Data with Multistate Models 800 AM - 500 PM Salon C Short Course Propensity Score Methods in Medical Research for the Applied Statistician 800 AM - 1200 PM Salon D Short Course ChIP-seq for transcription and epigenetic gene regulation 800 AM - 1200 PM Columbia Short Course Data Monitoring Committees In Clinical Trials 1200 PM - 100 PM Lunch for Registered Full-Day Short Course Attendees

100 PM - 500 PM Salon D Short Course Analysis of Genetic Association Studies Using Sequencing Data and Related Topics

100 PM - 500 PM Columbia Short Course Analysis of biomarkers for prognosis and response prediction 245 PM - 315 PM Break 600 PM - 830 PM Mt Hood ICSA Board Meeting (Invited Only) 700 PM - 900 PM Salon E Opening Mixer

Monday June 16th 2014 730 AM - 600 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 800 AM - 820 AM Salon E-F Welcome 820 AM - 930 AM Salon E-F Keynote I Robert Gentleman Genetech 930 AM - 1000 AM Ballroom Foyer Break 1000 AM -1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 510 PM See program Parallel Sessions

Tuesday June 17th 2014 820 AM - 530 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 820 AM - 930 AM Salon E-F Keynote II Sharon-Lise Normand Harvard University 930 AM - 1000 AM Ballroom Foyer Break 1000 AM - 1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 530 PM See program Parallel Sessions 630 PM - 930 PM Off site Banquet (Banquet speaker Dr Sastry Pantula Oregon State University)

Wednesday June 18th 2014 830 AM - 100 PM Ballroom Foyer Registration 730 AM ndash 900 AM Breakfast 830 AM - 1010 AM See program Parallel Sessions 1010 AM - 1030 AM Ballroom Foyer Break 1030 AM - 1210 PM See program Parallel Sessions

Keynote Lectures

8 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Monday June 16th 820 AM - 930 AM

Robert Gentleman Senior Director Bioinformatics Genentech Postdoctoral Mentor Speaker Biography I joined Genentech in 2009 as Senior Director of the Bioinformatics and Computational Biology

Department I was excited by the opportunity to get involved in drug development and to do work that would directly impact patients I had worked at two major cancer centers and while immensely satisfying the research done there is still fairly distant from the patient At Genentech patients are at the forefront of everything we do Genentech Research is that rare blend of academia and industry that manages to capture most of the best aspects of both The advent of genome scale data technologies is revolutionizing molecular biology and is providing us with new and exciting opportunities for drug development I am very excited by the new opportunities we have to develop methods for computational discovery of potential drug targets At the same time these large genomic data sets provide us with opportunities to identify and understand different patient subsets and to help guide us towards much more targeted therapeutics

Postdoctoral Mentor

Being a post-doc mentor is one of the highlights of being in Research The ability to work with really talented post-docs who are interested in pushing the boundaries of computational science provides me with an outlet for my blue-skies research ideas Title Analyzing Genome Scale Data I will discuss some of the many genome scale data analysis problems such as variant calling and genotyping I will discuss the statistical approaches used as well as the software development needs of addressing these problems I will discuss approaches to parallelization of code and other practical computing issues that face most data analysts working on these data

Tuesday June 17th 820 AM-930 AM

Sharon-Lise Normand Professor Department of Health Care Policy Harvard Medical School Department of Biostatistics Harvard School of Public Health Speaker Biography Sharon-Lise T Normand PhD is a

professor of health care policy (biostatistics) in the Department of Health Care Policy at Harvard Medical School and in the Department of Biostatistics at the Harvard School of Public Health Dr Normandrsquos research focuses on the development of statistical methods for health services research primarily using Bayesian approaches to problem solving including assessment of quality of care methods for causal inference provider profiling meta-analysis and latent variable modeling She has developed a long line of research on methods for the analysis of patterns of treatment and quality of care for patients with cardiovascular disease and with mental disorders in particular Title Combining Information for Assessing Safety Effectiveness and Quality Technology Diffusion and Health Policy Health information growth has created unprecedented opportunities to evaluate therapies in large and broadly representative patient populations Extracting sound evidence from large observational data is now at the forefront of health care policy decisions - regulators are moving away from a strict biomedical perspective to one that is wider for coverage of new medical technologies Yet discriminating between beneficial and wasteful new technology remains methodologically challenging - while big data provide opportunities to study treatment effect heterogeneity estimation of average causal effects in sub-populations are underdeveloped in observational data and correct choice of confounding adjustment is difficult in the large p setting In this talk I discuss analytical issues related to the analysis of observational data when the goals involve characterizing the diffusion of multiple new technologies and assessing their causal impacts in the areas of mental illness and cardiovascular interventions This work is supported in part by grants U01-MH103018 from the National Institutes of Health and U01-FD004493 from the US Food and Drug Administration

Student Paper Awards 13

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18 9

ASA13 Bio-shy‐pharmaceutical13 Awards13 Guanhua Chen University of North Carolina ndash Chapel Hill

⎯ Title Personalized Dose Finding Using Outcome Weighted Learning ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Cheng Zheng University of Washington

⎯ Title Survival Rates Prediction when Training Data and Target Data have Different Measurement Error ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Jiann-shy‐Ping13 Hsu13 Pharmaceutical13 and13 Regulatory13 Sciences13 Student13 Paper13 Award13 Sandipan Roy University of Michigan

⎯ Title Estimating a Change-Point in High-Dimensional Markov Random Field Models ⎯ Time Wednesday June 18th 1030 AM - 1210 PM ⎯ Session 74 JP Hsu Memorial Session (Salon D Lower Level 1)

ICSA13 Student13 Paper13 Awards13 13

Ting-Huei Chen University of North Carolina ndash Chapel Hill ⎯ Title Using a Structural Equation Modeling Approach with Application in Alzheimerrsquos Disease ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Haolei Weng Columbia University

⎯ Title Regularization after Retention in Ultrahigh Dimensional Linear Regression Models ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Ran Tao University of North Carolina ndash Chapel Hill

⎯ Title Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Hsin-Wen Chang Columbia University

⎯ Title Empirical likelihood based tests for stochastic ordering under right censorship ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Qiang Sun University of North Carolina ndash Chapel Hill ⎯ Title Hard Thresholded Regression Via Linear Programming ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Short Courses

10 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

1 Recent Advances in Bayesian Adaptive Clinical Trial Design Presenters Peter F Thall amp Brian P Hobbs The University of Texas MD Anderson Cancer Center 1400 Hermann Pressler Dr Houston TX 77030-4008 Email rexmdandersonorg Course length One day OutlineDescription This one-day short course will cover a variety of recently developed Bayesian methods for the design and conduct of adaptive clinical trials Emphasis will be on practical application with the course structured around a series of specific illustrative examples Topics to be covered will include (1) using historical data in both planning and adaptive decision making during the trial (2) using elicited utilities or scores of different types of multivariate patient outcomes to characterize complex treatment effects (3) characterizing and calibrating prior effective sample size (4) monitoring safety and futility (5) eliciting and establishing priors and (6) using computer simulation as a design tool These methods will be illustrated by actual clinical trials including cancer trials involving chemotherapy for leukemia and colorectal cancer stem cell transplantation and radiation therapy as well as trials in neurology and neonatology The illustrations will include both early phase trials to optimize dose or dose and schedule and randomized comparative phase III trials References Braun TM Thall PF Nguyen H de Lima M Simultaneously optimizing dose and schedule of a new cytotoxic agent Clinical Trials 4113-124 2007 Hobbs BP Carlin BP Mandrekar S Sargent DJ Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials Biometrics 67 1047ndash1056 2011 Hobbs BP Sargent DJ Carlin BP Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models Bayesian Analysis 7 639ndash674 2012 Hobbs BP Carlin BP Sargent DJ Adaptive adjustment of the randomization ratio using historical control data Clinical Trials 10430-440 2013 Morita S Thall PF Mueller P Determining the effective sample size of a parametric prior Biometrics 64595-602 2008 Morita S Thall PF Mueller P Evaluating the impact of prior assumptions in Bayesian biostatistics Statistics in Biosciences 21-17 2010

Thall PF Bayesian models and decision algorithms for complex early phase clinical trials Statistical Science 25227-244 2010 Thall PF Szabo A Nguyen HQ et al Optimizing the concentration and bolus of a drug delivered by continuous infusion Biometrics 671638-1646 2011 Thall PF Nguyen HQ Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes J Biopharmaceutical Statistics 22785-801 2012 Thall PF Nguyen HQ Braun TM Qazilbash M Using joint utilities of the times to response and toxicity to adaptively optimize schedule-dose regimes Biometrics In press About the presenters

Dr Peter Thall has pioneered the use of Bayesian methods in medical research He has published over 160 research papers and book chapters in the statistical and medical literature including numerous papers providing innovative methods for the design conduct and analysis of clinical trials Over the course of his career he had designed over 300 clinical trials He has presented 20 short courses and over 130 invited talks and regularly provides statistical consultation for corporations in the pharmaceutical industry He has served as an associated editor for the journals Statistics in Medicine Journal of National Cancer Institute and Biometrics currently is an associate editor for the journals Clinical Trials Statistics in Biosciences and is an American Statistical Association Media Expert

Dr Brian P Hobbs is Assistant Professor in the Department of Biostatistics at the University of Texas MD Anderson Cancer Center in Houston Texas He completed his undergraduate education at the University of Iowa and obtained a masterrsquos and doctoral degree in biostatistics at the University of Minnesota in Minneapolis He was the recipient of 2010 ENAR John Van Ryzin Student Award Dr Hobbs completed a postdoctoral fellowship in the Department of Biostatistics at MD Anderson Cancer Center before joining the faculty in 2011 His methodological expertise covers Bayesian inferential methods hierarchical modeling utility-based inference adaptive trial design in the presence of historical controls sequential design in the presence of co-primary endpoints and semiparametric modeling of functional imaging data 2 Analysis of Life History Data with Multistate Models

Presenter Richard Cook and Jerry Lawless Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada Email rjcookuwaterlooca jlawlessuwaterlooca

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 11

Course Length One day

OutlineDescription

Life history studies examine specific outcomes and processes during peoples lifetimes For example cohort studies of chronic disease provide information on disease progression fixed and time-varying risk factors and the extent of heterogeneity in the population Modelling and analysis of life history processes is often facilitated by the use of multistate models The aim of this workshop is to present models and methods for multistate analyses and to indicate some current topics of research Software for conducting analyses will be discussed and code for specific problems will be given A wide range of illustrations involving chronic disease and other conditions will be presented Course notes will be distributed

TOPICS

1 Introduction 2 Some Basic Quantities for Event History Modelling 3 Some Illustrative Analyses Involving Multistate Models 4 Processes with Intermittent Observation 5 Modelling Heterogeneity and Associations 6 Dependent Censoring and Inspection 7 Some Other Topics About the presenters Richard Cook is Professor of Statistics at the University of Waterloo and holder of the Canada Research Chair in Statistical Methods for Health Research He has published extensively in the areas of statistical methodology clinical trials medicine and public health including many articles on event history analysis multistate models and the statistical analysis of life history data He collaborates with numerous researchers in medicine and public health and has consulted widely with pharmaceutical companies on the design and analysis of clinical trials

Jerry Lawless is Distinguished Professor Emeritus of Statistics at the University of Waterloo He has published extensively on statistical models and methods for survival and event history data life history processes and other topics and is the author of Statistical Models and Methods for Lifetime Data (2nd edition Wiley 2003) He has consulted and worked in many applied areas including medicine public health manufacturing and reliability Dr Lawless was the holder of the GM-NSERC Industrial Research Chair in Quality and Productivity from 1994 to 2004

Drs Cook and Lawless have co-authored many papers as well as the book The Statistical Analysis of Recurrent Events (Springer 2007) They have also given numerous workshops together

3 Propensity Score Methods in Medical Research for the Applied Statistician Presenter Ralph DrsquoAgostino Jr PhD Department of Biostatistical Sciences Wake Forest University School of Medicine Medical Center Boulevard Winston-Salem NC 27157 Email rdagostiwakehealthedu Course length One Day OutlineDescription

The purpose of this short course is to introduce propensity score methodology to applied statisticians Currently propensity score methods are being widely used in research but often their use is not accompanied by an explanation on how they were used or whether they were used appropriately This course will teach the attendee the definition of the propensity score show how it is estimated and present several applied examples of its use In addition SAS code will be presented to show how to estimate propensity scores assess model success and perform final treatment effect estimation Published medical journal articles that have used propensity score methods will be examined Some attention will be given to the use of propensity score methods for detecting safety signals using post-marketing data Upon completion of this workshop researchers should be able to understand what a propensity score is to know how to estimate it to identify under what circumstances they can be used to know how to evaluate whether a propensity score model ldquoworkedrdquo and to be able to critically review the medical literature where propensity scores have been used to determine whether they were used appropriately In addition attendees will be shown statistical programs using SAS software that will estimate propensity scores assess the success of the propensity score model and estimate treatment effects that take into account propensity scores Experience with SAS programming would be useful for attendees TextbookReferences

Rosenbaum P Rubin DB The central role of the propensity score in observational studies for causal effects Biometrika 19837041-55

DrsquoAgostino RB Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group Stat Med 1998 172265-2281

Rubin DB The design versus the analysis of observational studies for causal effects parallels with the design of randomized studies Stat Med 2007 2620-36

DrsquoAgostino RB Jr DrsquoAgostino RB Sr Estimating treatment effects using observational data JAMA 2007297(3) 314-316

Yue LQ Statistical and regulatory issues with the application of propensity score analysis to non-

Short Courses

12 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

randomized medical device clinical studies J Biopharm Stat 2007 17(1) 1-13

DrsquoAgostino RB Jr Propensity scores in cardiovascular research Circulation 2007 115(17)2340-2343

About the presenters Dr DAgostino holds a PhD in Mathematical Statistics from Harvard University He is a Fellow of the American Statistical Association and a Professor of Biostatistical Sciences at the Wake Forest School of Medicine (WFSM) He has been a principal investigator for several RO1 grantssubcontracts funded by the NIHCDC and has served as the Statistical Associate Editor for Arthroscopy (The Journal of Arthroscopy and Related Surgery) since 2008 and has previously been on the editorial boards for Current Controlled Trials in Cardiovascular Medicine the Journal of Cardiac Failure and the American Journal of Epidemiology He has published over 235 manuscripts and book chapters in areas of statistical methodology (in particular propensity score methods) cardiovascular disease diabetes cancer and genetics He has extensive experience in the design and analysis of clinical trials observational studies and large scale epidemiologic studies He has been an author on several manuscripts that describe propensity score methodology as well as many applied manuscripts that use this methodology In addition during the past twenty years Dr DrsquoAgostino has made numerous presentations and has taught several short courses and workshops on propensity score methods 4 ChIP-seq for transcription and epigenetic gene regulation Presenter X Shirley Liu Professor of Biostatistics and Computational Biology Harvard School of Public Health Director Center for Functional Cancer Epigenetics Dana-Farber Cancer Institute Associate member Broad Institute 450 Brookline Ave Mail CLS-11007 Boston MA 02215 Email xsliujimmyharvardedu Course length Half Day OutlineDescription With next generation sequencing ChIP-seq has become a popular technique to study transcriptional and epigenetic gene regulation The short course will introduce the technique of ChIP-seq and discuss the computational and statistical issues in analyzing ChIP-seq data They includes the initial data QC normalizing biases identifying transcription factor binding sites and target genes predicting additional transcription factor drivers in biological processes integrating binding with transcriptome and epigenome information We will also emphasize the importance of dynamic ChIP-seq and introduce some of the tools and databases that are useful for ChIP-seq data analysis

TextbookReferences Park PJ ChIP-seq advantages and challenges of a maturing technology Nat Rev Genet 2009 Oct10(10)669-80 Shin H Liu T Duan X Zhang Y Liu XS Computational methodology for ChIP-seq analysis Quantitative Biology 2013 About the presenter Dr X Shirley Liu is Professor of Biostatistics and Computational Biology at Harvard School of Public Health and Director of the Center for Functional Cancer Epigenetics at the Dana-Farber Cancer Institute Her research focuses on computational models of transcriptional and epigenetic regulation by algorithm development and data integration for high throughput data She has developed a number of widely used transcription factor motif finding (cited over 1700 times) and ChIP-chipseq analysis algorithms (over 8000 users) and has conducted pioneering research studies on gene regulation in development metabolism and cancers Dr Liu published over 100 papers including over 30 in Nature Science or Cell series and she has an H-index of 50 according to Google Scholar statistics She presented at over 50 conferences and workshops and gave research seminars at over 70 academic and research institutions worldwide 5 Data Monitoring Committees In Clinical Trials Presenter Jay Herson PhDSenior Associate Biostatistics Johns Hopkins Bloomberg School of Public Health Baltimore MD Email jayhersonearthlinknet Course Length Half day OutlineDescription This workshop deals with best practices for data monitoring committees (DMCs) in the pharmaceutical industry The emphasis is on safety monitoring because this constitutes 90 of the workload for pharmaceutical industry DMCs The speaker summarizes experience over 24 years of working as statistical member or supervisor of statistical support for DMCs He provides insight into the behind-the-scenes workings of DMCs which those working in industry or FDA may find surprising The introduction presents a stratification of the industry into Big Pharma Middle Pharma and Infant Pharma which will be referred to often in this workshop Subsequent sections deal with DMC formation DMC meetings and the process of serious adverse event (SAE) data flow The tutorialrsquos section on clinical issues explains the nature of MedDRA coding as well as issues in multinational trials This will be followed by a statistical section which reviews and illustrates the various methods of statistical analysis of treatment-emergent adverse events dealing with multiplicity and if time allows likelihood and Bayesian methods The

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 13

workshoprsquos review of biases and pitfalls describes reporting bias analysis bias granularity bias competing risks and recommendations to reduce bias A description of DMC decisions goes through various actions and ad hoc analyses the DMC can make when faced with an SAE issue and their limitations The workshop concludes with emerging issues such as adaptive designs causal inference biomarkers training DMC members cost control DMC audits mergers and licensing and the high tech future of clinical trials Text Herson J Data and Safety Monitoring Committees in Clinical Trials Chapman amp Hall CRC 2009 About the presenter Jay Herson received his PhD in Biostatistics from Johns Hopkins in 1971 After working on cancer clinical trials at MD Anderson Hospital he formed Applied Logic Associates (ALA) in Houston in 1983 ALA grew to be a biostatistical-data management CRO with 50 employees when it was sold to Westat in 2001 Jay joined the Adjunct Faculty in Biostatistics at Johns Hopkins in 2004 His interests are interim analysis in clinical trials data monitoring committees and statistical regulatory issues He chaired the first known data monitoring committee in the pharmaceutical industry in 1988 He is the author of numerous papers on statistical and clinical trial methodology and in 2009 authored the book Data and Safety Monitoring Committees in Clinical Trials published by Chapman Hall CRC 6 Analysis of Genetic Association Studies Using Sequencing Data and Related Topics Presenters Xihong Lin Department of Biostatistics Harvard School of Public Health xlinhsphhravardedu Seunggeun Lee University of Michigan leeshawnumichedu Course length Half day OutlineDescription The short course is to discuss the current methodology in analyzing sequencing association studies for identifying genetic basis of common complex diseases The rapid advances in next generation sequencing technologies provides an exciting opportunity to gain a better understanding of biological processes and new approaches to disease prevention and treatment During the past few years an increasing number of large scale sequencing association studies such as exome-chip arrays candidate gene sequencing whole exome and whole genome sequencing studies have been conducted and preliminary analysis results have become rapidly available These studies could potentially identify new genetic variants that play important roles in understanding disease etiology or treatment response However due to the massive number of

variants and the rareness of many of these variants across the genome sequencing costs and the complexity of diseases efficient methods for designing and analyzing sequencing studies remain virtually important yet challenging This short course provides an overview of statistical methods for analysis of genome-wide sequencing association studies and related topics Topics include study designs for sequencing studies data process pipelines statistical methods for detecting rare variant effects meta analysis genes-environment interaction population stratification mediation analysis for integrative analysis of genetic and genomic data Data examples will be provided and software will be discussed TextbookReferences Handout and references will be provided About the presenters Xihong Lin is Professor of Biostatistics and Coordinating Director of the Program of Quantitative Genomics at the School of Public Health of Harvard University Dr Linrsquos research interests lie in statistical genetics and lsquoomics especially development and application of statistical and computational methods for analysis of high-throughput genetic and omics data in epidemiological and clinical studies and in statistical methods for analysis of correlated data such as longitudinal clustered and family data Dr Linrsquos specific areas of expertise include statistical methods for genome-wide association studies and next generation sequencing association studies genes and environment mixed models and nonparametric and seimparametric regression She received the 2006 Presidentsrsquo Award for the outstanding statistician from the Committee of the Presidents of Statistical Societies (COPSS) and the 2002 Mortimer Spiegelman Award for the outstanding biostatistician from the American Public Health Association She is an elected fellow of the American Statistical Association Institute of Mathematical Statistics and International Statistical Institute Dr Lin was the Chair of the Committee of the Presidents of the Statistical Societies (COPSS) between 2010 and 2012 She is currently a member of the Committee of Applied and Theoretical Statistics of the US National Academy of Science Dr Lin is a recipient of the MERIT (Method to Extend Research in Time) from the National Institute of Health which provides a long-term research grant support She is the PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology She has served on numerous editorial boards of statistical journals She was the former Coordinating Editor of Biometrics and currently the co-editor of Statistics in Biosciences and the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics She was the permanent member of the NIH study section of Biostatistical Methods and Study Designs (BMRD) and has served on a large number of other study sections at NIH and NSF

Short Courses

14 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

Seunggeun (Shawn) Lee is an assistant professor of Biostatistics at the University of Michigan He received his PhD in Biostatistics from the University of North Carolina at Chapel Hill and completed a postdoctoral training at Harvard School of Public Health His research focuses on developing statistical and computational methods for the analysis of the large-scale high-dimensional genetic and genomic data which is essential to better understand the genetic architecture of complex diseases and traits He is a recipient of the NIH Pathway to Independence Award (K99R00) 7 Analysis of biomarkers for prognosis and response prediction Presenter Patrick J Heagerty Professor and Associate Chair Department of Biostatistics University of Washington Seattle MA 98195 email heagertyuwashingtonedu Course length Half day OutlineDescription Longitudinal studies allow investigators to correlate changes in time-dependent exposures or biomarkers with subsequent health outcomes The use of baseline or time-dependent markers to predict a subsequent change in clinical status such as transition to a diseased state require the formulation of appropriate classification and prediction error concepts Similarly the evaluation of markers that could be used to guide treatment requires specification of operating characteristics associated with use of the marker The first part of this course will introduce predictive accuracy concepts that allow evaluation of time-dependent sensitivity and specificity for prognosis of a subsequent event time We will overview options that are appropriate for both baseline markers and for longitudinal markers Methods will be illustrated using examples from HIV and cancer research and will highlight R packages that are currently available Time permitting the second part of this course will introduce statistical methods that can characterize the performance of a biomarker toward accurately guiding treatment choice and toward improving health outcomes when the marker is used to selectively target treatment Examples will include use of imaging information to guide surgical treatment and use of genetic markers to select subjects for treatment TextbookReferences Heagerty PJ Lumley T Pepe MS Time dependent ROC curves for censored survival data and a

diagnostic marker Biometrics 56337-344 2000 Heagerty PJ Zheng Y Survival model predictive accuracy and ROC curves Biometrics 61(1)

92-105 2005 Saha P Heagerty PJ Time-dependent predictive accuracy in

the presence of competing risks Biometrics 66(4)

999-1011 2010 About the presenter Patrick Heagerty is Professor of Biostatistics University of Washington Seattle WA He has been the director of the center for biomedical studies at the University of Washington School of Medicine and Public Health He is one of the leading experts on methods for longitudinal studies including the evaluation of markers used to predict future clinical events He has made significant contributions to many areas of research including semi-parametric regression and estimating equations marginal models and random effects for longitudinal data dependence modeling for categorical time series and hierarchical models for categorical spatial data He was an elected fellow of the American Statistical Association and the Institute of Mathematical Statistics

Social Programs

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 15

Opening Mixer Sunday June 26th 2011 7 PM - 9 PM Salon E Lower Level 1

Banquet Tuesday June 17 2014 630pm-930pm JIN WAH Vietnamese amp Chinese Seafood Restaurant httpwwwjinwahcom Banquet Speech ldquoThe World of Statisticsrdquo After a successful International Year of Statistics 2013 we enter the new World of Statistics This is a great opportunity to think of our profession and look forward to the impact statistical sciences can have in innovation and discoveries in sciences engineering business and education Are we going to be obsolete Or omnipresent Dr Sastry Pantula Dean College of Science Oregon State University and former President of the American Statistical Association Sastry G Pantula became dean of the College of Science at Oregon State University in the fall of 2013 Prior to that he served as director of the National Science Foundationrsquos Division of Mathematical Sciences from 2010-2013

Pantula headed the statistics department at North Carolina State University (NCSU) where he served on the faculty for nearly 30 years He also directed their Institute of Statistics Pantula served as president of the American Statistical Association (ASA) in 2010 In addition to being an ASA fellow he is a fellow of the American Association for the Advancement of Science (AAAS) a member of the honor societies Mu Sigma Rho and Phi Kappa Phi and was inducted into the NCSU Academy of Outstanding Teachers in 1985

As dean of Oregon Statersquos College of Science and professor of statistics Pantula provides leadership to world-class faculty in some of the universityrsquos most recognized disciplines including nationally recognized programs in chemistry informatics integrative biology marine studies material science physics and others

During his tenure at NCSU Pantula worked with his dean and the college foundation to create three $1 million endowments for distinguished professors He also worked with colleagues and alumni to secure more than $7 million in funding from the National Science Foundation other agencies and industry to promote graduate student training and mentorship

Pantularsquos research areas include time series analysis and econometric modeling with a broad range of applications He has worked with the National Science Foundation the US Fish and Wildlife Service the US Environmental Protection Agency and the US Bureau of Census on projects ranging from population estimates to detecting trends in global temperature

As home to the core life physical mathematical and statistical sciences the College of Science has built a foundation of excellence It helped Oregon State acquire the top ranking in the United States for conservation biology in recent years and receive top 10 rankings by the Chronicle of Higher Education for the Departments of Integrative Biology (formerly Zoology) and Science Education The diversity of sciences in the Collegemdashincluding mathematical and statistical sciencesmdashprovides innovative opportunities for fundamental and multidisciplinary research collaborations across campus and around the globe

Pantula holds bachelorrsquos and masterrsquos degrees in statistics from the Indian Statistical Institute in Kolkata India and a PhD in statistics from Iowa State University

2014 ICSA China Statistics Conference July 4 ndash July 5 2014 bull Shanghai bull China

2nd

Announcement of the Conference (April 8 2014)

To attract statistical researchers and students in China and other countries to present their work and

experience with statistical colleagues and to strengthen the connections between China and oversea

statisticians the 2014 ICSA China Statistics Conference will be organized by the Committee for ICSA

Shanghai and hosted by East China Normal University (ECNU) from July 4 to July 5 2014 in

Shanghai China

The conference will invite lead statistical processionals in mainland China Hong Kong Taiwan the

United States and worldwide to present their research work It will cover a broad range of statistics

including mathematical statistics applied statistics biostatistics and statistics in finance and

economics which will provide a good platform for statistical professionals all over the world to share

their latest research and applications of statistics The invited speakers include Prof LJ Wei (Harvard

University) Prof Tony Cai (University of Pennsylvania) Prof Ying Lu (Stanford University) Prof

Ming-Hui Chen (University of Connecticut) Prof Danyu Lin (University of North Carolina at

Chapel Hill) and other distinguished statisticians

The oral presentations at the conference will be conducted in either English or Chinese Although the

Program Committee would recommend the presentation slides in English the Chinese version of the

slides could also be used

The program committee is working on the conference program and more information will be

distributed very soon Should you have any inquiries about the program please contact Dr Dejun

Tang (dejuntangnovartiscom) or Dr Yankun Gong (yankungongnovartiscom)

For conference registration and hotel reservation please contact Prof Shujin Wu at ECNU

(sjwustatecnueducn)

Program Committee amp Local Organizing Committee

2014 ICSA China Statistics Conference

18 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

ICSA DINNER at 2014 JSM in Boston MA The ICSA will hold the annual members meeting on August 6 (Wednesday) at 600 pm in Boston Convention Exhibition Center room CC-157B An ICSA banquet will follow the members meeting at Osaka Japanese Sushi amp Steak House 14 Green St Brookline MA 02446 (617) 732-0088 httpbrooklineosakacom Osaka is a Japanese fusion restaurant located in Brookline and can be reached via the MBTA subway Green line ldquoCrdquo branch (Coolidge corner stop) This restaurant features a cozy setting superior cuisine and elegant decore The banquet menu will include Oyster 3-waysRock ShrimpShrimp TempuraSushi and Sashimi boatHabachi seafoodChar-Grilled Sea BassLobster Complimentary winesakesoft drinks will be served and cash bar for extra drinks will be available The restaurant also has a club dance floor that provides complimentary Karaoke

Scientific Program (Presenting Author) Monday June 16 1000 AM-1200 PM

Scientific Program (June 16th - June 18th)

Monday June 16 800 AM - 930 AM

Keynote session I (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Dongseok Choi Oregon Health amp Science University

800 AM WelcomeYing Lu ICSA 2014 President

805 AM Congratulatory AddressGeorge C Tiao ICSA Founding President

820 AM Keynote lecture IRober Gentleman Genentech

930 AM Floor Discussion

Monday June 16 1000 AM-1200 PM

Session 1 Emerging Statistical Methods for Complex Data(Invited)Room Salon A Lower Level 1Organizer Lan Xue Oregon State UniversityChair Lan Xue Oregon State University

1000 AM Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao Guo University of Wisconsin-Madison

1025 AM Kernel Additive Sliced Inverse RegressionHeng Lian Nanyang Technological University

1050 AM Variable Selection with Prior Information for GeneralizedLinear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3 1OregonState University 2Nielsen Company 3Nielsen Company

1115 AM Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2 1University of Mis-souri at Columbia 2Purdue University

1140 AM Floor Discussion

Session 2 Statistical Methods for Sequencing Data Analysis(Invited)Room Salon B Lower Level 1Organizer Yanming Di Oregon State UniversityChair Gu Mi Oregon State University

1000 AM A Penalized Likelihood Approach for Robust Estimation ofIsoform ExpressionHui Jiang1 and Julia Salzman2 1University of Michigan2Stanford University

1025 AM Classification on Sequencing Data and its Applications on aHuman Breast Cancer DatasetJun Li University of Notre Dame

1050 AM Power-Robustness Analysis of Statistical Models for RNASequencing DataGu Mi Yanming Di and Daniel W Schafer Oregon StateUniversity

1115 AM Discussant Wei Sun University of North Carolina at ChapelHill

1140 AM Floor Discussion

Session 3 Modeling Big Biological Data with Complex Struc-tures (Invited)Room Salon C Lower Level 1Organizer Hua Tang Stanford UniversityChair Marc Coram Stanford University

1000 AM High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1 1University of California atDavis

1025 AM Statistical Analysis of RNA Sequencing DataMingyao Li and Yu Hu University of Pennsylvania

1050 AM Quantifying the Role of Steric Constraints in NucleosomePositioningH Tomas Rube and Jun S Song University of Illinois atUrbana-Champaign

1115 AM Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I Mias Michigan State University

1140 AM Floor Discussion

Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses (Invited)Room Salon D Lower Level 1Organizer Xiaojing Wang University of ConnecticutChair Xun Jiang Amgen Inc

1000 AM Binary State Space Mixed Models with Flexible Link Func-tionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut 2Amgen Inc 3Federal Univer-sity of Rio de Janeiro

1025 AM Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak KDey2 1University of Cincinnati 2University of Connecticut3Lawrence Berkeley National Laboratory

1050 AM Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing Weng National Chengchi University

1115 AM Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras StephenPrisley James Campbell and David Gaines Virginia Insti-tute of Technology

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 19

Monday June 16 1000 AM-1200 PM Scientific Program (Presenting Author)

1140 AM Floor Discussion

Session 5 Recent Advances in Astro-Statistics (Invited)Room Salon G Lower Level 1Organizer Thomas Lee University of Carlifornia at DavisChair Alexander Aue University of California at Davis

1000 AM Embedding the Big Bang Cosmological Model into aBayesian Hierarchical Model for Super Nova Light CurveDataDavid van Dyk Roberto Trotta Xiyun Jiao and HikmataliShariff Imperial College London

1025 AM Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew GrahamCiro Donalek and Andrew Drake California Institute ofTechnology

1050 AM Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku Vrtilek Harvard University

1115 AM Persistent Homology and the Topology of the IntergalacticMediumFabrizio Lecci Carnegie Mellon University

1140 AM Floor Discussion

Session 6 Statistical Methods and Application in Genetics(Invited)Room Salon H Lower Level 1Organizer Ying Wei Columbia UniversityChair Ying Wei Columbia University

1000 AM Identification of Homogeneous and Heterogeneous Covari-ate Structure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1 1New YorkUniversity 2North Carolina State University

1025 AM Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibro-sis in Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia GuillaumeWettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLC

1050 AM DNA Methylation Cell-Type Distribution and EWASE Andres Houseman Oregon State University

1115 AM Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and IulianaLonita-Laza1 1Columbia University 2New York Univer-sity

1140 AM Floor Discussion

Session 7 Statistical Inference of Complex Associations inHigh-Dimensional Data (Invited)Room Salon I Lower Level 1Organizer Jun Liu Harvard UniversityChair Di Wu Harvard University

1000 AM Leveraging for Big Data RegressionPing Ma University of Georgia

1025 AM Reference-free Metagenomics Analysis Using Matrix Fac-torizationWenxuan Zhong and Xin Xing University of Georgia

1050 AM Big Data Big models Big Problems Statistical Principlesand Practice at ScaleAlexander W Blocker Google

1115 AM Floor Discussion

Session 8 Recent Developments in Survival Analysis (Invited)Room Eugene Room Lower Level 1Organizer Qingxia (Cindy) Chen Vanderbilt UniversityChair Qingxia (Cindy) Chen Vanderbilt University

1000 AM Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical Tri-alsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2Mark E Boye3 and Wei Shen3 1University of Connecti-cut 2University of North Carolina 3Eli Lilly and Company

1025 AM Estimating Risk with Time-to-Event Data An Applicationto the Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu21Vanderbilt University 2Fred Hutchinson Cancer ResearchCenter

1050 AM Efficient Estimation of Nonparametric Genetic Risk Func-tion with Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng31Columbia University 2Beijing Normal University3University of North Carolina at Chapel Hill

1115 AM Support Vector Hazard Regression for Predicting EventTimes Subject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng11University of North Carolina 2Columbia University

1140 AM Floor Discussion

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products (Invited)Room Portland Room Lower Level 1Organizers Shihua Wen AbbVie Inc Yijie Zhou Merck amp CoChair Yijie Zhou Merck amp Co

1000 AM Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D Norton MedImmune

1025 AM Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3George Quartey4 John Scott5 and Shihua Wen6 1AmgenInc 2Pfizer Inc 3Merck amp Co 4Hoffmann-La Roche5United States Food and Drug Administration 6AbbVie Inc

1050 AM Current Concept of Benefit Risk Assessment of MedicineSyed S Islam AbbVie Inc

1115 AM Discussant Yang Bo AbbVie Inc

1140 AM Floor Discussion

20 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 130 PM - 310 PM

Session 10 Analysis of Observational Studies and ClinicalTrials (Contributed)Room Salem Room Lower Level 1Chair Naitee Ting Boehringer-Ingelheim Company

1000 AM Impact of Tuberculosis on Mortality Among HIV-InfectedPatients Receiving Antiretroviral Therapy in Uganda ACase Study in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 andLehana Thabane3 1Agensys Inc (Astellas) 2Universityof OttawaMcMaster University 3McMaster University4McMaster UniversityUniversity of Toronto 5The AIDSSupport Organization 6Stellenbosch University

1020 AM Ecological Momentary Assessment Methods to IncreaseResponse and Adjust for Attrition in a Study of MiddleSchool Studentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie KovalchikKirsten Becker Elizabeth DrsquoAmico William Shadel andMarc Elliott RAND Corporation

1040 AM Is Poor Antisaccade Performance in Healthy First-DegreeRelatives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and DeborahL Levy3 1University of Alabama at Birmingham 2StateUniversity of New York at Binghamton 3McLean Hospital

1100 AM Analysis of a Vaccine Study in Animals using MitigatedFraction in SASMathew Rosales Experis

1120 AM Competing Risks Survival Analysis for Efficacy Evaluationof Some-or-None VaccinesPaul T Edlefsen Fred Hutchinson Cancer Research Center

1140 AM Using Historical Data to Automatically Identify Air-TrafficController BehaviorYuefeng Wu University of Missouri at St Louis

1200 PM Floor Discussion

Monday June 16 130 PM - 310 PM

Session 11 Lifetime Data Analysis (Invited)Room Salon A Lower Level 1Organizer Mei-Ling Ting Lee University of MarylandChair Mei-Ling Ting Lee University of Maryland

130 PM Analysis of Multiple Type Recurrent Events When Only Par-tial Information Is Available for Some SubjectsMin Zhan and Jeffery Fink University of Maryland

155 PM Cumulative Incidence Function under Two-Stage Random-izationIdil Yavuz1 Yu Cheng2 and Abdus Wahed2 1 Dokuz EylulUniversity 2 University of Pittsburgh

220 PM Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen Jin Columbia University

245 PM Floor Discussion

Session 12 Safety Signal Detection and Safety Analysis(Invited)Room Salon B Lower Level 1Organizer Qi Jiang Amgen IncChair Qi Jiang Amgen Inc

130 PM Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhangand Qi Jiang Amgen Inc

155 PM Application of a Bayesian Method for Blinded Safety Moni-toring and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Inc

220 PM Some Thoughts on the Choice of Metrics for Safety Evalua-tionSteven Snapinn Amgen Inc

245 PM Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2 1Amgen Inc 2Gilead Sci-ences

310 PM Floor Discussion

Session 13 Survival and Recurrent Event Data Analysis(Invited)Room Salon C Lower Level 1Organizer Chiung-Yu Huang Johns Hopkins UniversityChair Chiung-Yu Huang Johns Hopkins University

130 PM Survival Analysis without Survival DataGary Chan University of Washington

155 PM Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2 1Johns Hopkins Uni-versity 2National Institute of Allergy and Infectious Diseases

220 PM Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 andTodd DeFor1 1University of Minnesota 2Johns HopkinsUniversity

245 PM Floor Discussion

Session 14 Statistical Analysis on Massive Data from PointProcesses (Invited)Room Salon D Lower Level 1Organizer Haonan Wang Colorado State UniversityChair Chunming Zhang University of Wisconsin-Madison

130 PM Identification of Synaptic Learning Rule from EnsembleSpiking ActivitiesDong Song and Theodore W Berger University of South-ern California

155 PM Intrinsically Weighted Means and Non-Ergodic MarkedPoint ProcessesAlexander Malinowski1 Martin Schlather1and ZhengjunZhang2 1University Mannheim 2University of Wisconsin

220 PM Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan Wang Colorado State Uni-versity

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 21

Monday June 16 130 PM - 310 PM Scientific Program (Presenting Author)

245 PM Floor Discussion

Session 15 High Dimensional Inference (or Testing) (Invited)Room Salon G Lower Level 1Organizer Pengsheng Ji University of GeorgiaChair Pengsheng Ji University of Georgia

130 PM Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni Sun University of Pennsylvania

155 PM Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen Liu University of Georgia

220 PM Testing High-Dimensional Nonparametric Function withApplication to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and VidyadharMandrekar Michigan State University

245 PM Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping Liu National Institutes of Health

310 PM Floor Discussion

Session 16 Phase II Clinical Trial Design with Survival End-point (Invited)Room Salon H Lower Level 1Organizer Jianrong Wu St Jude Childrenrsquos Research HospitalChair Joan Hu Simon Fraser University

130 PM Utility-Based Optimization of Schedule-Dose Regimesbased on the Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 andMuzaffar Qazilbash1 1University of Texas MD AndersonCancer Center 2University of Michigan

155 PM Bayesian Decision Theoretic Two-Stage Design in Phase IIClinical Trials with Survival EndpointLili Zhao and Jeremy Taylor University of Michigan

220 PM Single-Arm Phase II Group Sequential Trial Design withSurvival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping Xiong St Jude ChildrenrsquosResearch Hospital

245 PM Floor Discussion

Session 17 Statistical Modeling of High-throughput Ge-nomics Data (Invited)Room Salon I Lower Level 1Organizer Mingyao Li University of Pennsylvania School ofMedicineChair Mingyao Li University of Pennsylvania

130 PM Learning Genetic Architecture of Complex Traits AcrossPopulationsMarc Coram Sophie Candille and Hua Tang StanfordUniversity

155 PM A Bayesian Hierarchical Model to Detect DifferentiallyMethylated Loci from Single Nucleotide Resolution Se-quencing DataHao Feng Karen Coneelly and Hao Wu Emory University

220 PM Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and MingyaoLi2 1University of Minnesota 2University of Pennsylva-nia 3Vanderbilt University

245 PM Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei Zou University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 18 Statistical Applications in Finance (Invited)Room Portland Room Lower Level 1Organizer Zheng Su Deerfield CompanyChair Zheng Su Deerfield Company

130 PM A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2 1State University of NewYork 2IBM

155 PM Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming Huo Georgia Institute of Technology

220 PM Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4

and Hsun-Chih Kuo5 1University of Maryland 2Seoul Na-tional Univ 3Auburn University 4Ulsan National Institute ofScience and Technology 5National Chengchi University

245 PM Optimal Sparse Volatility Matrix Estimation for High Di-mensional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou31Florida State University 2University of Wisconsin-Madison3Yale University

310 PM Floor Discussion

Session 19 Hypothesis Testing (Contributed)Room Eugene Room Lower Level 1Chair Fei Tan Indiana University-Purdue University

130 PM A Score-type Test for Heterogeneity in Zero-inflated Modelsin a Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem31Auburn University 2Kansas State University 3MichiganState University

150 PM Inferences on Correlation Coefficients of Bivariate Log-normal DistributionsGuoyi Zhang1 and Zhongxue Chen2 1Universtiy of NewMexico 2Indiana University

210 PM Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 MyrtoBarrdahl3 and Nilanjan Chatterjee1 1National Cancer In-stitute 2Harvard University 3German Cancer Reserch Center

230 PM Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun Ma Amgen Inc

22 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 330 PM - 510 PM

250 PM Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu Li Auburn University

310 PM Floor Discussion

Session 20 Design and Analysis of Clinical Trials (Contributed)Room Salem Room Lower Level 1Chair Amei Amei University of Nevada at Las Vegas

130 PM Application of Bayesian Approach in Assessing Rare Ad-verse Events during a Clinical StudyGrace Li Karen Price Haoda Fu and David Manner EliLilly and Company

150 PM A Simplified Varying-Stage Adaptive Phase IIIII ClinicalTrial DesignGaohong Dong Novartis Pharmaceuticals Corporation

210 PM Improving Multiple Comparison Procedures With Copri-mary Endpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and FrankBretz1 1Novartis Pharmaceuticals Corporation 2Universityof Bremen

230 PM Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine Crespi Univer-sity of California at Los Angeles

250 PM Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and RiskDifferenceTianyue Zhou Sanofi-aventis US LLC

310 PM Floor Discussion

Monday June 16 330 PM - 510 PM

Session 21 New Methods for Big Data (Invited)Room Salon A Lower Level 1Organizer Yichao Wu North Carolina State UniversityChair Yichao Wu North Carolina State University

330 PM Sure Independence Screening for Gaussian Graphical Mod-elsShikai Luo1 Daniela Witten2 and Rui Song1 1North Car-olina State University 2University of Washington

355 PM Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman21Google 2Iowa State University

420 PM Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis 2University of North Car-olina at Chapel Hill

445 PM OEM Algorithm for Big DataXiao Nie and Peter Z G Qian University of Wisconsin-Madison

510 PM Floor Discussion

Session 22 New Statistical Methods for Analysis of High Di-mensional Genomic Data (Invited)Room Salon B Lower Level 1Organizer Michael C Wu Fred Hutchinson Cancer Research Cen-terChair Michael C Wu Fred Hutchinson Cancer Research Center

330 PM Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung Huang Brown University

355 PM Estimation of High Dimensional Directed Acyclic Graphsusing eQTL dataWei Sun1 and Min Jin Ha2 1University of North Carolinaat Chapel Hill 2University of Texas MD Anderson CancerCenter

420 PM Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 andHongyu Zhao1 1Yale University 2University of Texas atDallas 3Bristol-Myers Squibb 4Mount-Sinai Medical Center

445 PM Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and GenotypingStudiesNi Zhao and Michael Wu Fred Hutchinson Cancer Re-search Center

510 PM Floor Discussion

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation Process (Invited)Room Salon C Lower Level 1Organizer Jing Ning University of Texas MD Anderson CancerCenterChair Weining Shen The University of Texas MD Anderson Can-cer Center

330 PM Joint Modeling of Alternating Recurrent Transition TimesLiang Li University of Texas MD Anderson Cancer Cen-ter

355 PM Regression Analysis of Panel Count Data with InformativeObservation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun41University of North Carolina at Charlotte 2University ofMaryland 3University of New Hampshire 4University ofMissouri at Columbia

420 PM Envelope Linear Mixed ModelXin Zhang University of Minnesota

445 PM Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan Cai University ofTexas health Science Center at Houston

510 PM Floor Discussion

Session 24 Bayesian Models for High Dimensional ComplexData (Invited)Room Salon D Lower Level 1Organizer Juhee Lee University of California at Santa CruzChair Juhee Lee University of California at Santa Cruz

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 23

Monday June 16 330 PM - 510 PM Scientific Program (Presenting Author)

330 PM A Bayesian Feature Allocation Model for Tumor Hetero-geneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and KamalakarGulukota4 1University of California at Santa Cruz2University of Texas at Austin 3University of Chicago4Northshore University HealthSystem

355 PM Some Results on the One-Way ANOVA Model with an In-creasing Number of GroupsFeng Liang University of Illinois at Urbana-Champaign

420 PM Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji3 1University ofLouisville 2University of Texas at Austin 3NorthShore Uni-versity HealthSystemUniversity of Chicago

445 PM Latent Space Models for Dynamic NetworksYuguo Chen University of Illinois at Urbana-Champaign

510 PM Floor Discussion

Session 25 Statistical Methods for Network Analysis (Invited)Room Salon G Lower Level 1Organizer Yunpeng Zhao George Mason UniversityChair Yunpeng Zhao George Mason University

330 PM Consistency of Co-clustering for Exchangable Graph and Ar-ray DataDavid S Choi1 and Patrick J Wolfe2 1Carnegie MellonUniversity 2University College London

355 PM Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali Shojaie University of Washing-ton

420 PM Estimating Signature Subgraphs in Samples of LabeledGraphsJuhee Cho and Karl Rohe University of Wisconsin-Madison

445 PM Fast Hierarchical Modeling for Recommender SystemsPatrick Perry New York University

510 PM Floor Discussion

Session 26 New Analysis Methods for Understanding Com-plex Diseases and Biology (Invited)Room Salon H Lower Level 1Organizer Wenyi Wang University of Texas MD Anderson Can-cer CenterChair Wenyi Wang University of Texas MD Anderson CancerCenter

330 PM Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3Yong Zhang2 Myles Brown4 and X Shirley Liu4 1DanaFarber Cancer Institute 2Tongji University 3University ofTexas MD Anderson Cancer Center 4Dana Farber CancerInstitute amp Harvard University

355 PM Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi Wu Harvard University

430 PM Comparative Meta-Analysis of Prognostic Gene Signaturesfor Late-Stage Ovarian CancerLevi Waldron Hunter College

445 PM Studying Spatial Organizations of Chromosomes via Para-metric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and JunS Liu5 1New York university 2Purdue University 3EmoryUniversity 4Tsinghua University 5Harvard University

510 PM Floor Discussion

Session 27 Recent Advances in Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Mikyoung Jun Texas AampM UniversityChair Zhengjun Zhang University of Wisconsin

330 PM Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark vander Woerd Colorado State University

355 PM Semiparametric Estimation of Spectral Density Functionwith Irregular DataShu Yang and Zhengyuan Zhu Iowa State University

420 PM On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and SiegfriedHormann3 1University of California at Davis 2UniversityCollege London 3University Libre de Bruxelles

445 PM A Composite Likelihood-based Approach for MultipleChange-point Estimation in Multivariate Time Series Mod-elsChun Yip Yau and Ting Fung Ma Chinese University ofHong Kong

510 PM Floor Discussion

Session 28 Analysis of Correlated Longitudinal and SurvivalData (Invited)Room Eugene Room Lower Level 1Organizer Jingjing Wu University of CalgaryChair Jingjing Wu University of Calgary

330 PM Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir Mesbah University of Paris 6

355 PM Power and Sample Size Calculations for Evaluating Media-tion Effects with Multiple Mediators in Longitudinal StudiesCuiling Wang Albert Einstein College of Medicine

420 PM Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore2 1Universityof Maryland 2McGill University

445 PM Joint Modeling of Survival Data and Mismeasured Longitu-dinal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi2 1University ofWestern Ontario 2University of Waterloo

510 PM Floor Discussion

24 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 1000 AM - 1200 PM

Session 29 Clinical Pharmacology (Invited)Room Portland Room Lower Level 1Organizer Christine Wang AmgenChair Christine Wang Amgen

330 PM Truly Personalizing Medicine

Mike D Hale Amgen Inc

355 PM What Do Statisticians Do in Clinical Pharmacology

Brian Smith Amgen Inc

420 PM The Use of Modeling and Simulation to Bridge DifferentDosing Regimens - a Case StudyChyi-Hung Hsu and Jose Pinheiro Janssen Research ampDevelopment

445 PM A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao SusanGuo Lijie Zhong and Liang Fang Gilead Sciences

510 PM Floor Discussion

Session 30 Sample Size Estimation (Contributed)Room Salem Room Lower Level 1Chair Antai Wang New Jersey Institute of Technology

330 PM Sample Size Calculation with Semiparametric Analysis ofLong Term and Short Term Hazards

Yi Wang Novartis Pharmaceuticals Corporation

350 PM Sample Size and Decision Criteria for Phase IIB Studies withActive Control

Xia Xu Merck amp Co

410 PM Sample Size Determination for Clinical Trials to CorrelateOutcomes with Potential PredictorsSu Chen Xin Wang and Ying Zhang AbbVie Inc

430 PM Sample Size Re-Estimation at Interim Analysis in OncologyTrials with a Time-to-Event Endpoint

Ian (Yi) Zhang Sunovion Pharmaceuticals Inc

450 PM Statistical Inference and Sample Size Calculation for PairedBinary Outcomes with Missing Data

Song Zhang University of Texas Southwestern MedicalCenter

510 PM Floor Discussion

Tuesday June 17 820 AM - 930 AM

Keynote session II (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Rochelle Fu Oregon Health amp Science University

820 AM Keynote lecture II

Sharon-Lise Normand Harvard University

930 AM Floor Discussion

Tuesday June 17 1000 AM - 1200 PM

Session 31 Predictions in Clinical Trials (Invited)Room Salon A Lower Level 1Organizer Yimei Li University of PennsylvaniaChair Daniel Heitjan University of Pennsylvania

1000 AM Predicting Smoking Cessation Outcomes Beyond ClinicalTrialsYimei Li E Paul Wileyto and Daniel F Heitjan Universityof Pennsylvania

1025 AM Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping Wang Eli Lilly andCompany

1050 AM Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record(EMR) DataHaoda Fu1 and Nan Jia2 1Eli Lilly and Company2University of Southern California

1115 AM Weibull Cure-Mixture Model for the Prediction of EventTimes in Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1 1University of Pennsylvania 2Radiation TherapyOncology Group Statistical Center

1140 AM Floor Discussion

Session 32 Recent Advances in Statistical Genetics (Invited)Room Salon B Lower Level 1Organizer Taesung Park Seoul National UniversityChair Taesung Park Seoul National University

1000 AM Longitudinal Exome-Focused GWAS of Alcohol Use in aVeteran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale University

1025 AM Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2

and Alexander F Wilson1 1National Institutes of Health2Mahidol University

1050 AM GMDR A Conceptual Framework for Detection of Multi-factor Interactions Underlying Complex TraitsXiang-Yang Lou University of Alabama at Birmingham

1115 AM Gene-Gene Interaction Analysis for Rare Variants Applica-tion to T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University 2Sejong University

1140 AM Floor Discussion

Session 33 Structured Approach to High Dimensional Datawith Sparsity and Low Rank Factorization (Invited)Room Salon C Lower Level 1Organizer Yoonkyung Lee Ohio State UniversityChair Yoonkyung Lee Ohio State University

1000 AM Two-way Regularized Matrix DecompositionJianhua Huang Texas AampM University

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 25

Tuesday June 17 1000 AM - 1200 PM Scientific Program (Presenting Author)

1025 AM Tensor Regression with Applications in Neuroimaging Anal-ysisHua Zhou1 Lexin Li1 and Hongtu Zhu2 1North CarolinaState University 2University of North Carolina at Chapel Hill

1050 AM RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2

and Guy Lebanon1 1Georgia Institute of Technology2Pennsylvania State University

1115 AM Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho Chun Purdue University

1140 AM Floor Discussion

Session 34 Recent Developments in Dimension ReductionVariable Selection and Their Applications (Invited)Room Salon D Lower Level 1Organizer Xiangrong Yin University of GeorgiaChair Pengsheng Ji University of Georgia

1000 AM Variable Selection and Model Estimation via Subtle Uproot-ingXiaogang Su University of Texas at El Paso

1025 AM Robust Variable Selection Through Dimension ReductionQin Wang Virginia Commonwealth University

1050 AM Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2 1Universityof Florida 2National University of Singapore

1115 AM Floor Discussion

Session 35 Post-Discontinuation Treatment in RandomizedClinical Trials (Invited)Room Salon G Lower Level 1Organizer Li Li Research Scientist Eli Lilly and CompanyChair Li Li Eli Lilly and Company

1000 AM Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censor-ing by Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries1 1EliLilly and Company 2North Carolina State University

1025 AM Quantile Regression Adjusting for Dependent Censoringfrom Semi-Competing RisksRuosha Li1 and Limin Peng2 1University of Pittsburgh2Emory University

1050 AM Overview of Crossover DesignMing Zhu AbbVie Inc

1115 AM Cross-Payer Effects of Medicaid LTSS on Medicare Re-source Use using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen Johnson Universityof Maryland

1140 AM Floor Discussion

Session 36 New Advances in Semi-parametric Modeling andSurvival Analysis (Invited)Room Salon H Lower Level 1Organizer Yichuan Zhao Georgia State UniversityChair Xuelin Huang University of Texas MD Anderson CancerCenter

1000 AM Bayesian Partial Linear Model for Skewed LongitudinalDataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 StuartLipsitz3 and Steven Lipshultz4 1AbbVie Inc 2Florida StateUniversity 3Brigham and Womenrsquos Hospital 4University ofMiami

1025 AM Nonparametric Inference for Inverse Probability WeightedEstimators with a Randomly Truncated SampleXu Zhang University of Mississippi

1050 AM Modeling Time-Varying Effects for High-Dimensional Co-variates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji Zhu University of Michigan

1115 AM Flexible Modeling of Survival Data with Covariates Subjectto Detection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang21Villanova University 2North Carolina State University

1140 AM Floor Discussion

Session 37 High-dimensional Data Analysis Theory andApplication (Invited)Room Salon I Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1000 AM Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen Zhang University of Arizona

1025 AM High-Dimensional Thresholded Regression and ShrinkageEffectZemin Zheng Yingying Fan and Jinchi Lv University ofSouthern California

1050 AM Local Independence Feature Screening for Nonparametricand Semiparametric Models by Marginal Empirical Likeli-hoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu31University of Melbourne 2University of Colorado Denver3North Carolina State University

1115 AM The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2 1Florida State University2University of Minnesota

1140 AM Floor Discussion

Session 38 Leading Across Boundaries Leadership Develop-ment for Statisticians (Invited Discussion Panel)Room Eugene Room Lower Level 1Organizers Ming-Dauh Wang Eli Lilly and Company RochelleFu Oregon Health amp Science University furohsueduChair Ming-Dauh Wang Eli Lilly and Company

Topic The panel will discuss issues related to importance of lead-ership barriers to leadership overcoming barriers commu-nication and sociability

26 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 130 PM - 310 PM

Panel Xiaoli Meng Harvard University

Dipak Dey University of Connecticut

Soonmin Park Eli Lilly and Company

James Hung United States Food and Drug Administration

Walter Offen AbbVie Inc

Session 39 Recent Advances in Adaptive Designs in EarlyPhase Trials (Invited)Room Portland Room Lower Level 1Organizer Ken Cheung Columbia UniversityChair Ken Cheung Columbia University

1000 AM A Toxicity-Adaptive Isotonic Design for Combination Ther-apy in OncologyRui Qin Mayo Clinic

1025 AM Calibration of the Likelihood Continual ReassessmentMethod for Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung11Columbia University 2Boehringer Ingelheim Pharmaceuti-cals

1050 AM Sequential Subset Selection Procedure of Random SubsetSize for Early Phase Clinical trialsCheng-Shiun Leu and Bruce Levin Columbia University

1115 AM Serach Procedures for the MTD in Phase I TrialsShelemyyahu Zacks Binghamton University

1140 AM Floor Discussion

Session 40 High Dimensional RegressionMachine Learning(Contributed)Room Salem Room Lower Level 1Chair Hanxiang Peng Indiana University-Purdue University

1000 AM Variable Selection for High-Dimensional Nonparametric Or-dinary Differential Equation Models With Applications toDynamic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu11University of Rochester 2State University of New York atAlbany 3George Washington University

1020 AM BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University 2Cornell University

1040 AM A Sparse Linear Discriminant Analysis Method withAsymptotic Optimality for Multiclass ClassificationRuiyan Luo and Xin Qi Georgia State University

1100 AM Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles Kooperberg FredHutchinson Cancer Research Center

1120 AM Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin Li Wright State University

1140 AM Rank Estimation and Recovery of Low-rank Matrices ForFactor Model with Heteroscedastic NoiseJingshu Wang and Art B Owen Stanford University

1200 PM Floor Discussion

Tuesday June 17 130 PM - 310 PM

Session 41 Distributional Inference and its Impact on Statis-tical Theory and Practice (Invited)Room Salon A Lower Level 1Organizers Min-ge Xie Rutgers University Thomas Lee Univer-sity of California at Davis thomascmleegmailcomChair Min-ge Xie Rutgers University

130 PM Stat Wars Episode IV A New Hope (For Objective Infer-ence)Keli Liu and Xiao-Li Meng Harvard University

155 PM Higher Order Asymptotics for Generalized Fiducial Infer-enceAbhishek Pal Majumdarand Jan Hannig University ofNorth Carolina at Chapel Hill

220 PM Generalized Inferential ModelsRyan Martin University of Illinois at Chicago

245 PM Formal Definition of Reference Priors under a General Classof DivergenceDongchu Sun University of Missouri

310 PM Floor Discussion

Session 42 Applications of Spatial Modeling and ImagingData (Invited)Room Salon B Lower Level 1Organizer Karen Kafadar Indiana UniversityChair Karen Kafadar Indiana University

130 PM Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1

and James Coan2 1Duke University 2University of Virginia

155 PM A Hierarchical Model for Simultaneous Detection and Esti-mation in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist2 1DePaul Univer-sity 2Johns Hopkins University

220 PM On the Relevance of Accounting for Spatial Correlation ACase Study from FloridaLinda J Young1 and Emily Leary2 1USDA NASS RDD2University of Florida

245 PM Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico 2University of Texas at Austin

310 PM Floor Discussion

Session 43 Recent Development in Survival Analysis andStatistical Genetics (Invited)Room Salon C Lower Level 1Organizers Junlong Li Harvard University KyuHa Lee HarvardUniversityChair Junlong Li Harvard University

130 PM Restricted Survival Time and Non-proportional HazardsZhigang Zhang Memorial Sloan Kettering Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 27

Tuesday June 17 130 PM - 310 PM Scientific Program (Presenting Author)

155 PM Empirical Null using Mixture Distributions and Its Applica-tion in Local False Discovery RateDoHwan Park University of Maryland

220 PM A Bayesian Illness-Death Model for the Analysis of Corre-lated Semi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici1 1Harvard University 2Dana FarberCancer Institute

245 PM Detection of Chromosome Copy Number Variations in Mul-tiple SequencesXiaoyi Min Chi Song and Heping Zhang Yale University

310 PM Floor Discussion

Session 44 Bayesian Methods and Applications in ClinicalTrials with Small Population (Invited)Room Salon D Lower Level 1Organizer Alan Chiang Eli Lilly and CompanyChair Ming-Dauh Wang Eli Lilly and Company

130 PM Applications of Bayesian Meta-Analytic Approach at Novar-tisQiuling Ally He Roland Fisch and David Ohlssen Novar-tis Pharmaceuticals Corporation

155 PM Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and YuanJi3 1University of Texas at Austin 2Harvard University3University of Texas at Austin

220 PM Innovative Designs and Practical Considerations for Pedi-atric StudiesAlan Y Chiang Eli Lilly and Company

245 PM Discussant Ming-Dauh Wang Eli Lilly and Company

310 PM Floor Discussion

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis (Invited)Room Salon G Lower Level 1Organizer Ming Wang Penn State College of MedicineChair Lijun Zhang Penn State College of Medicine

130 PM partDSA for Deriving Survival Risk Groups EnsembleLearning and Variable SelectionAnnette Molinaro1 Adam Olshen1 and RobertStrawderman2 1University of California at San Francisco2University of Rochester

155 PM Predictive Accuracy of Time-Dependent Markers for Sur-vival OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2 1University ofKentucky 2University of North Carolina at Chapel Hill

220 PM Estimating the Effectiveness in HIV Prevention Trials by In-corporating the Exposure Process Application to HPTN 035DataJingyang Zhang1 and Elizabeth R Brown2 1FredHutchinson Cancer Research Center 2Fred Hutchinson Can-cer Research CenterUniversity of Washington

245 PM Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2 1Penn State College ofMedicine 2Emory University

310 PM Floor Discussion

Session 46 Missing Data the Interface between Survey Sam-pling and Biostatistics (Invited)Room Salon H Lower Level 1Organizer Jiwei Zhao University of WaterlooChair Peisong Han University of Waterloo

130 PM Likelihood-based Inference with Missing Data UnderMissing-at-randomShu Yang and Jae Kwang Kim Iowa State University

155 PM Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang Chen Iowa State University

220 PM A New Estimation with Minimum Trace of Asymptotic Co-variance Matrix for Incomplete Longitudinal Data with aSurrogate ProcessBaojiang Chen1 and Jing Qin2 1University of Nebraska2National Institutes of Health

245 PM Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook2 1Queenrsquos Univer-sity 2University of Waterloo

310 PM Floor Discussion

Session 47 New Statistical Methods for Comparative Effec-tiveness Research and Personalized Medicine (Invited)Room Salon I Lower Level 1Organizer Jane Paik Kim Stanford UniversityChair Jane Paik Kim Stanford University

130 PM Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin41University of Texas MD Anderson Cancer Center 2BaylorCollege of Medicine 3University of Texas MD AndersonCancer Center 4National Institutes of Health

155 PM Choice between Superiority and Non-inferiority in Compar-ative Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford University

220 PM An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze Lai Stanford University

245 PM Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins University

310 PM Floor Discussion

Session 48 Student Award Session 1 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Zhezhen Jin Columbia University

28 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 330 PM - 530 PM

130PM Regularization After Retention in Ultrahigh DimensionalLinear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2 1ColumbiaUniversity 2Binghamton University

155 PM Personalized Dose Finding Using Outcome Weighted Learn-ingGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hill

220 PM Survival Rates Prediction When Training Data and TargetData Have Different Measurement ErrorCheng Zheng and Yingye Zheng Fred Hutchinson CancerResearch Center

245 PM Hard Thresholded Regression Via Linear ProgrammingQiang Sun University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 49 Network AnalysisUnsupervised Methods(Contributed)Room Eugene Room Lower Level 1Chair Chunming Zhang University of Wisconsin-Madison

130 PM Community Detection in Multilayer Networks A Hypothe-sis Testing ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel Hill

150 PM Network Enrichment Analysis with Incomplete Network In-formationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan 2University of Washington

210 PM Estimation of A Linear Model with Fuzzy Data Treated asSpecial Functional DataWang Dabuxilatu Guangzhou University

230 PM Efficient Estimation of Sparse Directed Acyclic Graphs Un-der Compounded Poisson DataSung Won Han and Hua Zhong New York University

250 PM Asymptotically Normal and Efficient Estimation ofCovariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and HarrisonZhou Yale University

310 PM Floor Discussion

Session 50 Personalized Medicine and Adaptive Design(Contributed)Room Salem Room Lower Level 1Chair Danping Liu National Institutes of Health

130 PM MicroRNA Array NormalizationLi-Xuan and Qin Zhou Memorial Sloan Kettering CancerCenter

150 PM Combining Multiple Biomarker Models with Covariates inLogistic Regression Using Modified ARM (Adaptive Re-gression by Mixing) ApproachYanping Qiu1 and Rong Liu2 1Merck amp Co 2BayerHealthCare

210 PM A New Association Test for Case-Control GWAS Based onDisease Allele SelectionZhongxue Chen Indiana University

230 PM On Classification Methods for Personalized Medicine andIndividualized Treatment RulesDaniel Rubin United States Food and Drug Administration

250 PM Bayesian Adaptive Design for Dose-Finding Studies withDelayed Binary ResponsesXiaobi Huang1 and Haoda Fu2 1Merck amp Co 2Eli Lillyand Company

310 PM Floor Discussion

Tuesday June 17 330 PM - 530 PM

Session 51 New Development in Functional Data Analysis(Invited)Room Salon A Lower Level 1Organizer Guanqun Cao Auburn UniversityChair Guanqun Cao Auburn University

330 PM Variable Selection and Estimation for Longitudinal SurveyDataLi Wang1 Suojin Wang2 and Guannan Wang11University of Georgia 2Texas AampM University

355 PM Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and BorisFreydin1 1Thomas Jefferson University 2George Wash-ington University

420 PM A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1 1NewYork University 2Columbia University

445 PM Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan Mizera Universityof Alberta

510 PM Floor Discussion

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs (Invited)Room Salon B Lower Level 1Organizer Gang Li Johnson amp JohnsonChair Yi Wang Novartis Pharmaceuticals Corporation

330 PM Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric Chi Amgen Inc

350 PM New Analytical Methods for Non-Inferiority Trials Covari-ate Adjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug Administration

410 PM Where is the Right Balance for Designing an EfficientBiosimilar Clinical Program - A Biostatistic Perspective onAppropriate Applications of Statistical Principles from NewDrug to BiosimilarsYulan Li Novartis Pharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 29

Tuesday June 17 330 PM - 530 PM Scientific Program (Presenting Author)

430 PM Challenges of designinganalyzing trials for Hepatitis CdrugsGreg Soon United States Food and Drug Administration

450 PM GSKrsquos Patient-level Data Sharing ProgramShuyen Ho GlaxoSmithKline plc

510 PM Floor Discussion

Session 53 Gatekeeping Procedures and Their Applicationin Pivotal Clinical Trials (Invited)Room Salon C Lower Level 1Organizer Michael Lee Johnson amp JohnsonChair Michael Lee Johnson amp Johnson

330 PM A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane2 1Novartis PharmaceuticalsCorporation 2Northwestern University

355 PM Multiple Comparisons in Complex Trial DesignsHM James Hung United States Food and Drug Adminis-tration

420 PM Use of Bootstrapping in Adaptive Designs with MultiplicityIssuesJeff Maca Quintiles

445 PM Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael Lee Janssen Research amp Development

510 PM Floor Discussion

Session 54 Approaches to Assessing Qualitative Interactions(Invited)Room Salon D Lower Level 1Organizer Guohua (James) Pan Johnson amp JohnsonChair James Pan Johnson amp Johnson

330 PM Interval Based Graphical Approach to Assessing QualitativeInteractionGuohua Pan and Eun Young Suh Johnson amp Johnson

355 PM Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong Luo Celgene Corporation

420 PM A Bayesian Approach to Qualitative InteractionEmine O Bayman University of Iowa

445 PM Discussant Surya Mohanty Johnson amp Johnson

510 PM Floor Discussion

Session 55 Interim Decision-Making in Phase II Trials(Invited)Room Salon G Lower Level 1Organizer Lanju Zhang AbbVie IncChair Lanju Zhang AbbVie Inc

330 PM Evaluation of Interim Dose Selection Methods Using ROCApproachDeli Wang Lu Cui Lanju Zhang and Bo Yang AbbVieInc

355 PM Interim Monitoring for Futility Based on Probability of Suc-cessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh31AbbVie Inc 2Merck amp Co 3GlaxoSmithKline plc

420 PM Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran Suresh GlaxoSmithKlineplc

445 PM Discussant Peng Chen Celgene Corporation510 PM Floor Discussion

Session 56 Recent Advancement in Statistical Methods(Invited)Room Salon H Lower Level 1Organizer Dongseok Choi Oregon Health amp Science UniversityChair Dongseok Choi Oregon Health amp Science University

330 PM Exact Inference New Methods and ApplicationsIan Dinwoodie Portland State University

355 PM Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun Hong Sungkyunkwan University

420 PM Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho21Washington State University 2Seoul National University

445 PM A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying Yuan University ofTexas MD Anderson Cancer Center

510 PM Floor Discussion

Session 57 Building Bridges between Research and Practicein Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Jane Chu IBMSPSSChair Jane Chu IBMSPSS

330 PM Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and JinwooShin2 1IBM 2KAIST University

355 PM Time Series Research at the U S Census BureauBrian C Monsell U S Census Bureau

420 PM Issues Related to the Use of Time Series in Model Buildingand AnalysisWilliam WS Wei Temple University

445 PM Discussant George Tiao University of Chicago510 PM Floor Discussion

Session 58 Recent Advances in Design for BiostatisticalProblems (Invited)Room Eugene Room Lower Level 1Organizer Weng Kee Wong University of California at Los Ange-lesChair Weng Kee Wong University of California at Los Angeles

330 PM Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough Carriere University of Al-berta

30 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 830 AM - 1010 AM

355 PM Efficient Algorithms for Two-stage Designs on Phase II Clin-ical TrialsSeongho Kim1 and Weng Kee Wong2 1Wayne State Uni-versityKarmanos Cancer Institute 2University of Californiaat Los Angeles

420 PM Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-ChungWang3 and Weng Kee Wong4 1Institute of Statistical Sci-ence Academia Sinica 2National Cheng Kung University3National Taiwan University 4University of California at LosAngeles

445 PM D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle SwarmOptimizationJiaheng Qiu and Weng Kee Wong University of Californiaat Los Angeles

510 PM Floor Discussion

Session 59 Student Award Session 2 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Wenqing He University of Western Ontario

330 PM Analysis of Sequence Data Under Multivariate Trait-Dependent SamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari ENorth1 Eric Boerwinkle2 and Dan-Yu Lin1 1Universityof North Carolina at Chapel Hill 2University of Texas HealthScience Center

355 PM Empirical Likelihood Based Tests for Stochastic OrderingUnder Right CensorshipHsin-wen Chang and Ian W McKeague Columbia Uni-versity

420 PM Multiple Genetic Loci Mapping for Latent Disease LiabilityUsing a Structural Equation Modeling Approach with Appli-cation in Alzheimerrsquos DiseaseTing-Huei Chen University of North Carolina at ChapelHill

445 PM Floor Discussion

Session 60 Semi-parametric Methods (Contributed)Room Salem Room Lower Level 1Chair Ouhong Wang Amgen Inc

330 PM Semiparametric Estimation of Mean and Variance in Gener-alized Estimating EquationsJianxin Pan1 and Daoji Li2 1The University of Manch-ester 2University of Southern California

350 PM An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan Li IndianaUniversity-Purdue University Indianapolis

410 PM M-estimation for General ARMA Processes with InfiniteVarianceRongning Wu Baruch College City University of NewYork

430 PM Sufficient Dimension Reduction via Principal Lq SupportVector MachineAndreas Artemiou1 and Yuexiao Dong2 1Cardiff Univer-sity 2Temple University

450 PM Nonparametric Quantile Regression via a New MM Algo-rithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong41College of Charleston 1National Chengchi University2Shanghai University of Finance and Economics 3KansasState University 4Temple University

510 PM Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel LinderJingxian Cai and Robert Vogel Georgia Southern Uni-versity

530 PM Floor Discussion

Wednesday June 18 830 AM - 1010 AM

Session 61 Statistical Challenges in Variable Selection forGraphical Modeling (Invited)Room Salon A Lower Level 1Organizer Hua (Judy) Zhong New York UniversityChair Hua (Judy) Zhong New York University

830 AM Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1 1 Univer-sity of Cambridge 2 Columbia University

855 AM High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2 1Temple University 2EmoryUniversity

920 AM Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and MarinaVannucci3 1Stanford University 2University of Texas MDAnderson Cancer Center 3Rice University

945 AM Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genev-era I Allen2 and Zhandong Liu3 1University of Texas atAustin 2Rice University 3Baylor College of Medicine

1010 AM Floor Discussion

Session 62 Recent Advances in Non- and Semi-parametricMethods (Invited)Room Salon B Lower Level 1Organizer Lan Xue Oregon State UniversityChair Quanqun Cao Auburn University

830 AM Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential SplineFamilyLan Zhou Texas AampM University

855 AM Estimating Time-Varying Effects for Overdispersed Recur-rent Data with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2Mouna Akacha3 and Heinz Schmidli3 1Vanderbilt Univer-sity 2University of North Carolina at Chapel Hill 3NovartisPharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 31

Wednesday June 18 830 AM - 1010 AM Scientific Program (Presenting Author)

920 AM Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily Wang University of Georgia

945 AM Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and AnnieQu2 1Oregon State University 2University of Illinois atUrbana-Champaign 3Lung and Blood Institute

1010 AM Floor Discussion

Session 63 Statistical Challenges and Development in Can-cer Screening Research (Invited)Room Salon C Lower Level 1Organizer Yu Shen University of Texas MD Anderson CancerCenterChair Yu Shen Professor University of Texas M D AndersonCancer Center

830 AM Overdiagnosis in Breast and Prostate Cancer ScreeningConcepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing Xia Fred Hutchin-son Cancer Research Center

855 AM Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington 2Fred Hutchinson Cancer Re-search Center

920 AM Estimating Screening Test Effectiveness when Screening In-dication is UnknownRebecca Hubbard Group Health Research Institute

945 AM Developing Risk-Based Screening Guidelines ldquoEqual Man-agement of Equal RisksrdquoHormuzd Katki National Cancer Institute

1010 AM Floor Discussion

Session 64 Recent Developments in the Visualization andExploration of Spatial Data (Invited)Room Salon D Lower Level 1Organizer Juergen Symanzik Utah State UniversityChair Juergen Symanzik Utah State University

830 AM Recent Advancements in Geovisualization with a CaseStudy on Chinese ReligionsJuergen Symanzik1 and Shuming Bao2 1Utah State Uni-versity 2University of Michigan

855 AM Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She2 1University ofMichigan 2Wuhan University

920 AM Probcast Creating and Visualizing Probabilistic WeatherForecastsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3Tilmann Gneiting4 and Adrian Raftery2 1Seattle Uni-versity 2University of Washington 3Bigger Boat Consulting4University Heidelberg

945 AM Discussant Karen Kafadar Indiana University

1010 AM Floor Discussion

Session 65 Advancement in Biostaistical Methods and Ap-plications (Invited)Room Salon G Lower Level 1Organizer Sin-ho Jung Duke UniversityChair Dongseok Choi Oregon Health amp Science University

830 AM Estimation of Time-Dependent AUC under Marker-Dependent SamplingXiaofei Wang and Zhaoyin Zhu Duke University

855 AM A Measurement Error Approach for ModelingAccelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy Dunloop NorthwesternUniversity

920 AM Real-Time Prediction in Clinical Trials A Statistical Historyof REMATCHDaniel F Heitjan and Gui-shuang Ying University ofPennsylvania

945 AM An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C MorrisonElaine C Johnson Stephen R Planck and James T Rosen-baum Oregon Health amp Science University

1010 AM Floor Discussion

Session 66 Analysis of Complex Data (Invited)Room Salon H Lower Level 1Organizer Mesbah Mounir University of Paris 6Chair Mesbah Mounir University of Paris 6

830 AM Integrating Data from Heterogeneous Studies Using OnlySummary Statistics Efficiency and Robustness

Min-ge Xie Rutgers University

855 AM A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University 2Koc University

920 AM A Comparison of Two Approaches for Acute Leukemia Pa-tient ClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng31University of Calgary 2Enbridge Pipelines 3University ofGuelph

945 AM On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 ValerieMcGuire1 and John Shepherd3 1VA Palo Alto HealthCare System amp Stanford University 2Purdue University3University of California at San Francisco 4VA Palo AltoHealth Care System

1010 AM Floor Discussion

Session 67 Statistical Issues in Co-development of Drug andBiomarker (Invited)Room Salon I Lower Level 1Organizer Liang Fang Gilead SciencesChair Liang Fang Gilead Sciences

32 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 1030 AM-1210 PM

830 AM Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in ComparativeEffectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong WooKim3 1Stanford University 2Onyx Pharmaceuticals3Microsoft Corportation

855 AM Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2 1University of Wash-ington 2National Institutes of Health

920 AM An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael Wolf Amgen Inc

945 AM Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas Bengtsson Genentech Inc

1010 AM Floor Discussion

Session 68 New Challenges for Statistical Ana-lystProgrammer (Invited)Room Eugene Room Lower Level 1Organizer Xianming (Steve) Zheng Eli Lilly and CompanyChair Xianming (Steve) Zheng Eli Lilly and Company

830 AM Similarities and Differences in Statistical Programmingamong CRO and Pharmaceutical IndustriesMark Matthews inVentiv Health Clinical

855 AM Computational Aspects for Detecting Safety Signals in Clin-ical TrialsJyoti Rayamajhi Eli Lilly and Company

920 AM Bayesian Network Meta-Analysis Methods An Overviewand A Case StudyBaoguang Han1 Wei Zou2 and Karen Price1 1Eli Lillyand Company 2inVentiv Clinical Health

945 AM Floor Discussion

Session 69 Adaptive and Sequential Methods for ClinicalTrials (Invited)Room Portland Room Lower Level 1Organizers Zhengjia Chen Emory University Yichuan ZhaoGeorgia State University yichuangsueduChair Zhengjia Chen Emory University

830 AM Bayesian Data Augmentation Dose Finding with ContinualReassessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2 1 University ofTexas MD Anderson Cancer Center 2 University of HongKong

855 AM Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying Yuan University of TexasMD Anderson Cancer Center

920 AM Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston

945 AM Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen21 Georgia State University 2 Emory University

1010 AM Floor Discussion

Wednesday June 18 1030 AM-1210 PM

Session 70 Survival Analysis (Contributed)Room Portland Room Lower Level 1Chair Zhezhen Jin Columbia University

1030 AM Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua Naranjo Western Michi-gan University

1050 AM Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary Endpoint

Nibedita Bandyopadhyay Janssen Research amp Develop-ment

1110 AM Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen Cai University of North Carolinaat Chapel Hill

1130 AM Floor Discussion

Session 71 Complex Data Analysis Theory and Application(Invited)Room Salon A Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1030 AM Supervised Singular Value Decomposition and Its Asymp-totic Properties

Gen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill 2Rutgers Uni-versity

1055 AM New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng21University of Arizona 2Columbia University

1120 AM A Statistical Approach to Set Classification by Feature Se-lection with Applications to Classification of HistopathologyImages

Sungkyu Jung1 and Xingye Qiao2 1University of Pitts-burgh 2Binghamton University State University of NewYork

1145 AM A Smoothing Spline Model for analyzing dMRI Data ofSwallowing

Binhuan Wang Ryan Branski Milan Amin and Yixin FangNew York University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 33

Wednesday June 18 1030 AM-1210 PM Scientific Program (Presenting Author)

Session 72 Recent Development in Statistics Methods forMissing Data (Invited)Room Salon B Lower Level 1Organizer Nanhua Zhang Cincinnati Childrenrsquos Hospital MedicalCenterChair Haoda Fu Eli Lilly and Company

1030 AM A Semiparametric Inference to Regression Analysis withMissing Covariates in Survey DataShu Yang and Jae-kwang Kim Iowa State University

1055 AM Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2 1University of Waterloo2University of Michigan

1120 AM Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1 1United States Centers forDisease Control and Prevention

1145 AM Marginal Treatment Effect Estimation Using Pattern-Mixture ModelZhenzhen Xu United States Food and Drug Administration

1210 PM Floor Discussion

Session 73 Machine Learning Methods for Causal Inferencein Health Studies (Invited)Room Salon C Lower Level 1Organizer Mi-Ok Kim Cincinnati Childrenrsquos Hospital MedicalCenterChair Mi-Ok Kim Cincinnati Childrenrsquos Hospital Medical Center

1030 AM Causal Inference of Interaction Effects with Inverse Propen-sity Weighting G-Computation and Tree-Based Standard-izationJoseph Kang1 Xiaogang Su2 Lei Liu1 and MarthaDaviglus3 1 Northwestern University 2 University of Texasat El Paso 3 University of Illinois at Chicago

1055 AM Practice of Causal Inference with the Propensity of BeingZero or OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and PeterM Steiner3 1 Northwestern University 2University ofCincinnatiCincinnati Childrenrsquos Hospital Medical Center3University of Wisconsin-Madison

1120 AM Propensity Score and Proximity Matching Using RandomForestPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1 1SanDiego State University 2University of Texas at El Paso

1145 AM Discussant Joseph Kang Northwestern University

1210 PM Floor Discussion

Session 74 JP Hsu Memorial Session (Invited)Room Salon D Lower Level 1Organizers Lili Yu Georgia Southern University Karl PeaceGeorgia Southern University kepeacegeorgiasoutherneduChair Lili Yu Georgia Southern University

1030 AM Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili Yu Georgia Southern University

1055 AM (Student Paper Award) Estimating a Change-Point in High-Dimensional Markov Random Field Models Sandipan RoyUniversity of Michigan

1120 AM A Comparison of Size and Power of Tests of Hypotheses onParameters Based on Two Generalized Lindley DistributionsMacaulay Okwuokenye Biogen Idec

1145 AM Floor Discussion

Session 75 Challenge and New Development in Model Fit-ting and Selection (Invited)Room Salon G Lower Level 1Organizer Zhezhen Jin Columbia UniversityChair Cuiling Wang Yeshiva University

1030 AM Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2 1University ofNevada at Las Vegas 2American Museum of Natural History

1055 AM On A Class of Maximum Empirical Likelihood EstimatorsDefined By Convex FunctionsHanxiang Peng and Fei Tan Indiana University-PurdueUniversity Indianapolis

1120 AM Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai Wang New Jersey Institute of Technology

1145 AM Dual Model Misspecification in Generalized Linear Modelswith Error in VariablesXianzheng Huang University of Southern California

1210 PM Floor Discussion

Session 76 Advanced Methods and Their Applications inSurvival Analysis (Invited)Room Salon H Lower Level 1Organizers Jiajia Zhang University of South Carolina Wenbin LuNorth Carolina State UniversityChair Jiajia Zhang University of South Carolina

1030 AM Kernel Smoothed Profile Likelihood Estimation in the Ac-celerated Failure Time Frailty Model for Clustered SurvivalDataBo Liu1 Wenbin Lu1 and Jiajia Zhang2 1North CarolinaState University 2South Carolina University

1055 AM Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2 1National Uni-versity of Singapore 2Emory University

1120 AM Analysis of Event History Data in Tuberculosis (TB) Screen-ingJoan Hu Simon Fraser University

1145 AM On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1

and Mei-Cheng Wang3 1University of Texas MD An-derson Cancer Center 2University of Texas Health ScienceCenter at Houston 3Johns Hopkins University

1210 PM Floor Discussion

34 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Session name

Session 77 High Dimensional Variable Selection and Multi-ple Testing (Invited)Room Salon I Lower Level 1Organizer Zhigen Zhao Temple UniversityChair Jichun Xie Temple University

1030 AM On Procedures Controlling the False Discovery Rate forTesting Hierarchically Ordered HypothesesGavin Lynch and Wenge Guo New Jersey Institute ofTechnology

1055 AM Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 andYufeng Liu4 1University of Texas MD Anderson Can-

cer Center 2North Carolina State University 3University ofArizona 4University of North Carolina at Chapel Hill

1120 AM Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional Regression

Zhigen Zhao1 and Pengsheng Ji2 1Temple University2University of Georgia

1145 AM Pathwise Calibrated Active Shooting Algorithm with Appli-cation to Semiparametric Graph Estimation

Tuo Zhao1 and Han Liu2 1Johns Hopkins University2Princeton University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 35

Abstracts

Abstracts

Session 1 Emerging Statistical Methods for ComplexData

Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao GuoUniversity of Wisconsin-MadisoncmzhangstatwisceduIn statistical analysis of functional magnetic resonance imaging(fMRI) dealing with the temporal correlation is a major challengein assessing changes within voxels In this paper we aim to addressthis issue by considering a semi-parametric model for fMRI dataFor the error process in the semi-parametric model we constructa banded estimate of the auto-correlation matrix R and propose arefined estimate of the inverse of R Under some mild regularityconditions we establish consistency of the banded estimate with anexplicit convergence rate and show that the refined estimate con-verges under an appropriate norm Numerical results suggest thatthe refined estimate performs conceivably well when it is applied tothe detection of the brain activity

Kernel Additive Sliced Inverse RegressionHeng LianNanyang Technological UniversityshellinglianhenghotmailcomIn recent years nonlinear sufficient dimension reduction (SDR)methods have gained increasing popularity However while semi-parametric models in regression have fascinated researchers for sev-eral decades with a large amount of literature parsimonious struc-tured nonlinear SDR has attracted little attention so far In this pa-per extending kernel sliced inverse regression we study additivemodels in the context of SDR and demonstrate its potential use-fulness due to its flexibility and parsimony Theoretically we clar-ify the improved convergence rate using additive structure is due tofaster rate of decay of the kernelrsquos eigenvalues Additive structurealso opens the possibility of nonparametric variable selection Thissparsification of the kernel however does not introduce additionaltuning parameters in contrast with sparse regression Simulatedand real data sets are presented to illustrate the benefits and limita-tions of the approach

Variable Selection with Prior Information for Generalized Lin-ear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3

1Oregon State University2Nielsen Company3Nielsen CompanyyuanjiangstatoregonstateeduLASSO is a popular statistical tool often used in conjunction withgeneralized linear models that can simultaneously select variablesand estimate parameters When there are many variables of in-terest as in current biological and biomedical studies the powerof LASSO can be limited Fortunately so much biological andbiomedical data have been collected and they may contain usefulinformation about the importance of certain variables This paperproposes an extension of LASSO namely prior LASSO (pLASSO)to incorporate that prior information into penalized generalized lin-ear models The goal is achieved by adding in the LASSO criterion

function an additional measure of the discrepancy between the priorinformation and the model For linear regression the whole solu-tion path of the pLASSO estimator can be found with a proceduresimilar to the Least Angle Regression (LARS) Asymptotic theoriesand simulation results show that pLASSO provides signicant im-provement over LASSO when the prior information is relatively ac-curate When the prior information is less reliable pLASSO showsgreat robustness to the misspecication We illustrate the applicationof pLASSO using a real data set from a genome-wide associationstudy

Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2

1University of Missouri at Columbia2Purdue UniversityzhangxianymissourieduIn this talk we will focus on the problem of conducting inferencefor high dimensional weakly dependent time series Motivated bythe applications in modern high dimensional inference we derive aGaussian approximation result for the maximum of a sum of weaklydependent vectors using Steinrsquos method where the dimension ofthe vectors is allowed to be exponentially larger than the samplesize Our result reveals an interesting phenomenon arising fromthe interplay between the dependence and dimensionality the moredependent of the data vector the slower diverging rate of the di-mension is allowed for obtaining valid statistical inference A typeof dimension-free dependence structure is derived as a by-productBuilding on the Gaussian approximation result we propose a block-wise multiplier (Wild) bootstrap that is able to capture the depen-dence between and within the data vectors and thus provides high-quality distributional approximation to the distribution of the maxi-mum of vector sum in the high dimensional context

Session 2 Statistical Methods for Sequencing Data Anal-ysis

A Penalized Likelihood Approach for Robust Estimation of Iso-form ExpressionHui Jiang1 and Julia Salzman2

1University of Michigan2Stanford UniversityjianghuiumicheduUltra high-throughput sequencing of transcriptomes (RNA-Seq) hasenabled the accurate estimation of gene expression at individual iso-form level However systematic biases introduced during the se-quencing and mapping processes as well as incompleteness of thetranscript annotation databases may cause the estimates of isoformabundances to be unreliable and in some cases highly inaccurateThis paper introduces a penalized likelihood approach to detect andcorrect for such biases in a robust manner Our model extends thosepreviously proposed by introducing bias parameters for reads AnL1 penalty is used for the selection of non-zero bias parametersWe introduce an efficient algorithm for model fitting and analyzethe statistical properties of the proposed model Our experimentalstudies on both simulated and real datasets suggest that the model

36 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

has the potential to improve isoform-specific gene expression es-timates and identify incompletely annotated gene models This isjoint work with Julia Salzman

Classification on Sequencing Data and its Applications on a Hu-man Breast Cancer DatasetJun LiUniversity of Notre DamejunlindeduGene expression measured by the RNA-sequencing technique canbe used to classify biological samples from different groups suchas normal vs early-stage cancer vs cancer To get an interpretableclassifier with high robustness and generality often some types ofshrinkage is used to give a linear and sparse model In microarraydata an example is PAM (pattern analysis of microarrays) whichuses a nearest shrunken centroid classifier To accommodate the dis-crete nature of sequencing data this model was modified by usinga Poisson distribution We further generalize this model by usinga negative binomial distribution to take account of the overdisper-sion in the data We compare the performance of Gaussian Poissonand negative binomial based models on simulation data as well asa human breast cancer dataset We find while the cross-validationmisclassification rate of the three methods are often quite similarthe number of genes used by the models can be quite different andusing Gaussian model on carefully normalized data typically givesmodels with the least number of genes

Power-Robustness Analysis of Statistical Models for RNA Se-quencing DataGu Mi Yanming Di and Daniel W SchaferOregon State UniversitymigstatoregonstateeduWe present results from power-robustness analysis of several sta-tistical models for RNA sequencing (RNA-Seq) data We fit themodels to several RNA-Seq datasets perform goodness-of-fit teststhat we developed (Mi extitet al 2014) and quantify variations notexplained by the fitted models The statistical models we comparedare all based on the negative binomial (NB) distribution but differin how they handle the estimation of the dispersion parameter Thedispersion parameter summarizes the extra-Poisson variation com-monly observed in RNA-Seq data One widely-used power-savingstrategy is to assume some commonalities of NB dispersion param-eters across genes via simple models relating them to mean expres-sion rates and many such models have been proposed Howeverthe power benefit of the dispersion-modeling approach relies on theestimated dispersion models being adequate It is not well under-stood how robust the approach is if the fitted dispersion models areinadequate Our empirical investigations provide a further step to-wards understanding the pros and cons of different NB dispersionmodels and draw attention to power-robustness evaluation a some-what neglected yet important aspect of RNA-Seq data analysis

Session 3 Modeling Big Biological Data with ComplexStructures

High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1

1University of California at DavisjiepengucdaviseduProbabilistic graphical models are used as graphical presentationsof probability distributions particularly their conditional indepen-dence properties Graphical models have broad applications in the

fields of biology social science linguistic neuroscience etc Wewill focus on graphical model structure learning under the high di-mensional regime where to avoid over-fitting and to develop com-putationally efficient algorithms are particularly challenging Wewill discuss the use of data perturbation and model aggregation formodel building and model selection

Statistical Analysis of RNA Sequencing DataMingyao Li and Yu HuUniversity of PennsylvaniamingyaomailmedupenneduRNA sequencing (RNA-Seq) has rapidly replaced microarrays asthe major platform for transcriptomics studies Statistical analysisof RNA-Seq data however is challenging because various biasespresent in RNA-Seq data complicate the analysis and if not ap-propriately corrected can affect isoform expression estimation anddownstream analysis In this talk I will first present PennSeq astatistical method that estimates isoform-specific gene expressionPennSeq is a nonparametric-based approach that allows each iso-form to have its own non-uniform read distribution By giving ad-equate weight to the underlying data this empirical approach max-imally reflects the true underlying read distribution and is effectivein adjusting non-uniformity In the second part of my talk I willpresent a statistical method for testing differential alternative splic-ing by jointly modeling multiple samples I will show simulationresults as well as some examples from a clinical study

Quantifying the Role of Steric Constraints in Nucleosome Posi-tioningH Tomas Rube and Jun S SongUniversity of Illinois at Urbana-ChampaignsongjillinoiseduStatistical positioning the localization of nucleosomes packedagainst a fixed barrier is conjectured to explain the array of well-positioned nucleosomes at the 5rsquo end of genes but the extent andprecise implications of statistical positioning in vivo are unclear Iwill examine this hypothesis quantitatively and generalize the ideato include moving barriers Early experiments noted a similarity be-tween the nucleosome profile aligned and averaged across genes andthat predicted by statistical positioning however our study demon-strates that the same profile is generated by aligning random nu-cleosomes calling the previous interpretation into question Newrigorous analytic results reformulate statistical positioning as pre-dictions on the variance structure of nucleosome locations in indi-vidual genes In particular a quantity termed the variance gradientdescribing the change in variance between adjacent nucleosomes istested against recent high-throughput nucleosome sequencing dataConstant variance gradients render evidence in support of statisticalpositioning in about 50 of long genes Genes that deviate frompredictions have high nucleosome turnover and cell-to-cell gene ex-pression variability Our analyses thus clarify the role of statisticalpositioning in vivo

Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I MiasMichigan State UniversitygmiasmsueduThe emergence and ready availability of novel -omics technologiesis guiding our efforts to make advances in the implementation ofpersonalized medicine High quality genomic data is now comple-mented with other dynamic omes (eg transcriptomes proteomesmetabolomes autoantibodyomes) and other data providing tem-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 37

Abstracts

poral profiling of thousands of molecular components The anal-ysis of such dynamic omics data necessitates the development ofnew statistical and computational methodology towards the inte-gration of the different platforms Such an approach allows us tofollow changes in the physiological states of an individual includ-ing pathway changes over time and associated network interactions(inferred nodes amp connections) A framework implementing suchmethodology will be presented in association with a pilot person-alized medicine study that monitored an initially healthy individ-ual over multiple healthy and disease states The framework willbe described including raw data analysis approaches for transcrip-tome (RNA) sequencing mass spectrometry (proteins and smallmolecules) and protein array data and an overview of quantita-tion methods available for each analysis Examples how the data isintegrated in this framework using the personalized medicine pilotstudy will also be presented The extended framework infers novelpathways components and networks assessing topological changesand is being applied to other longitudinal studies to display changesthrough dynamical biological states Assessing such multimodalomics data has the great potential for implementations of a morepersonalized precise and preventative medicine

Session 4 Bayesian Approaches for Modeling DynamicNon-Gaussian Responses

Binary State Space Mixed Models with Flexible Link FunctionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut2Amgen Inc3Federal University of Rio de JaneirodipakdeyuconneduState space models (SSM) for binary time series data using a flexibleskewed link functions are introduced in this paper Commonly usedlogit cloglog and loglog links are prone to link misspecificationbecause of their fixed skewness Here we introduce three flexiblelinks as alternatives they are generalized extreme value (GEV) linksymmetric power logit (SPLOGIT) link and scale mixture of nor-mal (SMN) link Markov chain Monte Carlo (MCMC) methods forBayesian analysis of SSM with these links are implemented usingthe JAGS package a freely available software Model comparisonrelies on the deviance information criterion (DIC) The flexibilityof the propose model is illustrated to measure effects of deep brainstimulation (DBS) on attention of a macaque monkey performinga reaction-time task (Smith et al 2009) Empirical results showedthat the flexible links fit better over the usual logit and cloglog links

Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak K Dey21University of Cincinnati2University of Connecticut3Lawrence Berkeley National LaboratoryxiawanguceduA Bayesian hierarchical model is developed for count data with spa-tial and temporal correlations as well as excessive zeros unevensampling intensities and inference on missing spots Our contribu-tion is to develop a model on zero-inflated count data that providesflexibility in modeling spatial patterns in a dynamic manner andalso improves the computational efficiency via dimension reductionThe proposed methodology is of particular importance for studyingspecies presence and abundance in the field of ecological sciences

The proposed model is employed in the analysis of the survey databy the Northeast Fisheries Sciences Center (NEFSC) for estimationand prediction of the Atlantic cod in the Gulf of Maine - GeorgesBank region Model comparisons based on the deviance informa-tion criterion and the log predictive score show the improvement bythe proposed spatial-temporal model

Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing WengNational Chengchi UniversitychwengnccuedutwThe Bayesian item response models have been used in modeling ed-ucational testing and Internet ratings data Typically the statisticalanalysis is carried out using Markov Chain Monte Carlo (MCMC)methods However MCMC methods may not be computational fea-sible when real-time data continuously arrive and online parameterestimation is needed We develop an efficient algorithm based ona deterministic moment matching method to adjust the parametersin real-time The proposed online algorithm works well for tworeal datasets Moreover when compared with the offline MCMCmethods it achieves good accuracy but with considerably less com-putational time

Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras Stephen Pris-ley James Campbell and David GainesVirginia Institute of TechnologyjielivteduThe increasing demand for modeling spatio-temporal data is com-putationally challenging due to the large scale spatial and temporaldimensions involved The traditional Markov Chain Monte Carlo(MCMC) method suffers from slow convergence and is computa-tionally expensive The Integrated Nested Laplace Approximation(INLA) has been proposed as an alternative to speed up the compu-tation process by avoiding the extensive sampling process requiredby MCMC However even with INLA handling large-scale spatio-temporal prediction datasets remains difficult if not infeasible inmany cases This chapter proposes a new Divide-Recombine (DR)prediction method for dealing with spatio-temporal data A largespatial region is divided into smaller subregions and then INLA isapplied to fit a spatio-temporal model to each subregion To recoverthe spatial dependence an iterative procedure has been developedto recombine the model fitting and prediction results In particularthe new method utilizes a model offset term to make adjustmentsfor each subregion using information from neighboring subregionsStable estimationprediction results are obtained after several updat-ing iterations Simulations are used to validate the accuracy of thenew method in model fitting and prediction The method is thenapplied to the areal (census tract level) count data for Lyme diseasecases in Virginia from 2003 to 2010

Session 5 Recent Advances in Astro-Statistics

Embedding the Big Bang Cosmological Model into a BayesianHierarchical Model for Super Nova Light Curve DataDavid van Dyk Roberto Trotta Xiyun Jiao and Hikmatali ShariffImperial College LondondvandykimperialacukThe 2011 Nobel Prize in Physics was awarded for the discovery thatthe expansion of the Universe is accelerating This talk describes a

38 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Bayesian model that relates the difference between the apparent andintrinsic brightnesses of object to their distance which in turn de-pends on parameters that describe this expansion While apparentbrightness can be readily measured intrinsic brightness can only beobtained for certain objects Type Ia Supernova occur when ma-terial accreting onto a white dwarf drives mass above a thresholdand triggers a powerful supernova explosion Because this occursonly in a particular physical scenario we can use covariates to es-timate intrinsic brightness We use a hierarchical Bayesian modelto leverage this information to study the expansion history of theUniverse The model includes computer models that relate expan-sion parameters to observed brightnesses along with componentsthat account for measurement error data contamination dust ab-sorption repeated measures and covariate adjustment uncertaintySophisticated MCMC methods are employed for model fitting and asecondary Bayesian analysis is conducted for residual analysis andmodel checking

Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew Graham CiroDonalek and Andrew DrakeCalifornia Institute of TechnologyaamastrocaltecheduAstronomy datasets have been large and are getting larger by the day(TB to PB) This necessitates the use of advanced statistics for manypurposesHowever the datasets are often so large that small contam-ination rates imply large number of wrong results This makes blindapplications of methodologies unattractive Astronomical transientsare one area where rapid follow-up observations are required basedon very little data We show how the use of domain knowledge inthe right measure at the right juncture can improve classificationperformance We demonstrate this using Bayesian Networks andGaussian Process Regression on datasets from the Catalina Real-Time transient Survey which has covered 80 of the sky severaltens to a few hundreds of times over the last decade This becomeeven more critical as we move beyond PB-sized datasets in the com-ing years

Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku VrtilekHarvard UniversitybornnstatharvardeduBecause of their singular nature the primary method to obtain in-formation about stellar mass black holes is to study those that arepart of a binary system However we have no widely applicablemeans of determining the nature of the compact object (whether ablack hole [BH] or a neutron star [NS]) in a binary system Thedefinitive method is dynamic measurement of the mass of the com-pact object and that can be reliably established only for eclipsingsystems The motivation for finding a way to differentiate the pres-ence of NH or BH in any XRB system is strong subtle differencesin the behavior of neutron star and black hole X-ray binaries providetests of fundamental features of gravitation such as the existence ofa black hole event horizon In this talk we present a statistical ap-proach for classifying binary systems using a novel 3D representa-tion called a color-color-intensity diagram combined with nonlinearclassification techniques The method provides natural and accurateprobabilistic classifications of X-ray binary objects

Persistent Homology and the Topology of the IntergalacticMediumFabrizio LecciCarnegie Mellon University

leccicmueduLight we observe from quasars has traveled through the intergalacticmedium (IGM) to reach us and leaves an imprint of some proper-ties of the IGM on its spectrum There is a particular imprint ofwhich cosmologists are familiar dubbed the Lyman-alpha forestFrom this imprint we can infer the density fluctuations of neutralhydrogen along the line of sight from us to the quasar With cosmo-logical simulation output we develop a methodology using localpolynomial smoothing to model the IGM Then we study its topo-logical features using persistent homology a method for probingtopological properties of point clouds and functions Describing thetopological features of the IGM can aid in our understanding of thelarge-scale structure of the Universe along with providing a frame-work for comparing cosmological simulation output with real databeyond the standard measures Motivated by this example I willintroduce persistent homology and describe some statistical tech-niques that allow us to separate topological signal from topologicalnoise

Session 6 Statistical Methods and Application in Genet-ics

Identification of Homogeneous and Heterogeneous CovariateStructure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1

1New York University2North Carolina State Universityxc311nyueduPooled analyses which make use of data from multiple studies asa single dataset can achieve large sample size to increase statisticalpower When inter-study heterogeneity exists however the simplepooling strategy may fail to present a fair and complete picture onvariables with heterogeneous effects Therefore it is of great im-portance to know the homogeneous and heterogeneous structure ofvariables in pooled studies In this presentation we propose a penal-ized partial likelihood approach with adaptively weighted compos-ite penalties on variablesrsquo homogeneous effects and heterogeneouseffects We show that our method can characterize the structure ofvariables as heterogeneous homogeneous and null effects and si-multaneously provide inference for the non-zero effects The resultsare readily extended to the high-dimension situation where the num-ber of parameters diverges with sample size The proposed selectionand estimation procedure can be easily implemented using the iter-ative shooting algorithm We conduct extensive numerical studiesto evaluate the practical performance of our proposed method anddemonstrate it using real studies

Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibrosisin Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia Guillaume Wettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLCwenfeizhangsanoficomTranslational biomarkers are markers that produce biological sig-nals translatable from animal models to human models Identify-ing translational biomarkers can be important for disease diagno-sis prognosis and risk prediction in drug development Thereforethere is a growing demand on statistical analyses for biomarker dataespecially for large and complex genetic data To ensure the qual-ity of statistical analyses we develop a statistical analysis pipeline

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 39

Abstracts

for gene expression data When the pipeline is applied to gene ex-pression data from drug induced idiopathic pulmonary fibrosis inanimal models it shows some interesting results in evaluating thetranslatability of genes through comparisons with human models

DNA Methylation Cell-Type Distribution and EWASE Andres HousemanOregon State UniversityandreshousemanoregonstateeduEpigenetic processes form the principal mechanisms by which celldifferentiation occurs Consequently DNA methylation measure-ments are strongly influenced by the DNA methylation profilesof constituent cell types as well as by their mixing proportionsEpigenomewide association studies (EWAS) aim to find associ-ations of phenotype or exposure with DNA methylation at sin-gle CpG dinucleotides but these associations are potentially con-founded by associations with overall cell-type distribution In thistalk we review the literature on epigenetics and cell mixture Wethen present two techniques for mixture-adjusted EWAS the firstrequires a reference data set which may be expensive or infeasibleto collect while the other is free of this requirement Finally weprovide several data analysis examples using these techniques

Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and Iuliana Lonita-Laza1

1Columbia University2New York Universityyw2148columbiaeduCase-control designs are widely used in epidemiology and otherfields to identify factors associated with a disease of interest Thesestudies can also be used to study the associations of risk factorswith secondary outcomes such as biomarkers of the disease andprovide cost-effective way to understand disease mechanism Mostof the existing methods have focused on inference on the mean ofsecondary outcomes In this paper we propose a quantile-based ap-proach We construct a new family of estimating equations to makeconsistent and efficient estimation of conditional quantiles using thecase-control sample and also develop tools for statistical inferenceSimulations are conducted to evaluate the practical performance ofthe proposed approach and a case-control study on genetic associ-ation with asthma is used to demonstrate the method

Session 7 Statistical Inference of Complex Associationsin High-Dimensional Data

Leveraging for Big Data RegressionPing MaUniversity of GeorgiapingmaugaeduAdvances in science and technology in the past a few decades haveled to big data challenges across a variety of fields Extractionof useful information and knowledge from big data has become adaunting challenge to both the science community and entire soci-ety To tackle this challenge requires major breakthroughs in effi-cient computational and statistical approaches to big data analysisIn this talk I will present some leveraging algorithms which makea key contribution to resolving the grand challenge In these algo-rithms by sampling a very small representative sub-dataset usingsmart algorithms one can effectively extract relevant informationof vast data sets from the small sub-dataset Such algorithms arescalable to big data These efforts allow pervasive access to big data

analytics especially for those who cannot directly use supercomput-ers More importantly these algorithms enable massive ordinaryusers to analyze big data using tablet computers

Reference-free Metagenomics Analysis Using Matrix Factoriza-tionWenxuan Zhong and Xin XingUniversity of Georgiawenxuanugaedu

metagenomics refers to the study of a collection of genomes typi-cally microbial genomes present in a sample The sample itself cancome from diverse sources depending on the study eg a samplefrom the gastrointestinal tract of a human patient from or a sam-ple of soil from a particular ecological origin The premise is thatby understanding the genomic composition of the sample one canform hypotheses about properties of the sample eg disease corre-lates of the patient or ecological health of the soil source Existingmethods are limited in complex metagenome studies by consider-ing the similarity between some short DNA fragments and genomesin database In this talk I will introduce a reference free genomedeconvolution algorithm that can simultaneously estimate the com-position of a microbial community and estimate the quantity of eachspecies some theoretical results of the deconvolution method willalso be discussed

Big Data Big models Big Problems Statistical Principles andPractice at ScaleAlexander W BlockerGoogleawblockergooglecom

Massive datasets can yield great insights but only when unitedwith sound statistical principles and careful computation We sharelessons from a set of problems in industry all which combine clas-sical design and theory with large-scale computation Simply ob-taining reliable confidence intervals means grappling with complexdependence and distributed systems and obtaining masses of addi-tional data can actually degrade estimates without careful inferenceand computation These problems highlight the opportunities forstatisticians to provide a distinct contribution to the world of bigdata

Session 8 Recent Developments in Survival Analysis

Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical TrialsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2 Mark EBoye3 and Wei Shen3

1University of Connecticut2University of North Carolina3Eli Lilly and Companyming-huichenuconnedu

Motivated from the large phase III multicenter randomized single-blind EMPHACIS mesothelioma clinical trial we develop a classof shared parameter joint models for multi-dimensional longitudi-nal and survival data Specifically we propose a class of multivari-ate mixed effects regression models for multi-dimensional longitu-dinal measures and a class of frailty and cure rate survival mod-els for progression free survival (PFS) time and overall survival(OS) time The properties of the proposed models are examinedin detail In addition we derive the decomposition of the loga-rithm of the pseudo marginal likelihood (LPML) (ie LPML =

40 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

LPMLLong +LPMLSurv|Long) to assess the fit of each compo-nent of the joint model and in particular to assess the fit of the lon-gitudinal component and the survival component of the joint modelseparately and further use ∆LPML to determine the importance andcontribution of the longitudinal data to the model fit of the survivaldata Moreover efficient Markov chain Monte Carlo sampling algo-rithms are developed to carry out posterior computation We applythe proposed methodology to a detailed case study in mesothelioma

Estimating Risk with Time-to-Event Data An Application tothe Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu2

1Vanderbilt University2Fred Hutchinson Cancer Research Centerdandanliuvanderbiltedu

Accurate and individualized risk prediction is critical for popula-tion control of chronic diseases such as cancer and cardiovasculardisease Large cohort studies provide valuable resources for build-ing risk prediction models as the risk factors are collected at thebaseline and subjects are followed over time until disease occur-rence or termination of the study However for rare diseases thebaseline risk may not be estimated reliably based on cohort dataonly due to sparse events In this paper we propose to make useof external information to improve efficiency for estimating time-dependent absolute risk We derive the relationship between exter-nal disease incidence rates and the baseline risk and incorporate theexternal disease incidence information into estimation of absoluterisks while allowing for potential difference of disease incidencerates between cohort and external sources The asymptotic distribu-tions for the proposed estimators are established Simulation resultsshow that the proposed estimator for absolute risk is more efficientthan that based on the Breslow estimator which does not utilize ex-ternal disease incidence rates A large cohort study the WomenrsquosHealth Initiative Observational Study is used to illustrate the pro-posed method

Efficient Estimation of Nonparametric Genetic Risk Functionwith Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng3

1Columbia University2Beijing Normal University3University of North Carolina at Chapel Hillyw2016columbiaedu

With an increasing number of causal genes discovered forMendelian and complex human disorders it is important to assessthe genetic risk distribution functions of disease onset for subjectswho are carriers of these causal mutations and compare them withthe disease distribution in non-carriers In many genetic epidemi-ological studies of the genetic risk functions the disease onset in-formation is subject to censoring In addition subjectsrsquo mutationcarrier or non-carrier status is unknown due to thecost of ascertain-ing subjects to collect DNA samples or due to death in older sub-jects (especially for late onset disease) Instead the probability ofsubjectsrsquo genetic marker or mutation status can be obtained fromvarious sources When genetic status is missing the available datatakes the form of mixture censored data Recently various meth-ods have been proposed in the literature using parametric semi-parametric and nonparametric models to estimate the genetic riskdistribution functions from such data However none of the existingapproach is efficient in the presence of censoring and mixture andthe computation for some methods is demanding In this paper wepropose a sieve maximum likelihood estimation which is fully effi-

cient to infer genetic risk distribution functions nonparametricallySpecifically we estimate the logarithm of hazards ratios betweengenetic risk groups using B-splines while applying the nonpara-metric maximum likelihood estimation (NPMLE) for the referencebaseline hazard function Our estimator can be calculated via anEM algorithm and the computation is much faster than the exist-ing methods Furthermore we establish the asymptotic distributionof the obtained estimator and show that it is consistent and semi-parametric efficient and thus the optimal estimator in this frame-work The asymptotic theory on our sieve estimator sheds light onthe optimal estimation for censored mixture data Simulation stud-ies demonstrate superior performance of the proposed method insmall finite samples The method is applied to estimate the distri-bution of Parkinsonrsquos disease (PD) age at onset for carriers of mu-tations in the leucine-rich repeat kinase 2 (LRRK2) G2019S geneusing the data from the Michael J Fox Foundation AshkenaziJewishLRRK2 consortium This estimation is important for genetic coun-seling purposes since this test is commercially available yet geneticrisk (penetrance) estimates have been variable

Support Vector Hazard Regression for Predicting Event TimesSubject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng1

1University of North Carolina2Columbia UniversitydzengemailunceduPredicting dichotomous or continuous disease outcomes using pow-erful machine learning approaches has been studied extensively invarious scientific areas However how to learn prediction rules fortime-to-event outcomes subject to right censoring has received lit-tle attention until very recently Existing approaches rely on in-verse probability weighting or rank-based methods which are inef-ficient In this paper we develop a novel support vector hazards re-gression (SVHR) approach to predict time-to-event outcomes usingright censored data Our method is based on predicting the countingprocess via a series of support vector machines for time-to-eventoutcomes among subjects at risk Introducing counting processesto represent the time-to-event data leads to an intuitive connectionof the method with support vector machines in standard supervisedlearning and hazard regression models in standard survival analy-sis The resulting optimization is a convex quadratic programmingproblem that can easily incorporate non-linearity using kernel ma-chines We demonstrate an interesting connection of the profiledempirical risk function with the Cox partial likelihood which shedslights on the optimality of SVHR We formally show that the SVHRis optimal in discriminating covariate-specific hazard function frompopulation average hazard function and establish the consistencyand learning rate of the predicted risk Simulation studies demon-strate much improved prediction accuracy of the event times usingSVHR compared to existing machine learning methods Finally weapply our method to analyze data from two real world studies todemonstrate superiority of SVHR in practical settings

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products

Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D NortonMedImmunenortonjmedimmunecom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 41

Abstracts

Benefit-risk assessments are multidimensional and hence challeng-ing both to formulate and to communicate A particular limitation ofsome benefit-risk graphics is that they are based on the marginal dis-tributions of benefit and harm and do not show the degree to whichthey occur in the same patients Consider for example an imagi-nary drug that is beneficial to 50At the 2010 ICSA Symposium the speaker introduced a graphicshowing the benefit-risk state of each subject over time This talkwill include a new graphic based on similar principles that is in-tended for early phase studies It allows the user to assess the jointdistribution of benefit and harm on the individual and cohort levelsThe speaker will also review other graphical displays that may beeffective for benefit-risk assessment considering accepted princi-ples of statistical graphics and his experience working for FDA andindustry

Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3 GeorgeQuartey4 John Scott5 and Shihua Wen6

1Amgen Inc2Pfizer Inc3Merck amp Co4Hoffmann-La Roche5United States Food and Drug Administration6AbbVie IncqjiangamgencomIncreasingly companies regulatory agencies and other governancebodies are moving toward structured benefitrisk assessment ap-proaches One issue that complicates such structured approachesis uncertainty which comes from multiple sources and needs to beaddressed To develop potential approaches to address these sourcesof uncertainty it is critical first to have a thorough understanding ofthem In this presentation members from the Benefit-risk Work-ing Group of the Quantitative Sciences in Pharmaceutical Industry(QSPI BRWG) will discuss some major sources of uncertainty andshare some thoughts on how to address them

Current Concept of Benefit Risk Assessment of MedicineSyed S IslamAbbVie IncsyedislamabbviecomBenefit-risk assessment of a medicine should be as dynamic as thestages of drug development and life cycle of a drug Three fun-damental clinical concepts are critical at all stages- seriousness ofthe disease how much improvement will occur due to the drug un-der consideration and harmful effects including frequency serious-ness and duration One has to achieve a desirable balance betweenthese particularly prior to market approval and follow-up prospec-tively to see that the balance is maintained The desirable balanceis not a straightforward concept It depends on judgment by var-ious stakeholders The patients who are the direct beneficiary ofthe medicine should be the primary stakeholder provided adequateclear and concise information are available to them The healthcareproviders must have similar information that they can communicateto their patients The regulators and insurers are also stakehold-ers for different reasons Industry that are developing or producingthe drug must provide adequate and transparent information usableby all stakeholders Any quantitative approach to integrated bene-fit risk balance should be parsimonious and transparent along withsensitivity analyses This presentation will discuss pros and consof a dynamic benefit risk assessment and how integrated befit risk

analyses can be incorporated within the FDAEMA framework thatincludes patient preference

Session 10 Analysis of Observational Studies and Clini-cal Trials

Impact of Tuberculosis on Mortality Among HIV-Infected Pa-tients Receiving Antiretroviral Therapy in Uganda A CaseStudy in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 and LehanaThabane31Agensys Inc (Astellas)2University of OttawaMcMaster University3McMaster University4McMaster UniversityUniversity of Toronto5The AIDS Support Organization6Stellenbosch UniversityrongchuagensyscomBackground Tuberculosis (TB) disease affects survival among HIVco-infected patients on antiretroviral therapy (ART) Yet the mag-nitude of TB disease on mortality is poorly understoodMethods Using a prospective cohort of 22477 adult patients whoinitiated ART between August 2000 and June 2009 in Uganda weassessed the effect of active pulmonary TB disease at the initiationof ART on all-cause mortality using a Cox proportional hazardsmodel Propensity score (PS) matching was used to control for po-tential confounding Stratification and covariate adjustment for PSand not PS-based multivariable Cox models were also performedResults A total of 1609 (752) patients had active pulmonaryTB at the start of ART TB patients had higher proportions of beingmale suffering from AIDS-defining illnesses having World HealthOrganization (WHO) disease stage III or IV and having lower CD4cell counts at baseline (piexcl0001) The percentages of death duringfollow-up were 1047 and 638 for patients with and withoutTB respectively The hazard ratio (HR) for mortality comparing TBto non-TB patients using 1686 PS-matched pairs was 137 (95confidence interval [CI] 108 - 175) less marked than the crudeestimate (HR = 174 95 CI 149 - 204) The other PS-basedmethods and not PS-based multivariable Cox model produced sim-ilar resultsConclusions After controlling for important confounding variablesHIV patients who had TB at the initiation of ART in Uganda had anapproximate 37 increased hazard of overall mortality relative tonon-TB patients

Ecological Momentary Assessment Methods to Increase Re-sponse and Adjust for Attrition in a Study of Middle SchoolStudentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie Kovalchik KirstenBecker Elizabeth DrsquoAmico William Shadel and Marc ElliottRAND CorporationskovalchrandorgEcological momentary assessment (EMA) is a new approach forcollecting data about repeated exposures in natural settings thathas become more practical with the growth of mobile technolo-gies EMA has the potential to reduce recall bias However be-cause EMA occurs more often and frequently than traditional sur-veys missing data is common In this paper we describe the de-sign and preliminary results of a longitudinal EMA study of expo-sure to alcohol advertising among middle school students (n=600)

42 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

which employed a randomized missing design to increase responserates to smartphone surveys Early results (n=125) show evidenceof attrition over the 14-day collection period which was not associ-ated with student characteristics but was associated with study dayWe develop a prediction model for non-response and adjust for at-trition in exposure summaries using inverse probability weightingAttrition-adjusted estimates suggest that youths saw an average of38 alcohol ads per day over twice what has been previously re-ported with conventional assessment Corrected for attrition EMAmay allow more accurate estimation of frequent exposures than one-time delayed recall

Is Poor Antisaccade Performance in Healthy First-Degree Rel-atives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and Deborah L Levy31University of Alabama at Birmingham2State University of New York at Binghamton3McLean Hospitalcjmorganuabedu

A number of traits associated with schizophrenia aggregate in rel-atives of schizophrenia patients at rates much higher than thatof the clinical disorder These traits considered candidate en-dophenotypes may be alternative more penetrant manifestations ofschizophrenia risk genes than schizophrenia itself Performance onthe antisaccade task a measure of eye-tracking dysfunction is oneof the most widely studied candidate endophenotypes Howeverthere is little consensus on whether poor antisaccade performanceis a true endophenotype for schizophrenia Some studies compar-ing the performance of healthy relatives of schizophrenia patients(RelSZ) to that of normal controls (NC) report that RelSZ showsignificantly more errors while others find no statistically signifi-cant differences between the two groups A recent meta-analysis ofthese studies noted that some studies used stricter exclusion criteriafor NC than RelSZ and found these studies were more likely to findsignificant effect sizes Specifically NC in these studies with a per-sonal or family history of psychopathology were excluded whereasall RelSZ including those with psychotic conditions were includedIn order to determine whether a difference in antisaccade perfor-mance between NC and RelSZ remains after controlling for differ-ences in psychopathology we a binomial regression model to datafrom an antisaccade task We demonstrate that both psychopathol-ogy and familial history affect antisaccade performance

Analysis of a Vaccine Study in Animals using Mitigated Frac-tion in SASMathew RosalesExperismattrosalesexperiscom

Mitigated fraction is frequently used to evaluate the effect of an in-tervention in reducing the severity of a particular outcome a com-mon measure in vaccines study It utilizes rank of the observa-tions and measures the overlap of the two distributions using theirstochastic ordering Percent lung involvement is a common end-point in vaccines study to assess efficacy and mitigated fractionis used to estimate the relative increase in probability that a dis-ease will be less severe to the vaccinated group A SAS macro wasdevelop to estimate the mitigated fraction and its confidence inter-val The macro provides an asymptotic confidence interval and abootstrap-based interval For illustration an actual vaccine studywas used where the macro was utilized to generate the estimates

Competing Risks Survival Analysis for Efficacy Evaluation of

Some-or-None Vaccines

Paul T Edlefsen

Fred Hutchinson Cancer Research Centerpedlefsefhcrcorg

Evaluation of a vaccinersquos efficacy to prevent a specific type of in-fection endpoint in the context of multiple endpoint types is animportant challenge in biomedicine Examples include evaluationof multivalent vaccines such as the annual influenza vaccines thattarget multiple strains of the pathogen While statistical methodshave been developed for ldquomark-specific vaccine efficacyrdquo (wherethe term ldquomarkrdquo refers to a feature of the endpoint such as its typein contrast to a covariate of the subject) these methods addressonly vaccines that have a ldquoleakyrdquo vaccine mechanism meaningthat the vaccinersquos effect is to reduce the per-exposure probabilityof infection The usual presentation of vaccine mechanisms con-trasts ldquoleakyrdquo with ldquoall-or-nonerdquo vaccines which completely pro-tect some fraction of the subjects independent of the number ofexposures that each subject experiences We introduce the notion ofthe ldquosome-or-nonerdquo vaccine mechanism which completely protectsa fraction of the subjects from a defined subset of the possible end-point marks for example for a flu vaccine that completely protectsagainst the seasonal flu but has no effect against the H1N1 strainUnder conditions of non-harmful vaccines we introduce a frame-work and Bayesian and frequentist methods to detect and quantifythe extent to which a vaccinersquos partial efficacy is attributable to un-even efficacy across the marks rather than to incomplete ldquotakerdquo ofthe intervention These new methods provide more power than ex-isting methods to detect mark-varying efficacy (also called ldquosieveeffectsrdquo when the conditions hold We demonstrate the new frame-work and methods with simulation results and with new analyses ofgenetic signatures of vaccine effects in the RV144 HIV-1 vaccineefficacy trial

Using Historical Data to Automatically Identify Air-TrafficController Behavior

Yuefeng Wu

University of Missouri at St Louiswuyueumsledu

The Next Generation Air Traffic Control Systems are trajectory-based automation systems that rely on predictions of future statesof aircraft instead of just using human abilities that is how Na-tional Airspace System (NAS) does now As automation relyingon trajectories becomes more safety critical the accuracy of thesepredictions needs to be fully understood Also it is very importantfor researchers developing future automation systems to understandand in some cases mimic how current operations are conducted byhuman controllers to ensure that the new systems are at least as ef-ficient as humans and to understand creative solutions used by hu-man controllers The work to be presented answers both of thesequestions by developing statistical-based machine learning modelsto characterize the types of errors present when using current sys-tems to predict future aircraft states The models are used to infersituations in the historical data where an air-traffic controller inter-vened on an aircraftrsquos route even when there is no direct recordingof this action Local time series models and some other statisticsare calculated to construct the feature vector then both naive Bayesclassifier and support vector machine are used to learn the patternof the prediction errors

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 43

Abstracts

Session 11 Lifetime Data Analysis

Analysis of Multiple Type Recurrent Events When Only PartialInformation Is Available for Some SubjectsMin Zhan and Jeffery FinkUniversity of MarylandmzhanepiumarylandeduIn many longitudinal studies subjects may experience multipletypes of recurrent events In some situations the exact occurrencetimes of the recurrent events are not observed for some subjectsInstead the only information available is whether these subjects ex-perience each type of event in successive time intervals We discussmarginal models to assess the effect of baseline covariates on the re-current events The proposed methods are applied to a clinical studyof chronic kidney disease in which subjects can experience multipletypes of safety events repeatedly

Cumulative Incidence Function under Two-Stage Randomiza-tionIdil Yavuz1 Yu Cheng2 and Abdus Wahed2

1 Dokuz Eylul University2 University of PittsburghyuchengpitteduIn recent years personalized medicine and dynamic treatment regi-mens have drawn considerable attention Dynamic treatment regi-mens are sets of rules that govern the treatment of subjects depend-ing on their intermediate responses or covariates Two-stage ran-domization is a useful set-up to gather data for making inference onsuch regimens Meanwhile more and more practitioners becomeaware of competing-risk censoring for event type outcomes wheresubjects in a study are exposed to more than one possible failureand the specific event of interest may be dependently censored bythe occurrence of competing events We aim to compare severaltreatment regimens from a two-stage randomized trial on survivaloutcomes that are subject to competing-risk censoring With thepresence of competing risks cumulative incidence function (CIF)has been widely used to quantify the cumulative probability of oc-currence of the target event by a specific time point However if weonly use the data from those subjects who have followed a specifictreatment regimen to estimate the CIF the resulting naive estima-tor may be biased Hence we propose alternative non-parametricestimators for the CIF using inverse weighting and provide infer-ence procedures based on the asymptotic linear representation Inaddition test procedures are developed to compare the CIFs fromtwo different treatment regimens Through simulation we show thepracticality and advantages of the proposed estimators compared tothe naive estimator Since dynamic treatment regimens are widelyused in treating cancer AIDS psychological disorders and otherillnesses that require complex treatment and competing-risk cen-soring is common in studies with multiple endpoints the proposedmethods provide useful inferential tools to analyze such data andwill help advocate research in personalized medicine

Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen JinColumbia Universityzj7columbiaeduIn biomedical research and practice quantitative biomarkers are of-ten used for diagnostic or prognostic purposes with a threshold es-tablished on the measurement to aid binary classification Whenprognosis is on survival time single threshold may not be infor-mative It is also challenging to select threshold when the survival

time is subject to random censoring Using survival time dependentsensitivity and specificity we extend classification accuracy basedobjective function to allow for survival dependent threshold Toestimate optimal threshold for a range of survival rate we adopt anon-parametric procedure which produces satisfactory result in asimulation study The method will be illustrated with a real exam-ple

Session 12 Safety Signal Detection and Safety Analysis

Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhang and QiJiangAmgen Incmagchenamgencom

With the increased regulatory requirements for risk evaluation andminimization strategies large volumes of comprehensive safetydata have been collected and maintained by pharmaceutical spon-sors and proactive evaluation of such safety data for continuousassessment of product safety profile has become essential duringthe drug development life-cycle This presentation will introduceseveral key statistical methodologies developed for safety signalscreening detection including some methods recommended by reg-ulatory agencies for spontaneous reporting data as well as a few re-cently developed methodologies for clinical trials data In additionextensive simulation results will be presented to compare perfor-mance of these methods in terms of sensitivity and false discoveryrate The conclusion and recommendation will be briefed as well

Application of a Bayesian Method for Blinded Safety Monitor-ing and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Incshihuawenabbviecom

Monitoring patient safety is an indispensable component of clini-cal trial planning and conduct Proactive blinded safety monitoringand signal detection in on-going clinical trials enables pharmaceu-tical sponsors to monitor patient safety closely and at the same timemaintain the study blind Bayesian methods by their nature of up-dating knowledge based on accumulating data provide an excel-lent framework for carrying out such a safety monitoring processThis presentation will provide a step by step illustration of howseveral Bayesian models such as beta-binomial model Poisson-gamma model posterior probability vs predictive probability cri-terion etc can be applied to safety monitoring for a particular ad-verse event of special interest (AESI) in a real clinical trial settingunder various adverse event occurrence patterns

Some Thoughts on the Choice of Metrics for Safety EvaluationSteven SnapinnAmgen Incssnapinnamgencom

The magnitude of the treatment effect on adverse events can be as-sessed on a relative scale such as the hazard ratio or the relative riskor on an absolute scale such as the risk difference but there doesnrsquotappear to be any consistency regarding which metric should be usedin any given situation In this presentation I will provide some ex-amples where different metrics have been used discuss their advan-tages and disadvantages and provide a suggested approach

44 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2

1Amgen Inc2Gilead SciencesliangfanggileadcomAs an important aspect of the clinical evaluation of an investiga-tional therapy safety data are routinely collected in clinical trialsTo date the analysis of safety data has largely been limited to de-scriptive summaries of incidence rates or contingency tables aim-ing to compare simple rates between treatment arms Many haveargued this traditional approach failed to take into account impor-tant information including severity onset time and multiple occur-rences of a safety signal In addition premature treatment discon-tinuation due to excessive toxicity causes informative censoring andmay lead to potential bias in the interpretation of safety outcomesIn this article we propose a framework to summarize safety datawith mean frequency function and compare safety events of interestbetween treatments with a generalized log-rank test taking into ac-count the aforementioned characteristics ignored in traditional anal-ysis approaches In addition a multivariate generalized log-ranktest to compare the overall safety profile of different treatments isproposed In the proposed method safety events are considered tofollow a recurrent event process with a terminal event for each pa-tient The terminal event is modeled by a process of two types ofcompeting risks safety events of interest and other terminal eventsStatistical properties of the proposed method are investigated viasimulations An application is presented with data from a phase IIoncology trial

Session 13 Survival and Recurrent Event Data Analysis

Survival Analysis without Survival DataGary ChanUniversity of WashingtonkcgchanuweduWe show that relative mean survival parameters of a semiparametriclog-linear model can be estimated using covariate data from an inci-dent sample and a prevalent sample even when there is no prospec-tive follow-up to collect any survival data Estimation is based onan induced semiparametric density ratio model for covariates fromthe two samples and it shares the same structure as for a logisticregression model for case-control data Likelihood inference coin-cides with well-established methods for case-control data We showtwo further related results First estimation of interaction parame-ters in a survival model can be performed using covariate informa-tion only from a prevalent sample analogous to a case-only analy-sis Furthermore propensity score and conditional exposure effectparameters on survival can be estimated using only covariate datacollected from incident and prevalent samples

Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2

1Johns Hopkins University2National Institute of Allergy and Infectious DiseasescyhuangjhmieduSurvival data from prevalent cases collected under a cross-sectionalsampling scheme are subject to left-truncation When fitting an ad-ditive hazards model to left-truncated data the conditional estimat-ing equation method (Lin and Ying 1994) obtained by modifyingthe risk sets to account for left-truncation can be very inefficient

as the marginal likelihood of the truncation times is not used inthe estimation procedure In this paper we use a pairwise pseudo-likelihood to eliminate nuisance parameters from the marginal like-lihood and by combining the marginal pairwise pseudo-score func-tion and the conditional estimating function propose an efficientestimator for the additive hazards model The proposed estimatoris shown to be consistent and asymptotically normally distributedwith a sandwich-type covariance matrix that can be consistently es-timated Simulation studies show that the proposed estimator ismore efficient than its competitors A data analysis illustrates themethod

Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 and ToddDeFor11University of Minnesota2Johns Hopkins Universityluox0054umnedu

Infection is one of the most common complications afterhematopoietic cell transplantation It accounts for substantial mor-bidity and mortality among transplanted patients Many patientsexperience infectious complications repeatedly over time Existingstatistical methods for recurrent gap time data typically assume thatpatients are enrolled due to the occurrence of an event of the sametype as the recurrent event or assume that all gap times includingthe first gap are identically distributed Applying these methods onthe post-transplant infection data by ignoring event types will in-evitably lead to incorrect inferential results because the time fromthe transplant to the first infection has a different biological mean-ing than the gap times between recurrent infections after the firstinfection occurs Alternatively one may only analyze data afterthe first infection to make the existing recurrent gap time methodsapplicable but this introduces selection bias because only patientswho have experienced infections are included in the analysis Othernaive approaches may include using the univariate survival analysismethods eg the Kaplan-Meier method on the first infection onlydata or using the bivariate serial event data methods on the data upto the second infections Hence all subsequent infection data be-yond the first or the second infectious events will not be utilized inthe analysis These inefficient methods are expected to lead to de-creased power In this paper we propose a nonparametric estimatorof the joint distribution of time from transplant to the first infectionand the gap times between following infections and a semiparamet-ric regression model for studying the risk factors of infectious com-plications of the transplant patients The proposed methods takeinto account the potentially differential distribution of two types oftimes (time from transplant to the first infection and the gap timesbetween subsequent recurrent infections) and fully utilizes the dataof recurrent infections from patients Asymptotic properties of theproposed estimators are established

Session 14 Statistical Analysis on Massive Data fromPoint Processes

Identification of Synaptic Learning Rule from Ensemble Spik-ing ActivitiesDong Song and Theodore W BergerUniversity of Southern Californiadsonguscedu

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 45

Abstracts

Brain represents and processes information with spikes To under-stand the biological basis of brain functions it is essential to modelthe spike train transformations performed by brain regions Sucha model can also be used as a computational basis for developingcortical prostheses that can restore the lost cognitive function bybypassing the damaged brain regions We formulate a three-stagestrategy for such a modeling goal First we formulated a multiple-input multiple-output physiologically plausible model for repre-senting the nonlinear dynamics underlying spike train transforma-tions This model is equivalent to a cascade of a Volterra model anda generalized linear model The model has been successfully ap-plied to the hippocampal CA3-CA1 during learned behaviors Sec-ondly we extend the model to nonstationary cases using a point-process adaptive filter technique The resulting time-varying modelcaptures how the MIMO nonlinear dynamics evolve with time whenthe animal is learning Lastly we seek to identify the learning rulethat explains how the nonstationarity is formed as a consequence ofthe input-output flow that the brain region has experienced duringlearning

Intrinsically Weighted Means and Non-Ergodic Marked PointProcessesAlexander Malinowski1 Martin Schlather1and Zhengjun Zhang2

1University Mannheim2University of WisconsinzjzstatwisceduWhilst the definition of characteristics such as the mean mark in amarked point process (MPP) setup is non-ambiguous for ergodicprocesses several definitions of mark averages are possible andmight be practically relevant in the stationary but non-ergodic caseWe give a general approach via weighted means with possibly in-trinsically given weights We discuss estimators in this situationand show their consistency and asymptotic normality under certainconditions We also suggest a specific choice of weights that has aminimal variance interpretation under suitable assumptions

Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan WangColorado State UniversitysienkiewstatcolostateeduThis talk is motivated by a data set of brain neuron cells Each neu-ron is modeled as an unlabeled data object with topological and ge-ometric properties characterizing the branching structure connect-edness and orientation of a neuron This poses serious challengessince traditional statistical methods for multivariate data rely on lin-ear operations in Euclidean space We develop two curve represen-tations for each object and define the notion of percentiles basedon measures of topological and geometric variations through multi-objective optimization In general numerical solutions can be pro-vided by implementing genetic algorithm The proposed methodol-ogy is illustrated by analyzing a data set of pyramidal neurons

Session 15 High Dimensional Inference (or Testing)

Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni SunUniversity of PennsylvaniatingniwhartonupenneduThis paper studies the problem of estimating a large coefficient ma-trix in a multiple response linear regression model when the coef-ficient matrix is both sparse and of low rank We are especiallyinterested in the high dimensional settings where the number of

predictors andor response variables can be much larger than thenumber of observations We propose a new estimation schemewhich achieves competitive numerical performance while signifi-cantly reducing computation time when compared with state-of-the-art methods Moreover we show the proposed estimator achievesnear optimal non-asymptotic minimax rates of estimation under acollection of squared Schatten norm losses simultaneously by pro-viding both the error bounds for the estimator and minimax lowerbounds In particular such optimality results hold in the high di-mensional settings

Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen LiuUniversity of GeorgiayiwenliuugaeduThe early detection of biothreat is extremely difficult because mostof the early clinical signs in infected subjects show indistinguish-able ldquoflu-likerdquo symptoms Recent researches show that the genomicmarkers are the most reliable indicators and thus are widely usedin the existing detection methods in the past decades In this talk Iwill introduce a biomarker screening method based on the weightedleverage score The weighted leverage score is a variant of the lever-age score that has been widely used for the diagnostic of linear re-gression Empirical studies demonstrate that the weighted leveragescore is not only computationally efficient but also statistically ef-fective in variable screening

Testing High-Dimensional Nonparametric Function with Appli-cation to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and Vidyadhar MandrekarMichigan State UniversitypszhongsttmsueduThis paper proposes a test statistic for testing the high-dimensionalnonparametric function in a reproducing kernel Hilbert space gen-erated by a positive definite kernel We studied the asymptotic dis-tribution of the test statistic under the null hypothesis and a series oflocal alternative hypotheses in a large p smalln setup A simulationstudy was used to evaluate the finite sample performance of the pro-posed method We applied the proposed method to yeast data andthyroid hormone data to identify pathways that are associated withtraits of interest

Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping LiuNational Institutes of HealthdanpingliunihgovThe NEXT Generation Health study investigates the dating violenceof adolescents using a survey questionnaire Each student is askedto affirm or deny multiple instances of violence in hisher datingrelationship There is however evidence suggesting that studentsnot in a relationship responded to the survey resulting in excessivezeros in the responses This paper proposes likelihood-based andestimating equation approaches to analyze the zero-inflated clus-tered binary response data We adopt a mixed model method toaccount for the cluster effect and the model parameters are esti-mated using a maximum-likelihood (ML) approach that requires aGaussian-Hermite quadrature (GHQ) approximation for implemen-tation Since an incorrect assumption on the random effects distribu-tion may bias the results we construct generalized estimating equa-tions (GEE) that do not require the correct specification of within-cluster correlation In a series of simulation studies we examine

46 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the performance of ML and GEE methods in terms of their biasefficiency and robustness We illustrate the importance of properlyaccounting for this zero-inflation by re-analyzing the NEXT datawhere this issue has previously been ignored

Session 16 Phase II Clinical Trial Design with SurvivalEndpoint

Utility-Based Optimization of Schedule-Dose Regimes based onthe Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 and MuzaffarQazilbash1

1University of Texas MD Anderson Cancer Center2University of Michiganrexmdandersonorg

A two-stage Bayesian phase I-II design for jointly optimizing ad-ministration schedule and dose of an experimental agent based onthe times to response and toxicity is described Sequentially adap-tive decisions are based on the joint utility of the two event timesA utility surface is constructed by partitioning the two-dimensionalquadrant of event time pairs into rectangles eliciting a numericalutility for each rectangle and fitting a smooth parametric functionto the elicited values Event times are modeled using gamma distri-butions with shape and scale parameters both functions of sched-ule and dose In stage 1 patients are randomized fairly amongschedules and a dose is chosen within each schedule using an algo-rithm that hybridizes greedy optimization and randomization amongnearly optimal doses In stage 2 fair randomization among sched-ules is replaced by the hybrid algorithm An extension to accommo-date death or discontinuation of follow up is described The designis illustrated by an autologous stem cell transplantation trial in mul-tiple myeloma

Bayesian Decision Theoretic Two-Stage Design in Phase II Clin-ical Trials with Survival EndpointLili Zhao and Jeremy TaylorUniversity of Michiganzhaoliliumichedu

In this study we consider two-stage designs with failure-time end-points in single arm phase II trials We propose designs in whichstopping rules are constructed by comparing the Bayes risk of stop-ping at stage one to the expected Bayes risk of continuing to stagetwo using both the observed data in stage one and the predicted sur-vival data in stage two Terminal decision rules are constructed bycomparing the posterior expected loss of a rejection decision ver-sus an acceptance decision Simple threshold loss functions are ap-plied to time-to-event data modelled either parametrically or non-parametrically and the cost parameters in the loss structure are cal-ibrated to obtain desired Type I error and power We ran simula-tion studies to evaluate design properties including type IampII errorsprobability of early stopping expected sample size and expectedtrial duration and compared them with the Simon two-stage de-signs and a design which is an extension of the Simonrsquos designswith time-to-event endpoints An example based on a recently con-ducted phase II sarcoma trial illustrates the method

Single-Arm Phase II Group Sequential Trial Design with Sur-vival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping XiongSt Jude Childrenrsquos Research Hospitaljianrongwustjudeorg

Three non-parametric test statistics are proposed to design single-arm phase II group sequential trials for monitoring survival proba-bility The small-sample properties of these test statistics are stud-ied through simulations Sample size formulas are derived for thefixed sample test The Brownian motion property of the test statis-tics allowed us to develop a flexible group sequential design using asequential conditional probability ratio test procedure

Session 17 Statistical Modeling of High-throughput Ge-nomics Data

Learning Genetic Architecture of Complex Traits Across Popu-lationsMarc Coram Sophie Candille and Hua TangStanford UniversityhualtanggmailcomGenome-wide association studies (GWAS) have successfully re-vealed many loci that influence complex traits and disease suscep-tibilities An unanswered question is ldquoto what extent does the ge-netic architecture underlying a trait overlap between human popula-tionsrdquo We explore this question using blood lipid concentrations asa model trait In African Americans and Hispanic Americans par-ticipating in the Womenrsquos Health Initiative SNP Health AssociationResource we validated one African-specific HDL locus as well as14 known lipid loci that have been previously implicated in stud-ies of European populations Moreover we demonstrate strikingsimilarities in genetic architecture (loci influencing the trait direc-tion and magnitude of genetic effects and proportions of pheno-typic variation explained) of lipid traits across populations In par-ticular we found that a disproportionate fraction of lipid variationin African Americans and Hispanic Americans can be attributed togenomic loci exhibiting statistical evidence of association in Euro-peans even though the precise genes and variants remain unknownAt the same time we found substantial allelic heterogeneity withinshared loci characterized both by population-specific rare variantsand variants shared among multiple populations that occur at dis-parate frequencies The allelic heterogeneity emphasizes the impor-tance of including diverse populations in future genetic associationstudies of complex traits such as lipids furthermore the overlapin lipid loci across populations of diverse ancestral origin arguesthat additional knowledge can be gleaned from multiple popula-tions We discuss how the overlapping genetic architecture can beexploited to improve the efficiency of GWAS in minority popula-tions

A Bayesian Hierarchical Model to Detect Differentially Methy-lated Loci from Single Nucleotide Resolution Sequencing DataHao Feng Karen Coneelly and Hao WuEmory UniversityhaowuemoryeduDNA methylation is an important epigenetic modification that hasessential roles in cellular processes including gene regulation de-velopment and disease and is widely dysregulated in most types ofcancer Recent advances in sequencing technology have enabled themeasurement of DNA methylation at single nucleotide resolutionthrough methods such as whole-genome bisulfite sequencing andreduced representation bisulfite sequencing In DNA methylationstudies a key task is to identify differences under distinct biologicalcontexts for example between tumor and normal tissue A chal-lenge in sequencing studies is that the number of biological repli-cates is often limited by the costs of sequencing The small number

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 47

Abstracts

of replicates leads to unstable variance estimation which can re-duce accuracy to detect differentially methylated loci (DML) Herewe propose a novel statistical method to detect DML when com-paring two treatment groups The sequencing counts are describedby a lognormal-beta-binomial hierarchical model which providesa basis for information sharing across different CpG sites A Waldtest is developed for hypothesis testing at each CpG site Simulationresults show that the proposed method yields improved DML detec-tion compared to existing methods particularly when the numberof replicates is low The proposed method is implemented in theBioconductor package DSS

Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and Mingyao Li21University of Minnesota2University of Pennsylvania3Vanderbilt Universityrxiaomailmedupennedu

A major application of RNA-Seq is to detect differential isoform ex-pression across experimental conditions However this is challeng-ing because of uncertainty in isoform expression estimation owingto ambiguous reads and variability in the precision of the estimatesacross samples It is desirable to have a method that can accountfor these issues and also allows adjustment of covariates In thispaper we present a random-effects meta-regression approach thatnaturally fits for this purpose Through extensive simulations andanalysis of an RNA-Seq dataset on human heart failure we showthat this approach is computationally fast reliable and can improvethe power of differential expression analysis while controlling forfalse positives due to the effect of covariates or confounding vari-ables

Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei ZouUniversity of North Carolina at Chapel Hillfeizouemailuncedu

Next generation Methyl-seq data collected from F1 reciprocalcrosses in mouse can powerfully dissect strain and parent-of-origineffects on allelic specific methylation In this talk we present anovel statistical approach to analyze Methyl-seq data motivated byan F1 mouse study Our method jointly models the strain and parentof origin effects and deals with the over-dispersion problem com-monly observed in read counts and can flexibly adjust for the effectsof covariates such as sex and read depth We also propose a genomiccontrol procedure to properly control type I error for Methyl-seqstudies where the number of samples is small

Session 18 Statistical Applications in Finance

A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2

1State University of New York2IBMxingamssunysbedu

Markov switching model has been used in various applications ineconomics and finance As exisitng Markov switching models de-scribe the regimes or parameter values in a categorical way itis restrictive in practical analysis In this paper we introduce amixture model with stochastic regimes in which the regimes and

model parameters are represented both categorically and continu-ously Assuming conjudge priors we develop closed-form recur-sive Bayes estimates of the regression parameters an approxima-tion scheme that has much lower computational complexity and yetare comparable to the Bayes estimates in statistical efficiency andan expectation-maximization procedure to estimate the unknownhyper-parameters We conduct intensive simulation studies to eval-uate the performance of Bayes estimates of time-varying parametersand their approximations We further apply the proposed model toanalyze the series of the US monthly total non-farm employee

Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming HuoGeorgia Institute of TechnologyxiaomingisyegatecheduAd position auctions are being held all the time in nearly all websearch engines and have become the major source of revenue in on-line advertising We study statistical models of the bidding pricesTwo approaches are explored (1) a game theoretic approach thatcharacterizes biddersrsquo behavior and (2) a statistical generative ap-proach which aims at mimicking the fundamental mechanism un-derlying the bidding process We comparecontrast these two ap-proaches and describe how auctioneer can take advantage of theobtained knowledge

Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4 andHsun-Chih Kuo5

1University of Maryland2Seoul National Univ3Auburn University4Ulsan National Institute of Science and Technology5National Chengchi UniversityjohanlimsnuackrThis work is motivated by a hand-collected data set from one ofthe largest internet portal in Korea The data set records the top 30most frequently discussed stocks on its online stock message boardwhich can be considered as a measure of investorrsquos attention on in-dividual stocks The empirical goal of the data set is to investigatethe attentionrsquos effect to the trading behavior To do it we considerthe regression model whose response is either stock return perfor-mance or trading volume and covariates are the daily-observed par-tial ranks as well as other covariates influential to the response Inestimating the regression model the rank covariate is often treatedas an ordinal categorical variable or simply transformed into a scorevariable (mostly using identify score function) In the paper westart our discussion with that for the univariate regression problemwhere we find the asymptotic normality of the regression coefficientestimator whose mean is 0 and variance is an unknown function ofthe distribution of X We then straightforwardly extend the resultsof univariate regression to multiple regression and have the similarasympototic distribution We finally consider an estimator for mul-tiple sets by extending or combining the estimators of each singleset We apply our proposed distribution guided scoring function tothe motivated data set to empirically prove the attention effect

Optimal Sparse Volatility Matrix Estimation for High Dimen-sional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou3

1Florida State University2University of Wisconsin-Madison

48 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

3Yale UniversitytaostatfsueduStochastic processes are often used to model complex scientificproblems in fields ranging from biology and finance to engineeringand physical science This talk investigates rate-optimal estimationof the volatility matrix of a high dimensional Ito process observedwith measurement errors at discrete time points The minimax rateof convergence is established for estimating sparse volatility ma-trices By combining the multi-scale and threshold approaches weconstruct a volatility matrix estimator to achieve the optimal conver-gence rate The minimax lower bound is derived by considering asubclass of Ito processes for which the minimax lower bound is ob-tained through a novel equivalent model of covariance matrix esti-mation for independent but non-identically distributed observationsand through a delicate construction of the least favorable parame-ters In addition a simulation study was conducted to test the finitesample performance of the optimal estimator and the simulationresults were found to support the established asymptotic theory

Session 19 Hypothesis Testing

A Score-type Test for Heterogeneity in Zero-inflated Models ina Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem3

1Auburn University2Kansas State University3Michigan State Universitygzc0009auburneduWe propose a score-type statistic to evaluate heterogeneity in zero-inflated models for count data in a stratified population where het-erogeneity is defined as instances in which the zero counts are gen-erated from two sources In this work we extend the literature bydescribing a score-type test to evaluate homogeneity against generalalternatives that do not neglect the stratification information underthe alternative hypothesis Our numerical simulation studies showthat the proposed test can greatly improve efficiency over tests ofheterogeneity that ignore the stratification information An empiri-cal application to dental caries data in early childhood further showsthe importance and practical utility of the methodology in using thestratification profile to detect heterogeneity in the population

Inferences on Correlation Coefficients of Bivariate Log-normalDistributionsGuoyi Zhang1 and Zhongxue Chen2

1Universtiy of New Mexico2Indiana Universitygzhang123gmailcomThis research considers inference on the correlation coefficients ofbivariate log-normal distributions We developed a generalized con-fidence interval and hypothesis tests for the correlation coefficientand extended the results for comparing two independent correla-tions Simulation studies show that the suggested methods workwell even for small samples The methods are illustrated using twopractical examples

Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 Myrto Barrdahl3 andNilanjan Chatterjee11National Cancer Institute2Harvard University3German Cancer Reserch Centersongm4mailnihgov

Risk-prediction models need careful calibration to ensure they pro-duce unbiased estimates of risk for subjects in the underlying pop-ulation given their risk-factor profiles As subjects with extremehigh- or low- risk may be the most affected by knowledge of theirrisk estimates checking adequacy of risk models at the extremes ofrisk is very important for clinical applications We propose a newapproach to test model calibration targeted toward extremes of dis-ease risk distribution where standard goodness-of-fit tests may lackpower due to sparseness of data We construct a test statistic basedon model residuals summed over only those individuals who passhigh andor low risk-thresholds and then maximize the test-statisticover different risk-thresholds We derive an asymptotic distribu-tion for the max-test statistic based on analytic derivation of thevariance-covariance function of the underlying Gaussian processThe method is applied to a large case-control study of breast can-cer to examine joint effects of common SNPs discovered thoroughrecent genome-wide association studies The analysis clearly indi-cates non-additive effect of the SNPs on the scale of absolute riskbut an excellent fit for the linear-logistic model even at the extremesof risks

Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun MaAmgen Incphuamgencom

It is well known that sample sizes of clinical trials are often not bigenough to assess adverse events (AE) with very low incidence ratesLarge scale observational studies such as pharmacovigilence stud-ies using healthcare databases provide an alternative resource forassessment of very rare adverse events Healthcare databases oftencan easily provide tens of thousands of exposed patients which po-tentially allows the assessment of events as rare as in the magnitudeof iexcl 10minus4In this talk we discuss the performance of various commonly usedstatistical methods for comparison of binomial proportions of veryrare events The statistical power type I error control confidenceinterval (CI) coverage length of confidence interval bias and vari-ability of treatment effect estimates as well as the distribution of CIupper bound etc will be examined and compared for the differentmethods Power calculation is often necessary for study planningpurpose However many commonly used power calculation meth-ods are based on approximation and may give erroneous estimatesof power when events are We will compare the power estimates fordifferent methods provided by SAS Proc Power and empirically ob-tained via simulation The use of relative risks (RR) and risk differ-ences (RD) will also be commented on Based on these results sev-eral recommendations are given to guide sample size assessmentsfor such types of studies at design stage

Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu LiAuburn Universityxzl0037auburnedu

This paper proposes a class of lack-of-fit tests for fitting a paramet-ric regression model when response variables are missing at ran-dom These tests are based on a class of minimum integrated squaredistances between a kernel type estimator of a regression functionand the parametric regression function being fitted These tests areshown to be consistent against a large class of fixed alternativesThe corresponding test statistics are shown to have asymptotic nor-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 49

Abstracts

mal distributions under null hypothesis Some simulation results arealso presented

Session 20 Design and Analysis of Clinical Trials

Application of Bayesian Approach in Assessing Rare AdverseEvents during a Clinical StudyGrace Li Karen Price Haoda Fu and David MannerEli Lilly and CompanyLi Ying GracelillycomBayesian analysis is gaining wider application in decision makingthroughout the drug development process due to its more intuitiveframework and ability to provide direct probabilistic answers tocomplex problems Determining the risk profile for a compoundthroughout phases of drug development is crucial along with ensur-ing the most appropriate analyses are performed In a conventional2-arm parallel study design rare adverse events are often assessedvia frequentist approaches such as a Fisherrsquos exact test with itsknown limitations This presentation will focus on the challengesof the frequentist approach to detect and evaluate potential safetysignals in the rare event setting and compare it with the proposedBayesian approach We will compare the operational characteristicsbetween the frequentist and the Bayesian approaches using simu-lated data Most importantly the proposed approach offers muchmore flexibility and a more direct probabilistic interpretation thatimproves the process of detecting rare safety signals This approachhighlights the strength of Bayesian methods for inference Thesimulation results are intended to demonstrate the value of usingBayesian methods and that appropriate application has the potentialto increase efficiency of decision making in drug development

A Simplified Varying-Stage Adaptive Phase IIIII Clinical TrialDesignGaohong DongNovartis Pharmaceuticals CorporationgaohongdongnovartiscomConventionally adaptive phase IIIII clinical trials are carriedout with a strict two-stage design Recently Dong (Statistics inMedicine 2014 33(8)1272-87) proposed a varying-stage adap-tive phase IIIII clinical trial design In this design following thefirst stage an intermediate stage can be adaptively added to obtainmore data so that a more informative decision could be made re-garding whether the trial can be advanced to the final confirmatorystage Therefore the number of further investigational stages is de-termined based upon data accumulated to the interim analysis LaterDong (2013 ICSA Symposium Book to be published) investigatedsome characteristics of this design This design considers two plau-sible study endpoints with one of them initially designated as theprimary endpoint Based on interim results another endpoint canbe switched as the primary endpoint However in many therapeuticareas the primary study endpoint is well established therefore wesimplify this design to consider one study endpoint only Our sim-ulations show that same as the original design this simplified de-sign controls Type I error rate very well the sample size increasesas the threshold probability for the two-stage setting increases andthe alpha allocation ratio in the two-stage setting vs the three-stagesetting has a great impact to the design However this simplifieddesign requires a larger sample size for the initial stage to overcomethe power loss due to the futility Compared to a strict two-stagePhase IIIII design this simplified design improves the probabilityof trial success

Improving Multiple Comparison Procedures With CoprimaryEndpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and Frank Bretz11Novartis Pharmaceuticals Corporation2University of BremenJenniferlinovartiscomFor a fixed-dose combination of indacaterol acetate (long-acting β2-agonist) and mometasone furoate (inhaled corticosteroid) for theonce daily maintenance treatment of asthma and Chronic Obstruc-tive Pulmonary Disease(COPD) both lung function improvementand one symptom outcome improvement are required for the drug tobe developed successfully The symptom outcome could be AsthmaControl Questionnaire (ACQ) improvement for the asthma programand exacerbation rate reduction for the COPD program Havingtwo endpoints increases the probability of false positive results bychance alone ie marketing a drug which is not or insufficientlyeffective Therefore regulatory agencies require strict control ofthis probability at a pre-specified significance level (usually 251-sided) The Simes test is often used in our clinical trials How-ever the Simes test requires the assumption that the test statistics arepositively correlated This assumption is not always satisfied or can-not be easily verified when dealing with multiple endpoints In thispresentation an extension of the Simes test - a generalized Simestest introduced by Maurer Glimm Bretz (2011) which is applica-ble to any correlation (positive negative or even no correlation) isutilized Power benefits based on simulations are presented FDAand other agencies have accepted this approach indicating that theproposed method can be used in other trials in future

Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine CrespiUniversity of California at Los AngelesshengwuuclaeduCluster randomized trials (CRTs) are increasingly used for researchin many fields including public health education social studies andethnic disparity studies Equal allocation designs are often used inCRTs but they may not be optimal especially when cost considera-tion is taken into account In this paper we consider two-arm clusterrandomized trials with a binary outcome and develop various opti-mal designs when sampling costs for units and clusters are differentand the primary outcome is attributable risk or relative risk Weconsider both frequentist and Bayesian approaches in the context ofcancer control and prevention cluster randomized trials and presentformuale for optimal sample sizes for the two arms for each of theoutcome measure

Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and Risk Dif-ferenceTianyue ZhouSanofi-aventis US LLCtianyuezhousanoficomMeta-analysis of side effects has been widely used to combine datawith low event rate across comparative clinical studies for evaluat-ing drug safety profile When dealing with rare events a substantialproportion of studies may not have any events of interest In com-mon practice meta-analyses on a relative scale (relative risk [RR]or odds ratio [OR]) remove zero-event studies while meta-analysesusing risk difference [RD] as the effect measure include them Ascontinuity corrections are often used when zero event occurs in ei-ther arm of a study the impact of zero event and continuity cor-rection on estimates of Mantel-Haenszel (M-H) OR and RD was

50 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

examined through simulation Two types of continuity correctionthe treatment arm continuity correction and the constant continuitycorrection are applied in the meta-analysis for variance calculationFor M-H OR it is unnecessary to include zero-event trials and the95 confidence interval [CI] of the estimate without continuity cor-rections provided best coverage For H-M RD including zero-eventtrials reduced bias and using certain continuity correction ensured atleast 95 coverage of 95 CI This paper examined the influence ofzero event and continuity correction on estimates of M-H OR andRD in order to help people decide whether to include zero-eventtrials and use continuity corrections for a specific problem

Session 21 New methods for Big Data

Sure Independence Screening for Gaussian Graphical ModelsShikai Luo1 Daniela Witten2 and Rui Song1

1North Carolina State University2University of WashingtonrsongncsueduIn high-dimensional genomic studies it is of interest to understandthe regulatory network underlying tens of thousands of genes basedon hundreds or at most thousands of observations for which geneexpression data is available Because graphical models can identifyhow variables such as the coexpresion of genes are related theyare frequently used to study genetic networks Although various ef-ficient algorithms have been proposed statisticians still face hugecomputational challenges when the number of variables is in tens ofthousands of dimensions or higher Motivated by the fact that thecolumns of the precision matrix can be obtained by solving p regres-sion problems each of which involves regressing that feature ontothe remaining pminus 1 features we consider covariance screening forGaussian graphical models The proposed methods and algorithmspossess theoretical properties such as sure screening properties andsatisfactory empirical behavior

Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman2

1Google2Iowa State UniversitydnettiastateeduRandom forest (RF) methodology is a nonparametric methodologyfor prediction problems A standard way to utilize RFs includesgenerating a global RF in order to predict all test cases of interestIn this talk we propose growing different RFs specific to differenttest cases namely case-specific random forests (CSRFs) In con-trast to the bagging procedure used in the building of standard RFsthe CSRF algorithm takes weighted bootstrap resamples to createindividual trees where we assign large weights to the training casesin close proximity to the test case of interest a priori Tuning meth-ods are discussed to avoid overfitting issues Both simulation andreal data examples show that CSRFs often outperform standard RFsin prediction We also propose the idea of case-specific variable im-portance (CSVI) as a way to compare the relative predictor variableimportance for predicting a particular case It is possible that theidea of building a predictor case-specifically can be generalized inother areas

Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis2University of North Carolina at Chapel Hill

tcmleeucdaviseduIn this talk we present a novel parallel method for computing param-eter estimates and their standard errors for massive data problemsThe method is based on generalized fiducial inference

OEM Algorithm for Big DataXiao Nie and Peter Z G QianUniversity of Wisconsin-MadisonxiaoniestatwisceduBig data with large sample size arise in Internet marketing engi-neering and many other fields We propose an algorithm calledOEM (aka orthogonalizing EM) for analyzing big data This al-gorithm employs a procedure named active orthogonalization toexpand an arbitrary matrix to an orthogonal matrix This procedureyields closed-form solutions to ordinary and various penalized leastsquares problems The maximum number of points needed to beadded is bounded by the number of columns of the original ma-trix which is appealing for large n problems Attractive theoreticalproperties of OEM include (1) convergence to the Moore-Penrosegeneralized inverse estimator for a singular regression matrix and(2) convergence to a point having grouping coherence for a fullyaliased regression matrix We also extend this algorithm to logisticregression The effectiveness of OEM for least square and logisticregression problems will be illustrated through examples

Session 22 New Statistical Methods for Analysis of HighDimensional Genomic Data

Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung HuangBrown UniversityYen-Tsung HuangbrowneduGiven the availability of genomic data there have been emerging in-terests in integrating multi-platform data Here we propose to modelepigenetic DNA methylation micro-RNA expression and gene ex-pression data as a biological process to delineate phenotypic traitsunder the framework of causal mediation modeling We proposea regression model for the joint effect of methylation micro-RNAexpression and gene expression and their non-linear interactions onthe outcome and study three path-specific effects the direct effectof methylation on the outcome the effect mediated through expres-sion and the effect through micro-RNA expression We characterizecorrespondences between the three path-specific effects and coeffi-cients in the regression model which are influenced by causal rela-tions among methylation micro-RNA and gene expression A scoretest for variance components of regression coefficients is developedto assess path-specific effects The test statistic under the null fol-lows a mixture of chi-square distributions which can be approxi-mated using a characteristic function inversion method or a pertur-bation procedure We construct tests for candidate models deter-mined by different combinations of methylation micro-RNA geneexpression and their interactions and further propose an omnibustest to accommodate different models The utility of the methodwill be illustrated in numerical simulation studies and a glioblas-toma data from The Cancer Genome Atlas (TCGA)

Estimation of High Dimensional Directed Acyclic Graphs usingeQTL dataWei Sun1 and Min Jin Ha2

1University of North Carolina at Chapel Hill2University of Texas MD Anderson Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 51

Abstracts

weisunemailuncedu

Observational data can be used to estimate the skeleton of a di-rected acyclic graph (DAG) and the directions of a limited numberof edges With sufficient interventional data one can identify thedirections of all the edges of a DAG However such interventionaldata are often not available especially for high dimensional prob-lems We develop a statistical method to estimate a DAG using sur-rogate interventional data where the interventions are applied to aset of external variables and thus such interventions are consideredto be surrogate interventions on the variables of interest Our workis motivated by expression quantitative trait locus (eQTL) studieswhere the variables of interest are the expression of genes the ex-ternal variables are DNA variations and interventions are appliedto DNA variants during the process that a randomly selected DNAallele is passed to a child from either parent Our method namedas sirDAG (surrogate intervention recovery of DAG) first constructDAG skeleton using a combination of penalized regression and thePC algorithm and then estimate the posterior probabilities of all theedge directions after incorporating DNA variant data We demon-strate advantage of sirDAG by simulations and an application in aneQTL study of iquest18000 genes in 550 breast cancer patients

Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 and HongyuZhao1

1Yale University2University of Texas at Dallas3Bristol-Myers Squibb4Mount-Sinai Medical Centerhongyuzhaoyaleedu

Although Genome Wide Association Studies (GWAS) have iden-tified many susceptibility loci for common diseases they only ex-plain a small portion of heritability It is challenging to identify theremaining disease loci because their association signals are likelyweak and difficult to identify among millions of candidates Onepotentially useful direction to increase statistical power is to incor-porate functional genomics information especially gene expressionnetworks to prioritizeGWASsignals Most current methods utiliz-ing network information to prioritize disease genes are based onthe ldquoguilt by associationrdquo principle in which networks are treatedas static and disease-associated genes are assumed to locate closerwith each other than random pairs in the network In contrast wepropose a novel ldquoguilt by rewiringrdquo principle Studying the dynam-ics of gene networks between controls and patients this principleassumes that disease genes more likely undergo rewiring in patientswhereas most of the network remains unaffected in disease condi-tion To demonstrate this principle we consider thechanges of co-expression networks in Crohnrsquos disease patients andcontrols andhow network dynamics reveals information on disease associationsOur results demonstrate that network rewiring is abundant in theimmune system anddisease-associated genes are morelikely to berewired in patientsTo integrate this network rewiring feature andGWAS signals we propose to use the Markov random field frame-work to integrate network information to prioritize genes Appli-cations in Crohnrsquos disease and Parkinsonrsquos disease show that thisframework leads to more replicable results and implicates poten-tially disease-associated pathways

Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and Genotyping Stud-

iesNi Zhao and Michael WuFred Hutchinson Cancer Research CenternzhaofhcrcorgComprehensive understanding of complex trait etiology requires ex-amination of multiple sources of genomic variability Integrativeanalysis of these data sources promises elucidation of the biologicalprocesses underlying particular phenotypes Consequently manylarge GWAS consortia are expanding to simultaneously examine thejoint role of DNA methylation Two practical challenges have arisenfor researchers interested in joint analysis of GWAS and methyla-tion studies of the same subjects First it is unclear how to leverageboth data types to determine if particular genetic regions are relatedto traits of interest Second it is of considerable interest to under-stand the relative roles of different sources of genomic variabilityin complex trait etiology eg whether epigenetics mediates geneticeffects etc Therefore we propose to use the powerful kernel ma-chine framework for first testing the cumulative effect of both epige-netic and genetic variability on a trait and for subsequent mediationanalysis to understand the mechanisms by which the genomic datatypes influence the trait In particular we develop an approach thatworks at the generegion level (to allow for a common unit of anal-ysis across data types) Then we compare pair-wise similarity in thetrait values between individuals to pairwise similarity in methyla-tion and genotype values for a particular gene with correspondencesuggestive of association Similarity in methylation and genotypeis found by constructing an optimally weighted average of the sim-ilarities in methylation and genotype For a significant generegionwe then develop a causal steps approach to mediation analysis atthe generegion level which enables elucidation of the manner inwhich the different data types work or do not work together Wedemonstrate through simulations and real data applications that ourproposed testing approach often improves power to detect trait as-sociated genes while protecting type I error and that our mediationanalysis framework can often correctly elucidate the mechanisms bywhich genetic and epigenetic variability influences traits A key fea-ture of our approach is that it falls within the kernel machine testingframework which allows for heterogeneity in effect sizes nonlinearand interactive effects and rapid p-value computation Addition-ally the approach can be easily applied to analysis of rare variantsand sequencing studies

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation process

Joint Modeling of Alternating Recurrent Transition TimesLiang LiUniversity of Texas MD Anderson Cancer CenterLLi15mdandersonorgAtrial fibrillation (AF) is a common complication on patients under-going cardiac surgery Recent technological advancement enablesthe physicians to monitor the occurrence AF continuously with im-planted cardiac devices The device records two types of transitionaltimes the time when the heart enters the AF status from normal beatand the time when the heart exits from AF status and returns to nor-mal beat The two transitional time processes are recurrent and ap-pear alternatively Hundreds of transitional times may be recordedon a single patient over a follow-up period of up to 12 months Therecurrent pattern carries information on the risk of AF and may berelated to baseline covariates The previous AF pattern may be pre-dictive to the subsequent AF pattern We propose a semiparametric

52 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

bivariate longitudinal transitional time model to model this compli-cated process The model enables single subject analysis as well asmultiple subjects analysis and both can be carried out in a likelihoodframework We present numerical studies to illustrate the empiricalperformance of the methodology

Regression Analysis of Panel Count Data with Informative Ob-servation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun4

1University of North Carolina at Charlotte2University of Maryland3University of New Hampshire4University of Missouri at ColumbiaYLiunccedu

Panel count data usually occur in medical follow-up studies Mostexisting approaches on panel count data analysis assumed that theobservation or censoring times are independent of the response pro-cess either completely or given some covariates We present ajoint analysis approach in which the possible mutual correlationsare characterized by time-varying random effects Estimating equa-tions are developed for the parameter estimation and a simulationstudy is conducted to assess the finite sample performance of theapproach The asymptotic properties of the proposed estimates arealso given and the method is applied to an illustrative example

Envelope Linear Mixed ModelXin ZhangUniversity of Minnesotazhxnzxgmailcom

Envelopes were recently proposed by Cook Li and Chiaromonte(2010) as a method for reducing estimative and predictive varia-tions in multivariate linear regression We extend their formulationproposing a general definition of an envelope and adapting enve-lope methods to linear mixed models Simulations and illustrativedata analysis show the potential for envelope methods to signifi-cantly improve standard methods in longitudinal and multivariatedata analysis This is joint work with Professor R Dennis Cook andProfessor Joseph G Ibrahim

Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan CaiUniversity of Texas health Science Center at Houstonccaistatgmailcom

In longitudinal data analyses the observation times are often as-sumed to be independent of the outcomes In applications in whichthis assumption is violated the standard inferential approach of us-ing the generalized estimating equations may lead to biased infer-ence Current methods require the correct specification of either theobservation time process or the repeated measure process with a cor-rect covariance structure In this article we construct a novel pair-wise pseudo-likelihood method for longitudinal data that allows fordependence between observation times and outcomes This methodinvestigates the marginal covariate effects on the repeated measureprocess while leaving the probability structure of the observationtime process unspecified The novelty of this method is that ityields consistent estimator of the marginal covariate effects with-out specification of the observation time process or the covariancestructure of repeated measures process Large sample propertiesof the regression coefficient estimates and a pseudolikelihood-ratiotest procedure are established Simulation studies demonstrate thatthe proposed method performs well in finite samples An analysis of

weight loss data from a web-based program is presented to illustratethe proposed method

Session 24 Bayesian Models for High Dimensional Com-plex Data

A Bayesian Feature Allocation Model for Tumor HeterogeneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and Kamalakar Gulukota4

1University of California at Santa Cruz2University of Texas at Austin3University of Chicago4Northshore University HealthSystemjuheeleesoeucsceduWe propose a feature allocation model to model tumor heterogene-ity The data are next-generation sequencing data (NGS) from tumorsamples We use a variation of the Indian buffet process to charac-terize latent hypothetical subclones based on single nucleotide vari-ations (SNVs) We define latent subclones by the presence of somesubset of the recorded SNVs Assuming that each sample is com-posed of some sample-specific proportions of these subclones wecan then fit the observed proportions of SNVs for each sample Bytaking a Bayesian perspective the proposed method provides a fulldescription of all possible solutions as a coherent posterior proba-bility model for all relevant unknown quantities including the binaryindicators that characterize the latent subclones by selecting (or not)the recorded SNVs instead of reporting a single solution

Some Results on the One-Way ANOVA Model with an Increas-ing Number of GroupsFeng LiangUniversity of Illinois at Urbana-ChampaignliangfillinoiseduAsymptotic studies on models with diverging dimensionality havereceived increasing attention in statistics A simple version of suchmodels is a one-way ANOVA model where the number of repli-cates is fixed but the number of groups goes to infinity Of interestare inference problems like model selection and estimation of theunknown group means We examine the consistency of Bayesianprocedures using Zellner (1986)rsquos g-prior and its variants (such asmixed g-priors and Empirical Bayes) and compare their estimationaccuracy with other procedures such as the ones based AICBICand group Lasso Our results indicate that the Empirical Bayes pro-cedure (with some modification for the large p small n setting) andthe fully Bayes procedure (ie a prior is specified on g) can achievemodel selection consistency and also have better estimation accu-racy than other procedures being considered

Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji31University of Louisville2University of Texas at Austin3NorthShore University HealthSystemUniversity of ChicagojiyuanuchicagoeduGraphical models can be used to characterize the dependence struc-ture for a set of random variables In some applications the formof dependence varies across different subgroups This situationarises for example when protein activation on a certain pathwayis recorded and a subgroup of patients is characterized by a patho-logical disruption of that pathway A similar situation arises whenone subgroup of patients is treated with a drug that targets that samepathway In both cases understanding changes in the joint distri-bution and dependence structure across the two subgroups is key to

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 53

Abstracts

the desired inference Fitting a single model for the entire data couldmask the differences Separate independent analyses on the otherhand could reduce the effective sample size and ignore the com-mon features In this paper we develop a Bayesian graphical modelthat addresses heterogeneity and implements borrowing of strengthacross the two subgroups by simultaneously centering the prior to-wards a global network The key feature is a hierarchical prior forgraphs that borrows strength across edges resulting in a comparisonof pathways across subpopulations (differential pathways) under aunified model-based framework We apply the proposed model todata sets from two very different studies histone modifications fromChIP-seq experiments and protein measurements based on tissuemicroarrays

Latent Space Models for Dynamic NetworksYuguo ChenUniversity of Illinois at Urbana-Champaignyuguoillinoisedu

Dynamic networks are used in a variety of fields to represent thestructure and evolution of the relationships between entities Wepresent a model which embeds longitudinal network data as trajec-tories in a latent Euclidean space A Markov chain Monte Carloalgorithm is proposed to estimate the model parameters and latentpositions of the nodes in the network The model parameters pro-vide insight into the structure of the network and the visualizationprovided from the model gives insight into the network dynamicsWe apply the latent space model to simulated data as well as realdata sets to demonstrate its performance

Session 25 Statistical Methods for Network Analysis

Consistency of Co-clustering for Exchangable Graph and ArrayDataDavid S Choi1 and Patrick J Wolfe21Carnegie Mellon University2University College Londondavidchandrewcmuedu

We analyze the problem of partitioning a 0-1 array or bipartite graphinto subgroups (also known as co-clustering) under a relativelymild assumption that the data is generated by a general nonpara-metric process This problem can be thought of as co-clusteringunder model misspecification we show that the additional error dueto misspecification can be bounded by O(n( minus 14)) Our resultsuggests that under certain sparsity regimes community detectionalgorithms may be robust to modeling assumptions and that theirusage is analogous to the usage of histograms in exploratory dataanalysis

Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali ShojaieUniversity of Washingtonashojaieuwedu

We introduce a general framework using a Laplacian shrinkagepenalty for estimation of inverse covariance or precision matricesfrom heterogeneous nonexchangeable populations The proposedframework encourages similarity among disparate but related sub-populations while allowing for differences among estimated matri-ces We propose an efficient alternating direction method of mul-tiplier (ADMM) algorithm for parameter estimation and establishboth variable selection and norm consistency of the estimator for

distributions with exponential or polynomial tails Finally we dis-cuss the selection of the Laplacian shrinkage penalty based on hier-archical clustering in the settings where the true relationship amongsamples is unknown and discuss conditions under which this datadriven choice results in consistent estimation of precision matricesExtensive numerical studies and applications to gene expressiondata from subtypes of cancer with distinct clinical outcomes indi-cate the potential advantages of the proposed method over existingapproaches

Estimating Signature Subgraphs in Samples of Labeled GraphsJuhee Cho and Karl RoheUniversity of Wisconsin-MadisonchojuheestatwisceduNetwork is a vibrant area in statistics biology and computer sci-ence Recently an emerging type of data in these fields is samplesof labeled networks (or graphs) The ldquolabelsrdquo of networks imply thatthe nodes are labeled and that the same set of nodes reappears in allof the networks Also they have a dual meaning that there are values(eg age gender or healthy vs sick) or vectors of values charac-terizing the associated network From the analysis we observe thatonly a part of the network forming a ldquosignature subgraphrdquo variesacross the networks whereas the other part is very similar So wedevelop methods to estimate the signature subgraph and show the-oretical properties of the suggested methods under the frameworkthat allows the sample size to go to infinity with a sparsity condi-tion To check the finite sample performances for the methods weconduct a simulation study and then analyze two data sets 42 brain-graphs data from 21 subjects and transcriptional regulatory networkdata from 41 diverse human cell types

Fast Hierarchical Modeling for Recommender SystemsPatrick PerryNew York UniversitypperrysternnyueduIn the context of a recommender system a hierarchical model al-lows for user-specific tastes while simultaneously borrowing esti-mation strength across all users Unfortunately existing likelihood-based methods for fitting hierarchical models have high computa-tional demands and these demands have limited their adoption inlarge-scale prediction tasks We propose a moment-based methodfor fitting a hierarchical model which has its roots in a methodoriginally introduced by Cochran in 1937 The method trades sta-tistical efficiency for computational efficiency It gives consistentparameter estimates competitive prediction error performance anddramatic computational improvements

Session 26 New Analysis Methods for UnderstandingComplex Diseases and Biology

Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3 YongZhang2 Myles Brown4 and X Shirley Liu4

1Dana Farber Cancer Institute2Tongji University3University of Texas MD Anderson Cancer Center4Dana Farber Cancer Institute amp Harvard UniversityywchenjimmyharvardeduCumulatively 70 of the human genome are transcribed whereasiexcl2 of the genome encodes protein As a part of the prevalent non-coding transcription long non-coding RNAs (lncRNAs) are RNAs

54 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

that are longer than 200 base pairs (bps) but with little protein cod-ing capacity The human genome encodes over 10000 lncRNAsand the function of the vast majority of them are unknown Throughintegrative analysis of the lncRNA expression profiles with clinicaloutcome and somatic copy number alteration we identified lncRNAthat are associated with cancer subtypes and clinical prognosis andpredicted those that are potential drivers of cancer progression inmultiple cancers including glioblastoma multiforme (GBM) ovar-ian cancer (OvCa) lung squamous cell carcinoma (lung SCC) andprostate cancer We validated our predictions of two tumorgeniclncRNAs by experimentally confirming the prostate cancer cellgrowth dependence on these two lncRNAs Our integrative analysisprovided a resource of clinically relevant lncRNA for developmentof lncRNA biomarkers and identification of lncRNA therapeutic tar-gets for human cancer

Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi WuHarvard Universitydwufasharvardedu

Large amount of genetics variants were identified in cancer genomestudies and GWAS studies These variants may well capture thecharacteristics of the diseases To best leverage the knowledge fordeveloping new therapeutics to treat diseases our study exploresthe possibility to use the genetics of diseases to guide drug repur-posing Drug repurposing is to suggest whether the available drugsof certain diseases can be re-used for the treatment of other dis-eases We particularly use the gene target information of drugs andprotein-protein interaction information to connect risk genes basedon GWAS hits and the available drugs Drug indication was used toevaluate the sensitivity and specificity of the novel pipeline Eval-uation of the pipeline suggests the promising direction for certaindiseases

Comparative Meta-Analysis of Prognostic Gene Signatures forLate-Stage Ovarian CancerLevi WaldronHunter Collegeleviwaldronhuntercunyedu

Authors Levi Waldron Benjamin Haibe-Kains Aedın C CulhaneMarkus Riester Jie Ding Xin Victoria Wang Mahnaz Ahmadi-far Svitlana Tyekucheva Christoph Bernau Thomas Risch Ben-jamin Ganzfried Curtis Huttenhower Michael Birrer and GiovanniParmigianiAbstract Numerous published studies have reported prognosticmodels of cancer patient survival from tumor genomics These stud-ies employ a wide variety of model training and validation method-ologies making it difficult to compare and rank their modelingstrategies or the accuracy of the models However they have alsogenerated numerous publicly available microarray datasets withclinically-annotated individual patient data Through systematicreview we identified and implemented fully-specified versions of14 prognostic models of advanced stage ovarian cancer publishedover a 5-year period These 14 published models were developedby different authors using disparate training datasets and statis-tical methods but all claimed to be capable of predicting over-all survival using microarray data We evaluated these models forprognostic accuracy (defined by Concordance Index for overall sur-vival) adapting traditional methods of meta-analysis to synthesizeresults in ten independent validation datasets This systematic eval-uation showed that 1) models generated by penalized or ensemble

Cox Proportional Hazards-based regression methods out-performedmodels generated by more complicated methods and strongly out-performed hypothesis-based models 2) validation dataset bias ex-isted meaning that some datasets indicated better validation perfor-mance for all models than others and that comparative evaluation isneeded to identify this source of bias 3) datasets selected by authorsfor independent validation tended to over-estimate model accuracycompared to previously unused validation datasets and 4) seem-ingly unrelated models generated highly correlated predictions fur-ther emphasizing the need for comparative evaluation of accuracyThis talk will provide an overview of methods for prediction mod-eling in cancer genomics and highlight lessons from the first sys-tematic comparative meta-analysis of published cancer genomicsprognostic models

Studying Spatial Organizations of Chromosomes via Paramet-ric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and Jun SLiu5

1New York university2Purdue University3Emory University4Tsinghua University5Harvard UniversityminghunyumcorgThe recently developed Hi-C technology enables a genome-wideview of spatial organizations of chromosomes and has shed deepinsights into genome structure and genome function Although thetechnology is extremely promising multiple sources of biases anduncertainties pose great challenges for data analysis Statistical ap-proaches for inferring three-dimensional (3D) chromosomal struc-ture from Hi-C data are far from their maturity Most existing mod-els are highly over-parameterized lacking clear interpretations andsensitive to outliers In this study we propose parsimonious easyto interpret and robust helix models for reconstructing 3D chromo-somal structure from Hi-C data We also develop a negative bino-mial regression approach to accounting for over-dispersion in Hi-Cdata When applied to a real Hi-C dataset helix models achievemuch better model adequacy scores than existing models Moreimportantly these helix models reveal that geometric properties ofchromatin spatial organizations as well as chromatin dynamics areclosely related to genome functions

Session 27 Recent Advances in Time Series Analysis

Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark van derWoerdColorado State UniversityjbreidtgmailcomProteins consist of sequences of the 21 natural amino acids Therecan be tens to hundreds of amino acids in the protein and hundredsto hundreds of thousands of atoms A complete model for the pro-tein consists of coordinates for every atom A useful class of sim-plified models is obtained by focusing only on the alpha-carbonsequence consisting of the primary carbon atom in the backboneof each amino acid The three-dimensional structure of the alpha-carbon backbone of the protein can be described as a sequence ofangle pairs each consisting of a bond angle and a dihedral angleThese angle pairs lie naturally on a sphere We consider autoregres-sive time series models for such spherical data sequences using ex-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 55

Abstracts

tensions of projected normal distributions We describe applicationto protein data and further developments including autoregressivemodels that switch parameterizations according to local structure inthe protein (such as helices beta-sheets and coils)

Semiparametric Estimation of Spectral Density Function withIrregular DataShu Yang and Zhengyuan Zhu

Iowa State Universityzhu1997gmailcom

We propose a semi-parametric method to estimate spectral den-sities of isotropic Gaussian processes with irregular observationsThe spectral density function at low frequencies is estimated usingsmoothing spline while we use a parametric model for the spec-tral density at high frequencies and estimate the parameters usingmethod-of-moment based on empirical variogram at small lags Wederive the asymptotic bounds for bias and variance of the proposedestimator Simulation results show that our method outperforms theexisting nonparametric estimator by several performance criteria

On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and Siegfried Hormann3

1University of California at Davis2University College London3University Libre de Bruxellesaaueucdavisedu

This talk addresses the prediction of stationary functional time se-ries Existing contributions to this problem have largely focusedon the special case of first-order functional autoregressive processesbecause of their technical tractability and the current lack of ad-vanced functional time series methodology It is shown how stan-dard multivariate prediction techniques can be utilized in this con-text The connection between functional and multivariate predic-tions is made precise for the important case of vector and functionalautoregressions The proposed method is easy to implement mak-ing use of existing statistical software packages and may there-fore be attractive to a broader possibly non-academic audienceIts practical applicability is enhanced through the introduction ofa novel functional final prediction error model selection criterionthat allows for an automatic determination of the lag structure andthe dimensionality of the model The usefulness of the proposedmethodology is demonstrated in simulations and an application tothe prediction of daily pollution curves It is found that the proposedprediction method often significantly outperforms existing methods

A Composite Likelihood-based Approach for Multiple Change-point Estimation in Multivariate Time Series ModelsChun Yip Yau and Ting Fung Ma

Chinese University of Hong Kongcyyaustacuhkeduhk

We propose a likelihood-based approach for multiple change-pointsestimation in general multivariate time series models Specificallywe consider a criterion function based on pairwise likelihood to esti-mate the number and locations of change-points and perform modelselection for each segment By the virtue of pairwise likelihood thenumber and location of change-points can be consistently estimatedunder very mild assumptions Computation is conducted efficientlyby a pruned dynamic programming algorithm Simulation studiesand real data examples are presented to demonstrate the statisticaland computational efficiency of the proposed method

Session 28 Analysis of Correlated Longitudinal and Sur-vival Data

Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir MesbahUniversity of Paris 6mounirmesbahupmcfrIn this talk I will consider the context of a longitudinal study whereparticipants are interviewed about their health quality of life or an-other latent trait at regular dates of visit previously establishedThe interviews consist usually to fulfill a questionnaire in whichthey are asked multiple choice questions with various ordinal re-sponse scales built in order to measure at the time of the visit thelatent trait which is assumed in a first step unidimensional Atthe time of entering the study each participant receives a treatmentappropriate to his health profile The choice of treatment is not ran-domized This choice is arbitrarily decided by a doctor based onthe health profile of the patient and a deep clinical examinationWe assume that the different treatments that a doctor can choose areordered (a dose effect) In addition we assume that the treatmentprescribed at the entrance does not change throughout the study Inthis work I will investigate and compare strategies and models toanalyze time evolution of the latent variable in a longitudinal studywhen the main goal is to compare non-randomized ordinal treat-ments I will illustrate my results with a real longitudinal complexquality of life studyReferences [1] Bousseboua M and Mesbah M (2013) Longitu-dinal Rasch Process with Memory Dependence Pub InstStatUniv Paris Vol 57- Fasc 1-2 45-58 [2] Christensen KB KreinerS Mesbah M (2013) Rasch Models in Health J Wiley [3] Mes-bah M (2012) Measurement and Analysis of Quality of Life inEpidemiology In ldquoBioinformatics in Human Health and Heredity(Handbook of statistics Vol 28)rdquo Eds Rao CR ChakrabortyR and Sen PK North Holland Chapter 15 [4] Rosenbaum PRand Rubin DB (1983) The central role of the propensity score inobservational studies for causal effects Biometrika 70 1 pp 41-55[5] K Imai and D A Van Dyk (2004) Causal Inference With Gen-eral Treatment Regimes Generalizing the Propensity Score JASAVol 99 N 467 Theory and Methods

Power and Sample Size Calculations for Evaluating MediationEffects with Multiple Mediators in Longitudinal StudiesCuiling WangAlbert Einstein College of MedicinecuilingwangeinsteinyueduCurrently there are very limited statistical researches on power anal-ysis for evaluating mediation effects of multiple mediators in longi-tudinal studies In addition to the complex of missing data com-mon to longitudinal studies the case of multiple mediators furthercomplicates the hypotheses testing of mediation effects Based onprevious work of Wang and Xue (Wang and Xue 2012) we eval-uate several hypothesis tests regarding the mediation effects frommultiple mediators and provide formulae for power and sample sizecalculations The performance of these methods under limited sam-ple size is examined using simulation studies An example from theEinstein Aging Study (EAS) is used to illustrate the methods

Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore21University of Maryland

56 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

2McGill UniversitymltleeumdeduCox regression methods are well-known It has however a strongproportional hazards assumption In many medical contexts a dis-ease progresses until a failure event (such as death) is triggeredwhen the health level first reaches a failure threshold Irsquoll presentthe Threshold Regression (TR) model for the health process that re-quires few assumptions and hence is quite general in its potentialapplication Both parametric and distribution-free methods for es-timations and predictions using the TR models are derived Caseexamples are presented that demonstrate the methodology and itspractical use The methodology provides medical researchers andbiostatisticians with new and robust statistical tools for estimatingtreatment effects and assessing a survivorrsquos remaining life

Joint Modeling of Survival Data and Mismeasured Longitudi-nal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi21University of Western Ontario2University of WaterloowhestatsuwocaJoint modeling of longitudinal and survival data has been studiedextensively where the Cox proportional hazards model has fre-quently been used to incorporate the relationship between survivaltime and covariates Although the proportional odds model is anattractive alternative to the Cox proportional hazards model by fea-turing the dependence of survival times on covariates via cumulativecovariate effects this model is rarely discussed in the joint model-ing context To fill this gap we investigate joint modeling of thesurvival data and longitudinal data which subject to measurementerror We describe a model parameter estimation method based onexpectation maximization algorithm In addition we assess the im-pact of naive analyses that fail to address error occurring in longi-tudinal measurements The performance of the proposed method isevaluated through simulation studies and a real data analysis

Session 29 Clinical Pharmacology

Truly Personalizing MedicineMike D HaleAmgen IncmdhaleamgencomPredictive analytics are being increasingly used to optimize market-ing for many non-medical products These companies observe andanalyze the behavior andor characteristics of an individual pre-dict the needs of that individual and then address those needs Wefrequently encounter this when web-browsing and when participat-ing in retail store loyalty programs advertising and coupons aretargeted to the specific individual based on predictive models em-ployed by advertisers and retailers This makes the traditional drugdevelopment program appear antiquated where a drug may be in-tended for all patients with a given indication This talk contraststhose methods and practices for addressing individual needs withthe way medicines are typically prescribed and considers a wayto integrate big data product label and predictive analytics to im-prove and enable personalized medicine Some important questionsare posed (but unresolved) such as who could do this and whatare the implications if we were to predict outcomes for individualpatients

What Do Statisticians Do in Clinical PharmacologyBrian Smith

Amgen Incbrismithamgencom

Clinical pharmacology is the science of drugs and their clinical useIt could be arged that all drug development is clinical pharmacol-ogy however typically pharmaceutical companies speperate in apattern similiar to the following A) clinical (late) development(Phase 2b-Phase 3) B) post-marketing (phase 4) and C) clinicalpharmacology (Phase 1-Phase 2a) As will be seen in this presenta-tion clinical pharmacology research presents numerous interestingstatistical opportunities

The Use of Modeling and Simulation to Bridge Different DosingRegimens - a Case StudyChyi-Hung Hsu and Jose PinheiroJanssen Research amp Developmentchsu3itsjnjcom

In recent years the pharmaceutical industry has increasingly facedthe challenge of needing to efficiently evaluate and use all availableinformation to improve its success rate in drug development underlimited resources constraints Modeling and simulation has estab-lished itself as the quantitative tool of choice to meet this existentialchallenge Models provide a basis for quantitatively describing andsummarizing the available information and our understanding of itUsing models to simulate data allows the evaluation of scenarioswithin and even outside the boundaries of the original data In thispresentation we will discuss and illustrate the use of modeling andsimulation techniques to bridge different dosing regimens based onstudies using just one of the regimens Special attention will begiven to quantifying inferential uncertainty and model validation

A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao Susan GuoLijie Zhong and Liang FangGilead Sciencesyongwushaogileadcom

For a bioequivalence crossover study the FDA guidance recom-mends a mixed effects model for the formulation comparisons ofpharmacokinetics parameters including all subject data while theEMA guidance recommends an ANOVA model with fixed effectsof sequence subject within sequence period and formulation ex-cluding subjects with missing data from the pair-wise comparisonThese two methods are mathematically equivalent when there areno missing values from the targeted comparison With missing val-ues the mixed effects model including subjects with missing valuesprovides higher statistical power compared to fixed effects modelexcluding these subjects However the parameter estimation in themixed effects model is based on large sample asymptotic approxi-mations which may introduce bias in the estimate of standard devi-ations when sample size is small (Jones and Kenward 2003)In this talk we provide a closed-form formula to quantify the poten-tial gain of power using mixed effects models when missing dataare present A simulation study was conducted to confirm the theo-retical results We also perform a simulation study to investigate thebias introduced by the mixed effects model for small sample sizeOur results show that when the sample size is 12 or above as re-quired by both FDA and EMA the bias introduced by the mixedeffects model is negligible From a statistics point of view we rec-ommend the mixed effect model approach for bioequivalence stud-ies for its potential gain in power when missing data are present andmissing completely at random

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 57

Abstracts

Session 30 Sample Size Estimation

Sample Size Calculation with Semiparametric Analysis of LongTerm and Short Term HazardsYi WangNovartis Pharmaceuticals Corporationyi-11wangnovartiscom

We derive sample size formulae for survival data with non-proportional hazard functions under both fixed and contiguous al-ternatives Sample size determination has been widely discussed inliterature for studies with failure-time endpoints Many researchershave developed methods with the assumption of proportional haz-ards under contiguous alternatives Without covariate adjustmentthe logrank test statistic is often used for the sample size and powercalculation With covariate adjustment the approaches are oftenbased on the score test statistic for the Cox proportional hazardsmodel Such methods however are inappropriate when the pro-portional hazards assumption is violated We develop methods tocalculate the sample size based on the semiparametric analysis ofshort-term and long-term hazard ratios The methods are built ona semiparametric model by Yang and Prentice (2005) The modelaccommodates a wide range of patterns of hazard ratios and in-cludes the Cox proportional hazards model and the proportionalodds model as its special cases Therefore the proposed methodscan be used for survival data with proportional or non-proportionalhazard functions In particular the sample size formula by Schoen-feld (1983) and Hsieh and Lavori (2000) can be obtained as a specialcase of our methods under contiguous alternatives

Sample Size and Decision Criteria for Phase IIB Studies withActive ControlXia XuMerck amp Coxia xumerckcom

In drug development programs Phase IIB studies provide informa-tion to make GoNo Go decision of conducting large confirmatoryPhase III studies Currently more and more Phase IIB studies are us-ing active control as comparator especially for development of newtherapies for the treatment of HIV infection in which it is not ethicalto use placebo control due to severity of the disease and availabilityof approved drugs If Phase IIB study demonstrated ldquocomparablerdquoefficacy and safety compared to active control the program mayproceed to Phase III which usually use same or similar active con-trol to formally assess non-inferiority of the new therapy Samplesize determination and quantification of decision criteria for suchPhase IIB studies are explored using a Bayesian analysis

Sample Size Determination for Clinical Trials to Correlate Out-comes with Potential PredictorsSu Chen Xin Wang and Ying ZhangAbbVie Incsuchenabbviecom

Sample size determination can be a challenging task for a post-marketing clinical study aiming to establish the predictivity of asingle influential measurement or a set of variables to a clinical out-come of interest Since the relationship between the potential pre-dictors and the outcome is unknown at the design stage one maynot be able to perform the conventional sample size calculation butlook for other means to size the trial Our proposed approach isbased on the length of the confidence interval of the true correlationcoefficient between predictive and outcome variables In this studywe compare three methods to construct confidence intervals of the

correlation coefficient based on the approximate sampling distribu-tion of the Pearson correlation Z-transformed Pearson correlationand Bootstrapping respectively We evaluate the performance ofthe three methods under different scenarios with small to moderatesample sizes and different correlations Coverage probabilities ofthe confidence intervals are compared across the three methods Theresults are used for sample size determination based on the width ofthe confidence intervals Hypothetical examples are provided to il-lustrate the idea and its implementation

Sample Size Re-Estimation at Interim Analysis in Oncology Tri-als with a Time-to-Event Endpoint

Ian (Yi) Zhang

Sunovion Pharmaceuticals Incianzhangsunovioncom

Oncology is a hot therapeutic area due to highly unmet medicalneeds The superiority of a study drug over a control is commonlyassessed with respect to a time to event endpoint such as overall sur-vival (OS) or progression free survival (PFS) in confirmatory oncol-ogy trials Adaptive design allowing for sample size re-estimation(SSR) at interim analysis is often employed to accelerate oncologydrug development while reducing costs Although SSR is catego-rized as ldquoless well understoodrdquo (in contrast to ldquowell understoodrdquodesigns such as group sequential design) in the 2010 draft FDAguidance on adaptive designs it has gradually gained regulatory ac-ceptance and is widely adopted in industry In this presentation aphase IIIII seamless design is developed to re-estimate the samplesize based upon unblinded interim result using conditional power ofobserving a significant result by the end of the trial The method-ology achieved the desired conditional power while still controllingthe type I error rate Extensive simulations studies are performedto evaluate the operating characteristics of the design A real-worldexample will also be used for illustration Pros and cons of the de-sign will be discussed

Statistical Inference and Sample Size Calculation for Paired Bi-nary Outcomes with Missing Data

Song Zhang

University of Texas Southwestern Medical Centersongzhangutsouthwesternedu

We investigate the estimation of intervention effect and samplesize determination for experiments where subjects are supposed tocontribute paired binary outcomes with some incomplete observa-tions We propose a hybrid estimator to appropriately account forthe mixed nature of observed data paired outcomes from thosewho contribute complete pairs of observations and unpaired out-comes from those who contribute either pre- or post-interventionoutcomes We theoretically prove that if incomplete data are evenlydistributed between the pre- and post-intervention periods the pro-posed estimator will always be more efficient than the traditionalestimator A numerical research shows that when the distributionof incomplete data is unbalanced the proposed estimator will besuperior when there is moderate-to-strong positive within-subjectcorrelation We further derive a closed-form sample size formula tohelp researchers determine how many subjects need to be enrolledin such studies Simulation results suggest that the calculated sam-ple size maintain the empirical power and type I error under variousdesign configurations We demonstrate the proposed method usinga real application example

58 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Session 31 Predictions in Clinical Trials

Predicting Smoking Cessation Outcomes Beyond Clinical TrialsYimei Li E Paul Wileyto and Daniel F HeitjanUniversity of PennsylvaniayimeilimailmedupenneduIn smoking cessation trials subjects usually receive treatment forseveral weeks with additional information collected 6 or 12 monthafter that An important question concerns predicting long-term ces-sation success based on short-term clinical observations But sev-eral features need to be considered First subjects commonly transitseveral times between lapse and recovery during which they exhibitboth temporary and permanent quits and both brief and long-termlapses Second although we have some reliable predictors of out-come there is also substantial heterogeneity in the data We there-fore introduce a cure-mixture frailty model that describes the com-plex process of transitions between abstinence and smoking Thenbased on this model we propose a Bayesian approach to predictindividual future outcomes We will compare predictions from ourmodel to a variety of ad hoc methods

Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping WangEli Lilly and CompanyfuhaodagmailcomIn oncology trials it is challenging to predict when we can havecertain number of events or for a given period of time how manyadditional events that we can observe We develop a tool calledBEATLES which stands for Bayesian Event And Time LandmarkEstimation Software This method and tools have been broadly im-plemented in Lilly In this talk we will present the technical details

Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record (EMR)DataHaoda Fu1 and Nan Jia2

1Eli Lilly and Company2University of Southern Californiajia nan2lillycomTo compare a treatment with a control via a randomized clinicaltrial the assessment of the treatment efficacy is often based on anoverall treatment effect over a specific study population To increasethe probability of study success (PrSS) it is important to choose anappropriate and relevant study population where the treatment is ex-pected to show overall benefit over the control This research is topredict the PrSS based on EMR data for a given patient populationTherefore we can use this approach to refine the study inclusionand exclusion criteria to increase the PrSS For learning from EMRdata we also develop covariate balancing methods Although ourmethods are developed for learning from EMR data learning fromrandomized control trials will be a special case of our methods

Weibull Cure-Mixture Model for the Prediction of Event Timesin Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1

1University of Pennsylvania2Radiation Therapy Oncology Group Statistical CentergsyingmailmedupenneduMany clinical trials with time-to-event outcome are designed toperform interim and final analyses upon the occurrence of a pre-specified number of events As an aid to trial logistical planningit is desirable to predict the time to reach such landmark event

numbers Our previously developed parametric (exponential andWeibull) prediction models assume that every trial participant issusceptible to the event of interest and will eventually experiencethe event if follow-up time is long enough This assumption maynot hold as some trial participants may be cured of the fatal dis-ease and the failure to accommodate the cure possibility may leadto the biased prediction In this talk a Weibull cure-mixture predic-tion model will be presented that assumes the trial participants area mixture of susceptible (uncured) participants and non-susceptible(cured) participants The cure probability is modelled using logis-tic regression and the time to event among susceptible participantsis modelled by a two-parameter Weibull distribution The compar-ison of prediction from the Weibull-cure mixture prediction modelto that from the standard Weibull prediction model will be demon-strated using data from a randomized trial of oropharyngeal cancer

Session 32 Recent Advances in Statistical Genetics

Longitudinal Exome-Focused GWAS of Alcohol Use in a Vet-eran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale UniversityzuohengwangyaleeduAlcohol dependence (AD) is a major public health concern in theUnited States and contributes to the pathogenesis of many diseasesThe risk of AD is multifactorial and includes shared genetic andenvironmental factors However gene mapping in AD has not yetbeen successful the confirmed associations account for a small pro-portion of overall genetic risks Multiple measurements in longitu-dinal genetic studies provide a route to reduce noise and correspond-ingly increase the strength of signals in genome-wide associationstudies (GWAS) In this study we developed a powerful statisticalmethod for testing the joint effect of genetic variants with a generegion on diseases measured over multiple time points We appliedthe new method to a longitudinal study of veteran cohort with bothHIV-infected and HIV-uninfected patients to understand the geneticrisk underlying AD We found an interesting gene that has been re-ported in HIV study suggestive of potential gene by environmenteffect in alcohol use and HIV We also conducted simulation studiesto access the performance of the new statistical methods and demon-strated a power gain by taking advantage of repeated measurementsand aggregating information across a biological region This studynot only contributes to the statistical toolbox in the current GWASbut also potentially advances our understanding of the etiology ofAD

Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2 andAlexander F Wilson1

1National Institutes of Health2Mahidol UniversitysunghemailnihgovThe task of identifying genetic variants contributing to trait varia-tion is increasingly challenging given the large number and densityof variant data being produced Current methods of analyzing thesedata include regression-based variable selection methods which pro-duce linear models incorporating the chosen variants For examplethe Tiled Regression method begins by examining relatively smallsegments of the genome called tiles Selection of significant predic-tors if any is done first within individual tiles However type I errorrates for such methods havenrsquot been fully investigated particularly

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 59

Abstracts

considering correlation among variants To investigate type I er-ror in this situation we simulated a mini-GWAS genome including306097 SNPs in 4000 unrelated samples with 2000 non-genetictraits Initially 53060 tiles were defined by dividing the genomeaccording to recombination hotspots Then larger tiles were definedby combining groups of ten consecutive tiles Stepwise regressionand LASSO variable selection methods were performed within tilesfor each tile definition Type I error rates were calculated as thenumber of selected variants divided by the number considered av-eraged over the 2000 phenotypes Overall error rates for stepwiseregression using fixed selection criteria of 005 and LASSO mini-mizing mean square error were 004 and 012 respectively whenusing the initial (smaller) tiles Considering separately each combi-nation of tile size (number of SNPs) and multicollinearity (definedas 1 - the determinant of the genotype correlation matrix) observedtype I error rates for stepwise regression tended to increase withthe number of variants and decrease with increasing multicollinear-ity With LASSO the trends were in the opposite direction Whenthe larger tiles were used overall rates for LASSO were noticeablysmaller while overall rates were rather robust for stepwise regres-sion

GMDR A Conceptual Framework for Detection of MultifactorInteractions Underlying Complex TraitsXiang-Yang LouUniversity of Alabama at Birminghamxylouuabedu

Biological outcomes are governed by multiple genetic and envi-ronmental factors that act in concert Determining multifactor in-teractions is the primary topic of interest in recent genetics stud-ies but presents enormous statistical and mathematical challengesThe computationally efficient multifactor dimensionality reduction(MDR) approach has emerged as a promising tool for meeting thesechallenges On the other hand complex traits are expressed in vari-ous forms and have different data generation mechanisms that can-not be appropriately modeled by a dichotomous model the subjectsin a study may be recruited according to its own analytical goals re-search strategies and resources available not only homogeneous un-related individuals Although several modifications and extensionsof MDR have in part addressed the practical problems they arestill limited in statistical analyses of diverse phenotypes multivari-ate phenotypes and correlated observations correcting for poten-tial population stratification and unifying both unrelated and familysamples into a more powerful analysis I propose a comprehensivestatistical framework referred as to generalized MDR (GMDR) forsystematic extension of MDR The proposed approach is quite ver-satile not only allowing for covariate adjustment being suitablefor analyzing almost any trait type eg binary count continuouspolytomous ordinal time-to-onset multivariate and others as wellas combinations of those but also being applicable to various studydesigns including homogeneous and admixed unrelated-subject andfamily as well as mixtures of them The proposed GMDR offersan important addition to the arsenal of analytical tools for identi-fying nonlinear multifactor interactions and unraveling the geneticarchitecture of complex traits

Gene-Gene Interaction Analysis for Rare Variants Applicationto T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University2Sejong Universitytsparkstatssnuackr

Heritability of complex diseases may not be fully explained by thecommon variants This missing heritability could be partly due togene-gene interaction and rare variants There has been an exponen-tial growth of gene-gene interaction analysis for common variantsin terms of methodological developments and practical applicationsAlso the recent advance of high-throughput sequencing technolo-gies makes it possible to conduct rare variant analysis Howeverlittle progress has been made in gene-gene interaction analysis forrare variants Here we propose a new gene-gene interaction methodfor the rare variants in the framework of the multifactor dimension-ality reduction (MDR) analysis The proposed method consists oftwo steps The first step is to collapse the rare variants in a specificregion such as gene The second step is to perform MDR analysisfor the collapsed rare variants The proposed method is illustratedwith 1080 whole exome sequencing data of Korean population toidentify causal gene-gene interaction for rare variants for type 2 di-abetes

Session 33 Structured Approach to High DimensionalData with Sparsity and Low Rank Factorization

Two-way Regularized Matrix DecompositionJianhua HuangTexas AampM UniversityjianhuastattamueduMatrix decomposition (or low-rank matrix approximation) plays animportant role in various statistical learning problems Regulariza-tion has been introduced to matrix decomposition to achieve stabil-ity especially when the row or column dimension is high Whenboth the row and column domains of the matrix are structured itis natural to employ a two-way regularization penalty in low-rankmatrix approximation This talk discusses the importance of con-sidering invariance when designing the two-way penalty and showsun-desirable properties of some penalties used in the literature whenthe invariance is ignored

Tensor Regression with Applications in Neuroimaging AnalysisHua Zhou1 Lexin Li1 and Hongtu Zhu2

1North Carolina State University2University of North Carolina at Chapel Hilllli10ncsueduClassical regression methods treat covariates as a vector and es-timate a corresponding vector of regression coefficients Modernapplications in medical imaging generate covariates of more com-plex form such as multidimensional arrays (tensors) Traditionalstatistical and computational methods are compromised for analysisof those high-throughput data due to their ultrahigh dimensional-ity as well as complex structure In this talk I will discuss a newclass of tensor regression models that efficiently exploit the specialstructure of tensor covariates Under this framework ultrahigh di-mensionality is reduced to a manageable level resulting in efficientestimation and prediction Regularization both hard thresholdingand soft thresholding types will be carefully examined The newmethods aim to address a family of neuroimaging problems includ-ing using brain images to diagnose neurodegenerative disorders topredict onset of neuropsychiatric diseases and to identify diseaserelevant brain regions or activity patterns

RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2 andGuy Lebanon1

60 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

1Georgia Institute of Technology2Pennsylvania State Universitykrishnakumar3gatecheduFeature screening is a key step in handling ultrahigh dimensionaldata sets that are ubiquitous in modern statistical problems Over thelast decade convex relaxation based approaches (eg Lassosparseadditive model) have been extensively developed and analyzed forfeature selection in high dimensional regime But these approachessuffer from several problems both computationally and statisticallyTo overcome these issues we propose a novel Hilbert space em-bedding based approach for independence screening in ultrahigh di-mensional data sets The proposed approach is model-free (ie nomodel assumption is made between response and predictors) andcould handle non-standard (eg graphs) and multivariate outputsdirectly We establish the sure screening property of the proposedapproach in the ultrahigh dimensional regime and experimentallydemonstrate its advantages over other approaches

Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho ChunPurdue UniversitychunhpurdueeduFor the purpose of inferring a network we consider a sparse Gaus-sian graphical model (SGGM) under the presence of a populationstructure which often occurs in genetic studies with model organ-isms In these studies datasets are obtained by combining multi-ple lines of inbred organisms or by using outbred animals Ignor-ing such population structures would produce false connections ina graph structure but most research in graph inference has beenfocused on independent cases On the other hand in regression set-tings a linear mixed effect model has been widely used in orderto account for correlations among observations Besides its effec-tiveness the linear mixed effect model has a generality the modelcan be stated with a framework of penalized least squares Thisgenerality makes it very flexible for utilization in settings other thanregression In this manuscript we adopt a linear mixed effect modelto an SGGM Our formulation fits into the recently developed con-ditional Gaussian graphical model in which the population struc-tures are modeled as predictors and the graph is determined by aconditional precision matrix The proposed approach is applied tothe network inference problem of two datasets heterogeneous micediversity panel (HMDP) and heterogeneous stock (HS) datasets

Session 34 Recent Developments in Dimension Reduc-tion Variable Selection and Their Applications

Variable Selection and Model Estimation via Subtle UprootingXiaogang SuUniversity of Texas at El PasoxiaogangsugmailcomWe propose a new method termed ldquosubtle uprootingrdquo for fittingGLM by optimizing a smoothed information criterion The signif-icance of this approach is that it completes variable selection andparameter estimation within one single optimization step and avoidstuning penalty parameters as commonly done in traditional regular-ization approaches Two technical maneuvers ldquouprootingrdquo and anepsilon-threshold procedure are employed to enforce sparsity inparameter estimates while maintaining the smoothness of the ob-jective function The formulation allows us to borrow strength fromestablished methods and theories in both optimization and statistical

estimation More specifically a modified BFGS algorithm (Li andFukushima 2001) is adopted to solve the non-convex yet smoothprogramming problem with established global and super-linearconvergence properties By making connections to M -estimatorsand information criteria we also showed that the proposed methodis consistent in variable selection and efficient in estimating thenonzero parameters As illustrated with both simulated experimentsand data examples the empirical performance is either comparableor superior to many other competitors

Robust Variable Selection Through Dimension ReductionQin WangVirginia Commonwealth Universityqwang3vcueduDimension reduction and variable selection play important roles inhigh dimensional data analysis MAVE (minimum average varianceestimation) is an efficient approach proposed by Xia et al (2002)to estimate the regression mean space However it is not robust tooutliers in the dependent variable because of the use of least-squarescriterion In this talk we propose a robust estimation based on localmodal regression so that it is more applicable in practice We fur-ther extend the new approach to select informative variables throughshrinkage estimation The efficacy of the new approach is illustratedthrough simulation studies

Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2

1University of Florida2National University of SingaporezhihuasustatufleduThe envelope model recently proposed by Cook Li andChiaromonte (2010) is a novel method to achieve efficient estima-tion for multivariate linear regression It identifies the material andimmaterial information in the data using the covariance structureamong the responses The subsequent analysis is based only on thematerial part and is therefore more efficient The envelope estimatoris consistent but in the sample the material part estimated by theenvelope model consists of linear combinations of all the responsevariables while in many applications it is important to pinpoint theresponse variables that are immaterial to the regression For thispurpose we propose the sparse envelope model which can identifythese response variables and at the same time preserves the effi-ciency gains offered by the envelope model A group-lasso type ofpenalty is employed to induce sparsity on the manifold structure ofthe envelope model Consistency asymptotic distribution and oracleproperty of the estimator are established In particular new featuresof oracle property with response selection are discussed Simulationstudies and an example demonstrate the effectiveness of this model

Session 35 Post-Discontinuation Treatment in Random-ized Clinical Trials

Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censoringby Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries11Eli Lilly and Company2North Carolina State Universityliu jingyilillycomA randomized clinical trial is designed to estimate the direct ef-fect of a treatment versus control where patients receive the treat-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 61

Abstracts

ment of interest or control by random assignment The treatmenteffect is measured by the comparison of endpoints of interest egoverall survival However in some trials patients who discon-tinued their initial randomized treatment are allowed to switch toanother treatment based on clinicians or patientsrsquo subjective deci-sion In such cases the primary endpoint is censored and the di-rect treatment effect of interest may be confounded by subsequenttreatments especially when subsequent treatments have large im-pact on endpoints In such studies there usually exist variables thatare both risk factors of primary endpoint and also predictors of ini-tiation of subsequent treatment Such variables are called time de-pendent confounders When time dependent confounders exist thetraditional methods such as the intent-to-treat (ITT) analysis andtime-dependent Cox model may not appropriately adjust for timedependent confounders and result in biased estimators Marginalstructural models (MSM) have been applied to estimate the causaltreatment effect when initial treatment effect was confounded bysubsequent treatments It has been shown that MSM utilizing in-verse propensity weighting generates consistent estimators whenother nuisance parameters were correctly modeled However theoccurrence of very large weights can cause the estimator to haveinflated variance and consistency may not hold The AugmentedMSM estimator was proposed to more efficiently estimate treat-ment effect but may not perform well as expected in presence oflarge weights In this paper we proposed a new method to estimateweights by adaptively truncating longitudinal weights in MSM Thismethod sacrifices the consistency but gain efficiency when largeweight exists without ad hoc selecting and removing observationswith large weights We conducted simulation studies to explorethe performance of several different methods including ITT anal-ysis Cox model and the proposed method regarding bias standarddeviation coverage rate of confidence interval and mean squarederror (MSE) under various scenarios We also applied these meth-ods to a randomized open-label phase III study of patients withnon-squamous non-small cell lung cancer

Quantile Regression Adjusting for Dependent Censoring fromSemi-Competing RisksRuosha Li1 and Limin Peng2

1University of Pittsburgh2Emory Universityrul12pittedu

In this work we study quantile regression when the response is anevent time subject to potentially dependent censoring We considerthe semi-competing risks setting where time to censoring remainsobservable after the occurrence of the event of interest While sucha scenario frequently arises in biomedical studies most of currentquantile regression methods for censored data are not applicable be-cause they generally require the censoring time and the event timebe independent By imposing rather mild assumptions on the asso-ciation structure between the time-to-event response and the censor-ing time variable we propose quantile regression procedures whichallow us to garner a comprehensive view of the covariate effects onthe event time outcome as well as to examine the informativenessof censoring An efficient and stable algorithm is provided for im-plementing the new method We establish the asymptotic proper-ties of the resulting estimators including uniform consistency andweak convergence Extensive simulation studies suggest that theproposed method performs well with moderate sample sizes We il-lustrate the practical utility of our proposals through an applicationto a bone marrow transplant trial

Overview of Crossover DesignMing ZhuAbbVie Inczhuming83gmailcomCrossover design is used in many clinical trials Comparing toconventional parallel design crossover design has the advantage ofavoiding problems of comparability issues between study and con-trol groups with regard to potential confounding variables More-over crossover design is more efficient than parallel design in thatit requires smaller sample size with given type I and type II errorHowever crossover design may suffer from the problem of carry-over effects which might bias the interpretation of data analysis Inthe presentation I will talk about general consideration that needsto be taken and pitfalls to be avoided in planning and analysis ofcrossover trial Appropriate statistical methods for crossover trialanalysis will also be described

Cross-Payer Effects of Medicaid LTSS on Medicare ResourceUse using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen JohnsonUniversity of MarylandyihuangumbceduMedicaid administrators look to establish a better balance betweenlong-term services and supports (LTSS) provided in the communityand in institutions and to better integrate acute and long-term carefor recipients who are dually eligible for Medicare Programs of in-tegrated care will require the solid understanding on the interactiveeffects that are masked in the separation of Medicare and MedicaidThis paper aims to evaluate the causal effect of Marylandrsquos OlderAdult Waiver (OAW) program on the outcomes of Medicare spend-ing using propensity score based health risk profiling techniqueSpecifically dually eligible recipients enrolled for Marylandrsquos OAWprogram were identified as the treatment group and matched ldquocon-trolrdquo groups were drawn from comparable population who did notreceive those services The broader impact for this study is that sta-tistical approaches can be developed by any state to facilitate theimprovement of quality and cost effectiveness of LTSS for duals

Session 36 New Advances in Semi-Parametric Modelingand Survival Analysis

Bayesian Partial Linear Model for Skewed Longitudinal DataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 Stuart Lipsitz3

and Steven Lipshultz41AbbVie Inc2Florida State University3Brigham and Womenrsquos Hospital4University of MiamidebdeepstatfsueduCurrent statistical models and methods focusing on mean responseare not appropriate for longitudinal studies with heavily skewedcontinuous response For such longitudinal response we presenta novel model accommodating a partially linear median regressionfunction a flexible Dirichlet process mixture prior for the skewederror distribution and within subject association structure We pro-vide theoretical justifications for our methods including asymptoticproperties of the posterior and the semi-parametric Bayes estima-tors We also provide simulation studies of finite sample propertiesEase of computational implementation via available MCMC toolsand other additional advantages of our method compared to exist-ing methods are illustrated via analysis of a cardiotoxicity study ofchildren of HIV infected mothers

62 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Nonparametric Inference for Inverse Probability Weighted Es-timators with a Randomly Truncated SampleXu ZhangUniversity of Mississippixzhang2umcedu

A randomly truncated sample appears when the independent vari-ables T and L are observable if L iexcl T The truncated version Kaplan-Meier estimator is known to be the standard estimation method forthe marginal distribution of T or L The inverse probability weighted(IPW) estimator was suggested as an alternative and its agreementto the truncated version Kaplan-Meier estimator has been provedThis paper centers on the weak convergence of IPW estimators andvariance decomposition The paper shows that the asymptotic vari-ance of an IPW estimator can be decomposed into two sources Thevariation for the IPW estimator using known weight functions is theprimary source and the variation due to estimated weights shouldbe included as well Variance decomposition establishes the con-nection between a truncated sample and a biased sample with knowprobabilities of selection A simulation study was conducted to in-vestigate the practical performance of the proposed variance esti-mators as well as the relative magnitude of two sources of variationfor various truncation rates A blood transfusion data set is analyzedto illustrate the nonparametric inference discussed in the paper

Modeling Time-Varying Effects for High-Dimensional Covari-ates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji ZhuUniversity of Michiganyiliumichedu

Survival models with time-varying effects provide a flexible frame-work for modeling the effects of covariates on event times How-ever the difficulty of model construction increases dramatically asthe number of variable grows Existing constrained optimizationand boosting methods suffer from computational complexity Wepropose a new Gateaux differential-based boosting procedure forsimultaneously selecting and automatically determining the func-tional form of covariates The proposed method is flexible in that itextends the gradient boosting to functional differentials in generalparameter space In each boosting learning step of this procedureonly the best-fitting base-learner (and therefore the most informativecovariate) is added to the predictor which consequently encouragessparsity In addition the method controls smoothness which is cru-cial for improving predictive performance The performance of theproposed method is examined by simulations and by application toanalyze the national kidney transplant data

Flexible Modeling of Survival Data with Covariates Subject toDetection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang2

1Villanova University2North Carolina State Universitydzhang2ncsuedu

Models for survival data generally assume that covariates are fullyobserved However in medical studies it is not uncommon forbiomarkers to be censored at known detection limits A computa-tionally efficient multiple imputation procedure for modelling sur-vival data with covariates subject to detection limits is proposedThis procedure is developed in the context of an accelerated fail-ure time model with a flexible seminonparametric error distributionAn iterative version of the proposed multiple imputation algorithmthat approximates the EM algorithm for maximum likelihood is sug-gested Simulation studies demonstrate that the proposed multiple

imputation methods work well while alternative methods lead to es-timates that are either biased or more variable The proposed meth-ods are applied to analyze the dataset from a recently conductedGenIMS study

Session 37 High-Dimensional Data Analysis Theoryand Application

Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen ZhangUniversity of ArizonahzhangmatharizonaeduA new class of semiparametric functional regression models is con-sidered to jointly model the functional and non-functional predic-tors identifying important scalar covariates while taking into ac-count the functional covariate In particular we exploit a unifiedlinear structure to incorporate the functional predictor as in classi-cal functional linear models that is of nonparametric feature At thesame time we include a potentially large number of scalar predic-tors as the parametric part that may be reduced to a sparse represen-tation The new method performs variable selection and estimationby naturally combining the functional principal component analysis(FPCA) and the SCAD penalized regression under one frameworkTheoretical and empirical investigation reveals that efficient estima-tion regarding important scalar predictors can be obtained and en-joys the oracle property despite contamination of the noise-pronefunctional covariate The study also sheds light on the influence ofthe number of eigenfunctions for modeling the functional predic-tor on the correctness of model selection and accuracy of the scalarestimates

High-Dimensional Thresholded Regression and Shrinkage Ef-fectZemin Zheng Yingying Fan and Jinchi LvUniversity of Southern CaliforniazeminzheusceduHigh-dimensional sparse modeling via regularization provides apowerful tool for analyzing large-scale data sets and obtainingmeaningful interpretable models The use of nonconvex penaltyfunctions shows advantage in selecting important features in highdimensions but the global optimality of such methods still de-mands more understanding In this paper we consider sparse re-gression with hard-thresholding penalty which we show to giverise to thresholded regression This approach is motivated by itsclose connection with the L0-regularization which can be unreal-istic to implement in practice but of appealing sampling propertiesand its computational advantage Under some mild regularity con-ditions allowing possibly exponentially growing dimensionality weestablish the oracle inequalities of the resulting regularized estima-tor as the global minimizer under various prediction and variableselection losses as well as the oracle risk inequalities of the hard-thresholded estimator followed by a further L2-regularization Therisk properties exhibit interesting shrinkage effects under both es-timation and prediction losses We identify the optimal choice ofthe ridge parameter which is shown to have simultaneous advan-tages to both the L2-loss and prediction loss These new results andphenomena are evidenced by simulation and real data examples

Local Independence Feature Screening for Nonparametric andSemiparametric Models by Marginal Empirical LikelihoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu3

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 63

Abstracts

1University of Melbourne2University of Colorado Denver3North Carolina State UniversitychengyongtangucdenvereduWe consider an independence feature screening method for iden-tifying contributing explanatory variables in high-dimensional re-gression analysis Our approach is constructed by using the em-pirical likelihood approach in conjunction with marginal nonpara-metric regressions to surely capture the local impacts of explana-tory variables Without requiring a specific parametric form of theunderlying data model our approach can be applied for a broadrange of representative nonparametric and semi-parametric modelswhich include but are not limited to the nonparametric additivemodels single-index and multiple-index models and varying co-efficient models Facilitated by the marginal empirical likelihoodour approach addresses the independence feature screening prob-lem with a new insight by directly assessing evidence of significancefrom data on whether an explanatory variable is contributing locallyto the response variable or not Such a feature avoids the estima-tion step in most existing independence screening approaches andis advantageous in scenarios such as the single-index models whenthe identification of the marginal effect for its estimation is an issueTheoretical analysis shows that the proposed feature screening ap-proach can handle data dimensionality growing exponentially withthe sample size By extensive theoretical illustrations and empiricalexamples we show that the local independence screening approachworks promisingly

The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2

1Florida State University2University of MinnesotamaistatfsueduA new model-free screening method named fused Kolmogorov filteris proposed for high-dimensional data analysis This new method isfully nonparametric and can work with many types of covariatesand response variables including continuous discrete and categor-ical variables We apply the fused Kolmogorov filter to deal withvariable screening problems emerging from in a wide range of ap-plications such as multiclass classification nonparametric regres-sion and Poisson regression among others It is shown that thefused Kolmogorov filter enjoys the sure screening property underweak regularity conditions that are much milder than those requiredfor many existing nonparametric screening methods In particu-lar the fused Kolmogorov can still be powerful when covariatesare strongly dependent of each other We further demonstrate thesuperior performance of the fused Kolmogorov filter over existingscreening methods by simulations and real data examples

Session 38 Leading Across Boundaries Leadership De-velopment for Statisticians

Xiaoli Meng1Dipak Dey2 Soonmin Park3 James Hung4 WalterOffen5

1Harvard University2University of Connecticut3Eli Lilly and Company4United States Food and Drug Administration5AbbVie Inc1mengstatharvardedu2dipakdeyuconnedu

3park soominlillycom4hsienminghungfdahhsgov5walteroffenabbviecomThe role of statistician has long been valued as a critical collabo-rator in interdisciplinary collaboration Nevertheless statistician isoften regarded as a contributor more than a leader This stereotypehas limited statistics as a driving perspective in a partnership envi-ronment and inclusion of statistician in executive decision makingMore leadership skills are needed to prepare statisticians to play in-fluential roles and to promote our profession to be more impactfulIn this panel session statistician leaders from academia govern-ment and industry will share their insights about leadership andtheir experiences in leading in their respective positions Importantleadership skills and qualities for statisticians will be discussed bythe panelists This session is targeted for statisticians who intend toseek more knowledge and inspiration of leadership

Session 39 Recent Advances in Adaptive Designs inEarly Phase Trials

A Toxicity-Adaptive Isotonic Design for Combination Therapyin OncologyRui QinMayo ClinicqinruimayoeduWith the development of molecularly targeted drugs in cancer treat-ment combination therapy targeting multiple pathways to achievepotential synergy becomes increasingly popular While the dosingrange of individual drug may be already defined the maximum tol-erated dose of combination therapy is yet to be determined in a newphase I trial The possible dose level combinations which are par-tially ordered poses a great challenge for conventional dose-findingdesignsWe have proposed to estimate toxicity probability by isotonic re-gression and incorporate the attribution of toxicity into the consid-eration of dose escalation and de-escalation of combination therapySimulation studies are conducted to understand and assess its oper-ational characteristics under various scenarios The application ofthis novel design into an ongoing phase I clinical trial with dualagents is further illustrated as an example

Calibration of the Likelihood Continual Reassessment Methodfor Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung1

1Columbia University2Boehringer Ingelheim Pharmaceuticalssml2114columbiaeduThe likelihood continual reassessment method is an adaptive model-based design used to estimate the maximum tolerated dose in phaseI clinical trials The method is generally implemented in a two stageapproach whereby model based dose escalation is activated after aninitial sequence of patients are treated While it has been shown thatthe method has good large sample properties in finite sample set-tings it is important to specify a reasonable model We proposea systematic approach to select the initial dose sequence and theskeleton based on the concepts of indifference interval and coher-ence We compare the approaches to the traditional trial and errorapproach in the context of examples The systematic calibration ap-proach simplifies the model calibration process for the likelihoodcontinual reassessment method while being competitive comparedto a time consuming trial and error process We also share our expe-

64 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

rience using the calibration technique in real life applications usingthe dfcrm package in R

Sequential Subset Selection Procedure of Random Subset Sizefor Early Phase Clinical trialsCheng-Shiun Leu and Bruce LevinColumbia Universitycl94columbiaeduIn early phase clinical trials the objective is often to select a sub-set of promising candidate treatments whose treatment effects aregreater than the remaining candidates by at least a pre-specifiedamount to bring forward for phase III confirmatory testing Undercertain constraints such as budgetary limitations or difficulty of re-cruitment a procedure which select a subset of fixed pre-specifiedsize is entirely appropriate especially when the number of treat-ments available for further testing is limited However cliniciansand researchers often demand to identify all efficacious treatmentsin the screening process and a subset selection of fixed size may notbe sufficient to satisfy the requirement as the number of efficacioustreatments is unknown prior to the experiment To address this is-sue we discuss a family of sequential subset selection procedureswhich identify a subset of efficacious treatments of random sizethereby avoiding the need to pre-specify the subset size Variousversions of the procedure allow adaptive sequential elimination ofinferior treatments and sequential recruitment of superior treatmentsas the experiment processes We compare these new procedure withGuptarsquos random subset size procedure for selecting the one best can-didate by simulation

Serach Procedures for the MTD in Phase I TrialsShelemyyahu ZacksBinghamton UniversityshellymathbinghamtoneduThere are several competing methods of search for the MTD inPhase I Cancer clinical trials The paper will review some proce-dures and compare the operating characteristics of them In partic-ular the EWOC method of Rogatko and el will be highlighted

Session 40 High Dimensional RegressionMachineLearning

Variable Selection for High-Dimensional Nonparametric Ordi-nary Differential Equation Models With Applications to Dy-namic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu1

1University of Rochester2State University of New York at Albany3George Washington UniversityHongqi XueurmcrochestereduThe gene regulation network (GRN) is a high-dimensional complexsystem which can be represented by various mathematical or sta-tistical models The ordinary differential equation (ODE) model isone of the popular dynamic GRN models High-dimensional lin-ear ODE models have been proposed to identify GRNs but witha limitation of the linear regulation effect assumption We pro-pose a nonparametric additive ODE model coupled with two-stagesmoothing-based ODE estimation methods and adaptive groupLASSO techniques to model dynamic GRNs that could flexiblydeal with nonlinear regulation effects The asymptotic propertiesof the proposed method are established under the ldquolarge p small nrdquosetting Simulation studies are performed to validate the proposed

approach An application example for identifying the nonlinear dy-namic GRN of T-cell activation is used to illustrate the usefulnessof the proposed method

BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University2Cornell Universitypingli98gmailcomThe method of stable random projections is useful for efficientlyapproximating the lα distance in high dimension and it is naturallysuitable for data streams In this paper we propose to use only thesigns of the α = 1 (ie Cauchy random projections) we showthat the probability of collision can be accurately approximated asfunctions of the chi-square (χ2) similarity In text and vision ap-plications the χ2 similarity is a popular measure when the featuresare generated from histograms (which are a typical example of datastreams) Experiments confirm that the proposed method is promis-ing for large-scale learning applications The full paper is availableat arXiv13081009

A Sparse Linear Discriminant Analysis Method with Asymp-totic Optimality for Multiclass ClassificationRuiyan Luo and Xin QiGeorgia State UniversityrluogsueduRecently many sparse linear discriminant analysis methods havebeen proposed to overcome the major problems of the classic lineardiscriminant analysis in high-dimensional settings However theasymptotic optimality results are limited to the case that there areonly two classes as the classification boundary of LDA is a hyper-plane and there exist explicit formulas for the classification errorWe propose an efficient sparse linear discriminant analysis methodfor multiclass classification In practice this method can control therelationship between the sparse components and hence have im-proved prediction accuracy compared to other methods in both sim-ulation and case studies In theory we derive asymptotic optimalityfor our method as dimensionality and sample size go to infinity witharbitrary fixed number of classes

Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles KooperbergFred Hutchinson Cancer Research CenterychengfhcrcorgThe development in next-generation-sequencing technology en-ables the detection of both common and rare variants Genome wideassociation study (GWAS) benefits greatly from this fast growingtechnology Although a lot of associations between variants anddisease have been found for common variants new methods for de-tecting functional rare variants is still in urgent need Among exist-ing methods efforts have been done to increase detection power bydoing set-based test However none of the methods make a distinc-tion between functional variants and neutral variants (ie variantsthat do not have effect on the disease) In this paper we propose tomodel the effects from a set (for example a gene) of variants as aHidden Markov Model (HMM) For each SNP we model the effectsas a mixture of 0 and θ where θ is the true effect size The mixtureset up is to account for the fact that a proportion of the variants areneutral Another advantage of using HMM is it can account for pos-sible association between neighboring variants Our methods workswell for both linear model and logistic model Under the frameworkof HMM we test between having 1 components against more com-ponents and derived the asymptotic distribution under null hypoth-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 65

Abstracts

esis We show that our proposed methods works well as comparedto competitors under various scenario

Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin LiWright State UniversitygengxinliwrighteduEmpirical Bayes classification method is a useful risk prediction ap-proach for microarray data but it is challenging to apply this methodto risk prediction area using the mini exome sequencing data A ma-jor advantage of using this method is that the effect size distributionfor the set of possible features is empirically estimated and that allsubsequent parameter estimation and risk prediction is guided bythis distribution Here we generalize Efronrsquos method to allow forsome of the peculiarities of the mini exome sequencing data In par-ticular we incorporate quantitative trait information to binary traitprediction model and a new model named Joint Trait Model is pro-posed and we further allow this model to properly incorporate theannotation information of single nucleotide polymorphisms (SNPs)In the course of our analysis we examine several aspects of the pos-sible simulation model including the identity of the most importantgenes the differing effects of synonymous and non-synonymousSNPs and the relative roles of covariates and genes in conferringdisease risk Finally we compare the three methods to each otherand to other classifiers

Rank Estimation and Recovery of Low-rank Matrices For Fac-tor Model with Heteroscedastic NoiseJingshu Wang and Art B OwenStanford UniversitywangjingshususangmailcomWe consider recovery of low-rank matrices from noisy data withheteroscedastic noise We use an early stopping alternating method(ESAM) which iteratively alters the estimate of the noise vari-ance and the low-rank matrix and corrects over-fitting by an early-stopping rule Various simulations in our study suggest stoppingafter just 3 iterations and we have seen that ESAM gives better re-covery than the SVD on either the original data or the standardizeddata with the optimal rank given To select a rank we use an early-stopping bi-cross-validation (BCV) technique modified from BCVfor the white noise model Our method leaves out half the rows andhalf the columns as in BCV but uses low rank operations involvingESAM instead of the SVD on the retained data to predict the heldout entries Simulations considering both strong and weak signalcases show that our method is the most accurate overall comparedto some BCV strategies and two versions of Parallel Analysis (PA)PA is a state-of-the art method for choosing the number of factorsin Factor Analysis

Session 41 Distributional Inference and Its Impact onStatistical Theory and Practice

Stat Wars Episode IV A New Hope (For Objective Inference)Keli Liu and Xiao-Li MengHarvard UniversitymengstatharvardeduA long time ago in a galaxy far far away (pre-war England)It is a period of uncivil debate Rebel statisticians striking froman agricultural station have won their first victory against the evilBayesian EmpireA plea was made ldquoHelp me R A Fisher yoursquore my only hoperdquo

and Fiducial was born It promised posterior probability statementson parameters without a prior but at the seeming cost of violatingbasic probability laws Was Fisher crazy or did madness mask in-novation Fiducial calculations can be easily understood throughthe missing-data perspective which illuminates a trinity of missinginsightsI The Bayesian prior becomes an infinite dimensional nuisance pa-rameter to be dealt with using partial likelihoodII A Missing At Random (MAR) condition naturally characterizeswhen exact Fiducial solutions existIII Understanding the ldquomulti-phaserdquo structure underlying Fiducialinference leads to the development of approximate Fiducial proce-dures which remain robust to prior misspecificationIn the years after its introduction Fiducialrsquos critics branded it ldquoFish-ers biggest blunderrdquo But in the great words of Obi-Wan ldquoIf youstrike me down I shall become more powerful than you can possi-bly imaginerdquoTo be continued Episode V Ancillarity Paradoxes Strike Back (AtFiducial) and Episode VI Return of the Fiducialist will premiere re-spectively at IMS Asia Pacific Rim Meeting in Taipei (June 30-July3 2014) and at IMS Annual Meeting in Sydney (July 7-11 2014)

Higher Order Asymptotics for Generalized Fiducial InferenceAbhishek Pal Majumdarand Jan HannigUniversity of North Carolina at Chapel HilljanhannigunceduR A Fisherrsquos fiducial inference has been the subject of many dis-cussions and controversies ever since he introduced the idea duringthe 1930rsquos The idea experienced a bumpy ride to say the leastduring its early years and one can safely say that it eventually fellinto disfavor among mainstream statisticians However it appearsto have made a resurgence recently under various names and mod-ifications For example under the new name generalized inferencefiducial inference has proved to be a useful tool for deriving statis-tical procedures for problems where frequentist methods with goodproperties were previously unavailable Therefore we believe thatthe fiducial argument of RA Fisher deserves a fresh look from anew angle In this talk we investigate the properties of general-ized fiducial distribution using higher order asymptotics and pro-vide suggestions on some open issues in fiducial inference such asthe choice of data generating equation

Generalized Inferential ModelsRyan MartinUniversity of Illinois at ChicagorgmartinuiceduThe new inferential model (IM) framework provides prior-freeprobabilistic inference which is valid for all models and all sam-ple sizes The construction of an IM requires specification of anassociation that links the observable data to the parameter of inter-est and an unobservable auxiliary variable This specification canbe challenging however particularly when the parameter is morethan one dimension In this talk I will present a generalized (orldquoblack-boxrdquo) IM that bypasses full specification of the associationand the challenges it entails by working with an association basedon a scalar-valued parameter-dependent function of the data The-ory and examples demonstrate this method gives exact and efficientprior-free probabilistic inference in a wide variety of problems

Formal Definition of Reference Priors under a General Class ofDivergenceDongchu SunUniversity of Missouri

66 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

sundmissourieduReference analysis produces objective Bayesian inference that in-ferential statements depend only on the assumed model and theavailable data and the prior distribution used to make an inferenceis least informative in a certain information-theoretic sense BergerBernardo and Sun (2009) derived reference priors rigorously in thecontexts under Kullback-Leibler divergence In special cases withcommon support and other regularity conditions Ghosh Mergeland Liu (2011) derived a general f-divergence criterion for priorselection We generalize Ghosh Mergel and Liursquos (2011) results tothe case without common support and show how an explicit expres-sion for the reference prior can be obtained under posterior consis-tency The explicit expression can be used to derive new referencepriors both analytically and numerically

Session 42 Applications of Spatial Modeling and Imag-ing Data

Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1 andJames Coan2

1Duke University2University of Virginiatz3bvirginiaeduMulti-subject functional magnetic resonance imaging (fMRI) dataprovide opportunities to study population-wide relationship be-tween human brain activity and individual biological or behaviorialtraits But statistical modeling analysis and computation for suchmassive and noisy data with a complicated spatio-temporal corre-lation structure is extremely challenging In this article within theframework of Bayesian stochastic search variable selection we pro-pose a joint Ising and Dirichlet Process (Ising-DP) prior to achieveselection of spatially correlated brain voxels that are predictive ofindividual responses The Ising component of the prior utilizesof the spatial information between voxels and the DP componentshrinks the coefficients of the large number of voxels to a smallset of values and thus greatly reduces the posterior computationalburden To address the phase transition phenomenon of the Isingprior we propose a new analytic approach to derive bounds for thehyperparameters illustrated on 2- and 3-dimensional lattices Theproposed method is compared with several alternative methods viasimulations and is applied to the fMRI data collected from the Kiffhand-holding experiment

A Hierarchical Model for Simultaneous Detection and Estima-tion in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist21DePaul University2Johns Hopkins UniversityddegrasvdepauleduIn this paper we introduce a new hierarchical model for the simul-taneous detection of brain activation and estimation of the shapeof the hemodynamic response in multi-subject fMRI studies Theproposed approach circumvents a major stumbling block in stan-dard multi-subject fMRI data analysis in that it both allows theshape of the hemodynamic response function to vary across regionand subjects while still providing a straightforward way to estimatepopulation-level activation An efficient estimation algorithm is pre-sented as is an inferential framework that not only allows for testsof activation but also for tests for deviations from some canonical

shape The model is validated through simulations and applicationto a multi-subject fMRI study of thermal pain

On the Relevance of Accounting for Spatial Correlation A CaseStudy from FloridaLinda J Young1 and Emily Leary21USDA NASS RDD2University of FloridalindayoungnassusdagovIdentifying the potential impact of climate change is of increas-ing interest As an example understanding the effects of changingtemperature patterns on crops animals and public health is impor-tant if mitigation or adaptation strategies are to be developed Herethe consequences of the increasing frequency and intensity of heatwaves are considered First four decades of temperature data areused to identify heat waves for the six National Weather Serviceregions within Florida During these forty years each tempera-ture monitor has some days for which no data were recorded Thepresence of missing data has largely been ignored in this settingand analyses have been conducted based on observed data Alter-natively time series models spatial models or space-time modelscould be used to impute the missing data Here the effects of thetreatment of missing data on the identification of heat waves and thesubsequent inference related to the impact of heat waves on publichealth are explored

Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico2University of Texas at AustinghuertastatunmeduWe consider some recent developments to deal with climate mod-els and that rely on various modern computational and statisticalstrategies Firstly we consider various posterior sampling strate-gies to study a surrogate model that approximates a climate re-sponse through the Earthrsquos orbital parameters In particular weshow that for certain metrics of model skill AdaptiveDelayed Re-jection MCMC methods are effective to estimate parametric uncer-tainties and resolve inverse problems for climate models We willalso discuss some of the High Performance Computing efforts thatare taking place to calibrate various inputs that correspond to theNCAR Community Atmosphere Model (CAM) Finally we showhow to characterize output from a Regional Climate Model throughhierarchical modelling that combines Gauss Markov Random Fields(GMRF) with MCMC methods and that allows estimation of prob-ability distributions that underlie phenomena represented by the cli-mate output

Session 43 Recent Development in Survival Analysis andStatistical Genetics

Restricted Survival Time and Non-proportional HazardsZhigang ZhangMemorial Sloan Kettering Cancer CenterzhangzmskccorgIn this talk I will present some recent development of restricted sur-vival time and its usage especially when the proportional hazardsassumption is violated Technical advances and numerical studieswill both be discussed

Empirical Null using Mixture Distributions and Its Applicationin Local False Discovery RateDoHwan Park

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 67

Abstracts

University of MarylanddhparkumbceduWhen high dimensional data is given it is often of interest to distin-guish between significant (non-null Ha) and non-significant (nullH0) group from mixture of two by controlling type I error rate Onepopular way to control the level is the false discovery rate (FDR)This talk considers a method based on the local false discovery rateIn most of the previous studies the null group is commonly as-sumed to be a normal distribution However if the null distributioncan be departure from normal there may exist too many or too fewfalse discoveries (belongs null but rejected from the test) leadingto the failure of controlling the given level of FDR We propose anovel approach which enriches a class of null distribution based onmixture distributions We provide real examples of gene expressiondata fMRI data and protein domain data to illustrate the problemsfor overview

A Bayesian Illness-Death Model for the Analysis of CorrelatedSemi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici11Harvard University2Dana Farber Cancer InstitutekleehsphharvardeduReadmission rates are a major target of healthcare policy becausereadmission is common costly and potentially avoidable and henceis seen as an adverse outcome Therefore the Centers for Medicareand Medicaid Services currently uses 30-day readmission as a proxyoutcome for quality of care for a number of health conditions How-ever focusing solely on readmission rates in conditions with poorprognosis such as pancreatic cancer is to oversimplify a situationin which patients may die before being readmitted which clearlyis also an adverse outcome In such situations healthcare policyshould consider both readmission and death rates simultaneouslyTo this end our proposed Bayesian framework adopts an illness-death model to represent three transitions for pancreatic cancer pa-tients recently discharged from initial hospitalization (1) dischargeto readmission (2) discharge to death and (3) readmission to deathDependence between the two event times (readmission and death) isinduced via a subject-specific shared frailty Our proposed methodfurther extends the model to situations where patients within a hos-pital may be correlated due to unobserved characteristics We illus-trate the practical utility of our proposed method using data fromMedicare Part A on 100 of Medicare enrollees from 012000 to122010

Detection of Chromosome Copy Number Variations in MultipleSequencesXiaoyi Min Chi Song and Heping ZhangYale UniversityxiaoyiminyaleeduDNA copy number variation (CNV) is a form of genomic struc-tural variation that may affect human diseases Identification of theCNVs shared by many people in the population as well as deter-mining the carriers of these CNVs is essential for understanding therole of CNV in disease association studies For detecting CNVsin single samples a Screening and Ranking Algorithm (SaRa) waspreviously proposed which was shown to be superior over othercommonly used algorithms and have a sure coverage property Weextend SaRa to address the problem of common CNV detection inmultiple samples In particular we propose an adaptive Fisherrsquosmethod for combining the screening statistics across samples Theproposed multi-sample SaRa method inherits the computational and

practical benefits of single sample SaRa in CNV detection We alsocharacterize the theoretical properties of this method and demon-strate its performance in extensive numerical analyses

Session 44 Bayesian Methods and Applications in Clini-cal Trials with Small Population

Applications of Bayesian Meta-Analytic Approach at NovartisQiuling Ally He Roland Fisch and David OhlssenNovartis Pharmaceuticals CorporationallyhenovartiscomConducting an ethical efficient and cost-effective clinical trial hasalways been challenged by the availability of limited study popu-lation Bayesian approaches demonstrate many appealing featuresto deal with studies with small sample sizes and their importancehas been recognized by health authorities Novartis has been ac-tively developing and implementing Bayesian methods at differentstages of clinical development in both oncology and non-oncologysettings This presentation focuses on two applications of Bayesianmeta-analytic approach Both applications explore the relevant his-torical studies and establish meta-analysis to generate inferencesthat can be utilized by the concurrent studies The first example syn-thesized historical control information in a proof-of-concept studythe second application extrapolated efficacy from source to targetpopulation for registration purpose In both applications Bayesiansmethods are shown to effectively reduce the sample size durationof the studies and consequently resources invested

Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and Yuan Ji31University of Texas at Austin2Harvard University3University of Texas at AustinyxustatgmailcomTargeted therapies based on biomarker profiling are becoming amainstream direction of cancer research and treatment Dependingon the expression of specific prognostic biomarkers targeted ther-apies assign different cancer drugs to subgroups of patients evenif they are diagnosed with the same type of cancer by traditionalmeans such as tumor location For example Herceptin is only in-dicated for the subgroup of patients with HER2+ breast cancer butnot other types of breast cancer However subgroups like HER2+breast cancer with effective targeted therapies are rare and most can-cer drugs are still being applied to large patient populations that in-clude many patients who might not respond or benefit Also theresponse to targeted agents in human is usually unpredictable Toaddress these issues we propose SUBA subgroup-based adaptivedesigns that simultaneously search for prognostic subgroups and al-locate patients adaptively to the best subgroup-specific treatmentsthroughout the course of the trial The main features of SUBA in-clude the continuous reclassification of patient subgroups based ona random partition model and the adaptive allocation of patients tothe best treatment arm based on posterior predictive probabilitiesWe compare the SUBA design with three alternative designs in-cluding equal randomization outcome-adaptive randomization anda design based on a probit regression In simulation studies we findthat SUBA compares favorably against the alternatives

Innovative Designs and Practical Considerations for PediatricStudiesAlan Y Chiang

68 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Eli Lilly and CompanychiangaylillycomDespite representing a fundamental step to an efficacious and safeutilization of drugs in pediatric studies the conduct of clinical tri-als in children poses several problems Methodological issues andethical concerns represent the major obstacles that have tradition-ally limited research in small population The randomized controltrial mainstay of clinical studies to assess the effects of any thera-peutic intervention shows some weaknesses which make it scarcelyapplicable to the small population Alternatively and innovative ap-proaches to the clinical trial design in small populations have beendeveloped in the last decades with the aim of overcoming the limitsrelated to small samples and to the acceptability of the trial Thesefeatures make them particularly appealing for the pediatric popu-lation and patients with rare diseases This presentation aims toprovide a variety of designs and analysis methods to assess efficacyand safety in pediatric studies including their applicability advan-tages disadvantages and real case examples Approaches includeBayesian designs borrowing information from other studies andmore innovative approaches Thanks to their features these meth-ods may rationally limit the amount of experimentation in smallpopulation to what is achievable necessary and ethical and presenta reliable way of ultimately improving patient care

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis

partDSA for Deriving Survival Risk Groups Ensemble Learn-ing and Variable SelectionAnnette Molinaro1 Adam Olshen1 and Robert Strawderman2

1University of California at San Francisco2University of RochestermolinaroaneurosurgucsfeduWe recently developed partDSA a multivariate method that sim-ilarly to CART utilizes loss functions to select and partition pre-dictor variables to build a tree-like regression model for a given out-come However unlike CART partDSA permits both rsquoandrsquo and rsquoorrsquoconjunctions of predictors elucidating interactions between vari-ables as well as their independent contributions partDSA thus per-mits tremendous flexibility in the construction of predictive modelsand has been shown to supersede CART in both prediction accu-racy and stability As the resulting models continue to take the formof a decision tree partDSA also provides an ideal foundation fordeveloping a clinician-friendly tool for accurate risk prediction andstratificationWith right-censored outcomes partDSA currently builds estimatorsvia either the Inverse Probability Censoring Weighted (IPCW) orBrier Score weighting schemes see Lostritto Strawderman andMolinaro (2012) where it is shown in numerous simulations thatboth proposed adaptations for partDSA perform as well and of-ten considerably better than two competing tree-based methods Inthis talk various useful extensions of partDSA for right-censoredoutcomes are described and we show the power of the partDSA al-gorithm in deriving survival risk groups for glioma patient basedon genomic markers Another interesting extension of partDSA isas an aggregate learner A comparison will be made of standardpartDSA to an ensemble version of partDSA as well as to alterna-tive ensemble learners in terms of prediction accuracy and variableselection

Predictive Accuracy of Time-Dependent Markers for Survival

OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2

1University of Kentucky2University of North Carolina at Chapel HilllichenukyukyeduIn clinical cohort studies potentially censored times to a certainevent such as death or disease progression and patient charac-teristics at the time of diagnosis or the time of inclusion in thestudy (baseline) are often recorded Serial measurements on clin-ical markers during follow up may also be recorded for monitoringpurpose Recently there are increasing interests in incorporatingthese serial measurements of markers for the prediction of futuresurvival outcomes and assessing the predictive accuracy of thesetime-dependent markers In this paper we propose a new graphicalmeasure the negative predictive function to quantify the predictiveaccuracy of time-dependent markers for survival outcomes Thisnew measure has direct relevance to patient survival probabilitiesand thus has direct clinical utility We construct a nonparametricestimator for the proposed measure allowing censoring to dependon markers and adopt the bootstrap method to obtain the asymp-totic variances Simulation studies demonstrate that the proposedmethod performs well in practical situations One medical study ispresented

Estimating the Effectiveness in HIV Prevention Trials by Incor-porating the Exposure Process Application to HPTN 035 DataJingyang Zhang1 and Elizabeth R Brown2

1Fred Hutchinson Cancer Research Center2Fred Hutchinson Cancer Research CenterUniversity of Washing-tonjzhang2fhcrcorgEstimating the effectiveness of a new intervention is usually the pri-mary objective for HIV prevention trials The Cox proportionalhazard model is mainly used to estimate effectiveness by assum-ing that participants share the same risk under the covariates andthe risk is always non-zero In fact the risk is only non-zero whenan exposure event occurs and participants can have a varying riskto transmit due to varying patterns of exposure events Thereforewe propose a novel estimate of effectiveness adjusted for the hetero-geneity in the magnitude of exposure among the study populationusing a latent Poisson process model for the exposure path of eachparticipant Moreover our model considers the scenario in which aproportion of participants never experience an exposure event andadopts a zero-inflated distribution for the rate of the exposure pro-cess We employ a Bayesian estimation approach to estimate theexposure-adjusted effectiveness eliciting the priors from the histor-ical information Simulation studies are carried out to validate theapproach and explore the properties of the estimates An applicationexample is presented from an HIV prevention trial

Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2

1Penn State College of Medicine2Emory UniversitymwangphspsueduIn practice prediction models for cancer risk and prognosis playan important role in priority cancer research and evaluating andcomparing different models using predictive accuracy metrics in thepresence of censored data are of substantive interest by adjusting forcensoring mechanism To address this issue we evaluate two exist-ing metrics the concordance (c) statistic and the weighted c-statistic

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 69

Abstracts

which adopts an inverse-probability weighting technique under thecircumstances with dependent censoring mechanism via numericalstudies The asymptotic properties of the weighted c-statistic in-cluding consistency and normality is theoretically and rigorouslyestablished In particular the cases with high-dimensional prog-nostic factors (p is moderately large) are investigated to assess thestrategies for estimating the censoring weights by utilizing a regu-larization approach with lasso penalty In addition sensitivity anal-ysis is theoretically and practically conducted to assess predictiveaccuracy in the cases of informative censoring (ie not coarsened atrandom) using non-parametric estimates on the cumulative baselinehazard for the weights Finally a prostate cancer study is adopted tobuild up and evaluate prediction models of future tumor recurrenceafter surgery

Session 46 Missing Data the Interface between SurveySampling and Biostatistics

Likelihood-based Inference with Missing Data Under Missing-at-randomShu Yang and Jae Kwang KimIowa State Universityshuyangiastateedu

Likelihood-based inference with missing data is a challenging prob-lem because the observed log likelihood is of an integral form Ap-proximating the integral by Monte Carlo sampling does not neces-sarily lead to valid inference because the Monte Carlo samples aregenerated from a distribution with a fixed parameter valueWe consider an alternative approach that is based on the parametricfractional imputation of Kim (2011) In the proposed method thedependency of the integral on the parameter is properly reflectedthrough fractional weights We discuss constructing a confidenceinterval using the profile likelihood ratio test A Newton-Raphsonalgorithm is employed to find the interval end points Two limitedsimulation studies show the advantage of the likelihood-based in-ference over the Wald-type inference in terms of power parameterspace conformity and computational efficiency A real data exampleon Salamander mating (McCullagh and Nelder 1989) shows thatour method also works well with high-dimensional missing data

Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang ChenIowa State Universitysncheniastateedu

In this article we consider an imputation method to handle missingresponse values based on semiparametric quantile regression esti-mation In the proposed method the missing response values aregenerated using the estimated conditional quantile regression func-tion at given values of covariates We adopt the generalized methodof moments for estimation of parameters defined through a generalestimation equation We demonstrate that the proposed estimatorcombining both semiparametric quantile regression imputation andgeneralized method of moments is an effective alternative to pa-rameter estimation when missing data is present The consistencyand the asymptotic normality of our estimators are established andvariance estimation is provided Results from limited simulationstudies are presented to show the adequacy of the proposed method

A New Estimation with Minimum Trace of Asymptotic Covari-ance Matrix for Incomplete Longitudinal Data with a Surrogate

ProcessBaojiang Chen1 and Jing Qin2

1University of Nebraska2National Institutes of HealthbaojiangchenunmceduMissing data is a very common problem in medical and social stud-ies especially when data are collected longitudinally It is a chal-lenging problem to utilize observed data effectively Many paperson missing data problems can be found in statistical literature Itis well known that the inverse weighted estimation is neither effi-cient nor robust On the other hand the doubly robust method canimprove the efficiency and robustness As is known the doubly ro-bust estimation requires a missing data model (ie a model for theprobability that data are observed) and a working regression model(ie a model for the outcome variable given covariates and surro-gate variables) Since the DR estimating function has mean zero forany parameters in the working regression model when the missingdata model is correctly specified in this paper we derive a formulafor the estimator of the parameters of the working regression modelthat yields the optimally efficient estimator of the marginal meanmodel (the parameters of interest) when the missing data model iscorrectly specified Furthermore the proposed method also inher-its the doubly robust property Simulation studies demonstrate thegreater efficiency of the proposed method compared to the standarddoubly robust method A longitudinal dementia data set is used forillustration

Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook21Queenrsquos University2University of WaterloomcisaacmqueensucaResponse-dependent two-phase designs can ensure good statisti-cal efficiency while working within resource constraints Samplingschemes that are optimized for analyses based on mean score esti-mating equations have been shown to be highly efficient in a numberof different settings and are straightforward to implement if detailedpopulation characteristics are knownI will present an adaptive multi-phase design which exploits in-formation from an internal pilot study to approximate this optimalmean score design These adaptive designs are easy to implementand result in large efficiency gains while keeping study costs lowThe implementation of this design will be demonstrated using simu-lation studies motivated by an ongoing research program in rheuma-tology

Session 47 New Statistical Methods for Comparative Ef-fectiveness Research and Personalized medicine

Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin4

1University of Texas MD Anderson Cancer Center2Baylor College of Medicine3University of Texas MD Anderson Cancer Center4National Institutes of HealthyshenmdandersonorgUsing data from large observational studies may fill the informa-tion gaps due to lack of evidence from randomized controlled trialsSuch studies may inform real-world clinical scenarios and improveclinical decisions among various treatment strategies However the

70 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

design and analysis of comparative effectiveness studies based onobservational data are complex In this work we proposed prac-tical sample size and power calculation tools for prevalent cohortdesigns and suggested some efficient analysis methods as well

Choice between Superiority and Non-inferiority in Compara-tive Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford UniversityMei-ChiungShihvagovIn designing a comparative effectiveness experiment such as an ac-tive controlled clinical trial comparing a new treatment to an ac-tive control treatment or a comparative effectiveness trial comparingtreatments already in use one sometimes has to choose between asuperiority objective (to demonstrate that one treatment is more ef-fective than the other active treatments) and a non-inferiority objec-tive (to demonstrate that one treatment is no worse than other activetreatments within a pre-specified non-inferiority margin) It is oftendifficult to decide which study objective should be undertaken at theplanning stage when one does not have actual data on the compar-ative effectiveness of the treatments In this talk we describe twoadaptive design features for such trials (1) adaptive choice of su-periority and non-inferiority objectives during interim analyses (2)treatment selection instead of testing superiority The latter aims toselect treatments whose outcomes are close to that of the best treat-ment by eliminating at interim analyses non-promising treatmentsthat are unlikely to be much better than the observed best treatment

An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze LaiStanford UniversitymikebaiocchigmailcomThe demand for rigorous studies of dynamic treatment regimens isincreasing as medical providers treat larger numbers of patients withboth multi-stage disease states and chronic care issues (for examplecancer treatments pain management depression HIV) In this talkwe will propose a trial design developed specifically to be run in areal-world clinical setting These kinds of trials (sometimes calledldquopragmatic trialsrdquo) have several advantages which we will discussThey also pose two major problems for analysis (1) in runninga randomized trial in a clinical setting there is an ethical impera-tive to provide patients with the best outcomes while still collect-ing information on the relative efficacy of treatment regimes whichmeans traditional trial designs are inadequate in providing guidanceand (2) real-world considerations such as informative censoring ormissing data become substantial hurdles We incorporate elementsfrom both point-of-care randomized trials and multiarmed bandittheory and propose a unified method of trial design

Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins UniversityidiazjhueduWe present a methodology to evaluate the causal effect of a binarytreatment on a multinomial outcome when adjustment for covariatesis desirable Adjustment for baseline covariates may be desirableeven in randomized trials since covariates that are highly predic-tive of the outcome can substantially improve the efficiency Wefirst present a targeted minimum loss based estimator of the vec-tor of counterfactual probabilities This estimator is doubly robust

in observational studies and it is consistent in randomized trialsFurthermore it is locally semiparametric efficient under regular-ity conditions We present a variation of the previous estimatorthat may be used in randomized trials and that is guaranteed tobe asymptotically as efficient as the standard unadjusted estima-tor We use the previous results to derive a nonparametric extensionof the parameters in a proportional-odds model for ordinal-valueddata and present a targeted minimum loss based estimator Thisestimator is guaranteed to be asymptotically as or more efficientas the unadjusted estimator of the proportional-odds model As aconsequence this non-parametric extension may be used to test thenull hypothesis of no effect with potentially increased power Wepresent a motivating example and simulations using the data fromthe MISTIE II clinical trial of a new surgical intervention for strokeJoint work with Michael Rosenblum and Elizabeth Colantuoni

Session 48 Student Award Session 1

Regularization After Retention in Ultrahigh Dimensional Lin-ear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2

1Columbia University2Binghamton Universityhw2375columbiaedu

Lasso has proved to be a computationally tractable variable selec-tion approach in high dimensional data analysis However in theultrahigh dimensional setting the conditions of model selectionconsistency could easily fail The independence screening frame-work tackles this problem by reducing the dimensionality based onmarginal correlations before performing lasso In this paper we pro-pose a two-step approach to relax the consistency conditions of lassoby using marginal information in a different perspective from inde-pendence screening In particular we retain significant variablesrather than screening out irrelevant ones The new method is shownto be model selection consistent in the ultrahigh dimensional linearregression model A modified version is introduced to improve thefinite sample performance Simulations and real data analysis showadvantages of our method over lasso and independence screening

Personalized Dose Finding Using Outcome Weighted LearningGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hillguanhuacliveuncedu

In dose-finding clinical trials there is a growing recognition of theimportance to consider individual level heterogeneity when search-ing for optimal treatment doses Such optimal individualized treat-ment rule (ITR) for dosing should maximize the expected clinicalbenefit In this paper we consider a randomized trial design wherethe candidate dose levels are continuous To find the optimal ITRunder such a design we propose an outcome weighted learningmethod which directly maximizes the expected clinical beneficialoutcome This method converts the individualized dose selectionproblem into a penalized weighted regression with a truncated ell1loss A difference of convex functions (DC) algorithm is adoptedto efficiently solve the associated non-convex optimization prob-lem The consistency and convergence rate for the estimated ITRare derived and small-sample performance is evaluated via simula-tion studies We demonstrate that the proposed method outperformscompeting approaches We illustrate the method using data from aclinical trial for Warfarin (an anti-thrombotic drug) dosing

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 71

Abstracts

Survival Rates Prediction When Training Data and Target DataHave Different Measurement ErrorCheng Zheng and Yingye ZhengFred Hutchinson Cancer Research Centerzhengc68uweduNovel biologic markers have been widely used in predicting impor-tant clinical outcome One specific feature of biomarkers is thatthey often are ascertained with variations due to the specific processof measurement The magnitude of such variation may differ whenapplied to a different targeted population or when the platform forbiomarker assaying changes from original platform the predictionalgorithm (cutoffs) based upon Statistical methods have been pro-posed to characterize the effects of underlying error-free quantity inassociation with an outcome yet the impact of measurement errorsin terms of prediction has not been well studied We focus in thismanuscript on the settings which biomarkers are used for predictingindividualrsquos future risk and propose semiparametric estimators forerror-corrected risk when replicates of the error- prone biomark-ers are available The predictive performance of the proposed es-timators is evaluated and compared to alternative approaches withnumerical studies under settings with various assumptions on themeasurement distributions in the original cohort and a future cohortthe predictive rule is applied to We studied the asymptotic proper-ties of the proposed estimator Application is made in a liver cancerbiomarker study to predict risk of 3 and 4 years liver cancer inci-dence using age and a novel biomarker α-Fetoprotein

Hard Thresholded Regression Via Linear ProgrammingQiang SunUniversity of North Carolina at Chapel HillqsunliveunceduThis aim of this paper is to develop a hard thresholded regression(HTR) framework for simultaneous variable selection and unbiasedestimation in high dimensional linear regression This new frame-work is motivated by its close connection with best subset selectionunder orthogonal design while enjoying several key computationaland theoretical advantages over many existing penalization methods(eg SCAD or MCP) Computationally HTR is a fast two-step esti-mation procedure consisting of the first step for calculating a coarseinitial estimator and the second step for solving a linear program-ming Theoretically under some mild conditions the HTR estima-tor is shown to enjoy the strong oracle property and thresholed prop-erty even when the number of covariates may grow at an exponen-tial rate We also propose to incorporate the regularized covarianceestimator into the estimation procedure in order to better trade offbetween noise accumulation and correlation modeling Under thisscenario with regularized covariance matrix HTR includes Sure In-dependence Screening as a special case Both simulation and realdata results show that HTR outperforms other state-of-the-art meth-ods

Session 49 Network AnalysisUnsupervised Methods

Community Detection in Multilayer Networks A HypothesisTesting ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel HilljameswdemailunceduThe identification of clusters in relational data otherwise knownas community detection is an important and well-studied problemin undirected and directed networks Importantly the units of acomplex system often share multiple types of pairwise relationships

wherein a single community detection analysis does not account forthe unique types or layers In this scenario a sequence of networkscan be used to model each type of relationship resulting in a multi-layer network structure We propose and investigate a novel testingbased community detection procedure for multilayer networks Weshow that by borrowing strength across layers our method is ableto detect communities in scenarios that are impossible for contem-porary detection methods By investigating the performance andpotential use of our method through simulations and applicationon real multilayer networks we show that our procedure can suc-cessfully identify significant community structure in the multilayerregime

Network Enrichment Analysis with Incomplete Network Infor-mationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan2University of Washingtonmjingumichedu

Pathway enrichment analysis has become a key tool for biomed-ical researchers to gain insight in the underlying biology of dif-ferentially expressed genes proteins and metabolites It reducescomplexity and provides a systems-level view of changes in cellu-lar activity in response to treatments andor progression of diseasestates Methods that use pathway topology information have beenshown to outperform simpler methods based on over-representationanalysis However despite significant progress in understandingthe association among members of biological pathways and ex-pansion of new knowledge data bases such as Kyoto Encyclope-dia of Genes and Genomes Reactome BioCarta etc the exist-ing network information may be incompleteinaccurate and are notcondition-specific We propose a constrained network estimationframework that combines network estimation based on cell- andcondition-specific omics data with interaction information from ex-isting data bases The resulting pathway topology information issubsequently used to provide a framework for simultaneous test-ing of differences in mean expression levels as well as interactionmechanisms We study the asymptotic properties of the proposednetwork estimator and the test for pathway enrichment and investi-gate its small sample performance in simulated experiments and ona bladder cancer study involving metabolomics data

Estimation of A Linear Model with Fuzzy Data Treated as Spe-cial Functional DataWang DabuxilatuGuangzhou Universitywangdabugzhueducn

Data which cannot be exactly described by means of numerical val-ues such as evaluations medical diagnosi quality ratings vagueeconomic items to name but a few are frequently classified as ei-ther nominal or ordinal However we may be aware of that usingsuch representation of data (eg the categorises are labeled withnumerical values) the statistical analysis is limited and sometimesthe interpretation and reliability of the conclusions are effected Aneasy-to-use representation of such data through fuzzy values (fuzzydata) could be employed The measurement scale of fuzzy valuesincludes in particular real vectors and set values as special ele-ments It is more expressive than ordinal scales and more accuratethan rounding or using real or vectorial-valued codes The transi-tion between closely different values can be made gradually andthe variability accuracy and possible subjectiveness can be well re-flected in describing data Fuzzy data could be viewed as special

72 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

functional data the so-called support function of the data as it es-tablishes a useful embedding of the space of fuzzy data into a coneof a functional Hilbert spaceThe simple linear regression models with fuzzy data have been stud-ied from different perspectives and in different frameworks Theleast squares estimation on real-valued and set valued parametersunder generalized Hausdorff metric and the Hukuhara differenceare obtained However due to the nonlinearity of the space of fuzzyrandom sets it is difficult to consider the parameters estimation fora multivariate linear model with fuzzy random sets We will treatthe fuzzy data as special functional data to estimate a multivariatelinear model within a cone of a functional Hilbert space As a casewe consider LR fuzzy random sets (LR fuzzy values or LR fuzzydata) which are a sort of fuzzy data applied to model usual ran-dom experiments when the characteristic observed on each resultcan be described with fuzzy numbers of a particular class deter-mined by three random variables the center the left spread andthe right spread under the given shape functions L and R LRfuzzy random sets are widely applied in information science deci-sion making operational research economic and financial model-ings Using a least squares approach we obtain an estimation forthe set-valued parameters of the multivariate regression model withLR fuzzy random sets under L2 metric delta2dLSsome bootstrapdistributions for the spreads variables of the fuzzy random residualterm are given

Efficient Estimation of Sparse Directed Acyclic Graphs UnderCompounded Poisson DataSung Won Han and Hua ZhongNew York Universitysungwonhan2gmailcom

Certain gene expressions such as RNA-sequence measurementsare recorded as count data which can be assumed to follow com-pounded Poisson distribution This presentation proposes an effi-cient heuristic algorithm to estimate the structure of directed acyclicgraphs under the L1-penalized likelihood with the Poisson log-normal distributed data given that variable ordering is unknown Toobtain the close form of the penalize likelihood we apply Laplaceintegral approximation for unobserved normal variables and we useiterative two optimization steps to estimate an adjacency matrix andunobserved parameters The adjacency matrix is estimated by sepa-rable lasso problems and the unobserved parameters of the normaldistribution are estimated by separable optimization problems Thesimulation result shows that our proposed method performs betterthan the data transformation method in terms of true positive andMatthewrsquos correlation coefficient except for under low count datawith many zeros The large variance of data and the large numberof variables benefit more to the proposed method

Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and Harrison ZhouYale Universityzhaorenyaleedu

A tuning-free procedure is proposed to estimate the covariate-adjusted Gaussian graphical model For each finite subgraph thisestimator is asymptotically normal and efficient As a consequencea confidence interval can be obtained for each edge The proce-dure enjoys easy implementation and efficient computation throughparallel estimation on subgraphs or edges We further apply theasymptotic normality result to perform support recovery throughedge-wise adaptive thresholding This support recovery procedure

is called ANTAC standing for Asymptotically Normal estimationwith Thresholding after Adjusting Covariates ANTAC outper-forms other methodologies in the literature in a range of simulationstudies We apply ANTAC to identify gene-gene interactions us-ing a yeast eQTL (Genome-wide expression quantitative trait loci)dataset Our result achieves better interpretability and accuracy incomparison with the CAPME (covariate-adjusted precision matrixestimation) method proposed by Cai Li Liu and Xie (2013) This isa joint work with Mengjie Chen Hongyu Zhao and Harrison Zhou

Session 50 Personalized Medicine and Adaptive Design

MicroRNA Array NormalizationLi-Xuan and Qin ZhouMemorial Sloan Kettering Cancer CenterqinlmskccorgMicroRNA microarrays possess a number of unique data featuresthat challenge the assumption key to many normalization methodsWe assessed the performance of existing normalization methods us-ing two Agilent microRNA array datasets derived from the sameset of tumor samples one dataset was generated using a blockedrandomization design when assigning arrays to samples and hencewas free of confounding array effects the second dataset was gener-ated without blocking or randomization and exhibited array effectsThe randomized dataset was assessed for differential expression be-tween two tumor groups and treated as the benchmark The non-randomized dataset was assessed for differential expression afternormalization and compared against the benchmark Normaliza-tion improved the true positive rate significantly but still possessesa false discovery rate as high as 50 in the non-randomized dataregardless of the specific normalization method applied We per-formed simulation studies under various scenarios of differentialexpression patterns to assess the generalizability of our empiricalobservations

Combining Multiple Biomarker Models with Covariates in Lo-gistic Regression Using Modified ARM (Adaptive Regression byMixing) ApproachYanping Qiu1 and Rong Liu2

1Merck amp Co2Bayer HealthCarerongliuflgmailcomBiomarkers are wildly used as an indicator of some biological stateor condition in medical research One single biomarker may notbe sufficient to serve as an optimal screening device for early de-tection or prognosis for many diseases A combination of multiplebiomarkers will usually potentially lead to more sensitive screen-ing rules Therefore a great interest has been involved in develop-ing methods for combining biomarkers Biomarker selection pro-cedure will be necessary for efficient detections In this article wepropose a model-combining algorithm for classification with somenecessary covariates in biomarker studies It selects some best mod-els with some criterion and considers weighted combinations ofvarious logistic regression models via ARM (adaptive regressionby mixing) The weights and algorithm are justified using cross-validation methods Simulation studies are performed to assess thefinite-sample properties of the proposed model-combining methodIt is illustrated with an application to data from a vaccine study

A New Association Test for Case-Control GWAS Based on Dis-ease Allele SelectionZhongxue Chen

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 73

Abstracts

Indiana Universityzc3indianaeduCurrent robust association tests for case-control genome-wide asso-ciation study (GWAS) data are mainly based on the assumption ofsome specific genetic models Due to the richness of the geneticmodels this assumption may not be appropriate Therefore robustbut powerful association approaches are desirable Here we proposea new approach to testing for the association between the genotypeand phenotype for case-control GWAS This method assumes a gen-eralized genetic model and is based on the selected disease allele toobtain a p-value from the more powerful one-sided test Through acomprehensive simulation study we assess the performance of thenew test by comparing it with existing methods Some real data ap-plications are used to illustrate the use of the proposed test Basedon the simulation results and real data application the proposed testis powerful and robust

On Classification Methods for Personalized Medicine and Indi-vidualized Treatment RulesDaniel RubinUnited States Food and Drug AdministrationDanielRubinfdahhsgovAn important problem in personalized medicine is to construct in-dividualized treatment rules from clinical trials Instead of rec-ommending a single treatment for all patients such a rule tailorstreatments based on patient characteristics in order to optimize re-sponse to therapy In a 2012 JASA article Zhao et al showeda connection between this problem of constructing an individual-ized treatment rule and binary classification For instance in a two-arm clinical trial with binary outcomes and 11 randomization theproblem of constructing an individualized treatment rule can be re-duced to the classification problem in which one restricts to respon-ders and builds a classifier that predicts subjectsrsquo treatment assign-ments We extend this method to show an analogous reduction to theproblem in which one restricts to non-responders and must build aclassifier that predicts which treatments subjects were not assignedWe then use results from statistical efficiency theory to show howto efficiently combine the information from responders and non-responders Simulations show the benefits of the new methodology

Bayesian Adaptive Design for Dose-Finding Studies with De-layed Binary ResponsesXiaobi Huang1 and Haoda Fu2

1Merck amp Co2Eli Lilly and CompanyxiaobihuangmerckcomBayesian adaptive design is a popular concept in recent dose-findingstudies The idea of adaptive design is to use accrued data to makeadaptation or modification to an ongoing trial to improve the effi-ciency of the trial During the interim analysis most current meth-ods only use data from patients who have completed the studyHowever in certain therapeutic areas as diabetes and obesity sub-jects are usually studied for months to observe a treatment effectThus a large proportion of them have not completed the study atthe interim analysis It could lead to extensive information loss ifwe only incorporate subjects who completed the study at the interimanalysis Fu and Manner (2010) proposed a Bayesian integratedtwo-component prediction model to incorporate subjects who havenot yet completed the study at the time of interim analysis Thismethod showed efficiency gain with continuous delayed responsesIn this paper we extend this method to accommodate delayed bi-nary response and illustrate the Bayesian adaptive design through asimulation example

Session 51 New Development in Functional Data Analy-sis

Variable Selection and Estimation for Longitudinal Survey DataLi Wang1 Suojin Wang2 and Guannan Wang1

1University of Georgia2Texas AampM UniversityguannanugaeduThere is wide interest in studying longitudinal surveys where sam-ple subjects are observed successively over time Longitudinal sur-veys have been used in many areas today for example in the healthand social sciences to explore relationships or to identify signifi-cant variables in regression settings This paper develops a generalstrategy for the model selection problem in longitudinal sample sur-veys A survey weighted penalized estimating equation approachis proposed to select significant variables and estimate the coeffi-cients simultaneously The proposed estimators are design consis-tent and perform as well as the oracle procedure when the correctsubmodel were known The estimating function bootstrap is ap-plied to obtain the standard errors of the estimated parameters withgood accuracy A fast and efficient variable selection algorithm isdeveloped to identify significant variables for complex longitudinalsurvey data Simulated examples are illustrated to show the useful-ness of the proposed methodology under various model settings andsampling designs

Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and Boris Freydin1

1Thomas Jefferson University2George Washington UniversityapanasovichgwueduIn this work we develop an ordinary differential equations (ODE)model of physiological regulation of glycemia in type 1 diabetesmellitus (T1DM) patients in response to meals and intravenous in-sulin infusion Unlike for majority of existing mathematical modelsof glucose-insulin dynamics parameters in our model are estimablefrom a relatively small number of noisy observations of plasmaglucose and insulin concentrations For estimation we adopt thegeneralized smoothing estimation of nonlinear dynamic systems ofRamsay et al (2007) In this framework the ODE solution is ap-proximated with a penalized spline where the ODE model is in-corporated in the penalty We propose to optimize the generalizedsmoothing by using penalty weights that minimize the covariancepenalties criterion (Efron 2004) The covariance penalties criterionprovides an estimate of the prediction error for nonlinear estima-tion rules resulting from nonlinear andor non-homogeneous ODEmodels such as our model of glucose-insulin dynamics We alsopropose to select the optimal number and location of knots for B-spline bases used to represent the ODE solution The results of thesmall simulation study demonstrate advantages of optimized gen-eralized smoothing in terms of smaller estimation errors for ODEparameters and smaller prediction errors for solutions of differen-tial equations Using the proposed approach to analyze the glucoseand insulin concentration data in T1DM patients we obtained goodapproximation of global glucose-insulin dynamics and physiologi-cally meaningful parameter estimates

A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1

1New York University2Columbia Universityzhaoy05nyumcorg

74 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Resting-state functional magnetic resonance imaging (fMRI) is sen-sitive to functional brain changes related to many psychiatric disor-ders and thus becomes increasingly important in medical researchOne useful approach for fitting linear models with scalar outcomesand image predictors involves transforming the functional data tothe wavelet domain and converting the data fitting problem to a vari-able selection problem Applying the LASSO procedure in this sit-uation has been shown to be efficient and powerful In this study weexplore possible directions for improvements to this method The fi-nite sample performance of the proposed methods will be comparedthrough simulations and real data applications in mental health re-search We believe applying these procedures can lead to improvedestimation and prediction as well as better stability An illustrationof modeling psychiatric traits based on brain-imaging data will bepresented

Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan MizeraUniversity of AlbertalkongualbertacaWe consider the estimation in functional linear quantile regressionin which the dependent variable is scalar while the covariate is afunction and the conditional quantile for each fixed quantile indexis modeled as a linear functional of the covariate There are twocommon approaches for modeling the conditional mean as a linearfunctional of the covariate One is to use the functional principalcomponents of the covariates as basis to represent the functionalcovariate effect The other one is to extend the partial least squareto model the functional effect The former belongs to unsupervisedmethod and has been generalized to functional linear quantile re-gression The later is a supervised method and is superior to theunsupervised PCA method In this talk we propose to use partialquantile regression to estimate the functional effect in functionallinear quantile regression Asymptotic properties have been stud-ied and show the virtue of our method in large sample Simulationstudy is conducted to compare it with existing methods A real dataexample in stroke study is analyzed and some interesting findingsare discovered

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs

Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric ChiAmgen IncchiamgencomAs the patents of a growing number of biologic medicines have al-ready expired or are due to expire it has led to an increased interestfrom both the biopharmaceutical industry and the regulatory agen-cies in the development and approval of biosimilars EMA releasedthe first general guideline on similar biological medicinal productsin 2005 and specific guidelines for different drug classes subse-quently FDA issued three draft guidelines in 2012 on biosimilarproduct development A synthesized message from these guidancedocuments is that due to the fundamental differences between smallmolecule drug products and biologic drug products which are madeof living cells the generic versions of biologic drug products areviewed as similar instead of identical to the innovative biologicdrug product Thus more stringent requirement is necessary todemonstrate there are no clinically meaningful differences between

the biosimilar product and the reference product in terms of safetypurity and potency In this article we will briefly review statis-tical issues and challenges in clinical development of biosimilarsincluding criteria for biosimilarity and interchangeability selectionof endpoints and determination of equivalence margins equivalencevs non-inferiority bridging and regional effect and how to quan-tify totality-of-the-evidence

New Analytical Methods for Non-Inferiority Trials CovariateAdjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug AdministrationzhiweizhangfdahhsgovEven though an active-controlled trial provides no informationabout placebo investigators and regulators often wonder how theexperimental treatment would compare to placebo should a placeboarm be included in the study Such an indirect comparison oftenrequires a constancy assumption namely that the control effect rel-ative to placebo is constant across studies When the constancyassumption is in doubt there are ad hoc methods that ldquodiscountrdquothe historical data in conservative ways Recently a covariate ad-justment approach was proposed that does not require constancyor involve discounting but rather attempts to adjust for any imbal-ances in covariates between the current and historical studies Thiscovariate-adjusted approach is valid under a conditional constancyassumption which requires only that the control effect be constantwithin each subpopulation characterized by the observed covariatesFurthermore a sensitivity analysis approach has been developed toaddress possible departures from the conditional constancy assump-tion due to imbalances in unmeasured covariates This presentationdescribes these new approaches and illustrates them with examples

Where is the Right Balance for Designing an Efficient Biosim-ilar Clinical Program - A Biostatistic Perspective on Appro-priate Applications of Statistical Principles from New Drug toBiosimilarsYulan LiNovartis Pharmaceuticals Corporationyulanlinovartiscom

Challenges of designinganalyzing trials for Hepatitis C drugsGreg SoonUnited States Food and Drug AdministrationGuoxingSoonfdahhsgovThere has been a surge an outburst in drug developments to treathepatitis C virus (HCV) infection in the past 3-4 years and thelandscape has shifted significantly In particularly theresponse rateshaves steadily increased from approximately round 50 to now90 for HCV genotype 1 patients during this time While the suchchanging landscape es is beneficial were great for patientsit doeslead to some new challenges for new future HCV drugd evelopmentSome of the challenges include particularly in thechoice of controlsuccess efficacy winning criteria for efficacy and co-developmentof several drugs In this talk I will summarize the current landscapeof the HCV drug development and describe someongoing issues thatof interest

GSKrsquos Patient-level Data Sharing ProgramShuyen HoGlaxoSmithKline plcshu-yenyhogskcomIn May 2013 GSK launched an online system which would en-able researchers to request access to the anonymized patient-leveldata from published GSK-sponsored clinical trials of authorized or

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 75

Abstracts

terminated medicines Phase I-IV Consistent with expectations ofgood scientific practice researchers can request access and are re-quired to provide a scientific protocol with a commitment to publishtheir findings An Independent Review Panel is responsible for ap-proving or denying access to the data after reviewing a researcherrsquosproposal Once the request is approved and a signed Data SharingAgreement is received access to the requested data is provided ona password protected website to help protect research participantsrsquoprivacy This program is a step toward the ultimate aim of the clin-ical research community of developing a broader system where re-searchers will be able to access data from clinical trials conductedby different sponsors This talk will describe some of the details ofGSKrsquos data-sharing program including the opportunities and chal-lenges it presents We hope to bring the awareness of ICSAKISSsymposium participants on this program and encourage researchersto take full advantage of it to further clinical research

Session 53 Gatekeeping Procedures and Their Applica-tion in Pivotal Clinical Trials

A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane21Novartis Pharmaceuticals Corporation2Northwestern UniversitydongxinovartiscomWe generalize a multistage procedure for parallel gatekeeping towhat we refer to as k-out-of-n gatekeeping in which at least k outof n hypotheses in a gatekeeper family must be rejected in orderto test the hypotheses in the following family This gatekeepingrestriction arises in certain types of clinical trials for example inrheumatoid arthritis trials it is required that efficacy be shown onat least three of the four primary endpoints We provide a unifiedtheory of multistage procedures for arbitrary k with k = 1 corre-sponding to parallel gatekeeping and k = n to serial gatekeepingThe proposed procedure is simpler to apply for this particular prob-lem using a stepwise algorithm than the mixture procedure and thegraphical procedure with memory using entangled graphs

Multiple Comparisons in Complex Trial DesignsHM James HungUnited States Food and Drug AdministrationhsienminghungfdahhsgovAs the costs of clinical trials increase greatly in addition to otherconsiderations the clinical development program increasingly in-volves more than one trial for assessing the treatment effect of a testdrug particularly on adverse clinical outcomes A number of com-plex trial designs have been encountered in regulatory applicationsIn one scenario the primary efficacy endpoint requires two posi-tive trials to conclude a treatment effect while the key secondaryendpoint is a major adverse clinical endpoint such as mortality thatneeds to rely on integration of multiple trials in order to have a suf-ficient statistical power to show the treatment effect This presenta-tion is to stipulate the potential utility of such a trial design and thechallenging multiplicity issues with statistical inference

Use of Bootstrapping in Adaptive Designs with Multiplicity Is-suesJeff MacaQuintilesjeffmacaquintilescomWhen designing a clinical study there are often many parameterswhich are either unknown or not known with the precision neces-

sary to have confidence in the over design This has lead sponsors towant the design studies which are adaptive in nature and can adjustfor these design parameters by using data from the study to estimatethem As there are many different design parameters which dependon the type of study many different types of adaptive designs havebeen proposed It is also possible that one of the issues in the de-sign of the study is the optimal multiplicity strategy which could bebased on assumptions on the correlation of the multiple endpointswhich is often very difficult to know prior to the study start Theproposed methodology would use the data to estimate these param-eters and correct for any inaccuracies in the assumptions

Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael LeeJanssen Research amp Developmentmlee60itsjnjcomMultiplicity issues arise frequently in clinical trials with multipleendpoints andor multiple doses In drug development because ofregulatory requirements control of family-wise error rate (FWER)is essential in pivotal trials Numerous multiple testing proceduresthat control FWER in strong sense are available in literature Par-ticularly in the last decade efficient testing procedures such asfallback procedures gatekeeping procedures and the graphical ap-proach were proposed Depending on objectives of a study oneof these testing procedures can over-perform others To understandwhich testing procedure is preferable under certain circumstancewe use a simulation approach to evaluate performance of a few com-monly used multiple testing procedures Evaluation results and rec-ommendation will be presented

Session 54 Approaches to Assessing Qualitative Interac-tions

Interval Based Graphical Approach to Assessing Qualitative In-teractionGuohua Pan and Eun Young SuhJohnson amp JohnsonesuhitsjnjcomIn clinical studies comparing treatments the population often con-sists of subgroups of patients with different characteristics and in-vestigators often wish to know whether treatment effects are ho-mogeneous over various subgroups Qualitative interaction occurswhen the direction of treatment effect varies among subgroups Inthe presence of a qualitative interaction treatment recommendationis often challenging In medical research and applications to HealthAuthorities for approvals of new drugs qualitative interaction andits impact need to be carefully evaluated The initial statisticalmethod for assessing qualitative interaction was developed by Gailand Simon (GS) in 1985 and has been incorporated into commer-cial statistical software such as SAS While relatively often usedthe GS method and its interpretation are not easily understood bymedical researchers Alternative approaches have been researchedsince then One of the promising methods utilizes graphical repre-sentation of specially devised intervals for the treatment effects inthe subgroups If some of the intervals are to the left and others tothe right of a vertical line representing no treatment difference thereis then statistical evidence of a qualitative interaction and otherwisenot This feature similar to the familiar forest plots by subgroups isnaturally appealing to clinical scientists for examining and under-standing qualitative interactions These specially devised intervals

76 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

are shorter than simultaneous confidence intervals for treatment ef-fects in the subgroups and are shown to rival the GS method in sta-tistical power The method is easy to use and additionally providesan explicit power function which the GS method lacks This talkwill review and contrast statistical methods for assessing qualitativeinteraction with an emphasis on the above described graphical ap-proach Data from mega clinical trials on cardiovascular diseaseswill be analyzed to illustrate and compare the methods

Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong LuoCelgene Corporationxluocelgenecom

Post hoc findings of unexpected heterogeneous treatment effectshave been a challenge in the interpretation of clinical trials for spon-sor regulatory agencies and medical practitioners They are possi-ble simply due to chance or due to fundamental treatment effectdifferentiation Without repeating the resource intensive clinical tri-als it is critical to examine the framework of the given studies andto explore the likely model that may explain the overly simplifiedanalyses In this talk we will describe both theory and real clinicaltrials that can share lights on this complex and challenging issue

A Bayesian Approach to Qualitative InteractionEmine O BaymanUniversity of Iowaemine-baymanuiowaedu

A Bayesian Approach to Qualitative Interaction Author Emine OBayman Ph D emine-baymanuiowaedu Department of Anes-thesia Department of Biostatistics University of IowaDifferences in treatment effects between centers in a multi-centertrial may be important These differences represent treatment bysubgroup interaction Qualitative interaction occurs when the sim-ple treatment effect in one subgroup has a different sign than inanother subgroup1 this interaction is important Quantitative inter-action occurs when the treatment effects are of the same sign in allsubgroups and is often not important because the treatment recom-mendation is identical in all subgroupsA hierarchical model is used with exchangeable mean responsesto each treatment between subgroups Bayesian test of qualita-tive interaction is developed2 by calculating the posterior proba-bility of qualitative interaction and the corresponding Bayes factorThe model is motivated by two multi-center trials with binary re-sponses3 The frequentist power and type I error of the test usingthe Bayes factor are examined and compared with two other com-monly used frequentist tests Gail and Simon4 and Piantadosi andGail5 tests The impact of imbalance between the sample sizesin each subgroup on power is examined under different scenar-ios The method is implemented using WinBUGS and R and theR2WinBUGS interfaceREFERENCES 1 Peto R Statistical Aspects of Cancer TrialsTreatment of cancer Edited by Halnan KE London Chapman ampHall 1982 pp 867-871 2 Bayman EO Chaloner K Cowles MKDetecting qualitative interaction a Bayesian approach Statistics inMedicine 2010 29 455-63 3 Todd MM Hindman BJ Clarke WRTorner JC Intraoperative Hypothermia for Aneurysm Surgery TrialI Mild intraoperative hypothermia during surgery for intracranialaneurysm New England Journal of Medicine 2005 352 135-454 Gail M Simon R Testing for Qualitative Interactions betweenTreatment Effects and Patient Subsets Biometrics 1985 41 361-372 5 Piantadosi S Gail MH A comparison of the power of two

tests for qualitative interactions Statistics in Medicine 1993 121239-48

Session 55 Interim Decision-Making in Phase II Trials

Evaluation of Interim Dose Selection Methods Using ROC Ap-proachDeli Wang Lu Cui Lanju Zhang and Bo YangAbbVie Incdeliwangabbviecom

Interim analyses may be planned to drop inefficacious dose(s) indose-ranging clinical trials Commonly used statistical methods forinterim decision-making include conditional power (CP) predictedconfidence interval (PCI) and predictive power (PP) approachesFor these widely used methods it is worthy to have a closer look attheir performance characteristics and their interconnected relation-ship This research is to investigate the performance of these threestatistical methods in terms of decision quality based on a receiveroperating characteristic (ROC) method in the binary endpoint set-tings More precisely performance of each method is studied basedon calculated sensitivity and specificity under the assumed rangesof desirable as well as undesirable outcomes The preferred cutoffis determined and performance comparison across different meth-ods can be made With an apparent exchangeability of the threemethods a simple and uniform approach becomes possible

Interim Monitoring for Futility Based on Probability of SuccessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh3

1AbbVie Inc2Merck amp Co3GlaxoSmithKline plcyijiezhouabbviecom

Statistical significance has been the traditional focus of clinical trialdesign However an increasing emphasis has been placed on themagnitude of treatment effect based on point estimates to enablecross-therapy comparison The magnitude of point estimates todemonstrate sufficient medical value when compared with exist-ing therapies is typically larger than that to demonstrate statisticalsignificance Therefore a new clinical trial design and its interimmonitoring needs to take into account the trial success in terms ofthe magnitude of point estimates In this talk we propose a new in-terim monitoring approach for futility that targets on the probabilityof trial success in terms of achieving a sufficiently large point es-timate at end of the trial Simulation is conducted to evaluate theoperational characteristics of this approach

Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran SureshGlaxoSmithKline plcyuehui2wugskcom

Efficacy assessment is commonly seen in oncology trials as early asin Phase I trial expansion cohort part and phase II trials Early de-tection of efficacy signal or futility signal can greatly help the teamto make early decisions on future drug development plans such asstop for futility or start late phase planning In order to achievethis goal Bayesian adaptive design utilizing predictive probabilityis implemented This approach allows the team to monitor efficacydata constantly as the new patientrsquos data become available and makedecisions before the end of trial The primary endpoint in Oncologytrials is usually overall survival or progression free survival whichtakes long time to observe so surrogate endpoint such as overall

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 77

Abstracts

response rate is often used in early phase trials Multiple bound-aries for making future strategic decisions or for different endpointscan be provided Simulations play a vital role in providing variousdecision-making boundaries as well as the corresponding operatingcharacteristics Based on simulation results for each given samplesize the minimal sample size needed for the first interim look andthe futilityefficacy boundaries will be provided based on Bayesianpredictive probabilities Details of the implementation of this de-sign in real clinical trials will be demonstrated and pros and cons ofthis type of design will also be discussed

Session 56 Recent Advancement in Statistical Methods

Exact Inference New Methods and ApplicationsIan DinwoodiePortland State UniversityihdpdxeduExact inference concerns methods that generalize Fisherrsquos ExactTest for independence The methods are exact in the sense that teststatistics have distributions that do not depend on nuisance param-eters and asymptotic approximations are not used However com-putations are challenging and often require Monte Carlo methodsThis talk gives an overview with attention to sampling techniquesincluding Markov Chains and sequential importance sampling withnew applications to dynamical models and signalling networks

Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun HongSungkyunkwan UniversitycshongskkueduConsider the ROC surface which is a generalization of the ROCcurve for three-class diagnostic problems In this work we pro-pose five criteria for the three-class ROC surface by extending theYouden index the sum of sensitivity and specicity the maximumvertical distance the amended closest-to-(01) and the true rate Itmay be concluded that these five criteria can be expressed as a func-tion of two Kolmogorov-Smirnov (K-S) statistics It is found thatthe paired optimal thresholds selected from the ROC surface areequivalent to the two optimal thresholds found from the two ROCcurves Moreover we consider the volume under the ROC surface(VUS) The standard criteria of AUC for the probability of defaultbased on Basel II is extended to the VUS for ROC surface so thatthe standard criteria of VUS for the classification model is proposedThe ranges of AUC K-S and mean difference statistics correspond-ing to values of are VUS for each class of the standard criteria areobtained By exploring relationships of these statistics the standardcriteria of VUS for ROC surface could be established

Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho2

1Washington State University2Seoul National UniversityahnwsueduWe study the asymptotic properties of the reduced-rank estimator oferror correction models of vector processes observed with measure-ment errors Although it is well known that there is no asymptoticmeasurement error bias when predictor variables are integrated pro-cesses in regression models (Phillips and Durlauf 1986) we sys-tematically investigate the effects of the measurement errors (in thedependent variables as well as in the predictor variables) on the es-timation of not only cointegrating vectors but also the speed of ad-

justment matrix Furthermore we present the asymptotic propertiesof the estimators We also obtain the asymptotic distribution of thelikelihood ratio test for the cointegrating ranks and investigate theeffects of the measurement errors on the test through a Monte Carlosimulation study

A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying YuanUniversity of Texas MD Anderson Cancer Centerwshenmdandersonorg

Time-dependent areas under the receiver operating characteristics(ROC) curve (AUC) are important measures to evaluate the predic-tion accuracy of biomarkers for time-to-event endpoints (eg timeto disease progression or death) In this paper we propose a di-rect method to estimate AUC as a function of time using a flexiblefractional polynomials model without the middle step of modelingthe time-dependent ROC We develop a pseudo partial-likelihoodprocedure for parameter estimation and provide a test procedureto compare the predictive performance between biomarkers Weestablish the asymptotic properties of the proposed estimator andtest statistics A major advantage of the proposed method is itsease to make inference and compare the prediction accuracy acrossbiomarkers rendering our method particularly appealing for studiesthat require comparing and screening a large number of candidatebiomarkers We evaluate the finite-sample performance of the pro-posed method through simulation studies and illustrate our methodin an application to primary biliary cirrhosis data

Session 57 Building Bridges between Research and Prac-tice in Time Series Analysis

Time Series Research at the U S Census BureauBrian C MonsellU S Census Bureaubriancmonsellcensusgov

The Census Bureau has taken steps to reinforce the role of researchwithin the organization This talk will give details on the role of sta-tistical research at the U S Census Bureau with particular attentionpaid to the status of current work in time series analysis and statis-tical software development in time series A brief history of timeseries research will be given as well as details of work of historicalinterest

Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and Jinwoo Shin2

1IBM2KAIST Universitynabeusibmcom

Temporal causal modeling is an approach for modeling and causalinference based on time series data which is based on some recentadvances in graphical Granger modeling In this presentation wewill review the basic concept and approach some specific modelingalgorithms methods for associated functions (eg root cause anal-ysis) as well as some efforts of scaling these methods via parallelimplementation We will also describe some business applicationsof this approach in a number of domains (The authors are orderedalphabetically)

78 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Issues Related to the Use of Time Series in Model Building andAnalysisWilliam WS WeiTemple UniversitywweitempleeduTime series are used in many studies for model building and anal-ysis We must be very careful to understand the kind of time seriesdata used in the analysis In this presentation we will begin withsome issues related to the use of aggregate and systematic samplingtime series Since several time series are often used in a study of therelationship of variables we will also consider vector time seriesmodeling and analysis Although the basic procedures of modelbuilding between univariate time series and vector time series arethe same there are some important phenomena which are unique tovector time series Therefore we will also discuss some issues re-lated to vector time models Understanding these issues is importantwhen we use time series data in modeling and analysis regardlessof whether it is a univariate or multivariate time series

Session 58 Recent Advances in Design for BiostatisticalProblems

Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough CarriereUniversity of AlbertaKccarrieualbertacaN-of-1 trials are randomized multi-crossover experiments using twoor more treatments on a single patient They provide evidence-basedinformation on an individual patient thus optimizing the manage-ment of the individualrsquos chronic disease Such trials are preferredin many medical experiments as opposed to the more conventionalstatistical designs constructed to optimize treating the average pa-tient N-of-1 trials are also popular when the sample size is toosmall to adopt traditional optimal designs However there are veryfew guidelines available in the literature We constructed optimal N-of-1 designs for two treatments under a variety of conditions aboutthe carryover effects the covariance structure and the number ofplanned periods Extension to optimal aggregated N-of-1 designs isalso discussed

Efficient Algorithms for Two-stage Designs on Phase II ClinicalTrialsSeongho Kim1 and Weng Kee Wong2

1Wayne State UniversityKarmanos Cancer Institute2University of California at Los AngeleskimsekarmanosorgSingle-arm two-stage designs have been widely used in phase IIclinical trials One of the most popular designs is Simonrsquos optimaltwo-stage design that minimizes the expected sample size under thenull hypothesis Currently a greedy search algorithm is often usedto evaluate every possible combination of sample sizes for optimaltwo-stage designs However such a greedy strategy is computation-ally intensive and so is not feasible for large sample sizes or adaptivetwo-stage design with many parameters An efficient global op-timization discrete particle swarm optimization (DPSO) is there-fore developed to find two-stage designs efficiently and is comparedwith greedy algorithms for Simonrsquos optimal two-stage and adaptivetwo-stage designs It is further shown that DPSO can be efficientlyapplied to complicated adaptive two-stage designs even with threeprefixed possible response rates which a greedy algorithm cannothandle

D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle Swarm Op-timizationJiaheng Qiu and Weng Kee WongUniversity of California at Los AngeleswkwonguclaeduMultiple drug therapies are increasingly used to treat many diseasessuch as AIDS cancer and rheumatoid arthritis At the early stagesof clinical research the outcome is typically studied using a non-linear model with multiple doses from various drugs Advances inhandling estimation issues for such models are continually made butresearch to find informed design strategies has lagged We developa nature-inspired metaheuristic algorithm called ultra-dimensionalParticle Swarm Optimization (UPSO) to find D-optimal designs forthe Poisson and Exponential models for studying effects of up to 5drugs and their interactions This novel approach allows us to findeffective search strategy for such high-dimensional optimal designsand gain insight of their structure including conditions under whichlocally D-optimal designs are minimally supported We implementthe UPSO algorithm on a web site and apply it to redesign a realstudy that investigates 2-way interaction effects on the induction ofmicronuclei in mouse lymphoma cells from 3 genotoxic agents Weshow that a D-optimal design can reap substantial benefits over theimplemented design in Lutz et al (2005)

Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-Chung Wang3

and Weng Kee Wong4

1Institute of Statistical Science Academia Sinica2National Cheng Kung University3National Taiwan University4University of California at Los AngelesfredphoastatsinicaedutwSupersaturated designs (SSDs) are often used in screening experi-ments with a large number of factors to reduce the number of exper-imental runs As more factors are used in the study the search for anoptimal SSD becomes increasingly challenging because of the largenumber of feasible selection of factor level settings This talk tack-les this discrete optimization problem via a metaheuristic algorithmbased on Particle Swarm Optimization (PSO) techniques Usingthe commonly used E(s2) criterion as an illustrative example wewere able to modify the standard PSO algorithm and find SSDs thatsatisfy the lower bounds calculated in Bulutoglu and Cheng (2004)and Bulutoglu (2007) showing that the PSO-generated designs areE(s2)-optimal SSDs

Session 59 Student Award Session 2

Analysis of Sequence Data Under Multivariate Trait-DependentSamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari E North1Eric Boerwinkle2 and Dan-Yu Lin1

1University of North Carolina at Chapel Hill2University of Texas Health Science CentertaorliveunceduHigh-throughput DNA sequencing allows the genotyping of com-mon and rare variants for genetic association studies At the presenttime and in the near future it is not economically feasible to se-quence all individuals in a large cohort A cost-effective strategy isto sequence those individuals with extreme values of a quantitativetrait We consider the design under which the sampling depends on

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 79

Abstracts

multiple quantitative traits Under such trait-dependent samplingstandard linear regression analysis can result in bias of parameterestimation inflation of type 1 error and loss of power We con-struct a nonparametric likelihood function that properly reflects thesampling mechanism and utilizes all available data We implementa computationally efficient EM algorithm and establish the theoret-ical properties of the resulting nonparametric maximum likelihoodestimators Our methods can be used to perform separate inferenceon each trait or simultaneous inference on multiple traits We payspecial attention to gene-level association tests for rare variants Wedemonstrate the superiority of the proposed methods over standardlinear regression through extensive simulation studies We provideapplications to the Cohorts for Heart and Aging Research in Ge-nomic Epidemiology Targeted Sequencing Study and the NationalHeart Lung and Blood Institute Exome Sequencing Project

Empirical Likelihood Based Tests for Stochastic Ordering Un-der Right CensorshipHsin-wen Chang and Ian W McKeague

Columbia Universityhc2496columbiaedu

This paper develops an empirical likelihood approach to testing forstochastic ordering between two univariate distributions under rightcensorship The proposed test is based on a maximally selectedlocalized empirical likelihood ratio statistic The asymptotic nulldistribution is expressed in terms of a Brownian bridge The newprocedure is shown via a simulation study to have superior power tothe log-rank and weighted KaplanndashMeier tests under crossing haz-ard alternatives The approach is illustrated using data from a ran-domized clinical trial involving the treatment of severe alcoholichepatitis

Multiple Genetic Loci Mapping for Latent Disease Liability Us-ing a Structural Equation Modeling Approach with Applicationin Alzheimerrsquos DiseaseTing-Huei Chen

University of North Carolina at Chapel Hillthchenliveuncedu

Categorical traits such as cases-control status are often used as re-sponse variables in genome-wide association studies of genetic lociassociated with complex diseases Using categorical variables tosummarize likely continuous disease liability may lead to loss ofinformation thus reduction of power to recover associated geneticloci On the other hand a direct study of disease liability is ofteninfeasible because it is an unobservable latent variable In some dis-eases the underlying disease liability is manifested by several phe-notypes and thus the associated genetic loci may be identified bycombining the information of multiple phenotypes In this articlewe propose a novel method named PeLatent to address this chal-lenge We employ a structural equation approach to model the latentdisease liability by observed manifest variablesphenotypic infor-mation and to identify simultaneously multiple associated geneticloci by a regularized estimation method Simulation results showthat our method has substantially higher sensitivity and specificitythan existing methods Application of our method for a genome-wide association study of the Alzheimerrsquos disease (AD) identifies27 single nucleotide polymorphisms (SNPs) associated with ADThese 27 SNPs are located within 19 genes and several of thesegenes are known to be related to Alzheimerrsquos disease as well asneural activities

Session 60 Semi-parametric Methods

Semiparametric Estimation of Mean and Variance in General-ized Estimating EquationsJianxin Pan1 and Daoji Li21The University of Manchester2University of Southern CaliforniadaojilimarshallusceduEfficient estimation of regression parameters is a major objective inthe analysis of longitudinal data Existing approaches usually fo-cus on only modeling the mean and treat the variance as a nuisanceparameter The common assumption is that the variance is a func-tion of the mean and the variance function is further assumed to beknown However the estimator of the regression parameters can bevery inefficient if the variance function or variance is misspecifiedIn this paper a flexible semiparametric regression approach for lon-gitudinal data is proposed to jointly model the mean and varianceThe novel semiparametric mean and variance models offer greatflexibility in formulating the effects of covariates and time on themean and variance We simultaneously estimate the parametric andnonparametric components in the models by using a B-splines basedapproach The asymptotic normality of the resulting estimators forparametric components in the proposed models is established andthe optimal rate of convergence of the nonparametric components isobtained Our simulation study shows that our proposed approachyields more efficient estimators for the mean parameters than theconventional GEE approach The proposed approach is also illus-trated with real data analysis

An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan LiIndiana University-Purdue University IndianapolishpengmathiupuieduIn this talk wersquoll construct efficient estimators of linear functionalsof a probability measure when side information is available Our ap-proach is based on maximum empirical likelihood We will exhibitthat the proposed approach is mathematical simpler and computa-tional easier than the usual maximum empirical likelihood estima-tors Several examples are given about the possible side informa-tion We also report some simulation results

M-estimation for General ARMA Processes with Infinite Vari-anceRongning WuBaruch College City University of New YorkrongningwubaruchcunyeduGeneral autoregressive moving average (ARMA) models extend thetraditional ARMA models by removing the assumptions of causal-ity and invertibility The assumptions are not required under a non-Gaussian setting for the identifiability of the model parameters incontrast to the Gaussian setting We study M-estimation for generalARMA processes with infinite variance where the distribution ofinnovations is in the domain of attraction of a non-Gaussian stablelaw Following the approach taken by Davis et al (1992) and Davis(1996) we derive a functional limit theorem for random processesbased on the objective function and establish asymptotic propertiesof the M-estimator We also consider bootstrapping the M-estimatorand extend the results of Davis amp Wu (1997) to the present settingso that statistical inferences are readily implemented Simulationstudies are conducted to evaluate the finite sample performance ofthe M-estimation and bootstrap procedures An empirical exampleof financial time series is also provided

80 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Sufficient Dimension Reduction via Principal Lq Support Vec-tor MachineAndreas Artemiou1 and Yuexiao Dong2

1Cardiff University2Temple Universityydongtempleedu

Principal support vector machine was proposed recently by LiArtemiou and Li (2011) to combine L1 support vector machine andsufficient dimension reduction We introduce Lq support vector ma-chine as a unified framework for linear and nonlinear sufficient di-mension reduction By noticing that the solution of L1 support vec-tor machine may not be unique we set q gt 1 to ensure the unique-ness of the solution The asymptotic distribution of the proposedestimators are derived for q = 2 We demonstrate through numeri-cal studies that the proposed L2 support vector machine estimatorsimprove existing methods in accuracy and are less sensitive to thetuning parameter selection

Nonparametric Quantile Regression via a New MM AlgorithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong4

1College of Charleston1National Chengchi University2Shanghai University of Finance and Economics3Kansas State University4Temple Universitykaibcofcedu

Nonparametric quantile regression is an important statistical modelthat has been widely used in many research fields and applicationsHowever its optimization is very challenging since the objectivefunctions are non-differentiable In this work we propose a newMM algorithm for the nonparametric quantile regression modelThe proposed algorithm simultaneously updates the quantile func-tion and yield a smoother estimate of the quantile function We sys-tematically study the new MM algorithm in local linear quantile re-gression and show that the proposed algorithm preserves the mono-tone descent property of MM algorithms in an asymptotic senseMonte Carlo simulation studies will be presented to show the finitesample performance of the proposed algorithm

Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel Linder JingxianCai and Robert VogelGeorgia Southern Universityjxcai19880721hotmailcom

This article is intended to investigate the performance of two typesof stratified regression estimators namely the separate and the com-bined estimator using stratified ranked set sampling (SRSS) intro-duced by Samawi (1996) The expressions for mean and varianceof the proposed estimates are derived and are shown to be unbiasedA simulation study is designed to compare the efficiency of SRSSrelative to other sampling procedure under varying model scenar-ios Our investigation indicates that the regression estimator of thepopulation mean obtained through an SRSS becomes more efficientthan the crude sample mean estimator using stratified simple ran-dom sampling These findings are also illustrated with the help ofa data set on bilirubin levels in babies in a neonatal intensive careunitKey words Ranked set sampling stratified ranked set samplingregression estimator

Session 61 Statistical Challenges in Variable Selectionfor Graphical Modeling

Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1

1 University of Cambridge2 Columbia UniversityyangfengstatcolumbiaeduCommunity detection is one of the most widely studied problemsin network research In an undirected graph communities are re-garded as tightly-knit groups of nodes with comparatively few con-nections between them Popular existing techniques such as spec-tral clustering and variants thereof rely heavily on the edges beingsufficiently dense and the community structure being relatively ob-vious These are often not satisfactory assumptions for large-scalereal-world datasets We therefore propose a new community de-tection method called fused community detection (fcd) which isdesigned particularly for sparse networks and situations where thecommunity structure may be opaque The spirit of fcd is to takeadvantage of the edge information which we exploit by borrowingsparse recovery techniques from regression problems Our methodis supported by both theoretical results and numerical evidence Thealgorithms are implemented in the R package fcd which is availableon cran

High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2

1Temple University2Emory UniversityjichuntempleeduLarge-scale resting-state fMRI studies have been conducted for pa-tients with autism and the existence of abnormalities in the func-tional connectivity between brain regions (containing more thanone voxel) have been clearly demonstrated Due to the ultra-highdimensionality of the data current methods focusing on studyingthe connectivity pattern between voxels are often lack of power andcomputation-efficiency In this talk we introduce a new frameworkto identify the connection pattern of gigantic networks with desiredresolution We propose three procedures based on different networkstructures and testing criteria The asymptotical null distributions ofthe test statistics are derived together with its rate-optimality Sim-ulation results show that the tests are able to control type I error andyet very powerful We apply our method to a resting-state fMRIstudy on autism The analysis yields interesting insights about themechanism of autism

Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and Marina Vannucci31Stanford University2University of Texas MD Anderson Cancer Center3Rice UniversitycbpetersongmailcomIn this work we propose a Bayesian approach for inference of mul-tiple Gaussian graphical models Specifically we address the prob-lem of inferring multiple undirected networks in situations wheresome of the networks may be unrelated while others share com-mon features We link the estimation of the graph structures via aMarkov random field prior which encourages common edges Inaddition we learn which sample groups have shared graph structureby placing a spike-and-slab prior on the parameters that measurenetwork relatedness This approach allows us to share informationbetween sample groups when appropriate as well as to obtain a

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 81

Abstracts

measure of relative network similarity across groups In simula-tion studies we find improved accuracy of network estimation overcompeting methods particularly when the sample sizes within eachsubgroup are moderate We illustrate our model with an applica-tion to inference of protein networks for various subtypes of acutemyeloid leukemia

Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genevera IAllen2 and Zhandong Liu3

1University of Texas at Austin2Rice University3Baylor College of MedicineyuliabakerriceeduMarkov Random Fields or undirected graphical models are widelyused to model high-dimensional multivariate data Classical in-stances of these models such as Gaussian Graphical and Ising Mod-els as well as recent extensions (Yang et al 2012) to graphicalmodels specified by univariate exponential families assume all vari-ables arise from the same distribution Complex data from high-throughput genomics and social networking for example often con-tain discrete count and continuous variables measured on the sameset of samples To model such heterogeneous data we develop anovel class of mixed graphical models by specifying that each node-conditional distribution is a member of a possibly different univari-ate exponential family We study several instances of our modeland propose scalable M-estimators for recovering the underlyingnetwork structure Simulations as well as an application to learn-ing mixed genomic networks from next generation sequencing andmutation data demonstrate the versatility of our methods

Session 62 Recent Advances in Non- and Semi-Parametric Methods

Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential Spline Fam-ilyLan ZhouTexas AampM UniversitylzhoustattamueduIn this talk we introduce a method for joint estimation of multiplebivariate density functions for a collection of populations of proteinbackbone angles The method utilizes an exponential family of dis-tributions for which the log densities are modeled as a linear com-bination of a common set of basis functions The basis functionsare obtained as bivariate splines on triangulations and are adap-tively chosen based on dataThe circular nature of angular data istaken into account by imposing appropriate smoothness constraintsacross boundaries Maximum penalized likelihood is used for fit-ting the model and an effective Newton-type algorithm is devel-oped A simulation study clearly showed that the joint estimationapproach is statistically more efficient than estimating the densi-ties separately The proposed method provides a novel and uniqueperspective to two important and challenging problems in proteinstructure research namely structure-based protein classification andquality assessment of protein structure prediction servers The jointdensity estimation approach is widely applicable when there is aneed to estimate multiple density functions from different popula-tions with common features Moreover the coefficients of basisexpansion for the fitted densities provide a low-dimensional repre-sentation that is useful for visualization clustering and classifica-

tion of the densities This is joint work with Mehdi Maadooliat XinGao and Jianhua Huang

Estimating Time-Varying Effects for Overdispersed RecurrentData with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2 MounaAkacha3 and Heinz Schmidli31Vanderbilt University2University of North Carolina at Chapel Hill3Novartis Pharmaceuticals CorporationcindychenvanderbilteduIn the analysis of multivariate event times frailty models assum-ing time-independent regression coefficients are often consideredmainly due to their mathematical convenience In practice regres-sion coefficients are often time dependent and the temporal effectsare of clinical interest Motivated by a phase III clinical trial inmultiple sclerosis we develop a semiparametric frailty modellingapproach to estimate time-varying effects for overdispersed recur-rent events data with treatment switching The proposed model in-corporates the treatment switching time in the time-varying coeffi-cients Theoretical properties of the proposed model are establishedand an efficient EM algorithm is derived to obtain the maximumlikelihood estimates Simulation studies evaluate the numerical per-formance of the proposed model under various temporal treatmenteffect curves The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching aswell as for alternative models when the proportional hazard assump-tion is violated A multiple sclerosis dataset is analyzed to illustrateour methodology

Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily WangUniversity of GeorgialilywangugaeduIn this work we are interested in smoothing data over complex ir-regular boundaries or interior holes We propose bivariate penal-ized spline estimators over triangulations using energy functionalas the penalty We establish the consistency and asymptotic normal-ity for the proposed estimators and study the convergence rates ofthe estimators A comparison with thin-plate splines is provided toillustrate some advantages of this spline smoothing approach Theproposed method can be easily applied to various smoothing prob-lems over arbitrary domains including irregularly shaped domainswith irregularly scattered data points

Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and Annie Qu2

1Oregon State University2University of Illinois at Urbana-Champaign3Lung and Blood InstitutexuelstatoregonstateeduWe propose new varying-coefficient model selection and estimationbased on the spline approach which is capable of capturing time-dependent covariate effects The new penalty function utilizes local-region information for varying-coefficient estimation in contrast tothe traditional model selection approach focusing on the entire re-gion The proposed method is extremely useful when the signalsassociated with relevant predictors are time-dependent and detect-ing relevant covariate effects in the local region is more scientifi-cally relevant than those of the entire region However this bringschallenges in theoretical development due to the large-dimensionalparameters involved in the nonparametric functions to capture thelocal information in addition to computational challenges in solv-

82 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

ing optimization problems with overlapping parameters for differ-ent local-region penalization We provide the asymptotic theory ofmodel selection consistency on detecting local signals and estab-lish the optimal convergence rate for the varying-coefficient esti-mator Our simulation studies indicate that the proposed model se-lection incorporating local features outperforms the global featuremodel selection approaches The proposed method is also illus-trated through a longitudinal growth and health study from NationalHeart Lung and Blood Institute

Session 63 Statistical Challenges and Development inCancer Screening Research

Overdiagnosis in Breast and Prostate Cancer Screening Con-cepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing XiaFred Hutchinson Cancer Research CenterretzionifhcrcorgOverdiagnosis occurs when a tumor is detected by screening butin the absence of screening that tumor would never have becomesymptomatic within the lifetime of the patient Thus an overdiag-nosed tumor is a true extra diagnosis due solely to the existence ofthe screening test Patients who are overdiagnosed cannot by def-inition be helped by the diagnosis but they can be harmed partic-ularly if they are treated Therefore knowledge of the likelihoodthat a screen-detected cancer has been overdiagnosed is critical formaking treatment decisions and developing screening policy Theproblem of overdiagnosis has been long recognized in the case ofprostate cancer and is currently an area of extreme interest in breastcancer Published estimates of the frequency of overdiagnosis inbreast and prostate cancer screening vary greatly This presentationwill investigate why different studies yield such different resultsIrsquoll explain how overdiagnosis arises and catalog the different waysit may be measured in population studies Irsquoll then discuss differentapproaches that are used to estimate overdiagnosis Many studiesuse excess incidence under screening relative to incidence withoutscreening as a proxy for overdiagnosis Others use statistical mod-els to make inferences about lead time or disease natural historyand then derive the corresponding fraction of cases that are over-diagnosed Each approach has its limitations and challenges butone thing is clear estimation approach is clearly a major factor be-hind the variation in overdiagnosis estimates in the literature I willconclude with a list of key questions that consumers of overdiagno-sis studies should ask to determine the validity (or lack thereof) ofstudy results

Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington2Fred Hutchinson Cancer Research CenterlinoueuweduWith the growing importance of biomarker-based tests for early de-tection and monitoring of chronic diseases the question of howbest to utilize biomarker measurements is of tremendous interestthe answer requires understanding the biomarker growth processProspective screening studies offer an opportunity to investigatebiomarker growth while simultaneously assessing its value for earlydetection However since disease diagnosis usually terminates col-lection of biomarker measurements proper estimation of biomarkergrowth in these studies may need to account for how screening af-fects the length of the observed biomarker trajectory In this talk we

compare estimation of biomarker growth from prospective screen-ing studies using two approaches a retrospective approach that onlymodels biomarker growth and a prospective approach that jointlymodels biomarker growth and time to screen detection We assessperformance of the two approaches in a simulation study and usingempirical prostate-specific antigen data from the Prostate CancerPrevention Trial We find that the prospective approach accountingfor informative censoring often produces similar results but mayproduce different estimates of biomarker growth in some contexts

Estimating Screening Test Effectiveness when Screening Indica-tion is UnknownRebecca HubbardGroup Health Research Institutehubbardrghcorg

Understanding the effectiveness of cancer screening tests is chal-lenging when the same test is used for screening and also for dis-ease diagnosis in symptomatic individuals Estimates of screeningtest effectiveness based on data that include both screening and di-agnostic examinations will be biased Moreover in many cases goldstandard information on the indication for the examination are notavailable Models exist for predicting the probability that a givenexamination was used for a screening purpose but no previous re-search has investigated appropriate statistical methods for utilizingthese probabilities In this presentation we will explore alternativemethods for incorporating predicted probabilities of screening in-dication into analyses of screening test effectiveness Using sim-ulation studies we compare the bias and efficiency of alternativeapproaches We also demonstrate the performance of each methodin a study of colorectal cancer screening with colonoscopy Meth-ods for estimating regression model parameters associated with anunknown categorical predictor such as indication for examinationhave broad applicability in studies of cancer screening and otherstudies using data from electronic health records

Developing Risk-Based Screening Guidelines ldquoEqual Manage-ment of Equal RisksrdquoHormuzd KatkiNational Cancer Institutekatkihmailnihgov

The proliferation of disease risk calculators has not led to a prolif-eration of risk-based screening guidelines The focus of risk-basedscreening guidelines is connecting risk stratification under naturalhistory of disease (without intervention) to ldquobenefit stratificationrdquowhether the risk stratification better distinguishes people who havehigh benefit vs low benefit from a screening intervention To linkrisk stratification to benefit stratification we propose the principleof ldquoequal management of people at equal risk of diseaserdquo Whenapplicable this principle leads to simplified and consistent manage-ment of people with different risk factors or test results leading tothe same disease risk people who might also have a similar bene-fitharm profile We describe two examples of our approach Firstwe demonstrate how the ldquoequal management of equal risksrdquo prin-ciple was applied to thoroughly integrate HPV testing into the newrisk-based cervical cancer screening guidelines the first thoroughlyrisk-based US cancer screening guidelines Second we use risk oflung cancer death to estimate benefit stratification for targeting CTlung cancer screening We show how we calculated benefit strati-fication for CT lung screening and also the analogous ldquoharm strat-ificationrdquo and ldquoefficiency stratificationrdquo We critically examine thelimits of the ldquoequal management of equal risksrdquo principle This ap-proach of calculating benefit stratification and applying ldquoequal man-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 83

Abstracts

agement of equal risksrdquo might be applicable in other settings to helppave the way for developing risk-based screening guidelines

Session 64 Recent Developments in the Visualization andExploration of Spatial Data

Recent Advancements in Geovisualization with a Case Studyon Chinese ReligionsJuergen Symanzik1 and Shuming Bao2

1Utah State University2University of MichigansymanzikmathusueduProducing high-quality map-based displays for economic medicaleducational or any other kind of statistical data with geographiccovariates has always been challenging Either it was necessary tohave access to high-end software or one had to do a lot of detailedprogramming Recently R software for linked micromap (LM)plots has been enhanced to handle any available shapefiles fromGeographic Information Systems (GIS) Also enhancements havebeen made that allow for a fast overlay of various statistical graphson Google maps In this presentation we provide an overview ofthe necessary steps to produce such graphs in R starting with GIS-based data and shapefiles and ending with the resulting graphs inR We will use data from a study on Chinese religions and society(provided by the China Data Center at the University of Michigan)as a case study for these graphical methods

Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She21University of Michigan2Wuhan UniversitysbaoumicheduWith the rapid development of spatial and non-spatial databases ofpopulation economy social and natural environment from differ-ent sources times and formats It has been a challenge how to effi-ciently integrate those space-time data and methodology for spatialstudies This paper will discuss the recent development of spatialintelligence technologies and methodologies for spatial data inte-gration data analysis as well as their applications for spatial stud-ies The presentation will introduce the newly developed spatialdata explorers (China Geo-Explorer) distributed by the Universityof Michigan China Data Center It will demonstrate how space-timedata of different formats and sources can be integrated visualizedanalyzed and reported in a web based spatial system Some applica-tions in population and regional development disaster assessmentenvironment and health cultural and religious studies and house-hold surveys will be discussed for China and global studies Futuredirections will be discussed finally

Probcast Creating and Visualizing Probabilistic Weather Fore-castsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3 TilmannGneiting4 and Adrian Raftery21Seattle University2University of Washington3Bigger Boat Consulting4University HeidelbergsloughtjseattleueduProbabilistic methods are becoming increasingly common forweather forecasting However communicating uncertainty infor-mation about spatial forecasts to users is not always a straightfor-ward task The Probcast project (httpprobcastcom) looks to both

develop methodologies for spatial probabilistic weather forecast-ing and to develop means of communicating this information ef-fectively This talk will discuss both the statistical approaches usedto create forecasts and the cognitive psychology research used tofind the best ways to clearly communicate statistical and probabilis-tic information

Session 65 Advancement in Biostaistical Methods andApplications

Estimation of Time-Dependent AUC under Marker-DependentSamplingXiaofei Wang and Zhaoyin ZhuDuke UniversityxiaofeiwangdukeeduIn biomedical field evaluating the accuracy of a biomarker predict-ing the onset of a disease or a disease condition is essential Whenpredicting the binary status of disease onset is of interest the areaunder the ROC curve (AUC) is widely used When predicting thetime to an event is of interest time-dependent ROC curve (AUC(t))can be used In both cases however the simple random sampling(SRS) often used for biomarker validation is costly and requires alarge number of patients To improve study efficiency and reducecost marker-dependent sampling (MDS) has been proposed (Wanget al 2012 2013) in which selection of patients for ascertainingtheir survival outcomes is dependent on the results of biomarkerassays In this talk we will introduce a non-parametric estimatorfor time-dependent AUC(t) under MDS The consistency and theasymptotic normality of the proposed estimator will be discussedSimulation will be used to demonstrate the unbiasedness of the pro-posed estimator under MDS and the efficiency gain of MDS overSRS

A Measurement Error Approach for Modeling Accelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy DunloopNorthwestern Universityjungwha-leenorthwesterneduPhysical activity (PA) is a modifiable lifestyle factor for manychronic diseases with established health benefits PA outcomes us-ing accelerometers are measured and assessed in many studies butthere are limited statistical methods analyzing accelerometry dataWe describe a measurement error modeling approach to estimatethe distribution of habitual physical activity and the sources of vari-ation in accelerometer-based physical activity data from a sampleof adults with or at risk of knee osteoarthritis We model both theintra- and inter-individual variability in measured physical activityOur model allows us to account for and adjust for measurement er-rors biases and other sources of intra-individual variations

Real-Time Prediction in Clinical Trials A Statistical History ofREMATCHDaniel F Heitjan and Gui-shuang YingUniversity of PennsylvaniadheitjanupenneduRandomized clinical trials often include one or more planned in-terim analyses during which an external monitoring committee re-views the accumulated data and determines whether it is scientif-ically and ethically appropriate for the study to continue Withsurvival-time endpoints it is often desirable to schedule the interimanalyses at the times of occurrence of specified landmark eventssuch as the 50th event the 100th event and so on Because the

84 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

timing of such events is random and the interim analyses imposeconsiderable logistical burdens it is worthwhile to predict the eventtimes as accurately as possible Prediction methods available priorto 2001 used data only from previous trials which are often of ques-tionable relevance to the trial for which one wishes to make predic-tions With modern data management systems it is often feasibleto use data from the trial itself to make these predictions render-ing them far more reliable This talk will describe work that somecolleagues and students and I have done in this area I will set themethodologic development in the context of the trial that motivatedour work REMATCH a randomized clinical trial of a heart assistdevice that ran from 1998 to 2001 and was considered one of themost rigorous and expensive device trials ever conducted

An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C Morrison Elaine CJohnson Stephen R Planck and James T RosenbaumOregon Health amp Science UniversitychoidohsueduNormalization is considered an important step before any statisti-cal analyses in microarray studies Many methods have been pro-posed over the last decade or so for examples global normalizationlocal regression based methods and quantile normalization Nor-malization methods typically remove systemic biases across arraysand have been shown quite effective in removing them from arrayswhen they were processed simultaneously in a batch It is howeverreported that they sometimes do not remove differences betweenbatches when microarrays are split into several experiments over thetime In this presentation we will explore potential approaches thatcould adjust batch effects by using traditional methods and methodsdeveloped as a secondary normalization

Session 66 Analysis of Complex Data

Integrating Data from Heterogeneous Studies Using Only Sum-mary Statistics Efficiency and RobustnessMin-ge XieRutgers UniversitymxiestatrutgerseduHeterogeneous studies arise often in applications due to differentstudy and sampling designs populations or outcomes Sometimesthese studies have common hypotheses or parameters of interestWe can synthesize evidence from these studies to make inferencefor the common hypotheses or parameters of interest For hetero-geneous studies some of the parameters of interest may not be es-timable for certain studies and in such a case these studies are typ-ically excluded in conventional methods The exclusion of part ofthe studies can lead to a non-negligible loss of information This pa-per introduces a data integration method for heterogeneous studiesby combining the confidence distributions derived from the sum-mary statistics of individual studies It includes all the studies inthe analysis and makes use of all information direct as well as in-direct Under a general likelihood inference framework this newapproach is shown to have several desirable properties includingi) it is asymptotically as efficient as the maximum likelihood ap-proach using individual participant data (IPD) from all studies ii)unlike the IPD analysis it suffices to use summary statistics to carryout our approach Individual-level data are not required and iii) itis robust against misspecification of the working covariance struc-ture of the parameter estimates All the properties of the proposedapproach are further confirmed by data simulated from a random-

ized clinical trials setting as well as by real data on aircraft landingperformance (Joint work with Dungang Liu and Regina Liu)

A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University2Koc UniversityjlandongwueduIn this presentation we will consider a latent Markov process gov-erning the intensity rate of a Poisson process model for failure dataThe latent process enables us to infer the performance of the de-bugging operation over time and allows us to deal with the imper-fect debugging scenario We develop the Bayesian inference for themodel and also introduce a method to infer the unknown dimensionof the Markov process We will illustrate the implementation of ourmodel and the Bayesian approach by using actual software failuredata

A Comparison of Two Approaches for Acute Leukemia PatientClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng3

1University of Calgary2Enbridge Pipelines3University of GuelphjinwuucalgarycaThe advancement of microarray technology has greatly facilitatedthe research in gene expression based classification of patient sam-ples For example in cancer research microarray gene expressiondata has been used for cancer or tumor classification When thestudy is only focusing on two classes for example two different can-cer types we propose a two-sample semiparametric model to modelthe distributions of gene expression level for different classes Toestimate the parameters we consider both maximum semiparamet-ric likelihood estimate (MLE) and minimum Hellinger distance es-timate (MHDE) For each gene Wald statistic is constructed basedon either the MLE or MHDE Significance test is then performed oneach gene We exploit the idea of weighted sum of misclassificationrates to develop a novel classification model in which previouslyidentified significant genes only are involved To testify the useful-ness of our proposed method we consider a predictive approachWe apply our method to analyze the acute leukemia data of Golubet al (1999) in which a training set is used to build the classifica-tion model and the testing set is used to evaluate the accuracy of ourclassification model

On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 Valerie McGuire1

and John Shepherd3

1VA Palo Alto Health Care System amp Stanford University2Purdue University3University of California at San Francisco4VA Palo Alto Health Care SystemyingluvagovAlthough Deming regression (DR) has been successfully used toestablish cross-calibration (CC) formulas for bone mineral densities(BMD) between manufacturers at several anatomic sites it failedfor CC of whole body BMD because their relationship varies withsubjectrsquos weight total fat and lean mass We proposed to use a newvarying-coefficient DR (VCDR) that allows the intercept and slopebe non-linear functions of covariates and applied this new modelsuccessfully to derive a consistent calibration formula for the newwhole body BMD data Our results showed this VCDR effectively

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 85

Abstracts

removed all systematic bias in previous work In this talk we willdiscuss the consistency of the calibration formula and proceduresfor covariate selections

Session 67 Statistical Issues in Co-development of Drugand Biomarker

Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in Comparative Ef-fectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong Woo Kim3

1Stanford University2Onyx Pharmaceuticals3Microsoft Corportationdwkim88stanfordeduBiomarker-guided personalized therapies offer great promise to im-prove drug development and improve patient care but also posedifficult challenges in designing clinical trials for the developmentand validation of these therapies We first give a review of the exist-ing approaches briefly for clinical trials in new drug developmentand in more detail for comparative effectiveness trials involving ap-proved treatments We then introduce new group sequential designsto develop and test personalized treatment strategies involving ap-proved treatments

Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2

1University of Washington2National Institutes of HealthnrsimonuwashingtoneduMany difficult-to-treat diseases are actually a heterogenious collec-tion of similar syndromes with potentially different causal mech-anisms New molecules attack pathways that are dysregulated inonly a subset of this collection and so are expected to be effec-tive for only a subset of patients with the disease Often this subsetis not well understood until well into large scale of clinical trialsAs such standard practice has been to enroll a broad range of pa-tients and run post-hoc subset analysis to determine those who mayparticularly benefit This unnecessarily exposes many patients tohazardous side effects and may vastly decrease the efficiency of thetrial (expecially if only a small subset benefit) In this talk I willdiscuss a class of adaptive enrichment designs which allow the el-igibility criteria of a trial to be adaptively updated during the trialrestricting entry to only patients likely to benefit from the new treat-ment These designs control type I error can substantially increasepower I will also discuss and illustrate strategies for effectivelybuilding and evaluating biomarkers in this framework

An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael WolfAmgen IncmichaelwolfamgencomRoberts (Clin Cancer Res 2011) presented a single-arm 2-stageadaptive design to evaluate response overall and in one or morebiomarker-defined subgroup where biomarkers are only determinedfor responders While this design has obvious practical advantagesthe testing strategy proposed does not provide robust control offalse-positive error Modified futility and testing strategies are pro-posed based on marginal probabilities to achieve the same designobjectives that are shown to be more robust however a trade-off

is that biomarkers must be determined for all subjects Clinicalexamples of design setup and analysis are illustrated with a fixedsubgroup size that reflects its expected prevalence in the intendeduse population based on a validated in vitro companion diagnosticDesign efficiency and external validity are compared to testing fora difference in complement biomarker subgroups Possible gener-alizations of the design for a data-dependent subgroup size (egbiomarker value iquest sample median) and multiple subgroups are dis-cussed

Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas BengtssonGenentech IncthomasgbgenecomA key goal during early clinical co-development of a new therapeu-tic and a biomarker is to determine the ldquodiagnostic positive grouprdquoie to identify a sub-group of patients likely to receive a clini-cally meaningful treatment benefit We show that based on a typi-cally sized Ph1Ph2 study with nrevents iexcl 100 accurate biomarkerthreshold estimation with time-to-event data is not a realistic goalInstead we propose to hierarchically test for treatment effects inpre-determined patient subjects most likely to benefit clinically Weillustrate our method with data from a recent lung cancer trial

Session 68 New Challenges for Statistical Ana-lystProgrammer

Similarities and Differences in Statistical Programming amongCRO and Pharmaceutical IndustriesMark MatthewsinVentiv Health ClinicalmrkmtthwsyahoocomStatistical programming in the clinical environment has a widerange of opportunities across the clinical drug development cycleWhether you are employed by a Contract Research OrganizationPharmaceutical or Biotechnology company or as a contractor theprogramming tasks are often quite similar and at times the workcannot be differentiated by your employer However the higherlevel strategies and the direction any organization takes as an en-terprise can be an important factor in the fulfillment of a statisticalprogrammerrsquos career The author would like to share his experi-ences with the differences and similarities that a clinical statisticalprogrammer can be offered in their career and also provide someuseful tips on how to best collaborate when working with your peerprogrammers from different industries

Computational Aspects for Detecting Safety Signals in ClinicalTrialsJyoti RayamajhiEli Lilly and Companyrayamajhi jyotilillycomIt is always a challenge to detect safety signals from adverse event(AE) data in clinical trials which is a critical task in any drug devel-opment In any trial it is very desirable to describe and understandthe safety of the compound to the fullest possible extent MedDRAcoding scheme eg System Organ Class (SOC) and Preferred Term(PT) is used in safety analyses which is hierarchical in nature Useof Bayesian hierarchical models to predict posterior probabilitiesand will also account for AE in the same SOC to be more likelybe similar so they can sensibly borrow strength from each other

86 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

The model also allows borrowing strength across SOCs but doesnot impose it depending on the actual data It is interesting to seecomparative analyses between frequentistrsquos approach and an alter-native Bayesian methodology in detecting safety signals in clinicaltrials Computation of data to model these hierarchical models iscomplex and is challenging Data from studies were used to model3 Bayesian logistic regression hierarchical models Model selectionis achieved by using Deviance Information Criteria (DIC) Modelsand plots were implemented using BRugs R2WinBUGS and JAGSA scheme for meta analysis for a hierarchical three-stage Bayesianmixture model is also implemented and will be discussed An userfriendly and fully-functional web interface for safety signal detec-tion using Bayesian meta-analysis and general three-stage hierar-chical mixture model will be described Keywords System OrganClass Preferred terms Deviance Information Criteria hierarchicalmodels mixture model

Bayesian Network Meta-Analysis Methods An Overview andA Case StudyBaoguang Han1 Wei Zou2 and Karen Price11Eli Lilly and Company2inVentiv Clinical Healthhan baoguanglillycomEvidence-based health-care decision making requires comparing allrelevant competing interventions In the absence of direct head-to-head comparison of different treatments network meta-analysis(NMA) is increasingly used for selecting the best treatment strat-egy for health care intervention The Bayesian approach offers aflexible framework for NMA in part due to its ability to propagateparameter correlation structure and provide straightforward proba-bility statements around the parameters of interest In this talk wewill provide a general overview of the Bayesian NMA models in-cluding consistency models network meta-regression and inconsis-tency check using node-splitting techniques Then we will illustratehow NMA analysis can be performed with a detailed case studyand provide some details on available software as well as variousgraphical and textual outputs that can be readily understood and in-terpreted by clinicians

Session 69 Adaptive and Sequential Methods for ClinicalTrials

Bayesian Data Augmentation Dose Finding with Continual Re-assessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2

1 University of Texas MD Anderson Cancer Center2 University of Hong KongyyuanmdandersonorgA major practical impediment when implementing adaptive dose-finding designs is that the toxicity outcome used by the decisionrules may not be observed shortly after the initiation of the treat-ment To address this issue we propose the data augmentation con-tinual reassessment method (DA-CRM) for dose findingBy natu-rally treating the unobserved toxicities as missing data we showthat such missing data are nonignorable in the sense that the miss-ingness depends on the unobserved outcomes The Bayesian dataaugmentation approach is used to sample both the missing dataand model parameters from their posterior full conditional distri-butionsWe evaluate the performance of the DA-CRM through ex-tensive simulation studies and also compare it with other existingmethods The results show that the proposed design satisfactorily

resolves the issues related to late-onset toxicities and possesses de-sirable operating characteristicstreating patients more safely andalso selecting the maximum tolerated dose with a higher probabil-ity

Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying YuanUniversity of Texas MD Anderson Cancer Centeryzang1mdandersonorgIn developing targeted therapy the marker-strategy design providesan important approach to evaluate the predictive marker effect Thisdesign first randomizes patients into non-marker-based or marker-based strategies Patients allocated to the non-marker-based strat-egy are then further randomized to receive either the standard ortargeted treatments while patients allocated to the marker-basedstrategy receive treatments based on their marker statuses Thepredictive marker effect is tested by comparing the treatment out-come between the two strategies In this talk we show that sucha between-strategy comparison has low power to detect the predic-tive effect and is valid only under the restrictive condition that therandomization ratio within the non-marker-based strategy matchesthe marker prevalence To address these issues we propose a Waldtest that is generally valid and also uniformly more powerful thanthe between-strategy comparison Based on that we derive an opti-mal marker-strategy design that maximizes the power to detect thepredictive marker effect by choosing the optimal randomization ra-tios between the two strategies and treatments Our numerical studyshows that using the proposed optimal designs can substantially im-prove the power of the marker-strategy design to detect the predic-tive marker effect

Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at HoustonxlhuangmdandersonorgAs time goes by more and more data are observed for each pa-tient Dynamic prediction is to keep making updated predictionsof disease prognosis using all the available information This pro-posal is motivated by the need of real-time monitoring of the diseaseprogress of chronic myeloid leukemia patients using their BCR-ABL gene expression levels measured during their follow-up vis-its We provide real-time dynamic prediction for future prognosisusing a series of marginal Cox proportional hazards models overcontinuous time with constraints Comparing with separate land-mark analyses on different discrete time points after treatment ourapproach can achieve more smooth and robust predictions Com-paring with approaches of joint modeling of longitudinal biomark-ers and survival our approach does not need to specify a model forthe changes of the monitoring biomarkers and thus avoids the needof any kind of imputing of the biomarker values on time points theyare not available This helps eliminate the potential bias introducedby mis-specified models for longitudinal biomarkers

Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen2

1 Georgia State University2 Emory Universitycathysaiyogmailcom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 87

Abstracts

A phase II trial is an expedite and low cost trial to screen poten-tially effective agents for the following phase III trial Unfortu-nately the positive rate of Phase III trials is still low although agentshave been determined to be effective in proceeding Phase II trialsmainly because the different endpoints are used in Phase II (tu-mor response) and III (survival) trials Good disease response oftenleads to but can NOT guarantee better survival From statisticalconsideration transformation of continuous tumor size change intoa categorical tumor response (complete response partial responsestable disease or progressive disease) according to World HealthOrganization (WHO) or Response Evaluation Criteria In Solid Tu-mors (RECIST) will result in a loss of study power Tumor sizechange can be obtained rapidly but survival estimation requires along time follow up We propose a novel double screening phaseII design in which tumor size change percentage is used in the firststage to select potentially effective agents rapidly for second stagein which progression free or overall survival is estimated to confirmthe efficacy of agents The first screening can fully utilize all tumorsize change data and minimize cost and length of trial by stoppingit when agents are determined to be ineffective based on low stan-dard and the second screening can substantially increase the successrate of following Phase III trial by using similar or same outcomesand a high standard Simulation studies are performed to optimizethe significant levels of the two screening stages in the design andcompare its operating characteristics with Simonrsquos two stage designROC analysis is applied to estimate the success rate in the follow-upPhase III trials

Session 70 Survival Analysis

Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua NaranjoWestern Michigan UniversitybenedictpdormitoriowmicheduCox proportional hazards seems to be the standard statisticalmethod for analyzing treatment efficacy when time-to-event datais available In the absence of time-to-event investigators may uselogistic regression which does not require time-to-event or Poissonregression which requires only interval-summarized frequency ta-bles of time-to-event We investigate the relative performance of thethree methods In particular we compare the power of tests basedon the respective effect-size estimates (1)hazard ratio (2)odds ra-tio and (3)rate ratio We use a variety of survival distributions andcut-off points representing length of study The results have impli-cations on study design For example under what conditions mightwe recommend a simpler design based only on event frequenciesinstead of measuring time-to-event and what length of study is rec-ommended

Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary EndpointNibedita BandyopadhyayJanssen Research amp DevelopmentnbandyopitsjnjcomInterim analyses are widely used in Phase II and III clinical trialsThe efficiency in drug development process can be improved usinginterim analyses In clinical trials with time to an event as primaryendpoint it is common to plan the interim analyses at pre-specifiednumbers of events Performing these analyses at times with a differ-ent number of events than planned may impact the trialrsquos credibilityas well as the statistical properties of the interim analysis On the

other hand significant resources are required in conducting suchanalyses Therefore for logistic planning purposes it is very im-portant to predict the timing of this pre-specified number of eventsearly and accurately A statistical technique for making such pre-diction in ongoing multicenter clinical trials is developed Resultsare illustrated for different scenarios using simulations

Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen CaiUniversity of North Carolina at Chapel HillyudengliveunceduLogrank test is commonly used for comparing survival distributionsbetween treatment and control groups When censoring rate is lowand the sample size is moderate the approximation based on theasymptotic normal distribution of the logrank test works well in fi-nite samples However in some studies the sample size is small(eg 10 20 per group) and the censoring rate is high (eg 0809) Under such conditions we conduct a series of simulationsto compare the performance of the logrank test based on normal ap-proximation permutation and bootstrap In general the type I errorrate based on the bootstrap test is slightly inflated when the numberof failures is larger than 2 while the logrank test based on normalapproximation has a type I error around 005 and the permutationtest is relatively conservative in type I error However when thereis only one failure per group type I error of the permutation test ismore close to 005 than the other two tests

Session 71 Complex Data Analysis Theory and Appli-cation

Supervised Singular Value Decomposition and Its AsymptoticPropertiesGen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill2Rutgers UniversityhaipengemailunceduWe develop a supervised singular value decomposition (SupSVD)model for supervised dimension reduction The research is moti-vated by applications where the low rank structure of the data ofinterest is potentially driven by additional variables measured onthe same set of samples The SupSVD model can make use of theinformation in the additional data to accurately extract underlyingstructures that are more interpretable The model is very generaland includes the principal component analysis model and the re-duced rank regression model as two extreme cases We formulatethe model in a hierarchical fashion using latent variables and de-velop a modified expectation-maximization algorithm for parame-ter estimation which is computationally efficient The asymptoticproperties for the estimated parameters are derived We use com-prehensive simulations and two real data examples to illustrate theadvantages of the SupSVD model

New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng2

1University of Arizona2Columbia UniversitynhaomatharizonaeduIt is a challenging task to identify interaction effects for high di-mensional data The main difficulties lie in both computational and

88 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

theoretical aspects We propose a new framework for interaction se-lection Efficient computational algorithms based on both forwardselection and penalization approaches are illustrated

A Statistical Approach to Set Classification by Feature Selectionwith Applications to Classification of Histopathology ImagesSungkyu Jung1 and Xingye Qiao2

1University of Pittsburgh2Binghamton University State University of New YorkqiaomathbinghamtoneduSet classification problems arise when classification tasks are basedon sets of observations as opposed to individual observations In setclassification a classification rule is trained with N sets of observa-tions where each set is labeled with class information and the pre-diction of a class label is performed also with a set of observationsData sets for set classification appear for example in diagnosticsof disease based on multiple cell nucleus images from a single tis-sue Relevant statistical models for set classification are introducedwhich motivate a set classification framework based on context-freefeature extraction By understanding a set of observations as an em-pirical distribution we employ a data-driven method to choose thosefeatures which contain information on location and major variationIn particular the method of principal component analysis is usedto extract the features of major variation Multidimensional scal-ing is used to represent features as vector-valued points on whichconventional classifiers can be applied The proposed set classifica-tion approaches achieve better classification results than competingmethods in a number of simulated data examples The benefits ofour method are demonstrated in an analysis of histopathology im-ages of cell nuclei related to liver cancer

A Smoothing Spline Model for analyzing dMRI Data of Swal-lowingBinhuan Wang Ryan Branski Milan Amin and Yixin FangNew York UniversityyixinfangnyumcorgSwallowing disorders are common and have a significant healthimpact Dynamic magnetic resonance imaging (dMRI) is a noveltechnique for visualizing the pharynx and upper esophageal seg-ment during a swallowing process We develop a smoothing splinemethod for analyzing swallow dMRI data We apply the method toa dataset obtained from an experiment conducted in the NYU VoiceCenter

Session 72 Recent Development in Statistics Methods forMissing Data

A Semiparametric Inference to Regression Analysis with Miss-ing Covariates in Survey DataShu Yang and Jae-kwang KimIowa State UniversityjkimiastateeduWe consider parameter estimation in parametric regression modelswith covariates missing at random in survey data A semiparametricmaximum likelihood approach is proposed which requires no para-metric specification of the marginal covariate distribution We ob-tain an asymptotic linear representation of the semiparametric max-imum likelihood estimator (SMLE) using the theory of von Misescalculus and V Statistics which allows a consistent estimator ofasympototic variance An EM-type algorithm for computation isdiscussed We extend the methodology for general parameter es-timation which is not necessary equal to MLE Simulation results

suggest that the SMLE method is robust whereas the parametricmaximum likelihood method is subject to severe bias under modelmisspecification

Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2

1University of Waterloo2University of MichiganpeisonghanuwaterloocaWe propose an estimator which is more robust than doubly robustestimators by weighting the complete cases using weights otherthan the inverse probability when estimating the population meanof a response variable that is subject to ignorable missingness Weallow multiple models for both the propensity score and the out-come regression Our estimator is consistent if any one of the mul-tiple models is correctly specified Such multiple robustness againstmodel misspecification significantly improves over the double ro-bustness which only allows one propensity score model and oneoutcome regression model Our estimator attains the semiparamet-ric efficiency bound when one propensity score model and one out-come regression model are correctly specified without requiring theknowledge of exactly which two are correct

Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1

1United States Centers for Disease Control and Preventionjnu5cdcgovIn practice it is a challenge to impute missing values of binary vari-ables For a monotone missing pattern imputation methods avail-able in SAS include the LOGISTIC method which uses logistic re-gression modeling and the DISCRIM method which only allowscontinuous variables in the imputation model For an arbitrary miss-ing pattern a fully conditional specification (FCS) method is nowavailable in SAS This method only assumes the existence of a jointdistribution for all variables On the other hand IVEware devel-oped by University of Michigan Survey Research Center uses a se-quence of regression models and imputes missing values by drawingsamples from posterior predictive distributions We presents resultsfrom a series of simulations designed to evaluate and compare theperformance of the above mentioned imputation methods An ex-ample to impute the BED recent status (recent or long-standing)in estimating HIV incidence is used to illustrate the application ofthose procedures

Marginal Treatment Effect Estimation Using Pattern-MixtureModelZhenzhen XuUnited States Food and Drug AdministrationzhenzhenxufdahhsgovMissing data often occur in clinical trials When the missingness de-pends on unobserved responses pattern mixture model is frequentlyused This model stratifies the data according to drop-out patternsand formulates a model for each pattern with specific parametersThe resulting marginal distribution of response is a mixture of dis-tribution over the missing data patterns If the eventual interest is toestimate the overall treatment effect one can calculate a weightedaverage of pattern-specific treatment effects assuming that the treat-ment assignment is equally distributed across patterns Howeverin practice this assumption is unlikely to hold As a result theweighted average approach is subject to bias In this talk we in-troduce a new approach to estimate marginal treatment effect basedon random-effects pattern mixture model for longitudinal studieswith continuous endpoint relaxing the homogeneous distributional

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 89

Abstracts

assumption on treatment assignment across missing data patternsA simulation study shows that under missing not at random mech-anism the proposed approach can yield substantial reduction in es-timation bias and improvement in coverage probability comparedto the weighted average approach The proposed method is alsocompared with the linear mixed model and generalized estimatingequation approach under various missing data mechanisms

Session 73 Machine Learning Methods for Causal Infer-ence in Health Studies

Causal Inference of Interaction Effects with Inverse PropensityWeighting G-Computation and Tree-Based StandardizationJoseph Kang1 Xiaogang Su2 Lei Liu1 and Martha Daviglus31 Northwestern University2 University of Texas at El Paso3 University of Illinois at Chicagojoseph-kangnorthwesternedu

Given the recent interest of subgroup-level studies and personalizedmedicine health research with observational studies has been devel-oped for interaction effects of measured confounders In estimatinginteraction effects the inverse of the propensity weighting (IPW)method has been widely advocated despite the immediate availabil-ity of other competing methods such as G-computation estimatesThis talk compares the advocated IPW method the G-computationmethod and our new Tree-based standardization method whichwe call the Interaction effect Tree (IT) The IT procedure uses alikelihood-based decision rule to divide the subgroups into homo-geneous groups where the G-computation can be applied Our sim-ulation studies indicate that the IT-based method along with the G-computation works robustly while the advocated IPW method needssome caution in its weighting We applied the IT-based method toassess the effect of being overweight or obese on coronary arterycalcification (CAC) in the Chicago Healthy Aging Study cohort

Practice of Causal Inference with the Propensity of Being Zeroor OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and Peter M Steiner31 Northwestern University2University of CincinnatiCincinnati Childrenrsquos Hospital MedicalCenter3University of Wisconsin-Madisonwendychan2016unorthwesternedu

Causal inference methodologies have been developed for the pastdecade to estimate the unconfounded effect of an exposure underseveral key assumptions These assumptions include the absenceof unmeasured confounders the independence of the effect of onestudy subject from another and propensity scores being boundedaway from zero and one (the positivity assumption) The first twoassumptions have received much attention in the literature Yet thepositivity assumption has been recently discussed in only a few pa-pers Propensity scores of zero or one are indicative of deterministicexposure so that causal effects cannot be defined for these subjectsTherefore these subjects need to be removed because no compa-rable comparison groups can be found for such subjects In thispaper we evaluate and compare currently available causal inferencemethods in the context of the positivity assumption We propose atree-based method that can be easily implemented in R software Rcode for the studies is available online

Propensity Score and Proximity Matching Using Random For-estPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1

1San Diego State University2University of Texas at El PasojjfanmailsdsueduTo reduce potential bias in observational studies it is essential tohave balanced distributions on all available background informa-tion between cases and controls Propensity score has been a keymatching variable in this area However this approach has severallimitations including difficulties in handling missing values cate-gorical variables and interactions Random forest as an ensembleof many classification trees is straightforward to use and can eas-ily overcome those issues Each classification tree in random forestrecursively partitions the available dataset into sub-sets to increasethe purity of the terminal nodes With this process the cases andcontrols in the same terminal node automatically becomes the bestbalanced match By averaging the outcome of each individual treerandom forest can provide robust and balanced matching resultsThe proposed method is applied to data from the National Healthand Nutrition Examination Survey (NHNES)

Session 74 JP Hsu Memorial Session

Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili YuGeorgia Southern UniversitylyugeorgiasoutherneduThe classical accelerated failure time (AFT) model has been exten-sively investigated due to its direct interpretation of the covariateeffects on the mean survival time in survival analysis Howeverthis classical AFT model and its associated methodologies are builton the fundamental assumption of data homoscedasticity Conse-quently when the homoscedasticity assumption is violated as of-ten seen in the real applications the estimators lose efficiency andthe associated inference is not reliable Furthermore none of theexisting methods can estimate the intercept consistently To over-come these drawbacks we propose a semiparametric approach inthis paper for both homoscedastic and heteroscedastic data Thisapproach utilizes a weighted least-squares equation with syntheticobservations weighted by square root of their variances where thevariances are estimated via the local polynomial regression We es-tablish the limiting distributions of the resulting coefficient estima-tors and prove that both slope parameters and the intercept can beconsistently estimated We evaluate the finite sample performanceof the proposed approach through simulation studies and demon-strate its superiority through real example on its efficiency and reli-ability over the existing methods when the data is heteroscedastic

A Comparison of Size and Power of Tests of Hypotheses on Pa-rameters Based on Two Generalized Lindley DistributionsMacaulay OkwuokenyeBiogen IdecmacaulayokwuokenyebiogenideccomData (complete and censored) following the Lindley distributionare generated and analyzed using two generalized Lindley distribu-tions and maximum likelihood estimates of parameters from gen-eralized Lindley distributions are obtained Size and power of testsof hypotheses on the parameters are assessed drawing on asymp-totic properties of the maximum likelihood estimators Results sug-gest that whereas size of some of the tests of hypotheses based on

90 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the considered generalized distributions are essentially alpha-levelsome are possibly not power of tests of hypotheses on Lindley dis-tribution parameter from the two distributions differs

Session 75 Challenge and New Development in ModelFitting and Selection

Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2

1University of Nevada at Las Vegas2American Museum of Natural HistoryameiameiunlveduMutation frequencies can be modeled as a Poisson random field(PRF) to estimate speciation times and the degree of selection onnewly arisen mutations This approach provides a quantitative the-ory for comparing intraspecific polymorphism with interspecific di-vergence in the presence of selection and can be used to estimatepopulation genetic parameters First we modified a recently devel-oped time-dependent PRF model to independently estimate geneticparameters from a nuclear and mitochondrial DNA data set of 22sister pairs of birds that have diverged across a biogeographic bar-rier We found that species that inhabit humid habitat had more re-cent divergence times larger effective population sizes and smallerselective effect than those that inhabit drier habitats but overall themitochondrial DNA was under weak selection Our study indicatesthat PRF models are useful for estimating various population ge-netic parameters and serve as a framework for incorporating esti-mates of selection into comparative phylogeographic studies Sec-ond due to the built-in feature of the species divergence time thetime-dependent PRF model is especially suitable for estimating se-lective effects of more recent mutations such as the mutations thathave occurred in the human genome By analyzing the estimateddistribution of the selective coefficients at each individual gene forexample the sign and magnitude of the mean selection coefficientwe will be able to detect a gene or a group of genes that are relatedto the diagnosed cancer Moreover the estimate of the species diver-gence time will provide useful information regarding the occurrencetime of the cancer

On A Class of Maximum Empirical Likelihood Estimators De-fined By Convex FunctionsHanxiang Peng and Fei TanIndiana University-Purdue University IndianapolisftanmathiupuieduIn this talk we introduce a class of estimators defined by convexcriterion functions and show that they are maximum empirical like-lihood estimators (MELEs) We apply the results to obtain MELEsfor quantiles quantile regression and Cox regression when addi-tional information is available We report some simulation resultsand real data applications

Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai WangNew Jersey Institute of Technologyaw224njiteduGiven a dependent censored data (X delta) =(min(TC) I(T lt C)) from an Archimedean copula modelwe give general formulas for possible marginal survival functionsof T and C Based on our formulas we can easily establish therelationship between all these survival functions and derive some

useful identifiability results Also based on our formulas we pro-pose a new estimator of the marginal survival function when theArchimedean copula model is assumed to be known We derivebias formulas for our estimator and other existing estimators Simu-lation studies have shown that our estimator is comparable with thecopula-graphic estimator proposed by Zheng and Klein (1995) andRivest and Wells (2001) and Zheng and Kleinrsquos estimator (1994)under the Archimedean copula assumption We end our talk withsome discussions

Dual Model Misspecification in Generalized Linear Models withError in VariablesXianzheng HuangUniversity of Southern CaliforniahuangstatsceduWe study maximum likelihood estimation of regression parametersin generalized linear models for a binary response with error-pronecovariates when the distribution of the error-prone covariate or thelink function is misspecified We revisit the remeasurement methodproposed by Huang Stefanski and Davidian (2006) for detectinglatent-variable model misspecification and examine its operatingcharacteristics in the presence of link misspecification Further-more we propose a new diagnostic method for assessing assump-tions on the link function Combining these two methods yieldsinformative diagnostic procedures that can identify which model as-sumption is violated and also reveal the direction in which the truelatent-variable distribution or the true link function deviates fromthe assumed one

Session 76 Advanced Methods and Their Applications inSurvival Analysis

Kernel Smoothed Profile Likelihood Estimation in the Acceler-ated Failure Time Frailty Model for Clustered Survival DataBo Liu1 Wenbin Lu1 and Jiajia Zhang2

1North Carolina State University2South Carolina UniversityjzhangmailboxsceduClustered survival data frequently arise in biomedical applicationswhere event times of interest are clustered into groups such as fam-ilies In this article we consider an accelerated failure time frailtymodel for clustered survival data and develop nonparametric max-imum likelihood estimation for it via a kernel smoother aided EMalgorithm We show that the proposed estimator for the regressioncoefficients is consistent asymptotically normal and semiparamet-ric efficient when the kernel bandwidth is properly chosen An EM-aided numerical differentiation method is derived for estimating itsvariance Simulation studies evaluate the finite sample performanceof the estimator and it is applied to the Diabetic Retinopathy dataset

Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2

1National University of Singapore2Emory UniversityqizhengemoryeduMarginal regression-based ranking methods are widely adopted toscreen ultrahigh-dimensional biomarkers in biomedical studies Anassumed regression model may not fit a real data in practice Weconsider a model-free screening approach specifically for censoredlifetime data outcome by measuring the average survival differences

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 91

Abstracts

with and without the covariates The proposed survival impactingindex can be implemented with familiar nonparametric estimationprocedures and avoid imposing any rigid model assumptions Weestablish the sure screening property of the index and the asymptoticdistribution of the estimated index to facilitate inferences Simula-tions are carried out to assess the performance of our method Alung cancer data is analyzed as an illustration

Analysis of Event History Data in Tuberculosis (TB) ScreeningJoan HuSimon Fraser UniversityjoanhstatsfucaTuberculosis (TB) is an infectious disease spread by the airborneroute An important public health intervention in TB prevention istracing individuals (TB contacts) who may be at risk of having TBinfection or active TB disease as a result of having shared air spacewith an active TB case This talk presents an analysis of the datacollected from 7921 people identified as contacts from the TB reg-istry of British Columbia Canada in attempt to identify risk factorsto TB development of TB contacts Challenges encountered in theanalysis include clustered subjects covariate missing not at random(MNAR or NMAR) and a portion of subjects potentially will neverexperience the event of TB

On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1 and Mei-Cheng Wang3

1University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston3Johns Hopkins UniversityjningmdandersonorgBivariate or multivariate recurrent event processes are often encoun-tered in longitudinal studies in which more than one type of eventsare of interest There has been much research on regression analy-sis for such data but little has been done to address the problem ofhow to measure dependence between two types of recurrent eventprocesses We propose a time-dependent measure termed the rateratio to assess the local dependence between two types of recur-rent event processes We model the rate ratio as a parametric re-gression function of time and leave unspecified all other aspects ofthe distribution of bivariate recurrent event processes We developa composite-likelihood procedure for model fitting and parameterestimation We show that the proposed composite-likelihood esti-mator possesses consistency and asymptotically normality propertyThe finite sample performance of the proposed method is evaluatedthrough simulation studies and illustrated by an application to datafrom a soft tissue sarcoma study

Session 77 High Dimensional Variable Selection andMultiple Testing

On Procedures Controlling the False Discovery Rate for TestingHierarchically Ordered HypothesesGavin Lynch and Wenge GuoNew Jersey Institute of TechnologywengeguonjiteduComplex large-scale studies such as those related to microarray andquantitative trait loci often involve testing multiple hierarchicallyordered hypotheses However most existing false discovery rate(FDR) controlling procedures do not exploit the inherent hierarchi-cal structure among the tested hypotheses In this talk I present key

developments toward controlling the FDR when testing the hierar-chically ordered hypotheses First I offer a general framework un-der which hierarchical testing procedures can be developed Then Ipresent hierarchical testing procedures which control the FDR undervarious forms of dependence Simulation studies show that theseproposed methods can be more powerful than alternative methods

Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 and Yufeng Liu4

1University of Texas MD Anderson Cancer Center2North Carolina State University3University of Arizona4University of North Carolina at Chapel HillwustatncsueduReducing dimensionality of data is essential for binary classifica-tion with high-dimensional covariates In the context of sufficientdimension reduction (SDR) most if not all existing SDR meth-ods suffer in binary classification In this talk we target directly atthe SDR for binary classification and propose a new method basedon support vector machines The new method is supported by bothnumerical evidence and theoretical justification

Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional RegressionZhigen Zhao1 and Pengsheng Ji21Temple University2University of GeorgiapsjiugaeduThe variable selection and multiple testing problems for regres-sion have almost the same goalndashidentifying the important variablesamong many The research has been focusing on selection consis-tency which is possible only if the signals are sufficiently strongOn the contrary the signals in more modern applications are usu-ally rare and weak In this paper we developed a two-stage testingprocedure named it as ROMP short for the Rate Optimal Multi-ple testing Procedure because it achieves the fastest convergencerate of marginal false non-discovery rate (mFNR) while control-ling the marginal false discovery rate (mFDR) at any designatedlevel alpha asymptotically

Pathwise Calibrated Active Shooting Algorithm with Applica-tion to Semiparametric Graph EstimationTuo Zhao1 and Han Liu2

1Johns Hopkins University2Princeton UniversityhanliuprincetoneduThe pathwise coordinate optimization ndash combined with the activeset strategy ndash is arguably one of the most popular computationalframeworks for high dimensional problems It is conceptually sim-ple easy to implement and applicable to a wide range of convexand nonconvex problems However there is still a gap betweenits theoretical justification and practical success For high dimen-sional convex problems existing theories only show sublinear ratesof convergence For nonconvex problems almost no theory on therates of convergence exists To bridge this gap we propose a novelunified computational framework named PICASA for pathwise co-ordinate optimization The main difference between PICASA andexisting pathwise coordinate descent methods is that we exploit aproximal gradient pilot to identify an active set Such a modifica-tion though simple has profound impact with high probabilityPICASA attains a global geometric rate of convergence to a uniquesparse local solution with good statistical properties (eg minimaxoptimality oracle property) for solving a large family of convex and

92 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

nonconvex problems Unlike most existing analysis which assumesthat all the computation can be carried out exactly without worry-ing about numerical precision our theory explicitly counts the nu-merical computation accuracy and thus is more realistic The PI-CASA method is quite general and can be combined with differentcoordinate descent strategies such as cyclical coordinate descentgreedy coordinate descent and randomized coordinate descent As

an application we apply the PICASA method to a family of noncon-vex optimization problems motivated by estimating semiparametricgraphical models The PICASA method allows us to obtain newstatistical recovery results on both parameter estimation and graphselection consistency which do not exist in the existing literatureThorough numerical results are also provided to back up our theo-retical arguments

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 93

Index of Authors

Abantovalle C 19 38Abe N 30 78Ahn S 30 78Akacha M 31 82Allen GI 31 82Amei A 34 91Amin M 33 89Apanasovich TV 29 74Artemiou A 31 81Au S 32 85Aue A 24 56author) TZ( 27 67

Bai X 26 61Baiocchi M 28 71Bakanda C 21 42Baker Y 31 82Balasubramanian K 26 60Ball G 21 44Bandyopadhyay N 33 88Bao S 32 32 84 84Barrdahl M 22 49Bayman EO 30 77Becker K 21 42Bengtsson T 33 86Berger TW 21 45Bernhardt P 26 63Beyene J 21 42Bhamidi S 29 72Bidouard J 20 39Blocker AW 20 40Boerwinkle E 31 79Bornn L 20 39Boye ME 20 40Brannath W 23 50Branski R 33 89Braun T 22 47Breidt J 24 55Bretz F 23 50Brown ER 28 69Brown M 24 54

Cai C 23 34 53 92Cai J 31 33 81 88Campbell J 19 38Candille S 22 47Cao G 22 49Carriere KC 30 79Cepurna WO 32 85

Chan G 21 45Chan W 34 90Chang H 31 80Chang J 26 63Chatterjee A 31 81Chatterjee N 22 49Chen B 28 70Chen G 29 32 71 85Chen H 29 74Chen L 28 69Chen M 19 20 21 23 29

38 40 44 52 73Chen Q 31 82Chen R 31 79Chen S 25 28 58 70Chen T 31 80Chen X 26 61Chen Y 23 24 34 53 54

92Chen Z 22 29 33 49 73

87Cheng G 19 36Cheng X 20 39Cheng Y 21 27 44 65Chervoneva I 29 74Cheung YK 27 64Chi E 29 75Chiang AY 28 68Chiruvolu P 21 44Cho J 23 24 52 54Cho S 30 78Choi D 32 85Choi DS 24 54Choi S 22 33 48 87Chu R 21 42Chuang-Stein C 20 42Chun H 26 61Coan J 27 67Colantuoni E 28 71Collins R 21 42Coneelly K 22 47Cook R 28 70Coram M 22 47Crespi C 23 50Cui L 30 77Cui Y 22 33 46 87

DrsquoAmico E 21 42Dıaz I 28 71

Dabuxilatu W 29 72Dai J 27 65Daviglus M 34 90DeFor T 21 45Degras D 27 67Deng K 24 55Deng Y 33 88Dey D 19 38 64Dey DK 19 38Dey J 21 44Di Y 19 37Dinwoodie I 30 78Djorgovski G 20 39Dominici F 28 68Donalek C 20 39Dong G 23 50Dong Y 31 31 81 81Dormitorio B 33 88Drake A 20 39Du Z 24 54Duan Y 19 38Dunloop D 32 84Dyk DV 20 38

Edlefsen PT 21 43Elliott M 21 42Etzioni R 32 32 83 83

Fan B 32 85Fan J 34 90Fan Y 26 63Fang L 21 25 45 57Fang Y 33 89Faries D 26 61Faruquie T 30 78Fei T 24 54Feng H 22 47Feng Y 29 31 33 71 81

88Feng Z 32 85Fink J 21 44Fisch R 28 68Franceschini N 31 79Freydin B 29 74Fu H 23 25 25 29 50 59

59 74

Gaines D 19 38Gao B 25 57

Gentleman R 19Gneiting T 32 84Gong Q 21 45Graham M 20 39Gu C 32 85Guan W 22 48Gulati R 32 83Gulukota K 24 53Guo S 25 57Guo W 35 92Guo X 19 36

Ha MJ 23 51Hale MD 25 57Han B 33 87Han L 25 57Han P 34 89Han SW 29 73Haneuse S 28 68Hannig J 23 27 51 66Hao N 33 88He K 26 63He QA 28 68He T 22 46He W 20 24 42 57He X 23 53He Y 19 36Heitjan DF 25 32 59 84Hernandez-Stumpfhauser

D 24 55Ho S 30 75Hong CS 30 78Hong H 30 78Hong Y 19 38Hopcroft J 27 65Hormann S 24 56Hou L 23 52Houseman EA 20 40Hsu C 25 57Hsu L 20 41Hsu W 22 49Hu J 34 92Hu M 24 55Hu P 22 49Hu Y 19 37Huang C 21 21 45 45Huang J 25 60Huang M 31 81

95

Huang X 29 33 34 3474 87 91 92

Huang Y 23 26 51 62Hubbard R 32 83Huerta G 27 67Hung HJ 30 76Hung J 64Huo X 22 48

Ibrahim JG 20 31 40 82Inoue LY 32 83Islam SS 20 42

Jackson C 27 67Ji P 35 92Ji Y 24 24 28 53 53 68Jia N 25 59Jia X 27 64Jiang H 19 30 36 78Jiang Q 20 21 42 44Jiang X 19 38Jiang Y 19 36Jiao X 20 38Jin Z 21 44Johnson EC 32 85Johnson K 26 62Joshi AD 22 49Joslyn S 32 84Jung S 33 89Justice AC 25 59

Kai B 31 81Kambadur A 30 78Kang J 31 34 34 81 90

90Katki H 32 83Kim DW 33 86Kim J 34 89Kim JK 28 70Kim M 34 90Kim S 31 79Kim Y 22 48Kolivras K 19 38Kong L 29 75Kooperberg C 27 65Kosorok MR 29 71Kovalchik S 21 42Kracht K 21 44Kraft P 22 49Kuo H 22 48Kuo RC 19 38Kwon M 25 60

Lai M 32 82Lai RCS 23 51Lai T 28 71Lai TL 28 33 71 86Landon J 32 85Lang K 30 78Lavori PW 28 71Leary E 27 67

Lebanon G 26 60Lecci F 20 39Lee CH 21 45Lee J 24 32 53 84Lee KH 28 68Lee M 30 76Lee MT 24 56Lee S 27 64Lee SY 25 60Lee TCM 23 51Lenzenweger MF 21 43Leu CS 27 65Levin B 27 65Levy DL 21 43Li C 22 48Li D 31 80Li F 27 67Li G 23 27 33 50 66 88Li H 23 50Li J 19 34 37 38 91Li L 23 26 26 31 52 60

61 80Li M 19 22 37 48Li P 27 65Li R 26 62Li X 23 49Li Y 23 25 25 26 29 30

53 59 59 6375 79

Li-Xuan L 29 73Lian H 19 36Liang B 20 41Liang F 24 53Liang H 27 65Liao OY 33 86Lim J 22 48Lin D 28 31 69 79Linder D 31 81Lindquist M 27 67Lipshultz S 26 62Lipsitz S 26 62Liu B 34 91Liu D 20 22 41 46Liu H 28 35 70 92Liu J 26 61Liu JS 24 55Liu K 27 66Liu L 34 90Liu M 20 39 40Liu R 29 73Liu S 33 87Liu X 20 21 41 44Liu XS 24 54Liu Y 22 35 46 92Liu Z 31 82Long Q 28 69Lonita-Laza I 20 40Lou X 25 60Lozano A 30 78Lu T 27 65Lu W 20 34 39 91

Lu Y 20 32 39 85Luo R 27 65Luo S 23 51Luo X 21 30 45 77Lv J 26 63Lynch G 35 92

Ma H 20 22 42 49Ma J 29 72Ma P 20 40Ma TF 24 56Ma Z 22 46Maca J 30 76Mahabal A 20 39Mai Q 26 64Majumdar AP 27 66Malinowski A 21 46Mandrekar V 22 46Manner D 23 50Marniquet X 20 39Martin R 27 66Martino S 21 42Matthews M 33 86Maurer W 23 50McGuire V 32 85McIsaac M 28 70McKeague IW 31 80Meng X 27 64 66Mesbah M 24 56Mi G 19 37Mias GI 19 37Michailidis G 29 72Mills EJ 21 42Min X 28 68Mitra R 24 53Mizera I 29 75Molinaro A 28 69Monsell BC 30 78Morgan CJNA 21 43Morrison JC 32 85Mueller P 24 28 53 68

Nachega JB 21 42Naranjo J 33 88Nettleton D 23 51Nguyen HQ 22 47Nie L 29 75Nie X 23 51Ning J 23 28 30 33 34

53 70 78 87 92Nobel A 33 88Nobel AB 29 72Nordman DJ 23 51Norinho DD 24 56Normand S 25North KE 31 79Norton JD 20 41Nosedal A 27 67

Offen W 64Ogden RT 29 74

Ohlssen D 28 68Okwuokenye M 34 90Olshen A 28 69Owen AB 27 66Ozekici S 32 85

Paik J 28 71Pan G 30 76Pan J 31 80Pan Y 34 89Park D 28 67Park DH 22 48Park S 64Park T 25 60Pati D 26 62Peng H 31 34 80 91Peng J 19 37Peng L 26 34 62 91Perry P 24 54Peterson C 31 81Phoa FKH 31 79Pinheiro J 25 57Planck SR 32 85Prentice R 20 41Price K 23 33 50 87Prisley S 19 38Pullenayegum E 21 42

Qazilbash M 22 47Qi X 27 65Qian PZG 23 51Qiao X 29 33 71 89Qin J 21 28 45 70Qin R 27 64Qin ZS 24 55Qiu J 31 79Qiu Y 29 73Qu A 32 82Quartey G 20 42

Raftery A 32 84Ravikumar P 31 82Rayamajhi J 33 86Ren Z 29 73Rohe K 24 54Rosales M 21 43Rosenbaum JT 32 85Rosenblum M 28 71Rube HT 19 37Rubin D 29 74

Saegusa T 24 54Salzman J 19 36Samawi H 31 81Samorodnitsky G 27 65Samworth RJ 31 81Schafer DW 19 37Schlather M 21 46Schmidli H 31 82Schrag D 28 68Scott J 20 42

Shadel W 21 42Shao Y 25 57Shariff H 20 38She B 32 84Shen H 33 88Shen W 20 30 40 78Shen Y 28 70Shepherd J 32 85Shi P 32 82Shih M 28 71Shin J 30 78Shin SJ 35 92Shojaie A 24 29 54 72Shu X 32 82Shui M 32 84SienkiewiczE 21 46Simon N 33 86Simon R 33 86Sinha D 26 62Sloughter JM 32 84Smith B 25 57Smith BT 34 91Snapinn S 21 44Song C 28 68Song D 21 45Song J 32 84Song JS 19 37Song M 22 49Song R 23 34 51 89Song X 20 40Soon G 29 30 75 75Sorant AJ 25 59Soyer R 32 85Sriperambadur B 26 60Steiner PM 34 90Stingo F 31 81Strawderman R 28 69Su X 26 34 61 90Su Z 26 61Suh EY 30 76Suktitipat B 25 59Sun D 27 66Sun J 23 53Sun N 22 48Sun Q 29 72Sun T 22 46Sun W 23 51Sung H 25 59Suresh R 30 77Symanzik J 32 84

Tamhane A 30 76Tan F 34 91Tang CY 26 63

Tang H 22 47Tang Y 26 62Tao M 22 48Tao R 31 79Taylor J 22 47Tewson P 32 84Thabane L 21 42Thall PF 22 47Todem D 22 49Trippa L 28 68Trotta R 20 38Tucker A 26 62

Vannucci M 31 81Verhaak RG 24 54Vogel R 31 81Vrtilek S 20 39

Wahed A 21 44Waldron L 24 55Wang A 34 91Wang B 33 89Wang C 24 56Wang D 30 77Wang G 29 74Wang H 21 23 46 53Wang J 26 27 63 66Wang L 29 32 34 74 82

89Wang M 28 34 69 92Wang Q 26 27 61 67Wang R 19 37Wang S 29 31 74 80Wang W 31 79Wang X 19 25 32 38 58

84Wang Y 20 20 22 25 25

41 41 48 58 59Wang Z 25 25 33 59 59

87Wei WW 30 79Wei Y 20 40Wen S 20 21 42 44Weng H 29 71Weng RC 19 38Wettstein G 20 39Whitmore GA 24 56Wileyto EP 25 59Wilson AF 25 59Wilson JD 29 72Witten D 23 51Woerd MVD 24 55Wolf M 33 86Wolfe PJ 24 54

Wong WK 23 31 31 5079 79

Wu C 32 82Wu D 24 55Wu H 22 27 47 65Wu J 22 32 47 85Wu M 23 52Wu R 31 80Wu S 23 50Wu Y 21 26 30 35 43

63 77 92

Xi D 30 76Xia J 32 83Xia T 20 39Xiao R 22 48Xie J 31 81Xie M 32 85Xing H 22 48Xing X 20 40Xiong J 24 57Xiong X 22 47Xu K 25 59Xu R 23 51Xu X 25 58Xu Y 28 68Xu Z 34 89Xue H 27 65Xue L 32 82

Yang B 30 77Yang D 33 88Yang E 31 82Yang S 24 28 34 56 70

89Yao R 30 77Yao W 31 81Yau CY 24 56Yavuz I 21 44Yi G 24 57Yin G 33 87Ying G 25 32 59 84Young LJ 27 67Yu C 28 70Yu D 29 75Yu L 31 34 81 90Yu Y 31 81Yuan Y 30 33 33 78 87

87

Zacks S 27 65Zang Y 33 87Zeng D 20 20 28 29 31

41 41 69 7179 82

Zhan M 21 44Zhang B 29 75Zhang C 19 23 36 52Zhang D 20 26 40 63Zhang G 22 49Zhang H 19 28 36 68Zhang HH 26 33 35 63

88 92Zhang I 25 58Zhang J 28 34 69 91Zhang L 21 30 44 77Zhang N 29 75Zhang Q 25 59Zhang S 25 58Zhang W 20 39Zhang X 19 23 26 36 53

63Zhang Y 24 25 54 58Zhang Z 21 27 29 46 67

75Zhao H 23 29 52 73Zhao L 22 25 47 59Zhao N 23 52Zhao P 34 90Zhao S 25 57Zhao T 35 92Zhao Y 29 33 74 87Zhao Z 35 92Zheng C 29 72Zheng Q 34 91Zheng Y 20 29 41 72Zheng Z 26 63Zhong H 29 73Zhong L 25 57Zhong P 22 46Zhong W 20 22 40 46Zhou H 22 26 29 48 60

73Zhou L 31 82Zhou Q 29 73Zhou T 23 50Zhou Y 30 77Zhu G 26 61Zhu H 26 60Zhu J 26 63Zhu L 21 44Zhu M 26 62Zhu Y 24 55Zhu Z 24 32 56 84Zou F 22 48Zou H 26 64Zou W 33 87

  • Welcome
  • Conference Information
    • Committees
    • Acknowledgements
    • Conference Venue Information
    • Program Overview
    • Keynote Lectures
    • Student Paper Awards
    • Short Courses
    • Social Program
    • ICSA 2015 in Fort Collins CO
    • ICSA 2014 China Statistics Conference
    • ICSA Dinner at 2014 JSM
      • Scientific Program
        • Monday June 16 800 AM - 930 AM
        • Monday June 16 1000 AM-1200 PM
        • Monday June 16 130 PM - 310 PM
        • Monday June 16 330 PM - 510 PM
        • Tuesday June 17 820 AM - 930 AM
        • Tuesday June 17 1000 AM - 1200 PM
        • Tuesday June 17 130 PM - 310 PM
        • Tuesday June 17 330 PM - 530 PM
        • Wednesday June 18 830 AM - 1010 AM
        • Wednesday June 18 1030 AM-1210 PM
          • Abstracts
            • Session 1 Emerging Statistical Methods for Complex Data
            • Session 2 Statistical Methods for Sequencing Data Analysis
            • Session 3 Modeling Big Biological Data with Complex Structures
            • Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses
            • Session 5 Recent Advances in Astro-Statistics
            • Session 6 Statistical Methods and Application in Genetics
            • Session 7 Statistical Inference of Complex Associations in High-Dimensional Data
            • Session 8 Recent Developments in Survival Analysis
            • Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products
            • Session 10 Analysis of Observational Studies and Clinical Trials
            • Session 11 Lifetime Data Analysis
            • Session 12 Safety Signal Detection and Safety Analysis
            • Session 13 Survival and Recurrent Event Data Analysis
            • Session 14 Statistical Analysis on Massive Data from Point Processes
            • Session 15 High Dimensional Inference (or Testing)
            • Session 16 Phase II Clinical Trial Design with Survival Endpoint
            • Session 17 Statistical Modeling of High-throughput Genomics Data
            • Session 18 Statistical Applications in Finance
            • Session 19 Hypothesis Testing
            • Session 20 Design and Analysis of Clinical Trials
            • Session 21 New methods for Big Data
            • Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data
            • Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process
            • Session 24 Bayesian Models for High Dimensional Complex Data
            • Session 25 Statistical Methods for Network Analysis
            • Session 26 New Analysis Methods for Understanding Complex Diseases and Biology
            • Session 27 Recent Advances in Time Series Analysis
            • Session 28 Analysis of Correlated Longitudinal and Survival Data
            • Session 29 Clinical Pharmacology
            • Session 30 Sample Size Estimation
            • Session 31 Predictions in Clinical Trials
            • Session 32 Recent Advances in Statistical Genetics
            • Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization
            • Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications
            • Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials
            • Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis
            • Session 37 High-Dimensional Data Analysis Theory and Application
            • Session 38 Leading Across Boundaries Leadership Development for Statisticians
            • Session 39 Recent Advances in Adaptive Designs in Early Phase Trials
            • Session 40 High Dimensional RegressionMachine Learning
            • Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice
            • Session 42 Applications of Spatial Modeling and Imaging Data
            • Session 43 Recent Development in Survival Analysis and Statistical Genetics
            • Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population
            • Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis
            • Session 46 Missing Data the Interface between Survey Sampling and Biostatistics
            • Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine
            • Session 48 Student Award Session 1
            • Session 49 Network AnalysisUnsupervised Methods
            • Session 50 Personalized Medicine and Adaptive Design
            • Session 51 New Development in Functional Data Analysis
            • Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs
            • Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials
            • Session 54 Approaches to Assessing Qualitative Interactions
            • Session 55 Interim Decision-Making in Phase II Trials
            • Session 56 Recent Advancement in Statistical Methods
            • Session 57 Building Bridges between Research and Practice in Time Series Analysis
            • Session 58 Recent Advances in Design for Biostatistical Problems
            • Session 59 Student Award Session 2
            • Session 60 Semi-parametric Methods
            • Session 61 Statistical Challenges in Variable Selection for Graphical Modeling
            • Session 62 Recent Advances in Non- and Semi-Parametric Methods
            • Session 63 Statistical Challenges and Development in Cancer Screening Research
            • Session 64 Recent Developments in the Visualization and Exploration of Spatial Data
            • Session 65 Advancement in Biostaistical Methods and Applications
            • Session 66 Analysis of Complex Data
            • Session 67 Statistical Issues in Co-development of Drug and Biomarker
            • Session 68 New Challenges for Statistical AnalystProgrammer
            • Session 69 Adaptive and Sequential Methods for Clinical Trials
            • Session 70 Survival Analysis
            • Session 71 Complex Data Analysis Theory and Application
            • Session 72 Recent Development in Statistics Methods for Missing Data
            • Session 73 Machine Learning Methods for Causal Inference in Health Studies
            • Session 74 JP Hsu Memorial Session
            • Session 75 Challenge and New Development in Model Fitting and Selection
            • Session 76 Advanced Methods and Their Applications in Survival Analysis
            • Session 77 High Dimensional Variable Selection and Multiple Testing
              • Index of Authors
Page 2: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene

Published by International Chinese Statistical Association - Korean International Statistical Society

International Chinese Statistical Association - Korean InternationalStatistical Society

Applied Statistics Symposium

2014

CONFERENCE INFORMATION PROGRAM AND ABSTRACTS

June 15 - 18 2014

Portland Marriot Downtown Waterfront

Portland Oregon USA

Organized byInternational Chinese Statistical Association - Korean International Statistical Society

ccopy2014International Chinese Statistical Association - Korean International Statistical Society

Contents

Welcome 1Conference Information 2

Committees 2Acknowledgements 4Conference Venue Information 6Program Overview 7Keynote Lectures 8Student Paper Awards 9Short Courses 10Social Program 15ICSA 2015 in Fort Collins CO 16ICSA 2014 China Statistics Conference 17ICSA Dinner at 2014 JSM 18

Scientific Program 19Monday June 16 800 AM - 930 AM 19Monday June 16 1000 AM-1200 PM 19Monday June 16 130 PM - 310 PM 21Monday June 16 330 PM - 510 PM 23Tuesday June 17 820 AM - 930 AM 25Tuesday June 17 1000 AM - 1200 PM 25Tuesday June 17 130 PM - 310 PM 27Tuesday June 17 330 PM - 530 PM 29Wednesday June 18 830 AM - 1010 AM 31Wednesday June 18 1030 AM-1210 PM 33

Abstracts 36Session 1 Emerging Statistical Methods for Complex Data 36Session 2 Statistical Methods for Sequencing Data Analysis 36Session 3 Modeling Big Biological Data with Complex Structures 37Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses 38Session 5 Recent Advances in Astro-Statistics 38Session 6 Statistical Methods and Application in Genetics 39Session 7 Statistical Inference of Complex Associations in High-Dimensional Data 40Session 8 Recent Developments in Survival Analysis 40Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products 41Session 10 Analysis of Observational Studies and Clinical Trials 42Session 11 Lifetime Data Analysis 44Session 12 Safety Signal Detection and Safety Analysis 44Session 13 Survival and Recurrent Event Data Analysis 45Session 14 Statistical Analysis on Massive Data from Point Processes 45Session 15 High Dimensional Inference (or Testing) 46Session 16 Phase II Clinical Trial Design with Survival Endpoint 47Session 17 Statistical Modeling of High-throughput Genomics Data 47Session 18 Statistical Applications in Finance 48Session 19 Hypothesis Testing 49Session 20 Design and Analysis of Clinical Trials 50

iii

Session 21 New methods for Big Data 51Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data 51Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process 52Session 24 Bayesian Models for High Dimensional Complex Data 53Session 25 Statistical Methods for Network Analysis 54Session 26 New Analysis Methods for Understanding Complex Diseases and Biology 54Session 27 Recent Advances in Time Series Analysis 55Session 28 Analysis of Correlated Longitudinal and Survival Data 56Session 29 Clinical Pharmacology 57Session 30 Sample Size Estimation 58Session 31 Predictions in Clinical Trials 59Session 32 Recent Advances in Statistical Genetics 59Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization 60Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications 61Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials 61Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis 62Session 37 High-Dimensional Data Analysis Theory and Application 63Session 38 Leading Across Boundaries Leadership Development for Statisticians 64Session 39 Recent Advances in Adaptive Designs in Early Phase Trials 64Session 40 High Dimensional RegressionMachine Learning 65Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice 66Session 42 Applications of Spatial Modeling and Imaging Data 67Session 43 Recent Development in Survival Analysis and Statistical Genetics 67Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population 68Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis 69Session 46 Missing Data the Interface between Survey Sampling and Biostatistics 70Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine 70Session 48 Student Award Session 1 71Session 49 Network AnalysisUnsupervised Methods 72Session 50 Personalized Medicine and Adaptive Design 73Session 51 New Development in Functional Data Analysis 74Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs 75Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials 76Session 54 Approaches to Assessing Qualitative Interactions 76Session 55 Interim Decision-Making in Phase II Trials 77Session 56 Recent Advancement in Statistical Methods 78Session 57 Building Bridges between Research and Practice in Time Series Analysis 78Session 58 Recent Advances in Design for Biostatistical Problems 79Session 59 Student Award Session 2 79Session 60 Semi-parametric Methods 80Session 61 Statistical Challenges in Variable Selection for Graphical Modeling 81Session 62 Recent Advances in Non- and Semi-Parametric Methods 82Session 63 Statistical Challenges and Development in Cancer Screening Research 83Session 64 Recent Developments in the Visualization and Exploration of Spatial Data 84Session 65 Advancement in Biostaistical Methods and Applications 84Session 66 Analysis of Complex Data 85Session 67 Statistical Issues in Co-development of Drug and Biomarker 86Session 68 New Challenges for Statistical AnalystProgrammer 86Session 69 Adaptive and Sequential Methods for Clinical Trials 87Session 70 Survival Analysis 88Session 71 Complex Data Analysis Theory and Application 88Session 72 Recent Development in Statistics Methods for Missing Data 89Session 73 Machine Learning Methods for Causal Inference in Health Studies 90Session 74 JP Hsu Memorial Session 90Session 75 Challenge and New Development in Model Fitting and Selection 91Session 76 Advanced Methods and Their Applications in Survival Analysis 91

Session 77 High Dimensional Variable Selection and Multiple Testing 92Index of Authors 94

2014 Joint Applied Statistics Symposium of ICSA and KISS

June 15-18 Marriot Downtown Waterfront Portland Oregon USA

Welcome to the 2014 joint International Chinese Statistical Association (ICSA) and

the Korean International Statistical Society (KISS) Applied Statistical Symposium

This is the 23rd of the ICSA annual symposium and 1st for KISS The organizing committees have

been working hard to put together a strong program including 7 short courses 3 keynote lectures

76 scientific sessions student paper sessions and social events Our scientific program includes

keynote lectures from prominent statisticians Dr Sharon-Lise Normand Dr Robert Gentleman and

Dr Sastry Pantula and invited and contributed talks covering cutting-edge topics on Genome Scale

data and big data as well as on the new world of statistics after 2013 international year of statis-

tics We hope this symposium will provide abundant opportunities for you to engage learn and

network and get inspirations to advance old research ideas and develop new ones We believe this

will be a memorable and worthwhile learning experience for you

Portland is located near the confluence of the Willamette and Columbia rivers with unique city cul-

ture It is close to the famous Columbia gorge Oregon high mountains and coast Oregon is also

famous for many micro- breweries and beautiful wineries without sale tax June is a great time to

visit We hope you also have opportunities to experience the rich culture and activities the city has

to offer during your stay

Thanks for coming to the 2014 ICSA-KISS Applied Statistics Symposium in Portland

Dongseok Choi and Rochelle Fu on behalf of

2014 ICSA-KISS Applied Statistics Symposium Executive and Organizing committees

The city The city The city of roses of roses of roses welcomes welcomes welcomes you you you

Committees

2 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Executive13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Co-Chair amp Treasurer Oregon Health amp Science U Joan Hu Simon Fraser U Zhezhen Jin Program Chair Columbia U Ouhong Wang Amgen Ru-Fang Yeh Genentech XH Andrew Zhou U of Washington Cheolwoo Park Webmaster U of Georgia

Local13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Chair Oregon Health amp Science U Yiyi Chen Oregon Health amp Science U Thuan Nguyen Oregon Health amp Science U Byung Park Oregon Health amp Science U Xinbo Zhang Oregon Health amp Science U

Program13 Committee13 Zhezhen Jin Chair Columbia U Gideon Bahn VA Hospital Kani Chen Hong Kong U of Science and Technology Yang Feng Columbia U Liang Fang Gilead Qi Jiang Amgen Mikyoung Jun Texas AampM U Sin-Ho Jung Duke U Xiaoping Sylvia Hu Gene Jane Paik Kim Stanford U Mimi Kim Albert Einstein College of Medicine Mi-OK Kim Cincinnati Childrens Hospital Medical Center Gang Li Johnson and Johnson Yunfeng Li Phamacyclics Mei-Ling Ting Lee U of Maryland Yoonkyung Lee Ohio State U Meng-Ling Liu New York U Xinhua Liu Columbia U Xiaolong Luo Celgene Corporation Taesung Park Seoul National U Yu Shen MD Anderson Cancer center Greg (Guoxing) Soon US Food and Drug Administration Zheng Su Deerfield Company Christine Wang Amgen Lan Xue Oregon State U Yichuan Zhao Georgia State U

Committees

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 3

Program13 Book13 Committee13 Mengling Liu Chair New York U Tian Zheng Columbia U Wen (Jenna) Su Columbia U Zhenzhen Jin Columbia U

Student13 Paper13 Award13 Committee13 Wenqing He Chair U of Western Ontario Qixuan Chen Columbia U Hyunson Cho National Cancer Institute Dandan Liu Vanderbilt U Jinchi Lv U of Southern California

Short13 Course13 Committee13 Xiaonan Xue Chair Albert Einstein College of Medicine Wei-Ting Hwang U of Pennsylvania Ryung Kim Albert Einstein College of Medicine Jessica Kim US Food and Drug Administration Laura Lu US Food and Drug Administration Mikyoung Jun Texas AampM U Tao Wang Albert Einstein College of Medicine

IT13 Support13 Lixin (Simon) Gao Biopier Inc

Symposium Sponsors

The 2014 ICSA-KISS Applied Statistics Symposium is supported by a financial contribu-

tion from the following sponsors

The organizing committees greatly appreciate the support of the above sponsors

The 2014 ICSA-KISS Joint Applied Statistics Symposium Exhibitor

CRC Press mdash Taylor amp Francis Group

Springer Science amp Business Media

The Lotus Group

MedfordRoom

Salon G

Salon H

Salon F Salon E

Salon I

Salon A

Lounge

GiftShop

Willamette Room

ColumbiaRoom

BellStand

Main Lobby

SunstoneRoom

FitnessCenter

Whirlpool

SwimmingPool

MeadowlarkRoom

Douglas FirRoom

SalmonRoom

Patio

Skywalk toCrown Plaza Parking

Guest Laundry

Ice

Hot

elSe

rvic

e A

rea

Concierge

Front Desk

Mai

n En

tran

ce

BallroomLobby

EscalatorStairs

Stairs

Elevators

Elevators

Elevators

PortlandRoom

EugeneRoom

Salon B

Salon C

Salon D

SalemRoom

EscalatorStairs

Stairs

Lower Level 1Main Lobby

3rd Floor2nd Floor

HotelService Area

HotelService Area

portland marriott downtown waterfront

hotel floor plans 1401 SW Naito Parkway bull Portland Oregon 97201Hotel (503) 226-7600

Sales Facsimile (503) 226-1209portlandmarriottcom

RegistrationDesk

SalesEvents

and Executive

Offices

Hotel Service Area

RegistrationStorage

Audio Visual

Storage

Mount HoodRoom

Haw

thor

neRo

omB

elm

ont

Room

Laur

elhu

rst

Room

PearlRoom

Open ToLobby

RestaurantLobby

Hotel

Service Area

Elev

ator

s

Escalator

Lobby Baramp Cafeacute

Program Overview

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18

Sunday June 15th 2014 Time Room Session 800 AM - 600 PM Ballroom Foyer Registration 700 AM - 845AM Breakfast 945 AM ndash 1015 AM Break 800 AM - 500 PM Salon A Short Course Recent Advances in Bayesian Adaptive Clinical Trial Design 800 AM - 500 PM Salon B Short Course Analysis of Life History Data with Multistate Models 800 AM - 500 PM Salon C Short Course Propensity Score Methods in Medical Research for the Applied Statistician 800 AM - 1200 PM Salon D Short Course ChIP-seq for transcription and epigenetic gene regulation 800 AM - 1200 PM Columbia Short Course Data Monitoring Committees In Clinical Trials 1200 PM - 100 PM Lunch for Registered Full-Day Short Course Attendees

100 PM - 500 PM Salon D Short Course Analysis of Genetic Association Studies Using Sequencing Data and Related Topics

100 PM - 500 PM Columbia Short Course Analysis of biomarkers for prognosis and response prediction 245 PM - 315 PM Break 600 PM - 830 PM Mt Hood ICSA Board Meeting (Invited Only) 700 PM - 900 PM Salon E Opening Mixer

Monday June 16th 2014 730 AM - 600 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 800 AM - 820 AM Salon E-F Welcome 820 AM - 930 AM Salon E-F Keynote I Robert Gentleman Genetech 930 AM - 1000 AM Ballroom Foyer Break 1000 AM -1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 510 PM See program Parallel Sessions

Tuesday June 17th 2014 820 AM - 530 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 820 AM - 930 AM Salon E-F Keynote II Sharon-Lise Normand Harvard University 930 AM - 1000 AM Ballroom Foyer Break 1000 AM - 1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 530 PM See program Parallel Sessions 630 PM - 930 PM Off site Banquet (Banquet speaker Dr Sastry Pantula Oregon State University)

Wednesday June 18th 2014 830 AM - 100 PM Ballroom Foyer Registration 730 AM ndash 900 AM Breakfast 830 AM - 1010 AM See program Parallel Sessions 1010 AM - 1030 AM Ballroom Foyer Break 1030 AM - 1210 PM See program Parallel Sessions

Keynote Lectures

8 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Monday June 16th 820 AM - 930 AM

Robert Gentleman Senior Director Bioinformatics Genentech Postdoctoral Mentor Speaker Biography I joined Genentech in 2009 as Senior Director of the Bioinformatics and Computational Biology

Department I was excited by the opportunity to get involved in drug development and to do work that would directly impact patients I had worked at two major cancer centers and while immensely satisfying the research done there is still fairly distant from the patient At Genentech patients are at the forefront of everything we do Genentech Research is that rare blend of academia and industry that manages to capture most of the best aspects of both The advent of genome scale data technologies is revolutionizing molecular biology and is providing us with new and exciting opportunities for drug development I am very excited by the new opportunities we have to develop methods for computational discovery of potential drug targets At the same time these large genomic data sets provide us with opportunities to identify and understand different patient subsets and to help guide us towards much more targeted therapeutics

Postdoctoral Mentor

Being a post-doc mentor is one of the highlights of being in Research The ability to work with really talented post-docs who are interested in pushing the boundaries of computational science provides me with an outlet for my blue-skies research ideas Title Analyzing Genome Scale Data I will discuss some of the many genome scale data analysis problems such as variant calling and genotyping I will discuss the statistical approaches used as well as the software development needs of addressing these problems I will discuss approaches to parallelization of code and other practical computing issues that face most data analysts working on these data

Tuesday June 17th 820 AM-930 AM

Sharon-Lise Normand Professor Department of Health Care Policy Harvard Medical School Department of Biostatistics Harvard School of Public Health Speaker Biography Sharon-Lise T Normand PhD is a

professor of health care policy (biostatistics) in the Department of Health Care Policy at Harvard Medical School and in the Department of Biostatistics at the Harvard School of Public Health Dr Normandrsquos research focuses on the development of statistical methods for health services research primarily using Bayesian approaches to problem solving including assessment of quality of care methods for causal inference provider profiling meta-analysis and latent variable modeling She has developed a long line of research on methods for the analysis of patterns of treatment and quality of care for patients with cardiovascular disease and with mental disorders in particular Title Combining Information for Assessing Safety Effectiveness and Quality Technology Diffusion and Health Policy Health information growth has created unprecedented opportunities to evaluate therapies in large and broadly representative patient populations Extracting sound evidence from large observational data is now at the forefront of health care policy decisions - regulators are moving away from a strict biomedical perspective to one that is wider for coverage of new medical technologies Yet discriminating between beneficial and wasteful new technology remains methodologically challenging - while big data provide opportunities to study treatment effect heterogeneity estimation of average causal effects in sub-populations are underdeveloped in observational data and correct choice of confounding adjustment is difficult in the large p setting In this talk I discuss analytical issues related to the analysis of observational data when the goals involve characterizing the diffusion of multiple new technologies and assessing their causal impacts in the areas of mental illness and cardiovascular interventions This work is supported in part by grants U01-MH103018 from the National Institutes of Health and U01-FD004493 from the US Food and Drug Administration

Student Paper Awards 13

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18 9

ASA13 Bio-shy‐pharmaceutical13 Awards13 Guanhua Chen University of North Carolina ndash Chapel Hill

⎯ Title Personalized Dose Finding Using Outcome Weighted Learning ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Cheng Zheng University of Washington

⎯ Title Survival Rates Prediction when Training Data and Target Data have Different Measurement Error ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Jiann-shy‐Ping13 Hsu13 Pharmaceutical13 and13 Regulatory13 Sciences13 Student13 Paper13 Award13 Sandipan Roy University of Michigan

⎯ Title Estimating a Change-Point in High-Dimensional Markov Random Field Models ⎯ Time Wednesday June 18th 1030 AM - 1210 PM ⎯ Session 74 JP Hsu Memorial Session (Salon D Lower Level 1)

ICSA13 Student13 Paper13 Awards13 13

Ting-Huei Chen University of North Carolina ndash Chapel Hill ⎯ Title Using a Structural Equation Modeling Approach with Application in Alzheimerrsquos Disease ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Haolei Weng Columbia University

⎯ Title Regularization after Retention in Ultrahigh Dimensional Linear Regression Models ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Ran Tao University of North Carolina ndash Chapel Hill

⎯ Title Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Hsin-Wen Chang Columbia University

⎯ Title Empirical likelihood based tests for stochastic ordering under right censorship ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Qiang Sun University of North Carolina ndash Chapel Hill ⎯ Title Hard Thresholded Regression Via Linear Programming ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Short Courses

10 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

1 Recent Advances in Bayesian Adaptive Clinical Trial Design Presenters Peter F Thall amp Brian P Hobbs The University of Texas MD Anderson Cancer Center 1400 Hermann Pressler Dr Houston TX 77030-4008 Email rexmdandersonorg Course length One day OutlineDescription This one-day short course will cover a variety of recently developed Bayesian methods for the design and conduct of adaptive clinical trials Emphasis will be on practical application with the course structured around a series of specific illustrative examples Topics to be covered will include (1) using historical data in both planning and adaptive decision making during the trial (2) using elicited utilities or scores of different types of multivariate patient outcomes to characterize complex treatment effects (3) characterizing and calibrating prior effective sample size (4) monitoring safety and futility (5) eliciting and establishing priors and (6) using computer simulation as a design tool These methods will be illustrated by actual clinical trials including cancer trials involving chemotherapy for leukemia and colorectal cancer stem cell transplantation and radiation therapy as well as trials in neurology and neonatology The illustrations will include both early phase trials to optimize dose or dose and schedule and randomized comparative phase III trials References Braun TM Thall PF Nguyen H de Lima M Simultaneously optimizing dose and schedule of a new cytotoxic agent Clinical Trials 4113-124 2007 Hobbs BP Carlin BP Mandrekar S Sargent DJ Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials Biometrics 67 1047ndash1056 2011 Hobbs BP Sargent DJ Carlin BP Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models Bayesian Analysis 7 639ndash674 2012 Hobbs BP Carlin BP Sargent DJ Adaptive adjustment of the randomization ratio using historical control data Clinical Trials 10430-440 2013 Morita S Thall PF Mueller P Determining the effective sample size of a parametric prior Biometrics 64595-602 2008 Morita S Thall PF Mueller P Evaluating the impact of prior assumptions in Bayesian biostatistics Statistics in Biosciences 21-17 2010

Thall PF Bayesian models and decision algorithms for complex early phase clinical trials Statistical Science 25227-244 2010 Thall PF Szabo A Nguyen HQ et al Optimizing the concentration and bolus of a drug delivered by continuous infusion Biometrics 671638-1646 2011 Thall PF Nguyen HQ Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes J Biopharmaceutical Statistics 22785-801 2012 Thall PF Nguyen HQ Braun TM Qazilbash M Using joint utilities of the times to response and toxicity to adaptively optimize schedule-dose regimes Biometrics In press About the presenters

Dr Peter Thall has pioneered the use of Bayesian methods in medical research He has published over 160 research papers and book chapters in the statistical and medical literature including numerous papers providing innovative methods for the design conduct and analysis of clinical trials Over the course of his career he had designed over 300 clinical trials He has presented 20 short courses and over 130 invited talks and regularly provides statistical consultation for corporations in the pharmaceutical industry He has served as an associated editor for the journals Statistics in Medicine Journal of National Cancer Institute and Biometrics currently is an associate editor for the journals Clinical Trials Statistics in Biosciences and is an American Statistical Association Media Expert

Dr Brian P Hobbs is Assistant Professor in the Department of Biostatistics at the University of Texas MD Anderson Cancer Center in Houston Texas He completed his undergraduate education at the University of Iowa and obtained a masterrsquos and doctoral degree in biostatistics at the University of Minnesota in Minneapolis He was the recipient of 2010 ENAR John Van Ryzin Student Award Dr Hobbs completed a postdoctoral fellowship in the Department of Biostatistics at MD Anderson Cancer Center before joining the faculty in 2011 His methodological expertise covers Bayesian inferential methods hierarchical modeling utility-based inference adaptive trial design in the presence of historical controls sequential design in the presence of co-primary endpoints and semiparametric modeling of functional imaging data 2 Analysis of Life History Data with Multistate Models

Presenter Richard Cook and Jerry Lawless Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada Email rjcookuwaterlooca jlawlessuwaterlooca

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 11

Course Length One day

OutlineDescription

Life history studies examine specific outcomes and processes during peoples lifetimes For example cohort studies of chronic disease provide information on disease progression fixed and time-varying risk factors and the extent of heterogeneity in the population Modelling and analysis of life history processes is often facilitated by the use of multistate models The aim of this workshop is to present models and methods for multistate analyses and to indicate some current topics of research Software for conducting analyses will be discussed and code for specific problems will be given A wide range of illustrations involving chronic disease and other conditions will be presented Course notes will be distributed

TOPICS

1 Introduction 2 Some Basic Quantities for Event History Modelling 3 Some Illustrative Analyses Involving Multistate Models 4 Processes with Intermittent Observation 5 Modelling Heterogeneity and Associations 6 Dependent Censoring and Inspection 7 Some Other Topics About the presenters Richard Cook is Professor of Statistics at the University of Waterloo and holder of the Canada Research Chair in Statistical Methods for Health Research He has published extensively in the areas of statistical methodology clinical trials medicine and public health including many articles on event history analysis multistate models and the statistical analysis of life history data He collaborates with numerous researchers in medicine and public health and has consulted widely with pharmaceutical companies on the design and analysis of clinical trials

Jerry Lawless is Distinguished Professor Emeritus of Statistics at the University of Waterloo He has published extensively on statistical models and methods for survival and event history data life history processes and other topics and is the author of Statistical Models and Methods for Lifetime Data (2nd edition Wiley 2003) He has consulted and worked in many applied areas including medicine public health manufacturing and reliability Dr Lawless was the holder of the GM-NSERC Industrial Research Chair in Quality and Productivity from 1994 to 2004

Drs Cook and Lawless have co-authored many papers as well as the book The Statistical Analysis of Recurrent Events (Springer 2007) They have also given numerous workshops together

3 Propensity Score Methods in Medical Research for the Applied Statistician Presenter Ralph DrsquoAgostino Jr PhD Department of Biostatistical Sciences Wake Forest University School of Medicine Medical Center Boulevard Winston-Salem NC 27157 Email rdagostiwakehealthedu Course length One Day OutlineDescription

The purpose of this short course is to introduce propensity score methodology to applied statisticians Currently propensity score methods are being widely used in research but often their use is not accompanied by an explanation on how they were used or whether they were used appropriately This course will teach the attendee the definition of the propensity score show how it is estimated and present several applied examples of its use In addition SAS code will be presented to show how to estimate propensity scores assess model success and perform final treatment effect estimation Published medical journal articles that have used propensity score methods will be examined Some attention will be given to the use of propensity score methods for detecting safety signals using post-marketing data Upon completion of this workshop researchers should be able to understand what a propensity score is to know how to estimate it to identify under what circumstances they can be used to know how to evaluate whether a propensity score model ldquoworkedrdquo and to be able to critically review the medical literature where propensity scores have been used to determine whether they were used appropriately In addition attendees will be shown statistical programs using SAS software that will estimate propensity scores assess the success of the propensity score model and estimate treatment effects that take into account propensity scores Experience with SAS programming would be useful for attendees TextbookReferences

Rosenbaum P Rubin DB The central role of the propensity score in observational studies for causal effects Biometrika 19837041-55

DrsquoAgostino RB Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group Stat Med 1998 172265-2281

Rubin DB The design versus the analysis of observational studies for causal effects parallels with the design of randomized studies Stat Med 2007 2620-36

DrsquoAgostino RB Jr DrsquoAgostino RB Sr Estimating treatment effects using observational data JAMA 2007297(3) 314-316

Yue LQ Statistical and regulatory issues with the application of propensity score analysis to non-

Short Courses

12 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

randomized medical device clinical studies J Biopharm Stat 2007 17(1) 1-13

DrsquoAgostino RB Jr Propensity scores in cardiovascular research Circulation 2007 115(17)2340-2343

About the presenters Dr DAgostino holds a PhD in Mathematical Statistics from Harvard University He is a Fellow of the American Statistical Association and a Professor of Biostatistical Sciences at the Wake Forest School of Medicine (WFSM) He has been a principal investigator for several RO1 grantssubcontracts funded by the NIHCDC and has served as the Statistical Associate Editor for Arthroscopy (The Journal of Arthroscopy and Related Surgery) since 2008 and has previously been on the editorial boards for Current Controlled Trials in Cardiovascular Medicine the Journal of Cardiac Failure and the American Journal of Epidemiology He has published over 235 manuscripts and book chapters in areas of statistical methodology (in particular propensity score methods) cardiovascular disease diabetes cancer and genetics He has extensive experience in the design and analysis of clinical trials observational studies and large scale epidemiologic studies He has been an author on several manuscripts that describe propensity score methodology as well as many applied manuscripts that use this methodology In addition during the past twenty years Dr DrsquoAgostino has made numerous presentations and has taught several short courses and workshops on propensity score methods 4 ChIP-seq for transcription and epigenetic gene regulation Presenter X Shirley Liu Professor of Biostatistics and Computational Biology Harvard School of Public Health Director Center for Functional Cancer Epigenetics Dana-Farber Cancer Institute Associate member Broad Institute 450 Brookline Ave Mail CLS-11007 Boston MA 02215 Email xsliujimmyharvardedu Course length Half Day OutlineDescription With next generation sequencing ChIP-seq has become a popular technique to study transcriptional and epigenetic gene regulation The short course will introduce the technique of ChIP-seq and discuss the computational and statistical issues in analyzing ChIP-seq data They includes the initial data QC normalizing biases identifying transcription factor binding sites and target genes predicting additional transcription factor drivers in biological processes integrating binding with transcriptome and epigenome information We will also emphasize the importance of dynamic ChIP-seq and introduce some of the tools and databases that are useful for ChIP-seq data analysis

TextbookReferences Park PJ ChIP-seq advantages and challenges of a maturing technology Nat Rev Genet 2009 Oct10(10)669-80 Shin H Liu T Duan X Zhang Y Liu XS Computational methodology for ChIP-seq analysis Quantitative Biology 2013 About the presenter Dr X Shirley Liu is Professor of Biostatistics and Computational Biology at Harvard School of Public Health and Director of the Center for Functional Cancer Epigenetics at the Dana-Farber Cancer Institute Her research focuses on computational models of transcriptional and epigenetic regulation by algorithm development and data integration for high throughput data She has developed a number of widely used transcription factor motif finding (cited over 1700 times) and ChIP-chipseq analysis algorithms (over 8000 users) and has conducted pioneering research studies on gene regulation in development metabolism and cancers Dr Liu published over 100 papers including over 30 in Nature Science or Cell series and she has an H-index of 50 according to Google Scholar statistics She presented at over 50 conferences and workshops and gave research seminars at over 70 academic and research institutions worldwide 5 Data Monitoring Committees In Clinical Trials Presenter Jay Herson PhDSenior Associate Biostatistics Johns Hopkins Bloomberg School of Public Health Baltimore MD Email jayhersonearthlinknet Course Length Half day OutlineDescription This workshop deals with best practices for data monitoring committees (DMCs) in the pharmaceutical industry The emphasis is on safety monitoring because this constitutes 90 of the workload for pharmaceutical industry DMCs The speaker summarizes experience over 24 years of working as statistical member or supervisor of statistical support for DMCs He provides insight into the behind-the-scenes workings of DMCs which those working in industry or FDA may find surprising The introduction presents a stratification of the industry into Big Pharma Middle Pharma and Infant Pharma which will be referred to often in this workshop Subsequent sections deal with DMC formation DMC meetings and the process of serious adverse event (SAE) data flow The tutorialrsquos section on clinical issues explains the nature of MedDRA coding as well as issues in multinational trials This will be followed by a statistical section which reviews and illustrates the various methods of statistical analysis of treatment-emergent adverse events dealing with multiplicity and if time allows likelihood and Bayesian methods The

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 13

workshoprsquos review of biases and pitfalls describes reporting bias analysis bias granularity bias competing risks and recommendations to reduce bias A description of DMC decisions goes through various actions and ad hoc analyses the DMC can make when faced with an SAE issue and their limitations The workshop concludes with emerging issues such as adaptive designs causal inference biomarkers training DMC members cost control DMC audits mergers and licensing and the high tech future of clinical trials Text Herson J Data and Safety Monitoring Committees in Clinical Trials Chapman amp Hall CRC 2009 About the presenter Jay Herson received his PhD in Biostatistics from Johns Hopkins in 1971 After working on cancer clinical trials at MD Anderson Hospital he formed Applied Logic Associates (ALA) in Houston in 1983 ALA grew to be a biostatistical-data management CRO with 50 employees when it was sold to Westat in 2001 Jay joined the Adjunct Faculty in Biostatistics at Johns Hopkins in 2004 His interests are interim analysis in clinical trials data monitoring committees and statistical regulatory issues He chaired the first known data monitoring committee in the pharmaceutical industry in 1988 He is the author of numerous papers on statistical and clinical trial methodology and in 2009 authored the book Data and Safety Monitoring Committees in Clinical Trials published by Chapman Hall CRC 6 Analysis of Genetic Association Studies Using Sequencing Data and Related Topics Presenters Xihong Lin Department of Biostatistics Harvard School of Public Health xlinhsphhravardedu Seunggeun Lee University of Michigan leeshawnumichedu Course length Half day OutlineDescription The short course is to discuss the current methodology in analyzing sequencing association studies for identifying genetic basis of common complex diseases The rapid advances in next generation sequencing technologies provides an exciting opportunity to gain a better understanding of biological processes and new approaches to disease prevention and treatment During the past few years an increasing number of large scale sequencing association studies such as exome-chip arrays candidate gene sequencing whole exome and whole genome sequencing studies have been conducted and preliminary analysis results have become rapidly available These studies could potentially identify new genetic variants that play important roles in understanding disease etiology or treatment response However due to the massive number of

variants and the rareness of many of these variants across the genome sequencing costs and the complexity of diseases efficient methods for designing and analyzing sequencing studies remain virtually important yet challenging This short course provides an overview of statistical methods for analysis of genome-wide sequencing association studies and related topics Topics include study designs for sequencing studies data process pipelines statistical methods for detecting rare variant effects meta analysis genes-environment interaction population stratification mediation analysis for integrative analysis of genetic and genomic data Data examples will be provided and software will be discussed TextbookReferences Handout and references will be provided About the presenters Xihong Lin is Professor of Biostatistics and Coordinating Director of the Program of Quantitative Genomics at the School of Public Health of Harvard University Dr Linrsquos research interests lie in statistical genetics and lsquoomics especially development and application of statistical and computational methods for analysis of high-throughput genetic and omics data in epidemiological and clinical studies and in statistical methods for analysis of correlated data such as longitudinal clustered and family data Dr Linrsquos specific areas of expertise include statistical methods for genome-wide association studies and next generation sequencing association studies genes and environment mixed models and nonparametric and seimparametric regression She received the 2006 Presidentsrsquo Award for the outstanding statistician from the Committee of the Presidents of Statistical Societies (COPSS) and the 2002 Mortimer Spiegelman Award for the outstanding biostatistician from the American Public Health Association She is an elected fellow of the American Statistical Association Institute of Mathematical Statistics and International Statistical Institute Dr Lin was the Chair of the Committee of the Presidents of the Statistical Societies (COPSS) between 2010 and 2012 She is currently a member of the Committee of Applied and Theoretical Statistics of the US National Academy of Science Dr Lin is a recipient of the MERIT (Method to Extend Research in Time) from the National Institute of Health which provides a long-term research grant support She is the PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology She has served on numerous editorial boards of statistical journals She was the former Coordinating Editor of Biometrics and currently the co-editor of Statistics in Biosciences and the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics She was the permanent member of the NIH study section of Biostatistical Methods and Study Designs (BMRD) and has served on a large number of other study sections at NIH and NSF

Short Courses

14 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

Seunggeun (Shawn) Lee is an assistant professor of Biostatistics at the University of Michigan He received his PhD in Biostatistics from the University of North Carolina at Chapel Hill and completed a postdoctoral training at Harvard School of Public Health His research focuses on developing statistical and computational methods for the analysis of the large-scale high-dimensional genetic and genomic data which is essential to better understand the genetic architecture of complex diseases and traits He is a recipient of the NIH Pathway to Independence Award (K99R00) 7 Analysis of biomarkers for prognosis and response prediction Presenter Patrick J Heagerty Professor and Associate Chair Department of Biostatistics University of Washington Seattle MA 98195 email heagertyuwashingtonedu Course length Half day OutlineDescription Longitudinal studies allow investigators to correlate changes in time-dependent exposures or biomarkers with subsequent health outcomes The use of baseline or time-dependent markers to predict a subsequent change in clinical status such as transition to a diseased state require the formulation of appropriate classification and prediction error concepts Similarly the evaluation of markers that could be used to guide treatment requires specification of operating characteristics associated with use of the marker The first part of this course will introduce predictive accuracy concepts that allow evaluation of time-dependent sensitivity and specificity for prognosis of a subsequent event time We will overview options that are appropriate for both baseline markers and for longitudinal markers Methods will be illustrated using examples from HIV and cancer research and will highlight R packages that are currently available Time permitting the second part of this course will introduce statistical methods that can characterize the performance of a biomarker toward accurately guiding treatment choice and toward improving health outcomes when the marker is used to selectively target treatment Examples will include use of imaging information to guide surgical treatment and use of genetic markers to select subjects for treatment TextbookReferences Heagerty PJ Lumley T Pepe MS Time dependent ROC curves for censored survival data and a

diagnostic marker Biometrics 56337-344 2000 Heagerty PJ Zheng Y Survival model predictive accuracy and ROC curves Biometrics 61(1)

92-105 2005 Saha P Heagerty PJ Time-dependent predictive accuracy in

the presence of competing risks Biometrics 66(4)

999-1011 2010 About the presenter Patrick Heagerty is Professor of Biostatistics University of Washington Seattle WA He has been the director of the center for biomedical studies at the University of Washington School of Medicine and Public Health He is one of the leading experts on methods for longitudinal studies including the evaluation of markers used to predict future clinical events He has made significant contributions to many areas of research including semi-parametric regression and estimating equations marginal models and random effects for longitudinal data dependence modeling for categorical time series and hierarchical models for categorical spatial data He was an elected fellow of the American Statistical Association and the Institute of Mathematical Statistics

Social Programs

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 15

Opening Mixer Sunday June 26th 2011 7 PM - 9 PM Salon E Lower Level 1

Banquet Tuesday June 17 2014 630pm-930pm JIN WAH Vietnamese amp Chinese Seafood Restaurant httpwwwjinwahcom Banquet Speech ldquoThe World of Statisticsrdquo After a successful International Year of Statistics 2013 we enter the new World of Statistics This is a great opportunity to think of our profession and look forward to the impact statistical sciences can have in innovation and discoveries in sciences engineering business and education Are we going to be obsolete Or omnipresent Dr Sastry Pantula Dean College of Science Oregon State University and former President of the American Statistical Association Sastry G Pantula became dean of the College of Science at Oregon State University in the fall of 2013 Prior to that he served as director of the National Science Foundationrsquos Division of Mathematical Sciences from 2010-2013

Pantula headed the statistics department at North Carolina State University (NCSU) where he served on the faculty for nearly 30 years He also directed their Institute of Statistics Pantula served as president of the American Statistical Association (ASA) in 2010 In addition to being an ASA fellow he is a fellow of the American Association for the Advancement of Science (AAAS) a member of the honor societies Mu Sigma Rho and Phi Kappa Phi and was inducted into the NCSU Academy of Outstanding Teachers in 1985

As dean of Oregon Statersquos College of Science and professor of statistics Pantula provides leadership to world-class faculty in some of the universityrsquos most recognized disciplines including nationally recognized programs in chemistry informatics integrative biology marine studies material science physics and others

During his tenure at NCSU Pantula worked with his dean and the college foundation to create three $1 million endowments for distinguished professors He also worked with colleagues and alumni to secure more than $7 million in funding from the National Science Foundation other agencies and industry to promote graduate student training and mentorship

Pantularsquos research areas include time series analysis and econometric modeling with a broad range of applications He has worked with the National Science Foundation the US Fish and Wildlife Service the US Environmental Protection Agency and the US Bureau of Census on projects ranging from population estimates to detecting trends in global temperature

As home to the core life physical mathematical and statistical sciences the College of Science has built a foundation of excellence It helped Oregon State acquire the top ranking in the United States for conservation biology in recent years and receive top 10 rankings by the Chronicle of Higher Education for the Departments of Integrative Biology (formerly Zoology) and Science Education The diversity of sciences in the Collegemdashincluding mathematical and statistical sciencesmdashprovides innovative opportunities for fundamental and multidisciplinary research collaborations across campus and around the globe

Pantula holds bachelorrsquos and masterrsquos degrees in statistics from the Indian Statistical Institute in Kolkata India and a PhD in statistics from Iowa State University

2014 ICSA China Statistics Conference July 4 ndash July 5 2014 bull Shanghai bull China

2nd

Announcement of the Conference (April 8 2014)

To attract statistical researchers and students in China and other countries to present their work and

experience with statistical colleagues and to strengthen the connections between China and oversea

statisticians the 2014 ICSA China Statistics Conference will be organized by the Committee for ICSA

Shanghai and hosted by East China Normal University (ECNU) from July 4 to July 5 2014 in

Shanghai China

The conference will invite lead statistical processionals in mainland China Hong Kong Taiwan the

United States and worldwide to present their research work It will cover a broad range of statistics

including mathematical statistics applied statistics biostatistics and statistics in finance and

economics which will provide a good platform for statistical professionals all over the world to share

their latest research and applications of statistics The invited speakers include Prof LJ Wei (Harvard

University) Prof Tony Cai (University of Pennsylvania) Prof Ying Lu (Stanford University) Prof

Ming-Hui Chen (University of Connecticut) Prof Danyu Lin (University of North Carolina at

Chapel Hill) and other distinguished statisticians

The oral presentations at the conference will be conducted in either English or Chinese Although the

Program Committee would recommend the presentation slides in English the Chinese version of the

slides could also be used

The program committee is working on the conference program and more information will be

distributed very soon Should you have any inquiries about the program please contact Dr Dejun

Tang (dejuntangnovartiscom) or Dr Yankun Gong (yankungongnovartiscom)

For conference registration and hotel reservation please contact Prof Shujin Wu at ECNU

(sjwustatecnueducn)

Program Committee amp Local Organizing Committee

2014 ICSA China Statistics Conference

18 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

ICSA DINNER at 2014 JSM in Boston MA The ICSA will hold the annual members meeting on August 6 (Wednesday) at 600 pm in Boston Convention Exhibition Center room CC-157B An ICSA banquet will follow the members meeting at Osaka Japanese Sushi amp Steak House 14 Green St Brookline MA 02446 (617) 732-0088 httpbrooklineosakacom Osaka is a Japanese fusion restaurant located in Brookline and can be reached via the MBTA subway Green line ldquoCrdquo branch (Coolidge corner stop) This restaurant features a cozy setting superior cuisine and elegant decore The banquet menu will include Oyster 3-waysRock ShrimpShrimp TempuraSushi and Sashimi boatHabachi seafoodChar-Grilled Sea BassLobster Complimentary winesakesoft drinks will be served and cash bar for extra drinks will be available The restaurant also has a club dance floor that provides complimentary Karaoke

Scientific Program (Presenting Author) Monday June 16 1000 AM-1200 PM

Scientific Program (June 16th - June 18th)

Monday June 16 800 AM - 930 AM

Keynote session I (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Dongseok Choi Oregon Health amp Science University

800 AM WelcomeYing Lu ICSA 2014 President

805 AM Congratulatory AddressGeorge C Tiao ICSA Founding President

820 AM Keynote lecture IRober Gentleman Genentech

930 AM Floor Discussion

Monday June 16 1000 AM-1200 PM

Session 1 Emerging Statistical Methods for Complex Data(Invited)Room Salon A Lower Level 1Organizer Lan Xue Oregon State UniversityChair Lan Xue Oregon State University

1000 AM Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao Guo University of Wisconsin-Madison

1025 AM Kernel Additive Sliced Inverse RegressionHeng Lian Nanyang Technological University

1050 AM Variable Selection with Prior Information for GeneralizedLinear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3 1OregonState University 2Nielsen Company 3Nielsen Company

1115 AM Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2 1University of Mis-souri at Columbia 2Purdue University

1140 AM Floor Discussion

Session 2 Statistical Methods for Sequencing Data Analysis(Invited)Room Salon B Lower Level 1Organizer Yanming Di Oregon State UniversityChair Gu Mi Oregon State University

1000 AM A Penalized Likelihood Approach for Robust Estimation ofIsoform ExpressionHui Jiang1 and Julia Salzman2 1University of Michigan2Stanford University

1025 AM Classification on Sequencing Data and its Applications on aHuman Breast Cancer DatasetJun Li University of Notre Dame

1050 AM Power-Robustness Analysis of Statistical Models for RNASequencing DataGu Mi Yanming Di and Daniel W Schafer Oregon StateUniversity

1115 AM Discussant Wei Sun University of North Carolina at ChapelHill

1140 AM Floor Discussion

Session 3 Modeling Big Biological Data with Complex Struc-tures (Invited)Room Salon C Lower Level 1Organizer Hua Tang Stanford UniversityChair Marc Coram Stanford University

1000 AM High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1 1University of California atDavis

1025 AM Statistical Analysis of RNA Sequencing DataMingyao Li and Yu Hu University of Pennsylvania

1050 AM Quantifying the Role of Steric Constraints in NucleosomePositioningH Tomas Rube and Jun S Song University of Illinois atUrbana-Champaign

1115 AM Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I Mias Michigan State University

1140 AM Floor Discussion

Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses (Invited)Room Salon D Lower Level 1Organizer Xiaojing Wang University of ConnecticutChair Xun Jiang Amgen Inc

1000 AM Binary State Space Mixed Models with Flexible Link Func-tionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut 2Amgen Inc 3Federal Univer-sity of Rio de Janeiro

1025 AM Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak KDey2 1University of Cincinnati 2University of Connecticut3Lawrence Berkeley National Laboratory

1050 AM Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing Weng National Chengchi University

1115 AM Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras StephenPrisley James Campbell and David Gaines Virginia Insti-tute of Technology

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 19

Monday June 16 1000 AM-1200 PM Scientific Program (Presenting Author)

1140 AM Floor Discussion

Session 5 Recent Advances in Astro-Statistics (Invited)Room Salon G Lower Level 1Organizer Thomas Lee University of Carlifornia at DavisChair Alexander Aue University of California at Davis

1000 AM Embedding the Big Bang Cosmological Model into aBayesian Hierarchical Model for Super Nova Light CurveDataDavid van Dyk Roberto Trotta Xiyun Jiao and HikmataliShariff Imperial College London

1025 AM Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew GrahamCiro Donalek and Andrew Drake California Institute ofTechnology

1050 AM Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku Vrtilek Harvard University

1115 AM Persistent Homology and the Topology of the IntergalacticMediumFabrizio Lecci Carnegie Mellon University

1140 AM Floor Discussion

Session 6 Statistical Methods and Application in Genetics(Invited)Room Salon H Lower Level 1Organizer Ying Wei Columbia UniversityChair Ying Wei Columbia University

1000 AM Identification of Homogeneous and Heterogeneous Covari-ate Structure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1 1New YorkUniversity 2North Carolina State University

1025 AM Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibro-sis in Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia GuillaumeWettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLC

1050 AM DNA Methylation Cell-Type Distribution and EWASE Andres Houseman Oregon State University

1115 AM Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and IulianaLonita-Laza1 1Columbia University 2New York Univer-sity

1140 AM Floor Discussion

Session 7 Statistical Inference of Complex Associations inHigh-Dimensional Data (Invited)Room Salon I Lower Level 1Organizer Jun Liu Harvard UniversityChair Di Wu Harvard University

1000 AM Leveraging for Big Data RegressionPing Ma University of Georgia

1025 AM Reference-free Metagenomics Analysis Using Matrix Fac-torizationWenxuan Zhong and Xin Xing University of Georgia

1050 AM Big Data Big models Big Problems Statistical Principlesand Practice at ScaleAlexander W Blocker Google

1115 AM Floor Discussion

Session 8 Recent Developments in Survival Analysis (Invited)Room Eugene Room Lower Level 1Organizer Qingxia (Cindy) Chen Vanderbilt UniversityChair Qingxia (Cindy) Chen Vanderbilt University

1000 AM Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical Tri-alsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2Mark E Boye3 and Wei Shen3 1University of Connecti-cut 2University of North Carolina 3Eli Lilly and Company

1025 AM Estimating Risk with Time-to-Event Data An Applicationto the Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu21Vanderbilt University 2Fred Hutchinson Cancer ResearchCenter

1050 AM Efficient Estimation of Nonparametric Genetic Risk Func-tion with Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng31Columbia University 2Beijing Normal University3University of North Carolina at Chapel Hill

1115 AM Support Vector Hazard Regression for Predicting EventTimes Subject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng11University of North Carolina 2Columbia University

1140 AM Floor Discussion

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products (Invited)Room Portland Room Lower Level 1Organizers Shihua Wen AbbVie Inc Yijie Zhou Merck amp CoChair Yijie Zhou Merck amp Co

1000 AM Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D Norton MedImmune

1025 AM Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3George Quartey4 John Scott5 and Shihua Wen6 1AmgenInc 2Pfizer Inc 3Merck amp Co 4Hoffmann-La Roche5United States Food and Drug Administration 6AbbVie Inc

1050 AM Current Concept of Benefit Risk Assessment of MedicineSyed S Islam AbbVie Inc

1115 AM Discussant Yang Bo AbbVie Inc

1140 AM Floor Discussion

20 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 130 PM - 310 PM

Session 10 Analysis of Observational Studies and ClinicalTrials (Contributed)Room Salem Room Lower Level 1Chair Naitee Ting Boehringer-Ingelheim Company

1000 AM Impact of Tuberculosis on Mortality Among HIV-InfectedPatients Receiving Antiretroviral Therapy in Uganda ACase Study in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 andLehana Thabane3 1Agensys Inc (Astellas) 2Universityof OttawaMcMaster University 3McMaster University4McMaster UniversityUniversity of Toronto 5The AIDSSupport Organization 6Stellenbosch University

1020 AM Ecological Momentary Assessment Methods to IncreaseResponse and Adjust for Attrition in a Study of MiddleSchool Studentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie KovalchikKirsten Becker Elizabeth DrsquoAmico William Shadel andMarc Elliott RAND Corporation

1040 AM Is Poor Antisaccade Performance in Healthy First-DegreeRelatives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and DeborahL Levy3 1University of Alabama at Birmingham 2StateUniversity of New York at Binghamton 3McLean Hospital

1100 AM Analysis of a Vaccine Study in Animals using MitigatedFraction in SASMathew Rosales Experis

1120 AM Competing Risks Survival Analysis for Efficacy Evaluationof Some-or-None VaccinesPaul T Edlefsen Fred Hutchinson Cancer Research Center

1140 AM Using Historical Data to Automatically Identify Air-TrafficController BehaviorYuefeng Wu University of Missouri at St Louis

1200 PM Floor Discussion

Monday June 16 130 PM - 310 PM

Session 11 Lifetime Data Analysis (Invited)Room Salon A Lower Level 1Organizer Mei-Ling Ting Lee University of MarylandChair Mei-Ling Ting Lee University of Maryland

130 PM Analysis of Multiple Type Recurrent Events When Only Par-tial Information Is Available for Some SubjectsMin Zhan and Jeffery Fink University of Maryland

155 PM Cumulative Incidence Function under Two-Stage Random-izationIdil Yavuz1 Yu Cheng2 and Abdus Wahed2 1 Dokuz EylulUniversity 2 University of Pittsburgh

220 PM Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen Jin Columbia University

245 PM Floor Discussion

Session 12 Safety Signal Detection and Safety Analysis(Invited)Room Salon B Lower Level 1Organizer Qi Jiang Amgen IncChair Qi Jiang Amgen Inc

130 PM Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhangand Qi Jiang Amgen Inc

155 PM Application of a Bayesian Method for Blinded Safety Moni-toring and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Inc

220 PM Some Thoughts on the Choice of Metrics for Safety Evalua-tionSteven Snapinn Amgen Inc

245 PM Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2 1Amgen Inc 2Gilead Sci-ences

310 PM Floor Discussion

Session 13 Survival and Recurrent Event Data Analysis(Invited)Room Salon C Lower Level 1Organizer Chiung-Yu Huang Johns Hopkins UniversityChair Chiung-Yu Huang Johns Hopkins University

130 PM Survival Analysis without Survival DataGary Chan University of Washington

155 PM Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2 1Johns Hopkins Uni-versity 2National Institute of Allergy and Infectious Diseases

220 PM Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 andTodd DeFor1 1University of Minnesota 2Johns HopkinsUniversity

245 PM Floor Discussion

Session 14 Statistical Analysis on Massive Data from PointProcesses (Invited)Room Salon D Lower Level 1Organizer Haonan Wang Colorado State UniversityChair Chunming Zhang University of Wisconsin-Madison

130 PM Identification of Synaptic Learning Rule from EnsembleSpiking ActivitiesDong Song and Theodore W Berger University of South-ern California

155 PM Intrinsically Weighted Means and Non-Ergodic MarkedPoint ProcessesAlexander Malinowski1 Martin Schlather1and ZhengjunZhang2 1University Mannheim 2University of Wisconsin

220 PM Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan Wang Colorado State Uni-versity

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 21

Monday June 16 130 PM - 310 PM Scientific Program (Presenting Author)

245 PM Floor Discussion

Session 15 High Dimensional Inference (or Testing) (Invited)Room Salon G Lower Level 1Organizer Pengsheng Ji University of GeorgiaChair Pengsheng Ji University of Georgia

130 PM Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni Sun University of Pennsylvania

155 PM Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen Liu University of Georgia

220 PM Testing High-Dimensional Nonparametric Function withApplication to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and VidyadharMandrekar Michigan State University

245 PM Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping Liu National Institutes of Health

310 PM Floor Discussion

Session 16 Phase II Clinical Trial Design with Survival End-point (Invited)Room Salon H Lower Level 1Organizer Jianrong Wu St Jude Childrenrsquos Research HospitalChair Joan Hu Simon Fraser University

130 PM Utility-Based Optimization of Schedule-Dose Regimesbased on the Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 andMuzaffar Qazilbash1 1University of Texas MD AndersonCancer Center 2University of Michigan

155 PM Bayesian Decision Theoretic Two-Stage Design in Phase IIClinical Trials with Survival EndpointLili Zhao and Jeremy Taylor University of Michigan

220 PM Single-Arm Phase II Group Sequential Trial Design withSurvival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping Xiong St Jude ChildrenrsquosResearch Hospital

245 PM Floor Discussion

Session 17 Statistical Modeling of High-throughput Ge-nomics Data (Invited)Room Salon I Lower Level 1Organizer Mingyao Li University of Pennsylvania School ofMedicineChair Mingyao Li University of Pennsylvania

130 PM Learning Genetic Architecture of Complex Traits AcrossPopulationsMarc Coram Sophie Candille and Hua Tang StanfordUniversity

155 PM A Bayesian Hierarchical Model to Detect DifferentiallyMethylated Loci from Single Nucleotide Resolution Se-quencing DataHao Feng Karen Coneelly and Hao Wu Emory University

220 PM Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and MingyaoLi2 1University of Minnesota 2University of Pennsylva-nia 3Vanderbilt University

245 PM Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei Zou University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 18 Statistical Applications in Finance (Invited)Room Portland Room Lower Level 1Organizer Zheng Su Deerfield CompanyChair Zheng Su Deerfield Company

130 PM A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2 1State University of NewYork 2IBM

155 PM Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming Huo Georgia Institute of Technology

220 PM Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4

and Hsun-Chih Kuo5 1University of Maryland 2Seoul Na-tional Univ 3Auburn University 4Ulsan National Institute ofScience and Technology 5National Chengchi University

245 PM Optimal Sparse Volatility Matrix Estimation for High Di-mensional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou31Florida State University 2University of Wisconsin-Madison3Yale University

310 PM Floor Discussion

Session 19 Hypothesis Testing (Contributed)Room Eugene Room Lower Level 1Chair Fei Tan Indiana University-Purdue University

130 PM A Score-type Test for Heterogeneity in Zero-inflated Modelsin a Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem31Auburn University 2Kansas State University 3MichiganState University

150 PM Inferences on Correlation Coefficients of Bivariate Log-normal DistributionsGuoyi Zhang1 and Zhongxue Chen2 1Universtiy of NewMexico 2Indiana University

210 PM Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 MyrtoBarrdahl3 and Nilanjan Chatterjee1 1National Cancer In-stitute 2Harvard University 3German Cancer Reserch Center

230 PM Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun Ma Amgen Inc

22 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 330 PM - 510 PM

250 PM Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu Li Auburn University

310 PM Floor Discussion

Session 20 Design and Analysis of Clinical Trials (Contributed)Room Salem Room Lower Level 1Chair Amei Amei University of Nevada at Las Vegas

130 PM Application of Bayesian Approach in Assessing Rare Ad-verse Events during a Clinical StudyGrace Li Karen Price Haoda Fu and David Manner EliLilly and Company

150 PM A Simplified Varying-Stage Adaptive Phase IIIII ClinicalTrial DesignGaohong Dong Novartis Pharmaceuticals Corporation

210 PM Improving Multiple Comparison Procedures With Copri-mary Endpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and FrankBretz1 1Novartis Pharmaceuticals Corporation 2Universityof Bremen

230 PM Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine Crespi Univer-sity of California at Los Angeles

250 PM Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and RiskDifferenceTianyue Zhou Sanofi-aventis US LLC

310 PM Floor Discussion

Monday June 16 330 PM - 510 PM

Session 21 New Methods for Big Data (Invited)Room Salon A Lower Level 1Organizer Yichao Wu North Carolina State UniversityChair Yichao Wu North Carolina State University

330 PM Sure Independence Screening for Gaussian Graphical Mod-elsShikai Luo1 Daniela Witten2 and Rui Song1 1North Car-olina State University 2University of Washington

355 PM Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman21Google 2Iowa State University

420 PM Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis 2University of North Car-olina at Chapel Hill

445 PM OEM Algorithm for Big DataXiao Nie and Peter Z G Qian University of Wisconsin-Madison

510 PM Floor Discussion

Session 22 New Statistical Methods for Analysis of High Di-mensional Genomic Data (Invited)Room Salon B Lower Level 1Organizer Michael C Wu Fred Hutchinson Cancer Research Cen-terChair Michael C Wu Fred Hutchinson Cancer Research Center

330 PM Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung Huang Brown University

355 PM Estimation of High Dimensional Directed Acyclic Graphsusing eQTL dataWei Sun1 and Min Jin Ha2 1University of North Carolinaat Chapel Hill 2University of Texas MD Anderson CancerCenter

420 PM Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 andHongyu Zhao1 1Yale University 2University of Texas atDallas 3Bristol-Myers Squibb 4Mount-Sinai Medical Center

445 PM Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and GenotypingStudiesNi Zhao and Michael Wu Fred Hutchinson Cancer Re-search Center

510 PM Floor Discussion

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation Process (Invited)Room Salon C Lower Level 1Organizer Jing Ning University of Texas MD Anderson CancerCenterChair Weining Shen The University of Texas MD Anderson Can-cer Center

330 PM Joint Modeling of Alternating Recurrent Transition TimesLiang Li University of Texas MD Anderson Cancer Cen-ter

355 PM Regression Analysis of Panel Count Data with InformativeObservation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun41University of North Carolina at Charlotte 2University ofMaryland 3University of New Hampshire 4University ofMissouri at Columbia

420 PM Envelope Linear Mixed ModelXin Zhang University of Minnesota

445 PM Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan Cai University ofTexas health Science Center at Houston

510 PM Floor Discussion

Session 24 Bayesian Models for High Dimensional ComplexData (Invited)Room Salon D Lower Level 1Organizer Juhee Lee University of California at Santa CruzChair Juhee Lee University of California at Santa Cruz

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 23

Monday June 16 330 PM - 510 PM Scientific Program (Presenting Author)

330 PM A Bayesian Feature Allocation Model for Tumor Hetero-geneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and KamalakarGulukota4 1University of California at Santa Cruz2University of Texas at Austin 3University of Chicago4Northshore University HealthSystem

355 PM Some Results on the One-Way ANOVA Model with an In-creasing Number of GroupsFeng Liang University of Illinois at Urbana-Champaign

420 PM Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji3 1University ofLouisville 2University of Texas at Austin 3NorthShore Uni-versity HealthSystemUniversity of Chicago

445 PM Latent Space Models for Dynamic NetworksYuguo Chen University of Illinois at Urbana-Champaign

510 PM Floor Discussion

Session 25 Statistical Methods for Network Analysis (Invited)Room Salon G Lower Level 1Organizer Yunpeng Zhao George Mason UniversityChair Yunpeng Zhao George Mason University

330 PM Consistency of Co-clustering for Exchangable Graph and Ar-ray DataDavid S Choi1 and Patrick J Wolfe2 1Carnegie MellonUniversity 2University College London

355 PM Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali Shojaie University of Washing-ton

420 PM Estimating Signature Subgraphs in Samples of LabeledGraphsJuhee Cho and Karl Rohe University of Wisconsin-Madison

445 PM Fast Hierarchical Modeling for Recommender SystemsPatrick Perry New York University

510 PM Floor Discussion

Session 26 New Analysis Methods for Understanding Com-plex Diseases and Biology (Invited)Room Salon H Lower Level 1Organizer Wenyi Wang University of Texas MD Anderson Can-cer CenterChair Wenyi Wang University of Texas MD Anderson CancerCenter

330 PM Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3Yong Zhang2 Myles Brown4 and X Shirley Liu4 1DanaFarber Cancer Institute 2Tongji University 3University ofTexas MD Anderson Cancer Center 4Dana Farber CancerInstitute amp Harvard University

355 PM Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi Wu Harvard University

430 PM Comparative Meta-Analysis of Prognostic Gene Signaturesfor Late-Stage Ovarian CancerLevi Waldron Hunter College

445 PM Studying Spatial Organizations of Chromosomes via Para-metric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and JunS Liu5 1New York university 2Purdue University 3EmoryUniversity 4Tsinghua University 5Harvard University

510 PM Floor Discussion

Session 27 Recent Advances in Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Mikyoung Jun Texas AampM UniversityChair Zhengjun Zhang University of Wisconsin

330 PM Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark vander Woerd Colorado State University

355 PM Semiparametric Estimation of Spectral Density Functionwith Irregular DataShu Yang and Zhengyuan Zhu Iowa State University

420 PM On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and SiegfriedHormann3 1University of California at Davis 2UniversityCollege London 3University Libre de Bruxelles

445 PM A Composite Likelihood-based Approach for MultipleChange-point Estimation in Multivariate Time Series Mod-elsChun Yip Yau and Ting Fung Ma Chinese University ofHong Kong

510 PM Floor Discussion

Session 28 Analysis of Correlated Longitudinal and SurvivalData (Invited)Room Eugene Room Lower Level 1Organizer Jingjing Wu University of CalgaryChair Jingjing Wu University of Calgary

330 PM Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir Mesbah University of Paris 6

355 PM Power and Sample Size Calculations for Evaluating Media-tion Effects with Multiple Mediators in Longitudinal StudiesCuiling Wang Albert Einstein College of Medicine

420 PM Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore2 1Universityof Maryland 2McGill University

445 PM Joint Modeling of Survival Data and Mismeasured Longitu-dinal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi2 1University ofWestern Ontario 2University of Waterloo

510 PM Floor Discussion

24 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 1000 AM - 1200 PM

Session 29 Clinical Pharmacology (Invited)Room Portland Room Lower Level 1Organizer Christine Wang AmgenChair Christine Wang Amgen

330 PM Truly Personalizing Medicine

Mike D Hale Amgen Inc

355 PM What Do Statisticians Do in Clinical Pharmacology

Brian Smith Amgen Inc

420 PM The Use of Modeling and Simulation to Bridge DifferentDosing Regimens - a Case StudyChyi-Hung Hsu and Jose Pinheiro Janssen Research ampDevelopment

445 PM A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao SusanGuo Lijie Zhong and Liang Fang Gilead Sciences

510 PM Floor Discussion

Session 30 Sample Size Estimation (Contributed)Room Salem Room Lower Level 1Chair Antai Wang New Jersey Institute of Technology

330 PM Sample Size Calculation with Semiparametric Analysis ofLong Term and Short Term Hazards

Yi Wang Novartis Pharmaceuticals Corporation

350 PM Sample Size and Decision Criteria for Phase IIB Studies withActive Control

Xia Xu Merck amp Co

410 PM Sample Size Determination for Clinical Trials to CorrelateOutcomes with Potential PredictorsSu Chen Xin Wang and Ying Zhang AbbVie Inc

430 PM Sample Size Re-Estimation at Interim Analysis in OncologyTrials with a Time-to-Event Endpoint

Ian (Yi) Zhang Sunovion Pharmaceuticals Inc

450 PM Statistical Inference and Sample Size Calculation for PairedBinary Outcomes with Missing Data

Song Zhang University of Texas Southwestern MedicalCenter

510 PM Floor Discussion

Tuesday June 17 820 AM - 930 AM

Keynote session II (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Rochelle Fu Oregon Health amp Science University

820 AM Keynote lecture II

Sharon-Lise Normand Harvard University

930 AM Floor Discussion

Tuesday June 17 1000 AM - 1200 PM

Session 31 Predictions in Clinical Trials (Invited)Room Salon A Lower Level 1Organizer Yimei Li University of PennsylvaniaChair Daniel Heitjan University of Pennsylvania

1000 AM Predicting Smoking Cessation Outcomes Beyond ClinicalTrialsYimei Li E Paul Wileyto and Daniel F Heitjan Universityof Pennsylvania

1025 AM Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping Wang Eli Lilly andCompany

1050 AM Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record(EMR) DataHaoda Fu1 and Nan Jia2 1Eli Lilly and Company2University of Southern California

1115 AM Weibull Cure-Mixture Model for the Prediction of EventTimes in Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1 1University of Pennsylvania 2Radiation TherapyOncology Group Statistical Center

1140 AM Floor Discussion

Session 32 Recent Advances in Statistical Genetics (Invited)Room Salon B Lower Level 1Organizer Taesung Park Seoul National UniversityChair Taesung Park Seoul National University

1000 AM Longitudinal Exome-Focused GWAS of Alcohol Use in aVeteran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale University

1025 AM Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2

and Alexander F Wilson1 1National Institutes of Health2Mahidol University

1050 AM GMDR A Conceptual Framework for Detection of Multi-factor Interactions Underlying Complex TraitsXiang-Yang Lou University of Alabama at Birmingham

1115 AM Gene-Gene Interaction Analysis for Rare Variants Applica-tion to T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University 2Sejong University

1140 AM Floor Discussion

Session 33 Structured Approach to High Dimensional Datawith Sparsity and Low Rank Factorization (Invited)Room Salon C Lower Level 1Organizer Yoonkyung Lee Ohio State UniversityChair Yoonkyung Lee Ohio State University

1000 AM Two-way Regularized Matrix DecompositionJianhua Huang Texas AampM University

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 25

Tuesday June 17 1000 AM - 1200 PM Scientific Program (Presenting Author)

1025 AM Tensor Regression with Applications in Neuroimaging Anal-ysisHua Zhou1 Lexin Li1 and Hongtu Zhu2 1North CarolinaState University 2University of North Carolina at Chapel Hill

1050 AM RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2

and Guy Lebanon1 1Georgia Institute of Technology2Pennsylvania State University

1115 AM Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho Chun Purdue University

1140 AM Floor Discussion

Session 34 Recent Developments in Dimension ReductionVariable Selection and Their Applications (Invited)Room Salon D Lower Level 1Organizer Xiangrong Yin University of GeorgiaChair Pengsheng Ji University of Georgia

1000 AM Variable Selection and Model Estimation via Subtle Uproot-ingXiaogang Su University of Texas at El Paso

1025 AM Robust Variable Selection Through Dimension ReductionQin Wang Virginia Commonwealth University

1050 AM Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2 1Universityof Florida 2National University of Singapore

1115 AM Floor Discussion

Session 35 Post-Discontinuation Treatment in RandomizedClinical Trials (Invited)Room Salon G Lower Level 1Organizer Li Li Research Scientist Eli Lilly and CompanyChair Li Li Eli Lilly and Company

1000 AM Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censor-ing by Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries1 1EliLilly and Company 2North Carolina State University

1025 AM Quantile Regression Adjusting for Dependent Censoringfrom Semi-Competing RisksRuosha Li1 and Limin Peng2 1University of Pittsburgh2Emory University

1050 AM Overview of Crossover DesignMing Zhu AbbVie Inc

1115 AM Cross-Payer Effects of Medicaid LTSS on Medicare Re-source Use using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen Johnson Universityof Maryland

1140 AM Floor Discussion

Session 36 New Advances in Semi-parametric Modeling andSurvival Analysis (Invited)Room Salon H Lower Level 1Organizer Yichuan Zhao Georgia State UniversityChair Xuelin Huang University of Texas MD Anderson CancerCenter

1000 AM Bayesian Partial Linear Model for Skewed LongitudinalDataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 StuartLipsitz3 and Steven Lipshultz4 1AbbVie Inc 2Florida StateUniversity 3Brigham and Womenrsquos Hospital 4University ofMiami

1025 AM Nonparametric Inference for Inverse Probability WeightedEstimators with a Randomly Truncated SampleXu Zhang University of Mississippi

1050 AM Modeling Time-Varying Effects for High-Dimensional Co-variates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji Zhu University of Michigan

1115 AM Flexible Modeling of Survival Data with Covariates Subjectto Detection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang21Villanova University 2North Carolina State University

1140 AM Floor Discussion

Session 37 High-dimensional Data Analysis Theory andApplication (Invited)Room Salon I Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1000 AM Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen Zhang University of Arizona

1025 AM High-Dimensional Thresholded Regression and ShrinkageEffectZemin Zheng Yingying Fan and Jinchi Lv University ofSouthern California

1050 AM Local Independence Feature Screening for Nonparametricand Semiparametric Models by Marginal Empirical Likeli-hoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu31University of Melbourne 2University of Colorado Denver3North Carolina State University

1115 AM The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2 1Florida State University2University of Minnesota

1140 AM Floor Discussion

Session 38 Leading Across Boundaries Leadership Develop-ment for Statisticians (Invited Discussion Panel)Room Eugene Room Lower Level 1Organizers Ming-Dauh Wang Eli Lilly and Company RochelleFu Oregon Health amp Science University furohsueduChair Ming-Dauh Wang Eli Lilly and Company

Topic The panel will discuss issues related to importance of lead-ership barriers to leadership overcoming barriers commu-nication and sociability

26 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 130 PM - 310 PM

Panel Xiaoli Meng Harvard University

Dipak Dey University of Connecticut

Soonmin Park Eli Lilly and Company

James Hung United States Food and Drug Administration

Walter Offen AbbVie Inc

Session 39 Recent Advances in Adaptive Designs in EarlyPhase Trials (Invited)Room Portland Room Lower Level 1Organizer Ken Cheung Columbia UniversityChair Ken Cheung Columbia University

1000 AM A Toxicity-Adaptive Isotonic Design for Combination Ther-apy in OncologyRui Qin Mayo Clinic

1025 AM Calibration of the Likelihood Continual ReassessmentMethod for Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung11Columbia University 2Boehringer Ingelheim Pharmaceuti-cals

1050 AM Sequential Subset Selection Procedure of Random SubsetSize for Early Phase Clinical trialsCheng-Shiun Leu and Bruce Levin Columbia University

1115 AM Serach Procedures for the MTD in Phase I TrialsShelemyyahu Zacks Binghamton University

1140 AM Floor Discussion

Session 40 High Dimensional RegressionMachine Learning(Contributed)Room Salem Room Lower Level 1Chair Hanxiang Peng Indiana University-Purdue University

1000 AM Variable Selection for High-Dimensional Nonparametric Or-dinary Differential Equation Models With Applications toDynamic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu11University of Rochester 2State University of New York atAlbany 3George Washington University

1020 AM BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University 2Cornell University

1040 AM A Sparse Linear Discriminant Analysis Method withAsymptotic Optimality for Multiclass ClassificationRuiyan Luo and Xin Qi Georgia State University

1100 AM Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles Kooperberg FredHutchinson Cancer Research Center

1120 AM Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin Li Wright State University

1140 AM Rank Estimation and Recovery of Low-rank Matrices ForFactor Model with Heteroscedastic NoiseJingshu Wang and Art B Owen Stanford University

1200 PM Floor Discussion

Tuesday June 17 130 PM - 310 PM

Session 41 Distributional Inference and its Impact on Statis-tical Theory and Practice (Invited)Room Salon A Lower Level 1Organizers Min-ge Xie Rutgers University Thomas Lee Univer-sity of California at Davis thomascmleegmailcomChair Min-ge Xie Rutgers University

130 PM Stat Wars Episode IV A New Hope (For Objective Infer-ence)Keli Liu and Xiao-Li Meng Harvard University

155 PM Higher Order Asymptotics for Generalized Fiducial Infer-enceAbhishek Pal Majumdarand Jan Hannig University ofNorth Carolina at Chapel Hill

220 PM Generalized Inferential ModelsRyan Martin University of Illinois at Chicago

245 PM Formal Definition of Reference Priors under a General Classof DivergenceDongchu Sun University of Missouri

310 PM Floor Discussion

Session 42 Applications of Spatial Modeling and ImagingData (Invited)Room Salon B Lower Level 1Organizer Karen Kafadar Indiana UniversityChair Karen Kafadar Indiana University

130 PM Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1

and James Coan2 1Duke University 2University of Virginia

155 PM A Hierarchical Model for Simultaneous Detection and Esti-mation in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist2 1DePaul Univer-sity 2Johns Hopkins University

220 PM On the Relevance of Accounting for Spatial Correlation ACase Study from FloridaLinda J Young1 and Emily Leary2 1USDA NASS RDD2University of Florida

245 PM Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico 2University of Texas at Austin

310 PM Floor Discussion

Session 43 Recent Development in Survival Analysis andStatistical Genetics (Invited)Room Salon C Lower Level 1Organizers Junlong Li Harvard University KyuHa Lee HarvardUniversityChair Junlong Li Harvard University

130 PM Restricted Survival Time and Non-proportional HazardsZhigang Zhang Memorial Sloan Kettering Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 27

Tuesday June 17 130 PM - 310 PM Scientific Program (Presenting Author)

155 PM Empirical Null using Mixture Distributions and Its Applica-tion in Local False Discovery RateDoHwan Park University of Maryland

220 PM A Bayesian Illness-Death Model for the Analysis of Corre-lated Semi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici1 1Harvard University 2Dana FarberCancer Institute

245 PM Detection of Chromosome Copy Number Variations in Mul-tiple SequencesXiaoyi Min Chi Song and Heping Zhang Yale University

310 PM Floor Discussion

Session 44 Bayesian Methods and Applications in ClinicalTrials with Small Population (Invited)Room Salon D Lower Level 1Organizer Alan Chiang Eli Lilly and CompanyChair Ming-Dauh Wang Eli Lilly and Company

130 PM Applications of Bayesian Meta-Analytic Approach at Novar-tisQiuling Ally He Roland Fisch and David Ohlssen Novar-tis Pharmaceuticals Corporation

155 PM Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and YuanJi3 1University of Texas at Austin 2Harvard University3University of Texas at Austin

220 PM Innovative Designs and Practical Considerations for Pedi-atric StudiesAlan Y Chiang Eli Lilly and Company

245 PM Discussant Ming-Dauh Wang Eli Lilly and Company

310 PM Floor Discussion

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis (Invited)Room Salon G Lower Level 1Organizer Ming Wang Penn State College of MedicineChair Lijun Zhang Penn State College of Medicine

130 PM partDSA for Deriving Survival Risk Groups EnsembleLearning and Variable SelectionAnnette Molinaro1 Adam Olshen1 and RobertStrawderman2 1University of California at San Francisco2University of Rochester

155 PM Predictive Accuracy of Time-Dependent Markers for Sur-vival OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2 1University ofKentucky 2University of North Carolina at Chapel Hill

220 PM Estimating the Effectiveness in HIV Prevention Trials by In-corporating the Exposure Process Application to HPTN 035DataJingyang Zhang1 and Elizabeth R Brown2 1FredHutchinson Cancer Research Center 2Fred Hutchinson Can-cer Research CenterUniversity of Washington

245 PM Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2 1Penn State College ofMedicine 2Emory University

310 PM Floor Discussion

Session 46 Missing Data the Interface between Survey Sam-pling and Biostatistics (Invited)Room Salon H Lower Level 1Organizer Jiwei Zhao University of WaterlooChair Peisong Han University of Waterloo

130 PM Likelihood-based Inference with Missing Data UnderMissing-at-randomShu Yang and Jae Kwang Kim Iowa State University

155 PM Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang Chen Iowa State University

220 PM A New Estimation with Minimum Trace of Asymptotic Co-variance Matrix for Incomplete Longitudinal Data with aSurrogate ProcessBaojiang Chen1 and Jing Qin2 1University of Nebraska2National Institutes of Health

245 PM Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook2 1Queenrsquos Univer-sity 2University of Waterloo

310 PM Floor Discussion

Session 47 New Statistical Methods for Comparative Effec-tiveness Research and Personalized Medicine (Invited)Room Salon I Lower Level 1Organizer Jane Paik Kim Stanford UniversityChair Jane Paik Kim Stanford University

130 PM Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin41University of Texas MD Anderson Cancer Center 2BaylorCollege of Medicine 3University of Texas MD AndersonCancer Center 4National Institutes of Health

155 PM Choice between Superiority and Non-inferiority in Compar-ative Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford University

220 PM An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze Lai Stanford University

245 PM Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins University

310 PM Floor Discussion

Session 48 Student Award Session 1 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Zhezhen Jin Columbia University

28 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 330 PM - 530 PM

130PM Regularization After Retention in Ultrahigh DimensionalLinear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2 1ColumbiaUniversity 2Binghamton University

155 PM Personalized Dose Finding Using Outcome Weighted Learn-ingGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hill

220 PM Survival Rates Prediction When Training Data and TargetData Have Different Measurement ErrorCheng Zheng and Yingye Zheng Fred Hutchinson CancerResearch Center

245 PM Hard Thresholded Regression Via Linear ProgrammingQiang Sun University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 49 Network AnalysisUnsupervised Methods(Contributed)Room Eugene Room Lower Level 1Chair Chunming Zhang University of Wisconsin-Madison

130 PM Community Detection in Multilayer Networks A Hypothe-sis Testing ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel Hill

150 PM Network Enrichment Analysis with Incomplete Network In-formationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan 2University of Washington

210 PM Estimation of A Linear Model with Fuzzy Data Treated asSpecial Functional DataWang Dabuxilatu Guangzhou University

230 PM Efficient Estimation of Sparse Directed Acyclic Graphs Un-der Compounded Poisson DataSung Won Han and Hua Zhong New York University

250 PM Asymptotically Normal and Efficient Estimation ofCovariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and HarrisonZhou Yale University

310 PM Floor Discussion

Session 50 Personalized Medicine and Adaptive Design(Contributed)Room Salem Room Lower Level 1Chair Danping Liu National Institutes of Health

130 PM MicroRNA Array NormalizationLi-Xuan and Qin Zhou Memorial Sloan Kettering CancerCenter

150 PM Combining Multiple Biomarker Models with Covariates inLogistic Regression Using Modified ARM (Adaptive Re-gression by Mixing) ApproachYanping Qiu1 and Rong Liu2 1Merck amp Co 2BayerHealthCare

210 PM A New Association Test for Case-Control GWAS Based onDisease Allele SelectionZhongxue Chen Indiana University

230 PM On Classification Methods for Personalized Medicine andIndividualized Treatment RulesDaniel Rubin United States Food and Drug Administration

250 PM Bayesian Adaptive Design for Dose-Finding Studies withDelayed Binary ResponsesXiaobi Huang1 and Haoda Fu2 1Merck amp Co 2Eli Lillyand Company

310 PM Floor Discussion

Tuesday June 17 330 PM - 530 PM

Session 51 New Development in Functional Data Analysis(Invited)Room Salon A Lower Level 1Organizer Guanqun Cao Auburn UniversityChair Guanqun Cao Auburn University

330 PM Variable Selection and Estimation for Longitudinal SurveyDataLi Wang1 Suojin Wang2 and Guannan Wang11University of Georgia 2Texas AampM University

355 PM Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and BorisFreydin1 1Thomas Jefferson University 2George Wash-ington University

420 PM A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1 1NewYork University 2Columbia University

445 PM Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan Mizera Universityof Alberta

510 PM Floor Discussion

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs (Invited)Room Salon B Lower Level 1Organizer Gang Li Johnson amp JohnsonChair Yi Wang Novartis Pharmaceuticals Corporation

330 PM Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric Chi Amgen Inc

350 PM New Analytical Methods for Non-Inferiority Trials Covari-ate Adjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug Administration

410 PM Where is the Right Balance for Designing an EfficientBiosimilar Clinical Program - A Biostatistic Perspective onAppropriate Applications of Statistical Principles from NewDrug to BiosimilarsYulan Li Novartis Pharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 29

Tuesday June 17 330 PM - 530 PM Scientific Program (Presenting Author)

430 PM Challenges of designinganalyzing trials for Hepatitis CdrugsGreg Soon United States Food and Drug Administration

450 PM GSKrsquos Patient-level Data Sharing ProgramShuyen Ho GlaxoSmithKline plc

510 PM Floor Discussion

Session 53 Gatekeeping Procedures and Their Applicationin Pivotal Clinical Trials (Invited)Room Salon C Lower Level 1Organizer Michael Lee Johnson amp JohnsonChair Michael Lee Johnson amp Johnson

330 PM A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane2 1Novartis PharmaceuticalsCorporation 2Northwestern University

355 PM Multiple Comparisons in Complex Trial DesignsHM James Hung United States Food and Drug Adminis-tration

420 PM Use of Bootstrapping in Adaptive Designs with MultiplicityIssuesJeff Maca Quintiles

445 PM Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael Lee Janssen Research amp Development

510 PM Floor Discussion

Session 54 Approaches to Assessing Qualitative Interactions(Invited)Room Salon D Lower Level 1Organizer Guohua (James) Pan Johnson amp JohnsonChair James Pan Johnson amp Johnson

330 PM Interval Based Graphical Approach to Assessing QualitativeInteractionGuohua Pan and Eun Young Suh Johnson amp Johnson

355 PM Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong Luo Celgene Corporation

420 PM A Bayesian Approach to Qualitative InteractionEmine O Bayman University of Iowa

445 PM Discussant Surya Mohanty Johnson amp Johnson

510 PM Floor Discussion

Session 55 Interim Decision-Making in Phase II Trials(Invited)Room Salon G Lower Level 1Organizer Lanju Zhang AbbVie IncChair Lanju Zhang AbbVie Inc

330 PM Evaluation of Interim Dose Selection Methods Using ROCApproachDeli Wang Lu Cui Lanju Zhang and Bo Yang AbbVieInc

355 PM Interim Monitoring for Futility Based on Probability of Suc-cessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh31AbbVie Inc 2Merck amp Co 3GlaxoSmithKline plc

420 PM Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran Suresh GlaxoSmithKlineplc

445 PM Discussant Peng Chen Celgene Corporation510 PM Floor Discussion

Session 56 Recent Advancement in Statistical Methods(Invited)Room Salon H Lower Level 1Organizer Dongseok Choi Oregon Health amp Science UniversityChair Dongseok Choi Oregon Health amp Science University

330 PM Exact Inference New Methods and ApplicationsIan Dinwoodie Portland State University

355 PM Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun Hong Sungkyunkwan University

420 PM Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho21Washington State University 2Seoul National University

445 PM A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying Yuan University ofTexas MD Anderson Cancer Center

510 PM Floor Discussion

Session 57 Building Bridges between Research and Practicein Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Jane Chu IBMSPSSChair Jane Chu IBMSPSS

330 PM Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and JinwooShin2 1IBM 2KAIST University

355 PM Time Series Research at the U S Census BureauBrian C Monsell U S Census Bureau

420 PM Issues Related to the Use of Time Series in Model Buildingand AnalysisWilliam WS Wei Temple University

445 PM Discussant George Tiao University of Chicago510 PM Floor Discussion

Session 58 Recent Advances in Design for BiostatisticalProblems (Invited)Room Eugene Room Lower Level 1Organizer Weng Kee Wong University of California at Los Ange-lesChair Weng Kee Wong University of California at Los Angeles

330 PM Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough Carriere University of Al-berta

30 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 830 AM - 1010 AM

355 PM Efficient Algorithms for Two-stage Designs on Phase II Clin-ical TrialsSeongho Kim1 and Weng Kee Wong2 1Wayne State Uni-versityKarmanos Cancer Institute 2University of Californiaat Los Angeles

420 PM Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-ChungWang3 and Weng Kee Wong4 1Institute of Statistical Sci-ence Academia Sinica 2National Cheng Kung University3National Taiwan University 4University of California at LosAngeles

445 PM D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle SwarmOptimizationJiaheng Qiu and Weng Kee Wong University of Californiaat Los Angeles

510 PM Floor Discussion

Session 59 Student Award Session 2 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Wenqing He University of Western Ontario

330 PM Analysis of Sequence Data Under Multivariate Trait-Dependent SamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari ENorth1 Eric Boerwinkle2 and Dan-Yu Lin1 1Universityof North Carolina at Chapel Hill 2University of Texas HealthScience Center

355 PM Empirical Likelihood Based Tests for Stochastic OrderingUnder Right CensorshipHsin-wen Chang and Ian W McKeague Columbia Uni-versity

420 PM Multiple Genetic Loci Mapping for Latent Disease LiabilityUsing a Structural Equation Modeling Approach with Appli-cation in Alzheimerrsquos DiseaseTing-Huei Chen University of North Carolina at ChapelHill

445 PM Floor Discussion

Session 60 Semi-parametric Methods (Contributed)Room Salem Room Lower Level 1Chair Ouhong Wang Amgen Inc

330 PM Semiparametric Estimation of Mean and Variance in Gener-alized Estimating EquationsJianxin Pan1 and Daoji Li2 1The University of Manch-ester 2University of Southern California

350 PM An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan Li IndianaUniversity-Purdue University Indianapolis

410 PM M-estimation for General ARMA Processes with InfiniteVarianceRongning Wu Baruch College City University of NewYork

430 PM Sufficient Dimension Reduction via Principal Lq SupportVector MachineAndreas Artemiou1 and Yuexiao Dong2 1Cardiff Univer-sity 2Temple University

450 PM Nonparametric Quantile Regression via a New MM Algo-rithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong41College of Charleston 1National Chengchi University2Shanghai University of Finance and Economics 3KansasState University 4Temple University

510 PM Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel LinderJingxian Cai and Robert Vogel Georgia Southern Uni-versity

530 PM Floor Discussion

Wednesday June 18 830 AM - 1010 AM

Session 61 Statistical Challenges in Variable Selection forGraphical Modeling (Invited)Room Salon A Lower Level 1Organizer Hua (Judy) Zhong New York UniversityChair Hua (Judy) Zhong New York University

830 AM Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1 1 Univer-sity of Cambridge 2 Columbia University

855 AM High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2 1Temple University 2EmoryUniversity

920 AM Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and MarinaVannucci3 1Stanford University 2University of Texas MDAnderson Cancer Center 3Rice University

945 AM Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genev-era I Allen2 and Zhandong Liu3 1University of Texas atAustin 2Rice University 3Baylor College of Medicine

1010 AM Floor Discussion

Session 62 Recent Advances in Non- and Semi-parametricMethods (Invited)Room Salon B Lower Level 1Organizer Lan Xue Oregon State UniversityChair Quanqun Cao Auburn University

830 AM Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential SplineFamilyLan Zhou Texas AampM University

855 AM Estimating Time-Varying Effects for Overdispersed Recur-rent Data with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2Mouna Akacha3 and Heinz Schmidli3 1Vanderbilt Univer-sity 2University of North Carolina at Chapel Hill 3NovartisPharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 31

Wednesday June 18 830 AM - 1010 AM Scientific Program (Presenting Author)

920 AM Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily Wang University of Georgia

945 AM Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and AnnieQu2 1Oregon State University 2University of Illinois atUrbana-Champaign 3Lung and Blood Institute

1010 AM Floor Discussion

Session 63 Statistical Challenges and Development in Can-cer Screening Research (Invited)Room Salon C Lower Level 1Organizer Yu Shen University of Texas MD Anderson CancerCenterChair Yu Shen Professor University of Texas M D AndersonCancer Center

830 AM Overdiagnosis in Breast and Prostate Cancer ScreeningConcepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing Xia Fred Hutchin-son Cancer Research Center

855 AM Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington 2Fred Hutchinson Cancer Re-search Center

920 AM Estimating Screening Test Effectiveness when Screening In-dication is UnknownRebecca Hubbard Group Health Research Institute

945 AM Developing Risk-Based Screening Guidelines ldquoEqual Man-agement of Equal RisksrdquoHormuzd Katki National Cancer Institute

1010 AM Floor Discussion

Session 64 Recent Developments in the Visualization andExploration of Spatial Data (Invited)Room Salon D Lower Level 1Organizer Juergen Symanzik Utah State UniversityChair Juergen Symanzik Utah State University

830 AM Recent Advancements in Geovisualization with a CaseStudy on Chinese ReligionsJuergen Symanzik1 and Shuming Bao2 1Utah State Uni-versity 2University of Michigan

855 AM Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She2 1University ofMichigan 2Wuhan University

920 AM Probcast Creating and Visualizing Probabilistic WeatherForecastsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3Tilmann Gneiting4 and Adrian Raftery2 1Seattle Uni-versity 2University of Washington 3Bigger Boat Consulting4University Heidelberg

945 AM Discussant Karen Kafadar Indiana University

1010 AM Floor Discussion

Session 65 Advancement in Biostaistical Methods and Ap-plications (Invited)Room Salon G Lower Level 1Organizer Sin-ho Jung Duke UniversityChair Dongseok Choi Oregon Health amp Science University

830 AM Estimation of Time-Dependent AUC under Marker-Dependent SamplingXiaofei Wang and Zhaoyin Zhu Duke University

855 AM A Measurement Error Approach for ModelingAccelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy Dunloop NorthwesternUniversity

920 AM Real-Time Prediction in Clinical Trials A Statistical Historyof REMATCHDaniel F Heitjan and Gui-shuang Ying University ofPennsylvania

945 AM An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C MorrisonElaine C Johnson Stephen R Planck and James T Rosen-baum Oregon Health amp Science University

1010 AM Floor Discussion

Session 66 Analysis of Complex Data (Invited)Room Salon H Lower Level 1Organizer Mesbah Mounir University of Paris 6Chair Mesbah Mounir University of Paris 6

830 AM Integrating Data from Heterogeneous Studies Using OnlySummary Statistics Efficiency and Robustness

Min-ge Xie Rutgers University

855 AM A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University 2Koc University

920 AM A Comparison of Two Approaches for Acute Leukemia Pa-tient ClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng31University of Calgary 2Enbridge Pipelines 3University ofGuelph

945 AM On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 ValerieMcGuire1 and John Shepherd3 1VA Palo Alto HealthCare System amp Stanford University 2Purdue University3University of California at San Francisco 4VA Palo AltoHealth Care System

1010 AM Floor Discussion

Session 67 Statistical Issues in Co-development of Drug andBiomarker (Invited)Room Salon I Lower Level 1Organizer Liang Fang Gilead SciencesChair Liang Fang Gilead Sciences

32 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 1030 AM-1210 PM

830 AM Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in ComparativeEffectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong WooKim3 1Stanford University 2Onyx Pharmaceuticals3Microsoft Corportation

855 AM Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2 1University of Wash-ington 2National Institutes of Health

920 AM An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael Wolf Amgen Inc

945 AM Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas Bengtsson Genentech Inc

1010 AM Floor Discussion

Session 68 New Challenges for Statistical Ana-lystProgrammer (Invited)Room Eugene Room Lower Level 1Organizer Xianming (Steve) Zheng Eli Lilly and CompanyChair Xianming (Steve) Zheng Eli Lilly and Company

830 AM Similarities and Differences in Statistical Programmingamong CRO and Pharmaceutical IndustriesMark Matthews inVentiv Health Clinical

855 AM Computational Aspects for Detecting Safety Signals in Clin-ical TrialsJyoti Rayamajhi Eli Lilly and Company

920 AM Bayesian Network Meta-Analysis Methods An Overviewand A Case StudyBaoguang Han1 Wei Zou2 and Karen Price1 1Eli Lillyand Company 2inVentiv Clinical Health

945 AM Floor Discussion

Session 69 Adaptive and Sequential Methods for ClinicalTrials (Invited)Room Portland Room Lower Level 1Organizers Zhengjia Chen Emory University Yichuan ZhaoGeorgia State University yichuangsueduChair Zhengjia Chen Emory University

830 AM Bayesian Data Augmentation Dose Finding with ContinualReassessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2 1 University ofTexas MD Anderson Cancer Center 2 University of HongKong

855 AM Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying Yuan University of TexasMD Anderson Cancer Center

920 AM Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston

945 AM Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen21 Georgia State University 2 Emory University

1010 AM Floor Discussion

Wednesday June 18 1030 AM-1210 PM

Session 70 Survival Analysis (Contributed)Room Portland Room Lower Level 1Chair Zhezhen Jin Columbia University

1030 AM Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua Naranjo Western Michi-gan University

1050 AM Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary Endpoint

Nibedita Bandyopadhyay Janssen Research amp Develop-ment

1110 AM Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen Cai University of North Carolinaat Chapel Hill

1130 AM Floor Discussion

Session 71 Complex Data Analysis Theory and Application(Invited)Room Salon A Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1030 AM Supervised Singular Value Decomposition and Its Asymp-totic Properties

Gen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill 2Rutgers Uni-versity

1055 AM New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng21University of Arizona 2Columbia University

1120 AM A Statistical Approach to Set Classification by Feature Se-lection with Applications to Classification of HistopathologyImages

Sungkyu Jung1 and Xingye Qiao2 1University of Pitts-burgh 2Binghamton University State University of NewYork

1145 AM A Smoothing Spline Model for analyzing dMRI Data ofSwallowing

Binhuan Wang Ryan Branski Milan Amin and Yixin FangNew York University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 33

Wednesday June 18 1030 AM-1210 PM Scientific Program (Presenting Author)

Session 72 Recent Development in Statistics Methods forMissing Data (Invited)Room Salon B Lower Level 1Organizer Nanhua Zhang Cincinnati Childrenrsquos Hospital MedicalCenterChair Haoda Fu Eli Lilly and Company

1030 AM A Semiparametric Inference to Regression Analysis withMissing Covariates in Survey DataShu Yang and Jae-kwang Kim Iowa State University

1055 AM Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2 1University of Waterloo2University of Michigan

1120 AM Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1 1United States Centers forDisease Control and Prevention

1145 AM Marginal Treatment Effect Estimation Using Pattern-Mixture ModelZhenzhen Xu United States Food and Drug Administration

1210 PM Floor Discussion

Session 73 Machine Learning Methods for Causal Inferencein Health Studies (Invited)Room Salon C Lower Level 1Organizer Mi-Ok Kim Cincinnati Childrenrsquos Hospital MedicalCenterChair Mi-Ok Kim Cincinnati Childrenrsquos Hospital Medical Center

1030 AM Causal Inference of Interaction Effects with Inverse Propen-sity Weighting G-Computation and Tree-Based Standard-izationJoseph Kang1 Xiaogang Su2 Lei Liu1 and MarthaDaviglus3 1 Northwestern University 2 University of Texasat El Paso 3 University of Illinois at Chicago

1055 AM Practice of Causal Inference with the Propensity of BeingZero or OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and PeterM Steiner3 1 Northwestern University 2University ofCincinnatiCincinnati Childrenrsquos Hospital Medical Center3University of Wisconsin-Madison

1120 AM Propensity Score and Proximity Matching Using RandomForestPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1 1SanDiego State University 2University of Texas at El Paso

1145 AM Discussant Joseph Kang Northwestern University

1210 PM Floor Discussion

Session 74 JP Hsu Memorial Session (Invited)Room Salon D Lower Level 1Organizers Lili Yu Georgia Southern University Karl PeaceGeorgia Southern University kepeacegeorgiasoutherneduChair Lili Yu Georgia Southern University

1030 AM Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili Yu Georgia Southern University

1055 AM (Student Paper Award) Estimating a Change-Point in High-Dimensional Markov Random Field Models Sandipan RoyUniversity of Michigan

1120 AM A Comparison of Size and Power of Tests of Hypotheses onParameters Based on Two Generalized Lindley DistributionsMacaulay Okwuokenye Biogen Idec

1145 AM Floor Discussion

Session 75 Challenge and New Development in Model Fit-ting and Selection (Invited)Room Salon G Lower Level 1Organizer Zhezhen Jin Columbia UniversityChair Cuiling Wang Yeshiva University

1030 AM Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2 1University ofNevada at Las Vegas 2American Museum of Natural History

1055 AM On A Class of Maximum Empirical Likelihood EstimatorsDefined By Convex FunctionsHanxiang Peng and Fei Tan Indiana University-PurdueUniversity Indianapolis

1120 AM Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai Wang New Jersey Institute of Technology

1145 AM Dual Model Misspecification in Generalized Linear Modelswith Error in VariablesXianzheng Huang University of Southern California

1210 PM Floor Discussion

Session 76 Advanced Methods and Their Applications inSurvival Analysis (Invited)Room Salon H Lower Level 1Organizers Jiajia Zhang University of South Carolina Wenbin LuNorth Carolina State UniversityChair Jiajia Zhang University of South Carolina

1030 AM Kernel Smoothed Profile Likelihood Estimation in the Ac-celerated Failure Time Frailty Model for Clustered SurvivalDataBo Liu1 Wenbin Lu1 and Jiajia Zhang2 1North CarolinaState University 2South Carolina University

1055 AM Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2 1National Uni-versity of Singapore 2Emory University

1120 AM Analysis of Event History Data in Tuberculosis (TB) Screen-ingJoan Hu Simon Fraser University

1145 AM On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1

and Mei-Cheng Wang3 1University of Texas MD An-derson Cancer Center 2University of Texas Health ScienceCenter at Houston 3Johns Hopkins University

1210 PM Floor Discussion

34 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Session name

Session 77 High Dimensional Variable Selection and Multi-ple Testing (Invited)Room Salon I Lower Level 1Organizer Zhigen Zhao Temple UniversityChair Jichun Xie Temple University

1030 AM On Procedures Controlling the False Discovery Rate forTesting Hierarchically Ordered HypothesesGavin Lynch and Wenge Guo New Jersey Institute ofTechnology

1055 AM Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 andYufeng Liu4 1University of Texas MD Anderson Can-

cer Center 2North Carolina State University 3University ofArizona 4University of North Carolina at Chapel Hill

1120 AM Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional Regression

Zhigen Zhao1 and Pengsheng Ji2 1Temple University2University of Georgia

1145 AM Pathwise Calibrated Active Shooting Algorithm with Appli-cation to Semiparametric Graph Estimation

Tuo Zhao1 and Han Liu2 1Johns Hopkins University2Princeton University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 35

Abstracts

Abstracts

Session 1 Emerging Statistical Methods for ComplexData

Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao GuoUniversity of Wisconsin-MadisoncmzhangstatwisceduIn statistical analysis of functional magnetic resonance imaging(fMRI) dealing with the temporal correlation is a major challengein assessing changes within voxels In this paper we aim to addressthis issue by considering a semi-parametric model for fMRI dataFor the error process in the semi-parametric model we constructa banded estimate of the auto-correlation matrix R and propose arefined estimate of the inverse of R Under some mild regularityconditions we establish consistency of the banded estimate with anexplicit convergence rate and show that the refined estimate con-verges under an appropriate norm Numerical results suggest thatthe refined estimate performs conceivably well when it is applied tothe detection of the brain activity

Kernel Additive Sliced Inverse RegressionHeng LianNanyang Technological UniversityshellinglianhenghotmailcomIn recent years nonlinear sufficient dimension reduction (SDR)methods have gained increasing popularity However while semi-parametric models in regression have fascinated researchers for sev-eral decades with a large amount of literature parsimonious struc-tured nonlinear SDR has attracted little attention so far In this pa-per extending kernel sliced inverse regression we study additivemodels in the context of SDR and demonstrate its potential use-fulness due to its flexibility and parsimony Theoretically we clar-ify the improved convergence rate using additive structure is due tofaster rate of decay of the kernelrsquos eigenvalues Additive structurealso opens the possibility of nonparametric variable selection Thissparsification of the kernel however does not introduce additionaltuning parameters in contrast with sparse regression Simulatedand real data sets are presented to illustrate the benefits and limita-tions of the approach

Variable Selection with Prior Information for Generalized Lin-ear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3

1Oregon State University2Nielsen Company3Nielsen CompanyyuanjiangstatoregonstateeduLASSO is a popular statistical tool often used in conjunction withgeneralized linear models that can simultaneously select variablesand estimate parameters When there are many variables of in-terest as in current biological and biomedical studies the powerof LASSO can be limited Fortunately so much biological andbiomedical data have been collected and they may contain usefulinformation about the importance of certain variables This paperproposes an extension of LASSO namely prior LASSO (pLASSO)to incorporate that prior information into penalized generalized lin-ear models The goal is achieved by adding in the LASSO criterion

function an additional measure of the discrepancy between the priorinformation and the model For linear regression the whole solu-tion path of the pLASSO estimator can be found with a proceduresimilar to the Least Angle Regression (LARS) Asymptotic theoriesand simulation results show that pLASSO provides signicant im-provement over LASSO when the prior information is relatively ac-curate When the prior information is less reliable pLASSO showsgreat robustness to the misspecication We illustrate the applicationof pLASSO using a real data set from a genome-wide associationstudy

Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2

1University of Missouri at Columbia2Purdue UniversityzhangxianymissourieduIn this talk we will focus on the problem of conducting inferencefor high dimensional weakly dependent time series Motivated bythe applications in modern high dimensional inference we derive aGaussian approximation result for the maximum of a sum of weaklydependent vectors using Steinrsquos method where the dimension ofthe vectors is allowed to be exponentially larger than the samplesize Our result reveals an interesting phenomenon arising fromthe interplay between the dependence and dimensionality the moredependent of the data vector the slower diverging rate of the di-mension is allowed for obtaining valid statistical inference A typeof dimension-free dependence structure is derived as a by-productBuilding on the Gaussian approximation result we propose a block-wise multiplier (Wild) bootstrap that is able to capture the depen-dence between and within the data vectors and thus provides high-quality distributional approximation to the distribution of the maxi-mum of vector sum in the high dimensional context

Session 2 Statistical Methods for Sequencing Data Anal-ysis

A Penalized Likelihood Approach for Robust Estimation of Iso-form ExpressionHui Jiang1 and Julia Salzman2

1University of Michigan2Stanford UniversityjianghuiumicheduUltra high-throughput sequencing of transcriptomes (RNA-Seq) hasenabled the accurate estimation of gene expression at individual iso-form level However systematic biases introduced during the se-quencing and mapping processes as well as incompleteness of thetranscript annotation databases may cause the estimates of isoformabundances to be unreliable and in some cases highly inaccurateThis paper introduces a penalized likelihood approach to detect andcorrect for such biases in a robust manner Our model extends thosepreviously proposed by introducing bias parameters for reads AnL1 penalty is used for the selection of non-zero bias parametersWe introduce an efficient algorithm for model fitting and analyzethe statistical properties of the proposed model Our experimentalstudies on both simulated and real datasets suggest that the model

36 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

has the potential to improve isoform-specific gene expression es-timates and identify incompletely annotated gene models This isjoint work with Julia Salzman

Classification on Sequencing Data and its Applications on a Hu-man Breast Cancer DatasetJun LiUniversity of Notre DamejunlindeduGene expression measured by the RNA-sequencing technique canbe used to classify biological samples from different groups suchas normal vs early-stage cancer vs cancer To get an interpretableclassifier with high robustness and generality often some types ofshrinkage is used to give a linear and sparse model In microarraydata an example is PAM (pattern analysis of microarrays) whichuses a nearest shrunken centroid classifier To accommodate the dis-crete nature of sequencing data this model was modified by usinga Poisson distribution We further generalize this model by usinga negative binomial distribution to take account of the overdisper-sion in the data We compare the performance of Gaussian Poissonand negative binomial based models on simulation data as well asa human breast cancer dataset We find while the cross-validationmisclassification rate of the three methods are often quite similarthe number of genes used by the models can be quite different andusing Gaussian model on carefully normalized data typically givesmodels with the least number of genes

Power-Robustness Analysis of Statistical Models for RNA Se-quencing DataGu Mi Yanming Di and Daniel W SchaferOregon State UniversitymigstatoregonstateeduWe present results from power-robustness analysis of several sta-tistical models for RNA sequencing (RNA-Seq) data We fit themodels to several RNA-Seq datasets perform goodness-of-fit teststhat we developed (Mi extitet al 2014) and quantify variations notexplained by the fitted models The statistical models we comparedare all based on the negative binomial (NB) distribution but differin how they handle the estimation of the dispersion parameter Thedispersion parameter summarizes the extra-Poisson variation com-monly observed in RNA-Seq data One widely-used power-savingstrategy is to assume some commonalities of NB dispersion param-eters across genes via simple models relating them to mean expres-sion rates and many such models have been proposed Howeverthe power benefit of the dispersion-modeling approach relies on theestimated dispersion models being adequate It is not well under-stood how robust the approach is if the fitted dispersion models areinadequate Our empirical investigations provide a further step to-wards understanding the pros and cons of different NB dispersionmodels and draw attention to power-robustness evaluation a some-what neglected yet important aspect of RNA-Seq data analysis

Session 3 Modeling Big Biological Data with ComplexStructures

High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1

1University of California at DavisjiepengucdaviseduProbabilistic graphical models are used as graphical presentationsof probability distributions particularly their conditional indepen-dence properties Graphical models have broad applications in the

fields of biology social science linguistic neuroscience etc Wewill focus on graphical model structure learning under the high di-mensional regime where to avoid over-fitting and to develop com-putationally efficient algorithms are particularly challenging Wewill discuss the use of data perturbation and model aggregation formodel building and model selection

Statistical Analysis of RNA Sequencing DataMingyao Li and Yu HuUniversity of PennsylvaniamingyaomailmedupenneduRNA sequencing (RNA-Seq) has rapidly replaced microarrays asthe major platform for transcriptomics studies Statistical analysisof RNA-Seq data however is challenging because various biasespresent in RNA-Seq data complicate the analysis and if not ap-propriately corrected can affect isoform expression estimation anddownstream analysis In this talk I will first present PennSeq astatistical method that estimates isoform-specific gene expressionPennSeq is a nonparametric-based approach that allows each iso-form to have its own non-uniform read distribution By giving ad-equate weight to the underlying data this empirical approach max-imally reflects the true underlying read distribution and is effectivein adjusting non-uniformity In the second part of my talk I willpresent a statistical method for testing differential alternative splic-ing by jointly modeling multiple samples I will show simulationresults as well as some examples from a clinical study

Quantifying the Role of Steric Constraints in Nucleosome Posi-tioningH Tomas Rube and Jun S SongUniversity of Illinois at Urbana-ChampaignsongjillinoiseduStatistical positioning the localization of nucleosomes packedagainst a fixed barrier is conjectured to explain the array of well-positioned nucleosomes at the 5rsquo end of genes but the extent andprecise implications of statistical positioning in vivo are unclear Iwill examine this hypothesis quantitatively and generalize the ideato include moving barriers Early experiments noted a similarity be-tween the nucleosome profile aligned and averaged across genes andthat predicted by statistical positioning however our study demon-strates that the same profile is generated by aligning random nu-cleosomes calling the previous interpretation into question Newrigorous analytic results reformulate statistical positioning as pre-dictions on the variance structure of nucleosome locations in indi-vidual genes In particular a quantity termed the variance gradientdescribing the change in variance between adjacent nucleosomes istested against recent high-throughput nucleosome sequencing dataConstant variance gradients render evidence in support of statisticalpositioning in about 50 of long genes Genes that deviate frompredictions have high nucleosome turnover and cell-to-cell gene ex-pression variability Our analyses thus clarify the role of statisticalpositioning in vivo

Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I MiasMichigan State UniversitygmiasmsueduThe emergence and ready availability of novel -omics technologiesis guiding our efforts to make advances in the implementation ofpersonalized medicine High quality genomic data is now comple-mented with other dynamic omes (eg transcriptomes proteomesmetabolomes autoantibodyomes) and other data providing tem-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 37

Abstracts

poral profiling of thousands of molecular components The anal-ysis of such dynamic omics data necessitates the development ofnew statistical and computational methodology towards the inte-gration of the different platforms Such an approach allows us tofollow changes in the physiological states of an individual includ-ing pathway changes over time and associated network interactions(inferred nodes amp connections) A framework implementing suchmethodology will be presented in association with a pilot person-alized medicine study that monitored an initially healthy individ-ual over multiple healthy and disease states The framework willbe described including raw data analysis approaches for transcrip-tome (RNA) sequencing mass spectrometry (proteins and smallmolecules) and protein array data and an overview of quantita-tion methods available for each analysis Examples how the data isintegrated in this framework using the personalized medicine pilotstudy will also be presented The extended framework infers novelpathways components and networks assessing topological changesand is being applied to other longitudinal studies to display changesthrough dynamical biological states Assessing such multimodalomics data has the great potential for implementations of a morepersonalized precise and preventative medicine

Session 4 Bayesian Approaches for Modeling DynamicNon-Gaussian Responses

Binary State Space Mixed Models with Flexible Link FunctionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut2Amgen Inc3Federal University of Rio de JaneirodipakdeyuconneduState space models (SSM) for binary time series data using a flexibleskewed link functions are introduced in this paper Commonly usedlogit cloglog and loglog links are prone to link misspecificationbecause of their fixed skewness Here we introduce three flexiblelinks as alternatives they are generalized extreme value (GEV) linksymmetric power logit (SPLOGIT) link and scale mixture of nor-mal (SMN) link Markov chain Monte Carlo (MCMC) methods forBayesian analysis of SSM with these links are implemented usingthe JAGS package a freely available software Model comparisonrelies on the deviance information criterion (DIC) The flexibilityof the propose model is illustrated to measure effects of deep brainstimulation (DBS) on attention of a macaque monkey performinga reaction-time task (Smith et al 2009) Empirical results showedthat the flexible links fit better over the usual logit and cloglog links

Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak K Dey21University of Cincinnati2University of Connecticut3Lawrence Berkeley National LaboratoryxiawanguceduA Bayesian hierarchical model is developed for count data with spa-tial and temporal correlations as well as excessive zeros unevensampling intensities and inference on missing spots Our contribu-tion is to develop a model on zero-inflated count data that providesflexibility in modeling spatial patterns in a dynamic manner andalso improves the computational efficiency via dimension reductionThe proposed methodology is of particular importance for studyingspecies presence and abundance in the field of ecological sciences

The proposed model is employed in the analysis of the survey databy the Northeast Fisheries Sciences Center (NEFSC) for estimationand prediction of the Atlantic cod in the Gulf of Maine - GeorgesBank region Model comparisons based on the deviance informa-tion criterion and the log predictive score show the improvement bythe proposed spatial-temporal model

Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing WengNational Chengchi UniversitychwengnccuedutwThe Bayesian item response models have been used in modeling ed-ucational testing and Internet ratings data Typically the statisticalanalysis is carried out using Markov Chain Monte Carlo (MCMC)methods However MCMC methods may not be computational fea-sible when real-time data continuously arrive and online parameterestimation is needed We develop an efficient algorithm based ona deterministic moment matching method to adjust the parametersin real-time The proposed online algorithm works well for tworeal datasets Moreover when compared with the offline MCMCmethods it achieves good accuracy but with considerably less com-putational time

Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras Stephen Pris-ley James Campbell and David GainesVirginia Institute of TechnologyjielivteduThe increasing demand for modeling spatio-temporal data is com-putationally challenging due to the large scale spatial and temporaldimensions involved The traditional Markov Chain Monte Carlo(MCMC) method suffers from slow convergence and is computa-tionally expensive The Integrated Nested Laplace Approximation(INLA) has been proposed as an alternative to speed up the compu-tation process by avoiding the extensive sampling process requiredby MCMC However even with INLA handling large-scale spatio-temporal prediction datasets remains difficult if not infeasible inmany cases This chapter proposes a new Divide-Recombine (DR)prediction method for dealing with spatio-temporal data A largespatial region is divided into smaller subregions and then INLA isapplied to fit a spatio-temporal model to each subregion To recoverthe spatial dependence an iterative procedure has been developedto recombine the model fitting and prediction results In particularthe new method utilizes a model offset term to make adjustmentsfor each subregion using information from neighboring subregionsStable estimationprediction results are obtained after several updat-ing iterations Simulations are used to validate the accuracy of thenew method in model fitting and prediction The method is thenapplied to the areal (census tract level) count data for Lyme diseasecases in Virginia from 2003 to 2010

Session 5 Recent Advances in Astro-Statistics

Embedding the Big Bang Cosmological Model into a BayesianHierarchical Model for Super Nova Light Curve DataDavid van Dyk Roberto Trotta Xiyun Jiao and Hikmatali ShariffImperial College LondondvandykimperialacukThe 2011 Nobel Prize in Physics was awarded for the discovery thatthe expansion of the Universe is accelerating This talk describes a

38 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Bayesian model that relates the difference between the apparent andintrinsic brightnesses of object to their distance which in turn de-pends on parameters that describe this expansion While apparentbrightness can be readily measured intrinsic brightness can only beobtained for certain objects Type Ia Supernova occur when ma-terial accreting onto a white dwarf drives mass above a thresholdand triggers a powerful supernova explosion Because this occursonly in a particular physical scenario we can use covariates to es-timate intrinsic brightness We use a hierarchical Bayesian modelto leverage this information to study the expansion history of theUniverse The model includes computer models that relate expan-sion parameters to observed brightnesses along with componentsthat account for measurement error data contamination dust ab-sorption repeated measures and covariate adjustment uncertaintySophisticated MCMC methods are employed for model fitting and asecondary Bayesian analysis is conducted for residual analysis andmodel checking

Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew Graham CiroDonalek and Andrew DrakeCalifornia Institute of TechnologyaamastrocaltecheduAstronomy datasets have been large and are getting larger by the day(TB to PB) This necessitates the use of advanced statistics for manypurposesHowever the datasets are often so large that small contam-ination rates imply large number of wrong results This makes blindapplications of methodologies unattractive Astronomical transientsare one area where rapid follow-up observations are required basedon very little data We show how the use of domain knowledge inthe right measure at the right juncture can improve classificationperformance We demonstrate this using Bayesian Networks andGaussian Process Regression on datasets from the Catalina Real-Time transient Survey which has covered 80 of the sky severaltens to a few hundreds of times over the last decade This becomeeven more critical as we move beyond PB-sized datasets in the com-ing years

Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku VrtilekHarvard UniversitybornnstatharvardeduBecause of their singular nature the primary method to obtain in-formation about stellar mass black holes is to study those that arepart of a binary system However we have no widely applicablemeans of determining the nature of the compact object (whether ablack hole [BH] or a neutron star [NS]) in a binary system Thedefinitive method is dynamic measurement of the mass of the com-pact object and that can be reliably established only for eclipsingsystems The motivation for finding a way to differentiate the pres-ence of NH or BH in any XRB system is strong subtle differencesin the behavior of neutron star and black hole X-ray binaries providetests of fundamental features of gravitation such as the existence ofa black hole event horizon In this talk we present a statistical ap-proach for classifying binary systems using a novel 3D representa-tion called a color-color-intensity diagram combined with nonlinearclassification techniques The method provides natural and accurateprobabilistic classifications of X-ray binary objects

Persistent Homology and the Topology of the IntergalacticMediumFabrizio LecciCarnegie Mellon University

leccicmueduLight we observe from quasars has traveled through the intergalacticmedium (IGM) to reach us and leaves an imprint of some proper-ties of the IGM on its spectrum There is a particular imprint ofwhich cosmologists are familiar dubbed the Lyman-alpha forestFrom this imprint we can infer the density fluctuations of neutralhydrogen along the line of sight from us to the quasar With cosmo-logical simulation output we develop a methodology using localpolynomial smoothing to model the IGM Then we study its topo-logical features using persistent homology a method for probingtopological properties of point clouds and functions Describing thetopological features of the IGM can aid in our understanding of thelarge-scale structure of the Universe along with providing a frame-work for comparing cosmological simulation output with real databeyond the standard measures Motivated by this example I willintroduce persistent homology and describe some statistical tech-niques that allow us to separate topological signal from topologicalnoise

Session 6 Statistical Methods and Application in Genet-ics

Identification of Homogeneous and Heterogeneous CovariateStructure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1

1New York University2North Carolina State Universityxc311nyueduPooled analyses which make use of data from multiple studies asa single dataset can achieve large sample size to increase statisticalpower When inter-study heterogeneity exists however the simplepooling strategy may fail to present a fair and complete picture onvariables with heterogeneous effects Therefore it is of great im-portance to know the homogeneous and heterogeneous structure ofvariables in pooled studies In this presentation we propose a penal-ized partial likelihood approach with adaptively weighted compos-ite penalties on variablesrsquo homogeneous effects and heterogeneouseffects We show that our method can characterize the structure ofvariables as heterogeneous homogeneous and null effects and si-multaneously provide inference for the non-zero effects The resultsare readily extended to the high-dimension situation where the num-ber of parameters diverges with sample size The proposed selectionand estimation procedure can be easily implemented using the iter-ative shooting algorithm We conduct extensive numerical studiesto evaluate the practical performance of our proposed method anddemonstrate it using real studies

Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibrosisin Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia Guillaume Wettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLCwenfeizhangsanoficomTranslational biomarkers are markers that produce biological sig-nals translatable from animal models to human models Identify-ing translational biomarkers can be important for disease diagno-sis prognosis and risk prediction in drug development Thereforethere is a growing demand on statistical analyses for biomarker dataespecially for large and complex genetic data To ensure the qual-ity of statistical analyses we develop a statistical analysis pipeline

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 39

Abstracts

for gene expression data When the pipeline is applied to gene ex-pression data from drug induced idiopathic pulmonary fibrosis inanimal models it shows some interesting results in evaluating thetranslatability of genes through comparisons with human models

DNA Methylation Cell-Type Distribution and EWASE Andres HousemanOregon State UniversityandreshousemanoregonstateeduEpigenetic processes form the principal mechanisms by which celldifferentiation occurs Consequently DNA methylation measure-ments are strongly influenced by the DNA methylation profilesof constituent cell types as well as by their mixing proportionsEpigenomewide association studies (EWAS) aim to find associ-ations of phenotype or exposure with DNA methylation at sin-gle CpG dinucleotides but these associations are potentially con-founded by associations with overall cell-type distribution In thistalk we review the literature on epigenetics and cell mixture Wethen present two techniques for mixture-adjusted EWAS the firstrequires a reference data set which may be expensive or infeasibleto collect while the other is free of this requirement Finally weprovide several data analysis examples using these techniques

Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and Iuliana Lonita-Laza1

1Columbia University2New York Universityyw2148columbiaeduCase-control designs are widely used in epidemiology and otherfields to identify factors associated with a disease of interest Thesestudies can also be used to study the associations of risk factorswith secondary outcomes such as biomarkers of the disease andprovide cost-effective way to understand disease mechanism Mostof the existing methods have focused on inference on the mean ofsecondary outcomes In this paper we propose a quantile-based ap-proach We construct a new family of estimating equations to makeconsistent and efficient estimation of conditional quantiles using thecase-control sample and also develop tools for statistical inferenceSimulations are conducted to evaluate the practical performance ofthe proposed approach and a case-control study on genetic associ-ation with asthma is used to demonstrate the method

Session 7 Statistical Inference of Complex Associationsin High-Dimensional Data

Leveraging for Big Data RegressionPing MaUniversity of GeorgiapingmaugaeduAdvances in science and technology in the past a few decades haveled to big data challenges across a variety of fields Extractionof useful information and knowledge from big data has become adaunting challenge to both the science community and entire soci-ety To tackle this challenge requires major breakthroughs in effi-cient computational and statistical approaches to big data analysisIn this talk I will present some leveraging algorithms which makea key contribution to resolving the grand challenge In these algo-rithms by sampling a very small representative sub-dataset usingsmart algorithms one can effectively extract relevant informationof vast data sets from the small sub-dataset Such algorithms arescalable to big data These efforts allow pervasive access to big data

analytics especially for those who cannot directly use supercomput-ers More importantly these algorithms enable massive ordinaryusers to analyze big data using tablet computers

Reference-free Metagenomics Analysis Using Matrix Factoriza-tionWenxuan Zhong and Xin XingUniversity of Georgiawenxuanugaedu

metagenomics refers to the study of a collection of genomes typi-cally microbial genomes present in a sample The sample itself cancome from diverse sources depending on the study eg a samplefrom the gastrointestinal tract of a human patient from or a sam-ple of soil from a particular ecological origin The premise is thatby understanding the genomic composition of the sample one canform hypotheses about properties of the sample eg disease corre-lates of the patient or ecological health of the soil source Existingmethods are limited in complex metagenome studies by consider-ing the similarity between some short DNA fragments and genomesin database In this talk I will introduce a reference free genomedeconvolution algorithm that can simultaneously estimate the com-position of a microbial community and estimate the quantity of eachspecies some theoretical results of the deconvolution method willalso be discussed

Big Data Big models Big Problems Statistical Principles andPractice at ScaleAlexander W BlockerGoogleawblockergooglecom

Massive datasets can yield great insights but only when unitedwith sound statistical principles and careful computation We sharelessons from a set of problems in industry all which combine clas-sical design and theory with large-scale computation Simply ob-taining reliable confidence intervals means grappling with complexdependence and distributed systems and obtaining masses of addi-tional data can actually degrade estimates without careful inferenceand computation These problems highlight the opportunities forstatisticians to provide a distinct contribution to the world of bigdata

Session 8 Recent Developments in Survival Analysis

Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical TrialsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2 Mark EBoye3 and Wei Shen3

1University of Connecticut2University of North Carolina3Eli Lilly and Companyming-huichenuconnedu

Motivated from the large phase III multicenter randomized single-blind EMPHACIS mesothelioma clinical trial we develop a classof shared parameter joint models for multi-dimensional longitudi-nal and survival data Specifically we propose a class of multivari-ate mixed effects regression models for multi-dimensional longitu-dinal measures and a class of frailty and cure rate survival mod-els for progression free survival (PFS) time and overall survival(OS) time The properties of the proposed models are examinedin detail In addition we derive the decomposition of the loga-rithm of the pseudo marginal likelihood (LPML) (ie LPML =

40 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

LPMLLong +LPMLSurv|Long) to assess the fit of each compo-nent of the joint model and in particular to assess the fit of the lon-gitudinal component and the survival component of the joint modelseparately and further use ∆LPML to determine the importance andcontribution of the longitudinal data to the model fit of the survivaldata Moreover efficient Markov chain Monte Carlo sampling algo-rithms are developed to carry out posterior computation We applythe proposed methodology to a detailed case study in mesothelioma

Estimating Risk with Time-to-Event Data An Application tothe Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu2

1Vanderbilt University2Fred Hutchinson Cancer Research Centerdandanliuvanderbiltedu

Accurate and individualized risk prediction is critical for popula-tion control of chronic diseases such as cancer and cardiovasculardisease Large cohort studies provide valuable resources for build-ing risk prediction models as the risk factors are collected at thebaseline and subjects are followed over time until disease occur-rence or termination of the study However for rare diseases thebaseline risk may not be estimated reliably based on cohort dataonly due to sparse events In this paper we propose to make useof external information to improve efficiency for estimating time-dependent absolute risk We derive the relationship between exter-nal disease incidence rates and the baseline risk and incorporate theexternal disease incidence information into estimation of absoluterisks while allowing for potential difference of disease incidencerates between cohort and external sources The asymptotic distribu-tions for the proposed estimators are established Simulation resultsshow that the proposed estimator for absolute risk is more efficientthan that based on the Breslow estimator which does not utilize ex-ternal disease incidence rates A large cohort study the WomenrsquosHealth Initiative Observational Study is used to illustrate the pro-posed method

Efficient Estimation of Nonparametric Genetic Risk Functionwith Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng3

1Columbia University2Beijing Normal University3University of North Carolina at Chapel Hillyw2016columbiaedu

With an increasing number of causal genes discovered forMendelian and complex human disorders it is important to assessthe genetic risk distribution functions of disease onset for subjectswho are carriers of these causal mutations and compare them withthe disease distribution in non-carriers In many genetic epidemi-ological studies of the genetic risk functions the disease onset in-formation is subject to censoring In addition subjectsrsquo mutationcarrier or non-carrier status is unknown due to thecost of ascertain-ing subjects to collect DNA samples or due to death in older sub-jects (especially for late onset disease) Instead the probability ofsubjectsrsquo genetic marker or mutation status can be obtained fromvarious sources When genetic status is missing the available datatakes the form of mixture censored data Recently various meth-ods have been proposed in the literature using parametric semi-parametric and nonparametric models to estimate the genetic riskdistribution functions from such data However none of the existingapproach is efficient in the presence of censoring and mixture andthe computation for some methods is demanding In this paper wepropose a sieve maximum likelihood estimation which is fully effi-

cient to infer genetic risk distribution functions nonparametricallySpecifically we estimate the logarithm of hazards ratios betweengenetic risk groups using B-splines while applying the nonpara-metric maximum likelihood estimation (NPMLE) for the referencebaseline hazard function Our estimator can be calculated via anEM algorithm and the computation is much faster than the exist-ing methods Furthermore we establish the asymptotic distributionof the obtained estimator and show that it is consistent and semi-parametric efficient and thus the optimal estimator in this frame-work The asymptotic theory on our sieve estimator sheds light onthe optimal estimation for censored mixture data Simulation stud-ies demonstrate superior performance of the proposed method insmall finite samples The method is applied to estimate the distri-bution of Parkinsonrsquos disease (PD) age at onset for carriers of mu-tations in the leucine-rich repeat kinase 2 (LRRK2) G2019S geneusing the data from the Michael J Fox Foundation AshkenaziJewishLRRK2 consortium This estimation is important for genetic coun-seling purposes since this test is commercially available yet geneticrisk (penetrance) estimates have been variable

Support Vector Hazard Regression for Predicting Event TimesSubject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng1

1University of North Carolina2Columbia UniversitydzengemailunceduPredicting dichotomous or continuous disease outcomes using pow-erful machine learning approaches has been studied extensively invarious scientific areas However how to learn prediction rules fortime-to-event outcomes subject to right censoring has received lit-tle attention until very recently Existing approaches rely on in-verse probability weighting or rank-based methods which are inef-ficient In this paper we develop a novel support vector hazards re-gression (SVHR) approach to predict time-to-event outcomes usingright censored data Our method is based on predicting the countingprocess via a series of support vector machines for time-to-eventoutcomes among subjects at risk Introducing counting processesto represent the time-to-event data leads to an intuitive connectionof the method with support vector machines in standard supervisedlearning and hazard regression models in standard survival analy-sis The resulting optimization is a convex quadratic programmingproblem that can easily incorporate non-linearity using kernel ma-chines We demonstrate an interesting connection of the profiledempirical risk function with the Cox partial likelihood which shedslights on the optimality of SVHR We formally show that the SVHRis optimal in discriminating covariate-specific hazard function frompopulation average hazard function and establish the consistencyand learning rate of the predicted risk Simulation studies demon-strate much improved prediction accuracy of the event times usingSVHR compared to existing machine learning methods Finally weapply our method to analyze data from two real world studies todemonstrate superiority of SVHR in practical settings

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products

Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D NortonMedImmunenortonjmedimmunecom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 41

Abstracts

Benefit-risk assessments are multidimensional and hence challeng-ing both to formulate and to communicate A particular limitation ofsome benefit-risk graphics is that they are based on the marginal dis-tributions of benefit and harm and do not show the degree to whichthey occur in the same patients Consider for example an imagi-nary drug that is beneficial to 50At the 2010 ICSA Symposium the speaker introduced a graphicshowing the benefit-risk state of each subject over time This talkwill include a new graphic based on similar principles that is in-tended for early phase studies It allows the user to assess the jointdistribution of benefit and harm on the individual and cohort levelsThe speaker will also review other graphical displays that may beeffective for benefit-risk assessment considering accepted princi-ples of statistical graphics and his experience working for FDA andindustry

Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3 GeorgeQuartey4 John Scott5 and Shihua Wen6

1Amgen Inc2Pfizer Inc3Merck amp Co4Hoffmann-La Roche5United States Food and Drug Administration6AbbVie IncqjiangamgencomIncreasingly companies regulatory agencies and other governancebodies are moving toward structured benefitrisk assessment ap-proaches One issue that complicates such structured approachesis uncertainty which comes from multiple sources and needs to beaddressed To develop potential approaches to address these sourcesof uncertainty it is critical first to have a thorough understanding ofthem In this presentation members from the Benefit-risk Work-ing Group of the Quantitative Sciences in Pharmaceutical Industry(QSPI BRWG) will discuss some major sources of uncertainty andshare some thoughts on how to address them

Current Concept of Benefit Risk Assessment of MedicineSyed S IslamAbbVie IncsyedislamabbviecomBenefit-risk assessment of a medicine should be as dynamic as thestages of drug development and life cycle of a drug Three fun-damental clinical concepts are critical at all stages- seriousness ofthe disease how much improvement will occur due to the drug un-der consideration and harmful effects including frequency serious-ness and duration One has to achieve a desirable balance betweenthese particularly prior to market approval and follow-up prospec-tively to see that the balance is maintained The desirable balanceis not a straightforward concept It depends on judgment by var-ious stakeholders The patients who are the direct beneficiary ofthe medicine should be the primary stakeholder provided adequateclear and concise information are available to them The healthcareproviders must have similar information that they can communicateto their patients The regulators and insurers are also stakehold-ers for different reasons Industry that are developing or producingthe drug must provide adequate and transparent information usableby all stakeholders Any quantitative approach to integrated bene-fit risk balance should be parsimonious and transparent along withsensitivity analyses This presentation will discuss pros and consof a dynamic benefit risk assessment and how integrated befit risk

analyses can be incorporated within the FDAEMA framework thatincludes patient preference

Session 10 Analysis of Observational Studies and Clini-cal Trials

Impact of Tuberculosis on Mortality Among HIV-Infected Pa-tients Receiving Antiretroviral Therapy in Uganda A CaseStudy in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 and LehanaThabane31Agensys Inc (Astellas)2University of OttawaMcMaster University3McMaster University4McMaster UniversityUniversity of Toronto5The AIDS Support Organization6Stellenbosch UniversityrongchuagensyscomBackground Tuberculosis (TB) disease affects survival among HIVco-infected patients on antiretroviral therapy (ART) Yet the mag-nitude of TB disease on mortality is poorly understoodMethods Using a prospective cohort of 22477 adult patients whoinitiated ART between August 2000 and June 2009 in Uganda weassessed the effect of active pulmonary TB disease at the initiationof ART on all-cause mortality using a Cox proportional hazardsmodel Propensity score (PS) matching was used to control for po-tential confounding Stratification and covariate adjustment for PSand not PS-based multivariable Cox models were also performedResults A total of 1609 (752) patients had active pulmonaryTB at the start of ART TB patients had higher proportions of beingmale suffering from AIDS-defining illnesses having World HealthOrganization (WHO) disease stage III or IV and having lower CD4cell counts at baseline (piexcl0001) The percentages of death duringfollow-up were 1047 and 638 for patients with and withoutTB respectively The hazard ratio (HR) for mortality comparing TBto non-TB patients using 1686 PS-matched pairs was 137 (95confidence interval [CI] 108 - 175) less marked than the crudeestimate (HR = 174 95 CI 149 - 204) The other PS-basedmethods and not PS-based multivariable Cox model produced sim-ilar resultsConclusions After controlling for important confounding variablesHIV patients who had TB at the initiation of ART in Uganda had anapproximate 37 increased hazard of overall mortality relative tonon-TB patients

Ecological Momentary Assessment Methods to Increase Re-sponse and Adjust for Attrition in a Study of Middle SchoolStudentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie Kovalchik KirstenBecker Elizabeth DrsquoAmico William Shadel and Marc ElliottRAND CorporationskovalchrandorgEcological momentary assessment (EMA) is a new approach forcollecting data about repeated exposures in natural settings thathas become more practical with the growth of mobile technolo-gies EMA has the potential to reduce recall bias However be-cause EMA occurs more often and frequently than traditional sur-veys missing data is common In this paper we describe the de-sign and preliminary results of a longitudinal EMA study of expo-sure to alcohol advertising among middle school students (n=600)

42 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

which employed a randomized missing design to increase responserates to smartphone surveys Early results (n=125) show evidenceof attrition over the 14-day collection period which was not associ-ated with student characteristics but was associated with study dayWe develop a prediction model for non-response and adjust for at-trition in exposure summaries using inverse probability weightingAttrition-adjusted estimates suggest that youths saw an average of38 alcohol ads per day over twice what has been previously re-ported with conventional assessment Corrected for attrition EMAmay allow more accurate estimation of frequent exposures than one-time delayed recall

Is Poor Antisaccade Performance in Healthy First-Degree Rel-atives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and Deborah L Levy31University of Alabama at Birmingham2State University of New York at Binghamton3McLean Hospitalcjmorganuabedu

A number of traits associated with schizophrenia aggregate in rel-atives of schizophrenia patients at rates much higher than thatof the clinical disorder These traits considered candidate en-dophenotypes may be alternative more penetrant manifestations ofschizophrenia risk genes than schizophrenia itself Performance onthe antisaccade task a measure of eye-tracking dysfunction is oneof the most widely studied candidate endophenotypes Howeverthere is little consensus on whether poor antisaccade performanceis a true endophenotype for schizophrenia Some studies compar-ing the performance of healthy relatives of schizophrenia patients(RelSZ) to that of normal controls (NC) report that RelSZ showsignificantly more errors while others find no statistically signifi-cant differences between the two groups A recent meta-analysis ofthese studies noted that some studies used stricter exclusion criteriafor NC than RelSZ and found these studies were more likely to findsignificant effect sizes Specifically NC in these studies with a per-sonal or family history of psychopathology were excluded whereasall RelSZ including those with psychotic conditions were includedIn order to determine whether a difference in antisaccade perfor-mance between NC and RelSZ remains after controlling for differ-ences in psychopathology we a binomial regression model to datafrom an antisaccade task We demonstrate that both psychopathol-ogy and familial history affect antisaccade performance

Analysis of a Vaccine Study in Animals using Mitigated Frac-tion in SASMathew RosalesExperismattrosalesexperiscom

Mitigated fraction is frequently used to evaluate the effect of an in-tervention in reducing the severity of a particular outcome a com-mon measure in vaccines study It utilizes rank of the observa-tions and measures the overlap of the two distributions using theirstochastic ordering Percent lung involvement is a common end-point in vaccines study to assess efficacy and mitigated fractionis used to estimate the relative increase in probability that a dis-ease will be less severe to the vaccinated group A SAS macro wasdevelop to estimate the mitigated fraction and its confidence inter-val The macro provides an asymptotic confidence interval and abootstrap-based interval For illustration an actual vaccine studywas used where the macro was utilized to generate the estimates

Competing Risks Survival Analysis for Efficacy Evaluation of

Some-or-None Vaccines

Paul T Edlefsen

Fred Hutchinson Cancer Research Centerpedlefsefhcrcorg

Evaluation of a vaccinersquos efficacy to prevent a specific type of in-fection endpoint in the context of multiple endpoint types is animportant challenge in biomedicine Examples include evaluationof multivalent vaccines such as the annual influenza vaccines thattarget multiple strains of the pathogen While statistical methodshave been developed for ldquomark-specific vaccine efficacyrdquo (wherethe term ldquomarkrdquo refers to a feature of the endpoint such as its typein contrast to a covariate of the subject) these methods addressonly vaccines that have a ldquoleakyrdquo vaccine mechanism meaningthat the vaccinersquos effect is to reduce the per-exposure probabilityof infection The usual presentation of vaccine mechanisms con-trasts ldquoleakyrdquo with ldquoall-or-nonerdquo vaccines which completely pro-tect some fraction of the subjects independent of the number ofexposures that each subject experiences We introduce the notion ofthe ldquosome-or-nonerdquo vaccine mechanism which completely protectsa fraction of the subjects from a defined subset of the possible end-point marks for example for a flu vaccine that completely protectsagainst the seasonal flu but has no effect against the H1N1 strainUnder conditions of non-harmful vaccines we introduce a frame-work and Bayesian and frequentist methods to detect and quantifythe extent to which a vaccinersquos partial efficacy is attributable to un-even efficacy across the marks rather than to incomplete ldquotakerdquo ofthe intervention These new methods provide more power than ex-isting methods to detect mark-varying efficacy (also called ldquosieveeffectsrdquo when the conditions hold We demonstrate the new frame-work and methods with simulation results and with new analyses ofgenetic signatures of vaccine effects in the RV144 HIV-1 vaccineefficacy trial

Using Historical Data to Automatically Identify Air-TrafficController Behavior

Yuefeng Wu

University of Missouri at St Louiswuyueumsledu

The Next Generation Air Traffic Control Systems are trajectory-based automation systems that rely on predictions of future statesof aircraft instead of just using human abilities that is how Na-tional Airspace System (NAS) does now As automation relyingon trajectories becomes more safety critical the accuracy of thesepredictions needs to be fully understood Also it is very importantfor researchers developing future automation systems to understandand in some cases mimic how current operations are conducted byhuman controllers to ensure that the new systems are at least as ef-ficient as humans and to understand creative solutions used by hu-man controllers The work to be presented answers both of thesequestions by developing statistical-based machine learning modelsto characterize the types of errors present when using current sys-tems to predict future aircraft states The models are used to infersituations in the historical data where an air-traffic controller inter-vened on an aircraftrsquos route even when there is no direct recordingof this action Local time series models and some other statisticsare calculated to construct the feature vector then both naive Bayesclassifier and support vector machine are used to learn the patternof the prediction errors

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 43

Abstracts

Session 11 Lifetime Data Analysis

Analysis of Multiple Type Recurrent Events When Only PartialInformation Is Available for Some SubjectsMin Zhan and Jeffery FinkUniversity of MarylandmzhanepiumarylandeduIn many longitudinal studies subjects may experience multipletypes of recurrent events In some situations the exact occurrencetimes of the recurrent events are not observed for some subjectsInstead the only information available is whether these subjects ex-perience each type of event in successive time intervals We discussmarginal models to assess the effect of baseline covariates on the re-current events The proposed methods are applied to a clinical studyof chronic kidney disease in which subjects can experience multipletypes of safety events repeatedly

Cumulative Incidence Function under Two-Stage Randomiza-tionIdil Yavuz1 Yu Cheng2 and Abdus Wahed2

1 Dokuz Eylul University2 University of PittsburghyuchengpitteduIn recent years personalized medicine and dynamic treatment regi-mens have drawn considerable attention Dynamic treatment regi-mens are sets of rules that govern the treatment of subjects depend-ing on their intermediate responses or covariates Two-stage ran-domization is a useful set-up to gather data for making inference onsuch regimens Meanwhile more and more practitioners becomeaware of competing-risk censoring for event type outcomes wheresubjects in a study are exposed to more than one possible failureand the specific event of interest may be dependently censored bythe occurrence of competing events We aim to compare severaltreatment regimens from a two-stage randomized trial on survivaloutcomes that are subject to competing-risk censoring With thepresence of competing risks cumulative incidence function (CIF)has been widely used to quantify the cumulative probability of oc-currence of the target event by a specific time point However if weonly use the data from those subjects who have followed a specifictreatment regimen to estimate the CIF the resulting naive estima-tor may be biased Hence we propose alternative non-parametricestimators for the CIF using inverse weighting and provide infer-ence procedures based on the asymptotic linear representation Inaddition test procedures are developed to compare the CIFs fromtwo different treatment regimens Through simulation we show thepracticality and advantages of the proposed estimators compared tothe naive estimator Since dynamic treatment regimens are widelyused in treating cancer AIDS psychological disorders and otherillnesses that require complex treatment and competing-risk cen-soring is common in studies with multiple endpoints the proposedmethods provide useful inferential tools to analyze such data andwill help advocate research in personalized medicine

Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen JinColumbia Universityzj7columbiaeduIn biomedical research and practice quantitative biomarkers are of-ten used for diagnostic or prognostic purposes with a threshold es-tablished on the measurement to aid binary classification Whenprognosis is on survival time single threshold may not be infor-mative It is also challenging to select threshold when the survival

time is subject to random censoring Using survival time dependentsensitivity and specificity we extend classification accuracy basedobjective function to allow for survival dependent threshold Toestimate optimal threshold for a range of survival rate we adopt anon-parametric procedure which produces satisfactory result in asimulation study The method will be illustrated with a real exam-ple

Session 12 Safety Signal Detection and Safety Analysis

Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhang and QiJiangAmgen Incmagchenamgencom

With the increased regulatory requirements for risk evaluation andminimization strategies large volumes of comprehensive safetydata have been collected and maintained by pharmaceutical spon-sors and proactive evaluation of such safety data for continuousassessment of product safety profile has become essential duringthe drug development life-cycle This presentation will introduceseveral key statistical methodologies developed for safety signalscreening detection including some methods recommended by reg-ulatory agencies for spontaneous reporting data as well as a few re-cently developed methodologies for clinical trials data In additionextensive simulation results will be presented to compare perfor-mance of these methods in terms of sensitivity and false discoveryrate The conclusion and recommendation will be briefed as well

Application of a Bayesian Method for Blinded Safety Monitor-ing and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Incshihuawenabbviecom

Monitoring patient safety is an indispensable component of clini-cal trial planning and conduct Proactive blinded safety monitoringand signal detection in on-going clinical trials enables pharmaceu-tical sponsors to monitor patient safety closely and at the same timemaintain the study blind Bayesian methods by their nature of up-dating knowledge based on accumulating data provide an excel-lent framework for carrying out such a safety monitoring processThis presentation will provide a step by step illustration of howseveral Bayesian models such as beta-binomial model Poisson-gamma model posterior probability vs predictive probability cri-terion etc can be applied to safety monitoring for a particular ad-verse event of special interest (AESI) in a real clinical trial settingunder various adverse event occurrence patterns

Some Thoughts on the Choice of Metrics for Safety EvaluationSteven SnapinnAmgen Incssnapinnamgencom

The magnitude of the treatment effect on adverse events can be as-sessed on a relative scale such as the hazard ratio or the relative riskor on an absolute scale such as the risk difference but there doesnrsquotappear to be any consistency regarding which metric should be usedin any given situation In this presentation I will provide some ex-amples where different metrics have been used discuss their advan-tages and disadvantages and provide a suggested approach

44 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2

1Amgen Inc2Gilead SciencesliangfanggileadcomAs an important aspect of the clinical evaluation of an investiga-tional therapy safety data are routinely collected in clinical trialsTo date the analysis of safety data has largely been limited to de-scriptive summaries of incidence rates or contingency tables aim-ing to compare simple rates between treatment arms Many haveargued this traditional approach failed to take into account impor-tant information including severity onset time and multiple occur-rences of a safety signal In addition premature treatment discon-tinuation due to excessive toxicity causes informative censoring andmay lead to potential bias in the interpretation of safety outcomesIn this article we propose a framework to summarize safety datawith mean frequency function and compare safety events of interestbetween treatments with a generalized log-rank test taking into ac-count the aforementioned characteristics ignored in traditional anal-ysis approaches In addition a multivariate generalized log-ranktest to compare the overall safety profile of different treatments isproposed In the proposed method safety events are considered tofollow a recurrent event process with a terminal event for each pa-tient The terminal event is modeled by a process of two types ofcompeting risks safety events of interest and other terminal eventsStatistical properties of the proposed method are investigated viasimulations An application is presented with data from a phase IIoncology trial

Session 13 Survival and Recurrent Event Data Analysis

Survival Analysis without Survival DataGary ChanUniversity of WashingtonkcgchanuweduWe show that relative mean survival parameters of a semiparametriclog-linear model can be estimated using covariate data from an inci-dent sample and a prevalent sample even when there is no prospec-tive follow-up to collect any survival data Estimation is based onan induced semiparametric density ratio model for covariates fromthe two samples and it shares the same structure as for a logisticregression model for case-control data Likelihood inference coin-cides with well-established methods for case-control data We showtwo further related results First estimation of interaction parame-ters in a survival model can be performed using covariate informa-tion only from a prevalent sample analogous to a case-only analy-sis Furthermore propensity score and conditional exposure effectparameters on survival can be estimated using only covariate datacollected from incident and prevalent samples

Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2

1Johns Hopkins University2National Institute of Allergy and Infectious DiseasescyhuangjhmieduSurvival data from prevalent cases collected under a cross-sectionalsampling scheme are subject to left-truncation When fitting an ad-ditive hazards model to left-truncated data the conditional estimat-ing equation method (Lin and Ying 1994) obtained by modifyingthe risk sets to account for left-truncation can be very inefficient

as the marginal likelihood of the truncation times is not used inthe estimation procedure In this paper we use a pairwise pseudo-likelihood to eliminate nuisance parameters from the marginal like-lihood and by combining the marginal pairwise pseudo-score func-tion and the conditional estimating function propose an efficientestimator for the additive hazards model The proposed estimatoris shown to be consistent and asymptotically normally distributedwith a sandwich-type covariance matrix that can be consistently es-timated Simulation studies show that the proposed estimator ismore efficient than its competitors A data analysis illustrates themethod

Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 and ToddDeFor11University of Minnesota2Johns Hopkins Universityluox0054umnedu

Infection is one of the most common complications afterhematopoietic cell transplantation It accounts for substantial mor-bidity and mortality among transplanted patients Many patientsexperience infectious complications repeatedly over time Existingstatistical methods for recurrent gap time data typically assume thatpatients are enrolled due to the occurrence of an event of the sametype as the recurrent event or assume that all gap times includingthe first gap are identically distributed Applying these methods onthe post-transplant infection data by ignoring event types will in-evitably lead to incorrect inferential results because the time fromthe transplant to the first infection has a different biological mean-ing than the gap times between recurrent infections after the firstinfection occurs Alternatively one may only analyze data afterthe first infection to make the existing recurrent gap time methodsapplicable but this introduces selection bias because only patientswho have experienced infections are included in the analysis Othernaive approaches may include using the univariate survival analysismethods eg the Kaplan-Meier method on the first infection onlydata or using the bivariate serial event data methods on the data upto the second infections Hence all subsequent infection data be-yond the first or the second infectious events will not be utilized inthe analysis These inefficient methods are expected to lead to de-creased power In this paper we propose a nonparametric estimatorof the joint distribution of time from transplant to the first infectionand the gap times between following infections and a semiparamet-ric regression model for studying the risk factors of infectious com-plications of the transplant patients The proposed methods takeinto account the potentially differential distribution of two types oftimes (time from transplant to the first infection and the gap timesbetween subsequent recurrent infections) and fully utilizes the dataof recurrent infections from patients Asymptotic properties of theproposed estimators are established

Session 14 Statistical Analysis on Massive Data fromPoint Processes

Identification of Synaptic Learning Rule from Ensemble Spik-ing ActivitiesDong Song and Theodore W BergerUniversity of Southern Californiadsonguscedu

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 45

Abstracts

Brain represents and processes information with spikes To under-stand the biological basis of brain functions it is essential to modelthe spike train transformations performed by brain regions Sucha model can also be used as a computational basis for developingcortical prostheses that can restore the lost cognitive function bybypassing the damaged brain regions We formulate a three-stagestrategy for such a modeling goal First we formulated a multiple-input multiple-output physiologically plausible model for repre-senting the nonlinear dynamics underlying spike train transforma-tions This model is equivalent to a cascade of a Volterra model anda generalized linear model The model has been successfully ap-plied to the hippocampal CA3-CA1 during learned behaviors Sec-ondly we extend the model to nonstationary cases using a point-process adaptive filter technique The resulting time-varying modelcaptures how the MIMO nonlinear dynamics evolve with time whenthe animal is learning Lastly we seek to identify the learning rulethat explains how the nonstationarity is formed as a consequence ofthe input-output flow that the brain region has experienced duringlearning

Intrinsically Weighted Means and Non-Ergodic Marked PointProcessesAlexander Malinowski1 Martin Schlather1and Zhengjun Zhang2

1University Mannheim2University of WisconsinzjzstatwisceduWhilst the definition of characteristics such as the mean mark in amarked point process (MPP) setup is non-ambiguous for ergodicprocesses several definitions of mark averages are possible andmight be practically relevant in the stationary but non-ergodic caseWe give a general approach via weighted means with possibly in-trinsically given weights We discuss estimators in this situationand show their consistency and asymptotic normality under certainconditions We also suggest a specific choice of weights that has aminimal variance interpretation under suitable assumptions

Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan WangColorado State UniversitysienkiewstatcolostateeduThis talk is motivated by a data set of brain neuron cells Each neu-ron is modeled as an unlabeled data object with topological and ge-ometric properties characterizing the branching structure connect-edness and orientation of a neuron This poses serious challengessince traditional statistical methods for multivariate data rely on lin-ear operations in Euclidean space We develop two curve represen-tations for each object and define the notion of percentiles basedon measures of topological and geometric variations through multi-objective optimization In general numerical solutions can be pro-vided by implementing genetic algorithm The proposed methodol-ogy is illustrated by analyzing a data set of pyramidal neurons

Session 15 High Dimensional Inference (or Testing)

Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni SunUniversity of PennsylvaniatingniwhartonupenneduThis paper studies the problem of estimating a large coefficient ma-trix in a multiple response linear regression model when the coef-ficient matrix is both sparse and of low rank We are especiallyinterested in the high dimensional settings where the number of

predictors andor response variables can be much larger than thenumber of observations We propose a new estimation schemewhich achieves competitive numerical performance while signifi-cantly reducing computation time when compared with state-of-the-art methods Moreover we show the proposed estimator achievesnear optimal non-asymptotic minimax rates of estimation under acollection of squared Schatten norm losses simultaneously by pro-viding both the error bounds for the estimator and minimax lowerbounds In particular such optimality results hold in the high di-mensional settings

Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen LiuUniversity of GeorgiayiwenliuugaeduThe early detection of biothreat is extremely difficult because mostof the early clinical signs in infected subjects show indistinguish-able ldquoflu-likerdquo symptoms Recent researches show that the genomicmarkers are the most reliable indicators and thus are widely usedin the existing detection methods in the past decades In this talk Iwill introduce a biomarker screening method based on the weightedleverage score The weighted leverage score is a variant of the lever-age score that has been widely used for the diagnostic of linear re-gression Empirical studies demonstrate that the weighted leveragescore is not only computationally efficient but also statistically ef-fective in variable screening

Testing High-Dimensional Nonparametric Function with Appli-cation to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and Vidyadhar MandrekarMichigan State UniversitypszhongsttmsueduThis paper proposes a test statistic for testing the high-dimensionalnonparametric function in a reproducing kernel Hilbert space gen-erated by a positive definite kernel We studied the asymptotic dis-tribution of the test statistic under the null hypothesis and a series oflocal alternative hypotheses in a large p smalln setup A simulationstudy was used to evaluate the finite sample performance of the pro-posed method We applied the proposed method to yeast data andthyroid hormone data to identify pathways that are associated withtraits of interest

Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping LiuNational Institutes of HealthdanpingliunihgovThe NEXT Generation Health study investigates the dating violenceof adolescents using a survey questionnaire Each student is askedto affirm or deny multiple instances of violence in hisher datingrelationship There is however evidence suggesting that studentsnot in a relationship responded to the survey resulting in excessivezeros in the responses This paper proposes likelihood-based andestimating equation approaches to analyze the zero-inflated clus-tered binary response data We adopt a mixed model method toaccount for the cluster effect and the model parameters are esti-mated using a maximum-likelihood (ML) approach that requires aGaussian-Hermite quadrature (GHQ) approximation for implemen-tation Since an incorrect assumption on the random effects distribu-tion may bias the results we construct generalized estimating equa-tions (GEE) that do not require the correct specification of within-cluster correlation In a series of simulation studies we examine

46 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the performance of ML and GEE methods in terms of their biasefficiency and robustness We illustrate the importance of properlyaccounting for this zero-inflation by re-analyzing the NEXT datawhere this issue has previously been ignored

Session 16 Phase II Clinical Trial Design with SurvivalEndpoint

Utility-Based Optimization of Schedule-Dose Regimes based onthe Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 and MuzaffarQazilbash1

1University of Texas MD Anderson Cancer Center2University of Michiganrexmdandersonorg

A two-stage Bayesian phase I-II design for jointly optimizing ad-ministration schedule and dose of an experimental agent based onthe times to response and toxicity is described Sequentially adap-tive decisions are based on the joint utility of the two event timesA utility surface is constructed by partitioning the two-dimensionalquadrant of event time pairs into rectangles eliciting a numericalutility for each rectangle and fitting a smooth parametric functionto the elicited values Event times are modeled using gamma distri-butions with shape and scale parameters both functions of sched-ule and dose In stage 1 patients are randomized fairly amongschedules and a dose is chosen within each schedule using an algo-rithm that hybridizes greedy optimization and randomization amongnearly optimal doses In stage 2 fair randomization among sched-ules is replaced by the hybrid algorithm An extension to accommo-date death or discontinuation of follow up is described The designis illustrated by an autologous stem cell transplantation trial in mul-tiple myeloma

Bayesian Decision Theoretic Two-Stage Design in Phase II Clin-ical Trials with Survival EndpointLili Zhao and Jeremy TaylorUniversity of Michiganzhaoliliumichedu

In this study we consider two-stage designs with failure-time end-points in single arm phase II trials We propose designs in whichstopping rules are constructed by comparing the Bayes risk of stop-ping at stage one to the expected Bayes risk of continuing to stagetwo using both the observed data in stage one and the predicted sur-vival data in stage two Terminal decision rules are constructed bycomparing the posterior expected loss of a rejection decision ver-sus an acceptance decision Simple threshold loss functions are ap-plied to time-to-event data modelled either parametrically or non-parametrically and the cost parameters in the loss structure are cal-ibrated to obtain desired Type I error and power We ran simula-tion studies to evaluate design properties including type IampII errorsprobability of early stopping expected sample size and expectedtrial duration and compared them with the Simon two-stage de-signs and a design which is an extension of the Simonrsquos designswith time-to-event endpoints An example based on a recently con-ducted phase II sarcoma trial illustrates the method

Single-Arm Phase II Group Sequential Trial Design with Sur-vival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping XiongSt Jude Childrenrsquos Research Hospitaljianrongwustjudeorg

Three non-parametric test statistics are proposed to design single-arm phase II group sequential trials for monitoring survival proba-bility The small-sample properties of these test statistics are stud-ied through simulations Sample size formulas are derived for thefixed sample test The Brownian motion property of the test statis-tics allowed us to develop a flexible group sequential design using asequential conditional probability ratio test procedure

Session 17 Statistical Modeling of High-throughput Ge-nomics Data

Learning Genetic Architecture of Complex Traits Across Popu-lationsMarc Coram Sophie Candille and Hua TangStanford UniversityhualtanggmailcomGenome-wide association studies (GWAS) have successfully re-vealed many loci that influence complex traits and disease suscep-tibilities An unanswered question is ldquoto what extent does the ge-netic architecture underlying a trait overlap between human popula-tionsrdquo We explore this question using blood lipid concentrations asa model trait In African Americans and Hispanic Americans par-ticipating in the Womenrsquos Health Initiative SNP Health AssociationResource we validated one African-specific HDL locus as well as14 known lipid loci that have been previously implicated in stud-ies of European populations Moreover we demonstrate strikingsimilarities in genetic architecture (loci influencing the trait direc-tion and magnitude of genetic effects and proportions of pheno-typic variation explained) of lipid traits across populations In par-ticular we found that a disproportionate fraction of lipid variationin African Americans and Hispanic Americans can be attributed togenomic loci exhibiting statistical evidence of association in Euro-peans even though the precise genes and variants remain unknownAt the same time we found substantial allelic heterogeneity withinshared loci characterized both by population-specific rare variantsand variants shared among multiple populations that occur at dis-parate frequencies The allelic heterogeneity emphasizes the impor-tance of including diverse populations in future genetic associationstudies of complex traits such as lipids furthermore the overlapin lipid loci across populations of diverse ancestral origin arguesthat additional knowledge can be gleaned from multiple popula-tions We discuss how the overlapping genetic architecture can beexploited to improve the efficiency of GWAS in minority popula-tions

A Bayesian Hierarchical Model to Detect Differentially Methy-lated Loci from Single Nucleotide Resolution Sequencing DataHao Feng Karen Coneelly and Hao WuEmory UniversityhaowuemoryeduDNA methylation is an important epigenetic modification that hasessential roles in cellular processes including gene regulation de-velopment and disease and is widely dysregulated in most types ofcancer Recent advances in sequencing technology have enabled themeasurement of DNA methylation at single nucleotide resolutionthrough methods such as whole-genome bisulfite sequencing andreduced representation bisulfite sequencing In DNA methylationstudies a key task is to identify differences under distinct biologicalcontexts for example between tumor and normal tissue A chal-lenge in sequencing studies is that the number of biological repli-cates is often limited by the costs of sequencing The small number

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 47

Abstracts

of replicates leads to unstable variance estimation which can re-duce accuracy to detect differentially methylated loci (DML) Herewe propose a novel statistical method to detect DML when com-paring two treatment groups The sequencing counts are describedby a lognormal-beta-binomial hierarchical model which providesa basis for information sharing across different CpG sites A Waldtest is developed for hypothesis testing at each CpG site Simulationresults show that the proposed method yields improved DML detec-tion compared to existing methods particularly when the numberof replicates is low The proposed method is implemented in theBioconductor package DSS

Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and Mingyao Li21University of Minnesota2University of Pennsylvania3Vanderbilt Universityrxiaomailmedupennedu

A major application of RNA-Seq is to detect differential isoform ex-pression across experimental conditions However this is challeng-ing because of uncertainty in isoform expression estimation owingto ambiguous reads and variability in the precision of the estimatesacross samples It is desirable to have a method that can accountfor these issues and also allows adjustment of covariates In thispaper we present a random-effects meta-regression approach thatnaturally fits for this purpose Through extensive simulations andanalysis of an RNA-Seq dataset on human heart failure we showthat this approach is computationally fast reliable and can improvethe power of differential expression analysis while controlling forfalse positives due to the effect of covariates or confounding vari-ables

Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei ZouUniversity of North Carolina at Chapel Hillfeizouemailuncedu

Next generation Methyl-seq data collected from F1 reciprocalcrosses in mouse can powerfully dissect strain and parent-of-origineffects on allelic specific methylation In this talk we present anovel statistical approach to analyze Methyl-seq data motivated byan F1 mouse study Our method jointly models the strain and parentof origin effects and deals with the over-dispersion problem com-monly observed in read counts and can flexibly adjust for the effectsof covariates such as sex and read depth We also propose a genomiccontrol procedure to properly control type I error for Methyl-seqstudies where the number of samples is small

Session 18 Statistical Applications in Finance

A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2

1State University of New York2IBMxingamssunysbedu

Markov switching model has been used in various applications ineconomics and finance As exisitng Markov switching models de-scribe the regimes or parameter values in a categorical way itis restrictive in practical analysis In this paper we introduce amixture model with stochastic regimes in which the regimes and

model parameters are represented both categorically and continu-ously Assuming conjudge priors we develop closed-form recur-sive Bayes estimates of the regression parameters an approxima-tion scheme that has much lower computational complexity and yetare comparable to the Bayes estimates in statistical efficiency andan expectation-maximization procedure to estimate the unknownhyper-parameters We conduct intensive simulation studies to eval-uate the performance of Bayes estimates of time-varying parametersand their approximations We further apply the proposed model toanalyze the series of the US monthly total non-farm employee

Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming HuoGeorgia Institute of TechnologyxiaomingisyegatecheduAd position auctions are being held all the time in nearly all websearch engines and have become the major source of revenue in on-line advertising We study statistical models of the bidding pricesTwo approaches are explored (1) a game theoretic approach thatcharacterizes biddersrsquo behavior and (2) a statistical generative ap-proach which aims at mimicking the fundamental mechanism un-derlying the bidding process We comparecontrast these two ap-proaches and describe how auctioneer can take advantage of theobtained knowledge

Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4 andHsun-Chih Kuo5

1University of Maryland2Seoul National Univ3Auburn University4Ulsan National Institute of Science and Technology5National Chengchi UniversityjohanlimsnuackrThis work is motivated by a hand-collected data set from one ofthe largest internet portal in Korea The data set records the top 30most frequently discussed stocks on its online stock message boardwhich can be considered as a measure of investorrsquos attention on in-dividual stocks The empirical goal of the data set is to investigatethe attentionrsquos effect to the trading behavior To do it we considerthe regression model whose response is either stock return perfor-mance or trading volume and covariates are the daily-observed par-tial ranks as well as other covariates influential to the response Inestimating the regression model the rank covariate is often treatedas an ordinal categorical variable or simply transformed into a scorevariable (mostly using identify score function) In the paper westart our discussion with that for the univariate regression problemwhere we find the asymptotic normality of the regression coefficientestimator whose mean is 0 and variance is an unknown function ofthe distribution of X We then straightforwardly extend the resultsof univariate regression to multiple regression and have the similarasympototic distribution We finally consider an estimator for mul-tiple sets by extending or combining the estimators of each singleset We apply our proposed distribution guided scoring function tothe motivated data set to empirically prove the attention effect

Optimal Sparse Volatility Matrix Estimation for High Dimen-sional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou3

1Florida State University2University of Wisconsin-Madison

48 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

3Yale UniversitytaostatfsueduStochastic processes are often used to model complex scientificproblems in fields ranging from biology and finance to engineeringand physical science This talk investigates rate-optimal estimationof the volatility matrix of a high dimensional Ito process observedwith measurement errors at discrete time points The minimax rateof convergence is established for estimating sparse volatility ma-trices By combining the multi-scale and threshold approaches weconstruct a volatility matrix estimator to achieve the optimal conver-gence rate The minimax lower bound is derived by considering asubclass of Ito processes for which the minimax lower bound is ob-tained through a novel equivalent model of covariance matrix esti-mation for independent but non-identically distributed observationsand through a delicate construction of the least favorable parame-ters In addition a simulation study was conducted to test the finitesample performance of the optimal estimator and the simulationresults were found to support the established asymptotic theory

Session 19 Hypothesis Testing

A Score-type Test for Heterogeneity in Zero-inflated Models ina Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem3

1Auburn University2Kansas State University3Michigan State Universitygzc0009auburneduWe propose a score-type statistic to evaluate heterogeneity in zero-inflated models for count data in a stratified population where het-erogeneity is defined as instances in which the zero counts are gen-erated from two sources In this work we extend the literature bydescribing a score-type test to evaluate homogeneity against generalalternatives that do not neglect the stratification information underthe alternative hypothesis Our numerical simulation studies showthat the proposed test can greatly improve efficiency over tests ofheterogeneity that ignore the stratification information An empiri-cal application to dental caries data in early childhood further showsthe importance and practical utility of the methodology in using thestratification profile to detect heterogeneity in the population

Inferences on Correlation Coefficients of Bivariate Log-normalDistributionsGuoyi Zhang1 and Zhongxue Chen2

1Universtiy of New Mexico2Indiana Universitygzhang123gmailcomThis research considers inference on the correlation coefficients ofbivariate log-normal distributions We developed a generalized con-fidence interval and hypothesis tests for the correlation coefficientand extended the results for comparing two independent correla-tions Simulation studies show that the suggested methods workwell even for small samples The methods are illustrated using twopractical examples

Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 Myrto Barrdahl3 andNilanjan Chatterjee11National Cancer Institute2Harvard University3German Cancer Reserch Centersongm4mailnihgov

Risk-prediction models need careful calibration to ensure they pro-duce unbiased estimates of risk for subjects in the underlying pop-ulation given their risk-factor profiles As subjects with extremehigh- or low- risk may be the most affected by knowledge of theirrisk estimates checking adequacy of risk models at the extremes ofrisk is very important for clinical applications We propose a newapproach to test model calibration targeted toward extremes of dis-ease risk distribution where standard goodness-of-fit tests may lackpower due to sparseness of data We construct a test statistic basedon model residuals summed over only those individuals who passhigh andor low risk-thresholds and then maximize the test-statisticover different risk-thresholds We derive an asymptotic distribu-tion for the max-test statistic based on analytic derivation of thevariance-covariance function of the underlying Gaussian processThe method is applied to a large case-control study of breast can-cer to examine joint effects of common SNPs discovered thoroughrecent genome-wide association studies The analysis clearly indi-cates non-additive effect of the SNPs on the scale of absolute riskbut an excellent fit for the linear-logistic model even at the extremesof risks

Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun MaAmgen Incphuamgencom

It is well known that sample sizes of clinical trials are often not bigenough to assess adverse events (AE) with very low incidence ratesLarge scale observational studies such as pharmacovigilence stud-ies using healthcare databases provide an alternative resource forassessment of very rare adverse events Healthcare databases oftencan easily provide tens of thousands of exposed patients which po-tentially allows the assessment of events as rare as in the magnitudeof iexcl 10minus4In this talk we discuss the performance of various commonly usedstatistical methods for comparison of binomial proportions of veryrare events The statistical power type I error control confidenceinterval (CI) coverage length of confidence interval bias and vari-ability of treatment effect estimates as well as the distribution of CIupper bound etc will be examined and compared for the differentmethods Power calculation is often necessary for study planningpurpose However many commonly used power calculation meth-ods are based on approximation and may give erroneous estimatesof power when events are We will compare the power estimates fordifferent methods provided by SAS Proc Power and empirically ob-tained via simulation The use of relative risks (RR) and risk differ-ences (RD) will also be commented on Based on these results sev-eral recommendations are given to guide sample size assessmentsfor such types of studies at design stage

Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu LiAuburn Universityxzl0037auburnedu

This paper proposes a class of lack-of-fit tests for fitting a paramet-ric regression model when response variables are missing at ran-dom These tests are based on a class of minimum integrated squaredistances between a kernel type estimator of a regression functionand the parametric regression function being fitted These tests areshown to be consistent against a large class of fixed alternativesThe corresponding test statistics are shown to have asymptotic nor-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 49

Abstracts

mal distributions under null hypothesis Some simulation results arealso presented

Session 20 Design and Analysis of Clinical Trials

Application of Bayesian Approach in Assessing Rare AdverseEvents during a Clinical StudyGrace Li Karen Price Haoda Fu and David MannerEli Lilly and CompanyLi Ying GracelillycomBayesian analysis is gaining wider application in decision makingthroughout the drug development process due to its more intuitiveframework and ability to provide direct probabilistic answers tocomplex problems Determining the risk profile for a compoundthroughout phases of drug development is crucial along with ensur-ing the most appropriate analyses are performed In a conventional2-arm parallel study design rare adverse events are often assessedvia frequentist approaches such as a Fisherrsquos exact test with itsknown limitations This presentation will focus on the challengesof the frequentist approach to detect and evaluate potential safetysignals in the rare event setting and compare it with the proposedBayesian approach We will compare the operational characteristicsbetween the frequentist and the Bayesian approaches using simu-lated data Most importantly the proposed approach offers muchmore flexibility and a more direct probabilistic interpretation thatimproves the process of detecting rare safety signals This approachhighlights the strength of Bayesian methods for inference Thesimulation results are intended to demonstrate the value of usingBayesian methods and that appropriate application has the potentialto increase efficiency of decision making in drug development

A Simplified Varying-Stage Adaptive Phase IIIII Clinical TrialDesignGaohong DongNovartis Pharmaceuticals CorporationgaohongdongnovartiscomConventionally adaptive phase IIIII clinical trials are carriedout with a strict two-stage design Recently Dong (Statistics inMedicine 2014 33(8)1272-87) proposed a varying-stage adap-tive phase IIIII clinical trial design In this design following thefirst stage an intermediate stage can be adaptively added to obtainmore data so that a more informative decision could be made re-garding whether the trial can be advanced to the final confirmatorystage Therefore the number of further investigational stages is de-termined based upon data accumulated to the interim analysis LaterDong (2013 ICSA Symposium Book to be published) investigatedsome characteristics of this design This design considers two plau-sible study endpoints with one of them initially designated as theprimary endpoint Based on interim results another endpoint canbe switched as the primary endpoint However in many therapeuticareas the primary study endpoint is well established therefore wesimplify this design to consider one study endpoint only Our sim-ulations show that same as the original design this simplified de-sign controls Type I error rate very well the sample size increasesas the threshold probability for the two-stage setting increases andthe alpha allocation ratio in the two-stage setting vs the three-stagesetting has a great impact to the design However this simplifieddesign requires a larger sample size for the initial stage to overcomethe power loss due to the futility Compared to a strict two-stagePhase IIIII design this simplified design improves the probabilityof trial success

Improving Multiple Comparison Procedures With CoprimaryEndpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and Frank Bretz11Novartis Pharmaceuticals Corporation2University of BremenJenniferlinovartiscomFor a fixed-dose combination of indacaterol acetate (long-acting β2-agonist) and mometasone furoate (inhaled corticosteroid) for theonce daily maintenance treatment of asthma and Chronic Obstruc-tive Pulmonary Disease(COPD) both lung function improvementand one symptom outcome improvement are required for the drug tobe developed successfully The symptom outcome could be AsthmaControl Questionnaire (ACQ) improvement for the asthma programand exacerbation rate reduction for the COPD program Havingtwo endpoints increases the probability of false positive results bychance alone ie marketing a drug which is not or insufficientlyeffective Therefore regulatory agencies require strict control ofthis probability at a pre-specified significance level (usually 251-sided) The Simes test is often used in our clinical trials How-ever the Simes test requires the assumption that the test statistics arepositively correlated This assumption is not always satisfied or can-not be easily verified when dealing with multiple endpoints In thispresentation an extension of the Simes test - a generalized Simestest introduced by Maurer Glimm Bretz (2011) which is applica-ble to any correlation (positive negative or even no correlation) isutilized Power benefits based on simulations are presented FDAand other agencies have accepted this approach indicating that theproposed method can be used in other trials in future

Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine CrespiUniversity of California at Los AngelesshengwuuclaeduCluster randomized trials (CRTs) are increasingly used for researchin many fields including public health education social studies andethnic disparity studies Equal allocation designs are often used inCRTs but they may not be optimal especially when cost considera-tion is taken into account In this paper we consider two-arm clusterrandomized trials with a binary outcome and develop various opti-mal designs when sampling costs for units and clusters are differentand the primary outcome is attributable risk or relative risk Weconsider both frequentist and Bayesian approaches in the context ofcancer control and prevention cluster randomized trials and presentformuale for optimal sample sizes for the two arms for each of theoutcome measure

Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and Risk Dif-ferenceTianyue ZhouSanofi-aventis US LLCtianyuezhousanoficomMeta-analysis of side effects has been widely used to combine datawith low event rate across comparative clinical studies for evaluat-ing drug safety profile When dealing with rare events a substantialproportion of studies may not have any events of interest In com-mon practice meta-analyses on a relative scale (relative risk [RR]or odds ratio [OR]) remove zero-event studies while meta-analysesusing risk difference [RD] as the effect measure include them Ascontinuity corrections are often used when zero event occurs in ei-ther arm of a study the impact of zero event and continuity cor-rection on estimates of Mantel-Haenszel (M-H) OR and RD was

50 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

examined through simulation Two types of continuity correctionthe treatment arm continuity correction and the constant continuitycorrection are applied in the meta-analysis for variance calculationFor M-H OR it is unnecessary to include zero-event trials and the95 confidence interval [CI] of the estimate without continuity cor-rections provided best coverage For H-M RD including zero-eventtrials reduced bias and using certain continuity correction ensured atleast 95 coverage of 95 CI This paper examined the influence ofzero event and continuity correction on estimates of M-H OR andRD in order to help people decide whether to include zero-eventtrials and use continuity corrections for a specific problem

Session 21 New methods for Big Data

Sure Independence Screening for Gaussian Graphical ModelsShikai Luo1 Daniela Witten2 and Rui Song1

1North Carolina State University2University of WashingtonrsongncsueduIn high-dimensional genomic studies it is of interest to understandthe regulatory network underlying tens of thousands of genes basedon hundreds or at most thousands of observations for which geneexpression data is available Because graphical models can identifyhow variables such as the coexpresion of genes are related theyare frequently used to study genetic networks Although various ef-ficient algorithms have been proposed statisticians still face hugecomputational challenges when the number of variables is in tens ofthousands of dimensions or higher Motivated by the fact that thecolumns of the precision matrix can be obtained by solving p regres-sion problems each of which involves regressing that feature ontothe remaining pminus 1 features we consider covariance screening forGaussian graphical models The proposed methods and algorithmspossess theoretical properties such as sure screening properties andsatisfactory empirical behavior

Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman2

1Google2Iowa State UniversitydnettiastateeduRandom forest (RF) methodology is a nonparametric methodologyfor prediction problems A standard way to utilize RFs includesgenerating a global RF in order to predict all test cases of interestIn this talk we propose growing different RFs specific to differenttest cases namely case-specific random forests (CSRFs) In con-trast to the bagging procedure used in the building of standard RFsthe CSRF algorithm takes weighted bootstrap resamples to createindividual trees where we assign large weights to the training casesin close proximity to the test case of interest a priori Tuning meth-ods are discussed to avoid overfitting issues Both simulation andreal data examples show that CSRFs often outperform standard RFsin prediction We also propose the idea of case-specific variable im-portance (CSVI) as a way to compare the relative predictor variableimportance for predicting a particular case It is possible that theidea of building a predictor case-specifically can be generalized inother areas

Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis2University of North Carolina at Chapel Hill

tcmleeucdaviseduIn this talk we present a novel parallel method for computing param-eter estimates and their standard errors for massive data problemsThe method is based on generalized fiducial inference

OEM Algorithm for Big DataXiao Nie and Peter Z G QianUniversity of Wisconsin-MadisonxiaoniestatwisceduBig data with large sample size arise in Internet marketing engi-neering and many other fields We propose an algorithm calledOEM (aka orthogonalizing EM) for analyzing big data This al-gorithm employs a procedure named active orthogonalization toexpand an arbitrary matrix to an orthogonal matrix This procedureyields closed-form solutions to ordinary and various penalized leastsquares problems The maximum number of points needed to beadded is bounded by the number of columns of the original ma-trix which is appealing for large n problems Attractive theoreticalproperties of OEM include (1) convergence to the Moore-Penrosegeneralized inverse estimator for a singular regression matrix and(2) convergence to a point having grouping coherence for a fullyaliased regression matrix We also extend this algorithm to logisticregression The effectiveness of OEM for least square and logisticregression problems will be illustrated through examples

Session 22 New Statistical Methods for Analysis of HighDimensional Genomic Data

Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung HuangBrown UniversityYen-Tsung HuangbrowneduGiven the availability of genomic data there have been emerging in-terests in integrating multi-platform data Here we propose to modelepigenetic DNA methylation micro-RNA expression and gene ex-pression data as a biological process to delineate phenotypic traitsunder the framework of causal mediation modeling We proposea regression model for the joint effect of methylation micro-RNAexpression and gene expression and their non-linear interactions onthe outcome and study three path-specific effects the direct effectof methylation on the outcome the effect mediated through expres-sion and the effect through micro-RNA expression We characterizecorrespondences between the three path-specific effects and coeffi-cients in the regression model which are influenced by causal rela-tions among methylation micro-RNA and gene expression A scoretest for variance components of regression coefficients is developedto assess path-specific effects The test statistic under the null fol-lows a mixture of chi-square distributions which can be approxi-mated using a characteristic function inversion method or a pertur-bation procedure We construct tests for candidate models deter-mined by different combinations of methylation micro-RNA geneexpression and their interactions and further propose an omnibustest to accommodate different models The utility of the methodwill be illustrated in numerical simulation studies and a glioblas-toma data from The Cancer Genome Atlas (TCGA)

Estimation of High Dimensional Directed Acyclic Graphs usingeQTL dataWei Sun1 and Min Jin Ha2

1University of North Carolina at Chapel Hill2University of Texas MD Anderson Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 51

Abstracts

weisunemailuncedu

Observational data can be used to estimate the skeleton of a di-rected acyclic graph (DAG) and the directions of a limited numberof edges With sufficient interventional data one can identify thedirections of all the edges of a DAG However such interventionaldata are often not available especially for high dimensional prob-lems We develop a statistical method to estimate a DAG using sur-rogate interventional data where the interventions are applied to aset of external variables and thus such interventions are consideredto be surrogate interventions on the variables of interest Our workis motivated by expression quantitative trait locus (eQTL) studieswhere the variables of interest are the expression of genes the ex-ternal variables are DNA variations and interventions are appliedto DNA variants during the process that a randomly selected DNAallele is passed to a child from either parent Our method namedas sirDAG (surrogate intervention recovery of DAG) first constructDAG skeleton using a combination of penalized regression and thePC algorithm and then estimate the posterior probabilities of all theedge directions after incorporating DNA variant data We demon-strate advantage of sirDAG by simulations and an application in aneQTL study of iquest18000 genes in 550 breast cancer patients

Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 and HongyuZhao1

1Yale University2University of Texas at Dallas3Bristol-Myers Squibb4Mount-Sinai Medical Centerhongyuzhaoyaleedu

Although Genome Wide Association Studies (GWAS) have iden-tified many susceptibility loci for common diseases they only ex-plain a small portion of heritability It is challenging to identify theremaining disease loci because their association signals are likelyweak and difficult to identify among millions of candidates Onepotentially useful direction to increase statistical power is to incor-porate functional genomics information especially gene expressionnetworks to prioritizeGWASsignals Most current methods utiliz-ing network information to prioritize disease genes are based onthe ldquoguilt by associationrdquo principle in which networks are treatedas static and disease-associated genes are assumed to locate closerwith each other than random pairs in the network In contrast wepropose a novel ldquoguilt by rewiringrdquo principle Studying the dynam-ics of gene networks between controls and patients this principleassumes that disease genes more likely undergo rewiring in patientswhereas most of the network remains unaffected in disease condi-tion To demonstrate this principle we consider thechanges of co-expression networks in Crohnrsquos disease patients andcontrols andhow network dynamics reveals information on disease associationsOur results demonstrate that network rewiring is abundant in theimmune system anddisease-associated genes are morelikely to berewired in patientsTo integrate this network rewiring feature andGWAS signals we propose to use the Markov random field frame-work to integrate network information to prioritize genes Appli-cations in Crohnrsquos disease and Parkinsonrsquos disease show that thisframework leads to more replicable results and implicates poten-tially disease-associated pathways

Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and Genotyping Stud-

iesNi Zhao and Michael WuFred Hutchinson Cancer Research CenternzhaofhcrcorgComprehensive understanding of complex trait etiology requires ex-amination of multiple sources of genomic variability Integrativeanalysis of these data sources promises elucidation of the biologicalprocesses underlying particular phenotypes Consequently manylarge GWAS consortia are expanding to simultaneously examine thejoint role of DNA methylation Two practical challenges have arisenfor researchers interested in joint analysis of GWAS and methyla-tion studies of the same subjects First it is unclear how to leverageboth data types to determine if particular genetic regions are relatedto traits of interest Second it is of considerable interest to under-stand the relative roles of different sources of genomic variabilityin complex trait etiology eg whether epigenetics mediates geneticeffects etc Therefore we propose to use the powerful kernel ma-chine framework for first testing the cumulative effect of both epige-netic and genetic variability on a trait and for subsequent mediationanalysis to understand the mechanisms by which the genomic datatypes influence the trait In particular we develop an approach thatworks at the generegion level (to allow for a common unit of anal-ysis across data types) Then we compare pair-wise similarity in thetrait values between individuals to pairwise similarity in methyla-tion and genotype values for a particular gene with correspondencesuggestive of association Similarity in methylation and genotypeis found by constructing an optimally weighted average of the sim-ilarities in methylation and genotype For a significant generegionwe then develop a causal steps approach to mediation analysis atthe generegion level which enables elucidation of the manner inwhich the different data types work or do not work together Wedemonstrate through simulations and real data applications that ourproposed testing approach often improves power to detect trait as-sociated genes while protecting type I error and that our mediationanalysis framework can often correctly elucidate the mechanisms bywhich genetic and epigenetic variability influences traits A key fea-ture of our approach is that it falls within the kernel machine testingframework which allows for heterogeneity in effect sizes nonlinearand interactive effects and rapid p-value computation Addition-ally the approach can be easily applied to analysis of rare variantsand sequencing studies

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation process

Joint Modeling of Alternating Recurrent Transition TimesLiang LiUniversity of Texas MD Anderson Cancer CenterLLi15mdandersonorgAtrial fibrillation (AF) is a common complication on patients under-going cardiac surgery Recent technological advancement enablesthe physicians to monitor the occurrence AF continuously with im-planted cardiac devices The device records two types of transitionaltimes the time when the heart enters the AF status from normal beatand the time when the heart exits from AF status and returns to nor-mal beat The two transitional time processes are recurrent and ap-pear alternatively Hundreds of transitional times may be recordedon a single patient over a follow-up period of up to 12 months Therecurrent pattern carries information on the risk of AF and may berelated to baseline covariates The previous AF pattern may be pre-dictive to the subsequent AF pattern We propose a semiparametric

52 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

bivariate longitudinal transitional time model to model this compli-cated process The model enables single subject analysis as well asmultiple subjects analysis and both can be carried out in a likelihoodframework We present numerical studies to illustrate the empiricalperformance of the methodology

Regression Analysis of Panel Count Data with Informative Ob-servation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun4

1University of North Carolina at Charlotte2University of Maryland3University of New Hampshire4University of Missouri at ColumbiaYLiunccedu

Panel count data usually occur in medical follow-up studies Mostexisting approaches on panel count data analysis assumed that theobservation or censoring times are independent of the response pro-cess either completely or given some covariates We present ajoint analysis approach in which the possible mutual correlationsare characterized by time-varying random effects Estimating equa-tions are developed for the parameter estimation and a simulationstudy is conducted to assess the finite sample performance of theapproach The asymptotic properties of the proposed estimates arealso given and the method is applied to an illustrative example

Envelope Linear Mixed ModelXin ZhangUniversity of Minnesotazhxnzxgmailcom

Envelopes were recently proposed by Cook Li and Chiaromonte(2010) as a method for reducing estimative and predictive varia-tions in multivariate linear regression We extend their formulationproposing a general definition of an envelope and adapting enve-lope methods to linear mixed models Simulations and illustrativedata analysis show the potential for envelope methods to signifi-cantly improve standard methods in longitudinal and multivariatedata analysis This is joint work with Professor R Dennis Cook andProfessor Joseph G Ibrahim

Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan CaiUniversity of Texas health Science Center at Houstonccaistatgmailcom

In longitudinal data analyses the observation times are often as-sumed to be independent of the outcomes In applications in whichthis assumption is violated the standard inferential approach of us-ing the generalized estimating equations may lead to biased infer-ence Current methods require the correct specification of either theobservation time process or the repeated measure process with a cor-rect covariance structure In this article we construct a novel pair-wise pseudo-likelihood method for longitudinal data that allows fordependence between observation times and outcomes This methodinvestigates the marginal covariate effects on the repeated measureprocess while leaving the probability structure of the observationtime process unspecified The novelty of this method is that ityields consistent estimator of the marginal covariate effects with-out specification of the observation time process or the covariancestructure of repeated measures process Large sample propertiesof the regression coefficient estimates and a pseudolikelihood-ratiotest procedure are established Simulation studies demonstrate thatthe proposed method performs well in finite samples An analysis of

weight loss data from a web-based program is presented to illustratethe proposed method

Session 24 Bayesian Models for High Dimensional Com-plex Data

A Bayesian Feature Allocation Model for Tumor HeterogeneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and Kamalakar Gulukota4

1University of California at Santa Cruz2University of Texas at Austin3University of Chicago4Northshore University HealthSystemjuheeleesoeucsceduWe propose a feature allocation model to model tumor heterogene-ity The data are next-generation sequencing data (NGS) from tumorsamples We use a variation of the Indian buffet process to charac-terize latent hypothetical subclones based on single nucleotide vari-ations (SNVs) We define latent subclones by the presence of somesubset of the recorded SNVs Assuming that each sample is com-posed of some sample-specific proportions of these subclones wecan then fit the observed proportions of SNVs for each sample Bytaking a Bayesian perspective the proposed method provides a fulldescription of all possible solutions as a coherent posterior proba-bility model for all relevant unknown quantities including the binaryindicators that characterize the latent subclones by selecting (or not)the recorded SNVs instead of reporting a single solution

Some Results on the One-Way ANOVA Model with an Increas-ing Number of GroupsFeng LiangUniversity of Illinois at Urbana-ChampaignliangfillinoiseduAsymptotic studies on models with diverging dimensionality havereceived increasing attention in statistics A simple version of suchmodels is a one-way ANOVA model where the number of repli-cates is fixed but the number of groups goes to infinity Of interestare inference problems like model selection and estimation of theunknown group means We examine the consistency of Bayesianprocedures using Zellner (1986)rsquos g-prior and its variants (such asmixed g-priors and Empirical Bayes) and compare their estimationaccuracy with other procedures such as the ones based AICBICand group Lasso Our results indicate that the Empirical Bayes pro-cedure (with some modification for the large p small n setting) andthe fully Bayes procedure (ie a prior is specified on g) can achievemodel selection consistency and also have better estimation accu-racy than other procedures being considered

Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji31University of Louisville2University of Texas at Austin3NorthShore University HealthSystemUniversity of ChicagojiyuanuchicagoeduGraphical models can be used to characterize the dependence struc-ture for a set of random variables In some applications the formof dependence varies across different subgroups This situationarises for example when protein activation on a certain pathwayis recorded and a subgroup of patients is characterized by a patho-logical disruption of that pathway A similar situation arises whenone subgroup of patients is treated with a drug that targets that samepathway In both cases understanding changes in the joint distri-bution and dependence structure across the two subgroups is key to

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 53

Abstracts

the desired inference Fitting a single model for the entire data couldmask the differences Separate independent analyses on the otherhand could reduce the effective sample size and ignore the com-mon features In this paper we develop a Bayesian graphical modelthat addresses heterogeneity and implements borrowing of strengthacross the two subgroups by simultaneously centering the prior to-wards a global network The key feature is a hierarchical prior forgraphs that borrows strength across edges resulting in a comparisonof pathways across subpopulations (differential pathways) under aunified model-based framework We apply the proposed model todata sets from two very different studies histone modifications fromChIP-seq experiments and protein measurements based on tissuemicroarrays

Latent Space Models for Dynamic NetworksYuguo ChenUniversity of Illinois at Urbana-Champaignyuguoillinoisedu

Dynamic networks are used in a variety of fields to represent thestructure and evolution of the relationships between entities Wepresent a model which embeds longitudinal network data as trajec-tories in a latent Euclidean space A Markov chain Monte Carloalgorithm is proposed to estimate the model parameters and latentpositions of the nodes in the network The model parameters pro-vide insight into the structure of the network and the visualizationprovided from the model gives insight into the network dynamicsWe apply the latent space model to simulated data as well as realdata sets to demonstrate its performance

Session 25 Statistical Methods for Network Analysis

Consistency of Co-clustering for Exchangable Graph and ArrayDataDavid S Choi1 and Patrick J Wolfe21Carnegie Mellon University2University College Londondavidchandrewcmuedu

We analyze the problem of partitioning a 0-1 array or bipartite graphinto subgroups (also known as co-clustering) under a relativelymild assumption that the data is generated by a general nonpara-metric process This problem can be thought of as co-clusteringunder model misspecification we show that the additional error dueto misspecification can be bounded by O(n( minus 14)) Our resultsuggests that under certain sparsity regimes community detectionalgorithms may be robust to modeling assumptions and that theirusage is analogous to the usage of histograms in exploratory dataanalysis

Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali ShojaieUniversity of Washingtonashojaieuwedu

We introduce a general framework using a Laplacian shrinkagepenalty for estimation of inverse covariance or precision matricesfrom heterogeneous nonexchangeable populations The proposedframework encourages similarity among disparate but related sub-populations while allowing for differences among estimated matri-ces We propose an efficient alternating direction method of mul-tiplier (ADMM) algorithm for parameter estimation and establishboth variable selection and norm consistency of the estimator for

distributions with exponential or polynomial tails Finally we dis-cuss the selection of the Laplacian shrinkage penalty based on hier-archical clustering in the settings where the true relationship amongsamples is unknown and discuss conditions under which this datadriven choice results in consistent estimation of precision matricesExtensive numerical studies and applications to gene expressiondata from subtypes of cancer with distinct clinical outcomes indi-cate the potential advantages of the proposed method over existingapproaches

Estimating Signature Subgraphs in Samples of Labeled GraphsJuhee Cho and Karl RoheUniversity of Wisconsin-MadisonchojuheestatwisceduNetwork is a vibrant area in statistics biology and computer sci-ence Recently an emerging type of data in these fields is samplesof labeled networks (or graphs) The ldquolabelsrdquo of networks imply thatthe nodes are labeled and that the same set of nodes reappears in allof the networks Also they have a dual meaning that there are values(eg age gender or healthy vs sick) or vectors of values charac-terizing the associated network From the analysis we observe thatonly a part of the network forming a ldquosignature subgraphrdquo variesacross the networks whereas the other part is very similar So wedevelop methods to estimate the signature subgraph and show the-oretical properties of the suggested methods under the frameworkthat allows the sample size to go to infinity with a sparsity condi-tion To check the finite sample performances for the methods weconduct a simulation study and then analyze two data sets 42 brain-graphs data from 21 subjects and transcriptional regulatory networkdata from 41 diverse human cell types

Fast Hierarchical Modeling for Recommender SystemsPatrick PerryNew York UniversitypperrysternnyueduIn the context of a recommender system a hierarchical model al-lows for user-specific tastes while simultaneously borrowing esti-mation strength across all users Unfortunately existing likelihood-based methods for fitting hierarchical models have high computa-tional demands and these demands have limited their adoption inlarge-scale prediction tasks We propose a moment-based methodfor fitting a hierarchical model which has its roots in a methodoriginally introduced by Cochran in 1937 The method trades sta-tistical efficiency for computational efficiency It gives consistentparameter estimates competitive prediction error performance anddramatic computational improvements

Session 26 New Analysis Methods for UnderstandingComplex Diseases and Biology

Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3 YongZhang2 Myles Brown4 and X Shirley Liu4

1Dana Farber Cancer Institute2Tongji University3University of Texas MD Anderson Cancer Center4Dana Farber Cancer Institute amp Harvard UniversityywchenjimmyharvardeduCumulatively 70 of the human genome are transcribed whereasiexcl2 of the genome encodes protein As a part of the prevalent non-coding transcription long non-coding RNAs (lncRNAs) are RNAs

54 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

that are longer than 200 base pairs (bps) but with little protein cod-ing capacity The human genome encodes over 10000 lncRNAsand the function of the vast majority of them are unknown Throughintegrative analysis of the lncRNA expression profiles with clinicaloutcome and somatic copy number alteration we identified lncRNAthat are associated with cancer subtypes and clinical prognosis andpredicted those that are potential drivers of cancer progression inmultiple cancers including glioblastoma multiforme (GBM) ovar-ian cancer (OvCa) lung squamous cell carcinoma (lung SCC) andprostate cancer We validated our predictions of two tumorgeniclncRNAs by experimentally confirming the prostate cancer cellgrowth dependence on these two lncRNAs Our integrative analysisprovided a resource of clinically relevant lncRNA for developmentof lncRNA biomarkers and identification of lncRNA therapeutic tar-gets for human cancer

Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi WuHarvard Universitydwufasharvardedu

Large amount of genetics variants were identified in cancer genomestudies and GWAS studies These variants may well capture thecharacteristics of the diseases To best leverage the knowledge fordeveloping new therapeutics to treat diseases our study exploresthe possibility to use the genetics of diseases to guide drug repur-posing Drug repurposing is to suggest whether the available drugsof certain diseases can be re-used for the treatment of other dis-eases We particularly use the gene target information of drugs andprotein-protein interaction information to connect risk genes basedon GWAS hits and the available drugs Drug indication was used toevaluate the sensitivity and specificity of the novel pipeline Eval-uation of the pipeline suggests the promising direction for certaindiseases

Comparative Meta-Analysis of Prognostic Gene Signatures forLate-Stage Ovarian CancerLevi WaldronHunter Collegeleviwaldronhuntercunyedu

Authors Levi Waldron Benjamin Haibe-Kains Aedın C CulhaneMarkus Riester Jie Ding Xin Victoria Wang Mahnaz Ahmadi-far Svitlana Tyekucheva Christoph Bernau Thomas Risch Ben-jamin Ganzfried Curtis Huttenhower Michael Birrer and GiovanniParmigianiAbstract Numerous published studies have reported prognosticmodels of cancer patient survival from tumor genomics These stud-ies employ a wide variety of model training and validation method-ologies making it difficult to compare and rank their modelingstrategies or the accuracy of the models However they have alsogenerated numerous publicly available microarray datasets withclinically-annotated individual patient data Through systematicreview we identified and implemented fully-specified versions of14 prognostic models of advanced stage ovarian cancer publishedover a 5-year period These 14 published models were developedby different authors using disparate training datasets and statis-tical methods but all claimed to be capable of predicting over-all survival using microarray data We evaluated these models forprognostic accuracy (defined by Concordance Index for overall sur-vival) adapting traditional methods of meta-analysis to synthesizeresults in ten independent validation datasets This systematic eval-uation showed that 1) models generated by penalized or ensemble

Cox Proportional Hazards-based regression methods out-performedmodels generated by more complicated methods and strongly out-performed hypothesis-based models 2) validation dataset bias ex-isted meaning that some datasets indicated better validation perfor-mance for all models than others and that comparative evaluation isneeded to identify this source of bias 3) datasets selected by authorsfor independent validation tended to over-estimate model accuracycompared to previously unused validation datasets and 4) seem-ingly unrelated models generated highly correlated predictions fur-ther emphasizing the need for comparative evaluation of accuracyThis talk will provide an overview of methods for prediction mod-eling in cancer genomics and highlight lessons from the first sys-tematic comparative meta-analysis of published cancer genomicsprognostic models

Studying Spatial Organizations of Chromosomes via Paramet-ric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and Jun SLiu5

1New York university2Purdue University3Emory University4Tsinghua University5Harvard UniversityminghunyumcorgThe recently developed Hi-C technology enables a genome-wideview of spatial organizations of chromosomes and has shed deepinsights into genome structure and genome function Although thetechnology is extremely promising multiple sources of biases anduncertainties pose great challenges for data analysis Statistical ap-proaches for inferring three-dimensional (3D) chromosomal struc-ture from Hi-C data are far from their maturity Most existing mod-els are highly over-parameterized lacking clear interpretations andsensitive to outliers In this study we propose parsimonious easyto interpret and robust helix models for reconstructing 3D chromo-somal structure from Hi-C data We also develop a negative bino-mial regression approach to accounting for over-dispersion in Hi-Cdata When applied to a real Hi-C dataset helix models achievemuch better model adequacy scores than existing models Moreimportantly these helix models reveal that geometric properties ofchromatin spatial organizations as well as chromatin dynamics areclosely related to genome functions

Session 27 Recent Advances in Time Series Analysis

Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark van derWoerdColorado State UniversityjbreidtgmailcomProteins consist of sequences of the 21 natural amino acids Therecan be tens to hundreds of amino acids in the protein and hundredsto hundreds of thousands of atoms A complete model for the pro-tein consists of coordinates for every atom A useful class of sim-plified models is obtained by focusing only on the alpha-carbonsequence consisting of the primary carbon atom in the backboneof each amino acid The three-dimensional structure of the alpha-carbon backbone of the protein can be described as a sequence ofangle pairs each consisting of a bond angle and a dihedral angleThese angle pairs lie naturally on a sphere We consider autoregres-sive time series models for such spherical data sequences using ex-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 55

Abstracts

tensions of projected normal distributions We describe applicationto protein data and further developments including autoregressivemodels that switch parameterizations according to local structure inthe protein (such as helices beta-sheets and coils)

Semiparametric Estimation of Spectral Density Function withIrregular DataShu Yang and Zhengyuan Zhu

Iowa State Universityzhu1997gmailcom

We propose a semi-parametric method to estimate spectral den-sities of isotropic Gaussian processes with irregular observationsThe spectral density function at low frequencies is estimated usingsmoothing spline while we use a parametric model for the spec-tral density at high frequencies and estimate the parameters usingmethod-of-moment based on empirical variogram at small lags Wederive the asymptotic bounds for bias and variance of the proposedestimator Simulation results show that our method outperforms theexisting nonparametric estimator by several performance criteria

On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and Siegfried Hormann3

1University of California at Davis2University College London3University Libre de Bruxellesaaueucdavisedu

This talk addresses the prediction of stationary functional time se-ries Existing contributions to this problem have largely focusedon the special case of first-order functional autoregressive processesbecause of their technical tractability and the current lack of ad-vanced functional time series methodology It is shown how stan-dard multivariate prediction techniques can be utilized in this con-text The connection between functional and multivariate predic-tions is made precise for the important case of vector and functionalautoregressions The proposed method is easy to implement mak-ing use of existing statistical software packages and may there-fore be attractive to a broader possibly non-academic audienceIts practical applicability is enhanced through the introduction ofa novel functional final prediction error model selection criterionthat allows for an automatic determination of the lag structure andthe dimensionality of the model The usefulness of the proposedmethodology is demonstrated in simulations and an application tothe prediction of daily pollution curves It is found that the proposedprediction method often significantly outperforms existing methods

A Composite Likelihood-based Approach for Multiple Change-point Estimation in Multivariate Time Series ModelsChun Yip Yau and Ting Fung Ma

Chinese University of Hong Kongcyyaustacuhkeduhk

We propose a likelihood-based approach for multiple change-pointsestimation in general multivariate time series models Specificallywe consider a criterion function based on pairwise likelihood to esti-mate the number and locations of change-points and perform modelselection for each segment By the virtue of pairwise likelihood thenumber and location of change-points can be consistently estimatedunder very mild assumptions Computation is conducted efficientlyby a pruned dynamic programming algorithm Simulation studiesand real data examples are presented to demonstrate the statisticaland computational efficiency of the proposed method

Session 28 Analysis of Correlated Longitudinal and Sur-vival Data

Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir MesbahUniversity of Paris 6mounirmesbahupmcfrIn this talk I will consider the context of a longitudinal study whereparticipants are interviewed about their health quality of life or an-other latent trait at regular dates of visit previously establishedThe interviews consist usually to fulfill a questionnaire in whichthey are asked multiple choice questions with various ordinal re-sponse scales built in order to measure at the time of the visit thelatent trait which is assumed in a first step unidimensional Atthe time of entering the study each participant receives a treatmentappropriate to his health profile The choice of treatment is not ran-domized This choice is arbitrarily decided by a doctor based onthe health profile of the patient and a deep clinical examinationWe assume that the different treatments that a doctor can choose areordered (a dose effect) In addition we assume that the treatmentprescribed at the entrance does not change throughout the study Inthis work I will investigate and compare strategies and models toanalyze time evolution of the latent variable in a longitudinal studywhen the main goal is to compare non-randomized ordinal treat-ments I will illustrate my results with a real longitudinal complexquality of life studyReferences [1] Bousseboua M and Mesbah M (2013) Longitu-dinal Rasch Process with Memory Dependence Pub InstStatUniv Paris Vol 57- Fasc 1-2 45-58 [2] Christensen KB KreinerS Mesbah M (2013) Rasch Models in Health J Wiley [3] Mes-bah M (2012) Measurement and Analysis of Quality of Life inEpidemiology In ldquoBioinformatics in Human Health and Heredity(Handbook of statistics Vol 28)rdquo Eds Rao CR ChakrabortyR and Sen PK North Holland Chapter 15 [4] Rosenbaum PRand Rubin DB (1983) The central role of the propensity score inobservational studies for causal effects Biometrika 70 1 pp 41-55[5] K Imai and D A Van Dyk (2004) Causal Inference With Gen-eral Treatment Regimes Generalizing the Propensity Score JASAVol 99 N 467 Theory and Methods

Power and Sample Size Calculations for Evaluating MediationEffects with Multiple Mediators in Longitudinal StudiesCuiling WangAlbert Einstein College of MedicinecuilingwangeinsteinyueduCurrently there are very limited statistical researches on power anal-ysis for evaluating mediation effects of multiple mediators in longi-tudinal studies In addition to the complex of missing data com-mon to longitudinal studies the case of multiple mediators furthercomplicates the hypotheses testing of mediation effects Based onprevious work of Wang and Xue (Wang and Xue 2012) we eval-uate several hypothesis tests regarding the mediation effects frommultiple mediators and provide formulae for power and sample sizecalculations The performance of these methods under limited sam-ple size is examined using simulation studies An example from theEinstein Aging Study (EAS) is used to illustrate the methods

Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore21University of Maryland

56 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

2McGill UniversitymltleeumdeduCox regression methods are well-known It has however a strongproportional hazards assumption In many medical contexts a dis-ease progresses until a failure event (such as death) is triggeredwhen the health level first reaches a failure threshold Irsquoll presentthe Threshold Regression (TR) model for the health process that re-quires few assumptions and hence is quite general in its potentialapplication Both parametric and distribution-free methods for es-timations and predictions using the TR models are derived Caseexamples are presented that demonstrate the methodology and itspractical use The methodology provides medical researchers andbiostatisticians with new and robust statistical tools for estimatingtreatment effects and assessing a survivorrsquos remaining life

Joint Modeling of Survival Data and Mismeasured Longitudi-nal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi21University of Western Ontario2University of WaterloowhestatsuwocaJoint modeling of longitudinal and survival data has been studiedextensively where the Cox proportional hazards model has fre-quently been used to incorporate the relationship between survivaltime and covariates Although the proportional odds model is anattractive alternative to the Cox proportional hazards model by fea-turing the dependence of survival times on covariates via cumulativecovariate effects this model is rarely discussed in the joint model-ing context To fill this gap we investigate joint modeling of thesurvival data and longitudinal data which subject to measurementerror We describe a model parameter estimation method based onexpectation maximization algorithm In addition we assess the im-pact of naive analyses that fail to address error occurring in longi-tudinal measurements The performance of the proposed method isevaluated through simulation studies and a real data analysis

Session 29 Clinical Pharmacology

Truly Personalizing MedicineMike D HaleAmgen IncmdhaleamgencomPredictive analytics are being increasingly used to optimize market-ing for many non-medical products These companies observe andanalyze the behavior andor characteristics of an individual pre-dict the needs of that individual and then address those needs Wefrequently encounter this when web-browsing and when participat-ing in retail store loyalty programs advertising and coupons aretargeted to the specific individual based on predictive models em-ployed by advertisers and retailers This makes the traditional drugdevelopment program appear antiquated where a drug may be in-tended for all patients with a given indication This talk contraststhose methods and practices for addressing individual needs withthe way medicines are typically prescribed and considers a wayto integrate big data product label and predictive analytics to im-prove and enable personalized medicine Some important questionsare posed (but unresolved) such as who could do this and whatare the implications if we were to predict outcomes for individualpatients

What Do Statisticians Do in Clinical PharmacologyBrian Smith

Amgen Incbrismithamgencom

Clinical pharmacology is the science of drugs and their clinical useIt could be arged that all drug development is clinical pharmacol-ogy however typically pharmaceutical companies speperate in apattern similiar to the following A) clinical (late) development(Phase 2b-Phase 3) B) post-marketing (phase 4) and C) clinicalpharmacology (Phase 1-Phase 2a) As will be seen in this presenta-tion clinical pharmacology research presents numerous interestingstatistical opportunities

The Use of Modeling and Simulation to Bridge Different DosingRegimens - a Case StudyChyi-Hung Hsu and Jose PinheiroJanssen Research amp Developmentchsu3itsjnjcom

In recent years the pharmaceutical industry has increasingly facedthe challenge of needing to efficiently evaluate and use all availableinformation to improve its success rate in drug development underlimited resources constraints Modeling and simulation has estab-lished itself as the quantitative tool of choice to meet this existentialchallenge Models provide a basis for quantitatively describing andsummarizing the available information and our understanding of itUsing models to simulate data allows the evaluation of scenarioswithin and even outside the boundaries of the original data In thispresentation we will discuss and illustrate the use of modeling andsimulation techniques to bridge different dosing regimens based onstudies using just one of the regimens Special attention will begiven to quantifying inferential uncertainty and model validation

A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao Susan GuoLijie Zhong and Liang FangGilead Sciencesyongwushaogileadcom

For a bioequivalence crossover study the FDA guidance recom-mends a mixed effects model for the formulation comparisons ofpharmacokinetics parameters including all subject data while theEMA guidance recommends an ANOVA model with fixed effectsof sequence subject within sequence period and formulation ex-cluding subjects with missing data from the pair-wise comparisonThese two methods are mathematically equivalent when there areno missing values from the targeted comparison With missing val-ues the mixed effects model including subjects with missing valuesprovides higher statistical power compared to fixed effects modelexcluding these subjects However the parameter estimation in themixed effects model is based on large sample asymptotic approxi-mations which may introduce bias in the estimate of standard devi-ations when sample size is small (Jones and Kenward 2003)In this talk we provide a closed-form formula to quantify the poten-tial gain of power using mixed effects models when missing dataare present A simulation study was conducted to confirm the theo-retical results We also perform a simulation study to investigate thebias introduced by the mixed effects model for small sample sizeOur results show that when the sample size is 12 or above as re-quired by both FDA and EMA the bias introduced by the mixedeffects model is negligible From a statistics point of view we rec-ommend the mixed effect model approach for bioequivalence stud-ies for its potential gain in power when missing data are present andmissing completely at random

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 57

Abstracts

Session 30 Sample Size Estimation

Sample Size Calculation with Semiparametric Analysis of LongTerm and Short Term HazardsYi WangNovartis Pharmaceuticals Corporationyi-11wangnovartiscom

We derive sample size formulae for survival data with non-proportional hazard functions under both fixed and contiguous al-ternatives Sample size determination has been widely discussed inliterature for studies with failure-time endpoints Many researchershave developed methods with the assumption of proportional haz-ards under contiguous alternatives Without covariate adjustmentthe logrank test statistic is often used for the sample size and powercalculation With covariate adjustment the approaches are oftenbased on the score test statistic for the Cox proportional hazardsmodel Such methods however are inappropriate when the pro-portional hazards assumption is violated We develop methods tocalculate the sample size based on the semiparametric analysis ofshort-term and long-term hazard ratios The methods are built ona semiparametric model by Yang and Prentice (2005) The modelaccommodates a wide range of patterns of hazard ratios and in-cludes the Cox proportional hazards model and the proportionalodds model as its special cases Therefore the proposed methodscan be used for survival data with proportional or non-proportionalhazard functions In particular the sample size formula by Schoen-feld (1983) and Hsieh and Lavori (2000) can be obtained as a specialcase of our methods under contiguous alternatives

Sample Size and Decision Criteria for Phase IIB Studies withActive ControlXia XuMerck amp Coxia xumerckcom

In drug development programs Phase IIB studies provide informa-tion to make GoNo Go decision of conducting large confirmatoryPhase III studies Currently more and more Phase IIB studies are us-ing active control as comparator especially for development of newtherapies for the treatment of HIV infection in which it is not ethicalto use placebo control due to severity of the disease and availabilityof approved drugs If Phase IIB study demonstrated ldquocomparablerdquoefficacy and safety compared to active control the program mayproceed to Phase III which usually use same or similar active con-trol to formally assess non-inferiority of the new therapy Samplesize determination and quantification of decision criteria for suchPhase IIB studies are explored using a Bayesian analysis

Sample Size Determination for Clinical Trials to Correlate Out-comes with Potential PredictorsSu Chen Xin Wang and Ying ZhangAbbVie Incsuchenabbviecom

Sample size determination can be a challenging task for a post-marketing clinical study aiming to establish the predictivity of asingle influential measurement or a set of variables to a clinical out-come of interest Since the relationship between the potential pre-dictors and the outcome is unknown at the design stage one maynot be able to perform the conventional sample size calculation butlook for other means to size the trial Our proposed approach isbased on the length of the confidence interval of the true correlationcoefficient between predictive and outcome variables In this studywe compare three methods to construct confidence intervals of the

correlation coefficient based on the approximate sampling distribu-tion of the Pearson correlation Z-transformed Pearson correlationand Bootstrapping respectively We evaluate the performance ofthe three methods under different scenarios with small to moderatesample sizes and different correlations Coverage probabilities ofthe confidence intervals are compared across the three methods Theresults are used for sample size determination based on the width ofthe confidence intervals Hypothetical examples are provided to il-lustrate the idea and its implementation

Sample Size Re-Estimation at Interim Analysis in Oncology Tri-als with a Time-to-Event Endpoint

Ian (Yi) Zhang

Sunovion Pharmaceuticals Incianzhangsunovioncom

Oncology is a hot therapeutic area due to highly unmet medicalneeds The superiority of a study drug over a control is commonlyassessed with respect to a time to event endpoint such as overall sur-vival (OS) or progression free survival (PFS) in confirmatory oncol-ogy trials Adaptive design allowing for sample size re-estimation(SSR) at interim analysis is often employed to accelerate oncologydrug development while reducing costs Although SSR is catego-rized as ldquoless well understoodrdquo (in contrast to ldquowell understoodrdquodesigns such as group sequential design) in the 2010 draft FDAguidance on adaptive designs it has gradually gained regulatory ac-ceptance and is widely adopted in industry In this presentation aphase IIIII seamless design is developed to re-estimate the samplesize based upon unblinded interim result using conditional power ofobserving a significant result by the end of the trial The method-ology achieved the desired conditional power while still controllingthe type I error rate Extensive simulations studies are performedto evaluate the operating characteristics of the design A real-worldexample will also be used for illustration Pros and cons of the de-sign will be discussed

Statistical Inference and Sample Size Calculation for Paired Bi-nary Outcomes with Missing Data

Song Zhang

University of Texas Southwestern Medical Centersongzhangutsouthwesternedu

We investigate the estimation of intervention effect and samplesize determination for experiments where subjects are supposed tocontribute paired binary outcomes with some incomplete observa-tions We propose a hybrid estimator to appropriately account forthe mixed nature of observed data paired outcomes from thosewho contribute complete pairs of observations and unpaired out-comes from those who contribute either pre- or post-interventionoutcomes We theoretically prove that if incomplete data are evenlydistributed between the pre- and post-intervention periods the pro-posed estimator will always be more efficient than the traditionalestimator A numerical research shows that when the distributionof incomplete data is unbalanced the proposed estimator will besuperior when there is moderate-to-strong positive within-subjectcorrelation We further derive a closed-form sample size formula tohelp researchers determine how many subjects need to be enrolledin such studies Simulation results suggest that the calculated sam-ple size maintain the empirical power and type I error under variousdesign configurations We demonstrate the proposed method usinga real application example

58 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Session 31 Predictions in Clinical Trials

Predicting Smoking Cessation Outcomes Beyond Clinical TrialsYimei Li E Paul Wileyto and Daniel F HeitjanUniversity of PennsylvaniayimeilimailmedupenneduIn smoking cessation trials subjects usually receive treatment forseveral weeks with additional information collected 6 or 12 monthafter that An important question concerns predicting long-term ces-sation success based on short-term clinical observations But sev-eral features need to be considered First subjects commonly transitseveral times between lapse and recovery during which they exhibitboth temporary and permanent quits and both brief and long-termlapses Second although we have some reliable predictors of out-come there is also substantial heterogeneity in the data We there-fore introduce a cure-mixture frailty model that describes the com-plex process of transitions between abstinence and smoking Thenbased on this model we propose a Bayesian approach to predictindividual future outcomes We will compare predictions from ourmodel to a variety of ad hoc methods

Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping WangEli Lilly and CompanyfuhaodagmailcomIn oncology trials it is challenging to predict when we can havecertain number of events or for a given period of time how manyadditional events that we can observe We develop a tool calledBEATLES which stands for Bayesian Event And Time LandmarkEstimation Software This method and tools have been broadly im-plemented in Lilly In this talk we will present the technical details

Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record (EMR)DataHaoda Fu1 and Nan Jia2

1Eli Lilly and Company2University of Southern Californiajia nan2lillycomTo compare a treatment with a control via a randomized clinicaltrial the assessment of the treatment efficacy is often based on anoverall treatment effect over a specific study population To increasethe probability of study success (PrSS) it is important to choose anappropriate and relevant study population where the treatment is ex-pected to show overall benefit over the control This research is topredict the PrSS based on EMR data for a given patient populationTherefore we can use this approach to refine the study inclusionand exclusion criteria to increase the PrSS For learning from EMRdata we also develop covariate balancing methods Although ourmethods are developed for learning from EMR data learning fromrandomized control trials will be a special case of our methods

Weibull Cure-Mixture Model for the Prediction of Event Timesin Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1

1University of Pennsylvania2Radiation Therapy Oncology Group Statistical CentergsyingmailmedupenneduMany clinical trials with time-to-event outcome are designed toperform interim and final analyses upon the occurrence of a pre-specified number of events As an aid to trial logistical planningit is desirable to predict the time to reach such landmark event

numbers Our previously developed parametric (exponential andWeibull) prediction models assume that every trial participant issusceptible to the event of interest and will eventually experiencethe event if follow-up time is long enough This assumption maynot hold as some trial participants may be cured of the fatal dis-ease and the failure to accommodate the cure possibility may leadto the biased prediction In this talk a Weibull cure-mixture predic-tion model will be presented that assumes the trial participants area mixture of susceptible (uncured) participants and non-susceptible(cured) participants The cure probability is modelled using logis-tic regression and the time to event among susceptible participantsis modelled by a two-parameter Weibull distribution The compar-ison of prediction from the Weibull-cure mixture prediction modelto that from the standard Weibull prediction model will be demon-strated using data from a randomized trial of oropharyngeal cancer

Session 32 Recent Advances in Statistical Genetics

Longitudinal Exome-Focused GWAS of Alcohol Use in a Vet-eran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale UniversityzuohengwangyaleeduAlcohol dependence (AD) is a major public health concern in theUnited States and contributes to the pathogenesis of many diseasesThe risk of AD is multifactorial and includes shared genetic andenvironmental factors However gene mapping in AD has not yetbeen successful the confirmed associations account for a small pro-portion of overall genetic risks Multiple measurements in longitu-dinal genetic studies provide a route to reduce noise and correspond-ingly increase the strength of signals in genome-wide associationstudies (GWAS) In this study we developed a powerful statisticalmethod for testing the joint effect of genetic variants with a generegion on diseases measured over multiple time points We appliedthe new method to a longitudinal study of veteran cohort with bothHIV-infected and HIV-uninfected patients to understand the geneticrisk underlying AD We found an interesting gene that has been re-ported in HIV study suggestive of potential gene by environmenteffect in alcohol use and HIV We also conducted simulation studiesto access the performance of the new statistical methods and demon-strated a power gain by taking advantage of repeated measurementsand aggregating information across a biological region This studynot only contributes to the statistical toolbox in the current GWASbut also potentially advances our understanding of the etiology ofAD

Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2 andAlexander F Wilson1

1National Institutes of Health2Mahidol UniversitysunghemailnihgovThe task of identifying genetic variants contributing to trait varia-tion is increasingly challenging given the large number and densityof variant data being produced Current methods of analyzing thesedata include regression-based variable selection methods which pro-duce linear models incorporating the chosen variants For examplethe Tiled Regression method begins by examining relatively smallsegments of the genome called tiles Selection of significant predic-tors if any is done first within individual tiles However type I errorrates for such methods havenrsquot been fully investigated particularly

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 59

Abstracts

considering correlation among variants To investigate type I er-ror in this situation we simulated a mini-GWAS genome including306097 SNPs in 4000 unrelated samples with 2000 non-genetictraits Initially 53060 tiles were defined by dividing the genomeaccording to recombination hotspots Then larger tiles were definedby combining groups of ten consecutive tiles Stepwise regressionand LASSO variable selection methods were performed within tilesfor each tile definition Type I error rates were calculated as thenumber of selected variants divided by the number considered av-eraged over the 2000 phenotypes Overall error rates for stepwiseregression using fixed selection criteria of 005 and LASSO mini-mizing mean square error were 004 and 012 respectively whenusing the initial (smaller) tiles Considering separately each combi-nation of tile size (number of SNPs) and multicollinearity (definedas 1 - the determinant of the genotype correlation matrix) observedtype I error rates for stepwise regression tended to increase withthe number of variants and decrease with increasing multicollinear-ity With LASSO the trends were in the opposite direction Whenthe larger tiles were used overall rates for LASSO were noticeablysmaller while overall rates were rather robust for stepwise regres-sion

GMDR A Conceptual Framework for Detection of MultifactorInteractions Underlying Complex TraitsXiang-Yang LouUniversity of Alabama at Birminghamxylouuabedu

Biological outcomes are governed by multiple genetic and envi-ronmental factors that act in concert Determining multifactor in-teractions is the primary topic of interest in recent genetics stud-ies but presents enormous statistical and mathematical challengesThe computationally efficient multifactor dimensionality reduction(MDR) approach has emerged as a promising tool for meeting thesechallenges On the other hand complex traits are expressed in vari-ous forms and have different data generation mechanisms that can-not be appropriately modeled by a dichotomous model the subjectsin a study may be recruited according to its own analytical goals re-search strategies and resources available not only homogeneous un-related individuals Although several modifications and extensionsof MDR have in part addressed the practical problems they arestill limited in statistical analyses of diverse phenotypes multivari-ate phenotypes and correlated observations correcting for poten-tial population stratification and unifying both unrelated and familysamples into a more powerful analysis I propose a comprehensivestatistical framework referred as to generalized MDR (GMDR) forsystematic extension of MDR The proposed approach is quite ver-satile not only allowing for covariate adjustment being suitablefor analyzing almost any trait type eg binary count continuouspolytomous ordinal time-to-onset multivariate and others as wellas combinations of those but also being applicable to various studydesigns including homogeneous and admixed unrelated-subject andfamily as well as mixtures of them The proposed GMDR offersan important addition to the arsenal of analytical tools for identi-fying nonlinear multifactor interactions and unraveling the geneticarchitecture of complex traits

Gene-Gene Interaction Analysis for Rare Variants Applicationto T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University2Sejong Universitytsparkstatssnuackr

Heritability of complex diseases may not be fully explained by thecommon variants This missing heritability could be partly due togene-gene interaction and rare variants There has been an exponen-tial growth of gene-gene interaction analysis for common variantsin terms of methodological developments and practical applicationsAlso the recent advance of high-throughput sequencing technolo-gies makes it possible to conduct rare variant analysis Howeverlittle progress has been made in gene-gene interaction analysis forrare variants Here we propose a new gene-gene interaction methodfor the rare variants in the framework of the multifactor dimension-ality reduction (MDR) analysis The proposed method consists oftwo steps The first step is to collapse the rare variants in a specificregion such as gene The second step is to perform MDR analysisfor the collapsed rare variants The proposed method is illustratedwith 1080 whole exome sequencing data of Korean population toidentify causal gene-gene interaction for rare variants for type 2 di-abetes

Session 33 Structured Approach to High DimensionalData with Sparsity and Low Rank Factorization

Two-way Regularized Matrix DecompositionJianhua HuangTexas AampM UniversityjianhuastattamueduMatrix decomposition (or low-rank matrix approximation) plays animportant role in various statistical learning problems Regulariza-tion has been introduced to matrix decomposition to achieve stabil-ity especially when the row or column dimension is high Whenboth the row and column domains of the matrix are structured itis natural to employ a two-way regularization penalty in low-rankmatrix approximation This talk discusses the importance of con-sidering invariance when designing the two-way penalty and showsun-desirable properties of some penalties used in the literature whenthe invariance is ignored

Tensor Regression with Applications in Neuroimaging AnalysisHua Zhou1 Lexin Li1 and Hongtu Zhu2

1North Carolina State University2University of North Carolina at Chapel Hilllli10ncsueduClassical regression methods treat covariates as a vector and es-timate a corresponding vector of regression coefficients Modernapplications in medical imaging generate covariates of more com-plex form such as multidimensional arrays (tensors) Traditionalstatistical and computational methods are compromised for analysisof those high-throughput data due to their ultrahigh dimensional-ity as well as complex structure In this talk I will discuss a newclass of tensor regression models that efficiently exploit the specialstructure of tensor covariates Under this framework ultrahigh di-mensionality is reduced to a manageable level resulting in efficientestimation and prediction Regularization both hard thresholdingand soft thresholding types will be carefully examined The newmethods aim to address a family of neuroimaging problems includ-ing using brain images to diagnose neurodegenerative disorders topredict onset of neuropsychiatric diseases and to identify diseaserelevant brain regions or activity patterns

RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2 andGuy Lebanon1

60 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

1Georgia Institute of Technology2Pennsylvania State Universitykrishnakumar3gatecheduFeature screening is a key step in handling ultrahigh dimensionaldata sets that are ubiquitous in modern statistical problems Over thelast decade convex relaxation based approaches (eg Lassosparseadditive model) have been extensively developed and analyzed forfeature selection in high dimensional regime But these approachessuffer from several problems both computationally and statisticallyTo overcome these issues we propose a novel Hilbert space em-bedding based approach for independence screening in ultrahigh di-mensional data sets The proposed approach is model-free (ie nomodel assumption is made between response and predictors) andcould handle non-standard (eg graphs) and multivariate outputsdirectly We establish the sure screening property of the proposedapproach in the ultrahigh dimensional regime and experimentallydemonstrate its advantages over other approaches

Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho ChunPurdue UniversitychunhpurdueeduFor the purpose of inferring a network we consider a sparse Gaus-sian graphical model (SGGM) under the presence of a populationstructure which often occurs in genetic studies with model organ-isms In these studies datasets are obtained by combining multi-ple lines of inbred organisms or by using outbred animals Ignor-ing such population structures would produce false connections ina graph structure but most research in graph inference has beenfocused on independent cases On the other hand in regression set-tings a linear mixed effect model has been widely used in orderto account for correlations among observations Besides its effec-tiveness the linear mixed effect model has a generality the modelcan be stated with a framework of penalized least squares Thisgenerality makes it very flexible for utilization in settings other thanregression In this manuscript we adopt a linear mixed effect modelto an SGGM Our formulation fits into the recently developed con-ditional Gaussian graphical model in which the population struc-tures are modeled as predictors and the graph is determined by aconditional precision matrix The proposed approach is applied tothe network inference problem of two datasets heterogeneous micediversity panel (HMDP) and heterogeneous stock (HS) datasets

Session 34 Recent Developments in Dimension Reduc-tion Variable Selection and Their Applications

Variable Selection and Model Estimation via Subtle UprootingXiaogang SuUniversity of Texas at El PasoxiaogangsugmailcomWe propose a new method termed ldquosubtle uprootingrdquo for fittingGLM by optimizing a smoothed information criterion The signif-icance of this approach is that it completes variable selection andparameter estimation within one single optimization step and avoidstuning penalty parameters as commonly done in traditional regular-ization approaches Two technical maneuvers ldquouprootingrdquo and anepsilon-threshold procedure are employed to enforce sparsity inparameter estimates while maintaining the smoothness of the ob-jective function The formulation allows us to borrow strength fromestablished methods and theories in both optimization and statistical

estimation More specifically a modified BFGS algorithm (Li andFukushima 2001) is adopted to solve the non-convex yet smoothprogramming problem with established global and super-linearconvergence properties By making connections to M -estimatorsand information criteria we also showed that the proposed methodis consistent in variable selection and efficient in estimating thenonzero parameters As illustrated with both simulated experimentsand data examples the empirical performance is either comparableor superior to many other competitors

Robust Variable Selection Through Dimension ReductionQin WangVirginia Commonwealth Universityqwang3vcueduDimension reduction and variable selection play important roles inhigh dimensional data analysis MAVE (minimum average varianceestimation) is an efficient approach proposed by Xia et al (2002)to estimate the regression mean space However it is not robust tooutliers in the dependent variable because of the use of least-squarescriterion In this talk we propose a robust estimation based on localmodal regression so that it is more applicable in practice We fur-ther extend the new approach to select informative variables throughshrinkage estimation The efficacy of the new approach is illustratedthrough simulation studies

Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2

1University of Florida2National University of SingaporezhihuasustatufleduThe envelope model recently proposed by Cook Li andChiaromonte (2010) is a novel method to achieve efficient estima-tion for multivariate linear regression It identifies the material andimmaterial information in the data using the covariance structureamong the responses The subsequent analysis is based only on thematerial part and is therefore more efficient The envelope estimatoris consistent but in the sample the material part estimated by theenvelope model consists of linear combinations of all the responsevariables while in many applications it is important to pinpoint theresponse variables that are immaterial to the regression For thispurpose we propose the sparse envelope model which can identifythese response variables and at the same time preserves the effi-ciency gains offered by the envelope model A group-lasso type ofpenalty is employed to induce sparsity on the manifold structure ofthe envelope model Consistency asymptotic distribution and oracleproperty of the estimator are established In particular new featuresof oracle property with response selection are discussed Simulationstudies and an example demonstrate the effectiveness of this model

Session 35 Post-Discontinuation Treatment in Random-ized Clinical Trials

Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censoringby Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries11Eli Lilly and Company2North Carolina State Universityliu jingyilillycomA randomized clinical trial is designed to estimate the direct ef-fect of a treatment versus control where patients receive the treat-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 61

Abstracts

ment of interest or control by random assignment The treatmenteffect is measured by the comparison of endpoints of interest egoverall survival However in some trials patients who discon-tinued their initial randomized treatment are allowed to switch toanother treatment based on clinicians or patientsrsquo subjective deci-sion In such cases the primary endpoint is censored and the di-rect treatment effect of interest may be confounded by subsequenttreatments especially when subsequent treatments have large im-pact on endpoints In such studies there usually exist variables thatare both risk factors of primary endpoint and also predictors of ini-tiation of subsequent treatment Such variables are called time de-pendent confounders When time dependent confounders exist thetraditional methods such as the intent-to-treat (ITT) analysis andtime-dependent Cox model may not appropriately adjust for timedependent confounders and result in biased estimators Marginalstructural models (MSM) have been applied to estimate the causaltreatment effect when initial treatment effect was confounded bysubsequent treatments It has been shown that MSM utilizing in-verse propensity weighting generates consistent estimators whenother nuisance parameters were correctly modeled However theoccurrence of very large weights can cause the estimator to haveinflated variance and consistency may not hold The AugmentedMSM estimator was proposed to more efficiently estimate treat-ment effect but may not perform well as expected in presence oflarge weights In this paper we proposed a new method to estimateweights by adaptively truncating longitudinal weights in MSM Thismethod sacrifices the consistency but gain efficiency when largeweight exists without ad hoc selecting and removing observationswith large weights We conducted simulation studies to explorethe performance of several different methods including ITT anal-ysis Cox model and the proposed method regarding bias standarddeviation coverage rate of confidence interval and mean squarederror (MSE) under various scenarios We also applied these meth-ods to a randomized open-label phase III study of patients withnon-squamous non-small cell lung cancer

Quantile Regression Adjusting for Dependent Censoring fromSemi-Competing RisksRuosha Li1 and Limin Peng2

1University of Pittsburgh2Emory Universityrul12pittedu

In this work we study quantile regression when the response is anevent time subject to potentially dependent censoring We considerthe semi-competing risks setting where time to censoring remainsobservable after the occurrence of the event of interest While sucha scenario frequently arises in biomedical studies most of currentquantile regression methods for censored data are not applicable be-cause they generally require the censoring time and the event timebe independent By imposing rather mild assumptions on the asso-ciation structure between the time-to-event response and the censor-ing time variable we propose quantile regression procedures whichallow us to garner a comprehensive view of the covariate effects onthe event time outcome as well as to examine the informativenessof censoring An efficient and stable algorithm is provided for im-plementing the new method We establish the asymptotic proper-ties of the resulting estimators including uniform consistency andweak convergence Extensive simulation studies suggest that theproposed method performs well with moderate sample sizes We il-lustrate the practical utility of our proposals through an applicationto a bone marrow transplant trial

Overview of Crossover DesignMing ZhuAbbVie Inczhuming83gmailcomCrossover design is used in many clinical trials Comparing toconventional parallel design crossover design has the advantage ofavoiding problems of comparability issues between study and con-trol groups with regard to potential confounding variables More-over crossover design is more efficient than parallel design in thatit requires smaller sample size with given type I and type II errorHowever crossover design may suffer from the problem of carry-over effects which might bias the interpretation of data analysis Inthe presentation I will talk about general consideration that needsto be taken and pitfalls to be avoided in planning and analysis ofcrossover trial Appropriate statistical methods for crossover trialanalysis will also be described

Cross-Payer Effects of Medicaid LTSS on Medicare ResourceUse using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen JohnsonUniversity of MarylandyihuangumbceduMedicaid administrators look to establish a better balance betweenlong-term services and supports (LTSS) provided in the communityand in institutions and to better integrate acute and long-term carefor recipients who are dually eligible for Medicare Programs of in-tegrated care will require the solid understanding on the interactiveeffects that are masked in the separation of Medicare and MedicaidThis paper aims to evaluate the causal effect of Marylandrsquos OlderAdult Waiver (OAW) program on the outcomes of Medicare spend-ing using propensity score based health risk profiling techniqueSpecifically dually eligible recipients enrolled for Marylandrsquos OAWprogram were identified as the treatment group and matched ldquocon-trolrdquo groups were drawn from comparable population who did notreceive those services The broader impact for this study is that sta-tistical approaches can be developed by any state to facilitate theimprovement of quality and cost effectiveness of LTSS for duals

Session 36 New Advances in Semi-Parametric Modelingand Survival Analysis

Bayesian Partial Linear Model for Skewed Longitudinal DataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 Stuart Lipsitz3

and Steven Lipshultz41AbbVie Inc2Florida State University3Brigham and Womenrsquos Hospital4University of MiamidebdeepstatfsueduCurrent statistical models and methods focusing on mean responseare not appropriate for longitudinal studies with heavily skewedcontinuous response For such longitudinal response we presenta novel model accommodating a partially linear median regressionfunction a flexible Dirichlet process mixture prior for the skewederror distribution and within subject association structure We pro-vide theoretical justifications for our methods including asymptoticproperties of the posterior and the semi-parametric Bayes estima-tors We also provide simulation studies of finite sample propertiesEase of computational implementation via available MCMC toolsand other additional advantages of our method compared to exist-ing methods are illustrated via analysis of a cardiotoxicity study ofchildren of HIV infected mothers

62 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Nonparametric Inference for Inverse Probability Weighted Es-timators with a Randomly Truncated SampleXu ZhangUniversity of Mississippixzhang2umcedu

A randomly truncated sample appears when the independent vari-ables T and L are observable if L iexcl T The truncated version Kaplan-Meier estimator is known to be the standard estimation method forthe marginal distribution of T or L The inverse probability weighted(IPW) estimator was suggested as an alternative and its agreementto the truncated version Kaplan-Meier estimator has been provedThis paper centers on the weak convergence of IPW estimators andvariance decomposition The paper shows that the asymptotic vari-ance of an IPW estimator can be decomposed into two sources Thevariation for the IPW estimator using known weight functions is theprimary source and the variation due to estimated weights shouldbe included as well Variance decomposition establishes the con-nection between a truncated sample and a biased sample with knowprobabilities of selection A simulation study was conducted to in-vestigate the practical performance of the proposed variance esti-mators as well as the relative magnitude of two sources of variationfor various truncation rates A blood transfusion data set is analyzedto illustrate the nonparametric inference discussed in the paper

Modeling Time-Varying Effects for High-Dimensional Covari-ates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji ZhuUniversity of Michiganyiliumichedu

Survival models with time-varying effects provide a flexible frame-work for modeling the effects of covariates on event times How-ever the difficulty of model construction increases dramatically asthe number of variable grows Existing constrained optimizationand boosting methods suffer from computational complexity Wepropose a new Gateaux differential-based boosting procedure forsimultaneously selecting and automatically determining the func-tional form of covariates The proposed method is flexible in that itextends the gradient boosting to functional differentials in generalparameter space In each boosting learning step of this procedureonly the best-fitting base-learner (and therefore the most informativecovariate) is added to the predictor which consequently encouragessparsity In addition the method controls smoothness which is cru-cial for improving predictive performance The performance of theproposed method is examined by simulations and by application toanalyze the national kidney transplant data

Flexible Modeling of Survival Data with Covariates Subject toDetection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang2

1Villanova University2North Carolina State Universitydzhang2ncsuedu

Models for survival data generally assume that covariates are fullyobserved However in medical studies it is not uncommon forbiomarkers to be censored at known detection limits A computa-tionally efficient multiple imputation procedure for modelling sur-vival data with covariates subject to detection limits is proposedThis procedure is developed in the context of an accelerated fail-ure time model with a flexible seminonparametric error distributionAn iterative version of the proposed multiple imputation algorithmthat approximates the EM algorithm for maximum likelihood is sug-gested Simulation studies demonstrate that the proposed multiple

imputation methods work well while alternative methods lead to es-timates that are either biased or more variable The proposed meth-ods are applied to analyze the dataset from a recently conductedGenIMS study

Session 37 High-Dimensional Data Analysis Theoryand Application

Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen ZhangUniversity of ArizonahzhangmatharizonaeduA new class of semiparametric functional regression models is con-sidered to jointly model the functional and non-functional predic-tors identifying important scalar covariates while taking into ac-count the functional covariate In particular we exploit a unifiedlinear structure to incorporate the functional predictor as in classi-cal functional linear models that is of nonparametric feature At thesame time we include a potentially large number of scalar predic-tors as the parametric part that may be reduced to a sparse represen-tation The new method performs variable selection and estimationby naturally combining the functional principal component analysis(FPCA) and the SCAD penalized regression under one frameworkTheoretical and empirical investigation reveals that efficient estima-tion regarding important scalar predictors can be obtained and en-joys the oracle property despite contamination of the noise-pronefunctional covariate The study also sheds light on the influence ofthe number of eigenfunctions for modeling the functional predic-tor on the correctness of model selection and accuracy of the scalarestimates

High-Dimensional Thresholded Regression and Shrinkage Ef-fectZemin Zheng Yingying Fan and Jinchi LvUniversity of Southern CaliforniazeminzheusceduHigh-dimensional sparse modeling via regularization provides apowerful tool for analyzing large-scale data sets and obtainingmeaningful interpretable models The use of nonconvex penaltyfunctions shows advantage in selecting important features in highdimensions but the global optimality of such methods still de-mands more understanding In this paper we consider sparse re-gression with hard-thresholding penalty which we show to giverise to thresholded regression This approach is motivated by itsclose connection with the L0-regularization which can be unreal-istic to implement in practice but of appealing sampling propertiesand its computational advantage Under some mild regularity con-ditions allowing possibly exponentially growing dimensionality weestablish the oracle inequalities of the resulting regularized estima-tor as the global minimizer under various prediction and variableselection losses as well as the oracle risk inequalities of the hard-thresholded estimator followed by a further L2-regularization Therisk properties exhibit interesting shrinkage effects under both es-timation and prediction losses We identify the optimal choice ofthe ridge parameter which is shown to have simultaneous advan-tages to both the L2-loss and prediction loss These new results andphenomena are evidenced by simulation and real data examples

Local Independence Feature Screening for Nonparametric andSemiparametric Models by Marginal Empirical LikelihoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu3

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 63

Abstracts

1University of Melbourne2University of Colorado Denver3North Carolina State UniversitychengyongtangucdenvereduWe consider an independence feature screening method for iden-tifying contributing explanatory variables in high-dimensional re-gression analysis Our approach is constructed by using the em-pirical likelihood approach in conjunction with marginal nonpara-metric regressions to surely capture the local impacts of explana-tory variables Without requiring a specific parametric form of theunderlying data model our approach can be applied for a broadrange of representative nonparametric and semi-parametric modelswhich include but are not limited to the nonparametric additivemodels single-index and multiple-index models and varying co-efficient models Facilitated by the marginal empirical likelihoodour approach addresses the independence feature screening prob-lem with a new insight by directly assessing evidence of significancefrom data on whether an explanatory variable is contributing locallyto the response variable or not Such a feature avoids the estima-tion step in most existing independence screening approaches andis advantageous in scenarios such as the single-index models whenthe identification of the marginal effect for its estimation is an issueTheoretical analysis shows that the proposed feature screening ap-proach can handle data dimensionality growing exponentially withthe sample size By extensive theoretical illustrations and empiricalexamples we show that the local independence screening approachworks promisingly

The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2

1Florida State University2University of MinnesotamaistatfsueduA new model-free screening method named fused Kolmogorov filteris proposed for high-dimensional data analysis This new method isfully nonparametric and can work with many types of covariatesand response variables including continuous discrete and categor-ical variables We apply the fused Kolmogorov filter to deal withvariable screening problems emerging from in a wide range of ap-plications such as multiclass classification nonparametric regres-sion and Poisson regression among others It is shown that thefused Kolmogorov filter enjoys the sure screening property underweak regularity conditions that are much milder than those requiredfor many existing nonparametric screening methods In particu-lar the fused Kolmogorov can still be powerful when covariatesare strongly dependent of each other We further demonstrate thesuperior performance of the fused Kolmogorov filter over existingscreening methods by simulations and real data examples

Session 38 Leading Across Boundaries Leadership De-velopment for Statisticians

Xiaoli Meng1Dipak Dey2 Soonmin Park3 James Hung4 WalterOffen5

1Harvard University2University of Connecticut3Eli Lilly and Company4United States Food and Drug Administration5AbbVie Inc1mengstatharvardedu2dipakdeyuconnedu

3park soominlillycom4hsienminghungfdahhsgov5walteroffenabbviecomThe role of statistician has long been valued as a critical collabo-rator in interdisciplinary collaboration Nevertheless statistician isoften regarded as a contributor more than a leader This stereotypehas limited statistics as a driving perspective in a partnership envi-ronment and inclusion of statistician in executive decision makingMore leadership skills are needed to prepare statisticians to play in-fluential roles and to promote our profession to be more impactfulIn this panel session statistician leaders from academia govern-ment and industry will share their insights about leadership andtheir experiences in leading in their respective positions Importantleadership skills and qualities for statisticians will be discussed bythe panelists This session is targeted for statisticians who intend toseek more knowledge and inspiration of leadership

Session 39 Recent Advances in Adaptive Designs inEarly Phase Trials

A Toxicity-Adaptive Isotonic Design for Combination Therapyin OncologyRui QinMayo ClinicqinruimayoeduWith the development of molecularly targeted drugs in cancer treat-ment combination therapy targeting multiple pathways to achievepotential synergy becomes increasingly popular While the dosingrange of individual drug may be already defined the maximum tol-erated dose of combination therapy is yet to be determined in a newphase I trial The possible dose level combinations which are par-tially ordered poses a great challenge for conventional dose-findingdesignsWe have proposed to estimate toxicity probability by isotonic re-gression and incorporate the attribution of toxicity into the consid-eration of dose escalation and de-escalation of combination therapySimulation studies are conducted to understand and assess its oper-ational characteristics under various scenarios The application ofthis novel design into an ongoing phase I clinical trial with dualagents is further illustrated as an example

Calibration of the Likelihood Continual Reassessment Methodfor Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung1

1Columbia University2Boehringer Ingelheim Pharmaceuticalssml2114columbiaeduThe likelihood continual reassessment method is an adaptive model-based design used to estimate the maximum tolerated dose in phaseI clinical trials The method is generally implemented in a two stageapproach whereby model based dose escalation is activated after aninitial sequence of patients are treated While it has been shown thatthe method has good large sample properties in finite sample set-tings it is important to specify a reasonable model We proposea systematic approach to select the initial dose sequence and theskeleton based on the concepts of indifference interval and coher-ence We compare the approaches to the traditional trial and errorapproach in the context of examples The systematic calibration ap-proach simplifies the model calibration process for the likelihoodcontinual reassessment method while being competitive comparedto a time consuming trial and error process We also share our expe-

64 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

rience using the calibration technique in real life applications usingthe dfcrm package in R

Sequential Subset Selection Procedure of Random Subset Sizefor Early Phase Clinical trialsCheng-Shiun Leu and Bruce LevinColumbia Universitycl94columbiaeduIn early phase clinical trials the objective is often to select a sub-set of promising candidate treatments whose treatment effects aregreater than the remaining candidates by at least a pre-specifiedamount to bring forward for phase III confirmatory testing Undercertain constraints such as budgetary limitations or difficulty of re-cruitment a procedure which select a subset of fixed pre-specifiedsize is entirely appropriate especially when the number of treat-ments available for further testing is limited However cliniciansand researchers often demand to identify all efficacious treatmentsin the screening process and a subset selection of fixed size may notbe sufficient to satisfy the requirement as the number of efficacioustreatments is unknown prior to the experiment To address this is-sue we discuss a family of sequential subset selection procedureswhich identify a subset of efficacious treatments of random sizethereby avoiding the need to pre-specify the subset size Variousversions of the procedure allow adaptive sequential elimination ofinferior treatments and sequential recruitment of superior treatmentsas the experiment processes We compare these new procedure withGuptarsquos random subset size procedure for selecting the one best can-didate by simulation

Serach Procedures for the MTD in Phase I TrialsShelemyyahu ZacksBinghamton UniversityshellymathbinghamtoneduThere are several competing methods of search for the MTD inPhase I Cancer clinical trials The paper will review some proce-dures and compare the operating characteristics of them In partic-ular the EWOC method of Rogatko and el will be highlighted

Session 40 High Dimensional RegressionMachineLearning

Variable Selection for High-Dimensional Nonparametric Ordi-nary Differential Equation Models With Applications to Dy-namic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu1

1University of Rochester2State University of New York at Albany3George Washington UniversityHongqi XueurmcrochestereduThe gene regulation network (GRN) is a high-dimensional complexsystem which can be represented by various mathematical or sta-tistical models The ordinary differential equation (ODE) model isone of the popular dynamic GRN models High-dimensional lin-ear ODE models have been proposed to identify GRNs but witha limitation of the linear regulation effect assumption We pro-pose a nonparametric additive ODE model coupled with two-stagesmoothing-based ODE estimation methods and adaptive groupLASSO techniques to model dynamic GRNs that could flexiblydeal with nonlinear regulation effects The asymptotic propertiesof the proposed method are established under the ldquolarge p small nrdquosetting Simulation studies are performed to validate the proposed

approach An application example for identifying the nonlinear dy-namic GRN of T-cell activation is used to illustrate the usefulnessof the proposed method

BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University2Cornell Universitypingli98gmailcomThe method of stable random projections is useful for efficientlyapproximating the lα distance in high dimension and it is naturallysuitable for data streams In this paper we propose to use only thesigns of the α = 1 (ie Cauchy random projections) we showthat the probability of collision can be accurately approximated asfunctions of the chi-square (χ2) similarity In text and vision ap-plications the χ2 similarity is a popular measure when the featuresare generated from histograms (which are a typical example of datastreams) Experiments confirm that the proposed method is promis-ing for large-scale learning applications The full paper is availableat arXiv13081009

A Sparse Linear Discriminant Analysis Method with Asymp-totic Optimality for Multiclass ClassificationRuiyan Luo and Xin QiGeorgia State UniversityrluogsueduRecently many sparse linear discriminant analysis methods havebeen proposed to overcome the major problems of the classic lineardiscriminant analysis in high-dimensional settings However theasymptotic optimality results are limited to the case that there areonly two classes as the classification boundary of LDA is a hyper-plane and there exist explicit formulas for the classification errorWe propose an efficient sparse linear discriminant analysis methodfor multiclass classification In practice this method can control therelationship between the sparse components and hence have im-proved prediction accuracy compared to other methods in both sim-ulation and case studies In theory we derive asymptotic optimalityfor our method as dimensionality and sample size go to infinity witharbitrary fixed number of classes

Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles KooperbergFred Hutchinson Cancer Research CenterychengfhcrcorgThe development in next-generation-sequencing technology en-ables the detection of both common and rare variants Genome wideassociation study (GWAS) benefits greatly from this fast growingtechnology Although a lot of associations between variants anddisease have been found for common variants new methods for de-tecting functional rare variants is still in urgent need Among exist-ing methods efforts have been done to increase detection power bydoing set-based test However none of the methods make a distinc-tion between functional variants and neutral variants (ie variantsthat do not have effect on the disease) In this paper we propose tomodel the effects from a set (for example a gene) of variants as aHidden Markov Model (HMM) For each SNP we model the effectsas a mixture of 0 and θ where θ is the true effect size The mixtureset up is to account for the fact that a proportion of the variants areneutral Another advantage of using HMM is it can account for pos-sible association between neighboring variants Our methods workswell for both linear model and logistic model Under the frameworkof HMM we test between having 1 components against more com-ponents and derived the asymptotic distribution under null hypoth-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 65

Abstracts

esis We show that our proposed methods works well as comparedto competitors under various scenario

Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin LiWright State UniversitygengxinliwrighteduEmpirical Bayes classification method is a useful risk prediction ap-proach for microarray data but it is challenging to apply this methodto risk prediction area using the mini exome sequencing data A ma-jor advantage of using this method is that the effect size distributionfor the set of possible features is empirically estimated and that allsubsequent parameter estimation and risk prediction is guided bythis distribution Here we generalize Efronrsquos method to allow forsome of the peculiarities of the mini exome sequencing data In par-ticular we incorporate quantitative trait information to binary traitprediction model and a new model named Joint Trait Model is pro-posed and we further allow this model to properly incorporate theannotation information of single nucleotide polymorphisms (SNPs)In the course of our analysis we examine several aspects of the pos-sible simulation model including the identity of the most importantgenes the differing effects of synonymous and non-synonymousSNPs and the relative roles of covariates and genes in conferringdisease risk Finally we compare the three methods to each otherand to other classifiers

Rank Estimation and Recovery of Low-rank Matrices For Fac-tor Model with Heteroscedastic NoiseJingshu Wang and Art B OwenStanford UniversitywangjingshususangmailcomWe consider recovery of low-rank matrices from noisy data withheteroscedastic noise We use an early stopping alternating method(ESAM) which iteratively alters the estimate of the noise vari-ance and the low-rank matrix and corrects over-fitting by an early-stopping rule Various simulations in our study suggest stoppingafter just 3 iterations and we have seen that ESAM gives better re-covery than the SVD on either the original data or the standardizeddata with the optimal rank given To select a rank we use an early-stopping bi-cross-validation (BCV) technique modified from BCVfor the white noise model Our method leaves out half the rows andhalf the columns as in BCV but uses low rank operations involvingESAM instead of the SVD on the retained data to predict the heldout entries Simulations considering both strong and weak signalcases show that our method is the most accurate overall comparedto some BCV strategies and two versions of Parallel Analysis (PA)PA is a state-of-the art method for choosing the number of factorsin Factor Analysis

Session 41 Distributional Inference and Its Impact onStatistical Theory and Practice

Stat Wars Episode IV A New Hope (For Objective Inference)Keli Liu and Xiao-Li MengHarvard UniversitymengstatharvardeduA long time ago in a galaxy far far away (pre-war England)It is a period of uncivil debate Rebel statisticians striking froman agricultural station have won their first victory against the evilBayesian EmpireA plea was made ldquoHelp me R A Fisher yoursquore my only hoperdquo

and Fiducial was born It promised posterior probability statementson parameters without a prior but at the seeming cost of violatingbasic probability laws Was Fisher crazy or did madness mask in-novation Fiducial calculations can be easily understood throughthe missing-data perspective which illuminates a trinity of missinginsightsI The Bayesian prior becomes an infinite dimensional nuisance pa-rameter to be dealt with using partial likelihoodII A Missing At Random (MAR) condition naturally characterizeswhen exact Fiducial solutions existIII Understanding the ldquomulti-phaserdquo structure underlying Fiducialinference leads to the development of approximate Fiducial proce-dures which remain robust to prior misspecificationIn the years after its introduction Fiducialrsquos critics branded it ldquoFish-ers biggest blunderrdquo But in the great words of Obi-Wan ldquoIf youstrike me down I shall become more powerful than you can possi-bly imaginerdquoTo be continued Episode V Ancillarity Paradoxes Strike Back (AtFiducial) and Episode VI Return of the Fiducialist will premiere re-spectively at IMS Asia Pacific Rim Meeting in Taipei (June 30-July3 2014) and at IMS Annual Meeting in Sydney (July 7-11 2014)

Higher Order Asymptotics for Generalized Fiducial InferenceAbhishek Pal Majumdarand Jan HannigUniversity of North Carolina at Chapel HilljanhannigunceduR A Fisherrsquos fiducial inference has been the subject of many dis-cussions and controversies ever since he introduced the idea duringthe 1930rsquos The idea experienced a bumpy ride to say the leastduring its early years and one can safely say that it eventually fellinto disfavor among mainstream statisticians However it appearsto have made a resurgence recently under various names and mod-ifications For example under the new name generalized inferencefiducial inference has proved to be a useful tool for deriving statis-tical procedures for problems where frequentist methods with goodproperties were previously unavailable Therefore we believe thatthe fiducial argument of RA Fisher deserves a fresh look from anew angle In this talk we investigate the properties of general-ized fiducial distribution using higher order asymptotics and pro-vide suggestions on some open issues in fiducial inference such asthe choice of data generating equation

Generalized Inferential ModelsRyan MartinUniversity of Illinois at ChicagorgmartinuiceduThe new inferential model (IM) framework provides prior-freeprobabilistic inference which is valid for all models and all sam-ple sizes The construction of an IM requires specification of anassociation that links the observable data to the parameter of inter-est and an unobservable auxiliary variable This specification canbe challenging however particularly when the parameter is morethan one dimension In this talk I will present a generalized (orldquoblack-boxrdquo) IM that bypasses full specification of the associationand the challenges it entails by working with an association basedon a scalar-valued parameter-dependent function of the data The-ory and examples demonstrate this method gives exact and efficientprior-free probabilistic inference in a wide variety of problems

Formal Definition of Reference Priors under a General Class ofDivergenceDongchu SunUniversity of Missouri

66 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

sundmissourieduReference analysis produces objective Bayesian inference that in-ferential statements depend only on the assumed model and theavailable data and the prior distribution used to make an inferenceis least informative in a certain information-theoretic sense BergerBernardo and Sun (2009) derived reference priors rigorously in thecontexts under Kullback-Leibler divergence In special cases withcommon support and other regularity conditions Ghosh Mergeland Liu (2011) derived a general f-divergence criterion for priorselection We generalize Ghosh Mergel and Liursquos (2011) results tothe case without common support and show how an explicit expres-sion for the reference prior can be obtained under posterior consis-tency The explicit expression can be used to derive new referencepriors both analytically and numerically

Session 42 Applications of Spatial Modeling and Imag-ing Data

Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1 andJames Coan2

1Duke University2University of Virginiatz3bvirginiaeduMulti-subject functional magnetic resonance imaging (fMRI) dataprovide opportunities to study population-wide relationship be-tween human brain activity and individual biological or behaviorialtraits But statistical modeling analysis and computation for suchmassive and noisy data with a complicated spatio-temporal corre-lation structure is extremely challenging In this article within theframework of Bayesian stochastic search variable selection we pro-pose a joint Ising and Dirichlet Process (Ising-DP) prior to achieveselection of spatially correlated brain voxels that are predictive ofindividual responses The Ising component of the prior utilizesof the spatial information between voxels and the DP componentshrinks the coefficients of the large number of voxels to a smallset of values and thus greatly reduces the posterior computationalburden To address the phase transition phenomenon of the Isingprior we propose a new analytic approach to derive bounds for thehyperparameters illustrated on 2- and 3-dimensional lattices Theproposed method is compared with several alternative methods viasimulations and is applied to the fMRI data collected from the Kiffhand-holding experiment

A Hierarchical Model for Simultaneous Detection and Estima-tion in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist21DePaul University2Johns Hopkins UniversityddegrasvdepauleduIn this paper we introduce a new hierarchical model for the simul-taneous detection of brain activation and estimation of the shapeof the hemodynamic response in multi-subject fMRI studies Theproposed approach circumvents a major stumbling block in stan-dard multi-subject fMRI data analysis in that it both allows theshape of the hemodynamic response function to vary across regionand subjects while still providing a straightforward way to estimatepopulation-level activation An efficient estimation algorithm is pre-sented as is an inferential framework that not only allows for testsof activation but also for tests for deviations from some canonical

shape The model is validated through simulations and applicationto a multi-subject fMRI study of thermal pain

On the Relevance of Accounting for Spatial Correlation A CaseStudy from FloridaLinda J Young1 and Emily Leary21USDA NASS RDD2University of FloridalindayoungnassusdagovIdentifying the potential impact of climate change is of increas-ing interest As an example understanding the effects of changingtemperature patterns on crops animals and public health is impor-tant if mitigation or adaptation strategies are to be developed Herethe consequences of the increasing frequency and intensity of heatwaves are considered First four decades of temperature data areused to identify heat waves for the six National Weather Serviceregions within Florida During these forty years each tempera-ture monitor has some days for which no data were recorded Thepresence of missing data has largely been ignored in this settingand analyses have been conducted based on observed data Alter-natively time series models spatial models or space-time modelscould be used to impute the missing data Here the effects of thetreatment of missing data on the identification of heat waves and thesubsequent inference related to the impact of heat waves on publichealth are explored

Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico2University of Texas at AustinghuertastatunmeduWe consider some recent developments to deal with climate mod-els and that rely on various modern computational and statisticalstrategies Firstly we consider various posterior sampling strate-gies to study a surrogate model that approximates a climate re-sponse through the Earthrsquos orbital parameters In particular weshow that for certain metrics of model skill AdaptiveDelayed Re-jection MCMC methods are effective to estimate parametric uncer-tainties and resolve inverse problems for climate models We willalso discuss some of the High Performance Computing efforts thatare taking place to calibrate various inputs that correspond to theNCAR Community Atmosphere Model (CAM) Finally we showhow to characterize output from a Regional Climate Model throughhierarchical modelling that combines Gauss Markov Random Fields(GMRF) with MCMC methods and that allows estimation of prob-ability distributions that underlie phenomena represented by the cli-mate output

Session 43 Recent Development in Survival Analysis andStatistical Genetics

Restricted Survival Time and Non-proportional HazardsZhigang ZhangMemorial Sloan Kettering Cancer CenterzhangzmskccorgIn this talk I will present some recent development of restricted sur-vival time and its usage especially when the proportional hazardsassumption is violated Technical advances and numerical studieswill both be discussed

Empirical Null using Mixture Distributions and Its Applicationin Local False Discovery RateDoHwan Park

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 67

Abstracts

University of MarylanddhparkumbceduWhen high dimensional data is given it is often of interest to distin-guish between significant (non-null Ha) and non-significant (nullH0) group from mixture of two by controlling type I error rate Onepopular way to control the level is the false discovery rate (FDR)This talk considers a method based on the local false discovery rateIn most of the previous studies the null group is commonly as-sumed to be a normal distribution However if the null distributioncan be departure from normal there may exist too many or too fewfalse discoveries (belongs null but rejected from the test) leadingto the failure of controlling the given level of FDR We propose anovel approach which enriches a class of null distribution based onmixture distributions We provide real examples of gene expressiondata fMRI data and protein domain data to illustrate the problemsfor overview

A Bayesian Illness-Death Model for the Analysis of CorrelatedSemi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici11Harvard University2Dana Farber Cancer InstitutekleehsphharvardeduReadmission rates are a major target of healthcare policy becausereadmission is common costly and potentially avoidable and henceis seen as an adverse outcome Therefore the Centers for Medicareand Medicaid Services currently uses 30-day readmission as a proxyoutcome for quality of care for a number of health conditions How-ever focusing solely on readmission rates in conditions with poorprognosis such as pancreatic cancer is to oversimplify a situationin which patients may die before being readmitted which clearlyis also an adverse outcome In such situations healthcare policyshould consider both readmission and death rates simultaneouslyTo this end our proposed Bayesian framework adopts an illness-death model to represent three transitions for pancreatic cancer pa-tients recently discharged from initial hospitalization (1) dischargeto readmission (2) discharge to death and (3) readmission to deathDependence between the two event times (readmission and death) isinduced via a subject-specific shared frailty Our proposed methodfurther extends the model to situations where patients within a hos-pital may be correlated due to unobserved characteristics We illus-trate the practical utility of our proposed method using data fromMedicare Part A on 100 of Medicare enrollees from 012000 to122010

Detection of Chromosome Copy Number Variations in MultipleSequencesXiaoyi Min Chi Song and Heping ZhangYale UniversityxiaoyiminyaleeduDNA copy number variation (CNV) is a form of genomic struc-tural variation that may affect human diseases Identification of theCNVs shared by many people in the population as well as deter-mining the carriers of these CNVs is essential for understanding therole of CNV in disease association studies For detecting CNVsin single samples a Screening and Ranking Algorithm (SaRa) waspreviously proposed which was shown to be superior over othercommonly used algorithms and have a sure coverage property Weextend SaRa to address the problem of common CNV detection inmultiple samples In particular we propose an adaptive Fisherrsquosmethod for combining the screening statistics across samples Theproposed multi-sample SaRa method inherits the computational and

practical benefits of single sample SaRa in CNV detection We alsocharacterize the theoretical properties of this method and demon-strate its performance in extensive numerical analyses

Session 44 Bayesian Methods and Applications in Clini-cal Trials with Small Population

Applications of Bayesian Meta-Analytic Approach at NovartisQiuling Ally He Roland Fisch and David OhlssenNovartis Pharmaceuticals CorporationallyhenovartiscomConducting an ethical efficient and cost-effective clinical trial hasalways been challenged by the availability of limited study popu-lation Bayesian approaches demonstrate many appealing featuresto deal with studies with small sample sizes and their importancehas been recognized by health authorities Novartis has been ac-tively developing and implementing Bayesian methods at differentstages of clinical development in both oncology and non-oncologysettings This presentation focuses on two applications of Bayesianmeta-analytic approach Both applications explore the relevant his-torical studies and establish meta-analysis to generate inferencesthat can be utilized by the concurrent studies The first example syn-thesized historical control information in a proof-of-concept studythe second application extrapolated efficacy from source to targetpopulation for registration purpose In both applications Bayesiansmethods are shown to effectively reduce the sample size durationof the studies and consequently resources invested

Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and Yuan Ji31University of Texas at Austin2Harvard University3University of Texas at AustinyxustatgmailcomTargeted therapies based on biomarker profiling are becoming amainstream direction of cancer research and treatment Dependingon the expression of specific prognostic biomarkers targeted ther-apies assign different cancer drugs to subgroups of patients evenif they are diagnosed with the same type of cancer by traditionalmeans such as tumor location For example Herceptin is only in-dicated for the subgroup of patients with HER2+ breast cancer butnot other types of breast cancer However subgroups like HER2+breast cancer with effective targeted therapies are rare and most can-cer drugs are still being applied to large patient populations that in-clude many patients who might not respond or benefit Also theresponse to targeted agents in human is usually unpredictable Toaddress these issues we propose SUBA subgroup-based adaptivedesigns that simultaneously search for prognostic subgroups and al-locate patients adaptively to the best subgroup-specific treatmentsthroughout the course of the trial The main features of SUBA in-clude the continuous reclassification of patient subgroups based ona random partition model and the adaptive allocation of patients tothe best treatment arm based on posterior predictive probabilitiesWe compare the SUBA design with three alternative designs in-cluding equal randomization outcome-adaptive randomization anda design based on a probit regression In simulation studies we findthat SUBA compares favorably against the alternatives

Innovative Designs and Practical Considerations for PediatricStudiesAlan Y Chiang

68 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Eli Lilly and CompanychiangaylillycomDespite representing a fundamental step to an efficacious and safeutilization of drugs in pediatric studies the conduct of clinical tri-als in children poses several problems Methodological issues andethical concerns represent the major obstacles that have tradition-ally limited research in small population The randomized controltrial mainstay of clinical studies to assess the effects of any thera-peutic intervention shows some weaknesses which make it scarcelyapplicable to the small population Alternatively and innovative ap-proaches to the clinical trial design in small populations have beendeveloped in the last decades with the aim of overcoming the limitsrelated to small samples and to the acceptability of the trial Thesefeatures make them particularly appealing for the pediatric popu-lation and patients with rare diseases This presentation aims toprovide a variety of designs and analysis methods to assess efficacyand safety in pediatric studies including their applicability advan-tages disadvantages and real case examples Approaches includeBayesian designs borrowing information from other studies andmore innovative approaches Thanks to their features these meth-ods may rationally limit the amount of experimentation in smallpopulation to what is achievable necessary and ethical and presenta reliable way of ultimately improving patient care

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis

partDSA for Deriving Survival Risk Groups Ensemble Learn-ing and Variable SelectionAnnette Molinaro1 Adam Olshen1 and Robert Strawderman2

1University of California at San Francisco2University of RochestermolinaroaneurosurgucsfeduWe recently developed partDSA a multivariate method that sim-ilarly to CART utilizes loss functions to select and partition pre-dictor variables to build a tree-like regression model for a given out-come However unlike CART partDSA permits both rsquoandrsquo and rsquoorrsquoconjunctions of predictors elucidating interactions between vari-ables as well as their independent contributions partDSA thus per-mits tremendous flexibility in the construction of predictive modelsand has been shown to supersede CART in both prediction accu-racy and stability As the resulting models continue to take the formof a decision tree partDSA also provides an ideal foundation fordeveloping a clinician-friendly tool for accurate risk prediction andstratificationWith right-censored outcomes partDSA currently builds estimatorsvia either the Inverse Probability Censoring Weighted (IPCW) orBrier Score weighting schemes see Lostritto Strawderman andMolinaro (2012) where it is shown in numerous simulations thatboth proposed adaptations for partDSA perform as well and of-ten considerably better than two competing tree-based methods Inthis talk various useful extensions of partDSA for right-censoredoutcomes are described and we show the power of the partDSA al-gorithm in deriving survival risk groups for glioma patient basedon genomic markers Another interesting extension of partDSA isas an aggregate learner A comparison will be made of standardpartDSA to an ensemble version of partDSA as well as to alterna-tive ensemble learners in terms of prediction accuracy and variableselection

Predictive Accuracy of Time-Dependent Markers for Survival

OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2

1University of Kentucky2University of North Carolina at Chapel HilllichenukyukyeduIn clinical cohort studies potentially censored times to a certainevent such as death or disease progression and patient charac-teristics at the time of diagnosis or the time of inclusion in thestudy (baseline) are often recorded Serial measurements on clin-ical markers during follow up may also be recorded for monitoringpurpose Recently there are increasing interests in incorporatingthese serial measurements of markers for the prediction of futuresurvival outcomes and assessing the predictive accuracy of thesetime-dependent markers In this paper we propose a new graphicalmeasure the negative predictive function to quantify the predictiveaccuracy of time-dependent markers for survival outcomes Thisnew measure has direct relevance to patient survival probabilitiesand thus has direct clinical utility We construct a nonparametricestimator for the proposed measure allowing censoring to dependon markers and adopt the bootstrap method to obtain the asymp-totic variances Simulation studies demonstrate that the proposedmethod performs well in practical situations One medical study ispresented

Estimating the Effectiveness in HIV Prevention Trials by Incor-porating the Exposure Process Application to HPTN 035 DataJingyang Zhang1 and Elizabeth R Brown2

1Fred Hutchinson Cancer Research Center2Fred Hutchinson Cancer Research CenterUniversity of Washing-tonjzhang2fhcrcorgEstimating the effectiveness of a new intervention is usually the pri-mary objective for HIV prevention trials The Cox proportionalhazard model is mainly used to estimate effectiveness by assum-ing that participants share the same risk under the covariates andthe risk is always non-zero In fact the risk is only non-zero whenan exposure event occurs and participants can have a varying riskto transmit due to varying patterns of exposure events Thereforewe propose a novel estimate of effectiveness adjusted for the hetero-geneity in the magnitude of exposure among the study populationusing a latent Poisson process model for the exposure path of eachparticipant Moreover our model considers the scenario in which aproportion of participants never experience an exposure event andadopts a zero-inflated distribution for the rate of the exposure pro-cess We employ a Bayesian estimation approach to estimate theexposure-adjusted effectiveness eliciting the priors from the histor-ical information Simulation studies are carried out to validate theapproach and explore the properties of the estimates An applicationexample is presented from an HIV prevention trial

Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2

1Penn State College of Medicine2Emory UniversitymwangphspsueduIn practice prediction models for cancer risk and prognosis playan important role in priority cancer research and evaluating andcomparing different models using predictive accuracy metrics in thepresence of censored data are of substantive interest by adjusting forcensoring mechanism To address this issue we evaluate two exist-ing metrics the concordance (c) statistic and the weighted c-statistic

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 69

Abstracts

which adopts an inverse-probability weighting technique under thecircumstances with dependent censoring mechanism via numericalstudies The asymptotic properties of the weighted c-statistic in-cluding consistency and normality is theoretically and rigorouslyestablished In particular the cases with high-dimensional prog-nostic factors (p is moderately large) are investigated to assess thestrategies for estimating the censoring weights by utilizing a regu-larization approach with lasso penalty In addition sensitivity anal-ysis is theoretically and practically conducted to assess predictiveaccuracy in the cases of informative censoring (ie not coarsened atrandom) using non-parametric estimates on the cumulative baselinehazard for the weights Finally a prostate cancer study is adopted tobuild up and evaluate prediction models of future tumor recurrenceafter surgery

Session 46 Missing Data the Interface between SurveySampling and Biostatistics

Likelihood-based Inference with Missing Data Under Missing-at-randomShu Yang and Jae Kwang KimIowa State Universityshuyangiastateedu

Likelihood-based inference with missing data is a challenging prob-lem because the observed log likelihood is of an integral form Ap-proximating the integral by Monte Carlo sampling does not neces-sarily lead to valid inference because the Monte Carlo samples aregenerated from a distribution with a fixed parameter valueWe consider an alternative approach that is based on the parametricfractional imputation of Kim (2011) In the proposed method thedependency of the integral on the parameter is properly reflectedthrough fractional weights We discuss constructing a confidenceinterval using the profile likelihood ratio test A Newton-Raphsonalgorithm is employed to find the interval end points Two limitedsimulation studies show the advantage of the likelihood-based in-ference over the Wald-type inference in terms of power parameterspace conformity and computational efficiency A real data exampleon Salamander mating (McCullagh and Nelder 1989) shows thatour method also works well with high-dimensional missing data

Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang ChenIowa State Universitysncheniastateedu

In this article we consider an imputation method to handle missingresponse values based on semiparametric quantile regression esti-mation In the proposed method the missing response values aregenerated using the estimated conditional quantile regression func-tion at given values of covariates We adopt the generalized methodof moments for estimation of parameters defined through a generalestimation equation We demonstrate that the proposed estimatorcombining both semiparametric quantile regression imputation andgeneralized method of moments is an effective alternative to pa-rameter estimation when missing data is present The consistencyand the asymptotic normality of our estimators are established andvariance estimation is provided Results from limited simulationstudies are presented to show the adequacy of the proposed method

A New Estimation with Minimum Trace of Asymptotic Covari-ance Matrix for Incomplete Longitudinal Data with a Surrogate

ProcessBaojiang Chen1 and Jing Qin2

1University of Nebraska2National Institutes of HealthbaojiangchenunmceduMissing data is a very common problem in medical and social stud-ies especially when data are collected longitudinally It is a chal-lenging problem to utilize observed data effectively Many paperson missing data problems can be found in statistical literature Itis well known that the inverse weighted estimation is neither effi-cient nor robust On the other hand the doubly robust method canimprove the efficiency and robustness As is known the doubly ro-bust estimation requires a missing data model (ie a model for theprobability that data are observed) and a working regression model(ie a model for the outcome variable given covariates and surro-gate variables) Since the DR estimating function has mean zero forany parameters in the working regression model when the missingdata model is correctly specified in this paper we derive a formulafor the estimator of the parameters of the working regression modelthat yields the optimally efficient estimator of the marginal meanmodel (the parameters of interest) when the missing data model iscorrectly specified Furthermore the proposed method also inher-its the doubly robust property Simulation studies demonstrate thegreater efficiency of the proposed method compared to the standarddoubly robust method A longitudinal dementia data set is used forillustration

Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook21Queenrsquos University2University of WaterloomcisaacmqueensucaResponse-dependent two-phase designs can ensure good statisti-cal efficiency while working within resource constraints Samplingschemes that are optimized for analyses based on mean score esti-mating equations have been shown to be highly efficient in a numberof different settings and are straightforward to implement if detailedpopulation characteristics are knownI will present an adaptive multi-phase design which exploits in-formation from an internal pilot study to approximate this optimalmean score design These adaptive designs are easy to implementand result in large efficiency gains while keeping study costs lowThe implementation of this design will be demonstrated using simu-lation studies motivated by an ongoing research program in rheuma-tology

Session 47 New Statistical Methods for Comparative Ef-fectiveness Research and Personalized medicine

Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin4

1University of Texas MD Anderson Cancer Center2Baylor College of Medicine3University of Texas MD Anderson Cancer Center4National Institutes of HealthyshenmdandersonorgUsing data from large observational studies may fill the informa-tion gaps due to lack of evidence from randomized controlled trialsSuch studies may inform real-world clinical scenarios and improveclinical decisions among various treatment strategies However the

70 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

design and analysis of comparative effectiveness studies based onobservational data are complex In this work we proposed prac-tical sample size and power calculation tools for prevalent cohortdesigns and suggested some efficient analysis methods as well

Choice between Superiority and Non-inferiority in Compara-tive Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford UniversityMei-ChiungShihvagovIn designing a comparative effectiveness experiment such as an ac-tive controlled clinical trial comparing a new treatment to an ac-tive control treatment or a comparative effectiveness trial comparingtreatments already in use one sometimes has to choose between asuperiority objective (to demonstrate that one treatment is more ef-fective than the other active treatments) and a non-inferiority objec-tive (to demonstrate that one treatment is no worse than other activetreatments within a pre-specified non-inferiority margin) It is oftendifficult to decide which study objective should be undertaken at theplanning stage when one does not have actual data on the compar-ative effectiveness of the treatments In this talk we describe twoadaptive design features for such trials (1) adaptive choice of su-periority and non-inferiority objectives during interim analyses (2)treatment selection instead of testing superiority The latter aims toselect treatments whose outcomes are close to that of the best treat-ment by eliminating at interim analyses non-promising treatmentsthat are unlikely to be much better than the observed best treatment

An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze LaiStanford UniversitymikebaiocchigmailcomThe demand for rigorous studies of dynamic treatment regimens isincreasing as medical providers treat larger numbers of patients withboth multi-stage disease states and chronic care issues (for examplecancer treatments pain management depression HIV) In this talkwe will propose a trial design developed specifically to be run in areal-world clinical setting These kinds of trials (sometimes calledldquopragmatic trialsrdquo) have several advantages which we will discussThey also pose two major problems for analysis (1) in runninga randomized trial in a clinical setting there is an ethical impera-tive to provide patients with the best outcomes while still collect-ing information on the relative efficacy of treatment regimes whichmeans traditional trial designs are inadequate in providing guidanceand (2) real-world considerations such as informative censoring ormissing data become substantial hurdles We incorporate elementsfrom both point-of-care randomized trials and multiarmed bandittheory and propose a unified method of trial design

Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins UniversityidiazjhueduWe present a methodology to evaluate the causal effect of a binarytreatment on a multinomial outcome when adjustment for covariatesis desirable Adjustment for baseline covariates may be desirableeven in randomized trials since covariates that are highly predic-tive of the outcome can substantially improve the efficiency Wefirst present a targeted minimum loss based estimator of the vec-tor of counterfactual probabilities This estimator is doubly robust

in observational studies and it is consistent in randomized trialsFurthermore it is locally semiparametric efficient under regular-ity conditions We present a variation of the previous estimatorthat may be used in randomized trials and that is guaranteed tobe asymptotically as efficient as the standard unadjusted estima-tor We use the previous results to derive a nonparametric extensionof the parameters in a proportional-odds model for ordinal-valueddata and present a targeted minimum loss based estimator Thisestimator is guaranteed to be asymptotically as or more efficientas the unadjusted estimator of the proportional-odds model As aconsequence this non-parametric extension may be used to test thenull hypothesis of no effect with potentially increased power Wepresent a motivating example and simulations using the data fromthe MISTIE II clinical trial of a new surgical intervention for strokeJoint work with Michael Rosenblum and Elizabeth Colantuoni

Session 48 Student Award Session 1

Regularization After Retention in Ultrahigh Dimensional Lin-ear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2

1Columbia University2Binghamton Universityhw2375columbiaedu

Lasso has proved to be a computationally tractable variable selec-tion approach in high dimensional data analysis However in theultrahigh dimensional setting the conditions of model selectionconsistency could easily fail The independence screening frame-work tackles this problem by reducing the dimensionality based onmarginal correlations before performing lasso In this paper we pro-pose a two-step approach to relax the consistency conditions of lassoby using marginal information in a different perspective from inde-pendence screening In particular we retain significant variablesrather than screening out irrelevant ones The new method is shownto be model selection consistent in the ultrahigh dimensional linearregression model A modified version is introduced to improve thefinite sample performance Simulations and real data analysis showadvantages of our method over lasso and independence screening

Personalized Dose Finding Using Outcome Weighted LearningGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hillguanhuacliveuncedu

In dose-finding clinical trials there is a growing recognition of theimportance to consider individual level heterogeneity when search-ing for optimal treatment doses Such optimal individualized treat-ment rule (ITR) for dosing should maximize the expected clinicalbenefit In this paper we consider a randomized trial design wherethe candidate dose levels are continuous To find the optimal ITRunder such a design we propose an outcome weighted learningmethod which directly maximizes the expected clinical beneficialoutcome This method converts the individualized dose selectionproblem into a penalized weighted regression with a truncated ell1loss A difference of convex functions (DC) algorithm is adoptedto efficiently solve the associated non-convex optimization prob-lem The consistency and convergence rate for the estimated ITRare derived and small-sample performance is evaluated via simula-tion studies We demonstrate that the proposed method outperformscompeting approaches We illustrate the method using data from aclinical trial for Warfarin (an anti-thrombotic drug) dosing

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 71

Abstracts

Survival Rates Prediction When Training Data and Target DataHave Different Measurement ErrorCheng Zheng and Yingye ZhengFred Hutchinson Cancer Research Centerzhengc68uweduNovel biologic markers have been widely used in predicting impor-tant clinical outcome One specific feature of biomarkers is thatthey often are ascertained with variations due to the specific processof measurement The magnitude of such variation may differ whenapplied to a different targeted population or when the platform forbiomarker assaying changes from original platform the predictionalgorithm (cutoffs) based upon Statistical methods have been pro-posed to characterize the effects of underlying error-free quantity inassociation with an outcome yet the impact of measurement errorsin terms of prediction has not been well studied We focus in thismanuscript on the settings which biomarkers are used for predictingindividualrsquos future risk and propose semiparametric estimators forerror-corrected risk when replicates of the error- prone biomark-ers are available The predictive performance of the proposed es-timators is evaluated and compared to alternative approaches withnumerical studies under settings with various assumptions on themeasurement distributions in the original cohort and a future cohortthe predictive rule is applied to We studied the asymptotic proper-ties of the proposed estimator Application is made in a liver cancerbiomarker study to predict risk of 3 and 4 years liver cancer inci-dence using age and a novel biomarker α-Fetoprotein

Hard Thresholded Regression Via Linear ProgrammingQiang SunUniversity of North Carolina at Chapel HillqsunliveunceduThis aim of this paper is to develop a hard thresholded regression(HTR) framework for simultaneous variable selection and unbiasedestimation in high dimensional linear regression This new frame-work is motivated by its close connection with best subset selectionunder orthogonal design while enjoying several key computationaland theoretical advantages over many existing penalization methods(eg SCAD or MCP) Computationally HTR is a fast two-step esti-mation procedure consisting of the first step for calculating a coarseinitial estimator and the second step for solving a linear program-ming Theoretically under some mild conditions the HTR estima-tor is shown to enjoy the strong oracle property and thresholed prop-erty even when the number of covariates may grow at an exponen-tial rate We also propose to incorporate the regularized covarianceestimator into the estimation procedure in order to better trade offbetween noise accumulation and correlation modeling Under thisscenario with regularized covariance matrix HTR includes Sure In-dependence Screening as a special case Both simulation and realdata results show that HTR outperforms other state-of-the-art meth-ods

Session 49 Network AnalysisUnsupervised Methods

Community Detection in Multilayer Networks A HypothesisTesting ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel HilljameswdemailunceduThe identification of clusters in relational data otherwise knownas community detection is an important and well-studied problemin undirected and directed networks Importantly the units of acomplex system often share multiple types of pairwise relationships

wherein a single community detection analysis does not account forthe unique types or layers In this scenario a sequence of networkscan be used to model each type of relationship resulting in a multi-layer network structure We propose and investigate a novel testingbased community detection procedure for multilayer networks Weshow that by borrowing strength across layers our method is ableto detect communities in scenarios that are impossible for contem-porary detection methods By investigating the performance andpotential use of our method through simulations and applicationon real multilayer networks we show that our procedure can suc-cessfully identify significant community structure in the multilayerregime

Network Enrichment Analysis with Incomplete Network Infor-mationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan2University of Washingtonmjingumichedu

Pathway enrichment analysis has become a key tool for biomed-ical researchers to gain insight in the underlying biology of dif-ferentially expressed genes proteins and metabolites It reducescomplexity and provides a systems-level view of changes in cellu-lar activity in response to treatments andor progression of diseasestates Methods that use pathway topology information have beenshown to outperform simpler methods based on over-representationanalysis However despite significant progress in understandingthe association among members of biological pathways and ex-pansion of new knowledge data bases such as Kyoto Encyclope-dia of Genes and Genomes Reactome BioCarta etc the exist-ing network information may be incompleteinaccurate and are notcondition-specific We propose a constrained network estimationframework that combines network estimation based on cell- andcondition-specific omics data with interaction information from ex-isting data bases The resulting pathway topology information issubsequently used to provide a framework for simultaneous test-ing of differences in mean expression levels as well as interactionmechanisms We study the asymptotic properties of the proposednetwork estimator and the test for pathway enrichment and investi-gate its small sample performance in simulated experiments and ona bladder cancer study involving metabolomics data

Estimation of A Linear Model with Fuzzy Data Treated as Spe-cial Functional DataWang DabuxilatuGuangzhou Universitywangdabugzhueducn

Data which cannot be exactly described by means of numerical val-ues such as evaluations medical diagnosi quality ratings vagueeconomic items to name but a few are frequently classified as ei-ther nominal or ordinal However we may be aware of that usingsuch representation of data (eg the categorises are labeled withnumerical values) the statistical analysis is limited and sometimesthe interpretation and reliability of the conclusions are effected Aneasy-to-use representation of such data through fuzzy values (fuzzydata) could be employed The measurement scale of fuzzy valuesincludes in particular real vectors and set values as special ele-ments It is more expressive than ordinal scales and more accuratethan rounding or using real or vectorial-valued codes The transi-tion between closely different values can be made gradually andthe variability accuracy and possible subjectiveness can be well re-flected in describing data Fuzzy data could be viewed as special

72 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

functional data the so-called support function of the data as it es-tablishes a useful embedding of the space of fuzzy data into a coneof a functional Hilbert spaceThe simple linear regression models with fuzzy data have been stud-ied from different perspectives and in different frameworks Theleast squares estimation on real-valued and set valued parametersunder generalized Hausdorff metric and the Hukuhara differenceare obtained However due to the nonlinearity of the space of fuzzyrandom sets it is difficult to consider the parameters estimation fora multivariate linear model with fuzzy random sets We will treatthe fuzzy data as special functional data to estimate a multivariatelinear model within a cone of a functional Hilbert space As a casewe consider LR fuzzy random sets (LR fuzzy values or LR fuzzydata) which are a sort of fuzzy data applied to model usual ran-dom experiments when the characteristic observed on each resultcan be described with fuzzy numbers of a particular class deter-mined by three random variables the center the left spread andthe right spread under the given shape functions L and R LRfuzzy random sets are widely applied in information science deci-sion making operational research economic and financial model-ings Using a least squares approach we obtain an estimation forthe set-valued parameters of the multivariate regression model withLR fuzzy random sets under L2 metric delta2dLSsome bootstrapdistributions for the spreads variables of the fuzzy random residualterm are given

Efficient Estimation of Sparse Directed Acyclic Graphs UnderCompounded Poisson DataSung Won Han and Hua ZhongNew York Universitysungwonhan2gmailcom

Certain gene expressions such as RNA-sequence measurementsare recorded as count data which can be assumed to follow com-pounded Poisson distribution This presentation proposes an effi-cient heuristic algorithm to estimate the structure of directed acyclicgraphs under the L1-penalized likelihood with the Poisson log-normal distributed data given that variable ordering is unknown Toobtain the close form of the penalize likelihood we apply Laplaceintegral approximation for unobserved normal variables and we useiterative two optimization steps to estimate an adjacency matrix andunobserved parameters The adjacency matrix is estimated by sepa-rable lasso problems and the unobserved parameters of the normaldistribution are estimated by separable optimization problems Thesimulation result shows that our proposed method performs betterthan the data transformation method in terms of true positive andMatthewrsquos correlation coefficient except for under low count datawith many zeros The large variance of data and the large numberof variables benefit more to the proposed method

Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and Harrison ZhouYale Universityzhaorenyaleedu

A tuning-free procedure is proposed to estimate the covariate-adjusted Gaussian graphical model For each finite subgraph thisestimator is asymptotically normal and efficient As a consequencea confidence interval can be obtained for each edge The proce-dure enjoys easy implementation and efficient computation throughparallel estimation on subgraphs or edges We further apply theasymptotic normality result to perform support recovery throughedge-wise adaptive thresholding This support recovery procedure

is called ANTAC standing for Asymptotically Normal estimationwith Thresholding after Adjusting Covariates ANTAC outper-forms other methodologies in the literature in a range of simulationstudies We apply ANTAC to identify gene-gene interactions us-ing a yeast eQTL (Genome-wide expression quantitative trait loci)dataset Our result achieves better interpretability and accuracy incomparison with the CAPME (covariate-adjusted precision matrixestimation) method proposed by Cai Li Liu and Xie (2013) This isa joint work with Mengjie Chen Hongyu Zhao and Harrison Zhou

Session 50 Personalized Medicine and Adaptive Design

MicroRNA Array NormalizationLi-Xuan and Qin ZhouMemorial Sloan Kettering Cancer CenterqinlmskccorgMicroRNA microarrays possess a number of unique data featuresthat challenge the assumption key to many normalization methodsWe assessed the performance of existing normalization methods us-ing two Agilent microRNA array datasets derived from the sameset of tumor samples one dataset was generated using a blockedrandomization design when assigning arrays to samples and hencewas free of confounding array effects the second dataset was gener-ated without blocking or randomization and exhibited array effectsThe randomized dataset was assessed for differential expression be-tween two tumor groups and treated as the benchmark The non-randomized dataset was assessed for differential expression afternormalization and compared against the benchmark Normaliza-tion improved the true positive rate significantly but still possessesa false discovery rate as high as 50 in the non-randomized dataregardless of the specific normalization method applied We per-formed simulation studies under various scenarios of differentialexpression patterns to assess the generalizability of our empiricalobservations

Combining Multiple Biomarker Models with Covariates in Lo-gistic Regression Using Modified ARM (Adaptive Regression byMixing) ApproachYanping Qiu1 and Rong Liu2

1Merck amp Co2Bayer HealthCarerongliuflgmailcomBiomarkers are wildly used as an indicator of some biological stateor condition in medical research One single biomarker may notbe sufficient to serve as an optimal screening device for early de-tection or prognosis for many diseases A combination of multiplebiomarkers will usually potentially lead to more sensitive screen-ing rules Therefore a great interest has been involved in develop-ing methods for combining biomarkers Biomarker selection pro-cedure will be necessary for efficient detections In this article wepropose a model-combining algorithm for classification with somenecessary covariates in biomarker studies It selects some best mod-els with some criterion and considers weighted combinations ofvarious logistic regression models via ARM (adaptive regressionby mixing) The weights and algorithm are justified using cross-validation methods Simulation studies are performed to assess thefinite-sample properties of the proposed model-combining methodIt is illustrated with an application to data from a vaccine study

A New Association Test for Case-Control GWAS Based on Dis-ease Allele SelectionZhongxue Chen

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 73

Abstracts

Indiana Universityzc3indianaeduCurrent robust association tests for case-control genome-wide asso-ciation study (GWAS) data are mainly based on the assumption ofsome specific genetic models Due to the richness of the geneticmodels this assumption may not be appropriate Therefore robustbut powerful association approaches are desirable Here we proposea new approach to testing for the association between the genotypeand phenotype for case-control GWAS This method assumes a gen-eralized genetic model and is based on the selected disease allele toobtain a p-value from the more powerful one-sided test Through acomprehensive simulation study we assess the performance of thenew test by comparing it with existing methods Some real data ap-plications are used to illustrate the use of the proposed test Basedon the simulation results and real data application the proposed testis powerful and robust

On Classification Methods for Personalized Medicine and Indi-vidualized Treatment RulesDaniel RubinUnited States Food and Drug AdministrationDanielRubinfdahhsgovAn important problem in personalized medicine is to construct in-dividualized treatment rules from clinical trials Instead of rec-ommending a single treatment for all patients such a rule tailorstreatments based on patient characteristics in order to optimize re-sponse to therapy In a 2012 JASA article Zhao et al showeda connection between this problem of constructing an individual-ized treatment rule and binary classification For instance in a two-arm clinical trial with binary outcomes and 11 randomization theproblem of constructing an individualized treatment rule can be re-duced to the classification problem in which one restricts to respon-ders and builds a classifier that predicts subjectsrsquo treatment assign-ments We extend this method to show an analogous reduction to theproblem in which one restricts to non-responders and must build aclassifier that predicts which treatments subjects were not assignedWe then use results from statistical efficiency theory to show howto efficiently combine the information from responders and non-responders Simulations show the benefits of the new methodology

Bayesian Adaptive Design for Dose-Finding Studies with De-layed Binary ResponsesXiaobi Huang1 and Haoda Fu2

1Merck amp Co2Eli Lilly and CompanyxiaobihuangmerckcomBayesian adaptive design is a popular concept in recent dose-findingstudies The idea of adaptive design is to use accrued data to makeadaptation or modification to an ongoing trial to improve the effi-ciency of the trial During the interim analysis most current meth-ods only use data from patients who have completed the studyHowever in certain therapeutic areas as diabetes and obesity sub-jects are usually studied for months to observe a treatment effectThus a large proportion of them have not completed the study atthe interim analysis It could lead to extensive information loss ifwe only incorporate subjects who completed the study at the interimanalysis Fu and Manner (2010) proposed a Bayesian integratedtwo-component prediction model to incorporate subjects who havenot yet completed the study at the time of interim analysis Thismethod showed efficiency gain with continuous delayed responsesIn this paper we extend this method to accommodate delayed bi-nary response and illustrate the Bayesian adaptive design through asimulation example

Session 51 New Development in Functional Data Analy-sis

Variable Selection and Estimation for Longitudinal Survey DataLi Wang1 Suojin Wang2 and Guannan Wang1

1University of Georgia2Texas AampM UniversityguannanugaeduThere is wide interest in studying longitudinal surveys where sam-ple subjects are observed successively over time Longitudinal sur-veys have been used in many areas today for example in the healthand social sciences to explore relationships or to identify signifi-cant variables in regression settings This paper develops a generalstrategy for the model selection problem in longitudinal sample sur-veys A survey weighted penalized estimating equation approachis proposed to select significant variables and estimate the coeffi-cients simultaneously The proposed estimators are design consis-tent and perform as well as the oracle procedure when the correctsubmodel were known The estimating function bootstrap is ap-plied to obtain the standard errors of the estimated parameters withgood accuracy A fast and efficient variable selection algorithm isdeveloped to identify significant variables for complex longitudinalsurvey data Simulated examples are illustrated to show the useful-ness of the proposed methodology under various model settings andsampling designs

Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and Boris Freydin1

1Thomas Jefferson University2George Washington UniversityapanasovichgwueduIn this work we develop an ordinary differential equations (ODE)model of physiological regulation of glycemia in type 1 diabetesmellitus (T1DM) patients in response to meals and intravenous in-sulin infusion Unlike for majority of existing mathematical modelsof glucose-insulin dynamics parameters in our model are estimablefrom a relatively small number of noisy observations of plasmaglucose and insulin concentrations For estimation we adopt thegeneralized smoothing estimation of nonlinear dynamic systems ofRamsay et al (2007) In this framework the ODE solution is ap-proximated with a penalized spline where the ODE model is in-corporated in the penalty We propose to optimize the generalizedsmoothing by using penalty weights that minimize the covariancepenalties criterion (Efron 2004) The covariance penalties criterionprovides an estimate of the prediction error for nonlinear estima-tion rules resulting from nonlinear andor non-homogeneous ODEmodels such as our model of glucose-insulin dynamics We alsopropose to select the optimal number and location of knots for B-spline bases used to represent the ODE solution The results of thesmall simulation study demonstrate advantages of optimized gen-eralized smoothing in terms of smaller estimation errors for ODEparameters and smaller prediction errors for solutions of differen-tial equations Using the proposed approach to analyze the glucoseand insulin concentration data in T1DM patients we obtained goodapproximation of global glucose-insulin dynamics and physiologi-cally meaningful parameter estimates

A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1

1New York University2Columbia Universityzhaoy05nyumcorg

74 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Resting-state functional magnetic resonance imaging (fMRI) is sen-sitive to functional brain changes related to many psychiatric disor-ders and thus becomes increasingly important in medical researchOne useful approach for fitting linear models with scalar outcomesand image predictors involves transforming the functional data tothe wavelet domain and converting the data fitting problem to a vari-able selection problem Applying the LASSO procedure in this sit-uation has been shown to be efficient and powerful In this study weexplore possible directions for improvements to this method The fi-nite sample performance of the proposed methods will be comparedthrough simulations and real data applications in mental health re-search We believe applying these procedures can lead to improvedestimation and prediction as well as better stability An illustrationof modeling psychiatric traits based on brain-imaging data will bepresented

Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan MizeraUniversity of AlbertalkongualbertacaWe consider the estimation in functional linear quantile regressionin which the dependent variable is scalar while the covariate is afunction and the conditional quantile for each fixed quantile indexis modeled as a linear functional of the covariate There are twocommon approaches for modeling the conditional mean as a linearfunctional of the covariate One is to use the functional principalcomponents of the covariates as basis to represent the functionalcovariate effect The other one is to extend the partial least squareto model the functional effect The former belongs to unsupervisedmethod and has been generalized to functional linear quantile re-gression The later is a supervised method and is superior to theunsupervised PCA method In this talk we propose to use partialquantile regression to estimate the functional effect in functionallinear quantile regression Asymptotic properties have been stud-ied and show the virtue of our method in large sample Simulationstudy is conducted to compare it with existing methods A real dataexample in stroke study is analyzed and some interesting findingsare discovered

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs

Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric ChiAmgen IncchiamgencomAs the patents of a growing number of biologic medicines have al-ready expired or are due to expire it has led to an increased interestfrom both the biopharmaceutical industry and the regulatory agen-cies in the development and approval of biosimilars EMA releasedthe first general guideline on similar biological medicinal productsin 2005 and specific guidelines for different drug classes subse-quently FDA issued three draft guidelines in 2012 on biosimilarproduct development A synthesized message from these guidancedocuments is that due to the fundamental differences between smallmolecule drug products and biologic drug products which are madeof living cells the generic versions of biologic drug products areviewed as similar instead of identical to the innovative biologicdrug product Thus more stringent requirement is necessary todemonstrate there are no clinically meaningful differences between

the biosimilar product and the reference product in terms of safetypurity and potency In this article we will briefly review statis-tical issues and challenges in clinical development of biosimilarsincluding criteria for biosimilarity and interchangeability selectionof endpoints and determination of equivalence margins equivalencevs non-inferiority bridging and regional effect and how to quan-tify totality-of-the-evidence

New Analytical Methods for Non-Inferiority Trials CovariateAdjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug AdministrationzhiweizhangfdahhsgovEven though an active-controlled trial provides no informationabout placebo investigators and regulators often wonder how theexperimental treatment would compare to placebo should a placeboarm be included in the study Such an indirect comparison oftenrequires a constancy assumption namely that the control effect rel-ative to placebo is constant across studies When the constancyassumption is in doubt there are ad hoc methods that ldquodiscountrdquothe historical data in conservative ways Recently a covariate ad-justment approach was proposed that does not require constancyor involve discounting but rather attempts to adjust for any imbal-ances in covariates between the current and historical studies Thiscovariate-adjusted approach is valid under a conditional constancyassumption which requires only that the control effect be constantwithin each subpopulation characterized by the observed covariatesFurthermore a sensitivity analysis approach has been developed toaddress possible departures from the conditional constancy assump-tion due to imbalances in unmeasured covariates This presentationdescribes these new approaches and illustrates them with examples

Where is the Right Balance for Designing an Efficient Biosim-ilar Clinical Program - A Biostatistic Perspective on Appro-priate Applications of Statistical Principles from New Drug toBiosimilarsYulan LiNovartis Pharmaceuticals Corporationyulanlinovartiscom

Challenges of designinganalyzing trials for Hepatitis C drugsGreg SoonUnited States Food and Drug AdministrationGuoxingSoonfdahhsgovThere has been a surge an outburst in drug developments to treathepatitis C virus (HCV) infection in the past 3-4 years and thelandscape has shifted significantly In particularly theresponse rateshaves steadily increased from approximately round 50 to now90 for HCV genotype 1 patients during this time While the suchchanging landscape es is beneficial were great for patientsit doeslead to some new challenges for new future HCV drugd evelopmentSome of the challenges include particularly in thechoice of controlsuccess efficacy winning criteria for efficacy and co-developmentof several drugs In this talk I will summarize the current landscapeof the HCV drug development and describe someongoing issues thatof interest

GSKrsquos Patient-level Data Sharing ProgramShuyen HoGlaxoSmithKline plcshu-yenyhogskcomIn May 2013 GSK launched an online system which would en-able researchers to request access to the anonymized patient-leveldata from published GSK-sponsored clinical trials of authorized or

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 75

Abstracts

terminated medicines Phase I-IV Consistent with expectations ofgood scientific practice researchers can request access and are re-quired to provide a scientific protocol with a commitment to publishtheir findings An Independent Review Panel is responsible for ap-proving or denying access to the data after reviewing a researcherrsquosproposal Once the request is approved and a signed Data SharingAgreement is received access to the requested data is provided ona password protected website to help protect research participantsrsquoprivacy This program is a step toward the ultimate aim of the clin-ical research community of developing a broader system where re-searchers will be able to access data from clinical trials conductedby different sponsors This talk will describe some of the details ofGSKrsquos data-sharing program including the opportunities and chal-lenges it presents We hope to bring the awareness of ICSAKISSsymposium participants on this program and encourage researchersto take full advantage of it to further clinical research

Session 53 Gatekeeping Procedures and Their Applica-tion in Pivotal Clinical Trials

A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane21Novartis Pharmaceuticals Corporation2Northwestern UniversitydongxinovartiscomWe generalize a multistage procedure for parallel gatekeeping towhat we refer to as k-out-of-n gatekeeping in which at least k outof n hypotheses in a gatekeeper family must be rejected in orderto test the hypotheses in the following family This gatekeepingrestriction arises in certain types of clinical trials for example inrheumatoid arthritis trials it is required that efficacy be shown onat least three of the four primary endpoints We provide a unifiedtheory of multistage procedures for arbitrary k with k = 1 corre-sponding to parallel gatekeeping and k = n to serial gatekeepingThe proposed procedure is simpler to apply for this particular prob-lem using a stepwise algorithm than the mixture procedure and thegraphical procedure with memory using entangled graphs

Multiple Comparisons in Complex Trial DesignsHM James HungUnited States Food and Drug AdministrationhsienminghungfdahhsgovAs the costs of clinical trials increase greatly in addition to otherconsiderations the clinical development program increasingly in-volves more than one trial for assessing the treatment effect of a testdrug particularly on adverse clinical outcomes A number of com-plex trial designs have been encountered in regulatory applicationsIn one scenario the primary efficacy endpoint requires two posi-tive trials to conclude a treatment effect while the key secondaryendpoint is a major adverse clinical endpoint such as mortality thatneeds to rely on integration of multiple trials in order to have a suf-ficient statistical power to show the treatment effect This presenta-tion is to stipulate the potential utility of such a trial design and thechallenging multiplicity issues with statistical inference

Use of Bootstrapping in Adaptive Designs with Multiplicity Is-suesJeff MacaQuintilesjeffmacaquintilescomWhen designing a clinical study there are often many parameterswhich are either unknown or not known with the precision neces-

sary to have confidence in the over design This has lead sponsors towant the design studies which are adaptive in nature and can adjustfor these design parameters by using data from the study to estimatethem As there are many different design parameters which dependon the type of study many different types of adaptive designs havebeen proposed It is also possible that one of the issues in the de-sign of the study is the optimal multiplicity strategy which could bebased on assumptions on the correlation of the multiple endpointswhich is often very difficult to know prior to the study start Theproposed methodology would use the data to estimate these param-eters and correct for any inaccuracies in the assumptions

Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael LeeJanssen Research amp Developmentmlee60itsjnjcomMultiplicity issues arise frequently in clinical trials with multipleendpoints andor multiple doses In drug development because ofregulatory requirements control of family-wise error rate (FWER)is essential in pivotal trials Numerous multiple testing proceduresthat control FWER in strong sense are available in literature Par-ticularly in the last decade efficient testing procedures such asfallback procedures gatekeeping procedures and the graphical ap-proach were proposed Depending on objectives of a study oneof these testing procedures can over-perform others To understandwhich testing procedure is preferable under certain circumstancewe use a simulation approach to evaluate performance of a few com-monly used multiple testing procedures Evaluation results and rec-ommendation will be presented

Session 54 Approaches to Assessing Qualitative Interac-tions

Interval Based Graphical Approach to Assessing Qualitative In-teractionGuohua Pan and Eun Young SuhJohnson amp JohnsonesuhitsjnjcomIn clinical studies comparing treatments the population often con-sists of subgroups of patients with different characteristics and in-vestigators often wish to know whether treatment effects are ho-mogeneous over various subgroups Qualitative interaction occurswhen the direction of treatment effect varies among subgroups Inthe presence of a qualitative interaction treatment recommendationis often challenging In medical research and applications to HealthAuthorities for approvals of new drugs qualitative interaction andits impact need to be carefully evaluated The initial statisticalmethod for assessing qualitative interaction was developed by Gailand Simon (GS) in 1985 and has been incorporated into commer-cial statistical software such as SAS While relatively often usedthe GS method and its interpretation are not easily understood bymedical researchers Alternative approaches have been researchedsince then One of the promising methods utilizes graphical repre-sentation of specially devised intervals for the treatment effects inthe subgroups If some of the intervals are to the left and others tothe right of a vertical line representing no treatment difference thereis then statistical evidence of a qualitative interaction and otherwisenot This feature similar to the familiar forest plots by subgroups isnaturally appealing to clinical scientists for examining and under-standing qualitative interactions These specially devised intervals

76 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

are shorter than simultaneous confidence intervals for treatment ef-fects in the subgroups and are shown to rival the GS method in sta-tistical power The method is easy to use and additionally providesan explicit power function which the GS method lacks This talkwill review and contrast statistical methods for assessing qualitativeinteraction with an emphasis on the above described graphical ap-proach Data from mega clinical trials on cardiovascular diseaseswill be analyzed to illustrate and compare the methods

Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong LuoCelgene Corporationxluocelgenecom

Post hoc findings of unexpected heterogeneous treatment effectshave been a challenge in the interpretation of clinical trials for spon-sor regulatory agencies and medical practitioners They are possi-ble simply due to chance or due to fundamental treatment effectdifferentiation Without repeating the resource intensive clinical tri-als it is critical to examine the framework of the given studies andto explore the likely model that may explain the overly simplifiedanalyses In this talk we will describe both theory and real clinicaltrials that can share lights on this complex and challenging issue

A Bayesian Approach to Qualitative InteractionEmine O BaymanUniversity of Iowaemine-baymanuiowaedu

A Bayesian Approach to Qualitative Interaction Author Emine OBayman Ph D emine-baymanuiowaedu Department of Anes-thesia Department of Biostatistics University of IowaDifferences in treatment effects between centers in a multi-centertrial may be important These differences represent treatment bysubgroup interaction Qualitative interaction occurs when the sim-ple treatment effect in one subgroup has a different sign than inanother subgroup1 this interaction is important Quantitative inter-action occurs when the treatment effects are of the same sign in allsubgroups and is often not important because the treatment recom-mendation is identical in all subgroupsA hierarchical model is used with exchangeable mean responsesto each treatment between subgroups Bayesian test of qualita-tive interaction is developed2 by calculating the posterior proba-bility of qualitative interaction and the corresponding Bayes factorThe model is motivated by two multi-center trials with binary re-sponses3 The frequentist power and type I error of the test usingthe Bayes factor are examined and compared with two other com-monly used frequentist tests Gail and Simon4 and Piantadosi andGail5 tests The impact of imbalance between the sample sizesin each subgroup on power is examined under different scenar-ios The method is implemented using WinBUGS and R and theR2WinBUGS interfaceREFERENCES 1 Peto R Statistical Aspects of Cancer TrialsTreatment of cancer Edited by Halnan KE London Chapman ampHall 1982 pp 867-871 2 Bayman EO Chaloner K Cowles MKDetecting qualitative interaction a Bayesian approach Statistics inMedicine 2010 29 455-63 3 Todd MM Hindman BJ Clarke WRTorner JC Intraoperative Hypothermia for Aneurysm Surgery TrialI Mild intraoperative hypothermia during surgery for intracranialaneurysm New England Journal of Medicine 2005 352 135-454 Gail M Simon R Testing for Qualitative Interactions betweenTreatment Effects and Patient Subsets Biometrics 1985 41 361-372 5 Piantadosi S Gail MH A comparison of the power of two

tests for qualitative interactions Statistics in Medicine 1993 121239-48

Session 55 Interim Decision-Making in Phase II Trials

Evaluation of Interim Dose Selection Methods Using ROC Ap-proachDeli Wang Lu Cui Lanju Zhang and Bo YangAbbVie Incdeliwangabbviecom

Interim analyses may be planned to drop inefficacious dose(s) indose-ranging clinical trials Commonly used statistical methods forinterim decision-making include conditional power (CP) predictedconfidence interval (PCI) and predictive power (PP) approachesFor these widely used methods it is worthy to have a closer look attheir performance characteristics and their interconnected relation-ship This research is to investigate the performance of these threestatistical methods in terms of decision quality based on a receiveroperating characteristic (ROC) method in the binary endpoint set-tings More precisely performance of each method is studied basedon calculated sensitivity and specificity under the assumed rangesof desirable as well as undesirable outcomes The preferred cutoffis determined and performance comparison across different meth-ods can be made With an apparent exchangeability of the threemethods a simple and uniform approach becomes possible

Interim Monitoring for Futility Based on Probability of SuccessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh3

1AbbVie Inc2Merck amp Co3GlaxoSmithKline plcyijiezhouabbviecom

Statistical significance has been the traditional focus of clinical trialdesign However an increasing emphasis has been placed on themagnitude of treatment effect based on point estimates to enablecross-therapy comparison The magnitude of point estimates todemonstrate sufficient medical value when compared with exist-ing therapies is typically larger than that to demonstrate statisticalsignificance Therefore a new clinical trial design and its interimmonitoring needs to take into account the trial success in terms ofthe magnitude of point estimates In this talk we propose a new in-terim monitoring approach for futility that targets on the probabilityof trial success in terms of achieving a sufficiently large point es-timate at end of the trial Simulation is conducted to evaluate theoperational characteristics of this approach

Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran SureshGlaxoSmithKline plcyuehui2wugskcom

Efficacy assessment is commonly seen in oncology trials as early asin Phase I trial expansion cohort part and phase II trials Early de-tection of efficacy signal or futility signal can greatly help the teamto make early decisions on future drug development plans such asstop for futility or start late phase planning In order to achievethis goal Bayesian adaptive design utilizing predictive probabilityis implemented This approach allows the team to monitor efficacydata constantly as the new patientrsquos data become available and makedecisions before the end of trial The primary endpoint in Oncologytrials is usually overall survival or progression free survival whichtakes long time to observe so surrogate endpoint such as overall

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 77

Abstracts

response rate is often used in early phase trials Multiple bound-aries for making future strategic decisions or for different endpointscan be provided Simulations play a vital role in providing variousdecision-making boundaries as well as the corresponding operatingcharacteristics Based on simulation results for each given samplesize the minimal sample size needed for the first interim look andthe futilityefficacy boundaries will be provided based on Bayesianpredictive probabilities Details of the implementation of this de-sign in real clinical trials will be demonstrated and pros and cons ofthis type of design will also be discussed

Session 56 Recent Advancement in Statistical Methods

Exact Inference New Methods and ApplicationsIan DinwoodiePortland State UniversityihdpdxeduExact inference concerns methods that generalize Fisherrsquos ExactTest for independence The methods are exact in the sense that teststatistics have distributions that do not depend on nuisance param-eters and asymptotic approximations are not used However com-putations are challenging and often require Monte Carlo methodsThis talk gives an overview with attention to sampling techniquesincluding Markov Chains and sequential importance sampling withnew applications to dynamical models and signalling networks

Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun HongSungkyunkwan UniversitycshongskkueduConsider the ROC surface which is a generalization of the ROCcurve for three-class diagnostic problems In this work we pro-pose five criteria for the three-class ROC surface by extending theYouden index the sum of sensitivity and specicity the maximumvertical distance the amended closest-to-(01) and the true rate Itmay be concluded that these five criteria can be expressed as a func-tion of two Kolmogorov-Smirnov (K-S) statistics It is found thatthe paired optimal thresholds selected from the ROC surface areequivalent to the two optimal thresholds found from the two ROCcurves Moreover we consider the volume under the ROC surface(VUS) The standard criteria of AUC for the probability of defaultbased on Basel II is extended to the VUS for ROC surface so thatthe standard criteria of VUS for the classification model is proposedThe ranges of AUC K-S and mean difference statistics correspond-ing to values of are VUS for each class of the standard criteria areobtained By exploring relationships of these statistics the standardcriteria of VUS for ROC surface could be established

Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho2

1Washington State University2Seoul National UniversityahnwsueduWe study the asymptotic properties of the reduced-rank estimator oferror correction models of vector processes observed with measure-ment errors Although it is well known that there is no asymptoticmeasurement error bias when predictor variables are integrated pro-cesses in regression models (Phillips and Durlauf 1986) we sys-tematically investigate the effects of the measurement errors (in thedependent variables as well as in the predictor variables) on the es-timation of not only cointegrating vectors but also the speed of ad-

justment matrix Furthermore we present the asymptotic propertiesof the estimators We also obtain the asymptotic distribution of thelikelihood ratio test for the cointegrating ranks and investigate theeffects of the measurement errors on the test through a Monte Carlosimulation study

A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying YuanUniversity of Texas MD Anderson Cancer Centerwshenmdandersonorg

Time-dependent areas under the receiver operating characteristics(ROC) curve (AUC) are important measures to evaluate the predic-tion accuracy of biomarkers for time-to-event endpoints (eg timeto disease progression or death) In this paper we propose a di-rect method to estimate AUC as a function of time using a flexiblefractional polynomials model without the middle step of modelingthe time-dependent ROC We develop a pseudo partial-likelihoodprocedure for parameter estimation and provide a test procedureto compare the predictive performance between biomarkers Weestablish the asymptotic properties of the proposed estimator andtest statistics A major advantage of the proposed method is itsease to make inference and compare the prediction accuracy acrossbiomarkers rendering our method particularly appealing for studiesthat require comparing and screening a large number of candidatebiomarkers We evaluate the finite-sample performance of the pro-posed method through simulation studies and illustrate our methodin an application to primary biliary cirrhosis data

Session 57 Building Bridges between Research and Prac-tice in Time Series Analysis

Time Series Research at the U S Census BureauBrian C MonsellU S Census Bureaubriancmonsellcensusgov

The Census Bureau has taken steps to reinforce the role of researchwithin the organization This talk will give details on the role of sta-tistical research at the U S Census Bureau with particular attentionpaid to the status of current work in time series analysis and statis-tical software development in time series A brief history of timeseries research will be given as well as details of work of historicalinterest

Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and Jinwoo Shin2

1IBM2KAIST Universitynabeusibmcom

Temporal causal modeling is an approach for modeling and causalinference based on time series data which is based on some recentadvances in graphical Granger modeling In this presentation wewill review the basic concept and approach some specific modelingalgorithms methods for associated functions (eg root cause anal-ysis) as well as some efforts of scaling these methods via parallelimplementation We will also describe some business applicationsof this approach in a number of domains (The authors are orderedalphabetically)

78 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Issues Related to the Use of Time Series in Model Building andAnalysisWilliam WS WeiTemple UniversitywweitempleeduTime series are used in many studies for model building and anal-ysis We must be very careful to understand the kind of time seriesdata used in the analysis In this presentation we will begin withsome issues related to the use of aggregate and systematic samplingtime series Since several time series are often used in a study of therelationship of variables we will also consider vector time seriesmodeling and analysis Although the basic procedures of modelbuilding between univariate time series and vector time series arethe same there are some important phenomena which are unique tovector time series Therefore we will also discuss some issues re-lated to vector time models Understanding these issues is importantwhen we use time series data in modeling and analysis regardlessof whether it is a univariate or multivariate time series

Session 58 Recent Advances in Design for BiostatisticalProblems

Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough CarriereUniversity of AlbertaKccarrieualbertacaN-of-1 trials are randomized multi-crossover experiments using twoor more treatments on a single patient They provide evidence-basedinformation on an individual patient thus optimizing the manage-ment of the individualrsquos chronic disease Such trials are preferredin many medical experiments as opposed to the more conventionalstatistical designs constructed to optimize treating the average pa-tient N-of-1 trials are also popular when the sample size is toosmall to adopt traditional optimal designs However there are veryfew guidelines available in the literature We constructed optimal N-of-1 designs for two treatments under a variety of conditions aboutthe carryover effects the covariance structure and the number ofplanned periods Extension to optimal aggregated N-of-1 designs isalso discussed

Efficient Algorithms for Two-stage Designs on Phase II ClinicalTrialsSeongho Kim1 and Weng Kee Wong2

1Wayne State UniversityKarmanos Cancer Institute2University of California at Los AngeleskimsekarmanosorgSingle-arm two-stage designs have been widely used in phase IIclinical trials One of the most popular designs is Simonrsquos optimaltwo-stage design that minimizes the expected sample size under thenull hypothesis Currently a greedy search algorithm is often usedto evaluate every possible combination of sample sizes for optimaltwo-stage designs However such a greedy strategy is computation-ally intensive and so is not feasible for large sample sizes or adaptivetwo-stage design with many parameters An efficient global op-timization discrete particle swarm optimization (DPSO) is there-fore developed to find two-stage designs efficiently and is comparedwith greedy algorithms for Simonrsquos optimal two-stage and adaptivetwo-stage designs It is further shown that DPSO can be efficientlyapplied to complicated adaptive two-stage designs even with threeprefixed possible response rates which a greedy algorithm cannothandle

D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle Swarm Op-timizationJiaheng Qiu and Weng Kee WongUniversity of California at Los AngeleswkwonguclaeduMultiple drug therapies are increasingly used to treat many diseasessuch as AIDS cancer and rheumatoid arthritis At the early stagesof clinical research the outcome is typically studied using a non-linear model with multiple doses from various drugs Advances inhandling estimation issues for such models are continually made butresearch to find informed design strategies has lagged We developa nature-inspired metaheuristic algorithm called ultra-dimensionalParticle Swarm Optimization (UPSO) to find D-optimal designs forthe Poisson and Exponential models for studying effects of up to 5drugs and their interactions This novel approach allows us to findeffective search strategy for such high-dimensional optimal designsand gain insight of their structure including conditions under whichlocally D-optimal designs are minimally supported We implementthe UPSO algorithm on a web site and apply it to redesign a realstudy that investigates 2-way interaction effects on the induction ofmicronuclei in mouse lymphoma cells from 3 genotoxic agents Weshow that a D-optimal design can reap substantial benefits over theimplemented design in Lutz et al (2005)

Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-Chung Wang3

and Weng Kee Wong4

1Institute of Statistical Science Academia Sinica2National Cheng Kung University3National Taiwan University4University of California at Los AngelesfredphoastatsinicaedutwSupersaturated designs (SSDs) are often used in screening experi-ments with a large number of factors to reduce the number of exper-imental runs As more factors are used in the study the search for anoptimal SSD becomes increasingly challenging because of the largenumber of feasible selection of factor level settings This talk tack-les this discrete optimization problem via a metaheuristic algorithmbased on Particle Swarm Optimization (PSO) techniques Usingthe commonly used E(s2) criterion as an illustrative example wewere able to modify the standard PSO algorithm and find SSDs thatsatisfy the lower bounds calculated in Bulutoglu and Cheng (2004)and Bulutoglu (2007) showing that the PSO-generated designs areE(s2)-optimal SSDs

Session 59 Student Award Session 2

Analysis of Sequence Data Under Multivariate Trait-DependentSamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari E North1Eric Boerwinkle2 and Dan-Yu Lin1

1University of North Carolina at Chapel Hill2University of Texas Health Science CentertaorliveunceduHigh-throughput DNA sequencing allows the genotyping of com-mon and rare variants for genetic association studies At the presenttime and in the near future it is not economically feasible to se-quence all individuals in a large cohort A cost-effective strategy isto sequence those individuals with extreme values of a quantitativetrait We consider the design under which the sampling depends on

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 79

Abstracts

multiple quantitative traits Under such trait-dependent samplingstandard linear regression analysis can result in bias of parameterestimation inflation of type 1 error and loss of power We con-struct a nonparametric likelihood function that properly reflects thesampling mechanism and utilizes all available data We implementa computationally efficient EM algorithm and establish the theoret-ical properties of the resulting nonparametric maximum likelihoodestimators Our methods can be used to perform separate inferenceon each trait or simultaneous inference on multiple traits We payspecial attention to gene-level association tests for rare variants Wedemonstrate the superiority of the proposed methods over standardlinear regression through extensive simulation studies We provideapplications to the Cohorts for Heart and Aging Research in Ge-nomic Epidemiology Targeted Sequencing Study and the NationalHeart Lung and Blood Institute Exome Sequencing Project

Empirical Likelihood Based Tests for Stochastic Ordering Un-der Right CensorshipHsin-wen Chang and Ian W McKeague

Columbia Universityhc2496columbiaedu

This paper develops an empirical likelihood approach to testing forstochastic ordering between two univariate distributions under rightcensorship The proposed test is based on a maximally selectedlocalized empirical likelihood ratio statistic The asymptotic nulldistribution is expressed in terms of a Brownian bridge The newprocedure is shown via a simulation study to have superior power tothe log-rank and weighted KaplanndashMeier tests under crossing haz-ard alternatives The approach is illustrated using data from a ran-domized clinical trial involving the treatment of severe alcoholichepatitis

Multiple Genetic Loci Mapping for Latent Disease Liability Us-ing a Structural Equation Modeling Approach with Applicationin Alzheimerrsquos DiseaseTing-Huei Chen

University of North Carolina at Chapel Hillthchenliveuncedu

Categorical traits such as cases-control status are often used as re-sponse variables in genome-wide association studies of genetic lociassociated with complex diseases Using categorical variables tosummarize likely continuous disease liability may lead to loss ofinformation thus reduction of power to recover associated geneticloci On the other hand a direct study of disease liability is ofteninfeasible because it is an unobservable latent variable In some dis-eases the underlying disease liability is manifested by several phe-notypes and thus the associated genetic loci may be identified bycombining the information of multiple phenotypes In this articlewe propose a novel method named PeLatent to address this chal-lenge We employ a structural equation approach to model the latentdisease liability by observed manifest variablesphenotypic infor-mation and to identify simultaneously multiple associated geneticloci by a regularized estimation method Simulation results showthat our method has substantially higher sensitivity and specificitythan existing methods Application of our method for a genome-wide association study of the Alzheimerrsquos disease (AD) identifies27 single nucleotide polymorphisms (SNPs) associated with ADThese 27 SNPs are located within 19 genes and several of thesegenes are known to be related to Alzheimerrsquos disease as well asneural activities

Session 60 Semi-parametric Methods

Semiparametric Estimation of Mean and Variance in General-ized Estimating EquationsJianxin Pan1 and Daoji Li21The University of Manchester2University of Southern CaliforniadaojilimarshallusceduEfficient estimation of regression parameters is a major objective inthe analysis of longitudinal data Existing approaches usually fo-cus on only modeling the mean and treat the variance as a nuisanceparameter The common assumption is that the variance is a func-tion of the mean and the variance function is further assumed to beknown However the estimator of the regression parameters can bevery inefficient if the variance function or variance is misspecifiedIn this paper a flexible semiparametric regression approach for lon-gitudinal data is proposed to jointly model the mean and varianceThe novel semiparametric mean and variance models offer greatflexibility in formulating the effects of covariates and time on themean and variance We simultaneously estimate the parametric andnonparametric components in the models by using a B-splines basedapproach The asymptotic normality of the resulting estimators forparametric components in the proposed models is established andthe optimal rate of convergence of the nonparametric components isobtained Our simulation study shows that our proposed approachyields more efficient estimators for the mean parameters than theconventional GEE approach The proposed approach is also illus-trated with real data analysis

An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan LiIndiana University-Purdue University IndianapolishpengmathiupuieduIn this talk wersquoll construct efficient estimators of linear functionalsof a probability measure when side information is available Our ap-proach is based on maximum empirical likelihood We will exhibitthat the proposed approach is mathematical simpler and computa-tional easier than the usual maximum empirical likelihood estima-tors Several examples are given about the possible side informa-tion We also report some simulation results

M-estimation for General ARMA Processes with Infinite Vari-anceRongning WuBaruch College City University of New YorkrongningwubaruchcunyeduGeneral autoregressive moving average (ARMA) models extend thetraditional ARMA models by removing the assumptions of causal-ity and invertibility The assumptions are not required under a non-Gaussian setting for the identifiability of the model parameters incontrast to the Gaussian setting We study M-estimation for generalARMA processes with infinite variance where the distribution ofinnovations is in the domain of attraction of a non-Gaussian stablelaw Following the approach taken by Davis et al (1992) and Davis(1996) we derive a functional limit theorem for random processesbased on the objective function and establish asymptotic propertiesof the M-estimator We also consider bootstrapping the M-estimatorand extend the results of Davis amp Wu (1997) to the present settingso that statistical inferences are readily implemented Simulationstudies are conducted to evaluate the finite sample performance ofthe M-estimation and bootstrap procedures An empirical exampleof financial time series is also provided

80 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Sufficient Dimension Reduction via Principal Lq Support Vec-tor MachineAndreas Artemiou1 and Yuexiao Dong2

1Cardiff University2Temple Universityydongtempleedu

Principal support vector machine was proposed recently by LiArtemiou and Li (2011) to combine L1 support vector machine andsufficient dimension reduction We introduce Lq support vector ma-chine as a unified framework for linear and nonlinear sufficient di-mension reduction By noticing that the solution of L1 support vec-tor machine may not be unique we set q gt 1 to ensure the unique-ness of the solution The asymptotic distribution of the proposedestimators are derived for q = 2 We demonstrate through numeri-cal studies that the proposed L2 support vector machine estimatorsimprove existing methods in accuracy and are less sensitive to thetuning parameter selection

Nonparametric Quantile Regression via a New MM AlgorithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong4

1College of Charleston1National Chengchi University2Shanghai University of Finance and Economics3Kansas State University4Temple Universitykaibcofcedu

Nonparametric quantile regression is an important statistical modelthat has been widely used in many research fields and applicationsHowever its optimization is very challenging since the objectivefunctions are non-differentiable In this work we propose a newMM algorithm for the nonparametric quantile regression modelThe proposed algorithm simultaneously updates the quantile func-tion and yield a smoother estimate of the quantile function We sys-tematically study the new MM algorithm in local linear quantile re-gression and show that the proposed algorithm preserves the mono-tone descent property of MM algorithms in an asymptotic senseMonte Carlo simulation studies will be presented to show the finitesample performance of the proposed algorithm

Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel Linder JingxianCai and Robert VogelGeorgia Southern Universityjxcai19880721hotmailcom

This article is intended to investigate the performance of two typesof stratified regression estimators namely the separate and the com-bined estimator using stratified ranked set sampling (SRSS) intro-duced by Samawi (1996) The expressions for mean and varianceof the proposed estimates are derived and are shown to be unbiasedA simulation study is designed to compare the efficiency of SRSSrelative to other sampling procedure under varying model scenar-ios Our investigation indicates that the regression estimator of thepopulation mean obtained through an SRSS becomes more efficientthan the crude sample mean estimator using stratified simple ran-dom sampling These findings are also illustrated with the help ofa data set on bilirubin levels in babies in a neonatal intensive careunitKey words Ranked set sampling stratified ranked set samplingregression estimator

Session 61 Statistical Challenges in Variable Selectionfor Graphical Modeling

Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1

1 University of Cambridge2 Columbia UniversityyangfengstatcolumbiaeduCommunity detection is one of the most widely studied problemsin network research In an undirected graph communities are re-garded as tightly-knit groups of nodes with comparatively few con-nections between them Popular existing techniques such as spec-tral clustering and variants thereof rely heavily on the edges beingsufficiently dense and the community structure being relatively ob-vious These are often not satisfactory assumptions for large-scalereal-world datasets We therefore propose a new community de-tection method called fused community detection (fcd) which isdesigned particularly for sparse networks and situations where thecommunity structure may be opaque The spirit of fcd is to takeadvantage of the edge information which we exploit by borrowingsparse recovery techniques from regression problems Our methodis supported by both theoretical results and numerical evidence Thealgorithms are implemented in the R package fcd which is availableon cran

High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2

1Temple University2Emory UniversityjichuntempleeduLarge-scale resting-state fMRI studies have been conducted for pa-tients with autism and the existence of abnormalities in the func-tional connectivity between brain regions (containing more thanone voxel) have been clearly demonstrated Due to the ultra-highdimensionality of the data current methods focusing on studyingthe connectivity pattern between voxels are often lack of power andcomputation-efficiency In this talk we introduce a new frameworkto identify the connection pattern of gigantic networks with desiredresolution We propose three procedures based on different networkstructures and testing criteria The asymptotical null distributions ofthe test statistics are derived together with its rate-optimality Sim-ulation results show that the tests are able to control type I error andyet very powerful We apply our method to a resting-state fMRIstudy on autism The analysis yields interesting insights about themechanism of autism

Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and Marina Vannucci31Stanford University2University of Texas MD Anderson Cancer Center3Rice UniversitycbpetersongmailcomIn this work we propose a Bayesian approach for inference of mul-tiple Gaussian graphical models Specifically we address the prob-lem of inferring multiple undirected networks in situations wheresome of the networks may be unrelated while others share com-mon features We link the estimation of the graph structures via aMarkov random field prior which encourages common edges Inaddition we learn which sample groups have shared graph structureby placing a spike-and-slab prior on the parameters that measurenetwork relatedness This approach allows us to share informationbetween sample groups when appropriate as well as to obtain a

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 81

Abstracts

measure of relative network similarity across groups In simula-tion studies we find improved accuracy of network estimation overcompeting methods particularly when the sample sizes within eachsubgroup are moderate We illustrate our model with an applica-tion to inference of protein networks for various subtypes of acutemyeloid leukemia

Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genevera IAllen2 and Zhandong Liu3

1University of Texas at Austin2Rice University3Baylor College of MedicineyuliabakerriceeduMarkov Random Fields or undirected graphical models are widelyused to model high-dimensional multivariate data Classical in-stances of these models such as Gaussian Graphical and Ising Mod-els as well as recent extensions (Yang et al 2012) to graphicalmodels specified by univariate exponential families assume all vari-ables arise from the same distribution Complex data from high-throughput genomics and social networking for example often con-tain discrete count and continuous variables measured on the sameset of samples To model such heterogeneous data we develop anovel class of mixed graphical models by specifying that each node-conditional distribution is a member of a possibly different univari-ate exponential family We study several instances of our modeland propose scalable M-estimators for recovering the underlyingnetwork structure Simulations as well as an application to learn-ing mixed genomic networks from next generation sequencing andmutation data demonstrate the versatility of our methods

Session 62 Recent Advances in Non- and Semi-Parametric Methods

Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential Spline Fam-ilyLan ZhouTexas AampM UniversitylzhoustattamueduIn this talk we introduce a method for joint estimation of multiplebivariate density functions for a collection of populations of proteinbackbone angles The method utilizes an exponential family of dis-tributions for which the log densities are modeled as a linear com-bination of a common set of basis functions The basis functionsare obtained as bivariate splines on triangulations and are adap-tively chosen based on dataThe circular nature of angular data istaken into account by imposing appropriate smoothness constraintsacross boundaries Maximum penalized likelihood is used for fit-ting the model and an effective Newton-type algorithm is devel-oped A simulation study clearly showed that the joint estimationapproach is statistically more efficient than estimating the densi-ties separately The proposed method provides a novel and uniqueperspective to two important and challenging problems in proteinstructure research namely structure-based protein classification andquality assessment of protein structure prediction servers The jointdensity estimation approach is widely applicable when there is aneed to estimate multiple density functions from different popula-tions with common features Moreover the coefficients of basisexpansion for the fitted densities provide a low-dimensional repre-sentation that is useful for visualization clustering and classifica-

tion of the densities This is joint work with Mehdi Maadooliat XinGao and Jianhua Huang

Estimating Time-Varying Effects for Overdispersed RecurrentData with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2 MounaAkacha3 and Heinz Schmidli31Vanderbilt University2University of North Carolina at Chapel Hill3Novartis Pharmaceuticals CorporationcindychenvanderbilteduIn the analysis of multivariate event times frailty models assum-ing time-independent regression coefficients are often consideredmainly due to their mathematical convenience In practice regres-sion coefficients are often time dependent and the temporal effectsare of clinical interest Motivated by a phase III clinical trial inmultiple sclerosis we develop a semiparametric frailty modellingapproach to estimate time-varying effects for overdispersed recur-rent events data with treatment switching The proposed model in-corporates the treatment switching time in the time-varying coeffi-cients Theoretical properties of the proposed model are establishedand an efficient EM algorithm is derived to obtain the maximumlikelihood estimates Simulation studies evaluate the numerical per-formance of the proposed model under various temporal treatmenteffect curves The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching aswell as for alternative models when the proportional hazard assump-tion is violated A multiple sclerosis dataset is analyzed to illustrateour methodology

Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily WangUniversity of GeorgialilywangugaeduIn this work we are interested in smoothing data over complex ir-regular boundaries or interior holes We propose bivariate penal-ized spline estimators over triangulations using energy functionalas the penalty We establish the consistency and asymptotic normal-ity for the proposed estimators and study the convergence rates ofthe estimators A comparison with thin-plate splines is provided toillustrate some advantages of this spline smoothing approach Theproposed method can be easily applied to various smoothing prob-lems over arbitrary domains including irregularly shaped domainswith irregularly scattered data points

Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and Annie Qu2

1Oregon State University2University of Illinois at Urbana-Champaign3Lung and Blood InstitutexuelstatoregonstateeduWe propose new varying-coefficient model selection and estimationbased on the spline approach which is capable of capturing time-dependent covariate effects The new penalty function utilizes local-region information for varying-coefficient estimation in contrast tothe traditional model selection approach focusing on the entire re-gion The proposed method is extremely useful when the signalsassociated with relevant predictors are time-dependent and detect-ing relevant covariate effects in the local region is more scientifi-cally relevant than those of the entire region However this bringschallenges in theoretical development due to the large-dimensionalparameters involved in the nonparametric functions to capture thelocal information in addition to computational challenges in solv-

82 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

ing optimization problems with overlapping parameters for differ-ent local-region penalization We provide the asymptotic theory ofmodel selection consistency on detecting local signals and estab-lish the optimal convergence rate for the varying-coefficient esti-mator Our simulation studies indicate that the proposed model se-lection incorporating local features outperforms the global featuremodel selection approaches The proposed method is also illus-trated through a longitudinal growth and health study from NationalHeart Lung and Blood Institute

Session 63 Statistical Challenges and Development inCancer Screening Research

Overdiagnosis in Breast and Prostate Cancer Screening Con-cepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing XiaFred Hutchinson Cancer Research CenterretzionifhcrcorgOverdiagnosis occurs when a tumor is detected by screening butin the absence of screening that tumor would never have becomesymptomatic within the lifetime of the patient Thus an overdiag-nosed tumor is a true extra diagnosis due solely to the existence ofthe screening test Patients who are overdiagnosed cannot by def-inition be helped by the diagnosis but they can be harmed partic-ularly if they are treated Therefore knowledge of the likelihoodthat a screen-detected cancer has been overdiagnosed is critical formaking treatment decisions and developing screening policy Theproblem of overdiagnosis has been long recognized in the case ofprostate cancer and is currently an area of extreme interest in breastcancer Published estimates of the frequency of overdiagnosis inbreast and prostate cancer screening vary greatly This presentationwill investigate why different studies yield such different resultsIrsquoll explain how overdiagnosis arises and catalog the different waysit may be measured in population studies Irsquoll then discuss differentapproaches that are used to estimate overdiagnosis Many studiesuse excess incidence under screening relative to incidence withoutscreening as a proxy for overdiagnosis Others use statistical mod-els to make inferences about lead time or disease natural historyand then derive the corresponding fraction of cases that are over-diagnosed Each approach has its limitations and challenges butone thing is clear estimation approach is clearly a major factor be-hind the variation in overdiagnosis estimates in the literature I willconclude with a list of key questions that consumers of overdiagno-sis studies should ask to determine the validity (or lack thereof) ofstudy results

Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington2Fred Hutchinson Cancer Research CenterlinoueuweduWith the growing importance of biomarker-based tests for early de-tection and monitoring of chronic diseases the question of howbest to utilize biomarker measurements is of tremendous interestthe answer requires understanding the biomarker growth processProspective screening studies offer an opportunity to investigatebiomarker growth while simultaneously assessing its value for earlydetection However since disease diagnosis usually terminates col-lection of biomarker measurements proper estimation of biomarkergrowth in these studies may need to account for how screening af-fects the length of the observed biomarker trajectory In this talk we

compare estimation of biomarker growth from prospective screen-ing studies using two approaches a retrospective approach that onlymodels biomarker growth and a prospective approach that jointlymodels biomarker growth and time to screen detection We assessperformance of the two approaches in a simulation study and usingempirical prostate-specific antigen data from the Prostate CancerPrevention Trial We find that the prospective approach accountingfor informative censoring often produces similar results but mayproduce different estimates of biomarker growth in some contexts

Estimating Screening Test Effectiveness when Screening Indica-tion is UnknownRebecca HubbardGroup Health Research Institutehubbardrghcorg

Understanding the effectiveness of cancer screening tests is chal-lenging when the same test is used for screening and also for dis-ease diagnosis in symptomatic individuals Estimates of screeningtest effectiveness based on data that include both screening and di-agnostic examinations will be biased Moreover in many cases goldstandard information on the indication for the examination are notavailable Models exist for predicting the probability that a givenexamination was used for a screening purpose but no previous re-search has investigated appropriate statistical methods for utilizingthese probabilities In this presentation we will explore alternativemethods for incorporating predicted probabilities of screening in-dication into analyses of screening test effectiveness Using sim-ulation studies we compare the bias and efficiency of alternativeapproaches We also demonstrate the performance of each methodin a study of colorectal cancer screening with colonoscopy Meth-ods for estimating regression model parameters associated with anunknown categorical predictor such as indication for examinationhave broad applicability in studies of cancer screening and otherstudies using data from electronic health records

Developing Risk-Based Screening Guidelines ldquoEqual Manage-ment of Equal RisksrdquoHormuzd KatkiNational Cancer Institutekatkihmailnihgov

The proliferation of disease risk calculators has not led to a prolif-eration of risk-based screening guidelines The focus of risk-basedscreening guidelines is connecting risk stratification under naturalhistory of disease (without intervention) to ldquobenefit stratificationrdquowhether the risk stratification better distinguishes people who havehigh benefit vs low benefit from a screening intervention To linkrisk stratification to benefit stratification we propose the principleof ldquoequal management of people at equal risk of diseaserdquo Whenapplicable this principle leads to simplified and consistent manage-ment of people with different risk factors or test results leading tothe same disease risk people who might also have a similar bene-fitharm profile We describe two examples of our approach Firstwe demonstrate how the ldquoequal management of equal risksrdquo prin-ciple was applied to thoroughly integrate HPV testing into the newrisk-based cervical cancer screening guidelines the first thoroughlyrisk-based US cancer screening guidelines Second we use risk oflung cancer death to estimate benefit stratification for targeting CTlung cancer screening We show how we calculated benefit strati-fication for CT lung screening and also the analogous ldquoharm strat-ificationrdquo and ldquoefficiency stratificationrdquo We critically examine thelimits of the ldquoequal management of equal risksrdquo principle This ap-proach of calculating benefit stratification and applying ldquoequal man-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 83

Abstracts

agement of equal risksrdquo might be applicable in other settings to helppave the way for developing risk-based screening guidelines

Session 64 Recent Developments in the Visualization andExploration of Spatial Data

Recent Advancements in Geovisualization with a Case Studyon Chinese ReligionsJuergen Symanzik1 and Shuming Bao2

1Utah State University2University of MichigansymanzikmathusueduProducing high-quality map-based displays for economic medicaleducational or any other kind of statistical data with geographiccovariates has always been challenging Either it was necessary tohave access to high-end software or one had to do a lot of detailedprogramming Recently R software for linked micromap (LM)plots has been enhanced to handle any available shapefiles fromGeographic Information Systems (GIS) Also enhancements havebeen made that allow for a fast overlay of various statistical graphson Google maps In this presentation we provide an overview ofthe necessary steps to produce such graphs in R starting with GIS-based data and shapefiles and ending with the resulting graphs inR We will use data from a study on Chinese religions and society(provided by the China Data Center at the University of Michigan)as a case study for these graphical methods

Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She21University of Michigan2Wuhan UniversitysbaoumicheduWith the rapid development of spatial and non-spatial databases ofpopulation economy social and natural environment from differ-ent sources times and formats It has been a challenge how to effi-ciently integrate those space-time data and methodology for spatialstudies This paper will discuss the recent development of spatialintelligence technologies and methodologies for spatial data inte-gration data analysis as well as their applications for spatial stud-ies The presentation will introduce the newly developed spatialdata explorers (China Geo-Explorer) distributed by the Universityof Michigan China Data Center It will demonstrate how space-timedata of different formats and sources can be integrated visualizedanalyzed and reported in a web based spatial system Some applica-tions in population and regional development disaster assessmentenvironment and health cultural and religious studies and house-hold surveys will be discussed for China and global studies Futuredirections will be discussed finally

Probcast Creating and Visualizing Probabilistic Weather Fore-castsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3 TilmannGneiting4 and Adrian Raftery21Seattle University2University of Washington3Bigger Boat Consulting4University HeidelbergsloughtjseattleueduProbabilistic methods are becoming increasingly common forweather forecasting However communicating uncertainty infor-mation about spatial forecasts to users is not always a straightfor-ward task The Probcast project (httpprobcastcom) looks to both

develop methodologies for spatial probabilistic weather forecast-ing and to develop means of communicating this information ef-fectively This talk will discuss both the statistical approaches usedto create forecasts and the cognitive psychology research used tofind the best ways to clearly communicate statistical and probabilis-tic information

Session 65 Advancement in Biostaistical Methods andApplications

Estimation of Time-Dependent AUC under Marker-DependentSamplingXiaofei Wang and Zhaoyin ZhuDuke UniversityxiaofeiwangdukeeduIn biomedical field evaluating the accuracy of a biomarker predict-ing the onset of a disease or a disease condition is essential Whenpredicting the binary status of disease onset is of interest the areaunder the ROC curve (AUC) is widely used When predicting thetime to an event is of interest time-dependent ROC curve (AUC(t))can be used In both cases however the simple random sampling(SRS) often used for biomarker validation is costly and requires alarge number of patients To improve study efficiency and reducecost marker-dependent sampling (MDS) has been proposed (Wanget al 2012 2013) in which selection of patients for ascertainingtheir survival outcomes is dependent on the results of biomarkerassays In this talk we will introduce a non-parametric estimatorfor time-dependent AUC(t) under MDS The consistency and theasymptotic normality of the proposed estimator will be discussedSimulation will be used to demonstrate the unbiasedness of the pro-posed estimator under MDS and the efficiency gain of MDS overSRS

A Measurement Error Approach for Modeling Accelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy DunloopNorthwestern Universityjungwha-leenorthwesterneduPhysical activity (PA) is a modifiable lifestyle factor for manychronic diseases with established health benefits PA outcomes us-ing accelerometers are measured and assessed in many studies butthere are limited statistical methods analyzing accelerometry dataWe describe a measurement error modeling approach to estimatethe distribution of habitual physical activity and the sources of vari-ation in accelerometer-based physical activity data from a sampleof adults with or at risk of knee osteoarthritis We model both theintra- and inter-individual variability in measured physical activityOur model allows us to account for and adjust for measurement er-rors biases and other sources of intra-individual variations

Real-Time Prediction in Clinical Trials A Statistical History ofREMATCHDaniel F Heitjan and Gui-shuang YingUniversity of PennsylvaniadheitjanupenneduRandomized clinical trials often include one or more planned in-terim analyses during which an external monitoring committee re-views the accumulated data and determines whether it is scientif-ically and ethically appropriate for the study to continue Withsurvival-time endpoints it is often desirable to schedule the interimanalyses at the times of occurrence of specified landmark eventssuch as the 50th event the 100th event and so on Because the

84 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

timing of such events is random and the interim analyses imposeconsiderable logistical burdens it is worthwhile to predict the eventtimes as accurately as possible Prediction methods available priorto 2001 used data only from previous trials which are often of ques-tionable relevance to the trial for which one wishes to make predic-tions With modern data management systems it is often feasibleto use data from the trial itself to make these predictions render-ing them far more reliable This talk will describe work that somecolleagues and students and I have done in this area I will set themethodologic development in the context of the trial that motivatedour work REMATCH a randomized clinical trial of a heart assistdevice that ran from 1998 to 2001 and was considered one of themost rigorous and expensive device trials ever conducted

An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C Morrison Elaine CJohnson Stephen R Planck and James T RosenbaumOregon Health amp Science UniversitychoidohsueduNormalization is considered an important step before any statisti-cal analyses in microarray studies Many methods have been pro-posed over the last decade or so for examples global normalizationlocal regression based methods and quantile normalization Nor-malization methods typically remove systemic biases across arraysand have been shown quite effective in removing them from arrayswhen they were processed simultaneously in a batch It is howeverreported that they sometimes do not remove differences betweenbatches when microarrays are split into several experiments over thetime In this presentation we will explore potential approaches thatcould adjust batch effects by using traditional methods and methodsdeveloped as a secondary normalization

Session 66 Analysis of Complex Data

Integrating Data from Heterogeneous Studies Using Only Sum-mary Statistics Efficiency and RobustnessMin-ge XieRutgers UniversitymxiestatrutgerseduHeterogeneous studies arise often in applications due to differentstudy and sampling designs populations or outcomes Sometimesthese studies have common hypotheses or parameters of interestWe can synthesize evidence from these studies to make inferencefor the common hypotheses or parameters of interest For hetero-geneous studies some of the parameters of interest may not be es-timable for certain studies and in such a case these studies are typ-ically excluded in conventional methods The exclusion of part ofthe studies can lead to a non-negligible loss of information This pa-per introduces a data integration method for heterogeneous studiesby combining the confidence distributions derived from the sum-mary statistics of individual studies It includes all the studies inthe analysis and makes use of all information direct as well as in-direct Under a general likelihood inference framework this newapproach is shown to have several desirable properties includingi) it is asymptotically as efficient as the maximum likelihood ap-proach using individual participant data (IPD) from all studies ii)unlike the IPD analysis it suffices to use summary statistics to carryout our approach Individual-level data are not required and iii) itis robust against misspecification of the working covariance struc-ture of the parameter estimates All the properties of the proposedapproach are further confirmed by data simulated from a random-

ized clinical trials setting as well as by real data on aircraft landingperformance (Joint work with Dungang Liu and Regina Liu)

A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University2Koc UniversityjlandongwueduIn this presentation we will consider a latent Markov process gov-erning the intensity rate of a Poisson process model for failure dataThe latent process enables us to infer the performance of the de-bugging operation over time and allows us to deal with the imper-fect debugging scenario We develop the Bayesian inference for themodel and also introduce a method to infer the unknown dimensionof the Markov process We will illustrate the implementation of ourmodel and the Bayesian approach by using actual software failuredata

A Comparison of Two Approaches for Acute Leukemia PatientClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng3

1University of Calgary2Enbridge Pipelines3University of GuelphjinwuucalgarycaThe advancement of microarray technology has greatly facilitatedthe research in gene expression based classification of patient sam-ples For example in cancer research microarray gene expressiondata has been used for cancer or tumor classification When thestudy is only focusing on two classes for example two different can-cer types we propose a two-sample semiparametric model to modelthe distributions of gene expression level for different classes Toestimate the parameters we consider both maximum semiparamet-ric likelihood estimate (MLE) and minimum Hellinger distance es-timate (MHDE) For each gene Wald statistic is constructed basedon either the MLE or MHDE Significance test is then performed oneach gene We exploit the idea of weighted sum of misclassificationrates to develop a novel classification model in which previouslyidentified significant genes only are involved To testify the useful-ness of our proposed method we consider a predictive approachWe apply our method to analyze the acute leukemia data of Golubet al (1999) in which a training set is used to build the classifica-tion model and the testing set is used to evaluate the accuracy of ourclassification model

On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 Valerie McGuire1

and John Shepherd3

1VA Palo Alto Health Care System amp Stanford University2Purdue University3University of California at San Francisco4VA Palo Alto Health Care SystemyingluvagovAlthough Deming regression (DR) has been successfully used toestablish cross-calibration (CC) formulas for bone mineral densities(BMD) between manufacturers at several anatomic sites it failedfor CC of whole body BMD because their relationship varies withsubjectrsquos weight total fat and lean mass We proposed to use a newvarying-coefficient DR (VCDR) that allows the intercept and slopebe non-linear functions of covariates and applied this new modelsuccessfully to derive a consistent calibration formula for the newwhole body BMD data Our results showed this VCDR effectively

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 85

Abstracts

removed all systematic bias in previous work In this talk we willdiscuss the consistency of the calibration formula and proceduresfor covariate selections

Session 67 Statistical Issues in Co-development of Drugand Biomarker

Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in Comparative Ef-fectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong Woo Kim3

1Stanford University2Onyx Pharmaceuticals3Microsoft Corportationdwkim88stanfordeduBiomarker-guided personalized therapies offer great promise to im-prove drug development and improve patient care but also posedifficult challenges in designing clinical trials for the developmentand validation of these therapies We first give a review of the exist-ing approaches briefly for clinical trials in new drug developmentand in more detail for comparative effectiveness trials involving ap-proved treatments We then introduce new group sequential designsto develop and test personalized treatment strategies involving ap-proved treatments

Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2

1University of Washington2National Institutes of HealthnrsimonuwashingtoneduMany difficult-to-treat diseases are actually a heterogenious collec-tion of similar syndromes with potentially different causal mech-anisms New molecules attack pathways that are dysregulated inonly a subset of this collection and so are expected to be effec-tive for only a subset of patients with the disease Often this subsetis not well understood until well into large scale of clinical trialsAs such standard practice has been to enroll a broad range of pa-tients and run post-hoc subset analysis to determine those who mayparticularly benefit This unnecessarily exposes many patients tohazardous side effects and may vastly decrease the efficiency of thetrial (expecially if only a small subset benefit) In this talk I willdiscuss a class of adaptive enrichment designs which allow the el-igibility criteria of a trial to be adaptively updated during the trialrestricting entry to only patients likely to benefit from the new treat-ment These designs control type I error can substantially increasepower I will also discuss and illustrate strategies for effectivelybuilding and evaluating biomarkers in this framework

An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael WolfAmgen IncmichaelwolfamgencomRoberts (Clin Cancer Res 2011) presented a single-arm 2-stageadaptive design to evaluate response overall and in one or morebiomarker-defined subgroup where biomarkers are only determinedfor responders While this design has obvious practical advantagesthe testing strategy proposed does not provide robust control offalse-positive error Modified futility and testing strategies are pro-posed based on marginal probabilities to achieve the same designobjectives that are shown to be more robust however a trade-off

is that biomarkers must be determined for all subjects Clinicalexamples of design setup and analysis are illustrated with a fixedsubgroup size that reflects its expected prevalence in the intendeduse population based on a validated in vitro companion diagnosticDesign efficiency and external validity are compared to testing fora difference in complement biomarker subgroups Possible gener-alizations of the design for a data-dependent subgroup size (egbiomarker value iquest sample median) and multiple subgroups are dis-cussed

Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas BengtssonGenentech IncthomasgbgenecomA key goal during early clinical co-development of a new therapeu-tic and a biomarker is to determine the ldquodiagnostic positive grouprdquoie to identify a sub-group of patients likely to receive a clini-cally meaningful treatment benefit We show that based on a typi-cally sized Ph1Ph2 study with nrevents iexcl 100 accurate biomarkerthreshold estimation with time-to-event data is not a realistic goalInstead we propose to hierarchically test for treatment effects inpre-determined patient subjects most likely to benefit clinically Weillustrate our method with data from a recent lung cancer trial

Session 68 New Challenges for Statistical Ana-lystProgrammer

Similarities and Differences in Statistical Programming amongCRO and Pharmaceutical IndustriesMark MatthewsinVentiv Health ClinicalmrkmtthwsyahoocomStatistical programming in the clinical environment has a widerange of opportunities across the clinical drug development cycleWhether you are employed by a Contract Research OrganizationPharmaceutical or Biotechnology company or as a contractor theprogramming tasks are often quite similar and at times the workcannot be differentiated by your employer However the higherlevel strategies and the direction any organization takes as an en-terprise can be an important factor in the fulfillment of a statisticalprogrammerrsquos career The author would like to share his experi-ences with the differences and similarities that a clinical statisticalprogrammer can be offered in their career and also provide someuseful tips on how to best collaborate when working with your peerprogrammers from different industries

Computational Aspects for Detecting Safety Signals in ClinicalTrialsJyoti RayamajhiEli Lilly and Companyrayamajhi jyotilillycomIt is always a challenge to detect safety signals from adverse event(AE) data in clinical trials which is a critical task in any drug devel-opment In any trial it is very desirable to describe and understandthe safety of the compound to the fullest possible extent MedDRAcoding scheme eg System Organ Class (SOC) and Preferred Term(PT) is used in safety analyses which is hierarchical in nature Useof Bayesian hierarchical models to predict posterior probabilitiesand will also account for AE in the same SOC to be more likelybe similar so they can sensibly borrow strength from each other

86 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

The model also allows borrowing strength across SOCs but doesnot impose it depending on the actual data It is interesting to seecomparative analyses between frequentistrsquos approach and an alter-native Bayesian methodology in detecting safety signals in clinicaltrials Computation of data to model these hierarchical models iscomplex and is challenging Data from studies were used to model3 Bayesian logistic regression hierarchical models Model selectionis achieved by using Deviance Information Criteria (DIC) Modelsand plots were implemented using BRugs R2WinBUGS and JAGSA scheme for meta analysis for a hierarchical three-stage Bayesianmixture model is also implemented and will be discussed An userfriendly and fully-functional web interface for safety signal detec-tion using Bayesian meta-analysis and general three-stage hierar-chical mixture model will be described Keywords System OrganClass Preferred terms Deviance Information Criteria hierarchicalmodels mixture model

Bayesian Network Meta-Analysis Methods An Overview andA Case StudyBaoguang Han1 Wei Zou2 and Karen Price11Eli Lilly and Company2inVentiv Clinical Healthhan baoguanglillycomEvidence-based health-care decision making requires comparing allrelevant competing interventions In the absence of direct head-to-head comparison of different treatments network meta-analysis(NMA) is increasingly used for selecting the best treatment strat-egy for health care intervention The Bayesian approach offers aflexible framework for NMA in part due to its ability to propagateparameter correlation structure and provide straightforward proba-bility statements around the parameters of interest In this talk wewill provide a general overview of the Bayesian NMA models in-cluding consistency models network meta-regression and inconsis-tency check using node-splitting techniques Then we will illustratehow NMA analysis can be performed with a detailed case studyand provide some details on available software as well as variousgraphical and textual outputs that can be readily understood and in-terpreted by clinicians

Session 69 Adaptive and Sequential Methods for ClinicalTrials

Bayesian Data Augmentation Dose Finding with Continual Re-assessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2

1 University of Texas MD Anderson Cancer Center2 University of Hong KongyyuanmdandersonorgA major practical impediment when implementing adaptive dose-finding designs is that the toxicity outcome used by the decisionrules may not be observed shortly after the initiation of the treat-ment To address this issue we propose the data augmentation con-tinual reassessment method (DA-CRM) for dose findingBy natu-rally treating the unobserved toxicities as missing data we showthat such missing data are nonignorable in the sense that the miss-ingness depends on the unobserved outcomes The Bayesian dataaugmentation approach is used to sample both the missing dataand model parameters from their posterior full conditional distri-butionsWe evaluate the performance of the DA-CRM through ex-tensive simulation studies and also compare it with other existingmethods The results show that the proposed design satisfactorily

resolves the issues related to late-onset toxicities and possesses de-sirable operating characteristicstreating patients more safely andalso selecting the maximum tolerated dose with a higher probabil-ity

Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying YuanUniversity of Texas MD Anderson Cancer Centeryzang1mdandersonorgIn developing targeted therapy the marker-strategy design providesan important approach to evaluate the predictive marker effect Thisdesign first randomizes patients into non-marker-based or marker-based strategies Patients allocated to the non-marker-based strat-egy are then further randomized to receive either the standard ortargeted treatments while patients allocated to the marker-basedstrategy receive treatments based on their marker statuses Thepredictive marker effect is tested by comparing the treatment out-come between the two strategies In this talk we show that sucha between-strategy comparison has low power to detect the predic-tive effect and is valid only under the restrictive condition that therandomization ratio within the non-marker-based strategy matchesthe marker prevalence To address these issues we propose a Waldtest that is generally valid and also uniformly more powerful thanthe between-strategy comparison Based on that we derive an opti-mal marker-strategy design that maximizes the power to detect thepredictive marker effect by choosing the optimal randomization ra-tios between the two strategies and treatments Our numerical studyshows that using the proposed optimal designs can substantially im-prove the power of the marker-strategy design to detect the predic-tive marker effect

Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at HoustonxlhuangmdandersonorgAs time goes by more and more data are observed for each pa-tient Dynamic prediction is to keep making updated predictionsof disease prognosis using all the available information This pro-posal is motivated by the need of real-time monitoring of the diseaseprogress of chronic myeloid leukemia patients using their BCR-ABL gene expression levels measured during their follow-up vis-its We provide real-time dynamic prediction for future prognosisusing a series of marginal Cox proportional hazards models overcontinuous time with constraints Comparing with separate land-mark analyses on different discrete time points after treatment ourapproach can achieve more smooth and robust predictions Com-paring with approaches of joint modeling of longitudinal biomark-ers and survival our approach does not need to specify a model forthe changes of the monitoring biomarkers and thus avoids the needof any kind of imputing of the biomarker values on time points theyare not available This helps eliminate the potential bias introducedby mis-specified models for longitudinal biomarkers

Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen2

1 Georgia State University2 Emory Universitycathysaiyogmailcom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 87

Abstracts

A phase II trial is an expedite and low cost trial to screen poten-tially effective agents for the following phase III trial Unfortu-nately the positive rate of Phase III trials is still low although agentshave been determined to be effective in proceeding Phase II trialsmainly because the different endpoints are used in Phase II (tu-mor response) and III (survival) trials Good disease response oftenleads to but can NOT guarantee better survival From statisticalconsideration transformation of continuous tumor size change intoa categorical tumor response (complete response partial responsestable disease or progressive disease) according to World HealthOrganization (WHO) or Response Evaluation Criteria In Solid Tu-mors (RECIST) will result in a loss of study power Tumor sizechange can be obtained rapidly but survival estimation requires along time follow up We propose a novel double screening phaseII design in which tumor size change percentage is used in the firststage to select potentially effective agents rapidly for second stagein which progression free or overall survival is estimated to confirmthe efficacy of agents The first screening can fully utilize all tumorsize change data and minimize cost and length of trial by stoppingit when agents are determined to be ineffective based on low stan-dard and the second screening can substantially increase the successrate of following Phase III trial by using similar or same outcomesand a high standard Simulation studies are performed to optimizethe significant levels of the two screening stages in the design andcompare its operating characteristics with Simonrsquos two stage designROC analysis is applied to estimate the success rate in the follow-upPhase III trials

Session 70 Survival Analysis

Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua NaranjoWestern Michigan UniversitybenedictpdormitoriowmicheduCox proportional hazards seems to be the standard statisticalmethod for analyzing treatment efficacy when time-to-event datais available In the absence of time-to-event investigators may uselogistic regression which does not require time-to-event or Poissonregression which requires only interval-summarized frequency ta-bles of time-to-event We investigate the relative performance of thethree methods In particular we compare the power of tests basedon the respective effect-size estimates (1)hazard ratio (2)odds ra-tio and (3)rate ratio We use a variety of survival distributions andcut-off points representing length of study The results have impli-cations on study design For example under what conditions mightwe recommend a simpler design based only on event frequenciesinstead of measuring time-to-event and what length of study is rec-ommended

Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary EndpointNibedita BandyopadhyayJanssen Research amp DevelopmentnbandyopitsjnjcomInterim analyses are widely used in Phase II and III clinical trialsThe efficiency in drug development process can be improved usinginterim analyses In clinical trials with time to an event as primaryendpoint it is common to plan the interim analyses at pre-specifiednumbers of events Performing these analyses at times with a differ-ent number of events than planned may impact the trialrsquos credibilityas well as the statistical properties of the interim analysis On the

other hand significant resources are required in conducting suchanalyses Therefore for logistic planning purposes it is very im-portant to predict the timing of this pre-specified number of eventsearly and accurately A statistical technique for making such pre-diction in ongoing multicenter clinical trials is developed Resultsare illustrated for different scenarios using simulations

Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen CaiUniversity of North Carolina at Chapel HillyudengliveunceduLogrank test is commonly used for comparing survival distributionsbetween treatment and control groups When censoring rate is lowand the sample size is moderate the approximation based on theasymptotic normal distribution of the logrank test works well in fi-nite samples However in some studies the sample size is small(eg 10 20 per group) and the censoring rate is high (eg 0809) Under such conditions we conduct a series of simulationsto compare the performance of the logrank test based on normal ap-proximation permutation and bootstrap In general the type I errorrate based on the bootstrap test is slightly inflated when the numberof failures is larger than 2 while the logrank test based on normalapproximation has a type I error around 005 and the permutationtest is relatively conservative in type I error However when thereis only one failure per group type I error of the permutation test ismore close to 005 than the other two tests

Session 71 Complex Data Analysis Theory and Appli-cation

Supervised Singular Value Decomposition and Its AsymptoticPropertiesGen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill2Rutgers UniversityhaipengemailunceduWe develop a supervised singular value decomposition (SupSVD)model for supervised dimension reduction The research is moti-vated by applications where the low rank structure of the data ofinterest is potentially driven by additional variables measured onthe same set of samples The SupSVD model can make use of theinformation in the additional data to accurately extract underlyingstructures that are more interpretable The model is very generaland includes the principal component analysis model and the re-duced rank regression model as two extreme cases We formulatethe model in a hierarchical fashion using latent variables and de-velop a modified expectation-maximization algorithm for parame-ter estimation which is computationally efficient The asymptoticproperties for the estimated parameters are derived We use com-prehensive simulations and two real data examples to illustrate theadvantages of the SupSVD model

New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng2

1University of Arizona2Columbia UniversitynhaomatharizonaeduIt is a challenging task to identify interaction effects for high di-mensional data The main difficulties lie in both computational and

88 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

theoretical aspects We propose a new framework for interaction se-lection Efficient computational algorithms based on both forwardselection and penalization approaches are illustrated

A Statistical Approach to Set Classification by Feature Selectionwith Applications to Classification of Histopathology ImagesSungkyu Jung1 and Xingye Qiao2

1University of Pittsburgh2Binghamton University State University of New YorkqiaomathbinghamtoneduSet classification problems arise when classification tasks are basedon sets of observations as opposed to individual observations In setclassification a classification rule is trained with N sets of observa-tions where each set is labeled with class information and the pre-diction of a class label is performed also with a set of observationsData sets for set classification appear for example in diagnosticsof disease based on multiple cell nucleus images from a single tis-sue Relevant statistical models for set classification are introducedwhich motivate a set classification framework based on context-freefeature extraction By understanding a set of observations as an em-pirical distribution we employ a data-driven method to choose thosefeatures which contain information on location and major variationIn particular the method of principal component analysis is usedto extract the features of major variation Multidimensional scal-ing is used to represent features as vector-valued points on whichconventional classifiers can be applied The proposed set classifica-tion approaches achieve better classification results than competingmethods in a number of simulated data examples The benefits ofour method are demonstrated in an analysis of histopathology im-ages of cell nuclei related to liver cancer

A Smoothing Spline Model for analyzing dMRI Data of Swal-lowingBinhuan Wang Ryan Branski Milan Amin and Yixin FangNew York UniversityyixinfangnyumcorgSwallowing disorders are common and have a significant healthimpact Dynamic magnetic resonance imaging (dMRI) is a noveltechnique for visualizing the pharynx and upper esophageal seg-ment during a swallowing process We develop a smoothing splinemethod for analyzing swallow dMRI data We apply the method toa dataset obtained from an experiment conducted in the NYU VoiceCenter

Session 72 Recent Development in Statistics Methods forMissing Data

A Semiparametric Inference to Regression Analysis with Miss-ing Covariates in Survey DataShu Yang and Jae-kwang KimIowa State UniversityjkimiastateeduWe consider parameter estimation in parametric regression modelswith covariates missing at random in survey data A semiparametricmaximum likelihood approach is proposed which requires no para-metric specification of the marginal covariate distribution We ob-tain an asymptotic linear representation of the semiparametric max-imum likelihood estimator (SMLE) using the theory of von Misescalculus and V Statistics which allows a consistent estimator ofasympototic variance An EM-type algorithm for computation isdiscussed We extend the methodology for general parameter es-timation which is not necessary equal to MLE Simulation results

suggest that the SMLE method is robust whereas the parametricmaximum likelihood method is subject to severe bias under modelmisspecification

Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2

1University of Waterloo2University of MichiganpeisonghanuwaterloocaWe propose an estimator which is more robust than doubly robustestimators by weighting the complete cases using weights otherthan the inverse probability when estimating the population meanof a response variable that is subject to ignorable missingness Weallow multiple models for both the propensity score and the out-come regression Our estimator is consistent if any one of the mul-tiple models is correctly specified Such multiple robustness againstmodel misspecification significantly improves over the double ro-bustness which only allows one propensity score model and oneoutcome regression model Our estimator attains the semiparamet-ric efficiency bound when one propensity score model and one out-come regression model are correctly specified without requiring theknowledge of exactly which two are correct

Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1

1United States Centers for Disease Control and Preventionjnu5cdcgovIn practice it is a challenge to impute missing values of binary vari-ables For a monotone missing pattern imputation methods avail-able in SAS include the LOGISTIC method which uses logistic re-gression modeling and the DISCRIM method which only allowscontinuous variables in the imputation model For an arbitrary miss-ing pattern a fully conditional specification (FCS) method is nowavailable in SAS This method only assumes the existence of a jointdistribution for all variables On the other hand IVEware devel-oped by University of Michigan Survey Research Center uses a se-quence of regression models and imputes missing values by drawingsamples from posterior predictive distributions We presents resultsfrom a series of simulations designed to evaluate and compare theperformance of the above mentioned imputation methods An ex-ample to impute the BED recent status (recent or long-standing)in estimating HIV incidence is used to illustrate the application ofthose procedures

Marginal Treatment Effect Estimation Using Pattern-MixtureModelZhenzhen XuUnited States Food and Drug AdministrationzhenzhenxufdahhsgovMissing data often occur in clinical trials When the missingness de-pends on unobserved responses pattern mixture model is frequentlyused This model stratifies the data according to drop-out patternsand formulates a model for each pattern with specific parametersThe resulting marginal distribution of response is a mixture of dis-tribution over the missing data patterns If the eventual interest is toestimate the overall treatment effect one can calculate a weightedaverage of pattern-specific treatment effects assuming that the treat-ment assignment is equally distributed across patterns Howeverin practice this assumption is unlikely to hold As a result theweighted average approach is subject to bias In this talk we in-troduce a new approach to estimate marginal treatment effect basedon random-effects pattern mixture model for longitudinal studieswith continuous endpoint relaxing the homogeneous distributional

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 89

Abstracts

assumption on treatment assignment across missing data patternsA simulation study shows that under missing not at random mech-anism the proposed approach can yield substantial reduction in es-timation bias and improvement in coverage probability comparedto the weighted average approach The proposed method is alsocompared with the linear mixed model and generalized estimatingequation approach under various missing data mechanisms

Session 73 Machine Learning Methods for Causal Infer-ence in Health Studies

Causal Inference of Interaction Effects with Inverse PropensityWeighting G-Computation and Tree-Based StandardizationJoseph Kang1 Xiaogang Su2 Lei Liu1 and Martha Daviglus31 Northwestern University2 University of Texas at El Paso3 University of Illinois at Chicagojoseph-kangnorthwesternedu

Given the recent interest of subgroup-level studies and personalizedmedicine health research with observational studies has been devel-oped for interaction effects of measured confounders In estimatinginteraction effects the inverse of the propensity weighting (IPW)method has been widely advocated despite the immediate availabil-ity of other competing methods such as G-computation estimatesThis talk compares the advocated IPW method the G-computationmethod and our new Tree-based standardization method whichwe call the Interaction effect Tree (IT) The IT procedure uses alikelihood-based decision rule to divide the subgroups into homo-geneous groups where the G-computation can be applied Our sim-ulation studies indicate that the IT-based method along with the G-computation works robustly while the advocated IPW method needssome caution in its weighting We applied the IT-based method toassess the effect of being overweight or obese on coronary arterycalcification (CAC) in the Chicago Healthy Aging Study cohort

Practice of Causal Inference with the Propensity of Being Zeroor OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and Peter M Steiner31 Northwestern University2University of CincinnatiCincinnati Childrenrsquos Hospital MedicalCenter3University of Wisconsin-Madisonwendychan2016unorthwesternedu

Causal inference methodologies have been developed for the pastdecade to estimate the unconfounded effect of an exposure underseveral key assumptions These assumptions include the absenceof unmeasured confounders the independence of the effect of onestudy subject from another and propensity scores being boundedaway from zero and one (the positivity assumption) The first twoassumptions have received much attention in the literature Yet thepositivity assumption has been recently discussed in only a few pa-pers Propensity scores of zero or one are indicative of deterministicexposure so that causal effects cannot be defined for these subjectsTherefore these subjects need to be removed because no compa-rable comparison groups can be found for such subjects In thispaper we evaluate and compare currently available causal inferencemethods in the context of the positivity assumption We propose atree-based method that can be easily implemented in R software Rcode for the studies is available online

Propensity Score and Proximity Matching Using Random For-estPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1

1San Diego State University2University of Texas at El PasojjfanmailsdsueduTo reduce potential bias in observational studies it is essential tohave balanced distributions on all available background informa-tion between cases and controls Propensity score has been a keymatching variable in this area However this approach has severallimitations including difficulties in handling missing values cate-gorical variables and interactions Random forest as an ensembleof many classification trees is straightforward to use and can eas-ily overcome those issues Each classification tree in random forestrecursively partitions the available dataset into sub-sets to increasethe purity of the terminal nodes With this process the cases andcontrols in the same terminal node automatically becomes the bestbalanced match By averaging the outcome of each individual treerandom forest can provide robust and balanced matching resultsThe proposed method is applied to data from the National Healthand Nutrition Examination Survey (NHNES)

Session 74 JP Hsu Memorial Session

Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili YuGeorgia Southern UniversitylyugeorgiasoutherneduThe classical accelerated failure time (AFT) model has been exten-sively investigated due to its direct interpretation of the covariateeffects on the mean survival time in survival analysis Howeverthis classical AFT model and its associated methodologies are builton the fundamental assumption of data homoscedasticity Conse-quently when the homoscedasticity assumption is violated as of-ten seen in the real applications the estimators lose efficiency andthe associated inference is not reliable Furthermore none of theexisting methods can estimate the intercept consistently To over-come these drawbacks we propose a semiparametric approach inthis paper for both homoscedastic and heteroscedastic data Thisapproach utilizes a weighted least-squares equation with syntheticobservations weighted by square root of their variances where thevariances are estimated via the local polynomial regression We es-tablish the limiting distributions of the resulting coefficient estima-tors and prove that both slope parameters and the intercept can beconsistently estimated We evaluate the finite sample performanceof the proposed approach through simulation studies and demon-strate its superiority through real example on its efficiency and reli-ability over the existing methods when the data is heteroscedastic

A Comparison of Size and Power of Tests of Hypotheses on Pa-rameters Based on Two Generalized Lindley DistributionsMacaulay OkwuokenyeBiogen IdecmacaulayokwuokenyebiogenideccomData (complete and censored) following the Lindley distributionare generated and analyzed using two generalized Lindley distribu-tions and maximum likelihood estimates of parameters from gen-eralized Lindley distributions are obtained Size and power of testsof hypotheses on the parameters are assessed drawing on asymp-totic properties of the maximum likelihood estimators Results sug-gest that whereas size of some of the tests of hypotheses based on

90 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the considered generalized distributions are essentially alpha-levelsome are possibly not power of tests of hypotheses on Lindley dis-tribution parameter from the two distributions differs

Session 75 Challenge and New Development in ModelFitting and Selection

Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2

1University of Nevada at Las Vegas2American Museum of Natural HistoryameiameiunlveduMutation frequencies can be modeled as a Poisson random field(PRF) to estimate speciation times and the degree of selection onnewly arisen mutations This approach provides a quantitative the-ory for comparing intraspecific polymorphism with interspecific di-vergence in the presence of selection and can be used to estimatepopulation genetic parameters First we modified a recently devel-oped time-dependent PRF model to independently estimate geneticparameters from a nuclear and mitochondrial DNA data set of 22sister pairs of birds that have diverged across a biogeographic bar-rier We found that species that inhabit humid habitat had more re-cent divergence times larger effective population sizes and smallerselective effect than those that inhabit drier habitats but overall themitochondrial DNA was under weak selection Our study indicatesthat PRF models are useful for estimating various population ge-netic parameters and serve as a framework for incorporating esti-mates of selection into comparative phylogeographic studies Sec-ond due to the built-in feature of the species divergence time thetime-dependent PRF model is especially suitable for estimating se-lective effects of more recent mutations such as the mutations thathave occurred in the human genome By analyzing the estimateddistribution of the selective coefficients at each individual gene forexample the sign and magnitude of the mean selection coefficientwe will be able to detect a gene or a group of genes that are relatedto the diagnosed cancer Moreover the estimate of the species diver-gence time will provide useful information regarding the occurrencetime of the cancer

On A Class of Maximum Empirical Likelihood Estimators De-fined By Convex FunctionsHanxiang Peng and Fei TanIndiana University-Purdue University IndianapolisftanmathiupuieduIn this talk we introduce a class of estimators defined by convexcriterion functions and show that they are maximum empirical like-lihood estimators (MELEs) We apply the results to obtain MELEsfor quantiles quantile regression and Cox regression when addi-tional information is available We report some simulation resultsand real data applications

Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai WangNew Jersey Institute of Technologyaw224njiteduGiven a dependent censored data (X delta) =(min(TC) I(T lt C)) from an Archimedean copula modelwe give general formulas for possible marginal survival functionsof T and C Based on our formulas we can easily establish therelationship between all these survival functions and derive some

useful identifiability results Also based on our formulas we pro-pose a new estimator of the marginal survival function when theArchimedean copula model is assumed to be known We derivebias formulas for our estimator and other existing estimators Simu-lation studies have shown that our estimator is comparable with thecopula-graphic estimator proposed by Zheng and Klein (1995) andRivest and Wells (2001) and Zheng and Kleinrsquos estimator (1994)under the Archimedean copula assumption We end our talk withsome discussions

Dual Model Misspecification in Generalized Linear Models withError in VariablesXianzheng HuangUniversity of Southern CaliforniahuangstatsceduWe study maximum likelihood estimation of regression parametersin generalized linear models for a binary response with error-pronecovariates when the distribution of the error-prone covariate or thelink function is misspecified We revisit the remeasurement methodproposed by Huang Stefanski and Davidian (2006) for detectinglatent-variable model misspecification and examine its operatingcharacteristics in the presence of link misspecification Further-more we propose a new diagnostic method for assessing assump-tions on the link function Combining these two methods yieldsinformative diagnostic procedures that can identify which model as-sumption is violated and also reveal the direction in which the truelatent-variable distribution or the true link function deviates fromthe assumed one

Session 76 Advanced Methods and Their Applications inSurvival Analysis

Kernel Smoothed Profile Likelihood Estimation in the Acceler-ated Failure Time Frailty Model for Clustered Survival DataBo Liu1 Wenbin Lu1 and Jiajia Zhang2

1North Carolina State University2South Carolina UniversityjzhangmailboxsceduClustered survival data frequently arise in biomedical applicationswhere event times of interest are clustered into groups such as fam-ilies In this article we consider an accelerated failure time frailtymodel for clustered survival data and develop nonparametric max-imum likelihood estimation for it via a kernel smoother aided EMalgorithm We show that the proposed estimator for the regressioncoefficients is consistent asymptotically normal and semiparamet-ric efficient when the kernel bandwidth is properly chosen An EM-aided numerical differentiation method is derived for estimating itsvariance Simulation studies evaluate the finite sample performanceof the estimator and it is applied to the Diabetic Retinopathy dataset

Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2

1National University of Singapore2Emory UniversityqizhengemoryeduMarginal regression-based ranking methods are widely adopted toscreen ultrahigh-dimensional biomarkers in biomedical studies Anassumed regression model may not fit a real data in practice Weconsider a model-free screening approach specifically for censoredlifetime data outcome by measuring the average survival differences

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 91

Abstracts

with and without the covariates The proposed survival impactingindex can be implemented with familiar nonparametric estimationprocedures and avoid imposing any rigid model assumptions Weestablish the sure screening property of the index and the asymptoticdistribution of the estimated index to facilitate inferences Simula-tions are carried out to assess the performance of our method Alung cancer data is analyzed as an illustration

Analysis of Event History Data in Tuberculosis (TB) ScreeningJoan HuSimon Fraser UniversityjoanhstatsfucaTuberculosis (TB) is an infectious disease spread by the airborneroute An important public health intervention in TB prevention istracing individuals (TB contacts) who may be at risk of having TBinfection or active TB disease as a result of having shared air spacewith an active TB case This talk presents an analysis of the datacollected from 7921 people identified as contacts from the TB reg-istry of British Columbia Canada in attempt to identify risk factorsto TB development of TB contacts Challenges encountered in theanalysis include clustered subjects covariate missing not at random(MNAR or NMAR) and a portion of subjects potentially will neverexperience the event of TB

On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1 and Mei-Cheng Wang3

1University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston3Johns Hopkins UniversityjningmdandersonorgBivariate or multivariate recurrent event processes are often encoun-tered in longitudinal studies in which more than one type of eventsare of interest There has been much research on regression analy-sis for such data but little has been done to address the problem ofhow to measure dependence between two types of recurrent eventprocesses We propose a time-dependent measure termed the rateratio to assess the local dependence between two types of recur-rent event processes We model the rate ratio as a parametric re-gression function of time and leave unspecified all other aspects ofthe distribution of bivariate recurrent event processes We developa composite-likelihood procedure for model fitting and parameterestimation We show that the proposed composite-likelihood esti-mator possesses consistency and asymptotically normality propertyThe finite sample performance of the proposed method is evaluatedthrough simulation studies and illustrated by an application to datafrom a soft tissue sarcoma study

Session 77 High Dimensional Variable Selection andMultiple Testing

On Procedures Controlling the False Discovery Rate for TestingHierarchically Ordered HypothesesGavin Lynch and Wenge GuoNew Jersey Institute of TechnologywengeguonjiteduComplex large-scale studies such as those related to microarray andquantitative trait loci often involve testing multiple hierarchicallyordered hypotheses However most existing false discovery rate(FDR) controlling procedures do not exploit the inherent hierarchi-cal structure among the tested hypotheses In this talk I present key

developments toward controlling the FDR when testing the hierar-chically ordered hypotheses First I offer a general framework un-der which hierarchical testing procedures can be developed Then Ipresent hierarchical testing procedures which control the FDR undervarious forms of dependence Simulation studies show that theseproposed methods can be more powerful than alternative methods

Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 and Yufeng Liu4

1University of Texas MD Anderson Cancer Center2North Carolina State University3University of Arizona4University of North Carolina at Chapel HillwustatncsueduReducing dimensionality of data is essential for binary classifica-tion with high-dimensional covariates In the context of sufficientdimension reduction (SDR) most if not all existing SDR meth-ods suffer in binary classification In this talk we target directly atthe SDR for binary classification and propose a new method basedon support vector machines The new method is supported by bothnumerical evidence and theoretical justification

Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional RegressionZhigen Zhao1 and Pengsheng Ji21Temple University2University of GeorgiapsjiugaeduThe variable selection and multiple testing problems for regres-sion have almost the same goalndashidentifying the important variablesamong many The research has been focusing on selection consis-tency which is possible only if the signals are sufficiently strongOn the contrary the signals in more modern applications are usu-ally rare and weak In this paper we developed a two-stage testingprocedure named it as ROMP short for the Rate Optimal Multi-ple testing Procedure because it achieves the fastest convergencerate of marginal false non-discovery rate (mFNR) while control-ling the marginal false discovery rate (mFDR) at any designatedlevel alpha asymptotically

Pathwise Calibrated Active Shooting Algorithm with Applica-tion to Semiparametric Graph EstimationTuo Zhao1 and Han Liu2

1Johns Hopkins University2Princeton UniversityhanliuprincetoneduThe pathwise coordinate optimization ndash combined with the activeset strategy ndash is arguably one of the most popular computationalframeworks for high dimensional problems It is conceptually sim-ple easy to implement and applicable to a wide range of convexand nonconvex problems However there is still a gap betweenits theoretical justification and practical success For high dimen-sional convex problems existing theories only show sublinear ratesof convergence For nonconvex problems almost no theory on therates of convergence exists To bridge this gap we propose a novelunified computational framework named PICASA for pathwise co-ordinate optimization The main difference between PICASA andexisting pathwise coordinate descent methods is that we exploit aproximal gradient pilot to identify an active set Such a modifica-tion though simple has profound impact with high probabilityPICASA attains a global geometric rate of convergence to a uniquesparse local solution with good statistical properties (eg minimaxoptimality oracle property) for solving a large family of convex and

92 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

nonconvex problems Unlike most existing analysis which assumesthat all the computation can be carried out exactly without worry-ing about numerical precision our theory explicitly counts the nu-merical computation accuracy and thus is more realistic The PI-CASA method is quite general and can be combined with differentcoordinate descent strategies such as cyclical coordinate descentgreedy coordinate descent and randomized coordinate descent As

an application we apply the PICASA method to a family of noncon-vex optimization problems motivated by estimating semiparametricgraphical models The PICASA method allows us to obtain newstatistical recovery results on both parameter estimation and graphselection consistency which do not exist in the existing literatureThorough numerical results are also provided to back up our theo-retical arguments

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 93

Index of Authors

Abantovalle C 19 38Abe N 30 78Ahn S 30 78Akacha M 31 82Allen GI 31 82Amei A 34 91Amin M 33 89Apanasovich TV 29 74Artemiou A 31 81Au S 32 85Aue A 24 56author) TZ( 27 67

Bai X 26 61Baiocchi M 28 71Bakanda C 21 42Baker Y 31 82Balasubramanian K 26 60Ball G 21 44Bandyopadhyay N 33 88Bao S 32 32 84 84Barrdahl M 22 49Bayman EO 30 77Becker K 21 42Bengtsson T 33 86Berger TW 21 45Bernhardt P 26 63Beyene J 21 42Bhamidi S 29 72Bidouard J 20 39Blocker AW 20 40Boerwinkle E 31 79Bornn L 20 39Boye ME 20 40Brannath W 23 50Branski R 33 89Braun T 22 47Breidt J 24 55Bretz F 23 50Brown ER 28 69Brown M 24 54

Cai C 23 34 53 92Cai J 31 33 81 88Campbell J 19 38Candille S 22 47Cao G 22 49Carriere KC 30 79Cepurna WO 32 85

Chan G 21 45Chan W 34 90Chang H 31 80Chang J 26 63Chatterjee A 31 81Chatterjee N 22 49Chen B 28 70Chen G 29 32 71 85Chen H 29 74Chen L 28 69Chen M 19 20 21 23 29

38 40 44 52 73Chen Q 31 82Chen R 31 79Chen S 25 28 58 70Chen T 31 80Chen X 26 61Chen Y 23 24 34 53 54

92Chen Z 22 29 33 49 73

87Cheng G 19 36Cheng X 20 39Cheng Y 21 27 44 65Chervoneva I 29 74Cheung YK 27 64Chi E 29 75Chiang AY 28 68Chiruvolu P 21 44Cho J 23 24 52 54Cho S 30 78Choi D 32 85Choi DS 24 54Choi S 22 33 48 87Chu R 21 42Chuang-Stein C 20 42Chun H 26 61Coan J 27 67Colantuoni E 28 71Collins R 21 42Coneelly K 22 47Cook R 28 70Coram M 22 47Crespi C 23 50Cui L 30 77Cui Y 22 33 46 87

DrsquoAmico E 21 42Dıaz I 28 71

Dabuxilatu W 29 72Dai J 27 65Daviglus M 34 90DeFor T 21 45Degras D 27 67Deng K 24 55Deng Y 33 88Dey D 19 38 64Dey DK 19 38Dey J 21 44Di Y 19 37Dinwoodie I 30 78Djorgovski G 20 39Dominici F 28 68Donalek C 20 39Dong G 23 50Dong Y 31 31 81 81Dormitorio B 33 88Drake A 20 39Du Z 24 54Duan Y 19 38Dunloop D 32 84Dyk DV 20 38

Edlefsen PT 21 43Elliott M 21 42Etzioni R 32 32 83 83

Fan B 32 85Fan J 34 90Fan Y 26 63Fang L 21 25 45 57Fang Y 33 89Faries D 26 61Faruquie T 30 78Fei T 24 54Feng H 22 47Feng Y 29 31 33 71 81

88Feng Z 32 85Fink J 21 44Fisch R 28 68Franceschini N 31 79Freydin B 29 74Fu H 23 25 25 29 50 59

59 74

Gaines D 19 38Gao B 25 57

Gentleman R 19Gneiting T 32 84Gong Q 21 45Graham M 20 39Gu C 32 85Guan W 22 48Gulati R 32 83Gulukota K 24 53Guo S 25 57Guo W 35 92Guo X 19 36

Ha MJ 23 51Hale MD 25 57Han B 33 87Han L 25 57Han P 34 89Han SW 29 73Haneuse S 28 68Hannig J 23 27 51 66Hao N 33 88He K 26 63He QA 28 68He T 22 46He W 20 24 42 57He X 23 53He Y 19 36Heitjan DF 25 32 59 84Hernandez-Stumpfhauser

D 24 55Ho S 30 75Hong CS 30 78Hong H 30 78Hong Y 19 38Hopcroft J 27 65Hormann S 24 56Hou L 23 52Houseman EA 20 40Hsu C 25 57Hsu L 20 41Hsu W 22 49Hu J 34 92Hu M 24 55Hu P 22 49Hu Y 19 37Huang C 21 21 45 45Huang J 25 60Huang M 31 81

95

Huang X 29 33 34 3474 87 91 92

Huang Y 23 26 51 62Hubbard R 32 83Huerta G 27 67Hung HJ 30 76Hung J 64Huo X 22 48

Ibrahim JG 20 31 40 82Inoue LY 32 83Islam SS 20 42

Jackson C 27 67Ji P 35 92Ji Y 24 24 28 53 53 68Jia N 25 59Jia X 27 64Jiang H 19 30 36 78Jiang Q 20 21 42 44Jiang X 19 38Jiang Y 19 36Jiao X 20 38Jin Z 21 44Johnson EC 32 85Johnson K 26 62Joshi AD 22 49Joslyn S 32 84Jung S 33 89Justice AC 25 59

Kai B 31 81Kambadur A 30 78Kang J 31 34 34 81 90

90Katki H 32 83Kim DW 33 86Kim J 34 89Kim JK 28 70Kim M 34 90Kim S 31 79Kim Y 22 48Kolivras K 19 38Kong L 29 75Kooperberg C 27 65Kosorok MR 29 71Kovalchik S 21 42Kracht K 21 44Kraft P 22 49Kuo H 22 48Kuo RC 19 38Kwon M 25 60

Lai M 32 82Lai RCS 23 51Lai T 28 71Lai TL 28 33 71 86Landon J 32 85Lang K 30 78Lavori PW 28 71Leary E 27 67

Lebanon G 26 60Lecci F 20 39Lee CH 21 45Lee J 24 32 53 84Lee KH 28 68Lee M 30 76Lee MT 24 56Lee S 27 64Lee SY 25 60Lee TCM 23 51Lenzenweger MF 21 43Leu CS 27 65Levin B 27 65Levy DL 21 43Li C 22 48Li D 31 80Li F 27 67Li G 23 27 33 50 66 88Li H 23 50Li J 19 34 37 38 91Li L 23 26 26 31 52 60

61 80Li M 19 22 37 48Li P 27 65Li R 26 62Li X 23 49Li Y 23 25 25 26 29 30

53 59 59 6375 79

Li-Xuan L 29 73Lian H 19 36Liang B 20 41Liang F 24 53Liang H 27 65Liao OY 33 86Lim J 22 48Lin D 28 31 69 79Linder D 31 81Lindquist M 27 67Lipshultz S 26 62Lipsitz S 26 62Liu B 34 91Liu D 20 22 41 46Liu H 28 35 70 92Liu J 26 61Liu JS 24 55Liu K 27 66Liu L 34 90Liu M 20 39 40Liu R 29 73Liu S 33 87Liu X 20 21 41 44Liu XS 24 54Liu Y 22 35 46 92Liu Z 31 82Long Q 28 69Lonita-Laza I 20 40Lou X 25 60Lozano A 30 78Lu T 27 65Lu W 20 34 39 91

Lu Y 20 32 39 85Luo R 27 65Luo S 23 51Luo X 21 30 45 77Lv J 26 63Lynch G 35 92

Ma H 20 22 42 49Ma J 29 72Ma P 20 40Ma TF 24 56Ma Z 22 46Maca J 30 76Mahabal A 20 39Mai Q 26 64Majumdar AP 27 66Malinowski A 21 46Mandrekar V 22 46Manner D 23 50Marniquet X 20 39Martin R 27 66Martino S 21 42Matthews M 33 86Maurer W 23 50McGuire V 32 85McIsaac M 28 70McKeague IW 31 80Meng X 27 64 66Mesbah M 24 56Mi G 19 37Mias GI 19 37Michailidis G 29 72Mills EJ 21 42Min X 28 68Mitra R 24 53Mizera I 29 75Molinaro A 28 69Monsell BC 30 78Morgan CJNA 21 43Morrison JC 32 85Mueller P 24 28 53 68

Nachega JB 21 42Naranjo J 33 88Nettleton D 23 51Nguyen HQ 22 47Nie L 29 75Nie X 23 51Ning J 23 28 30 33 34

53 70 78 87 92Nobel A 33 88Nobel AB 29 72Nordman DJ 23 51Norinho DD 24 56Normand S 25North KE 31 79Norton JD 20 41Nosedal A 27 67

Offen W 64Ogden RT 29 74

Ohlssen D 28 68Okwuokenye M 34 90Olshen A 28 69Owen AB 27 66Ozekici S 32 85

Paik J 28 71Pan G 30 76Pan J 31 80Pan Y 34 89Park D 28 67Park DH 22 48Park S 64Park T 25 60Pati D 26 62Peng H 31 34 80 91Peng J 19 37Peng L 26 34 62 91Perry P 24 54Peterson C 31 81Phoa FKH 31 79Pinheiro J 25 57Planck SR 32 85Prentice R 20 41Price K 23 33 50 87Prisley S 19 38Pullenayegum E 21 42

Qazilbash M 22 47Qi X 27 65Qian PZG 23 51Qiao X 29 33 71 89Qin J 21 28 45 70Qin R 27 64Qin ZS 24 55Qiu J 31 79Qiu Y 29 73Qu A 32 82Quartey G 20 42

Raftery A 32 84Ravikumar P 31 82Rayamajhi J 33 86Ren Z 29 73Rohe K 24 54Rosales M 21 43Rosenbaum JT 32 85Rosenblum M 28 71Rube HT 19 37Rubin D 29 74

Saegusa T 24 54Salzman J 19 36Samawi H 31 81Samorodnitsky G 27 65Samworth RJ 31 81Schafer DW 19 37Schlather M 21 46Schmidli H 31 82Schrag D 28 68Scott J 20 42

Shadel W 21 42Shao Y 25 57Shariff H 20 38She B 32 84Shen H 33 88Shen W 20 30 40 78Shen Y 28 70Shepherd J 32 85Shi P 32 82Shih M 28 71Shin J 30 78Shin SJ 35 92Shojaie A 24 29 54 72Shu X 32 82Shui M 32 84SienkiewiczE 21 46Simon N 33 86Simon R 33 86Sinha D 26 62Sloughter JM 32 84Smith B 25 57Smith BT 34 91Snapinn S 21 44Song C 28 68Song D 21 45Song J 32 84Song JS 19 37Song M 22 49Song R 23 34 51 89Song X 20 40Soon G 29 30 75 75Sorant AJ 25 59Soyer R 32 85Sriperambadur B 26 60Steiner PM 34 90Stingo F 31 81Strawderman R 28 69Su X 26 34 61 90Su Z 26 61Suh EY 30 76Suktitipat B 25 59Sun D 27 66Sun J 23 53Sun N 22 48Sun Q 29 72Sun T 22 46Sun W 23 51Sung H 25 59Suresh R 30 77Symanzik J 32 84

Tamhane A 30 76Tan F 34 91Tang CY 26 63

Tang H 22 47Tang Y 26 62Tao M 22 48Tao R 31 79Taylor J 22 47Tewson P 32 84Thabane L 21 42Thall PF 22 47Todem D 22 49Trippa L 28 68Trotta R 20 38Tucker A 26 62

Vannucci M 31 81Verhaak RG 24 54Vogel R 31 81Vrtilek S 20 39

Wahed A 21 44Waldron L 24 55Wang A 34 91Wang B 33 89Wang C 24 56Wang D 30 77Wang G 29 74Wang H 21 23 46 53Wang J 26 27 63 66Wang L 29 32 34 74 82

89Wang M 28 34 69 92Wang Q 26 27 61 67Wang R 19 37Wang S 29 31 74 80Wang W 31 79Wang X 19 25 32 38 58

84Wang Y 20 20 22 25 25

41 41 48 58 59Wang Z 25 25 33 59 59

87Wei WW 30 79Wei Y 20 40Wen S 20 21 42 44Weng H 29 71Weng RC 19 38Wettstein G 20 39Whitmore GA 24 56Wileyto EP 25 59Wilson AF 25 59Wilson JD 29 72Witten D 23 51Woerd MVD 24 55Wolf M 33 86Wolfe PJ 24 54

Wong WK 23 31 31 5079 79

Wu C 32 82Wu D 24 55Wu H 22 27 47 65Wu J 22 32 47 85Wu M 23 52Wu R 31 80Wu S 23 50Wu Y 21 26 30 35 43

63 77 92

Xi D 30 76Xia J 32 83Xia T 20 39Xiao R 22 48Xie J 31 81Xie M 32 85Xing H 22 48Xing X 20 40Xiong J 24 57Xiong X 22 47Xu K 25 59Xu R 23 51Xu X 25 58Xu Y 28 68Xu Z 34 89Xue H 27 65Xue L 32 82

Yang B 30 77Yang D 33 88Yang E 31 82Yang S 24 28 34 56 70

89Yao R 30 77Yao W 31 81Yau CY 24 56Yavuz I 21 44Yi G 24 57Yin G 33 87Ying G 25 32 59 84Young LJ 27 67Yu C 28 70Yu D 29 75Yu L 31 34 81 90Yu Y 31 81Yuan Y 30 33 33 78 87

87

Zacks S 27 65Zang Y 33 87Zeng D 20 20 28 29 31

41 41 69 7179 82

Zhan M 21 44Zhang B 29 75Zhang C 19 23 36 52Zhang D 20 26 40 63Zhang G 22 49Zhang H 19 28 36 68Zhang HH 26 33 35 63

88 92Zhang I 25 58Zhang J 28 34 69 91Zhang L 21 30 44 77Zhang N 29 75Zhang Q 25 59Zhang S 25 58Zhang W 20 39Zhang X 19 23 26 36 53

63Zhang Y 24 25 54 58Zhang Z 21 27 29 46 67

75Zhao H 23 29 52 73Zhao L 22 25 47 59Zhao N 23 52Zhao P 34 90Zhao S 25 57Zhao T 35 92Zhao Y 29 33 74 87Zhao Z 35 92Zheng C 29 72Zheng Q 34 91Zheng Y 20 29 41 72Zheng Z 26 63Zhong H 29 73Zhong L 25 57Zhong P 22 46Zhong W 20 22 40 46Zhou H 22 26 29 48 60

73Zhou L 31 82Zhou Q 29 73Zhou T 23 50Zhou Y 30 77Zhu G 26 61Zhu H 26 60Zhu J 26 63Zhu L 21 44Zhu M 26 62Zhu Y 24 55Zhu Z 24 32 56 84Zou F 22 48Zou H 26 64Zou W 33 87

  • Welcome
  • Conference Information
    • Committees
    • Acknowledgements
    • Conference Venue Information
    • Program Overview
    • Keynote Lectures
    • Student Paper Awards
    • Short Courses
    • Social Program
    • ICSA 2015 in Fort Collins CO
    • ICSA 2014 China Statistics Conference
    • ICSA Dinner at 2014 JSM
      • Scientific Program
        • Monday June 16 800 AM - 930 AM
        • Monday June 16 1000 AM-1200 PM
        • Monday June 16 130 PM - 310 PM
        • Monday June 16 330 PM - 510 PM
        • Tuesday June 17 820 AM - 930 AM
        • Tuesday June 17 1000 AM - 1200 PM
        • Tuesday June 17 130 PM - 310 PM
        • Tuesday June 17 330 PM - 530 PM
        • Wednesday June 18 830 AM - 1010 AM
        • Wednesday June 18 1030 AM-1210 PM
          • Abstracts
            • Session 1 Emerging Statistical Methods for Complex Data
            • Session 2 Statistical Methods for Sequencing Data Analysis
            • Session 3 Modeling Big Biological Data with Complex Structures
            • Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses
            • Session 5 Recent Advances in Astro-Statistics
            • Session 6 Statistical Methods and Application in Genetics
            • Session 7 Statistical Inference of Complex Associations in High-Dimensional Data
            • Session 8 Recent Developments in Survival Analysis
            • Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products
            • Session 10 Analysis of Observational Studies and Clinical Trials
            • Session 11 Lifetime Data Analysis
            • Session 12 Safety Signal Detection and Safety Analysis
            • Session 13 Survival and Recurrent Event Data Analysis
            • Session 14 Statistical Analysis on Massive Data from Point Processes
            • Session 15 High Dimensional Inference (or Testing)
            • Session 16 Phase II Clinical Trial Design with Survival Endpoint
            • Session 17 Statistical Modeling of High-throughput Genomics Data
            • Session 18 Statistical Applications in Finance
            • Session 19 Hypothesis Testing
            • Session 20 Design and Analysis of Clinical Trials
            • Session 21 New methods for Big Data
            • Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data
            • Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process
            • Session 24 Bayesian Models for High Dimensional Complex Data
            • Session 25 Statistical Methods for Network Analysis
            • Session 26 New Analysis Methods for Understanding Complex Diseases and Biology
            • Session 27 Recent Advances in Time Series Analysis
            • Session 28 Analysis of Correlated Longitudinal and Survival Data
            • Session 29 Clinical Pharmacology
            • Session 30 Sample Size Estimation
            • Session 31 Predictions in Clinical Trials
            • Session 32 Recent Advances in Statistical Genetics
            • Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization
            • Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications
            • Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials
            • Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis
            • Session 37 High-Dimensional Data Analysis Theory and Application
            • Session 38 Leading Across Boundaries Leadership Development for Statisticians
            • Session 39 Recent Advances in Adaptive Designs in Early Phase Trials
            • Session 40 High Dimensional RegressionMachine Learning
            • Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice
            • Session 42 Applications of Spatial Modeling and Imaging Data
            • Session 43 Recent Development in Survival Analysis and Statistical Genetics
            • Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population
            • Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis
            • Session 46 Missing Data the Interface between Survey Sampling and Biostatistics
            • Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine
            • Session 48 Student Award Session 1
            • Session 49 Network AnalysisUnsupervised Methods
            • Session 50 Personalized Medicine and Adaptive Design
            • Session 51 New Development in Functional Data Analysis
            • Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs
            • Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials
            • Session 54 Approaches to Assessing Qualitative Interactions
            • Session 55 Interim Decision-Making in Phase II Trials
            • Session 56 Recent Advancement in Statistical Methods
            • Session 57 Building Bridges between Research and Practice in Time Series Analysis
            • Session 58 Recent Advances in Design for Biostatistical Problems
            • Session 59 Student Award Session 2
            • Session 60 Semi-parametric Methods
            • Session 61 Statistical Challenges in Variable Selection for Graphical Modeling
            • Session 62 Recent Advances in Non- and Semi-Parametric Methods
            • Session 63 Statistical Challenges and Development in Cancer Screening Research
            • Session 64 Recent Developments in the Visualization and Exploration of Spatial Data
            • Session 65 Advancement in Biostaistical Methods and Applications
            • Session 66 Analysis of Complex Data
            • Session 67 Statistical Issues in Co-development of Drug and Biomarker
            • Session 68 New Challenges for Statistical AnalystProgrammer
            • Session 69 Adaptive and Sequential Methods for Clinical Trials
            • Session 70 Survival Analysis
            • Session 71 Complex Data Analysis Theory and Application
            • Session 72 Recent Development in Statistics Methods for Missing Data
            • Session 73 Machine Learning Methods for Causal Inference in Health Studies
            • Session 74 JP Hsu Memorial Session
            • Session 75 Challenge and New Development in Model Fitting and Selection
            • Session 76 Advanced Methods and Their Applications in Survival Analysis
            • Session 77 High Dimensional Variable Selection and Multiple Testing
              • Index of Authors
Page 3: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene

International Chinese Statistical Association - Korean InternationalStatistical Society

Applied Statistics Symposium

2014

CONFERENCE INFORMATION PROGRAM AND ABSTRACTS

June 15 - 18 2014

Portland Marriot Downtown Waterfront

Portland Oregon USA

Organized byInternational Chinese Statistical Association - Korean International Statistical Society

ccopy2014International Chinese Statistical Association - Korean International Statistical Society

Contents

Welcome 1Conference Information 2

Committees 2Acknowledgements 4Conference Venue Information 6Program Overview 7Keynote Lectures 8Student Paper Awards 9Short Courses 10Social Program 15ICSA 2015 in Fort Collins CO 16ICSA 2014 China Statistics Conference 17ICSA Dinner at 2014 JSM 18

Scientific Program 19Monday June 16 800 AM - 930 AM 19Monday June 16 1000 AM-1200 PM 19Monday June 16 130 PM - 310 PM 21Monday June 16 330 PM - 510 PM 23Tuesday June 17 820 AM - 930 AM 25Tuesday June 17 1000 AM - 1200 PM 25Tuesday June 17 130 PM - 310 PM 27Tuesday June 17 330 PM - 530 PM 29Wednesday June 18 830 AM - 1010 AM 31Wednesday June 18 1030 AM-1210 PM 33

Abstracts 36Session 1 Emerging Statistical Methods for Complex Data 36Session 2 Statistical Methods for Sequencing Data Analysis 36Session 3 Modeling Big Biological Data with Complex Structures 37Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses 38Session 5 Recent Advances in Astro-Statistics 38Session 6 Statistical Methods and Application in Genetics 39Session 7 Statistical Inference of Complex Associations in High-Dimensional Data 40Session 8 Recent Developments in Survival Analysis 40Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products 41Session 10 Analysis of Observational Studies and Clinical Trials 42Session 11 Lifetime Data Analysis 44Session 12 Safety Signal Detection and Safety Analysis 44Session 13 Survival and Recurrent Event Data Analysis 45Session 14 Statistical Analysis on Massive Data from Point Processes 45Session 15 High Dimensional Inference (or Testing) 46Session 16 Phase II Clinical Trial Design with Survival Endpoint 47Session 17 Statistical Modeling of High-throughput Genomics Data 47Session 18 Statistical Applications in Finance 48Session 19 Hypothesis Testing 49Session 20 Design and Analysis of Clinical Trials 50

iii

Session 21 New methods for Big Data 51Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data 51Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process 52Session 24 Bayesian Models for High Dimensional Complex Data 53Session 25 Statistical Methods for Network Analysis 54Session 26 New Analysis Methods for Understanding Complex Diseases and Biology 54Session 27 Recent Advances in Time Series Analysis 55Session 28 Analysis of Correlated Longitudinal and Survival Data 56Session 29 Clinical Pharmacology 57Session 30 Sample Size Estimation 58Session 31 Predictions in Clinical Trials 59Session 32 Recent Advances in Statistical Genetics 59Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization 60Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications 61Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials 61Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis 62Session 37 High-Dimensional Data Analysis Theory and Application 63Session 38 Leading Across Boundaries Leadership Development for Statisticians 64Session 39 Recent Advances in Adaptive Designs in Early Phase Trials 64Session 40 High Dimensional RegressionMachine Learning 65Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice 66Session 42 Applications of Spatial Modeling and Imaging Data 67Session 43 Recent Development in Survival Analysis and Statistical Genetics 67Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population 68Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis 69Session 46 Missing Data the Interface between Survey Sampling and Biostatistics 70Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine 70Session 48 Student Award Session 1 71Session 49 Network AnalysisUnsupervised Methods 72Session 50 Personalized Medicine and Adaptive Design 73Session 51 New Development in Functional Data Analysis 74Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs 75Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials 76Session 54 Approaches to Assessing Qualitative Interactions 76Session 55 Interim Decision-Making in Phase II Trials 77Session 56 Recent Advancement in Statistical Methods 78Session 57 Building Bridges between Research and Practice in Time Series Analysis 78Session 58 Recent Advances in Design for Biostatistical Problems 79Session 59 Student Award Session 2 79Session 60 Semi-parametric Methods 80Session 61 Statistical Challenges in Variable Selection for Graphical Modeling 81Session 62 Recent Advances in Non- and Semi-Parametric Methods 82Session 63 Statistical Challenges and Development in Cancer Screening Research 83Session 64 Recent Developments in the Visualization and Exploration of Spatial Data 84Session 65 Advancement in Biostaistical Methods and Applications 84Session 66 Analysis of Complex Data 85Session 67 Statistical Issues in Co-development of Drug and Biomarker 86Session 68 New Challenges for Statistical AnalystProgrammer 86Session 69 Adaptive and Sequential Methods for Clinical Trials 87Session 70 Survival Analysis 88Session 71 Complex Data Analysis Theory and Application 88Session 72 Recent Development in Statistics Methods for Missing Data 89Session 73 Machine Learning Methods for Causal Inference in Health Studies 90Session 74 JP Hsu Memorial Session 90Session 75 Challenge and New Development in Model Fitting and Selection 91Session 76 Advanced Methods and Their Applications in Survival Analysis 91

Session 77 High Dimensional Variable Selection and Multiple Testing 92Index of Authors 94

2014 Joint Applied Statistics Symposium of ICSA and KISS

June 15-18 Marriot Downtown Waterfront Portland Oregon USA

Welcome to the 2014 joint International Chinese Statistical Association (ICSA) and

the Korean International Statistical Society (KISS) Applied Statistical Symposium

This is the 23rd of the ICSA annual symposium and 1st for KISS The organizing committees have

been working hard to put together a strong program including 7 short courses 3 keynote lectures

76 scientific sessions student paper sessions and social events Our scientific program includes

keynote lectures from prominent statisticians Dr Sharon-Lise Normand Dr Robert Gentleman and

Dr Sastry Pantula and invited and contributed talks covering cutting-edge topics on Genome Scale

data and big data as well as on the new world of statistics after 2013 international year of statis-

tics We hope this symposium will provide abundant opportunities for you to engage learn and

network and get inspirations to advance old research ideas and develop new ones We believe this

will be a memorable and worthwhile learning experience for you

Portland is located near the confluence of the Willamette and Columbia rivers with unique city cul-

ture It is close to the famous Columbia gorge Oregon high mountains and coast Oregon is also

famous for many micro- breweries and beautiful wineries without sale tax June is a great time to

visit We hope you also have opportunities to experience the rich culture and activities the city has

to offer during your stay

Thanks for coming to the 2014 ICSA-KISS Applied Statistics Symposium in Portland

Dongseok Choi and Rochelle Fu on behalf of

2014 ICSA-KISS Applied Statistics Symposium Executive and Organizing committees

The city The city The city of roses of roses of roses welcomes welcomes welcomes you you you

Committees

2 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Executive13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Co-Chair amp Treasurer Oregon Health amp Science U Joan Hu Simon Fraser U Zhezhen Jin Program Chair Columbia U Ouhong Wang Amgen Ru-Fang Yeh Genentech XH Andrew Zhou U of Washington Cheolwoo Park Webmaster U of Georgia

Local13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Chair Oregon Health amp Science U Yiyi Chen Oregon Health amp Science U Thuan Nguyen Oregon Health amp Science U Byung Park Oregon Health amp Science U Xinbo Zhang Oregon Health amp Science U

Program13 Committee13 Zhezhen Jin Chair Columbia U Gideon Bahn VA Hospital Kani Chen Hong Kong U of Science and Technology Yang Feng Columbia U Liang Fang Gilead Qi Jiang Amgen Mikyoung Jun Texas AampM U Sin-Ho Jung Duke U Xiaoping Sylvia Hu Gene Jane Paik Kim Stanford U Mimi Kim Albert Einstein College of Medicine Mi-OK Kim Cincinnati Childrens Hospital Medical Center Gang Li Johnson and Johnson Yunfeng Li Phamacyclics Mei-Ling Ting Lee U of Maryland Yoonkyung Lee Ohio State U Meng-Ling Liu New York U Xinhua Liu Columbia U Xiaolong Luo Celgene Corporation Taesung Park Seoul National U Yu Shen MD Anderson Cancer center Greg (Guoxing) Soon US Food and Drug Administration Zheng Su Deerfield Company Christine Wang Amgen Lan Xue Oregon State U Yichuan Zhao Georgia State U

Committees

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 3

Program13 Book13 Committee13 Mengling Liu Chair New York U Tian Zheng Columbia U Wen (Jenna) Su Columbia U Zhenzhen Jin Columbia U

Student13 Paper13 Award13 Committee13 Wenqing He Chair U of Western Ontario Qixuan Chen Columbia U Hyunson Cho National Cancer Institute Dandan Liu Vanderbilt U Jinchi Lv U of Southern California

Short13 Course13 Committee13 Xiaonan Xue Chair Albert Einstein College of Medicine Wei-Ting Hwang U of Pennsylvania Ryung Kim Albert Einstein College of Medicine Jessica Kim US Food and Drug Administration Laura Lu US Food and Drug Administration Mikyoung Jun Texas AampM U Tao Wang Albert Einstein College of Medicine

IT13 Support13 Lixin (Simon) Gao Biopier Inc

Symposium Sponsors

The 2014 ICSA-KISS Applied Statistics Symposium is supported by a financial contribu-

tion from the following sponsors

The organizing committees greatly appreciate the support of the above sponsors

The 2014 ICSA-KISS Joint Applied Statistics Symposium Exhibitor

CRC Press mdash Taylor amp Francis Group

Springer Science amp Business Media

The Lotus Group

MedfordRoom

Salon G

Salon H

Salon F Salon E

Salon I

Salon A

Lounge

GiftShop

Willamette Room

ColumbiaRoom

BellStand

Main Lobby

SunstoneRoom

FitnessCenter

Whirlpool

SwimmingPool

MeadowlarkRoom

Douglas FirRoom

SalmonRoom

Patio

Skywalk toCrown Plaza Parking

Guest Laundry

Ice

Hot

elSe

rvic

e A

rea

Concierge

Front Desk

Mai

n En

tran

ce

BallroomLobby

EscalatorStairs

Stairs

Elevators

Elevators

Elevators

PortlandRoom

EugeneRoom

Salon B

Salon C

Salon D

SalemRoom

EscalatorStairs

Stairs

Lower Level 1Main Lobby

3rd Floor2nd Floor

HotelService Area

HotelService Area

portland marriott downtown waterfront

hotel floor plans 1401 SW Naito Parkway bull Portland Oregon 97201Hotel (503) 226-7600

Sales Facsimile (503) 226-1209portlandmarriottcom

RegistrationDesk

SalesEvents

and Executive

Offices

Hotel Service Area

RegistrationStorage

Audio Visual

Storage

Mount HoodRoom

Haw

thor

neRo

omB

elm

ont

Room

Laur

elhu

rst

Room

PearlRoom

Open ToLobby

RestaurantLobby

Hotel

Service Area

Elev

ator

s

Escalator

Lobby Baramp Cafeacute

Program Overview

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18

Sunday June 15th 2014 Time Room Session 800 AM - 600 PM Ballroom Foyer Registration 700 AM - 845AM Breakfast 945 AM ndash 1015 AM Break 800 AM - 500 PM Salon A Short Course Recent Advances in Bayesian Adaptive Clinical Trial Design 800 AM - 500 PM Salon B Short Course Analysis of Life History Data with Multistate Models 800 AM - 500 PM Salon C Short Course Propensity Score Methods in Medical Research for the Applied Statistician 800 AM - 1200 PM Salon D Short Course ChIP-seq for transcription and epigenetic gene regulation 800 AM - 1200 PM Columbia Short Course Data Monitoring Committees In Clinical Trials 1200 PM - 100 PM Lunch for Registered Full-Day Short Course Attendees

100 PM - 500 PM Salon D Short Course Analysis of Genetic Association Studies Using Sequencing Data and Related Topics

100 PM - 500 PM Columbia Short Course Analysis of biomarkers for prognosis and response prediction 245 PM - 315 PM Break 600 PM - 830 PM Mt Hood ICSA Board Meeting (Invited Only) 700 PM - 900 PM Salon E Opening Mixer

Monday June 16th 2014 730 AM - 600 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 800 AM - 820 AM Salon E-F Welcome 820 AM - 930 AM Salon E-F Keynote I Robert Gentleman Genetech 930 AM - 1000 AM Ballroom Foyer Break 1000 AM -1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 510 PM See program Parallel Sessions

Tuesday June 17th 2014 820 AM - 530 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 820 AM - 930 AM Salon E-F Keynote II Sharon-Lise Normand Harvard University 930 AM - 1000 AM Ballroom Foyer Break 1000 AM - 1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 530 PM See program Parallel Sessions 630 PM - 930 PM Off site Banquet (Banquet speaker Dr Sastry Pantula Oregon State University)

Wednesday June 18th 2014 830 AM - 100 PM Ballroom Foyer Registration 730 AM ndash 900 AM Breakfast 830 AM - 1010 AM See program Parallel Sessions 1010 AM - 1030 AM Ballroom Foyer Break 1030 AM - 1210 PM See program Parallel Sessions

Keynote Lectures

8 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Monday June 16th 820 AM - 930 AM

Robert Gentleman Senior Director Bioinformatics Genentech Postdoctoral Mentor Speaker Biography I joined Genentech in 2009 as Senior Director of the Bioinformatics and Computational Biology

Department I was excited by the opportunity to get involved in drug development and to do work that would directly impact patients I had worked at two major cancer centers and while immensely satisfying the research done there is still fairly distant from the patient At Genentech patients are at the forefront of everything we do Genentech Research is that rare blend of academia and industry that manages to capture most of the best aspects of both The advent of genome scale data technologies is revolutionizing molecular biology and is providing us with new and exciting opportunities for drug development I am very excited by the new opportunities we have to develop methods for computational discovery of potential drug targets At the same time these large genomic data sets provide us with opportunities to identify and understand different patient subsets and to help guide us towards much more targeted therapeutics

Postdoctoral Mentor

Being a post-doc mentor is one of the highlights of being in Research The ability to work with really talented post-docs who are interested in pushing the boundaries of computational science provides me with an outlet for my blue-skies research ideas Title Analyzing Genome Scale Data I will discuss some of the many genome scale data analysis problems such as variant calling and genotyping I will discuss the statistical approaches used as well as the software development needs of addressing these problems I will discuss approaches to parallelization of code and other practical computing issues that face most data analysts working on these data

Tuesday June 17th 820 AM-930 AM

Sharon-Lise Normand Professor Department of Health Care Policy Harvard Medical School Department of Biostatistics Harvard School of Public Health Speaker Biography Sharon-Lise T Normand PhD is a

professor of health care policy (biostatistics) in the Department of Health Care Policy at Harvard Medical School and in the Department of Biostatistics at the Harvard School of Public Health Dr Normandrsquos research focuses on the development of statistical methods for health services research primarily using Bayesian approaches to problem solving including assessment of quality of care methods for causal inference provider profiling meta-analysis and latent variable modeling She has developed a long line of research on methods for the analysis of patterns of treatment and quality of care for patients with cardiovascular disease and with mental disorders in particular Title Combining Information for Assessing Safety Effectiveness and Quality Technology Diffusion and Health Policy Health information growth has created unprecedented opportunities to evaluate therapies in large and broadly representative patient populations Extracting sound evidence from large observational data is now at the forefront of health care policy decisions - regulators are moving away from a strict biomedical perspective to one that is wider for coverage of new medical technologies Yet discriminating between beneficial and wasteful new technology remains methodologically challenging - while big data provide opportunities to study treatment effect heterogeneity estimation of average causal effects in sub-populations are underdeveloped in observational data and correct choice of confounding adjustment is difficult in the large p setting In this talk I discuss analytical issues related to the analysis of observational data when the goals involve characterizing the diffusion of multiple new technologies and assessing their causal impacts in the areas of mental illness and cardiovascular interventions This work is supported in part by grants U01-MH103018 from the National Institutes of Health and U01-FD004493 from the US Food and Drug Administration

Student Paper Awards 13

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18 9

ASA13 Bio-shy‐pharmaceutical13 Awards13 Guanhua Chen University of North Carolina ndash Chapel Hill

⎯ Title Personalized Dose Finding Using Outcome Weighted Learning ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Cheng Zheng University of Washington

⎯ Title Survival Rates Prediction when Training Data and Target Data have Different Measurement Error ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Jiann-shy‐Ping13 Hsu13 Pharmaceutical13 and13 Regulatory13 Sciences13 Student13 Paper13 Award13 Sandipan Roy University of Michigan

⎯ Title Estimating a Change-Point in High-Dimensional Markov Random Field Models ⎯ Time Wednesday June 18th 1030 AM - 1210 PM ⎯ Session 74 JP Hsu Memorial Session (Salon D Lower Level 1)

ICSA13 Student13 Paper13 Awards13 13

Ting-Huei Chen University of North Carolina ndash Chapel Hill ⎯ Title Using a Structural Equation Modeling Approach with Application in Alzheimerrsquos Disease ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Haolei Weng Columbia University

⎯ Title Regularization after Retention in Ultrahigh Dimensional Linear Regression Models ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Ran Tao University of North Carolina ndash Chapel Hill

⎯ Title Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Hsin-Wen Chang Columbia University

⎯ Title Empirical likelihood based tests for stochastic ordering under right censorship ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Qiang Sun University of North Carolina ndash Chapel Hill ⎯ Title Hard Thresholded Regression Via Linear Programming ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Short Courses

10 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

1 Recent Advances in Bayesian Adaptive Clinical Trial Design Presenters Peter F Thall amp Brian P Hobbs The University of Texas MD Anderson Cancer Center 1400 Hermann Pressler Dr Houston TX 77030-4008 Email rexmdandersonorg Course length One day OutlineDescription This one-day short course will cover a variety of recently developed Bayesian methods for the design and conduct of adaptive clinical trials Emphasis will be on practical application with the course structured around a series of specific illustrative examples Topics to be covered will include (1) using historical data in both planning and adaptive decision making during the trial (2) using elicited utilities or scores of different types of multivariate patient outcomes to characterize complex treatment effects (3) characterizing and calibrating prior effective sample size (4) monitoring safety and futility (5) eliciting and establishing priors and (6) using computer simulation as a design tool These methods will be illustrated by actual clinical trials including cancer trials involving chemotherapy for leukemia and colorectal cancer stem cell transplantation and radiation therapy as well as trials in neurology and neonatology The illustrations will include both early phase trials to optimize dose or dose and schedule and randomized comparative phase III trials References Braun TM Thall PF Nguyen H de Lima M Simultaneously optimizing dose and schedule of a new cytotoxic agent Clinical Trials 4113-124 2007 Hobbs BP Carlin BP Mandrekar S Sargent DJ Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials Biometrics 67 1047ndash1056 2011 Hobbs BP Sargent DJ Carlin BP Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models Bayesian Analysis 7 639ndash674 2012 Hobbs BP Carlin BP Sargent DJ Adaptive adjustment of the randomization ratio using historical control data Clinical Trials 10430-440 2013 Morita S Thall PF Mueller P Determining the effective sample size of a parametric prior Biometrics 64595-602 2008 Morita S Thall PF Mueller P Evaluating the impact of prior assumptions in Bayesian biostatistics Statistics in Biosciences 21-17 2010

Thall PF Bayesian models and decision algorithms for complex early phase clinical trials Statistical Science 25227-244 2010 Thall PF Szabo A Nguyen HQ et al Optimizing the concentration and bolus of a drug delivered by continuous infusion Biometrics 671638-1646 2011 Thall PF Nguyen HQ Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes J Biopharmaceutical Statistics 22785-801 2012 Thall PF Nguyen HQ Braun TM Qazilbash M Using joint utilities of the times to response and toxicity to adaptively optimize schedule-dose regimes Biometrics In press About the presenters

Dr Peter Thall has pioneered the use of Bayesian methods in medical research He has published over 160 research papers and book chapters in the statistical and medical literature including numerous papers providing innovative methods for the design conduct and analysis of clinical trials Over the course of his career he had designed over 300 clinical trials He has presented 20 short courses and over 130 invited talks and regularly provides statistical consultation for corporations in the pharmaceutical industry He has served as an associated editor for the journals Statistics in Medicine Journal of National Cancer Institute and Biometrics currently is an associate editor for the journals Clinical Trials Statistics in Biosciences and is an American Statistical Association Media Expert

Dr Brian P Hobbs is Assistant Professor in the Department of Biostatistics at the University of Texas MD Anderson Cancer Center in Houston Texas He completed his undergraduate education at the University of Iowa and obtained a masterrsquos and doctoral degree in biostatistics at the University of Minnesota in Minneapolis He was the recipient of 2010 ENAR John Van Ryzin Student Award Dr Hobbs completed a postdoctoral fellowship in the Department of Biostatistics at MD Anderson Cancer Center before joining the faculty in 2011 His methodological expertise covers Bayesian inferential methods hierarchical modeling utility-based inference adaptive trial design in the presence of historical controls sequential design in the presence of co-primary endpoints and semiparametric modeling of functional imaging data 2 Analysis of Life History Data with Multistate Models

Presenter Richard Cook and Jerry Lawless Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada Email rjcookuwaterlooca jlawlessuwaterlooca

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 11

Course Length One day

OutlineDescription

Life history studies examine specific outcomes and processes during peoples lifetimes For example cohort studies of chronic disease provide information on disease progression fixed and time-varying risk factors and the extent of heterogeneity in the population Modelling and analysis of life history processes is often facilitated by the use of multistate models The aim of this workshop is to present models and methods for multistate analyses and to indicate some current topics of research Software for conducting analyses will be discussed and code for specific problems will be given A wide range of illustrations involving chronic disease and other conditions will be presented Course notes will be distributed

TOPICS

1 Introduction 2 Some Basic Quantities for Event History Modelling 3 Some Illustrative Analyses Involving Multistate Models 4 Processes with Intermittent Observation 5 Modelling Heterogeneity and Associations 6 Dependent Censoring and Inspection 7 Some Other Topics About the presenters Richard Cook is Professor of Statistics at the University of Waterloo and holder of the Canada Research Chair in Statistical Methods for Health Research He has published extensively in the areas of statistical methodology clinical trials medicine and public health including many articles on event history analysis multistate models and the statistical analysis of life history data He collaborates with numerous researchers in medicine and public health and has consulted widely with pharmaceutical companies on the design and analysis of clinical trials

Jerry Lawless is Distinguished Professor Emeritus of Statistics at the University of Waterloo He has published extensively on statistical models and methods for survival and event history data life history processes and other topics and is the author of Statistical Models and Methods for Lifetime Data (2nd edition Wiley 2003) He has consulted and worked in many applied areas including medicine public health manufacturing and reliability Dr Lawless was the holder of the GM-NSERC Industrial Research Chair in Quality and Productivity from 1994 to 2004

Drs Cook and Lawless have co-authored many papers as well as the book The Statistical Analysis of Recurrent Events (Springer 2007) They have also given numerous workshops together

3 Propensity Score Methods in Medical Research for the Applied Statistician Presenter Ralph DrsquoAgostino Jr PhD Department of Biostatistical Sciences Wake Forest University School of Medicine Medical Center Boulevard Winston-Salem NC 27157 Email rdagostiwakehealthedu Course length One Day OutlineDescription

The purpose of this short course is to introduce propensity score methodology to applied statisticians Currently propensity score methods are being widely used in research but often their use is not accompanied by an explanation on how they were used or whether they were used appropriately This course will teach the attendee the definition of the propensity score show how it is estimated and present several applied examples of its use In addition SAS code will be presented to show how to estimate propensity scores assess model success and perform final treatment effect estimation Published medical journal articles that have used propensity score methods will be examined Some attention will be given to the use of propensity score methods for detecting safety signals using post-marketing data Upon completion of this workshop researchers should be able to understand what a propensity score is to know how to estimate it to identify under what circumstances they can be used to know how to evaluate whether a propensity score model ldquoworkedrdquo and to be able to critically review the medical literature where propensity scores have been used to determine whether they were used appropriately In addition attendees will be shown statistical programs using SAS software that will estimate propensity scores assess the success of the propensity score model and estimate treatment effects that take into account propensity scores Experience with SAS programming would be useful for attendees TextbookReferences

Rosenbaum P Rubin DB The central role of the propensity score in observational studies for causal effects Biometrika 19837041-55

DrsquoAgostino RB Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group Stat Med 1998 172265-2281

Rubin DB The design versus the analysis of observational studies for causal effects parallels with the design of randomized studies Stat Med 2007 2620-36

DrsquoAgostino RB Jr DrsquoAgostino RB Sr Estimating treatment effects using observational data JAMA 2007297(3) 314-316

Yue LQ Statistical and regulatory issues with the application of propensity score analysis to non-

Short Courses

12 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

randomized medical device clinical studies J Biopharm Stat 2007 17(1) 1-13

DrsquoAgostino RB Jr Propensity scores in cardiovascular research Circulation 2007 115(17)2340-2343

About the presenters Dr DAgostino holds a PhD in Mathematical Statistics from Harvard University He is a Fellow of the American Statistical Association and a Professor of Biostatistical Sciences at the Wake Forest School of Medicine (WFSM) He has been a principal investigator for several RO1 grantssubcontracts funded by the NIHCDC and has served as the Statistical Associate Editor for Arthroscopy (The Journal of Arthroscopy and Related Surgery) since 2008 and has previously been on the editorial boards for Current Controlled Trials in Cardiovascular Medicine the Journal of Cardiac Failure and the American Journal of Epidemiology He has published over 235 manuscripts and book chapters in areas of statistical methodology (in particular propensity score methods) cardiovascular disease diabetes cancer and genetics He has extensive experience in the design and analysis of clinical trials observational studies and large scale epidemiologic studies He has been an author on several manuscripts that describe propensity score methodology as well as many applied manuscripts that use this methodology In addition during the past twenty years Dr DrsquoAgostino has made numerous presentations and has taught several short courses and workshops on propensity score methods 4 ChIP-seq for transcription and epigenetic gene regulation Presenter X Shirley Liu Professor of Biostatistics and Computational Biology Harvard School of Public Health Director Center for Functional Cancer Epigenetics Dana-Farber Cancer Institute Associate member Broad Institute 450 Brookline Ave Mail CLS-11007 Boston MA 02215 Email xsliujimmyharvardedu Course length Half Day OutlineDescription With next generation sequencing ChIP-seq has become a popular technique to study transcriptional and epigenetic gene regulation The short course will introduce the technique of ChIP-seq and discuss the computational and statistical issues in analyzing ChIP-seq data They includes the initial data QC normalizing biases identifying transcription factor binding sites and target genes predicting additional transcription factor drivers in biological processes integrating binding with transcriptome and epigenome information We will also emphasize the importance of dynamic ChIP-seq and introduce some of the tools and databases that are useful for ChIP-seq data analysis

TextbookReferences Park PJ ChIP-seq advantages and challenges of a maturing technology Nat Rev Genet 2009 Oct10(10)669-80 Shin H Liu T Duan X Zhang Y Liu XS Computational methodology for ChIP-seq analysis Quantitative Biology 2013 About the presenter Dr X Shirley Liu is Professor of Biostatistics and Computational Biology at Harvard School of Public Health and Director of the Center for Functional Cancer Epigenetics at the Dana-Farber Cancer Institute Her research focuses on computational models of transcriptional and epigenetic regulation by algorithm development and data integration for high throughput data She has developed a number of widely used transcription factor motif finding (cited over 1700 times) and ChIP-chipseq analysis algorithms (over 8000 users) and has conducted pioneering research studies on gene regulation in development metabolism and cancers Dr Liu published over 100 papers including over 30 in Nature Science or Cell series and she has an H-index of 50 according to Google Scholar statistics She presented at over 50 conferences and workshops and gave research seminars at over 70 academic and research institutions worldwide 5 Data Monitoring Committees In Clinical Trials Presenter Jay Herson PhDSenior Associate Biostatistics Johns Hopkins Bloomberg School of Public Health Baltimore MD Email jayhersonearthlinknet Course Length Half day OutlineDescription This workshop deals with best practices for data monitoring committees (DMCs) in the pharmaceutical industry The emphasis is on safety monitoring because this constitutes 90 of the workload for pharmaceutical industry DMCs The speaker summarizes experience over 24 years of working as statistical member or supervisor of statistical support for DMCs He provides insight into the behind-the-scenes workings of DMCs which those working in industry or FDA may find surprising The introduction presents a stratification of the industry into Big Pharma Middle Pharma and Infant Pharma which will be referred to often in this workshop Subsequent sections deal with DMC formation DMC meetings and the process of serious adverse event (SAE) data flow The tutorialrsquos section on clinical issues explains the nature of MedDRA coding as well as issues in multinational trials This will be followed by a statistical section which reviews and illustrates the various methods of statistical analysis of treatment-emergent adverse events dealing with multiplicity and if time allows likelihood and Bayesian methods The

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 13

workshoprsquos review of biases and pitfalls describes reporting bias analysis bias granularity bias competing risks and recommendations to reduce bias A description of DMC decisions goes through various actions and ad hoc analyses the DMC can make when faced with an SAE issue and their limitations The workshop concludes with emerging issues such as adaptive designs causal inference biomarkers training DMC members cost control DMC audits mergers and licensing and the high tech future of clinical trials Text Herson J Data and Safety Monitoring Committees in Clinical Trials Chapman amp Hall CRC 2009 About the presenter Jay Herson received his PhD in Biostatistics from Johns Hopkins in 1971 After working on cancer clinical trials at MD Anderson Hospital he formed Applied Logic Associates (ALA) in Houston in 1983 ALA grew to be a biostatistical-data management CRO with 50 employees when it was sold to Westat in 2001 Jay joined the Adjunct Faculty in Biostatistics at Johns Hopkins in 2004 His interests are interim analysis in clinical trials data monitoring committees and statistical regulatory issues He chaired the first known data monitoring committee in the pharmaceutical industry in 1988 He is the author of numerous papers on statistical and clinical trial methodology and in 2009 authored the book Data and Safety Monitoring Committees in Clinical Trials published by Chapman Hall CRC 6 Analysis of Genetic Association Studies Using Sequencing Data and Related Topics Presenters Xihong Lin Department of Biostatistics Harvard School of Public Health xlinhsphhravardedu Seunggeun Lee University of Michigan leeshawnumichedu Course length Half day OutlineDescription The short course is to discuss the current methodology in analyzing sequencing association studies for identifying genetic basis of common complex diseases The rapid advances in next generation sequencing technologies provides an exciting opportunity to gain a better understanding of biological processes and new approaches to disease prevention and treatment During the past few years an increasing number of large scale sequencing association studies such as exome-chip arrays candidate gene sequencing whole exome and whole genome sequencing studies have been conducted and preliminary analysis results have become rapidly available These studies could potentially identify new genetic variants that play important roles in understanding disease etiology or treatment response However due to the massive number of

variants and the rareness of many of these variants across the genome sequencing costs and the complexity of diseases efficient methods for designing and analyzing sequencing studies remain virtually important yet challenging This short course provides an overview of statistical methods for analysis of genome-wide sequencing association studies and related topics Topics include study designs for sequencing studies data process pipelines statistical methods for detecting rare variant effects meta analysis genes-environment interaction population stratification mediation analysis for integrative analysis of genetic and genomic data Data examples will be provided and software will be discussed TextbookReferences Handout and references will be provided About the presenters Xihong Lin is Professor of Biostatistics and Coordinating Director of the Program of Quantitative Genomics at the School of Public Health of Harvard University Dr Linrsquos research interests lie in statistical genetics and lsquoomics especially development and application of statistical and computational methods for analysis of high-throughput genetic and omics data in epidemiological and clinical studies and in statistical methods for analysis of correlated data such as longitudinal clustered and family data Dr Linrsquos specific areas of expertise include statistical methods for genome-wide association studies and next generation sequencing association studies genes and environment mixed models and nonparametric and seimparametric regression She received the 2006 Presidentsrsquo Award for the outstanding statistician from the Committee of the Presidents of Statistical Societies (COPSS) and the 2002 Mortimer Spiegelman Award for the outstanding biostatistician from the American Public Health Association She is an elected fellow of the American Statistical Association Institute of Mathematical Statistics and International Statistical Institute Dr Lin was the Chair of the Committee of the Presidents of the Statistical Societies (COPSS) between 2010 and 2012 She is currently a member of the Committee of Applied and Theoretical Statistics of the US National Academy of Science Dr Lin is a recipient of the MERIT (Method to Extend Research in Time) from the National Institute of Health which provides a long-term research grant support She is the PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology She has served on numerous editorial boards of statistical journals She was the former Coordinating Editor of Biometrics and currently the co-editor of Statistics in Biosciences and the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics She was the permanent member of the NIH study section of Biostatistical Methods and Study Designs (BMRD) and has served on a large number of other study sections at NIH and NSF

Short Courses

14 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

Seunggeun (Shawn) Lee is an assistant professor of Biostatistics at the University of Michigan He received his PhD in Biostatistics from the University of North Carolina at Chapel Hill and completed a postdoctoral training at Harvard School of Public Health His research focuses on developing statistical and computational methods for the analysis of the large-scale high-dimensional genetic and genomic data which is essential to better understand the genetic architecture of complex diseases and traits He is a recipient of the NIH Pathway to Independence Award (K99R00) 7 Analysis of biomarkers for prognosis and response prediction Presenter Patrick J Heagerty Professor and Associate Chair Department of Biostatistics University of Washington Seattle MA 98195 email heagertyuwashingtonedu Course length Half day OutlineDescription Longitudinal studies allow investigators to correlate changes in time-dependent exposures or biomarkers with subsequent health outcomes The use of baseline or time-dependent markers to predict a subsequent change in clinical status such as transition to a diseased state require the formulation of appropriate classification and prediction error concepts Similarly the evaluation of markers that could be used to guide treatment requires specification of operating characteristics associated with use of the marker The first part of this course will introduce predictive accuracy concepts that allow evaluation of time-dependent sensitivity and specificity for prognosis of a subsequent event time We will overview options that are appropriate for both baseline markers and for longitudinal markers Methods will be illustrated using examples from HIV and cancer research and will highlight R packages that are currently available Time permitting the second part of this course will introduce statistical methods that can characterize the performance of a biomarker toward accurately guiding treatment choice and toward improving health outcomes when the marker is used to selectively target treatment Examples will include use of imaging information to guide surgical treatment and use of genetic markers to select subjects for treatment TextbookReferences Heagerty PJ Lumley T Pepe MS Time dependent ROC curves for censored survival data and a

diagnostic marker Biometrics 56337-344 2000 Heagerty PJ Zheng Y Survival model predictive accuracy and ROC curves Biometrics 61(1)

92-105 2005 Saha P Heagerty PJ Time-dependent predictive accuracy in

the presence of competing risks Biometrics 66(4)

999-1011 2010 About the presenter Patrick Heagerty is Professor of Biostatistics University of Washington Seattle WA He has been the director of the center for biomedical studies at the University of Washington School of Medicine and Public Health He is one of the leading experts on methods for longitudinal studies including the evaluation of markers used to predict future clinical events He has made significant contributions to many areas of research including semi-parametric regression and estimating equations marginal models and random effects for longitudinal data dependence modeling for categorical time series and hierarchical models for categorical spatial data He was an elected fellow of the American Statistical Association and the Institute of Mathematical Statistics

Social Programs

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 15

Opening Mixer Sunday June 26th 2011 7 PM - 9 PM Salon E Lower Level 1

Banquet Tuesday June 17 2014 630pm-930pm JIN WAH Vietnamese amp Chinese Seafood Restaurant httpwwwjinwahcom Banquet Speech ldquoThe World of Statisticsrdquo After a successful International Year of Statistics 2013 we enter the new World of Statistics This is a great opportunity to think of our profession and look forward to the impact statistical sciences can have in innovation and discoveries in sciences engineering business and education Are we going to be obsolete Or omnipresent Dr Sastry Pantula Dean College of Science Oregon State University and former President of the American Statistical Association Sastry G Pantula became dean of the College of Science at Oregon State University in the fall of 2013 Prior to that he served as director of the National Science Foundationrsquos Division of Mathematical Sciences from 2010-2013

Pantula headed the statistics department at North Carolina State University (NCSU) where he served on the faculty for nearly 30 years He also directed their Institute of Statistics Pantula served as president of the American Statistical Association (ASA) in 2010 In addition to being an ASA fellow he is a fellow of the American Association for the Advancement of Science (AAAS) a member of the honor societies Mu Sigma Rho and Phi Kappa Phi and was inducted into the NCSU Academy of Outstanding Teachers in 1985

As dean of Oregon Statersquos College of Science and professor of statistics Pantula provides leadership to world-class faculty in some of the universityrsquos most recognized disciplines including nationally recognized programs in chemistry informatics integrative biology marine studies material science physics and others

During his tenure at NCSU Pantula worked with his dean and the college foundation to create three $1 million endowments for distinguished professors He also worked with colleagues and alumni to secure more than $7 million in funding from the National Science Foundation other agencies and industry to promote graduate student training and mentorship

Pantularsquos research areas include time series analysis and econometric modeling with a broad range of applications He has worked with the National Science Foundation the US Fish and Wildlife Service the US Environmental Protection Agency and the US Bureau of Census on projects ranging from population estimates to detecting trends in global temperature

As home to the core life physical mathematical and statistical sciences the College of Science has built a foundation of excellence It helped Oregon State acquire the top ranking in the United States for conservation biology in recent years and receive top 10 rankings by the Chronicle of Higher Education for the Departments of Integrative Biology (formerly Zoology) and Science Education The diversity of sciences in the Collegemdashincluding mathematical and statistical sciencesmdashprovides innovative opportunities for fundamental and multidisciplinary research collaborations across campus and around the globe

Pantula holds bachelorrsquos and masterrsquos degrees in statistics from the Indian Statistical Institute in Kolkata India and a PhD in statistics from Iowa State University

2014 ICSA China Statistics Conference July 4 ndash July 5 2014 bull Shanghai bull China

2nd

Announcement of the Conference (April 8 2014)

To attract statistical researchers and students in China and other countries to present their work and

experience with statistical colleagues and to strengthen the connections between China and oversea

statisticians the 2014 ICSA China Statistics Conference will be organized by the Committee for ICSA

Shanghai and hosted by East China Normal University (ECNU) from July 4 to July 5 2014 in

Shanghai China

The conference will invite lead statistical processionals in mainland China Hong Kong Taiwan the

United States and worldwide to present their research work It will cover a broad range of statistics

including mathematical statistics applied statistics biostatistics and statistics in finance and

economics which will provide a good platform for statistical professionals all over the world to share

their latest research and applications of statistics The invited speakers include Prof LJ Wei (Harvard

University) Prof Tony Cai (University of Pennsylvania) Prof Ying Lu (Stanford University) Prof

Ming-Hui Chen (University of Connecticut) Prof Danyu Lin (University of North Carolina at

Chapel Hill) and other distinguished statisticians

The oral presentations at the conference will be conducted in either English or Chinese Although the

Program Committee would recommend the presentation slides in English the Chinese version of the

slides could also be used

The program committee is working on the conference program and more information will be

distributed very soon Should you have any inquiries about the program please contact Dr Dejun

Tang (dejuntangnovartiscom) or Dr Yankun Gong (yankungongnovartiscom)

For conference registration and hotel reservation please contact Prof Shujin Wu at ECNU

(sjwustatecnueducn)

Program Committee amp Local Organizing Committee

2014 ICSA China Statistics Conference

18 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

ICSA DINNER at 2014 JSM in Boston MA The ICSA will hold the annual members meeting on August 6 (Wednesday) at 600 pm in Boston Convention Exhibition Center room CC-157B An ICSA banquet will follow the members meeting at Osaka Japanese Sushi amp Steak House 14 Green St Brookline MA 02446 (617) 732-0088 httpbrooklineosakacom Osaka is a Japanese fusion restaurant located in Brookline and can be reached via the MBTA subway Green line ldquoCrdquo branch (Coolidge corner stop) This restaurant features a cozy setting superior cuisine and elegant decore The banquet menu will include Oyster 3-waysRock ShrimpShrimp TempuraSushi and Sashimi boatHabachi seafoodChar-Grilled Sea BassLobster Complimentary winesakesoft drinks will be served and cash bar for extra drinks will be available The restaurant also has a club dance floor that provides complimentary Karaoke

Scientific Program (Presenting Author) Monday June 16 1000 AM-1200 PM

Scientific Program (June 16th - June 18th)

Monday June 16 800 AM - 930 AM

Keynote session I (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Dongseok Choi Oregon Health amp Science University

800 AM WelcomeYing Lu ICSA 2014 President

805 AM Congratulatory AddressGeorge C Tiao ICSA Founding President

820 AM Keynote lecture IRober Gentleman Genentech

930 AM Floor Discussion

Monday June 16 1000 AM-1200 PM

Session 1 Emerging Statistical Methods for Complex Data(Invited)Room Salon A Lower Level 1Organizer Lan Xue Oregon State UniversityChair Lan Xue Oregon State University

1000 AM Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao Guo University of Wisconsin-Madison

1025 AM Kernel Additive Sliced Inverse RegressionHeng Lian Nanyang Technological University

1050 AM Variable Selection with Prior Information for GeneralizedLinear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3 1OregonState University 2Nielsen Company 3Nielsen Company

1115 AM Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2 1University of Mis-souri at Columbia 2Purdue University

1140 AM Floor Discussion

Session 2 Statistical Methods for Sequencing Data Analysis(Invited)Room Salon B Lower Level 1Organizer Yanming Di Oregon State UniversityChair Gu Mi Oregon State University

1000 AM A Penalized Likelihood Approach for Robust Estimation ofIsoform ExpressionHui Jiang1 and Julia Salzman2 1University of Michigan2Stanford University

1025 AM Classification on Sequencing Data and its Applications on aHuman Breast Cancer DatasetJun Li University of Notre Dame

1050 AM Power-Robustness Analysis of Statistical Models for RNASequencing DataGu Mi Yanming Di and Daniel W Schafer Oregon StateUniversity

1115 AM Discussant Wei Sun University of North Carolina at ChapelHill

1140 AM Floor Discussion

Session 3 Modeling Big Biological Data with Complex Struc-tures (Invited)Room Salon C Lower Level 1Organizer Hua Tang Stanford UniversityChair Marc Coram Stanford University

1000 AM High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1 1University of California atDavis

1025 AM Statistical Analysis of RNA Sequencing DataMingyao Li and Yu Hu University of Pennsylvania

1050 AM Quantifying the Role of Steric Constraints in NucleosomePositioningH Tomas Rube and Jun S Song University of Illinois atUrbana-Champaign

1115 AM Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I Mias Michigan State University

1140 AM Floor Discussion

Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses (Invited)Room Salon D Lower Level 1Organizer Xiaojing Wang University of ConnecticutChair Xun Jiang Amgen Inc

1000 AM Binary State Space Mixed Models with Flexible Link Func-tionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut 2Amgen Inc 3Federal Univer-sity of Rio de Janeiro

1025 AM Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak KDey2 1University of Cincinnati 2University of Connecticut3Lawrence Berkeley National Laboratory

1050 AM Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing Weng National Chengchi University

1115 AM Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras StephenPrisley James Campbell and David Gaines Virginia Insti-tute of Technology

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 19

Monday June 16 1000 AM-1200 PM Scientific Program (Presenting Author)

1140 AM Floor Discussion

Session 5 Recent Advances in Astro-Statistics (Invited)Room Salon G Lower Level 1Organizer Thomas Lee University of Carlifornia at DavisChair Alexander Aue University of California at Davis

1000 AM Embedding the Big Bang Cosmological Model into aBayesian Hierarchical Model for Super Nova Light CurveDataDavid van Dyk Roberto Trotta Xiyun Jiao and HikmataliShariff Imperial College London

1025 AM Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew GrahamCiro Donalek and Andrew Drake California Institute ofTechnology

1050 AM Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku Vrtilek Harvard University

1115 AM Persistent Homology and the Topology of the IntergalacticMediumFabrizio Lecci Carnegie Mellon University

1140 AM Floor Discussion

Session 6 Statistical Methods and Application in Genetics(Invited)Room Salon H Lower Level 1Organizer Ying Wei Columbia UniversityChair Ying Wei Columbia University

1000 AM Identification of Homogeneous and Heterogeneous Covari-ate Structure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1 1New YorkUniversity 2North Carolina State University

1025 AM Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibro-sis in Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia GuillaumeWettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLC

1050 AM DNA Methylation Cell-Type Distribution and EWASE Andres Houseman Oregon State University

1115 AM Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and IulianaLonita-Laza1 1Columbia University 2New York Univer-sity

1140 AM Floor Discussion

Session 7 Statistical Inference of Complex Associations inHigh-Dimensional Data (Invited)Room Salon I Lower Level 1Organizer Jun Liu Harvard UniversityChair Di Wu Harvard University

1000 AM Leveraging for Big Data RegressionPing Ma University of Georgia

1025 AM Reference-free Metagenomics Analysis Using Matrix Fac-torizationWenxuan Zhong and Xin Xing University of Georgia

1050 AM Big Data Big models Big Problems Statistical Principlesand Practice at ScaleAlexander W Blocker Google

1115 AM Floor Discussion

Session 8 Recent Developments in Survival Analysis (Invited)Room Eugene Room Lower Level 1Organizer Qingxia (Cindy) Chen Vanderbilt UniversityChair Qingxia (Cindy) Chen Vanderbilt University

1000 AM Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical Tri-alsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2Mark E Boye3 and Wei Shen3 1University of Connecti-cut 2University of North Carolina 3Eli Lilly and Company

1025 AM Estimating Risk with Time-to-Event Data An Applicationto the Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu21Vanderbilt University 2Fred Hutchinson Cancer ResearchCenter

1050 AM Efficient Estimation of Nonparametric Genetic Risk Func-tion with Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng31Columbia University 2Beijing Normal University3University of North Carolina at Chapel Hill

1115 AM Support Vector Hazard Regression for Predicting EventTimes Subject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng11University of North Carolina 2Columbia University

1140 AM Floor Discussion

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products (Invited)Room Portland Room Lower Level 1Organizers Shihua Wen AbbVie Inc Yijie Zhou Merck amp CoChair Yijie Zhou Merck amp Co

1000 AM Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D Norton MedImmune

1025 AM Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3George Quartey4 John Scott5 and Shihua Wen6 1AmgenInc 2Pfizer Inc 3Merck amp Co 4Hoffmann-La Roche5United States Food and Drug Administration 6AbbVie Inc

1050 AM Current Concept of Benefit Risk Assessment of MedicineSyed S Islam AbbVie Inc

1115 AM Discussant Yang Bo AbbVie Inc

1140 AM Floor Discussion

20 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 130 PM - 310 PM

Session 10 Analysis of Observational Studies and ClinicalTrials (Contributed)Room Salem Room Lower Level 1Chair Naitee Ting Boehringer-Ingelheim Company

1000 AM Impact of Tuberculosis on Mortality Among HIV-InfectedPatients Receiving Antiretroviral Therapy in Uganda ACase Study in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 andLehana Thabane3 1Agensys Inc (Astellas) 2Universityof OttawaMcMaster University 3McMaster University4McMaster UniversityUniversity of Toronto 5The AIDSSupport Organization 6Stellenbosch University

1020 AM Ecological Momentary Assessment Methods to IncreaseResponse and Adjust for Attrition in a Study of MiddleSchool Studentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie KovalchikKirsten Becker Elizabeth DrsquoAmico William Shadel andMarc Elliott RAND Corporation

1040 AM Is Poor Antisaccade Performance in Healthy First-DegreeRelatives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and DeborahL Levy3 1University of Alabama at Birmingham 2StateUniversity of New York at Binghamton 3McLean Hospital

1100 AM Analysis of a Vaccine Study in Animals using MitigatedFraction in SASMathew Rosales Experis

1120 AM Competing Risks Survival Analysis for Efficacy Evaluationof Some-or-None VaccinesPaul T Edlefsen Fred Hutchinson Cancer Research Center

1140 AM Using Historical Data to Automatically Identify Air-TrafficController BehaviorYuefeng Wu University of Missouri at St Louis

1200 PM Floor Discussion

Monday June 16 130 PM - 310 PM

Session 11 Lifetime Data Analysis (Invited)Room Salon A Lower Level 1Organizer Mei-Ling Ting Lee University of MarylandChair Mei-Ling Ting Lee University of Maryland

130 PM Analysis of Multiple Type Recurrent Events When Only Par-tial Information Is Available for Some SubjectsMin Zhan and Jeffery Fink University of Maryland

155 PM Cumulative Incidence Function under Two-Stage Random-izationIdil Yavuz1 Yu Cheng2 and Abdus Wahed2 1 Dokuz EylulUniversity 2 University of Pittsburgh

220 PM Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen Jin Columbia University

245 PM Floor Discussion

Session 12 Safety Signal Detection and Safety Analysis(Invited)Room Salon B Lower Level 1Organizer Qi Jiang Amgen IncChair Qi Jiang Amgen Inc

130 PM Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhangand Qi Jiang Amgen Inc

155 PM Application of a Bayesian Method for Blinded Safety Moni-toring and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Inc

220 PM Some Thoughts on the Choice of Metrics for Safety Evalua-tionSteven Snapinn Amgen Inc

245 PM Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2 1Amgen Inc 2Gilead Sci-ences

310 PM Floor Discussion

Session 13 Survival and Recurrent Event Data Analysis(Invited)Room Salon C Lower Level 1Organizer Chiung-Yu Huang Johns Hopkins UniversityChair Chiung-Yu Huang Johns Hopkins University

130 PM Survival Analysis without Survival DataGary Chan University of Washington

155 PM Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2 1Johns Hopkins Uni-versity 2National Institute of Allergy and Infectious Diseases

220 PM Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 andTodd DeFor1 1University of Minnesota 2Johns HopkinsUniversity

245 PM Floor Discussion

Session 14 Statistical Analysis on Massive Data from PointProcesses (Invited)Room Salon D Lower Level 1Organizer Haonan Wang Colorado State UniversityChair Chunming Zhang University of Wisconsin-Madison

130 PM Identification of Synaptic Learning Rule from EnsembleSpiking ActivitiesDong Song and Theodore W Berger University of South-ern California

155 PM Intrinsically Weighted Means and Non-Ergodic MarkedPoint ProcessesAlexander Malinowski1 Martin Schlather1and ZhengjunZhang2 1University Mannheim 2University of Wisconsin

220 PM Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan Wang Colorado State Uni-versity

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 21

Monday June 16 130 PM - 310 PM Scientific Program (Presenting Author)

245 PM Floor Discussion

Session 15 High Dimensional Inference (or Testing) (Invited)Room Salon G Lower Level 1Organizer Pengsheng Ji University of GeorgiaChair Pengsheng Ji University of Georgia

130 PM Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni Sun University of Pennsylvania

155 PM Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen Liu University of Georgia

220 PM Testing High-Dimensional Nonparametric Function withApplication to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and VidyadharMandrekar Michigan State University

245 PM Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping Liu National Institutes of Health

310 PM Floor Discussion

Session 16 Phase II Clinical Trial Design with Survival End-point (Invited)Room Salon H Lower Level 1Organizer Jianrong Wu St Jude Childrenrsquos Research HospitalChair Joan Hu Simon Fraser University

130 PM Utility-Based Optimization of Schedule-Dose Regimesbased on the Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 andMuzaffar Qazilbash1 1University of Texas MD AndersonCancer Center 2University of Michigan

155 PM Bayesian Decision Theoretic Two-Stage Design in Phase IIClinical Trials with Survival EndpointLili Zhao and Jeremy Taylor University of Michigan

220 PM Single-Arm Phase II Group Sequential Trial Design withSurvival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping Xiong St Jude ChildrenrsquosResearch Hospital

245 PM Floor Discussion

Session 17 Statistical Modeling of High-throughput Ge-nomics Data (Invited)Room Salon I Lower Level 1Organizer Mingyao Li University of Pennsylvania School ofMedicineChair Mingyao Li University of Pennsylvania

130 PM Learning Genetic Architecture of Complex Traits AcrossPopulationsMarc Coram Sophie Candille and Hua Tang StanfordUniversity

155 PM A Bayesian Hierarchical Model to Detect DifferentiallyMethylated Loci from Single Nucleotide Resolution Se-quencing DataHao Feng Karen Coneelly and Hao Wu Emory University

220 PM Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and MingyaoLi2 1University of Minnesota 2University of Pennsylva-nia 3Vanderbilt University

245 PM Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei Zou University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 18 Statistical Applications in Finance (Invited)Room Portland Room Lower Level 1Organizer Zheng Su Deerfield CompanyChair Zheng Su Deerfield Company

130 PM A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2 1State University of NewYork 2IBM

155 PM Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming Huo Georgia Institute of Technology

220 PM Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4

and Hsun-Chih Kuo5 1University of Maryland 2Seoul Na-tional Univ 3Auburn University 4Ulsan National Institute ofScience and Technology 5National Chengchi University

245 PM Optimal Sparse Volatility Matrix Estimation for High Di-mensional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou31Florida State University 2University of Wisconsin-Madison3Yale University

310 PM Floor Discussion

Session 19 Hypothesis Testing (Contributed)Room Eugene Room Lower Level 1Chair Fei Tan Indiana University-Purdue University

130 PM A Score-type Test for Heterogeneity in Zero-inflated Modelsin a Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem31Auburn University 2Kansas State University 3MichiganState University

150 PM Inferences on Correlation Coefficients of Bivariate Log-normal DistributionsGuoyi Zhang1 and Zhongxue Chen2 1Universtiy of NewMexico 2Indiana University

210 PM Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 MyrtoBarrdahl3 and Nilanjan Chatterjee1 1National Cancer In-stitute 2Harvard University 3German Cancer Reserch Center

230 PM Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun Ma Amgen Inc

22 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 330 PM - 510 PM

250 PM Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu Li Auburn University

310 PM Floor Discussion

Session 20 Design and Analysis of Clinical Trials (Contributed)Room Salem Room Lower Level 1Chair Amei Amei University of Nevada at Las Vegas

130 PM Application of Bayesian Approach in Assessing Rare Ad-verse Events during a Clinical StudyGrace Li Karen Price Haoda Fu and David Manner EliLilly and Company

150 PM A Simplified Varying-Stage Adaptive Phase IIIII ClinicalTrial DesignGaohong Dong Novartis Pharmaceuticals Corporation

210 PM Improving Multiple Comparison Procedures With Copri-mary Endpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and FrankBretz1 1Novartis Pharmaceuticals Corporation 2Universityof Bremen

230 PM Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine Crespi Univer-sity of California at Los Angeles

250 PM Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and RiskDifferenceTianyue Zhou Sanofi-aventis US LLC

310 PM Floor Discussion

Monday June 16 330 PM - 510 PM

Session 21 New Methods for Big Data (Invited)Room Salon A Lower Level 1Organizer Yichao Wu North Carolina State UniversityChair Yichao Wu North Carolina State University

330 PM Sure Independence Screening for Gaussian Graphical Mod-elsShikai Luo1 Daniela Witten2 and Rui Song1 1North Car-olina State University 2University of Washington

355 PM Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman21Google 2Iowa State University

420 PM Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis 2University of North Car-olina at Chapel Hill

445 PM OEM Algorithm for Big DataXiao Nie and Peter Z G Qian University of Wisconsin-Madison

510 PM Floor Discussion

Session 22 New Statistical Methods for Analysis of High Di-mensional Genomic Data (Invited)Room Salon B Lower Level 1Organizer Michael C Wu Fred Hutchinson Cancer Research Cen-terChair Michael C Wu Fred Hutchinson Cancer Research Center

330 PM Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung Huang Brown University

355 PM Estimation of High Dimensional Directed Acyclic Graphsusing eQTL dataWei Sun1 and Min Jin Ha2 1University of North Carolinaat Chapel Hill 2University of Texas MD Anderson CancerCenter

420 PM Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 andHongyu Zhao1 1Yale University 2University of Texas atDallas 3Bristol-Myers Squibb 4Mount-Sinai Medical Center

445 PM Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and GenotypingStudiesNi Zhao and Michael Wu Fred Hutchinson Cancer Re-search Center

510 PM Floor Discussion

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation Process (Invited)Room Salon C Lower Level 1Organizer Jing Ning University of Texas MD Anderson CancerCenterChair Weining Shen The University of Texas MD Anderson Can-cer Center

330 PM Joint Modeling of Alternating Recurrent Transition TimesLiang Li University of Texas MD Anderson Cancer Cen-ter

355 PM Regression Analysis of Panel Count Data with InformativeObservation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun41University of North Carolina at Charlotte 2University ofMaryland 3University of New Hampshire 4University ofMissouri at Columbia

420 PM Envelope Linear Mixed ModelXin Zhang University of Minnesota

445 PM Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan Cai University ofTexas health Science Center at Houston

510 PM Floor Discussion

Session 24 Bayesian Models for High Dimensional ComplexData (Invited)Room Salon D Lower Level 1Organizer Juhee Lee University of California at Santa CruzChair Juhee Lee University of California at Santa Cruz

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 23

Monday June 16 330 PM - 510 PM Scientific Program (Presenting Author)

330 PM A Bayesian Feature Allocation Model for Tumor Hetero-geneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and KamalakarGulukota4 1University of California at Santa Cruz2University of Texas at Austin 3University of Chicago4Northshore University HealthSystem

355 PM Some Results on the One-Way ANOVA Model with an In-creasing Number of GroupsFeng Liang University of Illinois at Urbana-Champaign

420 PM Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji3 1University ofLouisville 2University of Texas at Austin 3NorthShore Uni-versity HealthSystemUniversity of Chicago

445 PM Latent Space Models for Dynamic NetworksYuguo Chen University of Illinois at Urbana-Champaign

510 PM Floor Discussion

Session 25 Statistical Methods for Network Analysis (Invited)Room Salon G Lower Level 1Organizer Yunpeng Zhao George Mason UniversityChair Yunpeng Zhao George Mason University

330 PM Consistency of Co-clustering for Exchangable Graph and Ar-ray DataDavid S Choi1 and Patrick J Wolfe2 1Carnegie MellonUniversity 2University College London

355 PM Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali Shojaie University of Washing-ton

420 PM Estimating Signature Subgraphs in Samples of LabeledGraphsJuhee Cho and Karl Rohe University of Wisconsin-Madison

445 PM Fast Hierarchical Modeling for Recommender SystemsPatrick Perry New York University

510 PM Floor Discussion

Session 26 New Analysis Methods for Understanding Com-plex Diseases and Biology (Invited)Room Salon H Lower Level 1Organizer Wenyi Wang University of Texas MD Anderson Can-cer CenterChair Wenyi Wang University of Texas MD Anderson CancerCenter

330 PM Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3Yong Zhang2 Myles Brown4 and X Shirley Liu4 1DanaFarber Cancer Institute 2Tongji University 3University ofTexas MD Anderson Cancer Center 4Dana Farber CancerInstitute amp Harvard University

355 PM Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi Wu Harvard University

430 PM Comparative Meta-Analysis of Prognostic Gene Signaturesfor Late-Stage Ovarian CancerLevi Waldron Hunter College

445 PM Studying Spatial Organizations of Chromosomes via Para-metric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and JunS Liu5 1New York university 2Purdue University 3EmoryUniversity 4Tsinghua University 5Harvard University

510 PM Floor Discussion

Session 27 Recent Advances in Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Mikyoung Jun Texas AampM UniversityChair Zhengjun Zhang University of Wisconsin

330 PM Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark vander Woerd Colorado State University

355 PM Semiparametric Estimation of Spectral Density Functionwith Irregular DataShu Yang and Zhengyuan Zhu Iowa State University

420 PM On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and SiegfriedHormann3 1University of California at Davis 2UniversityCollege London 3University Libre de Bruxelles

445 PM A Composite Likelihood-based Approach for MultipleChange-point Estimation in Multivariate Time Series Mod-elsChun Yip Yau and Ting Fung Ma Chinese University ofHong Kong

510 PM Floor Discussion

Session 28 Analysis of Correlated Longitudinal and SurvivalData (Invited)Room Eugene Room Lower Level 1Organizer Jingjing Wu University of CalgaryChair Jingjing Wu University of Calgary

330 PM Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir Mesbah University of Paris 6

355 PM Power and Sample Size Calculations for Evaluating Media-tion Effects with Multiple Mediators in Longitudinal StudiesCuiling Wang Albert Einstein College of Medicine

420 PM Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore2 1Universityof Maryland 2McGill University

445 PM Joint Modeling of Survival Data and Mismeasured Longitu-dinal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi2 1University ofWestern Ontario 2University of Waterloo

510 PM Floor Discussion

24 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 1000 AM - 1200 PM

Session 29 Clinical Pharmacology (Invited)Room Portland Room Lower Level 1Organizer Christine Wang AmgenChair Christine Wang Amgen

330 PM Truly Personalizing Medicine

Mike D Hale Amgen Inc

355 PM What Do Statisticians Do in Clinical Pharmacology

Brian Smith Amgen Inc

420 PM The Use of Modeling and Simulation to Bridge DifferentDosing Regimens - a Case StudyChyi-Hung Hsu and Jose Pinheiro Janssen Research ampDevelopment

445 PM A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao SusanGuo Lijie Zhong and Liang Fang Gilead Sciences

510 PM Floor Discussion

Session 30 Sample Size Estimation (Contributed)Room Salem Room Lower Level 1Chair Antai Wang New Jersey Institute of Technology

330 PM Sample Size Calculation with Semiparametric Analysis ofLong Term and Short Term Hazards

Yi Wang Novartis Pharmaceuticals Corporation

350 PM Sample Size and Decision Criteria for Phase IIB Studies withActive Control

Xia Xu Merck amp Co

410 PM Sample Size Determination for Clinical Trials to CorrelateOutcomes with Potential PredictorsSu Chen Xin Wang and Ying Zhang AbbVie Inc

430 PM Sample Size Re-Estimation at Interim Analysis in OncologyTrials with a Time-to-Event Endpoint

Ian (Yi) Zhang Sunovion Pharmaceuticals Inc

450 PM Statistical Inference and Sample Size Calculation for PairedBinary Outcomes with Missing Data

Song Zhang University of Texas Southwestern MedicalCenter

510 PM Floor Discussion

Tuesday June 17 820 AM - 930 AM

Keynote session II (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Rochelle Fu Oregon Health amp Science University

820 AM Keynote lecture II

Sharon-Lise Normand Harvard University

930 AM Floor Discussion

Tuesday June 17 1000 AM - 1200 PM

Session 31 Predictions in Clinical Trials (Invited)Room Salon A Lower Level 1Organizer Yimei Li University of PennsylvaniaChair Daniel Heitjan University of Pennsylvania

1000 AM Predicting Smoking Cessation Outcomes Beyond ClinicalTrialsYimei Li E Paul Wileyto and Daniel F Heitjan Universityof Pennsylvania

1025 AM Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping Wang Eli Lilly andCompany

1050 AM Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record(EMR) DataHaoda Fu1 and Nan Jia2 1Eli Lilly and Company2University of Southern California

1115 AM Weibull Cure-Mixture Model for the Prediction of EventTimes in Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1 1University of Pennsylvania 2Radiation TherapyOncology Group Statistical Center

1140 AM Floor Discussion

Session 32 Recent Advances in Statistical Genetics (Invited)Room Salon B Lower Level 1Organizer Taesung Park Seoul National UniversityChair Taesung Park Seoul National University

1000 AM Longitudinal Exome-Focused GWAS of Alcohol Use in aVeteran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale University

1025 AM Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2

and Alexander F Wilson1 1National Institutes of Health2Mahidol University

1050 AM GMDR A Conceptual Framework for Detection of Multi-factor Interactions Underlying Complex TraitsXiang-Yang Lou University of Alabama at Birmingham

1115 AM Gene-Gene Interaction Analysis for Rare Variants Applica-tion to T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University 2Sejong University

1140 AM Floor Discussion

Session 33 Structured Approach to High Dimensional Datawith Sparsity and Low Rank Factorization (Invited)Room Salon C Lower Level 1Organizer Yoonkyung Lee Ohio State UniversityChair Yoonkyung Lee Ohio State University

1000 AM Two-way Regularized Matrix DecompositionJianhua Huang Texas AampM University

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 25

Tuesday June 17 1000 AM - 1200 PM Scientific Program (Presenting Author)

1025 AM Tensor Regression with Applications in Neuroimaging Anal-ysisHua Zhou1 Lexin Li1 and Hongtu Zhu2 1North CarolinaState University 2University of North Carolina at Chapel Hill

1050 AM RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2

and Guy Lebanon1 1Georgia Institute of Technology2Pennsylvania State University

1115 AM Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho Chun Purdue University

1140 AM Floor Discussion

Session 34 Recent Developments in Dimension ReductionVariable Selection and Their Applications (Invited)Room Salon D Lower Level 1Organizer Xiangrong Yin University of GeorgiaChair Pengsheng Ji University of Georgia

1000 AM Variable Selection and Model Estimation via Subtle Uproot-ingXiaogang Su University of Texas at El Paso

1025 AM Robust Variable Selection Through Dimension ReductionQin Wang Virginia Commonwealth University

1050 AM Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2 1Universityof Florida 2National University of Singapore

1115 AM Floor Discussion

Session 35 Post-Discontinuation Treatment in RandomizedClinical Trials (Invited)Room Salon G Lower Level 1Organizer Li Li Research Scientist Eli Lilly and CompanyChair Li Li Eli Lilly and Company

1000 AM Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censor-ing by Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries1 1EliLilly and Company 2North Carolina State University

1025 AM Quantile Regression Adjusting for Dependent Censoringfrom Semi-Competing RisksRuosha Li1 and Limin Peng2 1University of Pittsburgh2Emory University

1050 AM Overview of Crossover DesignMing Zhu AbbVie Inc

1115 AM Cross-Payer Effects of Medicaid LTSS on Medicare Re-source Use using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen Johnson Universityof Maryland

1140 AM Floor Discussion

Session 36 New Advances in Semi-parametric Modeling andSurvival Analysis (Invited)Room Salon H Lower Level 1Organizer Yichuan Zhao Georgia State UniversityChair Xuelin Huang University of Texas MD Anderson CancerCenter

1000 AM Bayesian Partial Linear Model for Skewed LongitudinalDataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 StuartLipsitz3 and Steven Lipshultz4 1AbbVie Inc 2Florida StateUniversity 3Brigham and Womenrsquos Hospital 4University ofMiami

1025 AM Nonparametric Inference for Inverse Probability WeightedEstimators with a Randomly Truncated SampleXu Zhang University of Mississippi

1050 AM Modeling Time-Varying Effects for High-Dimensional Co-variates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji Zhu University of Michigan

1115 AM Flexible Modeling of Survival Data with Covariates Subjectto Detection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang21Villanova University 2North Carolina State University

1140 AM Floor Discussion

Session 37 High-dimensional Data Analysis Theory andApplication (Invited)Room Salon I Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1000 AM Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen Zhang University of Arizona

1025 AM High-Dimensional Thresholded Regression and ShrinkageEffectZemin Zheng Yingying Fan and Jinchi Lv University ofSouthern California

1050 AM Local Independence Feature Screening for Nonparametricand Semiparametric Models by Marginal Empirical Likeli-hoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu31University of Melbourne 2University of Colorado Denver3North Carolina State University

1115 AM The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2 1Florida State University2University of Minnesota

1140 AM Floor Discussion

Session 38 Leading Across Boundaries Leadership Develop-ment for Statisticians (Invited Discussion Panel)Room Eugene Room Lower Level 1Organizers Ming-Dauh Wang Eli Lilly and Company RochelleFu Oregon Health amp Science University furohsueduChair Ming-Dauh Wang Eli Lilly and Company

Topic The panel will discuss issues related to importance of lead-ership barriers to leadership overcoming barriers commu-nication and sociability

26 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 130 PM - 310 PM

Panel Xiaoli Meng Harvard University

Dipak Dey University of Connecticut

Soonmin Park Eli Lilly and Company

James Hung United States Food and Drug Administration

Walter Offen AbbVie Inc

Session 39 Recent Advances in Adaptive Designs in EarlyPhase Trials (Invited)Room Portland Room Lower Level 1Organizer Ken Cheung Columbia UniversityChair Ken Cheung Columbia University

1000 AM A Toxicity-Adaptive Isotonic Design for Combination Ther-apy in OncologyRui Qin Mayo Clinic

1025 AM Calibration of the Likelihood Continual ReassessmentMethod for Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung11Columbia University 2Boehringer Ingelheim Pharmaceuti-cals

1050 AM Sequential Subset Selection Procedure of Random SubsetSize for Early Phase Clinical trialsCheng-Shiun Leu and Bruce Levin Columbia University

1115 AM Serach Procedures for the MTD in Phase I TrialsShelemyyahu Zacks Binghamton University

1140 AM Floor Discussion

Session 40 High Dimensional RegressionMachine Learning(Contributed)Room Salem Room Lower Level 1Chair Hanxiang Peng Indiana University-Purdue University

1000 AM Variable Selection for High-Dimensional Nonparametric Or-dinary Differential Equation Models With Applications toDynamic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu11University of Rochester 2State University of New York atAlbany 3George Washington University

1020 AM BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University 2Cornell University

1040 AM A Sparse Linear Discriminant Analysis Method withAsymptotic Optimality for Multiclass ClassificationRuiyan Luo and Xin Qi Georgia State University

1100 AM Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles Kooperberg FredHutchinson Cancer Research Center

1120 AM Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin Li Wright State University

1140 AM Rank Estimation and Recovery of Low-rank Matrices ForFactor Model with Heteroscedastic NoiseJingshu Wang and Art B Owen Stanford University

1200 PM Floor Discussion

Tuesday June 17 130 PM - 310 PM

Session 41 Distributional Inference and its Impact on Statis-tical Theory and Practice (Invited)Room Salon A Lower Level 1Organizers Min-ge Xie Rutgers University Thomas Lee Univer-sity of California at Davis thomascmleegmailcomChair Min-ge Xie Rutgers University

130 PM Stat Wars Episode IV A New Hope (For Objective Infer-ence)Keli Liu and Xiao-Li Meng Harvard University

155 PM Higher Order Asymptotics for Generalized Fiducial Infer-enceAbhishek Pal Majumdarand Jan Hannig University ofNorth Carolina at Chapel Hill

220 PM Generalized Inferential ModelsRyan Martin University of Illinois at Chicago

245 PM Formal Definition of Reference Priors under a General Classof DivergenceDongchu Sun University of Missouri

310 PM Floor Discussion

Session 42 Applications of Spatial Modeling and ImagingData (Invited)Room Salon B Lower Level 1Organizer Karen Kafadar Indiana UniversityChair Karen Kafadar Indiana University

130 PM Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1

and James Coan2 1Duke University 2University of Virginia

155 PM A Hierarchical Model for Simultaneous Detection and Esti-mation in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist2 1DePaul Univer-sity 2Johns Hopkins University

220 PM On the Relevance of Accounting for Spatial Correlation ACase Study from FloridaLinda J Young1 and Emily Leary2 1USDA NASS RDD2University of Florida

245 PM Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico 2University of Texas at Austin

310 PM Floor Discussion

Session 43 Recent Development in Survival Analysis andStatistical Genetics (Invited)Room Salon C Lower Level 1Organizers Junlong Li Harvard University KyuHa Lee HarvardUniversityChair Junlong Li Harvard University

130 PM Restricted Survival Time and Non-proportional HazardsZhigang Zhang Memorial Sloan Kettering Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 27

Tuesday June 17 130 PM - 310 PM Scientific Program (Presenting Author)

155 PM Empirical Null using Mixture Distributions and Its Applica-tion in Local False Discovery RateDoHwan Park University of Maryland

220 PM A Bayesian Illness-Death Model for the Analysis of Corre-lated Semi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici1 1Harvard University 2Dana FarberCancer Institute

245 PM Detection of Chromosome Copy Number Variations in Mul-tiple SequencesXiaoyi Min Chi Song and Heping Zhang Yale University

310 PM Floor Discussion

Session 44 Bayesian Methods and Applications in ClinicalTrials with Small Population (Invited)Room Salon D Lower Level 1Organizer Alan Chiang Eli Lilly and CompanyChair Ming-Dauh Wang Eli Lilly and Company

130 PM Applications of Bayesian Meta-Analytic Approach at Novar-tisQiuling Ally He Roland Fisch and David Ohlssen Novar-tis Pharmaceuticals Corporation

155 PM Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and YuanJi3 1University of Texas at Austin 2Harvard University3University of Texas at Austin

220 PM Innovative Designs and Practical Considerations for Pedi-atric StudiesAlan Y Chiang Eli Lilly and Company

245 PM Discussant Ming-Dauh Wang Eli Lilly and Company

310 PM Floor Discussion

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis (Invited)Room Salon G Lower Level 1Organizer Ming Wang Penn State College of MedicineChair Lijun Zhang Penn State College of Medicine

130 PM partDSA for Deriving Survival Risk Groups EnsembleLearning and Variable SelectionAnnette Molinaro1 Adam Olshen1 and RobertStrawderman2 1University of California at San Francisco2University of Rochester

155 PM Predictive Accuracy of Time-Dependent Markers for Sur-vival OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2 1University ofKentucky 2University of North Carolina at Chapel Hill

220 PM Estimating the Effectiveness in HIV Prevention Trials by In-corporating the Exposure Process Application to HPTN 035DataJingyang Zhang1 and Elizabeth R Brown2 1FredHutchinson Cancer Research Center 2Fred Hutchinson Can-cer Research CenterUniversity of Washington

245 PM Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2 1Penn State College ofMedicine 2Emory University

310 PM Floor Discussion

Session 46 Missing Data the Interface between Survey Sam-pling and Biostatistics (Invited)Room Salon H Lower Level 1Organizer Jiwei Zhao University of WaterlooChair Peisong Han University of Waterloo

130 PM Likelihood-based Inference with Missing Data UnderMissing-at-randomShu Yang and Jae Kwang Kim Iowa State University

155 PM Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang Chen Iowa State University

220 PM A New Estimation with Minimum Trace of Asymptotic Co-variance Matrix for Incomplete Longitudinal Data with aSurrogate ProcessBaojiang Chen1 and Jing Qin2 1University of Nebraska2National Institutes of Health

245 PM Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook2 1Queenrsquos Univer-sity 2University of Waterloo

310 PM Floor Discussion

Session 47 New Statistical Methods for Comparative Effec-tiveness Research and Personalized Medicine (Invited)Room Salon I Lower Level 1Organizer Jane Paik Kim Stanford UniversityChair Jane Paik Kim Stanford University

130 PM Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin41University of Texas MD Anderson Cancer Center 2BaylorCollege of Medicine 3University of Texas MD AndersonCancer Center 4National Institutes of Health

155 PM Choice between Superiority and Non-inferiority in Compar-ative Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford University

220 PM An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze Lai Stanford University

245 PM Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins University

310 PM Floor Discussion

Session 48 Student Award Session 1 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Zhezhen Jin Columbia University

28 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 330 PM - 530 PM

130PM Regularization After Retention in Ultrahigh DimensionalLinear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2 1ColumbiaUniversity 2Binghamton University

155 PM Personalized Dose Finding Using Outcome Weighted Learn-ingGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hill

220 PM Survival Rates Prediction When Training Data and TargetData Have Different Measurement ErrorCheng Zheng and Yingye Zheng Fred Hutchinson CancerResearch Center

245 PM Hard Thresholded Regression Via Linear ProgrammingQiang Sun University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 49 Network AnalysisUnsupervised Methods(Contributed)Room Eugene Room Lower Level 1Chair Chunming Zhang University of Wisconsin-Madison

130 PM Community Detection in Multilayer Networks A Hypothe-sis Testing ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel Hill

150 PM Network Enrichment Analysis with Incomplete Network In-formationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan 2University of Washington

210 PM Estimation of A Linear Model with Fuzzy Data Treated asSpecial Functional DataWang Dabuxilatu Guangzhou University

230 PM Efficient Estimation of Sparse Directed Acyclic Graphs Un-der Compounded Poisson DataSung Won Han and Hua Zhong New York University

250 PM Asymptotically Normal and Efficient Estimation ofCovariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and HarrisonZhou Yale University

310 PM Floor Discussion

Session 50 Personalized Medicine and Adaptive Design(Contributed)Room Salem Room Lower Level 1Chair Danping Liu National Institutes of Health

130 PM MicroRNA Array NormalizationLi-Xuan and Qin Zhou Memorial Sloan Kettering CancerCenter

150 PM Combining Multiple Biomarker Models with Covariates inLogistic Regression Using Modified ARM (Adaptive Re-gression by Mixing) ApproachYanping Qiu1 and Rong Liu2 1Merck amp Co 2BayerHealthCare

210 PM A New Association Test for Case-Control GWAS Based onDisease Allele SelectionZhongxue Chen Indiana University

230 PM On Classification Methods for Personalized Medicine andIndividualized Treatment RulesDaniel Rubin United States Food and Drug Administration

250 PM Bayesian Adaptive Design for Dose-Finding Studies withDelayed Binary ResponsesXiaobi Huang1 and Haoda Fu2 1Merck amp Co 2Eli Lillyand Company

310 PM Floor Discussion

Tuesday June 17 330 PM - 530 PM

Session 51 New Development in Functional Data Analysis(Invited)Room Salon A Lower Level 1Organizer Guanqun Cao Auburn UniversityChair Guanqun Cao Auburn University

330 PM Variable Selection and Estimation for Longitudinal SurveyDataLi Wang1 Suojin Wang2 and Guannan Wang11University of Georgia 2Texas AampM University

355 PM Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and BorisFreydin1 1Thomas Jefferson University 2George Wash-ington University

420 PM A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1 1NewYork University 2Columbia University

445 PM Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan Mizera Universityof Alberta

510 PM Floor Discussion

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs (Invited)Room Salon B Lower Level 1Organizer Gang Li Johnson amp JohnsonChair Yi Wang Novartis Pharmaceuticals Corporation

330 PM Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric Chi Amgen Inc

350 PM New Analytical Methods for Non-Inferiority Trials Covari-ate Adjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug Administration

410 PM Where is the Right Balance for Designing an EfficientBiosimilar Clinical Program - A Biostatistic Perspective onAppropriate Applications of Statistical Principles from NewDrug to BiosimilarsYulan Li Novartis Pharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 29

Tuesday June 17 330 PM - 530 PM Scientific Program (Presenting Author)

430 PM Challenges of designinganalyzing trials for Hepatitis CdrugsGreg Soon United States Food and Drug Administration

450 PM GSKrsquos Patient-level Data Sharing ProgramShuyen Ho GlaxoSmithKline plc

510 PM Floor Discussion

Session 53 Gatekeeping Procedures and Their Applicationin Pivotal Clinical Trials (Invited)Room Salon C Lower Level 1Organizer Michael Lee Johnson amp JohnsonChair Michael Lee Johnson amp Johnson

330 PM A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane2 1Novartis PharmaceuticalsCorporation 2Northwestern University

355 PM Multiple Comparisons in Complex Trial DesignsHM James Hung United States Food and Drug Adminis-tration

420 PM Use of Bootstrapping in Adaptive Designs with MultiplicityIssuesJeff Maca Quintiles

445 PM Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael Lee Janssen Research amp Development

510 PM Floor Discussion

Session 54 Approaches to Assessing Qualitative Interactions(Invited)Room Salon D Lower Level 1Organizer Guohua (James) Pan Johnson amp JohnsonChair James Pan Johnson amp Johnson

330 PM Interval Based Graphical Approach to Assessing QualitativeInteractionGuohua Pan and Eun Young Suh Johnson amp Johnson

355 PM Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong Luo Celgene Corporation

420 PM A Bayesian Approach to Qualitative InteractionEmine O Bayman University of Iowa

445 PM Discussant Surya Mohanty Johnson amp Johnson

510 PM Floor Discussion

Session 55 Interim Decision-Making in Phase II Trials(Invited)Room Salon G Lower Level 1Organizer Lanju Zhang AbbVie IncChair Lanju Zhang AbbVie Inc

330 PM Evaluation of Interim Dose Selection Methods Using ROCApproachDeli Wang Lu Cui Lanju Zhang and Bo Yang AbbVieInc

355 PM Interim Monitoring for Futility Based on Probability of Suc-cessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh31AbbVie Inc 2Merck amp Co 3GlaxoSmithKline plc

420 PM Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran Suresh GlaxoSmithKlineplc

445 PM Discussant Peng Chen Celgene Corporation510 PM Floor Discussion

Session 56 Recent Advancement in Statistical Methods(Invited)Room Salon H Lower Level 1Organizer Dongseok Choi Oregon Health amp Science UniversityChair Dongseok Choi Oregon Health amp Science University

330 PM Exact Inference New Methods and ApplicationsIan Dinwoodie Portland State University

355 PM Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun Hong Sungkyunkwan University

420 PM Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho21Washington State University 2Seoul National University

445 PM A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying Yuan University ofTexas MD Anderson Cancer Center

510 PM Floor Discussion

Session 57 Building Bridges between Research and Practicein Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Jane Chu IBMSPSSChair Jane Chu IBMSPSS

330 PM Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and JinwooShin2 1IBM 2KAIST University

355 PM Time Series Research at the U S Census BureauBrian C Monsell U S Census Bureau

420 PM Issues Related to the Use of Time Series in Model Buildingand AnalysisWilliam WS Wei Temple University

445 PM Discussant George Tiao University of Chicago510 PM Floor Discussion

Session 58 Recent Advances in Design for BiostatisticalProblems (Invited)Room Eugene Room Lower Level 1Organizer Weng Kee Wong University of California at Los Ange-lesChair Weng Kee Wong University of California at Los Angeles

330 PM Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough Carriere University of Al-berta

30 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 830 AM - 1010 AM

355 PM Efficient Algorithms for Two-stage Designs on Phase II Clin-ical TrialsSeongho Kim1 and Weng Kee Wong2 1Wayne State Uni-versityKarmanos Cancer Institute 2University of Californiaat Los Angeles

420 PM Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-ChungWang3 and Weng Kee Wong4 1Institute of Statistical Sci-ence Academia Sinica 2National Cheng Kung University3National Taiwan University 4University of California at LosAngeles

445 PM D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle SwarmOptimizationJiaheng Qiu and Weng Kee Wong University of Californiaat Los Angeles

510 PM Floor Discussion

Session 59 Student Award Session 2 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Wenqing He University of Western Ontario

330 PM Analysis of Sequence Data Under Multivariate Trait-Dependent SamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari ENorth1 Eric Boerwinkle2 and Dan-Yu Lin1 1Universityof North Carolina at Chapel Hill 2University of Texas HealthScience Center

355 PM Empirical Likelihood Based Tests for Stochastic OrderingUnder Right CensorshipHsin-wen Chang and Ian W McKeague Columbia Uni-versity

420 PM Multiple Genetic Loci Mapping for Latent Disease LiabilityUsing a Structural Equation Modeling Approach with Appli-cation in Alzheimerrsquos DiseaseTing-Huei Chen University of North Carolina at ChapelHill

445 PM Floor Discussion

Session 60 Semi-parametric Methods (Contributed)Room Salem Room Lower Level 1Chair Ouhong Wang Amgen Inc

330 PM Semiparametric Estimation of Mean and Variance in Gener-alized Estimating EquationsJianxin Pan1 and Daoji Li2 1The University of Manch-ester 2University of Southern California

350 PM An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan Li IndianaUniversity-Purdue University Indianapolis

410 PM M-estimation for General ARMA Processes with InfiniteVarianceRongning Wu Baruch College City University of NewYork

430 PM Sufficient Dimension Reduction via Principal Lq SupportVector MachineAndreas Artemiou1 and Yuexiao Dong2 1Cardiff Univer-sity 2Temple University

450 PM Nonparametric Quantile Regression via a New MM Algo-rithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong41College of Charleston 1National Chengchi University2Shanghai University of Finance and Economics 3KansasState University 4Temple University

510 PM Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel LinderJingxian Cai and Robert Vogel Georgia Southern Uni-versity

530 PM Floor Discussion

Wednesday June 18 830 AM - 1010 AM

Session 61 Statistical Challenges in Variable Selection forGraphical Modeling (Invited)Room Salon A Lower Level 1Organizer Hua (Judy) Zhong New York UniversityChair Hua (Judy) Zhong New York University

830 AM Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1 1 Univer-sity of Cambridge 2 Columbia University

855 AM High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2 1Temple University 2EmoryUniversity

920 AM Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and MarinaVannucci3 1Stanford University 2University of Texas MDAnderson Cancer Center 3Rice University

945 AM Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genev-era I Allen2 and Zhandong Liu3 1University of Texas atAustin 2Rice University 3Baylor College of Medicine

1010 AM Floor Discussion

Session 62 Recent Advances in Non- and Semi-parametricMethods (Invited)Room Salon B Lower Level 1Organizer Lan Xue Oregon State UniversityChair Quanqun Cao Auburn University

830 AM Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential SplineFamilyLan Zhou Texas AampM University

855 AM Estimating Time-Varying Effects for Overdispersed Recur-rent Data with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2Mouna Akacha3 and Heinz Schmidli3 1Vanderbilt Univer-sity 2University of North Carolina at Chapel Hill 3NovartisPharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 31

Wednesday June 18 830 AM - 1010 AM Scientific Program (Presenting Author)

920 AM Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily Wang University of Georgia

945 AM Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and AnnieQu2 1Oregon State University 2University of Illinois atUrbana-Champaign 3Lung and Blood Institute

1010 AM Floor Discussion

Session 63 Statistical Challenges and Development in Can-cer Screening Research (Invited)Room Salon C Lower Level 1Organizer Yu Shen University of Texas MD Anderson CancerCenterChair Yu Shen Professor University of Texas M D AndersonCancer Center

830 AM Overdiagnosis in Breast and Prostate Cancer ScreeningConcepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing Xia Fred Hutchin-son Cancer Research Center

855 AM Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington 2Fred Hutchinson Cancer Re-search Center

920 AM Estimating Screening Test Effectiveness when Screening In-dication is UnknownRebecca Hubbard Group Health Research Institute

945 AM Developing Risk-Based Screening Guidelines ldquoEqual Man-agement of Equal RisksrdquoHormuzd Katki National Cancer Institute

1010 AM Floor Discussion

Session 64 Recent Developments in the Visualization andExploration of Spatial Data (Invited)Room Salon D Lower Level 1Organizer Juergen Symanzik Utah State UniversityChair Juergen Symanzik Utah State University

830 AM Recent Advancements in Geovisualization with a CaseStudy on Chinese ReligionsJuergen Symanzik1 and Shuming Bao2 1Utah State Uni-versity 2University of Michigan

855 AM Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She2 1University ofMichigan 2Wuhan University

920 AM Probcast Creating and Visualizing Probabilistic WeatherForecastsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3Tilmann Gneiting4 and Adrian Raftery2 1Seattle Uni-versity 2University of Washington 3Bigger Boat Consulting4University Heidelberg

945 AM Discussant Karen Kafadar Indiana University

1010 AM Floor Discussion

Session 65 Advancement in Biostaistical Methods and Ap-plications (Invited)Room Salon G Lower Level 1Organizer Sin-ho Jung Duke UniversityChair Dongseok Choi Oregon Health amp Science University

830 AM Estimation of Time-Dependent AUC under Marker-Dependent SamplingXiaofei Wang and Zhaoyin Zhu Duke University

855 AM A Measurement Error Approach for ModelingAccelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy Dunloop NorthwesternUniversity

920 AM Real-Time Prediction in Clinical Trials A Statistical Historyof REMATCHDaniel F Heitjan and Gui-shuang Ying University ofPennsylvania

945 AM An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C MorrisonElaine C Johnson Stephen R Planck and James T Rosen-baum Oregon Health amp Science University

1010 AM Floor Discussion

Session 66 Analysis of Complex Data (Invited)Room Salon H Lower Level 1Organizer Mesbah Mounir University of Paris 6Chair Mesbah Mounir University of Paris 6

830 AM Integrating Data from Heterogeneous Studies Using OnlySummary Statistics Efficiency and Robustness

Min-ge Xie Rutgers University

855 AM A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University 2Koc University

920 AM A Comparison of Two Approaches for Acute Leukemia Pa-tient ClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng31University of Calgary 2Enbridge Pipelines 3University ofGuelph

945 AM On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 ValerieMcGuire1 and John Shepherd3 1VA Palo Alto HealthCare System amp Stanford University 2Purdue University3University of California at San Francisco 4VA Palo AltoHealth Care System

1010 AM Floor Discussion

Session 67 Statistical Issues in Co-development of Drug andBiomarker (Invited)Room Salon I Lower Level 1Organizer Liang Fang Gilead SciencesChair Liang Fang Gilead Sciences

32 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 1030 AM-1210 PM

830 AM Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in ComparativeEffectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong WooKim3 1Stanford University 2Onyx Pharmaceuticals3Microsoft Corportation

855 AM Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2 1University of Wash-ington 2National Institutes of Health

920 AM An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael Wolf Amgen Inc

945 AM Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas Bengtsson Genentech Inc

1010 AM Floor Discussion

Session 68 New Challenges for Statistical Ana-lystProgrammer (Invited)Room Eugene Room Lower Level 1Organizer Xianming (Steve) Zheng Eli Lilly and CompanyChair Xianming (Steve) Zheng Eli Lilly and Company

830 AM Similarities and Differences in Statistical Programmingamong CRO and Pharmaceutical IndustriesMark Matthews inVentiv Health Clinical

855 AM Computational Aspects for Detecting Safety Signals in Clin-ical TrialsJyoti Rayamajhi Eli Lilly and Company

920 AM Bayesian Network Meta-Analysis Methods An Overviewand A Case StudyBaoguang Han1 Wei Zou2 and Karen Price1 1Eli Lillyand Company 2inVentiv Clinical Health

945 AM Floor Discussion

Session 69 Adaptive and Sequential Methods for ClinicalTrials (Invited)Room Portland Room Lower Level 1Organizers Zhengjia Chen Emory University Yichuan ZhaoGeorgia State University yichuangsueduChair Zhengjia Chen Emory University

830 AM Bayesian Data Augmentation Dose Finding with ContinualReassessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2 1 University ofTexas MD Anderson Cancer Center 2 University of HongKong

855 AM Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying Yuan University of TexasMD Anderson Cancer Center

920 AM Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston

945 AM Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen21 Georgia State University 2 Emory University

1010 AM Floor Discussion

Wednesday June 18 1030 AM-1210 PM

Session 70 Survival Analysis (Contributed)Room Portland Room Lower Level 1Chair Zhezhen Jin Columbia University

1030 AM Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua Naranjo Western Michi-gan University

1050 AM Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary Endpoint

Nibedita Bandyopadhyay Janssen Research amp Develop-ment

1110 AM Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen Cai University of North Carolinaat Chapel Hill

1130 AM Floor Discussion

Session 71 Complex Data Analysis Theory and Application(Invited)Room Salon A Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1030 AM Supervised Singular Value Decomposition and Its Asymp-totic Properties

Gen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill 2Rutgers Uni-versity

1055 AM New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng21University of Arizona 2Columbia University

1120 AM A Statistical Approach to Set Classification by Feature Se-lection with Applications to Classification of HistopathologyImages

Sungkyu Jung1 and Xingye Qiao2 1University of Pitts-burgh 2Binghamton University State University of NewYork

1145 AM A Smoothing Spline Model for analyzing dMRI Data ofSwallowing

Binhuan Wang Ryan Branski Milan Amin and Yixin FangNew York University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 33

Wednesday June 18 1030 AM-1210 PM Scientific Program (Presenting Author)

Session 72 Recent Development in Statistics Methods forMissing Data (Invited)Room Salon B Lower Level 1Organizer Nanhua Zhang Cincinnati Childrenrsquos Hospital MedicalCenterChair Haoda Fu Eli Lilly and Company

1030 AM A Semiparametric Inference to Regression Analysis withMissing Covariates in Survey DataShu Yang and Jae-kwang Kim Iowa State University

1055 AM Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2 1University of Waterloo2University of Michigan

1120 AM Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1 1United States Centers forDisease Control and Prevention

1145 AM Marginal Treatment Effect Estimation Using Pattern-Mixture ModelZhenzhen Xu United States Food and Drug Administration

1210 PM Floor Discussion

Session 73 Machine Learning Methods for Causal Inferencein Health Studies (Invited)Room Salon C Lower Level 1Organizer Mi-Ok Kim Cincinnati Childrenrsquos Hospital MedicalCenterChair Mi-Ok Kim Cincinnati Childrenrsquos Hospital Medical Center

1030 AM Causal Inference of Interaction Effects with Inverse Propen-sity Weighting G-Computation and Tree-Based Standard-izationJoseph Kang1 Xiaogang Su2 Lei Liu1 and MarthaDaviglus3 1 Northwestern University 2 University of Texasat El Paso 3 University of Illinois at Chicago

1055 AM Practice of Causal Inference with the Propensity of BeingZero or OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and PeterM Steiner3 1 Northwestern University 2University ofCincinnatiCincinnati Childrenrsquos Hospital Medical Center3University of Wisconsin-Madison

1120 AM Propensity Score and Proximity Matching Using RandomForestPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1 1SanDiego State University 2University of Texas at El Paso

1145 AM Discussant Joseph Kang Northwestern University

1210 PM Floor Discussion

Session 74 JP Hsu Memorial Session (Invited)Room Salon D Lower Level 1Organizers Lili Yu Georgia Southern University Karl PeaceGeorgia Southern University kepeacegeorgiasoutherneduChair Lili Yu Georgia Southern University

1030 AM Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili Yu Georgia Southern University

1055 AM (Student Paper Award) Estimating a Change-Point in High-Dimensional Markov Random Field Models Sandipan RoyUniversity of Michigan

1120 AM A Comparison of Size and Power of Tests of Hypotheses onParameters Based on Two Generalized Lindley DistributionsMacaulay Okwuokenye Biogen Idec

1145 AM Floor Discussion

Session 75 Challenge and New Development in Model Fit-ting and Selection (Invited)Room Salon G Lower Level 1Organizer Zhezhen Jin Columbia UniversityChair Cuiling Wang Yeshiva University

1030 AM Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2 1University ofNevada at Las Vegas 2American Museum of Natural History

1055 AM On A Class of Maximum Empirical Likelihood EstimatorsDefined By Convex FunctionsHanxiang Peng and Fei Tan Indiana University-PurdueUniversity Indianapolis

1120 AM Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai Wang New Jersey Institute of Technology

1145 AM Dual Model Misspecification in Generalized Linear Modelswith Error in VariablesXianzheng Huang University of Southern California

1210 PM Floor Discussion

Session 76 Advanced Methods and Their Applications inSurvival Analysis (Invited)Room Salon H Lower Level 1Organizers Jiajia Zhang University of South Carolina Wenbin LuNorth Carolina State UniversityChair Jiajia Zhang University of South Carolina

1030 AM Kernel Smoothed Profile Likelihood Estimation in the Ac-celerated Failure Time Frailty Model for Clustered SurvivalDataBo Liu1 Wenbin Lu1 and Jiajia Zhang2 1North CarolinaState University 2South Carolina University

1055 AM Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2 1National Uni-versity of Singapore 2Emory University

1120 AM Analysis of Event History Data in Tuberculosis (TB) Screen-ingJoan Hu Simon Fraser University

1145 AM On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1

and Mei-Cheng Wang3 1University of Texas MD An-derson Cancer Center 2University of Texas Health ScienceCenter at Houston 3Johns Hopkins University

1210 PM Floor Discussion

34 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Session name

Session 77 High Dimensional Variable Selection and Multi-ple Testing (Invited)Room Salon I Lower Level 1Organizer Zhigen Zhao Temple UniversityChair Jichun Xie Temple University

1030 AM On Procedures Controlling the False Discovery Rate forTesting Hierarchically Ordered HypothesesGavin Lynch and Wenge Guo New Jersey Institute ofTechnology

1055 AM Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 andYufeng Liu4 1University of Texas MD Anderson Can-

cer Center 2North Carolina State University 3University ofArizona 4University of North Carolina at Chapel Hill

1120 AM Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional Regression

Zhigen Zhao1 and Pengsheng Ji2 1Temple University2University of Georgia

1145 AM Pathwise Calibrated Active Shooting Algorithm with Appli-cation to Semiparametric Graph Estimation

Tuo Zhao1 and Han Liu2 1Johns Hopkins University2Princeton University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 35

Abstracts

Abstracts

Session 1 Emerging Statistical Methods for ComplexData

Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao GuoUniversity of Wisconsin-MadisoncmzhangstatwisceduIn statistical analysis of functional magnetic resonance imaging(fMRI) dealing with the temporal correlation is a major challengein assessing changes within voxels In this paper we aim to addressthis issue by considering a semi-parametric model for fMRI dataFor the error process in the semi-parametric model we constructa banded estimate of the auto-correlation matrix R and propose arefined estimate of the inverse of R Under some mild regularityconditions we establish consistency of the banded estimate with anexplicit convergence rate and show that the refined estimate con-verges under an appropriate norm Numerical results suggest thatthe refined estimate performs conceivably well when it is applied tothe detection of the brain activity

Kernel Additive Sliced Inverse RegressionHeng LianNanyang Technological UniversityshellinglianhenghotmailcomIn recent years nonlinear sufficient dimension reduction (SDR)methods have gained increasing popularity However while semi-parametric models in regression have fascinated researchers for sev-eral decades with a large amount of literature parsimonious struc-tured nonlinear SDR has attracted little attention so far In this pa-per extending kernel sliced inverse regression we study additivemodels in the context of SDR and demonstrate its potential use-fulness due to its flexibility and parsimony Theoretically we clar-ify the improved convergence rate using additive structure is due tofaster rate of decay of the kernelrsquos eigenvalues Additive structurealso opens the possibility of nonparametric variable selection Thissparsification of the kernel however does not introduce additionaltuning parameters in contrast with sparse regression Simulatedand real data sets are presented to illustrate the benefits and limita-tions of the approach

Variable Selection with Prior Information for Generalized Lin-ear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3

1Oregon State University2Nielsen Company3Nielsen CompanyyuanjiangstatoregonstateeduLASSO is a popular statistical tool often used in conjunction withgeneralized linear models that can simultaneously select variablesand estimate parameters When there are many variables of in-terest as in current biological and biomedical studies the powerof LASSO can be limited Fortunately so much biological andbiomedical data have been collected and they may contain usefulinformation about the importance of certain variables This paperproposes an extension of LASSO namely prior LASSO (pLASSO)to incorporate that prior information into penalized generalized lin-ear models The goal is achieved by adding in the LASSO criterion

function an additional measure of the discrepancy between the priorinformation and the model For linear regression the whole solu-tion path of the pLASSO estimator can be found with a proceduresimilar to the Least Angle Regression (LARS) Asymptotic theoriesand simulation results show that pLASSO provides signicant im-provement over LASSO when the prior information is relatively ac-curate When the prior information is less reliable pLASSO showsgreat robustness to the misspecication We illustrate the applicationof pLASSO using a real data set from a genome-wide associationstudy

Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2

1University of Missouri at Columbia2Purdue UniversityzhangxianymissourieduIn this talk we will focus on the problem of conducting inferencefor high dimensional weakly dependent time series Motivated bythe applications in modern high dimensional inference we derive aGaussian approximation result for the maximum of a sum of weaklydependent vectors using Steinrsquos method where the dimension ofthe vectors is allowed to be exponentially larger than the samplesize Our result reveals an interesting phenomenon arising fromthe interplay between the dependence and dimensionality the moredependent of the data vector the slower diverging rate of the di-mension is allowed for obtaining valid statistical inference A typeof dimension-free dependence structure is derived as a by-productBuilding on the Gaussian approximation result we propose a block-wise multiplier (Wild) bootstrap that is able to capture the depen-dence between and within the data vectors and thus provides high-quality distributional approximation to the distribution of the maxi-mum of vector sum in the high dimensional context

Session 2 Statistical Methods for Sequencing Data Anal-ysis

A Penalized Likelihood Approach for Robust Estimation of Iso-form ExpressionHui Jiang1 and Julia Salzman2

1University of Michigan2Stanford UniversityjianghuiumicheduUltra high-throughput sequencing of transcriptomes (RNA-Seq) hasenabled the accurate estimation of gene expression at individual iso-form level However systematic biases introduced during the se-quencing and mapping processes as well as incompleteness of thetranscript annotation databases may cause the estimates of isoformabundances to be unreliable and in some cases highly inaccurateThis paper introduces a penalized likelihood approach to detect andcorrect for such biases in a robust manner Our model extends thosepreviously proposed by introducing bias parameters for reads AnL1 penalty is used for the selection of non-zero bias parametersWe introduce an efficient algorithm for model fitting and analyzethe statistical properties of the proposed model Our experimentalstudies on both simulated and real datasets suggest that the model

36 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

has the potential to improve isoform-specific gene expression es-timates and identify incompletely annotated gene models This isjoint work with Julia Salzman

Classification on Sequencing Data and its Applications on a Hu-man Breast Cancer DatasetJun LiUniversity of Notre DamejunlindeduGene expression measured by the RNA-sequencing technique canbe used to classify biological samples from different groups suchas normal vs early-stage cancer vs cancer To get an interpretableclassifier with high robustness and generality often some types ofshrinkage is used to give a linear and sparse model In microarraydata an example is PAM (pattern analysis of microarrays) whichuses a nearest shrunken centroid classifier To accommodate the dis-crete nature of sequencing data this model was modified by usinga Poisson distribution We further generalize this model by usinga negative binomial distribution to take account of the overdisper-sion in the data We compare the performance of Gaussian Poissonand negative binomial based models on simulation data as well asa human breast cancer dataset We find while the cross-validationmisclassification rate of the three methods are often quite similarthe number of genes used by the models can be quite different andusing Gaussian model on carefully normalized data typically givesmodels with the least number of genes

Power-Robustness Analysis of Statistical Models for RNA Se-quencing DataGu Mi Yanming Di and Daniel W SchaferOregon State UniversitymigstatoregonstateeduWe present results from power-robustness analysis of several sta-tistical models for RNA sequencing (RNA-Seq) data We fit themodels to several RNA-Seq datasets perform goodness-of-fit teststhat we developed (Mi extitet al 2014) and quantify variations notexplained by the fitted models The statistical models we comparedare all based on the negative binomial (NB) distribution but differin how they handle the estimation of the dispersion parameter Thedispersion parameter summarizes the extra-Poisson variation com-monly observed in RNA-Seq data One widely-used power-savingstrategy is to assume some commonalities of NB dispersion param-eters across genes via simple models relating them to mean expres-sion rates and many such models have been proposed Howeverthe power benefit of the dispersion-modeling approach relies on theestimated dispersion models being adequate It is not well under-stood how robust the approach is if the fitted dispersion models areinadequate Our empirical investigations provide a further step to-wards understanding the pros and cons of different NB dispersionmodels and draw attention to power-robustness evaluation a some-what neglected yet important aspect of RNA-Seq data analysis

Session 3 Modeling Big Biological Data with ComplexStructures

High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1

1University of California at DavisjiepengucdaviseduProbabilistic graphical models are used as graphical presentationsof probability distributions particularly their conditional indepen-dence properties Graphical models have broad applications in the

fields of biology social science linguistic neuroscience etc Wewill focus on graphical model structure learning under the high di-mensional regime where to avoid over-fitting and to develop com-putationally efficient algorithms are particularly challenging Wewill discuss the use of data perturbation and model aggregation formodel building and model selection

Statistical Analysis of RNA Sequencing DataMingyao Li and Yu HuUniversity of PennsylvaniamingyaomailmedupenneduRNA sequencing (RNA-Seq) has rapidly replaced microarrays asthe major platform for transcriptomics studies Statistical analysisof RNA-Seq data however is challenging because various biasespresent in RNA-Seq data complicate the analysis and if not ap-propriately corrected can affect isoform expression estimation anddownstream analysis In this talk I will first present PennSeq astatistical method that estimates isoform-specific gene expressionPennSeq is a nonparametric-based approach that allows each iso-form to have its own non-uniform read distribution By giving ad-equate weight to the underlying data this empirical approach max-imally reflects the true underlying read distribution and is effectivein adjusting non-uniformity In the second part of my talk I willpresent a statistical method for testing differential alternative splic-ing by jointly modeling multiple samples I will show simulationresults as well as some examples from a clinical study

Quantifying the Role of Steric Constraints in Nucleosome Posi-tioningH Tomas Rube and Jun S SongUniversity of Illinois at Urbana-ChampaignsongjillinoiseduStatistical positioning the localization of nucleosomes packedagainst a fixed barrier is conjectured to explain the array of well-positioned nucleosomes at the 5rsquo end of genes but the extent andprecise implications of statistical positioning in vivo are unclear Iwill examine this hypothesis quantitatively and generalize the ideato include moving barriers Early experiments noted a similarity be-tween the nucleosome profile aligned and averaged across genes andthat predicted by statistical positioning however our study demon-strates that the same profile is generated by aligning random nu-cleosomes calling the previous interpretation into question Newrigorous analytic results reformulate statistical positioning as pre-dictions on the variance structure of nucleosome locations in indi-vidual genes In particular a quantity termed the variance gradientdescribing the change in variance between adjacent nucleosomes istested against recent high-throughput nucleosome sequencing dataConstant variance gradients render evidence in support of statisticalpositioning in about 50 of long genes Genes that deviate frompredictions have high nucleosome turnover and cell-to-cell gene ex-pression variability Our analyses thus clarify the role of statisticalpositioning in vivo

Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I MiasMichigan State UniversitygmiasmsueduThe emergence and ready availability of novel -omics technologiesis guiding our efforts to make advances in the implementation ofpersonalized medicine High quality genomic data is now comple-mented with other dynamic omes (eg transcriptomes proteomesmetabolomes autoantibodyomes) and other data providing tem-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 37

Abstracts

poral profiling of thousands of molecular components The anal-ysis of such dynamic omics data necessitates the development ofnew statistical and computational methodology towards the inte-gration of the different platforms Such an approach allows us tofollow changes in the physiological states of an individual includ-ing pathway changes over time and associated network interactions(inferred nodes amp connections) A framework implementing suchmethodology will be presented in association with a pilot person-alized medicine study that monitored an initially healthy individ-ual over multiple healthy and disease states The framework willbe described including raw data analysis approaches for transcrip-tome (RNA) sequencing mass spectrometry (proteins and smallmolecules) and protein array data and an overview of quantita-tion methods available for each analysis Examples how the data isintegrated in this framework using the personalized medicine pilotstudy will also be presented The extended framework infers novelpathways components and networks assessing topological changesand is being applied to other longitudinal studies to display changesthrough dynamical biological states Assessing such multimodalomics data has the great potential for implementations of a morepersonalized precise and preventative medicine

Session 4 Bayesian Approaches for Modeling DynamicNon-Gaussian Responses

Binary State Space Mixed Models with Flexible Link FunctionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut2Amgen Inc3Federal University of Rio de JaneirodipakdeyuconneduState space models (SSM) for binary time series data using a flexibleskewed link functions are introduced in this paper Commonly usedlogit cloglog and loglog links are prone to link misspecificationbecause of their fixed skewness Here we introduce three flexiblelinks as alternatives they are generalized extreme value (GEV) linksymmetric power logit (SPLOGIT) link and scale mixture of nor-mal (SMN) link Markov chain Monte Carlo (MCMC) methods forBayesian analysis of SSM with these links are implemented usingthe JAGS package a freely available software Model comparisonrelies on the deviance information criterion (DIC) The flexibilityof the propose model is illustrated to measure effects of deep brainstimulation (DBS) on attention of a macaque monkey performinga reaction-time task (Smith et al 2009) Empirical results showedthat the flexible links fit better over the usual logit and cloglog links

Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak K Dey21University of Cincinnati2University of Connecticut3Lawrence Berkeley National LaboratoryxiawanguceduA Bayesian hierarchical model is developed for count data with spa-tial and temporal correlations as well as excessive zeros unevensampling intensities and inference on missing spots Our contribu-tion is to develop a model on zero-inflated count data that providesflexibility in modeling spatial patterns in a dynamic manner andalso improves the computational efficiency via dimension reductionThe proposed methodology is of particular importance for studyingspecies presence and abundance in the field of ecological sciences

The proposed model is employed in the analysis of the survey databy the Northeast Fisheries Sciences Center (NEFSC) for estimationand prediction of the Atlantic cod in the Gulf of Maine - GeorgesBank region Model comparisons based on the deviance informa-tion criterion and the log predictive score show the improvement bythe proposed spatial-temporal model

Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing WengNational Chengchi UniversitychwengnccuedutwThe Bayesian item response models have been used in modeling ed-ucational testing and Internet ratings data Typically the statisticalanalysis is carried out using Markov Chain Monte Carlo (MCMC)methods However MCMC methods may not be computational fea-sible when real-time data continuously arrive and online parameterestimation is needed We develop an efficient algorithm based ona deterministic moment matching method to adjust the parametersin real-time The proposed online algorithm works well for tworeal datasets Moreover when compared with the offline MCMCmethods it achieves good accuracy but with considerably less com-putational time

Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras Stephen Pris-ley James Campbell and David GainesVirginia Institute of TechnologyjielivteduThe increasing demand for modeling spatio-temporal data is com-putationally challenging due to the large scale spatial and temporaldimensions involved The traditional Markov Chain Monte Carlo(MCMC) method suffers from slow convergence and is computa-tionally expensive The Integrated Nested Laplace Approximation(INLA) has been proposed as an alternative to speed up the compu-tation process by avoiding the extensive sampling process requiredby MCMC However even with INLA handling large-scale spatio-temporal prediction datasets remains difficult if not infeasible inmany cases This chapter proposes a new Divide-Recombine (DR)prediction method for dealing with spatio-temporal data A largespatial region is divided into smaller subregions and then INLA isapplied to fit a spatio-temporal model to each subregion To recoverthe spatial dependence an iterative procedure has been developedto recombine the model fitting and prediction results In particularthe new method utilizes a model offset term to make adjustmentsfor each subregion using information from neighboring subregionsStable estimationprediction results are obtained after several updat-ing iterations Simulations are used to validate the accuracy of thenew method in model fitting and prediction The method is thenapplied to the areal (census tract level) count data for Lyme diseasecases in Virginia from 2003 to 2010

Session 5 Recent Advances in Astro-Statistics

Embedding the Big Bang Cosmological Model into a BayesianHierarchical Model for Super Nova Light Curve DataDavid van Dyk Roberto Trotta Xiyun Jiao and Hikmatali ShariffImperial College LondondvandykimperialacukThe 2011 Nobel Prize in Physics was awarded for the discovery thatthe expansion of the Universe is accelerating This talk describes a

38 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Bayesian model that relates the difference between the apparent andintrinsic brightnesses of object to their distance which in turn de-pends on parameters that describe this expansion While apparentbrightness can be readily measured intrinsic brightness can only beobtained for certain objects Type Ia Supernova occur when ma-terial accreting onto a white dwarf drives mass above a thresholdand triggers a powerful supernova explosion Because this occursonly in a particular physical scenario we can use covariates to es-timate intrinsic brightness We use a hierarchical Bayesian modelto leverage this information to study the expansion history of theUniverse The model includes computer models that relate expan-sion parameters to observed brightnesses along with componentsthat account for measurement error data contamination dust ab-sorption repeated measures and covariate adjustment uncertaintySophisticated MCMC methods are employed for model fitting and asecondary Bayesian analysis is conducted for residual analysis andmodel checking

Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew Graham CiroDonalek and Andrew DrakeCalifornia Institute of TechnologyaamastrocaltecheduAstronomy datasets have been large and are getting larger by the day(TB to PB) This necessitates the use of advanced statistics for manypurposesHowever the datasets are often so large that small contam-ination rates imply large number of wrong results This makes blindapplications of methodologies unattractive Astronomical transientsare one area where rapid follow-up observations are required basedon very little data We show how the use of domain knowledge inthe right measure at the right juncture can improve classificationperformance We demonstrate this using Bayesian Networks andGaussian Process Regression on datasets from the Catalina Real-Time transient Survey which has covered 80 of the sky severaltens to a few hundreds of times over the last decade This becomeeven more critical as we move beyond PB-sized datasets in the com-ing years

Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku VrtilekHarvard UniversitybornnstatharvardeduBecause of their singular nature the primary method to obtain in-formation about stellar mass black holes is to study those that arepart of a binary system However we have no widely applicablemeans of determining the nature of the compact object (whether ablack hole [BH] or a neutron star [NS]) in a binary system Thedefinitive method is dynamic measurement of the mass of the com-pact object and that can be reliably established only for eclipsingsystems The motivation for finding a way to differentiate the pres-ence of NH or BH in any XRB system is strong subtle differencesin the behavior of neutron star and black hole X-ray binaries providetests of fundamental features of gravitation such as the existence ofa black hole event horizon In this talk we present a statistical ap-proach for classifying binary systems using a novel 3D representa-tion called a color-color-intensity diagram combined with nonlinearclassification techniques The method provides natural and accurateprobabilistic classifications of X-ray binary objects

Persistent Homology and the Topology of the IntergalacticMediumFabrizio LecciCarnegie Mellon University

leccicmueduLight we observe from quasars has traveled through the intergalacticmedium (IGM) to reach us and leaves an imprint of some proper-ties of the IGM on its spectrum There is a particular imprint ofwhich cosmologists are familiar dubbed the Lyman-alpha forestFrom this imprint we can infer the density fluctuations of neutralhydrogen along the line of sight from us to the quasar With cosmo-logical simulation output we develop a methodology using localpolynomial smoothing to model the IGM Then we study its topo-logical features using persistent homology a method for probingtopological properties of point clouds and functions Describing thetopological features of the IGM can aid in our understanding of thelarge-scale structure of the Universe along with providing a frame-work for comparing cosmological simulation output with real databeyond the standard measures Motivated by this example I willintroduce persistent homology and describe some statistical tech-niques that allow us to separate topological signal from topologicalnoise

Session 6 Statistical Methods and Application in Genet-ics

Identification of Homogeneous and Heterogeneous CovariateStructure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1

1New York University2North Carolina State Universityxc311nyueduPooled analyses which make use of data from multiple studies asa single dataset can achieve large sample size to increase statisticalpower When inter-study heterogeneity exists however the simplepooling strategy may fail to present a fair and complete picture onvariables with heterogeneous effects Therefore it is of great im-portance to know the homogeneous and heterogeneous structure ofvariables in pooled studies In this presentation we propose a penal-ized partial likelihood approach with adaptively weighted compos-ite penalties on variablesrsquo homogeneous effects and heterogeneouseffects We show that our method can characterize the structure ofvariables as heterogeneous homogeneous and null effects and si-multaneously provide inference for the non-zero effects The resultsare readily extended to the high-dimension situation where the num-ber of parameters diverges with sample size The proposed selectionand estimation procedure can be easily implemented using the iter-ative shooting algorithm We conduct extensive numerical studiesto evaluate the practical performance of our proposed method anddemonstrate it using real studies

Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibrosisin Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia Guillaume Wettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLCwenfeizhangsanoficomTranslational biomarkers are markers that produce biological sig-nals translatable from animal models to human models Identify-ing translational biomarkers can be important for disease diagno-sis prognosis and risk prediction in drug development Thereforethere is a growing demand on statistical analyses for biomarker dataespecially for large and complex genetic data To ensure the qual-ity of statistical analyses we develop a statistical analysis pipeline

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 39

Abstracts

for gene expression data When the pipeline is applied to gene ex-pression data from drug induced idiopathic pulmonary fibrosis inanimal models it shows some interesting results in evaluating thetranslatability of genes through comparisons with human models

DNA Methylation Cell-Type Distribution and EWASE Andres HousemanOregon State UniversityandreshousemanoregonstateeduEpigenetic processes form the principal mechanisms by which celldifferentiation occurs Consequently DNA methylation measure-ments are strongly influenced by the DNA methylation profilesof constituent cell types as well as by their mixing proportionsEpigenomewide association studies (EWAS) aim to find associ-ations of phenotype or exposure with DNA methylation at sin-gle CpG dinucleotides but these associations are potentially con-founded by associations with overall cell-type distribution In thistalk we review the literature on epigenetics and cell mixture Wethen present two techniques for mixture-adjusted EWAS the firstrequires a reference data set which may be expensive or infeasibleto collect while the other is free of this requirement Finally weprovide several data analysis examples using these techniques

Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and Iuliana Lonita-Laza1

1Columbia University2New York Universityyw2148columbiaeduCase-control designs are widely used in epidemiology and otherfields to identify factors associated with a disease of interest Thesestudies can also be used to study the associations of risk factorswith secondary outcomes such as biomarkers of the disease andprovide cost-effective way to understand disease mechanism Mostof the existing methods have focused on inference on the mean ofsecondary outcomes In this paper we propose a quantile-based ap-proach We construct a new family of estimating equations to makeconsistent and efficient estimation of conditional quantiles using thecase-control sample and also develop tools for statistical inferenceSimulations are conducted to evaluate the practical performance ofthe proposed approach and a case-control study on genetic associ-ation with asthma is used to demonstrate the method

Session 7 Statistical Inference of Complex Associationsin High-Dimensional Data

Leveraging for Big Data RegressionPing MaUniversity of GeorgiapingmaugaeduAdvances in science and technology in the past a few decades haveled to big data challenges across a variety of fields Extractionof useful information and knowledge from big data has become adaunting challenge to both the science community and entire soci-ety To tackle this challenge requires major breakthroughs in effi-cient computational and statistical approaches to big data analysisIn this talk I will present some leveraging algorithms which makea key contribution to resolving the grand challenge In these algo-rithms by sampling a very small representative sub-dataset usingsmart algorithms one can effectively extract relevant informationof vast data sets from the small sub-dataset Such algorithms arescalable to big data These efforts allow pervasive access to big data

analytics especially for those who cannot directly use supercomput-ers More importantly these algorithms enable massive ordinaryusers to analyze big data using tablet computers

Reference-free Metagenomics Analysis Using Matrix Factoriza-tionWenxuan Zhong and Xin XingUniversity of Georgiawenxuanugaedu

metagenomics refers to the study of a collection of genomes typi-cally microbial genomes present in a sample The sample itself cancome from diverse sources depending on the study eg a samplefrom the gastrointestinal tract of a human patient from or a sam-ple of soil from a particular ecological origin The premise is thatby understanding the genomic composition of the sample one canform hypotheses about properties of the sample eg disease corre-lates of the patient or ecological health of the soil source Existingmethods are limited in complex metagenome studies by consider-ing the similarity between some short DNA fragments and genomesin database In this talk I will introduce a reference free genomedeconvolution algorithm that can simultaneously estimate the com-position of a microbial community and estimate the quantity of eachspecies some theoretical results of the deconvolution method willalso be discussed

Big Data Big models Big Problems Statistical Principles andPractice at ScaleAlexander W BlockerGoogleawblockergooglecom

Massive datasets can yield great insights but only when unitedwith sound statistical principles and careful computation We sharelessons from a set of problems in industry all which combine clas-sical design and theory with large-scale computation Simply ob-taining reliable confidence intervals means grappling with complexdependence and distributed systems and obtaining masses of addi-tional data can actually degrade estimates without careful inferenceand computation These problems highlight the opportunities forstatisticians to provide a distinct contribution to the world of bigdata

Session 8 Recent Developments in Survival Analysis

Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical TrialsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2 Mark EBoye3 and Wei Shen3

1University of Connecticut2University of North Carolina3Eli Lilly and Companyming-huichenuconnedu

Motivated from the large phase III multicenter randomized single-blind EMPHACIS mesothelioma clinical trial we develop a classof shared parameter joint models for multi-dimensional longitudi-nal and survival data Specifically we propose a class of multivari-ate mixed effects regression models for multi-dimensional longitu-dinal measures and a class of frailty and cure rate survival mod-els for progression free survival (PFS) time and overall survival(OS) time The properties of the proposed models are examinedin detail In addition we derive the decomposition of the loga-rithm of the pseudo marginal likelihood (LPML) (ie LPML =

40 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

LPMLLong +LPMLSurv|Long) to assess the fit of each compo-nent of the joint model and in particular to assess the fit of the lon-gitudinal component and the survival component of the joint modelseparately and further use ∆LPML to determine the importance andcontribution of the longitudinal data to the model fit of the survivaldata Moreover efficient Markov chain Monte Carlo sampling algo-rithms are developed to carry out posterior computation We applythe proposed methodology to a detailed case study in mesothelioma

Estimating Risk with Time-to-Event Data An Application tothe Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu2

1Vanderbilt University2Fred Hutchinson Cancer Research Centerdandanliuvanderbiltedu

Accurate and individualized risk prediction is critical for popula-tion control of chronic diseases such as cancer and cardiovasculardisease Large cohort studies provide valuable resources for build-ing risk prediction models as the risk factors are collected at thebaseline and subjects are followed over time until disease occur-rence or termination of the study However for rare diseases thebaseline risk may not be estimated reliably based on cohort dataonly due to sparse events In this paper we propose to make useof external information to improve efficiency for estimating time-dependent absolute risk We derive the relationship between exter-nal disease incidence rates and the baseline risk and incorporate theexternal disease incidence information into estimation of absoluterisks while allowing for potential difference of disease incidencerates between cohort and external sources The asymptotic distribu-tions for the proposed estimators are established Simulation resultsshow that the proposed estimator for absolute risk is more efficientthan that based on the Breslow estimator which does not utilize ex-ternal disease incidence rates A large cohort study the WomenrsquosHealth Initiative Observational Study is used to illustrate the pro-posed method

Efficient Estimation of Nonparametric Genetic Risk Functionwith Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng3

1Columbia University2Beijing Normal University3University of North Carolina at Chapel Hillyw2016columbiaedu

With an increasing number of causal genes discovered forMendelian and complex human disorders it is important to assessthe genetic risk distribution functions of disease onset for subjectswho are carriers of these causal mutations and compare them withthe disease distribution in non-carriers In many genetic epidemi-ological studies of the genetic risk functions the disease onset in-formation is subject to censoring In addition subjectsrsquo mutationcarrier or non-carrier status is unknown due to thecost of ascertain-ing subjects to collect DNA samples or due to death in older sub-jects (especially for late onset disease) Instead the probability ofsubjectsrsquo genetic marker or mutation status can be obtained fromvarious sources When genetic status is missing the available datatakes the form of mixture censored data Recently various meth-ods have been proposed in the literature using parametric semi-parametric and nonparametric models to estimate the genetic riskdistribution functions from such data However none of the existingapproach is efficient in the presence of censoring and mixture andthe computation for some methods is demanding In this paper wepropose a sieve maximum likelihood estimation which is fully effi-

cient to infer genetic risk distribution functions nonparametricallySpecifically we estimate the logarithm of hazards ratios betweengenetic risk groups using B-splines while applying the nonpara-metric maximum likelihood estimation (NPMLE) for the referencebaseline hazard function Our estimator can be calculated via anEM algorithm and the computation is much faster than the exist-ing methods Furthermore we establish the asymptotic distributionof the obtained estimator and show that it is consistent and semi-parametric efficient and thus the optimal estimator in this frame-work The asymptotic theory on our sieve estimator sheds light onthe optimal estimation for censored mixture data Simulation stud-ies demonstrate superior performance of the proposed method insmall finite samples The method is applied to estimate the distri-bution of Parkinsonrsquos disease (PD) age at onset for carriers of mu-tations in the leucine-rich repeat kinase 2 (LRRK2) G2019S geneusing the data from the Michael J Fox Foundation AshkenaziJewishLRRK2 consortium This estimation is important for genetic coun-seling purposes since this test is commercially available yet geneticrisk (penetrance) estimates have been variable

Support Vector Hazard Regression for Predicting Event TimesSubject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng1

1University of North Carolina2Columbia UniversitydzengemailunceduPredicting dichotomous or continuous disease outcomes using pow-erful machine learning approaches has been studied extensively invarious scientific areas However how to learn prediction rules fortime-to-event outcomes subject to right censoring has received lit-tle attention until very recently Existing approaches rely on in-verse probability weighting or rank-based methods which are inef-ficient In this paper we develop a novel support vector hazards re-gression (SVHR) approach to predict time-to-event outcomes usingright censored data Our method is based on predicting the countingprocess via a series of support vector machines for time-to-eventoutcomes among subjects at risk Introducing counting processesto represent the time-to-event data leads to an intuitive connectionof the method with support vector machines in standard supervisedlearning and hazard regression models in standard survival analy-sis The resulting optimization is a convex quadratic programmingproblem that can easily incorporate non-linearity using kernel ma-chines We demonstrate an interesting connection of the profiledempirical risk function with the Cox partial likelihood which shedslights on the optimality of SVHR We formally show that the SVHRis optimal in discriminating covariate-specific hazard function frompopulation average hazard function and establish the consistencyand learning rate of the predicted risk Simulation studies demon-strate much improved prediction accuracy of the event times usingSVHR compared to existing machine learning methods Finally weapply our method to analyze data from two real world studies todemonstrate superiority of SVHR in practical settings

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products

Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D NortonMedImmunenortonjmedimmunecom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 41

Abstracts

Benefit-risk assessments are multidimensional and hence challeng-ing both to formulate and to communicate A particular limitation ofsome benefit-risk graphics is that they are based on the marginal dis-tributions of benefit and harm and do not show the degree to whichthey occur in the same patients Consider for example an imagi-nary drug that is beneficial to 50At the 2010 ICSA Symposium the speaker introduced a graphicshowing the benefit-risk state of each subject over time This talkwill include a new graphic based on similar principles that is in-tended for early phase studies It allows the user to assess the jointdistribution of benefit and harm on the individual and cohort levelsThe speaker will also review other graphical displays that may beeffective for benefit-risk assessment considering accepted princi-ples of statistical graphics and his experience working for FDA andindustry

Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3 GeorgeQuartey4 John Scott5 and Shihua Wen6

1Amgen Inc2Pfizer Inc3Merck amp Co4Hoffmann-La Roche5United States Food and Drug Administration6AbbVie IncqjiangamgencomIncreasingly companies regulatory agencies and other governancebodies are moving toward structured benefitrisk assessment ap-proaches One issue that complicates such structured approachesis uncertainty which comes from multiple sources and needs to beaddressed To develop potential approaches to address these sourcesof uncertainty it is critical first to have a thorough understanding ofthem In this presentation members from the Benefit-risk Work-ing Group of the Quantitative Sciences in Pharmaceutical Industry(QSPI BRWG) will discuss some major sources of uncertainty andshare some thoughts on how to address them

Current Concept of Benefit Risk Assessment of MedicineSyed S IslamAbbVie IncsyedislamabbviecomBenefit-risk assessment of a medicine should be as dynamic as thestages of drug development and life cycle of a drug Three fun-damental clinical concepts are critical at all stages- seriousness ofthe disease how much improvement will occur due to the drug un-der consideration and harmful effects including frequency serious-ness and duration One has to achieve a desirable balance betweenthese particularly prior to market approval and follow-up prospec-tively to see that the balance is maintained The desirable balanceis not a straightforward concept It depends on judgment by var-ious stakeholders The patients who are the direct beneficiary ofthe medicine should be the primary stakeholder provided adequateclear and concise information are available to them The healthcareproviders must have similar information that they can communicateto their patients The regulators and insurers are also stakehold-ers for different reasons Industry that are developing or producingthe drug must provide adequate and transparent information usableby all stakeholders Any quantitative approach to integrated bene-fit risk balance should be parsimonious and transparent along withsensitivity analyses This presentation will discuss pros and consof a dynamic benefit risk assessment and how integrated befit risk

analyses can be incorporated within the FDAEMA framework thatincludes patient preference

Session 10 Analysis of Observational Studies and Clini-cal Trials

Impact of Tuberculosis on Mortality Among HIV-Infected Pa-tients Receiving Antiretroviral Therapy in Uganda A CaseStudy in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 and LehanaThabane31Agensys Inc (Astellas)2University of OttawaMcMaster University3McMaster University4McMaster UniversityUniversity of Toronto5The AIDS Support Organization6Stellenbosch UniversityrongchuagensyscomBackground Tuberculosis (TB) disease affects survival among HIVco-infected patients on antiretroviral therapy (ART) Yet the mag-nitude of TB disease on mortality is poorly understoodMethods Using a prospective cohort of 22477 adult patients whoinitiated ART between August 2000 and June 2009 in Uganda weassessed the effect of active pulmonary TB disease at the initiationof ART on all-cause mortality using a Cox proportional hazardsmodel Propensity score (PS) matching was used to control for po-tential confounding Stratification and covariate adjustment for PSand not PS-based multivariable Cox models were also performedResults A total of 1609 (752) patients had active pulmonaryTB at the start of ART TB patients had higher proportions of beingmale suffering from AIDS-defining illnesses having World HealthOrganization (WHO) disease stage III or IV and having lower CD4cell counts at baseline (piexcl0001) The percentages of death duringfollow-up were 1047 and 638 for patients with and withoutTB respectively The hazard ratio (HR) for mortality comparing TBto non-TB patients using 1686 PS-matched pairs was 137 (95confidence interval [CI] 108 - 175) less marked than the crudeestimate (HR = 174 95 CI 149 - 204) The other PS-basedmethods and not PS-based multivariable Cox model produced sim-ilar resultsConclusions After controlling for important confounding variablesHIV patients who had TB at the initiation of ART in Uganda had anapproximate 37 increased hazard of overall mortality relative tonon-TB patients

Ecological Momentary Assessment Methods to Increase Re-sponse and Adjust for Attrition in a Study of Middle SchoolStudentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie Kovalchik KirstenBecker Elizabeth DrsquoAmico William Shadel and Marc ElliottRAND CorporationskovalchrandorgEcological momentary assessment (EMA) is a new approach forcollecting data about repeated exposures in natural settings thathas become more practical with the growth of mobile technolo-gies EMA has the potential to reduce recall bias However be-cause EMA occurs more often and frequently than traditional sur-veys missing data is common In this paper we describe the de-sign and preliminary results of a longitudinal EMA study of expo-sure to alcohol advertising among middle school students (n=600)

42 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

which employed a randomized missing design to increase responserates to smartphone surveys Early results (n=125) show evidenceof attrition over the 14-day collection period which was not associ-ated with student characteristics but was associated with study dayWe develop a prediction model for non-response and adjust for at-trition in exposure summaries using inverse probability weightingAttrition-adjusted estimates suggest that youths saw an average of38 alcohol ads per day over twice what has been previously re-ported with conventional assessment Corrected for attrition EMAmay allow more accurate estimation of frequent exposures than one-time delayed recall

Is Poor Antisaccade Performance in Healthy First-Degree Rel-atives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and Deborah L Levy31University of Alabama at Birmingham2State University of New York at Binghamton3McLean Hospitalcjmorganuabedu

A number of traits associated with schizophrenia aggregate in rel-atives of schizophrenia patients at rates much higher than thatof the clinical disorder These traits considered candidate en-dophenotypes may be alternative more penetrant manifestations ofschizophrenia risk genes than schizophrenia itself Performance onthe antisaccade task a measure of eye-tracking dysfunction is oneof the most widely studied candidate endophenotypes Howeverthere is little consensus on whether poor antisaccade performanceis a true endophenotype for schizophrenia Some studies compar-ing the performance of healthy relatives of schizophrenia patients(RelSZ) to that of normal controls (NC) report that RelSZ showsignificantly more errors while others find no statistically signifi-cant differences between the two groups A recent meta-analysis ofthese studies noted that some studies used stricter exclusion criteriafor NC than RelSZ and found these studies were more likely to findsignificant effect sizes Specifically NC in these studies with a per-sonal or family history of psychopathology were excluded whereasall RelSZ including those with psychotic conditions were includedIn order to determine whether a difference in antisaccade perfor-mance between NC and RelSZ remains after controlling for differ-ences in psychopathology we a binomial regression model to datafrom an antisaccade task We demonstrate that both psychopathol-ogy and familial history affect antisaccade performance

Analysis of a Vaccine Study in Animals using Mitigated Frac-tion in SASMathew RosalesExperismattrosalesexperiscom

Mitigated fraction is frequently used to evaluate the effect of an in-tervention in reducing the severity of a particular outcome a com-mon measure in vaccines study It utilizes rank of the observa-tions and measures the overlap of the two distributions using theirstochastic ordering Percent lung involvement is a common end-point in vaccines study to assess efficacy and mitigated fractionis used to estimate the relative increase in probability that a dis-ease will be less severe to the vaccinated group A SAS macro wasdevelop to estimate the mitigated fraction and its confidence inter-val The macro provides an asymptotic confidence interval and abootstrap-based interval For illustration an actual vaccine studywas used where the macro was utilized to generate the estimates

Competing Risks Survival Analysis for Efficacy Evaluation of

Some-or-None Vaccines

Paul T Edlefsen

Fred Hutchinson Cancer Research Centerpedlefsefhcrcorg

Evaluation of a vaccinersquos efficacy to prevent a specific type of in-fection endpoint in the context of multiple endpoint types is animportant challenge in biomedicine Examples include evaluationof multivalent vaccines such as the annual influenza vaccines thattarget multiple strains of the pathogen While statistical methodshave been developed for ldquomark-specific vaccine efficacyrdquo (wherethe term ldquomarkrdquo refers to a feature of the endpoint such as its typein contrast to a covariate of the subject) these methods addressonly vaccines that have a ldquoleakyrdquo vaccine mechanism meaningthat the vaccinersquos effect is to reduce the per-exposure probabilityof infection The usual presentation of vaccine mechanisms con-trasts ldquoleakyrdquo with ldquoall-or-nonerdquo vaccines which completely pro-tect some fraction of the subjects independent of the number ofexposures that each subject experiences We introduce the notion ofthe ldquosome-or-nonerdquo vaccine mechanism which completely protectsa fraction of the subjects from a defined subset of the possible end-point marks for example for a flu vaccine that completely protectsagainst the seasonal flu but has no effect against the H1N1 strainUnder conditions of non-harmful vaccines we introduce a frame-work and Bayesian and frequentist methods to detect and quantifythe extent to which a vaccinersquos partial efficacy is attributable to un-even efficacy across the marks rather than to incomplete ldquotakerdquo ofthe intervention These new methods provide more power than ex-isting methods to detect mark-varying efficacy (also called ldquosieveeffectsrdquo when the conditions hold We demonstrate the new frame-work and methods with simulation results and with new analyses ofgenetic signatures of vaccine effects in the RV144 HIV-1 vaccineefficacy trial

Using Historical Data to Automatically Identify Air-TrafficController Behavior

Yuefeng Wu

University of Missouri at St Louiswuyueumsledu

The Next Generation Air Traffic Control Systems are trajectory-based automation systems that rely on predictions of future statesof aircraft instead of just using human abilities that is how Na-tional Airspace System (NAS) does now As automation relyingon trajectories becomes more safety critical the accuracy of thesepredictions needs to be fully understood Also it is very importantfor researchers developing future automation systems to understandand in some cases mimic how current operations are conducted byhuman controllers to ensure that the new systems are at least as ef-ficient as humans and to understand creative solutions used by hu-man controllers The work to be presented answers both of thesequestions by developing statistical-based machine learning modelsto characterize the types of errors present when using current sys-tems to predict future aircraft states The models are used to infersituations in the historical data where an air-traffic controller inter-vened on an aircraftrsquos route even when there is no direct recordingof this action Local time series models and some other statisticsare calculated to construct the feature vector then both naive Bayesclassifier and support vector machine are used to learn the patternof the prediction errors

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 43

Abstracts

Session 11 Lifetime Data Analysis

Analysis of Multiple Type Recurrent Events When Only PartialInformation Is Available for Some SubjectsMin Zhan and Jeffery FinkUniversity of MarylandmzhanepiumarylandeduIn many longitudinal studies subjects may experience multipletypes of recurrent events In some situations the exact occurrencetimes of the recurrent events are not observed for some subjectsInstead the only information available is whether these subjects ex-perience each type of event in successive time intervals We discussmarginal models to assess the effect of baseline covariates on the re-current events The proposed methods are applied to a clinical studyof chronic kidney disease in which subjects can experience multipletypes of safety events repeatedly

Cumulative Incidence Function under Two-Stage Randomiza-tionIdil Yavuz1 Yu Cheng2 and Abdus Wahed2

1 Dokuz Eylul University2 University of PittsburghyuchengpitteduIn recent years personalized medicine and dynamic treatment regi-mens have drawn considerable attention Dynamic treatment regi-mens are sets of rules that govern the treatment of subjects depend-ing on their intermediate responses or covariates Two-stage ran-domization is a useful set-up to gather data for making inference onsuch regimens Meanwhile more and more practitioners becomeaware of competing-risk censoring for event type outcomes wheresubjects in a study are exposed to more than one possible failureand the specific event of interest may be dependently censored bythe occurrence of competing events We aim to compare severaltreatment regimens from a two-stage randomized trial on survivaloutcomes that are subject to competing-risk censoring With thepresence of competing risks cumulative incidence function (CIF)has been widely used to quantify the cumulative probability of oc-currence of the target event by a specific time point However if weonly use the data from those subjects who have followed a specifictreatment regimen to estimate the CIF the resulting naive estima-tor may be biased Hence we propose alternative non-parametricestimators for the CIF using inverse weighting and provide infer-ence procedures based on the asymptotic linear representation Inaddition test procedures are developed to compare the CIFs fromtwo different treatment regimens Through simulation we show thepracticality and advantages of the proposed estimators compared tothe naive estimator Since dynamic treatment regimens are widelyused in treating cancer AIDS psychological disorders and otherillnesses that require complex treatment and competing-risk cen-soring is common in studies with multiple endpoints the proposedmethods provide useful inferential tools to analyze such data andwill help advocate research in personalized medicine

Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen JinColumbia Universityzj7columbiaeduIn biomedical research and practice quantitative biomarkers are of-ten used for diagnostic or prognostic purposes with a threshold es-tablished on the measurement to aid binary classification Whenprognosis is on survival time single threshold may not be infor-mative It is also challenging to select threshold when the survival

time is subject to random censoring Using survival time dependentsensitivity and specificity we extend classification accuracy basedobjective function to allow for survival dependent threshold Toestimate optimal threshold for a range of survival rate we adopt anon-parametric procedure which produces satisfactory result in asimulation study The method will be illustrated with a real exam-ple

Session 12 Safety Signal Detection and Safety Analysis

Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhang and QiJiangAmgen Incmagchenamgencom

With the increased regulatory requirements for risk evaluation andminimization strategies large volumes of comprehensive safetydata have been collected and maintained by pharmaceutical spon-sors and proactive evaluation of such safety data for continuousassessment of product safety profile has become essential duringthe drug development life-cycle This presentation will introduceseveral key statistical methodologies developed for safety signalscreening detection including some methods recommended by reg-ulatory agencies for spontaneous reporting data as well as a few re-cently developed methodologies for clinical trials data In additionextensive simulation results will be presented to compare perfor-mance of these methods in terms of sensitivity and false discoveryrate The conclusion and recommendation will be briefed as well

Application of a Bayesian Method for Blinded Safety Monitor-ing and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Incshihuawenabbviecom

Monitoring patient safety is an indispensable component of clini-cal trial planning and conduct Proactive blinded safety monitoringand signal detection in on-going clinical trials enables pharmaceu-tical sponsors to monitor patient safety closely and at the same timemaintain the study blind Bayesian methods by their nature of up-dating knowledge based on accumulating data provide an excel-lent framework for carrying out such a safety monitoring processThis presentation will provide a step by step illustration of howseveral Bayesian models such as beta-binomial model Poisson-gamma model posterior probability vs predictive probability cri-terion etc can be applied to safety monitoring for a particular ad-verse event of special interest (AESI) in a real clinical trial settingunder various adverse event occurrence patterns

Some Thoughts on the Choice of Metrics for Safety EvaluationSteven SnapinnAmgen Incssnapinnamgencom

The magnitude of the treatment effect on adverse events can be as-sessed on a relative scale such as the hazard ratio or the relative riskor on an absolute scale such as the risk difference but there doesnrsquotappear to be any consistency regarding which metric should be usedin any given situation In this presentation I will provide some ex-amples where different metrics have been used discuss their advan-tages and disadvantages and provide a suggested approach

44 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2

1Amgen Inc2Gilead SciencesliangfanggileadcomAs an important aspect of the clinical evaluation of an investiga-tional therapy safety data are routinely collected in clinical trialsTo date the analysis of safety data has largely been limited to de-scriptive summaries of incidence rates or contingency tables aim-ing to compare simple rates between treatment arms Many haveargued this traditional approach failed to take into account impor-tant information including severity onset time and multiple occur-rences of a safety signal In addition premature treatment discon-tinuation due to excessive toxicity causes informative censoring andmay lead to potential bias in the interpretation of safety outcomesIn this article we propose a framework to summarize safety datawith mean frequency function and compare safety events of interestbetween treatments with a generalized log-rank test taking into ac-count the aforementioned characteristics ignored in traditional anal-ysis approaches In addition a multivariate generalized log-ranktest to compare the overall safety profile of different treatments isproposed In the proposed method safety events are considered tofollow a recurrent event process with a terminal event for each pa-tient The terminal event is modeled by a process of two types ofcompeting risks safety events of interest and other terminal eventsStatistical properties of the proposed method are investigated viasimulations An application is presented with data from a phase IIoncology trial

Session 13 Survival and Recurrent Event Data Analysis

Survival Analysis without Survival DataGary ChanUniversity of WashingtonkcgchanuweduWe show that relative mean survival parameters of a semiparametriclog-linear model can be estimated using covariate data from an inci-dent sample and a prevalent sample even when there is no prospec-tive follow-up to collect any survival data Estimation is based onan induced semiparametric density ratio model for covariates fromthe two samples and it shares the same structure as for a logisticregression model for case-control data Likelihood inference coin-cides with well-established methods for case-control data We showtwo further related results First estimation of interaction parame-ters in a survival model can be performed using covariate informa-tion only from a prevalent sample analogous to a case-only analy-sis Furthermore propensity score and conditional exposure effectparameters on survival can be estimated using only covariate datacollected from incident and prevalent samples

Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2

1Johns Hopkins University2National Institute of Allergy and Infectious DiseasescyhuangjhmieduSurvival data from prevalent cases collected under a cross-sectionalsampling scheme are subject to left-truncation When fitting an ad-ditive hazards model to left-truncated data the conditional estimat-ing equation method (Lin and Ying 1994) obtained by modifyingthe risk sets to account for left-truncation can be very inefficient

as the marginal likelihood of the truncation times is not used inthe estimation procedure In this paper we use a pairwise pseudo-likelihood to eliminate nuisance parameters from the marginal like-lihood and by combining the marginal pairwise pseudo-score func-tion and the conditional estimating function propose an efficientestimator for the additive hazards model The proposed estimatoris shown to be consistent and asymptotically normally distributedwith a sandwich-type covariance matrix that can be consistently es-timated Simulation studies show that the proposed estimator ismore efficient than its competitors A data analysis illustrates themethod

Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 and ToddDeFor11University of Minnesota2Johns Hopkins Universityluox0054umnedu

Infection is one of the most common complications afterhematopoietic cell transplantation It accounts for substantial mor-bidity and mortality among transplanted patients Many patientsexperience infectious complications repeatedly over time Existingstatistical methods for recurrent gap time data typically assume thatpatients are enrolled due to the occurrence of an event of the sametype as the recurrent event or assume that all gap times includingthe first gap are identically distributed Applying these methods onthe post-transplant infection data by ignoring event types will in-evitably lead to incorrect inferential results because the time fromthe transplant to the first infection has a different biological mean-ing than the gap times between recurrent infections after the firstinfection occurs Alternatively one may only analyze data afterthe first infection to make the existing recurrent gap time methodsapplicable but this introduces selection bias because only patientswho have experienced infections are included in the analysis Othernaive approaches may include using the univariate survival analysismethods eg the Kaplan-Meier method on the first infection onlydata or using the bivariate serial event data methods on the data upto the second infections Hence all subsequent infection data be-yond the first or the second infectious events will not be utilized inthe analysis These inefficient methods are expected to lead to de-creased power In this paper we propose a nonparametric estimatorof the joint distribution of time from transplant to the first infectionand the gap times between following infections and a semiparamet-ric regression model for studying the risk factors of infectious com-plications of the transplant patients The proposed methods takeinto account the potentially differential distribution of two types oftimes (time from transplant to the first infection and the gap timesbetween subsequent recurrent infections) and fully utilizes the dataof recurrent infections from patients Asymptotic properties of theproposed estimators are established

Session 14 Statistical Analysis on Massive Data fromPoint Processes

Identification of Synaptic Learning Rule from Ensemble Spik-ing ActivitiesDong Song and Theodore W BergerUniversity of Southern Californiadsonguscedu

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 45

Abstracts

Brain represents and processes information with spikes To under-stand the biological basis of brain functions it is essential to modelthe spike train transformations performed by brain regions Sucha model can also be used as a computational basis for developingcortical prostheses that can restore the lost cognitive function bybypassing the damaged brain regions We formulate a three-stagestrategy for such a modeling goal First we formulated a multiple-input multiple-output physiologically plausible model for repre-senting the nonlinear dynamics underlying spike train transforma-tions This model is equivalent to a cascade of a Volterra model anda generalized linear model The model has been successfully ap-plied to the hippocampal CA3-CA1 during learned behaviors Sec-ondly we extend the model to nonstationary cases using a point-process adaptive filter technique The resulting time-varying modelcaptures how the MIMO nonlinear dynamics evolve with time whenthe animal is learning Lastly we seek to identify the learning rulethat explains how the nonstationarity is formed as a consequence ofthe input-output flow that the brain region has experienced duringlearning

Intrinsically Weighted Means and Non-Ergodic Marked PointProcessesAlexander Malinowski1 Martin Schlather1and Zhengjun Zhang2

1University Mannheim2University of WisconsinzjzstatwisceduWhilst the definition of characteristics such as the mean mark in amarked point process (MPP) setup is non-ambiguous for ergodicprocesses several definitions of mark averages are possible andmight be practically relevant in the stationary but non-ergodic caseWe give a general approach via weighted means with possibly in-trinsically given weights We discuss estimators in this situationand show their consistency and asymptotic normality under certainconditions We also suggest a specific choice of weights that has aminimal variance interpretation under suitable assumptions

Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan WangColorado State UniversitysienkiewstatcolostateeduThis talk is motivated by a data set of brain neuron cells Each neu-ron is modeled as an unlabeled data object with topological and ge-ometric properties characterizing the branching structure connect-edness and orientation of a neuron This poses serious challengessince traditional statistical methods for multivariate data rely on lin-ear operations in Euclidean space We develop two curve represen-tations for each object and define the notion of percentiles basedon measures of topological and geometric variations through multi-objective optimization In general numerical solutions can be pro-vided by implementing genetic algorithm The proposed methodol-ogy is illustrated by analyzing a data set of pyramidal neurons

Session 15 High Dimensional Inference (or Testing)

Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni SunUniversity of PennsylvaniatingniwhartonupenneduThis paper studies the problem of estimating a large coefficient ma-trix in a multiple response linear regression model when the coef-ficient matrix is both sparse and of low rank We are especiallyinterested in the high dimensional settings where the number of

predictors andor response variables can be much larger than thenumber of observations We propose a new estimation schemewhich achieves competitive numerical performance while signifi-cantly reducing computation time when compared with state-of-the-art methods Moreover we show the proposed estimator achievesnear optimal non-asymptotic minimax rates of estimation under acollection of squared Schatten norm losses simultaneously by pro-viding both the error bounds for the estimator and minimax lowerbounds In particular such optimality results hold in the high di-mensional settings

Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen LiuUniversity of GeorgiayiwenliuugaeduThe early detection of biothreat is extremely difficult because mostof the early clinical signs in infected subjects show indistinguish-able ldquoflu-likerdquo symptoms Recent researches show that the genomicmarkers are the most reliable indicators and thus are widely usedin the existing detection methods in the past decades In this talk Iwill introduce a biomarker screening method based on the weightedleverage score The weighted leverage score is a variant of the lever-age score that has been widely used for the diagnostic of linear re-gression Empirical studies demonstrate that the weighted leveragescore is not only computationally efficient but also statistically ef-fective in variable screening

Testing High-Dimensional Nonparametric Function with Appli-cation to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and Vidyadhar MandrekarMichigan State UniversitypszhongsttmsueduThis paper proposes a test statistic for testing the high-dimensionalnonparametric function in a reproducing kernel Hilbert space gen-erated by a positive definite kernel We studied the asymptotic dis-tribution of the test statistic under the null hypothesis and a series oflocal alternative hypotheses in a large p smalln setup A simulationstudy was used to evaluate the finite sample performance of the pro-posed method We applied the proposed method to yeast data andthyroid hormone data to identify pathways that are associated withtraits of interest

Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping LiuNational Institutes of HealthdanpingliunihgovThe NEXT Generation Health study investigates the dating violenceof adolescents using a survey questionnaire Each student is askedto affirm or deny multiple instances of violence in hisher datingrelationship There is however evidence suggesting that studentsnot in a relationship responded to the survey resulting in excessivezeros in the responses This paper proposes likelihood-based andestimating equation approaches to analyze the zero-inflated clus-tered binary response data We adopt a mixed model method toaccount for the cluster effect and the model parameters are esti-mated using a maximum-likelihood (ML) approach that requires aGaussian-Hermite quadrature (GHQ) approximation for implemen-tation Since an incorrect assumption on the random effects distribu-tion may bias the results we construct generalized estimating equa-tions (GEE) that do not require the correct specification of within-cluster correlation In a series of simulation studies we examine

46 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the performance of ML and GEE methods in terms of their biasefficiency and robustness We illustrate the importance of properlyaccounting for this zero-inflation by re-analyzing the NEXT datawhere this issue has previously been ignored

Session 16 Phase II Clinical Trial Design with SurvivalEndpoint

Utility-Based Optimization of Schedule-Dose Regimes based onthe Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 and MuzaffarQazilbash1

1University of Texas MD Anderson Cancer Center2University of Michiganrexmdandersonorg

A two-stage Bayesian phase I-II design for jointly optimizing ad-ministration schedule and dose of an experimental agent based onthe times to response and toxicity is described Sequentially adap-tive decisions are based on the joint utility of the two event timesA utility surface is constructed by partitioning the two-dimensionalquadrant of event time pairs into rectangles eliciting a numericalutility for each rectangle and fitting a smooth parametric functionto the elicited values Event times are modeled using gamma distri-butions with shape and scale parameters both functions of sched-ule and dose In stage 1 patients are randomized fairly amongschedules and a dose is chosen within each schedule using an algo-rithm that hybridizes greedy optimization and randomization amongnearly optimal doses In stage 2 fair randomization among sched-ules is replaced by the hybrid algorithm An extension to accommo-date death or discontinuation of follow up is described The designis illustrated by an autologous stem cell transplantation trial in mul-tiple myeloma

Bayesian Decision Theoretic Two-Stage Design in Phase II Clin-ical Trials with Survival EndpointLili Zhao and Jeremy TaylorUniversity of Michiganzhaoliliumichedu

In this study we consider two-stage designs with failure-time end-points in single arm phase II trials We propose designs in whichstopping rules are constructed by comparing the Bayes risk of stop-ping at stage one to the expected Bayes risk of continuing to stagetwo using both the observed data in stage one and the predicted sur-vival data in stage two Terminal decision rules are constructed bycomparing the posterior expected loss of a rejection decision ver-sus an acceptance decision Simple threshold loss functions are ap-plied to time-to-event data modelled either parametrically or non-parametrically and the cost parameters in the loss structure are cal-ibrated to obtain desired Type I error and power We ran simula-tion studies to evaluate design properties including type IampII errorsprobability of early stopping expected sample size and expectedtrial duration and compared them with the Simon two-stage de-signs and a design which is an extension of the Simonrsquos designswith time-to-event endpoints An example based on a recently con-ducted phase II sarcoma trial illustrates the method

Single-Arm Phase II Group Sequential Trial Design with Sur-vival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping XiongSt Jude Childrenrsquos Research Hospitaljianrongwustjudeorg

Three non-parametric test statistics are proposed to design single-arm phase II group sequential trials for monitoring survival proba-bility The small-sample properties of these test statistics are stud-ied through simulations Sample size formulas are derived for thefixed sample test The Brownian motion property of the test statis-tics allowed us to develop a flexible group sequential design using asequential conditional probability ratio test procedure

Session 17 Statistical Modeling of High-throughput Ge-nomics Data

Learning Genetic Architecture of Complex Traits Across Popu-lationsMarc Coram Sophie Candille and Hua TangStanford UniversityhualtanggmailcomGenome-wide association studies (GWAS) have successfully re-vealed many loci that influence complex traits and disease suscep-tibilities An unanswered question is ldquoto what extent does the ge-netic architecture underlying a trait overlap between human popula-tionsrdquo We explore this question using blood lipid concentrations asa model trait In African Americans and Hispanic Americans par-ticipating in the Womenrsquos Health Initiative SNP Health AssociationResource we validated one African-specific HDL locus as well as14 known lipid loci that have been previously implicated in stud-ies of European populations Moreover we demonstrate strikingsimilarities in genetic architecture (loci influencing the trait direc-tion and magnitude of genetic effects and proportions of pheno-typic variation explained) of lipid traits across populations In par-ticular we found that a disproportionate fraction of lipid variationin African Americans and Hispanic Americans can be attributed togenomic loci exhibiting statistical evidence of association in Euro-peans even though the precise genes and variants remain unknownAt the same time we found substantial allelic heterogeneity withinshared loci characterized both by population-specific rare variantsand variants shared among multiple populations that occur at dis-parate frequencies The allelic heterogeneity emphasizes the impor-tance of including diverse populations in future genetic associationstudies of complex traits such as lipids furthermore the overlapin lipid loci across populations of diverse ancestral origin arguesthat additional knowledge can be gleaned from multiple popula-tions We discuss how the overlapping genetic architecture can beexploited to improve the efficiency of GWAS in minority popula-tions

A Bayesian Hierarchical Model to Detect Differentially Methy-lated Loci from Single Nucleotide Resolution Sequencing DataHao Feng Karen Coneelly and Hao WuEmory UniversityhaowuemoryeduDNA methylation is an important epigenetic modification that hasessential roles in cellular processes including gene regulation de-velopment and disease and is widely dysregulated in most types ofcancer Recent advances in sequencing technology have enabled themeasurement of DNA methylation at single nucleotide resolutionthrough methods such as whole-genome bisulfite sequencing andreduced representation bisulfite sequencing In DNA methylationstudies a key task is to identify differences under distinct biologicalcontexts for example between tumor and normal tissue A chal-lenge in sequencing studies is that the number of biological repli-cates is often limited by the costs of sequencing The small number

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 47

Abstracts

of replicates leads to unstable variance estimation which can re-duce accuracy to detect differentially methylated loci (DML) Herewe propose a novel statistical method to detect DML when com-paring two treatment groups The sequencing counts are describedby a lognormal-beta-binomial hierarchical model which providesa basis for information sharing across different CpG sites A Waldtest is developed for hypothesis testing at each CpG site Simulationresults show that the proposed method yields improved DML detec-tion compared to existing methods particularly when the numberof replicates is low The proposed method is implemented in theBioconductor package DSS

Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and Mingyao Li21University of Minnesota2University of Pennsylvania3Vanderbilt Universityrxiaomailmedupennedu

A major application of RNA-Seq is to detect differential isoform ex-pression across experimental conditions However this is challeng-ing because of uncertainty in isoform expression estimation owingto ambiguous reads and variability in the precision of the estimatesacross samples It is desirable to have a method that can accountfor these issues and also allows adjustment of covariates In thispaper we present a random-effects meta-regression approach thatnaturally fits for this purpose Through extensive simulations andanalysis of an RNA-Seq dataset on human heart failure we showthat this approach is computationally fast reliable and can improvethe power of differential expression analysis while controlling forfalse positives due to the effect of covariates or confounding vari-ables

Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei ZouUniversity of North Carolina at Chapel Hillfeizouemailuncedu

Next generation Methyl-seq data collected from F1 reciprocalcrosses in mouse can powerfully dissect strain and parent-of-origineffects on allelic specific methylation In this talk we present anovel statistical approach to analyze Methyl-seq data motivated byan F1 mouse study Our method jointly models the strain and parentof origin effects and deals with the over-dispersion problem com-monly observed in read counts and can flexibly adjust for the effectsof covariates such as sex and read depth We also propose a genomiccontrol procedure to properly control type I error for Methyl-seqstudies where the number of samples is small

Session 18 Statistical Applications in Finance

A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2

1State University of New York2IBMxingamssunysbedu

Markov switching model has been used in various applications ineconomics and finance As exisitng Markov switching models de-scribe the regimes or parameter values in a categorical way itis restrictive in practical analysis In this paper we introduce amixture model with stochastic regimes in which the regimes and

model parameters are represented both categorically and continu-ously Assuming conjudge priors we develop closed-form recur-sive Bayes estimates of the regression parameters an approxima-tion scheme that has much lower computational complexity and yetare comparable to the Bayes estimates in statistical efficiency andan expectation-maximization procedure to estimate the unknownhyper-parameters We conduct intensive simulation studies to eval-uate the performance of Bayes estimates of time-varying parametersand their approximations We further apply the proposed model toanalyze the series of the US monthly total non-farm employee

Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming HuoGeorgia Institute of TechnologyxiaomingisyegatecheduAd position auctions are being held all the time in nearly all websearch engines and have become the major source of revenue in on-line advertising We study statistical models of the bidding pricesTwo approaches are explored (1) a game theoretic approach thatcharacterizes biddersrsquo behavior and (2) a statistical generative ap-proach which aims at mimicking the fundamental mechanism un-derlying the bidding process We comparecontrast these two ap-proaches and describe how auctioneer can take advantage of theobtained knowledge

Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4 andHsun-Chih Kuo5

1University of Maryland2Seoul National Univ3Auburn University4Ulsan National Institute of Science and Technology5National Chengchi UniversityjohanlimsnuackrThis work is motivated by a hand-collected data set from one ofthe largest internet portal in Korea The data set records the top 30most frequently discussed stocks on its online stock message boardwhich can be considered as a measure of investorrsquos attention on in-dividual stocks The empirical goal of the data set is to investigatethe attentionrsquos effect to the trading behavior To do it we considerthe regression model whose response is either stock return perfor-mance or trading volume and covariates are the daily-observed par-tial ranks as well as other covariates influential to the response Inestimating the regression model the rank covariate is often treatedas an ordinal categorical variable or simply transformed into a scorevariable (mostly using identify score function) In the paper westart our discussion with that for the univariate regression problemwhere we find the asymptotic normality of the regression coefficientestimator whose mean is 0 and variance is an unknown function ofthe distribution of X We then straightforwardly extend the resultsof univariate regression to multiple regression and have the similarasympototic distribution We finally consider an estimator for mul-tiple sets by extending or combining the estimators of each singleset We apply our proposed distribution guided scoring function tothe motivated data set to empirically prove the attention effect

Optimal Sparse Volatility Matrix Estimation for High Dimen-sional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou3

1Florida State University2University of Wisconsin-Madison

48 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

3Yale UniversitytaostatfsueduStochastic processes are often used to model complex scientificproblems in fields ranging from biology and finance to engineeringand physical science This talk investigates rate-optimal estimationof the volatility matrix of a high dimensional Ito process observedwith measurement errors at discrete time points The minimax rateof convergence is established for estimating sparse volatility ma-trices By combining the multi-scale and threshold approaches weconstruct a volatility matrix estimator to achieve the optimal conver-gence rate The minimax lower bound is derived by considering asubclass of Ito processes for which the minimax lower bound is ob-tained through a novel equivalent model of covariance matrix esti-mation for independent but non-identically distributed observationsand through a delicate construction of the least favorable parame-ters In addition a simulation study was conducted to test the finitesample performance of the optimal estimator and the simulationresults were found to support the established asymptotic theory

Session 19 Hypothesis Testing

A Score-type Test for Heterogeneity in Zero-inflated Models ina Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem3

1Auburn University2Kansas State University3Michigan State Universitygzc0009auburneduWe propose a score-type statistic to evaluate heterogeneity in zero-inflated models for count data in a stratified population where het-erogeneity is defined as instances in which the zero counts are gen-erated from two sources In this work we extend the literature bydescribing a score-type test to evaluate homogeneity against generalalternatives that do not neglect the stratification information underthe alternative hypothesis Our numerical simulation studies showthat the proposed test can greatly improve efficiency over tests ofheterogeneity that ignore the stratification information An empiri-cal application to dental caries data in early childhood further showsthe importance and practical utility of the methodology in using thestratification profile to detect heterogeneity in the population

Inferences on Correlation Coefficients of Bivariate Log-normalDistributionsGuoyi Zhang1 and Zhongxue Chen2

1Universtiy of New Mexico2Indiana Universitygzhang123gmailcomThis research considers inference on the correlation coefficients ofbivariate log-normal distributions We developed a generalized con-fidence interval and hypothesis tests for the correlation coefficientand extended the results for comparing two independent correla-tions Simulation studies show that the suggested methods workwell even for small samples The methods are illustrated using twopractical examples

Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 Myrto Barrdahl3 andNilanjan Chatterjee11National Cancer Institute2Harvard University3German Cancer Reserch Centersongm4mailnihgov

Risk-prediction models need careful calibration to ensure they pro-duce unbiased estimates of risk for subjects in the underlying pop-ulation given their risk-factor profiles As subjects with extremehigh- or low- risk may be the most affected by knowledge of theirrisk estimates checking adequacy of risk models at the extremes ofrisk is very important for clinical applications We propose a newapproach to test model calibration targeted toward extremes of dis-ease risk distribution where standard goodness-of-fit tests may lackpower due to sparseness of data We construct a test statistic basedon model residuals summed over only those individuals who passhigh andor low risk-thresholds and then maximize the test-statisticover different risk-thresholds We derive an asymptotic distribu-tion for the max-test statistic based on analytic derivation of thevariance-covariance function of the underlying Gaussian processThe method is applied to a large case-control study of breast can-cer to examine joint effects of common SNPs discovered thoroughrecent genome-wide association studies The analysis clearly indi-cates non-additive effect of the SNPs on the scale of absolute riskbut an excellent fit for the linear-logistic model even at the extremesof risks

Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun MaAmgen Incphuamgencom

It is well known that sample sizes of clinical trials are often not bigenough to assess adverse events (AE) with very low incidence ratesLarge scale observational studies such as pharmacovigilence stud-ies using healthcare databases provide an alternative resource forassessment of very rare adverse events Healthcare databases oftencan easily provide tens of thousands of exposed patients which po-tentially allows the assessment of events as rare as in the magnitudeof iexcl 10minus4In this talk we discuss the performance of various commonly usedstatistical methods for comparison of binomial proportions of veryrare events The statistical power type I error control confidenceinterval (CI) coverage length of confidence interval bias and vari-ability of treatment effect estimates as well as the distribution of CIupper bound etc will be examined and compared for the differentmethods Power calculation is often necessary for study planningpurpose However many commonly used power calculation meth-ods are based on approximation and may give erroneous estimatesof power when events are We will compare the power estimates fordifferent methods provided by SAS Proc Power and empirically ob-tained via simulation The use of relative risks (RR) and risk differ-ences (RD) will also be commented on Based on these results sev-eral recommendations are given to guide sample size assessmentsfor such types of studies at design stage

Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu LiAuburn Universityxzl0037auburnedu

This paper proposes a class of lack-of-fit tests for fitting a paramet-ric regression model when response variables are missing at ran-dom These tests are based on a class of minimum integrated squaredistances between a kernel type estimator of a regression functionand the parametric regression function being fitted These tests areshown to be consistent against a large class of fixed alternativesThe corresponding test statistics are shown to have asymptotic nor-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 49

Abstracts

mal distributions under null hypothesis Some simulation results arealso presented

Session 20 Design and Analysis of Clinical Trials

Application of Bayesian Approach in Assessing Rare AdverseEvents during a Clinical StudyGrace Li Karen Price Haoda Fu and David MannerEli Lilly and CompanyLi Ying GracelillycomBayesian analysis is gaining wider application in decision makingthroughout the drug development process due to its more intuitiveframework and ability to provide direct probabilistic answers tocomplex problems Determining the risk profile for a compoundthroughout phases of drug development is crucial along with ensur-ing the most appropriate analyses are performed In a conventional2-arm parallel study design rare adverse events are often assessedvia frequentist approaches such as a Fisherrsquos exact test with itsknown limitations This presentation will focus on the challengesof the frequentist approach to detect and evaluate potential safetysignals in the rare event setting and compare it with the proposedBayesian approach We will compare the operational characteristicsbetween the frequentist and the Bayesian approaches using simu-lated data Most importantly the proposed approach offers muchmore flexibility and a more direct probabilistic interpretation thatimproves the process of detecting rare safety signals This approachhighlights the strength of Bayesian methods for inference Thesimulation results are intended to demonstrate the value of usingBayesian methods and that appropriate application has the potentialto increase efficiency of decision making in drug development

A Simplified Varying-Stage Adaptive Phase IIIII Clinical TrialDesignGaohong DongNovartis Pharmaceuticals CorporationgaohongdongnovartiscomConventionally adaptive phase IIIII clinical trials are carriedout with a strict two-stage design Recently Dong (Statistics inMedicine 2014 33(8)1272-87) proposed a varying-stage adap-tive phase IIIII clinical trial design In this design following thefirst stage an intermediate stage can be adaptively added to obtainmore data so that a more informative decision could be made re-garding whether the trial can be advanced to the final confirmatorystage Therefore the number of further investigational stages is de-termined based upon data accumulated to the interim analysis LaterDong (2013 ICSA Symposium Book to be published) investigatedsome characteristics of this design This design considers two plau-sible study endpoints with one of them initially designated as theprimary endpoint Based on interim results another endpoint canbe switched as the primary endpoint However in many therapeuticareas the primary study endpoint is well established therefore wesimplify this design to consider one study endpoint only Our sim-ulations show that same as the original design this simplified de-sign controls Type I error rate very well the sample size increasesas the threshold probability for the two-stage setting increases andthe alpha allocation ratio in the two-stage setting vs the three-stagesetting has a great impact to the design However this simplifieddesign requires a larger sample size for the initial stage to overcomethe power loss due to the futility Compared to a strict two-stagePhase IIIII design this simplified design improves the probabilityof trial success

Improving Multiple Comparison Procedures With CoprimaryEndpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and Frank Bretz11Novartis Pharmaceuticals Corporation2University of BremenJenniferlinovartiscomFor a fixed-dose combination of indacaterol acetate (long-acting β2-agonist) and mometasone furoate (inhaled corticosteroid) for theonce daily maintenance treatment of asthma and Chronic Obstruc-tive Pulmonary Disease(COPD) both lung function improvementand one symptom outcome improvement are required for the drug tobe developed successfully The symptom outcome could be AsthmaControl Questionnaire (ACQ) improvement for the asthma programand exacerbation rate reduction for the COPD program Havingtwo endpoints increases the probability of false positive results bychance alone ie marketing a drug which is not or insufficientlyeffective Therefore regulatory agencies require strict control ofthis probability at a pre-specified significance level (usually 251-sided) The Simes test is often used in our clinical trials How-ever the Simes test requires the assumption that the test statistics arepositively correlated This assumption is not always satisfied or can-not be easily verified when dealing with multiple endpoints In thispresentation an extension of the Simes test - a generalized Simestest introduced by Maurer Glimm Bretz (2011) which is applica-ble to any correlation (positive negative or even no correlation) isutilized Power benefits based on simulations are presented FDAand other agencies have accepted this approach indicating that theproposed method can be used in other trials in future

Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine CrespiUniversity of California at Los AngelesshengwuuclaeduCluster randomized trials (CRTs) are increasingly used for researchin many fields including public health education social studies andethnic disparity studies Equal allocation designs are often used inCRTs but they may not be optimal especially when cost considera-tion is taken into account In this paper we consider two-arm clusterrandomized trials with a binary outcome and develop various opti-mal designs when sampling costs for units and clusters are differentand the primary outcome is attributable risk or relative risk Weconsider both frequentist and Bayesian approaches in the context ofcancer control and prevention cluster randomized trials and presentformuale for optimal sample sizes for the two arms for each of theoutcome measure

Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and Risk Dif-ferenceTianyue ZhouSanofi-aventis US LLCtianyuezhousanoficomMeta-analysis of side effects has been widely used to combine datawith low event rate across comparative clinical studies for evaluat-ing drug safety profile When dealing with rare events a substantialproportion of studies may not have any events of interest In com-mon practice meta-analyses on a relative scale (relative risk [RR]or odds ratio [OR]) remove zero-event studies while meta-analysesusing risk difference [RD] as the effect measure include them Ascontinuity corrections are often used when zero event occurs in ei-ther arm of a study the impact of zero event and continuity cor-rection on estimates of Mantel-Haenszel (M-H) OR and RD was

50 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

examined through simulation Two types of continuity correctionthe treatment arm continuity correction and the constant continuitycorrection are applied in the meta-analysis for variance calculationFor M-H OR it is unnecessary to include zero-event trials and the95 confidence interval [CI] of the estimate without continuity cor-rections provided best coverage For H-M RD including zero-eventtrials reduced bias and using certain continuity correction ensured atleast 95 coverage of 95 CI This paper examined the influence ofzero event and continuity correction on estimates of M-H OR andRD in order to help people decide whether to include zero-eventtrials and use continuity corrections for a specific problem

Session 21 New methods for Big Data

Sure Independence Screening for Gaussian Graphical ModelsShikai Luo1 Daniela Witten2 and Rui Song1

1North Carolina State University2University of WashingtonrsongncsueduIn high-dimensional genomic studies it is of interest to understandthe regulatory network underlying tens of thousands of genes basedon hundreds or at most thousands of observations for which geneexpression data is available Because graphical models can identifyhow variables such as the coexpresion of genes are related theyare frequently used to study genetic networks Although various ef-ficient algorithms have been proposed statisticians still face hugecomputational challenges when the number of variables is in tens ofthousands of dimensions or higher Motivated by the fact that thecolumns of the precision matrix can be obtained by solving p regres-sion problems each of which involves regressing that feature ontothe remaining pminus 1 features we consider covariance screening forGaussian graphical models The proposed methods and algorithmspossess theoretical properties such as sure screening properties andsatisfactory empirical behavior

Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman2

1Google2Iowa State UniversitydnettiastateeduRandom forest (RF) methodology is a nonparametric methodologyfor prediction problems A standard way to utilize RFs includesgenerating a global RF in order to predict all test cases of interestIn this talk we propose growing different RFs specific to differenttest cases namely case-specific random forests (CSRFs) In con-trast to the bagging procedure used in the building of standard RFsthe CSRF algorithm takes weighted bootstrap resamples to createindividual trees where we assign large weights to the training casesin close proximity to the test case of interest a priori Tuning meth-ods are discussed to avoid overfitting issues Both simulation andreal data examples show that CSRFs often outperform standard RFsin prediction We also propose the idea of case-specific variable im-portance (CSVI) as a way to compare the relative predictor variableimportance for predicting a particular case It is possible that theidea of building a predictor case-specifically can be generalized inother areas

Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis2University of North Carolina at Chapel Hill

tcmleeucdaviseduIn this talk we present a novel parallel method for computing param-eter estimates and their standard errors for massive data problemsThe method is based on generalized fiducial inference

OEM Algorithm for Big DataXiao Nie and Peter Z G QianUniversity of Wisconsin-MadisonxiaoniestatwisceduBig data with large sample size arise in Internet marketing engi-neering and many other fields We propose an algorithm calledOEM (aka orthogonalizing EM) for analyzing big data This al-gorithm employs a procedure named active orthogonalization toexpand an arbitrary matrix to an orthogonal matrix This procedureyields closed-form solutions to ordinary and various penalized leastsquares problems The maximum number of points needed to beadded is bounded by the number of columns of the original ma-trix which is appealing for large n problems Attractive theoreticalproperties of OEM include (1) convergence to the Moore-Penrosegeneralized inverse estimator for a singular regression matrix and(2) convergence to a point having grouping coherence for a fullyaliased regression matrix We also extend this algorithm to logisticregression The effectiveness of OEM for least square and logisticregression problems will be illustrated through examples

Session 22 New Statistical Methods for Analysis of HighDimensional Genomic Data

Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung HuangBrown UniversityYen-Tsung HuangbrowneduGiven the availability of genomic data there have been emerging in-terests in integrating multi-platform data Here we propose to modelepigenetic DNA methylation micro-RNA expression and gene ex-pression data as a biological process to delineate phenotypic traitsunder the framework of causal mediation modeling We proposea regression model for the joint effect of methylation micro-RNAexpression and gene expression and their non-linear interactions onthe outcome and study three path-specific effects the direct effectof methylation on the outcome the effect mediated through expres-sion and the effect through micro-RNA expression We characterizecorrespondences between the three path-specific effects and coeffi-cients in the regression model which are influenced by causal rela-tions among methylation micro-RNA and gene expression A scoretest for variance components of regression coefficients is developedto assess path-specific effects The test statistic under the null fol-lows a mixture of chi-square distributions which can be approxi-mated using a characteristic function inversion method or a pertur-bation procedure We construct tests for candidate models deter-mined by different combinations of methylation micro-RNA geneexpression and their interactions and further propose an omnibustest to accommodate different models The utility of the methodwill be illustrated in numerical simulation studies and a glioblas-toma data from The Cancer Genome Atlas (TCGA)

Estimation of High Dimensional Directed Acyclic Graphs usingeQTL dataWei Sun1 and Min Jin Ha2

1University of North Carolina at Chapel Hill2University of Texas MD Anderson Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 51

Abstracts

weisunemailuncedu

Observational data can be used to estimate the skeleton of a di-rected acyclic graph (DAG) and the directions of a limited numberof edges With sufficient interventional data one can identify thedirections of all the edges of a DAG However such interventionaldata are often not available especially for high dimensional prob-lems We develop a statistical method to estimate a DAG using sur-rogate interventional data where the interventions are applied to aset of external variables and thus such interventions are consideredto be surrogate interventions on the variables of interest Our workis motivated by expression quantitative trait locus (eQTL) studieswhere the variables of interest are the expression of genes the ex-ternal variables are DNA variations and interventions are appliedto DNA variants during the process that a randomly selected DNAallele is passed to a child from either parent Our method namedas sirDAG (surrogate intervention recovery of DAG) first constructDAG skeleton using a combination of penalized regression and thePC algorithm and then estimate the posterior probabilities of all theedge directions after incorporating DNA variant data We demon-strate advantage of sirDAG by simulations and an application in aneQTL study of iquest18000 genes in 550 breast cancer patients

Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 and HongyuZhao1

1Yale University2University of Texas at Dallas3Bristol-Myers Squibb4Mount-Sinai Medical Centerhongyuzhaoyaleedu

Although Genome Wide Association Studies (GWAS) have iden-tified many susceptibility loci for common diseases they only ex-plain a small portion of heritability It is challenging to identify theremaining disease loci because their association signals are likelyweak and difficult to identify among millions of candidates Onepotentially useful direction to increase statistical power is to incor-porate functional genomics information especially gene expressionnetworks to prioritizeGWASsignals Most current methods utiliz-ing network information to prioritize disease genes are based onthe ldquoguilt by associationrdquo principle in which networks are treatedas static and disease-associated genes are assumed to locate closerwith each other than random pairs in the network In contrast wepropose a novel ldquoguilt by rewiringrdquo principle Studying the dynam-ics of gene networks between controls and patients this principleassumes that disease genes more likely undergo rewiring in patientswhereas most of the network remains unaffected in disease condi-tion To demonstrate this principle we consider thechanges of co-expression networks in Crohnrsquos disease patients andcontrols andhow network dynamics reveals information on disease associationsOur results demonstrate that network rewiring is abundant in theimmune system anddisease-associated genes are morelikely to berewired in patientsTo integrate this network rewiring feature andGWAS signals we propose to use the Markov random field frame-work to integrate network information to prioritize genes Appli-cations in Crohnrsquos disease and Parkinsonrsquos disease show that thisframework leads to more replicable results and implicates poten-tially disease-associated pathways

Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and Genotyping Stud-

iesNi Zhao and Michael WuFred Hutchinson Cancer Research CenternzhaofhcrcorgComprehensive understanding of complex trait etiology requires ex-amination of multiple sources of genomic variability Integrativeanalysis of these data sources promises elucidation of the biologicalprocesses underlying particular phenotypes Consequently manylarge GWAS consortia are expanding to simultaneously examine thejoint role of DNA methylation Two practical challenges have arisenfor researchers interested in joint analysis of GWAS and methyla-tion studies of the same subjects First it is unclear how to leverageboth data types to determine if particular genetic regions are relatedto traits of interest Second it is of considerable interest to under-stand the relative roles of different sources of genomic variabilityin complex trait etiology eg whether epigenetics mediates geneticeffects etc Therefore we propose to use the powerful kernel ma-chine framework for first testing the cumulative effect of both epige-netic and genetic variability on a trait and for subsequent mediationanalysis to understand the mechanisms by which the genomic datatypes influence the trait In particular we develop an approach thatworks at the generegion level (to allow for a common unit of anal-ysis across data types) Then we compare pair-wise similarity in thetrait values between individuals to pairwise similarity in methyla-tion and genotype values for a particular gene with correspondencesuggestive of association Similarity in methylation and genotypeis found by constructing an optimally weighted average of the sim-ilarities in methylation and genotype For a significant generegionwe then develop a causal steps approach to mediation analysis atthe generegion level which enables elucidation of the manner inwhich the different data types work or do not work together Wedemonstrate through simulations and real data applications that ourproposed testing approach often improves power to detect trait as-sociated genes while protecting type I error and that our mediationanalysis framework can often correctly elucidate the mechanisms bywhich genetic and epigenetic variability influences traits A key fea-ture of our approach is that it falls within the kernel machine testingframework which allows for heterogeneity in effect sizes nonlinearand interactive effects and rapid p-value computation Addition-ally the approach can be easily applied to analysis of rare variantsand sequencing studies

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation process

Joint Modeling of Alternating Recurrent Transition TimesLiang LiUniversity of Texas MD Anderson Cancer CenterLLi15mdandersonorgAtrial fibrillation (AF) is a common complication on patients under-going cardiac surgery Recent technological advancement enablesthe physicians to monitor the occurrence AF continuously with im-planted cardiac devices The device records two types of transitionaltimes the time when the heart enters the AF status from normal beatand the time when the heart exits from AF status and returns to nor-mal beat The two transitional time processes are recurrent and ap-pear alternatively Hundreds of transitional times may be recordedon a single patient over a follow-up period of up to 12 months Therecurrent pattern carries information on the risk of AF and may berelated to baseline covariates The previous AF pattern may be pre-dictive to the subsequent AF pattern We propose a semiparametric

52 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

bivariate longitudinal transitional time model to model this compli-cated process The model enables single subject analysis as well asmultiple subjects analysis and both can be carried out in a likelihoodframework We present numerical studies to illustrate the empiricalperformance of the methodology

Regression Analysis of Panel Count Data with Informative Ob-servation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun4

1University of North Carolina at Charlotte2University of Maryland3University of New Hampshire4University of Missouri at ColumbiaYLiunccedu

Panel count data usually occur in medical follow-up studies Mostexisting approaches on panel count data analysis assumed that theobservation or censoring times are independent of the response pro-cess either completely or given some covariates We present ajoint analysis approach in which the possible mutual correlationsare characterized by time-varying random effects Estimating equa-tions are developed for the parameter estimation and a simulationstudy is conducted to assess the finite sample performance of theapproach The asymptotic properties of the proposed estimates arealso given and the method is applied to an illustrative example

Envelope Linear Mixed ModelXin ZhangUniversity of Minnesotazhxnzxgmailcom

Envelopes were recently proposed by Cook Li and Chiaromonte(2010) as a method for reducing estimative and predictive varia-tions in multivariate linear regression We extend their formulationproposing a general definition of an envelope and adapting enve-lope methods to linear mixed models Simulations and illustrativedata analysis show the potential for envelope methods to signifi-cantly improve standard methods in longitudinal and multivariatedata analysis This is joint work with Professor R Dennis Cook andProfessor Joseph G Ibrahim

Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan CaiUniversity of Texas health Science Center at Houstonccaistatgmailcom

In longitudinal data analyses the observation times are often as-sumed to be independent of the outcomes In applications in whichthis assumption is violated the standard inferential approach of us-ing the generalized estimating equations may lead to biased infer-ence Current methods require the correct specification of either theobservation time process or the repeated measure process with a cor-rect covariance structure In this article we construct a novel pair-wise pseudo-likelihood method for longitudinal data that allows fordependence between observation times and outcomes This methodinvestigates the marginal covariate effects on the repeated measureprocess while leaving the probability structure of the observationtime process unspecified The novelty of this method is that ityields consistent estimator of the marginal covariate effects with-out specification of the observation time process or the covariancestructure of repeated measures process Large sample propertiesof the regression coefficient estimates and a pseudolikelihood-ratiotest procedure are established Simulation studies demonstrate thatthe proposed method performs well in finite samples An analysis of

weight loss data from a web-based program is presented to illustratethe proposed method

Session 24 Bayesian Models for High Dimensional Com-plex Data

A Bayesian Feature Allocation Model for Tumor HeterogeneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and Kamalakar Gulukota4

1University of California at Santa Cruz2University of Texas at Austin3University of Chicago4Northshore University HealthSystemjuheeleesoeucsceduWe propose a feature allocation model to model tumor heterogene-ity The data are next-generation sequencing data (NGS) from tumorsamples We use a variation of the Indian buffet process to charac-terize latent hypothetical subclones based on single nucleotide vari-ations (SNVs) We define latent subclones by the presence of somesubset of the recorded SNVs Assuming that each sample is com-posed of some sample-specific proportions of these subclones wecan then fit the observed proportions of SNVs for each sample Bytaking a Bayesian perspective the proposed method provides a fulldescription of all possible solutions as a coherent posterior proba-bility model for all relevant unknown quantities including the binaryindicators that characterize the latent subclones by selecting (or not)the recorded SNVs instead of reporting a single solution

Some Results on the One-Way ANOVA Model with an Increas-ing Number of GroupsFeng LiangUniversity of Illinois at Urbana-ChampaignliangfillinoiseduAsymptotic studies on models with diverging dimensionality havereceived increasing attention in statistics A simple version of suchmodels is a one-way ANOVA model where the number of repli-cates is fixed but the number of groups goes to infinity Of interestare inference problems like model selection and estimation of theunknown group means We examine the consistency of Bayesianprocedures using Zellner (1986)rsquos g-prior and its variants (such asmixed g-priors and Empirical Bayes) and compare their estimationaccuracy with other procedures such as the ones based AICBICand group Lasso Our results indicate that the Empirical Bayes pro-cedure (with some modification for the large p small n setting) andthe fully Bayes procedure (ie a prior is specified on g) can achievemodel selection consistency and also have better estimation accu-racy than other procedures being considered

Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji31University of Louisville2University of Texas at Austin3NorthShore University HealthSystemUniversity of ChicagojiyuanuchicagoeduGraphical models can be used to characterize the dependence struc-ture for a set of random variables In some applications the formof dependence varies across different subgroups This situationarises for example when protein activation on a certain pathwayis recorded and a subgroup of patients is characterized by a patho-logical disruption of that pathway A similar situation arises whenone subgroup of patients is treated with a drug that targets that samepathway In both cases understanding changes in the joint distri-bution and dependence structure across the two subgroups is key to

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 53

Abstracts

the desired inference Fitting a single model for the entire data couldmask the differences Separate independent analyses on the otherhand could reduce the effective sample size and ignore the com-mon features In this paper we develop a Bayesian graphical modelthat addresses heterogeneity and implements borrowing of strengthacross the two subgroups by simultaneously centering the prior to-wards a global network The key feature is a hierarchical prior forgraphs that borrows strength across edges resulting in a comparisonof pathways across subpopulations (differential pathways) under aunified model-based framework We apply the proposed model todata sets from two very different studies histone modifications fromChIP-seq experiments and protein measurements based on tissuemicroarrays

Latent Space Models for Dynamic NetworksYuguo ChenUniversity of Illinois at Urbana-Champaignyuguoillinoisedu

Dynamic networks are used in a variety of fields to represent thestructure and evolution of the relationships between entities Wepresent a model which embeds longitudinal network data as trajec-tories in a latent Euclidean space A Markov chain Monte Carloalgorithm is proposed to estimate the model parameters and latentpositions of the nodes in the network The model parameters pro-vide insight into the structure of the network and the visualizationprovided from the model gives insight into the network dynamicsWe apply the latent space model to simulated data as well as realdata sets to demonstrate its performance

Session 25 Statistical Methods for Network Analysis

Consistency of Co-clustering for Exchangable Graph and ArrayDataDavid S Choi1 and Patrick J Wolfe21Carnegie Mellon University2University College Londondavidchandrewcmuedu

We analyze the problem of partitioning a 0-1 array or bipartite graphinto subgroups (also known as co-clustering) under a relativelymild assumption that the data is generated by a general nonpara-metric process This problem can be thought of as co-clusteringunder model misspecification we show that the additional error dueto misspecification can be bounded by O(n( minus 14)) Our resultsuggests that under certain sparsity regimes community detectionalgorithms may be robust to modeling assumptions and that theirusage is analogous to the usage of histograms in exploratory dataanalysis

Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali ShojaieUniversity of Washingtonashojaieuwedu

We introduce a general framework using a Laplacian shrinkagepenalty for estimation of inverse covariance or precision matricesfrom heterogeneous nonexchangeable populations The proposedframework encourages similarity among disparate but related sub-populations while allowing for differences among estimated matri-ces We propose an efficient alternating direction method of mul-tiplier (ADMM) algorithm for parameter estimation and establishboth variable selection and norm consistency of the estimator for

distributions with exponential or polynomial tails Finally we dis-cuss the selection of the Laplacian shrinkage penalty based on hier-archical clustering in the settings where the true relationship amongsamples is unknown and discuss conditions under which this datadriven choice results in consistent estimation of precision matricesExtensive numerical studies and applications to gene expressiondata from subtypes of cancer with distinct clinical outcomes indi-cate the potential advantages of the proposed method over existingapproaches

Estimating Signature Subgraphs in Samples of Labeled GraphsJuhee Cho and Karl RoheUniversity of Wisconsin-MadisonchojuheestatwisceduNetwork is a vibrant area in statistics biology and computer sci-ence Recently an emerging type of data in these fields is samplesof labeled networks (or graphs) The ldquolabelsrdquo of networks imply thatthe nodes are labeled and that the same set of nodes reappears in allof the networks Also they have a dual meaning that there are values(eg age gender or healthy vs sick) or vectors of values charac-terizing the associated network From the analysis we observe thatonly a part of the network forming a ldquosignature subgraphrdquo variesacross the networks whereas the other part is very similar So wedevelop methods to estimate the signature subgraph and show the-oretical properties of the suggested methods under the frameworkthat allows the sample size to go to infinity with a sparsity condi-tion To check the finite sample performances for the methods weconduct a simulation study and then analyze two data sets 42 brain-graphs data from 21 subjects and transcriptional regulatory networkdata from 41 diverse human cell types

Fast Hierarchical Modeling for Recommender SystemsPatrick PerryNew York UniversitypperrysternnyueduIn the context of a recommender system a hierarchical model al-lows for user-specific tastes while simultaneously borrowing esti-mation strength across all users Unfortunately existing likelihood-based methods for fitting hierarchical models have high computa-tional demands and these demands have limited their adoption inlarge-scale prediction tasks We propose a moment-based methodfor fitting a hierarchical model which has its roots in a methodoriginally introduced by Cochran in 1937 The method trades sta-tistical efficiency for computational efficiency It gives consistentparameter estimates competitive prediction error performance anddramatic computational improvements

Session 26 New Analysis Methods for UnderstandingComplex Diseases and Biology

Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3 YongZhang2 Myles Brown4 and X Shirley Liu4

1Dana Farber Cancer Institute2Tongji University3University of Texas MD Anderson Cancer Center4Dana Farber Cancer Institute amp Harvard UniversityywchenjimmyharvardeduCumulatively 70 of the human genome are transcribed whereasiexcl2 of the genome encodes protein As a part of the prevalent non-coding transcription long non-coding RNAs (lncRNAs) are RNAs

54 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

that are longer than 200 base pairs (bps) but with little protein cod-ing capacity The human genome encodes over 10000 lncRNAsand the function of the vast majority of them are unknown Throughintegrative analysis of the lncRNA expression profiles with clinicaloutcome and somatic copy number alteration we identified lncRNAthat are associated with cancer subtypes and clinical prognosis andpredicted those that are potential drivers of cancer progression inmultiple cancers including glioblastoma multiforme (GBM) ovar-ian cancer (OvCa) lung squamous cell carcinoma (lung SCC) andprostate cancer We validated our predictions of two tumorgeniclncRNAs by experimentally confirming the prostate cancer cellgrowth dependence on these two lncRNAs Our integrative analysisprovided a resource of clinically relevant lncRNA for developmentof lncRNA biomarkers and identification of lncRNA therapeutic tar-gets for human cancer

Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi WuHarvard Universitydwufasharvardedu

Large amount of genetics variants were identified in cancer genomestudies and GWAS studies These variants may well capture thecharacteristics of the diseases To best leverage the knowledge fordeveloping new therapeutics to treat diseases our study exploresthe possibility to use the genetics of diseases to guide drug repur-posing Drug repurposing is to suggest whether the available drugsof certain diseases can be re-used for the treatment of other dis-eases We particularly use the gene target information of drugs andprotein-protein interaction information to connect risk genes basedon GWAS hits and the available drugs Drug indication was used toevaluate the sensitivity and specificity of the novel pipeline Eval-uation of the pipeline suggests the promising direction for certaindiseases

Comparative Meta-Analysis of Prognostic Gene Signatures forLate-Stage Ovarian CancerLevi WaldronHunter Collegeleviwaldronhuntercunyedu

Authors Levi Waldron Benjamin Haibe-Kains Aedın C CulhaneMarkus Riester Jie Ding Xin Victoria Wang Mahnaz Ahmadi-far Svitlana Tyekucheva Christoph Bernau Thomas Risch Ben-jamin Ganzfried Curtis Huttenhower Michael Birrer and GiovanniParmigianiAbstract Numerous published studies have reported prognosticmodels of cancer patient survival from tumor genomics These stud-ies employ a wide variety of model training and validation method-ologies making it difficult to compare and rank their modelingstrategies or the accuracy of the models However they have alsogenerated numerous publicly available microarray datasets withclinically-annotated individual patient data Through systematicreview we identified and implemented fully-specified versions of14 prognostic models of advanced stage ovarian cancer publishedover a 5-year period These 14 published models were developedby different authors using disparate training datasets and statis-tical methods but all claimed to be capable of predicting over-all survival using microarray data We evaluated these models forprognostic accuracy (defined by Concordance Index for overall sur-vival) adapting traditional methods of meta-analysis to synthesizeresults in ten independent validation datasets This systematic eval-uation showed that 1) models generated by penalized or ensemble

Cox Proportional Hazards-based regression methods out-performedmodels generated by more complicated methods and strongly out-performed hypothesis-based models 2) validation dataset bias ex-isted meaning that some datasets indicated better validation perfor-mance for all models than others and that comparative evaluation isneeded to identify this source of bias 3) datasets selected by authorsfor independent validation tended to over-estimate model accuracycompared to previously unused validation datasets and 4) seem-ingly unrelated models generated highly correlated predictions fur-ther emphasizing the need for comparative evaluation of accuracyThis talk will provide an overview of methods for prediction mod-eling in cancer genomics and highlight lessons from the first sys-tematic comparative meta-analysis of published cancer genomicsprognostic models

Studying Spatial Organizations of Chromosomes via Paramet-ric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and Jun SLiu5

1New York university2Purdue University3Emory University4Tsinghua University5Harvard UniversityminghunyumcorgThe recently developed Hi-C technology enables a genome-wideview of spatial organizations of chromosomes and has shed deepinsights into genome structure and genome function Although thetechnology is extremely promising multiple sources of biases anduncertainties pose great challenges for data analysis Statistical ap-proaches for inferring three-dimensional (3D) chromosomal struc-ture from Hi-C data are far from their maturity Most existing mod-els are highly over-parameterized lacking clear interpretations andsensitive to outliers In this study we propose parsimonious easyto interpret and robust helix models for reconstructing 3D chromo-somal structure from Hi-C data We also develop a negative bino-mial regression approach to accounting for over-dispersion in Hi-Cdata When applied to a real Hi-C dataset helix models achievemuch better model adequacy scores than existing models Moreimportantly these helix models reveal that geometric properties ofchromatin spatial organizations as well as chromatin dynamics areclosely related to genome functions

Session 27 Recent Advances in Time Series Analysis

Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark van derWoerdColorado State UniversityjbreidtgmailcomProteins consist of sequences of the 21 natural amino acids Therecan be tens to hundreds of amino acids in the protein and hundredsto hundreds of thousands of atoms A complete model for the pro-tein consists of coordinates for every atom A useful class of sim-plified models is obtained by focusing only on the alpha-carbonsequence consisting of the primary carbon atom in the backboneof each amino acid The three-dimensional structure of the alpha-carbon backbone of the protein can be described as a sequence ofangle pairs each consisting of a bond angle and a dihedral angleThese angle pairs lie naturally on a sphere We consider autoregres-sive time series models for such spherical data sequences using ex-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 55

Abstracts

tensions of projected normal distributions We describe applicationto protein data and further developments including autoregressivemodels that switch parameterizations according to local structure inthe protein (such as helices beta-sheets and coils)

Semiparametric Estimation of Spectral Density Function withIrregular DataShu Yang and Zhengyuan Zhu

Iowa State Universityzhu1997gmailcom

We propose a semi-parametric method to estimate spectral den-sities of isotropic Gaussian processes with irregular observationsThe spectral density function at low frequencies is estimated usingsmoothing spline while we use a parametric model for the spec-tral density at high frequencies and estimate the parameters usingmethod-of-moment based on empirical variogram at small lags Wederive the asymptotic bounds for bias and variance of the proposedestimator Simulation results show that our method outperforms theexisting nonparametric estimator by several performance criteria

On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and Siegfried Hormann3

1University of California at Davis2University College London3University Libre de Bruxellesaaueucdavisedu

This talk addresses the prediction of stationary functional time se-ries Existing contributions to this problem have largely focusedon the special case of first-order functional autoregressive processesbecause of their technical tractability and the current lack of ad-vanced functional time series methodology It is shown how stan-dard multivariate prediction techniques can be utilized in this con-text The connection between functional and multivariate predic-tions is made precise for the important case of vector and functionalautoregressions The proposed method is easy to implement mak-ing use of existing statistical software packages and may there-fore be attractive to a broader possibly non-academic audienceIts practical applicability is enhanced through the introduction ofa novel functional final prediction error model selection criterionthat allows for an automatic determination of the lag structure andthe dimensionality of the model The usefulness of the proposedmethodology is demonstrated in simulations and an application tothe prediction of daily pollution curves It is found that the proposedprediction method often significantly outperforms existing methods

A Composite Likelihood-based Approach for Multiple Change-point Estimation in Multivariate Time Series ModelsChun Yip Yau and Ting Fung Ma

Chinese University of Hong Kongcyyaustacuhkeduhk

We propose a likelihood-based approach for multiple change-pointsestimation in general multivariate time series models Specificallywe consider a criterion function based on pairwise likelihood to esti-mate the number and locations of change-points and perform modelselection for each segment By the virtue of pairwise likelihood thenumber and location of change-points can be consistently estimatedunder very mild assumptions Computation is conducted efficientlyby a pruned dynamic programming algorithm Simulation studiesand real data examples are presented to demonstrate the statisticaland computational efficiency of the proposed method

Session 28 Analysis of Correlated Longitudinal and Sur-vival Data

Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir MesbahUniversity of Paris 6mounirmesbahupmcfrIn this talk I will consider the context of a longitudinal study whereparticipants are interviewed about their health quality of life or an-other latent trait at regular dates of visit previously establishedThe interviews consist usually to fulfill a questionnaire in whichthey are asked multiple choice questions with various ordinal re-sponse scales built in order to measure at the time of the visit thelatent trait which is assumed in a first step unidimensional Atthe time of entering the study each participant receives a treatmentappropriate to his health profile The choice of treatment is not ran-domized This choice is arbitrarily decided by a doctor based onthe health profile of the patient and a deep clinical examinationWe assume that the different treatments that a doctor can choose areordered (a dose effect) In addition we assume that the treatmentprescribed at the entrance does not change throughout the study Inthis work I will investigate and compare strategies and models toanalyze time evolution of the latent variable in a longitudinal studywhen the main goal is to compare non-randomized ordinal treat-ments I will illustrate my results with a real longitudinal complexquality of life studyReferences [1] Bousseboua M and Mesbah M (2013) Longitu-dinal Rasch Process with Memory Dependence Pub InstStatUniv Paris Vol 57- Fasc 1-2 45-58 [2] Christensen KB KreinerS Mesbah M (2013) Rasch Models in Health J Wiley [3] Mes-bah M (2012) Measurement and Analysis of Quality of Life inEpidemiology In ldquoBioinformatics in Human Health and Heredity(Handbook of statistics Vol 28)rdquo Eds Rao CR ChakrabortyR and Sen PK North Holland Chapter 15 [4] Rosenbaum PRand Rubin DB (1983) The central role of the propensity score inobservational studies for causal effects Biometrika 70 1 pp 41-55[5] K Imai and D A Van Dyk (2004) Causal Inference With Gen-eral Treatment Regimes Generalizing the Propensity Score JASAVol 99 N 467 Theory and Methods

Power and Sample Size Calculations for Evaluating MediationEffects with Multiple Mediators in Longitudinal StudiesCuiling WangAlbert Einstein College of MedicinecuilingwangeinsteinyueduCurrently there are very limited statistical researches on power anal-ysis for evaluating mediation effects of multiple mediators in longi-tudinal studies In addition to the complex of missing data com-mon to longitudinal studies the case of multiple mediators furthercomplicates the hypotheses testing of mediation effects Based onprevious work of Wang and Xue (Wang and Xue 2012) we eval-uate several hypothesis tests regarding the mediation effects frommultiple mediators and provide formulae for power and sample sizecalculations The performance of these methods under limited sam-ple size is examined using simulation studies An example from theEinstein Aging Study (EAS) is used to illustrate the methods

Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore21University of Maryland

56 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

2McGill UniversitymltleeumdeduCox regression methods are well-known It has however a strongproportional hazards assumption In many medical contexts a dis-ease progresses until a failure event (such as death) is triggeredwhen the health level first reaches a failure threshold Irsquoll presentthe Threshold Regression (TR) model for the health process that re-quires few assumptions and hence is quite general in its potentialapplication Both parametric and distribution-free methods for es-timations and predictions using the TR models are derived Caseexamples are presented that demonstrate the methodology and itspractical use The methodology provides medical researchers andbiostatisticians with new and robust statistical tools for estimatingtreatment effects and assessing a survivorrsquos remaining life

Joint Modeling of Survival Data and Mismeasured Longitudi-nal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi21University of Western Ontario2University of WaterloowhestatsuwocaJoint modeling of longitudinal and survival data has been studiedextensively where the Cox proportional hazards model has fre-quently been used to incorporate the relationship between survivaltime and covariates Although the proportional odds model is anattractive alternative to the Cox proportional hazards model by fea-turing the dependence of survival times on covariates via cumulativecovariate effects this model is rarely discussed in the joint model-ing context To fill this gap we investigate joint modeling of thesurvival data and longitudinal data which subject to measurementerror We describe a model parameter estimation method based onexpectation maximization algorithm In addition we assess the im-pact of naive analyses that fail to address error occurring in longi-tudinal measurements The performance of the proposed method isevaluated through simulation studies and a real data analysis

Session 29 Clinical Pharmacology

Truly Personalizing MedicineMike D HaleAmgen IncmdhaleamgencomPredictive analytics are being increasingly used to optimize market-ing for many non-medical products These companies observe andanalyze the behavior andor characteristics of an individual pre-dict the needs of that individual and then address those needs Wefrequently encounter this when web-browsing and when participat-ing in retail store loyalty programs advertising and coupons aretargeted to the specific individual based on predictive models em-ployed by advertisers and retailers This makes the traditional drugdevelopment program appear antiquated where a drug may be in-tended for all patients with a given indication This talk contraststhose methods and practices for addressing individual needs withthe way medicines are typically prescribed and considers a wayto integrate big data product label and predictive analytics to im-prove and enable personalized medicine Some important questionsare posed (but unresolved) such as who could do this and whatare the implications if we were to predict outcomes for individualpatients

What Do Statisticians Do in Clinical PharmacologyBrian Smith

Amgen Incbrismithamgencom

Clinical pharmacology is the science of drugs and their clinical useIt could be arged that all drug development is clinical pharmacol-ogy however typically pharmaceutical companies speperate in apattern similiar to the following A) clinical (late) development(Phase 2b-Phase 3) B) post-marketing (phase 4) and C) clinicalpharmacology (Phase 1-Phase 2a) As will be seen in this presenta-tion clinical pharmacology research presents numerous interestingstatistical opportunities

The Use of Modeling and Simulation to Bridge Different DosingRegimens - a Case StudyChyi-Hung Hsu and Jose PinheiroJanssen Research amp Developmentchsu3itsjnjcom

In recent years the pharmaceutical industry has increasingly facedthe challenge of needing to efficiently evaluate and use all availableinformation to improve its success rate in drug development underlimited resources constraints Modeling and simulation has estab-lished itself as the quantitative tool of choice to meet this existentialchallenge Models provide a basis for quantitatively describing andsummarizing the available information and our understanding of itUsing models to simulate data allows the evaluation of scenarioswithin and even outside the boundaries of the original data In thispresentation we will discuss and illustrate the use of modeling andsimulation techniques to bridge different dosing regimens based onstudies using just one of the regimens Special attention will begiven to quantifying inferential uncertainty and model validation

A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao Susan GuoLijie Zhong and Liang FangGilead Sciencesyongwushaogileadcom

For a bioequivalence crossover study the FDA guidance recom-mends a mixed effects model for the formulation comparisons ofpharmacokinetics parameters including all subject data while theEMA guidance recommends an ANOVA model with fixed effectsof sequence subject within sequence period and formulation ex-cluding subjects with missing data from the pair-wise comparisonThese two methods are mathematically equivalent when there areno missing values from the targeted comparison With missing val-ues the mixed effects model including subjects with missing valuesprovides higher statistical power compared to fixed effects modelexcluding these subjects However the parameter estimation in themixed effects model is based on large sample asymptotic approxi-mations which may introduce bias in the estimate of standard devi-ations when sample size is small (Jones and Kenward 2003)In this talk we provide a closed-form formula to quantify the poten-tial gain of power using mixed effects models when missing dataare present A simulation study was conducted to confirm the theo-retical results We also perform a simulation study to investigate thebias introduced by the mixed effects model for small sample sizeOur results show that when the sample size is 12 or above as re-quired by both FDA and EMA the bias introduced by the mixedeffects model is negligible From a statistics point of view we rec-ommend the mixed effect model approach for bioequivalence stud-ies for its potential gain in power when missing data are present andmissing completely at random

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 57

Abstracts

Session 30 Sample Size Estimation

Sample Size Calculation with Semiparametric Analysis of LongTerm and Short Term HazardsYi WangNovartis Pharmaceuticals Corporationyi-11wangnovartiscom

We derive sample size formulae for survival data with non-proportional hazard functions under both fixed and contiguous al-ternatives Sample size determination has been widely discussed inliterature for studies with failure-time endpoints Many researchershave developed methods with the assumption of proportional haz-ards under contiguous alternatives Without covariate adjustmentthe logrank test statistic is often used for the sample size and powercalculation With covariate adjustment the approaches are oftenbased on the score test statistic for the Cox proportional hazardsmodel Such methods however are inappropriate when the pro-portional hazards assumption is violated We develop methods tocalculate the sample size based on the semiparametric analysis ofshort-term and long-term hazard ratios The methods are built ona semiparametric model by Yang and Prentice (2005) The modelaccommodates a wide range of patterns of hazard ratios and in-cludes the Cox proportional hazards model and the proportionalodds model as its special cases Therefore the proposed methodscan be used for survival data with proportional or non-proportionalhazard functions In particular the sample size formula by Schoen-feld (1983) and Hsieh and Lavori (2000) can be obtained as a specialcase of our methods under contiguous alternatives

Sample Size and Decision Criteria for Phase IIB Studies withActive ControlXia XuMerck amp Coxia xumerckcom

In drug development programs Phase IIB studies provide informa-tion to make GoNo Go decision of conducting large confirmatoryPhase III studies Currently more and more Phase IIB studies are us-ing active control as comparator especially for development of newtherapies for the treatment of HIV infection in which it is not ethicalto use placebo control due to severity of the disease and availabilityof approved drugs If Phase IIB study demonstrated ldquocomparablerdquoefficacy and safety compared to active control the program mayproceed to Phase III which usually use same or similar active con-trol to formally assess non-inferiority of the new therapy Samplesize determination and quantification of decision criteria for suchPhase IIB studies are explored using a Bayesian analysis

Sample Size Determination for Clinical Trials to Correlate Out-comes with Potential PredictorsSu Chen Xin Wang and Ying ZhangAbbVie Incsuchenabbviecom

Sample size determination can be a challenging task for a post-marketing clinical study aiming to establish the predictivity of asingle influential measurement or a set of variables to a clinical out-come of interest Since the relationship between the potential pre-dictors and the outcome is unknown at the design stage one maynot be able to perform the conventional sample size calculation butlook for other means to size the trial Our proposed approach isbased on the length of the confidence interval of the true correlationcoefficient between predictive and outcome variables In this studywe compare three methods to construct confidence intervals of the

correlation coefficient based on the approximate sampling distribu-tion of the Pearson correlation Z-transformed Pearson correlationand Bootstrapping respectively We evaluate the performance ofthe three methods under different scenarios with small to moderatesample sizes and different correlations Coverage probabilities ofthe confidence intervals are compared across the three methods Theresults are used for sample size determination based on the width ofthe confidence intervals Hypothetical examples are provided to il-lustrate the idea and its implementation

Sample Size Re-Estimation at Interim Analysis in Oncology Tri-als with a Time-to-Event Endpoint

Ian (Yi) Zhang

Sunovion Pharmaceuticals Incianzhangsunovioncom

Oncology is a hot therapeutic area due to highly unmet medicalneeds The superiority of a study drug over a control is commonlyassessed with respect to a time to event endpoint such as overall sur-vival (OS) or progression free survival (PFS) in confirmatory oncol-ogy trials Adaptive design allowing for sample size re-estimation(SSR) at interim analysis is often employed to accelerate oncologydrug development while reducing costs Although SSR is catego-rized as ldquoless well understoodrdquo (in contrast to ldquowell understoodrdquodesigns such as group sequential design) in the 2010 draft FDAguidance on adaptive designs it has gradually gained regulatory ac-ceptance and is widely adopted in industry In this presentation aphase IIIII seamless design is developed to re-estimate the samplesize based upon unblinded interim result using conditional power ofobserving a significant result by the end of the trial The method-ology achieved the desired conditional power while still controllingthe type I error rate Extensive simulations studies are performedto evaluate the operating characteristics of the design A real-worldexample will also be used for illustration Pros and cons of the de-sign will be discussed

Statistical Inference and Sample Size Calculation for Paired Bi-nary Outcomes with Missing Data

Song Zhang

University of Texas Southwestern Medical Centersongzhangutsouthwesternedu

We investigate the estimation of intervention effect and samplesize determination for experiments where subjects are supposed tocontribute paired binary outcomes with some incomplete observa-tions We propose a hybrid estimator to appropriately account forthe mixed nature of observed data paired outcomes from thosewho contribute complete pairs of observations and unpaired out-comes from those who contribute either pre- or post-interventionoutcomes We theoretically prove that if incomplete data are evenlydistributed between the pre- and post-intervention periods the pro-posed estimator will always be more efficient than the traditionalestimator A numerical research shows that when the distributionof incomplete data is unbalanced the proposed estimator will besuperior when there is moderate-to-strong positive within-subjectcorrelation We further derive a closed-form sample size formula tohelp researchers determine how many subjects need to be enrolledin such studies Simulation results suggest that the calculated sam-ple size maintain the empirical power and type I error under variousdesign configurations We demonstrate the proposed method usinga real application example

58 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Session 31 Predictions in Clinical Trials

Predicting Smoking Cessation Outcomes Beyond Clinical TrialsYimei Li E Paul Wileyto and Daniel F HeitjanUniversity of PennsylvaniayimeilimailmedupenneduIn smoking cessation trials subjects usually receive treatment forseveral weeks with additional information collected 6 or 12 monthafter that An important question concerns predicting long-term ces-sation success based on short-term clinical observations But sev-eral features need to be considered First subjects commonly transitseveral times between lapse and recovery during which they exhibitboth temporary and permanent quits and both brief and long-termlapses Second although we have some reliable predictors of out-come there is also substantial heterogeneity in the data We there-fore introduce a cure-mixture frailty model that describes the com-plex process of transitions between abstinence and smoking Thenbased on this model we propose a Bayesian approach to predictindividual future outcomes We will compare predictions from ourmodel to a variety of ad hoc methods

Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping WangEli Lilly and CompanyfuhaodagmailcomIn oncology trials it is challenging to predict when we can havecertain number of events or for a given period of time how manyadditional events that we can observe We develop a tool calledBEATLES which stands for Bayesian Event And Time LandmarkEstimation Software This method and tools have been broadly im-plemented in Lilly In this talk we will present the technical details

Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record (EMR)DataHaoda Fu1 and Nan Jia2

1Eli Lilly and Company2University of Southern Californiajia nan2lillycomTo compare a treatment with a control via a randomized clinicaltrial the assessment of the treatment efficacy is often based on anoverall treatment effect over a specific study population To increasethe probability of study success (PrSS) it is important to choose anappropriate and relevant study population where the treatment is ex-pected to show overall benefit over the control This research is topredict the PrSS based on EMR data for a given patient populationTherefore we can use this approach to refine the study inclusionand exclusion criteria to increase the PrSS For learning from EMRdata we also develop covariate balancing methods Although ourmethods are developed for learning from EMR data learning fromrandomized control trials will be a special case of our methods

Weibull Cure-Mixture Model for the Prediction of Event Timesin Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1

1University of Pennsylvania2Radiation Therapy Oncology Group Statistical CentergsyingmailmedupenneduMany clinical trials with time-to-event outcome are designed toperform interim and final analyses upon the occurrence of a pre-specified number of events As an aid to trial logistical planningit is desirable to predict the time to reach such landmark event

numbers Our previously developed parametric (exponential andWeibull) prediction models assume that every trial participant issusceptible to the event of interest and will eventually experiencethe event if follow-up time is long enough This assumption maynot hold as some trial participants may be cured of the fatal dis-ease and the failure to accommodate the cure possibility may leadto the biased prediction In this talk a Weibull cure-mixture predic-tion model will be presented that assumes the trial participants area mixture of susceptible (uncured) participants and non-susceptible(cured) participants The cure probability is modelled using logis-tic regression and the time to event among susceptible participantsis modelled by a two-parameter Weibull distribution The compar-ison of prediction from the Weibull-cure mixture prediction modelto that from the standard Weibull prediction model will be demon-strated using data from a randomized trial of oropharyngeal cancer

Session 32 Recent Advances in Statistical Genetics

Longitudinal Exome-Focused GWAS of Alcohol Use in a Vet-eran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale UniversityzuohengwangyaleeduAlcohol dependence (AD) is a major public health concern in theUnited States and contributes to the pathogenesis of many diseasesThe risk of AD is multifactorial and includes shared genetic andenvironmental factors However gene mapping in AD has not yetbeen successful the confirmed associations account for a small pro-portion of overall genetic risks Multiple measurements in longitu-dinal genetic studies provide a route to reduce noise and correspond-ingly increase the strength of signals in genome-wide associationstudies (GWAS) In this study we developed a powerful statisticalmethod for testing the joint effect of genetic variants with a generegion on diseases measured over multiple time points We appliedthe new method to a longitudinal study of veteran cohort with bothHIV-infected and HIV-uninfected patients to understand the geneticrisk underlying AD We found an interesting gene that has been re-ported in HIV study suggestive of potential gene by environmenteffect in alcohol use and HIV We also conducted simulation studiesto access the performance of the new statistical methods and demon-strated a power gain by taking advantage of repeated measurementsand aggregating information across a biological region This studynot only contributes to the statistical toolbox in the current GWASbut also potentially advances our understanding of the etiology ofAD

Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2 andAlexander F Wilson1

1National Institutes of Health2Mahidol UniversitysunghemailnihgovThe task of identifying genetic variants contributing to trait varia-tion is increasingly challenging given the large number and densityof variant data being produced Current methods of analyzing thesedata include regression-based variable selection methods which pro-duce linear models incorporating the chosen variants For examplethe Tiled Regression method begins by examining relatively smallsegments of the genome called tiles Selection of significant predic-tors if any is done first within individual tiles However type I errorrates for such methods havenrsquot been fully investigated particularly

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 59

Abstracts

considering correlation among variants To investigate type I er-ror in this situation we simulated a mini-GWAS genome including306097 SNPs in 4000 unrelated samples with 2000 non-genetictraits Initially 53060 tiles were defined by dividing the genomeaccording to recombination hotspots Then larger tiles were definedby combining groups of ten consecutive tiles Stepwise regressionand LASSO variable selection methods were performed within tilesfor each tile definition Type I error rates were calculated as thenumber of selected variants divided by the number considered av-eraged over the 2000 phenotypes Overall error rates for stepwiseregression using fixed selection criteria of 005 and LASSO mini-mizing mean square error were 004 and 012 respectively whenusing the initial (smaller) tiles Considering separately each combi-nation of tile size (number of SNPs) and multicollinearity (definedas 1 - the determinant of the genotype correlation matrix) observedtype I error rates for stepwise regression tended to increase withthe number of variants and decrease with increasing multicollinear-ity With LASSO the trends were in the opposite direction Whenthe larger tiles were used overall rates for LASSO were noticeablysmaller while overall rates were rather robust for stepwise regres-sion

GMDR A Conceptual Framework for Detection of MultifactorInteractions Underlying Complex TraitsXiang-Yang LouUniversity of Alabama at Birminghamxylouuabedu

Biological outcomes are governed by multiple genetic and envi-ronmental factors that act in concert Determining multifactor in-teractions is the primary topic of interest in recent genetics stud-ies but presents enormous statistical and mathematical challengesThe computationally efficient multifactor dimensionality reduction(MDR) approach has emerged as a promising tool for meeting thesechallenges On the other hand complex traits are expressed in vari-ous forms and have different data generation mechanisms that can-not be appropriately modeled by a dichotomous model the subjectsin a study may be recruited according to its own analytical goals re-search strategies and resources available not only homogeneous un-related individuals Although several modifications and extensionsof MDR have in part addressed the practical problems they arestill limited in statistical analyses of diverse phenotypes multivari-ate phenotypes and correlated observations correcting for poten-tial population stratification and unifying both unrelated and familysamples into a more powerful analysis I propose a comprehensivestatistical framework referred as to generalized MDR (GMDR) forsystematic extension of MDR The proposed approach is quite ver-satile not only allowing for covariate adjustment being suitablefor analyzing almost any trait type eg binary count continuouspolytomous ordinal time-to-onset multivariate and others as wellas combinations of those but also being applicable to various studydesigns including homogeneous and admixed unrelated-subject andfamily as well as mixtures of them The proposed GMDR offersan important addition to the arsenal of analytical tools for identi-fying nonlinear multifactor interactions and unraveling the geneticarchitecture of complex traits

Gene-Gene Interaction Analysis for Rare Variants Applicationto T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University2Sejong Universitytsparkstatssnuackr

Heritability of complex diseases may not be fully explained by thecommon variants This missing heritability could be partly due togene-gene interaction and rare variants There has been an exponen-tial growth of gene-gene interaction analysis for common variantsin terms of methodological developments and practical applicationsAlso the recent advance of high-throughput sequencing technolo-gies makes it possible to conduct rare variant analysis Howeverlittle progress has been made in gene-gene interaction analysis forrare variants Here we propose a new gene-gene interaction methodfor the rare variants in the framework of the multifactor dimension-ality reduction (MDR) analysis The proposed method consists oftwo steps The first step is to collapse the rare variants in a specificregion such as gene The second step is to perform MDR analysisfor the collapsed rare variants The proposed method is illustratedwith 1080 whole exome sequencing data of Korean population toidentify causal gene-gene interaction for rare variants for type 2 di-abetes

Session 33 Structured Approach to High DimensionalData with Sparsity and Low Rank Factorization

Two-way Regularized Matrix DecompositionJianhua HuangTexas AampM UniversityjianhuastattamueduMatrix decomposition (or low-rank matrix approximation) plays animportant role in various statistical learning problems Regulariza-tion has been introduced to matrix decomposition to achieve stabil-ity especially when the row or column dimension is high Whenboth the row and column domains of the matrix are structured itis natural to employ a two-way regularization penalty in low-rankmatrix approximation This talk discusses the importance of con-sidering invariance when designing the two-way penalty and showsun-desirable properties of some penalties used in the literature whenthe invariance is ignored

Tensor Regression with Applications in Neuroimaging AnalysisHua Zhou1 Lexin Li1 and Hongtu Zhu2

1North Carolina State University2University of North Carolina at Chapel Hilllli10ncsueduClassical regression methods treat covariates as a vector and es-timate a corresponding vector of regression coefficients Modernapplications in medical imaging generate covariates of more com-plex form such as multidimensional arrays (tensors) Traditionalstatistical and computational methods are compromised for analysisof those high-throughput data due to their ultrahigh dimensional-ity as well as complex structure In this talk I will discuss a newclass of tensor regression models that efficiently exploit the specialstructure of tensor covariates Under this framework ultrahigh di-mensionality is reduced to a manageable level resulting in efficientestimation and prediction Regularization both hard thresholdingand soft thresholding types will be carefully examined The newmethods aim to address a family of neuroimaging problems includ-ing using brain images to diagnose neurodegenerative disorders topredict onset of neuropsychiatric diseases and to identify diseaserelevant brain regions or activity patterns

RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2 andGuy Lebanon1

60 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

1Georgia Institute of Technology2Pennsylvania State Universitykrishnakumar3gatecheduFeature screening is a key step in handling ultrahigh dimensionaldata sets that are ubiquitous in modern statistical problems Over thelast decade convex relaxation based approaches (eg Lassosparseadditive model) have been extensively developed and analyzed forfeature selection in high dimensional regime But these approachessuffer from several problems both computationally and statisticallyTo overcome these issues we propose a novel Hilbert space em-bedding based approach for independence screening in ultrahigh di-mensional data sets The proposed approach is model-free (ie nomodel assumption is made between response and predictors) andcould handle non-standard (eg graphs) and multivariate outputsdirectly We establish the sure screening property of the proposedapproach in the ultrahigh dimensional regime and experimentallydemonstrate its advantages over other approaches

Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho ChunPurdue UniversitychunhpurdueeduFor the purpose of inferring a network we consider a sparse Gaus-sian graphical model (SGGM) under the presence of a populationstructure which often occurs in genetic studies with model organ-isms In these studies datasets are obtained by combining multi-ple lines of inbred organisms or by using outbred animals Ignor-ing such population structures would produce false connections ina graph structure but most research in graph inference has beenfocused on independent cases On the other hand in regression set-tings a linear mixed effect model has been widely used in orderto account for correlations among observations Besides its effec-tiveness the linear mixed effect model has a generality the modelcan be stated with a framework of penalized least squares Thisgenerality makes it very flexible for utilization in settings other thanregression In this manuscript we adopt a linear mixed effect modelto an SGGM Our formulation fits into the recently developed con-ditional Gaussian graphical model in which the population struc-tures are modeled as predictors and the graph is determined by aconditional precision matrix The proposed approach is applied tothe network inference problem of two datasets heterogeneous micediversity panel (HMDP) and heterogeneous stock (HS) datasets

Session 34 Recent Developments in Dimension Reduc-tion Variable Selection and Their Applications

Variable Selection and Model Estimation via Subtle UprootingXiaogang SuUniversity of Texas at El PasoxiaogangsugmailcomWe propose a new method termed ldquosubtle uprootingrdquo for fittingGLM by optimizing a smoothed information criterion The signif-icance of this approach is that it completes variable selection andparameter estimation within one single optimization step and avoidstuning penalty parameters as commonly done in traditional regular-ization approaches Two technical maneuvers ldquouprootingrdquo and anepsilon-threshold procedure are employed to enforce sparsity inparameter estimates while maintaining the smoothness of the ob-jective function The formulation allows us to borrow strength fromestablished methods and theories in both optimization and statistical

estimation More specifically a modified BFGS algorithm (Li andFukushima 2001) is adopted to solve the non-convex yet smoothprogramming problem with established global and super-linearconvergence properties By making connections to M -estimatorsand information criteria we also showed that the proposed methodis consistent in variable selection and efficient in estimating thenonzero parameters As illustrated with both simulated experimentsand data examples the empirical performance is either comparableor superior to many other competitors

Robust Variable Selection Through Dimension ReductionQin WangVirginia Commonwealth Universityqwang3vcueduDimension reduction and variable selection play important roles inhigh dimensional data analysis MAVE (minimum average varianceestimation) is an efficient approach proposed by Xia et al (2002)to estimate the regression mean space However it is not robust tooutliers in the dependent variable because of the use of least-squarescriterion In this talk we propose a robust estimation based on localmodal regression so that it is more applicable in practice We fur-ther extend the new approach to select informative variables throughshrinkage estimation The efficacy of the new approach is illustratedthrough simulation studies

Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2

1University of Florida2National University of SingaporezhihuasustatufleduThe envelope model recently proposed by Cook Li andChiaromonte (2010) is a novel method to achieve efficient estima-tion for multivariate linear regression It identifies the material andimmaterial information in the data using the covariance structureamong the responses The subsequent analysis is based only on thematerial part and is therefore more efficient The envelope estimatoris consistent but in the sample the material part estimated by theenvelope model consists of linear combinations of all the responsevariables while in many applications it is important to pinpoint theresponse variables that are immaterial to the regression For thispurpose we propose the sparse envelope model which can identifythese response variables and at the same time preserves the effi-ciency gains offered by the envelope model A group-lasso type ofpenalty is employed to induce sparsity on the manifold structure ofthe envelope model Consistency asymptotic distribution and oracleproperty of the estimator are established In particular new featuresof oracle property with response selection are discussed Simulationstudies and an example demonstrate the effectiveness of this model

Session 35 Post-Discontinuation Treatment in Random-ized Clinical Trials

Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censoringby Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries11Eli Lilly and Company2North Carolina State Universityliu jingyilillycomA randomized clinical trial is designed to estimate the direct ef-fect of a treatment versus control where patients receive the treat-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 61

Abstracts

ment of interest or control by random assignment The treatmenteffect is measured by the comparison of endpoints of interest egoverall survival However in some trials patients who discon-tinued their initial randomized treatment are allowed to switch toanother treatment based on clinicians or patientsrsquo subjective deci-sion In such cases the primary endpoint is censored and the di-rect treatment effect of interest may be confounded by subsequenttreatments especially when subsequent treatments have large im-pact on endpoints In such studies there usually exist variables thatare both risk factors of primary endpoint and also predictors of ini-tiation of subsequent treatment Such variables are called time de-pendent confounders When time dependent confounders exist thetraditional methods such as the intent-to-treat (ITT) analysis andtime-dependent Cox model may not appropriately adjust for timedependent confounders and result in biased estimators Marginalstructural models (MSM) have been applied to estimate the causaltreatment effect when initial treatment effect was confounded bysubsequent treatments It has been shown that MSM utilizing in-verse propensity weighting generates consistent estimators whenother nuisance parameters were correctly modeled However theoccurrence of very large weights can cause the estimator to haveinflated variance and consistency may not hold The AugmentedMSM estimator was proposed to more efficiently estimate treat-ment effect but may not perform well as expected in presence oflarge weights In this paper we proposed a new method to estimateweights by adaptively truncating longitudinal weights in MSM Thismethod sacrifices the consistency but gain efficiency when largeweight exists without ad hoc selecting and removing observationswith large weights We conducted simulation studies to explorethe performance of several different methods including ITT anal-ysis Cox model and the proposed method regarding bias standarddeviation coverage rate of confidence interval and mean squarederror (MSE) under various scenarios We also applied these meth-ods to a randomized open-label phase III study of patients withnon-squamous non-small cell lung cancer

Quantile Regression Adjusting for Dependent Censoring fromSemi-Competing RisksRuosha Li1 and Limin Peng2

1University of Pittsburgh2Emory Universityrul12pittedu

In this work we study quantile regression when the response is anevent time subject to potentially dependent censoring We considerthe semi-competing risks setting where time to censoring remainsobservable after the occurrence of the event of interest While sucha scenario frequently arises in biomedical studies most of currentquantile regression methods for censored data are not applicable be-cause they generally require the censoring time and the event timebe independent By imposing rather mild assumptions on the asso-ciation structure between the time-to-event response and the censor-ing time variable we propose quantile regression procedures whichallow us to garner a comprehensive view of the covariate effects onthe event time outcome as well as to examine the informativenessof censoring An efficient and stable algorithm is provided for im-plementing the new method We establish the asymptotic proper-ties of the resulting estimators including uniform consistency andweak convergence Extensive simulation studies suggest that theproposed method performs well with moderate sample sizes We il-lustrate the practical utility of our proposals through an applicationto a bone marrow transplant trial

Overview of Crossover DesignMing ZhuAbbVie Inczhuming83gmailcomCrossover design is used in many clinical trials Comparing toconventional parallel design crossover design has the advantage ofavoiding problems of comparability issues between study and con-trol groups with regard to potential confounding variables More-over crossover design is more efficient than parallel design in thatit requires smaller sample size with given type I and type II errorHowever crossover design may suffer from the problem of carry-over effects which might bias the interpretation of data analysis Inthe presentation I will talk about general consideration that needsto be taken and pitfalls to be avoided in planning and analysis ofcrossover trial Appropriate statistical methods for crossover trialanalysis will also be described

Cross-Payer Effects of Medicaid LTSS on Medicare ResourceUse using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen JohnsonUniversity of MarylandyihuangumbceduMedicaid administrators look to establish a better balance betweenlong-term services and supports (LTSS) provided in the communityand in institutions and to better integrate acute and long-term carefor recipients who are dually eligible for Medicare Programs of in-tegrated care will require the solid understanding on the interactiveeffects that are masked in the separation of Medicare and MedicaidThis paper aims to evaluate the causal effect of Marylandrsquos OlderAdult Waiver (OAW) program on the outcomes of Medicare spend-ing using propensity score based health risk profiling techniqueSpecifically dually eligible recipients enrolled for Marylandrsquos OAWprogram were identified as the treatment group and matched ldquocon-trolrdquo groups were drawn from comparable population who did notreceive those services The broader impact for this study is that sta-tistical approaches can be developed by any state to facilitate theimprovement of quality and cost effectiveness of LTSS for duals

Session 36 New Advances in Semi-Parametric Modelingand Survival Analysis

Bayesian Partial Linear Model for Skewed Longitudinal DataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 Stuart Lipsitz3

and Steven Lipshultz41AbbVie Inc2Florida State University3Brigham and Womenrsquos Hospital4University of MiamidebdeepstatfsueduCurrent statistical models and methods focusing on mean responseare not appropriate for longitudinal studies with heavily skewedcontinuous response For such longitudinal response we presenta novel model accommodating a partially linear median regressionfunction a flexible Dirichlet process mixture prior for the skewederror distribution and within subject association structure We pro-vide theoretical justifications for our methods including asymptoticproperties of the posterior and the semi-parametric Bayes estima-tors We also provide simulation studies of finite sample propertiesEase of computational implementation via available MCMC toolsand other additional advantages of our method compared to exist-ing methods are illustrated via analysis of a cardiotoxicity study ofchildren of HIV infected mothers

62 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Nonparametric Inference for Inverse Probability Weighted Es-timators with a Randomly Truncated SampleXu ZhangUniversity of Mississippixzhang2umcedu

A randomly truncated sample appears when the independent vari-ables T and L are observable if L iexcl T The truncated version Kaplan-Meier estimator is known to be the standard estimation method forthe marginal distribution of T or L The inverse probability weighted(IPW) estimator was suggested as an alternative and its agreementto the truncated version Kaplan-Meier estimator has been provedThis paper centers on the weak convergence of IPW estimators andvariance decomposition The paper shows that the asymptotic vari-ance of an IPW estimator can be decomposed into two sources Thevariation for the IPW estimator using known weight functions is theprimary source and the variation due to estimated weights shouldbe included as well Variance decomposition establishes the con-nection between a truncated sample and a biased sample with knowprobabilities of selection A simulation study was conducted to in-vestigate the practical performance of the proposed variance esti-mators as well as the relative magnitude of two sources of variationfor various truncation rates A blood transfusion data set is analyzedto illustrate the nonparametric inference discussed in the paper

Modeling Time-Varying Effects for High-Dimensional Covari-ates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji ZhuUniversity of Michiganyiliumichedu

Survival models with time-varying effects provide a flexible frame-work for modeling the effects of covariates on event times How-ever the difficulty of model construction increases dramatically asthe number of variable grows Existing constrained optimizationand boosting methods suffer from computational complexity Wepropose a new Gateaux differential-based boosting procedure forsimultaneously selecting and automatically determining the func-tional form of covariates The proposed method is flexible in that itextends the gradient boosting to functional differentials in generalparameter space In each boosting learning step of this procedureonly the best-fitting base-learner (and therefore the most informativecovariate) is added to the predictor which consequently encouragessparsity In addition the method controls smoothness which is cru-cial for improving predictive performance The performance of theproposed method is examined by simulations and by application toanalyze the national kidney transplant data

Flexible Modeling of Survival Data with Covariates Subject toDetection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang2

1Villanova University2North Carolina State Universitydzhang2ncsuedu

Models for survival data generally assume that covariates are fullyobserved However in medical studies it is not uncommon forbiomarkers to be censored at known detection limits A computa-tionally efficient multiple imputation procedure for modelling sur-vival data with covariates subject to detection limits is proposedThis procedure is developed in the context of an accelerated fail-ure time model with a flexible seminonparametric error distributionAn iterative version of the proposed multiple imputation algorithmthat approximates the EM algorithm for maximum likelihood is sug-gested Simulation studies demonstrate that the proposed multiple

imputation methods work well while alternative methods lead to es-timates that are either biased or more variable The proposed meth-ods are applied to analyze the dataset from a recently conductedGenIMS study

Session 37 High-Dimensional Data Analysis Theoryand Application

Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen ZhangUniversity of ArizonahzhangmatharizonaeduA new class of semiparametric functional regression models is con-sidered to jointly model the functional and non-functional predic-tors identifying important scalar covariates while taking into ac-count the functional covariate In particular we exploit a unifiedlinear structure to incorporate the functional predictor as in classi-cal functional linear models that is of nonparametric feature At thesame time we include a potentially large number of scalar predic-tors as the parametric part that may be reduced to a sparse represen-tation The new method performs variable selection and estimationby naturally combining the functional principal component analysis(FPCA) and the SCAD penalized regression under one frameworkTheoretical and empirical investigation reveals that efficient estima-tion regarding important scalar predictors can be obtained and en-joys the oracle property despite contamination of the noise-pronefunctional covariate The study also sheds light on the influence ofthe number of eigenfunctions for modeling the functional predic-tor on the correctness of model selection and accuracy of the scalarestimates

High-Dimensional Thresholded Regression and Shrinkage Ef-fectZemin Zheng Yingying Fan and Jinchi LvUniversity of Southern CaliforniazeminzheusceduHigh-dimensional sparse modeling via regularization provides apowerful tool for analyzing large-scale data sets and obtainingmeaningful interpretable models The use of nonconvex penaltyfunctions shows advantage in selecting important features in highdimensions but the global optimality of such methods still de-mands more understanding In this paper we consider sparse re-gression with hard-thresholding penalty which we show to giverise to thresholded regression This approach is motivated by itsclose connection with the L0-regularization which can be unreal-istic to implement in practice but of appealing sampling propertiesand its computational advantage Under some mild regularity con-ditions allowing possibly exponentially growing dimensionality weestablish the oracle inequalities of the resulting regularized estima-tor as the global minimizer under various prediction and variableselection losses as well as the oracle risk inequalities of the hard-thresholded estimator followed by a further L2-regularization Therisk properties exhibit interesting shrinkage effects under both es-timation and prediction losses We identify the optimal choice ofthe ridge parameter which is shown to have simultaneous advan-tages to both the L2-loss and prediction loss These new results andphenomena are evidenced by simulation and real data examples

Local Independence Feature Screening for Nonparametric andSemiparametric Models by Marginal Empirical LikelihoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu3

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 63

Abstracts

1University of Melbourne2University of Colorado Denver3North Carolina State UniversitychengyongtangucdenvereduWe consider an independence feature screening method for iden-tifying contributing explanatory variables in high-dimensional re-gression analysis Our approach is constructed by using the em-pirical likelihood approach in conjunction with marginal nonpara-metric regressions to surely capture the local impacts of explana-tory variables Without requiring a specific parametric form of theunderlying data model our approach can be applied for a broadrange of representative nonparametric and semi-parametric modelswhich include but are not limited to the nonparametric additivemodels single-index and multiple-index models and varying co-efficient models Facilitated by the marginal empirical likelihoodour approach addresses the independence feature screening prob-lem with a new insight by directly assessing evidence of significancefrom data on whether an explanatory variable is contributing locallyto the response variable or not Such a feature avoids the estima-tion step in most existing independence screening approaches andis advantageous in scenarios such as the single-index models whenthe identification of the marginal effect for its estimation is an issueTheoretical analysis shows that the proposed feature screening ap-proach can handle data dimensionality growing exponentially withthe sample size By extensive theoretical illustrations and empiricalexamples we show that the local independence screening approachworks promisingly

The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2

1Florida State University2University of MinnesotamaistatfsueduA new model-free screening method named fused Kolmogorov filteris proposed for high-dimensional data analysis This new method isfully nonparametric and can work with many types of covariatesand response variables including continuous discrete and categor-ical variables We apply the fused Kolmogorov filter to deal withvariable screening problems emerging from in a wide range of ap-plications such as multiclass classification nonparametric regres-sion and Poisson regression among others It is shown that thefused Kolmogorov filter enjoys the sure screening property underweak regularity conditions that are much milder than those requiredfor many existing nonparametric screening methods In particu-lar the fused Kolmogorov can still be powerful when covariatesare strongly dependent of each other We further demonstrate thesuperior performance of the fused Kolmogorov filter over existingscreening methods by simulations and real data examples

Session 38 Leading Across Boundaries Leadership De-velopment for Statisticians

Xiaoli Meng1Dipak Dey2 Soonmin Park3 James Hung4 WalterOffen5

1Harvard University2University of Connecticut3Eli Lilly and Company4United States Food and Drug Administration5AbbVie Inc1mengstatharvardedu2dipakdeyuconnedu

3park soominlillycom4hsienminghungfdahhsgov5walteroffenabbviecomThe role of statistician has long been valued as a critical collabo-rator in interdisciplinary collaboration Nevertheless statistician isoften regarded as a contributor more than a leader This stereotypehas limited statistics as a driving perspective in a partnership envi-ronment and inclusion of statistician in executive decision makingMore leadership skills are needed to prepare statisticians to play in-fluential roles and to promote our profession to be more impactfulIn this panel session statistician leaders from academia govern-ment and industry will share their insights about leadership andtheir experiences in leading in their respective positions Importantleadership skills and qualities for statisticians will be discussed bythe panelists This session is targeted for statisticians who intend toseek more knowledge and inspiration of leadership

Session 39 Recent Advances in Adaptive Designs inEarly Phase Trials

A Toxicity-Adaptive Isotonic Design for Combination Therapyin OncologyRui QinMayo ClinicqinruimayoeduWith the development of molecularly targeted drugs in cancer treat-ment combination therapy targeting multiple pathways to achievepotential synergy becomes increasingly popular While the dosingrange of individual drug may be already defined the maximum tol-erated dose of combination therapy is yet to be determined in a newphase I trial The possible dose level combinations which are par-tially ordered poses a great challenge for conventional dose-findingdesignsWe have proposed to estimate toxicity probability by isotonic re-gression and incorporate the attribution of toxicity into the consid-eration of dose escalation and de-escalation of combination therapySimulation studies are conducted to understand and assess its oper-ational characteristics under various scenarios The application ofthis novel design into an ongoing phase I clinical trial with dualagents is further illustrated as an example

Calibration of the Likelihood Continual Reassessment Methodfor Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung1

1Columbia University2Boehringer Ingelheim Pharmaceuticalssml2114columbiaeduThe likelihood continual reassessment method is an adaptive model-based design used to estimate the maximum tolerated dose in phaseI clinical trials The method is generally implemented in a two stageapproach whereby model based dose escalation is activated after aninitial sequence of patients are treated While it has been shown thatthe method has good large sample properties in finite sample set-tings it is important to specify a reasonable model We proposea systematic approach to select the initial dose sequence and theskeleton based on the concepts of indifference interval and coher-ence We compare the approaches to the traditional trial and errorapproach in the context of examples The systematic calibration ap-proach simplifies the model calibration process for the likelihoodcontinual reassessment method while being competitive comparedto a time consuming trial and error process We also share our expe-

64 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

rience using the calibration technique in real life applications usingthe dfcrm package in R

Sequential Subset Selection Procedure of Random Subset Sizefor Early Phase Clinical trialsCheng-Shiun Leu and Bruce LevinColumbia Universitycl94columbiaeduIn early phase clinical trials the objective is often to select a sub-set of promising candidate treatments whose treatment effects aregreater than the remaining candidates by at least a pre-specifiedamount to bring forward for phase III confirmatory testing Undercertain constraints such as budgetary limitations or difficulty of re-cruitment a procedure which select a subset of fixed pre-specifiedsize is entirely appropriate especially when the number of treat-ments available for further testing is limited However cliniciansand researchers often demand to identify all efficacious treatmentsin the screening process and a subset selection of fixed size may notbe sufficient to satisfy the requirement as the number of efficacioustreatments is unknown prior to the experiment To address this is-sue we discuss a family of sequential subset selection procedureswhich identify a subset of efficacious treatments of random sizethereby avoiding the need to pre-specify the subset size Variousversions of the procedure allow adaptive sequential elimination ofinferior treatments and sequential recruitment of superior treatmentsas the experiment processes We compare these new procedure withGuptarsquos random subset size procedure for selecting the one best can-didate by simulation

Serach Procedures for the MTD in Phase I TrialsShelemyyahu ZacksBinghamton UniversityshellymathbinghamtoneduThere are several competing methods of search for the MTD inPhase I Cancer clinical trials The paper will review some proce-dures and compare the operating characteristics of them In partic-ular the EWOC method of Rogatko and el will be highlighted

Session 40 High Dimensional RegressionMachineLearning

Variable Selection for High-Dimensional Nonparametric Ordi-nary Differential Equation Models With Applications to Dy-namic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu1

1University of Rochester2State University of New York at Albany3George Washington UniversityHongqi XueurmcrochestereduThe gene regulation network (GRN) is a high-dimensional complexsystem which can be represented by various mathematical or sta-tistical models The ordinary differential equation (ODE) model isone of the popular dynamic GRN models High-dimensional lin-ear ODE models have been proposed to identify GRNs but witha limitation of the linear regulation effect assumption We pro-pose a nonparametric additive ODE model coupled with two-stagesmoothing-based ODE estimation methods and adaptive groupLASSO techniques to model dynamic GRNs that could flexiblydeal with nonlinear regulation effects The asymptotic propertiesof the proposed method are established under the ldquolarge p small nrdquosetting Simulation studies are performed to validate the proposed

approach An application example for identifying the nonlinear dy-namic GRN of T-cell activation is used to illustrate the usefulnessof the proposed method

BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University2Cornell Universitypingli98gmailcomThe method of stable random projections is useful for efficientlyapproximating the lα distance in high dimension and it is naturallysuitable for data streams In this paper we propose to use only thesigns of the α = 1 (ie Cauchy random projections) we showthat the probability of collision can be accurately approximated asfunctions of the chi-square (χ2) similarity In text and vision ap-plications the χ2 similarity is a popular measure when the featuresare generated from histograms (which are a typical example of datastreams) Experiments confirm that the proposed method is promis-ing for large-scale learning applications The full paper is availableat arXiv13081009

A Sparse Linear Discriminant Analysis Method with Asymp-totic Optimality for Multiclass ClassificationRuiyan Luo and Xin QiGeorgia State UniversityrluogsueduRecently many sparse linear discriminant analysis methods havebeen proposed to overcome the major problems of the classic lineardiscriminant analysis in high-dimensional settings However theasymptotic optimality results are limited to the case that there areonly two classes as the classification boundary of LDA is a hyper-plane and there exist explicit formulas for the classification errorWe propose an efficient sparse linear discriminant analysis methodfor multiclass classification In practice this method can control therelationship between the sparse components and hence have im-proved prediction accuracy compared to other methods in both sim-ulation and case studies In theory we derive asymptotic optimalityfor our method as dimensionality and sample size go to infinity witharbitrary fixed number of classes

Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles KooperbergFred Hutchinson Cancer Research CenterychengfhcrcorgThe development in next-generation-sequencing technology en-ables the detection of both common and rare variants Genome wideassociation study (GWAS) benefits greatly from this fast growingtechnology Although a lot of associations between variants anddisease have been found for common variants new methods for de-tecting functional rare variants is still in urgent need Among exist-ing methods efforts have been done to increase detection power bydoing set-based test However none of the methods make a distinc-tion between functional variants and neutral variants (ie variantsthat do not have effect on the disease) In this paper we propose tomodel the effects from a set (for example a gene) of variants as aHidden Markov Model (HMM) For each SNP we model the effectsas a mixture of 0 and θ where θ is the true effect size The mixtureset up is to account for the fact that a proportion of the variants areneutral Another advantage of using HMM is it can account for pos-sible association between neighboring variants Our methods workswell for both linear model and logistic model Under the frameworkof HMM we test between having 1 components against more com-ponents and derived the asymptotic distribution under null hypoth-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 65

Abstracts

esis We show that our proposed methods works well as comparedto competitors under various scenario

Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin LiWright State UniversitygengxinliwrighteduEmpirical Bayes classification method is a useful risk prediction ap-proach for microarray data but it is challenging to apply this methodto risk prediction area using the mini exome sequencing data A ma-jor advantage of using this method is that the effect size distributionfor the set of possible features is empirically estimated and that allsubsequent parameter estimation and risk prediction is guided bythis distribution Here we generalize Efronrsquos method to allow forsome of the peculiarities of the mini exome sequencing data In par-ticular we incorporate quantitative trait information to binary traitprediction model and a new model named Joint Trait Model is pro-posed and we further allow this model to properly incorporate theannotation information of single nucleotide polymorphisms (SNPs)In the course of our analysis we examine several aspects of the pos-sible simulation model including the identity of the most importantgenes the differing effects of synonymous and non-synonymousSNPs and the relative roles of covariates and genes in conferringdisease risk Finally we compare the three methods to each otherand to other classifiers

Rank Estimation and Recovery of Low-rank Matrices For Fac-tor Model with Heteroscedastic NoiseJingshu Wang and Art B OwenStanford UniversitywangjingshususangmailcomWe consider recovery of low-rank matrices from noisy data withheteroscedastic noise We use an early stopping alternating method(ESAM) which iteratively alters the estimate of the noise vari-ance and the low-rank matrix and corrects over-fitting by an early-stopping rule Various simulations in our study suggest stoppingafter just 3 iterations and we have seen that ESAM gives better re-covery than the SVD on either the original data or the standardizeddata with the optimal rank given To select a rank we use an early-stopping bi-cross-validation (BCV) technique modified from BCVfor the white noise model Our method leaves out half the rows andhalf the columns as in BCV but uses low rank operations involvingESAM instead of the SVD on the retained data to predict the heldout entries Simulations considering both strong and weak signalcases show that our method is the most accurate overall comparedto some BCV strategies and two versions of Parallel Analysis (PA)PA is a state-of-the art method for choosing the number of factorsin Factor Analysis

Session 41 Distributional Inference and Its Impact onStatistical Theory and Practice

Stat Wars Episode IV A New Hope (For Objective Inference)Keli Liu and Xiao-Li MengHarvard UniversitymengstatharvardeduA long time ago in a galaxy far far away (pre-war England)It is a period of uncivil debate Rebel statisticians striking froman agricultural station have won their first victory against the evilBayesian EmpireA plea was made ldquoHelp me R A Fisher yoursquore my only hoperdquo

and Fiducial was born It promised posterior probability statementson parameters without a prior but at the seeming cost of violatingbasic probability laws Was Fisher crazy or did madness mask in-novation Fiducial calculations can be easily understood throughthe missing-data perspective which illuminates a trinity of missinginsightsI The Bayesian prior becomes an infinite dimensional nuisance pa-rameter to be dealt with using partial likelihoodII A Missing At Random (MAR) condition naturally characterizeswhen exact Fiducial solutions existIII Understanding the ldquomulti-phaserdquo structure underlying Fiducialinference leads to the development of approximate Fiducial proce-dures which remain robust to prior misspecificationIn the years after its introduction Fiducialrsquos critics branded it ldquoFish-ers biggest blunderrdquo But in the great words of Obi-Wan ldquoIf youstrike me down I shall become more powerful than you can possi-bly imaginerdquoTo be continued Episode V Ancillarity Paradoxes Strike Back (AtFiducial) and Episode VI Return of the Fiducialist will premiere re-spectively at IMS Asia Pacific Rim Meeting in Taipei (June 30-July3 2014) and at IMS Annual Meeting in Sydney (July 7-11 2014)

Higher Order Asymptotics for Generalized Fiducial InferenceAbhishek Pal Majumdarand Jan HannigUniversity of North Carolina at Chapel HilljanhannigunceduR A Fisherrsquos fiducial inference has been the subject of many dis-cussions and controversies ever since he introduced the idea duringthe 1930rsquos The idea experienced a bumpy ride to say the leastduring its early years and one can safely say that it eventually fellinto disfavor among mainstream statisticians However it appearsto have made a resurgence recently under various names and mod-ifications For example under the new name generalized inferencefiducial inference has proved to be a useful tool for deriving statis-tical procedures for problems where frequentist methods with goodproperties were previously unavailable Therefore we believe thatthe fiducial argument of RA Fisher deserves a fresh look from anew angle In this talk we investigate the properties of general-ized fiducial distribution using higher order asymptotics and pro-vide suggestions on some open issues in fiducial inference such asthe choice of data generating equation

Generalized Inferential ModelsRyan MartinUniversity of Illinois at ChicagorgmartinuiceduThe new inferential model (IM) framework provides prior-freeprobabilistic inference which is valid for all models and all sam-ple sizes The construction of an IM requires specification of anassociation that links the observable data to the parameter of inter-est and an unobservable auxiliary variable This specification canbe challenging however particularly when the parameter is morethan one dimension In this talk I will present a generalized (orldquoblack-boxrdquo) IM that bypasses full specification of the associationand the challenges it entails by working with an association basedon a scalar-valued parameter-dependent function of the data The-ory and examples demonstrate this method gives exact and efficientprior-free probabilistic inference in a wide variety of problems

Formal Definition of Reference Priors under a General Class ofDivergenceDongchu SunUniversity of Missouri

66 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

sundmissourieduReference analysis produces objective Bayesian inference that in-ferential statements depend only on the assumed model and theavailable data and the prior distribution used to make an inferenceis least informative in a certain information-theoretic sense BergerBernardo and Sun (2009) derived reference priors rigorously in thecontexts under Kullback-Leibler divergence In special cases withcommon support and other regularity conditions Ghosh Mergeland Liu (2011) derived a general f-divergence criterion for priorselection We generalize Ghosh Mergel and Liursquos (2011) results tothe case without common support and show how an explicit expres-sion for the reference prior can be obtained under posterior consis-tency The explicit expression can be used to derive new referencepriors both analytically and numerically

Session 42 Applications of Spatial Modeling and Imag-ing Data

Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1 andJames Coan2

1Duke University2University of Virginiatz3bvirginiaeduMulti-subject functional magnetic resonance imaging (fMRI) dataprovide opportunities to study population-wide relationship be-tween human brain activity and individual biological or behaviorialtraits But statistical modeling analysis and computation for suchmassive and noisy data with a complicated spatio-temporal corre-lation structure is extremely challenging In this article within theframework of Bayesian stochastic search variable selection we pro-pose a joint Ising and Dirichlet Process (Ising-DP) prior to achieveselection of spatially correlated brain voxels that are predictive ofindividual responses The Ising component of the prior utilizesof the spatial information between voxels and the DP componentshrinks the coefficients of the large number of voxels to a smallset of values and thus greatly reduces the posterior computationalburden To address the phase transition phenomenon of the Isingprior we propose a new analytic approach to derive bounds for thehyperparameters illustrated on 2- and 3-dimensional lattices Theproposed method is compared with several alternative methods viasimulations and is applied to the fMRI data collected from the Kiffhand-holding experiment

A Hierarchical Model for Simultaneous Detection and Estima-tion in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist21DePaul University2Johns Hopkins UniversityddegrasvdepauleduIn this paper we introduce a new hierarchical model for the simul-taneous detection of brain activation and estimation of the shapeof the hemodynamic response in multi-subject fMRI studies Theproposed approach circumvents a major stumbling block in stan-dard multi-subject fMRI data analysis in that it both allows theshape of the hemodynamic response function to vary across regionand subjects while still providing a straightforward way to estimatepopulation-level activation An efficient estimation algorithm is pre-sented as is an inferential framework that not only allows for testsof activation but also for tests for deviations from some canonical

shape The model is validated through simulations and applicationto a multi-subject fMRI study of thermal pain

On the Relevance of Accounting for Spatial Correlation A CaseStudy from FloridaLinda J Young1 and Emily Leary21USDA NASS RDD2University of FloridalindayoungnassusdagovIdentifying the potential impact of climate change is of increas-ing interest As an example understanding the effects of changingtemperature patterns on crops animals and public health is impor-tant if mitigation or adaptation strategies are to be developed Herethe consequences of the increasing frequency and intensity of heatwaves are considered First four decades of temperature data areused to identify heat waves for the six National Weather Serviceregions within Florida During these forty years each tempera-ture monitor has some days for which no data were recorded Thepresence of missing data has largely been ignored in this settingand analyses have been conducted based on observed data Alter-natively time series models spatial models or space-time modelscould be used to impute the missing data Here the effects of thetreatment of missing data on the identification of heat waves and thesubsequent inference related to the impact of heat waves on publichealth are explored

Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico2University of Texas at AustinghuertastatunmeduWe consider some recent developments to deal with climate mod-els and that rely on various modern computational and statisticalstrategies Firstly we consider various posterior sampling strate-gies to study a surrogate model that approximates a climate re-sponse through the Earthrsquos orbital parameters In particular weshow that for certain metrics of model skill AdaptiveDelayed Re-jection MCMC methods are effective to estimate parametric uncer-tainties and resolve inverse problems for climate models We willalso discuss some of the High Performance Computing efforts thatare taking place to calibrate various inputs that correspond to theNCAR Community Atmosphere Model (CAM) Finally we showhow to characterize output from a Regional Climate Model throughhierarchical modelling that combines Gauss Markov Random Fields(GMRF) with MCMC methods and that allows estimation of prob-ability distributions that underlie phenomena represented by the cli-mate output

Session 43 Recent Development in Survival Analysis andStatistical Genetics

Restricted Survival Time and Non-proportional HazardsZhigang ZhangMemorial Sloan Kettering Cancer CenterzhangzmskccorgIn this talk I will present some recent development of restricted sur-vival time and its usage especially when the proportional hazardsassumption is violated Technical advances and numerical studieswill both be discussed

Empirical Null using Mixture Distributions and Its Applicationin Local False Discovery RateDoHwan Park

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 67

Abstracts

University of MarylanddhparkumbceduWhen high dimensional data is given it is often of interest to distin-guish between significant (non-null Ha) and non-significant (nullH0) group from mixture of two by controlling type I error rate Onepopular way to control the level is the false discovery rate (FDR)This talk considers a method based on the local false discovery rateIn most of the previous studies the null group is commonly as-sumed to be a normal distribution However if the null distributioncan be departure from normal there may exist too many or too fewfalse discoveries (belongs null but rejected from the test) leadingto the failure of controlling the given level of FDR We propose anovel approach which enriches a class of null distribution based onmixture distributions We provide real examples of gene expressiondata fMRI data and protein domain data to illustrate the problemsfor overview

A Bayesian Illness-Death Model for the Analysis of CorrelatedSemi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici11Harvard University2Dana Farber Cancer InstitutekleehsphharvardeduReadmission rates are a major target of healthcare policy becausereadmission is common costly and potentially avoidable and henceis seen as an adverse outcome Therefore the Centers for Medicareand Medicaid Services currently uses 30-day readmission as a proxyoutcome for quality of care for a number of health conditions How-ever focusing solely on readmission rates in conditions with poorprognosis such as pancreatic cancer is to oversimplify a situationin which patients may die before being readmitted which clearlyis also an adverse outcome In such situations healthcare policyshould consider both readmission and death rates simultaneouslyTo this end our proposed Bayesian framework adopts an illness-death model to represent three transitions for pancreatic cancer pa-tients recently discharged from initial hospitalization (1) dischargeto readmission (2) discharge to death and (3) readmission to deathDependence between the two event times (readmission and death) isinduced via a subject-specific shared frailty Our proposed methodfurther extends the model to situations where patients within a hos-pital may be correlated due to unobserved characteristics We illus-trate the practical utility of our proposed method using data fromMedicare Part A on 100 of Medicare enrollees from 012000 to122010

Detection of Chromosome Copy Number Variations in MultipleSequencesXiaoyi Min Chi Song and Heping ZhangYale UniversityxiaoyiminyaleeduDNA copy number variation (CNV) is a form of genomic struc-tural variation that may affect human diseases Identification of theCNVs shared by many people in the population as well as deter-mining the carriers of these CNVs is essential for understanding therole of CNV in disease association studies For detecting CNVsin single samples a Screening and Ranking Algorithm (SaRa) waspreviously proposed which was shown to be superior over othercommonly used algorithms and have a sure coverage property Weextend SaRa to address the problem of common CNV detection inmultiple samples In particular we propose an adaptive Fisherrsquosmethod for combining the screening statistics across samples Theproposed multi-sample SaRa method inherits the computational and

practical benefits of single sample SaRa in CNV detection We alsocharacterize the theoretical properties of this method and demon-strate its performance in extensive numerical analyses

Session 44 Bayesian Methods and Applications in Clini-cal Trials with Small Population

Applications of Bayesian Meta-Analytic Approach at NovartisQiuling Ally He Roland Fisch and David OhlssenNovartis Pharmaceuticals CorporationallyhenovartiscomConducting an ethical efficient and cost-effective clinical trial hasalways been challenged by the availability of limited study popu-lation Bayesian approaches demonstrate many appealing featuresto deal with studies with small sample sizes and their importancehas been recognized by health authorities Novartis has been ac-tively developing and implementing Bayesian methods at differentstages of clinical development in both oncology and non-oncologysettings This presentation focuses on two applications of Bayesianmeta-analytic approach Both applications explore the relevant his-torical studies and establish meta-analysis to generate inferencesthat can be utilized by the concurrent studies The first example syn-thesized historical control information in a proof-of-concept studythe second application extrapolated efficacy from source to targetpopulation for registration purpose In both applications Bayesiansmethods are shown to effectively reduce the sample size durationof the studies and consequently resources invested

Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and Yuan Ji31University of Texas at Austin2Harvard University3University of Texas at AustinyxustatgmailcomTargeted therapies based on biomarker profiling are becoming amainstream direction of cancer research and treatment Dependingon the expression of specific prognostic biomarkers targeted ther-apies assign different cancer drugs to subgroups of patients evenif they are diagnosed with the same type of cancer by traditionalmeans such as tumor location For example Herceptin is only in-dicated for the subgroup of patients with HER2+ breast cancer butnot other types of breast cancer However subgroups like HER2+breast cancer with effective targeted therapies are rare and most can-cer drugs are still being applied to large patient populations that in-clude many patients who might not respond or benefit Also theresponse to targeted agents in human is usually unpredictable Toaddress these issues we propose SUBA subgroup-based adaptivedesigns that simultaneously search for prognostic subgroups and al-locate patients adaptively to the best subgroup-specific treatmentsthroughout the course of the trial The main features of SUBA in-clude the continuous reclassification of patient subgroups based ona random partition model and the adaptive allocation of patients tothe best treatment arm based on posterior predictive probabilitiesWe compare the SUBA design with three alternative designs in-cluding equal randomization outcome-adaptive randomization anda design based on a probit regression In simulation studies we findthat SUBA compares favorably against the alternatives

Innovative Designs and Practical Considerations for PediatricStudiesAlan Y Chiang

68 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Eli Lilly and CompanychiangaylillycomDespite representing a fundamental step to an efficacious and safeutilization of drugs in pediatric studies the conduct of clinical tri-als in children poses several problems Methodological issues andethical concerns represent the major obstacles that have tradition-ally limited research in small population The randomized controltrial mainstay of clinical studies to assess the effects of any thera-peutic intervention shows some weaknesses which make it scarcelyapplicable to the small population Alternatively and innovative ap-proaches to the clinical trial design in small populations have beendeveloped in the last decades with the aim of overcoming the limitsrelated to small samples and to the acceptability of the trial Thesefeatures make them particularly appealing for the pediatric popu-lation and patients with rare diseases This presentation aims toprovide a variety of designs and analysis methods to assess efficacyand safety in pediatric studies including their applicability advan-tages disadvantages and real case examples Approaches includeBayesian designs borrowing information from other studies andmore innovative approaches Thanks to their features these meth-ods may rationally limit the amount of experimentation in smallpopulation to what is achievable necessary and ethical and presenta reliable way of ultimately improving patient care

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis

partDSA for Deriving Survival Risk Groups Ensemble Learn-ing and Variable SelectionAnnette Molinaro1 Adam Olshen1 and Robert Strawderman2

1University of California at San Francisco2University of RochestermolinaroaneurosurgucsfeduWe recently developed partDSA a multivariate method that sim-ilarly to CART utilizes loss functions to select and partition pre-dictor variables to build a tree-like regression model for a given out-come However unlike CART partDSA permits both rsquoandrsquo and rsquoorrsquoconjunctions of predictors elucidating interactions between vari-ables as well as their independent contributions partDSA thus per-mits tremendous flexibility in the construction of predictive modelsand has been shown to supersede CART in both prediction accu-racy and stability As the resulting models continue to take the formof a decision tree partDSA also provides an ideal foundation fordeveloping a clinician-friendly tool for accurate risk prediction andstratificationWith right-censored outcomes partDSA currently builds estimatorsvia either the Inverse Probability Censoring Weighted (IPCW) orBrier Score weighting schemes see Lostritto Strawderman andMolinaro (2012) where it is shown in numerous simulations thatboth proposed adaptations for partDSA perform as well and of-ten considerably better than two competing tree-based methods Inthis talk various useful extensions of partDSA for right-censoredoutcomes are described and we show the power of the partDSA al-gorithm in deriving survival risk groups for glioma patient basedon genomic markers Another interesting extension of partDSA isas an aggregate learner A comparison will be made of standardpartDSA to an ensemble version of partDSA as well as to alterna-tive ensemble learners in terms of prediction accuracy and variableselection

Predictive Accuracy of Time-Dependent Markers for Survival

OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2

1University of Kentucky2University of North Carolina at Chapel HilllichenukyukyeduIn clinical cohort studies potentially censored times to a certainevent such as death or disease progression and patient charac-teristics at the time of diagnosis or the time of inclusion in thestudy (baseline) are often recorded Serial measurements on clin-ical markers during follow up may also be recorded for monitoringpurpose Recently there are increasing interests in incorporatingthese serial measurements of markers for the prediction of futuresurvival outcomes and assessing the predictive accuracy of thesetime-dependent markers In this paper we propose a new graphicalmeasure the negative predictive function to quantify the predictiveaccuracy of time-dependent markers for survival outcomes Thisnew measure has direct relevance to patient survival probabilitiesand thus has direct clinical utility We construct a nonparametricestimator for the proposed measure allowing censoring to dependon markers and adopt the bootstrap method to obtain the asymp-totic variances Simulation studies demonstrate that the proposedmethod performs well in practical situations One medical study ispresented

Estimating the Effectiveness in HIV Prevention Trials by Incor-porating the Exposure Process Application to HPTN 035 DataJingyang Zhang1 and Elizabeth R Brown2

1Fred Hutchinson Cancer Research Center2Fred Hutchinson Cancer Research CenterUniversity of Washing-tonjzhang2fhcrcorgEstimating the effectiveness of a new intervention is usually the pri-mary objective for HIV prevention trials The Cox proportionalhazard model is mainly used to estimate effectiveness by assum-ing that participants share the same risk under the covariates andthe risk is always non-zero In fact the risk is only non-zero whenan exposure event occurs and participants can have a varying riskto transmit due to varying patterns of exposure events Thereforewe propose a novel estimate of effectiveness adjusted for the hetero-geneity in the magnitude of exposure among the study populationusing a latent Poisson process model for the exposure path of eachparticipant Moreover our model considers the scenario in which aproportion of participants never experience an exposure event andadopts a zero-inflated distribution for the rate of the exposure pro-cess We employ a Bayesian estimation approach to estimate theexposure-adjusted effectiveness eliciting the priors from the histor-ical information Simulation studies are carried out to validate theapproach and explore the properties of the estimates An applicationexample is presented from an HIV prevention trial

Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2

1Penn State College of Medicine2Emory UniversitymwangphspsueduIn practice prediction models for cancer risk and prognosis playan important role in priority cancer research and evaluating andcomparing different models using predictive accuracy metrics in thepresence of censored data are of substantive interest by adjusting forcensoring mechanism To address this issue we evaluate two exist-ing metrics the concordance (c) statistic and the weighted c-statistic

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 69

Abstracts

which adopts an inverse-probability weighting technique under thecircumstances with dependent censoring mechanism via numericalstudies The asymptotic properties of the weighted c-statistic in-cluding consistency and normality is theoretically and rigorouslyestablished In particular the cases with high-dimensional prog-nostic factors (p is moderately large) are investigated to assess thestrategies for estimating the censoring weights by utilizing a regu-larization approach with lasso penalty In addition sensitivity anal-ysis is theoretically and practically conducted to assess predictiveaccuracy in the cases of informative censoring (ie not coarsened atrandom) using non-parametric estimates on the cumulative baselinehazard for the weights Finally a prostate cancer study is adopted tobuild up and evaluate prediction models of future tumor recurrenceafter surgery

Session 46 Missing Data the Interface between SurveySampling and Biostatistics

Likelihood-based Inference with Missing Data Under Missing-at-randomShu Yang and Jae Kwang KimIowa State Universityshuyangiastateedu

Likelihood-based inference with missing data is a challenging prob-lem because the observed log likelihood is of an integral form Ap-proximating the integral by Monte Carlo sampling does not neces-sarily lead to valid inference because the Monte Carlo samples aregenerated from a distribution with a fixed parameter valueWe consider an alternative approach that is based on the parametricfractional imputation of Kim (2011) In the proposed method thedependency of the integral on the parameter is properly reflectedthrough fractional weights We discuss constructing a confidenceinterval using the profile likelihood ratio test A Newton-Raphsonalgorithm is employed to find the interval end points Two limitedsimulation studies show the advantage of the likelihood-based in-ference over the Wald-type inference in terms of power parameterspace conformity and computational efficiency A real data exampleon Salamander mating (McCullagh and Nelder 1989) shows thatour method also works well with high-dimensional missing data

Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang ChenIowa State Universitysncheniastateedu

In this article we consider an imputation method to handle missingresponse values based on semiparametric quantile regression esti-mation In the proposed method the missing response values aregenerated using the estimated conditional quantile regression func-tion at given values of covariates We adopt the generalized methodof moments for estimation of parameters defined through a generalestimation equation We demonstrate that the proposed estimatorcombining both semiparametric quantile regression imputation andgeneralized method of moments is an effective alternative to pa-rameter estimation when missing data is present The consistencyand the asymptotic normality of our estimators are established andvariance estimation is provided Results from limited simulationstudies are presented to show the adequacy of the proposed method

A New Estimation with Minimum Trace of Asymptotic Covari-ance Matrix for Incomplete Longitudinal Data with a Surrogate

ProcessBaojiang Chen1 and Jing Qin2

1University of Nebraska2National Institutes of HealthbaojiangchenunmceduMissing data is a very common problem in medical and social stud-ies especially when data are collected longitudinally It is a chal-lenging problem to utilize observed data effectively Many paperson missing data problems can be found in statistical literature Itis well known that the inverse weighted estimation is neither effi-cient nor robust On the other hand the doubly robust method canimprove the efficiency and robustness As is known the doubly ro-bust estimation requires a missing data model (ie a model for theprobability that data are observed) and a working regression model(ie a model for the outcome variable given covariates and surro-gate variables) Since the DR estimating function has mean zero forany parameters in the working regression model when the missingdata model is correctly specified in this paper we derive a formulafor the estimator of the parameters of the working regression modelthat yields the optimally efficient estimator of the marginal meanmodel (the parameters of interest) when the missing data model iscorrectly specified Furthermore the proposed method also inher-its the doubly robust property Simulation studies demonstrate thegreater efficiency of the proposed method compared to the standarddoubly robust method A longitudinal dementia data set is used forillustration

Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook21Queenrsquos University2University of WaterloomcisaacmqueensucaResponse-dependent two-phase designs can ensure good statisti-cal efficiency while working within resource constraints Samplingschemes that are optimized for analyses based on mean score esti-mating equations have been shown to be highly efficient in a numberof different settings and are straightforward to implement if detailedpopulation characteristics are knownI will present an adaptive multi-phase design which exploits in-formation from an internal pilot study to approximate this optimalmean score design These adaptive designs are easy to implementand result in large efficiency gains while keeping study costs lowThe implementation of this design will be demonstrated using simu-lation studies motivated by an ongoing research program in rheuma-tology

Session 47 New Statistical Methods for Comparative Ef-fectiveness Research and Personalized medicine

Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin4

1University of Texas MD Anderson Cancer Center2Baylor College of Medicine3University of Texas MD Anderson Cancer Center4National Institutes of HealthyshenmdandersonorgUsing data from large observational studies may fill the informa-tion gaps due to lack of evidence from randomized controlled trialsSuch studies may inform real-world clinical scenarios and improveclinical decisions among various treatment strategies However the

70 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

design and analysis of comparative effectiveness studies based onobservational data are complex In this work we proposed prac-tical sample size and power calculation tools for prevalent cohortdesigns and suggested some efficient analysis methods as well

Choice between Superiority and Non-inferiority in Compara-tive Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford UniversityMei-ChiungShihvagovIn designing a comparative effectiveness experiment such as an ac-tive controlled clinical trial comparing a new treatment to an ac-tive control treatment or a comparative effectiveness trial comparingtreatments already in use one sometimes has to choose between asuperiority objective (to demonstrate that one treatment is more ef-fective than the other active treatments) and a non-inferiority objec-tive (to demonstrate that one treatment is no worse than other activetreatments within a pre-specified non-inferiority margin) It is oftendifficult to decide which study objective should be undertaken at theplanning stage when one does not have actual data on the compar-ative effectiveness of the treatments In this talk we describe twoadaptive design features for such trials (1) adaptive choice of su-periority and non-inferiority objectives during interim analyses (2)treatment selection instead of testing superiority The latter aims toselect treatments whose outcomes are close to that of the best treat-ment by eliminating at interim analyses non-promising treatmentsthat are unlikely to be much better than the observed best treatment

An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze LaiStanford UniversitymikebaiocchigmailcomThe demand for rigorous studies of dynamic treatment regimens isincreasing as medical providers treat larger numbers of patients withboth multi-stage disease states and chronic care issues (for examplecancer treatments pain management depression HIV) In this talkwe will propose a trial design developed specifically to be run in areal-world clinical setting These kinds of trials (sometimes calledldquopragmatic trialsrdquo) have several advantages which we will discussThey also pose two major problems for analysis (1) in runninga randomized trial in a clinical setting there is an ethical impera-tive to provide patients with the best outcomes while still collect-ing information on the relative efficacy of treatment regimes whichmeans traditional trial designs are inadequate in providing guidanceand (2) real-world considerations such as informative censoring ormissing data become substantial hurdles We incorporate elementsfrom both point-of-care randomized trials and multiarmed bandittheory and propose a unified method of trial design

Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins UniversityidiazjhueduWe present a methodology to evaluate the causal effect of a binarytreatment on a multinomial outcome when adjustment for covariatesis desirable Adjustment for baseline covariates may be desirableeven in randomized trials since covariates that are highly predic-tive of the outcome can substantially improve the efficiency Wefirst present a targeted minimum loss based estimator of the vec-tor of counterfactual probabilities This estimator is doubly robust

in observational studies and it is consistent in randomized trialsFurthermore it is locally semiparametric efficient under regular-ity conditions We present a variation of the previous estimatorthat may be used in randomized trials and that is guaranteed tobe asymptotically as efficient as the standard unadjusted estima-tor We use the previous results to derive a nonparametric extensionof the parameters in a proportional-odds model for ordinal-valueddata and present a targeted minimum loss based estimator Thisestimator is guaranteed to be asymptotically as or more efficientas the unadjusted estimator of the proportional-odds model As aconsequence this non-parametric extension may be used to test thenull hypothesis of no effect with potentially increased power Wepresent a motivating example and simulations using the data fromthe MISTIE II clinical trial of a new surgical intervention for strokeJoint work with Michael Rosenblum and Elizabeth Colantuoni

Session 48 Student Award Session 1

Regularization After Retention in Ultrahigh Dimensional Lin-ear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2

1Columbia University2Binghamton Universityhw2375columbiaedu

Lasso has proved to be a computationally tractable variable selec-tion approach in high dimensional data analysis However in theultrahigh dimensional setting the conditions of model selectionconsistency could easily fail The independence screening frame-work tackles this problem by reducing the dimensionality based onmarginal correlations before performing lasso In this paper we pro-pose a two-step approach to relax the consistency conditions of lassoby using marginal information in a different perspective from inde-pendence screening In particular we retain significant variablesrather than screening out irrelevant ones The new method is shownto be model selection consistent in the ultrahigh dimensional linearregression model A modified version is introduced to improve thefinite sample performance Simulations and real data analysis showadvantages of our method over lasso and independence screening

Personalized Dose Finding Using Outcome Weighted LearningGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hillguanhuacliveuncedu

In dose-finding clinical trials there is a growing recognition of theimportance to consider individual level heterogeneity when search-ing for optimal treatment doses Such optimal individualized treat-ment rule (ITR) for dosing should maximize the expected clinicalbenefit In this paper we consider a randomized trial design wherethe candidate dose levels are continuous To find the optimal ITRunder such a design we propose an outcome weighted learningmethod which directly maximizes the expected clinical beneficialoutcome This method converts the individualized dose selectionproblem into a penalized weighted regression with a truncated ell1loss A difference of convex functions (DC) algorithm is adoptedto efficiently solve the associated non-convex optimization prob-lem The consistency and convergence rate for the estimated ITRare derived and small-sample performance is evaluated via simula-tion studies We demonstrate that the proposed method outperformscompeting approaches We illustrate the method using data from aclinical trial for Warfarin (an anti-thrombotic drug) dosing

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 71

Abstracts

Survival Rates Prediction When Training Data and Target DataHave Different Measurement ErrorCheng Zheng and Yingye ZhengFred Hutchinson Cancer Research Centerzhengc68uweduNovel biologic markers have been widely used in predicting impor-tant clinical outcome One specific feature of biomarkers is thatthey often are ascertained with variations due to the specific processof measurement The magnitude of such variation may differ whenapplied to a different targeted population or when the platform forbiomarker assaying changes from original platform the predictionalgorithm (cutoffs) based upon Statistical methods have been pro-posed to characterize the effects of underlying error-free quantity inassociation with an outcome yet the impact of measurement errorsin terms of prediction has not been well studied We focus in thismanuscript on the settings which biomarkers are used for predictingindividualrsquos future risk and propose semiparametric estimators forerror-corrected risk when replicates of the error- prone biomark-ers are available The predictive performance of the proposed es-timators is evaluated and compared to alternative approaches withnumerical studies under settings with various assumptions on themeasurement distributions in the original cohort and a future cohortthe predictive rule is applied to We studied the asymptotic proper-ties of the proposed estimator Application is made in a liver cancerbiomarker study to predict risk of 3 and 4 years liver cancer inci-dence using age and a novel biomarker α-Fetoprotein

Hard Thresholded Regression Via Linear ProgrammingQiang SunUniversity of North Carolina at Chapel HillqsunliveunceduThis aim of this paper is to develop a hard thresholded regression(HTR) framework for simultaneous variable selection and unbiasedestimation in high dimensional linear regression This new frame-work is motivated by its close connection with best subset selectionunder orthogonal design while enjoying several key computationaland theoretical advantages over many existing penalization methods(eg SCAD or MCP) Computationally HTR is a fast two-step esti-mation procedure consisting of the first step for calculating a coarseinitial estimator and the second step for solving a linear program-ming Theoretically under some mild conditions the HTR estima-tor is shown to enjoy the strong oracle property and thresholed prop-erty even when the number of covariates may grow at an exponen-tial rate We also propose to incorporate the regularized covarianceestimator into the estimation procedure in order to better trade offbetween noise accumulation and correlation modeling Under thisscenario with regularized covariance matrix HTR includes Sure In-dependence Screening as a special case Both simulation and realdata results show that HTR outperforms other state-of-the-art meth-ods

Session 49 Network AnalysisUnsupervised Methods

Community Detection in Multilayer Networks A HypothesisTesting ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel HilljameswdemailunceduThe identification of clusters in relational data otherwise knownas community detection is an important and well-studied problemin undirected and directed networks Importantly the units of acomplex system often share multiple types of pairwise relationships

wherein a single community detection analysis does not account forthe unique types or layers In this scenario a sequence of networkscan be used to model each type of relationship resulting in a multi-layer network structure We propose and investigate a novel testingbased community detection procedure for multilayer networks Weshow that by borrowing strength across layers our method is ableto detect communities in scenarios that are impossible for contem-porary detection methods By investigating the performance andpotential use of our method through simulations and applicationon real multilayer networks we show that our procedure can suc-cessfully identify significant community structure in the multilayerregime

Network Enrichment Analysis with Incomplete Network Infor-mationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan2University of Washingtonmjingumichedu

Pathway enrichment analysis has become a key tool for biomed-ical researchers to gain insight in the underlying biology of dif-ferentially expressed genes proteins and metabolites It reducescomplexity and provides a systems-level view of changes in cellu-lar activity in response to treatments andor progression of diseasestates Methods that use pathway topology information have beenshown to outperform simpler methods based on over-representationanalysis However despite significant progress in understandingthe association among members of biological pathways and ex-pansion of new knowledge data bases such as Kyoto Encyclope-dia of Genes and Genomes Reactome BioCarta etc the exist-ing network information may be incompleteinaccurate and are notcondition-specific We propose a constrained network estimationframework that combines network estimation based on cell- andcondition-specific omics data with interaction information from ex-isting data bases The resulting pathway topology information issubsequently used to provide a framework for simultaneous test-ing of differences in mean expression levels as well as interactionmechanisms We study the asymptotic properties of the proposednetwork estimator and the test for pathway enrichment and investi-gate its small sample performance in simulated experiments and ona bladder cancer study involving metabolomics data

Estimation of A Linear Model with Fuzzy Data Treated as Spe-cial Functional DataWang DabuxilatuGuangzhou Universitywangdabugzhueducn

Data which cannot be exactly described by means of numerical val-ues such as evaluations medical diagnosi quality ratings vagueeconomic items to name but a few are frequently classified as ei-ther nominal or ordinal However we may be aware of that usingsuch representation of data (eg the categorises are labeled withnumerical values) the statistical analysis is limited and sometimesthe interpretation and reliability of the conclusions are effected Aneasy-to-use representation of such data through fuzzy values (fuzzydata) could be employed The measurement scale of fuzzy valuesincludes in particular real vectors and set values as special ele-ments It is more expressive than ordinal scales and more accuratethan rounding or using real or vectorial-valued codes The transi-tion between closely different values can be made gradually andthe variability accuracy and possible subjectiveness can be well re-flected in describing data Fuzzy data could be viewed as special

72 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

functional data the so-called support function of the data as it es-tablishes a useful embedding of the space of fuzzy data into a coneof a functional Hilbert spaceThe simple linear regression models with fuzzy data have been stud-ied from different perspectives and in different frameworks Theleast squares estimation on real-valued and set valued parametersunder generalized Hausdorff metric and the Hukuhara differenceare obtained However due to the nonlinearity of the space of fuzzyrandom sets it is difficult to consider the parameters estimation fora multivariate linear model with fuzzy random sets We will treatthe fuzzy data as special functional data to estimate a multivariatelinear model within a cone of a functional Hilbert space As a casewe consider LR fuzzy random sets (LR fuzzy values or LR fuzzydata) which are a sort of fuzzy data applied to model usual ran-dom experiments when the characteristic observed on each resultcan be described with fuzzy numbers of a particular class deter-mined by three random variables the center the left spread andthe right spread under the given shape functions L and R LRfuzzy random sets are widely applied in information science deci-sion making operational research economic and financial model-ings Using a least squares approach we obtain an estimation forthe set-valued parameters of the multivariate regression model withLR fuzzy random sets under L2 metric delta2dLSsome bootstrapdistributions for the spreads variables of the fuzzy random residualterm are given

Efficient Estimation of Sparse Directed Acyclic Graphs UnderCompounded Poisson DataSung Won Han and Hua ZhongNew York Universitysungwonhan2gmailcom

Certain gene expressions such as RNA-sequence measurementsare recorded as count data which can be assumed to follow com-pounded Poisson distribution This presentation proposes an effi-cient heuristic algorithm to estimate the structure of directed acyclicgraphs under the L1-penalized likelihood with the Poisson log-normal distributed data given that variable ordering is unknown Toobtain the close form of the penalize likelihood we apply Laplaceintegral approximation for unobserved normal variables and we useiterative two optimization steps to estimate an adjacency matrix andunobserved parameters The adjacency matrix is estimated by sepa-rable lasso problems and the unobserved parameters of the normaldistribution are estimated by separable optimization problems Thesimulation result shows that our proposed method performs betterthan the data transformation method in terms of true positive andMatthewrsquos correlation coefficient except for under low count datawith many zeros The large variance of data and the large numberof variables benefit more to the proposed method

Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and Harrison ZhouYale Universityzhaorenyaleedu

A tuning-free procedure is proposed to estimate the covariate-adjusted Gaussian graphical model For each finite subgraph thisestimator is asymptotically normal and efficient As a consequencea confidence interval can be obtained for each edge The proce-dure enjoys easy implementation and efficient computation throughparallel estimation on subgraphs or edges We further apply theasymptotic normality result to perform support recovery throughedge-wise adaptive thresholding This support recovery procedure

is called ANTAC standing for Asymptotically Normal estimationwith Thresholding after Adjusting Covariates ANTAC outper-forms other methodologies in the literature in a range of simulationstudies We apply ANTAC to identify gene-gene interactions us-ing a yeast eQTL (Genome-wide expression quantitative trait loci)dataset Our result achieves better interpretability and accuracy incomparison with the CAPME (covariate-adjusted precision matrixestimation) method proposed by Cai Li Liu and Xie (2013) This isa joint work with Mengjie Chen Hongyu Zhao and Harrison Zhou

Session 50 Personalized Medicine and Adaptive Design

MicroRNA Array NormalizationLi-Xuan and Qin ZhouMemorial Sloan Kettering Cancer CenterqinlmskccorgMicroRNA microarrays possess a number of unique data featuresthat challenge the assumption key to many normalization methodsWe assessed the performance of existing normalization methods us-ing two Agilent microRNA array datasets derived from the sameset of tumor samples one dataset was generated using a blockedrandomization design when assigning arrays to samples and hencewas free of confounding array effects the second dataset was gener-ated without blocking or randomization and exhibited array effectsThe randomized dataset was assessed for differential expression be-tween two tumor groups and treated as the benchmark The non-randomized dataset was assessed for differential expression afternormalization and compared against the benchmark Normaliza-tion improved the true positive rate significantly but still possessesa false discovery rate as high as 50 in the non-randomized dataregardless of the specific normalization method applied We per-formed simulation studies under various scenarios of differentialexpression patterns to assess the generalizability of our empiricalobservations

Combining Multiple Biomarker Models with Covariates in Lo-gistic Regression Using Modified ARM (Adaptive Regression byMixing) ApproachYanping Qiu1 and Rong Liu2

1Merck amp Co2Bayer HealthCarerongliuflgmailcomBiomarkers are wildly used as an indicator of some biological stateor condition in medical research One single biomarker may notbe sufficient to serve as an optimal screening device for early de-tection or prognosis for many diseases A combination of multiplebiomarkers will usually potentially lead to more sensitive screen-ing rules Therefore a great interest has been involved in develop-ing methods for combining biomarkers Biomarker selection pro-cedure will be necessary for efficient detections In this article wepropose a model-combining algorithm for classification with somenecessary covariates in biomarker studies It selects some best mod-els with some criterion and considers weighted combinations ofvarious logistic regression models via ARM (adaptive regressionby mixing) The weights and algorithm are justified using cross-validation methods Simulation studies are performed to assess thefinite-sample properties of the proposed model-combining methodIt is illustrated with an application to data from a vaccine study

A New Association Test for Case-Control GWAS Based on Dis-ease Allele SelectionZhongxue Chen

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 73

Abstracts

Indiana Universityzc3indianaeduCurrent robust association tests for case-control genome-wide asso-ciation study (GWAS) data are mainly based on the assumption ofsome specific genetic models Due to the richness of the geneticmodels this assumption may not be appropriate Therefore robustbut powerful association approaches are desirable Here we proposea new approach to testing for the association between the genotypeand phenotype for case-control GWAS This method assumes a gen-eralized genetic model and is based on the selected disease allele toobtain a p-value from the more powerful one-sided test Through acomprehensive simulation study we assess the performance of thenew test by comparing it with existing methods Some real data ap-plications are used to illustrate the use of the proposed test Basedon the simulation results and real data application the proposed testis powerful and robust

On Classification Methods for Personalized Medicine and Indi-vidualized Treatment RulesDaniel RubinUnited States Food and Drug AdministrationDanielRubinfdahhsgovAn important problem in personalized medicine is to construct in-dividualized treatment rules from clinical trials Instead of rec-ommending a single treatment for all patients such a rule tailorstreatments based on patient characteristics in order to optimize re-sponse to therapy In a 2012 JASA article Zhao et al showeda connection between this problem of constructing an individual-ized treatment rule and binary classification For instance in a two-arm clinical trial with binary outcomes and 11 randomization theproblem of constructing an individualized treatment rule can be re-duced to the classification problem in which one restricts to respon-ders and builds a classifier that predicts subjectsrsquo treatment assign-ments We extend this method to show an analogous reduction to theproblem in which one restricts to non-responders and must build aclassifier that predicts which treatments subjects were not assignedWe then use results from statistical efficiency theory to show howto efficiently combine the information from responders and non-responders Simulations show the benefits of the new methodology

Bayesian Adaptive Design for Dose-Finding Studies with De-layed Binary ResponsesXiaobi Huang1 and Haoda Fu2

1Merck amp Co2Eli Lilly and CompanyxiaobihuangmerckcomBayesian adaptive design is a popular concept in recent dose-findingstudies The idea of adaptive design is to use accrued data to makeadaptation or modification to an ongoing trial to improve the effi-ciency of the trial During the interim analysis most current meth-ods only use data from patients who have completed the studyHowever in certain therapeutic areas as diabetes and obesity sub-jects are usually studied for months to observe a treatment effectThus a large proportion of them have not completed the study atthe interim analysis It could lead to extensive information loss ifwe only incorporate subjects who completed the study at the interimanalysis Fu and Manner (2010) proposed a Bayesian integratedtwo-component prediction model to incorporate subjects who havenot yet completed the study at the time of interim analysis Thismethod showed efficiency gain with continuous delayed responsesIn this paper we extend this method to accommodate delayed bi-nary response and illustrate the Bayesian adaptive design through asimulation example

Session 51 New Development in Functional Data Analy-sis

Variable Selection and Estimation for Longitudinal Survey DataLi Wang1 Suojin Wang2 and Guannan Wang1

1University of Georgia2Texas AampM UniversityguannanugaeduThere is wide interest in studying longitudinal surveys where sam-ple subjects are observed successively over time Longitudinal sur-veys have been used in many areas today for example in the healthand social sciences to explore relationships or to identify signifi-cant variables in regression settings This paper develops a generalstrategy for the model selection problem in longitudinal sample sur-veys A survey weighted penalized estimating equation approachis proposed to select significant variables and estimate the coeffi-cients simultaneously The proposed estimators are design consis-tent and perform as well as the oracle procedure when the correctsubmodel were known The estimating function bootstrap is ap-plied to obtain the standard errors of the estimated parameters withgood accuracy A fast and efficient variable selection algorithm isdeveloped to identify significant variables for complex longitudinalsurvey data Simulated examples are illustrated to show the useful-ness of the proposed methodology under various model settings andsampling designs

Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and Boris Freydin1

1Thomas Jefferson University2George Washington UniversityapanasovichgwueduIn this work we develop an ordinary differential equations (ODE)model of physiological regulation of glycemia in type 1 diabetesmellitus (T1DM) patients in response to meals and intravenous in-sulin infusion Unlike for majority of existing mathematical modelsof glucose-insulin dynamics parameters in our model are estimablefrom a relatively small number of noisy observations of plasmaglucose and insulin concentrations For estimation we adopt thegeneralized smoothing estimation of nonlinear dynamic systems ofRamsay et al (2007) In this framework the ODE solution is ap-proximated with a penalized spline where the ODE model is in-corporated in the penalty We propose to optimize the generalizedsmoothing by using penalty weights that minimize the covariancepenalties criterion (Efron 2004) The covariance penalties criterionprovides an estimate of the prediction error for nonlinear estima-tion rules resulting from nonlinear andor non-homogeneous ODEmodels such as our model of glucose-insulin dynamics We alsopropose to select the optimal number and location of knots for B-spline bases used to represent the ODE solution The results of thesmall simulation study demonstrate advantages of optimized gen-eralized smoothing in terms of smaller estimation errors for ODEparameters and smaller prediction errors for solutions of differen-tial equations Using the proposed approach to analyze the glucoseand insulin concentration data in T1DM patients we obtained goodapproximation of global glucose-insulin dynamics and physiologi-cally meaningful parameter estimates

A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1

1New York University2Columbia Universityzhaoy05nyumcorg

74 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Resting-state functional magnetic resonance imaging (fMRI) is sen-sitive to functional brain changes related to many psychiatric disor-ders and thus becomes increasingly important in medical researchOne useful approach for fitting linear models with scalar outcomesand image predictors involves transforming the functional data tothe wavelet domain and converting the data fitting problem to a vari-able selection problem Applying the LASSO procedure in this sit-uation has been shown to be efficient and powerful In this study weexplore possible directions for improvements to this method The fi-nite sample performance of the proposed methods will be comparedthrough simulations and real data applications in mental health re-search We believe applying these procedures can lead to improvedestimation and prediction as well as better stability An illustrationof modeling psychiatric traits based on brain-imaging data will bepresented

Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan MizeraUniversity of AlbertalkongualbertacaWe consider the estimation in functional linear quantile regressionin which the dependent variable is scalar while the covariate is afunction and the conditional quantile for each fixed quantile indexis modeled as a linear functional of the covariate There are twocommon approaches for modeling the conditional mean as a linearfunctional of the covariate One is to use the functional principalcomponents of the covariates as basis to represent the functionalcovariate effect The other one is to extend the partial least squareto model the functional effect The former belongs to unsupervisedmethod and has been generalized to functional linear quantile re-gression The later is a supervised method and is superior to theunsupervised PCA method In this talk we propose to use partialquantile regression to estimate the functional effect in functionallinear quantile regression Asymptotic properties have been stud-ied and show the virtue of our method in large sample Simulationstudy is conducted to compare it with existing methods A real dataexample in stroke study is analyzed and some interesting findingsare discovered

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs

Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric ChiAmgen IncchiamgencomAs the patents of a growing number of biologic medicines have al-ready expired or are due to expire it has led to an increased interestfrom both the biopharmaceutical industry and the regulatory agen-cies in the development and approval of biosimilars EMA releasedthe first general guideline on similar biological medicinal productsin 2005 and specific guidelines for different drug classes subse-quently FDA issued three draft guidelines in 2012 on biosimilarproduct development A synthesized message from these guidancedocuments is that due to the fundamental differences between smallmolecule drug products and biologic drug products which are madeof living cells the generic versions of biologic drug products areviewed as similar instead of identical to the innovative biologicdrug product Thus more stringent requirement is necessary todemonstrate there are no clinically meaningful differences between

the biosimilar product and the reference product in terms of safetypurity and potency In this article we will briefly review statis-tical issues and challenges in clinical development of biosimilarsincluding criteria for biosimilarity and interchangeability selectionof endpoints and determination of equivalence margins equivalencevs non-inferiority bridging and regional effect and how to quan-tify totality-of-the-evidence

New Analytical Methods for Non-Inferiority Trials CovariateAdjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug AdministrationzhiweizhangfdahhsgovEven though an active-controlled trial provides no informationabout placebo investigators and regulators often wonder how theexperimental treatment would compare to placebo should a placeboarm be included in the study Such an indirect comparison oftenrequires a constancy assumption namely that the control effect rel-ative to placebo is constant across studies When the constancyassumption is in doubt there are ad hoc methods that ldquodiscountrdquothe historical data in conservative ways Recently a covariate ad-justment approach was proposed that does not require constancyor involve discounting but rather attempts to adjust for any imbal-ances in covariates between the current and historical studies Thiscovariate-adjusted approach is valid under a conditional constancyassumption which requires only that the control effect be constantwithin each subpopulation characterized by the observed covariatesFurthermore a sensitivity analysis approach has been developed toaddress possible departures from the conditional constancy assump-tion due to imbalances in unmeasured covariates This presentationdescribes these new approaches and illustrates them with examples

Where is the Right Balance for Designing an Efficient Biosim-ilar Clinical Program - A Biostatistic Perspective on Appro-priate Applications of Statistical Principles from New Drug toBiosimilarsYulan LiNovartis Pharmaceuticals Corporationyulanlinovartiscom

Challenges of designinganalyzing trials for Hepatitis C drugsGreg SoonUnited States Food and Drug AdministrationGuoxingSoonfdahhsgovThere has been a surge an outburst in drug developments to treathepatitis C virus (HCV) infection in the past 3-4 years and thelandscape has shifted significantly In particularly theresponse rateshaves steadily increased from approximately round 50 to now90 for HCV genotype 1 patients during this time While the suchchanging landscape es is beneficial were great for patientsit doeslead to some new challenges for new future HCV drugd evelopmentSome of the challenges include particularly in thechoice of controlsuccess efficacy winning criteria for efficacy and co-developmentof several drugs In this talk I will summarize the current landscapeof the HCV drug development and describe someongoing issues thatof interest

GSKrsquos Patient-level Data Sharing ProgramShuyen HoGlaxoSmithKline plcshu-yenyhogskcomIn May 2013 GSK launched an online system which would en-able researchers to request access to the anonymized patient-leveldata from published GSK-sponsored clinical trials of authorized or

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 75

Abstracts

terminated medicines Phase I-IV Consistent with expectations ofgood scientific practice researchers can request access and are re-quired to provide a scientific protocol with a commitment to publishtheir findings An Independent Review Panel is responsible for ap-proving or denying access to the data after reviewing a researcherrsquosproposal Once the request is approved and a signed Data SharingAgreement is received access to the requested data is provided ona password protected website to help protect research participantsrsquoprivacy This program is a step toward the ultimate aim of the clin-ical research community of developing a broader system where re-searchers will be able to access data from clinical trials conductedby different sponsors This talk will describe some of the details ofGSKrsquos data-sharing program including the opportunities and chal-lenges it presents We hope to bring the awareness of ICSAKISSsymposium participants on this program and encourage researchersto take full advantage of it to further clinical research

Session 53 Gatekeeping Procedures and Their Applica-tion in Pivotal Clinical Trials

A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane21Novartis Pharmaceuticals Corporation2Northwestern UniversitydongxinovartiscomWe generalize a multistage procedure for parallel gatekeeping towhat we refer to as k-out-of-n gatekeeping in which at least k outof n hypotheses in a gatekeeper family must be rejected in orderto test the hypotheses in the following family This gatekeepingrestriction arises in certain types of clinical trials for example inrheumatoid arthritis trials it is required that efficacy be shown onat least three of the four primary endpoints We provide a unifiedtheory of multistage procedures for arbitrary k with k = 1 corre-sponding to parallel gatekeeping and k = n to serial gatekeepingThe proposed procedure is simpler to apply for this particular prob-lem using a stepwise algorithm than the mixture procedure and thegraphical procedure with memory using entangled graphs

Multiple Comparisons in Complex Trial DesignsHM James HungUnited States Food and Drug AdministrationhsienminghungfdahhsgovAs the costs of clinical trials increase greatly in addition to otherconsiderations the clinical development program increasingly in-volves more than one trial for assessing the treatment effect of a testdrug particularly on adverse clinical outcomes A number of com-plex trial designs have been encountered in regulatory applicationsIn one scenario the primary efficacy endpoint requires two posi-tive trials to conclude a treatment effect while the key secondaryendpoint is a major adverse clinical endpoint such as mortality thatneeds to rely on integration of multiple trials in order to have a suf-ficient statistical power to show the treatment effect This presenta-tion is to stipulate the potential utility of such a trial design and thechallenging multiplicity issues with statistical inference

Use of Bootstrapping in Adaptive Designs with Multiplicity Is-suesJeff MacaQuintilesjeffmacaquintilescomWhen designing a clinical study there are often many parameterswhich are either unknown or not known with the precision neces-

sary to have confidence in the over design This has lead sponsors towant the design studies which are adaptive in nature and can adjustfor these design parameters by using data from the study to estimatethem As there are many different design parameters which dependon the type of study many different types of adaptive designs havebeen proposed It is also possible that one of the issues in the de-sign of the study is the optimal multiplicity strategy which could bebased on assumptions on the correlation of the multiple endpointswhich is often very difficult to know prior to the study start Theproposed methodology would use the data to estimate these param-eters and correct for any inaccuracies in the assumptions

Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael LeeJanssen Research amp Developmentmlee60itsjnjcomMultiplicity issues arise frequently in clinical trials with multipleendpoints andor multiple doses In drug development because ofregulatory requirements control of family-wise error rate (FWER)is essential in pivotal trials Numerous multiple testing proceduresthat control FWER in strong sense are available in literature Par-ticularly in the last decade efficient testing procedures such asfallback procedures gatekeeping procedures and the graphical ap-proach were proposed Depending on objectives of a study oneof these testing procedures can over-perform others To understandwhich testing procedure is preferable under certain circumstancewe use a simulation approach to evaluate performance of a few com-monly used multiple testing procedures Evaluation results and rec-ommendation will be presented

Session 54 Approaches to Assessing Qualitative Interac-tions

Interval Based Graphical Approach to Assessing Qualitative In-teractionGuohua Pan and Eun Young SuhJohnson amp JohnsonesuhitsjnjcomIn clinical studies comparing treatments the population often con-sists of subgroups of patients with different characteristics and in-vestigators often wish to know whether treatment effects are ho-mogeneous over various subgroups Qualitative interaction occurswhen the direction of treatment effect varies among subgroups Inthe presence of a qualitative interaction treatment recommendationis often challenging In medical research and applications to HealthAuthorities for approvals of new drugs qualitative interaction andits impact need to be carefully evaluated The initial statisticalmethod for assessing qualitative interaction was developed by Gailand Simon (GS) in 1985 and has been incorporated into commer-cial statistical software such as SAS While relatively often usedthe GS method and its interpretation are not easily understood bymedical researchers Alternative approaches have been researchedsince then One of the promising methods utilizes graphical repre-sentation of specially devised intervals for the treatment effects inthe subgroups If some of the intervals are to the left and others tothe right of a vertical line representing no treatment difference thereis then statistical evidence of a qualitative interaction and otherwisenot This feature similar to the familiar forest plots by subgroups isnaturally appealing to clinical scientists for examining and under-standing qualitative interactions These specially devised intervals

76 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

are shorter than simultaneous confidence intervals for treatment ef-fects in the subgroups and are shown to rival the GS method in sta-tistical power The method is easy to use and additionally providesan explicit power function which the GS method lacks This talkwill review and contrast statistical methods for assessing qualitativeinteraction with an emphasis on the above described graphical ap-proach Data from mega clinical trials on cardiovascular diseaseswill be analyzed to illustrate and compare the methods

Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong LuoCelgene Corporationxluocelgenecom

Post hoc findings of unexpected heterogeneous treatment effectshave been a challenge in the interpretation of clinical trials for spon-sor regulatory agencies and medical practitioners They are possi-ble simply due to chance or due to fundamental treatment effectdifferentiation Without repeating the resource intensive clinical tri-als it is critical to examine the framework of the given studies andto explore the likely model that may explain the overly simplifiedanalyses In this talk we will describe both theory and real clinicaltrials that can share lights on this complex and challenging issue

A Bayesian Approach to Qualitative InteractionEmine O BaymanUniversity of Iowaemine-baymanuiowaedu

A Bayesian Approach to Qualitative Interaction Author Emine OBayman Ph D emine-baymanuiowaedu Department of Anes-thesia Department of Biostatistics University of IowaDifferences in treatment effects between centers in a multi-centertrial may be important These differences represent treatment bysubgroup interaction Qualitative interaction occurs when the sim-ple treatment effect in one subgroup has a different sign than inanother subgroup1 this interaction is important Quantitative inter-action occurs when the treatment effects are of the same sign in allsubgroups and is often not important because the treatment recom-mendation is identical in all subgroupsA hierarchical model is used with exchangeable mean responsesto each treatment between subgroups Bayesian test of qualita-tive interaction is developed2 by calculating the posterior proba-bility of qualitative interaction and the corresponding Bayes factorThe model is motivated by two multi-center trials with binary re-sponses3 The frequentist power and type I error of the test usingthe Bayes factor are examined and compared with two other com-monly used frequentist tests Gail and Simon4 and Piantadosi andGail5 tests The impact of imbalance between the sample sizesin each subgroup on power is examined under different scenar-ios The method is implemented using WinBUGS and R and theR2WinBUGS interfaceREFERENCES 1 Peto R Statistical Aspects of Cancer TrialsTreatment of cancer Edited by Halnan KE London Chapman ampHall 1982 pp 867-871 2 Bayman EO Chaloner K Cowles MKDetecting qualitative interaction a Bayesian approach Statistics inMedicine 2010 29 455-63 3 Todd MM Hindman BJ Clarke WRTorner JC Intraoperative Hypothermia for Aneurysm Surgery TrialI Mild intraoperative hypothermia during surgery for intracranialaneurysm New England Journal of Medicine 2005 352 135-454 Gail M Simon R Testing for Qualitative Interactions betweenTreatment Effects and Patient Subsets Biometrics 1985 41 361-372 5 Piantadosi S Gail MH A comparison of the power of two

tests for qualitative interactions Statistics in Medicine 1993 121239-48

Session 55 Interim Decision-Making in Phase II Trials

Evaluation of Interim Dose Selection Methods Using ROC Ap-proachDeli Wang Lu Cui Lanju Zhang and Bo YangAbbVie Incdeliwangabbviecom

Interim analyses may be planned to drop inefficacious dose(s) indose-ranging clinical trials Commonly used statistical methods forinterim decision-making include conditional power (CP) predictedconfidence interval (PCI) and predictive power (PP) approachesFor these widely used methods it is worthy to have a closer look attheir performance characteristics and their interconnected relation-ship This research is to investigate the performance of these threestatistical methods in terms of decision quality based on a receiveroperating characteristic (ROC) method in the binary endpoint set-tings More precisely performance of each method is studied basedon calculated sensitivity and specificity under the assumed rangesof desirable as well as undesirable outcomes The preferred cutoffis determined and performance comparison across different meth-ods can be made With an apparent exchangeability of the threemethods a simple and uniform approach becomes possible

Interim Monitoring for Futility Based on Probability of SuccessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh3

1AbbVie Inc2Merck amp Co3GlaxoSmithKline plcyijiezhouabbviecom

Statistical significance has been the traditional focus of clinical trialdesign However an increasing emphasis has been placed on themagnitude of treatment effect based on point estimates to enablecross-therapy comparison The magnitude of point estimates todemonstrate sufficient medical value when compared with exist-ing therapies is typically larger than that to demonstrate statisticalsignificance Therefore a new clinical trial design and its interimmonitoring needs to take into account the trial success in terms ofthe magnitude of point estimates In this talk we propose a new in-terim monitoring approach for futility that targets on the probabilityof trial success in terms of achieving a sufficiently large point es-timate at end of the trial Simulation is conducted to evaluate theoperational characteristics of this approach

Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran SureshGlaxoSmithKline plcyuehui2wugskcom

Efficacy assessment is commonly seen in oncology trials as early asin Phase I trial expansion cohort part and phase II trials Early de-tection of efficacy signal or futility signal can greatly help the teamto make early decisions on future drug development plans such asstop for futility or start late phase planning In order to achievethis goal Bayesian adaptive design utilizing predictive probabilityis implemented This approach allows the team to monitor efficacydata constantly as the new patientrsquos data become available and makedecisions before the end of trial The primary endpoint in Oncologytrials is usually overall survival or progression free survival whichtakes long time to observe so surrogate endpoint such as overall

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 77

Abstracts

response rate is often used in early phase trials Multiple bound-aries for making future strategic decisions or for different endpointscan be provided Simulations play a vital role in providing variousdecision-making boundaries as well as the corresponding operatingcharacteristics Based on simulation results for each given samplesize the minimal sample size needed for the first interim look andthe futilityefficacy boundaries will be provided based on Bayesianpredictive probabilities Details of the implementation of this de-sign in real clinical trials will be demonstrated and pros and cons ofthis type of design will also be discussed

Session 56 Recent Advancement in Statistical Methods

Exact Inference New Methods and ApplicationsIan DinwoodiePortland State UniversityihdpdxeduExact inference concerns methods that generalize Fisherrsquos ExactTest for independence The methods are exact in the sense that teststatistics have distributions that do not depend on nuisance param-eters and asymptotic approximations are not used However com-putations are challenging and often require Monte Carlo methodsThis talk gives an overview with attention to sampling techniquesincluding Markov Chains and sequential importance sampling withnew applications to dynamical models and signalling networks

Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun HongSungkyunkwan UniversitycshongskkueduConsider the ROC surface which is a generalization of the ROCcurve for three-class diagnostic problems In this work we pro-pose five criteria for the three-class ROC surface by extending theYouden index the sum of sensitivity and specicity the maximumvertical distance the amended closest-to-(01) and the true rate Itmay be concluded that these five criteria can be expressed as a func-tion of two Kolmogorov-Smirnov (K-S) statistics It is found thatthe paired optimal thresholds selected from the ROC surface areequivalent to the two optimal thresholds found from the two ROCcurves Moreover we consider the volume under the ROC surface(VUS) The standard criteria of AUC for the probability of defaultbased on Basel II is extended to the VUS for ROC surface so thatthe standard criteria of VUS for the classification model is proposedThe ranges of AUC K-S and mean difference statistics correspond-ing to values of are VUS for each class of the standard criteria areobtained By exploring relationships of these statistics the standardcriteria of VUS for ROC surface could be established

Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho2

1Washington State University2Seoul National UniversityahnwsueduWe study the asymptotic properties of the reduced-rank estimator oferror correction models of vector processes observed with measure-ment errors Although it is well known that there is no asymptoticmeasurement error bias when predictor variables are integrated pro-cesses in regression models (Phillips and Durlauf 1986) we sys-tematically investigate the effects of the measurement errors (in thedependent variables as well as in the predictor variables) on the es-timation of not only cointegrating vectors but also the speed of ad-

justment matrix Furthermore we present the asymptotic propertiesof the estimators We also obtain the asymptotic distribution of thelikelihood ratio test for the cointegrating ranks and investigate theeffects of the measurement errors on the test through a Monte Carlosimulation study

A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying YuanUniversity of Texas MD Anderson Cancer Centerwshenmdandersonorg

Time-dependent areas under the receiver operating characteristics(ROC) curve (AUC) are important measures to evaluate the predic-tion accuracy of biomarkers for time-to-event endpoints (eg timeto disease progression or death) In this paper we propose a di-rect method to estimate AUC as a function of time using a flexiblefractional polynomials model without the middle step of modelingthe time-dependent ROC We develop a pseudo partial-likelihoodprocedure for parameter estimation and provide a test procedureto compare the predictive performance between biomarkers Weestablish the asymptotic properties of the proposed estimator andtest statistics A major advantage of the proposed method is itsease to make inference and compare the prediction accuracy acrossbiomarkers rendering our method particularly appealing for studiesthat require comparing and screening a large number of candidatebiomarkers We evaluate the finite-sample performance of the pro-posed method through simulation studies and illustrate our methodin an application to primary biliary cirrhosis data

Session 57 Building Bridges between Research and Prac-tice in Time Series Analysis

Time Series Research at the U S Census BureauBrian C MonsellU S Census Bureaubriancmonsellcensusgov

The Census Bureau has taken steps to reinforce the role of researchwithin the organization This talk will give details on the role of sta-tistical research at the U S Census Bureau with particular attentionpaid to the status of current work in time series analysis and statis-tical software development in time series A brief history of timeseries research will be given as well as details of work of historicalinterest

Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and Jinwoo Shin2

1IBM2KAIST Universitynabeusibmcom

Temporal causal modeling is an approach for modeling and causalinference based on time series data which is based on some recentadvances in graphical Granger modeling In this presentation wewill review the basic concept and approach some specific modelingalgorithms methods for associated functions (eg root cause anal-ysis) as well as some efforts of scaling these methods via parallelimplementation We will also describe some business applicationsof this approach in a number of domains (The authors are orderedalphabetically)

78 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Issues Related to the Use of Time Series in Model Building andAnalysisWilliam WS WeiTemple UniversitywweitempleeduTime series are used in many studies for model building and anal-ysis We must be very careful to understand the kind of time seriesdata used in the analysis In this presentation we will begin withsome issues related to the use of aggregate and systematic samplingtime series Since several time series are often used in a study of therelationship of variables we will also consider vector time seriesmodeling and analysis Although the basic procedures of modelbuilding between univariate time series and vector time series arethe same there are some important phenomena which are unique tovector time series Therefore we will also discuss some issues re-lated to vector time models Understanding these issues is importantwhen we use time series data in modeling and analysis regardlessof whether it is a univariate or multivariate time series

Session 58 Recent Advances in Design for BiostatisticalProblems

Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough CarriereUniversity of AlbertaKccarrieualbertacaN-of-1 trials are randomized multi-crossover experiments using twoor more treatments on a single patient They provide evidence-basedinformation on an individual patient thus optimizing the manage-ment of the individualrsquos chronic disease Such trials are preferredin many medical experiments as opposed to the more conventionalstatistical designs constructed to optimize treating the average pa-tient N-of-1 trials are also popular when the sample size is toosmall to adopt traditional optimal designs However there are veryfew guidelines available in the literature We constructed optimal N-of-1 designs for two treatments under a variety of conditions aboutthe carryover effects the covariance structure and the number ofplanned periods Extension to optimal aggregated N-of-1 designs isalso discussed

Efficient Algorithms for Two-stage Designs on Phase II ClinicalTrialsSeongho Kim1 and Weng Kee Wong2

1Wayne State UniversityKarmanos Cancer Institute2University of California at Los AngeleskimsekarmanosorgSingle-arm two-stage designs have been widely used in phase IIclinical trials One of the most popular designs is Simonrsquos optimaltwo-stage design that minimizes the expected sample size under thenull hypothesis Currently a greedy search algorithm is often usedto evaluate every possible combination of sample sizes for optimaltwo-stage designs However such a greedy strategy is computation-ally intensive and so is not feasible for large sample sizes or adaptivetwo-stage design with many parameters An efficient global op-timization discrete particle swarm optimization (DPSO) is there-fore developed to find two-stage designs efficiently and is comparedwith greedy algorithms for Simonrsquos optimal two-stage and adaptivetwo-stage designs It is further shown that DPSO can be efficientlyapplied to complicated adaptive two-stage designs even with threeprefixed possible response rates which a greedy algorithm cannothandle

D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle Swarm Op-timizationJiaheng Qiu and Weng Kee WongUniversity of California at Los AngeleswkwonguclaeduMultiple drug therapies are increasingly used to treat many diseasessuch as AIDS cancer and rheumatoid arthritis At the early stagesof clinical research the outcome is typically studied using a non-linear model with multiple doses from various drugs Advances inhandling estimation issues for such models are continually made butresearch to find informed design strategies has lagged We developa nature-inspired metaheuristic algorithm called ultra-dimensionalParticle Swarm Optimization (UPSO) to find D-optimal designs forthe Poisson and Exponential models for studying effects of up to 5drugs and their interactions This novel approach allows us to findeffective search strategy for such high-dimensional optimal designsand gain insight of their structure including conditions under whichlocally D-optimal designs are minimally supported We implementthe UPSO algorithm on a web site and apply it to redesign a realstudy that investigates 2-way interaction effects on the induction ofmicronuclei in mouse lymphoma cells from 3 genotoxic agents Weshow that a D-optimal design can reap substantial benefits over theimplemented design in Lutz et al (2005)

Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-Chung Wang3

and Weng Kee Wong4

1Institute of Statistical Science Academia Sinica2National Cheng Kung University3National Taiwan University4University of California at Los AngelesfredphoastatsinicaedutwSupersaturated designs (SSDs) are often used in screening experi-ments with a large number of factors to reduce the number of exper-imental runs As more factors are used in the study the search for anoptimal SSD becomes increasingly challenging because of the largenumber of feasible selection of factor level settings This talk tack-les this discrete optimization problem via a metaheuristic algorithmbased on Particle Swarm Optimization (PSO) techniques Usingthe commonly used E(s2) criterion as an illustrative example wewere able to modify the standard PSO algorithm and find SSDs thatsatisfy the lower bounds calculated in Bulutoglu and Cheng (2004)and Bulutoglu (2007) showing that the PSO-generated designs areE(s2)-optimal SSDs

Session 59 Student Award Session 2

Analysis of Sequence Data Under Multivariate Trait-DependentSamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari E North1Eric Boerwinkle2 and Dan-Yu Lin1

1University of North Carolina at Chapel Hill2University of Texas Health Science CentertaorliveunceduHigh-throughput DNA sequencing allows the genotyping of com-mon and rare variants for genetic association studies At the presenttime and in the near future it is not economically feasible to se-quence all individuals in a large cohort A cost-effective strategy isto sequence those individuals with extreme values of a quantitativetrait We consider the design under which the sampling depends on

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 79

Abstracts

multiple quantitative traits Under such trait-dependent samplingstandard linear regression analysis can result in bias of parameterestimation inflation of type 1 error and loss of power We con-struct a nonparametric likelihood function that properly reflects thesampling mechanism and utilizes all available data We implementa computationally efficient EM algorithm and establish the theoret-ical properties of the resulting nonparametric maximum likelihoodestimators Our methods can be used to perform separate inferenceon each trait or simultaneous inference on multiple traits We payspecial attention to gene-level association tests for rare variants Wedemonstrate the superiority of the proposed methods over standardlinear regression through extensive simulation studies We provideapplications to the Cohorts for Heart and Aging Research in Ge-nomic Epidemiology Targeted Sequencing Study and the NationalHeart Lung and Blood Institute Exome Sequencing Project

Empirical Likelihood Based Tests for Stochastic Ordering Un-der Right CensorshipHsin-wen Chang and Ian W McKeague

Columbia Universityhc2496columbiaedu

This paper develops an empirical likelihood approach to testing forstochastic ordering between two univariate distributions under rightcensorship The proposed test is based on a maximally selectedlocalized empirical likelihood ratio statistic The asymptotic nulldistribution is expressed in terms of a Brownian bridge The newprocedure is shown via a simulation study to have superior power tothe log-rank and weighted KaplanndashMeier tests under crossing haz-ard alternatives The approach is illustrated using data from a ran-domized clinical trial involving the treatment of severe alcoholichepatitis

Multiple Genetic Loci Mapping for Latent Disease Liability Us-ing a Structural Equation Modeling Approach with Applicationin Alzheimerrsquos DiseaseTing-Huei Chen

University of North Carolina at Chapel Hillthchenliveuncedu

Categorical traits such as cases-control status are often used as re-sponse variables in genome-wide association studies of genetic lociassociated with complex diseases Using categorical variables tosummarize likely continuous disease liability may lead to loss ofinformation thus reduction of power to recover associated geneticloci On the other hand a direct study of disease liability is ofteninfeasible because it is an unobservable latent variable In some dis-eases the underlying disease liability is manifested by several phe-notypes and thus the associated genetic loci may be identified bycombining the information of multiple phenotypes In this articlewe propose a novel method named PeLatent to address this chal-lenge We employ a structural equation approach to model the latentdisease liability by observed manifest variablesphenotypic infor-mation and to identify simultaneously multiple associated geneticloci by a regularized estimation method Simulation results showthat our method has substantially higher sensitivity and specificitythan existing methods Application of our method for a genome-wide association study of the Alzheimerrsquos disease (AD) identifies27 single nucleotide polymorphisms (SNPs) associated with ADThese 27 SNPs are located within 19 genes and several of thesegenes are known to be related to Alzheimerrsquos disease as well asneural activities

Session 60 Semi-parametric Methods

Semiparametric Estimation of Mean and Variance in General-ized Estimating EquationsJianxin Pan1 and Daoji Li21The University of Manchester2University of Southern CaliforniadaojilimarshallusceduEfficient estimation of regression parameters is a major objective inthe analysis of longitudinal data Existing approaches usually fo-cus on only modeling the mean and treat the variance as a nuisanceparameter The common assumption is that the variance is a func-tion of the mean and the variance function is further assumed to beknown However the estimator of the regression parameters can bevery inefficient if the variance function or variance is misspecifiedIn this paper a flexible semiparametric regression approach for lon-gitudinal data is proposed to jointly model the mean and varianceThe novel semiparametric mean and variance models offer greatflexibility in formulating the effects of covariates and time on themean and variance We simultaneously estimate the parametric andnonparametric components in the models by using a B-splines basedapproach The asymptotic normality of the resulting estimators forparametric components in the proposed models is established andthe optimal rate of convergence of the nonparametric components isobtained Our simulation study shows that our proposed approachyields more efficient estimators for the mean parameters than theconventional GEE approach The proposed approach is also illus-trated with real data analysis

An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan LiIndiana University-Purdue University IndianapolishpengmathiupuieduIn this talk wersquoll construct efficient estimators of linear functionalsof a probability measure when side information is available Our ap-proach is based on maximum empirical likelihood We will exhibitthat the proposed approach is mathematical simpler and computa-tional easier than the usual maximum empirical likelihood estima-tors Several examples are given about the possible side informa-tion We also report some simulation results

M-estimation for General ARMA Processes with Infinite Vari-anceRongning WuBaruch College City University of New YorkrongningwubaruchcunyeduGeneral autoregressive moving average (ARMA) models extend thetraditional ARMA models by removing the assumptions of causal-ity and invertibility The assumptions are not required under a non-Gaussian setting for the identifiability of the model parameters incontrast to the Gaussian setting We study M-estimation for generalARMA processes with infinite variance where the distribution ofinnovations is in the domain of attraction of a non-Gaussian stablelaw Following the approach taken by Davis et al (1992) and Davis(1996) we derive a functional limit theorem for random processesbased on the objective function and establish asymptotic propertiesof the M-estimator We also consider bootstrapping the M-estimatorand extend the results of Davis amp Wu (1997) to the present settingso that statistical inferences are readily implemented Simulationstudies are conducted to evaluate the finite sample performance ofthe M-estimation and bootstrap procedures An empirical exampleof financial time series is also provided

80 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Sufficient Dimension Reduction via Principal Lq Support Vec-tor MachineAndreas Artemiou1 and Yuexiao Dong2

1Cardiff University2Temple Universityydongtempleedu

Principal support vector machine was proposed recently by LiArtemiou and Li (2011) to combine L1 support vector machine andsufficient dimension reduction We introduce Lq support vector ma-chine as a unified framework for linear and nonlinear sufficient di-mension reduction By noticing that the solution of L1 support vec-tor machine may not be unique we set q gt 1 to ensure the unique-ness of the solution The asymptotic distribution of the proposedestimators are derived for q = 2 We demonstrate through numeri-cal studies that the proposed L2 support vector machine estimatorsimprove existing methods in accuracy and are less sensitive to thetuning parameter selection

Nonparametric Quantile Regression via a New MM AlgorithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong4

1College of Charleston1National Chengchi University2Shanghai University of Finance and Economics3Kansas State University4Temple Universitykaibcofcedu

Nonparametric quantile regression is an important statistical modelthat has been widely used in many research fields and applicationsHowever its optimization is very challenging since the objectivefunctions are non-differentiable In this work we propose a newMM algorithm for the nonparametric quantile regression modelThe proposed algorithm simultaneously updates the quantile func-tion and yield a smoother estimate of the quantile function We sys-tematically study the new MM algorithm in local linear quantile re-gression and show that the proposed algorithm preserves the mono-tone descent property of MM algorithms in an asymptotic senseMonte Carlo simulation studies will be presented to show the finitesample performance of the proposed algorithm

Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel Linder JingxianCai and Robert VogelGeorgia Southern Universityjxcai19880721hotmailcom

This article is intended to investigate the performance of two typesof stratified regression estimators namely the separate and the com-bined estimator using stratified ranked set sampling (SRSS) intro-duced by Samawi (1996) The expressions for mean and varianceof the proposed estimates are derived and are shown to be unbiasedA simulation study is designed to compare the efficiency of SRSSrelative to other sampling procedure under varying model scenar-ios Our investigation indicates that the regression estimator of thepopulation mean obtained through an SRSS becomes more efficientthan the crude sample mean estimator using stratified simple ran-dom sampling These findings are also illustrated with the help ofa data set on bilirubin levels in babies in a neonatal intensive careunitKey words Ranked set sampling stratified ranked set samplingregression estimator

Session 61 Statistical Challenges in Variable Selectionfor Graphical Modeling

Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1

1 University of Cambridge2 Columbia UniversityyangfengstatcolumbiaeduCommunity detection is one of the most widely studied problemsin network research In an undirected graph communities are re-garded as tightly-knit groups of nodes with comparatively few con-nections between them Popular existing techniques such as spec-tral clustering and variants thereof rely heavily on the edges beingsufficiently dense and the community structure being relatively ob-vious These are often not satisfactory assumptions for large-scalereal-world datasets We therefore propose a new community de-tection method called fused community detection (fcd) which isdesigned particularly for sparse networks and situations where thecommunity structure may be opaque The spirit of fcd is to takeadvantage of the edge information which we exploit by borrowingsparse recovery techniques from regression problems Our methodis supported by both theoretical results and numerical evidence Thealgorithms are implemented in the R package fcd which is availableon cran

High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2

1Temple University2Emory UniversityjichuntempleeduLarge-scale resting-state fMRI studies have been conducted for pa-tients with autism and the existence of abnormalities in the func-tional connectivity between brain regions (containing more thanone voxel) have been clearly demonstrated Due to the ultra-highdimensionality of the data current methods focusing on studyingthe connectivity pattern between voxels are often lack of power andcomputation-efficiency In this talk we introduce a new frameworkto identify the connection pattern of gigantic networks with desiredresolution We propose three procedures based on different networkstructures and testing criteria The asymptotical null distributions ofthe test statistics are derived together with its rate-optimality Sim-ulation results show that the tests are able to control type I error andyet very powerful We apply our method to a resting-state fMRIstudy on autism The analysis yields interesting insights about themechanism of autism

Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and Marina Vannucci31Stanford University2University of Texas MD Anderson Cancer Center3Rice UniversitycbpetersongmailcomIn this work we propose a Bayesian approach for inference of mul-tiple Gaussian graphical models Specifically we address the prob-lem of inferring multiple undirected networks in situations wheresome of the networks may be unrelated while others share com-mon features We link the estimation of the graph structures via aMarkov random field prior which encourages common edges Inaddition we learn which sample groups have shared graph structureby placing a spike-and-slab prior on the parameters that measurenetwork relatedness This approach allows us to share informationbetween sample groups when appropriate as well as to obtain a

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 81

Abstracts

measure of relative network similarity across groups In simula-tion studies we find improved accuracy of network estimation overcompeting methods particularly when the sample sizes within eachsubgroup are moderate We illustrate our model with an applica-tion to inference of protein networks for various subtypes of acutemyeloid leukemia

Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genevera IAllen2 and Zhandong Liu3

1University of Texas at Austin2Rice University3Baylor College of MedicineyuliabakerriceeduMarkov Random Fields or undirected graphical models are widelyused to model high-dimensional multivariate data Classical in-stances of these models such as Gaussian Graphical and Ising Mod-els as well as recent extensions (Yang et al 2012) to graphicalmodels specified by univariate exponential families assume all vari-ables arise from the same distribution Complex data from high-throughput genomics and social networking for example often con-tain discrete count and continuous variables measured on the sameset of samples To model such heterogeneous data we develop anovel class of mixed graphical models by specifying that each node-conditional distribution is a member of a possibly different univari-ate exponential family We study several instances of our modeland propose scalable M-estimators for recovering the underlyingnetwork structure Simulations as well as an application to learn-ing mixed genomic networks from next generation sequencing andmutation data demonstrate the versatility of our methods

Session 62 Recent Advances in Non- and Semi-Parametric Methods

Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential Spline Fam-ilyLan ZhouTexas AampM UniversitylzhoustattamueduIn this talk we introduce a method for joint estimation of multiplebivariate density functions for a collection of populations of proteinbackbone angles The method utilizes an exponential family of dis-tributions for which the log densities are modeled as a linear com-bination of a common set of basis functions The basis functionsare obtained as bivariate splines on triangulations and are adap-tively chosen based on dataThe circular nature of angular data istaken into account by imposing appropriate smoothness constraintsacross boundaries Maximum penalized likelihood is used for fit-ting the model and an effective Newton-type algorithm is devel-oped A simulation study clearly showed that the joint estimationapproach is statistically more efficient than estimating the densi-ties separately The proposed method provides a novel and uniqueperspective to two important and challenging problems in proteinstructure research namely structure-based protein classification andquality assessment of protein structure prediction servers The jointdensity estimation approach is widely applicable when there is aneed to estimate multiple density functions from different popula-tions with common features Moreover the coefficients of basisexpansion for the fitted densities provide a low-dimensional repre-sentation that is useful for visualization clustering and classifica-

tion of the densities This is joint work with Mehdi Maadooliat XinGao and Jianhua Huang

Estimating Time-Varying Effects for Overdispersed RecurrentData with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2 MounaAkacha3 and Heinz Schmidli31Vanderbilt University2University of North Carolina at Chapel Hill3Novartis Pharmaceuticals CorporationcindychenvanderbilteduIn the analysis of multivariate event times frailty models assum-ing time-independent regression coefficients are often consideredmainly due to their mathematical convenience In practice regres-sion coefficients are often time dependent and the temporal effectsare of clinical interest Motivated by a phase III clinical trial inmultiple sclerosis we develop a semiparametric frailty modellingapproach to estimate time-varying effects for overdispersed recur-rent events data with treatment switching The proposed model in-corporates the treatment switching time in the time-varying coeffi-cients Theoretical properties of the proposed model are establishedand an efficient EM algorithm is derived to obtain the maximumlikelihood estimates Simulation studies evaluate the numerical per-formance of the proposed model under various temporal treatmenteffect curves The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching aswell as for alternative models when the proportional hazard assump-tion is violated A multiple sclerosis dataset is analyzed to illustrateour methodology

Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily WangUniversity of GeorgialilywangugaeduIn this work we are interested in smoothing data over complex ir-regular boundaries or interior holes We propose bivariate penal-ized spline estimators over triangulations using energy functionalas the penalty We establish the consistency and asymptotic normal-ity for the proposed estimators and study the convergence rates ofthe estimators A comparison with thin-plate splines is provided toillustrate some advantages of this spline smoothing approach Theproposed method can be easily applied to various smoothing prob-lems over arbitrary domains including irregularly shaped domainswith irregularly scattered data points

Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and Annie Qu2

1Oregon State University2University of Illinois at Urbana-Champaign3Lung and Blood InstitutexuelstatoregonstateeduWe propose new varying-coefficient model selection and estimationbased on the spline approach which is capable of capturing time-dependent covariate effects The new penalty function utilizes local-region information for varying-coefficient estimation in contrast tothe traditional model selection approach focusing on the entire re-gion The proposed method is extremely useful when the signalsassociated with relevant predictors are time-dependent and detect-ing relevant covariate effects in the local region is more scientifi-cally relevant than those of the entire region However this bringschallenges in theoretical development due to the large-dimensionalparameters involved in the nonparametric functions to capture thelocal information in addition to computational challenges in solv-

82 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

ing optimization problems with overlapping parameters for differ-ent local-region penalization We provide the asymptotic theory ofmodel selection consistency on detecting local signals and estab-lish the optimal convergence rate for the varying-coefficient esti-mator Our simulation studies indicate that the proposed model se-lection incorporating local features outperforms the global featuremodel selection approaches The proposed method is also illus-trated through a longitudinal growth and health study from NationalHeart Lung and Blood Institute

Session 63 Statistical Challenges and Development inCancer Screening Research

Overdiagnosis in Breast and Prostate Cancer Screening Con-cepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing XiaFred Hutchinson Cancer Research CenterretzionifhcrcorgOverdiagnosis occurs when a tumor is detected by screening butin the absence of screening that tumor would never have becomesymptomatic within the lifetime of the patient Thus an overdiag-nosed tumor is a true extra diagnosis due solely to the existence ofthe screening test Patients who are overdiagnosed cannot by def-inition be helped by the diagnosis but they can be harmed partic-ularly if they are treated Therefore knowledge of the likelihoodthat a screen-detected cancer has been overdiagnosed is critical formaking treatment decisions and developing screening policy Theproblem of overdiagnosis has been long recognized in the case ofprostate cancer and is currently an area of extreme interest in breastcancer Published estimates of the frequency of overdiagnosis inbreast and prostate cancer screening vary greatly This presentationwill investigate why different studies yield such different resultsIrsquoll explain how overdiagnosis arises and catalog the different waysit may be measured in population studies Irsquoll then discuss differentapproaches that are used to estimate overdiagnosis Many studiesuse excess incidence under screening relative to incidence withoutscreening as a proxy for overdiagnosis Others use statistical mod-els to make inferences about lead time or disease natural historyand then derive the corresponding fraction of cases that are over-diagnosed Each approach has its limitations and challenges butone thing is clear estimation approach is clearly a major factor be-hind the variation in overdiagnosis estimates in the literature I willconclude with a list of key questions that consumers of overdiagno-sis studies should ask to determine the validity (or lack thereof) ofstudy results

Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington2Fred Hutchinson Cancer Research CenterlinoueuweduWith the growing importance of biomarker-based tests for early de-tection and monitoring of chronic diseases the question of howbest to utilize biomarker measurements is of tremendous interestthe answer requires understanding the biomarker growth processProspective screening studies offer an opportunity to investigatebiomarker growth while simultaneously assessing its value for earlydetection However since disease diagnosis usually terminates col-lection of biomarker measurements proper estimation of biomarkergrowth in these studies may need to account for how screening af-fects the length of the observed biomarker trajectory In this talk we

compare estimation of biomarker growth from prospective screen-ing studies using two approaches a retrospective approach that onlymodels biomarker growth and a prospective approach that jointlymodels biomarker growth and time to screen detection We assessperformance of the two approaches in a simulation study and usingempirical prostate-specific antigen data from the Prostate CancerPrevention Trial We find that the prospective approach accountingfor informative censoring often produces similar results but mayproduce different estimates of biomarker growth in some contexts

Estimating Screening Test Effectiveness when Screening Indica-tion is UnknownRebecca HubbardGroup Health Research Institutehubbardrghcorg

Understanding the effectiveness of cancer screening tests is chal-lenging when the same test is used for screening and also for dis-ease diagnosis in symptomatic individuals Estimates of screeningtest effectiveness based on data that include both screening and di-agnostic examinations will be biased Moreover in many cases goldstandard information on the indication for the examination are notavailable Models exist for predicting the probability that a givenexamination was used for a screening purpose but no previous re-search has investigated appropriate statistical methods for utilizingthese probabilities In this presentation we will explore alternativemethods for incorporating predicted probabilities of screening in-dication into analyses of screening test effectiveness Using sim-ulation studies we compare the bias and efficiency of alternativeapproaches We also demonstrate the performance of each methodin a study of colorectal cancer screening with colonoscopy Meth-ods for estimating regression model parameters associated with anunknown categorical predictor such as indication for examinationhave broad applicability in studies of cancer screening and otherstudies using data from electronic health records

Developing Risk-Based Screening Guidelines ldquoEqual Manage-ment of Equal RisksrdquoHormuzd KatkiNational Cancer Institutekatkihmailnihgov

The proliferation of disease risk calculators has not led to a prolif-eration of risk-based screening guidelines The focus of risk-basedscreening guidelines is connecting risk stratification under naturalhistory of disease (without intervention) to ldquobenefit stratificationrdquowhether the risk stratification better distinguishes people who havehigh benefit vs low benefit from a screening intervention To linkrisk stratification to benefit stratification we propose the principleof ldquoequal management of people at equal risk of diseaserdquo Whenapplicable this principle leads to simplified and consistent manage-ment of people with different risk factors or test results leading tothe same disease risk people who might also have a similar bene-fitharm profile We describe two examples of our approach Firstwe demonstrate how the ldquoequal management of equal risksrdquo prin-ciple was applied to thoroughly integrate HPV testing into the newrisk-based cervical cancer screening guidelines the first thoroughlyrisk-based US cancer screening guidelines Second we use risk oflung cancer death to estimate benefit stratification for targeting CTlung cancer screening We show how we calculated benefit strati-fication for CT lung screening and also the analogous ldquoharm strat-ificationrdquo and ldquoefficiency stratificationrdquo We critically examine thelimits of the ldquoequal management of equal risksrdquo principle This ap-proach of calculating benefit stratification and applying ldquoequal man-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 83

Abstracts

agement of equal risksrdquo might be applicable in other settings to helppave the way for developing risk-based screening guidelines

Session 64 Recent Developments in the Visualization andExploration of Spatial Data

Recent Advancements in Geovisualization with a Case Studyon Chinese ReligionsJuergen Symanzik1 and Shuming Bao2

1Utah State University2University of MichigansymanzikmathusueduProducing high-quality map-based displays for economic medicaleducational or any other kind of statistical data with geographiccovariates has always been challenging Either it was necessary tohave access to high-end software or one had to do a lot of detailedprogramming Recently R software for linked micromap (LM)plots has been enhanced to handle any available shapefiles fromGeographic Information Systems (GIS) Also enhancements havebeen made that allow for a fast overlay of various statistical graphson Google maps In this presentation we provide an overview ofthe necessary steps to produce such graphs in R starting with GIS-based data and shapefiles and ending with the resulting graphs inR We will use data from a study on Chinese religions and society(provided by the China Data Center at the University of Michigan)as a case study for these graphical methods

Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She21University of Michigan2Wuhan UniversitysbaoumicheduWith the rapid development of spatial and non-spatial databases ofpopulation economy social and natural environment from differ-ent sources times and formats It has been a challenge how to effi-ciently integrate those space-time data and methodology for spatialstudies This paper will discuss the recent development of spatialintelligence technologies and methodologies for spatial data inte-gration data analysis as well as their applications for spatial stud-ies The presentation will introduce the newly developed spatialdata explorers (China Geo-Explorer) distributed by the Universityof Michigan China Data Center It will demonstrate how space-timedata of different formats and sources can be integrated visualizedanalyzed and reported in a web based spatial system Some applica-tions in population and regional development disaster assessmentenvironment and health cultural and religious studies and house-hold surveys will be discussed for China and global studies Futuredirections will be discussed finally

Probcast Creating and Visualizing Probabilistic Weather Fore-castsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3 TilmannGneiting4 and Adrian Raftery21Seattle University2University of Washington3Bigger Boat Consulting4University HeidelbergsloughtjseattleueduProbabilistic methods are becoming increasingly common forweather forecasting However communicating uncertainty infor-mation about spatial forecasts to users is not always a straightfor-ward task The Probcast project (httpprobcastcom) looks to both

develop methodologies for spatial probabilistic weather forecast-ing and to develop means of communicating this information ef-fectively This talk will discuss both the statistical approaches usedto create forecasts and the cognitive psychology research used tofind the best ways to clearly communicate statistical and probabilis-tic information

Session 65 Advancement in Biostaistical Methods andApplications

Estimation of Time-Dependent AUC under Marker-DependentSamplingXiaofei Wang and Zhaoyin ZhuDuke UniversityxiaofeiwangdukeeduIn biomedical field evaluating the accuracy of a biomarker predict-ing the onset of a disease or a disease condition is essential Whenpredicting the binary status of disease onset is of interest the areaunder the ROC curve (AUC) is widely used When predicting thetime to an event is of interest time-dependent ROC curve (AUC(t))can be used In both cases however the simple random sampling(SRS) often used for biomarker validation is costly and requires alarge number of patients To improve study efficiency and reducecost marker-dependent sampling (MDS) has been proposed (Wanget al 2012 2013) in which selection of patients for ascertainingtheir survival outcomes is dependent on the results of biomarkerassays In this talk we will introduce a non-parametric estimatorfor time-dependent AUC(t) under MDS The consistency and theasymptotic normality of the proposed estimator will be discussedSimulation will be used to demonstrate the unbiasedness of the pro-posed estimator under MDS and the efficiency gain of MDS overSRS

A Measurement Error Approach for Modeling Accelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy DunloopNorthwestern Universityjungwha-leenorthwesterneduPhysical activity (PA) is a modifiable lifestyle factor for manychronic diseases with established health benefits PA outcomes us-ing accelerometers are measured and assessed in many studies butthere are limited statistical methods analyzing accelerometry dataWe describe a measurement error modeling approach to estimatethe distribution of habitual physical activity and the sources of vari-ation in accelerometer-based physical activity data from a sampleof adults with or at risk of knee osteoarthritis We model both theintra- and inter-individual variability in measured physical activityOur model allows us to account for and adjust for measurement er-rors biases and other sources of intra-individual variations

Real-Time Prediction in Clinical Trials A Statistical History ofREMATCHDaniel F Heitjan and Gui-shuang YingUniversity of PennsylvaniadheitjanupenneduRandomized clinical trials often include one or more planned in-terim analyses during which an external monitoring committee re-views the accumulated data and determines whether it is scientif-ically and ethically appropriate for the study to continue Withsurvival-time endpoints it is often desirable to schedule the interimanalyses at the times of occurrence of specified landmark eventssuch as the 50th event the 100th event and so on Because the

84 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

timing of such events is random and the interim analyses imposeconsiderable logistical burdens it is worthwhile to predict the eventtimes as accurately as possible Prediction methods available priorto 2001 used data only from previous trials which are often of ques-tionable relevance to the trial for which one wishes to make predic-tions With modern data management systems it is often feasibleto use data from the trial itself to make these predictions render-ing them far more reliable This talk will describe work that somecolleagues and students and I have done in this area I will set themethodologic development in the context of the trial that motivatedour work REMATCH a randomized clinical trial of a heart assistdevice that ran from 1998 to 2001 and was considered one of themost rigorous and expensive device trials ever conducted

An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C Morrison Elaine CJohnson Stephen R Planck and James T RosenbaumOregon Health amp Science UniversitychoidohsueduNormalization is considered an important step before any statisti-cal analyses in microarray studies Many methods have been pro-posed over the last decade or so for examples global normalizationlocal regression based methods and quantile normalization Nor-malization methods typically remove systemic biases across arraysand have been shown quite effective in removing them from arrayswhen they were processed simultaneously in a batch It is howeverreported that they sometimes do not remove differences betweenbatches when microarrays are split into several experiments over thetime In this presentation we will explore potential approaches thatcould adjust batch effects by using traditional methods and methodsdeveloped as a secondary normalization

Session 66 Analysis of Complex Data

Integrating Data from Heterogeneous Studies Using Only Sum-mary Statistics Efficiency and RobustnessMin-ge XieRutgers UniversitymxiestatrutgerseduHeterogeneous studies arise often in applications due to differentstudy and sampling designs populations or outcomes Sometimesthese studies have common hypotheses or parameters of interestWe can synthesize evidence from these studies to make inferencefor the common hypotheses or parameters of interest For hetero-geneous studies some of the parameters of interest may not be es-timable for certain studies and in such a case these studies are typ-ically excluded in conventional methods The exclusion of part ofthe studies can lead to a non-negligible loss of information This pa-per introduces a data integration method for heterogeneous studiesby combining the confidence distributions derived from the sum-mary statistics of individual studies It includes all the studies inthe analysis and makes use of all information direct as well as in-direct Under a general likelihood inference framework this newapproach is shown to have several desirable properties includingi) it is asymptotically as efficient as the maximum likelihood ap-proach using individual participant data (IPD) from all studies ii)unlike the IPD analysis it suffices to use summary statistics to carryout our approach Individual-level data are not required and iii) itis robust against misspecification of the working covariance struc-ture of the parameter estimates All the properties of the proposedapproach are further confirmed by data simulated from a random-

ized clinical trials setting as well as by real data on aircraft landingperformance (Joint work with Dungang Liu and Regina Liu)

A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University2Koc UniversityjlandongwueduIn this presentation we will consider a latent Markov process gov-erning the intensity rate of a Poisson process model for failure dataThe latent process enables us to infer the performance of the de-bugging operation over time and allows us to deal with the imper-fect debugging scenario We develop the Bayesian inference for themodel and also introduce a method to infer the unknown dimensionof the Markov process We will illustrate the implementation of ourmodel and the Bayesian approach by using actual software failuredata

A Comparison of Two Approaches for Acute Leukemia PatientClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng3

1University of Calgary2Enbridge Pipelines3University of GuelphjinwuucalgarycaThe advancement of microarray technology has greatly facilitatedthe research in gene expression based classification of patient sam-ples For example in cancer research microarray gene expressiondata has been used for cancer or tumor classification When thestudy is only focusing on two classes for example two different can-cer types we propose a two-sample semiparametric model to modelthe distributions of gene expression level for different classes Toestimate the parameters we consider both maximum semiparamet-ric likelihood estimate (MLE) and minimum Hellinger distance es-timate (MHDE) For each gene Wald statistic is constructed basedon either the MLE or MHDE Significance test is then performed oneach gene We exploit the idea of weighted sum of misclassificationrates to develop a novel classification model in which previouslyidentified significant genes only are involved To testify the useful-ness of our proposed method we consider a predictive approachWe apply our method to analyze the acute leukemia data of Golubet al (1999) in which a training set is used to build the classifica-tion model and the testing set is used to evaluate the accuracy of ourclassification model

On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 Valerie McGuire1

and John Shepherd3

1VA Palo Alto Health Care System amp Stanford University2Purdue University3University of California at San Francisco4VA Palo Alto Health Care SystemyingluvagovAlthough Deming regression (DR) has been successfully used toestablish cross-calibration (CC) formulas for bone mineral densities(BMD) between manufacturers at several anatomic sites it failedfor CC of whole body BMD because their relationship varies withsubjectrsquos weight total fat and lean mass We proposed to use a newvarying-coefficient DR (VCDR) that allows the intercept and slopebe non-linear functions of covariates and applied this new modelsuccessfully to derive a consistent calibration formula for the newwhole body BMD data Our results showed this VCDR effectively

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 85

Abstracts

removed all systematic bias in previous work In this talk we willdiscuss the consistency of the calibration formula and proceduresfor covariate selections

Session 67 Statistical Issues in Co-development of Drugand Biomarker

Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in Comparative Ef-fectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong Woo Kim3

1Stanford University2Onyx Pharmaceuticals3Microsoft Corportationdwkim88stanfordeduBiomarker-guided personalized therapies offer great promise to im-prove drug development and improve patient care but also posedifficult challenges in designing clinical trials for the developmentand validation of these therapies We first give a review of the exist-ing approaches briefly for clinical trials in new drug developmentand in more detail for comparative effectiveness trials involving ap-proved treatments We then introduce new group sequential designsto develop and test personalized treatment strategies involving ap-proved treatments

Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2

1University of Washington2National Institutes of HealthnrsimonuwashingtoneduMany difficult-to-treat diseases are actually a heterogenious collec-tion of similar syndromes with potentially different causal mech-anisms New molecules attack pathways that are dysregulated inonly a subset of this collection and so are expected to be effec-tive for only a subset of patients with the disease Often this subsetis not well understood until well into large scale of clinical trialsAs such standard practice has been to enroll a broad range of pa-tients and run post-hoc subset analysis to determine those who mayparticularly benefit This unnecessarily exposes many patients tohazardous side effects and may vastly decrease the efficiency of thetrial (expecially if only a small subset benefit) In this talk I willdiscuss a class of adaptive enrichment designs which allow the el-igibility criteria of a trial to be adaptively updated during the trialrestricting entry to only patients likely to benefit from the new treat-ment These designs control type I error can substantially increasepower I will also discuss and illustrate strategies for effectivelybuilding and evaluating biomarkers in this framework

An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael WolfAmgen IncmichaelwolfamgencomRoberts (Clin Cancer Res 2011) presented a single-arm 2-stageadaptive design to evaluate response overall and in one or morebiomarker-defined subgroup where biomarkers are only determinedfor responders While this design has obvious practical advantagesthe testing strategy proposed does not provide robust control offalse-positive error Modified futility and testing strategies are pro-posed based on marginal probabilities to achieve the same designobjectives that are shown to be more robust however a trade-off

is that biomarkers must be determined for all subjects Clinicalexamples of design setup and analysis are illustrated with a fixedsubgroup size that reflects its expected prevalence in the intendeduse population based on a validated in vitro companion diagnosticDesign efficiency and external validity are compared to testing fora difference in complement biomarker subgroups Possible gener-alizations of the design for a data-dependent subgroup size (egbiomarker value iquest sample median) and multiple subgroups are dis-cussed

Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas BengtssonGenentech IncthomasgbgenecomA key goal during early clinical co-development of a new therapeu-tic and a biomarker is to determine the ldquodiagnostic positive grouprdquoie to identify a sub-group of patients likely to receive a clini-cally meaningful treatment benefit We show that based on a typi-cally sized Ph1Ph2 study with nrevents iexcl 100 accurate biomarkerthreshold estimation with time-to-event data is not a realistic goalInstead we propose to hierarchically test for treatment effects inpre-determined patient subjects most likely to benefit clinically Weillustrate our method with data from a recent lung cancer trial

Session 68 New Challenges for Statistical Ana-lystProgrammer

Similarities and Differences in Statistical Programming amongCRO and Pharmaceutical IndustriesMark MatthewsinVentiv Health ClinicalmrkmtthwsyahoocomStatistical programming in the clinical environment has a widerange of opportunities across the clinical drug development cycleWhether you are employed by a Contract Research OrganizationPharmaceutical or Biotechnology company or as a contractor theprogramming tasks are often quite similar and at times the workcannot be differentiated by your employer However the higherlevel strategies and the direction any organization takes as an en-terprise can be an important factor in the fulfillment of a statisticalprogrammerrsquos career The author would like to share his experi-ences with the differences and similarities that a clinical statisticalprogrammer can be offered in their career and also provide someuseful tips on how to best collaborate when working with your peerprogrammers from different industries

Computational Aspects for Detecting Safety Signals in ClinicalTrialsJyoti RayamajhiEli Lilly and Companyrayamajhi jyotilillycomIt is always a challenge to detect safety signals from adverse event(AE) data in clinical trials which is a critical task in any drug devel-opment In any trial it is very desirable to describe and understandthe safety of the compound to the fullest possible extent MedDRAcoding scheme eg System Organ Class (SOC) and Preferred Term(PT) is used in safety analyses which is hierarchical in nature Useof Bayesian hierarchical models to predict posterior probabilitiesand will also account for AE in the same SOC to be more likelybe similar so they can sensibly borrow strength from each other

86 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

The model also allows borrowing strength across SOCs but doesnot impose it depending on the actual data It is interesting to seecomparative analyses between frequentistrsquos approach and an alter-native Bayesian methodology in detecting safety signals in clinicaltrials Computation of data to model these hierarchical models iscomplex and is challenging Data from studies were used to model3 Bayesian logistic regression hierarchical models Model selectionis achieved by using Deviance Information Criteria (DIC) Modelsand plots were implemented using BRugs R2WinBUGS and JAGSA scheme for meta analysis for a hierarchical three-stage Bayesianmixture model is also implemented and will be discussed An userfriendly and fully-functional web interface for safety signal detec-tion using Bayesian meta-analysis and general three-stage hierar-chical mixture model will be described Keywords System OrganClass Preferred terms Deviance Information Criteria hierarchicalmodels mixture model

Bayesian Network Meta-Analysis Methods An Overview andA Case StudyBaoguang Han1 Wei Zou2 and Karen Price11Eli Lilly and Company2inVentiv Clinical Healthhan baoguanglillycomEvidence-based health-care decision making requires comparing allrelevant competing interventions In the absence of direct head-to-head comparison of different treatments network meta-analysis(NMA) is increasingly used for selecting the best treatment strat-egy for health care intervention The Bayesian approach offers aflexible framework for NMA in part due to its ability to propagateparameter correlation structure and provide straightforward proba-bility statements around the parameters of interest In this talk wewill provide a general overview of the Bayesian NMA models in-cluding consistency models network meta-regression and inconsis-tency check using node-splitting techniques Then we will illustratehow NMA analysis can be performed with a detailed case studyand provide some details on available software as well as variousgraphical and textual outputs that can be readily understood and in-terpreted by clinicians

Session 69 Adaptive and Sequential Methods for ClinicalTrials

Bayesian Data Augmentation Dose Finding with Continual Re-assessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2

1 University of Texas MD Anderson Cancer Center2 University of Hong KongyyuanmdandersonorgA major practical impediment when implementing adaptive dose-finding designs is that the toxicity outcome used by the decisionrules may not be observed shortly after the initiation of the treat-ment To address this issue we propose the data augmentation con-tinual reassessment method (DA-CRM) for dose findingBy natu-rally treating the unobserved toxicities as missing data we showthat such missing data are nonignorable in the sense that the miss-ingness depends on the unobserved outcomes The Bayesian dataaugmentation approach is used to sample both the missing dataand model parameters from their posterior full conditional distri-butionsWe evaluate the performance of the DA-CRM through ex-tensive simulation studies and also compare it with other existingmethods The results show that the proposed design satisfactorily

resolves the issues related to late-onset toxicities and possesses de-sirable operating characteristicstreating patients more safely andalso selecting the maximum tolerated dose with a higher probabil-ity

Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying YuanUniversity of Texas MD Anderson Cancer Centeryzang1mdandersonorgIn developing targeted therapy the marker-strategy design providesan important approach to evaluate the predictive marker effect Thisdesign first randomizes patients into non-marker-based or marker-based strategies Patients allocated to the non-marker-based strat-egy are then further randomized to receive either the standard ortargeted treatments while patients allocated to the marker-basedstrategy receive treatments based on their marker statuses Thepredictive marker effect is tested by comparing the treatment out-come between the two strategies In this talk we show that sucha between-strategy comparison has low power to detect the predic-tive effect and is valid only under the restrictive condition that therandomization ratio within the non-marker-based strategy matchesthe marker prevalence To address these issues we propose a Waldtest that is generally valid and also uniformly more powerful thanthe between-strategy comparison Based on that we derive an opti-mal marker-strategy design that maximizes the power to detect thepredictive marker effect by choosing the optimal randomization ra-tios between the two strategies and treatments Our numerical studyshows that using the proposed optimal designs can substantially im-prove the power of the marker-strategy design to detect the predic-tive marker effect

Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at HoustonxlhuangmdandersonorgAs time goes by more and more data are observed for each pa-tient Dynamic prediction is to keep making updated predictionsof disease prognosis using all the available information This pro-posal is motivated by the need of real-time monitoring of the diseaseprogress of chronic myeloid leukemia patients using their BCR-ABL gene expression levels measured during their follow-up vis-its We provide real-time dynamic prediction for future prognosisusing a series of marginal Cox proportional hazards models overcontinuous time with constraints Comparing with separate land-mark analyses on different discrete time points after treatment ourapproach can achieve more smooth and robust predictions Com-paring with approaches of joint modeling of longitudinal biomark-ers and survival our approach does not need to specify a model forthe changes of the monitoring biomarkers and thus avoids the needof any kind of imputing of the biomarker values on time points theyare not available This helps eliminate the potential bias introducedby mis-specified models for longitudinal biomarkers

Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen2

1 Georgia State University2 Emory Universitycathysaiyogmailcom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 87

Abstracts

A phase II trial is an expedite and low cost trial to screen poten-tially effective agents for the following phase III trial Unfortu-nately the positive rate of Phase III trials is still low although agentshave been determined to be effective in proceeding Phase II trialsmainly because the different endpoints are used in Phase II (tu-mor response) and III (survival) trials Good disease response oftenleads to but can NOT guarantee better survival From statisticalconsideration transformation of continuous tumor size change intoa categorical tumor response (complete response partial responsestable disease or progressive disease) according to World HealthOrganization (WHO) or Response Evaluation Criteria In Solid Tu-mors (RECIST) will result in a loss of study power Tumor sizechange can be obtained rapidly but survival estimation requires along time follow up We propose a novel double screening phaseII design in which tumor size change percentage is used in the firststage to select potentially effective agents rapidly for second stagein which progression free or overall survival is estimated to confirmthe efficacy of agents The first screening can fully utilize all tumorsize change data and minimize cost and length of trial by stoppingit when agents are determined to be ineffective based on low stan-dard and the second screening can substantially increase the successrate of following Phase III trial by using similar or same outcomesand a high standard Simulation studies are performed to optimizethe significant levels of the two screening stages in the design andcompare its operating characteristics with Simonrsquos two stage designROC analysis is applied to estimate the success rate in the follow-upPhase III trials

Session 70 Survival Analysis

Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua NaranjoWestern Michigan UniversitybenedictpdormitoriowmicheduCox proportional hazards seems to be the standard statisticalmethod for analyzing treatment efficacy when time-to-event datais available In the absence of time-to-event investigators may uselogistic regression which does not require time-to-event or Poissonregression which requires only interval-summarized frequency ta-bles of time-to-event We investigate the relative performance of thethree methods In particular we compare the power of tests basedon the respective effect-size estimates (1)hazard ratio (2)odds ra-tio and (3)rate ratio We use a variety of survival distributions andcut-off points representing length of study The results have impli-cations on study design For example under what conditions mightwe recommend a simpler design based only on event frequenciesinstead of measuring time-to-event and what length of study is rec-ommended

Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary EndpointNibedita BandyopadhyayJanssen Research amp DevelopmentnbandyopitsjnjcomInterim analyses are widely used in Phase II and III clinical trialsThe efficiency in drug development process can be improved usinginterim analyses In clinical trials with time to an event as primaryendpoint it is common to plan the interim analyses at pre-specifiednumbers of events Performing these analyses at times with a differ-ent number of events than planned may impact the trialrsquos credibilityas well as the statistical properties of the interim analysis On the

other hand significant resources are required in conducting suchanalyses Therefore for logistic planning purposes it is very im-portant to predict the timing of this pre-specified number of eventsearly and accurately A statistical technique for making such pre-diction in ongoing multicenter clinical trials is developed Resultsare illustrated for different scenarios using simulations

Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen CaiUniversity of North Carolina at Chapel HillyudengliveunceduLogrank test is commonly used for comparing survival distributionsbetween treatment and control groups When censoring rate is lowand the sample size is moderate the approximation based on theasymptotic normal distribution of the logrank test works well in fi-nite samples However in some studies the sample size is small(eg 10 20 per group) and the censoring rate is high (eg 0809) Under such conditions we conduct a series of simulationsto compare the performance of the logrank test based on normal ap-proximation permutation and bootstrap In general the type I errorrate based on the bootstrap test is slightly inflated when the numberof failures is larger than 2 while the logrank test based on normalapproximation has a type I error around 005 and the permutationtest is relatively conservative in type I error However when thereis only one failure per group type I error of the permutation test ismore close to 005 than the other two tests

Session 71 Complex Data Analysis Theory and Appli-cation

Supervised Singular Value Decomposition and Its AsymptoticPropertiesGen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill2Rutgers UniversityhaipengemailunceduWe develop a supervised singular value decomposition (SupSVD)model for supervised dimension reduction The research is moti-vated by applications where the low rank structure of the data ofinterest is potentially driven by additional variables measured onthe same set of samples The SupSVD model can make use of theinformation in the additional data to accurately extract underlyingstructures that are more interpretable The model is very generaland includes the principal component analysis model and the re-duced rank regression model as two extreme cases We formulatethe model in a hierarchical fashion using latent variables and de-velop a modified expectation-maximization algorithm for parame-ter estimation which is computationally efficient The asymptoticproperties for the estimated parameters are derived We use com-prehensive simulations and two real data examples to illustrate theadvantages of the SupSVD model

New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng2

1University of Arizona2Columbia UniversitynhaomatharizonaeduIt is a challenging task to identify interaction effects for high di-mensional data The main difficulties lie in both computational and

88 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

theoretical aspects We propose a new framework for interaction se-lection Efficient computational algorithms based on both forwardselection and penalization approaches are illustrated

A Statistical Approach to Set Classification by Feature Selectionwith Applications to Classification of Histopathology ImagesSungkyu Jung1 and Xingye Qiao2

1University of Pittsburgh2Binghamton University State University of New YorkqiaomathbinghamtoneduSet classification problems arise when classification tasks are basedon sets of observations as opposed to individual observations In setclassification a classification rule is trained with N sets of observa-tions where each set is labeled with class information and the pre-diction of a class label is performed also with a set of observationsData sets for set classification appear for example in diagnosticsof disease based on multiple cell nucleus images from a single tis-sue Relevant statistical models for set classification are introducedwhich motivate a set classification framework based on context-freefeature extraction By understanding a set of observations as an em-pirical distribution we employ a data-driven method to choose thosefeatures which contain information on location and major variationIn particular the method of principal component analysis is usedto extract the features of major variation Multidimensional scal-ing is used to represent features as vector-valued points on whichconventional classifiers can be applied The proposed set classifica-tion approaches achieve better classification results than competingmethods in a number of simulated data examples The benefits ofour method are demonstrated in an analysis of histopathology im-ages of cell nuclei related to liver cancer

A Smoothing Spline Model for analyzing dMRI Data of Swal-lowingBinhuan Wang Ryan Branski Milan Amin and Yixin FangNew York UniversityyixinfangnyumcorgSwallowing disorders are common and have a significant healthimpact Dynamic magnetic resonance imaging (dMRI) is a noveltechnique for visualizing the pharynx and upper esophageal seg-ment during a swallowing process We develop a smoothing splinemethod for analyzing swallow dMRI data We apply the method toa dataset obtained from an experiment conducted in the NYU VoiceCenter

Session 72 Recent Development in Statistics Methods forMissing Data

A Semiparametric Inference to Regression Analysis with Miss-ing Covariates in Survey DataShu Yang and Jae-kwang KimIowa State UniversityjkimiastateeduWe consider parameter estimation in parametric regression modelswith covariates missing at random in survey data A semiparametricmaximum likelihood approach is proposed which requires no para-metric specification of the marginal covariate distribution We ob-tain an asymptotic linear representation of the semiparametric max-imum likelihood estimator (SMLE) using the theory of von Misescalculus and V Statistics which allows a consistent estimator ofasympototic variance An EM-type algorithm for computation isdiscussed We extend the methodology for general parameter es-timation which is not necessary equal to MLE Simulation results

suggest that the SMLE method is robust whereas the parametricmaximum likelihood method is subject to severe bias under modelmisspecification

Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2

1University of Waterloo2University of MichiganpeisonghanuwaterloocaWe propose an estimator which is more robust than doubly robustestimators by weighting the complete cases using weights otherthan the inverse probability when estimating the population meanof a response variable that is subject to ignorable missingness Weallow multiple models for both the propensity score and the out-come regression Our estimator is consistent if any one of the mul-tiple models is correctly specified Such multiple robustness againstmodel misspecification significantly improves over the double ro-bustness which only allows one propensity score model and oneoutcome regression model Our estimator attains the semiparamet-ric efficiency bound when one propensity score model and one out-come regression model are correctly specified without requiring theknowledge of exactly which two are correct

Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1

1United States Centers for Disease Control and Preventionjnu5cdcgovIn practice it is a challenge to impute missing values of binary vari-ables For a monotone missing pattern imputation methods avail-able in SAS include the LOGISTIC method which uses logistic re-gression modeling and the DISCRIM method which only allowscontinuous variables in the imputation model For an arbitrary miss-ing pattern a fully conditional specification (FCS) method is nowavailable in SAS This method only assumes the existence of a jointdistribution for all variables On the other hand IVEware devel-oped by University of Michigan Survey Research Center uses a se-quence of regression models and imputes missing values by drawingsamples from posterior predictive distributions We presents resultsfrom a series of simulations designed to evaluate and compare theperformance of the above mentioned imputation methods An ex-ample to impute the BED recent status (recent or long-standing)in estimating HIV incidence is used to illustrate the application ofthose procedures

Marginal Treatment Effect Estimation Using Pattern-MixtureModelZhenzhen XuUnited States Food and Drug AdministrationzhenzhenxufdahhsgovMissing data often occur in clinical trials When the missingness de-pends on unobserved responses pattern mixture model is frequentlyused This model stratifies the data according to drop-out patternsand formulates a model for each pattern with specific parametersThe resulting marginal distribution of response is a mixture of dis-tribution over the missing data patterns If the eventual interest is toestimate the overall treatment effect one can calculate a weightedaverage of pattern-specific treatment effects assuming that the treat-ment assignment is equally distributed across patterns Howeverin practice this assumption is unlikely to hold As a result theweighted average approach is subject to bias In this talk we in-troduce a new approach to estimate marginal treatment effect basedon random-effects pattern mixture model for longitudinal studieswith continuous endpoint relaxing the homogeneous distributional

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 89

Abstracts

assumption on treatment assignment across missing data patternsA simulation study shows that under missing not at random mech-anism the proposed approach can yield substantial reduction in es-timation bias and improvement in coverage probability comparedto the weighted average approach The proposed method is alsocompared with the linear mixed model and generalized estimatingequation approach under various missing data mechanisms

Session 73 Machine Learning Methods for Causal Infer-ence in Health Studies

Causal Inference of Interaction Effects with Inverse PropensityWeighting G-Computation and Tree-Based StandardizationJoseph Kang1 Xiaogang Su2 Lei Liu1 and Martha Daviglus31 Northwestern University2 University of Texas at El Paso3 University of Illinois at Chicagojoseph-kangnorthwesternedu

Given the recent interest of subgroup-level studies and personalizedmedicine health research with observational studies has been devel-oped for interaction effects of measured confounders In estimatinginteraction effects the inverse of the propensity weighting (IPW)method has been widely advocated despite the immediate availabil-ity of other competing methods such as G-computation estimatesThis talk compares the advocated IPW method the G-computationmethod and our new Tree-based standardization method whichwe call the Interaction effect Tree (IT) The IT procedure uses alikelihood-based decision rule to divide the subgroups into homo-geneous groups where the G-computation can be applied Our sim-ulation studies indicate that the IT-based method along with the G-computation works robustly while the advocated IPW method needssome caution in its weighting We applied the IT-based method toassess the effect of being overweight or obese on coronary arterycalcification (CAC) in the Chicago Healthy Aging Study cohort

Practice of Causal Inference with the Propensity of Being Zeroor OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and Peter M Steiner31 Northwestern University2University of CincinnatiCincinnati Childrenrsquos Hospital MedicalCenter3University of Wisconsin-Madisonwendychan2016unorthwesternedu

Causal inference methodologies have been developed for the pastdecade to estimate the unconfounded effect of an exposure underseveral key assumptions These assumptions include the absenceof unmeasured confounders the independence of the effect of onestudy subject from another and propensity scores being boundedaway from zero and one (the positivity assumption) The first twoassumptions have received much attention in the literature Yet thepositivity assumption has been recently discussed in only a few pa-pers Propensity scores of zero or one are indicative of deterministicexposure so that causal effects cannot be defined for these subjectsTherefore these subjects need to be removed because no compa-rable comparison groups can be found for such subjects In thispaper we evaluate and compare currently available causal inferencemethods in the context of the positivity assumption We propose atree-based method that can be easily implemented in R software Rcode for the studies is available online

Propensity Score and Proximity Matching Using Random For-estPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1

1San Diego State University2University of Texas at El PasojjfanmailsdsueduTo reduce potential bias in observational studies it is essential tohave balanced distributions on all available background informa-tion between cases and controls Propensity score has been a keymatching variable in this area However this approach has severallimitations including difficulties in handling missing values cate-gorical variables and interactions Random forest as an ensembleof many classification trees is straightforward to use and can eas-ily overcome those issues Each classification tree in random forestrecursively partitions the available dataset into sub-sets to increasethe purity of the terminal nodes With this process the cases andcontrols in the same terminal node automatically becomes the bestbalanced match By averaging the outcome of each individual treerandom forest can provide robust and balanced matching resultsThe proposed method is applied to data from the National Healthand Nutrition Examination Survey (NHNES)

Session 74 JP Hsu Memorial Session

Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili YuGeorgia Southern UniversitylyugeorgiasoutherneduThe classical accelerated failure time (AFT) model has been exten-sively investigated due to its direct interpretation of the covariateeffects on the mean survival time in survival analysis Howeverthis classical AFT model and its associated methodologies are builton the fundamental assumption of data homoscedasticity Conse-quently when the homoscedasticity assumption is violated as of-ten seen in the real applications the estimators lose efficiency andthe associated inference is not reliable Furthermore none of theexisting methods can estimate the intercept consistently To over-come these drawbacks we propose a semiparametric approach inthis paper for both homoscedastic and heteroscedastic data Thisapproach utilizes a weighted least-squares equation with syntheticobservations weighted by square root of their variances where thevariances are estimated via the local polynomial regression We es-tablish the limiting distributions of the resulting coefficient estima-tors and prove that both slope parameters and the intercept can beconsistently estimated We evaluate the finite sample performanceof the proposed approach through simulation studies and demon-strate its superiority through real example on its efficiency and reli-ability over the existing methods when the data is heteroscedastic

A Comparison of Size and Power of Tests of Hypotheses on Pa-rameters Based on Two Generalized Lindley DistributionsMacaulay OkwuokenyeBiogen IdecmacaulayokwuokenyebiogenideccomData (complete and censored) following the Lindley distributionare generated and analyzed using two generalized Lindley distribu-tions and maximum likelihood estimates of parameters from gen-eralized Lindley distributions are obtained Size and power of testsof hypotheses on the parameters are assessed drawing on asymp-totic properties of the maximum likelihood estimators Results sug-gest that whereas size of some of the tests of hypotheses based on

90 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the considered generalized distributions are essentially alpha-levelsome are possibly not power of tests of hypotheses on Lindley dis-tribution parameter from the two distributions differs

Session 75 Challenge and New Development in ModelFitting and Selection

Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2

1University of Nevada at Las Vegas2American Museum of Natural HistoryameiameiunlveduMutation frequencies can be modeled as a Poisson random field(PRF) to estimate speciation times and the degree of selection onnewly arisen mutations This approach provides a quantitative the-ory for comparing intraspecific polymorphism with interspecific di-vergence in the presence of selection and can be used to estimatepopulation genetic parameters First we modified a recently devel-oped time-dependent PRF model to independently estimate geneticparameters from a nuclear and mitochondrial DNA data set of 22sister pairs of birds that have diverged across a biogeographic bar-rier We found that species that inhabit humid habitat had more re-cent divergence times larger effective population sizes and smallerselective effect than those that inhabit drier habitats but overall themitochondrial DNA was under weak selection Our study indicatesthat PRF models are useful for estimating various population ge-netic parameters and serve as a framework for incorporating esti-mates of selection into comparative phylogeographic studies Sec-ond due to the built-in feature of the species divergence time thetime-dependent PRF model is especially suitable for estimating se-lective effects of more recent mutations such as the mutations thathave occurred in the human genome By analyzing the estimateddistribution of the selective coefficients at each individual gene forexample the sign and magnitude of the mean selection coefficientwe will be able to detect a gene or a group of genes that are relatedto the diagnosed cancer Moreover the estimate of the species diver-gence time will provide useful information regarding the occurrencetime of the cancer

On A Class of Maximum Empirical Likelihood Estimators De-fined By Convex FunctionsHanxiang Peng and Fei TanIndiana University-Purdue University IndianapolisftanmathiupuieduIn this talk we introduce a class of estimators defined by convexcriterion functions and show that they are maximum empirical like-lihood estimators (MELEs) We apply the results to obtain MELEsfor quantiles quantile regression and Cox regression when addi-tional information is available We report some simulation resultsand real data applications

Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai WangNew Jersey Institute of Technologyaw224njiteduGiven a dependent censored data (X delta) =(min(TC) I(T lt C)) from an Archimedean copula modelwe give general formulas for possible marginal survival functionsof T and C Based on our formulas we can easily establish therelationship between all these survival functions and derive some

useful identifiability results Also based on our formulas we pro-pose a new estimator of the marginal survival function when theArchimedean copula model is assumed to be known We derivebias formulas for our estimator and other existing estimators Simu-lation studies have shown that our estimator is comparable with thecopula-graphic estimator proposed by Zheng and Klein (1995) andRivest and Wells (2001) and Zheng and Kleinrsquos estimator (1994)under the Archimedean copula assumption We end our talk withsome discussions

Dual Model Misspecification in Generalized Linear Models withError in VariablesXianzheng HuangUniversity of Southern CaliforniahuangstatsceduWe study maximum likelihood estimation of regression parametersin generalized linear models for a binary response with error-pronecovariates when the distribution of the error-prone covariate or thelink function is misspecified We revisit the remeasurement methodproposed by Huang Stefanski and Davidian (2006) for detectinglatent-variable model misspecification and examine its operatingcharacteristics in the presence of link misspecification Further-more we propose a new diagnostic method for assessing assump-tions on the link function Combining these two methods yieldsinformative diagnostic procedures that can identify which model as-sumption is violated and also reveal the direction in which the truelatent-variable distribution or the true link function deviates fromthe assumed one

Session 76 Advanced Methods and Their Applications inSurvival Analysis

Kernel Smoothed Profile Likelihood Estimation in the Acceler-ated Failure Time Frailty Model for Clustered Survival DataBo Liu1 Wenbin Lu1 and Jiajia Zhang2

1North Carolina State University2South Carolina UniversityjzhangmailboxsceduClustered survival data frequently arise in biomedical applicationswhere event times of interest are clustered into groups such as fam-ilies In this article we consider an accelerated failure time frailtymodel for clustered survival data and develop nonparametric max-imum likelihood estimation for it via a kernel smoother aided EMalgorithm We show that the proposed estimator for the regressioncoefficients is consistent asymptotically normal and semiparamet-ric efficient when the kernel bandwidth is properly chosen An EM-aided numerical differentiation method is derived for estimating itsvariance Simulation studies evaluate the finite sample performanceof the estimator and it is applied to the Diabetic Retinopathy dataset

Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2

1National University of Singapore2Emory UniversityqizhengemoryeduMarginal regression-based ranking methods are widely adopted toscreen ultrahigh-dimensional biomarkers in biomedical studies Anassumed regression model may not fit a real data in practice Weconsider a model-free screening approach specifically for censoredlifetime data outcome by measuring the average survival differences

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 91

Abstracts

with and without the covariates The proposed survival impactingindex can be implemented with familiar nonparametric estimationprocedures and avoid imposing any rigid model assumptions Weestablish the sure screening property of the index and the asymptoticdistribution of the estimated index to facilitate inferences Simula-tions are carried out to assess the performance of our method Alung cancer data is analyzed as an illustration

Analysis of Event History Data in Tuberculosis (TB) ScreeningJoan HuSimon Fraser UniversityjoanhstatsfucaTuberculosis (TB) is an infectious disease spread by the airborneroute An important public health intervention in TB prevention istracing individuals (TB contacts) who may be at risk of having TBinfection or active TB disease as a result of having shared air spacewith an active TB case This talk presents an analysis of the datacollected from 7921 people identified as contacts from the TB reg-istry of British Columbia Canada in attempt to identify risk factorsto TB development of TB contacts Challenges encountered in theanalysis include clustered subjects covariate missing not at random(MNAR or NMAR) and a portion of subjects potentially will neverexperience the event of TB

On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1 and Mei-Cheng Wang3

1University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston3Johns Hopkins UniversityjningmdandersonorgBivariate or multivariate recurrent event processes are often encoun-tered in longitudinal studies in which more than one type of eventsare of interest There has been much research on regression analy-sis for such data but little has been done to address the problem ofhow to measure dependence between two types of recurrent eventprocesses We propose a time-dependent measure termed the rateratio to assess the local dependence between two types of recur-rent event processes We model the rate ratio as a parametric re-gression function of time and leave unspecified all other aspects ofthe distribution of bivariate recurrent event processes We developa composite-likelihood procedure for model fitting and parameterestimation We show that the proposed composite-likelihood esti-mator possesses consistency and asymptotically normality propertyThe finite sample performance of the proposed method is evaluatedthrough simulation studies and illustrated by an application to datafrom a soft tissue sarcoma study

Session 77 High Dimensional Variable Selection andMultiple Testing

On Procedures Controlling the False Discovery Rate for TestingHierarchically Ordered HypothesesGavin Lynch and Wenge GuoNew Jersey Institute of TechnologywengeguonjiteduComplex large-scale studies such as those related to microarray andquantitative trait loci often involve testing multiple hierarchicallyordered hypotheses However most existing false discovery rate(FDR) controlling procedures do not exploit the inherent hierarchi-cal structure among the tested hypotheses In this talk I present key

developments toward controlling the FDR when testing the hierar-chically ordered hypotheses First I offer a general framework un-der which hierarchical testing procedures can be developed Then Ipresent hierarchical testing procedures which control the FDR undervarious forms of dependence Simulation studies show that theseproposed methods can be more powerful than alternative methods

Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 and Yufeng Liu4

1University of Texas MD Anderson Cancer Center2North Carolina State University3University of Arizona4University of North Carolina at Chapel HillwustatncsueduReducing dimensionality of data is essential for binary classifica-tion with high-dimensional covariates In the context of sufficientdimension reduction (SDR) most if not all existing SDR meth-ods suffer in binary classification In this talk we target directly atthe SDR for binary classification and propose a new method basedon support vector machines The new method is supported by bothnumerical evidence and theoretical justification

Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional RegressionZhigen Zhao1 and Pengsheng Ji21Temple University2University of GeorgiapsjiugaeduThe variable selection and multiple testing problems for regres-sion have almost the same goalndashidentifying the important variablesamong many The research has been focusing on selection consis-tency which is possible only if the signals are sufficiently strongOn the contrary the signals in more modern applications are usu-ally rare and weak In this paper we developed a two-stage testingprocedure named it as ROMP short for the Rate Optimal Multi-ple testing Procedure because it achieves the fastest convergencerate of marginal false non-discovery rate (mFNR) while control-ling the marginal false discovery rate (mFDR) at any designatedlevel alpha asymptotically

Pathwise Calibrated Active Shooting Algorithm with Applica-tion to Semiparametric Graph EstimationTuo Zhao1 and Han Liu2

1Johns Hopkins University2Princeton UniversityhanliuprincetoneduThe pathwise coordinate optimization ndash combined with the activeset strategy ndash is arguably one of the most popular computationalframeworks for high dimensional problems It is conceptually sim-ple easy to implement and applicable to a wide range of convexand nonconvex problems However there is still a gap betweenits theoretical justification and practical success For high dimen-sional convex problems existing theories only show sublinear ratesof convergence For nonconvex problems almost no theory on therates of convergence exists To bridge this gap we propose a novelunified computational framework named PICASA for pathwise co-ordinate optimization The main difference between PICASA andexisting pathwise coordinate descent methods is that we exploit aproximal gradient pilot to identify an active set Such a modifica-tion though simple has profound impact with high probabilityPICASA attains a global geometric rate of convergence to a uniquesparse local solution with good statistical properties (eg minimaxoptimality oracle property) for solving a large family of convex and

92 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

nonconvex problems Unlike most existing analysis which assumesthat all the computation can be carried out exactly without worry-ing about numerical precision our theory explicitly counts the nu-merical computation accuracy and thus is more realistic The PI-CASA method is quite general and can be combined with differentcoordinate descent strategies such as cyclical coordinate descentgreedy coordinate descent and randomized coordinate descent As

an application we apply the PICASA method to a family of noncon-vex optimization problems motivated by estimating semiparametricgraphical models The PICASA method allows us to obtain newstatistical recovery results on both parameter estimation and graphselection consistency which do not exist in the existing literatureThorough numerical results are also provided to back up our theo-retical arguments

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 93

Index of Authors

Abantovalle C 19 38Abe N 30 78Ahn S 30 78Akacha M 31 82Allen GI 31 82Amei A 34 91Amin M 33 89Apanasovich TV 29 74Artemiou A 31 81Au S 32 85Aue A 24 56author) TZ( 27 67

Bai X 26 61Baiocchi M 28 71Bakanda C 21 42Baker Y 31 82Balasubramanian K 26 60Ball G 21 44Bandyopadhyay N 33 88Bao S 32 32 84 84Barrdahl M 22 49Bayman EO 30 77Becker K 21 42Bengtsson T 33 86Berger TW 21 45Bernhardt P 26 63Beyene J 21 42Bhamidi S 29 72Bidouard J 20 39Blocker AW 20 40Boerwinkle E 31 79Bornn L 20 39Boye ME 20 40Brannath W 23 50Branski R 33 89Braun T 22 47Breidt J 24 55Bretz F 23 50Brown ER 28 69Brown M 24 54

Cai C 23 34 53 92Cai J 31 33 81 88Campbell J 19 38Candille S 22 47Cao G 22 49Carriere KC 30 79Cepurna WO 32 85

Chan G 21 45Chan W 34 90Chang H 31 80Chang J 26 63Chatterjee A 31 81Chatterjee N 22 49Chen B 28 70Chen G 29 32 71 85Chen H 29 74Chen L 28 69Chen M 19 20 21 23 29

38 40 44 52 73Chen Q 31 82Chen R 31 79Chen S 25 28 58 70Chen T 31 80Chen X 26 61Chen Y 23 24 34 53 54

92Chen Z 22 29 33 49 73

87Cheng G 19 36Cheng X 20 39Cheng Y 21 27 44 65Chervoneva I 29 74Cheung YK 27 64Chi E 29 75Chiang AY 28 68Chiruvolu P 21 44Cho J 23 24 52 54Cho S 30 78Choi D 32 85Choi DS 24 54Choi S 22 33 48 87Chu R 21 42Chuang-Stein C 20 42Chun H 26 61Coan J 27 67Colantuoni E 28 71Collins R 21 42Coneelly K 22 47Cook R 28 70Coram M 22 47Crespi C 23 50Cui L 30 77Cui Y 22 33 46 87

DrsquoAmico E 21 42Dıaz I 28 71

Dabuxilatu W 29 72Dai J 27 65Daviglus M 34 90DeFor T 21 45Degras D 27 67Deng K 24 55Deng Y 33 88Dey D 19 38 64Dey DK 19 38Dey J 21 44Di Y 19 37Dinwoodie I 30 78Djorgovski G 20 39Dominici F 28 68Donalek C 20 39Dong G 23 50Dong Y 31 31 81 81Dormitorio B 33 88Drake A 20 39Du Z 24 54Duan Y 19 38Dunloop D 32 84Dyk DV 20 38

Edlefsen PT 21 43Elliott M 21 42Etzioni R 32 32 83 83

Fan B 32 85Fan J 34 90Fan Y 26 63Fang L 21 25 45 57Fang Y 33 89Faries D 26 61Faruquie T 30 78Fei T 24 54Feng H 22 47Feng Y 29 31 33 71 81

88Feng Z 32 85Fink J 21 44Fisch R 28 68Franceschini N 31 79Freydin B 29 74Fu H 23 25 25 29 50 59

59 74

Gaines D 19 38Gao B 25 57

Gentleman R 19Gneiting T 32 84Gong Q 21 45Graham M 20 39Gu C 32 85Guan W 22 48Gulati R 32 83Gulukota K 24 53Guo S 25 57Guo W 35 92Guo X 19 36

Ha MJ 23 51Hale MD 25 57Han B 33 87Han L 25 57Han P 34 89Han SW 29 73Haneuse S 28 68Hannig J 23 27 51 66Hao N 33 88He K 26 63He QA 28 68He T 22 46He W 20 24 42 57He X 23 53He Y 19 36Heitjan DF 25 32 59 84Hernandez-Stumpfhauser

D 24 55Ho S 30 75Hong CS 30 78Hong H 30 78Hong Y 19 38Hopcroft J 27 65Hormann S 24 56Hou L 23 52Houseman EA 20 40Hsu C 25 57Hsu L 20 41Hsu W 22 49Hu J 34 92Hu M 24 55Hu P 22 49Hu Y 19 37Huang C 21 21 45 45Huang J 25 60Huang M 31 81

95

Huang X 29 33 34 3474 87 91 92

Huang Y 23 26 51 62Hubbard R 32 83Huerta G 27 67Hung HJ 30 76Hung J 64Huo X 22 48

Ibrahim JG 20 31 40 82Inoue LY 32 83Islam SS 20 42

Jackson C 27 67Ji P 35 92Ji Y 24 24 28 53 53 68Jia N 25 59Jia X 27 64Jiang H 19 30 36 78Jiang Q 20 21 42 44Jiang X 19 38Jiang Y 19 36Jiao X 20 38Jin Z 21 44Johnson EC 32 85Johnson K 26 62Joshi AD 22 49Joslyn S 32 84Jung S 33 89Justice AC 25 59

Kai B 31 81Kambadur A 30 78Kang J 31 34 34 81 90

90Katki H 32 83Kim DW 33 86Kim J 34 89Kim JK 28 70Kim M 34 90Kim S 31 79Kim Y 22 48Kolivras K 19 38Kong L 29 75Kooperberg C 27 65Kosorok MR 29 71Kovalchik S 21 42Kracht K 21 44Kraft P 22 49Kuo H 22 48Kuo RC 19 38Kwon M 25 60

Lai M 32 82Lai RCS 23 51Lai T 28 71Lai TL 28 33 71 86Landon J 32 85Lang K 30 78Lavori PW 28 71Leary E 27 67

Lebanon G 26 60Lecci F 20 39Lee CH 21 45Lee J 24 32 53 84Lee KH 28 68Lee M 30 76Lee MT 24 56Lee S 27 64Lee SY 25 60Lee TCM 23 51Lenzenweger MF 21 43Leu CS 27 65Levin B 27 65Levy DL 21 43Li C 22 48Li D 31 80Li F 27 67Li G 23 27 33 50 66 88Li H 23 50Li J 19 34 37 38 91Li L 23 26 26 31 52 60

61 80Li M 19 22 37 48Li P 27 65Li R 26 62Li X 23 49Li Y 23 25 25 26 29 30

53 59 59 6375 79

Li-Xuan L 29 73Lian H 19 36Liang B 20 41Liang F 24 53Liang H 27 65Liao OY 33 86Lim J 22 48Lin D 28 31 69 79Linder D 31 81Lindquist M 27 67Lipshultz S 26 62Lipsitz S 26 62Liu B 34 91Liu D 20 22 41 46Liu H 28 35 70 92Liu J 26 61Liu JS 24 55Liu K 27 66Liu L 34 90Liu M 20 39 40Liu R 29 73Liu S 33 87Liu X 20 21 41 44Liu XS 24 54Liu Y 22 35 46 92Liu Z 31 82Long Q 28 69Lonita-Laza I 20 40Lou X 25 60Lozano A 30 78Lu T 27 65Lu W 20 34 39 91

Lu Y 20 32 39 85Luo R 27 65Luo S 23 51Luo X 21 30 45 77Lv J 26 63Lynch G 35 92

Ma H 20 22 42 49Ma J 29 72Ma P 20 40Ma TF 24 56Ma Z 22 46Maca J 30 76Mahabal A 20 39Mai Q 26 64Majumdar AP 27 66Malinowski A 21 46Mandrekar V 22 46Manner D 23 50Marniquet X 20 39Martin R 27 66Martino S 21 42Matthews M 33 86Maurer W 23 50McGuire V 32 85McIsaac M 28 70McKeague IW 31 80Meng X 27 64 66Mesbah M 24 56Mi G 19 37Mias GI 19 37Michailidis G 29 72Mills EJ 21 42Min X 28 68Mitra R 24 53Mizera I 29 75Molinaro A 28 69Monsell BC 30 78Morgan CJNA 21 43Morrison JC 32 85Mueller P 24 28 53 68

Nachega JB 21 42Naranjo J 33 88Nettleton D 23 51Nguyen HQ 22 47Nie L 29 75Nie X 23 51Ning J 23 28 30 33 34

53 70 78 87 92Nobel A 33 88Nobel AB 29 72Nordman DJ 23 51Norinho DD 24 56Normand S 25North KE 31 79Norton JD 20 41Nosedal A 27 67

Offen W 64Ogden RT 29 74

Ohlssen D 28 68Okwuokenye M 34 90Olshen A 28 69Owen AB 27 66Ozekici S 32 85

Paik J 28 71Pan G 30 76Pan J 31 80Pan Y 34 89Park D 28 67Park DH 22 48Park S 64Park T 25 60Pati D 26 62Peng H 31 34 80 91Peng J 19 37Peng L 26 34 62 91Perry P 24 54Peterson C 31 81Phoa FKH 31 79Pinheiro J 25 57Planck SR 32 85Prentice R 20 41Price K 23 33 50 87Prisley S 19 38Pullenayegum E 21 42

Qazilbash M 22 47Qi X 27 65Qian PZG 23 51Qiao X 29 33 71 89Qin J 21 28 45 70Qin R 27 64Qin ZS 24 55Qiu J 31 79Qiu Y 29 73Qu A 32 82Quartey G 20 42

Raftery A 32 84Ravikumar P 31 82Rayamajhi J 33 86Ren Z 29 73Rohe K 24 54Rosales M 21 43Rosenbaum JT 32 85Rosenblum M 28 71Rube HT 19 37Rubin D 29 74

Saegusa T 24 54Salzman J 19 36Samawi H 31 81Samorodnitsky G 27 65Samworth RJ 31 81Schafer DW 19 37Schlather M 21 46Schmidli H 31 82Schrag D 28 68Scott J 20 42

Shadel W 21 42Shao Y 25 57Shariff H 20 38She B 32 84Shen H 33 88Shen W 20 30 40 78Shen Y 28 70Shepherd J 32 85Shi P 32 82Shih M 28 71Shin J 30 78Shin SJ 35 92Shojaie A 24 29 54 72Shu X 32 82Shui M 32 84SienkiewiczE 21 46Simon N 33 86Simon R 33 86Sinha D 26 62Sloughter JM 32 84Smith B 25 57Smith BT 34 91Snapinn S 21 44Song C 28 68Song D 21 45Song J 32 84Song JS 19 37Song M 22 49Song R 23 34 51 89Song X 20 40Soon G 29 30 75 75Sorant AJ 25 59Soyer R 32 85Sriperambadur B 26 60Steiner PM 34 90Stingo F 31 81Strawderman R 28 69Su X 26 34 61 90Su Z 26 61Suh EY 30 76Suktitipat B 25 59Sun D 27 66Sun J 23 53Sun N 22 48Sun Q 29 72Sun T 22 46Sun W 23 51Sung H 25 59Suresh R 30 77Symanzik J 32 84

Tamhane A 30 76Tan F 34 91Tang CY 26 63

Tang H 22 47Tang Y 26 62Tao M 22 48Tao R 31 79Taylor J 22 47Tewson P 32 84Thabane L 21 42Thall PF 22 47Todem D 22 49Trippa L 28 68Trotta R 20 38Tucker A 26 62

Vannucci M 31 81Verhaak RG 24 54Vogel R 31 81Vrtilek S 20 39

Wahed A 21 44Waldron L 24 55Wang A 34 91Wang B 33 89Wang C 24 56Wang D 30 77Wang G 29 74Wang H 21 23 46 53Wang J 26 27 63 66Wang L 29 32 34 74 82

89Wang M 28 34 69 92Wang Q 26 27 61 67Wang R 19 37Wang S 29 31 74 80Wang W 31 79Wang X 19 25 32 38 58

84Wang Y 20 20 22 25 25

41 41 48 58 59Wang Z 25 25 33 59 59

87Wei WW 30 79Wei Y 20 40Wen S 20 21 42 44Weng H 29 71Weng RC 19 38Wettstein G 20 39Whitmore GA 24 56Wileyto EP 25 59Wilson AF 25 59Wilson JD 29 72Witten D 23 51Woerd MVD 24 55Wolf M 33 86Wolfe PJ 24 54

Wong WK 23 31 31 5079 79

Wu C 32 82Wu D 24 55Wu H 22 27 47 65Wu J 22 32 47 85Wu M 23 52Wu R 31 80Wu S 23 50Wu Y 21 26 30 35 43

63 77 92

Xi D 30 76Xia J 32 83Xia T 20 39Xiao R 22 48Xie J 31 81Xie M 32 85Xing H 22 48Xing X 20 40Xiong J 24 57Xiong X 22 47Xu K 25 59Xu R 23 51Xu X 25 58Xu Y 28 68Xu Z 34 89Xue H 27 65Xue L 32 82

Yang B 30 77Yang D 33 88Yang E 31 82Yang S 24 28 34 56 70

89Yao R 30 77Yao W 31 81Yau CY 24 56Yavuz I 21 44Yi G 24 57Yin G 33 87Ying G 25 32 59 84Young LJ 27 67Yu C 28 70Yu D 29 75Yu L 31 34 81 90Yu Y 31 81Yuan Y 30 33 33 78 87

87

Zacks S 27 65Zang Y 33 87Zeng D 20 20 28 29 31

41 41 69 7179 82

Zhan M 21 44Zhang B 29 75Zhang C 19 23 36 52Zhang D 20 26 40 63Zhang G 22 49Zhang H 19 28 36 68Zhang HH 26 33 35 63

88 92Zhang I 25 58Zhang J 28 34 69 91Zhang L 21 30 44 77Zhang N 29 75Zhang Q 25 59Zhang S 25 58Zhang W 20 39Zhang X 19 23 26 36 53

63Zhang Y 24 25 54 58Zhang Z 21 27 29 46 67

75Zhao H 23 29 52 73Zhao L 22 25 47 59Zhao N 23 52Zhao P 34 90Zhao S 25 57Zhao T 35 92Zhao Y 29 33 74 87Zhao Z 35 92Zheng C 29 72Zheng Q 34 91Zheng Y 20 29 41 72Zheng Z 26 63Zhong H 29 73Zhong L 25 57Zhong P 22 46Zhong W 20 22 40 46Zhou H 22 26 29 48 60

73Zhou L 31 82Zhou Q 29 73Zhou T 23 50Zhou Y 30 77Zhu G 26 61Zhu H 26 60Zhu J 26 63Zhu L 21 44Zhu M 26 62Zhu Y 24 55Zhu Z 24 32 56 84Zou F 22 48Zou H 26 64Zou W 33 87

  • Welcome
  • Conference Information
    • Committees
    • Acknowledgements
    • Conference Venue Information
    • Program Overview
    • Keynote Lectures
    • Student Paper Awards
    • Short Courses
    • Social Program
    • ICSA 2015 in Fort Collins CO
    • ICSA 2014 China Statistics Conference
    • ICSA Dinner at 2014 JSM
      • Scientific Program
        • Monday June 16 800 AM - 930 AM
        • Monday June 16 1000 AM-1200 PM
        • Monday June 16 130 PM - 310 PM
        • Monday June 16 330 PM - 510 PM
        • Tuesday June 17 820 AM - 930 AM
        • Tuesday June 17 1000 AM - 1200 PM
        • Tuesday June 17 130 PM - 310 PM
        • Tuesday June 17 330 PM - 530 PM
        • Wednesday June 18 830 AM - 1010 AM
        • Wednesday June 18 1030 AM-1210 PM
          • Abstracts
            • Session 1 Emerging Statistical Methods for Complex Data
            • Session 2 Statistical Methods for Sequencing Data Analysis
            • Session 3 Modeling Big Biological Data with Complex Structures
            • Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses
            • Session 5 Recent Advances in Astro-Statistics
            • Session 6 Statistical Methods and Application in Genetics
            • Session 7 Statistical Inference of Complex Associations in High-Dimensional Data
            • Session 8 Recent Developments in Survival Analysis
            • Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products
            • Session 10 Analysis of Observational Studies and Clinical Trials
            • Session 11 Lifetime Data Analysis
            • Session 12 Safety Signal Detection and Safety Analysis
            • Session 13 Survival and Recurrent Event Data Analysis
            • Session 14 Statistical Analysis on Massive Data from Point Processes
            • Session 15 High Dimensional Inference (or Testing)
            • Session 16 Phase II Clinical Trial Design with Survival Endpoint
            • Session 17 Statistical Modeling of High-throughput Genomics Data
            • Session 18 Statistical Applications in Finance
            • Session 19 Hypothesis Testing
            • Session 20 Design and Analysis of Clinical Trials
            • Session 21 New methods for Big Data
            • Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data
            • Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process
            • Session 24 Bayesian Models for High Dimensional Complex Data
            • Session 25 Statistical Methods for Network Analysis
            • Session 26 New Analysis Methods for Understanding Complex Diseases and Biology
            • Session 27 Recent Advances in Time Series Analysis
            • Session 28 Analysis of Correlated Longitudinal and Survival Data
            • Session 29 Clinical Pharmacology
            • Session 30 Sample Size Estimation
            • Session 31 Predictions in Clinical Trials
            • Session 32 Recent Advances in Statistical Genetics
            • Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization
            • Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications
            • Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials
            • Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis
            • Session 37 High-Dimensional Data Analysis Theory and Application
            • Session 38 Leading Across Boundaries Leadership Development for Statisticians
            • Session 39 Recent Advances in Adaptive Designs in Early Phase Trials
            • Session 40 High Dimensional RegressionMachine Learning
            • Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice
            • Session 42 Applications of Spatial Modeling and Imaging Data
            • Session 43 Recent Development in Survival Analysis and Statistical Genetics
            • Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population
            • Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis
            • Session 46 Missing Data the Interface between Survey Sampling and Biostatistics
            • Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine
            • Session 48 Student Award Session 1
            • Session 49 Network AnalysisUnsupervised Methods
            • Session 50 Personalized Medicine and Adaptive Design
            • Session 51 New Development in Functional Data Analysis
            • Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs
            • Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials
            • Session 54 Approaches to Assessing Qualitative Interactions
            • Session 55 Interim Decision-Making in Phase II Trials
            • Session 56 Recent Advancement in Statistical Methods
            • Session 57 Building Bridges between Research and Practice in Time Series Analysis
            • Session 58 Recent Advances in Design for Biostatistical Problems
            • Session 59 Student Award Session 2
            • Session 60 Semi-parametric Methods
            • Session 61 Statistical Challenges in Variable Selection for Graphical Modeling
            • Session 62 Recent Advances in Non- and Semi-Parametric Methods
            • Session 63 Statistical Challenges and Development in Cancer Screening Research
            • Session 64 Recent Developments in the Visualization and Exploration of Spatial Data
            • Session 65 Advancement in Biostaistical Methods and Applications
            • Session 66 Analysis of Complex Data
            • Session 67 Statistical Issues in Co-development of Drug and Biomarker
            • Session 68 New Challenges for Statistical AnalystProgrammer
            • Session 69 Adaptive and Sequential Methods for Clinical Trials
            • Session 70 Survival Analysis
            • Session 71 Complex Data Analysis Theory and Application
            • Session 72 Recent Development in Statistics Methods for Missing Data
            • Session 73 Machine Learning Methods for Causal Inference in Health Studies
            • Session 74 JP Hsu Memorial Session
            • Session 75 Challenge and New Development in Model Fitting and Selection
            • Session 76 Advanced Methods and Their Applications in Survival Analysis
            • Session 77 High Dimensional Variable Selection and Multiple Testing
              • Index of Authors
Page 4: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene

Contents

Welcome 1Conference Information 2

Committees 2Acknowledgements 4Conference Venue Information 6Program Overview 7Keynote Lectures 8Student Paper Awards 9Short Courses 10Social Program 15ICSA 2015 in Fort Collins CO 16ICSA 2014 China Statistics Conference 17ICSA Dinner at 2014 JSM 18

Scientific Program 19Monday June 16 800 AM - 930 AM 19Monday June 16 1000 AM-1200 PM 19Monday June 16 130 PM - 310 PM 21Monday June 16 330 PM - 510 PM 23Tuesday June 17 820 AM - 930 AM 25Tuesday June 17 1000 AM - 1200 PM 25Tuesday June 17 130 PM - 310 PM 27Tuesday June 17 330 PM - 530 PM 29Wednesday June 18 830 AM - 1010 AM 31Wednesday June 18 1030 AM-1210 PM 33

Abstracts 36Session 1 Emerging Statistical Methods for Complex Data 36Session 2 Statistical Methods for Sequencing Data Analysis 36Session 3 Modeling Big Biological Data with Complex Structures 37Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses 38Session 5 Recent Advances in Astro-Statistics 38Session 6 Statistical Methods and Application in Genetics 39Session 7 Statistical Inference of Complex Associations in High-Dimensional Data 40Session 8 Recent Developments in Survival Analysis 40Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products 41Session 10 Analysis of Observational Studies and Clinical Trials 42Session 11 Lifetime Data Analysis 44Session 12 Safety Signal Detection and Safety Analysis 44Session 13 Survival and Recurrent Event Data Analysis 45Session 14 Statistical Analysis on Massive Data from Point Processes 45Session 15 High Dimensional Inference (or Testing) 46Session 16 Phase II Clinical Trial Design with Survival Endpoint 47Session 17 Statistical Modeling of High-throughput Genomics Data 47Session 18 Statistical Applications in Finance 48Session 19 Hypothesis Testing 49Session 20 Design and Analysis of Clinical Trials 50

iii

Session 21 New methods for Big Data 51Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data 51Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process 52Session 24 Bayesian Models for High Dimensional Complex Data 53Session 25 Statistical Methods for Network Analysis 54Session 26 New Analysis Methods for Understanding Complex Diseases and Biology 54Session 27 Recent Advances in Time Series Analysis 55Session 28 Analysis of Correlated Longitudinal and Survival Data 56Session 29 Clinical Pharmacology 57Session 30 Sample Size Estimation 58Session 31 Predictions in Clinical Trials 59Session 32 Recent Advances in Statistical Genetics 59Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization 60Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications 61Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials 61Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis 62Session 37 High-Dimensional Data Analysis Theory and Application 63Session 38 Leading Across Boundaries Leadership Development for Statisticians 64Session 39 Recent Advances in Adaptive Designs in Early Phase Trials 64Session 40 High Dimensional RegressionMachine Learning 65Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice 66Session 42 Applications of Spatial Modeling and Imaging Data 67Session 43 Recent Development in Survival Analysis and Statistical Genetics 67Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population 68Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis 69Session 46 Missing Data the Interface between Survey Sampling and Biostatistics 70Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine 70Session 48 Student Award Session 1 71Session 49 Network AnalysisUnsupervised Methods 72Session 50 Personalized Medicine and Adaptive Design 73Session 51 New Development in Functional Data Analysis 74Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs 75Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials 76Session 54 Approaches to Assessing Qualitative Interactions 76Session 55 Interim Decision-Making in Phase II Trials 77Session 56 Recent Advancement in Statistical Methods 78Session 57 Building Bridges between Research and Practice in Time Series Analysis 78Session 58 Recent Advances in Design for Biostatistical Problems 79Session 59 Student Award Session 2 79Session 60 Semi-parametric Methods 80Session 61 Statistical Challenges in Variable Selection for Graphical Modeling 81Session 62 Recent Advances in Non- and Semi-Parametric Methods 82Session 63 Statistical Challenges and Development in Cancer Screening Research 83Session 64 Recent Developments in the Visualization and Exploration of Spatial Data 84Session 65 Advancement in Biostaistical Methods and Applications 84Session 66 Analysis of Complex Data 85Session 67 Statistical Issues in Co-development of Drug and Biomarker 86Session 68 New Challenges for Statistical AnalystProgrammer 86Session 69 Adaptive and Sequential Methods for Clinical Trials 87Session 70 Survival Analysis 88Session 71 Complex Data Analysis Theory and Application 88Session 72 Recent Development in Statistics Methods for Missing Data 89Session 73 Machine Learning Methods for Causal Inference in Health Studies 90Session 74 JP Hsu Memorial Session 90Session 75 Challenge and New Development in Model Fitting and Selection 91Session 76 Advanced Methods and Their Applications in Survival Analysis 91

Session 77 High Dimensional Variable Selection and Multiple Testing 92Index of Authors 94

2014 Joint Applied Statistics Symposium of ICSA and KISS

June 15-18 Marriot Downtown Waterfront Portland Oregon USA

Welcome to the 2014 joint International Chinese Statistical Association (ICSA) and

the Korean International Statistical Society (KISS) Applied Statistical Symposium

This is the 23rd of the ICSA annual symposium and 1st for KISS The organizing committees have

been working hard to put together a strong program including 7 short courses 3 keynote lectures

76 scientific sessions student paper sessions and social events Our scientific program includes

keynote lectures from prominent statisticians Dr Sharon-Lise Normand Dr Robert Gentleman and

Dr Sastry Pantula and invited and contributed talks covering cutting-edge topics on Genome Scale

data and big data as well as on the new world of statistics after 2013 international year of statis-

tics We hope this symposium will provide abundant opportunities for you to engage learn and

network and get inspirations to advance old research ideas and develop new ones We believe this

will be a memorable and worthwhile learning experience for you

Portland is located near the confluence of the Willamette and Columbia rivers with unique city cul-

ture It is close to the famous Columbia gorge Oregon high mountains and coast Oregon is also

famous for many micro- breweries and beautiful wineries without sale tax June is a great time to

visit We hope you also have opportunities to experience the rich culture and activities the city has

to offer during your stay

Thanks for coming to the 2014 ICSA-KISS Applied Statistics Symposium in Portland

Dongseok Choi and Rochelle Fu on behalf of

2014 ICSA-KISS Applied Statistics Symposium Executive and Organizing committees

The city The city The city of roses of roses of roses welcomes welcomes welcomes you you you

Committees

2 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Executive13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Co-Chair amp Treasurer Oregon Health amp Science U Joan Hu Simon Fraser U Zhezhen Jin Program Chair Columbia U Ouhong Wang Amgen Ru-Fang Yeh Genentech XH Andrew Zhou U of Washington Cheolwoo Park Webmaster U of Georgia

Local13 Committee13 Dongseok Choi Co-Chair Oregon Health amp Science U Rochelle Fu Chair Oregon Health amp Science U Yiyi Chen Oregon Health amp Science U Thuan Nguyen Oregon Health amp Science U Byung Park Oregon Health amp Science U Xinbo Zhang Oregon Health amp Science U

Program13 Committee13 Zhezhen Jin Chair Columbia U Gideon Bahn VA Hospital Kani Chen Hong Kong U of Science and Technology Yang Feng Columbia U Liang Fang Gilead Qi Jiang Amgen Mikyoung Jun Texas AampM U Sin-Ho Jung Duke U Xiaoping Sylvia Hu Gene Jane Paik Kim Stanford U Mimi Kim Albert Einstein College of Medicine Mi-OK Kim Cincinnati Childrens Hospital Medical Center Gang Li Johnson and Johnson Yunfeng Li Phamacyclics Mei-Ling Ting Lee U of Maryland Yoonkyung Lee Ohio State U Meng-Ling Liu New York U Xinhua Liu Columbia U Xiaolong Luo Celgene Corporation Taesung Park Seoul National U Yu Shen MD Anderson Cancer center Greg (Guoxing) Soon US Food and Drug Administration Zheng Su Deerfield Company Christine Wang Amgen Lan Xue Oregon State U Yichuan Zhao Georgia State U

Committees

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 3

Program13 Book13 Committee13 Mengling Liu Chair New York U Tian Zheng Columbia U Wen (Jenna) Su Columbia U Zhenzhen Jin Columbia U

Student13 Paper13 Award13 Committee13 Wenqing He Chair U of Western Ontario Qixuan Chen Columbia U Hyunson Cho National Cancer Institute Dandan Liu Vanderbilt U Jinchi Lv U of Southern California

Short13 Course13 Committee13 Xiaonan Xue Chair Albert Einstein College of Medicine Wei-Ting Hwang U of Pennsylvania Ryung Kim Albert Einstein College of Medicine Jessica Kim US Food and Drug Administration Laura Lu US Food and Drug Administration Mikyoung Jun Texas AampM U Tao Wang Albert Einstein College of Medicine

IT13 Support13 Lixin (Simon) Gao Biopier Inc

Symposium Sponsors

The 2014 ICSA-KISS Applied Statistics Symposium is supported by a financial contribu-

tion from the following sponsors

The organizing committees greatly appreciate the support of the above sponsors

The 2014 ICSA-KISS Joint Applied Statistics Symposium Exhibitor

CRC Press mdash Taylor amp Francis Group

Springer Science amp Business Media

The Lotus Group

MedfordRoom

Salon G

Salon H

Salon F Salon E

Salon I

Salon A

Lounge

GiftShop

Willamette Room

ColumbiaRoom

BellStand

Main Lobby

SunstoneRoom

FitnessCenter

Whirlpool

SwimmingPool

MeadowlarkRoom

Douglas FirRoom

SalmonRoom

Patio

Skywalk toCrown Plaza Parking

Guest Laundry

Ice

Hot

elSe

rvic

e A

rea

Concierge

Front Desk

Mai

n En

tran

ce

BallroomLobby

EscalatorStairs

Stairs

Elevators

Elevators

Elevators

PortlandRoom

EugeneRoom

Salon B

Salon C

Salon D

SalemRoom

EscalatorStairs

Stairs

Lower Level 1Main Lobby

3rd Floor2nd Floor

HotelService Area

HotelService Area

portland marriott downtown waterfront

hotel floor plans 1401 SW Naito Parkway bull Portland Oregon 97201Hotel (503) 226-7600

Sales Facsimile (503) 226-1209portlandmarriottcom

RegistrationDesk

SalesEvents

and Executive

Offices

Hotel Service Area

RegistrationStorage

Audio Visual

Storage

Mount HoodRoom

Haw

thor

neRo

omB

elm

ont

Room

Laur

elhu

rst

Room

PearlRoom

Open ToLobby

RestaurantLobby

Hotel

Service Area

Elev

ator

s

Escalator

Lobby Baramp Cafeacute

Program Overview

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18

Sunday June 15th 2014 Time Room Session 800 AM - 600 PM Ballroom Foyer Registration 700 AM - 845AM Breakfast 945 AM ndash 1015 AM Break 800 AM - 500 PM Salon A Short Course Recent Advances in Bayesian Adaptive Clinical Trial Design 800 AM - 500 PM Salon B Short Course Analysis of Life History Data with Multistate Models 800 AM - 500 PM Salon C Short Course Propensity Score Methods in Medical Research for the Applied Statistician 800 AM - 1200 PM Salon D Short Course ChIP-seq for transcription and epigenetic gene regulation 800 AM - 1200 PM Columbia Short Course Data Monitoring Committees In Clinical Trials 1200 PM - 100 PM Lunch for Registered Full-Day Short Course Attendees

100 PM - 500 PM Salon D Short Course Analysis of Genetic Association Studies Using Sequencing Data and Related Topics

100 PM - 500 PM Columbia Short Course Analysis of biomarkers for prognosis and response prediction 245 PM - 315 PM Break 600 PM - 830 PM Mt Hood ICSA Board Meeting (Invited Only) 700 PM - 900 PM Salon E Opening Mixer

Monday June 16th 2014 730 AM - 600 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 800 AM - 820 AM Salon E-F Welcome 820 AM - 930 AM Salon E-F Keynote I Robert Gentleman Genetech 930 AM - 1000 AM Ballroom Foyer Break 1000 AM -1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 510 PM See program Parallel Sessions

Tuesday June 17th 2014 820 AM - 530 PM Ballroom Foyer Registration 700 AM ndash 845AM Breakfast 820 AM - 930 AM Salon E-F Keynote II Sharon-Lise Normand Harvard University 930 AM - 1000 AM Ballroom Foyer Break 1000 AM - 1200 PM See program Parallel Sessions 1200 PM - 130 PM Lunch on own 130 PM - 310 PM See program Parallel Sessions 310 PM - 330 PM Ballroom Foyer Break 330 PM - 530 PM See program Parallel Sessions 630 PM - 930 PM Off site Banquet (Banquet speaker Dr Sastry Pantula Oregon State University)

Wednesday June 18th 2014 830 AM - 100 PM Ballroom Foyer Registration 730 AM ndash 900 AM Breakfast 830 AM - 1010 AM See program Parallel Sessions 1010 AM - 1030 AM Ballroom Foyer Break 1030 AM - 1210 PM See program Parallel Sessions

Keynote Lectures

8 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Monday June 16th 820 AM - 930 AM

Robert Gentleman Senior Director Bioinformatics Genentech Postdoctoral Mentor Speaker Biography I joined Genentech in 2009 as Senior Director of the Bioinformatics and Computational Biology

Department I was excited by the opportunity to get involved in drug development and to do work that would directly impact patients I had worked at two major cancer centers and while immensely satisfying the research done there is still fairly distant from the patient At Genentech patients are at the forefront of everything we do Genentech Research is that rare blend of academia and industry that manages to capture most of the best aspects of both The advent of genome scale data technologies is revolutionizing molecular biology and is providing us with new and exciting opportunities for drug development I am very excited by the new opportunities we have to develop methods for computational discovery of potential drug targets At the same time these large genomic data sets provide us with opportunities to identify and understand different patient subsets and to help guide us towards much more targeted therapeutics

Postdoctoral Mentor

Being a post-doc mentor is one of the highlights of being in Research The ability to work with really talented post-docs who are interested in pushing the boundaries of computational science provides me with an outlet for my blue-skies research ideas Title Analyzing Genome Scale Data I will discuss some of the many genome scale data analysis problems such as variant calling and genotyping I will discuss the statistical approaches used as well as the software development needs of addressing these problems I will discuss approaches to parallelization of code and other practical computing issues that face most data analysts working on these data

Tuesday June 17th 820 AM-930 AM

Sharon-Lise Normand Professor Department of Health Care Policy Harvard Medical School Department of Biostatistics Harvard School of Public Health Speaker Biography Sharon-Lise T Normand PhD is a

professor of health care policy (biostatistics) in the Department of Health Care Policy at Harvard Medical School and in the Department of Biostatistics at the Harvard School of Public Health Dr Normandrsquos research focuses on the development of statistical methods for health services research primarily using Bayesian approaches to problem solving including assessment of quality of care methods for causal inference provider profiling meta-analysis and latent variable modeling She has developed a long line of research on methods for the analysis of patterns of treatment and quality of care for patients with cardiovascular disease and with mental disorders in particular Title Combining Information for Assessing Safety Effectiveness and Quality Technology Diffusion and Health Policy Health information growth has created unprecedented opportunities to evaluate therapies in large and broadly representative patient populations Extracting sound evidence from large observational data is now at the forefront of health care policy decisions - regulators are moving away from a strict biomedical perspective to one that is wider for coverage of new medical technologies Yet discriminating between beneficial and wasteful new technology remains methodologically challenging - while big data provide opportunities to study treatment effect heterogeneity estimation of average causal effects in sub-populations are underdeveloped in observational data and correct choice of confounding adjustment is difficult in the large p setting In this talk I discuss analytical issues related to the analysis of observational data when the goals involve characterizing the diffusion of multiple new technologies and assessing their causal impacts in the areas of mental illness and cardiovascular interventions This work is supported in part by grants U01-MH103018 from the National Institutes of Health and U01-FD004493 from the US Food and Drug Administration

Student Paper Awards 13

2014 ICSA-KISS Applied Statistics Symposium 2014 Portland June 15-18 9

ASA13 Bio-shy‐pharmaceutical13 Awards13 Guanhua Chen University of North Carolina ndash Chapel Hill

⎯ Title Personalized Dose Finding Using Outcome Weighted Learning ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Cheng Zheng University of Washington

⎯ Title Survival Rates Prediction when Training Data and Target Data have Different Measurement Error ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Jiann-shy‐Ping13 Hsu13 Pharmaceutical13 and13 Regulatory13 Sciences13 Student13 Paper13 Award13 Sandipan Roy University of Michigan

⎯ Title Estimating a Change-Point in High-Dimensional Markov Random Field Models ⎯ Time Wednesday June 18th 1030 AM - 1210 PM ⎯ Session 74 JP Hsu Memorial Session (Salon D Lower Level 1)

ICSA13 Student13 Paper13 Awards13 13

Ting-Huei Chen University of North Carolina ndash Chapel Hill ⎯ Title Using a Structural Equation Modeling Approach with Application in Alzheimerrsquos Disease ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Haolei Weng Columbia University

⎯ Title Regularization after Retention in Ultrahigh Dimensional Linear Regression Models ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Ran Tao University of North Carolina ndash Chapel Hill

⎯ Title Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Hsin-Wen Chang Columbia University

⎯ Title Empirical likelihood based tests for stochastic ordering under right censorship ⎯ Time Tuesday June 17th 330 PM ndash 530 PM ⎯ Session 59 Student Award Session 2 (Portland Room Lower Level 1)

Qiang Sun University of North Carolina ndash Chapel Hill ⎯ Title Hard Thresholded Regression Via Linear Programming ⎯ Time Tuesday June 17th 130 PM - 310 PM ⎯ Session 48 Student Award Session 1 (Portland Room Lower Level 1)

Short Courses

10 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

1 Recent Advances in Bayesian Adaptive Clinical Trial Design Presenters Peter F Thall amp Brian P Hobbs The University of Texas MD Anderson Cancer Center 1400 Hermann Pressler Dr Houston TX 77030-4008 Email rexmdandersonorg Course length One day OutlineDescription This one-day short course will cover a variety of recently developed Bayesian methods for the design and conduct of adaptive clinical trials Emphasis will be on practical application with the course structured around a series of specific illustrative examples Topics to be covered will include (1) using historical data in both planning and adaptive decision making during the trial (2) using elicited utilities or scores of different types of multivariate patient outcomes to characterize complex treatment effects (3) characterizing and calibrating prior effective sample size (4) monitoring safety and futility (5) eliciting and establishing priors and (6) using computer simulation as a design tool These methods will be illustrated by actual clinical trials including cancer trials involving chemotherapy for leukemia and colorectal cancer stem cell transplantation and radiation therapy as well as trials in neurology and neonatology The illustrations will include both early phase trials to optimize dose or dose and schedule and randomized comparative phase III trials References Braun TM Thall PF Nguyen H de Lima M Simultaneously optimizing dose and schedule of a new cytotoxic agent Clinical Trials 4113-124 2007 Hobbs BP Carlin BP Mandrekar S Sargent DJ Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials Biometrics 67 1047ndash1056 2011 Hobbs BP Sargent DJ Carlin BP Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models Bayesian Analysis 7 639ndash674 2012 Hobbs BP Carlin BP Sargent DJ Adaptive adjustment of the randomization ratio using historical control data Clinical Trials 10430-440 2013 Morita S Thall PF Mueller P Determining the effective sample size of a parametric prior Biometrics 64595-602 2008 Morita S Thall PF Mueller P Evaluating the impact of prior assumptions in Bayesian biostatistics Statistics in Biosciences 21-17 2010

Thall PF Bayesian models and decision algorithms for complex early phase clinical trials Statistical Science 25227-244 2010 Thall PF Szabo A Nguyen HQ et al Optimizing the concentration and bolus of a drug delivered by continuous infusion Biometrics 671638-1646 2011 Thall PF Nguyen HQ Adaptive randomization to improve utility-based dose-finding with bivariate ordinal outcomes J Biopharmaceutical Statistics 22785-801 2012 Thall PF Nguyen HQ Braun TM Qazilbash M Using joint utilities of the times to response and toxicity to adaptively optimize schedule-dose regimes Biometrics In press About the presenters

Dr Peter Thall has pioneered the use of Bayesian methods in medical research He has published over 160 research papers and book chapters in the statistical and medical literature including numerous papers providing innovative methods for the design conduct and analysis of clinical trials Over the course of his career he had designed over 300 clinical trials He has presented 20 short courses and over 130 invited talks and regularly provides statistical consultation for corporations in the pharmaceutical industry He has served as an associated editor for the journals Statistics in Medicine Journal of National Cancer Institute and Biometrics currently is an associate editor for the journals Clinical Trials Statistics in Biosciences and is an American Statistical Association Media Expert

Dr Brian P Hobbs is Assistant Professor in the Department of Biostatistics at the University of Texas MD Anderson Cancer Center in Houston Texas He completed his undergraduate education at the University of Iowa and obtained a masterrsquos and doctoral degree in biostatistics at the University of Minnesota in Minneapolis He was the recipient of 2010 ENAR John Van Ryzin Student Award Dr Hobbs completed a postdoctoral fellowship in the Department of Biostatistics at MD Anderson Cancer Center before joining the faculty in 2011 His methodological expertise covers Bayesian inferential methods hierarchical modeling utility-based inference adaptive trial design in the presence of historical controls sequential design in the presence of co-primary endpoints and semiparametric modeling of functional imaging data 2 Analysis of Life History Data with Multistate Models

Presenter Richard Cook and Jerry Lawless Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada Email rjcookuwaterlooca jlawlessuwaterlooca

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 11

Course Length One day

OutlineDescription

Life history studies examine specific outcomes and processes during peoples lifetimes For example cohort studies of chronic disease provide information on disease progression fixed and time-varying risk factors and the extent of heterogeneity in the population Modelling and analysis of life history processes is often facilitated by the use of multistate models The aim of this workshop is to present models and methods for multistate analyses and to indicate some current topics of research Software for conducting analyses will be discussed and code for specific problems will be given A wide range of illustrations involving chronic disease and other conditions will be presented Course notes will be distributed

TOPICS

1 Introduction 2 Some Basic Quantities for Event History Modelling 3 Some Illustrative Analyses Involving Multistate Models 4 Processes with Intermittent Observation 5 Modelling Heterogeneity and Associations 6 Dependent Censoring and Inspection 7 Some Other Topics About the presenters Richard Cook is Professor of Statistics at the University of Waterloo and holder of the Canada Research Chair in Statistical Methods for Health Research He has published extensively in the areas of statistical methodology clinical trials medicine and public health including many articles on event history analysis multistate models and the statistical analysis of life history data He collaborates with numerous researchers in medicine and public health and has consulted widely with pharmaceutical companies on the design and analysis of clinical trials

Jerry Lawless is Distinguished Professor Emeritus of Statistics at the University of Waterloo He has published extensively on statistical models and methods for survival and event history data life history processes and other topics and is the author of Statistical Models and Methods for Lifetime Data (2nd edition Wiley 2003) He has consulted and worked in many applied areas including medicine public health manufacturing and reliability Dr Lawless was the holder of the GM-NSERC Industrial Research Chair in Quality and Productivity from 1994 to 2004

Drs Cook and Lawless have co-authored many papers as well as the book The Statistical Analysis of Recurrent Events (Springer 2007) They have also given numerous workshops together

3 Propensity Score Methods in Medical Research for the Applied Statistician Presenter Ralph DrsquoAgostino Jr PhD Department of Biostatistical Sciences Wake Forest University School of Medicine Medical Center Boulevard Winston-Salem NC 27157 Email rdagostiwakehealthedu Course length One Day OutlineDescription

The purpose of this short course is to introduce propensity score methodology to applied statisticians Currently propensity score methods are being widely used in research but often their use is not accompanied by an explanation on how they were used or whether they were used appropriately This course will teach the attendee the definition of the propensity score show how it is estimated and present several applied examples of its use In addition SAS code will be presented to show how to estimate propensity scores assess model success and perform final treatment effect estimation Published medical journal articles that have used propensity score methods will be examined Some attention will be given to the use of propensity score methods for detecting safety signals using post-marketing data Upon completion of this workshop researchers should be able to understand what a propensity score is to know how to estimate it to identify under what circumstances they can be used to know how to evaluate whether a propensity score model ldquoworkedrdquo and to be able to critically review the medical literature where propensity scores have been used to determine whether they were used appropriately In addition attendees will be shown statistical programs using SAS software that will estimate propensity scores assess the success of the propensity score model and estimate treatment effects that take into account propensity scores Experience with SAS programming would be useful for attendees TextbookReferences

Rosenbaum P Rubin DB The central role of the propensity score in observational studies for causal effects Biometrika 19837041-55

DrsquoAgostino RB Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group Stat Med 1998 172265-2281

Rubin DB The design versus the analysis of observational studies for causal effects parallels with the design of randomized studies Stat Med 2007 2620-36

DrsquoAgostino RB Jr DrsquoAgostino RB Sr Estimating treatment effects using observational data JAMA 2007297(3) 314-316

Yue LQ Statistical and regulatory issues with the application of propensity score analysis to non-

Short Courses

12 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

randomized medical device clinical studies J Biopharm Stat 2007 17(1) 1-13

DrsquoAgostino RB Jr Propensity scores in cardiovascular research Circulation 2007 115(17)2340-2343

About the presenters Dr DAgostino holds a PhD in Mathematical Statistics from Harvard University He is a Fellow of the American Statistical Association and a Professor of Biostatistical Sciences at the Wake Forest School of Medicine (WFSM) He has been a principal investigator for several RO1 grantssubcontracts funded by the NIHCDC and has served as the Statistical Associate Editor for Arthroscopy (The Journal of Arthroscopy and Related Surgery) since 2008 and has previously been on the editorial boards for Current Controlled Trials in Cardiovascular Medicine the Journal of Cardiac Failure and the American Journal of Epidemiology He has published over 235 manuscripts and book chapters in areas of statistical methodology (in particular propensity score methods) cardiovascular disease diabetes cancer and genetics He has extensive experience in the design and analysis of clinical trials observational studies and large scale epidemiologic studies He has been an author on several manuscripts that describe propensity score methodology as well as many applied manuscripts that use this methodology In addition during the past twenty years Dr DrsquoAgostino has made numerous presentations and has taught several short courses and workshops on propensity score methods 4 ChIP-seq for transcription and epigenetic gene regulation Presenter X Shirley Liu Professor of Biostatistics and Computational Biology Harvard School of Public Health Director Center for Functional Cancer Epigenetics Dana-Farber Cancer Institute Associate member Broad Institute 450 Brookline Ave Mail CLS-11007 Boston MA 02215 Email xsliujimmyharvardedu Course length Half Day OutlineDescription With next generation sequencing ChIP-seq has become a popular technique to study transcriptional and epigenetic gene regulation The short course will introduce the technique of ChIP-seq and discuss the computational and statistical issues in analyzing ChIP-seq data They includes the initial data QC normalizing biases identifying transcription factor binding sites and target genes predicting additional transcription factor drivers in biological processes integrating binding with transcriptome and epigenome information We will also emphasize the importance of dynamic ChIP-seq and introduce some of the tools and databases that are useful for ChIP-seq data analysis

TextbookReferences Park PJ ChIP-seq advantages and challenges of a maturing technology Nat Rev Genet 2009 Oct10(10)669-80 Shin H Liu T Duan X Zhang Y Liu XS Computational methodology for ChIP-seq analysis Quantitative Biology 2013 About the presenter Dr X Shirley Liu is Professor of Biostatistics and Computational Biology at Harvard School of Public Health and Director of the Center for Functional Cancer Epigenetics at the Dana-Farber Cancer Institute Her research focuses on computational models of transcriptional and epigenetic regulation by algorithm development and data integration for high throughput data She has developed a number of widely used transcription factor motif finding (cited over 1700 times) and ChIP-chipseq analysis algorithms (over 8000 users) and has conducted pioneering research studies on gene regulation in development metabolism and cancers Dr Liu published over 100 papers including over 30 in Nature Science or Cell series and she has an H-index of 50 according to Google Scholar statistics She presented at over 50 conferences and workshops and gave research seminars at over 70 academic and research institutions worldwide 5 Data Monitoring Committees In Clinical Trials Presenter Jay Herson PhDSenior Associate Biostatistics Johns Hopkins Bloomberg School of Public Health Baltimore MD Email jayhersonearthlinknet Course Length Half day OutlineDescription This workshop deals with best practices for data monitoring committees (DMCs) in the pharmaceutical industry The emphasis is on safety monitoring because this constitutes 90 of the workload for pharmaceutical industry DMCs The speaker summarizes experience over 24 years of working as statistical member or supervisor of statistical support for DMCs He provides insight into the behind-the-scenes workings of DMCs which those working in industry or FDA may find surprising The introduction presents a stratification of the industry into Big Pharma Middle Pharma and Infant Pharma which will be referred to often in this workshop Subsequent sections deal with DMC formation DMC meetings and the process of serious adverse event (SAE) data flow The tutorialrsquos section on clinical issues explains the nature of MedDRA coding as well as issues in multinational trials This will be followed by a statistical section which reviews and illustrates the various methods of statistical analysis of treatment-emergent adverse events dealing with multiplicity and if time allows likelihood and Bayesian methods The

Short Courses

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 13

workshoprsquos review of biases and pitfalls describes reporting bias analysis bias granularity bias competing risks and recommendations to reduce bias A description of DMC decisions goes through various actions and ad hoc analyses the DMC can make when faced with an SAE issue and their limitations The workshop concludes with emerging issues such as adaptive designs causal inference biomarkers training DMC members cost control DMC audits mergers and licensing and the high tech future of clinical trials Text Herson J Data and Safety Monitoring Committees in Clinical Trials Chapman amp Hall CRC 2009 About the presenter Jay Herson received his PhD in Biostatistics from Johns Hopkins in 1971 After working on cancer clinical trials at MD Anderson Hospital he formed Applied Logic Associates (ALA) in Houston in 1983 ALA grew to be a biostatistical-data management CRO with 50 employees when it was sold to Westat in 2001 Jay joined the Adjunct Faculty in Biostatistics at Johns Hopkins in 2004 His interests are interim analysis in clinical trials data monitoring committees and statistical regulatory issues He chaired the first known data monitoring committee in the pharmaceutical industry in 1988 He is the author of numerous papers on statistical and clinical trial methodology and in 2009 authored the book Data and Safety Monitoring Committees in Clinical Trials published by Chapman Hall CRC 6 Analysis of Genetic Association Studies Using Sequencing Data and Related Topics Presenters Xihong Lin Department of Biostatistics Harvard School of Public Health xlinhsphhravardedu Seunggeun Lee University of Michigan leeshawnumichedu Course length Half day OutlineDescription The short course is to discuss the current methodology in analyzing sequencing association studies for identifying genetic basis of common complex diseases The rapid advances in next generation sequencing technologies provides an exciting opportunity to gain a better understanding of biological processes and new approaches to disease prevention and treatment During the past few years an increasing number of large scale sequencing association studies such as exome-chip arrays candidate gene sequencing whole exome and whole genome sequencing studies have been conducted and preliminary analysis results have become rapidly available These studies could potentially identify new genetic variants that play important roles in understanding disease etiology or treatment response However due to the massive number of

variants and the rareness of many of these variants across the genome sequencing costs and the complexity of diseases efficient methods for designing and analyzing sequencing studies remain virtually important yet challenging This short course provides an overview of statistical methods for analysis of genome-wide sequencing association studies and related topics Topics include study designs for sequencing studies data process pipelines statistical methods for detecting rare variant effects meta analysis genes-environment interaction population stratification mediation analysis for integrative analysis of genetic and genomic data Data examples will be provided and software will be discussed TextbookReferences Handout and references will be provided About the presenters Xihong Lin is Professor of Biostatistics and Coordinating Director of the Program of Quantitative Genomics at the School of Public Health of Harvard University Dr Linrsquos research interests lie in statistical genetics and lsquoomics especially development and application of statistical and computational methods for analysis of high-throughput genetic and omics data in epidemiological and clinical studies and in statistical methods for analysis of correlated data such as longitudinal clustered and family data Dr Linrsquos specific areas of expertise include statistical methods for genome-wide association studies and next generation sequencing association studies genes and environment mixed models and nonparametric and seimparametric regression She received the 2006 Presidentsrsquo Award for the outstanding statistician from the Committee of the Presidents of Statistical Societies (COPSS) and the 2002 Mortimer Spiegelman Award for the outstanding biostatistician from the American Public Health Association She is an elected fellow of the American Statistical Association Institute of Mathematical Statistics and International Statistical Institute Dr Lin was the Chair of the Committee of the Presidents of the Statistical Societies (COPSS) between 2010 and 2012 She is currently a member of the Committee of Applied and Theoretical Statistics of the US National Academy of Science Dr Lin is a recipient of the MERIT (Method to Extend Research in Time) from the National Institute of Health which provides a long-term research grant support She is the PI of the T32 training grant on interdisciplinary training in statistical genetics and computational biology She has served on numerous editorial boards of statistical journals She was the former Coordinating Editor of Biometrics and currently the co-editor of Statistics in Biosciences and the Associate Editor of Journal of the American Statistical Association and American Journal of Human Genetics She was the permanent member of the NIH study section of Biostatistical Methods and Study Designs (BMRD) and has served on a large number of other study sections at NIH and NSF

Short Courses

14 2014 ICSA - KISS Applied Statistics Symposium Portland June 15-18

Seunggeun (Shawn) Lee is an assistant professor of Biostatistics at the University of Michigan He received his PhD in Biostatistics from the University of North Carolina at Chapel Hill and completed a postdoctoral training at Harvard School of Public Health His research focuses on developing statistical and computational methods for the analysis of the large-scale high-dimensional genetic and genomic data which is essential to better understand the genetic architecture of complex diseases and traits He is a recipient of the NIH Pathway to Independence Award (K99R00) 7 Analysis of biomarkers for prognosis and response prediction Presenter Patrick J Heagerty Professor and Associate Chair Department of Biostatistics University of Washington Seattle MA 98195 email heagertyuwashingtonedu Course length Half day OutlineDescription Longitudinal studies allow investigators to correlate changes in time-dependent exposures or biomarkers with subsequent health outcomes The use of baseline or time-dependent markers to predict a subsequent change in clinical status such as transition to a diseased state require the formulation of appropriate classification and prediction error concepts Similarly the evaluation of markers that could be used to guide treatment requires specification of operating characteristics associated with use of the marker The first part of this course will introduce predictive accuracy concepts that allow evaluation of time-dependent sensitivity and specificity for prognosis of a subsequent event time We will overview options that are appropriate for both baseline markers and for longitudinal markers Methods will be illustrated using examples from HIV and cancer research and will highlight R packages that are currently available Time permitting the second part of this course will introduce statistical methods that can characterize the performance of a biomarker toward accurately guiding treatment choice and toward improving health outcomes when the marker is used to selectively target treatment Examples will include use of imaging information to guide surgical treatment and use of genetic markers to select subjects for treatment TextbookReferences Heagerty PJ Lumley T Pepe MS Time dependent ROC curves for censored survival data and a

diagnostic marker Biometrics 56337-344 2000 Heagerty PJ Zheng Y Survival model predictive accuracy and ROC curves Biometrics 61(1)

92-105 2005 Saha P Heagerty PJ Time-dependent predictive accuracy in

the presence of competing risks Biometrics 66(4)

999-1011 2010 About the presenter Patrick Heagerty is Professor of Biostatistics University of Washington Seattle WA He has been the director of the center for biomedical studies at the University of Washington School of Medicine and Public Health He is one of the leading experts on methods for longitudinal studies including the evaluation of markers used to predict future clinical events He has made significant contributions to many areas of research including semi-parametric regression and estimating equations marginal models and random effects for longitudinal data dependence modeling for categorical time series and hierarchical models for categorical spatial data He was an elected fellow of the American Statistical Association and the Institute of Mathematical Statistics

Social Programs

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 15

Opening Mixer Sunday June 26th 2011 7 PM - 9 PM Salon E Lower Level 1

Banquet Tuesday June 17 2014 630pm-930pm JIN WAH Vietnamese amp Chinese Seafood Restaurant httpwwwjinwahcom Banquet Speech ldquoThe World of Statisticsrdquo After a successful International Year of Statistics 2013 we enter the new World of Statistics This is a great opportunity to think of our profession and look forward to the impact statistical sciences can have in innovation and discoveries in sciences engineering business and education Are we going to be obsolete Or omnipresent Dr Sastry Pantula Dean College of Science Oregon State University and former President of the American Statistical Association Sastry G Pantula became dean of the College of Science at Oregon State University in the fall of 2013 Prior to that he served as director of the National Science Foundationrsquos Division of Mathematical Sciences from 2010-2013

Pantula headed the statistics department at North Carolina State University (NCSU) where he served on the faculty for nearly 30 years He also directed their Institute of Statistics Pantula served as president of the American Statistical Association (ASA) in 2010 In addition to being an ASA fellow he is a fellow of the American Association for the Advancement of Science (AAAS) a member of the honor societies Mu Sigma Rho and Phi Kappa Phi and was inducted into the NCSU Academy of Outstanding Teachers in 1985

As dean of Oregon Statersquos College of Science and professor of statistics Pantula provides leadership to world-class faculty in some of the universityrsquos most recognized disciplines including nationally recognized programs in chemistry informatics integrative biology marine studies material science physics and others

During his tenure at NCSU Pantula worked with his dean and the college foundation to create three $1 million endowments for distinguished professors He also worked with colleagues and alumni to secure more than $7 million in funding from the National Science Foundation other agencies and industry to promote graduate student training and mentorship

Pantularsquos research areas include time series analysis and econometric modeling with a broad range of applications He has worked with the National Science Foundation the US Fish and Wildlife Service the US Environmental Protection Agency and the US Bureau of Census on projects ranging from population estimates to detecting trends in global temperature

As home to the core life physical mathematical and statistical sciences the College of Science has built a foundation of excellence It helped Oregon State acquire the top ranking in the United States for conservation biology in recent years and receive top 10 rankings by the Chronicle of Higher Education for the Departments of Integrative Biology (formerly Zoology) and Science Education The diversity of sciences in the Collegemdashincluding mathematical and statistical sciencesmdashprovides innovative opportunities for fundamental and multidisciplinary research collaborations across campus and around the globe

Pantula holds bachelorrsquos and masterrsquos degrees in statistics from the Indian Statistical Institute in Kolkata India and a PhD in statistics from Iowa State University

2014 ICSA China Statistics Conference July 4 ndash July 5 2014 bull Shanghai bull China

2nd

Announcement of the Conference (April 8 2014)

To attract statistical researchers and students in China and other countries to present their work and

experience with statistical colleagues and to strengthen the connections between China and oversea

statisticians the 2014 ICSA China Statistics Conference will be organized by the Committee for ICSA

Shanghai and hosted by East China Normal University (ECNU) from July 4 to July 5 2014 in

Shanghai China

The conference will invite lead statistical processionals in mainland China Hong Kong Taiwan the

United States and worldwide to present their research work It will cover a broad range of statistics

including mathematical statistics applied statistics biostatistics and statistics in finance and

economics which will provide a good platform for statistical professionals all over the world to share

their latest research and applications of statistics The invited speakers include Prof LJ Wei (Harvard

University) Prof Tony Cai (University of Pennsylvania) Prof Ying Lu (Stanford University) Prof

Ming-Hui Chen (University of Connecticut) Prof Danyu Lin (University of North Carolina at

Chapel Hill) and other distinguished statisticians

The oral presentations at the conference will be conducted in either English or Chinese Although the

Program Committee would recommend the presentation slides in English the Chinese version of the

slides could also be used

The program committee is working on the conference program and more information will be

distributed very soon Should you have any inquiries about the program please contact Dr Dejun

Tang (dejuntangnovartiscom) or Dr Yankun Gong (yankungongnovartiscom)

For conference registration and hotel reservation please contact Prof Shujin Wu at ECNU

(sjwustatecnueducn)

Program Committee amp Local Organizing Committee

2014 ICSA China Statistics Conference

18 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

ICSA DINNER at 2014 JSM in Boston MA The ICSA will hold the annual members meeting on August 6 (Wednesday) at 600 pm in Boston Convention Exhibition Center room CC-157B An ICSA banquet will follow the members meeting at Osaka Japanese Sushi amp Steak House 14 Green St Brookline MA 02446 (617) 732-0088 httpbrooklineosakacom Osaka is a Japanese fusion restaurant located in Brookline and can be reached via the MBTA subway Green line ldquoCrdquo branch (Coolidge corner stop) This restaurant features a cozy setting superior cuisine and elegant decore The banquet menu will include Oyster 3-waysRock ShrimpShrimp TempuraSushi and Sashimi boatHabachi seafoodChar-Grilled Sea BassLobster Complimentary winesakesoft drinks will be served and cash bar for extra drinks will be available The restaurant also has a club dance floor that provides complimentary Karaoke

Scientific Program (Presenting Author) Monday June 16 1000 AM-1200 PM

Scientific Program (June 16th - June 18th)

Monday June 16 800 AM - 930 AM

Keynote session I (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Dongseok Choi Oregon Health amp Science University

800 AM WelcomeYing Lu ICSA 2014 President

805 AM Congratulatory AddressGeorge C Tiao ICSA Founding President

820 AM Keynote lecture IRober Gentleman Genentech

930 AM Floor Discussion

Monday June 16 1000 AM-1200 PM

Session 1 Emerging Statistical Methods for Complex Data(Invited)Room Salon A Lower Level 1Organizer Lan Xue Oregon State UniversityChair Lan Xue Oregon State University

1000 AM Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao Guo University of Wisconsin-Madison

1025 AM Kernel Additive Sliced Inverse RegressionHeng Lian Nanyang Technological University

1050 AM Variable Selection with Prior Information for GeneralizedLinear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3 1OregonState University 2Nielsen Company 3Nielsen Company

1115 AM Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2 1University of Mis-souri at Columbia 2Purdue University

1140 AM Floor Discussion

Session 2 Statistical Methods for Sequencing Data Analysis(Invited)Room Salon B Lower Level 1Organizer Yanming Di Oregon State UniversityChair Gu Mi Oregon State University

1000 AM A Penalized Likelihood Approach for Robust Estimation ofIsoform ExpressionHui Jiang1 and Julia Salzman2 1University of Michigan2Stanford University

1025 AM Classification on Sequencing Data and its Applications on aHuman Breast Cancer DatasetJun Li University of Notre Dame

1050 AM Power-Robustness Analysis of Statistical Models for RNASequencing DataGu Mi Yanming Di and Daniel W Schafer Oregon StateUniversity

1115 AM Discussant Wei Sun University of North Carolina at ChapelHill

1140 AM Floor Discussion

Session 3 Modeling Big Biological Data with Complex Struc-tures (Invited)Room Salon C Lower Level 1Organizer Hua Tang Stanford UniversityChair Marc Coram Stanford University

1000 AM High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1 1University of California atDavis

1025 AM Statistical Analysis of RNA Sequencing DataMingyao Li and Yu Hu University of Pennsylvania

1050 AM Quantifying the Role of Steric Constraints in NucleosomePositioningH Tomas Rube and Jun S Song University of Illinois atUrbana-Champaign

1115 AM Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I Mias Michigan State University

1140 AM Floor Discussion

Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses (Invited)Room Salon D Lower Level 1Organizer Xiaojing Wang University of ConnecticutChair Xun Jiang Amgen Inc

1000 AM Binary State Space Mixed Models with Flexible Link Func-tionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut 2Amgen Inc 3Federal Univer-sity of Rio de Janeiro

1025 AM Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak KDey2 1University of Cincinnati 2University of Connecticut3Lawrence Berkeley National Laboratory

1050 AM Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing Weng National Chengchi University

1115 AM Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras StephenPrisley James Campbell and David Gaines Virginia Insti-tute of Technology

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 19

Monday June 16 1000 AM-1200 PM Scientific Program (Presenting Author)

1140 AM Floor Discussion

Session 5 Recent Advances in Astro-Statistics (Invited)Room Salon G Lower Level 1Organizer Thomas Lee University of Carlifornia at DavisChair Alexander Aue University of California at Davis

1000 AM Embedding the Big Bang Cosmological Model into aBayesian Hierarchical Model for Super Nova Light CurveDataDavid van Dyk Roberto Trotta Xiyun Jiao and HikmataliShariff Imperial College London

1025 AM Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew GrahamCiro Donalek and Andrew Drake California Institute ofTechnology

1050 AM Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku Vrtilek Harvard University

1115 AM Persistent Homology and the Topology of the IntergalacticMediumFabrizio Lecci Carnegie Mellon University

1140 AM Floor Discussion

Session 6 Statistical Methods and Application in Genetics(Invited)Room Salon H Lower Level 1Organizer Ying Wei Columbia UniversityChair Ying Wei Columbia University

1000 AM Identification of Homogeneous and Heterogeneous Covari-ate Structure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1 1New YorkUniversity 2North Carolina State University

1025 AM Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibro-sis in Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia GuillaumeWettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLC

1050 AM DNA Methylation Cell-Type Distribution and EWASE Andres Houseman Oregon State University

1115 AM Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and IulianaLonita-Laza1 1Columbia University 2New York Univer-sity

1140 AM Floor Discussion

Session 7 Statistical Inference of Complex Associations inHigh-Dimensional Data (Invited)Room Salon I Lower Level 1Organizer Jun Liu Harvard UniversityChair Di Wu Harvard University

1000 AM Leveraging for Big Data RegressionPing Ma University of Georgia

1025 AM Reference-free Metagenomics Analysis Using Matrix Fac-torizationWenxuan Zhong and Xin Xing University of Georgia

1050 AM Big Data Big models Big Problems Statistical Principlesand Practice at ScaleAlexander W Blocker Google

1115 AM Floor Discussion

Session 8 Recent Developments in Survival Analysis (Invited)Room Eugene Room Lower Level 1Organizer Qingxia (Cindy) Chen Vanderbilt UniversityChair Qingxia (Cindy) Chen Vanderbilt University

1000 AM Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical Tri-alsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2Mark E Boye3 and Wei Shen3 1University of Connecti-cut 2University of North Carolina 3Eli Lilly and Company

1025 AM Estimating Risk with Time-to-Event Data An Applicationto the Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu21Vanderbilt University 2Fred Hutchinson Cancer ResearchCenter

1050 AM Efficient Estimation of Nonparametric Genetic Risk Func-tion with Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng31Columbia University 2Beijing Normal University3University of North Carolina at Chapel Hill

1115 AM Support Vector Hazard Regression for Predicting EventTimes Subject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng11University of North Carolina 2Columbia University

1140 AM Floor Discussion

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products (Invited)Room Portland Room Lower Level 1Organizers Shihua Wen AbbVie Inc Yijie Zhou Merck amp CoChair Yijie Zhou Merck amp Co

1000 AM Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D Norton MedImmune

1025 AM Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3George Quartey4 John Scott5 and Shihua Wen6 1AmgenInc 2Pfizer Inc 3Merck amp Co 4Hoffmann-La Roche5United States Food and Drug Administration 6AbbVie Inc

1050 AM Current Concept of Benefit Risk Assessment of MedicineSyed S Islam AbbVie Inc

1115 AM Discussant Yang Bo AbbVie Inc

1140 AM Floor Discussion

20 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 130 PM - 310 PM

Session 10 Analysis of Observational Studies and ClinicalTrials (Contributed)Room Salem Room Lower Level 1Chair Naitee Ting Boehringer-Ingelheim Company

1000 AM Impact of Tuberculosis on Mortality Among HIV-InfectedPatients Receiving Antiretroviral Therapy in Uganda ACase Study in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 andLehana Thabane3 1Agensys Inc (Astellas) 2Universityof OttawaMcMaster University 3McMaster University4McMaster UniversityUniversity of Toronto 5The AIDSSupport Organization 6Stellenbosch University

1020 AM Ecological Momentary Assessment Methods to IncreaseResponse and Adjust for Attrition in a Study of MiddleSchool Studentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie KovalchikKirsten Becker Elizabeth DrsquoAmico William Shadel andMarc Elliott RAND Corporation

1040 AM Is Poor Antisaccade Performance in Healthy First-DegreeRelatives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and DeborahL Levy3 1University of Alabama at Birmingham 2StateUniversity of New York at Binghamton 3McLean Hospital

1100 AM Analysis of a Vaccine Study in Animals using MitigatedFraction in SASMathew Rosales Experis

1120 AM Competing Risks Survival Analysis for Efficacy Evaluationof Some-or-None VaccinesPaul T Edlefsen Fred Hutchinson Cancer Research Center

1140 AM Using Historical Data to Automatically Identify Air-TrafficController BehaviorYuefeng Wu University of Missouri at St Louis

1200 PM Floor Discussion

Monday June 16 130 PM - 310 PM

Session 11 Lifetime Data Analysis (Invited)Room Salon A Lower Level 1Organizer Mei-Ling Ting Lee University of MarylandChair Mei-Ling Ting Lee University of Maryland

130 PM Analysis of Multiple Type Recurrent Events When Only Par-tial Information Is Available for Some SubjectsMin Zhan and Jeffery Fink University of Maryland

155 PM Cumulative Incidence Function under Two-Stage Random-izationIdil Yavuz1 Yu Cheng2 and Abdus Wahed2 1 Dokuz EylulUniversity 2 University of Pittsburgh

220 PM Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen Jin Columbia University

245 PM Floor Discussion

Session 12 Safety Signal Detection and Safety Analysis(Invited)Room Salon B Lower Level 1Organizer Qi Jiang Amgen IncChair Qi Jiang Amgen Inc

130 PM Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhangand Qi Jiang Amgen Inc

155 PM Application of a Bayesian Method for Blinded Safety Moni-toring and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Inc

220 PM Some Thoughts on the Choice of Metrics for Safety Evalua-tionSteven Snapinn Amgen Inc

245 PM Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2 1Amgen Inc 2Gilead Sci-ences

310 PM Floor Discussion

Session 13 Survival and Recurrent Event Data Analysis(Invited)Room Salon C Lower Level 1Organizer Chiung-Yu Huang Johns Hopkins UniversityChair Chiung-Yu Huang Johns Hopkins University

130 PM Survival Analysis without Survival DataGary Chan University of Washington

155 PM Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2 1Johns Hopkins Uni-versity 2National Institute of Allergy and Infectious Diseases

220 PM Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 andTodd DeFor1 1University of Minnesota 2Johns HopkinsUniversity

245 PM Floor Discussion

Session 14 Statistical Analysis on Massive Data from PointProcesses (Invited)Room Salon D Lower Level 1Organizer Haonan Wang Colorado State UniversityChair Chunming Zhang University of Wisconsin-Madison

130 PM Identification of Synaptic Learning Rule from EnsembleSpiking ActivitiesDong Song and Theodore W Berger University of South-ern California

155 PM Intrinsically Weighted Means and Non-Ergodic MarkedPoint ProcessesAlexander Malinowski1 Martin Schlather1and ZhengjunZhang2 1University Mannheim 2University of Wisconsin

220 PM Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan Wang Colorado State Uni-versity

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 21

Monday June 16 130 PM - 310 PM Scientific Program (Presenting Author)

245 PM Floor Discussion

Session 15 High Dimensional Inference (or Testing) (Invited)Room Salon G Lower Level 1Organizer Pengsheng Ji University of GeorgiaChair Pengsheng Ji University of Georgia

130 PM Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni Sun University of Pennsylvania

155 PM Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen Liu University of Georgia

220 PM Testing High-Dimensional Nonparametric Function withApplication to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and VidyadharMandrekar Michigan State University

245 PM Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping Liu National Institutes of Health

310 PM Floor Discussion

Session 16 Phase II Clinical Trial Design with Survival End-point (Invited)Room Salon H Lower Level 1Organizer Jianrong Wu St Jude Childrenrsquos Research HospitalChair Joan Hu Simon Fraser University

130 PM Utility-Based Optimization of Schedule-Dose Regimesbased on the Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 andMuzaffar Qazilbash1 1University of Texas MD AndersonCancer Center 2University of Michigan

155 PM Bayesian Decision Theoretic Two-Stage Design in Phase IIClinical Trials with Survival EndpointLili Zhao and Jeremy Taylor University of Michigan

220 PM Single-Arm Phase II Group Sequential Trial Design withSurvival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping Xiong St Jude ChildrenrsquosResearch Hospital

245 PM Floor Discussion

Session 17 Statistical Modeling of High-throughput Ge-nomics Data (Invited)Room Salon I Lower Level 1Organizer Mingyao Li University of Pennsylvania School ofMedicineChair Mingyao Li University of Pennsylvania

130 PM Learning Genetic Architecture of Complex Traits AcrossPopulationsMarc Coram Sophie Candille and Hua Tang StanfordUniversity

155 PM A Bayesian Hierarchical Model to Detect DifferentiallyMethylated Loci from Single Nucleotide Resolution Se-quencing DataHao Feng Karen Coneelly and Hao Wu Emory University

220 PM Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and MingyaoLi2 1University of Minnesota 2University of Pennsylva-nia 3Vanderbilt University

245 PM Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei Zou University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 18 Statistical Applications in Finance (Invited)Room Portland Room Lower Level 1Organizer Zheng Su Deerfield CompanyChair Zheng Su Deerfield Company

130 PM A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2 1State University of NewYork 2IBM

155 PM Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming Huo Georgia Institute of Technology

220 PM Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4

and Hsun-Chih Kuo5 1University of Maryland 2Seoul Na-tional Univ 3Auburn University 4Ulsan National Institute ofScience and Technology 5National Chengchi University

245 PM Optimal Sparse Volatility Matrix Estimation for High Di-mensional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou31Florida State University 2University of Wisconsin-Madison3Yale University

310 PM Floor Discussion

Session 19 Hypothesis Testing (Contributed)Room Eugene Room Lower Level 1Chair Fei Tan Indiana University-Purdue University

130 PM A Score-type Test for Heterogeneity in Zero-inflated Modelsin a Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem31Auburn University 2Kansas State University 3MichiganState University

150 PM Inferences on Correlation Coefficients of Bivariate Log-normal DistributionsGuoyi Zhang1 and Zhongxue Chen2 1Universtiy of NewMexico 2Indiana University

210 PM Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 MyrtoBarrdahl3 and Nilanjan Chatterjee1 1National Cancer In-stitute 2Harvard University 3German Cancer Reserch Center

230 PM Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun Ma Amgen Inc

22 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Monday June 16 330 PM - 510 PM

250 PM Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu Li Auburn University

310 PM Floor Discussion

Session 20 Design and Analysis of Clinical Trials (Contributed)Room Salem Room Lower Level 1Chair Amei Amei University of Nevada at Las Vegas

130 PM Application of Bayesian Approach in Assessing Rare Ad-verse Events during a Clinical StudyGrace Li Karen Price Haoda Fu and David Manner EliLilly and Company

150 PM A Simplified Varying-Stage Adaptive Phase IIIII ClinicalTrial DesignGaohong Dong Novartis Pharmaceuticals Corporation

210 PM Improving Multiple Comparison Procedures With Copri-mary Endpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and FrankBretz1 1Novartis Pharmaceuticals Corporation 2Universityof Bremen

230 PM Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine Crespi Univer-sity of California at Los Angeles

250 PM Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and RiskDifferenceTianyue Zhou Sanofi-aventis US LLC

310 PM Floor Discussion

Monday June 16 330 PM - 510 PM

Session 21 New Methods for Big Data (Invited)Room Salon A Lower Level 1Organizer Yichao Wu North Carolina State UniversityChair Yichao Wu North Carolina State University

330 PM Sure Independence Screening for Gaussian Graphical Mod-elsShikai Luo1 Daniela Witten2 and Rui Song1 1North Car-olina State University 2University of Washington

355 PM Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman21Google 2Iowa State University

420 PM Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis 2University of North Car-olina at Chapel Hill

445 PM OEM Algorithm for Big DataXiao Nie and Peter Z G Qian University of Wisconsin-Madison

510 PM Floor Discussion

Session 22 New Statistical Methods for Analysis of High Di-mensional Genomic Data (Invited)Room Salon B Lower Level 1Organizer Michael C Wu Fred Hutchinson Cancer Research Cen-terChair Michael C Wu Fred Hutchinson Cancer Research Center

330 PM Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung Huang Brown University

355 PM Estimation of High Dimensional Directed Acyclic Graphsusing eQTL dataWei Sun1 and Min Jin Ha2 1University of North Carolinaat Chapel Hill 2University of Texas MD Anderson CancerCenter

420 PM Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 andHongyu Zhao1 1Yale University 2University of Texas atDallas 3Bristol-Myers Squibb 4Mount-Sinai Medical Center

445 PM Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and GenotypingStudiesNi Zhao and Michael Wu Fred Hutchinson Cancer Re-search Center

510 PM Floor Discussion

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation Process (Invited)Room Salon C Lower Level 1Organizer Jing Ning University of Texas MD Anderson CancerCenterChair Weining Shen The University of Texas MD Anderson Can-cer Center

330 PM Joint Modeling of Alternating Recurrent Transition TimesLiang Li University of Texas MD Anderson Cancer Cen-ter

355 PM Regression Analysis of Panel Count Data with InformativeObservation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun41University of North Carolina at Charlotte 2University ofMaryland 3University of New Hampshire 4University ofMissouri at Columbia

420 PM Envelope Linear Mixed ModelXin Zhang University of Minnesota

445 PM Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan Cai University ofTexas health Science Center at Houston

510 PM Floor Discussion

Session 24 Bayesian Models for High Dimensional ComplexData (Invited)Room Salon D Lower Level 1Organizer Juhee Lee University of California at Santa CruzChair Juhee Lee University of California at Santa Cruz

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 23

Monday June 16 330 PM - 510 PM Scientific Program (Presenting Author)

330 PM A Bayesian Feature Allocation Model for Tumor Hetero-geneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and KamalakarGulukota4 1University of California at Santa Cruz2University of Texas at Austin 3University of Chicago4Northshore University HealthSystem

355 PM Some Results on the One-Way ANOVA Model with an In-creasing Number of GroupsFeng Liang University of Illinois at Urbana-Champaign

420 PM Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji3 1University ofLouisville 2University of Texas at Austin 3NorthShore Uni-versity HealthSystemUniversity of Chicago

445 PM Latent Space Models for Dynamic NetworksYuguo Chen University of Illinois at Urbana-Champaign

510 PM Floor Discussion

Session 25 Statistical Methods for Network Analysis (Invited)Room Salon G Lower Level 1Organizer Yunpeng Zhao George Mason UniversityChair Yunpeng Zhao George Mason University

330 PM Consistency of Co-clustering for Exchangable Graph and Ar-ray DataDavid S Choi1 and Patrick J Wolfe2 1Carnegie MellonUniversity 2University College London

355 PM Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali Shojaie University of Washing-ton

420 PM Estimating Signature Subgraphs in Samples of LabeledGraphsJuhee Cho and Karl Rohe University of Wisconsin-Madison

445 PM Fast Hierarchical Modeling for Recommender SystemsPatrick Perry New York University

510 PM Floor Discussion

Session 26 New Analysis Methods for Understanding Com-plex Diseases and Biology (Invited)Room Salon H Lower Level 1Organizer Wenyi Wang University of Texas MD Anderson Can-cer CenterChair Wenyi Wang University of Texas MD Anderson CancerCenter

330 PM Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3Yong Zhang2 Myles Brown4 and X Shirley Liu4 1DanaFarber Cancer Institute 2Tongji University 3University ofTexas MD Anderson Cancer Center 4Dana Farber CancerInstitute amp Harvard University

355 PM Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi Wu Harvard University

430 PM Comparative Meta-Analysis of Prognostic Gene Signaturesfor Late-Stage Ovarian CancerLevi Waldron Hunter College

445 PM Studying Spatial Organizations of Chromosomes via Para-metric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and JunS Liu5 1New York university 2Purdue University 3EmoryUniversity 4Tsinghua University 5Harvard University

510 PM Floor Discussion

Session 27 Recent Advances in Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Mikyoung Jun Texas AampM UniversityChair Zhengjun Zhang University of Wisconsin

330 PM Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark vander Woerd Colorado State University

355 PM Semiparametric Estimation of Spectral Density Functionwith Irregular DataShu Yang and Zhengyuan Zhu Iowa State University

420 PM On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and SiegfriedHormann3 1University of California at Davis 2UniversityCollege London 3University Libre de Bruxelles

445 PM A Composite Likelihood-based Approach for MultipleChange-point Estimation in Multivariate Time Series Mod-elsChun Yip Yau and Ting Fung Ma Chinese University ofHong Kong

510 PM Floor Discussion

Session 28 Analysis of Correlated Longitudinal and SurvivalData (Invited)Room Eugene Room Lower Level 1Organizer Jingjing Wu University of CalgaryChair Jingjing Wu University of Calgary

330 PM Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir Mesbah University of Paris 6

355 PM Power and Sample Size Calculations for Evaluating Media-tion Effects with Multiple Mediators in Longitudinal StudiesCuiling Wang Albert Einstein College of Medicine

420 PM Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore2 1Universityof Maryland 2McGill University

445 PM Joint Modeling of Survival Data and Mismeasured Longitu-dinal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi2 1University ofWestern Ontario 2University of Waterloo

510 PM Floor Discussion

24 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 1000 AM - 1200 PM

Session 29 Clinical Pharmacology (Invited)Room Portland Room Lower Level 1Organizer Christine Wang AmgenChair Christine Wang Amgen

330 PM Truly Personalizing Medicine

Mike D Hale Amgen Inc

355 PM What Do Statisticians Do in Clinical Pharmacology

Brian Smith Amgen Inc

420 PM The Use of Modeling and Simulation to Bridge DifferentDosing Regimens - a Case StudyChyi-Hung Hsu and Jose Pinheiro Janssen Research ampDevelopment

445 PM A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao SusanGuo Lijie Zhong and Liang Fang Gilead Sciences

510 PM Floor Discussion

Session 30 Sample Size Estimation (Contributed)Room Salem Room Lower Level 1Chair Antai Wang New Jersey Institute of Technology

330 PM Sample Size Calculation with Semiparametric Analysis ofLong Term and Short Term Hazards

Yi Wang Novartis Pharmaceuticals Corporation

350 PM Sample Size and Decision Criteria for Phase IIB Studies withActive Control

Xia Xu Merck amp Co

410 PM Sample Size Determination for Clinical Trials to CorrelateOutcomes with Potential PredictorsSu Chen Xin Wang and Ying Zhang AbbVie Inc

430 PM Sample Size Re-Estimation at Interim Analysis in OncologyTrials with a Time-to-Event Endpoint

Ian (Yi) Zhang Sunovion Pharmaceuticals Inc

450 PM Statistical Inference and Sample Size Calculation for PairedBinary Outcomes with Missing Data

Song Zhang University of Texas Southwestern MedicalCenter

510 PM Floor Discussion

Tuesday June 17 820 AM - 930 AM

Keynote session II (Keynote)Room Salon E-F Lower Level 1Organizers ICSA-KISS 2014 organizing committeeChair Rochelle Fu Oregon Health amp Science University

820 AM Keynote lecture II

Sharon-Lise Normand Harvard University

930 AM Floor Discussion

Tuesday June 17 1000 AM - 1200 PM

Session 31 Predictions in Clinical Trials (Invited)Room Salon A Lower Level 1Organizer Yimei Li University of PennsylvaniaChair Daniel Heitjan University of Pennsylvania

1000 AM Predicting Smoking Cessation Outcomes Beyond ClinicalTrialsYimei Li E Paul Wileyto and Daniel F Heitjan Universityof Pennsylvania

1025 AM Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping Wang Eli Lilly andCompany

1050 AM Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record(EMR) DataHaoda Fu1 and Nan Jia2 1Eli Lilly and Company2University of Southern California

1115 AM Weibull Cure-Mixture Model for the Prediction of EventTimes in Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1 1University of Pennsylvania 2Radiation TherapyOncology Group Statistical Center

1140 AM Floor Discussion

Session 32 Recent Advances in Statistical Genetics (Invited)Room Salon B Lower Level 1Organizer Taesung Park Seoul National UniversityChair Taesung Park Seoul National University

1000 AM Longitudinal Exome-Focused GWAS of Alcohol Use in aVeteran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale University

1025 AM Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2

and Alexander F Wilson1 1National Institutes of Health2Mahidol University

1050 AM GMDR A Conceptual Framework for Detection of Multi-factor Interactions Underlying Complex TraitsXiang-Yang Lou University of Alabama at Birmingham

1115 AM Gene-Gene Interaction Analysis for Rare Variants Applica-tion to T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University 2Sejong University

1140 AM Floor Discussion

Session 33 Structured Approach to High Dimensional Datawith Sparsity and Low Rank Factorization (Invited)Room Salon C Lower Level 1Organizer Yoonkyung Lee Ohio State UniversityChair Yoonkyung Lee Ohio State University

1000 AM Two-way Regularized Matrix DecompositionJianhua Huang Texas AampM University

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 25

Tuesday June 17 1000 AM - 1200 PM Scientific Program (Presenting Author)

1025 AM Tensor Regression with Applications in Neuroimaging Anal-ysisHua Zhou1 Lexin Li1 and Hongtu Zhu2 1North CarolinaState University 2University of North Carolina at Chapel Hill

1050 AM RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2

and Guy Lebanon1 1Georgia Institute of Technology2Pennsylvania State University

1115 AM Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho Chun Purdue University

1140 AM Floor Discussion

Session 34 Recent Developments in Dimension ReductionVariable Selection and Their Applications (Invited)Room Salon D Lower Level 1Organizer Xiangrong Yin University of GeorgiaChair Pengsheng Ji University of Georgia

1000 AM Variable Selection and Model Estimation via Subtle Uproot-ingXiaogang Su University of Texas at El Paso

1025 AM Robust Variable Selection Through Dimension ReductionQin Wang Virginia Commonwealth University

1050 AM Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2 1Universityof Florida 2National University of Singapore

1115 AM Floor Discussion

Session 35 Post-Discontinuation Treatment in RandomizedClinical Trials (Invited)Room Salon G Lower Level 1Organizer Li Li Research Scientist Eli Lilly and CompanyChair Li Li Eli Lilly and Company

1000 AM Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censor-ing by Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries1 1EliLilly and Company 2North Carolina State University

1025 AM Quantile Regression Adjusting for Dependent Censoringfrom Semi-Competing RisksRuosha Li1 and Limin Peng2 1University of Pittsburgh2Emory University

1050 AM Overview of Crossover DesignMing Zhu AbbVie Inc

1115 AM Cross-Payer Effects of Medicaid LTSS on Medicare Re-source Use using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen Johnson Universityof Maryland

1140 AM Floor Discussion

Session 36 New Advances in Semi-parametric Modeling andSurvival Analysis (Invited)Room Salon H Lower Level 1Organizer Yichuan Zhao Georgia State UniversityChair Xuelin Huang University of Texas MD Anderson CancerCenter

1000 AM Bayesian Partial Linear Model for Skewed LongitudinalDataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 StuartLipsitz3 and Steven Lipshultz4 1AbbVie Inc 2Florida StateUniversity 3Brigham and Womenrsquos Hospital 4University ofMiami

1025 AM Nonparametric Inference for Inverse Probability WeightedEstimators with a Randomly Truncated SampleXu Zhang University of Mississippi

1050 AM Modeling Time-Varying Effects for High-Dimensional Co-variates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji Zhu University of Michigan

1115 AM Flexible Modeling of Survival Data with Covariates Subjectto Detection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang21Villanova University 2North Carolina State University

1140 AM Floor Discussion

Session 37 High-dimensional Data Analysis Theory andApplication (Invited)Room Salon I Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1000 AM Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen Zhang University of Arizona

1025 AM High-Dimensional Thresholded Regression and ShrinkageEffectZemin Zheng Yingying Fan and Jinchi Lv University ofSouthern California

1050 AM Local Independence Feature Screening for Nonparametricand Semiparametric Models by Marginal Empirical Likeli-hoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu31University of Melbourne 2University of Colorado Denver3North Carolina State University

1115 AM The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2 1Florida State University2University of Minnesota

1140 AM Floor Discussion

Session 38 Leading Across Boundaries Leadership Develop-ment for Statisticians (Invited Discussion Panel)Room Eugene Room Lower Level 1Organizers Ming-Dauh Wang Eli Lilly and Company RochelleFu Oregon Health amp Science University furohsueduChair Ming-Dauh Wang Eli Lilly and Company

Topic The panel will discuss issues related to importance of lead-ership barriers to leadership overcoming barriers commu-nication and sociability

26 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 130 PM - 310 PM

Panel Xiaoli Meng Harvard University

Dipak Dey University of Connecticut

Soonmin Park Eli Lilly and Company

James Hung United States Food and Drug Administration

Walter Offen AbbVie Inc

Session 39 Recent Advances in Adaptive Designs in EarlyPhase Trials (Invited)Room Portland Room Lower Level 1Organizer Ken Cheung Columbia UniversityChair Ken Cheung Columbia University

1000 AM A Toxicity-Adaptive Isotonic Design for Combination Ther-apy in OncologyRui Qin Mayo Clinic

1025 AM Calibration of the Likelihood Continual ReassessmentMethod for Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung11Columbia University 2Boehringer Ingelheim Pharmaceuti-cals

1050 AM Sequential Subset Selection Procedure of Random SubsetSize for Early Phase Clinical trialsCheng-Shiun Leu and Bruce Levin Columbia University

1115 AM Serach Procedures for the MTD in Phase I TrialsShelemyyahu Zacks Binghamton University

1140 AM Floor Discussion

Session 40 High Dimensional RegressionMachine Learning(Contributed)Room Salem Room Lower Level 1Chair Hanxiang Peng Indiana University-Purdue University

1000 AM Variable Selection for High-Dimensional Nonparametric Or-dinary Differential Equation Models With Applications toDynamic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu11University of Rochester 2State University of New York atAlbany 3George Washington University

1020 AM BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University 2Cornell University

1040 AM A Sparse Linear Discriminant Analysis Method withAsymptotic Optimality for Multiclass ClassificationRuiyan Luo and Xin Qi Georgia State University

1100 AM Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles Kooperberg FredHutchinson Cancer Research Center

1120 AM Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin Li Wright State University

1140 AM Rank Estimation and Recovery of Low-rank Matrices ForFactor Model with Heteroscedastic NoiseJingshu Wang and Art B Owen Stanford University

1200 PM Floor Discussion

Tuesday June 17 130 PM - 310 PM

Session 41 Distributional Inference and its Impact on Statis-tical Theory and Practice (Invited)Room Salon A Lower Level 1Organizers Min-ge Xie Rutgers University Thomas Lee Univer-sity of California at Davis thomascmleegmailcomChair Min-ge Xie Rutgers University

130 PM Stat Wars Episode IV A New Hope (For Objective Infer-ence)Keli Liu and Xiao-Li Meng Harvard University

155 PM Higher Order Asymptotics for Generalized Fiducial Infer-enceAbhishek Pal Majumdarand Jan Hannig University ofNorth Carolina at Chapel Hill

220 PM Generalized Inferential ModelsRyan Martin University of Illinois at Chicago

245 PM Formal Definition of Reference Priors under a General Classof DivergenceDongchu Sun University of Missouri

310 PM Floor Discussion

Session 42 Applications of Spatial Modeling and ImagingData (Invited)Room Salon B Lower Level 1Organizer Karen Kafadar Indiana UniversityChair Karen Kafadar Indiana University

130 PM Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1

and James Coan2 1Duke University 2University of Virginia

155 PM A Hierarchical Model for Simultaneous Detection and Esti-mation in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist2 1DePaul Univer-sity 2Johns Hopkins University

220 PM On the Relevance of Accounting for Spatial Correlation ACase Study from FloridaLinda J Young1 and Emily Leary2 1USDA NASS RDD2University of Florida

245 PM Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico 2University of Texas at Austin

310 PM Floor Discussion

Session 43 Recent Development in Survival Analysis andStatistical Genetics (Invited)Room Salon C Lower Level 1Organizers Junlong Li Harvard University KyuHa Lee HarvardUniversityChair Junlong Li Harvard University

130 PM Restricted Survival Time and Non-proportional HazardsZhigang Zhang Memorial Sloan Kettering Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 27

Tuesday June 17 130 PM - 310 PM Scientific Program (Presenting Author)

155 PM Empirical Null using Mixture Distributions and Its Applica-tion in Local False Discovery RateDoHwan Park University of Maryland

220 PM A Bayesian Illness-Death Model for the Analysis of Corre-lated Semi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici1 1Harvard University 2Dana FarberCancer Institute

245 PM Detection of Chromosome Copy Number Variations in Mul-tiple SequencesXiaoyi Min Chi Song and Heping Zhang Yale University

310 PM Floor Discussion

Session 44 Bayesian Methods and Applications in ClinicalTrials with Small Population (Invited)Room Salon D Lower Level 1Organizer Alan Chiang Eli Lilly and CompanyChair Ming-Dauh Wang Eli Lilly and Company

130 PM Applications of Bayesian Meta-Analytic Approach at Novar-tisQiuling Ally He Roland Fisch and David Ohlssen Novar-tis Pharmaceuticals Corporation

155 PM Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and YuanJi3 1University of Texas at Austin 2Harvard University3University of Texas at Austin

220 PM Innovative Designs and Practical Considerations for Pedi-atric StudiesAlan Y Chiang Eli Lilly and Company

245 PM Discussant Ming-Dauh Wang Eli Lilly and Company

310 PM Floor Discussion

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis (Invited)Room Salon G Lower Level 1Organizer Ming Wang Penn State College of MedicineChair Lijun Zhang Penn State College of Medicine

130 PM partDSA for Deriving Survival Risk Groups EnsembleLearning and Variable SelectionAnnette Molinaro1 Adam Olshen1 and RobertStrawderman2 1University of California at San Francisco2University of Rochester

155 PM Predictive Accuracy of Time-Dependent Markers for Sur-vival OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2 1University ofKentucky 2University of North Carolina at Chapel Hill

220 PM Estimating the Effectiveness in HIV Prevention Trials by In-corporating the Exposure Process Application to HPTN 035DataJingyang Zhang1 and Elizabeth R Brown2 1FredHutchinson Cancer Research Center 2Fred Hutchinson Can-cer Research CenterUniversity of Washington

245 PM Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2 1Penn State College ofMedicine 2Emory University

310 PM Floor Discussion

Session 46 Missing Data the Interface between Survey Sam-pling and Biostatistics (Invited)Room Salon H Lower Level 1Organizer Jiwei Zhao University of WaterlooChair Peisong Han University of Waterloo

130 PM Likelihood-based Inference with Missing Data UnderMissing-at-randomShu Yang and Jae Kwang Kim Iowa State University

155 PM Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang Chen Iowa State University

220 PM A New Estimation with Minimum Trace of Asymptotic Co-variance Matrix for Incomplete Longitudinal Data with aSurrogate ProcessBaojiang Chen1 and Jing Qin2 1University of Nebraska2National Institutes of Health

245 PM Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook2 1Queenrsquos Univer-sity 2University of Waterloo

310 PM Floor Discussion

Session 47 New Statistical Methods for Comparative Effec-tiveness Research and Personalized Medicine (Invited)Room Salon I Lower Level 1Organizer Jane Paik Kim Stanford UniversityChair Jane Paik Kim Stanford University

130 PM Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin41University of Texas MD Anderson Cancer Center 2BaylorCollege of Medicine 3University of Texas MD AndersonCancer Center 4National Institutes of Health

155 PM Choice between Superiority and Non-inferiority in Compar-ative Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford University

220 PM An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze Lai Stanford University

245 PM Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins University

310 PM Floor Discussion

Session 48 Student Award Session 1 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Zhezhen Jin Columbia University

28 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Tuesday June 17 330 PM - 530 PM

130PM Regularization After Retention in Ultrahigh DimensionalLinear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2 1ColumbiaUniversity 2Binghamton University

155 PM Personalized Dose Finding Using Outcome Weighted Learn-ingGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hill

220 PM Survival Rates Prediction When Training Data and TargetData Have Different Measurement ErrorCheng Zheng and Yingye Zheng Fred Hutchinson CancerResearch Center

245 PM Hard Thresholded Regression Via Linear ProgrammingQiang Sun University of North Carolina at Chapel Hill

310 PM Floor Discussion

Session 49 Network AnalysisUnsupervised Methods(Contributed)Room Eugene Room Lower Level 1Chair Chunming Zhang University of Wisconsin-Madison

130 PM Community Detection in Multilayer Networks A Hypothe-sis Testing ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel Hill

150 PM Network Enrichment Analysis with Incomplete Network In-formationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan 2University of Washington

210 PM Estimation of A Linear Model with Fuzzy Data Treated asSpecial Functional DataWang Dabuxilatu Guangzhou University

230 PM Efficient Estimation of Sparse Directed Acyclic Graphs Un-der Compounded Poisson DataSung Won Han and Hua Zhong New York University

250 PM Asymptotically Normal and Efficient Estimation ofCovariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and HarrisonZhou Yale University

310 PM Floor Discussion

Session 50 Personalized Medicine and Adaptive Design(Contributed)Room Salem Room Lower Level 1Chair Danping Liu National Institutes of Health

130 PM MicroRNA Array NormalizationLi-Xuan and Qin Zhou Memorial Sloan Kettering CancerCenter

150 PM Combining Multiple Biomarker Models with Covariates inLogistic Regression Using Modified ARM (Adaptive Re-gression by Mixing) ApproachYanping Qiu1 and Rong Liu2 1Merck amp Co 2BayerHealthCare

210 PM A New Association Test for Case-Control GWAS Based onDisease Allele SelectionZhongxue Chen Indiana University

230 PM On Classification Methods for Personalized Medicine andIndividualized Treatment RulesDaniel Rubin United States Food and Drug Administration

250 PM Bayesian Adaptive Design for Dose-Finding Studies withDelayed Binary ResponsesXiaobi Huang1 and Haoda Fu2 1Merck amp Co 2Eli Lillyand Company

310 PM Floor Discussion

Tuesday June 17 330 PM - 530 PM

Session 51 New Development in Functional Data Analysis(Invited)Room Salon A Lower Level 1Organizer Guanqun Cao Auburn UniversityChair Guanqun Cao Auburn University

330 PM Variable Selection and Estimation for Longitudinal SurveyDataLi Wang1 Suojin Wang2 and Guannan Wang11University of Georgia 2Texas AampM University

355 PM Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and BorisFreydin1 1Thomas Jefferson University 2George Wash-ington University

420 PM A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1 1NewYork University 2Columbia University

445 PM Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan Mizera Universityof Alberta

510 PM Floor Discussion

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs (Invited)Room Salon B Lower Level 1Organizer Gang Li Johnson amp JohnsonChair Yi Wang Novartis Pharmaceuticals Corporation

330 PM Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric Chi Amgen Inc

350 PM New Analytical Methods for Non-Inferiority Trials Covari-ate Adjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug Administration

410 PM Where is the Right Balance for Designing an EfficientBiosimilar Clinical Program - A Biostatistic Perspective onAppropriate Applications of Statistical Principles from NewDrug to BiosimilarsYulan Li Novartis Pharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 29

Tuesday June 17 330 PM - 530 PM Scientific Program (Presenting Author)

430 PM Challenges of designinganalyzing trials for Hepatitis CdrugsGreg Soon United States Food and Drug Administration

450 PM GSKrsquos Patient-level Data Sharing ProgramShuyen Ho GlaxoSmithKline plc

510 PM Floor Discussion

Session 53 Gatekeeping Procedures and Their Applicationin Pivotal Clinical Trials (Invited)Room Salon C Lower Level 1Organizer Michael Lee Johnson amp JohnsonChair Michael Lee Johnson amp Johnson

330 PM A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane2 1Novartis PharmaceuticalsCorporation 2Northwestern University

355 PM Multiple Comparisons in Complex Trial DesignsHM James Hung United States Food and Drug Adminis-tration

420 PM Use of Bootstrapping in Adaptive Designs with MultiplicityIssuesJeff Maca Quintiles

445 PM Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael Lee Janssen Research amp Development

510 PM Floor Discussion

Session 54 Approaches to Assessing Qualitative Interactions(Invited)Room Salon D Lower Level 1Organizer Guohua (James) Pan Johnson amp JohnsonChair James Pan Johnson amp Johnson

330 PM Interval Based Graphical Approach to Assessing QualitativeInteractionGuohua Pan and Eun Young Suh Johnson amp Johnson

355 PM Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong Luo Celgene Corporation

420 PM A Bayesian Approach to Qualitative InteractionEmine O Bayman University of Iowa

445 PM Discussant Surya Mohanty Johnson amp Johnson

510 PM Floor Discussion

Session 55 Interim Decision-Making in Phase II Trials(Invited)Room Salon G Lower Level 1Organizer Lanju Zhang AbbVie IncChair Lanju Zhang AbbVie Inc

330 PM Evaluation of Interim Dose Selection Methods Using ROCApproachDeli Wang Lu Cui Lanju Zhang and Bo Yang AbbVieInc

355 PM Interim Monitoring for Futility Based on Probability of Suc-cessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh31AbbVie Inc 2Merck amp Co 3GlaxoSmithKline plc

420 PM Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran Suresh GlaxoSmithKlineplc

445 PM Discussant Peng Chen Celgene Corporation510 PM Floor Discussion

Session 56 Recent Advancement in Statistical Methods(Invited)Room Salon H Lower Level 1Organizer Dongseok Choi Oregon Health amp Science UniversityChair Dongseok Choi Oregon Health amp Science University

330 PM Exact Inference New Methods and ApplicationsIan Dinwoodie Portland State University

355 PM Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun Hong Sungkyunkwan University

420 PM Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho21Washington State University 2Seoul National University

445 PM A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying Yuan University ofTexas MD Anderson Cancer Center

510 PM Floor Discussion

Session 57 Building Bridges between Research and Practicein Time Series Analysis (Invited)Room Salon I Lower Level 1Organizer Jane Chu IBMSPSSChair Jane Chu IBMSPSS

330 PM Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and JinwooShin2 1IBM 2KAIST University

355 PM Time Series Research at the U S Census BureauBrian C Monsell U S Census Bureau

420 PM Issues Related to the Use of Time Series in Model Buildingand AnalysisWilliam WS Wei Temple University

445 PM Discussant George Tiao University of Chicago510 PM Floor Discussion

Session 58 Recent Advances in Design for BiostatisticalProblems (Invited)Room Eugene Room Lower Level 1Organizer Weng Kee Wong University of California at Los Ange-lesChair Weng Kee Wong University of California at Los Angeles

330 PM Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough Carriere University of Al-berta

30 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 830 AM - 1010 AM

355 PM Efficient Algorithms for Two-stage Designs on Phase II Clin-ical TrialsSeongho Kim1 and Weng Kee Wong2 1Wayne State Uni-versityKarmanos Cancer Institute 2University of Californiaat Los Angeles

420 PM Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-ChungWang3 and Weng Kee Wong4 1Institute of Statistical Sci-ence Academia Sinica 2National Cheng Kung University3National Taiwan University 4University of California at LosAngeles

445 PM D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle SwarmOptimizationJiaheng Qiu and Weng Kee Wong University of Californiaat Los Angeles

510 PM Floor Discussion

Session 59 Student Award Session 2 (Invited)Room Portland Room Lower Level 1Organizer ICSA-KISS 2014 Student Paper Award CommitteeChair Wenqing He University of Western Ontario

330 PM Analysis of Sequence Data Under Multivariate Trait-Dependent SamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari ENorth1 Eric Boerwinkle2 and Dan-Yu Lin1 1Universityof North Carolina at Chapel Hill 2University of Texas HealthScience Center

355 PM Empirical Likelihood Based Tests for Stochastic OrderingUnder Right CensorshipHsin-wen Chang and Ian W McKeague Columbia Uni-versity

420 PM Multiple Genetic Loci Mapping for Latent Disease LiabilityUsing a Structural Equation Modeling Approach with Appli-cation in Alzheimerrsquos DiseaseTing-Huei Chen University of North Carolina at ChapelHill

445 PM Floor Discussion

Session 60 Semi-parametric Methods (Contributed)Room Salem Room Lower Level 1Chair Ouhong Wang Amgen Inc

330 PM Semiparametric Estimation of Mean and Variance in Gener-alized Estimating EquationsJianxin Pan1 and Daoji Li2 1The University of Manch-ester 2University of Southern California

350 PM An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan Li IndianaUniversity-Purdue University Indianapolis

410 PM M-estimation for General ARMA Processes with InfiniteVarianceRongning Wu Baruch College City University of NewYork

430 PM Sufficient Dimension Reduction via Principal Lq SupportVector MachineAndreas Artemiou1 and Yuexiao Dong2 1Cardiff Univer-sity 2Temple University

450 PM Nonparametric Quantile Regression via a New MM Algo-rithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong41College of Charleston 1National Chengchi University2Shanghai University of Finance and Economics 3KansasState University 4Temple University

510 PM Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel LinderJingxian Cai and Robert Vogel Georgia Southern Uni-versity

530 PM Floor Discussion

Wednesday June 18 830 AM - 1010 AM

Session 61 Statistical Challenges in Variable Selection forGraphical Modeling (Invited)Room Salon A Lower Level 1Organizer Hua (Judy) Zhong New York UniversityChair Hua (Judy) Zhong New York University

830 AM Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1 1 Univer-sity of Cambridge 2 Columbia University

855 AM High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2 1Temple University 2EmoryUniversity

920 AM Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and MarinaVannucci3 1Stanford University 2University of Texas MDAnderson Cancer Center 3Rice University

945 AM Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genev-era I Allen2 and Zhandong Liu3 1University of Texas atAustin 2Rice University 3Baylor College of Medicine

1010 AM Floor Discussion

Session 62 Recent Advances in Non- and Semi-parametricMethods (Invited)Room Salon B Lower Level 1Organizer Lan Xue Oregon State UniversityChair Quanqun Cao Auburn University

830 AM Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential SplineFamilyLan Zhou Texas AampM University

855 AM Estimating Time-Varying Effects for Overdispersed Recur-rent Data with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2Mouna Akacha3 and Heinz Schmidli3 1Vanderbilt Univer-sity 2University of North Carolina at Chapel Hill 3NovartisPharmaceuticals Corporation

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 31

Wednesday June 18 830 AM - 1010 AM Scientific Program (Presenting Author)

920 AM Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily Wang University of Georgia

945 AM Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and AnnieQu2 1Oregon State University 2University of Illinois atUrbana-Champaign 3Lung and Blood Institute

1010 AM Floor Discussion

Session 63 Statistical Challenges and Development in Can-cer Screening Research (Invited)Room Salon C Lower Level 1Organizer Yu Shen University of Texas MD Anderson CancerCenterChair Yu Shen Professor University of Texas M D AndersonCancer Center

830 AM Overdiagnosis in Breast and Prostate Cancer ScreeningConcepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing Xia Fred Hutchin-son Cancer Research Center

855 AM Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington 2Fred Hutchinson Cancer Re-search Center

920 AM Estimating Screening Test Effectiveness when Screening In-dication is UnknownRebecca Hubbard Group Health Research Institute

945 AM Developing Risk-Based Screening Guidelines ldquoEqual Man-agement of Equal RisksrdquoHormuzd Katki National Cancer Institute

1010 AM Floor Discussion

Session 64 Recent Developments in the Visualization andExploration of Spatial Data (Invited)Room Salon D Lower Level 1Organizer Juergen Symanzik Utah State UniversityChair Juergen Symanzik Utah State University

830 AM Recent Advancements in Geovisualization with a CaseStudy on Chinese ReligionsJuergen Symanzik1 and Shuming Bao2 1Utah State Uni-versity 2University of Michigan

855 AM Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She2 1University ofMichigan 2Wuhan University

920 AM Probcast Creating and Visualizing Probabilistic WeatherForecastsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3Tilmann Gneiting4 and Adrian Raftery2 1Seattle Uni-versity 2University of Washington 3Bigger Boat Consulting4University Heidelberg

945 AM Discussant Karen Kafadar Indiana University

1010 AM Floor Discussion

Session 65 Advancement in Biostaistical Methods and Ap-plications (Invited)Room Salon G Lower Level 1Organizer Sin-ho Jung Duke UniversityChair Dongseok Choi Oregon Health amp Science University

830 AM Estimation of Time-Dependent AUC under Marker-Dependent SamplingXiaofei Wang and Zhaoyin Zhu Duke University

855 AM A Measurement Error Approach for ModelingAccelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy Dunloop NorthwesternUniversity

920 AM Real-Time Prediction in Clinical Trials A Statistical Historyof REMATCHDaniel F Heitjan and Gui-shuang Ying University ofPennsylvania

945 AM An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C MorrisonElaine C Johnson Stephen R Planck and James T Rosen-baum Oregon Health amp Science University

1010 AM Floor Discussion

Session 66 Analysis of Complex Data (Invited)Room Salon H Lower Level 1Organizer Mesbah Mounir University of Paris 6Chair Mesbah Mounir University of Paris 6

830 AM Integrating Data from Heterogeneous Studies Using OnlySummary Statistics Efficiency and Robustness

Min-ge Xie Rutgers University

855 AM A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University 2Koc University

920 AM A Comparison of Two Approaches for Acute Leukemia Pa-tient ClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng31University of Calgary 2Enbridge Pipelines 3University ofGuelph

945 AM On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 ValerieMcGuire1 and John Shepherd3 1VA Palo Alto HealthCare System amp Stanford University 2Purdue University3University of California at San Francisco 4VA Palo AltoHealth Care System

1010 AM Floor Discussion

Session 67 Statistical Issues in Co-development of Drug andBiomarker (Invited)Room Salon I Lower Level 1Organizer Liang Fang Gilead SciencesChair Liang Fang Gilead Sciences

32 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Wednesday June 18 1030 AM-1210 PM

830 AM Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in ComparativeEffectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong WooKim3 1Stanford University 2Onyx Pharmaceuticals3Microsoft Corportation

855 AM Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2 1University of Wash-ington 2National Institutes of Health

920 AM An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael Wolf Amgen Inc

945 AM Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas Bengtsson Genentech Inc

1010 AM Floor Discussion

Session 68 New Challenges for Statistical Ana-lystProgrammer (Invited)Room Eugene Room Lower Level 1Organizer Xianming (Steve) Zheng Eli Lilly and CompanyChair Xianming (Steve) Zheng Eli Lilly and Company

830 AM Similarities and Differences in Statistical Programmingamong CRO and Pharmaceutical IndustriesMark Matthews inVentiv Health Clinical

855 AM Computational Aspects for Detecting Safety Signals in Clin-ical TrialsJyoti Rayamajhi Eli Lilly and Company

920 AM Bayesian Network Meta-Analysis Methods An Overviewand A Case StudyBaoguang Han1 Wei Zou2 and Karen Price1 1Eli Lillyand Company 2inVentiv Clinical Health

945 AM Floor Discussion

Session 69 Adaptive and Sequential Methods for ClinicalTrials (Invited)Room Portland Room Lower Level 1Organizers Zhengjia Chen Emory University Yichuan ZhaoGeorgia State University yichuangsueduChair Zhengjia Chen Emory University

830 AM Bayesian Data Augmentation Dose Finding with ContinualReassessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2 1 University ofTexas MD Anderson Cancer Center 2 University of HongKong

855 AM Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying Yuan University of TexasMD Anderson Cancer Center

920 AM Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston

945 AM Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen21 Georgia State University 2 Emory University

1010 AM Floor Discussion

Wednesday June 18 1030 AM-1210 PM

Session 70 Survival Analysis (Contributed)Room Portland Room Lower Level 1Chair Zhezhen Jin Columbia University

1030 AM Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua Naranjo Western Michi-gan University

1050 AM Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary Endpoint

Nibedita Bandyopadhyay Janssen Research amp Develop-ment

1110 AM Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen Cai University of North Carolinaat Chapel Hill

1130 AM Floor Discussion

Session 71 Complex Data Analysis Theory and Application(Invited)Room Salon A Lower Level 1Organizer Yang Feng Columbia UniversityChair Yang Feng Columbia University

1030 AM Supervised Singular Value Decomposition and Its Asymp-totic Properties

Gen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill 2Rutgers Uni-versity

1055 AM New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng21University of Arizona 2Columbia University

1120 AM A Statistical Approach to Set Classification by Feature Se-lection with Applications to Classification of HistopathologyImages

Sungkyu Jung1 and Xingye Qiao2 1University of Pitts-burgh 2Binghamton University State University of NewYork

1145 AM A Smoothing Spline Model for analyzing dMRI Data ofSwallowing

Binhuan Wang Ryan Branski Milan Amin and Yixin FangNew York University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 33

Wednesday June 18 1030 AM-1210 PM Scientific Program (Presenting Author)

Session 72 Recent Development in Statistics Methods forMissing Data (Invited)Room Salon B Lower Level 1Organizer Nanhua Zhang Cincinnati Childrenrsquos Hospital MedicalCenterChair Haoda Fu Eli Lilly and Company

1030 AM A Semiparametric Inference to Regression Analysis withMissing Covariates in Survey DataShu Yang and Jae-kwang Kim Iowa State University

1055 AM Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2 1University of Waterloo2University of Michigan

1120 AM Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1 1United States Centers forDisease Control and Prevention

1145 AM Marginal Treatment Effect Estimation Using Pattern-Mixture ModelZhenzhen Xu United States Food and Drug Administration

1210 PM Floor Discussion

Session 73 Machine Learning Methods for Causal Inferencein Health Studies (Invited)Room Salon C Lower Level 1Organizer Mi-Ok Kim Cincinnati Childrenrsquos Hospital MedicalCenterChair Mi-Ok Kim Cincinnati Childrenrsquos Hospital Medical Center

1030 AM Causal Inference of Interaction Effects with Inverse Propen-sity Weighting G-Computation and Tree-Based Standard-izationJoseph Kang1 Xiaogang Su2 Lei Liu1 and MarthaDaviglus3 1 Northwestern University 2 University of Texasat El Paso 3 University of Illinois at Chicago

1055 AM Practice of Causal Inference with the Propensity of BeingZero or OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and PeterM Steiner3 1 Northwestern University 2University ofCincinnatiCincinnati Childrenrsquos Hospital Medical Center3University of Wisconsin-Madison

1120 AM Propensity Score and Proximity Matching Using RandomForestPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1 1SanDiego State University 2University of Texas at El Paso

1145 AM Discussant Joseph Kang Northwestern University

1210 PM Floor Discussion

Session 74 JP Hsu Memorial Session (Invited)Room Salon D Lower Level 1Organizers Lili Yu Georgia Southern University Karl PeaceGeorgia Southern University kepeacegeorgiasoutherneduChair Lili Yu Georgia Southern University

1030 AM Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili Yu Georgia Southern University

1055 AM (Student Paper Award) Estimating a Change-Point in High-Dimensional Markov Random Field Models Sandipan RoyUniversity of Michigan

1120 AM A Comparison of Size and Power of Tests of Hypotheses onParameters Based on Two Generalized Lindley DistributionsMacaulay Okwuokenye Biogen Idec

1145 AM Floor Discussion

Session 75 Challenge and New Development in Model Fit-ting and Selection (Invited)Room Salon G Lower Level 1Organizer Zhezhen Jin Columbia UniversityChair Cuiling Wang Yeshiva University

1030 AM Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2 1University ofNevada at Las Vegas 2American Museum of Natural History

1055 AM On A Class of Maximum Empirical Likelihood EstimatorsDefined By Convex FunctionsHanxiang Peng and Fei Tan Indiana University-PurdueUniversity Indianapolis

1120 AM Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai Wang New Jersey Institute of Technology

1145 AM Dual Model Misspecification in Generalized Linear Modelswith Error in VariablesXianzheng Huang University of Southern California

1210 PM Floor Discussion

Session 76 Advanced Methods and Their Applications inSurvival Analysis (Invited)Room Salon H Lower Level 1Organizers Jiajia Zhang University of South Carolina Wenbin LuNorth Carolina State UniversityChair Jiajia Zhang University of South Carolina

1030 AM Kernel Smoothed Profile Likelihood Estimation in the Ac-celerated Failure Time Frailty Model for Clustered SurvivalDataBo Liu1 Wenbin Lu1 and Jiajia Zhang2 1North CarolinaState University 2South Carolina University

1055 AM Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2 1National Uni-versity of Singapore 2Emory University

1120 AM Analysis of Event History Data in Tuberculosis (TB) Screen-ingJoan Hu Simon Fraser University

1145 AM On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1

and Mei-Cheng Wang3 1University of Texas MD An-derson Cancer Center 2University of Texas Health ScienceCenter at Houston 3Johns Hopkins University

1210 PM Floor Discussion

34 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Scientific Program (Presenting Author) Session name

Session 77 High Dimensional Variable Selection and Multi-ple Testing (Invited)Room Salon I Lower Level 1Organizer Zhigen Zhao Temple UniversityChair Jichun Xie Temple University

1030 AM On Procedures Controlling the False Discovery Rate forTesting Hierarchically Ordered HypothesesGavin Lynch and Wenge Guo New Jersey Institute ofTechnology

1055 AM Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 andYufeng Liu4 1University of Texas MD Anderson Can-

cer Center 2North Carolina State University 3University ofArizona 4University of North Carolina at Chapel Hill

1120 AM Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional Regression

Zhigen Zhao1 and Pengsheng Ji2 1Temple University2University of Georgia

1145 AM Pathwise Calibrated Active Shooting Algorithm with Appli-cation to Semiparametric Graph Estimation

Tuo Zhao1 and Han Liu2 1Johns Hopkins University2Princeton University

1210 PM Floor Discussion

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 35

Abstracts

Abstracts

Session 1 Emerging Statistical Methods for ComplexData

Estimation of the Error Auto-Correlation Matrix in Semi-parametric Model for Brain fMRI DataChunming Zhang and Xiao GuoUniversity of Wisconsin-MadisoncmzhangstatwisceduIn statistical analysis of functional magnetic resonance imaging(fMRI) dealing with the temporal correlation is a major challengein assessing changes within voxels In this paper we aim to addressthis issue by considering a semi-parametric model for fMRI dataFor the error process in the semi-parametric model we constructa banded estimate of the auto-correlation matrix R and propose arefined estimate of the inverse of R Under some mild regularityconditions we establish consistency of the banded estimate with anexplicit convergence rate and show that the refined estimate con-verges under an appropriate norm Numerical results suggest thatthe refined estimate performs conceivably well when it is applied tothe detection of the brain activity

Kernel Additive Sliced Inverse RegressionHeng LianNanyang Technological UniversityshellinglianhenghotmailcomIn recent years nonlinear sufficient dimension reduction (SDR)methods have gained increasing popularity However while semi-parametric models in regression have fascinated researchers for sev-eral decades with a large amount of literature parsimonious struc-tured nonlinear SDR has attracted little attention so far In this pa-per extending kernel sliced inverse regression we study additivemodels in the context of SDR and demonstrate its potential use-fulness due to its flexibility and parsimony Theoretically we clar-ify the improved convergence rate using additive structure is due tofaster rate of decay of the kernelrsquos eigenvalues Additive structurealso opens the possibility of nonparametric variable selection Thissparsification of the kernel however does not introduce additionaltuning parameters in contrast with sparse regression Simulatedand real data sets are presented to illustrate the benefits and limita-tions of the approach

Variable Selection with Prior Information for Generalized Lin-ear Models via the Prior LASSO MethodYuan Jiang1 Yunxiao He2 and Heping Zhang3

1Oregon State University2Nielsen Company3Nielsen CompanyyuanjiangstatoregonstateeduLASSO is a popular statistical tool often used in conjunction withgeneralized linear models that can simultaneously select variablesand estimate parameters When there are many variables of in-terest as in current biological and biomedical studies the powerof LASSO can be limited Fortunately so much biological andbiomedical data have been collected and they may contain usefulinformation about the importance of certain variables This paperproposes an extension of LASSO namely prior LASSO (pLASSO)to incorporate that prior information into penalized generalized lin-ear models The goal is achieved by adding in the LASSO criterion

function an additional measure of the discrepancy between the priorinformation and the model For linear regression the whole solu-tion path of the pLASSO estimator can be found with a proceduresimilar to the Least Angle Regression (LARS) Asymptotic theoriesand simulation results show that pLASSO provides signicant im-provement over LASSO when the prior information is relatively ac-curate When the prior information is less reliable pLASSO showsgreat robustness to the misspecication We illustrate the applicationof pLASSO using a real data set from a genome-wide associationstudy

Bootstrapping High Dimensional Vector Interplay BetweenDependence and DimensionalityXianyang Zhang1 and Guang Cheng2

1University of Missouri at Columbia2Purdue UniversityzhangxianymissourieduIn this talk we will focus on the problem of conducting inferencefor high dimensional weakly dependent time series Motivated bythe applications in modern high dimensional inference we derive aGaussian approximation result for the maximum of a sum of weaklydependent vectors using Steinrsquos method where the dimension ofthe vectors is allowed to be exponentially larger than the samplesize Our result reveals an interesting phenomenon arising fromthe interplay between the dependence and dimensionality the moredependent of the data vector the slower diverging rate of the di-mension is allowed for obtaining valid statistical inference A typeof dimension-free dependence structure is derived as a by-productBuilding on the Gaussian approximation result we propose a block-wise multiplier (Wild) bootstrap that is able to capture the depen-dence between and within the data vectors and thus provides high-quality distributional approximation to the distribution of the maxi-mum of vector sum in the high dimensional context

Session 2 Statistical Methods for Sequencing Data Anal-ysis

A Penalized Likelihood Approach for Robust Estimation of Iso-form ExpressionHui Jiang1 and Julia Salzman2

1University of Michigan2Stanford UniversityjianghuiumicheduUltra high-throughput sequencing of transcriptomes (RNA-Seq) hasenabled the accurate estimation of gene expression at individual iso-form level However systematic biases introduced during the se-quencing and mapping processes as well as incompleteness of thetranscript annotation databases may cause the estimates of isoformabundances to be unreliable and in some cases highly inaccurateThis paper introduces a penalized likelihood approach to detect andcorrect for such biases in a robust manner Our model extends thosepreviously proposed by introducing bias parameters for reads AnL1 penalty is used for the selection of non-zero bias parametersWe introduce an efficient algorithm for model fitting and analyzethe statistical properties of the proposed model Our experimentalstudies on both simulated and real datasets suggest that the model

36 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

has the potential to improve isoform-specific gene expression es-timates and identify incompletely annotated gene models This isjoint work with Julia Salzman

Classification on Sequencing Data and its Applications on a Hu-man Breast Cancer DatasetJun LiUniversity of Notre DamejunlindeduGene expression measured by the RNA-sequencing technique canbe used to classify biological samples from different groups suchas normal vs early-stage cancer vs cancer To get an interpretableclassifier with high robustness and generality often some types ofshrinkage is used to give a linear and sparse model In microarraydata an example is PAM (pattern analysis of microarrays) whichuses a nearest shrunken centroid classifier To accommodate the dis-crete nature of sequencing data this model was modified by usinga Poisson distribution We further generalize this model by usinga negative binomial distribution to take account of the overdisper-sion in the data We compare the performance of Gaussian Poissonand negative binomial based models on simulation data as well asa human breast cancer dataset We find while the cross-validationmisclassification rate of the three methods are often quite similarthe number of genes used by the models can be quite different andusing Gaussian model on carefully normalized data typically givesmodels with the least number of genes

Power-Robustness Analysis of Statistical Models for RNA Se-quencing DataGu Mi Yanming Di and Daniel W SchaferOregon State UniversitymigstatoregonstateeduWe present results from power-robustness analysis of several sta-tistical models for RNA sequencing (RNA-Seq) data We fit themodels to several RNA-Seq datasets perform goodness-of-fit teststhat we developed (Mi extitet al 2014) and quantify variations notexplained by the fitted models The statistical models we comparedare all based on the negative binomial (NB) distribution but differin how they handle the estimation of the dispersion parameter Thedispersion parameter summarizes the extra-Poisson variation com-monly observed in RNA-Seq data One widely-used power-savingstrategy is to assume some commonalities of NB dispersion param-eters across genes via simple models relating them to mean expres-sion rates and many such models have been proposed Howeverthe power benefit of the dispersion-modeling approach relies on theestimated dispersion models being adequate It is not well under-stood how robust the approach is if the fitted dispersion models areinadequate Our empirical investigations provide a further step to-wards understanding the pros and cons of different NB dispersionmodels and draw attention to power-robustness evaluation a some-what neglected yet important aspect of RNA-Seq data analysis

Session 3 Modeling Big Biological Data with ComplexStructures

High Dimensional Graphical Models LearningJie Peng1 and Ru Wang1

1University of California at DavisjiepengucdaviseduProbabilistic graphical models are used as graphical presentationsof probability distributions particularly their conditional indepen-dence properties Graphical models have broad applications in the

fields of biology social science linguistic neuroscience etc Wewill focus on graphical model structure learning under the high di-mensional regime where to avoid over-fitting and to develop com-putationally efficient algorithms are particularly challenging Wewill discuss the use of data perturbation and model aggregation formodel building and model selection

Statistical Analysis of RNA Sequencing DataMingyao Li and Yu HuUniversity of PennsylvaniamingyaomailmedupenneduRNA sequencing (RNA-Seq) has rapidly replaced microarrays asthe major platform for transcriptomics studies Statistical analysisof RNA-Seq data however is challenging because various biasespresent in RNA-Seq data complicate the analysis and if not ap-propriately corrected can affect isoform expression estimation anddownstream analysis In this talk I will first present PennSeq astatistical method that estimates isoform-specific gene expressionPennSeq is a nonparametric-based approach that allows each iso-form to have its own non-uniform read distribution By giving ad-equate weight to the underlying data this empirical approach max-imally reflects the true underlying read distribution and is effectivein adjusting non-uniformity In the second part of my talk I willpresent a statistical method for testing differential alternative splic-ing by jointly modeling multiple samples I will show simulationresults as well as some examples from a clinical study

Quantifying the Role of Steric Constraints in Nucleosome Posi-tioningH Tomas Rube and Jun S SongUniversity of Illinois at Urbana-ChampaignsongjillinoiseduStatistical positioning the localization of nucleosomes packedagainst a fixed barrier is conjectured to explain the array of well-positioned nucleosomes at the 5rsquo end of genes but the extent andprecise implications of statistical positioning in vivo are unclear Iwill examine this hypothesis quantitatively and generalize the ideato include moving barriers Early experiments noted a similarity be-tween the nucleosome profile aligned and averaged across genes andthat predicted by statistical positioning however our study demon-strates that the same profile is generated by aligning random nu-cleosomes calling the previous interpretation into question Newrigorous analytic results reformulate statistical positioning as pre-dictions on the variance structure of nucleosome locations in indi-vidual genes In particular a quantity termed the variance gradientdescribing the change in variance between adjacent nucleosomes istested against recent high-throughput nucleosome sequencing dataConstant variance gradients render evidence in support of statisticalpositioning in about 50 of long genes Genes that deviate frompredictions have high nucleosome turnover and cell-to-cell gene ex-pression variability Our analyses thus clarify the role of statisticalpositioning in vivo

Integrative Dynamic Omics Networks and PersonalizedMedicineGeorge I MiasMichigan State UniversitygmiasmsueduThe emergence and ready availability of novel -omics technologiesis guiding our efforts to make advances in the implementation ofpersonalized medicine High quality genomic data is now comple-mented with other dynamic omes (eg transcriptomes proteomesmetabolomes autoantibodyomes) and other data providing tem-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 37

Abstracts

poral profiling of thousands of molecular components The anal-ysis of such dynamic omics data necessitates the development ofnew statistical and computational methodology towards the inte-gration of the different platforms Such an approach allows us tofollow changes in the physiological states of an individual includ-ing pathway changes over time and associated network interactions(inferred nodes amp connections) A framework implementing suchmethodology will be presented in association with a pilot person-alized medicine study that monitored an initially healthy individ-ual over multiple healthy and disease states The framework willbe described including raw data analysis approaches for transcrip-tome (RNA) sequencing mass spectrometry (proteins and smallmolecules) and protein array data and an overview of quantita-tion methods available for each analysis Examples how the data isintegrated in this framework using the personalized medicine pilotstudy will also be presented The extended framework infers novelpathways components and networks assessing topological changesand is being applied to other longitudinal studies to display changesthrough dynamical biological states Assessing such multimodalomics data has the great potential for implementations of a morepersonalized precise and preventative medicine

Session 4 Bayesian Approaches for Modeling DynamicNon-Gaussian Responses

Binary State Space Mixed Models with Flexible Link FunctionsDipak Dey1 Xun Jiang2 and Carlos Abantovalle31University of Connecticut2Amgen Inc3Federal University of Rio de JaneirodipakdeyuconneduState space models (SSM) for binary time series data using a flexibleskewed link functions are introduced in this paper Commonly usedlogit cloglog and loglog links are prone to link misspecificationbecause of their fixed skewness Here we introduce three flexiblelinks as alternatives they are generalized extreme value (GEV) linksymmetric power logit (SPLOGIT) link and scale mixture of nor-mal (SMN) link Markov chain Monte Carlo (MCMC) methods forBayesian analysis of SSM with these links are implemented usingthe JAGS package a freely available software Model comparisonrelies on the deviance information criterion (DIC) The flexibilityof the propose model is illustrated to measure effects of deep brainstimulation (DBS) on attention of a macaque monkey performinga reaction-time task (Smith et al 2009) Empirical results showedthat the flexible links fit better over the usual logit and cloglog links

Bayesian Spatial-Temporal Modeling of Ecological Zero-Inflated Count DataXia Wang1 Ming-Hui Chen2 Rita C Kuo3 and Dipak K Dey21University of Cincinnati2University of Connecticut3Lawrence Berkeley National LaboratoryxiawanguceduA Bayesian hierarchical model is developed for count data with spa-tial and temporal correlations as well as excessive zeros unevensampling intensities and inference on missing spots Our contribu-tion is to develop a model on zero-inflated count data that providesflexibility in modeling spatial patterns in a dynamic manner andalso improves the computational efficiency via dimension reductionThe proposed methodology is of particular importance for studyingspecies presence and abundance in the field of ecological sciences

The proposed model is employed in the analysis of the survey databy the Northeast Fisheries Sciences Center (NEFSC) for estimationand prediction of the Atlantic cod in the Gulf of Maine - GeorgesBank region Model comparisons based on the deviance informa-tion criterion and the log predictive score show the improvement bythe proposed spatial-temporal model

Real-time Bayesian Parameter Estimation for Item ResponseModelsRuby Chiu-Hsing WengNational Chengchi UniversitychwengnccuedutwThe Bayesian item response models have been used in modeling ed-ucational testing and Internet ratings data Typically the statisticalanalysis is carried out using Markov Chain Monte Carlo (MCMC)methods However MCMC methods may not be computational fea-sible when real-time data continuously arrive and online parameterestimation is needed We develop an efficient algorithm based ona deterministic moment matching method to adjust the parametersin real-time The proposed online algorithm works well for tworeal datasets Moreover when compared with the offline MCMCmethods it achieves good accuracy but with considerably less com-putational time

Statistical Prediction for Virginia Lyme Disease EmergenceBased on Spatio-temporal Count DataYuanyuan Duan Jie Li Yili Hong Korine Kolivras Stephen Pris-ley James Campbell and David GainesVirginia Institute of TechnologyjielivteduThe increasing demand for modeling spatio-temporal data is com-putationally challenging due to the large scale spatial and temporaldimensions involved The traditional Markov Chain Monte Carlo(MCMC) method suffers from slow convergence and is computa-tionally expensive The Integrated Nested Laplace Approximation(INLA) has been proposed as an alternative to speed up the compu-tation process by avoiding the extensive sampling process requiredby MCMC However even with INLA handling large-scale spatio-temporal prediction datasets remains difficult if not infeasible inmany cases This chapter proposes a new Divide-Recombine (DR)prediction method for dealing with spatio-temporal data A largespatial region is divided into smaller subregions and then INLA isapplied to fit a spatio-temporal model to each subregion To recoverthe spatial dependence an iterative procedure has been developedto recombine the model fitting and prediction results In particularthe new method utilizes a model offset term to make adjustmentsfor each subregion using information from neighboring subregionsStable estimationprediction results are obtained after several updat-ing iterations Simulations are used to validate the accuracy of thenew method in model fitting and prediction The method is thenapplied to the areal (census tract level) count data for Lyme diseasecases in Virginia from 2003 to 2010

Session 5 Recent Advances in Astro-Statistics

Embedding the Big Bang Cosmological Model into a BayesianHierarchical Model for Super Nova Light Curve DataDavid van Dyk Roberto Trotta Xiyun Jiao and Hikmatali ShariffImperial College LondondvandykimperialacukThe 2011 Nobel Prize in Physics was awarded for the discovery thatthe expansion of the Universe is accelerating This talk describes a

38 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Bayesian model that relates the difference between the apparent andintrinsic brightnesses of object to their distance which in turn de-pends on parameters that describe this expansion While apparentbrightness can be readily measured intrinsic brightness can only beobtained for certain objects Type Ia Supernova occur when ma-terial accreting onto a white dwarf drives mass above a thresholdand triggers a powerful supernova explosion Because this occursonly in a particular physical scenario we can use covariates to es-timate intrinsic brightness We use a hierarchical Bayesian modelto leverage this information to study the expansion history of theUniverse The model includes computer models that relate expan-sion parameters to observed brightnesses along with componentsthat account for measurement error data contamination dust ab-sorption repeated measures and covariate adjustment uncertaintySophisticated MCMC methods are employed for model fitting and asecondary Bayesian analysis is conducted for residual analysis andmodel checking

Marrying Domain Knowledge and Statistical MethodsAshish Mahabal George Djorgovski Matthew Graham CiroDonalek and Andrew DrakeCalifornia Institute of TechnologyaamastrocaltecheduAstronomy datasets have been large and are getting larger by the day(TB to PB) This necessitates the use of advanced statistics for manypurposesHowever the datasets are often so large that small contam-ination rates imply large number of wrong results This makes blindapplications of methodologies unattractive Astronomical transientsare one area where rapid follow-up observations are required basedon very little data We show how the use of domain knowledge inthe right measure at the right juncture can improve classificationperformance We demonstrate this using Bayesian Networks andGaussian Process Regression on datasets from the Catalina Real-Time transient Survey which has covered 80 of the sky severaltens to a few hundreds of times over the last decade This becomeeven more critical as we move beyond PB-sized datasets in the com-ing years

Nonlinear Classification of X-Ray BinariesLuke Bornn and Saku VrtilekHarvard UniversitybornnstatharvardeduBecause of their singular nature the primary method to obtain in-formation about stellar mass black holes is to study those that arepart of a binary system However we have no widely applicablemeans of determining the nature of the compact object (whether ablack hole [BH] or a neutron star [NS]) in a binary system Thedefinitive method is dynamic measurement of the mass of the com-pact object and that can be reliably established only for eclipsingsystems The motivation for finding a way to differentiate the pres-ence of NH or BH in any XRB system is strong subtle differencesin the behavior of neutron star and black hole X-ray binaries providetests of fundamental features of gravitation such as the existence ofa black hole event horizon In this talk we present a statistical ap-proach for classifying binary systems using a novel 3D representa-tion called a color-color-intensity diagram combined with nonlinearclassification techniques The method provides natural and accurateprobabilistic classifications of X-ray binary objects

Persistent Homology and the Topology of the IntergalacticMediumFabrizio LecciCarnegie Mellon University

leccicmueduLight we observe from quasars has traveled through the intergalacticmedium (IGM) to reach us and leaves an imprint of some proper-ties of the IGM on its spectrum There is a particular imprint ofwhich cosmologists are familiar dubbed the Lyman-alpha forestFrom this imprint we can infer the density fluctuations of neutralhydrogen along the line of sight from us to the quasar With cosmo-logical simulation output we develop a methodology using localpolynomial smoothing to model the IGM Then we study its topo-logical features using persistent homology a method for probingtopological properties of point clouds and functions Describing thetopological features of the IGM can aid in our understanding of thelarge-scale structure of the Universe along with providing a frame-work for comparing cosmological simulation output with real databeyond the standard measures Motivated by this example I willintroduce persistent homology and describe some statistical tech-niques that allow us to separate topological signal from topologicalnoise

Session 6 Statistical Methods and Application in Genet-ics

Identification of Homogeneous and Heterogeneous CovariateStructure in Pooled Cohort StudiesXin Cheng1 Wenbin Lu2 and Mengling Liu1

1New York University2North Carolina State Universityxc311nyueduPooled analyses which make use of data from multiple studies asa single dataset can achieve large sample size to increase statisticalpower When inter-study heterogeneity exists however the simplepooling strategy may fail to present a fair and complete picture onvariables with heterogeneous effects Therefore it is of great im-portance to know the homogeneous and heterogeneous structure ofvariables in pooled studies In this presentation we propose a penal-ized partial likelihood approach with adaptively weighted compos-ite penalties on variablesrsquo homogeneous effects and heterogeneouseffects We show that our method can characterize the structure ofvariables as heterogeneous homogeneous and null effects and si-multaneously provide inference for the non-zero effects The resultsare readily extended to the high-dimension situation where the num-ber of parameters diverges with sample size The proposed selectionand estimation procedure can be easily implemented using the iter-ative shooting algorithm We conduct extensive numerical studiesto evaluate the practical performance of our proposed method anddemonstrate it using real studies

Gene Expression Analyses in Evaluating TranslationalBiomarkers from drug Induced Idiopathic Pulmonary Fibrosisin Animal ModelsWenfei Zhang Yuefeng Lu Tai-He Xia Guillaume Wettstein Jean-Pierre Bidouard and Xavier MarniquetSanofi-aventis US LLCwenfeizhangsanoficomTranslational biomarkers are markers that produce biological sig-nals translatable from animal models to human models Identify-ing translational biomarkers can be important for disease diagno-sis prognosis and risk prediction in drug development Thereforethere is a growing demand on statistical analyses for biomarker dataespecially for large and complex genetic data To ensure the qual-ity of statistical analyses we develop a statistical analysis pipeline

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 39

Abstracts

for gene expression data When the pipeline is applied to gene ex-pression data from drug induced idiopathic pulmonary fibrosis inanimal models it shows some interesting results in evaluating thetranslatability of genes through comparisons with human models

DNA Methylation Cell-Type Distribution and EWASE Andres HousemanOregon State UniversityandreshousemanoregonstateeduEpigenetic processes form the principal mechanisms by which celldifferentiation occurs Consequently DNA methylation measure-ments are strongly influenced by the DNA methylation profilesof constituent cell types as well as by their mixing proportionsEpigenomewide association studies (EWAS) aim to find associ-ations of phenotype or exposure with DNA methylation at sin-gle CpG dinucleotides but these associations are potentially con-founded by associations with overall cell-type distribution In thistalk we review the literature on epigenetics and cell mixture Wethen present two techniques for mixture-adjusted EWAS the firstrequires a reference data set which may be expensive or infeasibleto collect while the other is free of this requirement Finally weprovide several data analysis examples using these techniques

Secondary Quantile Analysis for GWASYing Wei1 Xiaoyu Song1 Mengling Liu2 and Iuliana Lonita-Laza1

1Columbia University2New York Universityyw2148columbiaeduCase-control designs are widely used in epidemiology and otherfields to identify factors associated with a disease of interest Thesestudies can also be used to study the associations of risk factorswith secondary outcomes such as biomarkers of the disease andprovide cost-effective way to understand disease mechanism Mostof the existing methods have focused on inference on the mean ofsecondary outcomes In this paper we propose a quantile-based ap-proach We construct a new family of estimating equations to makeconsistent and efficient estimation of conditional quantiles using thecase-control sample and also develop tools for statistical inferenceSimulations are conducted to evaluate the practical performance ofthe proposed approach and a case-control study on genetic associ-ation with asthma is used to demonstrate the method

Session 7 Statistical Inference of Complex Associationsin High-Dimensional Data

Leveraging for Big Data RegressionPing MaUniversity of GeorgiapingmaugaeduAdvances in science and technology in the past a few decades haveled to big data challenges across a variety of fields Extractionof useful information and knowledge from big data has become adaunting challenge to both the science community and entire soci-ety To tackle this challenge requires major breakthroughs in effi-cient computational and statistical approaches to big data analysisIn this talk I will present some leveraging algorithms which makea key contribution to resolving the grand challenge In these algo-rithms by sampling a very small representative sub-dataset usingsmart algorithms one can effectively extract relevant informationof vast data sets from the small sub-dataset Such algorithms arescalable to big data These efforts allow pervasive access to big data

analytics especially for those who cannot directly use supercomput-ers More importantly these algorithms enable massive ordinaryusers to analyze big data using tablet computers

Reference-free Metagenomics Analysis Using Matrix Factoriza-tionWenxuan Zhong and Xin XingUniversity of Georgiawenxuanugaedu

metagenomics refers to the study of a collection of genomes typi-cally microbial genomes present in a sample The sample itself cancome from diverse sources depending on the study eg a samplefrom the gastrointestinal tract of a human patient from or a sam-ple of soil from a particular ecological origin The premise is thatby understanding the genomic composition of the sample one canform hypotheses about properties of the sample eg disease corre-lates of the patient or ecological health of the soil source Existingmethods are limited in complex metagenome studies by consider-ing the similarity between some short DNA fragments and genomesin database In this talk I will introduce a reference free genomedeconvolution algorithm that can simultaneously estimate the com-position of a microbial community and estimate the quantity of eachspecies some theoretical results of the deconvolution method willalso be discussed

Big Data Big models Big Problems Statistical Principles andPractice at ScaleAlexander W BlockerGoogleawblockergooglecom

Massive datasets can yield great insights but only when unitedwith sound statistical principles and careful computation We sharelessons from a set of problems in industry all which combine clas-sical design and theory with large-scale computation Simply ob-taining reliable confidence intervals means grappling with complexdependence and distributed systems and obtaining masses of addi-tional data can actually degrade estimates without careful inferenceand computation These problems highlight the opportunities forstatisticians to provide a distinct contribution to the world of bigdata

Session 8 Recent Developments in Survival Analysis

Bayesian Joint Modeling of Multi-dimensional Longitudinaland Survival Data with Applications to Cancer Clinical TrialsMing-Hui Chen1 Danjie Zhang1 Joseph G Ibrahim2 Mark EBoye3 and Wei Shen3

1University of Connecticut2University of North Carolina3Eli Lilly and Companyming-huichenuconnedu

Motivated from the large phase III multicenter randomized single-blind EMPHACIS mesothelioma clinical trial we develop a classof shared parameter joint models for multi-dimensional longitudi-nal and survival data Specifically we propose a class of multivari-ate mixed effects regression models for multi-dimensional longitu-dinal measures and a class of frailty and cure rate survival mod-els for progression free survival (PFS) time and overall survival(OS) time The properties of the proposed models are examinedin detail In addition we derive the decomposition of the loga-rithm of the pseudo marginal likelihood (LPML) (ie LPML =

40 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

LPMLLong +LPMLSurv|Long) to assess the fit of each compo-nent of the joint model and in particular to assess the fit of the lon-gitudinal component and the survival component of the joint modelseparately and further use ∆LPML to determine the importance andcontribution of the longitudinal data to the model fit of the survivaldata Moreover efficient Markov chain Monte Carlo sampling algo-rithms are developed to carry out posterior computation We applythe proposed methodology to a detailed case study in mesothelioma

Estimating Risk with Time-to-Event Data An Application tothe Womenrsquos Health InitiativeDandan Liu1 Yingye Zheng2 Ross Prentice2 and Li Hsu2

1Vanderbilt University2Fred Hutchinson Cancer Research Centerdandanliuvanderbiltedu

Accurate and individualized risk prediction is critical for popula-tion control of chronic diseases such as cancer and cardiovasculardisease Large cohort studies provide valuable resources for build-ing risk prediction models as the risk factors are collected at thebaseline and subjects are followed over time until disease occur-rence or termination of the study However for rare diseases thebaseline risk may not be estimated reliably based on cohort dataonly due to sparse events In this paper we propose to make useof external information to improve efficiency for estimating time-dependent absolute risk We derive the relationship between exter-nal disease incidence rates and the baseline risk and incorporate theexternal disease incidence information into estimation of absoluterisks while allowing for potential difference of disease incidencerates between cohort and external sources The asymptotic distribu-tions for the proposed estimators are established Simulation resultsshow that the proposed estimator for absolute risk is more efficientthan that based on the Breslow estimator which does not utilize ex-ternal disease incidence rates A large cohort study the WomenrsquosHealth Initiative Observational Study is used to illustrate the pro-posed method

Efficient Estimation of Nonparametric Genetic Risk Functionwith Censored DataYuanjia Wang1 Baosheng Liang2 and Donglin Zeng3

1Columbia University2Beijing Normal University3University of North Carolina at Chapel Hillyw2016columbiaedu

With an increasing number of causal genes discovered forMendelian and complex human disorders it is important to assessthe genetic risk distribution functions of disease onset for subjectswho are carriers of these causal mutations and compare them withthe disease distribution in non-carriers In many genetic epidemi-ological studies of the genetic risk functions the disease onset in-formation is subject to censoring In addition subjectsrsquo mutationcarrier or non-carrier status is unknown due to thecost of ascertain-ing subjects to collect DNA samples or due to death in older sub-jects (especially for late onset disease) Instead the probability ofsubjectsrsquo genetic marker or mutation status can be obtained fromvarious sources When genetic status is missing the available datatakes the form of mixture censored data Recently various meth-ods have been proposed in the literature using parametric semi-parametric and nonparametric models to estimate the genetic riskdistribution functions from such data However none of the existingapproach is efficient in the presence of censoring and mixture andthe computation for some methods is demanding In this paper wepropose a sieve maximum likelihood estimation which is fully effi-

cient to infer genetic risk distribution functions nonparametricallySpecifically we estimate the logarithm of hazards ratios betweengenetic risk groups using B-splines while applying the nonpara-metric maximum likelihood estimation (NPMLE) for the referencebaseline hazard function Our estimator can be calculated via anEM algorithm and the computation is much faster than the exist-ing methods Furthermore we establish the asymptotic distributionof the obtained estimator and show that it is consistent and semi-parametric efficient and thus the optimal estimator in this frame-work The asymptotic theory on our sieve estimator sheds light onthe optimal estimation for censored mixture data Simulation stud-ies demonstrate superior performance of the proposed method insmall finite samples The method is applied to estimate the distri-bution of Parkinsonrsquos disease (PD) age at onset for carriers of mu-tations in the leucine-rich repeat kinase 2 (LRRK2) G2019S geneusing the data from the Michael J Fox Foundation AshkenaziJewishLRRK2 consortium This estimation is important for genetic coun-seling purposes since this test is commercially available yet geneticrisk (penetrance) estimates have been variable

Support Vector Hazard Regression for Predicting Event TimesSubject to CensoringXiaoxi Liu1 Yuanjia Wang2 and Donglin Zeng1

1University of North Carolina2Columbia UniversitydzengemailunceduPredicting dichotomous or continuous disease outcomes using pow-erful machine learning approaches has been studied extensively invarious scientific areas However how to learn prediction rules fortime-to-event outcomes subject to right censoring has received lit-tle attention until very recently Existing approaches rely on in-verse probability weighting or rank-based methods which are inef-ficient In this paper we develop a novel support vector hazards re-gression (SVHR) approach to predict time-to-event outcomes usingright censored data Our method is based on predicting the countingprocess via a series of support vector machines for time-to-eventoutcomes among subjects at risk Introducing counting processesto represent the time-to-event data leads to an intuitive connectionof the method with support vector machines in standard supervisedlearning and hazard regression models in standard survival analy-sis The resulting optimization is a convex quadratic programmingproblem that can easily incorporate non-linearity using kernel ma-chines We demonstrate an interesting connection of the profiledempirical risk function with the Cox partial likelihood which shedslights on the optimality of SVHR We formally show that the SVHRis optimal in discriminating covariate-specific hazard function frompopulation average hazard function and establish the consistencyand learning rate of the predicted risk Simulation studies demon-strate much improved prediction accuracy of the event times usingSVHR compared to existing machine learning methods Finally weapply our method to analyze data from two real world studies todemonstrate superiority of SVHR in practical settings

Session 9 Industry Practice and Regulatory Pathway forBenefit-Risk Assessment of Medicinal Products

Visual Communication and Assessment of Benefit-Risk forMedical ProductsJonathan D NortonMedImmunenortonjmedimmunecom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 41

Abstracts

Benefit-risk assessments are multidimensional and hence challeng-ing both to formulate and to communicate A particular limitation ofsome benefit-risk graphics is that they are based on the marginal dis-tributions of benefit and harm and do not show the degree to whichthey occur in the same patients Consider for example an imagi-nary drug that is beneficial to 50At the 2010 ICSA Symposium the speaker introduced a graphicshowing the benefit-risk state of each subject over time This talkwill include a new graphic based on similar principles that is in-tended for early phase studies It allows the user to assess the jointdistribution of benefit and harm on the individual and cohort levelsThe speaker will also review other graphical displays that may beeffective for benefit-risk assessment considering accepted princi-ples of statistical graphics and his experience working for FDA andindustry

Some Thoughts on Evaluation of Uncertainty in Benefit-RiskAssessmentQi Jiang1 Haijun Ma1 Christy Chuang-Stein2 Weili He3 GeorgeQuartey4 John Scott5 and Shihua Wen6

1Amgen Inc2Pfizer Inc3Merck amp Co4Hoffmann-La Roche5United States Food and Drug Administration6AbbVie IncqjiangamgencomIncreasingly companies regulatory agencies and other governancebodies are moving toward structured benefitrisk assessment ap-proaches One issue that complicates such structured approachesis uncertainty which comes from multiple sources and needs to beaddressed To develop potential approaches to address these sourcesof uncertainty it is critical first to have a thorough understanding ofthem In this presentation members from the Benefit-risk Work-ing Group of the Quantitative Sciences in Pharmaceutical Industry(QSPI BRWG) will discuss some major sources of uncertainty andshare some thoughts on how to address them

Current Concept of Benefit Risk Assessment of MedicineSyed S IslamAbbVie IncsyedislamabbviecomBenefit-risk assessment of a medicine should be as dynamic as thestages of drug development and life cycle of a drug Three fun-damental clinical concepts are critical at all stages- seriousness ofthe disease how much improvement will occur due to the drug un-der consideration and harmful effects including frequency serious-ness and duration One has to achieve a desirable balance betweenthese particularly prior to market approval and follow-up prospec-tively to see that the balance is maintained The desirable balanceis not a straightforward concept It depends on judgment by var-ious stakeholders The patients who are the direct beneficiary ofthe medicine should be the primary stakeholder provided adequateclear and concise information are available to them The healthcareproviders must have similar information that they can communicateto their patients The regulators and insurers are also stakehold-ers for different reasons Industry that are developing or producingthe drug must provide adequate and transparent information usableby all stakeholders Any quantitative approach to integrated bene-fit risk balance should be parsimonious and transparent along withsensitivity analyses This presentation will discuss pros and consof a dynamic benefit risk assessment and how integrated befit risk

analyses can be incorporated within the FDAEMA framework thatincludes patient preference

Session 10 Analysis of Observational Studies and Clini-cal Trials

Impact of Tuberculosis on Mortality Among HIV-Infected Pa-tients Receiving Antiretroviral Therapy in Uganda A CaseStudy in Propensity Score AnalysisRong Chu1 Edward J Mills2 Joseph Beyene3 EleanorPullenayegum4 Celestin Bakanda5 Jean B Nachega6 and LehanaThabane31Agensys Inc (Astellas)2University of OttawaMcMaster University3McMaster University4McMaster UniversityUniversity of Toronto5The AIDS Support Organization6Stellenbosch UniversityrongchuagensyscomBackground Tuberculosis (TB) disease affects survival among HIVco-infected patients on antiretroviral therapy (ART) Yet the mag-nitude of TB disease on mortality is poorly understoodMethods Using a prospective cohort of 22477 adult patients whoinitiated ART between August 2000 and June 2009 in Uganda weassessed the effect of active pulmonary TB disease at the initiationof ART on all-cause mortality using a Cox proportional hazardsmodel Propensity score (PS) matching was used to control for po-tential confounding Stratification and covariate adjustment for PSand not PS-based multivariable Cox models were also performedResults A total of 1609 (752) patients had active pulmonaryTB at the start of ART TB patients had higher proportions of beingmale suffering from AIDS-defining illnesses having World HealthOrganization (WHO) disease stage III or IV and having lower CD4cell counts at baseline (piexcl0001) The percentages of death duringfollow-up were 1047 and 638 for patients with and withoutTB respectively The hazard ratio (HR) for mortality comparing TBto non-TB patients using 1686 PS-matched pairs was 137 (95confidence interval [CI] 108 - 175) less marked than the crudeestimate (HR = 174 95 CI 149 - 204) The other PS-basedmethods and not PS-based multivariable Cox model produced sim-ilar resultsConclusions After controlling for important confounding variablesHIV patients who had TB at the initiation of ART in Uganda had anapproximate 37 increased hazard of overall mortality relative tonon-TB patients

Ecological Momentary Assessment Methods to Increase Re-sponse and Adjust for Attrition in a Study of Middle SchoolStudentsrsquo Exposure to Alcohol AdvertisingSteven Martino Rebecca Collins Stephanie Kovalchik KirstenBecker Elizabeth DrsquoAmico William Shadel and Marc ElliottRAND CorporationskovalchrandorgEcological momentary assessment (EMA) is a new approach forcollecting data about repeated exposures in natural settings thathas become more practical with the growth of mobile technolo-gies EMA has the potential to reduce recall bias However be-cause EMA occurs more often and frequently than traditional sur-veys missing data is common In this paper we describe the de-sign and preliminary results of a longitudinal EMA study of expo-sure to alcohol advertising among middle school students (n=600)

42 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

which employed a randomized missing design to increase responserates to smartphone surveys Early results (n=125) show evidenceof attrition over the 14-day collection period which was not associ-ated with student characteristics but was associated with study dayWe develop a prediction model for non-response and adjust for at-trition in exposure summaries using inverse probability weightingAttrition-adjusted estimates suggest that youths saw an average of38 alcohol ads per day over twice what has been previously re-ported with conventional assessment Corrected for attrition EMAmay allow more accurate estimation of frequent exposures than one-time delayed recall

Is Poor Antisaccade Performance in Healthy First-Degree Rel-atives of Schizophrenics an Artifact of Study DesignCharity J Morgan1 Mark F Lenzenweger2 and Deborah L Levy31University of Alabama at Birmingham2State University of New York at Binghamton3McLean Hospitalcjmorganuabedu

A number of traits associated with schizophrenia aggregate in rel-atives of schizophrenia patients at rates much higher than thatof the clinical disorder These traits considered candidate en-dophenotypes may be alternative more penetrant manifestations ofschizophrenia risk genes than schizophrenia itself Performance onthe antisaccade task a measure of eye-tracking dysfunction is oneof the most widely studied candidate endophenotypes Howeverthere is little consensus on whether poor antisaccade performanceis a true endophenotype for schizophrenia Some studies compar-ing the performance of healthy relatives of schizophrenia patients(RelSZ) to that of normal controls (NC) report that RelSZ showsignificantly more errors while others find no statistically signifi-cant differences between the two groups A recent meta-analysis ofthese studies noted that some studies used stricter exclusion criteriafor NC than RelSZ and found these studies were more likely to findsignificant effect sizes Specifically NC in these studies with a per-sonal or family history of psychopathology were excluded whereasall RelSZ including those with psychotic conditions were includedIn order to determine whether a difference in antisaccade perfor-mance between NC and RelSZ remains after controlling for differ-ences in psychopathology we a binomial regression model to datafrom an antisaccade task We demonstrate that both psychopathol-ogy and familial history affect antisaccade performance

Analysis of a Vaccine Study in Animals using Mitigated Frac-tion in SASMathew RosalesExperismattrosalesexperiscom

Mitigated fraction is frequently used to evaluate the effect of an in-tervention in reducing the severity of a particular outcome a com-mon measure in vaccines study It utilizes rank of the observa-tions and measures the overlap of the two distributions using theirstochastic ordering Percent lung involvement is a common end-point in vaccines study to assess efficacy and mitigated fractionis used to estimate the relative increase in probability that a dis-ease will be less severe to the vaccinated group A SAS macro wasdevelop to estimate the mitigated fraction and its confidence inter-val The macro provides an asymptotic confidence interval and abootstrap-based interval For illustration an actual vaccine studywas used where the macro was utilized to generate the estimates

Competing Risks Survival Analysis for Efficacy Evaluation of

Some-or-None Vaccines

Paul T Edlefsen

Fred Hutchinson Cancer Research Centerpedlefsefhcrcorg

Evaluation of a vaccinersquos efficacy to prevent a specific type of in-fection endpoint in the context of multiple endpoint types is animportant challenge in biomedicine Examples include evaluationof multivalent vaccines such as the annual influenza vaccines thattarget multiple strains of the pathogen While statistical methodshave been developed for ldquomark-specific vaccine efficacyrdquo (wherethe term ldquomarkrdquo refers to a feature of the endpoint such as its typein contrast to a covariate of the subject) these methods addressonly vaccines that have a ldquoleakyrdquo vaccine mechanism meaningthat the vaccinersquos effect is to reduce the per-exposure probabilityof infection The usual presentation of vaccine mechanisms con-trasts ldquoleakyrdquo with ldquoall-or-nonerdquo vaccines which completely pro-tect some fraction of the subjects independent of the number ofexposures that each subject experiences We introduce the notion ofthe ldquosome-or-nonerdquo vaccine mechanism which completely protectsa fraction of the subjects from a defined subset of the possible end-point marks for example for a flu vaccine that completely protectsagainst the seasonal flu but has no effect against the H1N1 strainUnder conditions of non-harmful vaccines we introduce a frame-work and Bayesian and frequentist methods to detect and quantifythe extent to which a vaccinersquos partial efficacy is attributable to un-even efficacy across the marks rather than to incomplete ldquotakerdquo ofthe intervention These new methods provide more power than ex-isting methods to detect mark-varying efficacy (also called ldquosieveeffectsrdquo when the conditions hold We demonstrate the new frame-work and methods with simulation results and with new analyses ofgenetic signatures of vaccine effects in the RV144 HIV-1 vaccineefficacy trial

Using Historical Data to Automatically Identify Air-TrafficController Behavior

Yuefeng Wu

University of Missouri at St Louiswuyueumsledu

The Next Generation Air Traffic Control Systems are trajectory-based automation systems that rely on predictions of future statesof aircraft instead of just using human abilities that is how Na-tional Airspace System (NAS) does now As automation relyingon trajectories becomes more safety critical the accuracy of thesepredictions needs to be fully understood Also it is very importantfor researchers developing future automation systems to understandand in some cases mimic how current operations are conducted byhuman controllers to ensure that the new systems are at least as ef-ficient as humans and to understand creative solutions used by hu-man controllers The work to be presented answers both of thesequestions by developing statistical-based machine learning modelsto characterize the types of errors present when using current sys-tems to predict future aircraft states The models are used to infersituations in the historical data where an air-traffic controller inter-vened on an aircraftrsquos route even when there is no direct recordingof this action Local time series models and some other statisticsare calculated to construct the feature vector then both naive Bayesclassifier and support vector machine are used to learn the patternof the prediction errors

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 43

Abstracts

Session 11 Lifetime Data Analysis

Analysis of Multiple Type Recurrent Events When Only PartialInformation Is Available for Some SubjectsMin Zhan and Jeffery FinkUniversity of MarylandmzhanepiumarylandeduIn many longitudinal studies subjects may experience multipletypes of recurrent events In some situations the exact occurrencetimes of the recurrent events are not observed for some subjectsInstead the only information available is whether these subjects ex-perience each type of event in successive time intervals We discussmarginal models to assess the effect of baseline covariates on the re-current events The proposed methods are applied to a clinical studyof chronic kidney disease in which subjects can experience multipletypes of safety events repeatedly

Cumulative Incidence Function under Two-Stage Randomiza-tionIdil Yavuz1 Yu Cheng2 and Abdus Wahed2

1 Dokuz Eylul University2 University of PittsburghyuchengpitteduIn recent years personalized medicine and dynamic treatment regi-mens have drawn considerable attention Dynamic treatment regi-mens are sets of rules that govern the treatment of subjects depend-ing on their intermediate responses or covariates Two-stage ran-domization is a useful set-up to gather data for making inference onsuch regimens Meanwhile more and more practitioners becomeaware of competing-risk censoring for event type outcomes wheresubjects in a study are exposed to more than one possible failureand the specific event of interest may be dependently censored bythe occurrence of competing events We aim to compare severaltreatment regimens from a two-stage randomized trial on survivaloutcomes that are subject to competing-risk censoring With thepresence of competing risks cumulative incidence function (CIF)has been widely used to quantify the cumulative probability of oc-currence of the target event by a specific time point However if weonly use the data from those subjects who have followed a specifictreatment regimen to estimate the CIF the resulting naive estima-tor may be biased Hence we propose alternative non-parametricestimators for the CIF using inverse weighting and provide infer-ence procedures based on the asymptotic linear representation Inaddition test procedures are developed to compare the CIFs fromtwo different treatment regimens Through simulation we show thepracticality and advantages of the proposed estimators compared tothe naive estimator Since dynamic treatment regimens are widelyused in treating cancer AIDS psychological disorders and otherillnesses that require complex treatment and competing-risk cen-soring is common in studies with multiple endpoints the proposedmethods provide useful inferential tools to analyze such data andwill help advocate research in personalized medicine

Nonparametric Threshold Selection with Censored SurvivalDataXinhua Liu and Zhezhen JinColumbia Universityzj7columbiaeduIn biomedical research and practice quantitative biomarkers are of-ten used for diagnostic or prognostic purposes with a threshold es-tablished on the measurement to aid binary classification Whenprognosis is on survival time single threshold may not be infor-mative It is also challenging to select threshold when the survival

time is subject to random censoring Using survival time dependentsensitivity and specificity we extend classification accuracy basedobjective function to allow for survival dependent threshold Toestimate optimal threshold for a range of survival rate we adopt anon-parametric procedure which produces satisfactory result in asimulation study The method will be illustrated with a real exam-ple

Session 12 Safety Signal Detection and Safety Analysis

Evaluation of Statistical Methods for the Identification of Po-tential Safety SignalsMaggie Chen1 Li Zhu1 Padmaja Chiruvolu Liying Zhang and QiJiangAmgen Incmagchenamgencom

With the increased regulatory requirements for risk evaluation andminimization strategies large volumes of comprehensive safetydata have been collected and maintained by pharmaceutical spon-sors and proactive evaluation of such safety data for continuousassessment of product safety profile has become essential duringthe drug development life-cycle This presentation will introduceseveral key statistical methodologies developed for safety signalscreening detection including some methods recommended by reg-ulatory agencies for spontaneous reporting data as well as a few re-cently developed methodologies for clinical trials data In additionextensive simulation results will be presented to compare perfor-mance of these methods in terms of sensitivity and false discoveryrate The conclusion and recommendation will be briefed as well

Application of a Bayesian Method for Blinded Safety Monitor-ing and Signal Detection in Clinical TrialsShihua Wen Jyotirmoy Dey Greg Ball and Karolyn KrachtAbbVie Incshihuawenabbviecom

Monitoring patient safety is an indispensable component of clini-cal trial planning and conduct Proactive blinded safety monitoringand signal detection in on-going clinical trials enables pharmaceu-tical sponsors to monitor patient safety closely and at the same timemaintain the study blind Bayesian methods by their nature of up-dating knowledge based on accumulating data provide an excel-lent framework for carrying out such a safety monitoring processThis presentation will provide a step by step illustration of howseveral Bayesian models such as beta-binomial model Poisson-gamma model posterior probability vs predictive probability cri-terion etc can be applied to safety monitoring for a particular ad-verse event of special interest (AESI) in a real clinical trial settingunder various adverse event occurrence patterns

Some Thoughts on the Choice of Metrics for Safety EvaluationSteven SnapinnAmgen Incssnapinnamgencom

The magnitude of the treatment effect on adverse events can be as-sessed on a relative scale such as the hazard ratio or the relative riskor on an absolute scale such as the risk difference but there doesnrsquotappear to be any consistency regarding which metric should be usedin any given situation In this presentation I will provide some ex-amples where different metrics have been used discuss their advan-tages and disadvantages and provide a suggested approach

44 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Hypothesis Testing on Safety Data A Recurrent Event Ap-proachQi Gong1 and Liang Fang2

1Amgen Inc2Gilead SciencesliangfanggileadcomAs an important aspect of the clinical evaluation of an investiga-tional therapy safety data are routinely collected in clinical trialsTo date the analysis of safety data has largely been limited to de-scriptive summaries of incidence rates or contingency tables aim-ing to compare simple rates between treatment arms Many haveargued this traditional approach failed to take into account impor-tant information including severity onset time and multiple occur-rences of a safety signal In addition premature treatment discon-tinuation due to excessive toxicity causes informative censoring andmay lead to potential bias in the interpretation of safety outcomesIn this article we propose a framework to summarize safety datawith mean frequency function and compare safety events of interestbetween treatments with a generalized log-rank test taking into ac-count the aforementioned characteristics ignored in traditional anal-ysis approaches In addition a multivariate generalized log-ranktest to compare the overall safety profile of different treatments isproposed In the proposed method safety events are considered tofollow a recurrent event process with a terminal event for each pa-tient The terminal event is modeled by a process of two types ofcompeting risks safety events of interest and other terminal eventsStatistical properties of the proposed method are investigated viasimulations An application is presented with data from a phase IIoncology trial

Session 13 Survival and Recurrent Event Data Analysis

Survival Analysis without Survival DataGary ChanUniversity of WashingtonkcgchanuweduWe show that relative mean survival parameters of a semiparametriclog-linear model can be estimated using covariate data from an inci-dent sample and a prevalent sample even when there is no prospec-tive follow-up to collect any survival data Estimation is based onan induced semiparametric density ratio model for covariates fromthe two samples and it shares the same structure as for a logisticregression model for case-control data Likelihood inference coin-cides with well-established methods for case-control data We showtwo further related results First estimation of interaction parame-ters in a survival model can be performed using covariate informa-tion only from a prevalent sample analogous to a case-only analy-sis Furthermore propensity score and conditional exposure effectparameters on survival can be estimated using only covariate datacollected from incident and prevalent samples

Semiparametric Estimation for the Additive Hazards Modelwith Left-Truncated and Right-Censored DataChiung-Yu Huang1 and Jing Qin2

1Johns Hopkins University2National Institute of Allergy and Infectious DiseasescyhuangjhmieduSurvival data from prevalent cases collected under a cross-sectionalsampling scheme are subject to left-truncation When fitting an ad-ditive hazards model to left-truncated data the conditional estimat-ing equation method (Lin and Ying 1994) obtained by modifyingthe risk sets to account for left-truncation can be very inefficient

as the marginal likelihood of the truncation times is not used inthe estimation procedure In this paper we use a pairwise pseudo-likelihood to eliminate nuisance parameters from the marginal like-lihood and by combining the marginal pairwise pseudo-score func-tion and the conditional estimating function propose an efficientestimator for the additive hazards model The proposed estimatoris shown to be consistent and asymptotically normally distributedwith a sandwich-type covariance matrix that can be consistently es-timated Simulation studies show that the proposed estimator ismore efficient than its competitors A data analysis illustrates themethod

Nonparametric Method for Data of Recurrent Infections afterHematopoietic Cell TransplantationChi Hyun Lee1 Xianghua Luo1 Chiung-Yu Huang2 and ToddDeFor11University of Minnesota2Johns Hopkins Universityluox0054umnedu

Infection is one of the most common complications afterhematopoietic cell transplantation It accounts for substantial mor-bidity and mortality among transplanted patients Many patientsexperience infectious complications repeatedly over time Existingstatistical methods for recurrent gap time data typically assume thatpatients are enrolled due to the occurrence of an event of the sametype as the recurrent event or assume that all gap times includingthe first gap are identically distributed Applying these methods onthe post-transplant infection data by ignoring event types will in-evitably lead to incorrect inferential results because the time fromthe transplant to the first infection has a different biological mean-ing than the gap times between recurrent infections after the firstinfection occurs Alternatively one may only analyze data afterthe first infection to make the existing recurrent gap time methodsapplicable but this introduces selection bias because only patientswho have experienced infections are included in the analysis Othernaive approaches may include using the univariate survival analysismethods eg the Kaplan-Meier method on the first infection onlydata or using the bivariate serial event data methods on the data upto the second infections Hence all subsequent infection data be-yond the first or the second infectious events will not be utilized inthe analysis These inefficient methods are expected to lead to de-creased power In this paper we propose a nonparametric estimatorof the joint distribution of time from transplant to the first infectionand the gap times between following infections and a semiparamet-ric regression model for studying the risk factors of infectious com-plications of the transplant patients The proposed methods takeinto account the potentially differential distribution of two types oftimes (time from transplant to the first infection and the gap timesbetween subsequent recurrent infections) and fully utilizes the dataof recurrent infections from patients Asymptotic properties of theproposed estimators are established

Session 14 Statistical Analysis on Massive Data fromPoint Processes

Identification of Synaptic Learning Rule from Ensemble Spik-ing ActivitiesDong Song and Theodore W BergerUniversity of Southern Californiadsonguscedu

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 45

Abstracts

Brain represents and processes information with spikes To under-stand the biological basis of brain functions it is essential to modelthe spike train transformations performed by brain regions Sucha model can also be used as a computational basis for developingcortical prostheses that can restore the lost cognitive function bybypassing the damaged brain regions We formulate a three-stagestrategy for such a modeling goal First we formulated a multiple-input multiple-output physiologically plausible model for repre-senting the nonlinear dynamics underlying spike train transforma-tions This model is equivalent to a cascade of a Volterra model anda generalized linear model The model has been successfully ap-plied to the hippocampal CA3-CA1 during learned behaviors Sec-ondly we extend the model to nonstationary cases using a point-process adaptive filter technique The resulting time-varying modelcaptures how the MIMO nonlinear dynamics evolve with time whenthe animal is learning Lastly we seek to identify the learning rulethat explains how the nonstationarity is formed as a consequence ofthe input-output flow that the brain region has experienced duringlearning

Intrinsically Weighted Means and Non-Ergodic Marked PointProcessesAlexander Malinowski1 Martin Schlather1and Zhengjun Zhang2

1University Mannheim2University of WisconsinzjzstatwisceduWhilst the definition of characteristics such as the mean mark in amarked point process (MPP) setup is non-ambiguous for ergodicprocesses several definitions of mark averages are possible andmight be practically relevant in the stationary but non-ergodic caseWe give a general approach via weighted means with possibly in-trinsically given weights We discuss estimators in this situationand show their consistency and asymptotic normality under certainconditions We also suggest a specific choice of weights that has aminimal variance interpretation under suitable assumptions

Statistical Analysis for Unlabeled Data ObjectsEla Sienkiewicz and Haonan WangColorado State UniversitysienkiewstatcolostateeduThis talk is motivated by a data set of brain neuron cells Each neu-ron is modeled as an unlabeled data object with topological and ge-ometric properties characterizing the branching structure connect-edness and orientation of a neuron This poses serious challengessince traditional statistical methods for multivariate data rely on lin-ear operations in Euclidean space We develop two curve represen-tations for each object and define the notion of percentiles basedon measures of topological and geometric variations through multi-objective optimization In general numerical solutions can be pro-vided by implementing genetic algorithm The proposed methodol-ogy is illustrated by analyzing a data set of pyramidal neurons

Session 15 High Dimensional Inference (or Testing)

Adaptive Sparse Reduced-rank RegressionZongming Ma and Tingni SunUniversity of PennsylvaniatingniwhartonupenneduThis paper studies the problem of estimating a large coefficient ma-trix in a multiple response linear regression model when the coef-ficient matrix is both sparse and of low rank We are especiallyinterested in the high dimensional settings where the number of

predictors andor response variables can be much larger than thenumber of observations We propose a new estimation schemewhich achieves competitive numerical performance while signifi-cantly reducing computation time when compared with state-of-the-art methods Moreover we show the proposed estimator achievesnear optimal non-asymptotic minimax rates of estimation under acollection of squared Schatten norm losses simultaneously by pro-viding both the error bounds for the estimator and minimax lowerbounds In particular such optimality results hold in the high di-mensional settings

Variable Screening in Biothreat Detection Using WeightedLeverage ScoreWenxuan Zhong and Yiwen LiuUniversity of GeorgiayiwenliuugaeduThe early detection of biothreat is extremely difficult because mostof the early clinical signs in infected subjects show indistinguish-able ldquoflu-likerdquo symptoms Recent researches show that the genomicmarkers are the most reliable indicators and thus are widely usedin the existing detection methods in the past decades In this talk Iwill introduce a biomarker screening method based on the weightedleverage score The weighted leverage score is a variant of the lever-age score that has been widely used for the diagnostic of linear re-gression Empirical studies demonstrate that the weighted leveragescore is not only computationally efficient but also statistically ef-fective in variable screening

Testing High-Dimensional Nonparametric Function with Appli-cation to Gene Set AnalysisTao He Ping-Shou Zhong Yuehua Cui and Vidyadhar MandrekarMichigan State UniversitypszhongsttmsueduThis paper proposes a test statistic for testing the high-dimensionalnonparametric function in a reproducing kernel Hilbert space gen-erated by a positive definite kernel We studied the asymptotic dis-tribution of the test statistic under the null hypothesis and a series oflocal alternative hypotheses in a large p smalln setup A simulationstudy was used to evaluate the finite sample performance of the pro-posed method We applied the proposed method to yeast data andthyroid hormone data to identify pathways that are associated withtraits of interest

Zero-Inflation in Clustered Binary Response Data MixedModel and Estimating Equation ApproachesDanping LiuNational Institutes of HealthdanpingliunihgovThe NEXT Generation Health study investigates the dating violenceof adolescents using a survey questionnaire Each student is askedto affirm or deny multiple instances of violence in hisher datingrelationship There is however evidence suggesting that studentsnot in a relationship responded to the survey resulting in excessivezeros in the responses This paper proposes likelihood-based andestimating equation approaches to analyze the zero-inflated clus-tered binary response data We adopt a mixed model method toaccount for the cluster effect and the model parameters are esti-mated using a maximum-likelihood (ML) approach that requires aGaussian-Hermite quadrature (GHQ) approximation for implemen-tation Since an incorrect assumption on the random effects distribu-tion may bias the results we construct generalized estimating equa-tions (GEE) that do not require the correct specification of within-cluster correlation In a series of simulation studies we examine

46 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the performance of ML and GEE methods in terms of their biasefficiency and robustness We illustrate the importance of properlyaccounting for this zero-inflation by re-analyzing the NEXT datawhere this issue has previously been ignored

Session 16 Phase II Clinical Trial Design with SurvivalEndpoint

Utility-Based Optimization of Schedule-Dose Regimes based onthe Times to Response and ToxicityPeter F Thall1 Hoang Q Nguyen1 Thomas Braun2 and MuzaffarQazilbash1

1University of Texas MD Anderson Cancer Center2University of Michiganrexmdandersonorg

A two-stage Bayesian phase I-II design for jointly optimizing ad-ministration schedule and dose of an experimental agent based onthe times to response and toxicity is described Sequentially adap-tive decisions are based on the joint utility of the two event timesA utility surface is constructed by partitioning the two-dimensionalquadrant of event time pairs into rectangles eliciting a numericalutility for each rectangle and fitting a smooth parametric functionto the elicited values Event times are modeled using gamma distri-butions with shape and scale parameters both functions of sched-ule and dose In stage 1 patients are randomized fairly amongschedules and a dose is chosen within each schedule using an algo-rithm that hybridizes greedy optimization and randomization amongnearly optimal doses In stage 2 fair randomization among sched-ules is replaced by the hybrid algorithm An extension to accommo-date death or discontinuation of follow up is described The designis illustrated by an autologous stem cell transplantation trial in mul-tiple myeloma

Bayesian Decision Theoretic Two-Stage Design in Phase II Clin-ical Trials with Survival EndpointLili Zhao and Jeremy TaylorUniversity of Michiganzhaoliliumichedu

In this study we consider two-stage designs with failure-time end-points in single arm phase II trials We propose designs in whichstopping rules are constructed by comparing the Bayes risk of stop-ping at stage one to the expected Bayes risk of continuing to stagetwo using both the observed data in stage one and the predicted sur-vival data in stage two Terminal decision rules are constructed bycomparing the posterior expected loss of a rejection decision ver-sus an acceptance decision Simple threshold loss functions are ap-plied to time-to-event data modelled either parametrically or non-parametrically and the cost parameters in the loss structure are cal-ibrated to obtain desired Type I error and power We ran simula-tion studies to evaluate design properties including type IampII errorsprobability of early stopping expected sample size and expectedtrial duration and compared them with the Simon two-stage de-signs and a design which is an extension of the Simonrsquos designswith time-to-event endpoints An example based on a recently con-ducted phase II sarcoma trial illustrates the method

Single-Arm Phase II Group Sequential Trial Design with Sur-vival Endpoint at a Fixed Time PointJianrong Wu and Xiaoping XiongSt Jude Childrenrsquos Research Hospitaljianrongwustjudeorg

Three non-parametric test statistics are proposed to design single-arm phase II group sequential trials for monitoring survival proba-bility The small-sample properties of these test statistics are stud-ied through simulations Sample size formulas are derived for thefixed sample test The Brownian motion property of the test statis-tics allowed us to develop a flexible group sequential design using asequential conditional probability ratio test procedure

Session 17 Statistical Modeling of High-throughput Ge-nomics Data

Learning Genetic Architecture of Complex Traits Across Popu-lationsMarc Coram Sophie Candille and Hua TangStanford UniversityhualtanggmailcomGenome-wide association studies (GWAS) have successfully re-vealed many loci that influence complex traits and disease suscep-tibilities An unanswered question is ldquoto what extent does the ge-netic architecture underlying a trait overlap between human popula-tionsrdquo We explore this question using blood lipid concentrations asa model trait In African Americans and Hispanic Americans par-ticipating in the Womenrsquos Health Initiative SNP Health AssociationResource we validated one African-specific HDL locus as well as14 known lipid loci that have been previously implicated in stud-ies of European populations Moreover we demonstrate strikingsimilarities in genetic architecture (loci influencing the trait direc-tion and magnitude of genetic effects and proportions of pheno-typic variation explained) of lipid traits across populations In par-ticular we found that a disproportionate fraction of lipid variationin African Americans and Hispanic Americans can be attributed togenomic loci exhibiting statistical evidence of association in Euro-peans even though the precise genes and variants remain unknownAt the same time we found substantial allelic heterogeneity withinshared loci characterized both by population-specific rare variantsand variants shared among multiple populations that occur at dis-parate frequencies The allelic heterogeneity emphasizes the impor-tance of including diverse populations in future genetic associationstudies of complex traits such as lipids furthermore the overlapin lipid loci across populations of diverse ancestral origin arguesthat additional knowledge can be gleaned from multiple popula-tions We discuss how the overlapping genetic architecture can beexploited to improve the efficiency of GWAS in minority popula-tions

A Bayesian Hierarchical Model to Detect Differentially Methy-lated Loci from Single Nucleotide Resolution Sequencing DataHao Feng Karen Coneelly and Hao WuEmory UniversityhaowuemoryeduDNA methylation is an important epigenetic modification that hasessential roles in cellular processes including gene regulation de-velopment and disease and is widely dysregulated in most types ofcancer Recent advances in sequencing technology have enabled themeasurement of DNA methylation at single nucleotide resolutionthrough methods such as whole-genome bisulfite sequencing andreduced representation bisulfite sequencing In DNA methylationstudies a key task is to identify differences under distinct biologicalcontexts for example between tumor and normal tissue A chal-lenge in sequencing studies is that the number of biological repli-cates is often limited by the costs of sequencing The small number

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 47

Abstracts

of replicates leads to unstable variance estimation which can re-duce accuracy to detect differentially methylated loci (DML) Herewe propose a novel statistical method to detect DML when com-paring two treatment groups The sequencing counts are describedby a lognormal-beta-binomial hierarchical model which providesa basis for information sharing across different CpG sites A Waldtest is developed for hypothesis testing at each CpG site Simulationresults show that the proposed method yields improved DML detec-tion compared to existing methods particularly when the numberof replicates is low The proposed method is implemented in theBioconductor package DSS

Differential Isoform Expression Analysis in RNA-Seq usingRandom-Effects Meta-RegressionWeihua Guan1 Rui Xiao2 Chun Li3 and Mingyao Li21University of Minnesota2University of Pennsylvania3Vanderbilt Universityrxiaomailmedupennedu

A major application of RNA-Seq is to detect differential isoform ex-pression across experimental conditions However this is challeng-ing because of uncertainty in isoform expression estimation owingto ambiguous reads and variability in the precision of the estimatesacross samples It is desirable to have a method that can accountfor these issues and also allows adjustment of covariates In thispaper we present a random-effects meta-regression approach thatnaturally fits for this purpose Through extensive simulations andanalysis of an RNA-Seq dataset on human heart failure we showthat this approach is computationally fast reliable and can improvethe power of differential expression analysis while controlling forfalse positives due to the effect of covariates or confounding vari-ables

Allele-Specific Differential Methylation Analysis with NextGeneration Methylation Sequencing DataFei ZouUniversity of North Carolina at Chapel Hillfeizouemailuncedu

Next generation Methyl-seq data collected from F1 reciprocalcrosses in mouse can powerfully dissect strain and parent-of-origineffects on allelic specific methylation In this talk we present anovel statistical approach to analyze Methyl-seq data motivated byan F1 mouse study Our method jointly models the strain and parentof origin effects and deals with the over-dispersion problem com-monly observed in read counts and can flexibly adjust for the effectsof covariates such as sex and read depth We also propose a genomiccontrol procedure to properly control type I error for Methyl-seqstudies where the number of samples is small

Session 18 Statistical Applications in Finance

A Stochastic Mixture Model for Economic CyclesHaipeng Xing1 and Ning Sun2

1State University of New York2IBMxingamssunysbedu

Markov switching model has been used in various applications ineconomics and finance As exisitng Markov switching models de-scribe the regimes or parameter values in a categorical way itis restrictive in practical analysis In this paper we introduce amixture model with stochastic regimes in which the regimes and

model parameters are represented both categorically and continu-ously Assuming conjudge priors we develop closed-form recur-sive Bayes estimates of the regression parameters an approxima-tion scheme that has much lower computational complexity and yetare comparable to the Bayes estimates in statistical efficiency andan expectation-maximization procedure to estimate the unknownhyper-parameters We conduct intensive simulation studies to eval-uate the performance of Bayes estimates of time-varying parametersand their approximations We further apply the proposed model toanalyze the series of the US monthly total non-farm employee

Statistical Modelling of Bidding Prices in Online ad PositionAuctionsXiaoming HuoGeorgia Institute of TechnologyxiaomingisyegatecheduAd position auctions are being held all the time in nearly all websearch engines and have become the major source of revenue in on-line advertising We study statistical models of the bidding pricesTwo approaches are explored (1) a game theoretic approach thatcharacterizes biddersrsquo behavior and (2) a statistical generative ap-proach which aims at mimicking the fundamental mechanism un-derlying the bidding process We comparecontrast these two ap-proaches and describe how auctioneer can take advantage of theobtained knowledge

Regression with Rank Covariates A Distribution GuidedScores for RanksDo Hwan Park1 Yuneung Kim2 Johan Lim3 Sujung Choi4 andHsun-Chih Kuo5

1University of Maryland2Seoul National Univ3Auburn University4Ulsan National Institute of Science and Technology5National Chengchi UniversityjohanlimsnuackrThis work is motivated by a hand-collected data set from one ofthe largest internet portal in Korea The data set records the top 30most frequently discussed stocks on its online stock message boardwhich can be considered as a measure of investorrsquos attention on in-dividual stocks The empirical goal of the data set is to investigatethe attentionrsquos effect to the trading behavior To do it we considerthe regression model whose response is either stock return perfor-mance or trading volume and covariates are the daily-observed par-tial ranks as well as other covariates influential to the response Inestimating the regression model the rank covariate is often treatedas an ordinal categorical variable or simply transformed into a scorevariable (mostly using identify score function) In the paper westart our discussion with that for the univariate regression problemwhere we find the asymptotic normality of the regression coefficientestimator whose mean is 0 and variance is an unknown function ofthe distribution of X We then straightforwardly extend the resultsof univariate regression to multiple regression and have the similarasympototic distribution We finally consider an estimator for mul-tiple sets by extending or combining the estimators of each singleset We apply our proposed distribution guided scoring function tothe motivated data set to empirically prove the attention effect

Optimal Sparse Volatility Matrix Estimation for High Dimen-sional Ito Processes with Measurement ErrorsMinjing Tao1 Yazhen Wang2 and Harrison Zhou3

1Florida State University2University of Wisconsin-Madison

48 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

3Yale UniversitytaostatfsueduStochastic processes are often used to model complex scientificproblems in fields ranging from biology and finance to engineeringand physical science This talk investigates rate-optimal estimationof the volatility matrix of a high dimensional Ito process observedwith measurement errors at discrete time points The minimax rateof convergence is established for estimating sparse volatility ma-trices By combining the multi-scale and threshold approaches weconstruct a volatility matrix estimator to achieve the optimal conver-gence rate The minimax lower bound is derived by considering asubclass of Ito processes for which the minimax lower bound is ob-tained through a novel equivalent model of covariance matrix esti-mation for independent but non-identically distributed observationsand through a delicate construction of the least favorable parame-ters In addition a simulation study was conducted to test the finitesample performance of the optimal estimator and the simulationresults were found to support the established asymptotic theory

Session 19 Hypothesis Testing

A Score-type Test for Heterogeneity in Zero-inflated Models ina Stratified PopulationGuanqun Cao1 Wei-Wen Hsu2 and David Todem3

1Auburn University2Kansas State University3Michigan State Universitygzc0009auburneduWe propose a score-type statistic to evaluate heterogeneity in zero-inflated models for count data in a stratified population where het-erogeneity is defined as instances in which the zero counts are gen-erated from two sources In this work we extend the literature bydescribing a score-type test to evaluate homogeneity against generalalternatives that do not neglect the stratification information underthe alternative hypothesis Our numerical simulation studies showthat the proposed test can greatly improve efficiency over tests ofheterogeneity that ignore the stratification information An empiri-cal application to dental caries data in early childhood further showsthe importance and practical utility of the methodology in using thestratification profile to detect heterogeneity in the population

Inferences on Correlation Coefficients of Bivariate Log-normalDistributionsGuoyi Zhang1 and Zhongxue Chen2

1Universtiy of New Mexico2Indiana Universitygzhang123gmailcomThis research considers inference on the correlation coefficients ofbivariate log-normal distributions We developed a generalized con-fidence interval and hypothesis tests for the correlation coefficientand extended the results for comparing two independent correla-tions Simulation studies show that the suggested methods workwell even for small samples The methods are illustrated using twopractical examples

Testing Calibration of Risk Models at Extremes of Disease-RiskMinsun Song1 Peter Kraft2 Amit D Joshi2 Myrto Barrdahl3 andNilanjan Chatterjee11National Cancer Institute2Harvard University3German Cancer Reserch Centersongm4mailnihgov

Risk-prediction models need careful calibration to ensure they pro-duce unbiased estimates of risk for subjects in the underlying pop-ulation given their risk-factor profiles As subjects with extremehigh- or low- risk may be the most affected by knowledge of theirrisk estimates checking adequacy of risk models at the extremes ofrisk is very important for clinical applications We propose a newapproach to test model calibration targeted toward extremes of dis-ease risk distribution where standard goodness-of-fit tests may lackpower due to sparseness of data We construct a test statistic basedon model residuals summed over only those individuals who passhigh andor low risk-thresholds and then maximize the test-statisticover different risk-thresholds We derive an asymptotic distribu-tion for the max-test statistic based on analytic derivation of thevariance-covariance function of the underlying Gaussian processThe method is applied to a large case-control study of breast can-cer to examine joint effects of common SNPs discovered thoroughrecent genome-wide association studies The analysis clearly indi-cates non-additive effect of the SNPs on the scale of absolute riskbut an excellent fit for the linear-logistic model even at the extremesof risks

Statistical Issues When Incidence Rates Extremely Low AndSample Sizes Very BigPeter Hu and Haijun MaAmgen Incphuamgencom

It is well known that sample sizes of clinical trials are often not bigenough to assess adverse events (AE) with very low incidence ratesLarge scale observational studies such as pharmacovigilence stud-ies using healthcare databases provide an alternative resource forassessment of very rare adverse events Healthcare databases oftencan easily provide tens of thousands of exposed patients which po-tentially allows the assessment of events as rare as in the magnitudeof iexcl 10minus4In this talk we discuss the performance of various commonly usedstatistical methods for comparison of binomial proportions of veryrare events The statistical power type I error control confidenceinterval (CI) coverage length of confidence interval bias and vari-ability of treatment effect estimates as well as the distribution of CIupper bound etc will be examined and compared for the differentmethods Power calculation is often necessary for study planningpurpose However many commonly used power calculation meth-ods are based on approximation and may give erroneous estimatesof power when events are We will compare the power estimates fordifferent methods provided by SAS Proc Power and empirically ob-tained via simulation The use of relative risks (RR) and risk differ-ences (RD) will also be commented on Based on these results sev-eral recommendations are given to guide sample size assessmentsfor such types of studies at design stage

Minimum Distance Regression Model Checking When Re-sponses are Missing at RandomXiaoyu LiAuburn Universityxzl0037auburnedu

This paper proposes a class of lack-of-fit tests for fitting a paramet-ric regression model when response variables are missing at ran-dom These tests are based on a class of minimum integrated squaredistances between a kernel type estimator of a regression functionand the parametric regression function being fitted These tests areshown to be consistent against a large class of fixed alternativesThe corresponding test statistics are shown to have asymptotic nor-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 49

Abstracts

mal distributions under null hypothesis Some simulation results arealso presented

Session 20 Design and Analysis of Clinical Trials

Application of Bayesian Approach in Assessing Rare AdverseEvents during a Clinical StudyGrace Li Karen Price Haoda Fu and David MannerEli Lilly and CompanyLi Ying GracelillycomBayesian analysis is gaining wider application in decision makingthroughout the drug development process due to its more intuitiveframework and ability to provide direct probabilistic answers tocomplex problems Determining the risk profile for a compoundthroughout phases of drug development is crucial along with ensur-ing the most appropriate analyses are performed In a conventional2-arm parallel study design rare adverse events are often assessedvia frequentist approaches such as a Fisherrsquos exact test with itsknown limitations This presentation will focus on the challengesof the frequentist approach to detect and evaluate potential safetysignals in the rare event setting and compare it with the proposedBayesian approach We will compare the operational characteristicsbetween the frequentist and the Bayesian approaches using simu-lated data Most importantly the proposed approach offers muchmore flexibility and a more direct probabilistic interpretation thatimproves the process of detecting rare safety signals This approachhighlights the strength of Bayesian methods for inference Thesimulation results are intended to demonstrate the value of usingBayesian methods and that appropriate application has the potentialto increase efficiency of decision making in drug development

A Simplified Varying-Stage Adaptive Phase IIIII Clinical TrialDesignGaohong DongNovartis Pharmaceuticals CorporationgaohongdongnovartiscomConventionally adaptive phase IIIII clinical trials are carriedout with a strict two-stage design Recently Dong (Statistics inMedicine 2014 33(8)1272-87) proposed a varying-stage adap-tive phase IIIII clinical trial design In this design following thefirst stage an intermediate stage can be adaptively added to obtainmore data so that a more informative decision could be made re-garding whether the trial can be advanced to the final confirmatorystage Therefore the number of further investigational stages is de-termined based upon data accumulated to the interim analysis LaterDong (2013 ICSA Symposium Book to be published) investigatedsome characteristics of this design This design considers two plau-sible study endpoints with one of them initially designated as theprimary endpoint Based on interim results another endpoint canbe switched as the primary endpoint However in many therapeuticareas the primary study endpoint is well established therefore wesimplify this design to consider one study endpoint only Our sim-ulations show that same as the original design this simplified de-sign controls Type I error rate very well the sample size increasesas the threshold probability for the two-stage setting increases andthe alpha allocation ratio in the two-stage setting vs the three-stagesetting has a great impact to the design However this simplifieddesign requires a larger sample size for the initial stage to overcomethe power loss due to the futility Compared to a strict two-stagePhase IIIII design this simplified design improves the probabilityof trial success

Improving Multiple Comparison Procedures With CoprimaryEndpoints by Generalized Simes TestsHua Li1 Willi Maurer1 Werner Brannath2 and Frank Bretz11Novartis Pharmaceuticals Corporation2University of BremenJenniferlinovartiscomFor a fixed-dose combination of indacaterol acetate (long-acting β2-agonist) and mometasone furoate (inhaled corticosteroid) for theonce daily maintenance treatment of asthma and Chronic Obstruc-tive Pulmonary Disease(COPD) both lung function improvementand one symptom outcome improvement are required for the drug tobe developed successfully The symptom outcome could be AsthmaControl Questionnaire (ACQ) improvement for the asthma programand exacerbation rate reduction for the COPD program Havingtwo endpoints increases the probability of false positive results bychance alone ie marketing a drug which is not or insufficientlyeffective Therefore regulatory agencies require strict control ofthis probability at a pre-specified significance level (usually 251-sided) The Simes test is often used in our clinical trials How-ever the Simes test requires the assumption that the test statistics arepositively correlated This assumption is not always satisfied or can-not be easily verified when dealing with multiple endpoints In thispresentation an extension of the Simes test - a generalized Simestest introduced by Maurer Glimm Bretz (2011) which is applica-ble to any correlation (positive negative or even no correlation) isutilized Power benefits based on simulations are presented FDAand other agencies have accepted this approach indicating that theproposed method can be used in other trials in future

Efficient Design for Cluster Randomized Trials with BinaryOutcomesSheng Wu Weng Kee Wong and Catherine CrespiUniversity of California at Los AngelesshengwuuclaeduCluster randomized trials (CRTs) are increasingly used for researchin many fields including public health education social studies andethnic disparity studies Equal allocation designs are often used inCRTs but they may not be optimal especially when cost considera-tion is taken into account In this paper we consider two-arm clusterrandomized trials with a binary outcome and develop various opti-mal designs when sampling costs for units and clusters are differentand the primary outcome is attributable risk or relative risk Weconsider both frequentist and Bayesian approaches in the context ofcancer control and prevention cluster randomized trials and presentformuale for optimal sample sizes for the two arms for each of theoutcome measure

Zero Event and Continuity Correction in Meta-Analyses ofRare Events Using Mantel-Haenszel Odds Ratio and Risk Dif-ferenceTianyue ZhouSanofi-aventis US LLCtianyuezhousanoficomMeta-analysis of side effects has been widely used to combine datawith low event rate across comparative clinical studies for evaluat-ing drug safety profile When dealing with rare events a substantialproportion of studies may not have any events of interest In com-mon practice meta-analyses on a relative scale (relative risk [RR]or odds ratio [OR]) remove zero-event studies while meta-analysesusing risk difference [RD] as the effect measure include them Ascontinuity corrections are often used when zero event occurs in ei-ther arm of a study the impact of zero event and continuity cor-rection on estimates of Mantel-Haenszel (M-H) OR and RD was

50 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

examined through simulation Two types of continuity correctionthe treatment arm continuity correction and the constant continuitycorrection are applied in the meta-analysis for variance calculationFor M-H OR it is unnecessary to include zero-event trials and the95 confidence interval [CI] of the estimate without continuity cor-rections provided best coverage For H-M RD including zero-eventtrials reduced bias and using certain continuity correction ensured atleast 95 coverage of 95 CI This paper examined the influence ofzero event and continuity correction on estimates of M-H OR andRD in order to help people decide whether to include zero-eventtrials and use continuity corrections for a specific problem

Session 21 New methods for Big Data

Sure Independence Screening for Gaussian Graphical ModelsShikai Luo1 Daniela Witten2 and Rui Song1

1North Carolina State University2University of WashingtonrsongncsueduIn high-dimensional genomic studies it is of interest to understandthe regulatory network underlying tens of thousands of genes basedon hundreds or at most thousands of observations for which geneexpression data is available Because graphical models can identifyhow variables such as the coexpresion of genes are related theyare frequently used to study genetic networks Although various ef-ficient algorithms have been proposed statisticians still face hugecomputational challenges when the number of variables is in tens ofthousands of dimensions or higher Motivated by the fact that thecolumns of the precision matrix can be obtained by solving p regres-sion problems each of which involves regressing that feature ontothe remaining pminus 1 features we consider covariance screening forGaussian graphical models The proposed methods and algorithmspossess theoretical properties such as sure screening properties andsatisfactory empirical behavior

Case-Specific Random ForestsRuo Xu1 Dan Nettleton2 and Daniel J Nordman2

1Google2Iowa State UniversitydnettiastateeduRandom forest (RF) methodology is a nonparametric methodologyfor prediction problems A standard way to utilize RFs includesgenerating a global RF in order to predict all test cases of interestIn this talk we propose growing different RFs specific to differenttest cases namely case-specific random forests (CSRFs) In con-trast to the bagging procedure used in the building of standard RFsthe CSRF algorithm takes weighted bootstrap resamples to createindividual trees where we assign large weights to the training casesin close proximity to the test case of interest a priori Tuning meth-ods are discussed to avoid overfitting issues Both simulation andreal data examples show that CSRFs often outperform standard RFsin prediction We also propose the idea of case-specific variable im-portance (CSVI) as a way to compare the relative predictor variableimportance for predicting a particular case It is possible that theidea of building a predictor case-specifically can be generalized inother areas

Uncertainty Quantification for Massive Data Problems usingGeneralized Fiducial InferenceRandy C S Lai1 Jan Hannig2 and Thomas C M Lee11University of California at Davis2University of North Carolina at Chapel Hill

tcmleeucdaviseduIn this talk we present a novel parallel method for computing param-eter estimates and their standard errors for massive data problemsThe method is based on generalized fiducial inference

OEM Algorithm for Big DataXiao Nie and Peter Z G QianUniversity of Wisconsin-MadisonxiaoniestatwisceduBig data with large sample size arise in Internet marketing engi-neering and many other fields We propose an algorithm calledOEM (aka orthogonalizing EM) for analyzing big data This al-gorithm employs a procedure named active orthogonalization toexpand an arbitrary matrix to an orthogonal matrix This procedureyields closed-form solutions to ordinary and various penalized leastsquares problems The maximum number of points needed to beadded is bounded by the number of columns of the original ma-trix which is appealing for large n problems Attractive theoreticalproperties of OEM include (1) convergence to the Moore-Penrosegeneralized inverse estimator for a singular regression matrix and(2) convergence to a point having grouping coherence for a fullyaliased regression matrix We also extend this algorithm to logisticregression The effectiveness of OEM for least square and logisticregression problems will be illustrated through examples

Session 22 New Statistical Methods for Analysis of HighDimensional Genomic Data

Integrative Modeling of Multi-Platform Genomic Data underthe Framework of Mediation AnalysisYen-Tsung HuangBrown UniversityYen-Tsung HuangbrowneduGiven the availability of genomic data there have been emerging in-terests in integrating multi-platform data Here we propose to modelepigenetic DNA methylation micro-RNA expression and gene ex-pression data as a biological process to delineate phenotypic traitsunder the framework of causal mediation modeling We proposea regression model for the joint effect of methylation micro-RNAexpression and gene expression and their non-linear interactions onthe outcome and study three path-specific effects the direct effectof methylation on the outcome the effect mediated through expres-sion and the effect through micro-RNA expression We characterizecorrespondences between the three path-specific effects and coeffi-cients in the regression model which are influenced by causal rela-tions among methylation micro-RNA and gene expression A scoretest for variance components of regression coefficients is developedto assess path-specific effects The test statistic under the null fol-lows a mixture of chi-square distributions which can be approxi-mated using a characteristic function inversion method or a pertur-bation procedure We construct tests for candidate models deter-mined by different combinations of methylation micro-RNA geneexpression and their interactions and further propose an omnibustest to accommodate different models The utility of the methodwill be illustrated in numerical simulation studies and a glioblas-toma data from The Cancer Genome Atlas (TCGA)

Estimation of High Dimensional Directed Acyclic Graphs usingeQTL dataWei Sun1 and Min Jin Ha2

1University of North Carolina at Chapel Hill2University of Texas MD Anderson Cancer Center

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 51

Abstracts

weisunemailuncedu

Observational data can be used to estimate the skeleton of a di-rected acyclic graph (DAG) and the directions of a limited numberof edges With sufficient interventional data one can identify thedirections of all the edges of a DAG However such interventionaldata are often not available especially for high dimensional prob-lems We develop a statistical method to estimate a DAG using sur-rogate interventional data where the interventions are applied to aset of external variables and thus such interventions are consideredto be surrogate interventions on the variables of interest Our workis motivated by expression quantitative trait locus (eQTL) studieswhere the variables of interest are the expression of genes the ex-ternal variables are DNA variations and interventions are appliedto DNA variants during the process that a randomly selected DNAallele is passed to a child from either parent Our method namedas sirDAG (surrogate intervention recovery of DAG) first constructDAG skeleton using a combination of penalized regression and thePC algorithm and then estimate the posterior probabilities of all theedge directions after incorporating DNA variant data We demon-strate advantage of sirDAG by simulations and an application in aneQTL study of iquest18000 genes in 550 breast cancer patients

Prioritizing Disease Genes from Genome-wide AssociationStudies Through Dynamic Co-expression NetworksLin Hou1 Min Chen2 Clarence Zhang3 Judy Cho4 and HongyuZhao1

1Yale University2University of Texas at Dallas3Bristol-Myers Squibb4Mount-Sinai Medical Centerhongyuzhaoyaleedu

Although Genome Wide Association Studies (GWAS) have iden-tified many susceptibility loci for common diseases they only ex-plain a small portion of heritability It is challenging to identify theremaining disease loci because their association signals are likelyweak and difficult to identify among millions of candidates Onepotentially useful direction to increase statistical power is to incor-porate functional genomics information especially gene expressionnetworks to prioritizeGWASsignals Most current methods utiliz-ing network information to prioritize disease genes are based onthe ldquoguilt by associationrdquo principle in which networks are treatedas static and disease-associated genes are assumed to locate closerwith each other than random pairs in the network In contrast wepropose a novel ldquoguilt by rewiringrdquo principle Studying the dynam-ics of gene networks between controls and patients this principleassumes that disease genes more likely undergo rewiring in patientswhereas most of the network remains unaffected in disease condi-tion To demonstrate this principle we consider thechanges of co-expression networks in Crohnrsquos disease patients andcontrols andhow network dynamics reveals information on disease associationsOur results demonstrate that network rewiring is abundant in theimmune system anddisease-associated genes are morelikely to berewired in patientsTo integrate this network rewiring feature andGWAS signals we propose to use the Markov random field frame-work to integrate network information to prioritize genes Appli-cations in Crohnrsquos disease and Parkinsonrsquos disease show that thisframework leads to more replicable results and implicates poten-tially disease-associated pathways

Kernel Machine Methods for Joint Testing and IntegrativeAnalysis of Genome Wide Methylation and Genotyping Stud-

iesNi Zhao and Michael WuFred Hutchinson Cancer Research CenternzhaofhcrcorgComprehensive understanding of complex trait etiology requires ex-amination of multiple sources of genomic variability Integrativeanalysis of these data sources promises elucidation of the biologicalprocesses underlying particular phenotypes Consequently manylarge GWAS consortia are expanding to simultaneously examine thejoint role of DNA methylation Two practical challenges have arisenfor researchers interested in joint analysis of GWAS and methyla-tion studies of the same subjects First it is unclear how to leverageboth data types to determine if particular genetic regions are relatedto traits of interest Second it is of considerable interest to under-stand the relative roles of different sources of genomic variabilityin complex trait etiology eg whether epigenetics mediates geneticeffects etc Therefore we propose to use the powerful kernel ma-chine framework for first testing the cumulative effect of both epige-netic and genetic variability on a trait and for subsequent mediationanalysis to understand the mechanisms by which the genomic datatypes influence the trait In particular we develop an approach thatworks at the generegion level (to allow for a common unit of anal-ysis across data types) Then we compare pair-wise similarity in thetrait values between individuals to pairwise similarity in methyla-tion and genotype values for a particular gene with correspondencesuggestive of association Similarity in methylation and genotypeis found by constructing an optimally weighted average of the sim-ilarities in methylation and genotype For a significant generegionwe then develop a causal steps approach to mediation analysis atthe generegion level which enables elucidation of the manner inwhich the different data types work or do not work together Wedemonstrate through simulations and real data applications that ourproposed testing approach often improves power to detect trait as-sociated genes while protecting type I error and that our mediationanalysis framework can often correctly elucidate the mechanisms bywhich genetic and epigenetic variability influences traits A key fea-ture of our approach is that it falls within the kernel machine testingframework which allows for heterogeneity in effect sizes nonlinearand interactive effects and rapid p-value computation Addition-ally the approach can be easily applied to analysis of rare variantsand sequencing studies

Session 23 Recent Advances in Analysis of LongitudinalData with Informative Observation process

Joint Modeling of Alternating Recurrent Transition TimesLiang LiUniversity of Texas MD Anderson Cancer CenterLLi15mdandersonorgAtrial fibrillation (AF) is a common complication on patients under-going cardiac surgery Recent technological advancement enablesthe physicians to monitor the occurrence AF continuously with im-planted cardiac devices The device records two types of transitionaltimes the time when the heart enters the AF status from normal beatand the time when the heart exits from AF status and returns to nor-mal beat The two transitional time processes are recurrent and ap-pear alternatively Hundreds of transitional times may be recordedon a single patient over a follow-up period of up to 12 months Therecurrent pattern carries information on the risk of AF and may berelated to baseline covariates The previous AF pattern may be pre-dictive to the subsequent AF pattern We propose a semiparametric

52 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

bivariate longitudinal transitional time model to model this compli-cated process The model enables single subject analysis as well asmultiple subjects analysis and both can be carried out in a likelihoodframework We present numerical studies to illustrate the empiricalperformance of the methodology

Regression Analysis of Panel Count Data with Informative Ob-servation TimesYang Li1 Xin He2 Haiying Wang3 and Jianguo Sun4

1University of North Carolina at Charlotte2University of Maryland3University of New Hampshire4University of Missouri at ColumbiaYLiunccedu

Panel count data usually occur in medical follow-up studies Mostexisting approaches on panel count data analysis assumed that theobservation or censoring times are independent of the response pro-cess either completely or given some covariates We present ajoint analysis approach in which the possible mutual correlationsare characterized by time-varying random effects Estimating equa-tions are developed for the parameter estimation and a simulationstudy is conducted to assess the finite sample performance of theapproach The asymptotic properties of the proposed estimates arealso given and the method is applied to an illustrative example

Envelope Linear Mixed ModelXin ZhangUniversity of Minnesotazhxnzxgmailcom

Envelopes were recently proposed by Cook Li and Chiaromonte(2010) as a method for reducing estimative and predictive varia-tions in multivariate linear regression We extend their formulationproposing a general definition of an envelope and adapting enve-lope methods to linear mixed models Simulations and illustrativedata analysis show the potential for envelope methods to signifi-cantly improve standard methods in longitudinal and multivariatedata analysis This is joint work with Professor R Dennis Cook andProfessor Joseph G Ibrahim

Regression Analysis of Longitudinal Data with Irregular andInformative Observation TimesYong Chen Jing Ning and Chunyan CaiUniversity of Texas health Science Center at Houstonccaistatgmailcom

In longitudinal data analyses the observation times are often as-sumed to be independent of the outcomes In applications in whichthis assumption is violated the standard inferential approach of us-ing the generalized estimating equations may lead to biased infer-ence Current methods require the correct specification of either theobservation time process or the repeated measure process with a cor-rect covariance structure In this article we construct a novel pair-wise pseudo-likelihood method for longitudinal data that allows fordependence between observation times and outcomes This methodinvestigates the marginal covariate effects on the repeated measureprocess while leaving the probability structure of the observationtime process unspecified The novelty of this method is that ityields consistent estimator of the marginal covariate effects with-out specification of the observation time process or the covariancestructure of repeated measures process Large sample propertiesof the regression coefficient estimates and a pseudolikelihood-ratiotest procedure are established Simulation studies demonstrate thatthe proposed method performs well in finite samples An analysis of

weight loss data from a web-based program is presented to illustratethe proposed method

Session 24 Bayesian Models for High Dimensional Com-plex Data

A Bayesian Feature Allocation Model for Tumor HeterogeneityJuhee Lee1 Peter Mueller2 Yuan Ji3 and Kamalakar Gulukota4

1University of California at Santa Cruz2University of Texas at Austin3University of Chicago4Northshore University HealthSystemjuheeleesoeucsceduWe propose a feature allocation model to model tumor heterogene-ity The data are next-generation sequencing data (NGS) from tumorsamples We use a variation of the Indian buffet process to charac-terize latent hypothetical subclones based on single nucleotide vari-ations (SNVs) We define latent subclones by the presence of somesubset of the recorded SNVs Assuming that each sample is com-posed of some sample-specific proportions of these subclones wecan then fit the observed proportions of SNVs for each sample Bytaking a Bayesian perspective the proposed method provides a fulldescription of all possible solutions as a coherent posterior proba-bility model for all relevant unknown quantities including the binaryindicators that characterize the latent subclones by selecting (or not)the recorded SNVs instead of reporting a single solution

Some Results on the One-Way ANOVA Model with an Increas-ing Number of GroupsFeng LiangUniversity of Illinois at Urbana-ChampaignliangfillinoiseduAsymptotic studies on models with diverging dimensionality havereceived increasing attention in statistics A simple version of suchmodels is a one-way ANOVA model where the number of repli-cates is fixed but the number of groups goes to infinity Of interestare inference problems like model selection and estimation of theunknown group means We examine the consistency of Bayesianprocedures using Zellner (1986)rsquos g-prior and its variants (such asmixed g-priors and Empirical Bayes) and compare their estimationaccuracy with other procedures such as the ones based AICBICand group Lasso Our results indicate that the Empirical Bayes pro-cedure (with some modification for the large p small n setting) andthe fully Bayes procedure (ie a prior is specified on g) can achievemodel selection consistency and also have better estimation accu-racy than other procedures being considered

Bayesian Graphical Models for Differential PathwaysRiten Mitra1 Peter Mueller2 and Yuan Ji31University of Louisville2University of Texas at Austin3NorthShore University HealthSystemUniversity of ChicagojiyuanuchicagoeduGraphical models can be used to characterize the dependence struc-ture for a set of random variables In some applications the formof dependence varies across different subgroups This situationarises for example when protein activation on a certain pathwayis recorded and a subgroup of patients is characterized by a patho-logical disruption of that pathway A similar situation arises whenone subgroup of patients is treated with a drug that targets that samepathway In both cases understanding changes in the joint distri-bution and dependence structure across the two subgroups is key to

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 53

Abstracts

the desired inference Fitting a single model for the entire data couldmask the differences Separate independent analyses on the otherhand could reduce the effective sample size and ignore the com-mon features In this paper we develop a Bayesian graphical modelthat addresses heterogeneity and implements borrowing of strengthacross the two subgroups by simultaneously centering the prior to-wards a global network The key feature is a hierarchical prior forgraphs that borrows strength across edges resulting in a comparisonof pathways across subpopulations (differential pathways) under aunified model-based framework We apply the proposed model todata sets from two very different studies histone modifications fromChIP-seq experiments and protein measurements based on tissuemicroarrays

Latent Space Models for Dynamic NetworksYuguo ChenUniversity of Illinois at Urbana-Champaignyuguoillinoisedu

Dynamic networks are used in a variety of fields to represent thestructure and evolution of the relationships between entities Wepresent a model which embeds longitudinal network data as trajec-tories in a latent Euclidean space A Markov chain Monte Carloalgorithm is proposed to estimate the model parameters and latentpositions of the nodes in the network The model parameters pro-vide insight into the structure of the network and the visualizationprovided from the model gives insight into the network dynamicsWe apply the latent space model to simulated data as well as realdata sets to demonstrate its performance

Session 25 Statistical Methods for Network Analysis

Consistency of Co-clustering for Exchangable Graph and ArrayDataDavid S Choi1 and Patrick J Wolfe21Carnegie Mellon University2University College Londondavidchandrewcmuedu

We analyze the problem of partitioning a 0-1 array or bipartite graphinto subgroups (also known as co-clustering) under a relativelymild assumption that the data is generated by a general nonpara-metric process This problem can be thought of as co-clusteringunder model misspecification we show that the additional error dueto misspecification can be bounded by O(n( minus 14)) Our resultsuggests that under certain sparsity regimes community detectionalgorithms may be robust to modeling assumptions and that theirusage is analogous to the usage of histograms in exploratory dataanalysis

Laplacian Shrinkage for Inverse Covariance Estimation fromHeterogenous PopulationsTakumi Saegusa and Ali ShojaieUniversity of Washingtonashojaieuwedu

We introduce a general framework using a Laplacian shrinkagepenalty for estimation of inverse covariance or precision matricesfrom heterogeneous nonexchangeable populations The proposedframework encourages similarity among disparate but related sub-populations while allowing for differences among estimated matri-ces We propose an efficient alternating direction method of mul-tiplier (ADMM) algorithm for parameter estimation and establishboth variable selection and norm consistency of the estimator for

distributions with exponential or polynomial tails Finally we dis-cuss the selection of the Laplacian shrinkage penalty based on hier-archical clustering in the settings where the true relationship amongsamples is unknown and discuss conditions under which this datadriven choice results in consistent estimation of precision matricesExtensive numerical studies and applications to gene expressiondata from subtypes of cancer with distinct clinical outcomes indi-cate the potential advantages of the proposed method over existingapproaches

Estimating Signature Subgraphs in Samples of Labeled GraphsJuhee Cho and Karl RoheUniversity of Wisconsin-MadisonchojuheestatwisceduNetwork is a vibrant area in statistics biology and computer sci-ence Recently an emerging type of data in these fields is samplesof labeled networks (or graphs) The ldquolabelsrdquo of networks imply thatthe nodes are labeled and that the same set of nodes reappears in allof the networks Also they have a dual meaning that there are values(eg age gender or healthy vs sick) or vectors of values charac-terizing the associated network From the analysis we observe thatonly a part of the network forming a ldquosignature subgraphrdquo variesacross the networks whereas the other part is very similar So wedevelop methods to estimate the signature subgraph and show the-oretical properties of the suggested methods under the frameworkthat allows the sample size to go to infinity with a sparsity condi-tion To check the finite sample performances for the methods weconduct a simulation study and then analyze two data sets 42 brain-graphs data from 21 subjects and transcriptional regulatory networkdata from 41 diverse human cell types

Fast Hierarchical Modeling for Recommender SystemsPatrick PerryNew York UniversitypperrysternnyueduIn the context of a recommender system a hierarchical model al-lows for user-specific tastes while simultaneously borrowing esti-mation strength across all users Unfortunately existing likelihood-based methods for fitting hierarchical models have high computa-tional demands and these demands have limited their adoption inlarge-scale prediction tasks We propose a moment-based methodfor fitting a hierarchical model which has its roots in a methodoriginally introduced by Cochran in 1937 The method trades sta-tistical efficiency for computational efficiency It gives consistentparameter estimates competitive prediction error performance anddramatic computational improvements

Session 26 New Analysis Methods for UnderstandingComplex Diseases and Biology

Data-Integration for Identifying Clinically Important LongNon-coding RNA in CancerYiwen Chen1 Zhou Du2 Teng Fei1 Roel GW Verhaak3 YongZhang2 Myles Brown4 and X Shirley Liu4

1Dana Farber Cancer Institute2Tongji University3University of Texas MD Anderson Cancer Center4Dana Farber Cancer Institute amp Harvard UniversityywchenjimmyharvardeduCumulatively 70 of the human genome are transcribed whereasiexcl2 of the genome encodes protein As a part of the prevalent non-coding transcription long non-coding RNAs (lncRNAs) are RNAs

54 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

that are longer than 200 base pairs (bps) but with little protein cod-ing capacity The human genome encodes over 10000 lncRNAsand the function of the vast majority of them are unknown Throughintegrative analysis of the lncRNA expression profiles with clinicaloutcome and somatic copy number alteration we identified lncRNAthat are associated with cancer subtypes and clinical prognosis andpredicted those that are potential drivers of cancer progression inmultiple cancers including glioblastoma multiforme (GBM) ovar-ian cancer (OvCa) lung squamous cell carcinoma (lung SCC) andprostate cancer We validated our predictions of two tumorgeniclncRNAs by experimentally confirming the prostate cancer cellgrowth dependence on these two lncRNAs Our integrative analysisprovided a resource of clinically relevant lncRNA for developmentof lncRNA biomarkers and identification of lncRNA therapeutic tar-gets for human cancer

Data Integration for Genetics-Based Drug Repurposing inComplex DiseasesDi WuHarvard Universitydwufasharvardedu

Large amount of genetics variants were identified in cancer genomestudies and GWAS studies These variants may well capture thecharacteristics of the diseases To best leverage the knowledge fordeveloping new therapeutics to treat diseases our study exploresthe possibility to use the genetics of diseases to guide drug repur-posing Drug repurposing is to suggest whether the available drugsof certain diseases can be re-used for the treatment of other dis-eases We particularly use the gene target information of drugs andprotein-protein interaction information to connect risk genes basedon GWAS hits and the available drugs Drug indication was used toevaluate the sensitivity and specificity of the novel pipeline Eval-uation of the pipeline suggests the promising direction for certaindiseases

Comparative Meta-Analysis of Prognostic Gene Signatures forLate-Stage Ovarian CancerLevi WaldronHunter Collegeleviwaldronhuntercunyedu

Authors Levi Waldron Benjamin Haibe-Kains Aedın C CulhaneMarkus Riester Jie Ding Xin Victoria Wang Mahnaz Ahmadi-far Svitlana Tyekucheva Christoph Bernau Thomas Risch Ben-jamin Ganzfried Curtis Huttenhower Michael Birrer and GiovanniParmigianiAbstract Numerous published studies have reported prognosticmodels of cancer patient survival from tumor genomics These stud-ies employ a wide variety of model training and validation method-ologies making it difficult to compare and rank their modelingstrategies or the accuracy of the models However they have alsogenerated numerous publicly available microarray datasets withclinically-annotated individual patient data Through systematicreview we identified and implemented fully-specified versions of14 prognostic models of advanced stage ovarian cancer publishedover a 5-year period These 14 published models were developedby different authors using disparate training datasets and statis-tical methods but all claimed to be capable of predicting over-all survival using microarray data We evaluated these models forprognostic accuracy (defined by Concordance Index for overall sur-vival) adapting traditional methods of meta-analysis to synthesizeresults in ten independent validation datasets This systematic eval-uation showed that 1) models generated by penalized or ensemble

Cox Proportional Hazards-based regression methods out-performedmodels generated by more complicated methods and strongly out-performed hypothesis-based models 2) validation dataset bias ex-isted meaning that some datasets indicated better validation perfor-mance for all models than others and that comparative evaluation isneeded to identify this source of bias 3) datasets selected by authorsfor independent validation tended to over-estimate model accuracycompared to previously unused validation datasets and 4) seem-ingly unrelated models generated highly correlated predictions fur-ther emphasizing the need for comparative evaluation of accuracyThis talk will provide an overview of methods for prediction mod-eling in cancer genomics and highlight lessons from the first sys-tematic comparative meta-analysis of published cancer genomicsprognostic models

Studying Spatial Organizations of Chromosomes via Paramet-ric ModelMing Hu1 Yu Zhu2 Zhaohui Steve Qin3 Ke Deng4 and Jun SLiu5

1New York university2Purdue University3Emory University4Tsinghua University5Harvard UniversityminghunyumcorgThe recently developed Hi-C technology enables a genome-wideview of spatial organizations of chromosomes and has shed deepinsights into genome structure and genome function Although thetechnology is extremely promising multiple sources of biases anduncertainties pose great challenges for data analysis Statistical ap-proaches for inferring three-dimensional (3D) chromosomal struc-ture from Hi-C data are far from their maturity Most existing mod-els are highly over-parameterized lacking clear interpretations andsensitive to outliers In this study we propose parsimonious easyto interpret and robust helix models for reconstructing 3D chromo-somal structure from Hi-C data We also develop a negative bino-mial regression approach to accounting for over-dispersion in Hi-Cdata When applied to a real Hi-C dataset helix models achievemuch better model adequacy scores than existing models Moreimportantly these helix models reveal that geometric properties ofchromatin spatial organizations as well as chromatin dynamics areclosely related to genome functions

Session 27 Recent Advances in Time Series Analysis

Time Series Models for Spherical Data with Applications inStructural BiochemistryJay Breidt Daniel Hernandez-Stumpfhauser and Mark van derWoerdColorado State UniversityjbreidtgmailcomProteins consist of sequences of the 21 natural amino acids Therecan be tens to hundreds of amino acids in the protein and hundredsto hundreds of thousands of atoms A complete model for the pro-tein consists of coordinates for every atom A useful class of sim-plified models is obtained by focusing only on the alpha-carbonsequence consisting of the primary carbon atom in the backboneof each amino acid The three-dimensional structure of the alpha-carbon backbone of the protein can be described as a sequence ofangle pairs each consisting of a bond angle and a dihedral angleThese angle pairs lie naturally on a sphere We consider autoregres-sive time series models for such spherical data sequences using ex-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 55

Abstracts

tensions of projected normal distributions We describe applicationto protein data and further developments including autoregressivemodels that switch parameterizations according to local structure inthe protein (such as helices beta-sheets and coils)

Semiparametric Estimation of Spectral Density Function withIrregular DataShu Yang and Zhengyuan Zhu

Iowa State Universityzhu1997gmailcom

We propose a semi-parametric method to estimate spectral den-sities of isotropic Gaussian processes with irregular observationsThe spectral density function at low frequencies is estimated usingsmoothing spline while we use a parametric model for the spec-tral density at high frequencies and estimate the parameters usingmethod-of-moment based on empirical variogram at small lags Wederive the asymptotic bounds for bias and variance of the proposedestimator Simulation results show that our method outperforms theexisting nonparametric estimator by several performance criteria

On the Prediction of Stationary Functional Time SeriesAlexander Aue1 Diogo Dubart Norinho2 and Siegfried Hormann3

1University of California at Davis2University College London3University Libre de Bruxellesaaueucdavisedu

This talk addresses the prediction of stationary functional time se-ries Existing contributions to this problem have largely focusedon the special case of first-order functional autoregressive processesbecause of their technical tractability and the current lack of ad-vanced functional time series methodology It is shown how stan-dard multivariate prediction techniques can be utilized in this con-text The connection between functional and multivariate predic-tions is made precise for the important case of vector and functionalautoregressions The proposed method is easy to implement mak-ing use of existing statistical software packages and may there-fore be attractive to a broader possibly non-academic audienceIts practical applicability is enhanced through the introduction ofa novel functional final prediction error model selection criterionthat allows for an automatic determination of the lag structure andthe dimensionality of the model The usefulness of the proposedmethodology is demonstrated in simulations and an application tothe prediction of daily pollution curves It is found that the proposedprediction method often significantly outperforms existing methods

A Composite Likelihood-based Approach for Multiple Change-point Estimation in Multivariate Time Series ModelsChun Yip Yau and Ting Fung Ma

Chinese University of Hong Kongcyyaustacuhkeduhk

We propose a likelihood-based approach for multiple change-pointsestimation in general multivariate time series models Specificallywe consider a criterion function based on pairwise likelihood to esti-mate the number and locations of change-points and perform modelselection for each segment By the virtue of pairwise likelihood thenumber and location of change-points can be consistently estimatedunder very mild assumptions Computation is conducted efficientlyby a pruned dynamic programming algorithm Simulation studiesand real data examples are presented to demonstrate the statisticaland computational efficiency of the proposed method

Session 28 Analysis of Correlated Longitudinal and Sur-vival Data

Analysis of a Non-Randomized Longitudinal Quality of LifetrialMounir MesbahUniversity of Paris 6mounirmesbahupmcfrIn this talk I will consider the context of a longitudinal study whereparticipants are interviewed about their health quality of life or an-other latent trait at regular dates of visit previously establishedThe interviews consist usually to fulfill a questionnaire in whichthey are asked multiple choice questions with various ordinal re-sponse scales built in order to measure at the time of the visit thelatent trait which is assumed in a first step unidimensional Atthe time of entering the study each participant receives a treatmentappropriate to his health profile The choice of treatment is not ran-domized This choice is arbitrarily decided by a doctor based onthe health profile of the patient and a deep clinical examinationWe assume that the different treatments that a doctor can choose areordered (a dose effect) In addition we assume that the treatmentprescribed at the entrance does not change throughout the study Inthis work I will investigate and compare strategies and models toanalyze time evolution of the latent variable in a longitudinal studywhen the main goal is to compare non-randomized ordinal treat-ments I will illustrate my results with a real longitudinal complexquality of life studyReferences [1] Bousseboua M and Mesbah M (2013) Longitu-dinal Rasch Process with Memory Dependence Pub InstStatUniv Paris Vol 57- Fasc 1-2 45-58 [2] Christensen KB KreinerS Mesbah M (2013) Rasch Models in Health J Wiley [3] Mes-bah M (2012) Measurement and Analysis of Quality of Life inEpidemiology In ldquoBioinformatics in Human Health and Heredity(Handbook of statistics Vol 28)rdquo Eds Rao CR ChakrabortyR and Sen PK North Holland Chapter 15 [4] Rosenbaum PRand Rubin DB (1983) The central role of the propensity score inobservational studies for causal effects Biometrika 70 1 pp 41-55[5] K Imai and D A Van Dyk (2004) Causal Inference With Gen-eral Treatment Regimes Generalizing the Propensity Score JASAVol 99 N 467 Theory and Methods

Power and Sample Size Calculations for Evaluating MediationEffects with Multiple Mediators in Longitudinal StudiesCuiling WangAlbert Einstein College of MedicinecuilingwangeinsteinyueduCurrently there are very limited statistical researches on power anal-ysis for evaluating mediation effects of multiple mediators in longi-tudinal studies In addition to the complex of missing data com-mon to longitudinal studies the case of multiple mediators furthercomplicates the hypotheses testing of mediation effects Based onprevious work of Wang and Xue (Wang and Xue 2012) we eval-uate several hypothesis tests regarding the mediation effects frommultiple mediators and provide formulae for power and sample sizecalculations The performance of these methods under limited sam-ple size is examined using simulation studies An example from theEinstein Aging Study (EAS) is used to illustrate the methods

Distribution-free First-hitting-time Based Threshold Regres-sions for Lifetime DataMei-Ling Ting Lee1 and G Alex Whitmore21University of Maryland

56 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

2McGill UniversitymltleeumdeduCox regression methods are well-known It has however a strongproportional hazards assumption In many medical contexts a dis-ease progresses until a failure event (such as death) is triggeredwhen the health level first reaches a failure threshold Irsquoll presentthe Threshold Regression (TR) model for the health process that re-quires few assumptions and hence is quite general in its potentialapplication Both parametric and distribution-free methods for es-timations and predictions using the TR models are derived Caseexamples are presented that demonstrate the methodology and itspractical use The methodology provides medical researchers andbiostatisticians with new and robust statistical tools for estimatingtreatment effects and assessing a survivorrsquos remaining life

Joint Modeling of Survival Data and Mismeasured Longitudi-nal Data using the Proportional Odds ModelJuan Xiong1 Wenqing He1 and Grace Yi21University of Western Ontario2University of WaterloowhestatsuwocaJoint modeling of longitudinal and survival data has been studiedextensively where the Cox proportional hazards model has fre-quently been used to incorporate the relationship between survivaltime and covariates Although the proportional odds model is anattractive alternative to the Cox proportional hazards model by fea-turing the dependence of survival times on covariates via cumulativecovariate effects this model is rarely discussed in the joint model-ing context To fill this gap we investigate joint modeling of thesurvival data and longitudinal data which subject to measurementerror We describe a model parameter estimation method based onexpectation maximization algorithm In addition we assess the im-pact of naive analyses that fail to address error occurring in longi-tudinal measurements The performance of the proposed method isevaluated through simulation studies and a real data analysis

Session 29 Clinical Pharmacology

Truly Personalizing MedicineMike D HaleAmgen IncmdhaleamgencomPredictive analytics are being increasingly used to optimize market-ing for many non-medical products These companies observe andanalyze the behavior andor characteristics of an individual pre-dict the needs of that individual and then address those needs Wefrequently encounter this when web-browsing and when participat-ing in retail store loyalty programs advertising and coupons aretargeted to the specific individual based on predictive models em-ployed by advertisers and retailers This makes the traditional drugdevelopment program appear antiquated where a drug may be in-tended for all patients with a given indication This talk contraststhose methods and practices for addressing individual needs withthe way medicines are typically prescribed and considers a wayto integrate big data product label and predictive analytics to im-prove and enable personalized medicine Some important questionsare posed (but unresolved) such as who could do this and whatare the implications if we were to predict outcomes for individualpatients

What Do Statisticians Do in Clinical PharmacologyBrian Smith

Amgen Incbrismithamgencom

Clinical pharmacology is the science of drugs and their clinical useIt could be arged that all drug development is clinical pharmacol-ogy however typically pharmaceutical companies speperate in apattern similiar to the following A) clinical (late) development(Phase 2b-Phase 3) B) post-marketing (phase 4) and C) clinicalpharmacology (Phase 1-Phase 2a) As will be seen in this presenta-tion clinical pharmacology research presents numerous interestingstatistical opportunities

The Use of Modeling and Simulation to Bridge Different DosingRegimens - a Case StudyChyi-Hung Hsu and Jose PinheiroJanssen Research amp Developmentchsu3itsjnjcom

In recent years the pharmaceutical industry has increasingly facedthe challenge of needing to efficiently evaluate and use all availableinformation to improve its success rate in drug development underlimited resources constraints Modeling and simulation has estab-lished itself as the quantitative tool of choice to meet this existentialchallenge Models provide a basis for quantitatively describing andsummarizing the available information and our understanding of itUsing models to simulate data allows the evaluation of scenarioswithin and even outside the boundaries of the original data In thispresentation we will discuss and illustrate the use of modeling andsimulation techniques to bridge different dosing regimens based onstudies using just one of the regimens Special attention will begiven to quantifying inferential uncertainty and model validation

A Comparison of FDA and EMA Recommended Models forBioequivalence StudiesYongwu Shao Lingling Han Bing Gao Sally Zhao Susan GuoLijie Zhong and Liang FangGilead Sciencesyongwushaogileadcom

For a bioequivalence crossover study the FDA guidance recom-mends a mixed effects model for the formulation comparisons ofpharmacokinetics parameters including all subject data while theEMA guidance recommends an ANOVA model with fixed effectsof sequence subject within sequence period and formulation ex-cluding subjects with missing data from the pair-wise comparisonThese two methods are mathematically equivalent when there areno missing values from the targeted comparison With missing val-ues the mixed effects model including subjects with missing valuesprovides higher statistical power compared to fixed effects modelexcluding these subjects However the parameter estimation in themixed effects model is based on large sample asymptotic approxi-mations which may introduce bias in the estimate of standard devi-ations when sample size is small (Jones and Kenward 2003)In this talk we provide a closed-form formula to quantify the poten-tial gain of power using mixed effects models when missing dataare present A simulation study was conducted to confirm the theo-retical results We also perform a simulation study to investigate thebias introduced by the mixed effects model for small sample sizeOur results show that when the sample size is 12 or above as re-quired by both FDA and EMA the bias introduced by the mixedeffects model is negligible From a statistics point of view we rec-ommend the mixed effect model approach for bioequivalence stud-ies for its potential gain in power when missing data are present andmissing completely at random

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 57

Abstracts

Session 30 Sample Size Estimation

Sample Size Calculation with Semiparametric Analysis of LongTerm and Short Term HazardsYi WangNovartis Pharmaceuticals Corporationyi-11wangnovartiscom

We derive sample size formulae for survival data with non-proportional hazard functions under both fixed and contiguous al-ternatives Sample size determination has been widely discussed inliterature for studies with failure-time endpoints Many researchershave developed methods with the assumption of proportional haz-ards under contiguous alternatives Without covariate adjustmentthe logrank test statistic is often used for the sample size and powercalculation With covariate adjustment the approaches are oftenbased on the score test statistic for the Cox proportional hazardsmodel Such methods however are inappropriate when the pro-portional hazards assumption is violated We develop methods tocalculate the sample size based on the semiparametric analysis ofshort-term and long-term hazard ratios The methods are built ona semiparametric model by Yang and Prentice (2005) The modelaccommodates a wide range of patterns of hazard ratios and in-cludes the Cox proportional hazards model and the proportionalodds model as its special cases Therefore the proposed methodscan be used for survival data with proportional or non-proportionalhazard functions In particular the sample size formula by Schoen-feld (1983) and Hsieh and Lavori (2000) can be obtained as a specialcase of our methods under contiguous alternatives

Sample Size and Decision Criteria for Phase IIB Studies withActive ControlXia XuMerck amp Coxia xumerckcom

In drug development programs Phase IIB studies provide informa-tion to make GoNo Go decision of conducting large confirmatoryPhase III studies Currently more and more Phase IIB studies are us-ing active control as comparator especially for development of newtherapies for the treatment of HIV infection in which it is not ethicalto use placebo control due to severity of the disease and availabilityof approved drugs If Phase IIB study demonstrated ldquocomparablerdquoefficacy and safety compared to active control the program mayproceed to Phase III which usually use same or similar active con-trol to formally assess non-inferiority of the new therapy Samplesize determination and quantification of decision criteria for suchPhase IIB studies are explored using a Bayesian analysis

Sample Size Determination for Clinical Trials to Correlate Out-comes with Potential PredictorsSu Chen Xin Wang and Ying ZhangAbbVie Incsuchenabbviecom

Sample size determination can be a challenging task for a post-marketing clinical study aiming to establish the predictivity of asingle influential measurement or a set of variables to a clinical out-come of interest Since the relationship between the potential pre-dictors and the outcome is unknown at the design stage one maynot be able to perform the conventional sample size calculation butlook for other means to size the trial Our proposed approach isbased on the length of the confidence interval of the true correlationcoefficient between predictive and outcome variables In this studywe compare three methods to construct confidence intervals of the

correlation coefficient based on the approximate sampling distribu-tion of the Pearson correlation Z-transformed Pearson correlationand Bootstrapping respectively We evaluate the performance ofthe three methods under different scenarios with small to moderatesample sizes and different correlations Coverage probabilities ofthe confidence intervals are compared across the three methods Theresults are used for sample size determination based on the width ofthe confidence intervals Hypothetical examples are provided to il-lustrate the idea and its implementation

Sample Size Re-Estimation at Interim Analysis in Oncology Tri-als with a Time-to-Event Endpoint

Ian (Yi) Zhang

Sunovion Pharmaceuticals Incianzhangsunovioncom

Oncology is a hot therapeutic area due to highly unmet medicalneeds The superiority of a study drug over a control is commonlyassessed with respect to a time to event endpoint such as overall sur-vival (OS) or progression free survival (PFS) in confirmatory oncol-ogy trials Adaptive design allowing for sample size re-estimation(SSR) at interim analysis is often employed to accelerate oncologydrug development while reducing costs Although SSR is catego-rized as ldquoless well understoodrdquo (in contrast to ldquowell understoodrdquodesigns such as group sequential design) in the 2010 draft FDAguidance on adaptive designs it has gradually gained regulatory ac-ceptance and is widely adopted in industry In this presentation aphase IIIII seamless design is developed to re-estimate the samplesize based upon unblinded interim result using conditional power ofobserving a significant result by the end of the trial The method-ology achieved the desired conditional power while still controllingthe type I error rate Extensive simulations studies are performedto evaluate the operating characteristics of the design A real-worldexample will also be used for illustration Pros and cons of the de-sign will be discussed

Statistical Inference and Sample Size Calculation for Paired Bi-nary Outcomes with Missing Data

Song Zhang

University of Texas Southwestern Medical Centersongzhangutsouthwesternedu

We investigate the estimation of intervention effect and samplesize determination for experiments where subjects are supposed tocontribute paired binary outcomes with some incomplete observa-tions We propose a hybrid estimator to appropriately account forthe mixed nature of observed data paired outcomes from thosewho contribute complete pairs of observations and unpaired out-comes from those who contribute either pre- or post-interventionoutcomes We theoretically prove that if incomplete data are evenlydistributed between the pre- and post-intervention periods the pro-posed estimator will always be more efficient than the traditionalestimator A numerical research shows that when the distributionof incomplete data is unbalanced the proposed estimator will besuperior when there is moderate-to-strong positive within-subjectcorrelation We further derive a closed-form sample size formula tohelp researchers determine how many subjects need to be enrolledin such studies Simulation results suggest that the calculated sam-ple size maintain the empirical power and type I error under variousdesign configurations We demonstrate the proposed method usinga real application example

58 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Session 31 Predictions in Clinical Trials

Predicting Smoking Cessation Outcomes Beyond Clinical TrialsYimei Li E Paul Wileyto and Daniel F HeitjanUniversity of PennsylvaniayimeilimailmedupenneduIn smoking cessation trials subjects usually receive treatment forseveral weeks with additional information collected 6 or 12 monthafter that An important question concerns predicting long-term ces-sation success based on short-term clinical observations But sev-eral features need to be considered First subjects commonly transitseveral times between lapse and recovery during which they exhibitboth temporary and permanent quits and both brief and long-termlapses Second although we have some reliable predictors of out-come there is also substantial heterogeneity in the data We there-fore introduce a cure-mixture frailty model that describes the com-plex process of transitions between abstinence and smoking Thenbased on this model we propose a Bayesian approach to predictindividual future outcomes We will compare predictions from ourmodel to a variety of ad hoc methods

Bayesian Event And Time Landmark Estimation In ClinicalTrials When Responses Are Failure Time DataHaoda Fu Luping Zhao and Yanping WangEli Lilly and CompanyfuhaodagmailcomIn oncology trials it is challenging to predict when we can havecertain number of events or for a given period of time how manyadditional events that we can observe We develop a tool calledBEATLES which stands for Bayesian Event And Time LandmarkEstimation Software This method and tools have been broadly im-plemented in Lilly In this talk we will present the technical details

Predicting the Probability of Future Clinical Study SuccessBased on the Evidence from Electronic Medical Record (EMR)DataHaoda Fu1 and Nan Jia2

1Eli Lilly and Company2University of Southern Californiajia nan2lillycomTo compare a treatment with a control via a randomized clinicaltrial the assessment of the treatment efficacy is often based on anoverall treatment effect over a specific study population To increasethe probability of study success (PrSS) it is important to choose anappropriate and relevant study population where the treatment is ex-pected to show overall benefit over the control This research is topredict the PrSS based on EMR data for a given patient populationTherefore we can use this approach to refine the study inclusionand exclusion criteria to increase the PrSS For learning from EMRdata we also develop covariate balancing methods Although ourmethods are developed for learning from EMR data learning fromrandomized control trials will be a special case of our methods

Weibull Cure-Mixture Model for the Prediction of Event Timesin Randomized Clinical TrialsGui-shuang Ying1 Qiang Zhang2 Yimei Li1 and Daniel FHeitjan1

1University of Pennsylvania2Radiation Therapy Oncology Group Statistical CentergsyingmailmedupenneduMany clinical trials with time-to-event outcome are designed toperform interim and final analyses upon the occurrence of a pre-specified number of events As an aid to trial logistical planningit is desirable to predict the time to reach such landmark event

numbers Our previously developed parametric (exponential andWeibull) prediction models assume that every trial participant issusceptible to the event of interest and will eventually experiencethe event if follow-up time is long enough This assumption maynot hold as some trial participants may be cured of the fatal dis-ease and the failure to accommodate the cure possibility may leadto the biased prediction In this talk a Weibull cure-mixture predic-tion model will be presented that assumes the trial participants area mixture of susceptible (uncured) participants and non-susceptible(cured) participants The cure probability is modelled using logis-tic regression and the time to event among susceptible participantsis modelled by a two-parameter Weibull distribution The compar-ison of prediction from the Weibull-cure mixture prediction modelto that from the standard Weibull prediction model will be demon-strated using data from a randomized trial of oropharyngeal cancer

Session 32 Recent Advances in Statistical Genetics

Longitudinal Exome-Focused GWAS of Alcohol Use in a Vet-eran CohortZuoheng Wang Zhong Wang Amy C Justice and Ke XuYale UniversityzuohengwangyaleeduAlcohol dependence (AD) is a major public health concern in theUnited States and contributes to the pathogenesis of many diseasesThe risk of AD is multifactorial and includes shared genetic andenvironmental factors However gene mapping in AD has not yetbeen successful the confirmed associations account for a small pro-portion of overall genetic risks Multiple measurements in longitu-dinal genetic studies provide a route to reduce noise and correspond-ingly increase the strength of signals in genome-wide associationstudies (GWAS) In this study we developed a powerful statisticalmethod for testing the joint effect of genetic variants with a generegion on diseases measured over multiple time points We appliedthe new method to a longitudinal study of veteran cohort with bothHIV-infected and HIV-uninfected patients to understand the geneticrisk underlying AD We found an interesting gene that has been re-ported in HIV study suggestive of potential gene by environmenteffect in alcohol use and HIV We also conducted simulation studiesto access the performance of the new statistical methods and demon-strated a power gain by taking advantage of repeated measurementsand aggregating information across a biological region This studynot only contributes to the statistical toolbox in the current GWASbut also potentially advances our understanding of the etiology ofAD

Type I Error in Regression-based Genetic Model BuildingHeejong Sung1 Alexa JM Sorant1 Bhoom Suktitipat2 andAlexander F Wilson1

1National Institutes of Health2Mahidol UniversitysunghemailnihgovThe task of identifying genetic variants contributing to trait varia-tion is increasingly challenging given the large number and densityof variant data being produced Current methods of analyzing thesedata include regression-based variable selection methods which pro-duce linear models incorporating the chosen variants For examplethe Tiled Regression method begins by examining relatively smallsegments of the genome called tiles Selection of significant predic-tors if any is done first within individual tiles However type I errorrates for such methods havenrsquot been fully investigated particularly

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 59

Abstracts

considering correlation among variants To investigate type I er-ror in this situation we simulated a mini-GWAS genome including306097 SNPs in 4000 unrelated samples with 2000 non-genetictraits Initially 53060 tiles were defined by dividing the genomeaccording to recombination hotspots Then larger tiles were definedby combining groups of ten consecutive tiles Stepwise regressionand LASSO variable selection methods were performed within tilesfor each tile definition Type I error rates were calculated as thenumber of selected variants divided by the number considered av-eraged over the 2000 phenotypes Overall error rates for stepwiseregression using fixed selection criteria of 005 and LASSO mini-mizing mean square error were 004 and 012 respectively whenusing the initial (smaller) tiles Considering separately each combi-nation of tile size (number of SNPs) and multicollinearity (definedas 1 - the determinant of the genotype correlation matrix) observedtype I error rates for stepwise regression tended to increase withthe number of variants and decrease with increasing multicollinear-ity With LASSO the trends were in the opposite direction Whenthe larger tiles were used overall rates for LASSO were noticeablysmaller while overall rates were rather robust for stepwise regres-sion

GMDR A Conceptual Framework for Detection of MultifactorInteractions Underlying Complex TraitsXiang-Yang LouUniversity of Alabama at Birminghamxylouuabedu

Biological outcomes are governed by multiple genetic and envi-ronmental factors that act in concert Determining multifactor in-teractions is the primary topic of interest in recent genetics stud-ies but presents enormous statistical and mathematical challengesThe computationally efficient multifactor dimensionality reduction(MDR) approach has emerged as a promising tool for meeting thesechallenges On the other hand complex traits are expressed in vari-ous forms and have different data generation mechanisms that can-not be appropriately modeled by a dichotomous model the subjectsin a study may be recruited according to its own analytical goals re-search strategies and resources available not only homogeneous un-related individuals Although several modifications and extensionsof MDR have in part addressed the practical problems they arestill limited in statistical analyses of diverse phenotypes multivari-ate phenotypes and correlated observations correcting for poten-tial population stratification and unifying both unrelated and familysamples into a more powerful analysis I propose a comprehensivestatistical framework referred as to generalized MDR (GMDR) forsystematic extension of MDR The proposed approach is quite ver-satile not only allowing for covariate adjustment being suitablefor analyzing almost any trait type eg binary count continuouspolytomous ordinal time-to-onset multivariate and others as wellas combinations of those but also being applicable to various studydesigns including homogeneous and admixed unrelated-subject andfamily as well as mixtures of them The proposed GMDR offersan important addition to the arsenal of analytical tools for identi-fying nonlinear multifactor interactions and unraveling the geneticarchitecture of complex traits

Gene-Gene Interaction Analysis for Rare Variants Applicationto T2D Exome Sequencing DataTaesung Park1 Min-Seok Kwon1 and Seung Yeoun Lee21Seoul National University2Sejong Universitytsparkstatssnuackr

Heritability of complex diseases may not be fully explained by thecommon variants This missing heritability could be partly due togene-gene interaction and rare variants There has been an exponen-tial growth of gene-gene interaction analysis for common variantsin terms of methodological developments and practical applicationsAlso the recent advance of high-throughput sequencing technolo-gies makes it possible to conduct rare variant analysis Howeverlittle progress has been made in gene-gene interaction analysis forrare variants Here we propose a new gene-gene interaction methodfor the rare variants in the framework of the multifactor dimension-ality reduction (MDR) analysis The proposed method consists oftwo steps The first step is to collapse the rare variants in a specificregion such as gene The second step is to perform MDR analysisfor the collapsed rare variants The proposed method is illustratedwith 1080 whole exome sequencing data of Korean population toidentify causal gene-gene interaction for rare variants for type 2 di-abetes

Session 33 Structured Approach to High DimensionalData with Sparsity and Low Rank Factorization

Two-way Regularized Matrix DecompositionJianhua HuangTexas AampM UniversityjianhuastattamueduMatrix decomposition (or low-rank matrix approximation) plays animportant role in various statistical learning problems Regulariza-tion has been introduced to matrix decomposition to achieve stabil-ity especially when the row or column dimension is high Whenboth the row and column domains of the matrix are structured itis natural to employ a two-way regularization penalty in low-rankmatrix approximation This talk discusses the importance of con-sidering invariance when designing the two-way penalty and showsun-desirable properties of some penalties used in the literature whenthe invariance is ignored

Tensor Regression with Applications in Neuroimaging AnalysisHua Zhou1 Lexin Li1 and Hongtu Zhu2

1North Carolina State University2University of North Carolina at Chapel Hilllli10ncsueduClassical regression methods treat covariates as a vector and es-timate a corresponding vector of regression coefficients Modernapplications in medical imaging generate covariates of more com-plex form such as multidimensional arrays (tensors) Traditionalstatistical and computational methods are compromised for analysisof those high-throughput data due to their ultrahigh dimensional-ity as well as complex structure In this talk I will discuss a newclass of tensor regression models that efficiently exploit the specialstructure of tensor covariates Under this framework ultrahigh di-mensionality is reduced to a manageable level resulting in efficientestimation and prediction Regularization both hard thresholdingand soft thresholding types will be carefully examined The newmethods aim to address a family of neuroimaging problems includ-ing using brain images to diagnose neurodegenerative disorders topredict onset of neuropsychiatric diseases and to identify diseaserelevant brain regions or activity patterns

RKHS-Embedding Based Feature Screening for High-Dimensional DataKrishnakumar Balasubramanian1 Bharath Sriperambadur2 andGuy Lebanon1

60 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

1Georgia Institute of Technology2Pennsylvania State Universitykrishnakumar3gatecheduFeature screening is a key step in handling ultrahigh dimensionaldata sets that are ubiquitous in modern statistical problems Over thelast decade convex relaxation based approaches (eg Lassosparseadditive model) have been extensively developed and analyzed forfeature selection in high dimensional regime But these approachessuffer from several problems both computationally and statisticallyTo overcome these issues we propose a novel Hilbert space em-bedding based approach for independence screening in ultrahigh di-mensional data sets The proposed approach is model-free (ie nomodel assumption is made between response and predictors) andcould handle non-standard (eg graphs) and multivariate outputsdirectly We establish the sure screening property of the proposedapproach in the ultrahigh dimensional regime and experimentallydemonstrate its advantages over other approaches

Sparse Conditional Graphical Models for Structured GeneticDatasetsHyonho ChunPurdue UniversitychunhpurdueeduFor the purpose of inferring a network we consider a sparse Gaus-sian graphical model (SGGM) under the presence of a populationstructure which often occurs in genetic studies with model organ-isms In these studies datasets are obtained by combining multi-ple lines of inbred organisms or by using outbred animals Ignor-ing such population structures would produce false connections ina graph structure but most research in graph inference has beenfocused on independent cases On the other hand in regression set-tings a linear mixed effect model has been widely used in orderto account for correlations among observations Besides its effec-tiveness the linear mixed effect model has a generality the modelcan be stated with a framework of penalized least squares Thisgenerality makes it very flexible for utilization in settings other thanregression In this manuscript we adopt a linear mixed effect modelto an SGGM Our formulation fits into the recently developed con-ditional Gaussian graphical model in which the population struc-tures are modeled as predictors and the graph is determined by aconditional precision matrix The proposed approach is applied tothe network inference problem of two datasets heterogeneous micediversity panel (HMDP) and heterogeneous stock (HS) datasets

Session 34 Recent Developments in Dimension Reduc-tion Variable Selection and Their Applications

Variable Selection and Model Estimation via Subtle UprootingXiaogang SuUniversity of Texas at El PasoxiaogangsugmailcomWe propose a new method termed ldquosubtle uprootingrdquo for fittingGLM by optimizing a smoothed information criterion The signif-icance of this approach is that it completes variable selection andparameter estimation within one single optimization step and avoidstuning penalty parameters as commonly done in traditional regular-ization approaches Two technical maneuvers ldquouprootingrdquo and anepsilon-threshold procedure are employed to enforce sparsity inparameter estimates while maintaining the smoothness of the ob-jective function The formulation allows us to borrow strength fromestablished methods and theories in both optimization and statistical

estimation More specifically a modified BFGS algorithm (Li andFukushima 2001) is adopted to solve the non-convex yet smoothprogramming problem with established global and super-linearconvergence properties By making connections to M -estimatorsand information criteria we also showed that the proposed methodis consistent in variable selection and efficient in estimating thenonzero parameters As illustrated with both simulated experimentsand data examples the empirical performance is either comparableor superior to many other competitors

Robust Variable Selection Through Dimension ReductionQin WangVirginia Commonwealth Universityqwang3vcueduDimension reduction and variable selection play important roles inhigh dimensional data analysis MAVE (minimum average varianceestimation) is an efficient approach proposed by Xia et al (2002)to estimate the regression mean space However it is not robust tooutliers in the dependent variable because of the use of least-squarescriterion In this talk we propose a robust estimation based on localmodal regression so that it is more applicable in practice We fur-ther extend the new approach to select informative variables throughshrinkage estimation The efficacy of the new approach is illustratedthrough simulation studies

Sparse Envelope Model Efficient Estimation and ResponseVariable Selection in Multivariate Linear RegressionZhihua Su1 Guangyu Zhu1 and Xin Chen2

1University of Florida2National University of SingaporezhihuasustatufleduThe envelope model recently proposed by Cook Li andChiaromonte (2010) is a novel method to achieve efficient estima-tion for multivariate linear regression It identifies the material andimmaterial information in the data using the covariance structureamong the responses The subsequent analysis is based only on thematerial part and is therefore more efficient The envelope estimatoris consistent but in the sample the material part estimated by theenvelope model consists of linear combinations of all the responsevariables while in many applications it is important to pinpoint theresponse variables that are immaterial to the regression For thispurpose we propose the sparse envelope model which can identifythese response variables and at the same time preserves the effi-ciency gains offered by the envelope model A group-lasso type ofpenalty is employed to induce sparsity on the manifold structure ofthe envelope model Consistency asymptotic distribution and oracleproperty of the estimator are established In particular new featuresof oracle property with response selection are discussed Simulationstudies and an example demonstrate the effectiveness of this model

Session 35 Post-Discontinuation Treatment in Random-ized Clinical Trials

Marginal Structure Model with Adaptive Truncation in Esti-mating the Initial Treatment Effect with Informative Censoringby Subsequent TherapyJingyi Liu1 Li Li1 Xiaofei Bai2 and Douglas Faries11Eli Lilly and Company2North Carolina State Universityliu jingyilillycomA randomized clinical trial is designed to estimate the direct ef-fect of a treatment versus control where patients receive the treat-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 61

Abstracts

ment of interest or control by random assignment The treatmenteffect is measured by the comparison of endpoints of interest egoverall survival However in some trials patients who discon-tinued their initial randomized treatment are allowed to switch toanother treatment based on clinicians or patientsrsquo subjective deci-sion In such cases the primary endpoint is censored and the di-rect treatment effect of interest may be confounded by subsequenttreatments especially when subsequent treatments have large im-pact on endpoints In such studies there usually exist variables thatare both risk factors of primary endpoint and also predictors of ini-tiation of subsequent treatment Such variables are called time de-pendent confounders When time dependent confounders exist thetraditional methods such as the intent-to-treat (ITT) analysis andtime-dependent Cox model may not appropriately adjust for timedependent confounders and result in biased estimators Marginalstructural models (MSM) have been applied to estimate the causaltreatment effect when initial treatment effect was confounded bysubsequent treatments It has been shown that MSM utilizing in-verse propensity weighting generates consistent estimators whenother nuisance parameters were correctly modeled However theoccurrence of very large weights can cause the estimator to haveinflated variance and consistency may not hold The AugmentedMSM estimator was proposed to more efficiently estimate treat-ment effect but may not perform well as expected in presence oflarge weights In this paper we proposed a new method to estimateweights by adaptively truncating longitudinal weights in MSM Thismethod sacrifices the consistency but gain efficiency when largeweight exists without ad hoc selecting and removing observationswith large weights We conducted simulation studies to explorethe performance of several different methods including ITT anal-ysis Cox model and the proposed method regarding bias standarddeviation coverage rate of confidence interval and mean squarederror (MSE) under various scenarios We also applied these meth-ods to a randomized open-label phase III study of patients withnon-squamous non-small cell lung cancer

Quantile Regression Adjusting for Dependent Censoring fromSemi-Competing RisksRuosha Li1 and Limin Peng2

1University of Pittsburgh2Emory Universityrul12pittedu

In this work we study quantile regression when the response is anevent time subject to potentially dependent censoring We considerthe semi-competing risks setting where time to censoring remainsobservable after the occurrence of the event of interest While sucha scenario frequently arises in biomedical studies most of currentquantile regression methods for censored data are not applicable be-cause they generally require the censoring time and the event timebe independent By imposing rather mild assumptions on the asso-ciation structure between the time-to-event response and the censor-ing time variable we propose quantile regression procedures whichallow us to garner a comprehensive view of the covariate effects onthe event time outcome as well as to examine the informativenessof censoring An efficient and stable algorithm is provided for im-plementing the new method We establish the asymptotic proper-ties of the resulting estimators including uniform consistency andweak convergence Extensive simulation studies suggest that theproposed method performs well with moderate sample sizes We il-lustrate the practical utility of our proposals through an applicationto a bone marrow transplant trial

Overview of Crossover DesignMing ZhuAbbVie Inczhuming83gmailcomCrossover design is used in many clinical trials Comparing toconventional parallel design crossover design has the advantage ofavoiding problems of comparability issues between study and con-trol groups with regard to potential confounding variables More-over crossover design is more efficient than parallel design in thatit requires smaller sample size with given type I and type II errorHowever crossover design may suffer from the problem of carry-over effects which might bias the interpretation of data analysis Inthe presentation I will talk about general consideration that needsto be taken and pitfalls to be avoided in planning and analysis ofcrossover trial Appropriate statistical methods for crossover trialanalysis will also be described

Cross-Payer Effects of Medicaid LTSS on Medicare ResourceUse using Propensity Score Risk ProfilingYi Huang Anthony Tucker and Karen JohnsonUniversity of MarylandyihuangumbceduMedicaid administrators look to establish a better balance betweenlong-term services and supports (LTSS) provided in the communityand in institutions and to better integrate acute and long-term carefor recipients who are dually eligible for Medicare Programs of in-tegrated care will require the solid understanding on the interactiveeffects that are masked in the separation of Medicare and MedicaidThis paper aims to evaluate the causal effect of Marylandrsquos OlderAdult Waiver (OAW) program on the outcomes of Medicare spend-ing using propensity score based health risk profiling techniqueSpecifically dually eligible recipients enrolled for Marylandrsquos OAWprogram were identified as the treatment group and matched ldquocon-trolrdquo groups were drawn from comparable population who did notreceive those services The broader impact for this study is that sta-tistical approaches can be developed by any state to facilitate theimprovement of quality and cost effectiveness of LTSS for duals

Session 36 New Advances in Semi-Parametric Modelingand Survival Analysis

Bayesian Partial Linear Model for Skewed Longitudinal DataYuanyuan Tang1 Debajyoti Sinha2 Debdeep Pati2 Stuart Lipsitz3

and Steven Lipshultz41AbbVie Inc2Florida State University3Brigham and Womenrsquos Hospital4University of MiamidebdeepstatfsueduCurrent statistical models and methods focusing on mean responseare not appropriate for longitudinal studies with heavily skewedcontinuous response For such longitudinal response we presenta novel model accommodating a partially linear median regressionfunction a flexible Dirichlet process mixture prior for the skewederror distribution and within subject association structure We pro-vide theoretical justifications for our methods including asymptoticproperties of the posterior and the semi-parametric Bayes estima-tors We also provide simulation studies of finite sample propertiesEase of computational implementation via available MCMC toolsand other additional advantages of our method compared to exist-ing methods are illustrated via analysis of a cardiotoxicity study ofchildren of HIV infected mothers

62 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Nonparametric Inference for Inverse Probability Weighted Es-timators with a Randomly Truncated SampleXu ZhangUniversity of Mississippixzhang2umcedu

A randomly truncated sample appears when the independent vari-ables T and L are observable if L iexcl T The truncated version Kaplan-Meier estimator is known to be the standard estimation method forthe marginal distribution of T or L The inverse probability weighted(IPW) estimator was suggested as an alternative and its agreementto the truncated version Kaplan-Meier estimator has been provedThis paper centers on the weak convergence of IPW estimators andvariance decomposition The paper shows that the asymptotic vari-ance of an IPW estimator can be decomposed into two sources Thevariation for the IPW estimator using known weight functions is theprimary source and the variation due to estimated weights shouldbe included as well Variance decomposition establishes the con-nection between a truncated sample and a biased sample with knowprobabilities of selection A simulation study was conducted to in-vestigate the practical performance of the proposed variance esti-mators as well as the relative magnitude of two sources of variationfor various truncation rates A blood transfusion data set is analyzedto illustrate the nonparametric inference discussed in the paper

Modeling Time-Varying Effects for High-Dimensional Covari-ates A New Gateaux-Differential Boosting ApproachKevin He Yi Li and Ji ZhuUniversity of Michiganyiliumichedu

Survival models with time-varying effects provide a flexible frame-work for modeling the effects of covariates on event times How-ever the difficulty of model construction increases dramatically asthe number of variable grows Existing constrained optimizationand boosting methods suffer from computational complexity Wepropose a new Gateaux differential-based boosting procedure forsimultaneously selecting and automatically determining the func-tional form of covariates The proposed method is flexible in that itextends the gradient boosting to functional differentials in generalparameter space In each boosting learning step of this procedureonly the best-fitting base-learner (and therefore the most informativecovariate) is added to the predictor which consequently encouragessparsity In addition the method controls smoothness which is cru-cial for improving predictive performance The performance of theproposed method is examined by simulations and by application toanalyze the national kidney transplant data

Flexible Modeling of Survival Data with Covariates Subject toDetection Limits via Multiple ImputationPaul Bernhardt1 Judy Wang2 and Daowen Zhang2

1Villanova University2North Carolina State Universitydzhang2ncsuedu

Models for survival data generally assume that covariates are fullyobserved However in medical studies it is not uncommon forbiomarkers to be censored at known detection limits A computa-tionally efficient multiple imputation procedure for modelling sur-vival data with covariates subject to detection limits is proposedThis procedure is developed in the context of an accelerated fail-ure time model with a flexible seminonparametric error distributionAn iterative version of the proposed multiple imputation algorithmthat approximates the EM algorithm for maximum likelihood is sug-gested Simulation studies demonstrate that the proposed multiple

imputation methods work well while alternative methods lead to es-timates that are either biased or more variable The proposed meth-ods are applied to analyze the dataset from a recently conductedGenIMS study

Session 37 High-Dimensional Data Analysis Theoryand Application

Structured Functional Additive Regression in ReproducingKernel Hilbert SpacesHao Helen ZhangUniversity of ArizonahzhangmatharizonaeduA new class of semiparametric functional regression models is con-sidered to jointly model the functional and non-functional predic-tors identifying important scalar covariates while taking into ac-count the functional covariate In particular we exploit a unifiedlinear structure to incorporate the functional predictor as in classi-cal functional linear models that is of nonparametric feature At thesame time we include a potentially large number of scalar predic-tors as the parametric part that may be reduced to a sparse represen-tation The new method performs variable selection and estimationby naturally combining the functional principal component analysis(FPCA) and the SCAD penalized regression under one frameworkTheoretical and empirical investigation reveals that efficient estima-tion regarding important scalar predictors can be obtained and en-joys the oracle property despite contamination of the noise-pronefunctional covariate The study also sheds light on the influence ofthe number of eigenfunctions for modeling the functional predic-tor on the correctness of model selection and accuracy of the scalarestimates

High-Dimensional Thresholded Regression and Shrinkage Ef-fectZemin Zheng Yingying Fan and Jinchi LvUniversity of Southern CaliforniazeminzheusceduHigh-dimensional sparse modeling via regularization provides apowerful tool for analyzing large-scale data sets and obtainingmeaningful interpretable models The use of nonconvex penaltyfunctions shows advantage in selecting important features in highdimensions but the global optimality of such methods still de-mands more understanding In this paper we consider sparse re-gression with hard-thresholding penalty which we show to giverise to thresholded regression This approach is motivated by itsclose connection with the L0-regularization which can be unreal-istic to implement in practice but of appealing sampling propertiesand its computational advantage Under some mild regularity con-ditions allowing possibly exponentially growing dimensionality weestablish the oracle inequalities of the resulting regularized estima-tor as the global minimizer under various prediction and variableselection losses as well as the oracle risk inequalities of the hard-thresholded estimator followed by a further L2-regularization Therisk properties exhibit interesting shrinkage effects under both es-timation and prediction losses We identify the optimal choice ofthe ridge parameter which is shown to have simultaneous advan-tages to both the L2-loss and prediction loss These new results andphenomena are evidenced by simulation and real data examples

Local Independence Feature Screening for Nonparametric andSemiparametric Models by Marginal Empirical LikelihoodJinyuan Chang1 Cheng Yong Tang2 and Yichao Wu3

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 63

Abstracts

1University of Melbourne2University of Colorado Denver3North Carolina State UniversitychengyongtangucdenvereduWe consider an independence feature screening method for iden-tifying contributing explanatory variables in high-dimensional re-gression analysis Our approach is constructed by using the em-pirical likelihood approach in conjunction with marginal nonpara-metric regressions to surely capture the local impacts of explana-tory variables Without requiring a specific parametric form of theunderlying data model our approach can be applied for a broadrange of representative nonparametric and semi-parametric modelswhich include but are not limited to the nonparametric additivemodels single-index and multiple-index models and varying co-efficient models Facilitated by the marginal empirical likelihoodour approach addresses the independence feature screening prob-lem with a new insight by directly assessing evidence of significancefrom data on whether an explanatory variable is contributing locallyto the response variable or not Such a feature avoids the estima-tion step in most existing independence screening approaches andis advantageous in scenarios such as the single-index models whenthe identification of the marginal effect for its estimation is an issueTheoretical analysis shows that the proposed feature screening ap-proach can handle data dimensionality growing exponentially withthe sample size By extensive theoretical illustrations and empiricalexamples we show that the local independence screening approachworks promisingly

The Fused Kolmogorov Filter A Nonparametric Model-FreeScreening MethodQing Mai1 and Hui Zou2

1Florida State University2University of MinnesotamaistatfsueduA new model-free screening method named fused Kolmogorov filteris proposed for high-dimensional data analysis This new method isfully nonparametric and can work with many types of covariatesand response variables including continuous discrete and categor-ical variables We apply the fused Kolmogorov filter to deal withvariable screening problems emerging from in a wide range of ap-plications such as multiclass classification nonparametric regres-sion and Poisson regression among others It is shown that thefused Kolmogorov filter enjoys the sure screening property underweak regularity conditions that are much milder than those requiredfor many existing nonparametric screening methods In particu-lar the fused Kolmogorov can still be powerful when covariatesare strongly dependent of each other We further demonstrate thesuperior performance of the fused Kolmogorov filter over existingscreening methods by simulations and real data examples

Session 38 Leading Across Boundaries Leadership De-velopment for Statisticians

Xiaoli Meng1Dipak Dey2 Soonmin Park3 James Hung4 WalterOffen5

1Harvard University2University of Connecticut3Eli Lilly and Company4United States Food and Drug Administration5AbbVie Inc1mengstatharvardedu2dipakdeyuconnedu

3park soominlillycom4hsienminghungfdahhsgov5walteroffenabbviecomThe role of statistician has long been valued as a critical collabo-rator in interdisciplinary collaboration Nevertheless statistician isoften regarded as a contributor more than a leader This stereotypehas limited statistics as a driving perspective in a partnership envi-ronment and inclusion of statistician in executive decision makingMore leadership skills are needed to prepare statisticians to play in-fluential roles and to promote our profession to be more impactfulIn this panel session statistician leaders from academia govern-ment and industry will share their insights about leadership andtheir experiences in leading in their respective positions Importantleadership skills and qualities for statisticians will be discussed bythe panelists This session is targeted for statisticians who intend toseek more knowledge and inspiration of leadership

Session 39 Recent Advances in Adaptive Designs inEarly Phase Trials

A Toxicity-Adaptive Isotonic Design for Combination Therapyin OncologyRui QinMayo ClinicqinruimayoeduWith the development of molecularly targeted drugs in cancer treat-ment combination therapy targeting multiple pathways to achievepotential synergy becomes increasingly popular While the dosingrange of individual drug may be already defined the maximum tol-erated dose of combination therapy is yet to be determined in a newphase I trial The possible dose level combinations which are par-tially ordered poses a great challenge for conventional dose-findingdesignsWe have proposed to estimate toxicity probability by isotonic re-gression and incorporate the attribution of toxicity into the consid-eration of dose escalation and de-escalation of combination therapySimulation studies are conducted to understand and assess its oper-ational characteristics under various scenarios The application ofthis novel design into an ongoing phase I clinical trial with dualagents is further illustrated as an example

Calibration of the Likelihood Continual Reassessment Methodfor Phase I Clinical TrialsShing Lee1 Xiaoyu Jia2 and Ying Kuen Cheung1

1Columbia University2Boehringer Ingelheim Pharmaceuticalssml2114columbiaeduThe likelihood continual reassessment method is an adaptive model-based design used to estimate the maximum tolerated dose in phaseI clinical trials The method is generally implemented in a two stageapproach whereby model based dose escalation is activated after aninitial sequence of patients are treated While it has been shown thatthe method has good large sample properties in finite sample set-tings it is important to specify a reasonable model We proposea systematic approach to select the initial dose sequence and theskeleton based on the concepts of indifference interval and coher-ence We compare the approaches to the traditional trial and errorapproach in the context of examples The systematic calibration ap-proach simplifies the model calibration process for the likelihoodcontinual reassessment method while being competitive comparedto a time consuming trial and error process We also share our expe-

64 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

rience using the calibration technique in real life applications usingthe dfcrm package in R

Sequential Subset Selection Procedure of Random Subset Sizefor Early Phase Clinical trialsCheng-Shiun Leu and Bruce LevinColumbia Universitycl94columbiaeduIn early phase clinical trials the objective is often to select a sub-set of promising candidate treatments whose treatment effects aregreater than the remaining candidates by at least a pre-specifiedamount to bring forward for phase III confirmatory testing Undercertain constraints such as budgetary limitations or difficulty of re-cruitment a procedure which select a subset of fixed pre-specifiedsize is entirely appropriate especially when the number of treat-ments available for further testing is limited However cliniciansand researchers often demand to identify all efficacious treatmentsin the screening process and a subset selection of fixed size may notbe sufficient to satisfy the requirement as the number of efficacioustreatments is unknown prior to the experiment To address this is-sue we discuss a family of sequential subset selection procedureswhich identify a subset of efficacious treatments of random sizethereby avoiding the need to pre-specify the subset size Variousversions of the procedure allow adaptive sequential elimination ofinferior treatments and sequential recruitment of superior treatmentsas the experiment processes We compare these new procedure withGuptarsquos random subset size procedure for selecting the one best can-didate by simulation

Serach Procedures for the MTD in Phase I TrialsShelemyyahu ZacksBinghamton UniversityshellymathbinghamtoneduThere are several competing methods of search for the MTD inPhase I Cancer clinical trials The paper will review some proce-dures and compare the operating characteristics of them In partic-ular the EWOC method of Rogatko and el will be highlighted

Session 40 High Dimensional RegressionMachineLearning

Variable Selection for High-Dimensional Nonparametric Ordi-nary Differential Equation Models With Applications to Dy-namic Gene Regulatory NetworksHongqi Xue1 Tao Lu2 Hua Liang3 and Hulin Wu1

1University of Rochester2State University of New York at Albany3George Washington UniversityHongqi XueurmcrochestereduThe gene regulation network (GRN) is a high-dimensional complexsystem which can be represented by various mathematical or sta-tistical models The ordinary differential equation (ODE) model isone of the popular dynamic GRN models High-dimensional lin-ear ODE models have been proposed to identify GRNs but witha limitation of the linear regulation effect assumption We pro-pose a nonparametric additive ODE model coupled with two-stagesmoothing-based ODE estimation methods and adaptive groupLASSO techniques to model dynamic GRNs that could flexiblydeal with nonlinear regulation effects The asymptotic propertiesof the proposed method are established under the ldquolarge p small nrdquosetting Simulation studies are performed to validate the proposed

approach An application example for identifying the nonlinear dy-namic GRN of T-cell activation is used to illustrate the usefulnessof the proposed method

BigData Sign Cauchy Projections and Chi-Square KernelsPing Li1 Gennady Samorodnitsky2 and John Hopcroft21Rutgers University2Cornell Universitypingli98gmailcomThe method of stable random projections is useful for efficientlyapproximating the lα distance in high dimension and it is naturallysuitable for data streams In this paper we propose to use only thesigns of the α = 1 (ie Cauchy random projections) we showthat the probability of collision can be accurately approximated asfunctions of the chi-square (χ2) similarity In text and vision ap-plications the χ2 similarity is a popular measure when the featuresare generated from histograms (which are a typical example of datastreams) Experiments confirm that the proposed method is promis-ing for large-scale learning applications The full paper is availableat arXiv13081009

A Sparse Linear Discriminant Analysis Method with Asymp-totic Optimality for Multiclass ClassificationRuiyan Luo and Xin QiGeorgia State UniversityrluogsueduRecently many sparse linear discriminant analysis methods havebeen proposed to overcome the major problems of the classic lineardiscriminant analysis in high-dimensional settings However theasymptotic optimality results are limited to the case that there areonly two classes as the classification boundary of LDA is a hyper-plane and there exist explicit formulas for the classification errorWe propose an efficient sparse linear discriminant analysis methodfor multiclass classification In practice this method can control therelationship between the sparse components and hence have im-proved prediction accuracy compared to other methods in both sim-ulation and case studies In theory we derive asymptotic optimalityfor our method as dimensionality and sample size go to infinity witharbitrary fixed number of classes

Generalized Hidden Markov Model for Variant DetectionYichen Cheng James Dai and Charles KooperbergFred Hutchinson Cancer Research CenterychengfhcrcorgThe development in next-generation-sequencing technology en-ables the detection of both common and rare variants Genome wideassociation study (GWAS) benefits greatly from this fast growingtechnology Although a lot of associations between variants anddisease have been found for common variants new methods for de-tecting functional rare variants is still in urgent need Among exist-ing methods efforts have been done to increase detection power bydoing set-based test However none of the methods make a distinc-tion between functional variants and neutral variants (ie variantsthat do not have effect on the disease) In this paper we propose tomodel the effects from a set (for example a gene) of variants as aHidden Markov Model (HMM) For each SNP we model the effectsas a mixture of 0 and θ where θ is the true effect size The mixtureset up is to account for the fact that a proportion of the variants areneutral Another advantage of using HMM is it can account for pos-sible association between neighboring variants Our methods workswell for both linear model and logistic model Under the frameworkof HMM we test between having 1 components against more com-ponents and derived the asymptotic distribution under null hypoth-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 65

Abstracts

esis We show that our proposed methods works well as comparedto competitors under various scenario

Large-Scale Joint Trait Risk Prediction for Mini-exome Se-quence DataGengxin LiWright State UniversitygengxinliwrighteduEmpirical Bayes classification method is a useful risk prediction ap-proach for microarray data but it is challenging to apply this methodto risk prediction area using the mini exome sequencing data A ma-jor advantage of using this method is that the effect size distributionfor the set of possible features is empirically estimated and that allsubsequent parameter estimation and risk prediction is guided bythis distribution Here we generalize Efronrsquos method to allow forsome of the peculiarities of the mini exome sequencing data In par-ticular we incorporate quantitative trait information to binary traitprediction model and a new model named Joint Trait Model is pro-posed and we further allow this model to properly incorporate theannotation information of single nucleotide polymorphisms (SNPs)In the course of our analysis we examine several aspects of the pos-sible simulation model including the identity of the most importantgenes the differing effects of synonymous and non-synonymousSNPs and the relative roles of covariates and genes in conferringdisease risk Finally we compare the three methods to each otherand to other classifiers

Rank Estimation and Recovery of Low-rank Matrices For Fac-tor Model with Heteroscedastic NoiseJingshu Wang and Art B OwenStanford UniversitywangjingshususangmailcomWe consider recovery of low-rank matrices from noisy data withheteroscedastic noise We use an early stopping alternating method(ESAM) which iteratively alters the estimate of the noise vari-ance and the low-rank matrix and corrects over-fitting by an early-stopping rule Various simulations in our study suggest stoppingafter just 3 iterations and we have seen that ESAM gives better re-covery than the SVD on either the original data or the standardizeddata with the optimal rank given To select a rank we use an early-stopping bi-cross-validation (BCV) technique modified from BCVfor the white noise model Our method leaves out half the rows andhalf the columns as in BCV but uses low rank operations involvingESAM instead of the SVD on the retained data to predict the heldout entries Simulations considering both strong and weak signalcases show that our method is the most accurate overall comparedto some BCV strategies and two versions of Parallel Analysis (PA)PA is a state-of-the art method for choosing the number of factorsin Factor Analysis

Session 41 Distributional Inference and Its Impact onStatistical Theory and Practice

Stat Wars Episode IV A New Hope (For Objective Inference)Keli Liu and Xiao-Li MengHarvard UniversitymengstatharvardeduA long time ago in a galaxy far far away (pre-war England)It is a period of uncivil debate Rebel statisticians striking froman agricultural station have won their first victory against the evilBayesian EmpireA plea was made ldquoHelp me R A Fisher yoursquore my only hoperdquo

and Fiducial was born It promised posterior probability statementson parameters without a prior but at the seeming cost of violatingbasic probability laws Was Fisher crazy or did madness mask in-novation Fiducial calculations can be easily understood throughthe missing-data perspective which illuminates a trinity of missinginsightsI The Bayesian prior becomes an infinite dimensional nuisance pa-rameter to be dealt with using partial likelihoodII A Missing At Random (MAR) condition naturally characterizeswhen exact Fiducial solutions existIII Understanding the ldquomulti-phaserdquo structure underlying Fiducialinference leads to the development of approximate Fiducial proce-dures which remain robust to prior misspecificationIn the years after its introduction Fiducialrsquos critics branded it ldquoFish-ers biggest blunderrdquo But in the great words of Obi-Wan ldquoIf youstrike me down I shall become more powerful than you can possi-bly imaginerdquoTo be continued Episode V Ancillarity Paradoxes Strike Back (AtFiducial) and Episode VI Return of the Fiducialist will premiere re-spectively at IMS Asia Pacific Rim Meeting in Taipei (June 30-July3 2014) and at IMS Annual Meeting in Sydney (July 7-11 2014)

Higher Order Asymptotics for Generalized Fiducial InferenceAbhishek Pal Majumdarand Jan HannigUniversity of North Carolina at Chapel HilljanhannigunceduR A Fisherrsquos fiducial inference has been the subject of many dis-cussions and controversies ever since he introduced the idea duringthe 1930rsquos The idea experienced a bumpy ride to say the leastduring its early years and one can safely say that it eventually fellinto disfavor among mainstream statisticians However it appearsto have made a resurgence recently under various names and mod-ifications For example under the new name generalized inferencefiducial inference has proved to be a useful tool for deriving statis-tical procedures for problems where frequentist methods with goodproperties were previously unavailable Therefore we believe thatthe fiducial argument of RA Fisher deserves a fresh look from anew angle In this talk we investigate the properties of general-ized fiducial distribution using higher order asymptotics and pro-vide suggestions on some open issues in fiducial inference such asthe choice of data generating equation

Generalized Inferential ModelsRyan MartinUniversity of Illinois at ChicagorgmartinuiceduThe new inferential model (IM) framework provides prior-freeprobabilistic inference which is valid for all models and all sam-ple sizes The construction of an IM requires specification of anassociation that links the observable data to the parameter of inter-est and an unobservable auxiliary variable This specification canbe challenging however particularly when the parameter is morethan one dimension In this talk I will present a generalized (orldquoblack-boxrdquo) IM that bypasses full specification of the associationand the challenges it entails by working with an association basedon a scalar-valued parameter-dependent function of the data The-ory and examples demonstrate this method gives exact and efficientprior-free probabilistic inference in a wide variety of problems

Formal Definition of Reference Priors under a General Class ofDivergenceDongchu SunUniversity of Missouri

66 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

sundmissourieduReference analysis produces objective Bayesian inference that in-ferential statements depend only on the assumed model and theavailable data and the prior distribution used to make an inferenceis least informative in a certain information-theoretic sense BergerBernardo and Sun (2009) derived reference priors rigorously in thecontexts under Kullback-Leibler divergence In special cases withcommon support and other regularity conditions Ghosh Mergeland Liu (2011) derived a general f-divergence criterion for priorselection We generalize Ghosh Mergel and Liursquos (2011) results tothe case without common support and show how an explicit expres-sion for the reference prior can be obtained under posterior consis-tency The explicit expression can be used to derive new referencepriors both analytically and numerically

Session 42 Applications of Spatial Modeling and Imag-ing Data

Spatial Bayesian Variable Selection and Shrinkage in High-dimensional Covariate Spaces with Application to fMRIFan Li1 Tingting Zhang (Co-first author)2 Quanli Wang1 andJames Coan2

1Duke University2University of Virginiatz3bvirginiaeduMulti-subject functional magnetic resonance imaging (fMRI) dataprovide opportunities to study population-wide relationship be-tween human brain activity and individual biological or behaviorialtraits But statistical modeling analysis and computation for suchmassive and noisy data with a complicated spatio-temporal corre-lation structure is extremely challenging In this article within theframework of Bayesian stochastic search variable selection we pro-pose a joint Ising and Dirichlet Process (Ising-DP) prior to achieveselection of spatially correlated brain voxels that are predictive ofindividual responses The Ising component of the prior utilizesof the spatial information between voxels and the DP componentshrinks the coefficients of the large number of voxels to a smallset of values and thus greatly reduces the posterior computationalburden To address the phase transition phenomenon of the Isingprior we propose a new analytic approach to derive bounds for thehyperparameters illustrated on 2- and 3-dimensional lattices Theproposed method is compared with several alternative methods viasimulations and is applied to the fMRI data collected from the Kiffhand-holding experiment

A Hierarchical Model for Simultaneous Detection and Estima-tion in Multi-Subject fMRI StudiesDavid Degras1 and Martin Lindquist21DePaul University2Johns Hopkins UniversityddegrasvdepauleduIn this paper we introduce a new hierarchical model for the simul-taneous detection of brain activation and estimation of the shapeof the hemodynamic response in multi-subject fMRI studies Theproposed approach circumvents a major stumbling block in stan-dard multi-subject fMRI data analysis in that it both allows theshape of the hemodynamic response function to vary across regionand subjects while still providing a straightforward way to estimatepopulation-level activation An efficient estimation algorithm is pre-sented as is an inferential framework that not only allows for testsof activation but also for tests for deviations from some canonical

shape The model is validated through simulations and applicationto a multi-subject fMRI study of thermal pain

On the Relevance of Accounting for Spatial Correlation A CaseStudy from FloridaLinda J Young1 and Emily Leary21USDA NASS RDD2University of FloridalindayoungnassusdagovIdentifying the potential impact of climate change is of increas-ing interest As an example understanding the effects of changingtemperature patterns on crops animals and public health is impor-tant if mitigation or adaptation strategies are to be developed Herethe consequences of the increasing frequency and intensity of heatwaves are considered First four decades of temperature data areused to identify heat waves for the six National Weather Serviceregions within Florida During these forty years each tempera-ture monitor has some days for which no data were recorded Thepresence of missing data has largely been ignored in this settingand analyses have been conducted based on observed data Alter-natively time series models spatial models or space-time modelscould be used to impute the missing data Here the effects of thetreatment of missing data on the identification of heat waves and thesubsequent inference related to the impact of heat waves on publichealth are explored

Statistical Approaches for Calibration of Climate ModelsGabriel Huerta1 Charles Jackson2 and Alvaro Nosedal11University of New Mexico2University of Texas at AustinghuertastatunmeduWe consider some recent developments to deal with climate mod-els and that rely on various modern computational and statisticalstrategies Firstly we consider various posterior sampling strate-gies to study a surrogate model that approximates a climate re-sponse through the Earthrsquos orbital parameters In particular weshow that for certain metrics of model skill AdaptiveDelayed Re-jection MCMC methods are effective to estimate parametric uncer-tainties and resolve inverse problems for climate models We willalso discuss some of the High Performance Computing efforts thatare taking place to calibrate various inputs that correspond to theNCAR Community Atmosphere Model (CAM) Finally we showhow to characterize output from a Regional Climate Model throughhierarchical modelling that combines Gauss Markov Random Fields(GMRF) with MCMC methods and that allows estimation of prob-ability distributions that underlie phenomena represented by the cli-mate output

Session 43 Recent Development in Survival Analysis andStatistical Genetics

Restricted Survival Time and Non-proportional HazardsZhigang ZhangMemorial Sloan Kettering Cancer CenterzhangzmskccorgIn this talk I will present some recent development of restricted sur-vival time and its usage especially when the proportional hazardsassumption is violated Technical advances and numerical studieswill both be discussed

Empirical Null using Mixture Distributions and Its Applicationin Local False Discovery RateDoHwan Park

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 67

Abstracts

University of MarylanddhparkumbceduWhen high dimensional data is given it is often of interest to distin-guish between significant (non-null Ha) and non-significant (nullH0) group from mixture of two by controlling type I error rate Onepopular way to control the level is the false discovery rate (FDR)This talk considers a method based on the local false discovery rateIn most of the previous studies the null group is commonly as-sumed to be a normal distribution However if the null distributioncan be departure from normal there may exist too many or too fewfalse discoveries (belongs null but rejected from the test) leadingto the failure of controlling the given level of FDR We propose anovel approach which enriches a class of null distribution based onmixture distributions We provide real examples of gene expressiondata fMRI data and protein domain data to illustrate the problemsfor overview

A Bayesian Illness-Death Model for the Analysis of CorrelatedSemi-Competing Risks DataKyu Ha Lee1 Sebastien Haneuse1 Deborah Schrag2 andFrancesca Dominici11Harvard University2Dana Farber Cancer InstitutekleehsphharvardeduReadmission rates are a major target of healthcare policy becausereadmission is common costly and potentially avoidable and henceis seen as an adverse outcome Therefore the Centers for Medicareand Medicaid Services currently uses 30-day readmission as a proxyoutcome for quality of care for a number of health conditions How-ever focusing solely on readmission rates in conditions with poorprognosis such as pancreatic cancer is to oversimplify a situationin which patients may die before being readmitted which clearlyis also an adverse outcome In such situations healthcare policyshould consider both readmission and death rates simultaneouslyTo this end our proposed Bayesian framework adopts an illness-death model to represent three transitions for pancreatic cancer pa-tients recently discharged from initial hospitalization (1) dischargeto readmission (2) discharge to death and (3) readmission to deathDependence between the two event times (readmission and death) isinduced via a subject-specific shared frailty Our proposed methodfurther extends the model to situations where patients within a hos-pital may be correlated due to unobserved characteristics We illus-trate the practical utility of our proposed method using data fromMedicare Part A on 100 of Medicare enrollees from 012000 to122010

Detection of Chromosome Copy Number Variations in MultipleSequencesXiaoyi Min Chi Song and Heping ZhangYale UniversityxiaoyiminyaleeduDNA copy number variation (CNV) is a form of genomic struc-tural variation that may affect human diseases Identification of theCNVs shared by many people in the population as well as deter-mining the carriers of these CNVs is essential for understanding therole of CNV in disease association studies For detecting CNVsin single samples a Screening and Ranking Algorithm (SaRa) waspreviously proposed which was shown to be superior over othercommonly used algorithms and have a sure coverage property Weextend SaRa to address the problem of common CNV detection inmultiple samples In particular we propose an adaptive Fisherrsquosmethod for combining the screening statistics across samples Theproposed multi-sample SaRa method inherits the computational and

practical benefits of single sample SaRa in CNV detection We alsocharacterize the theoretical properties of this method and demon-strate its performance in extensive numerical analyses

Session 44 Bayesian Methods and Applications in Clini-cal Trials with Small Population

Applications of Bayesian Meta-Analytic Approach at NovartisQiuling Ally He Roland Fisch and David OhlssenNovartis Pharmaceuticals CorporationallyhenovartiscomConducting an ethical efficient and cost-effective clinical trial hasalways been challenged by the availability of limited study popu-lation Bayesian approaches demonstrate many appealing featuresto deal with studies with small sample sizes and their importancehas been recognized by health authorities Novartis has been ac-tively developing and implementing Bayesian methods at differentstages of clinical development in both oncology and non-oncologysettings This presentation focuses on two applications of Bayesianmeta-analytic approach Both applications explore the relevant his-torical studies and establish meta-analysis to generate inferencesthat can be utilized by the concurrent studies The first example syn-thesized historical control information in a proof-of-concept studythe second application extrapolated efficacy from source to targetpopulation for registration purpose In both applications Bayesiansmethods are shown to effectively reduce the sample size durationof the studies and consequently resources invested

Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker TrialsYanxun Xu1 Lorenzo Trippa2 Peter Mueller1 and Yuan Ji31University of Texas at Austin2Harvard University3University of Texas at AustinyxustatgmailcomTargeted therapies based on biomarker profiling are becoming amainstream direction of cancer research and treatment Dependingon the expression of specific prognostic biomarkers targeted ther-apies assign different cancer drugs to subgroups of patients evenif they are diagnosed with the same type of cancer by traditionalmeans such as tumor location For example Herceptin is only in-dicated for the subgroup of patients with HER2+ breast cancer butnot other types of breast cancer However subgroups like HER2+breast cancer with effective targeted therapies are rare and most can-cer drugs are still being applied to large patient populations that in-clude many patients who might not respond or benefit Also theresponse to targeted agents in human is usually unpredictable Toaddress these issues we propose SUBA subgroup-based adaptivedesigns that simultaneously search for prognostic subgroups and al-locate patients adaptively to the best subgroup-specific treatmentsthroughout the course of the trial The main features of SUBA in-clude the continuous reclassification of patient subgroups based ona random partition model and the adaptive allocation of patients tothe best treatment arm based on posterior predictive probabilitiesWe compare the SUBA design with three alternative designs in-cluding equal randomization outcome-adaptive randomization anda design based on a probit regression In simulation studies we findthat SUBA compares favorably against the alternatives

Innovative Designs and Practical Considerations for PediatricStudiesAlan Y Chiang

68 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Eli Lilly and CompanychiangaylillycomDespite representing a fundamental step to an efficacious and safeutilization of drugs in pediatric studies the conduct of clinical tri-als in children poses several problems Methodological issues andethical concerns represent the major obstacles that have tradition-ally limited research in small population The randomized controltrial mainstay of clinical studies to assess the effects of any thera-peutic intervention shows some weaknesses which make it scarcelyapplicable to the small population Alternatively and innovative ap-proaches to the clinical trial design in small populations have beendeveloped in the last decades with the aim of overcoming the limitsrelated to small samples and to the acceptability of the trial Thesefeatures make them particularly appealing for the pediatric popu-lation and patients with rare diseases This presentation aims toprovide a variety of designs and analysis methods to assess efficacyand safety in pediatric studies including their applicability advan-tages disadvantages and real case examples Approaches includeBayesian designs borrowing information from other studies andmore innovative approaches Thanks to their features these meth-ods may rationally limit the amount of experimentation in smallpopulation to what is achievable necessary and ethical and presenta reliable way of ultimately improving patient care

Session 45 Recent Developments in Assessing PredictiveModels in Survival Analysis

partDSA for Deriving Survival Risk Groups Ensemble Learn-ing and Variable SelectionAnnette Molinaro1 Adam Olshen1 and Robert Strawderman2

1University of California at San Francisco2University of RochestermolinaroaneurosurgucsfeduWe recently developed partDSA a multivariate method that sim-ilarly to CART utilizes loss functions to select and partition pre-dictor variables to build a tree-like regression model for a given out-come However unlike CART partDSA permits both rsquoandrsquo and rsquoorrsquoconjunctions of predictors elucidating interactions between vari-ables as well as their independent contributions partDSA thus per-mits tremendous flexibility in the construction of predictive modelsand has been shown to supersede CART in both prediction accu-racy and stability As the resulting models continue to take the formof a decision tree partDSA also provides an ideal foundation fordeveloping a clinician-friendly tool for accurate risk prediction andstratificationWith right-censored outcomes partDSA currently builds estimatorsvia either the Inverse Probability Censoring Weighted (IPCW) orBrier Score weighting schemes see Lostritto Strawderman andMolinaro (2012) where it is shown in numerous simulations thatboth proposed adaptations for partDSA perform as well and of-ten considerably better than two competing tree-based methods Inthis talk various useful extensions of partDSA for right-censoredoutcomes are described and we show the power of the partDSA al-gorithm in deriving survival risk groups for glioma patient basedon genomic markers Another interesting extension of partDSA isas an aggregate learner A comparison will be made of standardpartDSA to an ensemble version of partDSA as well as to alterna-tive ensemble learners in terms of prediction accuracy and variableselection

Predictive Accuracy of Time-Dependent Markers for Survival

OutcomesLi Chen1 Donglin Zeng2 and Danyu Lin2

1University of Kentucky2University of North Carolina at Chapel HilllichenukyukyeduIn clinical cohort studies potentially censored times to a certainevent such as death or disease progression and patient charac-teristics at the time of diagnosis or the time of inclusion in thestudy (baseline) are often recorded Serial measurements on clin-ical markers during follow up may also be recorded for monitoringpurpose Recently there are increasing interests in incorporatingthese serial measurements of markers for the prediction of futuresurvival outcomes and assessing the predictive accuracy of thesetime-dependent markers In this paper we propose a new graphicalmeasure the negative predictive function to quantify the predictiveaccuracy of time-dependent markers for survival outcomes Thisnew measure has direct relevance to patient survival probabilitiesand thus has direct clinical utility We construct a nonparametricestimator for the proposed measure allowing censoring to dependon markers and adopt the bootstrap method to obtain the asymp-totic variances Simulation studies demonstrate that the proposedmethod performs well in practical situations One medical study ispresented

Estimating the Effectiveness in HIV Prevention Trials by Incor-porating the Exposure Process Application to HPTN 035 DataJingyang Zhang1 and Elizabeth R Brown2

1Fred Hutchinson Cancer Research Center2Fred Hutchinson Cancer Research CenterUniversity of Washing-tonjzhang2fhcrcorgEstimating the effectiveness of a new intervention is usually the pri-mary objective for HIV prevention trials The Cox proportionalhazard model is mainly used to estimate effectiveness by assum-ing that participants share the same risk under the covariates andthe risk is always non-zero In fact the risk is only non-zero whenan exposure event occurs and participants can have a varying riskto transmit due to varying patterns of exposure events Thereforewe propose a novel estimate of effectiveness adjusted for the hetero-geneity in the magnitude of exposure among the study populationusing a latent Poisson process model for the exposure path of eachparticipant Moreover our model considers the scenario in which aproportion of participants never experience an exposure event andadopts a zero-inflated distribution for the rate of the exposure pro-cess We employ a Bayesian estimation approach to estimate theexposure-adjusted effectiveness eliciting the priors from the histor-ical information Simulation studies are carried out to validate theapproach and explore the properties of the estimates An applicationexample is presented from an HIV prevention trial

Estimation of Predictive Accuracy of Survival RegressionModels Adjusting for Dependent Censoring andor High-Dimensional DataMing Wang1 and Qi Long2

1Penn State College of Medicine2Emory UniversitymwangphspsueduIn practice prediction models for cancer risk and prognosis playan important role in priority cancer research and evaluating andcomparing different models using predictive accuracy metrics in thepresence of censored data are of substantive interest by adjusting forcensoring mechanism To address this issue we evaluate two exist-ing metrics the concordance (c) statistic and the weighted c-statistic

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 69

Abstracts

which adopts an inverse-probability weighting technique under thecircumstances with dependent censoring mechanism via numericalstudies The asymptotic properties of the weighted c-statistic in-cluding consistency and normality is theoretically and rigorouslyestablished In particular the cases with high-dimensional prog-nostic factors (p is moderately large) are investigated to assess thestrategies for estimating the censoring weights by utilizing a regu-larization approach with lasso penalty In addition sensitivity anal-ysis is theoretically and practically conducted to assess predictiveaccuracy in the cases of informative censoring (ie not coarsened atrandom) using non-parametric estimates on the cumulative baselinehazard for the weights Finally a prostate cancer study is adopted tobuild up and evaluate prediction models of future tumor recurrenceafter surgery

Session 46 Missing Data the Interface between SurveySampling and Biostatistics

Likelihood-based Inference with Missing Data Under Missing-at-randomShu Yang and Jae Kwang KimIowa State Universityshuyangiastateedu

Likelihood-based inference with missing data is a challenging prob-lem because the observed log likelihood is of an integral form Ap-proximating the integral by Monte Carlo sampling does not neces-sarily lead to valid inference because the Monte Carlo samples aregenerated from a distribution with a fixed parameter valueWe consider an alternative approach that is based on the parametricfractional imputation of Kim (2011) In the proposed method thedependency of the integral on the parameter is properly reflectedthrough fractional weights We discuss constructing a confidenceinterval using the profile likelihood ratio test A Newton-Raphsonalgorithm is employed to find the interval end points Two limitedsimulation studies show the advantage of the likelihood-based in-ference over the Wald-type inference in terms of power parameterspace conformity and computational efficiency A real data exampleon Salamander mating (McCullagh and Nelder 1989) shows thatour method also works well with high-dimensional missing data

Generalized Method of Moments Estimator Based On Semi-parametric Quantile Regression ImputationCindy Yu and Senniang ChenIowa State Universitysncheniastateedu

In this article we consider an imputation method to handle missingresponse values based on semiparametric quantile regression esti-mation In the proposed method the missing response values aregenerated using the estimated conditional quantile regression func-tion at given values of covariates We adopt the generalized methodof moments for estimation of parameters defined through a generalestimation equation We demonstrate that the proposed estimatorcombining both semiparametric quantile regression imputation andgeneralized method of moments is an effective alternative to pa-rameter estimation when missing data is present The consistencyand the asymptotic normality of our estimators are established andvariance estimation is provided Results from limited simulationstudies are presented to show the adequacy of the proposed method

A New Estimation with Minimum Trace of Asymptotic Covari-ance Matrix for Incomplete Longitudinal Data with a Surrogate

ProcessBaojiang Chen1 and Jing Qin2

1University of Nebraska2National Institutes of HealthbaojiangchenunmceduMissing data is a very common problem in medical and social stud-ies especially when data are collected longitudinally It is a chal-lenging problem to utilize observed data effectively Many paperson missing data problems can be found in statistical literature Itis well known that the inverse weighted estimation is neither effi-cient nor robust On the other hand the doubly robust method canimprove the efficiency and robustness As is known the doubly ro-bust estimation requires a missing data model (ie a model for theprobability that data are observed) and a working regression model(ie a model for the outcome variable given covariates and surro-gate variables) Since the DR estimating function has mean zero forany parameters in the working regression model when the missingdata model is correctly specified in this paper we derive a formulafor the estimator of the parameters of the working regression modelthat yields the optimally efficient estimator of the marginal meanmodel (the parameters of interest) when the missing data model iscorrectly specified Furthermore the proposed method also inher-its the doubly robust property Simulation studies demonstrate thegreater efficiency of the proposed method compared to the standarddoubly robust method A longitudinal dementia data set is used forillustration

Adaptive Multi-Phase Sampling for Asymptotically-OptimalMean Score AnalysesMichael McIsaac1 and Richard Cook21Queenrsquos University2University of WaterloomcisaacmqueensucaResponse-dependent two-phase designs can ensure good statisti-cal efficiency while working within resource constraints Samplingschemes that are optimized for analyses based on mean score esti-mating equations have been shown to be highly efficient in a numberof different settings and are straightforward to implement if detailedpopulation characteristics are knownI will present an adaptive multi-phase design which exploits in-formation from an internal pilot study to approximate this optimalmean score design These adaptive designs are easy to implementand result in large efficiency gains while keeping study costs lowThe implementation of this design will be demonstrated using simu-lation studies motivated by an ongoing research program in rheuma-tology

Session 47 New Statistical Methods for Comparative Ef-fectiveness Research and Personalized medicine

Efficient Design for Prospective Observational StudiesYu Shen1 Hao Liu2 Jing Ning3 and Jing Qin4

1University of Texas MD Anderson Cancer Center2Baylor College of Medicine3University of Texas MD Anderson Cancer Center4National Institutes of HealthyshenmdandersonorgUsing data from large observational studies may fill the informa-tion gaps due to lack of evidence from randomized controlled trialsSuch studies may inform real-world clinical scenarios and improveclinical decisions among various treatment strategies However the

70 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

design and analysis of comparative effectiveness studies based onobservational data are complex In this work we proposed prac-tical sample size and power calculation tools for prevalent cohortdesigns and suggested some efficient analysis methods as well

Choice between Superiority and Non-inferiority in Compara-tive Effectiveness ExperimentsMei-Chiung Shih1 Tze Leung Lai2 and Philip W Lavori21VA Cooperative Studies Program amp Stanford University2Stanford UniversityMei-ChiungShihvagovIn designing a comparative effectiveness experiment such as an ac-tive controlled clinical trial comparing a new treatment to an ac-tive control treatment or a comparative effectiveness trial comparingtreatments already in use one sometimes has to choose between asuperiority objective (to demonstrate that one treatment is more ef-fective than the other active treatments) and a non-inferiority objec-tive (to demonstrate that one treatment is no worse than other activetreatments within a pre-specified non-inferiority margin) It is oftendifficult to decide which study objective should be undertaken at theplanning stage when one does not have actual data on the compar-ative effectiveness of the treatments In this talk we describe twoadaptive design features for such trials (1) adaptive choice of su-periority and non-inferiority objectives during interim analyses (2)treatment selection instead of testing superiority The latter aims toselect treatments whose outcomes are close to that of the best treat-ment by eliminating at interim analyses non-promising treatmentsthat are unlikely to be much better than the observed best treatment

An Adaptive Design Approach for Studying Dynamic Treat-ment Regimes in a Pragmatic Trials SettingMike Baiocchi Jane Paik and Tze LaiStanford UniversitymikebaiocchigmailcomThe demand for rigorous studies of dynamic treatment regimens isincreasing as medical providers treat larger numbers of patients withboth multi-stage disease states and chronic care issues (for examplecancer treatments pain management depression HIV) In this talkwe will propose a trial design developed specifically to be run in areal-world clinical setting These kinds of trials (sometimes calledldquopragmatic trialsrdquo) have several advantages which we will discussThey also pose two major problems for analysis (1) in runninga randomized trial in a clinical setting there is an ethical impera-tive to provide patients with the best outcomes while still collect-ing information on the relative efficacy of treatment regimes whichmeans traditional trial designs are inadequate in providing guidanceand (2) real-world considerations such as informative censoring ormissing data become substantial hurdles We incorporate elementsfrom both point-of-care randomized trials and multiarmed bandittheory and propose a unified method of trial design

Improving Efficiency in the Estimation of the Effect of Treat-ment on a Multinomial OutcomeIvan Dıaz Michael Rosenblum and Elizabeth ColantuoniJohns Hopkins UniversityidiazjhueduWe present a methodology to evaluate the causal effect of a binarytreatment on a multinomial outcome when adjustment for covariatesis desirable Adjustment for baseline covariates may be desirableeven in randomized trials since covariates that are highly predic-tive of the outcome can substantially improve the efficiency Wefirst present a targeted minimum loss based estimator of the vec-tor of counterfactual probabilities This estimator is doubly robust

in observational studies and it is consistent in randomized trialsFurthermore it is locally semiparametric efficient under regular-ity conditions We present a variation of the previous estimatorthat may be used in randomized trials and that is guaranteed tobe asymptotically as efficient as the standard unadjusted estima-tor We use the previous results to derive a nonparametric extensionof the parameters in a proportional-odds model for ordinal-valueddata and present a targeted minimum loss based estimator Thisestimator is guaranteed to be asymptotically as or more efficientas the unadjusted estimator of the proportional-odds model As aconsequence this non-parametric extension may be used to test thenull hypothesis of no effect with potentially increased power Wepresent a motivating example and simulations using the data fromthe MISTIE II clinical trial of a new surgical intervention for strokeJoint work with Michael Rosenblum and Elizabeth Colantuoni

Session 48 Student Award Session 1

Regularization After Retention in Ultrahigh Dimensional Lin-ear Regression ModelsHaolei Weng1 Yang Feng1 and Xingye Qiao2

1Columbia University2Binghamton Universityhw2375columbiaedu

Lasso has proved to be a computationally tractable variable selec-tion approach in high dimensional data analysis However in theultrahigh dimensional setting the conditions of model selectionconsistency could easily fail The independence screening frame-work tackles this problem by reducing the dimensionality based onmarginal correlations before performing lasso In this paper we pro-pose a two-step approach to relax the consistency conditions of lassoby using marginal information in a different perspective from inde-pendence screening In particular we retain significant variablesrather than screening out irrelevant ones The new method is shownto be model selection consistent in the ultrahigh dimensional linearregression model A modified version is introduced to improve thefinite sample performance Simulations and real data analysis showadvantages of our method over lasso and independence screening

Personalized Dose Finding Using Outcome Weighted LearningGuanhua Chen1 Donglin Zeng1 and Michael R Kosorok11University of North Carolina at Chapel Hillguanhuacliveuncedu

In dose-finding clinical trials there is a growing recognition of theimportance to consider individual level heterogeneity when search-ing for optimal treatment doses Such optimal individualized treat-ment rule (ITR) for dosing should maximize the expected clinicalbenefit In this paper we consider a randomized trial design wherethe candidate dose levels are continuous To find the optimal ITRunder such a design we propose an outcome weighted learningmethod which directly maximizes the expected clinical beneficialoutcome This method converts the individualized dose selectionproblem into a penalized weighted regression with a truncated ell1loss A difference of convex functions (DC) algorithm is adoptedto efficiently solve the associated non-convex optimization prob-lem The consistency and convergence rate for the estimated ITRare derived and small-sample performance is evaluated via simula-tion studies We demonstrate that the proposed method outperformscompeting approaches We illustrate the method using data from aclinical trial for Warfarin (an anti-thrombotic drug) dosing

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 71

Abstracts

Survival Rates Prediction When Training Data and Target DataHave Different Measurement ErrorCheng Zheng and Yingye ZhengFred Hutchinson Cancer Research Centerzhengc68uweduNovel biologic markers have been widely used in predicting impor-tant clinical outcome One specific feature of biomarkers is thatthey often are ascertained with variations due to the specific processof measurement The magnitude of such variation may differ whenapplied to a different targeted population or when the platform forbiomarker assaying changes from original platform the predictionalgorithm (cutoffs) based upon Statistical methods have been pro-posed to characterize the effects of underlying error-free quantity inassociation with an outcome yet the impact of measurement errorsin terms of prediction has not been well studied We focus in thismanuscript on the settings which biomarkers are used for predictingindividualrsquos future risk and propose semiparametric estimators forerror-corrected risk when replicates of the error- prone biomark-ers are available The predictive performance of the proposed es-timators is evaluated and compared to alternative approaches withnumerical studies under settings with various assumptions on themeasurement distributions in the original cohort and a future cohortthe predictive rule is applied to We studied the asymptotic proper-ties of the proposed estimator Application is made in a liver cancerbiomarker study to predict risk of 3 and 4 years liver cancer inci-dence using age and a novel biomarker α-Fetoprotein

Hard Thresholded Regression Via Linear ProgrammingQiang SunUniversity of North Carolina at Chapel HillqsunliveunceduThis aim of this paper is to develop a hard thresholded regression(HTR) framework for simultaneous variable selection and unbiasedestimation in high dimensional linear regression This new frame-work is motivated by its close connection with best subset selectionunder orthogonal design while enjoying several key computationaland theoretical advantages over many existing penalization methods(eg SCAD or MCP) Computationally HTR is a fast two-step esti-mation procedure consisting of the first step for calculating a coarseinitial estimator and the second step for solving a linear program-ming Theoretically under some mild conditions the HTR estima-tor is shown to enjoy the strong oracle property and thresholed prop-erty even when the number of covariates may grow at an exponen-tial rate We also propose to incorporate the regularized covarianceestimator into the estimation procedure in order to better trade offbetween noise accumulation and correlation modeling Under thisscenario with regularized covariance matrix HTR includes Sure In-dependence Screening as a special case Both simulation and realdata results show that HTR outperforms other state-of-the-art meth-ods

Session 49 Network AnalysisUnsupervised Methods

Community Detection in Multilayer Networks A HypothesisTesting ApproachJames D Wilson Shankar Bhamidi and Andrew B NobelUniversity of North Carolina at Chapel HilljameswdemailunceduThe identification of clusters in relational data otherwise knownas community detection is an important and well-studied problemin undirected and directed networks Importantly the units of acomplex system often share multiple types of pairwise relationships

wherein a single community detection analysis does not account forthe unique types or layers In this scenario a sequence of networkscan be used to model each type of relationship resulting in a multi-layer network structure We propose and investigate a novel testingbased community detection procedure for multilayer networks Weshow that by borrowing strength across layers our method is ableto detect communities in scenarios that are impossible for contem-porary detection methods By investigating the performance andpotential use of our method through simulations and applicationon real multilayer networks we show that our procedure can suc-cessfully identify significant community structure in the multilayerregime

Network Enrichment Analysis with Incomplete Network Infor-mationJing Ma1 Ali Shojaie2 and George Michailidis11University of Michigan2University of Washingtonmjingumichedu

Pathway enrichment analysis has become a key tool for biomed-ical researchers to gain insight in the underlying biology of dif-ferentially expressed genes proteins and metabolites It reducescomplexity and provides a systems-level view of changes in cellu-lar activity in response to treatments andor progression of diseasestates Methods that use pathway topology information have beenshown to outperform simpler methods based on over-representationanalysis However despite significant progress in understandingthe association among members of biological pathways and ex-pansion of new knowledge data bases such as Kyoto Encyclope-dia of Genes and Genomes Reactome BioCarta etc the exist-ing network information may be incompleteinaccurate and are notcondition-specific We propose a constrained network estimationframework that combines network estimation based on cell- andcondition-specific omics data with interaction information from ex-isting data bases The resulting pathway topology information issubsequently used to provide a framework for simultaneous test-ing of differences in mean expression levels as well as interactionmechanisms We study the asymptotic properties of the proposednetwork estimator and the test for pathway enrichment and investi-gate its small sample performance in simulated experiments and ona bladder cancer study involving metabolomics data

Estimation of A Linear Model with Fuzzy Data Treated as Spe-cial Functional DataWang DabuxilatuGuangzhou Universitywangdabugzhueducn

Data which cannot be exactly described by means of numerical val-ues such as evaluations medical diagnosi quality ratings vagueeconomic items to name but a few are frequently classified as ei-ther nominal or ordinal However we may be aware of that usingsuch representation of data (eg the categorises are labeled withnumerical values) the statistical analysis is limited and sometimesthe interpretation and reliability of the conclusions are effected Aneasy-to-use representation of such data through fuzzy values (fuzzydata) could be employed The measurement scale of fuzzy valuesincludes in particular real vectors and set values as special ele-ments It is more expressive than ordinal scales and more accuratethan rounding or using real or vectorial-valued codes The transi-tion between closely different values can be made gradually andthe variability accuracy and possible subjectiveness can be well re-flected in describing data Fuzzy data could be viewed as special

72 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

functional data the so-called support function of the data as it es-tablishes a useful embedding of the space of fuzzy data into a coneof a functional Hilbert spaceThe simple linear regression models with fuzzy data have been stud-ied from different perspectives and in different frameworks Theleast squares estimation on real-valued and set valued parametersunder generalized Hausdorff metric and the Hukuhara differenceare obtained However due to the nonlinearity of the space of fuzzyrandom sets it is difficult to consider the parameters estimation fora multivariate linear model with fuzzy random sets We will treatthe fuzzy data as special functional data to estimate a multivariatelinear model within a cone of a functional Hilbert space As a casewe consider LR fuzzy random sets (LR fuzzy values or LR fuzzydata) which are a sort of fuzzy data applied to model usual ran-dom experiments when the characteristic observed on each resultcan be described with fuzzy numbers of a particular class deter-mined by three random variables the center the left spread andthe right spread under the given shape functions L and R LRfuzzy random sets are widely applied in information science deci-sion making operational research economic and financial model-ings Using a least squares approach we obtain an estimation forthe set-valued parameters of the multivariate regression model withLR fuzzy random sets under L2 metric delta2dLSsome bootstrapdistributions for the spreads variables of the fuzzy random residualterm are given

Efficient Estimation of Sparse Directed Acyclic Graphs UnderCompounded Poisson DataSung Won Han and Hua ZhongNew York Universitysungwonhan2gmailcom

Certain gene expressions such as RNA-sequence measurementsare recorded as count data which can be assumed to follow com-pounded Poisson distribution This presentation proposes an effi-cient heuristic algorithm to estimate the structure of directed acyclicgraphs under the L1-penalized likelihood with the Poisson log-normal distributed data given that variable ordering is unknown Toobtain the close form of the penalize likelihood we apply Laplaceintegral approximation for unobserved normal variables and we useiterative two optimization steps to estimate an adjacency matrix andunobserved parameters The adjacency matrix is estimated by sepa-rable lasso problems and the unobserved parameters of the normaldistribution are estimated by separable optimization problems Thesimulation result shows that our proposed method performs betterthan the data transformation method in terms of true positive andMatthewrsquos correlation coefficient except for under low count datawith many zeros The large variance of data and the large numberof variables benefit more to the proposed method

Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical ModelMengjie Chen Zhao Ren Hongyu Zhao and Harrison ZhouYale Universityzhaorenyaleedu

A tuning-free procedure is proposed to estimate the covariate-adjusted Gaussian graphical model For each finite subgraph thisestimator is asymptotically normal and efficient As a consequencea confidence interval can be obtained for each edge The proce-dure enjoys easy implementation and efficient computation throughparallel estimation on subgraphs or edges We further apply theasymptotic normality result to perform support recovery throughedge-wise adaptive thresholding This support recovery procedure

is called ANTAC standing for Asymptotically Normal estimationwith Thresholding after Adjusting Covariates ANTAC outper-forms other methodologies in the literature in a range of simulationstudies We apply ANTAC to identify gene-gene interactions us-ing a yeast eQTL (Genome-wide expression quantitative trait loci)dataset Our result achieves better interpretability and accuracy incomparison with the CAPME (covariate-adjusted precision matrixestimation) method proposed by Cai Li Liu and Xie (2013) This isa joint work with Mengjie Chen Hongyu Zhao and Harrison Zhou

Session 50 Personalized Medicine and Adaptive Design

MicroRNA Array NormalizationLi-Xuan and Qin ZhouMemorial Sloan Kettering Cancer CenterqinlmskccorgMicroRNA microarrays possess a number of unique data featuresthat challenge the assumption key to many normalization methodsWe assessed the performance of existing normalization methods us-ing two Agilent microRNA array datasets derived from the sameset of tumor samples one dataset was generated using a blockedrandomization design when assigning arrays to samples and hencewas free of confounding array effects the second dataset was gener-ated without blocking or randomization and exhibited array effectsThe randomized dataset was assessed for differential expression be-tween two tumor groups and treated as the benchmark The non-randomized dataset was assessed for differential expression afternormalization and compared against the benchmark Normaliza-tion improved the true positive rate significantly but still possessesa false discovery rate as high as 50 in the non-randomized dataregardless of the specific normalization method applied We per-formed simulation studies under various scenarios of differentialexpression patterns to assess the generalizability of our empiricalobservations

Combining Multiple Biomarker Models with Covariates in Lo-gistic Regression Using Modified ARM (Adaptive Regression byMixing) ApproachYanping Qiu1 and Rong Liu2

1Merck amp Co2Bayer HealthCarerongliuflgmailcomBiomarkers are wildly used as an indicator of some biological stateor condition in medical research One single biomarker may notbe sufficient to serve as an optimal screening device for early de-tection or prognosis for many diseases A combination of multiplebiomarkers will usually potentially lead to more sensitive screen-ing rules Therefore a great interest has been involved in develop-ing methods for combining biomarkers Biomarker selection pro-cedure will be necessary for efficient detections In this article wepropose a model-combining algorithm for classification with somenecessary covariates in biomarker studies It selects some best mod-els with some criterion and considers weighted combinations ofvarious logistic regression models via ARM (adaptive regressionby mixing) The weights and algorithm are justified using cross-validation methods Simulation studies are performed to assess thefinite-sample properties of the proposed model-combining methodIt is illustrated with an application to data from a vaccine study

A New Association Test for Case-Control GWAS Based on Dis-ease Allele SelectionZhongxue Chen

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 73

Abstracts

Indiana Universityzc3indianaeduCurrent robust association tests for case-control genome-wide asso-ciation study (GWAS) data are mainly based on the assumption ofsome specific genetic models Due to the richness of the geneticmodels this assumption may not be appropriate Therefore robustbut powerful association approaches are desirable Here we proposea new approach to testing for the association between the genotypeand phenotype for case-control GWAS This method assumes a gen-eralized genetic model and is based on the selected disease allele toobtain a p-value from the more powerful one-sided test Through acomprehensive simulation study we assess the performance of thenew test by comparing it with existing methods Some real data ap-plications are used to illustrate the use of the proposed test Basedon the simulation results and real data application the proposed testis powerful and robust

On Classification Methods for Personalized Medicine and Indi-vidualized Treatment RulesDaniel RubinUnited States Food and Drug AdministrationDanielRubinfdahhsgovAn important problem in personalized medicine is to construct in-dividualized treatment rules from clinical trials Instead of rec-ommending a single treatment for all patients such a rule tailorstreatments based on patient characteristics in order to optimize re-sponse to therapy In a 2012 JASA article Zhao et al showeda connection between this problem of constructing an individual-ized treatment rule and binary classification For instance in a two-arm clinical trial with binary outcomes and 11 randomization theproblem of constructing an individualized treatment rule can be re-duced to the classification problem in which one restricts to respon-ders and builds a classifier that predicts subjectsrsquo treatment assign-ments We extend this method to show an analogous reduction to theproblem in which one restricts to non-responders and must build aclassifier that predicts which treatments subjects were not assignedWe then use results from statistical efficiency theory to show howto efficiently combine the information from responders and non-responders Simulations show the benefits of the new methodology

Bayesian Adaptive Design for Dose-Finding Studies with De-layed Binary ResponsesXiaobi Huang1 and Haoda Fu2

1Merck amp Co2Eli Lilly and CompanyxiaobihuangmerckcomBayesian adaptive design is a popular concept in recent dose-findingstudies The idea of adaptive design is to use accrued data to makeadaptation or modification to an ongoing trial to improve the effi-ciency of the trial During the interim analysis most current meth-ods only use data from patients who have completed the studyHowever in certain therapeutic areas as diabetes and obesity sub-jects are usually studied for months to observe a treatment effectThus a large proportion of them have not completed the study atthe interim analysis It could lead to extensive information loss ifwe only incorporate subjects who completed the study at the interimanalysis Fu and Manner (2010) proposed a Bayesian integratedtwo-component prediction model to incorporate subjects who havenot yet completed the study at the time of interim analysis Thismethod showed efficiency gain with continuous delayed responsesIn this paper we extend this method to accommodate delayed bi-nary response and illustrate the Bayesian adaptive design through asimulation example

Session 51 New Development in Functional Data Analy-sis

Variable Selection and Estimation for Longitudinal Survey DataLi Wang1 Suojin Wang2 and Guannan Wang1

1University of Georgia2Texas AampM UniversityguannanugaeduThere is wide interest in studying longitudinal surveys where sam-ple subjects are observed successively over time Longitudinal sur-veys have been used in many areas today for example in the healthand social sciences to explore relationships or to identify signifi-cant variables in regression settings This paper develops a generalstrategy for the model selection problem in longitudinal sample sur-veys A survey weighted penalized estimating equation approachis proposed to select significant variables and estimate the coeffi-cients simultaneously The proposed estimators are design consis-tent and perform as well as the oracle procedure when the correctsubmodel were known The estimating function bootstrap is ap-plied to obtain the standard errors of the estimated parameters withgood accuracy A fast and efficient variable selection algorithm isdeveloped to identify significant variables for complex longitudinalsurvey data Simulated examples are illustrated to show the useful-ness of the proposed methodology under various model settings andsampling designs

Estimation of Nonlinear Differential Equation Model UsingGeneralized SmoothingInna Chervoneva1 Tatiyana V Apanasovich2 and Boris Freydin1

1Thomas Jefferson University2George Washington UniversityapanasovichgwueduIn this work we develop an ordinary differential equations (ODE)model of physiological regulation of glycemia in type 1 diabetesmellitus (T1DM) patients in response to meals and intravenous in-sulin infusion Unlike for majority of existing mathematical modelsof glucose-insulin dynamics parameters in our model are estimablefrom a relatively small number of noisy observations of plasmaglucose and insulin concentrations For estimation we adopt thegeneralized smoothing estimation of nonlinear dynamic systems ofRamsay et al (2007) In this framework the ODE solution is ap-proximated with a penalized spline where the ODE model is in-corporated in the penalty We propose to optimize the generalizedsmoothing by using penalty weights that minimize the covariancepenalties criterion (Efron 2004) The covariance penalties criterionprovides an estimate of the prediction error for nonlinear estima-tion rules resulting from nonlinear andor non-homogeneous ODEmodels such as our model of glucose-insulin dynamics We alsopropose to select the optimal number and location of knots for B-spline bases used to represent the ODE solution The results of thesmall simulation study demonstrate advantages of optimized gen-eralized smoothing in terms of smaller estimation errors for ODEparameters and smaller prediction errors for solutions of differen-tial equations Using the proposed approach to analyze the glucoseand insulin concentration data in T1DM patients we obtained goodapproximation of global glucose-insulin dynamics and physiologi-cally meaningful parameter estimates

A Functional Data Approach to Modeling Brain Image DataYihong Zhao1 R Todd Ogden2 and Huaihou Chen1

1New York University2Columbia Universityzhaoy05nyumcorg

74 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Resting-state functional magnetic resonance imaging (fMRI) is sen-sitive to functional brain changes related to many psychiatric disor-ders and thus becomes increasingly important in medical researchOne useful approach for fitting linear models with scalar outcomesand image predictors involves transforming the functional data tothe wavelet domain and converting the data fitting problem to a vari-able selection problem Applying the LASSO procedure in this sit-uation has been shown to be efficient and powerful In this study weexplore possible directions for improvements to this method The fi-nite sample performance of the proposed methods will be comparedthrough simulations and real data applications in mental health re-search We believe applying these procedures can lead to improvedestimation and prediction as well as better stability An illustrationof modeling psychiatric traits based on brain-imaging data will bepresented

Estimation in Functional Linear Quantile RegressionLinglong Kong Dengdeng Yu and Ivan MizeraUniversity of AlbertalkongualbertacaWe consider the estimation in functional linear quantile regressionin which the dependent variable is scalar while the covariate is afunction and the conditional quantile for each fixed quantile indexis modeled as a linear functional of the covariate There are twocommon approaches for modeling the conditional mean as a linearfunctional of the covariate One is to use the functional principalcomponents of the covariates as basis to represent the functionalcovariate effect The other one is to extend the partial least squareto model the functional effect The former belongs to unsupervisedmethod and has been generalized to functional linear quantile re-gression The later is a supervised method and is superior to theunsupervised PCA method In this talk we propose to use partialquantile regression to estimate the functional effect in functionallinear quantile regression Asymptotic properties have been stud-ied and show the virtue of our method in large sample Simulationstudy is conducted to compare it with existing methods A real dataexample in stroke study is analyzed and some interesting findingsare discovered

Session 52 Recent RegulatoryIndustry Experience inBiosimilar Trial Designs

Statistical Considerations for the Development of BiosimilarProductsNan Zhangand Eric ChiAmgen IncchiamgencomAs the patents of a growing number of biologic medicines have al-ready expired or are due to expire it has led to an increased interestfrom both the biopharmaceutical industry and the regulatory agen-cies in the development and approval of biosimilars EMA releasedthe first general guideline on similar biological medicinal productsin 2005 and specific guidelines for different drug classes subse-quently FDA issued three draft guidelines in 2012 on biosimilarproduct development A synthesized message from these guidancedocuments is that due to the fundamental differences between smallmolecule drug products and biologic drug products which are madeof living cells the generic versions of biologic drug products areviewed as similar instead of identical to the innovative biologicdrug product Thus more stringent requirement is necessary todemonstrate there are no clinically meaningful differences between

the biosimilar product and the reference product in terms of safetypurity and potency In this article we will briefly review statis-tical issues and challenges in clinical development of biosimilarsincluding criteria for biosimilarity and interchangeability selectionof endpoints and determination of equivalence margins equivalencevs non-inferiority bridging and regional effect and how to quan-tify totality-of-the-evidence

New Analytical Methods for Non-Inferiority Trials CovariateAdjustment and Sensitivity AnalysisZhiwei Zhang Lei Nie Guoxing Soon and Bo ZhangUnited States Food and Drug AdministrationzhiweizhangfdahhsgovEven though an active-controlled trial provides no informationabout placebo investigators and regulators often wonder how theexperimental treatment would compare to placebo should a placeboarm be included in the study Such an indirect comparison oftenrequires a constancy assumption namely that the control effect rel-ative to placebo is constant across studies When the constancyassumption is in doubt there are ad hoc methods that ldquodiscountrdquothe historical data in conservative ways Recently a covariate ad-justment approach was proposed that does not require constancyor involve discounting but rather attempts to adjust for any imbal-ances in covariates between the current and historical studies Thiscovariate-adjusted approach is valid under a conditional constancyassumption which requires only that the control effect be constantwithin each subpopulation characterized by the observed covariatesFurthermore a sensitivity analysis approach has been developed toaddress possible departures from the conditional constancy assump-tion due to imbalances in unmeasured covariates This presentationdescribes these new approaches and illustrates them with examples

Where is the Right Balance for Designing an Efficient Biosim-ilar Clinical Program - A Biostatistic Perspective on Appro-priate Applications of Statistical Principles from New Drug toBiosimilarsYulan LiNovartis Pharmaceuticals Corporationyulanlinovartiscom

Challenges of designinganalyzing trials for Hepatitis C drugsGreg SoonUnited States Food and Drug AdministrationGuoxingSoonfdahhsgovThere has been a surge an outburst in drug developments to treathepatitis C virus (HCV) infection in the past 3-4 years and thelandscape has shifted significantly In particularly theresponse rateshaves steadily increased from approximately round 50 to now90 for HCV genotype 1 patients during this time While the suchchanging landscape es is beneficial were great for patientsit doeslead to some new challenges for new future HCV drugd evelopmentSome of the challenges include particularly in thechoice of controlsuccess efficacy winning criteria for efficacy and co-developmentof several drugs In this talk I will summarize the current landscapeof the HCV drug development and describe someongoing issues thatof interest

GSKrsquos Patient-level Data Sharing ProgramShuyen HoGlaxoSmithKline plcshu-yenyhogskcomIn May 2013 GSK launched an online system which would en-able researchers to request access to the anonymized patient-leveldata from published GSK-sponsored clinical trials of authorized or

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 75

Abstracts

terminated medicines Phase I-IV Consistent with expectations ofgood scientific practice researchers can request access and are re-quired to provide a scientific protocol with a commitment to publishtheir findings An Independent Review Panel is responsible for ap-proving or denying access to the data after reviewing a researcherrsquosproposal Once the request is approved and a signed Data SharingAgreement is received access to the requested data is provided ona password protected website to help protect research participantsrsquoprivacy This program is a step toward the ultimate aim of the clin-ical research community of developing a broader system where re-searchers will be able to access data from clinical trials conductedby different sponsors This talk will describe some of the details ofGSKrsquos data-sharing program including the opportunities and chal-lenges it presents We hope to bring the awareness of ICSAKISSsymposium participants on this program and encourage researchersto take full advantage of it to further clinical research

Session 53 Gatekeeping Procedures and Their Applica-tion in Pivotal Clinical Trials

A General Multistage Procedure for k-out-of-n GatekeepingDong Xi1 and Ajit Tamhane21Novartis Pharmaceuticals Corporation2Northwestern UniversitydongxinovartiscomWe generalize a multistage procedure for parallel gatekeeping towhat we refer to as k-out-of-n gatekeeping in which at least k outof n hypotheses in a gatekeeper family must be rejected in orderto test the hypotheses in the following family This gatekeepingrestriction arises in certain types of clinical trials for example inrheumatoid arthritis trials it is required that efficacy be shown onat least three of the four primary endpoints We provide a unifiedtheory of multistage procedures for arbitrary k with k = 1 corre-sponding to parallel gatekeeping and k = n to serial gatekeepingThe proposed procedure is simpler to apply for this particular prob-lem using a stepwise algorithm than the mixture procedure and thegraphical procedure with memory using entangled graphs

Multiple Comparisons in Complex Trial DesignsHM James HungUnited States Food and Drug AdministrationhsienminghungfdahhsgovAs the costs of clinical trials increase greatly in addition to otherconsiderations the clinical development program increasingly in-volves more than one trial for assessing the treatment effect of a testdrug particularly on adverse clinical outcomes A number of com-plex trial designs have been encountered in regulatory applicationsIn one scenario the primary efficacy endpoint requires two posi-tive trials to conclude a treatment effect while the key secondaryendpoint is a major adverse clinical endpoint such as mortality thatneeds to rely on integration of multiple trials in order to have a suf-ficient statistical power to show the treatment effect This presenta-tion is to stipulate the potential utility of such a trial design and thechallenging multiplicity issues with statistical inference

Use of Bootstrapping in Adaptive Designs with Multiplicity Is-suesJeff MacaQuintilesjeffmacaquintilescomWhen designing a clinical study there are often many parameterswhich are either unknown or not known with the precision neces-

sary to have confidence in the over design This has lead sponsors towant the design studies which are adaptive in nature and can adjustfor these design parameters by using data from the study to estimatethem As there are many different design parameters which dependon the type of study many different types of adaptive designs havebeen proposed It is also possible that one of the issues in the de-sign of the study is the optimal multiplicity strategy which could bebased on assumptions on the correlation of the multiple endpointswhich is often very difficult to know prior to the study start Theproposed methodology would use the data to estimate these param-eters and correct for any inaccuracies in the assumptions

Evaluating Commonly Used Multiple Testing Procedures inDrug DevelopmentMichael LeeJanssen Research amp Developmentmlee60itsjnjcomMultiplicity issues arise frequently in clinical trials with multipleendpoints andor multiple doses In drug development because ofregulatory requirements control of family-wise error rate (FWER)is essential in pivotal trials Numerous multiple testing proceduresthat control FWER in strong sense are available in literature Par-ticularly in the last decade efficient testing procedures such asfallback procedures gatekeeping procedures and the graphical ap-proach were proposed Depending on objectives of a study oneof these testing procedures can over-perform others To understandwhich testing procedure is preferable under certain circumstancewe use a simulation approach to evaluate performance of a few com-monly used multiple testing procedures Evaluation results and rec-ommendation will be presented

Session 54 Approaches to Assessing Qualitative Interac-tions

Interval Based Graphical Approach to Assessing Qualitative In-teractionGuohua Pan and Eun Young SuhJohnson amp JohnsonesuhitsjnjcomIn clinical studies comparing treatments the population often con-sists of subgroups of patients with different characteristics and in-vestigators often wish to know whether treatment effects are ho-mogeneous over various subgroups Qualitative interaction occurswhen the direction of treatment effect varies among subgroups Inthe presence of a qualitative interaction treatment recommendationis often challenging In medical research and applications to HealthAuthorities for approvals of new drugs qualitative interaction andits impact need to be carefully evaluated The initial statisticalmethod for assessing qualitative interaction was developed by Gailand Simon (GS) in 1985 and has been incorporated into commer-cial statistical software such as SAS While relatively often usedthe GS method and its interpretation are not easily understood bymedical researchers Alternative approaches have been researchedsince then One of the promising methods utilizes graphical repre-sentation of specially devised intervals for the treatment effects inthe subgroups If some of the intervals are to the left and others tothe right of a vertical line representing no treatment difference thereis then statistical evidence of a qualitative interaction and otherwisenot This feature similar to the familiar forest plots by subgroups isnaturally appealing to clinical scientists for examining and under-standing qualitative interactions These specially devised intervals

76 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

are shorter than simultaneous confidence intervals for treatment ef-fects in the subgroups and are shown to rival the GS method in sta-tistical power The method is easy to use and additionally providesan explicit power function which the GS method lacks This talkwill review and contrast statistical methods for assessing qualitativeinteraction with an emphasis on the above described graphical ap-proach Data from mega clinical trials on cardiovascular diseaseswill be analyzed to illustrate and compare the methods

Expected Variation and Quantitative and Qualitative Interac-tion in Clinical TrialsXiaolong LuoCelgene Corporationxluocelgenecom

Post hoc findings of unexpected heterogeneous treatment effectshave been a challenge in the interpretation of clinical trials for spon-sor regulatory agencies and medical practitioners They are possi-ble simply due to chance or due to fundamental treatment effectdifferentiation Without repeating the resource intensive clinical tri-als it is critical to examine the framework of the given studies andto explore the likely model that may explain the overly simplifiedanalyses In this talk we will describe both theory and real clinicaltrials that can share lights on this complex and challenging issue

A Bayesian Approach to Qualitative InteractionEmine O BaymanUniversity of Iowaemine-baymanuiowaedu

A Bayesian Approach to Qualitative Interaction Author Emine OBayman Ph D emine-baymanuiowaedu Department of Anes-thesia Department of Biostatistics University of IowaDifferences in treatment effects between centers in a multi-centertrial may be important These differences represent treatment bysubgroup interaction Qualitative interaction occurs when the sim-ple treatment effect in one subgroup has a different sign than inanother subgroup1 this interaction is important Quantitative inter-action occurs when the treatment effects are of the same sign in allsubgroups and is often not important because the treatment recom-mendation is identical in all subgroupsA hierarchical model is used with exchangeable mean responsesto each treatment between subgroups Bayesian test of qualita-tive interaction is developed2 by calculating the posterior proba-bility of qualitative interaction and the corresponding Bayes factorThe model is motivated by two multi-center trials with binary re-sponses3 The frequentist power and type I error of the test usingthe Bayes factor are examined and compared with two other com-monly used frequentist tests Gail and Simon4 and Piantadosi andGail5 tests The impact of imbalance between the sample sizesin each subgroup on power is examined under different scenar-ios The method is implemented using WinBUGS and R and theR2WinBUGS interfaceREFERENCES 1 Peto R Statistical Aspects of Cancer TrialsTreatment of cancer Edited by Halnan KE London Chapman ampHall 1982 pp 867-871 2 Bayman EO Chaloner K Cowles MKDetecting qualitative interaction a Bayesian approach Statistics inMedicine 2010 29 455-63 3 Todd MM Hindman BJ Clarke WRTorner JC Intraoperative Hypothermia for Aneurysm Surgery TrialI Mild intraoperative hypothermia during surgery for intracranialaneurysm New England Journal of Medicine 2005 352 135-454 Gail M Simon R Testing for Qualitative Interactions betweenTreatment Effects and Patient Subsets Biometrics 1985 41 361-372 5 Piantadosi S Gail MH A comparison of the power of two

tests for qualitative interactions Statistics in Medicine 1993 121239-48

Session 55 Interim Decision-Making in Phase II Trials

Evaluation of Interim Dose Selection Methods Using ROC Ap-proachDeli Wang Lu Cui Lanju Zhang and Bo YangAbbVie Incdeliwangabbviecom

Interim analyses may be planned to drop inefficacious dose(s) indose-ranging clinical trials Commonly used statistical methods forinterim decision-making include conditional power (CP) predictedconfidence interval (PCI) and predictive power (PP) approachesFor these widely used methods it is worthy to have a closer look attheir performance characteristics and their interconnected relation-ship This research is to investigate the performance of these threestatistical methods in terms of decision quality based on a receiveroperating characteristic (ROC) method in the binary endpoint set-tings More precisely performance of each method is studied basedon calculated sensitivity and specificity under the assumed rangesof desirable as well as undesirable outcomes The preferred cutoffis determined and performance comparison across different meth-ods can be made With an apparent exchangeability of the threemethods a simple and uniform approach becomes possible

Interim Monitoring for Futility Based on Probability of SuccessYijie Zhou1 Ruji Yao2 Bo Yang1 and Ram Suresh3

1AbbVie Inc2Merck amp Co3GlaxoSmithKline plcyijiezhouabbviecom

Statistical significance has been the traditional focus of clinical trialdesign However an increasing emphasis has been placed on themagnitude of treatment effect based on point estimates to enablecross-therapy comparison The magnitude of point estimates todemonstrate sufficient medical value when compared with exist-ing therapies is typically larger than that to demonstrate statisticalsignificance Therefore a new clinical trial design and its interimmonitoring needs to take into account the trial success in terms ofthe magnitude of point estimates In this talk we propose a new in-terim monitoring approach for futility that targets on the probabilityof trial success in terms of achieving a sufficiently large point es-timate at end of the trial Simulation is conducted to evaluate theoperational characteristics of this approach

Bayesian Adaptive Design in Oncology Early Phase TrialsYuehui Wu and Ramachandran SureshGlaxoSmithKline plcyuehui2wugskcom

Efficacy assessment is commonly seen in oncology trials as early asin Phase I trial expansion cohort part and phase II trials Early de-tection of efficacy signal or futility signal can greatly help the teamto make early decisions on future drug development plans such asstop for futility or start late phase planning In order to achievethis goal Bayesian adaptive design utilizing predictive probabilityis implemented This approach allows the team to monitor efficacydata constantly as the new patientrsquos data become available and makedecisions before the end of trial The primary endpoint in Oncologytrials is usually overall survival or progression free survival whichtakes long time to observe so surrogate endpoint such as overall

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 77

Abstracts

response rate is often used in early phase trials Multiple bound-aries for making future strategic decisions or for different endpointscan be provided Simulations play a vital role in providing variousdecision-making boundaries as well as the corresponding operatingcharacteristics Based on simulation results for each given samplesize the minimal sample size needed for the first interim look andthe futilityefficacy boundaries will be provided based on Bayesianpredictive probabilities Details of the implementation of this de-sign in real clinical trials will be demonstrated and pros and cons ofthis type of design will also be discussed

Session 56 Recent Advancement in Statistical Methods

Exact Inference New Methods and ApplicationsIan DinwoodiePortland State UniversityihdpdxeduExact inference concerns methods that generalize Fisherrsquos ExactTest for independence The methods are exact in the sense that teststatistics have distributions that do not depend on nuisance param-eters and asymptotic approximations are not used However com-putations are challenging and often require Monte Carlo methodsThis talk gives an overview with attention to sampling techniquesincluding Markov Chains and sequential importance sampling withnew applications to dynamical models and signalling networks

Optimal Thresholds Criteria and Standard Criterion of VUSfor ROC SurfaceChong Sun HongSungkyunkwan UniversitycshongskkueduConsider the ROC surface which is a generalization of the ROCcurve for three-class diagnostic problems In this work we pro-pose five criteria for the three-class ROC surface by extending theYouden index the sum of sensitivity and specicity the maximumvertical distance the amended closest-to-(01) and the true rate Itmay be concluded that these five criteria can be expressed as a func-tion of two Kolmogorov-Smirnov (K-S) statistics It is found thatthe paired optimal thresholds selected from the ROC surface areequivalent to the two optimal thresholds found from the two ROCcurves Moreover we consider the volume under the ROC surface(VUS) The standard criteria of AUC for the probability of defaultbased on Basel II is extended to the VUS for ROC surface so thatthe standard criteria of VUS for the classification model is proposedThe ranges of AUC K-S and mean difference statistics correspond-ing to values of are VUS for each class of the standard criteria areobtained By exploring relationships of these statistics the standardcriteria of VUS for ROC surface could be established

Analysis of Cointegrated Models with Measurement ErrorsSung Ahn1 Hamwoom Hong2 and Sinsup Cho2

1Washington State University2Seoul National UniversityahnwsueduWe study the asymptotic properties of the reduced-rank estimator oferror correction models of vector processes observed with measure-ment errors Although it is well known that there is no asymptoticmeasurement error bias when predictor variables are integrated pro-cesses in regression models (Phillips and Durlauf 1986) we sys-tematically investigate the effects of the measurement errors (in thedependent variables as well as in the predictor variables) on the es-timation of not only cointegrating vectors but also the speed of ad-

justment matrix Furthermore we present the asymptotic propertiesof the estimators We also obtain the asymptotic distribution of thelikelihood ratio test for the cointegrating ranks and investigate theeffects of the measurement errors on the test through a Monte Carlosimulation study

A Direct Method to Evaluate the Time-Dependent PredictiveAccuracy for BiomarkersWeining Shen Jing Ning and Ying YuanUniversity of Texas MD Anderson Cancer Centerwshenmdandersonorg

Time-dependent areas under the receiver operating characteristics(ROC) curve (AUC) are important measures to evaluate the predic-tion accuracy of biomarkers for time-to-event endpoints (eg timeto disease progression or death) In this paper we propose a di-rect method to estimate AUC as a function of time using a flexiblefractional polynomials model without the middle step of modelingthe time-dependent ROC We develop a pseudo partial-likelihoodprocedure for parameter estimation and provide a test procedureto compare the predictive performance between biomarkers Weestablish the asymptotic properties of the proposed estimator andtest statistics A major advantage of the proposed method is itsease to make inference and compare the prediction accuracy acrossbiomarkers rendering our method particularly appealing for studiesthat require comparing and screening a large number of candidatebiomarkers We evaluate the finite-sample performance of the pro-posed method through simulation studies and illustrate our methodin an application to primary biliary cirrhosis data

Session 57 Building Bridges between Research and Prac-tice in Time Series Analysis

Time Series Research at the U S Census BureauBrian C MonsellU S Census Bureaubriancmonsellcensusgov

The Census Bureau has taken steps to reinforce the role of researchwithin the organization This talk will give details on the role of sta-tistical research at the U S Census Bureau with particular attentionpaid to the status of current work in time series analysis and statis-tical software development in time series A brief history of timeseries research will be given as well as details of work of historicalinterest

Temporal Causal Modeling Methodology Applications andImplementationNaoki Abe1 Tanveer Faruquie1 Huijing Jiang1 AnjuKambadur1 Kimberly Lang1 Aurelie Lozano1 and Jinwoo Shin2

1IBM2KAIST Universitynabeusibmcom

Temporal causal modeling is an approach for modeling and causalinference based on time series data which is based on some recentadvances in graphical Granger modeling In this presentation wewill review the basic concept and approach some specific modelingalgorithms methods for associated functions (eg root cause anal-ysis) as well as some efforts of scaling these methods via parallelimplementation We will also describe some business applicationsof this approach in a number of domains (The authors are orderedalphabetically)

78 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Issues Related to the Use of Time Series in Model Building andAnalysisWilliam WS WeiTemple UniversitywweitempleeduTime series are used in many studies for model building and anal-ysis We must be very careful to understand the kind of time seriesdata used in the analysis In this presentation we will begin withsome issues related to the use of aggregate and systematic samplingtime series Since several time series are often used in a study of therelationship of variables we will also consider vector time seriesmodeling and analysis Although the basic procedures of modelbuilding between univariate time series and vector time series arethe same there are some important phenomena which are unique tovector time series Therefore we will also discuss some issues re-lated to vector time models Understanding these issues is importantwhen we use time series data in modeling and analysis regardlessof whether it is a univariate or multivariate time series

Session 58 Recent Advances in Design for BiostatisticalProblems

Optimal Designs for N-of-1 TrialsYin Li and Keumhee Chough CarriereUniversity of AlbertaKccarrieualbertacaN-of-1 trials are randomized multi-crossover experiments using twoor more treatments on a single patient They provide evidence-basedinformation on an individual patient thus optimizing the manage-ment of the individualrsquos chronic disease Such trials are preferredin many medical experiments as opposed to the more conventionalstatistical designs constructed to optimize treating the average pa-tient N-of-1 trials are also popular when the sample size is toosmall to adopt traditional optimal designs However there are veryfew guidelines available in the literature We constructed optimal N-of-1 designs for two treatments under a variety of conditions aboutthe carryover effects the covariance structure and the number ofplanned periods Extension to optimal aggregated N-of-1 designs isalso discussed

Efficient Algorithms for Two-stage Designs on Phase II ClinicalTrialsSeongho Kim1 and Weng Kee Wong2

1Wayne State UniversityKarmanos Cancer Institute2University of California at Los AngeleskimsekarmanosorgSingle-arm two-stage designs have been widely used in phase IIclinical trials One of the most popular designs is Simonrsquos optimaltwo-stage design that minimizes the expected sample size under thenull hypothesis Currently a greedy search algorithm is often usedto evaluate every possible combination of sample sizes for optimaltwo-stage designs However such a greedy strategy is computation-ally intensive and so is not feasible for large sample sizes or adaptivetwo-stage design with many parameters An efficient global op-timization discrete particle swarm optimization (DPSO) is there-fore developed to find two-stage designs efficiently and is comparedwith greedy algorithms for Simonrsquos optimal two-stage and adaptivetwo-stage designs It is further shown that DPSO can be efficientlyapplied to complicated adaptive two-stage designs even with threeprefixed possible response rates which a greedy algorithm cannothandle

D-optimal Designs for Multivariate Exponential and PoissonRegression Models via Ultra-Dimensional Particle Swarm Op-timizationJiaheng Qiu and Weng Kee WongUniversity of California at Los AngeleswkwonguclaeduMultiple drug therapies are increasingly used to treat many diseasessuch as AIDS cancer and rheumatoid arthritis At the early stagesof clinical research the outcome is typically studied using a non-linear model with multiple doses from various drugs Advances inhandling estimation issues for such models are continually made butresearch to find informed design strategies has lagged We developa nature-inspired metaheuristic algorithm called ultra-dimensionalParticle Swarm Optimization (UPSO) to find D-optimal designs forthe Poisson and Exponential models for studying effects of up to 5drugs and their interactions This novel approach allows us to findeffective search strategy for such high-dimensional optimal designsand gain insight of their structure including conditions under whichlocally D-optimal designs are minimally supported We implementthe UPSO algorithm on a web site and apply it to redesign a realstudy that investigates 2-way interaction effects on the induction ofmicronuclei in mouse lymphoma cells from 3 genotoxic agents Weshow that a D-optimal design can reap substantial benefits over theimplemented design in Lutz et al (2005)

Optimizing Two-level Supersaturated Designs by ParticleSwarm TechniquesFrederick Kin Hing Phoa1 Ray-Bing Chen2 Wei-Chung Wang3

and Weng Kee Wong4

1Institute of Statistical Science Academia Sinica2National Cheng Kung University3National Taiwan University4University of California at Los AngelesfredphoastatsinicaedutwSupersaturated designs (SSDs) are often used in screening experi-ments with a large number of factors to reduce the number of exper-imental runs As more factors are used in the study the search for anoptimal SSD becomes increasingly challenging because of the largenumber of feasible selection of factor level settings This talk tack-les this discrete optimization problem via a metaheuristic algorithmbased on Particle Swarm Optimization (PSO) techniques Usingthe commonly used E(s2) criterion as an illustrative example wewere able to modify the standard PSO algorithm and find SSDs thatsatisfy the lower bounds calculated in Bulutoglu and Cheng (2004)and Bulutoglu (2007) showing that the PSO-generated designs areE(s2)-optimal SSDs

Session 59 Student Award Session 2

Analysis of Sequence Data Under Multivariate Trait-DependentSamplingRan Tao1 Donglin Zeng1 Nora Franceschini1 Kari E North1Eric Boerwinkle2 and Dan-Yu Lin1

1University of North Carolina at Chapel Hill2University of Texas Health Science CentertaorliveunceduHigh-throughput DNA sequencing allows the genotyping of com-mon and rare variants for genetic association studies At the presenttime and in the near future it is not economically feasible to se-quence all individuals in a large cohort A cost-effective strategy isto sequence those individuals with extreme values of a quantitativetrait We consider the design under which the sampling depends on

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 79

Abstracts

multiple quantitative traits Under such trait-dependent samplingstandard linear regression analysis can result in bias of parameterestimation inflation of type 1 error and loss of power We con-struct a nonparametric likelihood function that properly reflects thesampling mechanism and utilizes all available data We implementa computationally efficient EM algorithm and establish the theoret-ical properties of the resulting nonparametric maximum likelihoodestimators Our methods can be used to perform separate inferenceon each trait or simultaneous inference on multiple traits We payspecial attention to gene-level association tests for rare variants Wedemonstrate the superiority of the proposed methods over standardlinear regression through extensive simulation studies We provideapplications to the Cohorts for Heart and Aging Research in Ge-nomic Epidemiology Targeted Sequencing Study and the NationalHeart Lung and Blood Institute Exome Sequencing Project

Empirical Likelihood Based Tests for Stochastic Ordering Un-der Right CensorshipHsin-wen Chang and Ian W McKeague

Columbia Universityhc2496columbiaedu

This paper develops an empirical likelihood approach to testing forstochastic ordering between two univariate distributions under rightcensorship The proposed test is based on a maximally selectedlocalized empirical likelihood ratio statistic The asymptotic nulldistribution is expressed in terms of a Brownian bridge The newprocedure is shown via a simulation study to have superior power tothe log-rank and weighted KaplanndashMeier tests under crossing haz-ard alternatives The approach is illustrated using data from a ran-domized clinical trial involving the treatment of severe alcoholichepatitis

Multiple Genetic Loci Mapping for Latent Disease Liability Us-ing a Structural Equation Modeling Approach with Applicationin Alzheimerrsquos DiseaseTing-Huei Chen

University of North Carolina at Chapel Hillthchenliveuncedu

Categorical traits such as cases-control status are often used as re-sponse variables in genome-wide association studies of genetic lociassociated with complex diseases Using categorical variables tosummarize likely continuous disease liability may lead to loss ofinformation thus reduction of power to recover associated geneticloci On the other hand a direct study of disease liability is ofteninfeasible because it is an unobservable latent variable In some dis-eases the underlying disease liability is manifested by several phe-notypes and thus the associated genetic loci may be identified bycombining the information of multiple phenotypes In this articlewe propose a novel method named PeLatent to address this chal-lenge We employ a structural equation approach to model the latentdisease liability by observed manifest variablesphenotypic infor-mation and to identify simultaneously multiple associated geneticloci by a regularized estimation method Simulation results showthat our method has substantially higher sensitivity and specificitythan existing methods Application of our method for a genome-wide association study of the Alzheimerrsquos disease (AD) identifies27 single nucleotide polymorphisms (SNPs) associated with ADThese 27 SNPs are located within 19 genes and several of thesegenes are known to be related to Alzheimerrsquos disease as well asneural activities

Session 60 Semi-parametric Methods

Semiparametric Estimation of Mean and Variance in General-ized Estimating EquationsJianxin Pan1 and Daoji Li21The University of Manchester2University of Southern CaliforniadaojilimarshallusceduEfficient estimation of regression parameters is a major objective inthe analysis of longitudinal data Existing approaches usually fo-cus on only modeling the mean and treat the variance as a nuisanceparameter The common assumption is that the variance is a func-tion of the mean and the variance function is further assumed to beknown However the estimator of the regression parameters can bevery inefficient if the variance function or variance is misspecifiedIn this paper a flexible semiparametric regression approach for lon-gitudinal data is proposed to jointly model the mean and varianceThe novel semiparametric mean and variance models offer greatflexibility in formulating the effects of covariates and time on themean and variance We simultaneously estimate the parametric andnonparametric components in the models by using a B-splines basedapproach The asymptotic normality of the resulting estimators forparametric components in the proposed models is established andthe optimal rate of convergence of the nonparametric components isobtained Our simulation study shows that our proposed approachyields more efficient estimators for the mean parameters than theconventional GEE approach The proposed approach is also illus-trated with real data analysis

An Empirical Appraoch Of Efficient Estimation Of LinearFunctioinals Of A Probability With Side InformationHanxiang Peng Shan Wang and Lingnan LiIndiana University-Purdue University IndianapolishpengmathiupuieduIn this talk wersquoll construct efficient estimators of linear functionalsof a probability measure when side information is available Our ap-proach is based on maximum empirical likelihood We will exhibitthat the proposed approach is mathematical simpler and computa-tional easier than the usual maximum empirical likelihood estima-tors Several examples are given about the possible side informa-tion We also report some simulation results

M-estimation for General ARMA Processes with Infinite Vari-anceRongning WuBaruch College City University of New YorkrongningwubaruchcunyeduGeneral autoregressive moving average (ARMA) models extend thetraditional ARMA models by removing the assumptions of causal-ity and invertibility The assumptions are not required under a non-Gaussian setting for the identifiability of the model parameters incontrast to the Gaussian setting We study M-estimation for generalARMA processes with infinite variance where the distribution ofinnovations is in the domain of attraction of a non-Gaussian stablelaw Following the approach taken by Davis et al (1992) and Davis(1996) we derive a functional limit theorem for random processesbased on the objective function and establish asymptotic propertiesof the M-estimator We also consider bootstrapping the M-estimatorand extend the results of Davis amp Wu (1997) to the present settingso that statistical inferences are readily implemented Simulationstudies are conducted to evaluate the finite sample performance ofthe M-estimation and bootstrap procedures An empirical exampleof financial time series is also provided

80 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

Sufficient Dimension Reduction via Principal Lq Support Vec-tor MachineAndreas Artemiou1 and Yuexiao Dong2

1Cardiff University2Temple Universityydongtempleedu

Principal support vector machine was proposed recently by LiArtemiou and Li (2011) to combine L1 support vector machine andsufficient dimension reduction We introduce Lq support vector ma-chine as a unified framework for linear and nonlinear sufficient di-mension reduction By noticing that the solution of L1 support vec-tor machine may not be unique we set q gt 1 to ensure the unique-ness of the solution The asymptotic distribution of the proposedestimators are derived for q = 2 We demonstrate through numeri-cal studies that the proposed L2 support vector machine estimatorsimprove existing methods in accuracy and are less sensitive to thetuning parameter selection

Nonparametric Quantile Regression via a New MM AlgorithmBo Kai1 Mian Huang2 Weixin Yao3 and Yuexiao Dong4

1College of Charleston1National Chengchi University2Shanghai University of Finance and Economics3Kansas State University4Temple Universitykaibcofcedu

Nonparametric quantile regression is an important statistical modelthat has been widely used in many research fields and applicationsHowever its optimization is very challenging since the objectivefunctions are non-differentiable In this work we propose a newMM algorithm for the nonparametric quantile regression modelThe proposed algorithm simultaneously updates the quantile func-tion and yield a smoother estimate of the quantile function We sys-tematically study the new MM algorithm in local linear quantile re-gression and show that the proposed algorithm preserves the mono-tone descent property of MM algorithms in an asymptotic senseMonte Carlo simulation studies will be presented to show the finitesample performance of the proposed algorithm

Regression Estimators Using Stratified Ranked Set SamplingArbita Chatterjee Hani Samawi Lili Yu Daniel Linder JingxianCai and Robert VogelGeorgia Southern Universityjxcai19880721hotmailcom

This article is intended to investigate the performance of two typesof stratified regression estimators namely the separate and the com-bined estimator using stratified ranked set sampling (SRSS) intro-duced by Samawi (1996) The expressions for mean and varianceof the proposed estimates are derived and are shown to be unbiasedA simulation study is designed to compare the efficiency of SRSSrelative to other sampling procedure under varying model scenar-ios Our investigation indicates that the regression estimator of thepopulation mean obtained through an SRSS becomes more efficientthan the crude sample mean estimator using stratified simple ran-dom sampling These findings are also illustrated with the help ofa data set on bilirubin levels in babies in a neonatal intensive careunitKey words Ranked set sampling stratified ranked set samplingregression estimator

Session 61 Statistical Challenges in Variable Selectionfor Graphical Modeling

Fused Community DetectionYi Yu1 Yang Feng2 and Richard J Samworth1

1 University of Cambridge2 Columbia UniversityyangfengstatcolumbiaeduCommunity detection is one of the most widely studied problemsin network research In an undirected graph communities are re-garded as tightly-knit groups of nodes with comparatively few con-nections between them Popular existing techniques such as spec-tral clustering and variants thereof rely heavily on the edges beingsufficiently dense and the community structure being relatively ob-vious These are often not satisfactory assumptions for large-scalereal-world datasets We therefore propose a new community de-tection method called fused community detection (fcd) which isdesigned particularly for sparse networks and situations where thecommunity structure may be opaque The spirit of fcd is to takeadvantage of the edge information which we exploit by borrowingsparse recovery techniques from regression problems Our methodis supported by both theoretical results and numerical evidence Thealgorithms are implemented in the R package fcd which is availableon cran

High Dimensional Tests for Functional Brain NetworksJichun Xie1 and Jian Kang2

1Temple University2Emory UniversityjichuntempleeduLarge-scale resting-state fMRI studies have been conducted for pa-tients with autism and the existence of abnormalities in the func-tional connectivity between brain regions (containing more thanone voxel) have been clearly demonstrated Due to the ultra-highdimensionality of the data current methods focusing on studyingthe connectivity pattern between voxels are often lack of power andcomputation-efficiency In this talk we introduce a new frameworkto identify the connection pattern of gigantic networks with desiredresolution We propose three procedures based on different networkstructures and testing criteria The asymptotical null distributions ofthe test statistics are derived together with its rate-optimality Sim-ulation results show that the tests are able to control type I error andyet very powerful We apply our method to a resting-state fMRIstudy on autism The analysis yields interesting insights about themechanism of autism

Bayesian Inference of Multiple Gaussian Graphical ModelsChristine Peterson1 Francesco Stingo2 and Marina Vannucci31Stanford University2University of Texas MD Anderson Cancer Center3Rice UniversitycbpetersongmailcomIn this work we propose a Bayesian approach for inference of mul-tiple Gaussian graphical models Specifically we address the prob-lem of inferring multiple undirected networks in situations wheresome of the networks may be unrelated while others share com-mon features We link the estimation of the graph structures via aMarkov random field prior which encourages common edges Inaddition we learn which sample groups have shared graph structureby placing a spike-and-slab prior on the parameters that measurenetwork relatedness This approach allows us to share informationbetween sample groups when appropriate as well as to obtain a

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 81

Abstracts

measure of relative network similarity across groups In simula-tion studies we find improved accuracy of network estimation overcompeting methods particularly when the sample sizes within eachsubgroup are moderate We illustrate our model with an applica-tion to inference of protein networks for various subtypes of acutemyeloid leukemia

Mixed Graphical Models via Exponential FamiliesEunho Yang1 Yulia Baker2 Pradeep Ravikumar1 Genevera IAllen2 and Zhandong Liu3

1University of Texas at Austin2Rice University3Baylor College of MedicineyuliabakerriceeduMarkov Random Fields or undirected graphical models are widelyused to model high-dimensional multivariate data Classical in-stances of these models such as Gaussian Graphical and Ising Mod-els as well as recent extensions (Yang et al 2012) to graphicalmodels specified by univariate exponential families assume all vari-ables arise from the same distribution Complex data from high-throughput genomics and social networking for example often con-tain discrete count and continuous variables measured on the sameset of samples To model such heterogeneous data we develop anovel class of mixed graphical models by specifying that each node-conditional distribution is a member of a possibly different univari-ate exponential family We study several instances of our modeland propose scalable M-estimators for recovering the underlyingnetwork structure Simulations as well as an application to learn-ing mixed genomic networks from next generation sequencing andmutation data demonstrate the versatility of our methods

Session 62 Recent Advances in Non- and Semi-Parametric Methods

Joint Estimation of Multiple Bivariate Densities of ProteinBackbone Angles Using an Adaptive Exponential Spline Fam-ilyLan ZhouTexas AampM UniversitylzhoustattamueduIn this talk we introduce a method for joint estimation of multiplebivariate density functions for a collection of populations of proteinbackbone angles The method utilizes an exponential family of dis-tributions for which the log densities are modeled as a linear com-bination of a common set of basis functions The basis functionsare obtained as bivariate splines on triangulations and are adap-tively chosen based on dataThe circular nature of angular data istaken into account by imposing appropriate smoothness constraintsacross boundaries Maximum penalized likelihood is used for fit-ting the model and an effective Newton-type algorithm is devel-oped A simulation study clearly showed that the joint estimationapproach is statistically more efficient than estimating the densi-ties separately The proposed method provides a novel and uniqueperspective to two important and challenging problems in proteinstructure research namely structure-based protein classification andquality assessment of protein structure prediction servers The jointdensity estimation approach is widely applicable when there is aneed to estimate multiple density functions from different popula-tions with common features Moreover the coefficients of basisexpansion for the fitted densities provide a low-dimensional repre-sentation that is useful for visualization clustering and classifica-

tion of the densities This is joint work with Mehdi Maadooliat XinGao and Jianhua Huang

Estimating Time-Varying Effects for Overdispersed RecurrentData with Treatment SwitchingQingxia Chen1 Donglin Zeng2 Joseph G Ibrahim2 MounaAkacha3 and Heinz Schmidli31Vanderbilt University2University of North Carolina at Chapel Hill3Novartis Pharmaceuticals CorporationcindychenvanderbilteduIn the analysis of multivariate event times frailty models assum-ing time-independent regression coefficients are often consideredmainly due to their mathematical convenience In practice regres-sion coefficients are often time dependent and the temporal effectsare of clinical interest Motivated by a phase III clinical trial inmultiple sclerosis we develop a semiparametric frailty modellingapproach to estimate time-varying effects for overdispersed recur-rent events data with treatment switching The proposed model in-corporates the treatment switching time in the time-varying coeffi-cients Theoretical properties of the proposed model are establishedand an efficient EM algorithm is derived to obtain the maximumlikelihood estimates Simulation studies evaluate the numerical per-formance of the proposed model under various temporal treatmenteffect curves The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching aswell as for alternative models when the proportional hazard assump-tion is violated A multiple sclerosis dataset is analyzed to illustrateour methodology

Bivariate Penalized Splines for RegressionMing-Jun Lai and Lily WangUniversity of GeorgialilywangugaeduIn this work we are interested in smoothing data over complex ir-regular boundaries or interior holes We propose bivariate penal-ized spline estimators over triangulations using energy functionalas the penalty We establish the consistency and asymptotic normal-ity for the proposed estimators and study the convergence rates ofthe estimators A comparison with thin-plate splines is provided toillustrate some advantages of this spline smoothing approach Theproposed method can be easily applied to various smoothing prob-lems over arbitrary domains including irregularly shaped domainswith irregularly scattered data points

Local Feature Selection in Varying-Coefficient ModelsLan Xue1 Xinxin Shu2 Peibei Shi2 Colin Wu3 and Annie Qu2

1Oregon State University2University of Illinois at Urbana-Champaign3Lung and Blood InstitutexuelstatoregonstateeduWe propose new varying-coefficient model selection and estimationbased on the spline approach which is capable of capturing time-dependent covariate effects The new penalty function utilizes local-region information for varying-coefficient estimation in contrast tothe traditional model selection approach focusing on the entire re-gion The proposed method is extremely useful when the signalsassociated with relevant predictors are time-dependent and detect-ing relevant covariate effects in the local region is more scientifi-cally relevant than those of the entire region However this bringschallenges in theoretical development due to the large-dimensionalparameters involved in the nonparametric functions to capture thelocal information in addition to computational challenges in solv-

82 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

ing optimization problems with overlapping parameters for differ-ent local-region penalization We provide the asymptotic theory ofmodel selection consistency on detecting local signals and estab-lish the optimal convergence rate for the varying-coefficient esti-mator Our simulation studies indicate that the proposed model se-lection incorporating local features outperforms the global featuremodel selection approaches The proposed method is also illus-trated through a longitudinal growth and health study from NationalHeart Lung and Blood Institute

Session 63 Statistical Challenges and Development inCancer Screening Research

Overdiagnosis in Breast and Prostate Cancer Screening Con-cepts Methods and ChallengesRuth Etzioni Roman Gulati and Jing XiaFred Hutchinson Cancer Research CenterretzionifhcrcorgOverdiagnosis occurs when a tumor is detected by screening butin the absence of screening that tumor would never have becomesymptomatic within the lifetime of the patient Thus an overdiag-nosed tumor is a true extra diagnosis due solely to the existence ofthe screening test Patients who are overdiagnosed cannot by def-inition be helped by the diagnosis but they can be harmed partic-ularly if they are treated Therefore knowledge of the likelihoodthat a screen-detected cancer has been overdiagnosed is critical formaking treatment decisions and developing screening policy Theproblem of overdiagnosis has been long recognized in the case ofprostate cancer and is currently an area of extreme interest in breastcancer Published estimates of the frequency of overdiagnosis inbreast and prostate cancer screening vary greatly This presentationwill investigate why different studies yield such different resultsIrsquoll explain how overdiagnosis arises and catalog the different waysit may be measured in population studies Irsquoll then discuss differentapproaches that are used to estimate overdiagnosis Many studiesuse excess incidence under screening relative to incidence withoutscreening as a proxy for overdiagnosis Others use statistical mod-els to make inferences about lead time or disease natural historyand then derive the corresponding fraction of cases that are over-diagnosed Each approach has its limitations and challenges butone thing is clear estimation approach is clearly a major factor be-hind the variation in overdiagnosis estimates in the literature I willconclude with a list of key questions that consumers of overdiagno-sis studies should ask to determine the validity (or lack thereof) ofstudy results

Estimation of Biomarker Growth in a Screening StudyLurdes YT Inoue1 Roman Gulati2 and Ruth Etzioni21University of Washington2Fred Hutchinson Cancer Research CenterlinoueuweduWith the growing importance of biomarker-based tests for early de-tection and monitoring of chronic diseases the question of howbest to utilize biomarker measurements is of tremendous interestthe answer requires understanding the biomarker growth processProspective screening studies offer an opportunity to investigatebiomarker growth while simultaneously assessing its value for earlydetection However since disease diagnosis usually terminates col-lection of biomarker measurements proper estimation of biomarkergrowth in these studies may need to account for how screening af-fects the length of the observed biomarker trajectory In this talk we

compare estimation of biomarker growth from prospective screen-ing studies using two approaches a retrospective approach that onlymodels biomarker growth and a prospective approach that jointlymodels biomarker growth and time to screen detection We assessperformance of the two approaches in a simulation study and usingempirical prostate-specific antigen data from the Prostate CancerPrevention Trial We find that the prospective approach accountingfor informative censoring often produces similar results but mayproduce different estimates of biomarker growth in some contexts

Estimating Screening Test Effectiveness when Screening Indica-tion is UnknownRebecca HubbardGroup Health Research Institutehubbardrghcorg

Understanding the effectiveness of cancer screening tests is chal-lenging when the same test is used for screening and also for dis-ease diagnosis in symptomatic individuals Estimates of screeningtest effectiveness based on data that include both screening and di-agnostic examinations will be biased Moreover in many cases goldstandard information on the indication for the examination are notavailable Models exist for predicting the probability that a givenexamination was used for a screening purpose but no previous re-search has investigated appropriate statistical methods for utilizingthese probabilities In this presentation we will explore alternativemethods for incorporating predicted probabilities of screening in-dication into analyses of screening test effectiveness Using sim-ulation studies we compare the bias and efficiency of alternativeapproaches We also demonstrate the performance of each methodin a study of colorectal cancer screening with colonoscopy Meth-ods for estimating regression model parameters associated with anunknown categorical predictor such as indication for examinationhave broad applicability in studies of cancer screening and otherstudies using data from electronic health records

Developing Risk-Based Screening Guidelines ldquoEqual Manage-ment of Equal RisksrdquoHormuzd KatkiNational Cancer Institutekatkihmailnihgov

The proliferation of disease risk calculators has not led to a prolif-eration of risk-based screening guidelines The focus of risk-basedscreening guidelines is connecting risk stratification under naturalhistory of disease (without intervention) to ldquobenefit stratificationrdquowhether the risk stratification better distinguishes people who havehigh benefit vs low benefit from a screening intervention To linkrisk stratification to benefit stratification we propose the principleof ldquoequal management of people at equal risk of diseaserdquo Whenapplicable this principle leads to simplified and consistent manage-ment of people with different risk factors or test results leading tothe same disease risk people who might also have a similar bene-fitharm profile We describe two examples of our approach Firstwe demonstrate how the ldquoequal management of equal risksrdquo prin-ciple was applied to thoroughly integrate HPV testing into the newrisk-based cervical cancer screening guidelines the first thoroughlyrisk-based US cancer screening guidelines Second we use risk oflung cancer death to estimate benefit stratification for targeting CTlung cancer screening We show how we calculated benefit strati-fication for CT lung screening and also the analogous ldquoharm strat-ificationrdquo and ldquoefficiency stratificationrdquo We critically examine thelimits of the ldquoequal management of equal risksrdquo principle This ap-proach of calculating benefit stratification and applying ldquoequal man-

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 83

Abstracts

agement of equal risksrdquo might be applicable in other settings to helppave the way for developing risk-based screening guidelines

Session 64 Recent Developments in the Visualization andExploration of Spatial Data

Recent Advancements in Geovisualization with a Case Studyon Chinese ReligionsJuergen Symanzik1 and Shuming Bao2

1Utah State University2University of MichigansymanzikmathusueduProducing high-quality map-based displays for economic medicaleducational or any other kind of statistical data with geographiccovariates has always been challenging Either it was necessary tohave access to high-end software or one had to do a lot of detailedprogramming Recently R software for linked micromap (LM)plots has been enhanced to handle any available shapefiles fromGeographic Information Systems (GIS) Also enhancements havebeen made that allow for a fast overlay of various statistical graphson Google maps In this presentation we provide an overview ofthe necessary steps to produce such graphs in R starting with GIS-based data and shapefiles and ending with the resulting graphs inR We will use data from a study on Chinese religions and society(provided by the China Data Center at the University of Michigan)as a case study for these graphical methods

Spatial Analysis with China Geo-ExplorersShuming Bao1 Miao Shui2 and Bing She21University of Michigan2Wuhan UniversitysbaoumicheduWith the rapid development of spatial and non-spatial databases ofpopulation economy social and natural environment from differ-ent sources times and formats It has been a challenge how to effi-ciently integrate those space-time data and methodology for spatialstudies This paper will discuss the recent development of spatialintelligence technologies and methodologies for spatial data inte-gration data analysis as well as their applications for spatial stud-ies The presentation will introduce the newly developed spatialdata explorers (China Geo-Explorer) distributed by the Universityof Michigan China Data Center It will demonstrate how space-timedata of different formats and sources can be integrated visualizedanalyzed and reported in a web based spatial system Some applica-tions in population and regional development disaster assessmentenvironment and health cultural and religious studies and house-hold surveys will be discussed for China and global studies Futuredirections will be discussed finally

Probcast Creating and Visualizing Probabilistic Weather Fore-castsJ McLean Sloughter1 Susan Joslyn2 Patrick Tewson3 TilmannGneiting4 and Adrian Raftery21Seattle University2University of Washington3Bigger Boat Consulting4University HeidelbergsloughtjseattleueduProbabilistic methods are becoming increasingly common forweather forecasting However communicating uncertainty infor-mation about spatial forecasts to users is not always a straightfor-ward task The Probcast project (httpprobcastcom) looks to both

develop methodologies for spatial probabilistic weather forecast-ing and to develop means of communicating this information ef-fectively This talk will discuss both the statistical approaches usedto create forecasts and the cognitive psychology research used tofind the best ways to clearly communicate statistical and probabilis-tic information

Session 65 Advancement in Biostaistical Methods andApplications

Estimation of Time-Dependent AUC under Marker-DependentSamplingXiaofei Wang and Zhaoyin ZhuDuke UniversityxiaofeiwangdukeeduIn biomedical field evaluating the accuracy of a biomarker predict-ing the onset of a disease or a disease condition is essential Whenpredicting the binary status of disease onset is of interest the areaunder the ROC curve (AUC) is widely used When predicting thetime to an event is of interest time-dependent ROC curve (AUC(t))can be used In both cases however the simple random sampling(SRS) often used for biomarker validation is costly and requires alarge number of patients To improve study efficiency and reducecost marker-dependent sampling (MDS) has been proposed (Wanget al 2012 2013) in which selection of patients for ascertainingtheir survival outcomes is dependent on the results of biomarkerassays In this talk we will introduce a non-parametric estimatorfor time-dependent AUC(t) under MDS The consistency and theasymptotic normality of the proposed estimator will be discussedSimulation will be used to demonstrate the unbiasedness of the pro-posed estimator under MDS and the efficiency gain of MDS overSRS

A Measurement Error Approach for Modeling Accelerometer-based Physical Activity DataJulia Lee Jing Song and Dorothy DunloopNorthwestern Universityjungwha-leenorthwesterneduPhysical activity (PA) is a modifiable lifestyle factor for manychronic diseases with established health benefits PA outcomes us-ing accelerometers are measured and assessed in many studies butthere are limited statistical methods analyzing accelerometry dataWe describe a measurement error modeling approach to estimatethe distribution of habitual physical activity and the sources of vari-ation in accelerometer-based physical activity data from a sampleof adults with or at risk of knee osteoarthritis We model both theintra- and inter-individual variability in measured physical activityOur model allows us to account for and adjust for measurement er-rors biases and other sources of intra-individual variations

Real-Time Prediction in Clinical Trials A Statistical History ofREMATCHDaniel F Heitjan and Gui-shuang YingUniversity of PennsylvaniadheitjanupenneduRandomized clinical trials often include one or more planned in-terim analyses during which an external monitoring committee re-views the accumulated data and determines whether it is scientif-ically and ethically appropriate for the study to continue Withsurvival-time endpoints it is often desirable to schedule the interimanalyses at the times of occurrence of specified landmark eventssuch as the 50th event the 100th event and so on Because the

84 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

timing of such events is random and the interim analyses imposeconsiderable logistical burdens it is worthwhile to predict the eventtimes as accurately as possible Prediction methods available priorto 2001 used data only from previous trials which are often of ques-tionable relevance to the trial for which one wishes to make predic-tions With modern data management systems it is often feasibleto use data from the trial itself to make these predictions render-ing them far more reliable This talk will describe work that somecolleagues and students and I have done in this area I will set themethodologic development in the context of the trial that motivatedour work REMATCH a randomized clinical trial of a heart assistdevice that ran from 1998 to 2001 and was considered one of themost rigorous and expensive device trials ever conducted

An Analysis of Microarary Data with Batch EffectsDongseok Choi William O Cepurna John C Morrison Elaine CJohnson Stephen R Planck and James T RosenbaumOregon Health amp Science UniversitychoidohsueduNormalization is considered an important step before any statisti-cal analyses in microarray studies Many methods have been pro-posed over the last decade or so for examples global normalizationlocal regression based methods and quantile normalization Nor-malization methods typically remove systemic biases across arraysand have been shown quite effective in removing them from arrayswhen they were processed simultaneously in a batch It is howeverreported that they sometimes do not remove differences betweenbatches when microarrays are split into several experiments over thetime In this presentation we will explore potential approaches thatcould adjust batch effects by using traditional methods and methodsdeveloped as a secondary normalization

Session 66 Analysis of Complex Data

Integrating Data from Heterogeneous Studies Using Only Sum-mary Statistics Efficiency and RobustnessMin-ge XieRutgers UniversitymxiestatrutgerseduHeterogeneous studies arise often in applications due to differentstudy and sampling designs populations or outcomes Sometimesthese studies have common hypotheses or parameters of interestWe can synthesize evidence from these studies to make inferencefor the common hypotheses or parameters of interest For hetero-geneous studies some of the parameters of interest may not be es-timable for certain studies and in such a case these studies are typ-ically excluded in conventional methods The exclusion of part ofthe studies can lead to a non-negligible loss of information This pa-per introduces a data integration method for heterogeneous studiesby combining the confidence distributions derived from the sum-mary statistics of individual studies It includes all the studies inthe analysis and makes use of all information direct as well as in-direct Under a general likelihood inference framework this newapproach is shown to have several desirable properties includingi) it is asymptotically as efficient as the maximum likelihood ap-proach using individual participant data (IPD) from all studies ii)unlike the IPD analysis it suffices to use summary statistics to carryout our approach Individual-level data are not required and iii) itis robust against misspecification of the working covariance struc-ture of the parameter estimates All the properties of the proposedapproach are further confirmed by data simulated from a random-

ized clinical trials setting as well as by real data on aircraft landingperformance (Joint work with Dungang Liu and Regina Liu)

A Markov Modulated Poisson Model for Reliability DataJoshua Landon1 Suleyman Ozekici2 and Refik Soyer11George Washington University2Koc UniversityjlandongwueduIn this presentation we will consider a latent Markov process gov-erning the intensity rate of a Poisson process model for failure dataThe latent process enables us to infer the performance of the de-bugging operation over time and allows us to deal with the imper-fect debugging scenario We develop the Bayesian inference for themodel and also introduce a method to infer the unknown dimensionof the Markov process We will illustrate the implementation of ourmodel and the Bayesian approach by using actual software failuredata

A Comparison of Two Approaches for Acute Leukemia PatientClassificationJingjing Wu1 Guoqiang Chen2 and Zeny Feng3

1University of Calgary2Enbridge Pipelines3University of GuelphjinwuucalgarycaThe advancement of microarray technology has greatly facilitatedthe research in gene expression based classification of patient sam-ples For example in cancer research microarray gene expressiondata has been used for cancer or tumor classification When thestudy is only focusing on two classes for example two different can-cer types we propose a two-sample semiparametric model to modelthe distributions of gene expression level for different classes Toestimate the parameters we consider both maximum semiparamet-ric likelihood estimate (MLE) and minimum Hellinger distance es-timate (MHDE) For each gene Wald statistic is constructed basedon either the MLE or MHDE Significance test is then performed oneach gene We exploit the idea of weighted sum of misclassificationrates to develop a novel classification model in which previouslyidentified significant genes only are involved To testify the useful-ness of our proposed method we consider a predictive approachWe apply our method to analyze the acute leukemia data of Golubet al (1999) in which a training set is used to build the classifica-tion model and the testing set is used to evaluate the accuracy of ourclassification model

On the Consistency and Covariate Selections for Varying-Coefficient Deming RegressionsYing Lu1 Chong Gu2 Bo Fan3 Selwyn Au4 Valerie McGuire1

and John Shepherd3

1VA Palo Alto Health Care System amp Stanford University2Purdue University3University of California at San Francisco4VA Palo Alto Health Care SystemyingluvagovAlthough Deming regression (DR) has been successfully used toestablish cross-calibration (CC) formulas for bone mineral densities(BMD) between manufacturers at several anatomic sites it failedfor CC of whole body BMD because their relationship varies withsubjectrsquos weight total fat and lean mass We proposed to use a newvarying-coefficient DR (VCDR) that allows the intercept and slopebe non-linear functions of covariates and applied this new modelsuccessfully to derive a consistent calibration formula for the newwhole body BMD data Our results showed this VCDR effectively

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 85

Abstracts

removed all systematic bias in previous work In this talk we willdiscuss the consistency of the calibration formula and proceduresfor covariate selections

Session 67 Statistical Issues in Co-development of Drugand Biomarker

Group Sequential Designs for Developing and TestingBiomarker-Guided Personalized Therapies in Comparative Ef-fectiveness ResearchTze Leung Lai1 Olivia Yueh-Wen Liao2 and Dong Woo Kim3

1Stanford University2Onyx Pharmaceuticals3Microsoft Corportationdwkim88stanfordeduBiomarker-guided personalized therapies offer great promise to im-prove drug development and improve patient care but also posedifficult challenges in designing clinical trials for the developmentand validation of these therapies We first give a review of the exist-ing approaches briefly for clinical trials in new drug developmentand in more detail for comparative effectiveness trials involving ap-proved treatments We then introduce new group sequential designsto develop and test personalized treatment strategies involving ap-proved treatments

Adaptive Enrichment Designs for Clinical TrialsNoah Simon1 and Richard Simon2

1University of Washington2National Institutes of HealthnrsimonuwashingtoneduMany difficult-to-treat diseases are actually a heterogenious collec-tion of similar syndromes with potentially different causal mech-anisms New molecules attack pathways that are dysregulated inonly a subset of this collection and so are expected to be effec-tive for only a subset of patients with the disease Often this subsetis not well understood until well into large scale of clinical trialsAs such standard practice has been to enroll a broad range of pa-tients and run post-hoc subset analysis to determine those who mayparticularly benefit This unnecessarily exposes many patients tohazardous side effects and may vastly decrease the efficiency of thetrial (expecially if only a small subset benefit) In this talk I willdiscuss a class of adaptive enrichment designs which allow the el-igibility criteria of a trial to be adaptively updated during the trialrestricting entry to only patients likely to benefit from the new treat-ment These designs control type I error can substantially increasepower I will also discuss and illustrate strategies for effectivelybuilding and evaluating biomarkers in this framework

An Adaptive Single-Arm Phase II Design with Co-primaryObjectives to Evaluate Activity Overall and In Relation to aBiomarker-Defined SubgroupMichael WolfAmgen IncmichaelwolfamgencomRoberts (Clin Cancer Res 2011) presented a single-arm 2-stageadaptive design to evaluate response overall and in one or morebiomarker-defined subgroup where biomarkers are only determinedfor responders While this design has obvious practical advantagesthe testing strategy proposed does not provide robust control offalse-positive error Modified futility and testing strategies are pro-posed based on marginal probabilities to achieve the same designobjectives that are shown to be more robust however a trade-off

is that biomarkers must be determined for all subjects Clinicalexamples of design setup and analysis are illustrated with a fixedsubgroup size that reflects its expected prevalence in the intendeduse population based on a validated in vitro companion diagnosticDesign efficiency and external validity are compared to testing fora difference in complement biomarker subgroups Possible gener-alizations of the design for a data-dependent subgroup size (egbiomarker value iquest sample median) and multiple subgroups are dis-cussed

Biomarker Threshold Estimation to Predict Clinical BenefitWhat Can Reasonably be Learned During Early (PhIII) On-cology DevelopmentThomas BengtssonGenentech IncthomasgbgenecomA key goal during early clinical co-development of a new therapeu-tic and a biomarker is to determine the ldquodiagnostic positive grouprdquoie to identify a sub-group of patients likely to receive a clini-cally meaningful treatment benefit We show that based on a typi-cally sized Ph1Ph2 study with nrevents iexcl 100 accurate biomarkerthreshold estimation with time-to-event data is not a realistic goalInstead we propose to hierarchically test for treatment effects inpre-determined patient subjects most likely to benefit clinically Weillustrate our method with data from a recent lung cancer trial

Session 68 New Challenges for Statistical Ana-lystProgrammer

Similarities and Differences in Statistical Programming amongCRO and Pharmaceutical IndustriesMark MatthewsinVentiv Health ClinicalmrkmtthwsyahoocomStatistical programming in the clinical environment has a widerange of opportunities across the clinical drug development cycleWhether you are employed by a Contract Research OrganizationPharmaceutical or Biotechnology company or as a contractor theprogramming tasks are often quite similar and at times the workcannot be differentiated by your employer However the higherlevel strategies and the direction any organization takes as an en-terprise can be an important factor in the fulfillment of a statisticalprogrammerrsquos career The author would like to share his experi-ences with the differences and similarities that a clinical statisticalprogrammer can be offered in their career and also provide someuseful tips on how to best collaborate when working with your peerprogrammers from different industries

Computational Aspects for Detecting Safety Signals in ClinicalTrialsJyoti RayamajhiEli Lilly and Companyrayamajhi jyotilillycomIt is always a challenge to detect safety signals from adverse event(AE) data in clinical trials which is a critical task in any drug devel-opment In any trial it is very desirable to describe and understandthe safety of the compound to the fullest possible extent MedDRAcoding scheme eg System Organ Class (SOC) and Preferred Term(PT) is used in safety analyses which is hierarchical in nature Useof Bayesian hierarchical models to predict posterior probabilitiesand will also account for AE in the same SOC to be more likelybe similar so they can sensibly borrow strength from each other

86 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

The model also allows borrowing strength across SOCs but doesnot impose it depending on the actual data It is interesting to seecomparative analyses between frequentistrsquos approach and an alter-native Bayesian methodology in detecting safety signals in clinicaltrials Computation of data to model these hierarchical models iscomplex and is challenging Data from studies were used to model3 Bayesian logistic regression hierarchical models Model selectionis achieved by using Deviance Information Criteria (DIC) Modelsand plots were implemented using BRugs R2WinBUGS and JAGSA scheme for meta analysis for a hierarchical three-stage Bayesianmixture model is also implemented and will be discussed An userfriendly and fully-functional web interface for safety signal detec-tion using Bayesian meta-analysis and general three-stage hierar-chical mixture model will be described Keywords System OrganClass Preferred terms Deviance Information Criteria hierarchicalmodels mixture model

Bayesian Network Meta-Analysis Methods An Overview andA Case StudyBaoguang Han1 Wei Zou2 and Karen Price11Eli Lilly and Company2inVentiv Clinical Healthhan baoguanglillycomEvidence-based health-care decision making requires comparing allrelevant competing interventions In the absence of direct head-to-head comparison of different treatments network meta-analysis(NMA) is increasingly used for selecting the best treatment strat-egy for health care intervention The Bayesian approach offers aflexible framework for NMA in part due to its ability to propagateparameter correlation structure and provide straightforward proba-bility statements around the parameters of interest In this talk wewill provide a general overview of the Bayesian NMA models in-cluding consistency models network meta-regression and inconsis-tency check using node-splitting techniques Then we will illustratehow NMA analysis can be performed with a detailed case studyand provide some details on available software as well as variousgraphical and textual outputs that can be readily understood and in-terpreted by clinicians

Session 69 Adaptive and Sequential Methods for ClinicalTrials

Bayesian Data Augmentation Dose Finding with Continual Re-assessment Method and Delayed ToxicitiesYing Yuan1 Suyu Liu1 and Guosheng Yin2

1 University of Texas MD Anderson Cancer Center2 University of Hong KongyyuanmdandersonorgA major practical impediment when implementing adaptive dose-finding designs is that the toxicity outcome used by the decisionrules may not be observed shortly after the initiation of the treat-ment To address this issue we propose the data augmentation con-tinual reassessment method (DA-CRM) for dose findingBy natu-rally treating the unobserved toxicities as missing data we showthat such missing data are nonignorable in the sense that the miss-ingness depends on the unobserved outcomes The Bayesian dataaugmentation approach is used to sample both the missing dataand model parameters from their posterior full conditional distri-butionsWe evaluate the performance of the DA-CRM through ex-tensive simulation studies and also compare it with other existingmethods The results show that the proposed design satisfactorily

resolves the issues related to late-onset toxicities and possesses de-sirable operating characteristicstreating patients more safely andalso selecting the maximum tolerated dose with a higher probabil-ity

Optimal Marker-strategy Clinical Trial Design to Detect Pre-dictive Markers for Targeted TherapyYong Zang Suyu Liu and Ying YuanUniversity of Texas MD Anderson Cancer Centeryzang1mdandersonorgIn developing targeted therapy the marker-strategy design providesan important approach to evaluate the predictive marker effect Thisdesign first randomizes patients into non-marker-based or marker-based strategies Patients allocated to the non-marker-based strat-egy are then further randomized to receive either the standard ortargeted treatments while patients allocated to the marker-basedstrategy receive treatments based on their marker statuses Thepredictive marker effect is tested by comparing the treatment out-come between the two strategies In this talk we show that sucha between-strategy comparison has low power to detect the predic-tive effect and is valid only under the restrictive condition that therandomization ratio within the non-marker-based strategy matchesthe marker prevalence To address these issues we propose a Waldtest that is generally valid and also uniformly more powerful thanthe between-strategy comparison Based on that we derive an opti-mal marker-strategy design that maximizes the power to detect thepredictive marker effect by choosing the optimal randomization ra-tios between the two strategies and treatments Our numerical studyshows that using the proposed optimal designs can substantially im-prove the power of the marker-strategy design to detect the predic-tive marker effect

Dynamic Prediction of Time to Relapse Using LongitudinalBiomarker DataXuelin Huang1 Jing Ning1 and Sangbum Choi21University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at HoustonxlhuangmdandersonorgAs time goes by more and more data are observed for each pa-tient Dynamic prediction is to keep making updated predictionsof disease prognosis using all the available information This pro-posal is motivated by the need of real-time monitoring of the diseaseprogress of chronic myeloid leukemia patients using their BCR-ABL gene expression levels measured during their follow-up vis-its We provide real-time dynamic prediction for future prognosisusing a series of marginal Cox proportional hazards models overcontinuous time with constraints Comparing with separate land-mark analyses on different discrete time points after treatment ourapproach can achieve more smooth and robust predictions Com-paring with approaches of joint modeling of longitudinal biomark-ers and survival our approach does not need to specify a model forthe changes of the monitoring biomarkers and thus avoids the needof any kind of imputing of the biomarker values on time points theyare not available This helps eliminate the potential bias introducedby mis-specified models for longitudinal biomarkers

Continuous Tumor Size Change Percentage and ProgressionFree Survival as Endpoint of the First and Second Stage Re-spectively in a Novel Double Screening Phase II DesignYe Cui1 Zhibo Wang1 Yichuan Zhao1 and Zhengjia Chen2

1 Georgia State University2 Emory Universitycathysaiyogmailcom

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 87

Abstracts

A phase II trial is an expedite and low cost trial to screen poten-tially effective agents for the following phase III trial Unfortu-nately the positive rate of Phase III trials is still low although agentshave been determined to be effective in proceeding Phase II trialsmainly because the different endpoints are used in Phase II (tu-mor response) and III (survival) trials Good disease response oftenleads to but can NOT guarantee better survival From statisticalconsideration transformation of continuous tumor size change intoa categorical tumor response (complete response partial responsestable disease or progressive disease) according to World HealthOrganization (WHO) or Response Evaluation Criteria In Solid Tu-mors (RECIST) will result in a loss of study power Tumor sizechange can be obtained rapidly but survival estimation requires along time follow up We propose a novel double screening phaseII design in which tumor size change percentage is used in the firststage to select potentially effective agents rapidly for second stagein which progression free or overall survival is estimated to confirmthe efficacy of agents The first screening can fully utilize all tumorsize change data and minimize cost and length of trial by stoppingit when agents are determined to be ineffective based on low stan-dard and the second screening can substantially increase the successrate of following Phase III trial by using similar or same outcomesand a high standard Simulation studies are performed to optimizethe significant levels of the two screening stages in the design andcompare its operating characteristics with Simonrsquos two stage designROC analysis is applied to estimate the success rate in the follow-upPhase III trials

Session 70 Survival Analysis

Comparison of Hazard Rate and Odds Ratio in the Two-Sample Survival ProblemBenedict Dormitorio and Joshua NaranjoWestern Michigan UniversitybenedictpdormitoriowmicheduCox proportional hazards seems to be the standard statisticalmethod for analyzing treatment efficacy when time-to-event datais available In the absence of time-to-event investigators may uselogistic regression which does not require time-to-event or Poissonregression which requires only interval-summarized frequency ta-bles of time-to-event We investigate the relative performance of thethree methods In particular we compare the power of tests basedon the respective effect-size estimates (1)hazard ratio (2)odds ra-tio and (3)rate ratio We use a variety of survival distributions andcut-off points representing length of study The results have impli-cations on study design For example under what conditions mightwe recommend a simpler design based only on event frequenciesinstead of measuring time-to-event and what length of study is rec-ommended

Predicting the Event Time in Multicenter Clinical Trials withTime-to-Event Outcome as Primary EndpointNibedita BandyopadhyayJanssen Research amp DevelopmentnbandyopitsjnjcomInterim analyses are widely used in Phase II and III clinical trialsThe efficiency in drug development process can be improved usinginterim analyses In clinical trials with time to an event as primaryendpoint it is common to plan the interim analyses at pre-specifiednumbers of events Performing these analyses at times with a differ-ent number of events than planned may impact the trialrsquos credibilityas well as the statistical properties of the interim analysis On the

other hand significant resources are required in conducting suchanalyses Therefore for logistic planning purposes it is very im-portant to predict the timing of this pre-specified number of eventsearly and accurately A statistical technique for making such pre-diction in ongoing multicenter clinical trials is developed Resultsare illustrated for different scenarios using simulations

Empirical Comparison of Small Sample Performance for theLogrank Test and Resampling Methods with High CensoringRatesYu Deng and Jianwen CaiUniversity of North Carolina at Chapel HillyudengliveunceduLogrank test is commonly used for comparing survival distributionsbetween treatment and control groups When censoring rate is lowand the sample size is moderate the approximation based on theasymptotic normal distribution of the logrank test works well in fi-nite samples However in some studies the sample size is small(eg 10 20 per group) and the censoring rate is high (eg 0809) Under such conditions we conduct a series of simulationsto compare the performance of the logrank test based on normal ap-proximation permutation and bootstrap In general the type I errorrate based on the bootstrap test is slightly inflated when the numberof failures is larger than 2 while the logrank test based on normalapproximation has a type I error around 005 and the permutationtest is relatively conservative in type I error However when thereis only one failure per group type I error of the permutation test ismore close to 005 than the other two tests

Session 71 Complex Data Analysis Theory and Appli-cation

Supervised Singular Value Decomposition and Its AsymptoticPropertiesGen Li1 Dan Yang2 Haipeng Shen1 and Andrew Nobel11University of North Carolina at Chapel Hill2Rutgers UniversityhaipengemailunceduWe develop a supervised singular value decomposition (SupSVD)model for supervised dimension reduction The research is moti-vated by applications where the low rank structure of the data ofinterest is potentially driven by additional variables measured onthe same set of samples The SupSVD model can make use of theinformation in the additional data to accurately extract underlyingstructures that are more interpretable The model is very generaland includes the principal component analysis model and the re-duced rank regression model as two extreme cases We formulatethe model in a hierarchical fashion using latent variables and de-velop a modified expectation-maximization algorithm for parame-ter estimation which is computationally efficient The asymptoticproperties for the estimated parameters are derived We use com-prehensive simulations and two real data examples to illustrate theadvantages of the SupSVD model

New Methods for Interaction SelectionNing Hao1 Hao Helen Zhang1 and Yang Feng2

1University of Arizona2Columbia UniversitynhaomatharizonaeduIt is a challenging task to identify interaction effects for high di-mensional data The main difficulties lie in both computational and

88 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

theoretical aspects We propose a new framework for interaction se-lection Efficient computational algorithms based on both forwardselection and penalization approaches are illustrated

A Statistical Approach to Set Classification by Feature Selectionwith Applications to Classification of Histopathology ImagesSungkyu Jung1 and Xingye Qiao2

1University of Pittsburgh2Binghamton University State University of New YorkqiaomathbinghamtoneduSet classification problems arise when classification tasks are basedon sets of observations as opposed to individual observations In setclassification a classification rule is trained with N sets of observa-tions where each set is labeled with class information and the pre-diction of a class label is performed also with a set of observationsData sets for set classification appear for example in diagnosticsof disease based on multiple cell nucleus images from a single tis-sue Relevant statistical models for set classification are introducedwhich motivate a set classification framework based on context-freefeature extraction By understanding a set of observations as an em-pirical distribution we employ a data-driven method to choose thosefeatures which contain information on location and major variationIn particular the method of principal component analysis is usedto extract the features of major variation Multidimensional scal-ing is used to represent features as vector-valued points on whichconventional classifiers can be applied The proposed set classifica-tion approaches achieve better classification results than competingmethods in a number of simulated data examples The benefits ofour method are demonstrated in an analysis of histopathology im-ages of cell nuclei related to liver cancer

A Smoothing Spline Model for analyzing dMRI Data of Swal-lowingBinhuan Wang Ryan Branski Milan Amin and Yixin FangNew York UniversityyixinfangnyumcorgSwallowing disorders are common and have a significant healthimpact Dynamic magnetic resonance imaging (dMRI) is a noveltechnique for visualizing the pharynx and upper esophageal seg-ment during a swallowing process We develop a smoothing splinemethod for analyzing swallow dMRI data We apply the method toa dataset obtained from an experiment conducted in the NYU VoiceCenter

Session 72 Recent Development in Statistics Methods forMissing Data

A Semiparametric Inference to Regression Analysis with Miss-ing Covariates in Survey DataShu Yang and Jae-kwang KimIowa State UniversityjkimiastateeduWe consider parameter estimation in parametric regression modelswith covariates missing at random in survey data A semiparametricmaximum likelihood approach is proposed which requires no para-metric specification of the marginal covariate distribution We ob-tain an asymptotic linear representation of the semiparametric max-imum likelihood estimator (SMLE) using the theory of von Misescalculus and V Statistics which allows a consistent estimator ofasympototic variance An EM-type algorithm for computation isdiscussed We extend the methodology for general parameter es-timation which is not necessary equal to MLE Simulation results

suggest that the SMLE method is robust whereas the parametricmaximum likelihood method is subject to severe bias under modelmisspecification

Multiple Robustness in Missing Data AnalysisPeisong Han1 and Lu Wang2

1University of Waterloo2University of MichiganpeisonghanuwaterloocaWe propose an estimator which is more robust than doubly robustestimators by weighting the complete cases using weights otherthan the inverse probability when estimating the population meanof a response variable that is subject to ignorable missingness Weallow multiple models for both the propensity score and the out-come regression Our estimator is consistent if any one of the mul-tiple models is correctly specified Such multiple robustness againstmodel misspecification significantly improves over the double ro-bustness which only allows one propensity score model and oneoutcome regression model Our estimator attains the semiparamet-ric efficiency bound when one propensity score model and one out-come regression model are correctly specified without requiring theknowledge of exactly which two are correct

Imputation of Binary Variables with SAS and IVEwareYi Pan1 and Riguang Song1

1United States Centers for Disease Control and Preventionjnu5cdcgovIn practice it is a challenge to impute missing values of binary vari-ables For a monotone missing pattern imputation methods avail-able in SAS include the LOGISTIC method which uses logistic re-gression modeling and the DISCRIM method which only allowscontinuous variables in the imputation model For an arbitrary miss-ing pattern a fully conditional specification (FCS) method is nowavailable in SAS This method only assumes the existence of a jointdistribution for all variables On the other hand IVEware devel-oped by University of Michigan Survey Research Center uses a se-quence of regression models and imputes missing values by drawingsamples from posterior predictive distributions We presents resultsfrom a series of simulations designed to evaluate and compare theperformance of the above mentioned imputation methods An ex-ample to impute the BED recent status (recent or long-standing)in estimating HIV incidence is used to illustrate the application ofthose procedures

Marginal Treatment Effect Estimation Using Pattern-MixtureModelZhenzhen XuUnited States Food and Drug AdministrationzhenzhenxufdahhsgovMissing data often occur in clinical trials When the missingness de-pends on unobserved responses pattern mixture model is frequentlyused This model stratifies the data according to drop-out patternsand formulates a model for each pattern with specific parametersThe resulting marginal distribution of response is a mixture of dis-tribution over the missing data patterns If the eventual interest is toestimate the overall treatment effect one can calculate a weightedaverage of pattern-specific treatment effects assuming that the treat-ment assignment is equally distributed across patterns Howeverin practice this assumption is unlikely to hold As a result theweighted average approach is subject to bias In this talk we in-troduce a new approach to estimate marginal treatment effect basedon random-effects pattern mixture model for longitudinal studieswith continuous endpoint relaxing the homogeneous distributional

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 89

Abstracts

assumption on treatment assignment across missing data patternsA simulation study shows that under missing not at random mech-anism the proposed approach can yield substantial reduction in es-timation bias and improvement in coverage probability comparedto the weighted average approach The proposed method is alsocompared with the linear mixed model and generalized estimatingequation approach under various missing data mechanisms

Session 73 Machine Learning Methods for Causal Infer-ence in Health Studies

Causal Inference of Interaction Effects with Inverse PropensityWeighting G-Computation and Tree-Based StandardizationJoseph Kang1 Xiaogang Su2 Lei Liu1 and Martha Daviglus31 Northwestern University2 University of Texas at El Paso3 University of Illinois at Chicagojoseph-kangnorthwesternedu

Given the recent interest of subgroup-level studies and personalizedmedicine health research with observational studies has been devel-oped for interaction effects of measured confounders In estimatinginteraction effects the inverse of the propensity weighting (IPW)method has been widely advocated despite the immediate availabil-ity of other competing methods such as G-computation estimatesThis talk compares the advocated IPW method the G-computationmethod and our new Tree-based standardization method whichwe call the Interaction effect Tree (IT) The IT procedure uses alikelihood-based decision rule to divide the subgroups into homo-geneous groups where the G-computation can be applied Our sim-ulation studies indicate that the IT-based method along with the G-computation works robustly while the advocated IPW method needssome caution in its weighting We applied the IT-based method toassess the effect of being overweight or obese on coronary arterycalcification (CAC) in the Chicago Healthy Aging Study cohort

Practice of Causal Inference with the Propensity of Being Zeroor OneJoseph Kang1 Wendy Chan1 Mi-Ok Kim2 and Peter M Steiner31 Northwestern University2University of CincinnatiCincinnati Childrenrsquos Hospital MedicalCenter3University of Wisconsin-Madisonwendychan2016unorthwesternedu

Causal inference methodologies have been developed for the pastdecade to estimate the unconfounded effect of an exposure underseveral key assumptions These assumptions include the absenceof unmeasured confounders the independence of the effect of onestudy subject from another and propensity scores being boundedaway from zero and one (the positivity assumption) The first twoassumptions have received much attention in the literature Yet thepositivity assumption has been recently discussed in only a few pa-pers Propensity scores of zero or one are indicative of deterministicexposure so that causal effects cannot be defined for these subjectsTherefore these subjects need to be removed because no compa-rable comparison groups can be found for such subjects In thispaper we evaluate and compare currently available causal inferencemethods in the context of the positivity assumption We propose atree-based method that can be easily implemented in R software Rcode for the studies is available online

Propensity Score and Proximity Matching Using Random For-estPeng Zhao1 Xiaogang Su2 and Juanjuan Fan1

1San Diego State University2University of Texas at El PasojjfanmailsdsueduTo reduce potential bias in observational studies it is essential tohave balanced distributions on all available background informa-tion between cases and controls Propensity score has been a keymatching variable in this area However this approach has severallimitations including difficulties in handling missing values cate-gorical variables and interactions Random forest as an ensembleof many classification trees is straightforward to use and can eas-ily overcome those issues Each classification tree in random forestrecursively partitions the available dataset into sub-sets to increasethe purity of the terminal nodes With this process the cases andcontrols in the same terminal node automatically becomes the bestbalanced match By averaging the outcome of each individual treerandom forest can provide robust and balanced matching resultsThe proposed method is applied to data from the National Healthand Nutrition Examination Survey (NHNES)

Session 74 JP Hsu Memorial Session

Weighted Least-Squares Method for Right-Censored Data inAccelerated Failure Time ModelLili YuGeorgia Southern UniversitylyugeorgiasoutherneduThe classical accelerated failure time (AFT) model has been exten-sively investigated due to its direct interpretation of the covariateeffects on the mean survival time in survival analysis Howeverthis classical AFT model and its associated methodologies are builton the fundamental assumption of data homoscedasticity Conse-quently when the homoscedasticity assumption is violated as of-ten seen in the real applications the estimators lose efficiency andthe associated inference is not reliable Furthermore none of theexisting methods can estimate the intercept consistently To over-come these drawbacks we propose a semiparametric approach inthis paper for both homoscedastic and heteroscedastic data Thisapproach utilizes a weighted least-squares equation with syntheticobservations weighted by square root of their variances where thevariances are estimated via the local polynomial regression We es-tablish the limiting distributions of the resulting coefficient estima-tors and prove that both slope parameters and the intercept can beconsistently estimated We evaluate the finite sample performanceof the proposed approach through simulation studies and demon-strate its superiority through real example on its efficiency and reli-ability over the existing methods when the data is heteroscedastic

A Comparison of Size and Power of Tests of Hypotheses on Pa-rameters Based on Two Generalized Lindley DistributionsMacaulay OkwuokenyeBiogen IdecmacaulayokwuokenyebiogenideccomData (complete and censored) following the Lindley distributionare generated and analyzed using two generalized Lindley distribu-tions and maximum likelihood estimates of parameters from gen-eralized Lindley distributions are obtained Size and power of testsof hypotheses on the parameters are assessed drawing on asymp-totic properties of the maximum likelihood estimators Results sug-gest that whereas size of some of the tests of hypotheses based on

90 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

the considered generalized distributions are essentially alpha-levelsome are possibly not power of tests of hypotheses on Lindley dis-tribution parameter from the two distributions differs

Session 75 Challenge and New Development in ModelFitting and Selection

Robust Estimates of Divergence Times and Selection with aPoisson Random Field ModelAmei Amei1 and Brian Tilston Smith2

1University of Nevada at Las Vegas2American Museum of Natural HistoryameiameiunlveduMutation frequencies can be modeled as a Poisson random field(PRF) to estimate speciation times and the degree of selection onnewly arisen mutations This approach provides a quantitative the-ory for comparing intraspecific polymorphism with interspecific di-vergence in the presence of selection and can be used to estimatepopulation genetic parameters First we modified a recently devel-oped time-dependent PRF model to independently estimate geneticparameters from a nuclear and mitochondrial DNA data set of 22sister pairs of birds that have diverged across a biogeographic bar-rier We found that species that inhabit humid habitat had more re-cent divergence times larger effective population sizes and smallerselective effect than those that inhabit drier habitats but overall themitochondrial DNA was under weak selection Our study indicatesthat PRF models are useful for estimating various population ge-netic parameters and serve as a framework for incorporating esti-mates of selection into comparative phylogeographic studies Sec-ond due to the built-in feature of the species divergence time thetime-dependent PRF model is especially suitable for estimating se-lective effects of more recent mutations such as the mutations thathave occurred in the human genome By analyzing the estimateddistribution of the selective coefficients at each individual gene forexample the sign and magnitude of the mean selection coefficientwe will be able to detect a gene or a group of genes that are relatedto the diagnosed cancer Moreover the estimate of the species diver-gence time will provide useful information regarding the occurrencetime of the cancer

On A Class of Maximum Empirical Likelihood Estimators De-fined By Convex FunctionsHanxiang Peng and Fei TanIndiana University-Purdue University IndianapolisftanmathiupuieduIn this talk we introduce a class of estimators defined by convexcriterion functions and show that they are maximum empirical like-lihood estimators (MELEs) We apply the results to obtain MELEsfor quantiles quantile regression and Cox regression when addi-tional information is available We report some simulation resultsand real data applications

Properties of the Marginal Survival Functions for DependentCensored Data under an assumed Archimedean CopulaAntai WangNew Jersey Institute of Technologyaw224njiteduGiven a dependent censored data (X delta) =(min(TC) I(T lt C)) from an Archimedean copula modelwe give general formulas for possible marginal survival functionsof T and C Based on our formulas we can easily establish therelationship between all these survival functions and derive some

useful identifiability results Also based on our formulas we pro-pose a new estimator of the marginal survival function when theArchimedean copula model is assumed to be known We derivebias formulas for our estimator and other existing estimators Simu-lation studies have shown that our estimator is comparable with thecopula-graphic estimator proposed by Zheng and Klein (1995) andRivest and Wells (2001) and Zheng and Kleinrsquos estimator (1994)under the Archimedean copula assumption We end our talk withsome discussions

Dual Model Misspecification in Generalized Linear Models withError in VariablesXianzheng HuangUniversity of Southern CaliforniahuangstatsceduWe study maximum likelihood estimation of regression parametersin generalized linear models for a binary response with error-pronecovariates when the distribution of the error-prone covariate or thelink function is misspecified We revisit the remeasurement methodproposed by Huang Stefanski and Davidian (2006) for detectinglatent-variable model misspecification and examine its operatingcharacteristics in the presence of link misspecification Further-more we propose a new diagnostic method for assessing assump-tions on the link function Combining these two methods yieldsinformative diagnostic procedures that can identify which model as-sumption is violated and also reveal the direction in which the truelatent-variable distribution or the true link function deviates fromthe assumed one

Session 76 Advanced Methods and Their Applications inSurvival Analysis

Kernel Smoothed Profile Likelihood Estimation in the Acceler-ated Failure Time Frailty Model for Clustered Survival DataBo Liu1 Wenbin Lu1 and Jiajia Zhang2

1North Carolina State University2South Carolina UniversityjzhangmailboxsceduClustered survival data frequently arise in biomedical applicationswhere event times of interest are clustered into groups such as fam-ilies In this article we consider an accelerated failure time frailtymodel for clustered survival data and develop nonparametric max-imum likelihood estimation for it via a kernel smoother aided EMalgorithm We show that the proposed estimator for the regressioncoefficients is consistent asymptotically normal and semiparamet-ric efficient when the kernel bandwidth is properly chosen An EM-aided numerical differentiation method is derived for estimating itsvariance Simulation studies evaluate the finite sample performanceof the estimator and it is applied to the Diabetic Retinopathy dataset

Model-free Screening for Lifetime Data Analysis withUltrahigh-dimensional Biomarkers Survival ImpactingJialiang Li1 Qi Zheng2 and Limin Peng2

1National University of Singapore2Emory UniversityqizhengemoryeduMarginal regression-based ranking methods are widely adopted toscreen ultrahigh-dimensional biomarkers in biomedical studies Anassumed regression model may not fit a real data in practice Weconsider a model-free screening approach specifically for censoredlifetime data outcome by measuring the average survival differences

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 91

Abstracts

with and without the covariates The proposed survival impactingindex can be implemented with familiar nonparametric estimationprocedures and avoid imposing any rigid model assumptions Weestablish the sure screening property of the index and the asymptoticdistribution of the estimated index to facilitate inferences Simula-tions are carried out to assess the performance of our method Alung cancer data is analyzed as an illustration

Analysis of Event History Data in Tuberculosis (TB) ScreeningJoan HuSimon Fraser UniversityjoanhstatsfucaTuberculosis (TB) is an infectious disease spread by the airborneroute An important public health intervention in TB prevention istracing individuals (TB contacts) who may be at risk of having TBinfection or active TB disease as a result of having shared air spacewith an active TB case This talk presents an analysis of the datacollected from 7921 people identified as contacts from the TB reg-istry of British Columbia Canada in attempt to identify risk factorsto TB development of TB contacts Challenges encountered in theanalysis include clustered subjects covariate missing not at random(MNAR or NMAR) and a portion of subjects potentially will neverexperience the event of TB

On the Dependence Structure of Bivariate Recurrent EventProcesses Inference and EstimationJing Ning1 Yong Chen2 Chunyan Cai2 Xuelin Huang1 and Mei-Cheng Wang3

1University of Texas MD Anderson Cancer Center2University of Texas Health Science Center at Houston3Johns Hopkins UniversityjningmdandersonorgBivariate or multivariate recurrent event processes are often encoun-tered in longitudinal studies in which more than one type of eventsare of interest There has been much research on regression analy-sis for such data but little has been done to address the problem ofhow to measure dependence between two types of recurrent eventprocesses We propose a time-dependent measure termed the rateratio to assess the local dependence between two types of recur-rent event processes We model the rate ratio as a parametric re-gression function of time and leave unspecified all other aspects ofthe distribution of bivariate recurrent event processes We developa composite-likelihood procedure for model fitting and parameterestimation We show that the proposed composite-likelihood esti-mator possesses consistency and asymptotically normality propertyThe finite sample performance of the proposed method is evaluatedthrough simulation studies and illustrated by an application to datafrom a soft tissue sarcoma study

Session 77 High Dimensional Variable Selection andMultiple Testing

On Procedures Controlling the False Discovery Rate for TestingHierarchically Ordered HypothesesGavin Lynch and Wenge GuoNew Jersey Institute of TechnologywengeguonjiteduComplex large-scale studies such as those related to microarray andquantitative trait loci often involve testing multiple hierarchicallyordered hypotheses However most existing false discovery rate(FDR) controlling procedures do not exploit the inherent hierarchi-cal structure among the tested hypotheses In this talk I present key

developments toward controlling the FDR when testing the hierar-chically ordered hypotheses First I offer a general framework un-der which hierarchical testing procedures can be developed Then Ipresent hierarchical testing procedures which control the FDR undervarious forms of dependence Simulation studies show that theseproposed methods can be more powerful than alternative methods

Sufficient Dimension Reduction in Binary ClassificationSeung Jun Shin1 Yichao Wu2 Hao Helen Zhang3 and Yufeng Liu4

1University of Texas MD Anderson Cancer Center2North Carolina State University3University of Arizona4University of North Carolina at Chapel HillwustatncsueduReducing dimensionality of data is essential for binary classifica-tion with high-dimensional covariates In the context of sufficientdimension reduction (SDR) most if not all existing SDR meth-ods suffer in binary classification In this talk we target directly atthe SDR for binary classification and propose a new method basedon support vector machines The new method is supported by bothnumerical evidence and theoretical justification

Rate Optimal Multiple Testing Procedure (ROMP) in High-dimensional RegressionZhigen Zhao1 and Pengsheng Ji21Temple University2University of GeorgiapsjiugaeduThe variable selection and multiple testing problems for regres-sion have almost the same goalndashidentifying the important variablesamong many The research has been focusing on selection consis-tency which is possible only if the signals are sufficiently strongOn the contrary the signals in more modern applications are usu-ally rare and weak In this paper we developed a two-stage testingprocedure named it as ROMP short for the Rate Optimal Multi-ple testing Procedure because it achieves the fastest convergencerate of marginal false non-discovery rate (mFNR) while control-ling the marginal false discovery rate (mFDR) at any designatedlevel alpha asymptotically

Pathwise Calibrated Active Shooting Algorithm with Applica-tion to Semiparametric Graph EstimationTuo Zhao1 and Han Liu2

1Johns Hopkins University2Princeton UniversityhanliuprincetoneduThe pathwise coordinate optimization ndash combined with the activeset strategy ndash is arguably one of the most popular computationalframeworks for high dimensional problems It is conceptually sim-ple easy to implement and applicable to a wide range of convexand nonconvex problems However there is still a gap betweenits theoretical justification and practical success For high dimen-sional convex problems existing theories only show sublinear ratesof convergence For nonconvex problems almost no theory on therates of convergence exists To bridge this gap we propose a novelunified computational framework named PICASA for pathwise co-ordinate optimization The main difference between PICASA andexisting pathwise coordinate descent methods is that we exploit aproximal gradient pilot to identify an active set Such a modifica-tion though simple has profound impact with high probabilityPICASA attains a global geometric rate of convergence to a uniquesparse local solution with good statistical properties (eg minimaxoptimality oracle property) for solving a large family of convex and

92 2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18

Abstracts

nonconvex problems Unlike most existing analysis which assumesthat all the computation can be carried out exactly without worry-ing about numerical precision our theory explicitly counts the nu-merical computation accuracy and thus is more realistic The PI-CASA method is quite general and can be combined with differentcoordinate descent strategies such as cyclical coordinate descentgreedy coordinate descent and randomized coordinate descent As

an application we apply the PICASA method to a family of noncon-vex optimization problems motivated by estimating semiparametricgraphical models The PICASA method allows us to obtain newstatistical recovery results on both parameter estimation and graphselection consistency which do not exist in the existing literatureThorough numerical results are also provided to back up our theo-retical arguments

2014 ICSA-KISS Applied Statistics Symposium Portland June 15-18 93

Index of Authors

Abantovalle C 19 38Abe N 30 78Ahn S 30 78Akacha M 31 82Allen GI 31 82Amei A 34 91Amin M 33 89Apanasovich TV 29 74Artemiou A 31 81Au S 32 85Aue A 24 56author) TZ( 27 67

Bai X 26 61Baiocchi M 28 71Bakanda C 21 42Baker Y 31 82Balasubramanian K 26 60Ball G 21 44Bandyopadhyay N 33 88Bao S 32 32 84 84Barrdahl M 22 49Bayman EO 30 77Becker K 21 42Bengtsson T 33 86Berger TW 21 45Bernhardt P 26 63Beyene J 21 42Bhamidi S 29 72Bidouard J 20 39Blocker AW 20 40Boerwinkle E 31 79Bornn L 20 39Boye ME 20 40Brannath W 23 50Branski R 33 89Braun T 22 47Breidt J 24 55Bretz F 23 50Brown ER 28 69Brown M 24 54

Cai C 23 34 53 92Cai J 31 33 81 88Campbell J 19 38Candille S 22 47Cao G 22 49Carriere KC 30 79Cepurna WO 32 85

Chan G 21 45Chan W 34 90Chang H 31 80Chang J 26 63Chatterjee A 31 81Chatterjee N 22 49Chen B 28 70Chen G 29 32 71 85Chen H 29 74Chen L 28 69Chen M 19 20 21 23 29

38 40 44 52 73Chen Q 31 82Chen R 31 79Chen S 25 28 58 70Chen T 31 80Chen X 26 61Chen Y 23 24 34 53 54

92Chen Z 22 29 33 49 73

87Cheng G 19 36Cheng X 20 39Cheng Y 21 27 44 65Chervoneva I 29 74Cheung YK 27 64Chi E 29 75Chiang AY 28 68Chiruvolu P 21 44Cho J 23 24 52 54Cho S 30 78Choi D 32 85Choi DS 24 54Choi S 22 33 48 87Chu R 21 42Chuang-Stein C 20 42Chun H 26 61Coan J 27 67Colantuoni E 28 71Collins R 21 42Coneelly K 22 47Cook R 28 70Coram M 22 47Crespi C 23 50Cui L 30 77Cui Y 22 33 46 87

DrsquoAmico E 21 42Dıaz I 28 71

Dabuxilatu W 29 72Dai J 27 65Daviglus M 34 90DeFor T 21 45Degras D 27 67Deng K 24 55Deng Y 33 88Dey D 19 38 64Dey DK 19 38Dey J 21 44Di Y 19 37Dinwoodie I 30 78Djorgovski G 20 39Dominici F 28 68Donalek C 20 39Dong G 23 50Dong Y 31 31 81 81Dormitorio B 33 88Drake A 20 39Du Z 24 54Duan Y 19 38Dunloop D 32 84Dyk DV 20 38

Edlefsen PT 21 43Elliott M 21 42Etzioni R 32 32 83 83

Fan B 32 85Fan J 34 90Fan Y 26 63Fang L 21 25 45 57Fang Y 33 89Faries D 26 61Faruquie T 30 78Fei T 24 54Feng H 22 47Feng Y 29 31 33 71 81

88Feng Z 32 85Fink J 21 44Fisch R 28 68Franceschini N 31 79Freydin B 29 74Fu H 23 25 25 29 50 59

59 74

Gaines D 19 38Gao B 25 57

Gentleman R 19Gneiting T 32 84Gong Q 21 45Graham M 20 39Gu C 32 85Guan W 22 48Gulati R 32 83Gulukota K 24 53Guo S 25 57Guo W 35 92Guo X 19 36

Ha MJ 23 51Hale MD 25 57Han B 33 87Han L 25 57Han P 34 89Han SW 29 73Haneuse S 28 68Hannig J 23 27 51 66Hao N 33 88He K 26 63He QA 28 68He T 22 46He W 20 24 42 57He X 23 53He Y 19 36Heitjan DF 25 32 59 84Hernandez-Stumpfhauser

D 24 55Ho S 30 75Hong CS 30 78Hong H 30 78Hong Y 19 38Hopcroft J 27 65Hormann S 24 56Hou L 23 52Houseman EA 20 40Hsu C 25 57Hsu L 20 41Hsu W 22 49Hu J 34 92Hu M 24 55Hu P 22 49Hu Y 19 37Huang C 21 21 45 45Huang J 25 60Huang M 31 81

95

Huang X 29 33 34 3474 87 91 92

Huang Y 23 26 51 62Hubbard R 32 83Huerta G 27 67Hung HJ 30 76Hung J 64Huo X 22 48

Ibrahim JG 20 31 40 82Inoue LY 32 83Islam SS 20 42

Jackson C 27 67Ji P 35 92Ji Y 24 24 28 53 53 68Jia N 25 59Jia X 27 64Jiang H 19 30 36 78Jiang Q 20 21 42 44Jiang X 19 38Jiang Y 19 36Jiao X 20 38Jin Z 21 44Johnson EC 32 85Johnson K 26 62Joshi AD 22 49Joslyn S 32 84Jung S 33 89Justice AC 25 59

Kai B 31 81Kambadur A 30 78Kang J 31 34 34 81 90

90Katki H 32 83Kim DW 33 86Kim J 34 89Kim JK 28 70Kim M 34 90Kim S 31 79Kim Y 22 48Kolivras K 19 38Kong L 29 75Kooperberg C 27 65Kosorok MR 29 71Kovalchik S 21 42Kracht K 21 44Kraft P 22 49Kuo H 22 48Kuo RC 19 38Kwon M 25 60

Lai M 32 82Lai RCS 23 51Lai T 28 71Lai TL 28 33 71 86Landon J 32 85Lang K 30 78Lavori PW 28 71Leary E 27 67

Lebanon G 26 60Lecci F 20 39Lee CH 21 45Lee J 24 32 53 84Lee KH 28 68Lee M 30 76Lee MT 24 56Lee S 27 64Lee SY 25 60Lee TCM 23 51Lenzenweger MF 21 43Leu CS 27 65Levin B 27 65Levy DL 21 43Li C 22 48Li D 31 80Li F 27 67Li G 23 27 33 50 66 88Li H 23 50Li J 19 34 37 38 91Li L 23 26 26 31 52 60

61 80Li M 19 22 37 48Li P 27 65Li R 26 62Li X 23 49Li Y 23 25 25 26 29 30

53 59 59 6375 79

Li-Xuan L 29 73Lian H 19 36Liang B 20 41Liang F 24 53Liang H 27 65Liao OY 33 86Lim J 22 48Lin D 28 31 69 79Linder D 31 81Lindquist M 27 67Lipshultz S 26 62Lipsitz S 26 62Liu B 34 91Liu D 20 22 41 46Liu H 28 35 70 92Liu J 26 61Liu JS 24 55Liu K 27 66Liu L 34 90Liu M 20 39 40Liu R 29 73Liu S 33 87Liu X 20 21 41 44Liu XS 24 54Liu Y 22 35 46 92Liu Z 31 82Long Q 28 69Lonita-Laza I 20 40Lou X 25 60Lozano A 30 78Lu T 27 65Lu W 20 34 39 91

Lu Y 20 32 39 85Luo R 27 65Luo S 23 51Luo X 21 30 45 77Lv J 26 63Lynch G 35 92

Ma H 20 22 42 49Ma J 29 72Ma P 20 40Ma TF 24 56Ma Z 22 46Maca J 30 76Mahabal A 20 39Mai Q 26 64Majumdar AP 27 66Malinowski A 21 46Mandrekar V 22 46Manner D 23 50Marniquet X 20 39Martin R 27 66Martino S 21 42Matthews M 33 86Maurer W 23 50McGuire V 32 85McIsaac M 28 70McKeague IW 31 80Meng X 27 64 66Mesbah M 24 56Mi G 19 37Mias GI 19 37Michailidis G 29 72Mills EJ 21 42Min X 28 68Mitra R 24 53Mizera I 29 75Molinaro A 28 69Monsell BC 30 78Morgan CJNA 21 43Morrison JC 32 85Mueller P 24 28 53 68

Nachega JB 21 42Naranjo J 33 88Nettleton D 23 51Nguyen HQ 22 47Nie L 29 75Nie X 23 51Ning J 23 28 30 33 34

53 70 78 87 92Nobel A 33 88Nobel AB 29 72Nordman DJ 23 51Norinho DD 24 56Normand S 25North KE 31 79Norton JD 20 41Nosedal A 27 67

Offen W 64Ogden RT 29 74

Ohlssen D 28 68Okwuokenye M 34 90Olshen A 28 69Owen AB 27 66Ozekici S 32 85

Paik J 28 71Pan G 30 76Pan J 31 80Pan Y 34 89Park D 28 67Park DH 22 48Park S 64Park T 25 60Pati D 26 62Peng H 31 34 80 91Peng J 19 37Peng L 26 34 62 91Perry P 24 54Peterson C 31 81Phoa FKH 31 79Pinheiro J 25 57Planck SR 32 85Prentice R 20 41Price K 23 33 50 87Prisley S 19 38Pullenayegum E 21 42

Qazilbash M 22 47Qi X 27 65Qian PZG 23 51Qiao X 29 33 71 89Qin J 21 28 45 70Qin R 27 64Qin ZS 24 55Qiu J 31 79Qiu Y 29 73Qu A 32 82Quartey G 20 42

Raftery A 32 84Ravikumar P 31 82Rayamajhi J 33 86Ren Z 29 73Rohe K 24 54Rosales M 21 43Rosenbaum JT 32 85Rosenblum M 28 71Rube HT 19 37Rubin D 29 74

Saegusa T 24 54Salzman J 19 36Samawi H 31 81Samorodnitsky G 27 65Samworth RJ 31 81Schafer DW 19 37Schlather M 21 46Schmidli H 31 82Schrag D 28 68Scott J 20 42

Shadel W 21 42Shao Y 25 57Shariff H 20 38She B 32 84Shen H 33 88Shen W 20 30 40 78Shen Y 28 70Shepherd J 32 85Shi P 32 82Shih M 28 71Shin J 30 78Shin SJ 35 92Shojaie A 24 29 54 72Shu X 32 82Shui M 32 84SienkiewiczE 21 46Simon N 33 86Simon R 33 86Sinha D 26 62Sloughter JM 32 84Smith B 25 57Smith BT 34 91Snapinn S 21 44Song C 28 68Song D 21 45Song J 32 84Song JS 19 37Song M 22 49Song R 23 34 51 89Song X 20 40Soon G 29 30 75 75Sorant AJ 25 59Soyer R 32 85Sriperambadur B 26 60Steiner PM 34 90Stingo F 31 81Strawderman R 28 69Su X 26 34 61 90Su Z 26 61Suh EY 30 76Suktitipat B 25 59Sun D 27 66Sun J 23 53Sun N 22 48Sun Q 29 72Sun T 22 46Sun W 23 51Sung H 25 59Suresh R 30 77Symanzik J 32 84

Tamhane A 30 76Tan F 34 91Tang CY 26 63

Tang H 22 47Tang Y 26 62Tao M 22 48Tao R 31 79Taylor J 22 47Tewson P 32 84Thabane L 21 42Thall PF 22 47Todem D 22 49Trippa L 28 68Trotta R 20 38Tucker A 26 62

Vannucci M 31 81Verhaak RG 24 54Vogel R 31 81Vrtilek S 20 39

Wahed A 21 44Waldron L 24 55Wang A 34 91Wang B 33 89Wang C 24 56Wang D 30 77Wang G 29 74Wang H 21 23 46 53Wang J 26 27 63 66Wang L 29 32 34 74 82

89Wang M 28 34 69 92Wang Q 26 27 61 67Wang R 19 37Wang S 29 31 74 80Wang W 31 79Wang X 19 25 32 38 58

84Wang Y 20 20 22 25 25

41 41 48 58 59Wang Z 25 25 33 59 59

87Wei WW 30 79Wei Y 20 40Wen S 20 21 42 44Weng H 29 71Weng RC 19 38Wettstein G 20 39Whitmore GA 24 56Wileyto EP 25 59Wilson AF 25 59Wilson JD 29 72Witten D 23 51Woerd MVD 24 55Wolf M 33 86Wolfe PJ 24 54

Wong WK 23 31 31 5079 79

Wu C 32 82Wu D 24 55Wu H 22 27 47 65Wu J 22 32 47 85Wu M 23 52Wu R 31 80Wu S 23 50Wu Y 21 26 30 35 43

63 77 92

Xi D 30 76Xia J 32 83Xia T 20 39Xiao R 22 48Xie J 31 81Xie M 32 85Xing H 22 48Xing X 20 40Xiong J 24 57Xiong X 22 47Xu K 25 59Xu R 23 51Xu X 25 58Xu Y 28 68Xu Z 34 89Xue H 27 65Xue L 32 82

Yang B 30 77Yang D 33 88Yang E 31 82Yang S 24 28 34 56 70

89Yao R 30 77Yao W 31 81Yau CY 24 56Yavuz I 21 44Yi G 24 57Yin G 33 87Ying G 25 32 59 84Young LJ 27 67Yu C 28 70Yu D 29 75Yu L 31 34 81 90Yu Y 31 81Yuan Y 30 33 33 78 87

87

Zacks S 27 65Zang Y 33 87Zeng D 20 20 28 29 31

41 41 69 7179 82

Zhan M 21 44Zhang B 29 75Zhang C 19 23 36 52Zhang D 20 26 40 63Zhang G 22 49Zhang H 19 28 36 68Zhang HH 26 33 35 63

88 92Zhang I 25 58Zhang J 28 34 69 91Zhang L 21 30 44 77Zhang N 29 75Zhang Q 25 59Zhang S 25 58Zhang W 20 39Zhang X 19 23 26 36 53

63Zhang Y 24 25 54 58Zhang Z 21 27 29 46 67

75Zhao H 23 29 52 73Zhao L 22 25 47 59Zhao N 23 52Zhao P 34 90Zhao S 25 57Zhao T 35 92Zhao Y 29 33 74 87Zhao Z 35 92Zheng C 29 72Zheng Q 34 91Zheng Y 20 29 41 72Zheng Z 26 63Zhong H 29 73Zhong L 25 57Zhong P 22 46Zhong W 20 22 40 46Zhou H 22 26 29 48 60

73Zhou L 31 82Zhou Q 29 73Zhou T 23 50Zhou Y 30 77Zhu G 26 61Zhu H 26 60Zhu J 26 63Zhu L 21 44Zhu M 26 62Zhu Y 24 55Zhu Z 24 32 56 84Zou F 22 48Zou H 26 64Zou W 33 87

  • Welcome
  • Conference Information
    • Committees
    • Acknowledgements
    • Conference Venue Information
    • Program Overview
    • Keynote Lectures
    • Student Paper Awards
    • Short Courses
    • Social Program
    • ICSA 2015 in Fort Collins CO
    • ICSA 2014 China Statistics Conference
    • ICSA Dinner at 2014 JSM
      • Scientific Program
        • Monday June 16 800 AM - 930 AM
        • Monday June 16 1000 AM-1200 PM
        • Monday June 16 130 PM - 310 PM
        • Monday June 16 330 PM - 510 PM
        • Tuesday June 17 820 AM - 930 AM
        • Tuesday June 17 1000 AM - 1200 PM
        • Tuesday June 17 130 PM - 310 PM
        • Tuesday June 17 330 PM - 530 PM
        • Wednesday June 18 830 AM - 1010 AM
        • Wednesday June 18 1030 AM-1210 PM
          • Abstracts
            • Session 1 Emerging Statistical Methods for Complex Data
            • Session 2 Statistical Methods for Sequencing Data Analysis
            • Session 3 Modeling Big Biological Data with Complex Structures
            • Session 4 Bayesian Approaches for Modeling Dynamic Non-Gaussian Responses
            • Session 5 Recent Advances in Astro-Statistics
            • Session 6 Statistical Methods and Application in Genetics
            • Session 7 Statistical Inference of Complex Associations in High-Dimensional Data
            • Session 8 Recent Developments in Survival Analysis
            • Session 9 Industry Practice and Regulatory Pathway for Benefit-Risk Assessment of Medicinal Products
            • Session 10 Analysis of Observational Studies and Clinical Trials
            • Session 11 Lifetime Data Analysis
            • Session 12 Safety Signal Detection and Safety Analysis
            • Session 13 Survival and Recurrent Event Data Analysis
            • Session 14 Statistical Analysis on Massive Data from Point Processes
            • Session 15 High Dimensional Inference (or Testing)
            • Session 16 Phase II Clinical Trial Design with Survival Endpoint
            • Session 17 Statistical Modeling of High-throughput Genomics Data
            • Session 18 Statistical Applications in Finance
            • Session 19 Hypothesis Testing
            • Session 20 Design and Analysis of Clinical Trials
            • Session 21 New methods for Big Data
            • Session 22 New Statistical Methods for Analysis of High Dimensional Genomic Data
            • Session 23 Recent Advances in Analysis of Longitudinal Data with Informative Observation process
            • Session 24 Bayesian Models for High Dimensional Complex Data
            • Session 25 Statistical Methods for Network Analysis
            • Session 26 New Analysis Methods for Understanding Complex Diseases and Biology
            • Session 27 Recent Advances in Time Series Analysis
            • Session 28 Analysis of Correlated Longitudinal and Survival Data
            • Session 29 Clinical Pharmacology
            • Session 30 Sample Size Estimation
            • Session 31 Predictions in Clinical Trials
            • Session 32 Recent Advances in Statistical Genetics
            • Session 33 Structured Approach to High Dimensional Data with Sparsity and Low Rank Factorization
            • Session 34 Recent Developments in Dimension Reduction Variable Selection and Their Applications
            • Session 35 Post-Discontinuation Treatment in Randomized Clinical Trials
            • Session 36 New Advances in Semi-Parametric Modeling and Survival Analysis
            • Session 37 High-Dimensional Data Analysis Theory and Application
            • Session 38 Leading Across Boundaries Leadership Development for Statisticians
            • Session 39 Recent Advances in Adaptive Designs in Early Phase Trials
            • Session 40 High Dimensional RegressionMachine Learning
            • Session 41 Distributional Inference and Its Impact on Statistical Theory and Practice
            • Session 42 Applications of Spatial Modeling and Imaging Data
            • Session 43 Recent Development in Survival Analysis and Statistical Genetics
            • Session 44 Bayesian Methods and Applications in Clinical Trials with Small Population
            • Session 45 Recent Developments in Assessing Predictive Models in Survival Analysis
            • Session 46 Missing Data the Interface between Survey Sampling and Biostatistics
            • Session 47 New Statistical Methods for Comparative Effectiveness Research and Personalized medicine
            • Session 48 Student Award Session 1
            • Session 49 Network AnalysisUnsupervised Methods
            • Session 50 Personalized Medicine and Adaptive Design
            • Session 51 New Development in Functional Data Analysis
            • Session 52 Recent RegulatoryIndustry Experience in Biosimilar Trial Designs
            • Session 53 Gatekeeping Procedures and Their Application in Pivotal Clinical Trials
            • Session 54 Approaches to Assessing Qualitative Interactions
            • Session 55 Interim Decision-Making in Phase II Trials
            • Session 56 Recent Advancement in Statistical Methods
            • Session 57 Building Bridges between Research and Practice in Time Series Analysis
            • Session 58 Recent Advances in Design for Biostatistical Problems
            • Session 59 Student Award Session 2
            • Session 60 Semi-parametric Methods
            • Session 61 Statistical Challenges in Variable Selection for Graphical Modeling
            • Session 62 Recent Advances in Non- and Semi-Parametric Methods
            • Session 63 Statistical Challenges and Development in Cancer Screening Research
            • Session 64 Recent Developments in the Visualization and Exploration of Spatial Data
            • Session 65 Advancement in Biostaistical Methods and Applications
            • Session 66 Analysis of Complex Data
            • Session 67 Statistical Issues in Co-development of Drug and Biomarker
            • Session 68 New Challenges for Statistical AnalystProgrammer
            • Session 69 Adaptive and Sequential Methods for Clinical Trials
            • Session 70 Survival Analysis
            • Session 71 Complex Data Analysis Theory and Application
            • Session 72 Recent Development in Statistics Methods for Missing Data
            • Session 73 Machine Learning Methods for Causal Inference in Health Studies
            • Session 74 JP Hsu Memorial Session
            • Session 75 Challenge and New Development in Model Fitting and Selection
            • Session 76 Advanced Methods and Their Applications in Survival Analysis
            • Session 77 High Dimensional Variable Selection and Multiple Testing
              • Index of Authors
Page 5: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 6: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 7: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 8: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 9: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 10: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 11: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 12: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 13: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 14: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 15: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 16: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 17: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 18: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 19: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 20: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 21: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 22: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 23: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 24: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 25: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 26: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 27: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 28: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 29: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 30: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 31: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 32: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 33: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 34: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 35: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 36: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 37: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 38: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 39: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 40: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 41: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 42: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 43: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 44: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 45: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 46: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 47: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 48: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 49: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 50: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 51: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 52: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 53: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 54: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 55: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 56: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 57: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 58: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 59: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 60: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 61: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 62: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 63: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 64: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 65: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 66: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 67: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 68: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 69: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 70: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 71: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 72: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 73: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 74: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 75: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 76: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 77: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 78: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 79: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 80: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 81: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 82: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 83: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 84: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 85: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 86: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 87: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 88: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 89: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 90: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 91: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 92: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 93: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 94: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 95: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 96: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 97: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 98: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 99: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 100: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene
Page 101: The City of Roses Welcomes You! - WordPress.com...Mei-Ling Ting Lee U. of Maryland Yoonkyung Lee Ohio State U. Meng -Ling Liu New York U. Xinhua Liu Columbia U. Xiaolong Luo Celgene

Recommended