+ All Categories
Home > Documents > PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster...

PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster...

Date post: 22-Mar-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
138
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page that your abstract is on and put your poster on the poster board with the corresponding number (e.g., if your abstract is on page 50, put your poster on board #50). Proceedings papers with oral presentations #2-39 are not assigned poster space. Papers are organized first by session, then the last name of the first author. Presenting authors’ names are underlined.
Transcript
Page 1: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

PACIFICSYMPOSIUMONBIOCOMPUTING2017

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwiththecorrespondingnumber(e.g.,ifyourabstractison

page50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Papersareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlined.

Page 2: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

i

TABLEOFCONTENTS

PROCEEDINGSPAPERSWITHORALPRESENTATIONCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 1IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES...2NathanBowerman,NathanTintle,MatthewDeJongh,AaronA.Best

WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?.............................................................................................................................................................3MengfeiCao,LenoreJ.Cowen

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION...................................................................................................................................4ShengWang,MengQu,JianPeng

ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES...............................................................................................................................5ChristianWiwie,RichardRöttger

IMAGINGGENOMICS 6INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS.....................................................................7ChaoWang,HaiSu,LinYang,KunHuang

IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL......................................8JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen

ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK.........................9PascalZille,VinceD.Calhoun,Yu-PingWang

METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH 10EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS.........................................................................................................................................11AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt

REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE...............12EmreGuney

EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY...........................................................................................................................................13WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,TimothyE.Sweeney,PurveshKhatri

RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS..........................................................14GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural

DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES........15ShanYang,MelissaCline,CanZhang,BenedictPaten,StephenE.Lincoln

Page 3: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

ii

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 16LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES.........................................................................................................................................................17VibhuAgarwal,NigamH.Shah

COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA......................................................18HarishBabuArunachalam,RashikaMishra,BogdanArmaselu,OvidiuDaescu,MariaMartinez,PatrickLeavey,DineshRakheja,KevinCederberg,AnitaSengupta,MollyNi'Suilleabhain

MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS..........................................................................................................................19BrettK.Beaulieu-Jones,JasonH.Moore,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium

DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTHRECORDS.......20BrittanyM.Hollister,NicoleA.Restrepo,EricFarber-Eger,DanaC.Crawford,MelindaC.Melinda C. Aldrich,AmyNon

DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS..........................................................................................................................21JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi

PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNTSINAIHEARTFAILURECOHORT.............................................................................................................................22KhaderShameer,KippW.Johnson,AlexandreYahi,RiccardoMiotto,LiLi,DoranRicks,JebakumarJebakaran,PatriciaKovatch,ParthoP.Sengupta,AnnetineGelijns,AlanMoskovitz,BruceDarrow,DavidL.Reich,AndrewKasarskis,NicholasP.Tatonetti,SeanPinney5,JoelT.Dudley

METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS........................................................................................................................................................................23NicoleTignor,PeiWang,NicholasGenes,LindaRogers,StevenG.Hershman,ErickR.Scott,MicolZweig,Yu-FengYvonneChan,EricE.Schadt

ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS.................................................................24ModestvonKorff,TobiasFink,ThomasSander

DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS.................................................................................25StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 26OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFININGPHENOTYPES......................................................................................................................................................27ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass

TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA.......................................................................................28MetteBeck,DavidWestergaard,LeifGroop,SorenBrunak

Page 4: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

iii

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER..................................................................................................................................................................29JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION......................................................................................................................30DanHe,LaxmiParida

DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES..............................................................................................31GilSpeyer,DivyaMahendra,HaiJ.Tran,JeffKiefer,StuartL.Schreiber,PaulA.Clemons,HarshilDhruv,MichaelBerens,SeungchanKim

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSINCLEARCELLKIDNEYCANCER................................................................................................................................................32JeffreyA.Thompson,CarmenJ.Marsit

DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK.............................................................................................................................................................33GuhanRamVenkataraman,ChloeO'Connell,FumikoEgawa,DornaKashef-Haghighi,DennisPaulWall

IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHEQUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR.................................................................34ShefaliS.Verma,AnastasiaM.Lucas,DanielR.Lavage,JosephB.Leader,RaghuMetpally,SarathbabuKrishnamurthy,FrederickDewey,IngridBorecki,AlexanderLopez,JohnOverton,JohnPenn,JeffreyReid,SarahA.Pendergrass,GerdaBreitwieser,MarylynD.Ritchie

STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICALPOPULATION.......................................................................................................................................................35LauraWiley,JacobVanHouten,DavidSamuels,MelindaAldrich,DanRoden,JoshPeterson,JoshuaDenny

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY36PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPEDIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX...................................................................................................37BrianAevermann,JamisonMcCorrison,PratapVenepally,RebeccaHodge,TrygveBakken,JeremyMiller,MarkNovotny,DannyN.Tran,FranciscoDiez-Fuertes,LenaChristiansen,FanZhang,FrankSteemers,RogerS.Lasken,EdLein,NicholasSchork,RichardH.Scheuermann

TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES.............................................................................................................38PabloCordero,JoshuaM.Stuart

ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT...........................................................39KristinI.Fread,WilliamD.Strickland,GarryP.Nolan,EliR.Zunder

Page 5: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

iv

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSIMAGINGGENOMICS 40ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS...............................................................................................................41ChenGao,JunghiKim,WeiPan

EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS...........................................................................................................42ZhanaKuncheva,MichelleL.Krishnan,GiovanniMontana

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 43ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION...............................................................................................................................................44PadidehDanaee,RezaGhaeini,DavidHendrix

GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS...................................................45JacobM.Keaton,JacklynN.Hellwege,MaggieC.Y.Ng,NicholetteD.Palmer,JamesS.Pankow,MyriamFornage,JamesG.Wilson,AdolofoCorrea,LauraJ.Rasmussen-Torvik,JeromeI.Rotter,Yii-DerI.Chen,KentD.Taylor,StephenS.Rich,LynneE.Wagenknecht,BarryI.Freedman,DonaldW.Bowden

META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS.......................................................................................46MadeleineScott,FrancescoVallania,PurveshKhatri

LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS...................................................................................................................................47AnaStanescu,GauravPandey

NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE.......................................................................................................................................48KathleenWhiting,LarryY.Liu,MehmetKoyutürk,GunnurKarakurt

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 49APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM............................................................................................................50AndrewBeck,AlexanderLuedtke,KeliLiu,NathanTintle

MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING...............................................................................51DianaDiaz,MicheleDonato,TinNguyen,SorinDraghici

FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASEPATHWAYSANDPREDICTSPROGNOSIS....................................................................................................................................52ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek

CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS.................................................................53HallaKabat,LeoTunkle,InhanLee

IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES...................54ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle

Page 6: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

v

METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT.................................................................55PeiFenKuan,JunyanSong,ShuyaoHe

IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS.................................................56MengMa,ChangchangWang,BenjaminGlicksberg,EricE.Schadt,ShuyuLi,RongChen

IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS.................................................................................................................................................................57AndréSchultz,SanketMehta,ChenyueW.Hu,FiekeW.Hoff,TerzahM.Horton,StevenM.Kornblau,AminaA.Qutub

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY58MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING...................................................................59TravisJohnson,ZacharyAbrams,YanZhang,KunHuang

ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG........................................60KimberlyR.KanigelWinner,JamesC.Costello

SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA.........................61JuhoKim,NateRussell,JianPeng

POSTERPRESENTATIONSCOMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION 62CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM.........................................................................................................................................................63ErnestoBorrayo,RyokoMachida-Hirano

QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS..............................................................................................................64JingyiJessicaLi,Guo-LiangChew,MarkD.Biggin

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION................................................................................................................................65ShengWang,MengQu,JianPen

GENERAL 66IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS...........................................................................................................................67MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk

CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS..................................................................................................................................................................68YongshengBai,NaureenAslam,AliSalman

FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY....................................................................................................................69ChengshengZhu,YannickMahlich,YanaBromberg

Page 7: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

vi

THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN......................................................................................................................................................70FrankC.Brosius,WenjunJu,KeithBellovich,ZeenatBhat,CrystalGadegbeku,DebbieGipson,JenniferHawkins,JuliaHerzog,SusanMassengill,RichardC.McEachin,SubramaniamPennathur,KalyaniPerumal,RogerWiggins,MatthiasKretzler

MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE............................................................................................................................................................71DanaiChasioti,XiaohuiYao,PengyueZhang,XiaNing,LangLi,LiShen

DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITH GENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMICLANDSCAPESINTHEHUMANBRAIN...................................................................................................................................................72AslihanDincer,EricE.Schadt,BinZhang,JoelT.Dudley,DavinGavin,SchahramAkbarian

NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER....................73JenniferM.Franks,GuoshuaiCai,JaclynN.Taroni,MichaelL.Whitfield

MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA.........................................................................................................................................................74KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire

TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR..................................................................................................................................75NaHong,NareshProdduturi,ChenWang,GuoqianJiang

ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH......................................................................................................................76AustinHuang,DmitriBichko,MathieuBoespflug,EdskodeVries,FacundoDominguez,DanielZiemek

GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES................................................................77JeremieKim,DamlaSenol,HongyiXin,DonghyukLee,MohammedAlser,HasanHassan,OguzErgin,CanAlkan,OnurMutlu

BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL.....................................................................................................................78MelissaE.Ko,CharisTeh,ChristopherS.Playter,EliR.Zunder,DanielH.Gray,WendyJ.Fantl,SylviaK.Plevritis,GarryP.Nolan

BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE.............................79EmilyK.Mallory,ChrisRe,RussB.Altman

PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING.............................................................................................................................................80SergheiMangul,IgorMandric,HarryTaegyunYang,DennisMontoya,NicolasStrauli,JeremyRotman,BenjaminStatz,WillVanDerWey,AlexZelikovsky,RobertoSpreafico,MauraRossetti,SagivShifman,MarkAnsel,NoahZaitlen,EleazarEskin

THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL.......................................................................................................................81NeilMIller,GreysonTwist,ByunggilYoo,AndreaGaedigk

MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE..........................................................................................................................................................82VikasPejaver,LiliaM.Iakoucheva,SeanD.Mooney,PredragRadivojac

HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY..................................................83SergeiPond,StevenWeaver,JoelWertheim,AndrewJ.LeighBrown

Page 8: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

vii

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY........................................................................................84MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully

RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS..............................................................................................................................................................85YingxueRen,JosephS.Reddy,VivekanandaSarangi,JasonP.Sinnwell,SteveG.Younkin,NilüferErtekin-Taner,OwenA.Ross,RosaRademakers,ShannonK.McDonnell,JoannaM.Biernacka,YanW.Asmann

TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ.......................86PamelaRussell,RichardRadcliffe,BrianVestal,WenShi,PratyaydiptaRudra,LauraSaba,KaterinaKechris

NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS..........................................87DamlaSenol,JeremieKim,SaugataGhose,CanAlkan,OnurMutlu

DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER..................................................................................................................................................................88KyleSmith,SubhajyotiDe,DebashisGosh

HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS...........................................89AbiodunOtolorin,NanaOsafo,WilliamSoutherland

DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY..................................................................................90Kun-HsingYu,GeraldJ.Berry,DanielL.Rubin,ChristopherRé,RussB.Altman,MichaelSnyder

EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA.......................................................................................................................................................................91Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano

IMAGINGGENOMICS 92PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA................................................................................................................................................93DongdongLin,VinceD.Calhoun,JuanR.Bustillo,NoraPerrone-Bizzozero,JingyuLiu

THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS...................................................................................................................................................94OlgaV.Matveeva,NafisaN.Nazipova,AlekseyY.Ogurtsov,SvetlanaA.Shabalina

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM? 95WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT...................................................................................................96AlyssaI.Clay,RichardM.Weinshilboum,K.SreekumaranNair,RimaF.Kaddurah-Daouk,LieweiWang,MatthewK.Breitenstein

ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS..........97StephenV.Gliske,KatyL.Lau,BenjaminH.Brinkman,GregA.Worrell,CrisG.Fink,WilliamC.Stacey

INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS..................................................................................................................98ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje

VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS......................................99ModestvonKorff,TobiasFink,ThomasSander

Page 9: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

viii

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES 100FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPEPREDICTION............................101StevenE.Brenner,GaiaAndreoletti,RogerAHoskins,JohnMoult,CAGIParticipants

ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1.......................................................................102AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER................................................................................................................................................................103JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA.....................................................................................................................................................................104RachelGoldfeder,EuanAshley

MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE..................................................105IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayGay Reed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,RamaVolety,TonyStai,YaxiongLin,RobertFreimuth

PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT).....................................106T.E.Klein,M.Whirl-Carrillo,R.M.Whaley,M.Woon,K.Sangkuhl,LesterG.Carter,H.M.Dunnenberger,P.E.Empey,A.T.Frase,R.R.Freimuth,A.Gaedigk,A.Gordon,C. Haidar,J.K.Hicks,J.M.Hoffman,M.T.Lee,N.Miller,S.D.Mooney,T.N.Person,J.F.Peterson,M.V.Relling,S.A.Scott,G.Twist,A.Verma,M.S.Williams,C.Wu,W.Yang,M.D.Ritchie

PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA......................107SarathbabuKrishnamurthy,DianeSmelser,ManickamKandamurugu,JosephLeader,NouraS.Abul-Husn,AlanR.Shuldiner,DavidH.Ledbetter,FrederickE.Dewey,DavidJ.Carey,MichaelF.Murray,RaghuP.R.Metpally

INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOFPROSTATECANCERRISKLOCI.................................................................................................................108NicholasB.Larson,ShannonMcDonnell,ZachFogarty,MelissaLarson,JohnCheville,ShaunRiska,SaurabhBaheti,AshaA.Nair,DanielO’Brien,Jaime Davila, Daniel Schaid, Stephen N. Thibodeau

INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES...................................................................................................................109JasonE.McDermott,TaoLiu,SamuelPayne,VladislavPetyuk,RichardSmith,PhilippMertins,StevenCarr,KarinRodland

NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS........................................................................................................................................................110ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader

PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES...........................................................................................................................................111Hyun-TaeShin,JaeWonYun,NayoungK.D.Kim,Yoon-LaChoi,Woong-YangPark,PeterJ.Park

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS.......................................112JeffreyA.Thompson,CarmenJ.Marsit

CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE............................113AndreaGaedigk,GreysonP.Twist,SarahSoden,EmilyG.Farrow,NeilA.Miller

Page 10: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

ix

INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE...........................................................................................................................................................114DavidS.Wishart,AnaMarcu,AnChiGuo,AshAnwar,SolveigJohannessen,CraigKnox,MichaelWilson,ChristophH.Borchers,PieterCullis,RobertFraser

BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES..........115JiwenXin,CyrusAfrasiabi,SebastienLelong,GingerTsueng,SeanD.Mooney,AndrewI.Su,ChunleiWu

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY116SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS..................................................................................117ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall

ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA......................................................................................................118TylerJ.Burns,GarryP.Nolan,NikolaySamusik

SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATINGQUANTIFICATIONOFUNCERTAINTY..................................................................................................................................................119WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie

REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION.........................................120JonathanA.Rebhahn,SallyA.Quataert,GauravSharma,TimR.Mosmann

WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS 121ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY..........................122E. Griffiths,D.Dooley,C.Bertelli,J.Adam,F.Bristow,T.Matthews,A.Petkau,M.Courtot,J.A. Carriço,A.Keddy,R.Beiko,L.M.Schriml,E.Taboada,M.Graham,G.VanDomselaar,W. Hsiao,F.Brinkman

AUTHORINDEX 123

Page 11: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

1

COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 12: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

2

IDENTIFICATIONANDANALYSISOFBACTERIALGENOMICMETABOLICSIGNATURES

NathanBowerman1,NathanTintle2,MatthewDeJongh3,AaronA.Best1

1DepartmentofBiology,HopeCollege;2DepartmentofMathematicsandStatistics,DordtCollege,3DepartmentofComputerScience,HopeCollege

BestAaronWithcontinuedrapidgrowthinthenumberandqualityoffullysequencedandaccuratelyannotatedbacterialgenomes,wehaveunprecedentedopportunitiestounderstandmetabolicdiversity.Weselected101diverseandrepresentativecompletelysequencedbacteriaandimplementedamanualcurationefforttoidentify846uniquemetabolicvariantspresentinthesebacteria.Thepresenceorabsenceofthesevariantsactasametabolicsignatureforeachofthebacteria,whichcanthenbeusedtounderstandsimilaritiesanddifferencesbetweenandacrossbacterialgroups.Weproposeanovelandrobustmethodofsummarizingmetabolicdiversityusingmetabolicsignaturesandusethismethodtogenerateametabolictree,clusteringmetabolicallysimilarorganisms.Resultinganalysisofthemetabolictreeconfirmsstrongassociationswithwell-establishedbiologicalresultsalongwithdirectinsightintoparticularmetabolicvariantswhicharemostpredictiveofmetabolicdiversity.Thepositiveresultsofthismanualcurationeffortandnovelmethoddevelopmentsuggestthatfutureworkisneededtofurtherexpandthesetofbacteriatowhichthisapproachisappliedandusetheresultingtreetotestbroadquestionsaboutmetabolicdiversityandcomplexityacrossthebacterialtreeoflife.

Page 13: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

3

WHENSHOULDWENOTTRANSFERFUNCTIONALANNOTATIONBETWEENSEQUENCEPARALOGS?

MengfeiCao,LenoreJ.Cowen

TuftsUniversity

LenoreCowenCurrentautomatedcomputationalmethodstoassignfunctionallabelstounstudiedgenesofteninvolvetransferringannotationfromorthologousorparalogousgenes,howeversuchgenescanevolvedivergentfunctions,makingsuchtransferinappropriate.Weconsidertheproblemofdeterminingwhenitiscorrecttomakesuchanassignmentbetweenparalogs.Weconstructabenchmarkdatasetoftwotypesofsimilarparalogouspairsofgenesinthewell-studiedmodelorganismS.cerevisiae:onesetofpairswheresingledeletionmutantshaveverysimilarphenotypes(implyingsimilarfunctions),andanothersetofpairswheresingledeletionmutantshaveverydivergentphenotypes(implyingdifferentfunctions).Stateoftheartmethodsforthisproblemwilldeterminetheevolutionaryhistoryoftheparalogswithreferencestomultiplerelatedspecies.Here,weaskafirstandsimplerquestion:weexploretowhatextentanycomputationalmethodwithaccessonlytodatafromasinglespeciescansolvethisproblem.Weconsiderdivergencedata(atboththeaminoacidandnucleotidelevels),andnetworkdata(basedontheyeastprotein-proteininteractionnetwork,ascapturedinBioGRID),andaskifwecanextractfeaturesfromthesedatathatcandistinguishbetweenthesesetsofparalogousgenepairs.Wefindthatthebestfeaturescomefrommeasuresofsequencedivergence,however,simplenetworkmeasuresbasedondegreeorcentralityorshortestpathordiffusionstatedistance(DSD),orsharedneighborhoodintheyeastprotein-proteininteraction(PPI)networkalsocontainsomesignal.Oneshould,ingeneral,nottransferfunctionifsequencedivergenceistoohigh.Furtherimprovementsinclassificationwillneedtocomefrommorecomputationallyexpensivebutmuchmorepowerfulevolutionarymethodsthatincorporateancestralstatesandmeasureevolutionarydivergenceovermultiplespeciesbasedonevolutionarytrees.

Page 14: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

4

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION

ShengWang,MengQu,JianPeng

UniversityofIllinoisUrbana-Champaign

ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.

Page 15: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

5

ONTHEPOWERANDLIMITSOFSEQUENCESIMILARITYBASEDCLUSTERINGOFPROTEINSINTOFAMILIES

ChristianWiwie,RichardRöttger

UniversityofSouthernDenmark

RichardRöttgerOverthelastdecades,wehaveobservedanongoingtremendousgrowthofavailablesequencingdatafueledbytheadvancementsinwet-labtechnology.Thesequencinginformationisonlythebeginningoftheactualunderstandingofhoworganismssurviveandprosper.Itis,forinstance,equallyimportanttoalsounraveltheproteomicrepertoireofanorganism.Aclassicalcomputationalapproachfordetectingproteinfamiliesisasequence-basedsimilaritycalculationcoupledwithasubsequentclusteranalysis.Inthisworkwehaveintensivelyanalyzedvariousclusteringtoolsonalargescale.Weusedthedatatoinvestigatethebehaviorofthetools'parametersunderliningthediversityoftheproteinfamilies.Furthermore,wetrainedregressionmodelsforpredictingtheexpectedperformanceofaclusteringtoolforanunknowndatasetandaimedtoalsosuggestoptimalparametersinanautomatedfashion.Ouranalysisdemonstratesthebenefitsandlimitationsoftheclusteringofproteinswithlowsequencesimilarityindicatingthateachproteinfamilyrequiresitsowndistinctsetoftoolsandparameters.Allresults,atoolpredictionservice,andadditionalsupportingmaterialisalsoavailableonlineunderhttp://proteinclustering.compbio.sdu.dk/

Page 16: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

6

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 17: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

7

INTEGRATIVEANALYSISFORLUNGADENOCARCINOMAPREDICTSMORPHOLOGICALFEATURESASSOCIATEDWITHGENETICVARIATIONS

ChaoWang1,HaiSu2,LinYang2,KunHuang1

1TheOhioStateUniversity,2UniversityofFlorida

KunHuangLungcancerisoneofthemostdeadlycancersandlungadenocarcinoma(LUAD)isthemostcommonhistologicaltypeoflungcancer.However,LUADishighlyheterogeneousduetogeneticdifferenceaswellasphenotypicdifferencessuchascellularandtissuemorphology.Inthispaper,wesystematicallyexaminetherelationshipsbetweenhistologicalfeaturesandgenetranscription.Specifically,wecalculated283morphologicalfeaturesfromhistologyimagesfor201LUADpatientsfromTCGAprojectandidentifiedthemorphologicalfeaturewithstrongcorrelationwithpatientoutcome.Wethenmodeledthemorphologyfeatureusingmultipleco-expressedgeneclustersusingLasso-regression.Manyofthegeneclustersarehighlyassociatedwithgeneticvariations,specificallyDNAcopynumbervariations,implyingthatgeneticvariationsplayimportantrolesinthedevelopmentcancermorphology.Asfarasweknow,ourfindingisthefirsttodirectlylinkthegeneticvariationsandfunctionalgenomicstoLUADhistology.Theseobservationswillleadtonewinsightonlungcancerdevelopmentandpotentialnewintegrativebiomarkersforpredictionpatientprognosisandresponsetotreatments.

Page 18: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

8

IDENTIFICATIONOFDISCRIMINATIVEIMAGINGPROTEOMICSASSOCIATIONSINALZHEIMER'SDISEASEVIAANOVELSPARSECORRELATIONMODEL

JingwenYan,ShannonL.Risacher,KwangsikNho,AndrewJ.Saykin,LiShen

IndianaUniversity

JingwenYanBrainimagingandproteinexpression,frombothcerebrospinalfluidandbloodplasma,havebeenfoundtoprovidecomplementaryinformationinpredictingtheclinicaloutcomesofAlzheimer'sdisease(AD).Buttheunderlyingassociationsthatcontributetosuchacomplementaryrelationshiphavenotbeenpreviouslystudiedyet.Inthiswork,wewillperformanimagingproteomicsassociationanalysistoexplorehowtheyarerelatedwitheachother.Whiletraditionalassociationmodels,suchasSparseCanonicalCorrelationAnalysis(SCCA),cannotguaranteetheselectionofonlydisease-relevantbiomarkersandassociations,weproposeanoveldiscriminativeSCCA(denotedasDSCCA)modelwithnewpenaltytermstoaccountforthediseasestatusinformation.Givenbrainimaging,proteomicanddiagnosticdata,theproposedmodelcanperformajointassociationandmulti-classdiscriminationanalysis,suchthatwecannotonlyidentifydisease-relevantmultimodalbiomarkers,butalsorevealstrongassociationsbetweenthem.Basedonarealimagingproteomicdataset,theempiricalresultsshowthatDSCCAandtraditionalSCCAhavecomparableassociationperformances.Butinafurtherclassificationanalysis,canonicalvariablesofimagingandproteomicdataobtainedinDSCCAdemonstratemuchmorediscriminationpowertowardmultiplepairsofdiagnosisgroupsthanthoseobtainedinSCCA.

Page 19: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

9

ENFORCINGCO-EXPRESSIONINMULTIMODALREGRESSIONFRAMEWORK

PascalZille1,VinceD.Calhoun2,Yu-PingWang1

1TulaneUniversity,2UniversityofNewMexico

PascalZilleWeconsidertheproblemofmultimodaldataintegrationforthestudyofcomplexneurologicaldiseases(e.g.schizophrenia).Amongthechallengesarisinginsuchsituation,estimatingthelinkbetweengeneticandneurologicalvariabilitywithinapopulationsamplehasbeenapromisingdirection.Awidevarietyofstatisticalmodelsarosefromsuchapplications.Forexample,Lassoregressionanditsmultitaskextensionareoftenusedtofitamultivariatelinearrelationshipbetweengivenphenotype(s)andassociatedobservations.Otherapproaches,suchascanonicalcorrelationanalysis(CCA),arewidelyusedtoextractrelationshipsbetweensetsofvariablesfromdifferentmodalities.Inthispaper,weproposeanexploratorymultivariatemethodcombiningthesetwomethods.MoreSpecifically,werelyona'CCA-type'formulationinordertoregularizetheclassicalmultimodalLassoregressionproblem.Theunderlyingmotivationistoextractdiscriminativevariablesthatdisplayarealsoco-expressedacrossmodalities.Wefirstevaluatethemethodonasimulateddataset,andfurthervalidateitusingSingleNucleotidePolymorphisms(SNP)andfunctionalMagneticResonanceImaging(fMRI)dataforthestudyofschizophrenia.

Page 20: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

10

METHODSTOENSURETHEREPRODUCIBILITYOFBIOMEDICALRESEARCH

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 21: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

11

EXPLORINGTHEREPRODUCIBILITYOFPROBABILISTICCAUSALMOLECULARNETWORKMODELS

AriellaCohain,AparnaA.Divaraniya,KuixiZhu,JosephR.Scarpa,AndrewKasarskis,JunZhu,RuiChang,JoelT.Dudley,EricE.Schadt

IcahnInstituteandDepartmentofGeneticsandGenomics,IcahnSchoolofMedicineatMountSinai

AriellaCohainNetworkreconstructionalgorithmsareincreasinglybeingemployedinbiomedicalandlifesciencesresearchtointegratelarge-scale,high-dimensionaldatainformingonlivingsystems.OneparticularclassofprobabilisticcausalnetworksbeingappliedtomodelthecomplexityandcausalstructureofbiologicaldataisBayesiannetworks(BNs).BNsprovideanelegantmathematicalframeworkfornotonlyinferringcausalrelationshipsamongmanydifferentmolecularandhigherorderphenotypes,butalsoforincorporatinghighlydiversepriorsthatprovideanefficientpathforincorporatingexistingknowledge.WhilesignificantmethodologicaldevelopmentshavebroadlyenabledtheapplicationofBNstogenerateandvalidatemeaningfulbiologicalhypotheses,thereproducibilityofBNsinthiscontexthasnotbeensystematicallyexplored.Inthisstudy,weaimtodeterminethecriteriaforgeneratingreproducibleBNsinthecontextoftranscription-basedregulatorynetworks.Weutilizetwouniquetissuesfromindependentdatasets,wholebloodfromtheGTExConsortiumandliverfromtheStockholm-TartuAtherosclerosisReverseNetworkEngineeringTeam(STARNET)study.WeevaluatedthereproducibilityoftheBNsbycreatingnetworksondatasubsampledatdifferentlevelsfromeachcohortandcomparingthesenetworkstotheBNsconstructedusingthecompletedata.Tohelpvalidateourresults,weusedsimulatednetworksatvaryingsamplesizes.OurstudyindicatesthatreproducibilityofBNsinbiologicalresearchisanissueworthyoffurtherconsideration,especiallyinlightofthemanypublicationsthatnowemployfindingsfromsuchconstructswithoutappropriateattentionpaidtoreproducibility.Wefindthatwhileedge-to-edgereproducibilityisstronglydependentonsamplesize,identificationofmorehighlyconnectedkeydrivernodesinBNscanbecarriedoutwithhighconfidenceacrossarangeofsamplesizes.

Page 22: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

12

REPRODUCIBLEDRUGREPURPOSING:WHENSIMILARITYDOESNOTSUFFICE

EmreGuney

JointIRB-BSC-CRGPrograminComputationalBiology-InstituteforResearchinBiomedicine(IRB)Barcelona

EmreGuneyRepurposingexistingdrugsfornewuseshasattractedconsiderableattentionoverthepastyears.Toidentifypotentialcandidatesthatcouldberepositionedforanewindication,manystudiesmakeuseofchemical,target,andsideeffectsimilaritybetweendrugstotrainclassifiers.Despitepromisingpredictionaccuraciesofthesesupervisedcomputationalmodels,theiruseinpractice,suchasforrarediseases,ishinderedbytheassumptionthattherearealreadyknownandsimilardrugsforagivenconditionofinterest.Inthisstudy,usingpubliclyavailabledatasets,wequestionthepredictionaccuraciesofsupervisedapproachesbasedondrugsimilaritywhenthedrugsinthetrainingandthetestsetarecompletelydisjoint.WefirstbuildaPythonplatformtogeneratereproduciblesimilarity-baseddrugrepurposingmodels.Next,weshowthat,whileasimplechemical,target,andsideeffectsimilaritybasedmachinelearningmethodcanachievegoodperformanceonthebenchmarkdataset,thepredictionperformancedropssharplywhenthedrugsinthefoldsofthecrossvalidationarenotoverlappingandthesimilarityinformationwithinthetrainingandtestsetsareusedindependently.Theseintriguingresultssuggestrevisitingtheassumptionsunderlyingthevalidationscenariosofsimilarity-basedmethodsandunderlinetheneedforunsupervisedapproachestoidentifynoveldrugusesinsidetheunexploredpharmacologicalspace.WemakethedigitalnotebookcontainingthePythoncodetoreplicateouranalysisthatinvolvesthedrugrepurposingplatformbasedonmachinelearningmodelsandtheproposeddisjointcrossfoldgenerationmethodfreelyavailableatgithub.com/emreg00/repurpose.

Page 23: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

13

EMPOWERINGMULTI-COHORTGENEEXPRESSIONANALYSISTOINCREASEREPRODUCIBILITY

WinstonA.Haynes,FrancescoVallania,CharlesLiu,ErikaBongen,AurelieTomczak,MartaAndres-Terrè,ShaneLofgren,AndrewTam,ColeA.Deisseroth,MatthewD.Li,

TimothyE.Sweeney,PurveshKhatri

StanfordUniversity

WinstonHaynesAmajorcontributortothescientificreproducibilitycrisishasbeenthattheresultsfromhomogeneous,single-centerstudiesdonotgeneralizetoheterogeneous,realworldpopulations.Multi-cohortgeneexpressionanalysishashelpedtoincreasereproducibilitybyaggregatingdatafromdiversepopulationsintoasingleanalysis.Tomakethemulti-cohortanalysisprocessmorefeasible,wehaveassembledananalysispipelinewhichimplementsrigorouslystudiedmeta-analysisbestpractices.Wehavecompiledandmadepubliclyavailabletheresultsofourownmulti-cohortgeneexpressionanalysisof103diseases,spanning615studiesand36,915samples,throughanovelandinteractivewebapplication.Asaresult,wehavemadeboththeprocessofandtheresultsfrommulti-cohortgeneexpressionanalysismoreapproachablefornon-technicalusers.

Page 24: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

14

RABIX:ANOPEN-SOURCEWORKFLOWEXECUTORSUPPORTINGRECOMPUTABILITYANDINTEROPERABILITYOFWORKFLOWDESCRIPTIONS

GauravKaushik,SinisaIvkovic,JankoSimonovic,NebojsaTijanic,BrandiDavis-Dusenbery,DenizKural

SevenBridgesGenomics

GauravKaushikAsbiomedicaldatahasbecomeincreasinglyeasytogenerateinlargequantities,themethodsusedtoanalyzeithaveproliferatedrapidly.Reproducibleandreusablemethodsarerequiredtolearnfromlargevolumesofdatareliably.Toaddressthisissue,numerousgroupshavedevelopedworkflowspecificationsorexecutionengines,whichprovideaframeworkwithwhichtoperformasequenceofanalyses.OnesuchspecificationistheCommonWorkflowLanguage,anemergingstandardwhichprovidesarobustandflexibleframeworkfordescribingdataanalysistoolsandworkflows.Inaddition,reproducibilitycanbefurtheredbyexecutorsorworkflowengineswhichinterpretthespecificationandenableadditionalfeatures,suchaserrorlogging,fileorganization,optimizationstocomputationandjobscheduling,andallowforeasycomputingonlargevolumesofdata.Tothisend,wehavedevelopedtheRabixExecutora,anopen-sourceworkflowengineforthepurposesofimprovingreproducibilitythroughreusabilityandinteroperabilityofworkflowdescriptions.

Page 25: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

15

DATASHARINGANDCLINICALGENETICTESTING:SUCCESSESANDCHALLENGES

ShanYang1,MelissaCline2,CanZhang2,BenedictPaten2,StephenE.Lincoln1

1Invitae,2UniversityofCaliforniaSantaCruz

StephenLincolnOpensharingofclinicalgeneticdatapromisestobothmonitorandeventuallyimprovethereproducibilityofvariantinterpretationamongclinicaltestinglaboratories.AsignificantpublicdataresourcehasbeendevelopedbytheNIHClinVarinitiative,whichincludessubmissionsfromhundredsoflaboratoriesandclinicsworldwide.WeanalyzedasubsetofClinVardatafocusedonspecificclinicalareasandwefindhighreproducibility(>90%concordance)amonglabs,althoughchallengesforthecommunityareclearlyidentifiedinthisdataset.WefurtherreviewresultsforthecommonlytestedBRCA1andBRCA2genes,whichshowevenhigherconcordance,althoughthesignificantfragmentationofdataintodifferentsilospresentsanongoingchallengenowbeingaddressedbytheBRCAExchange.Weencouragealllaboratoriesandclinicstocontributetotheseimportantresources.

Page 26: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

16

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 27: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

17

LEARNINGATTRIBUTESOFDISEASEPROGRESSIONFROMTRAJECTORIESOFSPARSELABVALUES

VibhuAgarwal1,NigamH.Shah2

1BiomedicalInformaticsTrainingProgram,StanfordUniversity,2TheCenterforBiomedicalInformaticsResearch,StanfordUniversity

VibhuAgarwalThereisheterogeneityinthemanifestationofdiseases,thereforeitisessentialtounderstandthepatternsofprogressionofadiseaseinagivenpopulationfordiseasemanagementaswellasforclinicalresearch.Diseasestatusisoftensummarizedbyrepeatedrecordingsofoneormorephysiologicalmeasures.Asaresult,historicalvaluesofthesephysiologicalmeasuresforapopulationsamplecanbeusedtocharacterizediseaseprogressionpatterns.Weuseamethodforclusteringsparsefunctionaldataforidentifyingsub-groupswithinacohortofpatientswithchronickidneydisease(CKD),basedonthetrajectoriesoftheirCreatininemeasurements.Wedemonstratethroughaproof-of-principlestudyhowthetwosub-groupsthatdisplaydistinctpatternsofdiseaseprogressionmaybecomparedonclinicalattributesthatcorrespondtothemaximumdifferenceinprogressionpatterns.Thekeyattributesthatdistinguishthetwosub-groupsappeartohavesupportinpublishedliteratureclinicalpracticerelatedtoCKD.

Page 28: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

18

COMPUTERAIDEDIMAGESEGMENTATIONANDCLASSIFICATIONFORVIABLEANDNON-VIABLETUMORIDENTIFICATIONINOSTEOSARCOMA

HarishBabuArunachalam1,RashikaMishra1,BogdanArmaselu1,OvidiuDaescu1,MariaMartinez1,PatrickLeavey1,DineshRakheja2,KevinCederberg2,AnitaSengupta2,Molly

Ni'Suilleabhain2

1UniversityofTexasatDallas,2UniversityofTexasSouthwesternMedicalCenter

HarishBabuArunachalamOsteosarcomaisoneofthemostcommontypesofbonecancerinchildren.Togaugetheextentofcancertreatmentresponseinthepatientaftersurgicalresection,theH&Estainedimageslidesaremanuallyevaluatedbypathologiststoestimatethepercentageofnecrosis,atimeconsumingprocesspronetoobserverbiasandinaccuracy.Digitalimageanalysisisapotentialmethodtoautomatethisprocess,thussavingtimeandprovidingamoreaccurateevaluation.TheslidesarescannedinAperioScanscope,convertedtodigitalWholeSlideImages(WSIs)andstoredinSVSformat.Thesearehighresolutionimages,oftheorderof10^9pixels,allowingupto40Xmagnificationfactor.Thispaperproposesanimagesegmentationandanalysistechniqueforsegmentingtumorandnon-tumorregionsinhistopathologicalWSIsofosteosarcomadatasets.Ourapproachisacombinationofpixel-basedandobject-basedmethodswhichutilizetumorpropertiessuchasnucleicluster,density,andcircularitytoclassifytumorregionsasviableandnon-viable.AK-Meansclusteringtechniqueisusedfortumorisolationusingcolornormalization,followedbymulti-thresholdOtsusegmentationtechniquetofurtherclassifytumorregionasviableandnon-viable.ThenaFlood-fillalgorithmisappliedtoclustersimilarpixelsintocellularobjectsandcomputeclusterdataforfurtheranalysisofregionsunderstudy.TothebestofourknowledgethisisthefirstcomprehensivesolutionthatisabletoproducesuchaclassificationforOsteosarcomacancer.Theresultsareveryconclusiveinidentifyingviableandnon-viabletumorregions.Inourexperiments,theaccuracyofthediscussedapproachis100%inviabletumorandcoagulativenecrosisidentificationwhileitisaround90%forfibrosisandacellular/hypocellulartumorosteoid,forallthesampleddatasetsused.Weexpectthedevelopedsoftwaretoleadtoasignificantincreaseinaccuracyanddecreaseininter-observervariabilityinassessmentofnecrosisbythepathologistsandareductioninthetimespentbythepathologistsinsuchassessments.

Page 29: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

19

MISSINGDATAIMPUTATIONINTHEELECTRONICHEALTHRECORDUSINGDEEPLYLEARNEDAUTOENCODERS

BrettK.Beaulieu-Jones1,JasonH.Moore2,ThePooledResourceOpen-AccessALSClinicalTrialsConsortium

1GenomicsandComputationalBiologyGraduateGroup,ComputationalGeneticsLab,InstituteforBiomedicalInformatics,PerelmanSchoolofMedicine,UniversityofPennsylvania;2ComputationalGeneticsLab,InstituteforBiomedicalInformatics,

UniversityofPennsylvania

BrettBeaulieu-JonesElectronichealthrecords(EHRs)havebecomeavitalsourceofpatientoutcomedatabutthewidespreadprevalenceofmissingdatapresentsamajorchallenge.DifferentcausesofmissingdataintheEHRdatamayintroduceunintentionalbias.Here,wecomparetheeffectivenessofpopularmultipleimputationstrategieswithadeeplylearnedautoencoderusingthePooledResourceOpen-AccessALSClinicalTrialsDatabase(PRO-ACT).Toevaluateperformance,weexaminedimputationaccuracyforknownvaluessimulatedtobeeithermissingcompletelyatrandomormissingnotatrandom.WealsocomparedALSdiseaseprogressionpredictionacrossdifferentimputationmodels.Autoencodersshowedstrongperformanceforimputationaccuracyandcontributedtothestrongestdiseaseprogressionpredictor.Finally,weshowthatdespiteclinicalheterogeneity,ALSdiseaseprogressionappearshomogenouswithtimefromonsetbeingthemostimportantpredictor.

Page 30: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

20

DEVELOPMENTANDPERFORMANCEOFTEXT-MININGALGORITHMSTOEXTRACTSOCIOECONOMICSTATUSFROMDE-IDENTIFIEDELECTRONICHEALTH

RECORDS

BrittanyM.Hollister1,NicoleA.Restrepo2,EricFarber-Eger3,DanaC.Crawford2,MelindaC.Aldrich4,AmyNon5

1VanderbiltGeneticInstitute,VanderbiltUniversity;2InstituteforComputationalBiologyandDepartmentofEpidemiologyandBiostatistics,CaseWesternReserveUniversity;3VanderbiltInstituteforClinicalandTranslationalResearch,VanderbiltUniversity;

4DepartmentofThoracicSurgeryandDivisionofEpidemiology,VanderbiltUniversityMedicalCenter;5DepartmentofAnthropology,UniversityofCaliforniaSanDiego

BrittanyHollisterSocioeconomicstatus(SES)isafundamentalcontributortohealth,andakeyfactorunderlyingracialdisparitiesindisease.However,SESdataarerarelyincludedingeneticstudiesdueinparttothedifficultlyofcollectingthesedatawhenstudieswerenotoriginallydesignedforthatpurpose.Theemergenceoflargeclinic-basedbiobankslinkedtoelectronichealthrecords(EHRs)providesresearchaccesstolargepatientpopulationswithlongitudinalphenotypedatacapturedinstructuredfieldsasbillingcodes,procedurecodes,andprescriptions.SESdatahowever,areoftennotexplicitlyrecordedinstructuredfields,butratherrecordedinthefreetextofclinicalnotesandcommunications.Thecontentandcompletenessofthesedatavarywidelybypractitioner.Toenablegene-environmentstudiesthatconsiderSESasanexposure,wesoughttoextractSESvariablesfromracial/ethnicminorityadultpatients(n=9,977)inBioVU,theVanderbiltUniversityMedicalCenterbiorepositorylinkedtode-identifiedEHRs.WedevelopedseveralmeasuresofSESusinginformationavailablewithinthede-identifiedEHR,includingbroadcategoriesofoccupation,education,insurancestatus,andhomelessness.TwohundredpatientswererandomlyselectedformanualreviewtodevelopasetofsevenalgorithmsforextractingSESinformationfromde-identifiedEHRs.Thealgorithmsconsistof15categoriesofinformation,with830uniquesearchterms.SESdataextractedfrommanualreviewof50randomlyselectedrecordswerecomparedtodataproducedbythealgorithm,resultinginpositivepredictivevaluesof80.0%(education),85.4%(occupation),87.5%(unemployment),63.6%(retirement),23.1%(uninsured),81.8%(Medicaid),and33.3%(homelessness),suggestingsomecategoriesofSESdataareeasiertoextractinthisEHRthanothers.TheSESdataextractionapproachdevelopedherewillenablefutureEHR-basedgeneticstudiestointegrateSESinformationintostatisticalanalyses.Ultimately,incorporationofmeasuresofSESintogeneticstudieswillhelpelucidatetheimpactofthesocialenvironmentondiseaseriskandoutcomes.

Page 31: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

21

DEMODASHBOARD:VISUALIZINGANDUNDERSTANDINGGENOMICSEQUENCESUSINGDEEPNEURALNETWORKS

JackLanchantin,RitambharaSingh,BeilunWang,YanjunQi

UniversityofVirginia

JackLanchantinDeepneuralnetwork(DNN)modelshaverecentlyobtainedstate-of-the-artpredictionaccuracyforthetranscriptionfactorbinding(TFBS)siteclassificationtask.However,itremainsunclearhowtheseapproachesidentifymeaningfulDNAsequencesignalsandgiveinsightsastowhyTFsbindtocertainlocations.Inthispaper,weproposeatoolkitcalledtheDeepMotifDashboard(DeMoDashboard)whichprovidesasuiteofvisualizationstrategiestoextractmotifs,orsequencepatternsfromdeepneuralnetworkmodelsforTFBSclassification.WedemonstratehowtovisualizeandunderstandthreeimportantDNNmodels:convolutional,recurrent,andconvolutional-recurrentnetworks.Ourfirstvisualizationmethodisfindingatestsequence'ssaliencymapwhichusesfirst-orderderivativestodescribetheimportanceofeachnucleotideinmakingthefinalprediction.Second,consideringrecurrentmodelsmakepredictionsinatemporalmanner(fromoneendofaTFBSsequencetotheother),weintroducetemporaloutputscores,indicatingthepredictionscoreofamodelovertimeforasequentialinput.Lastly,aclass-specificvisualizationstrategyfindstheoptimalinputsequenceforagivenTFBSpositiveclassviastochasticgradientoptimization.Ourexperimentalresultsindicatethataconvolutional-recurrentarchitectureperformsthebestamongthethreearchitectures.ThevisualizationtechniquesindicatethatCNN-RNNmakespredictionsbymodelingbothmotifsaswellasdependenciesamongthem.

Page 32: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

22

PREDICTIVEMODELINGOFHOSPITALREADMISSIONRATESUSINGELECTRONICMEDICALRECORD-WIDEMACHINELEARNING:ACASE-STUDYUSINGMOUNT

SINAIHEARTFAILURECOHORT

KhaderShameer1,2,KippW.Johnson1,2,AlexandreYahi7,RiccardoMiotto1,2,LiLi1,2,DoranRicks3,JebakumarJebakaran4,PatriciaKovatch1,4,ParthoP.Sengupta5,AnnetineGelijns8,Alan

Moskovitz8,BruceDarrow5,DavidL.Reich6,AndrewKasarskis1,NicholasP.Tatonetti7,SeanPinney5,JoelT.Dudley1,2,8*

1DepartmentofGeneticsandGenomics,IcahnInstituteofGenomicsandMultiscaleBiology;2InstituteofNextGenerationHealthcare,MountSinaiHealthSystem,NY;3DecisionSupport,

MountSinaiHealthSystem,NY;4MountSinaiDataWarehouse,IcahnInstituteofGenomicsandMultiscaleBiology,NY;5ZenaandMichaelA.WienerCardiovascularInstitute,IcahnSchoolofMedicineatMountSinai,NY;6DepartmentofAnesthesiology,IcahnSchoolofMedicineatMountSinai,NY;7DepartmentsofBiomedicalInformatics,SystemsBiologyandMedicine,

ColumbiaUniversityMedicalCenter,NY;8PopulationHealthScienceandPolicy,MountSinaiHealthSystem,NY

*CorrespondingAuthor,Email:joel.dudley@mssm.eduKhaderShameerReductionofpreventablehospitalreadmissionsthatresultfromchronicoracuteconditionslikestroke,heartfailure,myocardialinfarctionandpneumoniaremainsasignificantchallengeforimprovingtheoutcomesanddecreasingthecostofhealthcaredeliveryintheUnitedStates.Patientreadmissionratesarerelativelyhighforconditionslikeheartfailure(HF)despitetheimplementationofhigh-qualityhealthcaredeliveryoperationguidelinescreatedbyregulatoryauthorities.Multiplepredictivemodelsarecurrentlyavailabletoevaluatepotential30-dayreadmissionratesofpatients.Mostofthesemodelsarehypothesisdrivenandrepetitivelyassessthepredictiveabilitiesofthesamesetofbiomarkersaspredictivefeatures.Inthismanuscript,wediscussourattempttodevelopadata-driven,electronic-medicalrecord-wide(EMR-wide)featureselectionapproachandsubsequentmachinelearningtopredictreadmissionprobabilities.Wehaveassessedalargerepertoireofvariablesfromelectronicmedicalrecordsofheartfailurepatientsinasinglecenter.Thecohortincluded1,068patientswith178patientswerereadmittedwithina30-dayinterval(16.66%readmissionrate).Atotalof4,205variableswereextractedfromEMRincludingdiagnosiscodes(n=1,763),medications(n=1,028),laboratorymeasurements(n=846),surgicalprocedures(n=564)andvitalsigns(n=4).WedesignedamultistepmodelingstrategyusingtheNaïveBayesalgorithm.Inthefirststep,wecreatedindividualmodelstoclassifythecases(readmitted)andcontrols(non-readmitted).Inthesecondstep,featurescontributingtopredictiveriskfromindependentmodelswerecombinedintoacompositemodelusingacorrelation-basedfeatureselection(CFS)method.Allmodelsweretrainedandtestedusinga5-foldcross-validationmethod,with70%ofthecohortusedfortrainingandtheremaining30%fortesting.ComparedtoexistingpredictivemodelsforHFreadmissionrates(AUCsintherangeof0.6-0.7),resultsfromourEMR-widepredictivemodel(AUC=0.78;Accuracy=83.19%)andphenome-widefeatureselectionstrategiesareencouragingandrevealtheutilityofsuchdata-drivenmachinelearning.Finetuningofthemodel,replicationusingmulti-centercohortsandprospectiveclinicaltrialtoevaluatetheclinicalutilitywouldhelptheadoptionofthemodelasaclinicaldecisionsystemforevaluatingreadmissionstatus.

Page 33: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

23

METHODSFORCLUSTERINGTIMESERIESDATAACQUIREDFROMMOBILEHEALTHAPPS

NicoleTignor1,PeiWang1,NicholasGenes1,LindaRogers1,StevenG.Hershman2,ErickR.Scott1,MicolZweig1,Yu-FengYvonneChan1,EricE.Schadt1

1IcahnSchoolofMedicineatMountSinai,2LifeMapSolutions

NicoleTignorInourrecentAsthmaMobileHealthStudy(AMHS),thousandsofasthmapatientsacrossthecountrycontributedmedicaldatathroughtheiPhoneAsthmaHealthApponadailybasisforanextendedperiodoftime.Thecollecteddataincludeddailyself-reportedasthmasymptoms,symptomtriggers,andrealtimegeographiclocationinformation.TheAMHSisjustoneofmanystudiesoccurringinthecontextofnowmanythousandsofmobilehealthappsaimedatimprovingwellnessandbettermanagingchronicdiseaseconditions,leveragingthepassiveandactivecollectionofdatafrommobile,handheldsmartdevices.Theabilitytoidentifypatientgroupsorpatternsofsymptomsthatmightpredictadverseoutcomessuchasasthmaexacerbationsorhospitalizationsfromthesetypesoflarge,prospectivelycollecteddatasets,wouldbeofsignificantgeneralinterest.However,conventionalclusteringmethodscannotbeappliedtothesetypesoflongitudinallycollecteddata,especiallysurveydataactivelycollectedfromappusers,givenheterogeneouspatternsofmissingvaluesdueto:1)varyingsurveyresponseratesamongdifferentusers,2)varyingsurveyresponseratesovertimeofeachuser,and3)non-overlappingperiodsofenrollmentamongdifferentusers.Tohandlesuchcomplicatedmissingdatastructure,weproposedaprobabilityimputationmodeltoinfermissingdata.Wealsoemployedaconsensusclusteringstrategyintandemwiththemultipleimputationprocedure.Throughsimulationstudiesunderarangeofscenariosreflectingrealdataconditions,weidentifiedfavorableperformanceoftheproposedmethodoverotherstrategiesthatimputethemissingvaluethroughlow-rankmatrixcompletion.WhenapplyingtheproposednewmethodtostudyasthmatriggersandsymptomscollectedaspartoftheAMHS,weidentifiedseveralpatientgroupswithdistinctphenotypepatterns.Furthervalidationofthemethodsdescribedinthispapermightbeusedtoidentifyclinicallyimportantpatternsinlargedatasetswithcomplicatedmissingdatastructure,improvingtheabilitytousesuchdatasetstoidentifyat-riskpopulationsforpotentialintervention.

Page 34: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

24

ANEWRELEVANCEESTIMATORFORTHECOMPILATIONANDVISUALIZATIONOFDISEASEPATTERNSANDPOTENTIALDRUGTARGETS

ModestvonKorff,TobiasFink,ThomasSander

ResearchInformationManagement,ActelionPharmaceuticalsLtd.

ModestvonKorffAnewcomputationalmethodispresentedtoextractdiseasepatternsfromheterogeneousandtext-baseddata.Forthisstudy,22millionPubMedrecordswereminedforco-occurrencesofgenenamesynonymsanddiseaseMeSHterms.TheresultingpublicationcountsweretransferredintoamatrixMdata.Inthismatrix,adiseasewasrepresentedbyarowandagenebyacolumn.Eachfieldinthematrixrepresentedthepublicationcountforaco-occurringdisease–genepair.AsecondmatrixwithidenticaldimensionsMrelevancewasderivedfromMdata.TocreateMrelevancethevaluesfromMdatawerenormalized.Thenormalizedvaluesweremultipliedbythecolumn-wisecalculatedGinicoefficient.Thismultiplicationresultedinarelevanceestimatorforeverygeneinrelationtoadisease.FromMrelevancethesimilaritiesbetweenallrowvectorswerecalculated.TheresultingsimilaritymatrixSrelevancerelated5,000diseasesbytherelevanceestimatorscalculatedfor15,000genes.Threediseaseswereanalyzedindetailforthevalidationofthediseasepatternsandtherelevantgenes.CytoscapewasusedtovisualizeandtoanalyzeMrelevanceandSrelevancetogetherwiththegenesanddiseases.Summarizingtheresults,itcanbestatedthattherelevanceestimatorintroducedherewasabletodetectvaliddiseasepatternsandtoidentifygenesthatencodedkeyproteinsandpotentialtargetsfordrugdiscoveryprojects.

Page 35: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

25

DISCOVERYOFFUNCTIONALANDDISEASEPATHWAYSBYCOMMUNITYDETECTIONINPROTEIN-PROTEININTERACTIONNETWORKS

StephenJ.Wilson,AngelaD.Wilkins,Chih-HsuLin,RhonaldC.Lua,OlivierLichtarge

BaylorCollegeofMedicine

StephenWilsonAdvancesincellular,molecular,anddiseasebiologydependonthecomprehensivecharacterizationofgeneinteractionsandpathways.Traditionally,thesepathwaysarecuratedmanually,limitingtheirefficientannotationand,potentially,reinforcingfield-specificbias.Here,inordertotestobjectiveandautomatedidentificationoffunctionallycooperativegenes,wecomparedanovelalgorithmwiththreeestablishedmethodstosearchforcommunitieswithingeneinteractionnetworks.Communitiesidentifiedbythenovelapproachandbyoneoftheestablishedmethodoverlappedsignificantly(q<0.1)withcontrolpathways.Withrespecttodisease,thesecommunitieswerebiasedtogeneswithpathogenicvariantsinClinVar(p<<0.01),andoftengenesfromthesamecommunitywereco-expressed,includinginbreastcancers.Theinterestingsubsetofnovelcommunities,definedbypooroverlaptocontrolpathwaysalsocontainedco-expressedgenes,consistentwithapossiblefunctionalrole.Thisworkshowsthatcommunitydetectionbasedontopologicalfeaturesofnetworkssuggestsnew,biologicallymeaningfulgroupingsofgenesthat,inturn,pointtohealthanddiseaserelevanthypotheses.

Page 36: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

26

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 37: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

27

OPENINGTHEDOORTOTHELARGESCALEUSEOFCLINICALLABMEASURESFORASSOCIATIONTESTING:EXPLORINGDIFFERENTMETHODSFORDEFINING

PHENOTYPES

ChristopherR.Bauer,DanielLavage,JohnSnyder,JosephLeader,J.MatthewMahoney,SarahA.Pendergrass

GeisingerHealthSystem,UniversityofVermont

ChristopherBauerThepastdecadehasseenexponentialgrowthinthenumbersofsequencedandgenotypedindividualsandacorrespondingincreaseinourabilityofcollectandcataloguephenotypicdataforuseintheclinic.Wenowfacethechallengeofintegratingthesediversedatainnewwaysnewthatcanprovideusefuldiagnosticsandprecisemedicalinterventionsforindividualpatients.Oneofthefirststepsinthisprocessistoaccuratelymapthephenotypicconsequencesofthegeneticvariationinhumanpopulations.Themostcommonapproachforthisisthegenomewideassociationstudy(GWAS).Whilethistechniqueisrelativelysimpletoimplementforagivenphenotype,thechoiceofhowtodefineaphenotypeiscritical.ItisbecomingincreasinglycommonforeachindividualinaGWAScohorttohavealargeprofileofquantitativemeasures.Thestandardapproachistotestforassociationswithonemeasureatatime;however,therearemanyjustifiablewaystodefineasetofphenotypes,andthegeneticassociationsthatarerevealedwillvarybasedonthesedefinitions.Somephenotypesmayonlyshowasignificantgeneticassociationsignalwhenconsideredtogether,suchasthroughprinciplecomponentsanalysis(PCA).Combiningcorrelatedmeasuresmayincreasethepowertodetectassociationbyreducingthenoisepresentinindividualvariablesandreducethemultiplehypothesistestingburden.HereweshowthatPCAandk-meansclusteringaretwocomplimentarymethodsforidentifyingnovelgenotype-phenotyperelationshipswithinasetofquantitativehumantraitsderivedfromtheGeisingerHealthSystemelectronichealthrecord(EHR).Usingadiversesetofapproachesfordefiningphenotypemayyieldmoreinsightsintothegeneticarchitectureofcomplextraitsandthefindingspresentedherehighlightaclearneedforfurtherinvestigationintoothermethodsfordefiningthemostrelevantphenotypesinasetofvariables.AsthedataofEHRcontinuetogrow,addressingtheseissueswillbecomeincreasinglyimportantinoureffortstousegenomicdataeffectivelyinmedicine.

Page 38: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

28

TEMPORALORDEROFDISEASEPAIRSAFFECTSSUBSEQUENTDISEASETRAJECTORIES:THECASEOFDIABETESANDSLEEPAPNEA

MetteBeck1,DavidWestergaard1,LeifGroop2,SorenBrunak1

1NovoNordiskFoundationCenterforProteinResearch;2LundUniversityDiabetesCentre,DepartmentofClinicalSciences

MetteBeckMoststudiesofdiseaseetiologiesfocusononediseaseonlyandnotthefullspectrumofmultimorbiditiesthatmanypatientshave.Somediseasepairshavesharedcausalorigins,othersrepresentcommonfollow-ondiseases,whileyetotherco-occurringdiseasesmaymanifestthemselvesinrandomorderofappearance.Wediscussthesedifferenttypesofdiseaseco-occurrences,andusethetwodiseases“sleepapnea”and“diabetes”toshowcasetheapproachwhichotherwisecanbeappliedtoanydiseasepair.WebenefitfromsevenmillionelectronicmedicalrecordscoveringtheentirepopulationofDenmarkformorethan20years.Sleepapneaisthemostcommonsleep-relatedbreathingdisorderandithaspreviouslybeenshowntobebidirectionallylinkedtodiabetes,meaningthateachdiseaseincreasestheriskofacquiringtheother.Weconfirmthatthereisnosignificanttemporalrelationship,asapproximatelyhalfofpatientswithbothdiseasesarediagnosedwithdiabetesfirst.However,wealsoshowthatpatientsdiagnosedwithdiabetesbeforesleepapneahaveahigherdiseaseburdencomparedtopatientsdiagnosedwithsleepapneabeforediabetes.Thestudyclearlydemonstratesthatitisnotonlythediagnosesinthepatient’sdiseasehistorythatareimportant,butalsothespecificorderinwhichthesediagnosisaregiventhatmattersintermsofoutcome.Wesuggestthatthisshouldbeconsideredforpatientstratification.

Page 39: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

29

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER

JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

BaylorCollegeofMedicine

JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.

Page 40: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

30

MUSE:AMULTI-LOCUSSAMPLING-BASEDEPISTASISALGORITHMFORQUANTITATIVEGENETICTRAITPREDICTION

DanHe,LaxmiParida

IBMThomasJ.WatsonResearchCenter

DanHeQuantitativegenetictraitpredictionbasedonhigh-densitygenotypingarraysplaysanimportantroleforplantandanimalbreeding,aswellasgeneticepidemiologysuchascomplexdiseases.Thepredictioncanbeveryhelpfultodevelopbreedingstrategiesandiscrucialtotranslatethefindingsingeneticstoprecisionmedicine.Epistasis,thephenomenawheretheSNPsinteractwitheachother,hasbeenstudiedextensivelyinGenomeWideAssociationStudies(GWAS)butreceivedrelativelylessattentionforquantitativegenetictraitprediction.Asthenumberofpossibleinteractionsisgenerallyextremelylarge,evenpairwiseinteractionsisverychallenging.Toourknowledge,thereisnosolidsolutionyettoutilizeepistasistoimprovegenetictraitprediction.Inthiswork,westudiedthemulti-locusepistasisproblemwheretheinteractionswithmorethantwoSNPsareconsidered.WedevelopedanefficientalgorithmMUSEtoimprovethegenetictraitpredictionwiththehelpofmulti-locusepistasis.MUSEissampling-basedandweproposedafewdifferentsamplingstrategies.OurexperimentsonrealdatashowedthatMUSEisnotonlyefficientbutalsoeffectivetoimprovethegenetictraitprediction.MUSEalsoachievedverysignificantimprovementsonarealplantdatasetaswellasarealhumandataset.

Page 41: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

31

DIFFERENTIALPATHWAYDEPENDENCYDISCOVERYASSOCIATEDWITHDRUGRESPONSEACROSSCANCERCELLLINES

GilSpeyer1,DivyaMahendra1,HaiJ.Tran1,JeffKiefer1,StuartL.Schreiber2,PaulA.Clemons2,HarshilDhruv1,MichaelBerens1,SeungchanKim1

1TheTranslationalGenomicsResearchInstitute,2BroadInstituteofHarvardandMIT

SeungchanKimTheefforttopersonalizetreatmentplansforcancerpatientsinvolvestheidentificationofdrugtreatmentsthatcaneffectivelytargetthediseasewhileminimizingthelikelihoodofadversereactions.Inthisstudy,thegene-expressionprofileof810cancercelllinesandtheirresponsedatato368smallmoleculesfromtheCancerTherapeuticsResearchPortal(CTRP)areanalyzedtoidentifypathwayswithsignificantrewiringbetweengenes,ordifferentialgenedependency,betweensensitiveandnon-sensitivecelllines.Identifiedpathwaysandtheircorrespondingdifferentialdependencynetworksarefurtheranalyzedtodiscoveressentialityandspecificitymediatorsofcelllineresponsetodrugs/compounds.ForanalysisweusethepreviouslypublishedmethodEDDY(EvaluationofDifferentialDependencY).EDDYfirstconstructslikelihooddistributionsofgene-dependencynetworks,aidedbyknowngene-geneinteraction,fortwogivenconditions,forexample,sensitivecelllinesvs.non-sensitivecelllines.Thesesetsofnetworksyieldadivergencevaluebetweentwodistributionsofnetworklikelihoodsthatcanbeassessedforsignificanceusingpermutationtests.Resultingdifferentialdependencynetworkswerethenfurtheranalyzedtoidentifygenes,termedmediators,whichmayplayimportantrolesinbiologicalsignalingincertaincelllinesthataresensitiveornon-sensitivetothedrugs.Establishingstatisticalcorrespondencebetweencompoundsandmediatorscanimproveunderstandingofknowngenedependenciesassociatedwithdrugresponsewhilealsodiscoveringnewdependencies.Millionsofcomputehoursresultedinthousandsofthesestatisticaldiscoveries.EDDYidentified8,811statisticallysignificantpathwaysleadingto26,822compound-pathway-mediatortriplets.ByincorporatingSTITCHandSTRINGdatabases,wecouldconstructevidencenetworksfor14,415compound-pathway-mediatortripletsforsupport.Theresultsofthisanalysisarepresentedinasearchablewebsitetoaidresearchersinstudyingpotentialmolecularmechanismsunderlyingcells’drugresponseaswellasindesigningexperimentsforthepurposeofpersonalizedtreatmentregimens.

Page 42: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

32

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETSIN

CLEARCELLKIDNEYCANCER

JeffreyA.Thompson1,CarmenJ.Marsit2

1DartmouthCollege,2EmoryUniversity

JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcombinesmolecularandclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Furthermore,theproposedprocessofdataintegrationcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.

Page 43: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

33

DENOVOMUTATIONSINAUTISMIMPLICATETHESYNAPTICELIMINATIONNETWORK

GuhanRamVenkataraman1,ChloeO'Connell1,FumikoEgawa2,DornaKashef-Haghighi1,DennisPaulWall1

1StanfordUniversity,2St.George'sUniversity

FumikoEgawaAutismhasbeenshowntohaveamajorgeneticriskcomponent;thearchitectureofdocumentedautisminfamilieshasbeenoverandagainshowntobepasseddownforgenerations.Whileinheritedriskplaysanimportantroleintheautisticnatureofchildren,denovo(germline)mutationshavealsobeenimplicatedinautismrisk.HerewefindthatautismdenovovariantsverifiedandpublishedintheliteratureareBonferroni-significantlyenrichedinagenesetimplicatedinsynapticelimination.Additionally,severalofthegenesinthissynapticeliminationsetthatwereenrichedinprotein-proteininteractions(CACNA1C,SHANK2,SYNGAP1,NLGN3,NRXN1,andPTEN)havebeenpreviouslyconfirmedasgenesthatconferriskforthedisorder.Theresultsdemonstratethatautism-associateddenovosarelinkedtopropersynapticpruninganddensity,hintingattheetiologyofautismandsuggestingpathophysiologyfordownstreamcorrectionandtreatment.

Page 44: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

34

IDENTIFYINGGENETICASSOCIATIONSWITHVARIABILITYINMETABOLICHEALTHANDBLOODCOUNTLABORATORYVALUES:DIVINGINTOTHE

QUANTITATIVETRAITSBYLEVERAGINGLONGITUDINALDATAFROMANEHR

ShefaliS.Verma1,AnastasiaM.Lucas1,DanielR.Lavage1,JosephB.Leader1,RaghuMetpally2,SarathbabuKrishnamurthy1,FrederickDewey1,IngridBorecki1,AlexanderLopez3,JohnOverton3,

JohnPenn3,JeffreyReid3,SarahA.Pendergrass1,GerdaBreitwieser2,MarylynD.Ritchie1

1DepartmentofBiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;2DepartmentofFunctionalandMolecularGenomics,GeisingerHealthSystem,Danville,PA;

3RegeneronGeneticsCenter,Tarrytown,NYShefaliSetiaVermaAwiderangeofpatienthealthdataisrecordedinElectronicHealthRecords(EHR).Thisdataincludesdiagnosis,surgicalprocedures,clinicallaboratorymeasurements,andmedicationinformation.Togetherthisinformationreflectsthepatient’smedicalhistory.ManystudieshaveefficientlyusedthisdatafromtheEHRtofindassociationsthatareclinicallyrelevant,eitherbyutilizingInternationalClassificationofDiseases,version9(ICD-9)codesorlaboratorymeasurements,orbydesigningphenotypealgorithmstoextractcaseandcontrolstatuswithaccuracyfromtheEHR.HerewedevelopedastrategytoutilizelongitudinalquantitativetraitdatafromtheEHRatGeisingerHealthSystemfocusingonoutpatientmetabolicandcompletebloodpaneldataasastartingpoint.ComprehensiveMetabolicPanel(CMP)aswellasCompleteBloodCounts(CBC)arepartsofroutinecareandprovideacomprehensivepicturefromhighlevelscreeningofpatients’overallhealthanddisease.Werandomlysplitourdataintotwodatasetstoallowfordiscoveryandreplication.Wefirstconductedagenome-wideassociationstudy(GWAS)withmedianvaluesof25differentclinicallaboratorymeasurementstoidentifyvariantsfromHumanOmniExpressExomebeadchipdatathatareassociatedwiththesemeasurements.Weidentified687variantsthatassociatedandreplicatedwiththetestedclinicalmeasurementsatp<5x10-08.SincelongitudinaldatafromtheEHRprovidesarecordofapatient’smedicalhistory,weutilizedthisinformationtofurtherinvestigatetheICD-9codesthatmightbeassociatedwithdifferencesinvariabilityofthemeasurementsinthelongitudinaldataset.WeidentifiedlowandhighvariancepatientsbylookingatchangeswithintheirindividuallongitudinalEHRlaboratoryresultsforeachofthe25clinicallabvalues(thuscreating50groups–ahighvarianceandalowvarianceforeachlabvariable).WethenperformedaPheWASanalysiswithICD-9diagnosiscodes,separatelyinthehighvariancegroupandthelowvariancegroupforeachlabvariable.Wefound717PheWASassociationsthatreplicatedatap-valuelessthan0.001.Next,weevaluatedtheresultsofthisstudybycomparingtheassociationresultsbetweenthehighandlowvariancegroups.Forexample,wefound39SNPs(inmultiplegenes)associatedwithICD-9250.01(Type-Idiabetes)inpatientswithhighvarianceofplasmaglucoselevels,butnotinpatientswithlowvarianceinplasmaglucoselevels.Anotherexampleistheassociationof4SNPsinUMODwithchronickidneydiseaseinpatientswithhighvarianceforaspartateaminotransferase(discoveryp-value:8.71x10-09andreplicationp-value:2.03x10-06).Ingeneral,weseeapatternofmanymore statisticallysignificantassociationsfrompatientswithhighvarianceinthequantitativelabvariables, incomparisonwiththelowvariancegroupacrossallofthe25laboratorymeasurements.Thisstudy isoneofthefirstofitskindtoutilizequantitativetraitvariancefromlongitudinallaboratorydatato findassociationsamonggeneticvariantsandclinicalphenotypesobtainedfromanEHR,integrating laboratoryvaluesanddiagnosiscodestounderstandthegeneticcomplexitiesofcommondiseases.

Page 45: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

35

STRATEGIESFOREQUITABLEPHARMACOGENOMIC-GUIDEDWARFARINDOSINGAMONGEUROPEANANDAFRICANAMERICANINDIVIDUALSINACLINICAL

POPULATION

LauraWiley1,JacobVanHouten2,DavidSamuels2,MelindaAldrich3,DanRoden2,JoshPeterson2,JoshuaDenny2

1UniversityofColorado,2VanderbiltUniversity,3VanderbiltUniversityMedicalCenter

LauraWileyThebloodthinnerwarfarinhasanarrowtherapeuticrangeandhighinter-andintra-patientvariabilityintherapeuticdoses.Severalstudieshaveshownthatpharmacogenomicvariantshelppredictstablewarfarindosing.However,retrospectiveandrandomizedcontrolledtrialsthatemploydosingalgorithmsincorporatingpharmacogenomicvariantsunderperforminAfricanAmericans.Thisstudysoughttodetermineif:1)includingadditionalvariantsassociatedwithwarfarindoseinAfricanAmericans,2)predictingwithinsingleancestrygroupsratherthanacombinedpopulation,or3)usingpercentageAfricanancestryratherthanobservedrace,wouldimprovewarfarindosingalgorithmsinAfricanAmericans.UsingBioVU,theVanderbiltUniversityMedicalCenterbiobanklinkedtoelectronicmedicalrecords,wecompared25modelingstrategiestoexistingalgorithmsusingacohortof2,181warfarinusers(1,928whites,253blacks).Wefoundthatapproachesincorporatingadditionalvariantsincreasedmodelaccuracy,butnotinclinicallysignificantways.RacestratificationincreasedmodelfidelityforAfricanAmericans,buttheimprovementwassmallandnotlikelytobeclinicallysignificant.UseofpercentAfricanancestryimprovedmodelfitinthecontextofracemisclassification.

Page 46: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

36

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 47: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

37

PRODUCTIONOFAPRELIMINARYQUALITYCONTROLPIPELINEFORSINGLENUCLEIRNA-SEQANDITSAPPLICATIONINTHEANALYSISOFCELLTYPE

DIVERSITYOFPOST-MORTEMHUMANBRAINNEOCORTEX

BrianAevermann1,JamisonMcCorrison1,PratapVenepally1,RebeccaHodge2,TrygveBakken2,JeremyMiller2,MarkNovotny1,DannyN.Tran1,FranciscoDiez-Fuertes3,LenaChristiansen4,FanZhang4,FrankSteemers4,RogerS.Lasken1,EdLein2,NicholasSchork1,

RichardH.Scheuermann1

1J.CraigVenterInstitute,2AllenInstituteforBrainScience,3InstitutodeSaludCarlosIII,4Illumina,Inc.

RichardScheuermannNextgenerationsequencingoftheRNAcontentofsinglecellsorsinglenuclei(sc/nRNA-seq)hasbecomeapowerfulapproachtounderstandthecellularcomplexityanddiversityofmulticellularorganismsandenvironmentalecosystems.However,thefactthattheprocedurebeginswitharelativelysmallamountofstartingmaterial,therebypushingthelimitsofthelaboratoryproceduresrequired,dictatesthatcarefulapproachesforsamplequalitycontrol(QC)areessentialtoreducetheimpactoftechnicalnoiseandsamplebiasindownstreamanalysisapplications.HerewepresentapreliminaryframeworkforsamplelevelqualitycontrolthatisbasedonthecollectionofaseriesofquantitativelaboratoryanddatametricsthatareusedasfeaturesfortheconstructionofQCclassificationmodelsusingrandomforestmachinelearningapproaches.We’veappliedthisinitialframeworktoadatasetcomprisedof2272singlenucleiRNA-seqresultsanddeterminedthat~79%ofsampleswereofhighquality.Removalofthepoorqualitysamplesfromdownstreamanalysiswasfoundtoimprovethecelltypeclusteringresults.Inaddition,thisapproachidentifiedquantitativefeaturesrelatedtotheproportionofuniqueorduplicatereadsandtheproportionofreadsremainingafterqualitytrimmingasusefulfeaturesforpass/failclassification.Theconstructionanduseofclassificationmodelsfortheidentificationofpoorqualitysamplesprovidesforanobjectiveandscalableapproachtosc/nRNA-seqqualitycontrol.

Page 48: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

38

TRACINGCO-REGULATORYNETWORKDYNAMICSINNOISY,SINGLE-CELLTRANSCRIPTOMETRAJECTORIES

PabloCordero,JoshuaM.Stuart

UCSantaCruzGenomicsInstitute,UniversityofCalifornia,SantaCruz

PabloCorderoTheavailabilityofgeneexpressiondataatthesinglecelllevelmakesitpossibletoprobethemolecularunderpinningsofcomplexbiologicalprocessessuchasdifferentiationandoncogenesis.Promisingnewmethodshaveemergedforreconstructingaprogression'trajectory'fromstaticsingle-celltranscriptomemeasurements.However,itremainsunclearhowtoadequatelymodeltheappreciablelevelofnoiseinthesedatatoelucidategeneregulatorynetworkrewiring.Here,wepresentaframeworkcalledSingleCellInferenceofMorphIngTrajectoriesandtheirAssociatedRegulation(SCIMITAR)thatinfersprogressionsfromstaticsingle-celltranscriptomesbyemployingacontinuousparametrizationofGaussianmixturesinhigh-dimensionalcurves.SCIMITARyieldsrichmodelsfromthedatathathighlightgeneswithexpressionandco-expressionpatternsthatareassociatedwiththeinferredprogression.Further,SCIMITARextractsregulatorystatesfromtheimplicatedtrajectory-evolvingco-expressionnetworks.Webenchmarkthemethodonsimulateddatatoshowthatityieldsaccuratecellorderingandgenenetworkinferences.Appliedtotheinterpretationofasingle-cellhumanfetalneurondataset,SCIMITARfindsprogression-associatedgenesincornerstoneneuraldifferentiationpathwaysmissedbystandarddifferentialexpressiontests.Finally,byleveragingtherewiringofgene-geneco-expressionrelationsacrosstheprogression,themethodrevealstheriseandfallofco-regulatorystatesandtrajectory-dependentgenemodules.Theseanalysesimplicatenewtranscriptionfactorsinneuraldifferentiationincludingputativeco-factorsforthemulti-functionalNFATpathway.

Page 49: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

39

ANUPDATEDDEBARCODINGTOOLFORMASSCYTOMETRYWITHCELLTYPE-SPECIFICANDCELLSAMPLE-SPECIFICSTRINGENCYADJUSTMENT

KristinI.Fread1,WilliamD.Strickland2,GarryP.Nolan3,EliR.Zunder1

1DepartmentofBiomedicalEngineering,UniversityofVirginia;2DepartmentofBiomedicalSciences,UniversityofVirginia;3DepartmentofMicrobiologyand

Immunology,StanfordUniversity

EliZunderPooledsampleanalysisbymasscytometrybarcodingcarriesmanyadvantages:reducedantibodyconsumption,increasedsamplethroughput,removalofcelldoublets,reductionofcross-contaminationbysamplecarryover,andtheeliminationoftube-to-tube-variabilityinantibodystaining.Asingle-celldebarcodingalgorithmwaspreviouslydevelopedtoimprovetheaccuracyandyieldofsampledeconvolution,butthismethodwaslimitedtousingfixedparametersfordebarcodingstringencyfiltering,whichcouldintroducecell-specificorsample-specificbiastocellyieldinscenarioswherebarcodestainingintensityandvariancearenotuniformacrossthepooledsamples.Toaddressthisissue,wehaveupdatedthealgorithmtooutputdebarcodingparametersforeverycellinthesample-assignedFCSfiles,whichallowsforvisualizationandanalysisoftheseparametersviaflowcytometryanalysissoftware.Thisstrategycanbeusedtodetectcelltype-specificandsample-specificeffectsontheunderlyingcelldatathatariseduringthedebarcodingprocess.Anadditionalbenefittothisstrategyisthedecouplingofbarcodestringencyfilteringfromthedebarcodingandsampleassignmentprocess.Thisisaccomplishedbyremovingthestringencyfiltersduringsampleassignment,andthenfilteringafterthefactwith1-and2-dimensionalgatingonthedebarcodingparameterswhichareoutputwiththeFCSfiles.Thesedataexplorationstrategiesserveasanimportantqualitycheckforbarcodedmasscytometrydatasets,andallowcelltypeandsample-specificstringencyadjustmentthatcanremovebiasincellyieldintroducedduringthedebarcodingprocess.

Page 50: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

40

IMAGINGGENOMICS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 51: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

41

ADAPTIVETESTINGOFSNP-BRAINFUNCTIONALCONNECTIVITYASSOCIATIONVIAAMODULARNETWORKANALYSIS

ChenGao,JunghiKim,WeiPan

DivisionofBiostatistics,SchoolofPublicHealth,UniversityofMinnesota

WeiPanDuetoitshighdimensionalityandhighnoiselevels,analysisofalargebrainfunctionalnetworkmaynotbepowerfulandeasytointerpret;instead,decompositionofalargenetworkintosmallersubcomponentscalledmodulesmaybemorepromisingassuggestedbysomeempiricalevidence.Forexample,alterationofbrainmodularityisobservedinpatientssufferingfromvarioustypesofbrainmalfunctions.Althoughseveralmethodsexistforestimatingbrainfunctionalnetworks,suchasthesamplecorrelationmatrixorgraphicallassoforasparseprecisionmatrix,itisstilldifficulttoextractmodulesfromsuchnetworkestimates.Motivatedbytheseconsiderations,weadaptaweightedgeneco-expressionnetworkanalysis(WGCNA)frameworktoresting-statefMRI(rs-fMRI)datatoidentifymodularstructuresinbrainfunctionalnetworks.Modularstructuresareidentifiedbyusingtopologicaloverlapmatrix(TOM)elementsinhierarchicalclustering.Weproposeapplyinganewadaptivetestbuiltontheproportionaloddsmodel(POM)thatcanbeappliedtoahigh-dimensionalsetting,wherethenumberofvariables(p)canexceedthesamplesize(n)inadditiontotheusualp<nsetting.WeappliedourproposedmethodstotheADNIdatatotestforassociationsbetweenageneticvariantandeitherthewholebrainfunctionalnetworkoritsvarioussubcomponentsusingvariousconnectivitymeasures.Weuncoveredseveralmodulesbasedonthecontrolcohort,andsomeofthemweremarginallyassociatedwiththeAPOE4variantandseveralotherSNPs;however,duetothesmallsamplesizeoftheADNIdata,largerstudiesareneeded.

Page 52: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

42

EXPLORINGBRAINTRANSCRIPTOMICPATTERNS:ATOPOLOGICALANALYSISUSINGSPATIALEXPRESSIONNETWORKS

ZhanaKuncheva1,MichelleL.Krishnan2,GiovanniMontana2

1ImperialCollegeLondon,2King'sCollegeLondon

ZhanaKunchevaCharacterizingthetranscriptomearchitectureofthehumanbrainisfundamentalingaininganunderstandingofbrainfunctionanddisease.AnumberofrecentstudieshaveinvestigatedpatternsofbraingeneexpressionobtainedfromanextensiveanatomicalcoverageacrosstheentirehumanbrainusingexperimentaldatageneratedbytheAllenHumanBrainAtlas(AHBA)project.Inthispaper,weproposeanewrepresentationofagene'stranscriptionactivitythatexplicitlycapturesthepatternofspatialco-expressionacrossdifferentanatomicalbrainregions.Foreachgene,wedefineaSpatialExpressionNetwork(SEN),anetworkquantifyingco-expressionpatternsamongstseveralanatomicallocations.NetworksimilaritymeasuresarethenemployedtoquantifythetopologicalresemblancebetweenpairsofSENsandidentifynaturallyoccurringclusters.Usingnetwork-theoreticalmeasures,threelargeclustershavebeendetectedfeaturingdistincttopologicalproperties.WethenevaluatewhethertopologicaldiversityoftheSENsreflectssignificantdifferencesinbiologicalfunctionthroughageneontologyanalysis.WereportonevidencesuggestingthatoneofthethreeSENclustersconsistsofgenesspecificallyinvolvedinthenervoussystem,includinggenesrelatedtobraindisorders,whiletheremainingtwoclustersarerepresentativeofimmunity,transcriptionandtranslation.Thesefindingsareconsistentwithpreviousstudiesshowingthatbraingeneclustersaregenerallyassociatedwithoneofthesethreemajorbiologicalprocesses.

Page 53: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

43

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 54: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

44

ADEEPLEARNINGAPPROACHFORCANCERDETECTIONANDRELEVANTGENEIDENTIFICATION

PadidehDanaee,RezaGhaeini,DavidHendrix

OregonStateUniversity

PadidehDaneeCancerdetectionfromgeneexpressiondatacontinuestoposeachallengeduetothehighdimensionalityandcomplexityofthesedata.Afterdecadesofresearchthereisstilluncertaintyintheclinicaldiagnosisofcancerandtheidentificationoftumor-specificmarkers.Herewepresentadeeplearningapproachtocancerdetection,andtotheidentificationofgenescriticalforthediagnosisofbreastcancer.First,weusedStackedDenoisingAutoencoder(SDAE)todeeplyextractfunctionalfeaturesfromhighdimensionalgeneexpressionprofiles.Next,weevaluatedtheperformanceoftheextractedrepresentationthroughsupervisedclassificationmodelstoverifytheusefulnessofthenewfeaturesincancerdetection.Lastly,weidentifiedasetofhighlyinteractivegenesbyanalyzingtheSDAEconnectivitymatrices.Ourresultsandanalysisillustratethatthesehighlyinteractivegenescouldbeusefulcancerbiomarkersforthedetectionofbreastcancerthatdeservefurtherstudies.

Page 55: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

45

GENOME-WIDEINTERACTIONWITHSELECTEDTYPE2DIABETESLOCIREVEALSNOVELLOCIFORTYPE2DIABETESINAFRICANAMERICANS

JacobM.Keaton1,JacklynN.Hellwege1,MaggieC.Y.Ng1,NicholetteD.Palmer1,JamesS.Pankow2,MyriamFornage3,JamesG.Wilson4,AdolofoCorrea4,LauraJ.Rasmussen-Torvik5,JeromeI.Rotter6,Yii-DerI.Chen6,KentD.Taylor6,StephenS.Rich7,LynneE.

Wagenknecht1,BarryI.Freedman1,DonaldW.Bowden1

1WakeForestSchoolofMedicine,2UniversityofMinnesota,3UniversityofTexasHealthScienceCenteratHouston,4UniversityofMississippiMedicalCenter,5NorthwesternUniversityFeinbergSchoolofMedicine,6Harbor-UCLAMedicalCenter,7Universityof

Virginia

JacobKeatonType2diabetes(T2D)istheresultofmetabolicdefectsininsulinsecretionandinsulinsensitivity,yetmostT2Dlociidentifiedtodateinfluenceinsulinsecretion.WehypothesizedthatT2Dloci,particularlythoseaffectinginsulinsensitivity,canbeidentifiedthroughinteractionwithknownT2Dlociimplicatedininsulinsecretion.Totestthishypothesis,singlenucleotidepolymorphisms(SNPs)nominallyassociatedwithacuteinsulinresponsetoglucose(AIRg),adynamicmeasureoffirst-phaseinsulinsecretion,andpreviouslyassociatedwithT2Dingenome-wideassociationstudies(GWAS)wereidentifiedinAfricanAmericansfromtheInsulinResistanceAtherosclerosisFamilyStudy(IRASFS;n=492subjects).TheseSNPsweretestedforinteraction,individuallyandjointlyasageneticriskscore(GRS),usingGWASdatafromfivecohorts(ARIC,CARDIA,JHS,MESA,WFSM;n=2,725cases,4,167controls)withT2Dastheoutcome.Insinglevariantanalyses,suggestivelysignificant(Pinteraction<5x10-6)interactionswereobservedatseverallociincludingDGKB(rs978989),CDK18(rs12126276),CXCL12(rs7921850),HCN1(rs6895191),FAM98A(rs1900780),andMGMT(rs568530).Notablebeta-cellGRSinteractionsincludedtwoSNPsattheDGKBlocus(rs6976381;rs6962498).ThesedatasupportthehypothesisthatadditionalgeneticfactorscontributingtoT2Driskcanbeidentifiedbyinteractionswithinsulinsecretionloci.

Page 56: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

46

META-ANALYSISOFCONTINUOUSPHENOTYPESIDENTIFIESAGENESIGNATURETHATCORRELATESWITHCOPDDISEASESTATUS

MadeleineScott1,FrancescoVallania2,PurveshKhatri3

1StanfordMedicalSchool,StanfordUniversity,Stanford,California;2StanfordInstituteforImmunity,Transplantation,andInfection,StanfordUniversity,Stanford,California;3StanfordCenterforBiomedicalInformaticsResearch,StanfordUniversity,Stanford,

California

PurveshKhatriTheutilityofmulti-cohorttwo-classmeta-analysistoidentifyrobustdifferentiallyexpressedgenesignatureshasbeenwellestablished.However,manybiomedicalapplications,suchasgenesignaturesofdiseaseprogression,requireone-classanalysis.HerewedescribeanRpackage,MetaCorrelator,thatcanidentifyareproducibletranscriptionalsignaturethatiscorrelatedwithacontinuousdiseasephenotypeacrossmultipledatasets.Wesuccessfullyappliedthisframeworktoextractapatternofgeneexpressionthatcanpredictlungfunctioninpatientswithchronicobstructivepulmonarydisease(COPD)inbothperipheralbloodmononuclearcells(PBMCs)andtissue.OurresultspointtoadisregulationintheoxidationstateofthelungsofpatientswithCOPD,aswellasunderscoretheclassicallyrecognizedinflammatorystatethatunderliesthisdisease.

Page 57: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

47

LEARNINGPARSIMONIOUSENSEMBLESFORUNBALANCEDCOMPUTATIONALGENOMICSPROBLEMS

AnaStanescu,GauravPandey

IcahnSchoolofMedicineatMountSinai

GauravPandeyPredictionproblemsinbiomedicalsciencesaregenerallyquitedifficult,partiallyduetoincompleteknowledgeofhowthephenomenonofinterestisinfluencedbythevariablesandmeasurementsusedforprediction,aswellasalackofconsensusregardingtheidealpredictor(s)forspecificproblems.Inthesesituations,apowerfulapproachtoimprovingpredictionperformanceistoconstructensemblesthatcombinetheoutputsofmanyindividualbasepredictors,whichhavebeensuccessfulformanybiomedicalpredictiontasks.Moreover,selectinga{\itparsimonious}ensemblecanbeofevengreatervalueforbiomedicalsciences,whereitisnotonlyimportanttolearnanaccuratepredictor,butalsotointerpretwhatnovelknowledgeitcanprovideaboutthetargetproblem.Ensembleselectionisapromisingapproachforthistaskbecauseofitsabilitytoselectacollectivelypredictivesubset,oftenarelativelysmallone,ofallinputbasepredictors.Oneofthemostwell-knownalgorithmsforensembleselection,CES(Caruana{\itetal.}'sEnsembleSelection),generallyperformswellinpractice,butfacesseveralchallengesduetothedifficultyofchoosingtherightvaluesofitsvariousparameters.Sincethechoicesmadefortheseparametersareusuallyad-hoc,goodperformanceofCESisdifficulttoguaranteeforavarietyofproblemsordatasets.ToaddressthesechallengeswithCESandothersuchalgorithms,weproposeanovelheterogeneousensembleselectionapproachbasedontheparadigmofreinforcementlearning(RL),whichoffersamoresystematicandmathematicallysoundmethodologyforexploringthemanypossiblecombinationsofbasepredictorsthatcanbeselectedintoanensemble.WedevelopthreeRL-basedstrategiesforconstructingensemblesandanalyzetheirresultsontwounbalancedcomputationalgenomicsproblems,namelythepredictionofproteinfunctionandsplicesitesineukaryoticgenomes.Weshowthattheresultantensemblesareindeedsubstantiallymoreparsimoniousascomparedtothefullsetofbasepredictors,yetstillofferalmostthesameclassificationpower,especiallyforlargerdatasets.TheRLensemblesalsoyieldabettercombinationofparsimonyandpredictiveperformanceascomparedtoCES.

Page 58: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

48

NETWORKMAPOFADVERSEHEALTHEFFECTSAMONGVICTIMSOFINTIMATEPARTNERVIOLENCE

KathleenWhiting1,LarryY.Liu2,MehmetKoyutürk2,GunnurKarakurt2

1UniformedServicesUniversity,2CaseWesternReserveUniversity

GunnurKarakurtIntimatepartnerviolence(IPV)isaseriousproblemwithdevastatinghealthconsequences.ScreeningproceduresmayoverlookrelationshipsbetweenIPVandnegativehealtheffects.ToidentifyIPV-associatedwomen’shealthissues,weminednational,aggregatedde-identifiedelectronichealthrecorddataandcomparedfemalehealthissuesofdomesticabuse(DA)versusnon-DArecords,identifyingtermssignificantlymorefrequentfortheDAgroup.Aftercodingthesetermsinto28broadcategories,wedevelopedanetworkmaptodeterminestrengthofrelationshipsbetweencategoriesinthecontextofDA,findingthatacuteconditionsarestronglyconnectedtocardiovascular,gastrointestinal,gynecological,andneurologicalconditionsamongvictims.

Page 59: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

49

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 60: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

50

APOWERFULMETHODFORINCLUDINGGENOTYPEUNCERTAINTYINTESTSOFHARDY-WEINBERGEQUILIBRIUM

AndrewBeck1,AlexanderLuedtke2,KeliLiu3,NathanTintle4

1UniversityofMichigan,2UniversityofCalifornia-Berkeley,3HarvardUniversity,4DordtCollege

NathanTintleTheuseofposteriorprobabilitiestosummarizegenotypeuncertaintyispervasiveacrossgenotype,sequencingandimputationplatforms.Priorworkinmanycontextshasshowntheutilityofincorporatinggenotypeuncertainty(posteriorprobabilities)indownstreamstatisticaltests.TypicalapproachestoincorporatinggenotypeuncertaintywhentestingHardy-WeinbergequilibriumtendtolackcalibrationinthetypeIerrorrate,especiallyasgenotypeuncertaintyincreases.WeproposeanewapproachinthespiritofgenomiccontrolthatproperlycalibratesthetypeIerrorrate,whileyieldingimprovedpowertodetectdeviationsfromHardy-WeinbergEquilibrium.Wedemonstratetheimprovedperformanceofourmethodonbothsimulatedandrealgenotypes.

Page 61: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

51

MICRORNA-AUGMENTEDPATHWAYS(MIRAP)ANDTHEIRAPPLICATIONSTOPATHWAYANALYSISANDDISEASESUBTYPING

DianaDiaz1,MicheleDonato2,TinNguyen1,SorinDraghici1

1WayneStateUniversity,2StanfordUniversityMedicalCenter

SorinDraghiciMicroRNAsplayimportantrolesinthedevelopmentofmanycomplexdiseases.Becauseoftheirimportance,theanalysisofsignalingpathwaysincludingmiRNAinteractionsholdsthepotentialforunveilingthemechanismsunderlyingsuchdiseases.However,currentsignalingpathwaydatabasesarelimitedtointeractionsbetweengenesandignoremiRNAs.Here,weusetheinformationonmiRNAtargetstobuildadatabaseofmiRNA-augmentedpathways(mirAP),andweshowitsapplicationinthecontextsofintegrativepathwayanalysisanddiseasesubtyping.OurmiRNA-mRNAintegrativepathwayanalysispipelineincorporatesatopology-awareapproachthatwepreviouslyimplemented.Ourintegrativediseasesubtypingpipelinetakesintoaccountsurvivaldata,geneandmiRNAexpression,andknowledgeoftheinteractionsamonggenes.Wedemonstratetheadvantagesofourapproachbyanalyzingninesample-matcheddatasetsthatprovidebothmiRNAandmRNAexpression.WeshowthatintegratingmiRNAsintopathwayanalysisresultsingreaterstatisticalpower,andprovidesamorecomprehensiveviewoftheunderlyingphenomena.Wealsocompareourdiseasesubtypingmethodwiththestate-of-the-artintegrativeanalysisbyanalyzingacolorectalcancerdatabasefromTCGA.Thecolorectalcancersubtypesidentifiedbyourapproacharesignificantlydifferentintermsoftheirsurvivalexpectation.ThesemiRNA-augmentedpathwaysofferamorecomprehensiveviewandadeeperunderstandingofbiologicalpathways.Abetterunderstandingofthemolecularprocessesassociatedwithpatients'survivalcanhelptoabetterprognosisandanappropriatetreatmentforeachsubtype.

Page 62: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

52

FREQUENTSUBGRAPHMININGOFPERSONALIZEDSIGNALINGPATHWAYNETWORKSGROUPSPATIENTSWITHFREQUENTLYDYSREGULATEDDISEASE

PATHWAYSANDPREDICTSPROGNOSIS

ArdaDurmaz,TimA.D.Henderson,DouglasBrubaker,GurkanBebek

CaseWesternReserveUniversity

GurkanBebekMotivation:Largescalegenomicsstudieshavegeneratedcomprehensivemolecularcharacterizationofnumerouscancertypes.Subtypesformanytumortypeshavebeenestablished;however,theseclassificationsarebasedonmolecularcharacteristicsofasmallgenesetswithlimitedpowertodetectdysregulationatthepatientlevel.Wehypothesizethatfrequentgraphminingofpathwaystogatherpathwaysfunctionallyrelevanttotumorscancharacterizetumortypesandprovideopportunitiesforpersonalizedtherapies.Results:Inthisstudywepresentanintegrativeomicsapproachtogrouppatientsbasedontheiralteredpathwaycharacteristicsandshowprognosticdifferenceswithinbreastcancer(p<9.57E−10)andglioblastomamultiforme(p<0.05)patients.WewereablevalidatethisapproachinsecondaryRNA-Seqdatasetswithp<0.05andp<0.01respectively.Wealsoperformedpathwayenrichmentanalysistofurtherinvestigatethebiologicalrelevanceofdysregulatedpathways.Wecomparedourapproachwithnetwork-basedclassifieralgorithmsandshowedthatourunsupervisedapproachgeneratesmorerobustandbiologicallyrelevantclusteringwhereaspreviousapproachesfailedtoreportspecificfunctionsforsimilarpatientgroupsorclassifypatientsintoprognosticgroups.Conclusions:Theseresultscouldserveasameanstoimproveprognosisforfuturecancerpatients,andtoprovideopportunitiesforimprovedtreatmentoptionsandpersonalizedinterventions.TheproposednovelgraphminingapproachisabletointegratePPInetworkswithgeneexpressioninabiologicallysoundapproachandclusterpatientsintoclinicallydistinctgroups.WehaveutilizedbreastcancerandglioblastomamultiformedatasetsfrommicroarrayandRNA-Seqplatformsandidentifieddiseasemechanismsdifferentiatingsamples.

Page 63: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

53

CERNASEARCHMETHODIDENTIFIEDAMET-ACTIVATEDSUBGROUPAMONGEGFRDNAAMPLIFIEDLUNGADENOCARCINOMAPATIENTS

HallaKabat,LeoTunkle,InhanLee

miRcore

InhanLeeGiventhediversemolecularpathwaysinvolvedintumorigenesis,identifyingsubgroupsamongcancerpatientsiscrucialinprecisionmedicine.WhilemosttargetedtherapiesrelyonDNAmutationstatusintumors,responsestosuchtherapiesvaryduetothemanymolecularprocessesinvolvedinpropagatingDNAchangestoproteins(whichconstitutetheusualdrugtargets).ThoughRNAexpressionshavebeenextensivelyusedtocategorizetumors,identifyingclinicallyimportantsubgroupsremainschallenginggiventhedifficultyofdiscerningsubgroupswithinallpossibleRNA-RNAnetworks.Itisthusessentialtoincorporatemultipletypesofdata.Recently,RNAwasfoundtoregulateotherRNAthroughacommonmicroRNA(miR).TheseregulatingandregulatedRNAsarereferredtoascompetingendogenousRNAs(ceRNAs).However,globalcorrelationsbetweenmRNAandmiRexpressionsacrossallsampleshavenotreliablyyieldedceRNAs.Inthisstudy,wedevelopedaceRNA-basedmethodtoidentifysubgroupsofcancerpatientscombiningDNAcopynumbervariation,mRNAexpression,andmicroRNA(miR)expressiondatawithbiologicalknowledge.ClinicaldataisusedtovalidateidentifiedsubgroupsandceRNAs.SinceceRNAsarecausal,ceRNA-basedsubgroupsmaypresentclinicalrelevance.UsinglungadenocarcinomadatafromTheCancerGenomeAtlas(TCGA)asanexample,wefocusedonEGFRamplificationstatus,sinceatargetedtherapyforEGFRexists.WehypothesizedthatglobalcorrelationsbetweenmRNAandmiRexpressionsacrossallpatientswouldnotrevealimportantsubgroupsandthatclusteringofpotentialceRNAsmightdefinemolecularpathway-relevantsubgroups.UsingexperimentallyvalidatedmiR-targetpairs,weidentifiedEGFRandMETaspotentialceRNAsformiR-133binlungadenocarcinoma.TheEGFR-METupandmiR-133bdownsubgroupshowedahigherdeathratethantheEGFR-METdownandmiR-133bupsubgroup.AlthoughtransactivationbetweenMETandEGFRhasbeenidentifiedpreviously,ourresultisthefirsttoproposeceRNAasoneofitsunderlyingmechanisms.Furthermore,sinceMETamplificationwasseeninthecaseofresistancetoEGFR-targetedtherapy,theEGFR-METupandmiR-133bdownsubgroupmayfallintothedrugnon-responsegroupandthusprecludeEGFRtargettherapy.

Page 64: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

54

IMPROVEDPERFORMANCEOFGENESETANALYSISONGENOME-WIDETRANSCRIPTOMICSDATAWHENUSINGGENEACTIVITYSTATEESTIMATES

ThomasKamp,MicahAdams,CraigDisselkoen,NathanTintle

DordtCollege

NathanTintleGenesetanalysismethodscontinuetobeapopularandpowerfulmethodofevaluatinggenome-widetranscriptomicsdata.Theseapproachrequireapriorigroupingofgenesintobiologicallymeaningfulsets,andthenconductingdownstreamanalysesattheset(insteadofgene)levelofanalysis.Genesetanalysismethodshavebeenshowntoyieldmorepowerfulstatisticalconclusionsthansingle-geneanalysesduetobothreducedmultipletestingpenaltiesandpotentiallylargerobservedeffectsduetotheaggregationofeffectsacrossmultiplegenesintheset.Traditionally,genesetanalysismethodshavebeenapplieddirectlytonormalized,log-transformed,transcriptomicsdata.Recently,effortshavebeenmadetotransformtranscriptomicsdatatoscalesyieldingmorebiologicallyinterpretableresults.Forexample,recentlyproposedmodelstransformlog-transformedtranscriptomicsdatatoaconfidencemetric(rangingbetween0and100%)thatageneisactive(roughlyspeaking,thatthegeneproductispartofanactivecellularmechanism).Inthismanuscript,wedemonstrate,onbothrealandsimulatedtranscriptomicsdata,thattestsfordifferentialexpressionbetweensetsofgenesusingaretypicallymorepowerfulwhenusinggeneactivitystateestimatesasopposedtolog-transformedgeneexpressiondata.Ouranalysissuggestsfurtherexplorationoftechniquestotransformtranscriptomicsdatatomeaningfulquantitiesforimproveddownstreaminference.

Page 65: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

55

METHYLDMV:SIMULTANEOUSDETECTIONOFDIFFERENTIALDNAMETHYLATIONANDVARIABILITYWITHCONFOUNDERADJUSTMENT

PeiFenKuan,JunyanSong,ShuyaoHe

StonyBrookUniversity

PeiFenKuanDNAmethylationhasemergedaspromisingepigeneticmarkersfordiseasediagnosis.Boththedifferentialmean(DM)anddifferentialvariability(DV)inmethylationhavebeenshowntocontributetotranscriptionalaberrationanddiseasepathogenesis.ThepresenceofconfoundingfactorsinlargescaleEWASmayaffectthemethylationvaluesandhamperaccuratemarkerdiscovery.Inthispaper,weproposeaflexibleframeworkcalledmethylDMVwhichallowsforconfoundingfactorsadjustmentandenablessimultaneouscharacterizationandidentificationofCpGsexhibitingDMonly,DVonlyandbothDMandDV.Theproposedframeworkalsoallowsforprioritizationandselectionofcandidatefeaturestobeincludedinthepredictionalgorithm.WeillustratetheutilityofmethylDMVinseveralTCGAdatasets.AnRpackagemethylDMVimplementingourproposedmethodisavailableathttp://www.ams.sunysb.edu/~pfkuan/softwares.html#methylDMV.

Page 66: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

56

IDENTIFYCANCERDRIVERGENESTHROUGHSHAREDMENDELIANDISEASEPATHOGENICVARIANTSANDCANCERSOMATICMUTATIONS

MengMa1,ChangchangWang2,BenjaminGlicksberg1,EricE.Schadt1,ShuyuLi1,RongChen1

1IcahnSchoolofMedicineatMountSinai,2AnhuiUniversity

ShuyuLiGenomicsequencingstudiesinthepastseveralyearshaveyieldedalargenumberofcancersomaticmutations.Thereremainsamajorchallengeindelineatingasmallfractionofsomaticmutationsthatareoncogenicdriversfromabackgroundofpredominantlypassengermutations.Althoughcomputationaltoolshavebeendevelopedtopredictthefunctionalimpactofmutations,theirutilityislimited.Inthisstudy,weappliedanalternativeapproachtoidentifypotentiallynovelcancerdriversasthosesomaticmutationsthatoverlapwithknownpathogenicmutationsinMendeliandiseases.Wehypothesizethatthosesharedmutationsaremorelikelytobecancerdriversbecausetheyhavetheestablishedmolecularmechanismstoimpactproteinfunctions.WefirstshowthattheoverlapbetweensomaticmutationsinCOSMICandpathogenicgeneticvariantsinHGMDisassociatedwithhighmutationfrequencyincancersandisenrichedforknowncancergenes.WethenattemptedtoidentifyputativetumorsuppressorsbasedonthenumberofdistinctHGMD/COSMICoverlappingmutationsinagivengene,andourresultssuggestthationchannels,collagensandMarfansyndromeassociatedgenesmayrepresentnewclassesoftumorsuppressors.Toelucidatepotentiallynoveloncogenes,weidentifiedthoseHGMD/COSMICoverlappingmutationsthatarenotonlyhighlyrecurrentbutalsomutuallyexclusivefrompreviouslycharacterizedoncogenicmutationsineachspecificcancertype.Takentogether,ourstudyrepresentsanovelapproachtodiscovernewcancergenesfromthevastamountofcancergenomesequencingdata.

Page 67: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

57

IDENTIFYINGCANCERSPECIFICMETABOLICSIGNATURESUSINGCONSTRAINT-BASEDMODELS

AndréSchultz1,SanketMehta1,ChenyueW.Hu1,FiekeW.Hoff2,TerzahM.Horton3,StevenM.Kornblau2,AminaA.Qutub1

1RiceUniversity,2UniversityofTexasMDAndersonCancerCenter,3BaylorCollegeof

MedicineandTexasChildren'sHospital

AndréSchultzCancermetabolismdiffersremarkablyfromthemetabolismofhealthysurroundingtissues,anditisextremelyheterogeneousacrosscancertypes.Whilethesemetabolicdifferencesprovidepromisingavenuesforcancertreatments,muchworkremainstobedoneinunderstandinghowmetabolismisrewiredinmalignanttissues.Tothatend,constraint-basedmodelsprovideapowerfulcomputationaltoolforthestudyofmetabolismatthegenomescale.Togeneratemeaningfulpredictions,however,thesegeneralizedhumanmodelsmustfirstbetailoredforspecificcellortissuesub-types.Herewefirstpresenttwoimprovedalgorithmsfor(1)thegenerationofthesecontext-specificmetabolicmodelsbasedonomicsdata,and(2)Monte-Carlosamplingofthemetabolicmodelfluxspace.Byapplyingthesemethodstogenerateandanalyzecontext-specificmetabolicmodelsofdiversesolidcancercelllinedata,andprimaryleukemiapediatricpatientbiopsies,wedemonstratehowthemethodologypresentedinthisstudycangenerateinsightsintotherewiringdifferencesacrosssolidtumorsandbloodcancers.

Page 68: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

58

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 69: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

59

MAPPINGNEURONALCELLTYPESUSINGINTEGRATIVEMULTI-SPECIESMODELINGOFHUMANANDMOUSESINGLECELLRNASEQUENCING

TravisJohnson,ZacharyAbrams,YanZhang,KunHuang

OhioStateUniversity

TravisJohnsonMousebraintranscriptomicstudiesareimportantintheunderstandingofthestructuralheterogeneityinthebrain.However,itisnotwellunderstoodhowcelltypesinthemousebrainrelatetohumanbraincelltypesonacellularlevel.Weproposethatitispossiblewithsinglecellgranularitytofindconcordantgenesbetweenmouseandhumanandthatthesegenescanbeusedtoseparatecelltypesacrossspecies.Weshowthatasetofconcordantgenescanbealgorithmicallyderivedfromacombinationofhumanandmousesinglecellsequencingdata.Usingthisgeneset,weshowthatsimilarcelltypessharedbetweenmouseandhumanclustertogether.Furthermorewefindthatpreviouslyunclassifiedhumancellscanbemappedtotheglial/vascularcelltypebyintegratingmousecelltypeexpressionprofiles.

Page 70: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

60

ASPATIOTEMPORALMODELTOSIMULATECHEMOTHERAPYREGIMENSFORHETEROGENEOUSBLADDERCANCERMETASTASESTOTHELUNG

KimberlyR.KanigelWinner1,JamesC.Costello2

1ComputationalBioscienceProgram,DepartmentofPharmacology,UniveristyofColoradoCancerCenter;2UniversityofColoradoAnschutzMedicalCampus

KimberlyKanigelWinnerTumorsarecomposedofheterogeneouspopulationsofcells.Somaticgeneticaberrationsareoneformofheterogeneitythatallowsclonalcellstoadapttochemotherapeuticstress,thusprovidingapathforresistancetoarise.Insilicomodelingoftumorsprovidesaplatformforrapid,quantitativeexperimentstoinexpensivelystudyhowcompositionalheterogeneitycontributestodrugresistance.Accordingly,wehavebuiltaspatiotemporalmodelofalungmetastasisoriginatingfromaprimarybladdertumor,incorporatinginvivodrugconcentrationsoffirst-linechemotherapy,resistancedatafrombladdercancercelllines,vasculardensityoflungmetastases,andgainsinresistanceincellsthatsurvivechemotherapy.Inmetastaticbladdercancer,afirst-linedrugregimenincludessixcyclesofgemcitabinepluscisplatin(GC)deliveredsimultaneouslyonday1,andgemcitabineonday8ineach21-daycycle.Theinteractionbetweengemcitabineandcisplatinhasbeenshowntobesynergisticinvitro,andresultsinbetteroutcomesinpatients.Ourmodelshowsthatduringsimulatedtreatmentwiththisregimen,GCsynergydoesbegintokillcellsthataremoreresistanttocisplatin,butrepopulationbyresistantcellsoccurs.Post-regimenpopulationsaremixturesoftheoriginal,seededresistantclones,and/ornewclonesthathavegainedresistancetocisplatin,gemcitabine,orbothdrugs.Theemergenceofatumorwithincreasedresistanceisqualitativelyconsistentwiththefive-yearsurvivalof6.8%forpatientswithmetastatictransitionalcellcarcinomaoftheurinarybladdertreatedwithaGCregimen.Themodelcanbefurtherusedtoexploretheparameterspaceforclinicallyrelevantvariables,includingthetimingofdrugdeliverytooptimizecelldeath,andpatient-specificdatasuchasvasculardensity,ratesofresistancegain,diseaseprogression,andmolecularprofiles,andcanbeexpandedfordataontoxicity.Themodelisspecifictobladdercancer,whichhasnotpreviouslybeenmodeledinthiscontext,butcanbeadaptedtorepresentothercancers.

Page 71: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

61

SCALABLEVISUALIZATIONFORHIGH-DIMENSIONALSINGLE-CELLDATA

JuhoKim,NateRussell,JianPeng

UniversityofIllinoisatUrbana-Champaign

JuhoKimSingle-cellanalysiscanuncoverthemysteriesinthestateofindividualcellsandenableustoconstructnewmodelsabouttheanalysisofheterogeneoustissues.State-of-the-arttechnologiesforsingle-cellanalysishavebeendevelopedtomeasurethepropertiesofsingle-cellsanddetecthiddeninformation.Theyareabletoprovidethemeasurementsofdozensoffeaturessimultaneouslyineachcell.However,duetothehigh-dimensionality,heterogeneouscomplexityandsheerenormityofsingle-celldata,itsinterpretationischallenging.Thus,newmethodstoovercomehigh-dimensionalityarenecessary.Here,wepresentacomputationaltoolthatallowsefficientvisualizationofhigh-dimensionalsingle-celldataontoalow-dimensional(2Dor3D)spacewhilepreservingthesimilaritystructurebetweensingle-cells.Wefirstconstructanetworkthatcanrepresentthesimilaritystructurebetweenthehigh-dimensionalrepresentationsofsingle-cells,andthen,embedthisnetworkintoalow-dimensionalspacethroughanefficientonlineoptimizationmethodbasedontheideaofnegativesampling.Usingthisapproach,wecanpreservethehigh-dimensionalstructureofsingle-celldatainanembeddedlow-dimensionalspacethatfacilitatesvisualanalysesofthedata.

Page 72: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

62

COMPUTATIONALAPPROACHESTOUNDERSTANDINGTHEEVOLUTIONOFMOLECULARFUNCTION

POSTERPRESENTATIONS

Page 73: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

63

CLUSTER-BASEDGENOTYPE-ENVIRONMENT-PHENOTYPECORRELATIONALGORITHM

ErnestoBorrayo,RyokoMachida-Hirano

GeneResearchCenter,UniversityofTsukuba

ErnestoBorrayoTheinteractionsbetweengenotypeandenvironmentgiverisetophenotypicplasticity.However,theseinteractionsaredynamicandcomplex.Whatisconsideredasaphenotypeatoneevaluation,canbeconsideredasanenvironmentalconditionatsomeother,asthatpreviousphenotypewillaffectparticularconditionsforthenewone.Also,underaspecificperspectiveadeterminedgeneticmaterialcanbeconsideredasanenvironmentalconditionforotherloci.Theseconceptselucidatethatthe“onegene,onetrait”rationaleisrathertheexceptionthantherule,andinordertoadequatelypredictthepossiblephenotypeexpectedatanybiologicallevel,thespecificinteractionbetweenenvironmentandgenotypeshouldbeanalyzedcarefully.Inordertoinferthedegreeofinfluenceofbothagenotypeandanenvironmentovercertainphenotypictraits,wedevelopedacluster-basedalgorithmthatrendersthewayphenotypicaltraitscanbeexplainedbyeitherthatgenotypeorsuchenvironmentalconditions.Althoughthisapproachisstillfarfrombeingabletoconsiderallpossibleaspectsthatmayexplainaphenotypiccondition,itisafirstapproachtosuccessfullyanalyzingthementionedgenotype-environment-phenotypeinteractionsinacomprehensivemanner.Totestthealgorithmalongwithsyntheticdata,realgenetic,environmentalandagromorphologicaltraitsofTheobromacacaoandSechiumedulewerealsoanalyzed.Weexpectthatfurtherexplorationofdifferentclassifierswillhelptoadequatelypredictphenotypicexpressionatdifferentbiologicallevels—withsignificantapplicationsindiversefieldssuchascropimprovement,genomics,clinicaldiagnosis/prognosis/treatmentandmetabolomics—andthatitwillenhanceourunderstandingofgenomics,metabolomicsandadaptation/evolutionaryprocesses.

Page 74: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

64

QUANTITATINGTRANSLATIONALCONTROL:MRNAABUNDANCE-DEPENDENTANDINDEPENDENTCONTRIBUTIONS

JingyiJessicaLi1,Guo-LiangChew2,MarkD.Biggin3

1DepartmentofStatisticsandDepartmentofHumanGenetics,UCLA;2ComputationalBiologyProgram,FredHutchinsonCancerResearchCenter;3BiologicalSystemsand

EngineeringDivision,LawrenceBerkeleyNationalLaboratory

JingyiJessicaLiTranslationratepermRNAmoleculecorrelatespositivelywithmRNAabundance.Asaresult,proteinlevelsdonotscalelinearlywithmRNAlevels,butinsteadscalewiththeabundanceofmRNAraisedtothepowerofan“amplificationexponent”.Hereweshowthattoquantitatetranslationalcontrolitisnecessarytodecomposethetranslationrateintotwocomponents.Onecomponent,TRmD,dependsonthemRNAlevelanddefinestheamplificationexponent.Theothercomponent,TRmIND,isindependentofmRNAamountandimpactsthecorrelationcoefficientbetweenproteinandmRNAlevels.WeshowthatinS.cerevisiaeTRmDrepresents~30%ofthevarianceintranslationandresultsinanamplificationexponentof~1.20–1.27.TRmINDconstitutestheremaining70%ofthevarianceintranslationandexplains<5%ofthevarianceinproteinexpression.Whenproteindegradationisalsoconsidered,thecorrelationbetweentheabundancesofproteinandmRNAisR2prot–RNA>0.92.WealsoinvestigatewhichmRNAsequenceelementsexplainthevarianceinTRmDandTRmIND.WefindthatTRmINDismoststronglydeterminedbythelengthoftheopenreadingframe,whileTRmDismorestronglydeterminedbyanArich,highlyunfoldedelementthatspansnucleotides-35to+28relativetotheinitiatingAUGcodon,implyingthatTRmINDisunderdifferentevolutionaryselectivepressuresthanTRmD.OurworkintroducesmethodsforcorrectlyscalingmRNAandproteinabundancedatausinginternallycontrolledstandards.Itprovidesquitedifferent,moreaccurateestimatesoftranslationalcontrolthananyprevious.Bydecomposingtranslationrates,wealsoprovideinsightsintothemRNAsequencedependenciesoftranslationthatwouldnotbeapparentotherwise.

Page 75: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

65

PROSNET:INTEGRATINGHOMOLOGYWITHMOLECULARNETWORKSFORPROTEINFUNCTIONPREDICTION

ShengWang,MengQu,JianPen

UniversityofIllinoisUrbanaChampaign

ShengWangAutomatedannotationofproteinfunctionhasbecomeacriticaltaskinthepost-genomicera.Network-basedapproachesandhomology-basedapproacheshavebeenwidelyusedandrecentlytestedinlarge-scalecommunity-wideassessmentexperiments.Itisnaturaltointegratenetworkdatawithhomologyinformationtofurtherimprovethepredictiveperformance.However,integratingthesetwoheterogeneous,high-dimensionalandnoisydatasetsisnon-trivial.Inthiswork,weintroduceanovelproteinfunctionpredictionalgorithmProSNet.Anintegratedheterogeneousnetworkisfirstbuilttoincludemolecularnetworksofmultiplespeciesandlinktogetherhomologousproteinsacrossmultiplespecies.Basedonthisintegratednetwork,adimensionalityreductionalgorithmisintroducedtoobtaincompactlow-dimensionalvectorstoencodeproteinsinthenetwork.Finally,wedevelopmachinelearningclassificationalgorithmsthattakethevectorsasinputandmakepredictionsbytransferringannotationsbothwithineachspeciesandacrossdifferentspecies.Extensiveexperimentsonfivemajorspeciesdemonstratethatourintegrationofhomologywithmolecularnetworkssubstantiallyimprovesthepredictiveperformanceoverexistingapproaches.

Page 76: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

66

GENERAL

POSTERPRESENTATIONS

Page 77: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

67

IDENTIFICATIONOFDIFFERENTIALLYPHOSPHORYLATEDMODULESINPROTEININTERACTIONNETWORKS

MarziehAyati,DanicaWiredja,DanielaSchlatzer,GouthamNarla,MarkChance,MehmetKoyutürk

CaseWesternReserveUniversity

MehmetKoyuturkAdvancesinhigh-throughputomicstechnologiesrevolutionizedourunderstandingofthegenomicunderpinningsofcancer.However,manychallengesremaininunderstandinghowpatientswithcommondrivermutationsmaydisplaydivergingphosphoproteomicresponsestothesametreatment.Thus,anexaminationofthesignalinglandscapewillprovideessentialmolecularinformationformodelingpersonalizedpatienttreatmentdesign.However,integrativebioinformaticsapproachestoidentifyphosphoproteomics-basedmolecularstatesareintheirinfancy.Toaddressthischallenge,weadaptouralgorithmMoBaS,whichhasbeenoriginallydevelopedtoidentifyphenotype-associatedsubnetworksinthecontextofgenome-wideassociationstudies.MoBaStakesasinputaPPInetworkandascoreforeachproteinindicatingtheprotein’sdifferentialphosphorylationlevel.Itthenidentifiesproteinsubnetworksthatare(i)composedofdenselyinteractingproteins,and(ii)enrichedinproteinswithhighscores.MoBaSalsoassessesthestatisticalsignificanceoftheidentifiedsubnetworksusingpermutationteststhateffectivelyhandlemultiplehypothesistesting.WeapplyMoBaStocompareandcontrastthedrug-inducedglobalsignalingalterationsoftwoKRASmutatednon-smallcelllungcancer(NSCLC)celllines,A549andH358,treatedwithanovelactivatorofthetumorsuppressorProteinPhosphatase2A(PP2A)versusDMSOcontrol.Applyingkinaseenrichmentanalysisonidentifiedsubnetworks,weidentifyAuroraKBasakeykinasedifferentiallyregulatedbetweenthetwocelllinesinresponsetoourcompound.Furthercorroboratingthisfinding,weshowthatAuroraKBisdownregulatedattheproteinandmRNAlevelswithourtreatmentinA549butnotinH358.

Page 78: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

68

CLUSTERINGMETHODFORPRIORITIZINGBREASTCANCERRISKGENESANDMIRNAS

YongshengBai,NaureenAslam,AliSalman

IndianaStateUniversity

YongshengBaiBackgroundMicroRNAs(miRNA)areshortnucleotidesthatinteractwiththeirtargetmRNAsthrough3’untranslatedregions(UTRs).TheCancerGenomeAtlas(TCGA)projectinitiatedin2006hasachievedtosequencetissuecollectionwithmatchedtumorandnormalsamplesfrom11,000patientsin33cancertypesandsubtypes,including10rarecancers.ThereisanurgentneedtodevelopinnovativemethodologiesandtoolsthatcanclustermRNA-miRNAinteractionpairsintogroupsandcharacterizefunctionalconsequencesofcancerriskgeneswhileanalyzingthetumorandnormalsamplessimultaneously.RationaleAnundirectedgraphcanbeusedtorepresentgeneandmiRNArelationshipsinaninteractionnetwork.Specifically,interactionsbetweengenesandmiRNAsarerenderedasabipartitegraphwithgenesormiRNAsasverticesandtheircalculatedcorrelationasedges.Ourhypothesisis:Ifahighlyscoredgene/miRNAclusterinagiventumorsampleshowsasignificantlyalteredregulationrelativetoasimilargene/miRNAclusterinthecorrespondingnon-tumorsample,theclusterisbiologicallysignificant.ResultsWedevelopedapowerfulmathematicalmodeltoidentifyclustersofsignificantmRNAandmiRNAinteractionpairsanddeciphermRNAandmiRNAregulationnetworkusingTCGAmiRNAsequencingandmRNAsequencingdata.WerantheclusterdetectionalgorithmimplementedinPython3onTCGABreastInvasiveCarcinoma(BRCA)transcriptome(bothRNA-SeqandmiRNA-Seq)datasets.Usingdifferentclustersize(orbin)anddifferentselectionofmiRNAandmRNApairsforcreatingclusterswillgeneratedifferenttopologyofclusters,therefore,resultingindifferentnumbersofcommonclustersbetweentumorandnormalsamplesaswell.Weran1,000differentrandomselectionsoftargetpairstogeneratedifferentclustertopologyandcombinedallresultstogethertoobtain105,850distinctivecandidateclustersforprioritization.ConclusionsWethinkourmethodologyforidentifyingcancerdrivergenesinpersonalgenomesinwhichcliniciansseektodevelopbettertreatmentstrategiesisvaluabletothefield.Ourproposedmethodshouldbeapplicableacrossarangeofdiseasesandcancers.

Page 79: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

69

FUSIONDB:ASSESSINGMICROBIALDIVERSITYANDENVIRONMENTALPREFERENCESVIAFUNCTIONALSIMILARITY

ChengshengZhu1,YannickMahlich1,2,3,4,YanaBromberg1,4

1DepartmentofBiochemistryandMicrobiology,SchoolofEnvironmentalandBiologicalSciences,RutgersUniversity,NewBrunswick,NJ,USA;2GraduateSchool,Centerof

DoctoralStudiesinInformaticsanditsApplications(CeDoSIA),TUM,Garching,Germany;3DepartmentofInformatics,Bioinformatics&ComputationalBiology-I12,TUM,Garching,Germany;4InstituteofAdvancedStudy(TUM-IAS),Garching,Germany

YanaBrombergSummary:Microbialfunctionaldiversificationisdrivenbyenvironmentalfactors.Insomecases,microbesdiffermoreacrossenvironmentsthanacrosstaxa.HereweintroducefusionDB,anoveldatabaseofmicrobialfunctionalsimilarities,indexedbyavailableenvironmentalpreferences.fusionDBentriesrepresentnearlyfourteenhundredtaxonomically-distinctbacteriaannotatedwithavailablemetadata:habitat,temperature,andoxygenuse.Eachmicrobeisencodedasasetoffunctionsrepresentedbyitsproteome,andindividualmicrobesareconnectedviacommonfunctions.DatabasesearchesproduceeasilyvisualizableXML-formattednetworkfilesofselectedorganisms,alongwiththeirsharedfunctions.fusionDBthusprovidesafastmeansofassociatingspecificenvironmentalfactorswithorganismfunctions.Availability:http://bromberglab.org/databases/fusiondbandasasql-dumpbyrequest.Contact:[email protected],[email protected]

Page 80: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

70

THEGEORGEM.O’BRIENKIDNEYTRANSLATIONALCORECENTERATTHEUNIVERSITYOFMICHIGAN

FrankC.Brosius1,WenjunJu1,KeithBellovich2,ZeenatBhat3,CrystalGadegbeku4,DebbieGipson1,JenniferHawkins1,JuliaHerzog1,SusanMassengill5,RichardC.

McEachin1,SubramaniamPennathur1,KalyaniPerumal6,RogerWiggins1,MatthiasKretzler1

1UniversityofMichigan,2RenaissanceRenalResearchInstitute,3WayneStateUniversity,

4TempleUniversity,5LevineChildren’sHospital,6UniviversityofIllinoisatChicago

RichardMcEachinRecentadvanceshaveallowedthedevelopmentofmolecularmapstodefinechronickidneydisease(CKD)innew,accurateandpersonalizedways.ThesedevelopmentsmakepossiblethepredictionofoutcomesandresponsetotherapyandtheidentificationofkeymoleculartargetsfortreatmentofCKDinindividualpatients.IdentificationofsuchtargetsentailsclosecollaborationbetweenteamsofinvestigatorstocollectandannotatesamplesfromwellcharacterizedCKDsubjects.Inaddition,technologiesareneededthatsupportinformationexchange,robustdatabanks,anddataintegrationtodefinekeypathwaysdrivingCKDpathogenesis.TheO'BrienKidneyTranslationalCoreCenterattheUniversityofMichiganprovidessuchbiobanking,databankstructureandbioinformaticsupporttobasicandclinicalinvestigatorstoallowthemtopursuecriticalprecisionmedicineinvestigationsofhumanswithCKD.TheClinicalPhenotypingandBiobankCorehasenrolledover1200patientswithCKDfrom5sitesandbankedtheirsamplesandclinicalinformationprovidingavaluableresourceforefficientdiscovery.Multiplespecificresearchstudieshavenowsuccessfullyutilizedtheseresources.TheAppliedSystemsBiologyCoreanditsonlineanalyticaltool,Nephroseq,haveassistedhundredsofinvestigatorsaroundtheworldinapproachestotheanalysisoflargetranscriptomicdatasetsandothersystems-level,biologicalstudiesofpatientswithCKD.TheCenter’sBioinformaticsCoreprovidesaccesstocomputationalapplicationsandskilledprofessionalsupportinbioinformaticsandbiostatisticsandwillnowbeprovidingback-endmaintenanceofNephroseq.TheAdministrativeCoredirectspilotandsmallgrants,studenttraininganddiscountprogramswiththegoalofhelpingnewandestablishedresearchersutilizesystemsbiologicalandtranslationalresearchtools.Togetherthesecoresprovideacomprehensivetranslationalresearchsupportfornovelresearchintoclassificationandtreatmentofchronickidneydiseases.Allinterestedacademicinvestigatorsaroundtheworldareinvitedtomakeuseoftheseservicesandtocontactusforinformationandconsultation.

Page 81: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

71

MININGDIRECTIONALDRUGINTERACTIONEFFECTSONMYOPATHYUSINGTHEFAERSDATABASE

DanaiChasioti1,XiaohuiYao1,PengyueZhang2,XiaNing3,LangLi2,LiShen4

1IUPUISchoolofInformaticsandComputing;2CenterforComputationalBiologyandBioinformatics,DepartmentofMedicalandMolecularGenetics,IndianaUniversity

SchoolofMedicine;3IUPUIDepartmentofComputerScience;4CenterforNeruoimaging,DepartmentofRadiologyandImagingSciences,IndianaUniversitySchoolofMedicine

LiShenBackground:Mininghigh-orderdrug-druginteraction(DDI)inducedadversedrugeffects(ADEs)fromelectronichealthrecord(EHR)databasesisanemergingarea,andveryfewstudieshaveexploredtherelationshipsbetweenDDIs.Tobridgethisgap,westudyanovelpharmacovigilanceproblemforminingdirectionaldruginteractioneffectonmyopathyusingtheFDAAdverseEventReportingSystem(FAERS)database.Method:Theanalysiswasperformedonacase–controldatasetextractedfromtheFAERSdatabase.Thedatasetcontains1,763drugs,andincludes136,860myopathyeventsand3,940,587controlevents.GiventwosetsofdrugcombinationsD1andD2(asupersetofD1),wedefinethedirectionalADEeffectfromD1toD2,asthealteredADEriskassociatedwiththechangefromtakingD1totakingD2.TheADEriskswereestimatedusingoddratios(ORs).Toaddressbothcomputationalandstatisticalchallenges,thisstudywasfocusedoncomputingORsforfrequentD2’s(i.e.,thenumberofoccurrencesauser-specifiedminimumsupport).TheApriorialgorithmwasemployedtoidentifyfrequentD2’s.Results:Usingtheminimumsupportof1000,weidentified764frequentdrugs,7036frequent2drugcombinations,and4280frequent3drugcombinations.ThetoptenADEORsforsingledrugsrangefrom4.1to5.6,fortwodrugcombinationsfrom12.6to21.5,andforthreedrugcombinationsfrom14.8to19.5.ThetoptendirectionalADEORsbetweenonedrugandtwodrugsrangefrom13.5to28.2;thosebetweenonedrugandthreedrugsrangefrom13.1to20.3;andthosebetweentwodrugsandthreedrugsrangefrom11.3to34.4.MultiplepromisingdirectionalADEfindingswereidentified.Forexample,theriskofmyopathyis28.2timeshigherwhenaddingGadopentetatedimeglumineontopofGadobenatedimeglumine.BothdrugsareGadolinium-basedcontrastagents(GBCAs)usedinmagneticresonanceimaging.GBCAshavebeenshowntobeassociatedwithNephrogenicsystemicfibrosis(NSF)whichmaypresentasprogressivemyopathy.Conclusion:ThedirectionaldruginteractionscapturetheADErisksintroducedbyadditionaldrugstakenontopofasetofbaselinedrugs,andprovidenovelandvaluablepharmacovigilanceknowledgewithpotentialtoimpactclinicaldecisionsupport.MiningfrequentpatternsusingAprioriisapromisingapproachforeffectivediscoveryofhigh-orderdirectionaldruginteractioneffects.

Page 82: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

72

DECIPHERINGNEURONALBROADHISTONEH3K4ME3DOMAINSASSOCIATEDWITHGENE-REGULATORYNETWORKSANDCONSERVEDEPIGENOMIC

LANDSCAPESINTHEHUMANBRAIN

AslihanDincer1,EricE.Schadt2,BinZhang2,JoelT.Dudley2,DavinGavin3,SchahramAkbarian4

1DepartmentofNeuroscience,FriedmanBrainInstitute,IcahnSchoolofMedicineatMountSinai,NewYork;2DepartmentofGeneticsandGenomicSciences,InstituteforGenomicsandMultiscaleBiology,IcahnSchoolofMedicineatMountSinai,NewYork;

3DepartmentofPsychiatry,JesseBrownVeteransAffairsMedicalCenter,Chicago;4DepartmentofPsychiatry,FriedmanBrainInstitute,IcahnSchoolofMedicineatMount

Sinai,NewYork

AslihanDincerOnlyfewhistonemodificationshavebeenmappedinhumanbrain.TrimethylationofhistoneH3atlysine4(H3K4me3)isachromatinmodificationknowntomarkthetranscriptionstartsites(TSS)ofactivegenepromoters.RegulatorsofH3K4me3markaresignificantlyassociatedwiththegeneticriskarchitectureofcommonneurodevelopmentaldisease,includingschizophreniaandautism.Here,throughintegrativecomputationalanalysisofepigenomicandtranscriptomicdatabasedonnextgenerationsequencing,weinvestigatedH3K4me3landscapesofFACSsortedneuronalandnon-neuronalnucleiinhumanpostmortem,non-humanprimate(chimpanzeeandmacaque)andmouseprefrontalcortex(PFC),andblood.WecharacterizedthebroadH3K4me3histonedomainsfromhumanPFCinthecontextofcell-typespecificregulation,associationwithneuronalandnon-neuronalgeneexpressionandpotentialimplicationsfornormalanddiseaseddevelopment.WefirstaddressedtheoccurrenceandthebiologicalsignificanceofthebroadH3K4me3histonedomainsinthreedifferentcelltypes,includingNeuN+PFCneurons,NeuN-PFCcells,andnucleatedbloodcellsandthenidentifiednovelregulatorsofthesethreedifferentcelltypesbyfocusingontop5%broadestH3K4me3peaks(lengthinbasepairs).InPFCneurons,broadestpeaksrangedinsizefrom3.9to12kb,withextremelybroadpeaks(~10kborbroader)relatedtosynapticfunctionandGABAergicsignaling(DLX1,ELFN1,GAD1,LINC00966).Broadestneuronalpeaksshoweddistinctmotifsignatures,andwerecentrallypositionedinprefrontalgenebayesianregulatorynetworks.Approximately120ofthebroadestH3K4me3peaksinhumanPFCneurons,includingmanygenesrelatedtoglutamatergicanddopaminergicsignaling,werefullyconservedinchimpanzee,macaqueandmousecorticalneurons.Explorationofspreadandbreadthoflysinemethylationmarkingsinspecificcelltypescouldprovidenovelinsightsintoepigeneticmechanismofnormalanddiseasedbraindevelopment,agingandevolutionofneuronalgenomes.

Page 83: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

73

NORMALIZATIONTECHNIQUESANDMACHINELEARNINGCLASSIFICATIONFORASSIGNINGMOLECULARSUBSETSINAUTOIMMUNEDISEASEANDCANCER

JenniferM.Franks1,2,GuoshuaiCai1,JaclynN.Taroni3,4,MichaelL.Whitfield1,21DepartmentofMolecularandSystemsBiology;2PrograminQuantitativeBiomedicalSciences,

GeiselSchoolofMedicineatDartmouth;3DepartmentofSystemsPharmacologyandTranslationalTherapeutics;4InstituteforTranslationalMedicineandTherapeutics,Universityof

PennsylvaniaPerelmanSchoolofMedicineJenniferFranksSystemicsclerosis(SSc)isacomplexconnectivetissuediseaseinvolvingskinandinternalorganfibrosis,vasculardamage,andimmunologicabnormalities.Tocharacterizediseaseheterogeneityandmolecularpathogenesis,transcriptomicshaveelucidatedcommonbiologicalprocessesinsubsetsofSScpatientsusingintrinsicgeneexpressionanalyses.Fourintrinsicsubsetscharacterizedbydistinctmolecularsignatureshavebeenvalidatedbymultipleindependentcohorts.Technicalbiasesinherenttodifferentgeneexpressionprofilingplatformspresentauniqueproblemwhenanalyzingdatageneratedfrommultiplestudies.WhilemicroarrayandRNA-seqdatahavebeenshowntohaveahighcorrelation,differencesinoverallprocessingandquantificationresultindistinctdatadistributions.Here,weintroduceanaccurateandreproducibleclassifierforSScmolecularsubtypesandhavedevelopedamethodtonormalizedatawhenplatform-specificartifactsarise.Weusedthreeindependent,well-characterizedandvalidatedexperimentalmicroarraydatasets(Hinchcliffetal.,2013;Milanoetal.,2008;Pendergrassetal.,2012)totrainasupervisedclassifierusingthree-foldcross-validationrepeatedtentimes,performingatanaverageof>88%accuracy.Datafromotherplatforms,includingRNA-seq,areanalyzedforplatform-basedbiasusingguidedPCAanalysis(Reeseetal.,2013).Wedevelopedamethodtoeliminateplatformbiasbynormalizingonagene-by-genebasisusingthemicroarraytrainingdataasthetargetdistribution.Wefindthatthismethodsuccessfullyremovesplatform-specificeffectsfromthedata.Followingnormalization,eachsampleisassignedtoamolecularsubsetbasedonsupportvectormachine(SVM)classification.OurpreliminaryanalysesfindthatthesemethodsworkextremelywellonavalidationRNA-seqdatasetinSSc(100%accuracy,n=12,Lietal.,inpreparation).WealsoappliedourmethodstobreastcancerDNAmicroarrayandRNA-seqdatafromTheCancerGenomeAtlas(TCGA)(CancerGenomeAtlas,2012)wherefiveintrinsicgeneexpressionsubsetshavebeenpreviouslyidentifiedanddescribedwithPAM50(Parkeretal.,2009).Tumorandtumor-adjacentnormalbiopsiesofbreastcancer,forwhichintrinsicsubtypeinformationwasavailable,wereusedtotrainandtestaSVMandevaluateournormalizationtechnique.Weachieve93%accuracyinassigningsubtypesfornormalizedRNA-seqdatausingourclassifiertrainedexclusivelyonmicroarraydata.Untilrecently,clinicaltrialsanddiagnosingphysicianshavenotconsideredmolecularheterogeneityinthecontextofimmunosuppressivetherapy,whichmayexplainimprovementinselectSScpatients(Martyanov&Whitfield,2016).Advancingpersonalizedmedicinebyusingintrinsicmolecularsubsetsmayproveparticularlybeneficialtothisfield.Withournewlydevelopedtechniques,wecansuccessfullyleverageinformationfromvalidatedexpressiondatainnewanalysesdespitedifferentplatformsusedforgeneexpressionprofiling.

Page 84: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

74

MULTI-OMICSDATAINTEGRATIONTOSTRATIFYPOPULATIONINHEPATOCELLULARCARCINOMA

KumardeepChaudhary,OlivierPoirion,LiangqunLu,LanaGarmire

UniversityofHawaiiCancerCenter,Honolulu

LanaGarmireHighmortalityrateofHepatocellularCarcinoma(HCC)isinpartduetothevastheterogeneityofthecancer.IdentifyingrobustmolecularsubgroupsofHCChelpstoguideprecisetargetedtherapeutics.Thiscouldberealizedbyintegratingdifferentlayersofomicsdatasetsfromthesamecohort.Toachievethis,wepresentadeeplearning(DL)basedmethodtoinspectthedifferentsubpopulationsofpatientswithinHCCfromTCGA.Weobtainedtheinformationof360HCCpatientsavailableinTCGAwith3omicsdatatypes(RNA-seq,miRNA-seqandmethylation).Toidentifythedifferentsubpopulations,ourpipelineimplementsaDL-basedautoencoder,identifieshiddenlayerslinkedtosurvival,andperformsk-meansclusteringusingthesenewfeatures.Toassignnewsamplestotheidentifiedsubpopulations,asupervisedclassificationprocedurewasconductedusingSupportVectorMachine(SVM).Toassesstheperformanceofthemodel,weused5-foldscross-validationschemetoestimatec-indexandbrierscores.Wealsoused60:40ratiotosplitthedatain10foldsinordertoassessthesignificanceofthecoxphregressioninthetestdataset.Finally,weinferredtheclusterlabelsoftwoexternalcohortsbasedonthegeneexpressiondata.Autoencoderframeworkwasusedtocombinethe3omicsasinputfeatures(~40,000)andtoproduce100transformednewfeatures.Amongthesenewfeatures,weidentified36featuressignificantlylinkedwithsurvival,whichwerefurtherusedtoinfer2optimalclustersofpatientswithsignificantsurvivaldifferences.Usingcross-validationprocedure,weobtainedaveragec-indexandbrierscorevaluesof0.70and0.20respectively,forthetestsets.Also,thecoxphregressionshowssignificantsurvivalestimationwhenusingthetestsamples.Finally,ourframeworkisvalidatedontwoexternaldataset:221HCCsamplesfromGEOstudyand230HCCsamplesfromLIRI-JP(RIKEN)cohort.Moreover,weprovedthateachoftheindividualomicfeaturesetscanbeusedsuccessfullytoinferthe2survivalprofiles.However,thecombinationofthe3omicsismorepowerful.WealsocomparedtheDLmethodologywithnewfeaturesproducedbyPCAinstead.Theclinicalandmoleculardifferences(intermsofsurvival,pathways,anddrivermutationprofiles)weresignificantlydifferentforthetwosubpopulations.Thisisthefirststudytoemploydeeplearningasarobustframeworktoidentifynon-linearcombinationofmulti-omicsfeatureslinkedtoidentificationofsubclassesofHCCpatients.Usingmulti-omicsdatasets,ourpipelinesuccessfullycombinesthesedifferentfeaturesandidentifiestwoHCCsubpopulationsexhibitingdifferentsurvivalprofiles.Wethenusedthismodelincombinationwithsupervisedmachine-learningapproachestopredictHCCsubpopulationassignmentfortestandvalidationdatasets.

Page 85: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

75

TOWARDSSTANDARDS-BASEDCLINICALDATAWEBAPPLICATIONLEVERAGINGSHINYRANDHL7FHIR

NaHong,NareshProdduturi,ChenWang,GuoqianJiang

DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN

GuoqianJiangIntroduction:TheFastHealthcareInteroperabilityResources(FHIR)isanemergingclinicaldatastandarddevelopedatHL7,whichenablestherepresentationandexchangeoftheelectronichealthrecords(EHR)datainastandardstructure.FHIRhasstrongexecutableabilitybasedontheRESTfulservicearchitectureandmultipleflexibledataexchangeformats.ShinyisawebapplicationframeworkwithasimplifiedwebdeploymentmechanismthatenablespowerfulRfunctionstosupportthegraphicalandinteractiveanalysis.Therefore,withthegoalofbuildingreusableandextensibleclinicalstatisticsandanalysisapplications,weaimtodesign,developandevaluateaflexibleframeworkusingtheHL7FHIRstandardandtheR-poweredwebapplication-Shiny.Methods:WefirstestablishedalocalFHIRservertomanageourclinicaldata.ThispartofworkisfocusedontheanalysisandimplementationoftheFHIRdatamodels(i.e.,coreresources),dataexchangeformats(e.g.,XMLandJSON)andinvokinganopensourceHAPIFHIRAPI.Second,wedesignedtwoanalysisworkflowsthatarefocusedonpatient-centereddataanalysisandcohort-baseddataanalysisrespectively.Accordingtotheworkflowdesign,wedevelopedanopenapplicationplatformknownasShinyFHIRusingtheShinywebframeworkandtheestablishedFHIRserver.Results:WebuiltalocalFHIRserverusingtheHAPIDSTU2API.Intotal,140patientrecords,476observationrecords,496conditionrecordsand107procedurerecordswerepopulatedintotheFHIRserverfortesting.WiththesupportofRpackages,including‘jsonlite’,‘dygraph’and‘timeline’,ourplatformcanbeusedforavarietyofusecasesofclinicaldataanalysis,includingpatientbloodpressureobservationtimelineanalysis,patientcohortgender/agedistributionstatistics,etc.TheresultsoftheexperimentshowthattheShinyFHIRintegrationapproachoffersthefeasibilityofweb-basedinteractivestatisticsanalysisonstandardizedFHIR-basedclinicaldata.Discussions:TheimplementationsofFHIRhavealreadyattractedalotofinterestsfromhealthcarepractitioners.OurShinyFHIRimplementationprovidesausefulframeworkthatwouldbecomplementarytootherFHIR-basedapplications(e.g.,SMARTonFHIR).ShinyFHIRisdesignedtovisualizetheFHIR-conformantdatathroughcapturingtheuserexperiencesandhabits,andoffersrapidsupportforclinicalresearchwhilecombiningthelimitlessstatisticalpowerofR.However,thereareseveralissuesneedtobesolvedinthefuture,suchasthesupportoftheFHIRextensionsandcustommodelsandthesystemperformanceenhancement.Inthisstudy,wedescribedoureffortsinbuildingastandardizedclinicalstatisticsandanalysisapplicationleveragingShiny.WeconsiderthatthedesignedworkflowscanbeappliedtootherEHRsdatathatfollowstheFHIRstandard,andotherpublicavailableFHIRserverscanbeusedtovalidatetheutilityofourframework.

Page 86: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

76

ADATALAKEPLATFORMOFCONTEXTUALBIOLOGICALINFORMATIONFORAGILETRANSLATIONALRESEARCH

AustinHuang1,DmitriBichko1,MathieuBoespflug2,EdskodeVries3,FacundoDominguez2,DanielZiemek1

1Pfizer,2TweagI/O,3Well-Typed

AustinHuangResearchersneedtoaggregatecontextualbiologicalinformationinordertointerpretexperimentalandclinicalstudyresults.Theseneedsvarygreatlydependingonthescientificquestion.Creatinglarge-scale,structureddatarepositoriesrequiressubstantialinvestmentthatisnotamenabletotherapidly-evolvingneedsoftranslationalresearch.Ontheotherhand,performingdataanalysesusingadhoccollectionsoflocaldatafiles(excelsheets,csvtables,etc.)allowsrapidandflexibleexecution,italsocreatestechnicaldebt.Inthelongterm,theseworkflowsresultinmissedopportunitiestoaccumulateinstitutionalknowledgeandareassociatedwithpoorreproducibility.Wehaveimplementedadataplatformthatcanachievethebenefitsofamoreprincipledhandlingofdatapersistencewithminimalanalystoverhead.Thisisachievedbyautomatingschemainference,metadatacuration,versioning,andRESTfulserviceproductionthroughasimple,Git-likeingestiontool.DatascientistscanretrievedataviafamiliarclientlanguageAPIssuchasdplyrinR.Theplatformisbuiltonopensourcedatabase(Postgres,withanarchitecturethatallowsalternativebackends)andfunctionalprogramming(Haskell,PostgREST)technologies.Ourobjectiveistoacceleratedatasharing/discoverabilityonanalystteamsanddrasticallyreducetheeffortofpersistingdatainasystematicmechanism.Wethereforeprovideatechnologyfoundationforrapiddataserviceproductionandimprovingreproducibilityandreusabilityindataanalyses.

Page 87: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

77

GENOMEREADIN-MEMORY(GRIM)FILTER:FASTLOCATIONFILTERINGINDNAREADMAPPINGUSINGEMERGINGMEMORYTECHNOLOGIES

JeremieKim1,DamlaSenol1,HongyiXin2,DonghyukLee1,3,MohammedAlser4,HasanHassan5,OguzErgin5,CanAlkan4,OnurMutlu1,6

1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA;2DepartmentofComputerScience,CarnegieMellonUniversity,Pittsburgh,PA;3NVIDIA

Research,Austin,TX;4DepartmentofComputerEngineeringBilkentUniversity,Ankara,Turkey;5DepartmentofComputerEngineering,TOBBUniversityofEconomicsandTechnology,

Söğütözü,Ankara,Turkey;6DepartmentofComputerScience,SystemsGroupETH,Zürich,Switzerland

JeremieKimHigh-throughput sequencing (HTS) technology has resulted in a massive influx of available genetic data. Using HTS technology, genomes are sequenced relatively quickly and result in many short DNA sequences (reads) that are used to analyze the donor’s genome across multiple days when using state-of-the-art methods. The first step of genome analysis, read mapping, determines origins for billions of reads within a reference genome to identify the donor’s genomic variants. Hash-table based read mappers are a common type of comprehensive read mappers. They operate by fetching from a pre-generated hash-table, potential mapping locations of a read in the reference genome, which are verified by local alignment, a computationally-expensive dynamic programming algorithm that determines similarity between the read and the potential mapping segment of the reference genome. Alignment has traditionally been the computational bottleneck of read mapping, but recently, many works have been proposing a new step called Location-Filtering in order to alleviate this bottleneck.

Location-Filtering is a critical step where many incorrect potential locations from the hash-table are discarded before local alignment verifies such locations. FastHASH, SHD, and GateKeeper propose variations of Location-Filtering that discard only incorrect locations to reduce end-to-end runtime of hash-table based read mapping. Location-Filtering is now the computational bottleneck of read mapping.

Our goal is to create an efficient Location-Filter that quickly discards as many false negative locations as possible before alignment, while retaining a zero false positive rate. Efficiently filtering incorrect mappings before alignment significantly improves throughput and latency of hash-table based read mapping. We propose a novel filtering algorithm that quickly eliminates from consideration reference genome segments where alignment would yield no matches. Our algorithm’s novelty mainly stems from its design to exploit 3D-stacked memory systems. 3D-stacked memory is an emerging technology that tightly integrates computation and high-capacity memory in a single die stack, thereby enabling concurrent processing of large data chunks at low latency and high bandwidth. The key ideas of our design consist of 1) a new representation of coarse-grained reference genome segments such that the genome can be operated on in parallel using bitwise operations and 2) exploiting the parallel computation capability of 3D-stacked memory to run massively-parallel in-memory operations on the new genome representation. We call our resulting filter the GRIM-Filter.

This work shows how GRIM-Filter can be used with any hash-table based read mapping algorithm and how it effectively exploits processing-in-memory capabilities of 3D-stacked memory. We show that when running with 5% error tolerance, GRIM-Filter reduces false positive locations by 5.59x-6.41x and provides a 1.81x-3.65x end-to-end speedup over the state-of-the-art read mapper mrFAST with FastHASH

Page 88: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

78

BCL-2FAMILYMEMBERSASREGULATORSOFRESPONSIVENESSTOBORTEZOMIBINAMULTIPLEMYELOMAMODEL

MelissaE.Ko1,2,CharisTeh3,4,ChristopherS.Playter5,EliR.Zunder6,DanielH.Gray4,7,WendyJ.Fantl8,SylviaK.Plevritis9,GarryP.Nolan2

1CancerBiologyProgram,StanfordSchoolofMedicine,Stanford,CA;2BaxterLaboratoryforStemCellBiology,StanfordSchoolofMedicine,Stanford,CA;3MolecularGeneticsofCancerDivision,ImmunologyDivision,TheWalterandElizaHallInstitute,Parkville,VIC,Australia;

4DepartmentofMedicalBiology,TheUniversityofMelbourne,Parkville,VIC,Australia;5DepartmentofBiologicalSciences,PurdueUniversity,Lafayette,IN;6DepartmentofBiomedical

Engineering,UniversityofVirginia,Charlottesville,VA;7TheWalterandElizaHallInstitute,Parkville,VIC,Australia;8DepartmentofObstetricsandGynecology,StanfordSchoolof

Medicine,Stanford,CA;9DepartmentofRadiology,StanfordSchoolofMedicine,Stanford,CAMelissaKoSurvivalratesforBcellmalignancieshavesteadilyimprovedoverthelastfivedecadesreachinglevelsofover50%asaresultoftherapeuticagentssuchasdexamethasone,bortezomib,andlenalidomide.However,despitetheirsuccessinproducingclinicalresponses,thecellularmechanismsbywhichtheseagentskilltumorcellsarepoorlyunderstood.WehypothesizedthattheBcl-2familyofproteins,whichareknowntocontrolinitiationofapoptosisandarefrequentlydysregulatedincancerousBcellssuchasmultiplemyeloma,caninfluenceresponsivenesstothesetherapeuticagents.Thus,withafocusonmultiplemyeloma,weaimedtocomprehensivelyprofileindividualcellsfortheirexpressionlevelsofBcl-2familymemberssimultaneouslywithactivatedintracellularsignalingproteinsuponexposureofcellstodrugsusedtotreatB-cellmalignancies.Weappliedsingle-cellmasscytometrytoinvestigatetheinterplayofpro-survivalandpro-apoptoticBcl-2familymembersinMM1SBlymphoblasticcellsexposedtodifferentdrugs.ThisdatasetwasanalyzedwithFLOW-MAP,acomputationaltooldevelopedintheNolanLabthatorganizeshigh-dimensionalsingle-celldataintoaninterpretable2Dgraphstructure.FLOW-MAPenabledtheapoptoticprogressionofindividualcellstobevisualizedandshowedchangesinexpressionlevelsofBcl-2familymembersandsignalingfactorsacrosscellswithdifferentdrugsensitivities.Ourextensivestudyrevealedheterogeneousresponsesofcellsubsetstotherapeuticagentsusedtotreatmultiplemyelomapatients.Forexample,ourresultsshowedthatbortezomib,aproteasomeinhibitorapprovedfortreatmentofmultiplemyeloma,potentlyinducesapoptosiswithin24hourstoagreaterextentcomparedtoothertreatments.Inductionofapoptosisinsinglecellstreatedwithbortezomibcoincidedwithaselectivereductionofasubsetofpro-survivalBcl-2members.Furthermore,ouranalysissuggeststhatametricthatreflectsthebalanceofpro-survivalandpro-apoptoticBcl-2proteinsmaybestseparateandpredictcellswithdifferentialsensitivitytobortezomib.Thisparadigmissupportedbystatisticalmodelingwhereinwedevelopedaclassifierofbortezomib-resistantvs.sensitivecellsusingBcl-2familyinformationorasingleBcl-2scorewithsignificantaccuracy.Ourstudyprovidesageneralframeworkforunderstandingdifferentialsensitivityoftumorpopulationstoanti-cancerdrugs.Ourresultsarelikelytoidentifypreviouslyunknowndeath-inducingmechanismsaswellaspinpointpotentialsynergiesbetweenstandard-of-caretherapiesandnewlydevelopedtherapies,suchasBcl-2familyinhibitors.

Page 89: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

79

BIOMEDICALTEXT-MININGAPPLICATIONSFORTHESYSTEMDEEPDIVE

EmilyK.Mallory,ChrisRe,RussB.Altman

StanfordUniversity

EmilyMalloryAcompleterepositoryofbiomedicalrelationshipsiskeyforunderstandingtheprocessesunderlyingbothhumandiseaseanddrugresponse.Afterdecadesofexperimentalresearch,themajorityofknownbiomedicalrelationshipsexistsolelyintextualformintheliteratureandarethuscomputationallyinaccessible.Whilecurateddatabaseshaveexpertsmanuallyannotaterelevantrelationshipsorinteractionsfromtext,thesedatabasesstruggletokeepupwiththerapidgrowthofthebiomedicalliterature.Toaddresstheneedforbiomedicalrelationshipextraction,therehavebeennumerousbiologicalentityandrelationshipextractionchallenges;however,extractionsystemsinthebiomedicalspacetendtobetaskspecificanddonotprovideageneralframeworkforquicklydevelopingfuturesystemstoaddressnewextractiontasks.Inthiswork,wedevelopedmultipleentityandrelationshipapplications(called“extractors”)forthesystemDeepDivetoextractbiomedicalrelationshipsfromfulltextarticles.DeepDiveisatrainedsystemforextractinginformationfromavarietyofsources,includingtext.Applicationdeveloperscreatefeaturesandtrainingexamples,andDeepDiveassignsaprobabilitythatagivenentityorrelationshipiscorrectortrueintheoriginalsentence.Wedevelopedentityextractorsforgenes,drugs,anddiseases;andrelationshipextractorsforgene-gene,gene-disease,andgene-drugrelationships.Weevaluatedthegene-geneworkpreviouslywithacorpusofarticlesfromthreePLOSjournals,andwearecurrentlyevaluatingtheothertworelationshipextractorsonacorpusfromPubMedCentral.Theprecisionofourentityextractorsrangedfrom80to90%.Forthetaskofextractinggene-generelationships,oursystemachieved76%precisionand49%recallinextractingdirectandindirectinteractionspreviouslycuratedbytheDatabaseofInteractingProteins(DIP).Forrandomlycuratedextractions,thesystemachievedbetween62%and83%precisionbasedondirectorindirectinteractions,aswellassentence-levelanddocument-levelprecision.Ourcurrentgene-diseaseandgene-drugextractorsachievedover70%precisiononarandomsubsetofdocumentsfromover340,000fulltextarticlesinthePubMedCentralOpenAccessSubset.Wearecurrentlytuningtheseextractorstoincreaseperformance.Thisworkwillenablenotonlyfulltextliteratureextractionforbiomedicalrelationships,butalsocomputationalmethodsdevelopmentbasedontheserelationships.

Page 90: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

80

PROFILINGADAPTIVEIMMUNEREPERTOIRESACROSSMULTIPLEHUMANTISSUESBYRNASEQUENCING

SergheiMangul1,IgorMandric2,HarryTaegyunYang1,DennisMontoya1,NicolasStrauli3,JeremyRotman1,BenjaminStatz1,WillVanDerWey1,AlexZelikovsky2,Roberto

Spreafico1,MauraRossetti1,SagivShifman1,MarkAnsel3,NoahZaitlen3,EleazarEskin1

1UniversityofCaliforniaLosAngeles,2GeorgiaStateUniversity,3UniversityofCaliforniaSanFrancisco

SergheiMangulAssay-basedapproachesprovideadetailedviewoftheadaptiveimmunesystembyprofilingT-andB-cellreceptors.However,thesemethodscomeatahighcostandlackthescaleofregularRNAsequencing(RNA-seq).WedevelopedImReP,anovelcomputationalmethodthatutilizesRNA-seqdatatoprofiletheadaptiveimmunerepertoire.ImRePisabletoquantifyindividualimmuneresponsesfromRNA-SeqdatabasedonarecombinationlandscapeofgenesencodingB-andT-cellreceptors.WeappliedImRePto8,555samplesfrom544individualsand53diversehumantissues,andconstructedthecomplementaritydeterminingregions3(CDR3),whichisthemostvariablepartoftheantigen-bindingsite.Weassembled3.8milliondistinctCDR3sequences.Analyzingthisdataset,weidentifiedthenormal,healthy,adaptiveimmuneprofilefordifferenttissues.Wedescribethevariationinimmuneprofiles,andthedistributionofclonallineagesacrossindividualsandtissues.BaseontheimmuneprofilesgeneratedbyImReP,wewereabletoidentifyinflammationandvariousdiseases,asconfirmedfromthehistologicalimages.TheatlasofTandBcellrepertoires,freelyavailableathttps://sergheimangul.wordpress.com/atlas-of-t-and-b-cell-repertoires/,isthelargestrecourseintermsofthenumberofCDR3sequencesandtissuetypesinvolved.Weanticipatethisrecoursetoenhancefuturestudiesinareassuchasimmunologyandadvancedevelopmentoftherapiesforhumandiseases.ImRePisfreelyavailableathttps://sergheimangul.wordpress.com/imrep/.

Page 91: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

81

THECMHVARIANTWAREHOUSE-ACATALOGOFGENETICVARIATIONINPATIENTSOFACHILDREN'SHOSPITAL

NeilMIller1,GreysonTwist1,ByunggilYoo1,AndreaGaedigk2

1CenterforPediatricGenomicMedicine,Children'sMercy,KansasCity;2DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,School

ofMedicine,UniversityofMissouri-KansasCity

NeilMillerAdvancesinhigh-throughputDNAsequencinghaveenabledthecomprehensiveidentificationofindividualgeneticvariationonanunprecedentedscale,poweringthediagnosisofdiseaseandpersonalizedtreatment.Astheabilitytodetectgeneticvariationhasgrown,cliniciansandresearchersstruggletointerpretthefunctionalsignificanceofthemillionsofvariantsfoundineachindividualgenome.TheVariantWarehouseattheCenterforPediatricGenomicMedicineatChildren’sMercy,KansasCity,isaresourcecontainingarecordofover160milliongenomicvariantsdetectedinmorethan5000patientssequencedbytheCentersince2011.EachvarianthasbeencharacterizedbytheCPGM’sRapidUnderstandingofNucleotideEffectSoftware(RUNES)pipeline,whichrecordsdatabasecrossreferences,predictedfunctionalconsequencesandavariantclassificationscore(1-5)basedonpreliminaryguidelinesfromtheAmericanCollegeofMedicalGeneticsandGenomics(ACMG).Additionally,alocalallelefrequencyiscalculatedforeachvariantevery6hoursenablingcliniciansandresearcherstorapidlyidentifyrarevariants.Despiteextensivecross-referencingwithdatabasessuchasdbSNP,ClinVar,ExACandCOSMICtheCMHvariantwarehousecontainsasignificantnumberofnovelvariantsnotpresentinexternaldatabases.59%ofthetotalvariantsinthewarehousearenovelwithalocalallelefrequencyoflessthan0.25%.Ofthese,1%arecategory1-3variantsexpectedtohavesomefunctionalimpact.Wehaveobserved82,578variantsamongapanelof58pharmacogenes(includingCPICgenes),ofwhich59%arenoveland2%arecategory1-3variants.Theamountofnoveltyobservedinthispatientpopulationsuggeststhateffortstocomprehensivelycataloghumanvariationremainaworkinprogressandthatinterpretationofvariantdatawillrequiresomelevelofinterpretationofnovelvariantsfortheforeseeablefuture.Theseobservationsareincreasinglyrelevantinpharmacogenomicsapplicationswheredrugcompatibilityisdeterminedthroughassociationtoknownhaplotypes;inthiscontext,thepresenceofnovelandrarevariantsmustbeanticipatedandaccountedforinautomatedhaplotypedetermination.TheCMHvariantwarehouseispubliclyavailableathttp://warehouse.cmh.edu.Toolstosearchandviewvariantsbygene,categoryandallelefrequencyareprovidedaswellasbulkdownloadsofdata.ProgrammaticaccesstodataisprovidedthroughimplementationsoftheGlobalAllianceforGenomicsandHealthvariantannotationAPI.

Page 92: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

82

MUTPRED2ANDITSAPPLICATIONTOTHEINFERENCEOFMOLECULARSIGNATURESOFDISEASE

VikasPejaver1,LiliaM.Iakoucheva2,SeanD.Mooney3,PredragRadivojac1

1DepartmentofComputerScienceandInformatics,SchoolofInformaticsandComputing,IndianaUniversityBloomington;2DepartmentofPsychiatry,UniversityofCaliforniaSanDiego;

3DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashingtonSeattlePredragRadivojacOverthepastdecade,severalmethodshavebeendevelopedforthecomputationalprioritizationofmissensemutations.However,theidentificationoftheeffectsofsuchmutationsonproteinstructureandfunctionstillremainamajorchallenge.Previously,wedevelopedMutPred,arandomforest-basedmodelfortheclassificationofpathogenicmissensevariantsandtheautomatedinferenceofmolecularmechanismsofdisease.Here,webuildonourpreviousworkandpresentMutPred2asanimprovedapproachforthesetasks.Forpathogenicityprediction,MutPred2particularlybenefitsfromalargerandheterogeneoustrainingset,theinclusionofnewfeatures,theencodingoflocalsequencecontextandtheuseofaneuralnetworkensemble.Throughcross-validationexperimentsandatestonanindependentdataset,weshowthatMutPred2outperformsMutPredandotherstate-of-the-artmethods.Inparticular,weobservethatMutPred2predictsfewerpathogenicmutationsthanPolyPhen-2,whenappliedtohomozygousmutationsfromhealthyindividuals.Additionally,MutPred2hasover50built-instructuralandfunctionalpropertypredictors,whichgreatlyincreasethenumberofpossibledownstreamconsequencesthatcanbeassociatedwithagivenaminoacidsubstitution.Weintroduceanovelrankingapproachthatutilizesapositive-unlabeledlearningframeworktoderiveposteriorprobabilitiesforthedisruptionofthesepropertiesand,thus,inferthemostlikelymolecularmechanismofpathogenicity.WethendemonstratetheutilityofMutPred2intwosituations.First,weidentifyprominentstructuralandfunctionalsignaturesinadatasetofmostlyMendeliandiseases(fromMutPred2’strainingset)andrecapitulateknownassociationsbetweenthesediseasesandorderedandstructuredregionsofproteins.Wealsomakenovelpredictionsabouttheroleofallostericresiduesinsuchdiseases.Second,weapplyMutPred2toadatasetofdenovomutationsfrompatientsdiagnosedwithneuropsychiatricdisorders,alongwithhealthysiblingsascontrols.Onthisdataset,MutPred2pathogenicityscoresalonearesufficienttodistinguishbetweenneuropsychiatriccasesandcontrols,withoutanyadditionalgene-basedorvariant-basedfiltering.Wealsoobservethatdisruptionsinprotein-proteininteractions(PPIs),phosphorylationandacetylationarefrequentmechanisms,suggestingthatneuropsychiatricdisordersarelargelycharacterizedbyabreakdowninmolecularsignaling.Finally,weidentifycandidatemutationspredictedtodisruptPPIsandvalidatethemexperimentally.

Page 93: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

83

HIV-TRACE:MONITORINGTHEHIVEPIDEMICINNEARREALTIMEUSINGLARGENATIONALANDGLOBALSCALEMOLECULAREPIDEMIOLOGY

SergeiPond1,StevenWeaver1,JoelWertheim2,AndrewJ.LeighBrown3

1TempleUniversity,2UniversityofCaliforniaSanDiego,3UniversityofEdinburgh

SergeiPondManypathogens,includingHIV,propagatealongsexualandsocialcontactnetworks.ItisnowclearthatHIVtransmissionnetworksbelongtothescalefreefamilyandthespreadofinfectionsinscalefreenetworksiscriticallyenhancedbyhighlyconnectedindividualsor“hubs”.Thestructureofthetransmissionnetworkhasmajorimplicationsforinterruptinganepidemic.Sincepathogentransmissionnetworksarenotobserveddirectly,theyareinferredandcharacterizedbasedonindirectmeasurements,andmethodstodothisproperlyremainsanopenresearchchallenge.Becauseoftheirrapidandhost-specificevolutionandchronicdiseasestates,HIVsequenceisolatesareessentiallyuniquetoeachinfectedperson.Thissequenceuniquenesscanbeusedtoconfirmorrejectthehypothesisthattwoindividualsare“linked”byarecenttransmissionorbelongtothesametransmissionclusterThereare~1,000,000HIVsequencesisolatedfromdifferentindividualsoverthelast4decades.Nationalandinternationalsurveillanceanddrugresistanceprogramsaregeneratinghighresolutionsequencingdataonhundredsofthousandsofisolatesannually.WedevelopedHIVTransmissionClusterEngine(HIV-TRACE)inordertomaketheprocessofcluster(andnetwork)inferenceautomated,fast,convenient,andmorerobust.Itisanefficientopen-sourceapplicationdesignedtoscalewellandenablenearreal-timeinferenceandanalysisoflargenetworks:itcanprocess100,000sequencesin~15-30minutesona64corebackendsystem.HIV-TRACE(hiv-trace.org)isanopen-sourcewebapplicationbuiltonrobustandpopularmodernlibraries.Userinteractionandresultvisualizationisdoneentirelyinthebrowser,processingisdoneasynchronouslyonaserverbackend.ComponentsandversionsofHIV-TRACEareusedbytheCDC(VARS,HICSB),Canadianpublichealthofficials,NYCDepartmentofPublichealth,SanDiegoprimaryinfectioncohort,andtheUKDrugResistanceDatabase.WeillustratetheutilityofHIV-TRACEonfourreal-worldexamplesofessentialquestionsinpublichealthandepidemiologyofHIV-1:1).Arethererapidlygrowingtransmissionclusters,andwhatisdrivingtheirgrowth?2).HowdoesHIVspreadatdifferentgeographicscales,andamongdifferentriskgroups?3).Howcantreatmentandinterventionbedeployedinoptimalwaystoreduceincidenceandprevalence?4).Canvaccineandpreventionefficacybemeasuredmoreaccuratelyusingnetwork-levelinformation.

Page 94: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

84

THEEXTREMEMEMORY®CHALLENGE:ASEARCHFORTHEHERITABLEFOUNDATIONSOFEXCEPTIONALMEMORY

MaryA.Pyc,EmilyGiron,PhilipCheung,DouglasFenger,J.StevendeBelle,TimTully

DartNeuroScience

DouglasFengerWeareinterestedindiscoveringnewcandidatetargetsfordrugtherapiestoenhancecognitivevitalityinhumansthroughoutlife,andtoremediatememorydeficitsassociatedwithbraininjuryandbrain-relateddiseasessuchasAlzheimer’sandParkinson’s.Toachieveourgoalweneedacomprehensiveandobjectiveunderstandingofthehumangenomecontributiontovariationinmemoryperformanceinhealthyindividuals.WeareimplementingaGenome-WideAssociationStudy(GWAS)toidentifygeneticlocivaryingamongindividualswhopossessexceptionalandnormalmemoryabilities.Thesegenesandthoseinassociatednetworkswillinformdrugdiscoveryanddevelopment.Ourfirststepistoidentifyexceptionalmembersofthepopulation.Thus,wehavecreatedanonlinememorytest–theExtremeMemoryChallenge(XMC,accessibleathttp://www.extremememorychallenge.com)–toconvenientlyscreenthroughanunlimitednumberofsubjectstofindindividualswithexceptionalmemoryconsolidationabilities.Identifiedsubjectsare(1)validatedbyabatteryofsecondarymemorytasks,and(2)providingsalivasamplesfromwhichwecanisolateDNAforGWAS..TenpilotexperimentswereconductedtoparameterizetheXMCscreen.Participantslearnedface-namepairsforadelayedrecalltest.Afterinitialstudy,eachnamewaspresentedandparticipantswereaskedtoselectthecorrectfaceamongfour(distracterswereotherfacespairedwithdifferentnames).Onedaylaterparticipantscompletedafinaltesttrial.Weareprimarilyinterestedinforgettingacrosssessions,asthisprovidesanestimateofconsolidationacrossa24-hourtimeinterval.Pilotstudiesindicatedtheoptimalprotocolshouldinclude30face-namepairs,presentedata4secondrate.Todate,17,849participantsfrom176nationshavebeenscreenedintheXMC.Ofthese,11,311havecompletedbothsessions.IndividualsinoursamplearemostfrequentlyCaucasians(55%),post-secondaryschool-educated(63%),reportedbeingmostalertinthemorning(51%),andrighthanded(89.5%).Theaverageagewas34,andthegenderdistributionwassplitevenly.Theforgettingrate(decreaseinperformancefromday1today2)was10%.Wehaveidentified49individualswithperfectperformanceonday2ofthetestand24withexceptionalconsolidationabilities(definedas3SDsfromthemean).Wehavebegunthegenomicsphaseofthestudywith33individualswhohavecompletedadditionalbehavioraltesting.

Page 95: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

85

RESCUETHEMISSINGVARIANTS-LESSONSLEARNEDFROMLARGESEQUENCINGPROJECTS

YingxueRen1,JosephS.Reddy1,VivekanandaSarangi2,JasonP.Sinnwell2,SteveG.Younkin3,NilüferErtekin-Taner3,OwenA.Ross3,RosaRademakers3,ShannonK.McDonnell2,JoannaM.

Biernacka2,YanW.Asmann1

1DepartmentofHealthSciencesResearch,MayoClinic,Jacksonville,FL;2DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;3DepartmentofNeuroscience,MayoClinic,

Jacksonville,FLYingxueRenIdentifyingnoveldiseasevariantsthroughnextgenerationsequencing(NGS)hasbeenafruitfulpracticeinmedicalresearchinrecentyears,leadingtothediscoveriesofnewdiseasemechanismsaswellastherapeuticstrategies.TheGATKbestpracticeshavesincebeenestablishedtoprovidegeneralrecommendationsoncoreprocessingstepsrequiredtogofromrawreadstofinalvariantcallsets.However,withthesamplesizedrasticallyincreasingintoday’ssequencingexperiments,manydefaultvariantcallingstrategiesandthechoiceoftoolscallforacloserexamination.OurstudyutilizedthewholeexomesequencingdataprovidedbytheAlzheimer'sDiseaseSequencingProject(ADSP)totestfordifferentvariantcallingstrategiesandtoolsinvolvedinthevariantdiscoveryworkflowinthecontextofsamplesizes.WefirstinvestigatedtheimpactofusingdifferentsequencealignersonvariantcallsetswhilekeepingthedefaultGATKsettingsofthevariantcallingandQCstepsidentical.Weselected1952samplestoalignbybothBWAandNovoAlign,andcomparedthevariantcallsetsin50,100,200,500,1000and1952samples.Wediscoveredthatthepercentageofvariantsuniquetoalignerincreaseddramaticallywithincreasingsamplesizes.Atsamplesizeof1952,theuniquevariantsgeneratedbyBWAandNovoAlignaccountformorethan20%oftotalcalledvariants.Theseuniquevariantshavegoodvariantqualitymetrics:~80%haveGenotypeQuality(GQ)scoreof60orabove,andtheirdistributionofBalleleconcentration(BAC)centersaround0.5and1,consistentwithwhatisexpectedofdiploidgenomes.What’smore,over96%oftheuniquevariantshavepopulationBallelefrequency(BAF)oflessthan0.01,indicatingthatthesevariantsarerareinthepopulation.Allthesemetricssuggestthattheseuniquevariantsareimportanttobeincludedindownstreamvariantanalysis.Inadditiontoalignercomparison,wealsoevaluatedsingle-samplevariantcallingversusthedefault,singlesamplevariantcallingfollowedbyjointmulti-samplegenotypingstrategyin50,100,500,2000,and5000samples.Ourdatashowedthat,withincreasingsamplesizes,thesingle-samplecallingstrategyaddedincreasingpercentageofuniquevariants.Atsamplesizeof5000,single-samplecallingadded58,884variants,accountingfor5.55%oftotalvariantscalledbybothstrategies.7331oftheseuniquevariantspassedVariantQualityScoreRecalibration(VQSR)andhaveGQof60oraboveinatleast5samples.Ourstudyidentifiedalargenumberofgood-qualityvariantsfromtheADSPexomesequencingprojectthatweremissedbyusingonealignerorusingmulti-samplegenotypingstrategyalone.Ourfindingsrevealedtherelationshipsbetweenbioinformaticspipelinesandbiomedicalresearchresults,andsuggestedthatalternativevariantcallingstrategiesmaybebeneficialforoptimalvariantdiscoveryinfaceoftoday’slargesequencingscale.

Page 96: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

86

TOWARDEFFECTIVEMICRORNAQUANTIFICATIONFROMSMALLRNA-SEQ

PamelaRussell1,RichardRadcliffe2,BrianVestal1,WenShi1,PratyaydiptaRudra1,LauraSaba2,KaterinaKechris1

1DepartmentofBiostatisticsandInformatics,ColoradoSchoolofPublicHealth;2DepartmentofPharmaceuticalSciences,UniversityofColoradoSkaggsSchoolof

PharmacyandPharmaceuticalSciences

PamelaRussellExtensiveworkhasledtorobustquantificationmethodsforRNA-seqdataprimarilyderivedfromlargeRNAs.Manystudieshaveusedthesemethods“outofthebox”toestimatemicroRNA(miRNA)expressionfromsmallRNA-seqdata.However,thesemethodsdonoteffectivelyaddressissuesparticulartomiRNAs.Firstofall,referencebiasisamplifiedduetothesmallsizeofsequencingreadsderivedfrommiRNAs(~22nt).Thatis,withshorterreads,atruemismatchbetweenasampleandthereferencecanleadtoincorrectalignmentsorinabilitytoalignreadsatall,creatingacountbiastowardthosesampleswiththereferenceallele.Withlongerreads,singlemismatcheshavelessimpactonalignmentalgorithms.Second,anybiasforindividualmiRNAsismoreimpactfuloverallduetotherelativelysmallrepertoireofmiRNAscomparedtomRNAs.InaccuratecountsforahandfulofmiRNAscansignificantlyalteroveralllibrarycountsandthusaffectnormalization.Werefertothisissueasrepertoirebias.Also,mostmiRNAstudiesseektoidentifyfunctionalmaturemiRNAmoleculesregardlessofthepositioninthegenomethattheyareoriginallytranscribedfromorsmallnon-functionaldifferencesbetweenmiRNAsofthesamefamily.ToolsdesignedforlargeRNAsdonotaddresstherepetitivenatureandfamilystructureofmiRNAs,bydefaultreturningestimatedcountsformultipletargetsthatshouldbeconsideredequivalentbytypicalmiRNAstudyparadigms.Genome-basedmethodsoftenmapmiRNAreadstomultiplelociencodingthesamematuremiRNA.MethodsbasedonmappingdirectlytoamiRNAdatabasedonotsufferfrommultiplealignmentsduetoidenticalregionsofthegenomebutdotypicallydistinguishamongmembersofeachmiRNAfamily.Bothsourcesofmultiplemappingscanleadtomisleadingcountswhenthegoalistoelucidatefunction.Hereweexplorealltheseissuesinthecontextofcommonlyusedmethods.Wethenproposeanewhighthroughputapproachthat(1)incorporatesindividualgeneticvariationintothereferencesequenceusedforalignment,reducingreferencebias,and(2) assignseachreadtoasinglefunctionalgroupsuchasamiRNAfamily.Wedemonstratetheaccuracyofthisapproachcomparedtootherpopularmethodsusingadatasetderivedfrom206mousebrainsamples.FundedbyNIH/NIAAAAA016597,R01AA021131andR24AA013162

Page 97: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

87

NANOPORESEQUENCINGTECHNOLOGYANDTOOLS:COMPUTATIONALANALYSISOFTHECURRENTSTATE,BOTTLENECKSANDFUTUREDIRECTIONS

DamlaSenol1,JeremieKim1,SaugataGhose1,CanAlkan2,OnurMutlu1,3

1DepartmentofElectricalandComputerEngineering,CarnegieMellonUniversity,Pittsburgh,PA,USA;2DepartmentofComputerEngineering,BilkentUniversity,Bilkent,Ankara,Turkey;

3DepartmentofComputerScience,SystemsGroup,ETHZürich,SwitzerlandDamlaSenolNanoporesequencing,apromisingsingle-moleculeDNAsequencingtechnology,exhibitsmanyattractive qualities and, in time, could potentially surpass current sequencing technologies.Nanoporesequencingpromiseshigherthroughput,lowercost,andincreasedreadlength,anditdoes not require a prior amplification step. Nanopore sequencers rely solely on theelectrochemicalstructureofthedifferentnucleotidesforidentificationandmeasurethechangeintheioniccurrentaslongstrandsofDNA(ssDNA)passthroughthenano-scaleproteinpores. Biologicalnanopores forDNAsequencingwas firstproposed in the1990s,but itwasonly justrecentlymade commercially available inMay 2014 by Oxford Nanopore Technologies (ONT).The first commercial nanopore sequencing device, MinION, is an inexpensive, pocket-sized,portable,high-throughputsequencingapparatusthatproducesreal-timedata.Thesepropertiesenable newpotential applications of genome sequencing, such as rapid surveillanceof Ebola,Zikaorotherepidemics,near-patienttesting,andotherapplicationsthatrequirereal-timedataanalysis. Inaddition,thistechnologyiscapableofgeneratingvery longreads(~50,000bp)withminimal sample preparation. Despite all these advantageous characteristics, it has onemajordrawback:higherrorrates.Inordertoprovidehigheraccuracyandhigherspeed,inMay2016,ONT released a new version of MinION with a new nanopore chemistry called R9, whichreplacedthepreviousversionR7.AlthoughR9chemistryimprovesthedataaccuracy,thetoolsused for nanopore sequence analysis are of critical importance as they should overcome thehigherrorratesofthetechnology. Ourgoalinthisworkistocomprehensivelyanalyzetoolsfornanoporesequenceanalysis,withafocusonunderstandingtheadvantages,disadvantages,andbottlenecksofthevarioustools.Tothisend,werigorouslyexaminemultiplesteps in thenanoporegenomeanalysispipeline.Thefirststep,basecalling, translatestherawsignaloutputofMinIONintonucleotidestogenerateDNA sequences. Currently,Nanocall andNanonet are publicly available nanoporebasecallers.The second stepperformsgenomeassemblywithassemblers fornoisy long reads.Usingonlythe basecalled DNA reads, assemblers generate longer contiguous fragments called draftassemblies. Currently,CanuandMiniasm are the commonlyused long-readassemblers.Afterthis step, an improved consensus sequence is generated from the draft assembly withNanopolish,andacompletewholegenomeisobtained. Weanalyzethefiveaforementionednanoporesequencingtoolsintermsoftheirspeedandaccuracy,withthegoalsofdeterminingtheirbottlenecksandfindingimprovementstothesetools.Wealsodiscusspotentialfutureworksinnanoporebasecallersandassemblers,totakebetteradvantageofnanoporesequencingandtoovercomeitscurrentdisadvantageofhigherrorrates.

Page 98: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

88

DETECTINGOUTLIERSFROMMULTIDIMENSIONALDATAWITHAPPLICATIONINCANCER

KyleSmith1,SubhajyotiDe2,DebashisGosh1

1UniversityofColorado,2RutgersUniversity

KyleSmithOutliers,whichareverydifferentfromthetypicalcasesinacohort,bringinunexpectedchallengesfordecisionmakinginmanydifferentdisciplines.Theissueismoreacuteinoncology,sincemosttypesofcancerarehighlyheterogeneousdiseases.Evenwithinanycancersubtype,patientsshowextensivevariationintheirmolecularprofilesandclinicaloutcomes.Evenwithinacohortofcancerpatientswhohaveapparentlythesamebiomarkersandreceivedidenticaltreatment,thereareexceptionalrespondersandexceptionalnon-responders,whoareoutliers.Itissuspectedthattheiratypicalmolecularandclinicalprofilescontributetotheirexceptionalresponse.Whileidentifyingsuchoutliercasescanbenefitprecisionmedicineinitiatives,methodstodetectthemfrommultidimensionaldatahasreceivedlimitedattention.Here,weproposeanovelframeworktoidentifyoutliercancerpatientswithatypicalprofilesfrommultidimensionalgenomicdata.Wearguethatdetectionofoutlierpatientswithatypicalprofilescanhelpidentifyexceptionalrespondersandtailorprecisionmedicineinoncologyinitiatives.

Page 99: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

89

HUEMR:INTUITIVEMININGOFELECTRONICMEDICALRECORDS

AbiodunOtolorin1,NanaOsafo2,WilliamSoutherland2

1DepartmentofCommunityandFamilyMedicine,HowardUniversity,Washington,DC;2DepartmentofBiochemistry&MolecularBiologyandtheCenterforComputational

BiologyandBioinformatics,HowardUniversity,Washington,DC

WilliamSoutherlandDespitethewidespreadadoptionofelectronicmedicalrecordsystemsandadvancesingenomics,amajorbarriertoresearchendeavorsisthelackofintuitiveuser-friendlyinteractivetoolsthatenableresearcherstoaccessandanalyzedatareadily.Inlightofthis,innovativetoolshavebeendevelopedtoaddresstheproblem.However,wehypothesizedthataninteractivedatavisualizationtoolthatiscapableofstand-aloneorpluginfunctionalitythatalsoleveragescommondataquerymethodologieswouldcontributetoresearcheffortsrequiringinterrogationofclinicalresearchdatabases.HowardUniversityHospital(HUH)isatertiaryacademicmedicalcenterwithover50,000emergencydepartmentvisitsand8,000inpatientadmissionsperyearandprimarilyprovidescaretotheminoritypopulationintheDistrictofColumbiametropolitanarea.Usingde-identifiedHUHelectronicmedicalrecordsdata,aHUHclinicalresearchdatabasewasdeveloped.Additionally,theHowardUniversityelectronicMedicalRecords(HUeMR)querytoolwasdevelopedasaweb-basedclient-serverapplicationusingjavascriptandphp.HUeMRmayfunctioninstand-aloneorpluginmode.ItsgraphicalinterfacewasbuiltusingGoogleCharts,aninteractiveopensourcevisualizationlibrary.HUeMRsupportscomplexbooleansearchoperationsspecifiedbyaninteractivequerytool.Ontologyispresentedusinglinkeddropdownmenusandqueryconstructionisdisplayedinnaturallanguageform.Dataisdisplayedusingeditableinteractivecharts.Multiplerowsofchartsmaybecreatedthatcontaindifferenttypesofdataconcepts.Queriesmayberefinedbyclickingonthechartsfollowedbyselectionofoneormoreadditionalqueryparameters.DiagnosisbasedonICDcodesorkeywordsmayalsobesearched.Thesefeaturesareillustratedinadiabetesuse-caseinvestigation.Insummary,HUeMRisasecuredataanalyticsthatcanbeuseinstand-aloneorpluginmodetoqueryingclinicalresearchdatabases.Ithasahighlyinteractiveuserinterfacethatallowsrapiddataanalysisforcohortdiscovery.Thisworkwassupportedbygrant#5G12MD007597fromtheNationalInstituteonMinorityHealthandHealthDisparitiesfromtheNIH.

Page 100: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

90

DECIPHERINGLUNGADENOCARCINOMAMORPHOLOGYANDPROGNOSISBYINTEGRATINGOMICSANDHISTOPATHOLOGY

Kun-HsingYu1,GeraldJ.Berry2,DanielL.Rubin1,ChristopherRé3,RussB.Altman1,MichaelSnyder4

1BiomedicalInformaticsProgram,StanfordUniversity;2DepartmentofPathology,StanfordUniversity;3DepartmentofComputerScience,StanfordUniversity;

4DepartmentofGenetics,StanfordUniversity

Kun-HsingYuAdenocarcinomaaccountsformorethan40%oflungmalignancy,andmicroscopicpathologyevaluationisindispensabletoitsdiagnosis.However,howhistopathologyfindingsrelatetomolecularabnormalitiesremainslargelyunknown.Toaddressthisproblem,weobtainedhematoxylinandeosinstainedwhole-slidehistopathologyimages,pathologyreports,RNA-sequencing,andproteomicsdataof538lungadenocarcinomapatientsfromTheCancerGenomeAtlas.Weprofiledgeneexpression,proteinexpressionandmodifications,andextractedmorethan9,000objectivefeaturesfromthehistopathologyimagesofeachpatient.Wesuccessfullypredictedhistologygradewithtranscriptomicsandproteomicssignatures(areaundercurve>0.75)andidentifiedtheassociatedmolecularpathways,suchascellcycleregulation,whichprovidebiologicalinsightsintotumorcelldifferentiationgrades.Wefurtherbuiltanintegrativehistopathology-transcriptomicsmodeltogeneratesuperiorprognosticpredictionsforstageIpatients(P<0.01)comparedwithgeneexpressionorhistopathologyanalysisalone.Theseresultssuggestthattheintegrationofhistopathologyandomicsstudiescanrevealthemolecularmechanismsofpathologyfindingsandenhanceclinicalprognosticprediction,whichwillcontributetothedevelopmentofprecisioncancermedicine.Ourmethodsaregeneralizabletoothertypesofmalignancyordiseases.

Page 101: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

91

EXPLORINGDEEPLEARNINGFORCOPYNUMBERVARIATIONDETECTIONWITHNGSDATA

Yao-zhongZhang,RuiYamaguchi,SeiyaImoto,SatoruMiyano

InstituteofMedicalScience,UniversityofTokyo

Yao-zhongZhangCopynumbervariations(CNVs)areanimportanttypeofgeneticvariationswidelyusedforprofilingcancerandothercomplexdiseases.AccuratedetectionandsummarizationofCNVshelpidentifyoncotargetandcancersubtypesforprecisionmedicine.InusingNGSdataforCNVsdetection,variousheterogeneousbiases,suchasGC-contentbiasandothernoisesareneededtobeproperlyprocessed.ThisbecomesespeciallyimportantforCNVsdetectiononsingle-cellNGSdata.Inthisstudy,weextendtraditionalHMMapproachesforCNVsdetectionwithdeeplearning.Weextractfeaturerepresentation,whichintegratetheinformationfromreadcountandobservablegenomicsequences,asthenewobservablesequenceofgenomicbinsanditerativelytrainaDNN-HMMmodelforCNVsdetection.WecompareourmethodwithotherHMMbasedCNVsdetectionmethods.

Page 102: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

92

IMAGINGGENOMICS

POSTERPRESENTATIONS

Page 103: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

93

PERIPHERALEPIGENETICASSOCIATIONSWITHBRAINGRAYMATTERINSCHIZOPHRENIA

DongdongLin1,VinceD.Calhoun2,JuanR.Bustillo3,NoraPerrone-Bizzozero4,JingyuLiu1

1TheMindResearchNetworkandLovelaceBiomedicalandEnvironmentalResearchInstitute,Albuquerque;2Dept.ofElectronicandComputerEngineering,UniversityofNewMexico,Albuquerque;3Dept.ofPsychiatry,UniversityofNewMexico,Albuquerque;4Dept.of

Neurosciences,UniversityofNewMexico,AlbuquerqueJingyuLiuEpigeneticregulationbyDNAmethylationandhistonemodificationhasbeenincreasinglyrecognizedforitsrelevancetoschizophrenia(SZ).Beyondthegeneticvariation,epigeneticsthroughregulationofgenetranscriptionandexpressioncanpotentiallyexplainthe‘missing’heritabilityandmediatetheeffectofgeneticrisksindisease.SpecifictoDNAmethylation,recentstudieshavedemonstratedthat6-7%ofCpGsitesacrossthegenomeshowsignificantcorrespondencebetweenbrainandblood,supportingtheinvestigationofeasilyaccessibletissuesforbrainandmentaldisorders.Inthisstudy,weanalyzedDNAmethylationof163CpGsitesfromsalivaandwholebraingraymatterdensityof108SZpatientsand105healthycontrols.Weareawareofcellularitydifferencesbetweenbloodandsaliva,andtoourbestknowledgenodetailedsaliva-braincorrespondencestudyhasbeendoneexceptgeneralcomparisonofoverallpatterns,whichindicatesalivamaybeamorecloseindicatortobrainthanblood.The163CpGsitesarelocatedwithinthe108schizophrenicriskregionsreportedbythePsychiatricGenomicsConsortiumschizophreniaworkinggroup,andalsoshowedstrongcross-tissuesimilaritybasedonthegenome-widemethylationstudyofbloodandbraintissuesbyHannon,etal.QualitycontrolandnormalizationformethylationdatawereimplementedusingminfiRpackagetoremovebatcheffect,andcelltypeproportioneffect.GraymatterdensitymapsweresegmentedbySPM12withasmoothkennelof8mm3.Weappliedindependentcomponentanalysistobothbrainimagingdataandmethylationdata,andextracted25graymatternetworks,and15methylationcomponents.Amongthem,twomethylationcomponentsweresignificantlycorrelatedtothreegraymatternetworks(falsediscoveryrate<0.05).ThefirstmethylationcomponentcomprisedtwoCpGsiteswithinandneargeneZSCAN12,andwasassociatedwithabilateralmiddle/superiortemporalnetwork(r=0.25),andabilateralsuperiorfrontalnetwork(r=-0.24).Thehigherthemethylationcomponentis,thelowerthegraymatterdensityinsuperiorfrontalgyrusandthehigherinmiddletemporalgyrusare.Moreover,SZpatientsshowedsignificantgraymatterreductioninsuperiorfrontalgyrus(p=7.9x10-5).ThesecondmethylationcomponentconsistedofCpGsitesfromtwochromosomeregions(Chr.10AS3MTandNT5C2genes,andChr.12ARL6IP4andOGFOD2genes),andwasassociatedwithcaudateandthalamusregions.Allanalyseswerecontrolledforageandgender.AlthoughwedidnotfindSZspecificmethylationdifferenceswithinSZriskregions,ourresultssuggestthatDNAmethylationpatternsinsalivaareassociatedwithbraingraymattervariation,andsomeofthisvariationisrelatedtoschizophrenia.Themainlimitationofthisstudyincludes1)thelackofreplicationdatatoverifythefindings,and2)thelackofdirectsalivaandbraintissuecorrespondenceverification.

Page 104: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

94

THEINTERPLAYBETWEENOLIGO-TARGETSPECIFICANDGENOME-WIDEOFF-TARGETINTERACTIONS

OlgaV.Matveeva1,NafisaN.Nazipova2,AlekseyY.Ogurtsov3,SvetlanaA.Shabalina3

1BiopolymerDesignLLC,Acton,MA;2InstituteofMathematicalProblemsofBiology,Pushchino,MoscowRegion,Russia;3NationalCenterforBiotechnologyInformation,NationalLibraryof

Medicine,NationalInstitutesofHealth,Bethesda,MDSvetlanaShabalinaManytechniquesofmolecularbiologyinvolveinteractionofspecificoligonucleotideswithDNAorRNAasabasicstep.DNAtargetingofsingle-guided(sg)RNAsforgenomeeditingprocedures,oligonucleotidearraygeneexpressionmonitoringoranti-sense-mediatedgenedown-regulation,andtheGenomicComparisonHybridization(GCH)arrayexperimentsareexamplesoftechniquesinvolvingRNA-DNAandDNA-DNAinteractions.RNAiapproacheswithsiRNAandshRNAmoleculesarebasedonRNA-RNAinteractions.Themainproblemofanyoligo-probeexperimentisthatthespecificoligo-targetinteraction,basedonfullypairedduplex,areusuallycombinedwithnon-specificparallelreactions,whereoligo-probecouldinteractwithmanypartiallypairedDNAorRNAsequences.Theinterplaybetweenspecificandgenome-wideoff-targetinteractionsispoorlystudieddespiteitscrucialroleintheefficacyofthesetechniques.Inthisstudy,weinvestigatedoligo-probecharacteristics,whichareresponsiblefortheinterplay,andwhichmostimprovetheoligo-probedesign.Wedefinedspecificityofinteractionasaratiobetweenoligo-targetspecificandgenome-wideoff-targetinteractions.Microarraydatabases,derivedfromtheGCHexperimentsusingtheAffymetrixplatforms,andcontainingtwodifferenttypesofprobeswereusedfortheanalysisbasedonthethermodynamicfeaturesandnucleotidesequencesofoligo-probes.Thefirsttypeofoligo-probedoesnothaveaspecifictargetonthegenomeandtheirhybridizationsignalsarederivedfromgenome-widecross-hybridizationalone.ThesecondtypeincludesoligonucleotidesthathaveaspecifictargetonthegenomicDNAandtheirsignalsarederivedfromspecificandcross-hybridizationcomponentscombinedtogetherinatotalsignal.Theanalysishasrevealedthathybridizationspecificitywasnegativelyaffectedbylowstabilityofthefully-pairedoligo-targetduplex,stableprobeself-folding,G-richcontent,includingGGGmotifs,lowsequenceSymmetricalComplexity(SC)score.TheSC-scorecharacterizesnucleotidecompositionsymmetryandprobe’svulnerabilitytooff-targetinteractions.Filteringouttheprobeswiththesecharacteristicssignificantlyincreaseshybridizationspecificitybydecreasinggenome-widecross-hybridizationorbyincreasingspecificinteractions.Selectedoligo-probeshavethreetimeshigherhybridizationspecificityonaverage,comparedtotheprobesthatwerefilteredoutfromtheanalysisbyapplyingsuggestedcut-offthresholdstothedescribedparameters.Multipleregressionmodelswithdescribedparametersweresuccessfullyappliedforpredictionsofinteractionspecificityandoff-targeteffectsandsupportedparameterchoice(P<0.001).WealsocomparedprobecharacteristicsselectedfortheanalysisinmicroarraydatabaseswithapplicablefeaturesofsiRNA/shRNAdesignfromourearlierstudies.WeappliedallselectedoligonucleotidefeaturesanddescribedparameterstonewsetsofsgRNAs.Ourstudyexaminedthethermodynamicsandsequence-intrinsicpropertiesofsgRNA-DNAduplexesandanalyzedadditionalselectioncriteriathatarecriticalforguideefficacy.Finally,weidentifyuniversalfeaturesofoligo-probes,si/shRNAsandguidesforoptimaldesignincludingtheSC-score.

Page 105: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

95

PATTERNSINBIOMEDICALDATA–HOWDOWEFINDTHEM?

POSTERPRESENTATIONS

Page 106: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

96

WARS2IMPLICATEDASACOMMONMODIFIEROFMETFORMINMETABOLITEBIOMARKERSINABIOBANKCOHORT

AlyssaI.Clay1,RichardM.Weinshilboum2,K.SreekumaranNair3,RimaF.Kaddurah-Daouk4,LieweiWang2,MatthewK.Breitenstein1

1DivisionofEpidemiology,MayoClinic;2DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic;3DivisionofEndocrinology,MayoClinic;4Duke

UniversityMatthewBreitensteinBackgroundMetforminisoneofthemostwidelyprescribeddrugsworldwideandafirstlinetreatmentfortype2diabetesmellitus(T2D).Metforminhasmanymechanismsofaction,withvaryinglevelsofunderstanding.Metforminisbeingevaluatedasapotentialchemopreventionagentforcancertreatment,withinhibitionofangiogenesisasoneaffectofmetforminbeingstronglypursued.However,contradictoryevidenceexistsforapotentialmechanismofangiogenesisinhibition(Carcinogenesis2014;(35)5).Buildingonourpriorworkthatidentifiedstratumofstatisticallycorrelatedmetabolites,weaimedtoidentifyoverlappingmetforminpharmacogenomic(PGx)SNPassociations,usingpharmacometabolomicsinformedPGxpairedwithanagnosticcomputationalapproach.MethodsToelucidateoverlappingPGxsignalsofmetforminexposure,weincludedmetabolites(n=5)withcorrelatedplasmaconcentration,adjustedformetforminexposure,inabiobankcohort-based,case-controlstudy.Cases(n=274)wereexposedtometforminmonotherapywithT2D;healthycontrols(n=274)hadnoknowndrugexposures.Casesandcontrolswerematchedbyageandgender,andadjustedforBMIandbatch.Apanelofaminoacidmetabolite(n=42)concentrationswasquantitativelymeasuredusingtandemliquidchromatography-massspectrometryfromfastingplateletpoorplasmasamplescollectedinEDTA.Genotypingwasperformedusingthe700kSNPIlluminaOmniExpressarrayplatformfrom250ngofDNA.Normalizedmetaboliteconcentrationswereutilizedasendpointstoinformgenomewideassociations.ResultsIncreasedplasmametaboliteconcentrationsforleucine(t=4.47,p=<0.0001),isoleuceine(t=4.63,p=<0.0001),andvaline(t=4.48,p=<0.0001)wereobservedwithexposuretometformin.Variantrs17023164(MAF=0.31),intheTryptophanylTRNASynthetase2,Mitochondrial(WARS2)generegionofchromosome1andaneQTLforWARS2infibroblasts,wasacommondownwardmodifierofleucine(β=-11.69,p=1.79e-7),isoleuceine(β=-6.99,p=2.40e-6),andvaline(β=-14.55,p=1.04e-5)withmetforminexposure.NoSNPsinneighboringgenesregionswereinhighLD(R^2>0.5)withrs17023164.ConclusionIncreasedplasmametaboliteconcentrationsforleucine,valine,andisoleucinewereobservedwithmetforminexposure.Acommonvariant,rs17023164inWARS2,wasidentifiedasastrongdownwardmodifierofthesemetaboliteswithmetforminexposure.Independently,WARS2isproposedasadeterminantofangiogenesis(NatCom2016;(7)12061).Wepositahypothesis:modificationofmetabolitebiomarkerconcentrationassociatedwithmetforminexposurebyWARS2variantsisapotentiallinkbetweenmetforminandangiogenesis.Functionalcharacterizationofapotentialmechanismformetformininhibitionofangiogenesis,modifiedbyWARS2,isongoing.

Page 107: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

97

ESTIMATIONOFFALSENEGATIVERATESVIAEMBEDDINGSIMULATEDEVENTS

StephenV.Gliske1,KatyL.Lau1,BenjaminH.Brinkman2,GregA.Worrell2,CrisG.Fink3,WilliamC.Stacey1

1UniversityofMichigan,2MayoClinic,3OhioWesleyanUniversity

StephenGliskAutomatedeventdetectionistheresultofmanytypesofdata-drivenpatternrecognitionmethods.Oneofthegeneralchallengestotheseanalyzesisthequantificationandcorrectionforfalsenegativedetections,i.e.,caseswheretheevent(pattern)ispresentinthedatabutwasnotdetected.Estimatingthefalsepositiverateismucheasier,ashumanreviewofasubsampleofdetectedeventsissufficient.However,determiningthefalsenegativeratebyhumanreviewwouldrequiremanualsearchingthroughtherawdata,whichisimpractical,ifnotcompletelyinfeasible.Thischallengeisnotuniquetobiomedicaldataandiscommonlyaddressedinhighenergyphysics.Theapproachiscalledembedding.Itisapplicabletoanyanalysiswhereatleastoneofthesignalorbackgroundcanbemodeledwellbysimulations.Byplacingspecificeventsatknownlocations,onecanthenruntheautomateddetectorandreportthefractionofembeddedeventsthatweredetected.Wepresentthefirstapplicationofembeddingtoneurologicaldata,specificallytheautomateddetectionofabiomarkerofepilepsy(highfrequencyoscillations)recordedinintracranialelectroencephalogram(EEG)data.Thefalsenegativerateisfoundtobeconsistentacrossbothrecordingchannelandacrosspatients.

Page 108: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

98

INTEGRATIVE,INTERPRETABLEDEEPLEARNINGFRAMEWORKSFORREGULATORYGENOMICSANDEPIGENOMICS

ChuanShengFoo,AvantiShrikumar,JohnnyIsraeli,PeytonGreenside,ChrisProbert,AnnaScherbina,RahulMohan,NathanBoley,AnshulKundaje

StanfordUniversity

AnshulKundajeWepresentgeneralizableandinterpretablesuperviseddeeplearningframeworkstopredictregulatoryandepigeneticstateofputativefunctionalgenomicelementsbyintegratingrawDNAsequencewithdiversechromatinassayssuchasATAC-seq,DNase-seqorMNase-seq.First,wedevelopnovelmulti-channel,multi-modalCNNsthatintegrateDNAsequenceandchromatinaccessibityprofiles(DNase-seqorATAC-seq)topredictin-vivobindingsitesofadiversesetoftranscriptionfactors(TF)acrosscelltypeswithhighaccuracy.Ourintegrativemodelsprovidesignificantimprovementsoverotherstate-of-the-artmethodsincludingrecentlypublisheddeeplearningTFbindingmodels.Next,wetrainmulti-task,multi-modaldeepCNNstosimultaneouslypredictmultiplehistonemodificationsandcombinatorialchromatinstateatregulatoryelementsbyintegratingDNAsequence,RNA-seqandATAC-seqoracombinationofDNase-seqandMNase-seq.Ourmodelsachievehighpredictionaccuracyevenacrosscell-typesrevealingafundamentalpredictiverelationshipbetweenchromatinarchitectureandhistonemodifications.Finally,wedevelopDeepLIFT(DeepLinearImportanceFeatureTracker),anovelinterpretationengineforextractingpredictiveandbiologicalmeaningfulpatternsfromdeepneuralnetworks(DNNs)fordiversegenomicdatatypes.DeepLIFTcanintegratethecombinedeffectsofmultiplecooperatingfiltersandcomputeimportancescoresaccountingforredundantpatterns.WeapplyDeepLIFTonourmodelstoobtainunifiedTFsequenceaffinitymodels,inferhighresolutionpointbindingeventsofTFs,dissectregulatorysequencegrammarsinvolvinghomodimerandheterodimericbindingwithco-factors,learnpredictivechromatinarchitecturalfeaturesandunravelthesequenceandarchitecturalheterogeneityofregulatoryelements.

Page 109: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

99

VISUALIZATIONOFCOMPLEXDISEASESANDRELATEDGENESETS

ModestvonKorff,TobiasFink,ThomasSander

ActelionPharmaceuticalsLtd.,Allschwil,Switzerland

ModestvonKorffTherelationsbetweengenesanddiseasesformcomplexpatterns.Visualizationofthesepatternsenablesthescientisttoobtainanoverviewofthemostimportantgene–diseaserelations.Thesegene–diseaserelationsareofhighimportanceindrugdiscovery.Proteinsencodedbydisease-relatedgenesarepotentialtargetsfornewdrugsormaybecomebiomarkersfordiseasediagnosis.Bothanoveldrugtargetandabiomarkershouldbehighlyspecificfortheaimeddisease.Inourpublicationforthisconference,weintroducearelevanceestimator.Thisrelevanceestimatorisameasureofthespecificityofagene–diseaserelationshipthatalsotakesintoconsiderationallotherknowngene–diseaserelationships.Weanalyzedgene–diseaserelationshipsfrom22millionPubMedrecordsandobtainedamatrixwithrelevanceestimatorsforabout5000diseasesand15,000genes.Thisrelevancematrixenabledustoexpressthesimilaritybetweendiseaseswithsimplevector-baseddistancemeasures.Ameaningfuldisease–gene–diseasevisualization,consistingofseverallayers,wasderivedfromthesedisease–diseasesimilaritymeasuresandtherelevanceestimators.Themultidimensionalvisualizationspresentedheregiveanoverviewofcomplexdiseaseslikeasthma,Alzheimer'sdiseaseandhypertension.

Page 110: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

100

PRECISIONMEDICINE:FROMGENOTYPESANDMOLECULARPHENOTYPESTOWARDSIMPROVEDHEALTHANDTHERAPIES

POSTERPRESENTATIONS

Page 111: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

101

FINDINGSFROMTHEFOURTHCRITICALASSESSMENTOFGENOMEINTERPRETATION,ACOMMUNITYEXPERIMENTTOEVALUATEPHENOTYPE

PREDICTION

StevenE.Brenner1,GaiaAndreoletti1,RogerAHoskins1,JohnMoult2,CAGIParticipants,

1UniversityofCalifornia,Berkeley;2IBBR,UniversityofMaryland,Rockville,MD

StevenBrennerTheCriticalAssessmentofGenomeInterpretation(CAGI,\'kā-jē\)isacommunityexperimenttoobjectivelyassesscomputationalmethodsforpredictingthephenotypicimpactsofgenomicvariation.CAGIparticipantsareprovidedgeneticvariantsandmakepredictionsofresultingphenotype.Thesepredictionsareevaluatedagainstexperimentalcharacterizationsbyindependentassessors.

ThefourthCAGIexperimentconcludedthisyear.Itincluded11challengeswhichreflected:non-synonymousvariantsandtheirbiochemicalimpactmeasuredbytargetedassays;noncodingregulatoryvariantsandtheirimpactongeneexpression;researchexomesforpredictionofcomplextraits;personalgenomesandtraitprofiles;andclinicalsequencesandassociatedreferringindications.

TherewerenotablediscoveriesthroughouttheCAGIexperiment,andgeneralthemesemerged.Theindependentassessmentfoundthattopmissensepredictionmethodsarehighlystatisticallysignificant,butindividualvariantaccuracyislimited.Moreover,missensemethodstendtocorrelatebetterwitheachotherthanwithexperiments(forreasonsthatmayreflectthepredictivemethodsandtheassaysthemselves).However,theremightbepotentialformissenseinterpretationattheextremeofthedistribution.Structure-basedmissensemethodsexcelinafewcases,whileevolutionary-basedmethodshavemoreconsistentperformance.Bespokeapproachesoftenenhanceperformance.

Ontheclinicalstudies,predictorswereabletoidentifycausalvariantsthatwereoverlookedbytheclinicallaboratory,anditappearsthatphysiciansmaynotalwaysorderthemostrelevantgenetictestfortheirpatients.CAGIdatashowthatrunningmultipleuncalibratedmethodsandconsideringtheirconsensusoftenprovidesundueconfidenceintheircorrelation;wethereforeadviseagainstrunningmultipleuncalibratedvariantinterpretationtoolsinclinicalanalysis.

Theresultsshowedthatpredictingcomplextraitsfromexomesisfraught.Interpretationofnon-codingvariantsshowspromisebutisnotatthelevelofmissense.Beyondthis,creatingageneticstudythatprovidesareliablegoldstandardisremarkablydifficult.However,therewerenotableimprovementsintheabilitytomatchgenomestotraitprofiles.

CompleteinformationaboutCAGImaybefoundathttps://genomeinterpretation.org.

Page 112: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

102

ASTROLABE:EXPANSIONTOCYP2C9ANDCYP2C1

AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2

1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatricGenomic

Medicine,Children'sMercy,KansasCity

AndreaGaedigkBackground:CYP2C9and19arehighlypolymorphicpharmacogenesmetabolizingnumerousdrugs.BotharegeneswithCPICguidelinesunderscoringtheirclinicalrelevance.Tofacilitatehaplotypecallingandtranslationintophenotype,wehavedevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)thatenablesautomatedCYP2D6diplotypecallingfromwholegenomesequencing.WereportheretheextensionofAstrolabetoCYP2C9and2C19.Methods:ThestudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyandincluded85subjects(7HapMap;78patients/parents).AlleledefinitionsareaccordingtotheP450NomenclatureDatabase(cypalleles.ki.se/)withsomemodifications.Exonsand100bpofflankingintronswereusedforAstrolabecallsaswellas-2990to-440ofCYP2C9and-1063to-180ofCYP2C19harboringSNPsdefiningCYP2C9*8andCYP2C19*27,respectively.Allbut3subjectsweregenotypedforCYP2C9*2,*3,*5and*8andCYP2C19*2-*4,*17,*27and*35usingTaqManassaystovalidateAstrolabecalls.WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovevariationcallquality.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/Results:TomaximizeAstrolabecallaccuracy,intronregionswereadjustedtoincludeinformativeSNPswhileexcludingthosethatoccuronnumeroushaplotypesand/orarenotpartofadefinedallele.TheCYP2C9exon1region,e.g.waslimitedto57bpofintron1toexclude251T>C,whichispresentin1155/3540subjects(CMHvariantwarehousedatabase).ThisSNPdefinesCYP2C*29,butinterferedwithAstrolabecallsbyovercallingCYP2C*29intheabsenceofitskeySNP(33437C>A).OptimizedcallingtargetregionswerethenusedtocompareAstrolabewithgenotypecalls.Astrolabecorrectlycalled68/75(90.67%)and71/75(94.67%)ofsubjectsforCYP2C9and19,respectively.AmongtheallelesdetectedbyAstrolabeandgenotypingwereCYP2C9*2,*3and*8andCYP2C19*2,*17,*27and*35.AstrolabealsoidentifiedsubjectscarryingtherareCYP2C9*9and*11andCYP2C19*15alleleswhichwerenotcoveredbygenotyping.Astrolabecorrectlycalled1077/1128simulatedCYP2C19diplotypes(95%recall;45missedand6multiplecalls).Allmissedcallswere*12calledas*1.ForCYP2C9,Astrolabecorrectlycalled2186/2278simulateddiplotypes(95%recall;61missedand31multiplecalls).Allmissedcallswere*25calledas*1.Discussion:Astrolabe’sfunctionalitywassuccessfullyexpandedtoCYP2C9and19.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedimprovementandexpansionofthenomenclaturedefinitionswillallowustoresolvethemiscalledhaplotypesrepresentedinthesimulationsetandimproveAstrolabecallingacrossalldiplotypes.

Page 113: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

103

HUMANKINASESDISPLAYMUTATIONALHOTSPOTSATCOGNATEPOSITIONSWITHINCANCER

JonathanGallion,AngelaD.Wilkins,OlivierLichtarge

BaylorCollegeofMedicine

JonathanGallionThediscoveryofdrivergenesisamajorpursuitofcancergenomics,usuallybasedonobservingthesamemutationindifferentpatients.Buttheheterogeneityofcancerpathwaysplusthehighbackgroundmutationalfrequencyoftumorcellsoftencloudthedistinctionbetweenlessfrequentdriversandinnocentpassengermutations.Here,toovercomethesedisadvantages,wegroupedtogethermutationsfromclosekinaseparalogsunderthehypothesisthatcognatemutationsmayfunctionallyfavorcancercellsinsimilarways.Indeed,wefindthatkinaseparalogsoftenbearmutationstothesamesubstitutedaminoacidatthesamealignedpositionsandwithalargepredictedEvolutionaryAction.Functionally,thesehighEvolutionaryAction,non-randommutationsaffectknownkinasemotifs,butstrikingly,theydosodifferentlyamongdifferentkinasetypesandcancers,consistentwithdifferencesinselectivepressures.Takentogether,theseresultssuggestthatcancerpathwaysmayflexiblydistributeadependenceonagivenfunctionalmutationamongmultipleclosekinaseparalogs.Therecognitionofthis“mutationaldelocalization”ofcancerdriversamonggroupsofparalogsisanewphenomenathatmayhelpbetteridentifyrelevantmechanismsandthereforeeventuallyguidepersonalizedtherapy.

Page 114: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

104

SCOTCH:ANOVELMETHODTODETECTINSERTIONSANDDELETIONSFROMNGSDATA

RachelGoldfeder,EuanAshley

StanfordUniversity

RachelGoldfederClinical-gradegenomesequencingandinterpretationrequiresaccurateandcompletegenotypecallsacrosstheentiregenome.Whilesinglenucleotidevariantdetectionishighlyaccurateandconsistent,thesevariantsexplainonlyasmallfractionofdiseaserisk.Othertypesofvariationthatdisrupttheopenreadingframe,suchasinsertionsanddeletions(INDELs),aremorelikelytobeharmful.However,currentmethodshavelowsensitivityforlarger(>=fivebases)INDELs,primarilyduetochallengessurroundingaligningsequencereadsthatspanINDELs.WepresentScotch,anovelINDELdetectionmethodthatleveragessignaturesofpoorreadalignment,readdepthinformation,andmachinelearningapproachestoaccuratelyidentifyINDELsfromnext-generationDNAsequencingdata.Usingbiologicallyrealisticsimulatedgenomesandsequencereadswithtechnologicallyrepresentativeerrorprofiles(generatedbyART),weevaluateScotchandseveralcurrentlyavailableINDELcallers.WeshowthatScotchhashighersensitivitythancurrentmethods,particularlyforlargerINDELs.Finally,wevalidateINDELsthatScotchdiscoveredinoneindividual,NA12878,andshowthatScotchhashighpositivepredictivevalue.ThismethodwillenableresearchersandclinicianstomoreaccuratelyidentifyINDELsassociatedwithpreviouslyunexplainedgeneticconditions.

Page 115: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

105

MAYOOMICSREPOSITORYFORTRANSLATIONALMEDICINE

IainHorton,JeanetteEckel-Passow,StevenHart,ShannonMcDonnell,DavidMead,GayReed,GregDougherty,JasonRoss,JulieSwank,MarkMyers,MathieuWiepert,Rama

Volety,TonyStai,YaxiongLin,RobertFreimuth

MayoClinic

IainHortonTheMayoClinicGenomicDataWarehousehasestablishedtheinfrastructurefoundation,processes,andapplicationstomeetthetranslationalneedsoftheMayoClinicCenterforIndividualizedMedicine(CIM).Throughthestreamlinedandautomateddatapipeline,thenext-gensequencing(NGS)resultsareloadedandintegratedwithclinicaldata,providingthefoundationforthedevelopmentofrevolutionarysolutionsanddiscoveryintheclinicalpracticeandgenomicresearch.Initiatedin2012,withproductiondataingestionbeginninginearly2014,MayoClinic'sTranslationalResearchCenter(TRC)hasprovidedthecornerstoneplatformfordatacentricactivitieswithinCIM.DatageneratedfromboththeclinicalpipelineandresearchpipelineareautomaticallyloadedintoTRCwitheachnewbitaddingvalueandpowertothesystem.Twokeysolutionswithsignificantpotentialofimpactingpatientcareandscientificdiscoveryhavebeenbuiltonthisgenomicdatawarehouse.FirstistheMolecularDecisionSupportsystem,arule-basedpharmacogenomicssystemthatenablesMayoClinicclinicianstointegrateactionableinformationbasedonapatient'sgenotypeinformationatthepointofcareusingNGSdata.SecondistheMayoVariantSummaryapplication,acloud-nativesystemwhichempowersMayoClinicresearcherstoidentifyrareandactionablegenomicvariantsthroughdynamicfilteringandgroupingofsubjectphenotypeandspecimenmetadata.

Page 116: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

106

PHARMACOGENOMICSCLINICALANNOTATIONTOOL(PHARMCAT)

T.E.Klein1,M.Whirl-Carrillo1,R.M.Whaley1,M.Woon1,K.Sangkuhl1,LesterG.Carter1,H.M.Dunnenberger2,P.E.Empey3,A.T.Frase4,R.R.Freimuth5,A.Gaedigk6,A.Gordon7,C.Haidar8,J.K.Hicks9,J.M.Hoffman8,M.T.Lee10,N.Miller11,S.D.Mooney12,T.N.Person13,J.F.Peterson14,M.V.Relling8,S.A.Scott15,G.Twist11,A.Verma13,M.S.Williams10,C.Wu16,W.Yang8,M.D.Ritchie4,13

1DeptGenetics,StanfordUniv,Stanford,CA;2CenterforMolecularMedicine,NorthShoreUniversityHealthSystem,EvanstonIL;3DepartmentofPharmacyandTherapeutics,SchoolofPharmacy,

UniversityofPittsburgh;4DepartmentofBiochemistryandMolecularBiology,ThePennsylvaniaStateUniversity,UniversityPark,PA;5DepartmentofHealthSciencesResearch,MayoClinic,RochesterMN;6DivisionofClinicalPharmacology,Toxicology&TherapeuticInnovation,Children’sMercy-

KansasCity,KansasCity,MO;7DepartmentofMedicine,DivisionofMedicalGenetics,UniversityofWashington,Seattle,WA;8St.JudeChildren'sResearchHospital,Memphis,TN;9DeBartoloFamilyPersonalizedMedicineInstitute,H.LeeMoffittCancerCenter,Tampa,FL;10GenomicMedicine

Institute,GeisingerHealthSystem,Danville,PA;11CenterforPediatricGenomicMedicine,Children’sMercy,KansasCity,MO;12DepartmentofBiomedicalInformaticsandMedicalEducation,UniversityofWashington,Seattle,WA;13BiomedicalandTranslationalInformatics,GeisingerHealthSystem,Danville,PA;14VanderbiltUniversityMedicalCenter,Nashville,TN;15DepartmentofGeneticsand

GenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,NY;16DepartmentofMolecularandExperimentalMedicine,TheScrippsResearchInstitute,LaJolla,CA

TeriKleinPharmacogenomics(PGx)decisionsupportandreturnofresultsisanactiveareaofgenomicmedicineimplementationatmanyhealthcareorganizationsandacademicmedicalcenters.TheClinicalPharmacogeneticsImplementationConsortium(CPIC)hasestablishedguidelinessurroundinggene-drugpairsthatcanandshouldleadtoprescribingmodificationsbasedongeneticvariant(s).OneofthechallengesinimplementingPGxisextractinggenomicvariantsandassigninghaplotypes(includingstar-alleles)fromgeneticdataderivedfromsequencingandgenotypingtechnologiesinordertoapplytheprescribingrecommendationsofCPICguidelines.InacollaborationbetweenthePGRNStatisticalAnalysisResource(P-STAR),ThePharmacogenomicsKnowledgebase(PharmGKB),theClinicalGenomeResource(ClinGen),andCPIC,wearedevelopingasoftwaretooltoextractallvariantsfromCPIClevel-AgeneswiththeexceptionofG6PDandHLA,fromageneticdatasetresultingfromsequencingorgenotypingtechnologies(representedasa.vcf),interpretthevariantalleles,inferdiplotypes,andgenerateaninterpretationreportbasedonCPICguidelines.TheCPICpipelinereportcanthenbeusedtoinformprescribingdecisions.WeassembledafocusgroupofthoughtleadersinPGxtobrainstormtheissuesandtodesignthesoftwarepipeline.Wehostedaone-weekHackathonatthePharmGKBatStanfordUniversitytobringtogethercomputerprogrammerswithscientificcuratorstoimplementthefirstversionofthistool.Throughthisprocess,wehaveuncoveredmanyofthechallengessurroundingPGximplementation.Forexample,theinferenceofdiplotypesischallengingforseveralCPIClevel-Agenes.ThissoftwarepipelinewillbemadeavailableundertheMozillaPublicLicense(MPL2.0)anddisseminatedinGithubforthescientificandclinicalcommunitytotest,explore,andimprove.PharmCATwillprovideasolutionthatwillenablesitesimplementingPGxawaytomoreconsistentlyinterpretgenomicresultsandlinkthoseresultstopublishedclinicalguidelines.Furthermore,weareassembling(andwillbemaintaining)thetranslationtablesthatunderliethetool,whichwillsignificantlyreducetheeffortrequiredtoimplementPGxclinicallyandensuremoreuniforminterpretationsofPGxknowledge.Asprecisionmedicinecontinuestomoveintoclinicalpractice,implementationworkflowsforPGx,likePharmCAT,wouldenablestandardizedandconsistentimplementationofPGxgenes.

Page 117: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

107

PCSK9MODULATINGVARIANTSINFAMILIALHYPERCHOLESTEROLEMIA

SarathbabuKrishnamurthy1,DianeSmelser1,ManickamKandamurugu1,JosephLeader1,NouraS.Abul-Husn2,AlanR.Shuldiner2,DavidH.Ledbetter1,FrederickE.Dewey2,David

J.Carey1,MichaelF.Murray1,RaghuP.R.Metpally1

1GeisingerHealthSystem;2RegeneronGeneticsCenter

SarathbabuKrishnamurthyBACKGROUND:Highlypenetrantautosomaldominantfamilialhypercholesterolemia(FH)isknowntobecausedbypathogeniclossoffunction(LOF)variantsinLDLRandgainoffunctionvariantsinPCSK9andAPOBgenes.InadditiontoitscausativeroleinFH,PCSK9LOFvariantsareassociatedwithloweringofserumlowdensitylipoproteincholesterol(LDL-C)andtotalcholesterol.Theaimsofthisstudywereto1.IdentifyrarenovelPCSK9genevariantsthatleadtocompleteorpartiallossofproteinfunctionintheDiscovEHRcohort.2.ExploreprevalenceofPCSK9LOFvariantsinasubsetofFHpatientsand3.ExaminewhetherFHpatientscarryingPCSK9LOFsshowassociationwithloweringtheplasmalowdensityLDL-Candcardiovascularrisk.METHODS:Weanalyzedwholeexomesequencesfrom51,289individualsintheDiscovEHRcohort,whoconsentedtoparticipateintheGeisingerHealthSystem’sMyCodeCommunityHealthInitiative.Raremissenseandpredictivelossoffunction(pLOF)codingvariantsinPCSK9wereidentifiedbyintegratingbioinformaticsandevaluatingLDL-Candtotalcholesterolmeasuresfromtheelectronichealthrecords(EHR).RESULTS:IntheoverallDiscovEHRcohort,weidentified20missenseand13pLOFs(2splicedonor,6stopgainedand5frameshift)rarevariantsinPCSK9,including15novelvariantsthatwereassociatedwithlowerLDL-Candtotalcholesterollevels.LDL-CinpLOFcarrierswassignificantlylowerthaninmissensecarrierswithpresumedpartiallossoffunction(p<0.0012).PatientswithPCSK9raremissensewithpresumedpartialLOForLOFvariantshadsignificantreductionintheincidenceofcoronaryeventscomparedtothecontrolgroup(p<0.0001).InFHpatients,theLDL-loweringPCSK9R46Lvariantpreviouslyreportedas3%prevalencewasfoundtobeenrichedat9.6%andwasassociatedwithlowerLDL-CcomparedtoFHpatientsnotcarryinganR46Lallele.AnovelPCSK9missensevariant(G316S)wasalsopresentinFHpatientswithaprevalenceof0.8%andalsoshowedanLDL-loweringphenotypiceffectinanimputedfamilypedigree.CONCLUSIONS:Overall11.8%oftheFHpatientsintheDiscovEHRcohortwereidentifiedtoalsocarryaPCSK9variantwhichmodulatestheirLDL-Candserumcholesterollevels.

Page 118: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

108

INTEGRATIVENETWORKANALYSISOFPROSTATETISSUELINCRNA-MRNAEXPRESSIONPROFILESREVEALSPOTENTIALREGULATORYMECHANISMSOF

PROSTATECANCERRISKLOCI

NicholasB.Larson1,ShannonMcDonnell1,ZachFogarty1,MelissaLarson1,JohnCheville2,ShaunRiska1,SaurabhBaheti1,AshaA.Nair1,DanielO’Brien1,JaimeDavila1,DanielSchaid1,StephenN.

Thibodeau21DepartmentofHealthSciencesResearch,MayoClinic,Rochester,MN;2Departmentof

LaboratoryMedicineandPathology,MayoClinic,Rochester,MN

NicholasLarsonLarge-scalegenome-wideassociationstudieshaveidentified146lociassociatedwithriskofdevelopingprostatecancer(PRCA).However,mostoftheselocidonotlieincloseproximitytoproteincodinggenesandarepresumedtoberegulatoryinnature.DownstreamregulationofproteincodinggenesrelatedtoPRCAdevelopmentmaybemediatedbycis-actingregulationofnearbytranscripts,alsoknownascis-mediatedtrans-eQTLs.Thiscis-mediatorcausalrelationshipiscomprisedofaregulatoryvariant,anearbycis-regulatedgene,andthedownstreamregulatedtranstargetgene.Cis-mediatorsmayincludetranscriptionfactors,signalingproteins,andlongintergenicnon-codingRNAs(lincRNAs).LincRNAscorrespondtoahostofregulatoryfunctionssuchaschromatinremodelingandtranscriptionalco-activation,andhavepreviouslybeenidentifiedasdiagnosticandprognosticbiomarkersforanumberofcancers.Howevertheirroleincancerdevelopmentandprogressionispoorlyunderstood.Toexplorethehypothesisthatcis-mediatedtranseQTLsmayplayaroleinPRCArisk,weleveragedaneQTLdatasetof471samplesofnormalprostatetissuefromprostate/bladdercancerpatientswithavailableRNA-SeqandimputedIlluminaInfinium2.5Mgenotypedata.Wefirstconductedaninitialtranscriptome-wideeQTLscreeningofalllincRNAsandmRNAswith8,073SNPsinhighlinkagedisequilibrium(r2>0.5)withpreviouslyidentifiedPRCArisk-associatedvariants,identifyingapproximately5000transcripts(FDR<0.10)tobeputativelyassociated(cisortrans).WethenconstructedanundirectedGaussiangraphicalregulatorynetworkfromtheexpressionprofilesofthistranscriptsubset,identifying87,468connections.Toidentifycandidatecis-mediatornode-pairsintheexpressionnetwork,weisolatedasubsetofcis-associatedtranscripts(lincRNAormRNA)atastrictBonferronisignificancethreshold.WethenidentifiedallconnectedmRNAnodestothesecis-nodesthatdistaltothecis-variant(>1Mb)andhadevidenceofatrans-associationwiththecisvariant(P<1E-04),resultingin9candidatecis-mediatortrios.Finally,weappliedcausalmediationanalysistotesttheproportionofthetrans-associationthatismediatedbythecis-regulatedtranscript,resultingin7/9significantcis-mediatorrelationships.TranscriptionfactorHNF1Bwasidentifiedtobeasignificantmediatorinthetrans-associationsbetweenrs11263762andthreemRNAs:SRC,MIA2,andSEMA6A.AllthreeexhibitedconcomitantupregulationwithHNF1B.Notably,HNF1AhasbeenshowntostimulateSRCexpressionviaanalternativepromoter,whileMIA2isalsoaknownHNF1Atarget.DysregulationofSEMA6AhasbeenobservedinPRCAmetastasesandplaysapotentialroleinangiogenesisinteractingwithVEGFR2.MSMBandNDRG1bothdemonstrateandrogen-stimulatedexpressioninprostatetissue,andindicatedarecessivepatternofexpressiondysregulationwithrs10993994.Despiteasmallsamplesize,wereplicatedmultipletrans-eQTLsfromthesecis-mediatortriosintheGTExprostatetissueeQTLdataset(P<0.05).Together,ourfindingssuggestdysregulationofRNAexpressionmayplayaroleingeneticpredispositiontoPRCA.

Page 119: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

109

INTEGRATEDANALYSISOFGENOMICS,PROTEOMICS,ANDPHOSPHOPROTEOMICSINCELLSANDTUMORSAMPLES

JasonE.McDermott1,TaoLiu1,SamuelPayne1,VladislavPetyuk1,RichardSmith1,PhilippMertins2,StevenCarr2,KarinRodland1

1PacificNorthwestNationalLaborator,2BroadInstitute

JasonMcDermottAspartoftheClinicalProteomicTumorAnalysisConsortium(CPTAC),wehaverecentlypublishedthefirstlarge-scaleproteomicandphosphoproteomicanalysisofhigh-gradeserousovariantumors.Weobservedthatphosphorylationstatuswasanexcellentindicatorofpathwayactivityandcoulddiscriminatebetweenpatientsurvivaltimes.Inthecurrentworkwehavecombinedthisdatawithcomparabledatafrombreastcancertumorsandcancercelllinestreatedwithkinaseinhibitors,toanswerseveralfundamentalquestionsabouttheroleofphosphorylationincellularprocessesandcancer.Thetotaldatasetcomprisedover150sampleswithverydeepproteomiccoverage(>20,000phosphopeptidesconfidentlyidentified).Wefirstfoundthatthecorrelationbetweenkinaseproteinabundanceandabundanceofphosphorylatedtargetpeptideswasverylow,indicatingthatkinaseabundanceisnotagoodproxyforphosphorylationstatusoverall.However,highlycorrelatedkinase-substratepairsweresignificantlymorelikelytobetruerelationships(fromexistingknowledge),demonstratingthatthismethodcouldbeusedtopredictnovelkinasetargetsinsomecases.Weusedthisanalysistoidentifyseveralnovelkinase-substraterelationshipsthatweredifferentialbetweentumorsubtypes,andthatcorrelatedwithpathwayswherephosphorylationwasaffectedbydrugtreatment.Theserelationshipsarecurrentlyunderinvestigationaspotentialnoveltargetsfortherapeuticintervention.Tobetteranalyzecancer-relevantpathwayactivitywedevelopedanovelapproachthatcharacterizescorrelation,differentialabundance,andstatisticalinteractionsbetweencomponentstoanalyzemultipleomicstypesinthecontextofsignalingandfunctionalpathways.Weusedthisapproach,calledtheLayeredEnrichmentAnalysisofPathways(LEAP),toidentifyactivepathwaysinmolecularsubtypesofovarianandbreastcancer,andseveralnovelsubpopulationsofpatientsdisplayinguniquelydysregulatedpathways.Ourresultsshowthatintegrationofmultipleomicstypeshasgreatpotentialintheareaofdevelopmentofnoveltherapeuticapproachesforpersonalizedmedicine.

Page 120: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

110

NETDX:PATIENTCLASSIFICATIONUSINGINTEGRATEDPATIENTSIMILARITYNETWORKS

ShraddhaPai,ShirleyHui,RuthIsserlin,HussamKaka,GaryD.Bader

TheDonnellyCentre,UniversityofToronto

ShraddhaPaiPatientclassificationhaswidespreadbiomedicalandclinicalapplications,includingdiagnosis,prognosis,diseasesubtypingandtreatmentresponseprediction.Ageneralpurposeandclinicallyrelevantpredictionalgorithmshouldbeaccurate,generalizable,beabletointegratediversedatatypes(e.g.clinical,genomic,metabolomic,imaging),handlesparsedataandbeintuitivetointerpret.WedescribenetDx,asupervisedpatientclassificationframeworkbasedonpatientsimilaritynetworks,thatmeetstheabovecriteria(Ref1).netDxmodelsinputdataaspatientnetworks,andusesnetworkintegrationandmachinelearningforfeatureselection.WedemonstratetheutilityofnetDxbyintegratinggeneexpressionandcopynumbervariantstoclassifybreastcancertumoursasbeingoftheLuminalAsubtype(N=348tumours;Ref2).Usinggeneexpressiondata,netDxperformedaswellasorbetterthanestablishedstateoftheartmachinelearningmethods,achievingameanaccuracyof89%(2%s.d.)inclassifyingLuminalA.Inthesecondapplication,wepredictcase/controlstatusinautismspectrumdisordersbasedontheoccurrenceofrarecopynumberdeletionsinmetabolicpathways(N=3,291patients;Ref3);thispredictorachievedbetterperformancethanpreviouslypublishedmethods.netDxusespathwayfeaturestoaidbiologicalinterpretabilityandresultscanbevisualizedasanintegratedpatientsimilaritynetworktoaidclinicalinterpretation.Uponpublication,netDxsoftwarewillbemadepubliclyavailableviagithub;thesoftwareprovidesworkedexamplesandeasy-to-usefunctionsfordesignofcustompredictorworkflows.Moreathttp://netdx.orgReferences:1.netDxpreprint:http://dx.doi.org/10.1101/0844182.TheCancerGenomeAtlas(2012)Nature490:61.3. Pintoetal.(2014).AmJHumGen.94(5):677.

Page 121: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

111

PREVALENCEANDDETECTIONOFLOW-ALLELE-FRACTIONVARIANTSINCLINICALCANCERSAMPLES

Hyun-TaeShin1,2,JaeWonYun1,2,NayoungK.D.Kim1,Yoon-LaChoi2,3,Woong-YangPark1,2,4,PeterJ.Park5

1SamsungGenomeInstitute,SamsungMedicalCenter,Seoul,Korea;2Samsung

AdvancedInstituteofHealthScienceandTechnology,SungkyunkwanUniversity,Seoul,Korea;3DepartmentofPathology&TranslationalGenomics,SamsungMedicalCenter,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;4DepartmentofMolecularCellBiology,SungkyunkwanUniversitySchoolofMedicine,Seoul,Korea;5Department

ofBiomedicalInformatics,HarvardMedicalSchool,Boston,MA

Hyun-TaeShinClinicalapplicationofsequencing-basedassaysrequireshighsensitivityandspecificityfordetectinggenomicalterations.Ouranalysisofmorethan5000cancersamplesrevealsthatasignificantfractionofclinically-actionablesomaticvariantsmayhavelowvariantallelefractions(VAF),indicatingtheimportanceofveryhighcoveragesequencingforthesepatients.Asacasestudy,wedescriberefractorycancerpatientswithclinicalresponsetotherapiesthattargetlowVAFalterations.

Page 122: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

112

AMETHYLATION-TO-EXPRESSIONFEATUREMODELFORGENERATINGACCURATEPROGNOSTICRISKSCORESANDIDENTIFYINGDISEASETARGETS

JeffreyA.Thompson1,CarmenJ.Marsit2

1DartmouthCollege,2EmoryUniversity

JeffreyThompsonManyresearchersnowhaveavailablemultiplehigh-dimensionalmolecularandclinicaldatasetswhenstudyingadisease.Asweenterthismulti-omiceraofdataanalysis,newapproachesthatcombinedifferentlevelsofdata(e.g.atthegenomicandepigenomiclevels)arerequiredtofullycapitalizeonthisopportunity.Inthiswork,weoutlineanewapproachtomulti-omicdataintegration,whichcreatesamodelofmethylationdysregulationanditseffectongeneexpressionandthencombinesthismolecularinformationwithclinicalpredictorsaspartofasingleanalysistocreateaprognosticriskscoreforclearcellrenalcellcarcinoma.Theapproachintegratesdatainmultiplewaysandyetcreatesmodelsthatarerelativelystraightforwardtointerpretandwithahighlevelofperformance.Over100randomsplitsofthedataintotrainingandtestingsets,ourmodelhadthehighestmedianC-indexofanymethodwetried,at.792.Furthermore,wedemonstratedthatourmolecularriskpredictorisindependentofclinicalcovariatesandthatthecombinedmodelresultsinstatisticallysignificantlyhigheraccuracythaneitherdatatypealone.Additionally,theproposedprocessofdataintegrationitselfcapturesrelationshipsinthedatathatrepresenthighlydisease-relevantfunctions.Thegenesignatureweidentifyforclearcellrenalcellcarcinomaprognosisisenrichedforgenesthatarecentralnodesinaprotein-proteininteractionnetworkassociatedwiththeJAK-STATsignalingcascade,whichitselfisaknownfactorinkidneycancerprogression.Oursignatureisalsoenrichedforgenesinpathwaysinvolvedinimmuneresponse,whichareincreasinglytargetedbynovelcancertherapies.Wecallthismodelthemethylation-to-expressionfeaturemodel(M2EFM).Althoughoneoftheotherapproachesweconsideredalsoresultedinahighlyaccuratemodel,M2EFMperformedbetterwithafarmoreparsimoniousmodelthatshedslightonthepotentialrelationshipbetweenabnormalgeneregulationandcancerprognosis.Givenourresults,wethinkthatfurtherdevelopmentofthisapproachiswarranted.

Page 123: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

113

CYP2D6DIPLOTYPECALLINGFROMWGSUSINGASTROLABE:UPDATE

AndreaGaedigk1,GreysonP.Twist2,SarahSoden2,EmilyG.Farrow2,NeilA.Miller2

1DivisionofClinicalPharmacology&TherapeuticInnovation,Children'sMercy,KansasCity,SchoolofMedicine,UniversityofMissouri-KansasCity;2CenterforPediatric

GenomicMedicine,Children'sMercy,KansasCityGreysonTwistBackground:Tofacilitatehaplotypecallingandtranslationintophenotype,wehavepreviouslydevelopedaprobabilisticscoringsystem,Astrolabe(initiallycalledConstellation;Twistetal2016,GenMed1:15007)enablingautomatedCYP2D6diplotypecallingfromwholegenomesequencing.Wehaveimplementedaseriesofimprovementstoincreasecallaccuracyaswellaseaseofuse.Methods:TheStudywasapprovedbytheInstitutionalReviewBoardofChildren’sMercyKansasCityandincludedatotalof85subjects(7HapMap;78patients/parents).WGSdatawerereanalyzedwiththeDRAGENBio-ITprocessor(EdicoGenome)toimprovethequalityofvariationcalls.TheAstrolabeCYP2D6alleledefinitiontablewasexpandedtoincludea)additionalvariantsavailablethroughtheP450NomenclatureDatabase;b)variantscharacterizedbyourlaboratory,butnotavailablethroughtheNomenclatureDatabase;c)resequencingofsomealleles(e.g.*10,*17)forwhichonlyexonsareannotatedbytheNomenclatureDatabase.Programmingerrorsinthescoringalgorithmwererepairedandunittestedaswellasabroadrangeofvariantfileinputtypeswereincluded(vcf,gvcf,tabix,.gz).ImprovementsalsoincludeversioningoftheAstrolabetoolandthenomenclaturedatafromwhichcallsaregenerated.ToaccountforhaplotypeanddiplotypecombinationsnotobservedinoursamplesetsimulationsofallpossiblediplotypecombinationswereperformedusingtheARTreadsimulatorandDRAGENanalysispipeline.Astrolabeisavailableathttps://www.childrensmercy.org/genomesoftwareportal/.Results:TomaximizeAstrolabecallaccuracy,weremovedCYP2D6*1E,*3B,*4A-L,*4N,*6D,*10C-D,and*45Bfromthecallset,becauseofincompletealleledefinitions(basedonexonsonly),orSNP(s)thatarenotuniquetoanallele.Forexample,1749A>GispartoftheCYP2D6*3Band*103definitions,butalsoappearstobepresentonsome*1subvariants.Likewise,3288A>GisnotlimitedtoCYP2D6*6Dasimpliedbythenomenclaturedatabase,thuscausingerroneousAstrolabecalls.Callswithourreviseddefinitionswerecomparedwiththoseobtainedbygenotyping.AstrolabealsoaccuratelyidentifiedsubjectswithcopynumbervariationsincludingtheCYP2D6*5deletion(n=5)andgeneduplications(n=2).Also,increasedvariantcallingaccuracyoftheDRAGENpipelineimprovedthecallingofseveralsamples(n=).Astrolabecorrectlycalled7731/8128simulateddiplotypes(95%recall);133missedand264multiplecalls).Ofthemissedcalls124weredueto*38calledas*1.Discussion:TheseriesofimprovementstoAstrolabeincreasedcallaccuracyandminimizedthenumberofnocalls.PhenotypepredictionbasedonAstrolabewassuperioroverthatderivedfromalimitedgenotypepanel.ContinuedrefinementofexistingalleledefinitionsandtheinclusionofnovelhaplotypedefinitionswillfurtherimprovetheAstrolabetool.WearecurrentlyapplyingAstrolabetootherNGSdatasetsincludingexomesandtargetedNGSpanels.

Page 124: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

114

INTEGRATION,INTERPRETATIONANDDISPLAYOFMULTI-OMICDATAFORPRECISIONMEDICINE

DavidS.Wishart1,AnaMarcu1,AnChiGuo1,AshAnwar2,SolveigJohannessen3,CraigKnox4,MichaelWilson4,ChristophH.Borchers5,PieterCullis6,RobertFraser2

1UniversityofAlberta,2MolecularYouInc.,3EduceDesignInc.,4OMxInc.,5Universityof

Victoria,6UniversityofBritishColumbia

DavidWishartThegoalofprecisionmedicineistouseadvancedmulti-omictechnologiestoimprovetheaccuracyofmedicaldiagnosesandenhancetheindividualizationofmedicaltreatment.Thefundamentalchallengeinprecisionmedicineisnotinthemeasurementorcollectionofmulti-omicdatabutinitsdelivery.Inparticular,theintegration,interpretationanddisplayofmulti-omicdatahasproventobeparticularlyproblematic.Herewedescribesomeofourexperiencesintacklingthisproblemandoutlineanumberofimportantfindingsthatwebelieveareworthsharing.Ourmostimportantfindingwastheneedtousehighquality,quantitative‘omicsdata.Measuringabsolutelyquantitative‘omicsdataensuresgreaterreproducibilityandpermitsdirectcomparisonstowell-establishedclinicalreferencevalues.Several‘omicslaboratoriesofferingquantitativeserviceshavebeenidentifiedandthesearedescribedhere.Second,wediscoveredthatcustomdatabasescontainingbiomarker-diseasedataareessential.Veryfewofthesekindsofdatabasesexist,buttheyarenecessaryforthecomparisonandfullintegrationofmulti-omicdata.Inparticular,theyprovidetheinformationneededtointegratemulti-omicmeasuresandtodeterminediseaserisk.Abriefdescriptionofafewofthesebiomarker-diseasedatabasesisprovided.Third,wediscoveredthatcolor-codedgraphs,whicharehyperlinkedtodetailedtextualexplanations,arenecessaryforthefacileinterpretationofthemulti-omicdata–bothbypatientsandphysicians.Anexampleofawell-designed,web-enabled“dashboard”isshowntohighlightthesefindings.Finallywefoundthatcomprehensivedatabasesofactionableresponsesmustbepreparedsothatdetailed,customizablemedical,lifestyle,dietorpharmacologicalguidancecanbeprovidedtotreatorpreventconditionsdetectedbythesemulti-omicmeasurements.Examplesofseveralomics-derived,actionableresponsesareprovidedtoclarifythispoint.Thesefindings,alongwithseveralassociatedsoftwaretoolsanddatabases,haverecentlybeenintegratedintoanautomaticworkflowthatallowsawiderangeofmulti-omicmeasurementstobeintegrated,interpretedanddisplayedforprecisionorpersonalizedmedicineapplications.

Page 125: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

115

BIOTHINGSAPIS:LINKEDHIGH-PERFORMANCEAPISFORBIOLOGICALENTITIES

JiwenXin1,CyrusAfrasiabi1,SebastienLelong1,GingerTsueng1,SeanD.Mooney2,AndrewI.Su1,ChunleiWu1

1TheScrippsResearchInstitute,2TheUniversityofWashington

ChunleiWuTheaccumulationofbiologicalknowledgeandtheadvanceofwebandcloudtechnologyaregrowinginparallel.Recently,manybiologicaldataprovidersstarttoprovideweb-basedAPIs(ApplicationProgrammingInterfaces)foraccessingdatainasimpleandreliablemanner,inadditiontothetraditionalrawflat-filedownloads.WebAPIsprovidemanybenefitsovertraditionalfiledownloads.Forinstance,userscanrequestspecificdatasuchasalistofgenesofinterestwithouthavingtodownloadtheentiredataset,therebyprovidingthelatestdataondemandandreducingcomputationanddatatransfertimes.Thismeansthatprogrammerscanspendlesstimeonwranglingdata,andmoretimeonanalysisanddiscovery.Buildinganddeployingscalableandhigh-performancewebAPIsrequiressophisticatedsoftwareengineeringtechniques.Wepreviouslydevelopedhigh-performanceandscalablewebAPIsforgeneandgeneticvariantannotations,accessibleatMyGene.infoandMyVariant.info.Thesetwoservicesareatangibleimplementationofourexpertiseandcollectivelyserveover4millionrequestseverymonthfromthousandsofuniqueusers.Crucially,theunderlyingdesignandimplementationofthesesystemsareinfactnotspecifictogenesorvariants,butrathercanbeeasilyadaptedtootherbiomedicaldatatypessuchdrugs,diseases,pathways,species,genomes,domainsandinteractions.Wearecurrentlyexpandingthescopeofourplatformtootherbiologicalentities.Collectively,wereferthemas“BioThingsAPIs”(http://biothings.io).WealsoappliedJSON-LD(JSONforLinkingData)technologyinthedevelopmentofBioThingsAPIs.JSON-LDprovidesastandardwaytoaddsemanticcontexttotheexistingJSONdatastructure,forthepurposeofenhancingtheinteroperabilitybetweenAPIs.WehavedemonstratedtheapplicationsofJSON-LDwithBioThingsAPIs,includingdatadiscrepancychecksaswellasthecross-linkingbetweenAPIs.

Page 126: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

116

SINGLE-CELLANALYSISANDMODELLINGOFCELLPOPULATIONHETEROGENEITY

POSTERPRESENTATIONS

Page 127: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

117

SINGLECELLSIGNALINGSTATESREVEALINDUCTIONOFNON-GENETICVARIATIONINRESISTANCETOTRAIL-INDUCEDAPOPTOSIS

ReemaBaskar,HarrisFienberg,GarryNolan,SeanBendall

StanfordUniversity

ReemaBaskarTNFalpha-relatedapoptosis-inducingligand(TRAIL)hasbeenshowntospecificallytargetcancercells,howeverrampantresistancehascurtaileditsefficacyasadrug.Cell-to-cellvariationhasbeenpreviouslylinkedtoresistancetoTRAIL-inducedapoptosis.Wefurtherinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistance.Usingmasscytometry,wecapturedhigh-dimensional,single-cellsignalingstatesofdifferentcancertypesoverthecourseofTRAILtreatment.Forthefirsttime,weprovideacomprehensivesinglecelloverviewofTRAILsignalingdynamicsandprovidepopulationmetricstoquantifyheterogeneitywithinresistancephenotypes.WedemonstratethatwhileallcellsrespondtoTRAIL,asubsetofthempersistintransientresistantstatesanddonotprogresstoapoptosis.OurmethodsshowcorrelationbetweenheterogeneityofresponsetoTRAILandpersistenceofnon-apoptotic,viablecancercellsindrug.Wealsoshowthatcombinatorialtherapiesdesignedtoinhibitimplicatedpathwaysinconservedresistantstatesdonoteradicateresistanceandinfactcaninducenewstatesofresistance.Thisstudypresentsexperimentalandcomputationaltoolstoinvestigatenon-geneticphenotypicvariationasanovelmodeofdrugresistanceincanceranddemonstratestheirutilityinunderstandingresistancetoTRAIL-inducedapoptosis.

Page 128: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

118

ANOVELK-NEARESTNEIGHBORSAPPROACHTOCOMPAREMULTIPLEBIOLOGICALCONDITIONSINSINGLECELLDATA

TylerJ.Burns1,GarryP.Nolan2,NikolaySamusik2

1StanfordUniversitySchoolofMedicine,Dept.ofCancerBiology;2StanfordUniversitySchoolofMedicine,BaxterLaboratoryforStemCellBiology

TylerBurnsHighdimensionalsingle-celldataisroutinelyvisualizedintwodimensionsusingdimensionreductionalgorithmsliket-SNE,PrincipleComponentsAnalysis(PCA),orforce-directedgraphs.Whencomparinglevelsofintracellularproteinsinbasalversusperturbedcells,clusteringmustbeusedtovisualizechangesinspecificmarkersinasinglegraph.However,discretizingadatasetdoesnotallowonetounderstandsubtle,rare,and/orcontinuousbiologicalchangesacrosstheoriginalmanifold.Herein,wepresentanalgorithmthatrepresentseachcell’sinformationcontentasitsaverageacrossk-nearestneighbors.Thisallowsforcomparisonstobemadebetweenbiologicalconditionsonaper-cellbasis.Weusethistoproducedetailedt-SNEmapsdepictingbiologicalchange,andcorrelationanalysistoenumeratesignalingresponsestoperturbation.

Page 129: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

119

SINGLE-CELLRNASEQUENCINGINPRIMARYGLIOBLASTOMA:IMPROVINGANALYSISOFHETEROGENEOUSSAMPLESBYINCORPORATING

QUANTIFICATIONOFUNCERTAINTY

WendyMarieIngram,DebdiptoMisra,NicholasF.Marko,MarylynRitchie

GeisingerHealthSystemWendyIngramBackground:Glioblastoma(GBM)isthemostcommonanddeadlybraincancerinadults.Theassociatedlethalitymaybeattributabletotheintrinsicheterogeneityofmicro-invasivetumorcells,someofwhichareunavoidablyleftbehindfollowingtumorresection.Thetranscriptomicheterogeneitymaycontributetothesurvivalandsubsequentproliferationofasmallsubsetofcellsthatareresistanttoradiationandchemotherapy.Ithaslongbeenhypothesizedthatinvestigationsintothesetumorsatasinglecelllevelwillallowforbettermolecularunderstandingoftreatmentresistanceandthedevelopmentofnoveltherapeuticapproaches.Recently,advancesinsinglecellcaptureandsequencingtechnologyhavebecomeavailableandallowforthesestudiestobeconducted.However,therearemanytechnicalandcomputationalchallengesinherenttosinglecelltranscriptomicsthatarenotaddressedbytraditionalRNA-seqanalysistools.Thesechallengesincludeuncertaintyoftechnicalandbiologicalvarianceandmustbecarefullyconsideredinorderforbiologicallyandtherapeuticallyrelevantconclusionstobereached.Methods:TumortissuefromtwoGBMpatientsundergoingsurgicalresectionaspartofstandardofcaretherapywascollectedatthetimeofsurgery.WeusedtheFluidigmC1microfluidicsplatformtocapturesinglecellsfollowedbyRNAsequencing(RNA-seq)ofthesecellsandabulkpopulationof~10,000cellsfromeachtumor.Wecomparedtwodifferenttranscriptomicalignmenttools,Bowtieandkallisto,andanalyzedthesinglecelltranscriptionalheterogeneityofcellswithinandbetweentumorsusingtherecentlydevelopedanalysistools,sleuth.Tothebestofourknowledge,wearethefirsttoutilizethissinglecellcapturemethodandperformsinglecellRNA-seqanalysisusingthenewlydevelopedkallistoandsleuthprogramsforprimaryGBMtissuesamples.Results:WeshowthattheFluidigmC1microfluidicssinglecellcapturemethodproduceshighqualitytranscriptomicmaterialforRNA-seqandmayhavebenefitsoveralternativemethods(e.g.fluorescence-activatedcellsorting)suchasshorterpreparationtime.Thekallisto-sleuthanalysisprogramsprovideimprovedestimationofgeneexpressionvariabilityandmorereliableclusteringofsinglecellsbyleveragingtheuniquefeaturesofequivalencygroupsandbootstrapestimatesofkallisto.Clusteranalysisdemonstratesthatcertaincellsfrombothtumorsclustertogetherandsharesomecommonexpressionpatters,buttheremainingcellsclusterintumor-specificgroupsordonotgroupwithothercells.WeobservemarkedintertumorandintratumortranscriptionalvariabilityandnotethataverageexpressionfromsinglecellsdoesnotreliablycorrelatewiththebulkcellRNA-seqabundanceestimates.Takentogether,wehaveshownthatthecombinationofFluidigmC1andthekallisto-sleuthanalysisprogramsprovetobeusefulandreliablemethodstoobtainandanalyzehighqualitysinglecellRNA-seqdatafortheinvestigationofprimarytumortissues.

Page 130: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

120

REGISTRATIONOFFLOWCYTOMETRYDATAUSINGSWIFTCLUSTERTEMPLATESTOREMOVECHANNEL-SPECIFICORCLUSTER-SPECIFICVARIATION

JonathanA.Rebhahn1,SallyA.Quataert1,GauravSharma2,TimR.Mosmann1

1CenterforVaccineBiologyandImmunology,UniversityofRochesterMedicalCenter;2DepartmentofElectricalandComputerEngineering,UniversityofRochester

TimMosmannStandardizationbetweenflowcytometryexperimentsperformedatdifferenttimesisdifficultbecausevariationsincellparameterscanbecausedbymanyfactors,includingchangesinantibodyreagents,stainingprotocols,cellhandling,differentcytometers,andcytometersettingssuchasphotomultiplieramplificationvoltages.Thesevariationsmayoverwhelmthegenuinebiologicaldifferencesbeinginvestigated,suchasgeneticordisease-specificvariationsbetweensubjects.Technicalvariationscanbepartlyreducedbymanuallyadjustinganalysisgates,butthisissubjectiveandtime-consuming.Previousmethodsforsemi-automatedadjustmenthavereliedonhistogrampeaksormanualgatingtoidentifyanchorpopulations.Wehavenowdevelopedfully-automatedmethodsforregisteringflowcytometrysamples,i.e.normalizingthefluorescenceintensityofeachcellinallchannels.Wetakeadvantageofthehigh-resolutionclustertemplatesderivedbyclusteringreferencesamplesbytheSWIFTalgorithm.ThesetemplatesrepresentGaussianmodeldescriptionsofthemultidimensionaldata.Ifsamplestoberegisteredareatleastmoderatelysimilartothetarget/referencesample,assignmentofthetestsampletothetemplateresultsinmostcellsbeingassignedtotheappropriatecluster,butclustersthathaveshiftedinthetestsamplethenhavealteredmedianvaluesinoneormorechannels.Thishigh-resolutionpositionalinformationisusedfortwotypesofregistration:Rigid,orper-channelregistrationcomparesclusterlocationsbetweenthetargetandthetestsampletoberegistered,andthebest-fitregistrationadjustmentsaredeterminedforeachchannelandappliedincrementally,reassigningthecellsateachsteptoimprovethefinalfit.Thisobjectivelyusespositionalinformationfromallclusters,regardlessofclustersizevariation,andsuccessfullycorrectsglobalartifactssuchasstainingorcytometersettingsthatcause‘batch’differencesbetweenassaydays.Fluid,orper-clusterregistrationcalculatestheregistrationadjustmentrequiredforeachclusterinthetestsampletooverlapfullywithitscorrespondingclusterinthereferencesample.Thisregistersclustersmorecompletely,andcanremoveindividualvariation(duetoe.g.geneticordisease-specificeffects).Fluidregistrationremovesmostpositionalinformation-thisisdesirableifthemainexperimentaloutcomeisexpectedtobevariationsofthenumberofcellsofdifferenttypes.Thismethodhasbeenappliedtodatasetsthatincludechangesduetoassaydates,flowcytometers,subjects,andsequentialbloodsamples.Mostvariationoccurredbetweencytometersandassaydays,lessbetweensubjects,andtheleastbetweendifferentbleedsfromthesameperson.Registrationsubstantiallyimprovedcorrelationsbetweenclustermedians.Thenumberofcellsperclusteralsoshowedincreasedcorrelation,suggestingthatunmodifiedsamplesassignedtotheclustertemplatessometimeshadcellsassignedtoaninappropriatecluster.ThustheSWIFTcluster-basedregistrationcanimprovesubsequentflowcytometryanalysis.Registeredsamplescanbeanalyzedbyavarietyofmanualorautomatedprocedures.

Page 131: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

121

WORKSHOP:NOBOUNDARYTHINKINGINBIOINFORMATICS

POSTERPRESENTATION

Page 132: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

122

ENABLINGRICHERDATAINTEGRATIONFORGENOMICEPIDEMIOLOGY

E. Griffiths1,D.Dooley2,C.Bertelli1,J.Adam3,F.Bristow3,T.Matthews3,A.Petkau3,M.Courtot4,J.A.Carriço5,A.Keddy6,R.Beiko6,L.M.Schriml7,E.Taboada8,M.Graham3,G.VanDomselaar3,

W. Hsiao2,F.Brinkman1

1SFU,Burnaby,BC,Canada;2BCCentreforDiseaseControl,Vancouver,BC,Canada;3PHAC,Winnipeg,MB,Canada;4EBI,Hinxton,Cambridge,UK;5Univ.ofLisbon,Lisbon,Portugal;

6DalhousieUniv.,Halifax,NS,Canada;7Univ.ofMarylandSchoolofMedicine,Baltimore,MD,USA;8PHAC,Lethbridge,AB,Canada

FionaBrinkmanOnebarriertoeffectivelycapitalizingonwholegenomesequencedataisefficient,robustannotationandintegrationofassociatedcontextualdata(metadata).Whetherhuman,microbialorotherorganismalgenomicsequence,frequentlysuchcontextualdataistoounorganized,infreetextformat,toenableeffectiveintegrationforansweringmoresophisticatedquestions.ApproachestohelpovercomethisbarrierareillustratedherewiththeIntegratedRapidInfectiousDiseasesAnalysis(IRIDA.ca)ProjectandGenomicEpidemiologyOntology(GenEpiO.org)Consortium.Microbialpathogenwholegenomesequencingprovidesthehighestresolutionmolecular“fingerprint”forinfectiousdiseaseepidemiologyandistransformingpublichealthpractice–enablingmorerapididentificationofdiseaseoutbreaks,theirsources,andpotentialcontrolmeasures.However,suchmicrobialgenomicdata(likehuman‘omicdata)mustbecombinedwithepidemiological/clinical/laboratory/otherhealthcaredata(“contextualdata”)tobemeaningfullyinterpretedforclinicalandpublichealthquestions/actions.Furthermore,informationmustbesharedbetweendifferentagenciestoefficientlyassessandmanageriskstohumanhealthacrossjurisdictions.Currently,terminologiesdescribingpublichealthdatacannotbeeasilymappedacrossfunctionally-similarsoftwaresystemswithoutintricateinterventionbyspecialists,resultingindataexchangesystemsthatarestaticandfragile.Topromoteefficientdataexchangeandintelligencesharing,weproposeanintuitiveplatformforsearching,identifying,andverifyingthefundamentalhealthcareentityelements(ontologyterms)tomaptoinstitutionalapplicationdataformats,startingwithgenomicandpublichealthcontextualdata.KeyinnovationsaretheproposedGenomicEpidemiologyEntityMart(GE2M)thatallowsuserstoinspecttermdefinitions,labeling,anddatabasecrossreferencesinauser-friendlyformat,plusasoftwaresystemallowingdifferentjurisdictionstousethetermssuitableforthem,essentiallychoosingfroma“shoppingcart”ofoptionsmappedbetweenjurisdictions/organizations.AverypreliminaryprototypeofthisconcepthasbeenestablishedaspartoftheIRIDA.caprojectandtheGenEpiOConsortium(aconsortiumof70researchersfrom15countriesinterestedincontributingtothiseffort).Wehypothesizethatacommonandaccessibleontologyentitymartcanbedeveloped,ifappropriatetoolsforinterfacingdomainexpertswiththismartaredeveloped–andthemartisfirstappliedtopracticalmicrobialgenomicepidemiologydatasharingneedsbetweenselectpublichealthsystems(withconsultationinvolvingalargerconsortium).Inaddition,newgenomicdatavisualizationapproachesarebeingdevelopedforintegrationintotheIRIDAsoftwareplatform,toenablemoreinteractive,flexiblevisualizationofgenomicdatawithdifferentlevelsorviewsofcontextualdata(fromfinelydetailedcomparisonsofgenomicislandsandotherfeaturesbetweengenomes,toexamininggenomicdatainthecontextofgeographicaldata).IRIDAisbeingusedinCanada’spublichealthagency,andthisopensourcesoftwareisalsobeinginstalledinothercountriesinterestedinco-developingthisresourceandusingafederateddatasharingapproach.

Page 133: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

123

AUTHORINDEX

A

Abrams,Zachary·59Abul-Husn,NouraS.·107Adam,J.·122Adams,Micah·54Aevermann,Brian·37Afrasiabi,Cyrus·115Agarwal,Vibhu·17Akbarian,Schahram·72Aldrich,MelindaC.·20,35Alkan,Can·77,87Alser,Mohammed·77Altman,RussB.·79,90Andreoletti,Gaia·101Andres-Terrè,Marta·13Ansel,Mark·80Anwar,Ash·114Armaselu,Bogdan·18Arunachalam,HarishBabu·18Ashley,Euan·104Aslam,Naureen·68Asmann,YanW.·85Ayati,Marzieh·67

B

Bader,GaryD.·110Baheti,Saurabh·108Bai,Yongsheng·68Bakken,Trygve·37Baskar,Reema·117Bauer,ChristopherR.·27Beaulieu-Jones,BrettK.·19Bebek,Gurkan·52Beck,Andrew·50Beck,Mette·28Beiko,R.·122Bellovich,Keith·70Bendall,Sean·117Berens,Michael·31Berry,GeraldJ.·90Bertelli,C.·122Best,AaronA.·2Bhat,Zeenat·70Bichko,Dmitri·76Biernacka,JoannaM.·85Biggin,MarkD.·64Boespflug,Mathieu·76Boley,Nathan·98Bongen,Erika·13Borchers,ChristophH.·114Borecki,Ingrid·34Borrayo,Ernesto·63

Bowden,DonaldW.·45Bowerman,Nathan·2Breitenstein,MatthewK.·96Breitwieser,Gerda·34Brenner,StevenE.·101Brinkman,BenjaminH.·97Brinkman,F.·122Bristow,F.·122Bromberg,Yana·69Brosius,FrankC.·70Brown,AndrewJ.Leigh·83Brubaker,Douglas·52Brunak,Soren·28Burns,TylerJ.·118Bustillo,JuanR.·93

C

Cai,Guoshuai·73Calhoun,VinceD.·9,93Cao,Mengfei·3Carey,DavidJ.·107Carr,Steven·109Carriço,J.A.·122Carter,LesterG.·106Cederberg,Kevin·18Chan,Yu-FengYvonne·23Chance,Mark·67Chang,Rui·11Chasioti,Danai·71Chaudhary,Kumardeep·74Chen,Rong·56Chen,Yii-DerI.·45Cheung,Philip·84Cheville,John·108Chew,Guo-Liang·64Choi,Yoon-La·111Christiansen,Lena·37Clay,AlyssaI.·96Clemons,PaulA.·31Cline,Melissa·15Cohain,Ariella·11Cordero,Pablo·38Correa,Adolofo·45Costello,JamesC.·60Courtot,M.·122Cowen,LenoreJ.·3Crawford,DanaC.·20Cullis,Pieter·114

D

Daescu,Ovidiu·18Danaee,Padideh·44Darrow,Bruce·22

Page 134: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

124

Davila, Jaime·108Davis-Dusenbery,Brandi·14deBelle,J.Steven·84De,Subhajyoti·88Deisseroth,ColeA.·13DeJongh,Matthew·2Denny,Joshua·35deVries,Edsko·76Dewey,FrederickE.·34,107Dhruv,Harshil·31Diaz,Diana·51Diez-Fuertes,Francisco·37Dincer,Aslihan·72Disselkoen,Craig·54Divaraniya,AparnaA.·11Dominguez,Facundo·76Domselaar,G.Van·122Donato,Michele·51Dooley,D.·122Dougherty,Greg·105Draghici,Sorin·51Dudley,JoelT.·11,22,72Dunnenberger,H.M.·106Durmaz,Arda·52

E

Eckel-Passow,Jeanette·105Egawa,Fumiko·33Empey,P.E.·106Ergin,Oguz·77Ertekin-Taner,Nilüfer·85Eskin,Eleazar·80

F

Fantl,WendyJ.·78Farber-Eger,Eric·20Farrow,EmilyG.·102,113Fienberg,Harris·117Fink,CrisG.·97Fink,Tobias·24,99Fogarty,Zach·108Foo,ChuanSheng·98Fornage,Myriam·45Franks,JenniferM.·73Frase,A.T.·106Fraser,Robert·114Fread,KristinI.·39Freedman,BarryI.·45Freimuth,R.R.·106Freimuth,Robert·105

G

Gadegbeku,Crystal·70Gaedigk,A.·106

Gaedigk,Andrea·81,102,113Gallion,Jonathan·29,103Gao,Chen·41Garmire,Lana·74Gavin,Davin·72Gelijns,Annetine·22Genes,Nicholas·23Ghaeini,Reza·44Ghose,Saugata·87Gipson,Debbie·70Giron,Emily·84Glicksberg,Benjamin·56Gliske,StephenV.·97Goldfeder,Rachel·104Gordon,A.·106Gosh,Debashis·88Graham,M.·122Gray,DanielH.·78Greenside,Peyton·98Griffiths,E.·122Groop,Leif·28Guney,Emre·12Guo,AnChi·114

H

Haidar,C.·106Hart,Steven·105Hassan,Hasan·77Hawkins,Jennifer·70Haynes,WinstonA.·13He,Dan·30He,Shuyao·55Hellwege,JacklynN.·45Henderson,TimA.D.·52Hendrix,David·44Hershman,StevenG.·23Herzog,Julia·70Hicks,J.K.·106Hodge,Rebecca·37Hoff,FiekeW.·57Hoffman,J.M.·106Hollister,BrittanyM.·20Hong,Na·75Horton,Iain·105Horton,TerzahM.·57Hoskins,RogerA.·101Hsiao,W.·122Hu,ChenyueW.·57Huang,Austin·76Huang,Kun·7,59Hui,Shirley·110

I

Iakoucheva,LiliaM.·82Imoto,Seiya·91Ingram,WendyMarie·119

Page 135: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

125

Israeli,Johnny·98Isserlin,Ruth·110Ivkovic,Sinisa·14

J

Jebakaran,Jebakumar·22Jiang,Guoqian·75Johannessen,Solveig·114Johnson,KippW.·22Johnson,Travis·59Ju,Wenjun·70

K

Kabat,Halla·53Kaddurah-Daouk,RimaF.·96Kaka,Hussam·110Kamp,Thomas·54Kandamurugu,Manickam·107KanigelWinner,KimberlyR.·60Karakurt,Gunnur·48Kasarskis,Andrew·11,22Kashef-Haghighi,Dorna·33Kaushik,Gaurav·14Keaton,JacobM.·45Kechris,Katerina·86Keddy,A.·122Khatri,Purvesh·13, 46Kiefer,Jeff·31Kim,Jeremie·77,87Kim,Juho·61Kim,Junghi·41Kim,NayoungK.D.·111Kim,Seungchan·31Klein,T.E.·106Knox,Craig·114Ko,MelissaE.·78Kornblau,StevenM.·57Kovatch,Patricia·22Koyutürk,Mehmet·48, 67Kretzler,Matthias·70Krishnamurthy,Sarathbabu·34,107Krishnan,MichelleL.·42Kuan,PeiFen·55Kuncheva,Zhana·42Kundaje,Anshul·98Kural, Deniz ·14

L

Lanchantin,Jack·21Larson,Melissa·108Larson,NicholasB.·108Lasken,RogerS.·37

Lau,KatyL.·97Lavage,DanielR.·27,34Leader,JosephB.·27,34,107Leavey,Patrick·18Ledbetter,DavidH.·107Lee,Donghyuk·77Lee,Inhan·53Lee,M.T.·106Lein,Ed·37Lelong,Sebastien·115Li,JingyiJessica·64Li,Lang·71Li,Li·22Li,MatthewD.·13Li,Shuyu·56Lichtarge,Olivier·25,29,103Lin,Chih-Hsu·25Lin,Dongdong·93Lin,Yaxiong·105Lincoln,StephenE.·15Liu,Charles·13Liu,Jingyu·93Liu,Keli·50Liu,LarryY.·48Liu,Tao·109Lofgren,Shane·13Lopez,Alexander·34Lu,Liangqun·74Lua,RhonaldC.·25Lucas,AnastasiaM.·34Luedtke,Alexander·50

M

Ma,Meng·56Machida-Hirano,Ryoko·63Mahendra,Divya·31Mahlich,Yannick·69Mahoney,J.Matthew·27Mallory,EmilyK.·79Mandric,Igor·80Mangul,Serghei·80Marcu,Ana·114Marko,NicholasF.·119Marsit,CarmenJ.·32,112Martinez,Maria·18Massengill,Susan·70Matthews,T.·122Matveeva,OlgaV.·94McCorrison,Jamison·37McDermott,JasonE.·109McDonnell,ShannonK.·85,105,108McEachin,RichardC.·70Mead,David·105Mehta,Sanket·57Mertins,Philipp·109Metpally,RaghuP.R.·34,107Miller,Jeremy·37Miller,Neil·81,102, 106,113

Page 136: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

126

Miotto,Riccardo·22Mishra,Rashika·18Misra,Debdipto·119Miyano,Satoru·91Mohan,Rahul·98Montana,Giovanni·42Montoya,Dennis·80Mooney,SeanD.·82,106,115Moore,JasonH.·19Moskovitz,Alan·22Mosmann,TimR.·120Moult,John·101Murray,MichaelF.·107Mutlu,Onur·77,87Myers,Mark·105

N

Nair,AshaA.·108Nair,K.Sreekumaran·96Narla,Goutham·67Nazipova,NafisaN.·94Ng,MaggieC.Y.·45Nguyen,Tin·51Nho,Kwangsik·8Ni'Suilleabhain,Molly·18Ning,Xia·71Nolan,GarryP.·39,78,117,118Non,Amy·20Novotny,Mark·37

O

O'Connell,Chloe·33O’Brien,Daniel·108Ogurtsov,AlekseyY.·94Osafo,Nana·89Otolorin,Abiodun·89Overton,John·34

P

Pai,Shraddha·110Palmer,NicholetteD.·45Pan,Wei·41Pandey,Gaurav·47Pankow,JamesS.·45Parida,Laxmi·30Park,PeterJ.·111Park,Woong-Yang·111Paten,Benedict·15Payne,Samuel·109Pejaver,Vikas·82Pen,Jian·65Pendergrass,SarahA.·27,34Peng,Jian·4,61

Penn,John·34Pennathur,Subramaniam·70Perrone-Bizzozero,Nora·93Person,T.N.·106Perumal,Kalyani·70Peterson,Josh·35,106Petkau,A.·122Petyuk,Vladislav·109Pinney,Sean·22Playter,ChristopherS.·78Plevritis,SylviaK.·78Poirion,Olivier·74Pond,Sergei·83Probert,Chris·98Prodduturi,Naresh·75Pyc,MaryA.·84

Q

Qi,Yanjun·21Qu,Meng·4,65Quataert,SallyA.·120Qutub,AminaA.·57

R

Radcliffe,Richard·86Rademakers,Rosa·85Radivojac,Predrag·82Rakheja,Dinesh·18Rasmussen-Torvik,LauraJ.·45Ré,Christopher·79,90Rebhahn,JonathanA.·120Reddy,JosephS.·85Reed,Gay·105Reich,DavidL.·22Reid,Jeffrey·34Relling,M.V.·106Ren,Yingxue·85Restrepo,NicoleA.·20Rich,StephenS.·45Ricks,Doran·22Risacher,ShannonL.·8Riska,Shaun·108Ritchie,MarylynD.·34,106,119Roden,Dan·35Rodland,Karin·109Rogers,Linda·23Ross,Jason·105Ross,OwenA.·85Rossetti,Maura·80Rotman,Jeremy·80Rotter,JeromeI.·45Röttger,Richard·5Rubin,DanielL.·90Rudra,Pratyaydipta·86Russell,Nate·61Russell,Pamela·86

Page 137: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

127

S

Saba,Laura·86Salman,Ali·68Samuels,David·35Samusik,Nikolay·118Sander,Thomas·24,99Sangkuhl,K.·106Sarangi,Vivekananda·85Saykin,AndrewJ.·8Scarpa,JosephR.·11Schadt,EricE.·11,23,56,72Schaid, Daniel ·108Scherbina,Anna·98Scheuermann,RichardH.·37Schlatzer,Daniela·67Schork,Nicholas·37Schreiber,StuartL.·31Schriml,L.M.·122Schultz,André·57Scott,ErickR.·23Scott,Madeleine·46Scott,S.A.·106Sengupta,Anita·18Sengupta,ParthoP.·22Senol,Damla·77,87Shabalina,SvetlanaA.·94Shah,NigamH.·17Shameer,Khader·22Sharma,Gaurav·120Shen,Li·8,71Shi,Wen·86Shifman,Sagiv·80Shin,Hyun-Tae·111Shrikumar,Avanti·98Shuldiner,AlanR.·107Simonovic,Janko·14Singh,Ritambhara·21Sinnwell,JasonP.·85Smelser,Diane·107Smith,Kyle·88Smith,Richard·109Snyder,John·27Snyder,Michael·90Soden,Sarah·102,113Song,Junyan·55Southerland,William·89Speyer,Gil·31Spreafico,Roberto·80Stacey,WilliamC.·97Stai,Tony·105Stanescu,Ana·47Statz,Benjamin·80Steemers,Frank·37Strauli,Nicolas·80Strickland,WilliamD.·39Stuart,JoshuaM.·38Su,AndrewI.·115Su,Hai·7Swank,Julie·105

Sweeney,TimothyE.·13

T

Taboada,E.·122Tam,Andrew·13Taroni,JaclynN.·73Tatonetti,NicholasP.·22Taylor,KentD.·45Teh,Charis·78Thibodeau, Stephen N.·108Thompson,JeffreyA.·32,112Tignor,Nicole·23Tijanic,Nebojsa·14Tintle,Nathan·2,50,54Tomczak,Aurelie·13Tran,DannyN.·37Tran,HaiJ.·31Tsueng,Ginger·115Tully,Tim·84Tunkle,Leo·53Twist,GreysonP.·81,102,106,113

V

Vallania,Francesco·13,46VanDerWey,Will·80VanHouten,Jacob·35Venepally,Pratap·37Venkataraman,GuhanRam·33Verma,A.·106Verma,ShefaliS.·34Vestal,Brian·86Volety,Rama·105vonKorff,Modest·24,99

W

Wagenknecht,LynneE.·45Wall,DennisPaul·33Wang,Beilun·21Wang,Changchang·56Wang,Chao·7Wang,Chen·75Wang,Liewei·96Wang,Pei·23Wang,Sheng·4,65Wang,Yu-Ping·9Weaver,Steven·83Weinshilboum,RichardM.·96Wertheim,Joel·83Westergaard,David·28Whaley,R.M.·106Whirl-Carrillo,M.·106Whitfield,MichaelL.·73Whiting,Kathleen·48Wiepert,Mathieu·105

Page 138: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page

128

Wiggins,Roger·70Wiley,Laura·35Wilkins,AngelaD.·25,29,103Williams,M.S.·106Wilson,JamesG.·45Wilson,Michael·114Wilson,StephenJ.·25Wiredja,Danica·67Wishart,DavidS.·114Wiwie,Christian·5Woon,M.·106Worrell,GregA.·97Wu,Chunlei·106,115

X

Xin,Hongyi·77Xin,Jiwen·115

Y

Yahi,Alexandre·22Yamaguchi,Rui·91Yan,Jingwen·8Yang,HarryTaegyun·80Yang,Lin·7

Yang,Shan·15Yang,W.·106Yao,Xiaohui·71Yoo,Byunggil·81Younkin,SteveG.·85Yu,Kun-Hsing·90Yun,JaeWon·111

Z

Zaitlen,Noah·80Zelikovsky,Alex·80Zhang,Bin·72Zhang,Can·15Zhang,Fan·37Zhang,Pengyue·71Zhang,Yan·59Zhang,Yao-zhong·91Zhu,Chengsheng·69Zhu,Jun·11Zhu,Kuixi·11Ziemek,Daniel·76Zille,Pascal·9Zunder,EliR.·39,78Zweig,Micol·23


Recommended