PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK...

Post on 10-Aug-2020

2 views 0 download

transcript

PACIFICSYMPOSIUMONBIOCOMPUTING2020

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwith

thecorrespondingnumber(e.g.,ifyourabstractisonpage50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlinedintheTableofContents

andinboldtextontheabstracts.

PROCEEDINGSPAPERSWITHORALPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE....................................................1PREDICTINGLONGITUDINALOUTCOMESOFALZHEIMER'SDISEASEVIAATENSOR-BASEDJOINT

.........................................................................................................................2CLASSIFICATIONANDREGRESSIONMODELLodewijkBrand,KaiNichols,HuaWang,HengHuang,LiShen,fortheADNI

ROBUSTLYEXTRACTINGMEDICALKNOWLEDGEFROMEHRS:ACASESTUDYOFLEARNINGAHEALTH................................................................................................................................................................3KNOWLEDGEGRAPH

IreneY.Chen,MonicaAgrawal,StevenHorng,DavidSontag...........................4INCREASINGCLINICALTRIALACCRUALVIAAUTOMATEDMATCHINGOFBIOMARKERCRITERIA

JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehrADDRESSINGTHECREDITASSIGNMENTPROBLEMINTREATMENTOUTCOMEPREDICTIONUSINGTEMPORAL

...........................................................................................................................................................5DIFFERENCELEARNINGSaharHarati,AndreaCrowell,HelenMayberg,ShamimNemati

FROMGENOMETOPHENOME:PREDICTINGMULTIPLECANCERPHENOTYPESBASEDONSOMATICGENOMIC.................................................................................................6ALTERATIONSVIATHEGENOMICIMPACTTRANSFORMER

YifengTao,ChunhuiCai,WilliamW.Cohen,XinghuaLuAUTOMATEDPHENOTYPINGOFPATIENTSWITHNON-ALCOHOLICFATTYLIVERDISEASEREVEALSCLINICALLY

................................................................................................................................................7RELEVANTDISEASESUBTYPESMaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi

...8MONITORINGICUMORTALITYRISKWITHALONGSHORT-TERMMEMORYRECURRENTNEURALNETWORKKeYu,MingdaZhang,TianyiCui,MilosHauskrecht

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS......................................9DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLY

............................................................................................................................................................10DISORDEREDREGIONSSinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

DENOVOENSEMBLEMODELINGSUGGESTSTHATAP2-BINDINGTODISORDEREDREGIONSCANINCREASESTERIC.....................................................................................................................................11VOLUMEOFEPSINBUTNOTEPS15

N. SuhasJagannathan,ChristopherW.V.Hogue,LisaTucker-KelloggMODULATIONOFP53TRANSACTIVATIONDOMAINCONFORMATIONSBYLIGANDBINDINGANDCANCER-

......................................................................................................................................................12ASSOCIATEDMUTATIONSXiaorongLiu,JianhanChen

EXPLORINGRELATIONSHIPSBETWEENTHEDENSITYOFCHARGEDTRACTSWITHINDISORDEREDREGIONSAND...............................................................................................................................................................13PHASESEPARATION

RamizSomjee,DianaM.Mitrea,RichardW.Kriwacki

MUTATIONALSIGNATURES...........................................................................................................................14......................................................15PHYSIGS:PHYLOGENETICINFERENCEOFMUTATIONALSIGNATUREDYNAMICS

SarahChristensen,MarkD.M.Leiserson,MohammedEl-KebirTRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSB ..16ASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIESCaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega

DNAREPAIRFOOTPRINTUNCOVERSCONTRIBUTIONOFDNAREPAIRMECHANISMTOMUTATIONAL.............................................................................................................................................................................17SIGNATURES

DamianWojtowicz,MarkD.M.Leiserson,RodedSharan,TeresaM.Przytycka

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................18

.........19CLINICALCONCEPTEMBEDDINGSLEARNEDFROMMASSIVESOURCESOFMULTIMODALMEDICALDATAAndrewL.Beam,BenjaminKompa,AllenSchmaltz,InbarFried,GriffinWeber,NathanPalmer,XuShi,TianxiCai,IsaacS.Kohane

ii

ASSESSMENTOFIMPUTATIONMETHODSFORMISSINGGENEEXPRESSIONDATAINMETA-ANALYSISOF...........................................................................................................20DISTINCTCOHORTSOFTUBERCULOSISPATIENTS

CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.HillTOWARDSIDENTIFYINGDRUGSIDEEFFECTSFROMSOCIALMEDIAUSINGACTIVELEARNINGANDCROWD

.................................................................................................................................................................................21SOURCINGSophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer

.....................................22MICROVASCULARDYNAMICSFROM4DMICROSCOPYUSINGTEMPORALSEGMENTATIONShirGur,LiorWolf,LiorGolgher,PabloBlinder

.....................................................23USINGTRANSCRIPTIONALSIGNATURESTOFINDCANCERDRIVERSWITHLUREDavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart

PAGE-NET:INTERPRETABLEANDINTEGRATIVEDEEPLEARNINGFORSURVIVALANALYSISUSING.......................................................................................................24HISTOPATHOLOGICALIMAGESANDGENOMICDATA

JieHao,SaiChandraKosaraju,NelsonZangeTsaku,DaeHyunSong,MingonKangMACHINELEARNINGALGORITHMSFORSIMULTANEOUSSUPERVISEDDETECTIONOFPEAKSINMULTIPLE

.....................................................................................................................................................25SAMPLESANDCELLTYPESTobyDylanHocking,GuillaumeBourque

GRAPH-BASEDINFORMATIONDIFFUSIONMETHODFORPRIORITIZINGFUNCTIONALLYRELATEDGENESIN...................................................................................................................26PROTEIN-PROTEININTERACTIONNETWORKS

MinhPham,OlivierLichtargeALITERATURE-BASEDKNOWLEDGEGRAPHEMBEDDINGMETHODFORIDENTIFYINGDRUGREPURPOSING

....................................................................................................................................27OPPORTUNITIESINRAREDISEASESDanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman

...............28TWO-STAGEMLCLASSIFIERFORIDENTIFYINGHOSTPROTEINTARGETSOFTHEDENGUEPROTEASEJacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell

ENHANCINGMODELINTERPRETABILITYANDACCURACYFORDISEASEPROGRESSIONPREDICTIONVIA....................................................................................................29PHENOTYPE-BASEDPATIENTSIMILARITYLEARNING

YueWang,TongWu,YunlongWang,GaoWangPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................30

...............31INTEGRATEDCANCERSUBTYPINGUSINGHETEROGENEOUSGENOME-SCALEMOLECULARDATASETSSuzanArslanturk,SorinDraghici,TinNguyen

ASSESSMENTOFCOVERAGEFORENDOGENOUSMETABOLITESANDEXOGENOUSCHEMICALCOMPOUNDSUSING...................................................................................................................32ANUNTARGETEDMETABOLOMICSPLATFORM

SekWonKong,CarlesHernandez-FerrerCOVERAGEPROFILECORRECTIONOFSHALLOW-DEPTHCIRCULATINGCELL-FREEDNASEQUENCINGVIAMULTI-

..............................................................................................................................................................33DISTANCELEARNINGNicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey

..............................................................................................34PGXMINE:TEXTMININGFORCURATIONOFPHARMGKBJakeLever,JuliaM.Barbarino,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MichelleWhirl-Carrillo,MarkWoon,TeriE.Klein,RussB.Altman

....................................35THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTHShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic

.............................36IMPLEMENTINGACLOUDBASEDMETHODFORPROTECTEDCLINICALTRIALDATASHARINGGauravLuthria,QingboWang

....................................37PATHWAYANDNETWORKEMBEDDINGMETHODSFORPRIORITIZINGPSYCHIATRICDRUGSYashPershad,MargaretGuo,RussB.Altman

ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVEL..........................................................................................................................................................................................38DATA

JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen

iii

COMPUTATIONALLYEFFICIENT,EXACT,COVARIATE-ADJUSTEDGENETICPRINCIPALCOMPONENTANALYSISBY..................................................39LEVERAGINGINDIVIDUALMARKERSUMMARYSTATISTICSFROMLARGEBIOBANKS

JackWolf,MarthaBarnard,XuetingXia,NathanRyder,JasonWestra,NathanTintle

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................40

.......................41MULTICLASSDISEASECLASSIFICATIONFROMMICROBIALWHOLE-COMMUNITYMETAGENOMESSaadKhan,LibushaKelly

.....................................42LITGEN:GENETICLITERATURERECOMMENDATIONGUIDEDBYHUMANEXPLANATIONSAllenNie,ArturoL.Pineda,MattW.Wright,HannahWand,BryanWulf,HelioA.Costa,RonakY.Patel,CarlosD.Bustamante,JamesZou

...........................................43MULTILEVELSELF-ATTENTIONMODELANDITSUSEONMEDICALRISKPREDICTIONXianlongZeng,YunyiFeng,SoheilMoosavinasab,DeborahLin,SimonLin,ChangLiu

IDENTIFYINGTRANSITIONALHIGHCOSTUSERSFROMUNSTRUCTUREDPATIENTPROFILESWRITTENBY.................................................................................................................................................44PRIMARYCAREPHYSICIANS

HaoranZhang,ElisaCandido,AndrewS.Wilton,RaquelDuchen,LiisaJaakkimainen,WalterWodchis,QuaidMorris

OBTAININGDUAL-ENERGYCOMPUTEDTOMOGRAPHY(CT)INFORMATIONFROMASINGLE-ENERGYCTIMAGE.........................................45FORQUANTITATIVEIMAGINGANALYSISOFLIVINGSUBJECTSBYUSINGDEEPLEARNING

WeiZhao,TianlingLv,RenaLee,YangChen,LeiXingINTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS....................................46

............................................................47MANY-TO-ONEBINDINGBYINTRINSICALLYDISORDEREDPROTEINREGIONSWei-LunAlterovitz,EshelFaraggi,ChristopherJ.Oldfield,JingweiMeng,BinXue,FeiHuang,PedroRomero,AndrzejKloczkowski,VladimirN.Uversky,A.KeithDunker

MUTATIONALSIGNATURES...........................................................................................................................48......................................49IMPACTOFMUTATIONALSIGNATURESONMICRORNAANDTHEIRRESPONSEELEMENTS

EiriniStamoulakatou,PietroPinoli,StefanoCeri,RosarioPiroGENOMEGERRYMANDERING:OPTIMALDIVISONOFTHEGENOMEINTOREGIONSWITHCANCERTYPESPECIFIC

.....................................................................................................................................50DIFFERENCESINMUTATIONRATESAdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................51

..........................................................52LEARNINGALATENTSPACEOFHIGHLYMULTIDIMENSIONALCANCERDATABenjaminKompa,BeauCoker

................53SCALINGSTRUCTURALLEARNINGWITHNO-BEARSTOINFERCAUSALTRANSCRIPTOMENETWORKSHao-ChihLee,MatteoDanieletto,RiccardoMiotto,SarahT.Cherng,JoelT.Dudley

PATHFLOWAI:AHIGH-THROUGHPUTWORKFLOWFORPREPROCESSING,DEEPLEARNINGAND.......................................................................................................................54INTERPRETATIONINDIGITALPATHOLOGY

JoshuaJ.Levy,LucasA.Salas,BrockC.Christensen,AravindhanSriharan,LouisJ.VaickusIMPROVINGSURVIVALPREDICTIONUSINGANOVELFEATURESELECTIONANDFEATUREREDUCTION

...................................................55FRAMEWORKBASEDONTHEINTEGRATIONOFCLINICALANDMOLECULARDATA*LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson

BAYESIANSEMI-NONNEGATIVEMATRIXTRI-FACTORIZATIONTOIDENTIFYPATHWAYSASSOCIATEDWITH.............................................................................................................................................................56CANCERPHENOTYPES

SunhoPark,NabhonilKar,Jae-HoCheong,TaeHyunHwang......................................................................................57TREE-WEIGHTINGFORMULTI-STUDYENSEMBLELEARNERS

MayaRamchandran,PrasadPatil,GiovanniParmigianiPTREXPLORER:ANAPPROACHTOIDENTIFYANDEXPLOREPOSTTRANSCRIPTIONALREGULATORY

.............................................................................................................................58MECHANISMSUSINGPROTEOGENOMICSArunimaSrivastava,MichaelSharpnack,KunHuang,ParagMallick,RaghuMachiraju

iv

NETWORKREPRESENTATIONOFLARGE-SCALEHETEROGENEOUSRNASEQUENCESWITHINTEGRATIONOF............................................................................59DIVERSEMULTI-OMICS,INTERACTIONS,ANDANNOTATIONSDATA

NhatTran,JeanGao...............60HADOOPANDPYSPARKFORREPRODUCIBILITYANDSCALABILITYOFGENOMICSEQUENCINGSTUDIES

NicholasR.Wheeler,PenelopeBenchek,BrianW.Kunkle,KaraL.Hamilton-Nelson,MikeWarfe,JeremyR.Fondran,JonathanL.Haines,WilliamS.Bush

CERENKOV3:CLUSTERINGANDMOLECULARNETWORK-DERIVEDFEATURESIMPROVECOMPUTATIONAL..............................................................................................................61PREDICTIONOFFUNCTIONALNONCODINGSNPS

YaoYao,StephenA.RamseyPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................62

.................63ANOMIGAN:GENERATIVEADVERSARIALNETWORKSFORANONYMIZINGPRIVATEMEDICALDATAHoBae,DahuinJung,Hyun-SooChoi,SungrohYoon

FREQUENCYOFCLINVARPATHOGENICVARIANTSINCHRONICKIDNEYDISEASEPATIENTSSURVEYEDFOR..........................................................................64RETURNOFRESEARCHRESULTSATACLEVELANDPUBLICHOSPITAL

DanaC.Crawford,JohnLin,JessicaN.CookeBailey,TylerKinzy,JohnR.Sedor,JohnF.O'Toole,WilliamsS.Bush

................65NETWORK-BASEDMATCHINGOFPATIENTSANDTARGETEDTHERAPIESFORPRECISIONONCOLOGYQingzhiLiu,MinJinHa,RupamBhattacharyya,LanaGarmire,VeerabhadranBaladandayuthapani

PHENOME-WIDEASSOCIATIONSTUDIESONCARDIOVASCULARHEALTHANDFATTYACIDSCONSIDERING..................................................................66PHENOTYPEQUALITYCONTROLPRACTICESFOREPIDEMIOLOGICALDATA

KristinPassero,XiHe,JiayanZhou,BertramMueller-Myhsok,MarcusE.Kleber,WinfriedMaerz,MollyA.Hall

.....................................67ATEMPO:PATHWAY-SPECIFICTEMPORALANOMALIESFORPRECISIONTHERAPEUTICSChristopherMichaelPietras,LiamPower,DonnaK.Slonim

.........................................................68FEATURESELECTIONANDDIMENSIONREDUCTIONOFSOCIALAUTISMDATAPeterWashington,KelleyMariePaskov,HaikKalantarian,NathanielStockham,CatalinVoss,AaronKline,RitikPatnaik,BriannaChrisman,MayaVarma,QandeelTariq,KaitlynDunlap,JesseySchwartz,NickHaber,DennisP.Wall

POSTERPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................69PRIORITIZINGCOPYNUMBERVARIANTSUSINGPHENOTYPEANDGENEFUNCTIONALSIMILARITY.....................70AzzaAlthagafi,JunChen,RobertHoehndorf

INFERRINGTHEREWARDFUNCTIONSTHATGUIDECANCERPROGRESSION..............................................................71JohnKalantari,HeidiNelson,NicholasChia

PREDICTINGDISEASE-ASSOCIATEDMUTATIONOFMETAL-BINDINGSITESINPROTEINSUSINGADEEPLEARNINGAPPROACH................................................................................................................................................................................72MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun

GENERAL...............................................................................................................................................................73RANKINGRASPATHWAYMUTATIONSUSINGEVOLUTIONARYHISTORYOFMEK1...................................................74KatiaAndrianova,IgorJouline

INTEGRATIVEANALYSISOFCOPDANDLUNGCANCERMETADATAREVEALSSHAREDALTERATIONSINIMMUNERESPONSE,PTENANDPI3K-AKTPATHWAYS}.............................................................................................................75DannielleSkander,ArdaDurmaz,MohammedOrloff,GurkanBebek

INVESTIGATINGSOURCESOFIRREPRODUCIBILITYINANALYSISOFGENEEXPRESSIONDATA..................................76CarlyA.Bobak,JaneE.Hill

ETHEREUMANDMULTICHAINBLOCKCHAINSASSECURETOOLSFORINDIVIDUALIZEDMEDICINE........................77CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein

v

GENOMICPREDICTORSOFL-ASPARAGINASE-INDUCEDPANCREATITISINPEDIATRICCANCERPATIENTS............78BrittDrogemoller,GalenE.B.Wright,ShahradRassekh,ShinyaIto,BruceCarleton,ColinRoss,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium

NITECAP:ANOVELMETHODANDINTERFACEFORTHEIDENTIFICATIONOFCIRCADIANBEHAVIORINHIGHLYPARALLELTIME-COURSEDATA.............................................................................................................................................79ThomasG.Brooks,CrisW.Lawrence,NicholasF.Lahens,SoumyashantNayak,DimitraSarantopoulou,GarretA.FitzGerald,GregoryR.Grant

THEINTERPLAYOFOBESITYANDRACE/ETHNICITYONMAJORPERINATALCOMPLICATIONS.............................80YaadiraBrown,MPH;OlubodeA.Olufajo,MD,MPH;EdwardE.CornwellIII,MD;WilliamSoutherland,PhD

ACOMPARISONOFPHARMACOGENOMICINFORMATIONINFDA-APPROVEDDRUGLABELSANDCPICGUIDELINES..............................................................................................................................................................................81KatherineI.Carrillo,TeriE.Klein

XTEA:ATRANSPOSABLEELEMENTINSERTIONANALYZERFORGENOMESEQUENCINGDATAFROMMULTIPLETECHNOLOGIES........................................................................................................................................................................82ChongChu,RebecaMonroy,SoohyunLee,E.AliceLee,PeterJ.Park

GOGETDATA(GGD):SIMPLE,REPRODUCIBLEACCESSTOSCIENTIFICDATA............................................................83MichaelCormier,JonBelyeu,BrentPedersen,JoeBrown,JohannesKoster,AaronR.Quinlan

GLOBALEPIGENOMICREGULATIONOFGENEEXPRESSIONANDCELLULARPROLIFERATIONINT-CELLLEUKEMIA..84SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue

APHARMACOGENOMICINVESTIGATIONOFTHECARDIACSAFETYPROFILEOFONDANSETRONINCHILDRENANDINPREGNANTWOMEN............................................................................................................................................................85GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraAMagee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton

TREND:APLATFORMFOREXPLORINGPROTEINFUNCTIONINPROKARYOTESUSINGPHYLOGENETICS,DOMAINARCHITECTURES,ANDGENENEIGHBORHOODSINFORMATION......................................................................................86VadimM.Gumerov,IgorB.Zhulin

TRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSBASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIES..87CaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega

AFLEXIBLEPIPELINEFORTHEPREDICTIONOFBIOMARKERSRELEVANTTODRUGSENSITIVITY........................88V.KeithHughitt,SayehGorjifard,AleksandraM.Michalowski,JohnK.Simmons,RyanDale,EricC.Polley,JonathanJ.Keats,BeverlyA.Mock

CREATINGAMETABOLICSYNDROMERESEARCHRESOURCE(METSRR)...................................................................89WillyshaJenkins,ChristianRichardson,ClarLyndaWilliams-DeVanePhD

UTILIZINGCOHORTINFORMATIONTOFINDCAUSATIVEVARIANTS...............................................................................90SenayKafkas,RobertHoehndorf

INTEGRATEDANALYSISOFJAK-STATPATHWAYINHOMEOSTASIS,SIMULATEDINFLAMMATIONANDTUMOUR...91MilicaKrunic,AnzhelikaKarjalainen,MojoyinolaJoannaOla,StephenShoebridge,SabineMacho-Maschler,CarolineLassnig,AndreaPoelzl,MatthiasFarlik,NikolausFortelny,ChristophBock,BirgitStrobl,MathiasMueller

BEERS2:THENEXTGENERATIONOFRNA-SEQSIMULATOR....................................................................................92NicholasF.Lahens,ThomasG.Brooks,DimitraSarantopoulou,SoumyashantNayak,CrisLawrence,AnandSrinivasan,JonathanSchug,GarretA.FitzGerald,JohnB.Hogenesch,YosephBarash,GregoryR.Grant

EFFECTMODIFICATIONBYAGEONADIAGNOSTICTHREE-GENE-SIGNATUREINPATIENTSWITHACTIVETUBERCULOSIS........................................................................................................................................................................93LaurenMcDonnell,CarlyBobak,MatthewNemesure,JustinLin,JaneHill

CLASSIFICATIONANDMUTATIONPREDICTIONFROMGASTROINTESTINALCANCERHISTOPATHOLOGYIMAGESUSINGDEEPLEARNING...........................................................................................................................................................94SungHakLee,Hyun-JongJang

vi

MAPPINGTHEEMERGENCEANDMIGRATIONOFHEMATOPOIETICSTEMCELLSANDPROGENITORSDURINGHUMANDEVELOPMENTATSINGLECELLRESOLUTION..................................................................................................95FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola

LARGE-SCALEMACHINELEARNINGANDGRAPHANALYTICSFORFUNCTIONALPREDICTIONOFPATHOGENPROTEINS.................................................................................................................................................................................96JasonMcDermott,SongFeng,WilliamNelson,Joon-YongLee,SayanGhosh,ArifulKhan,MahanteshHalappanavar,JustineNguyen,JonathanPruneda,DavidBaltrus,JoshuaAdkins

GENE-SETANALYSISUSINGGWASSUMMARYSTATISTICSANDGTEXDATABASE....................................................97MasahiroNakatochi

TARGETINGCANCERVIASIGNALINGPATHWAYS:ANOVELAPPROACHTOTHEDISCOVERYOFGENECCDC191'SDOUBLE-AGENTFUNCTIONUSINGDIFFERENTIALGENEEXPRESSION,HEATMAPANALYSESTHROUGHAIDEEPLEARNING,ANDMATHEMATICALMODELING................................................................................98AnnieOstojic

RFEX:SIMPLERANDOMFORESTMODELANDSAMPLEEXPLAINERFORNON-MACHINELEARNINGEXPERTS..99DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar

APPARENTBIASTOWARDLONGGENEMISREGULATIONINMECP2SYNDROMESDISAPPEARSAFTERCONTROLLINGFORBASELINEVARIATIONS.....................................................................................................................100AyushT.Raman,AmyEPohodich,Ying-WooiWan,HariKrishnaYalamanchili,WilliamE.Lowry,HudaY.Zoghbi,ZhandongLiu

PREDICTIONOFCHRONOLOGICALANDBIOLOGICALAGEFROMLABORATORYDATA..............................................101LukeSagers,LukeMelas-Kyriazi,ChiragJ.Patel,ArjunK.Manrai

WHOLEGENOMESEQUENCINGANALYSISOFINFLUENZACVIRUSINKOREA...........................................................102SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim

MININGTHEHUMUHUMUNUKUNUKUAPUAANDTHESHAKAOFAUTISMWITHBIGDATABIOMEDICALDATASCIENCE.................................................................................................................................................................................103PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyPaskov,NateStockham,MayaVarma,EmilieLeBlanc,JackKent,YordanPenev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall

DEVELOPMENTOFARECURRENCEPREDICTIONMODELFOREARLYLUNGADENOCARCINOMAUSINGRADIOMICS-BASEDARTIFICIALINTELLIGENCE.....................................................................................................................................104HeeChulYang,GunseokPark,JiEunOh

DRLPC:DIMENSIONREDUCTIONOFSEQUENCINGDATAUSINGLOCALPRINCIPALCOMPONENTS...................105YunJooYoo,FatemehYavartanu,ShelleyB.Bull

META-ANALYSISINEXHAUSTEDTCELLSFROMHOMOSAPIENSANDMUSMUSCULUSPROVIDESNOVELTARGETSFORIMMUNOTHERAPY........................................................................................................................................................106LinZhang,YichengGuo,HafumiNishi

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS.................................107DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLYDISORDEREDREGIONS.........................................................................................................................................................108SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

MUTATIONALSIGNATURES........................................................................................................................109TRANSCRIPTION-ASSOCIATEDREGIONALMUTATIONRATESANDSIGNATURESINREGULATORYELEMENTSACROSS2,500WHOLECANCERGENOMES......................................................................................................................110JüriReimand

COMPLEXMOSAICSTRUCTURALVARIATIONSINHUMANFETALBRAINS...................................................................111ShobanaSekar,LiviaTomasini,MariaKalyva,TaejeongBae,LoganManlove,BoZhou,JessicaMariani,FritzSedlazeck,AlexanderE.Urban,ChristosProukakis,FloraM.Vaccarino,AlexejAbyzov

vii

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK.................................................................................................................................................................112STRATIFICATIONOFKIDNEYTRANSPLANTRECIPIENTSBASEDONTEMPORALDISEASETRAJECTORIES............113IsabellaFriisJørgensenPhD,SørenSchwartzSørensenPhD,SørenBrunakPhD

MODELINGGENEEXPRESSIONLEVELSFROMEPIGENETICMARKERSUSINGADYNAMICALSYSTEMSAPPROACH114JamesBrunner,JacobKim,KordM.Kober

TRANSLATINGBIGDATANEUROIMAGINGFINDINGSINTOMEASUREMENTSOFINDIVIDUALVULNERABILITY..115PeterKochunov,PaulThompson,NedaJahanshad,ElliotHong

AUTOMATINGNEW-USERCOHORTCONSTRUCTIONWITHINDICATIONEMBEDDINGS............................................116RachelD.Melamed

REPRODUCIBILITY-OPTIMIZEDSTATISTICALTESTINGFOROMICSSTUDIES.............................................................117TomiSuomi,LauraElo

DATAINTEGRATIONEXPECTATIONMAPS:TOWARDSMOREINFORMED'OMICDATAINTEGRATION.................118TiaTate,ChristainRichardson,ClarLyndaWIlliams-DeVane

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE............................................................................................................................................................119INTEGRATEDOMICSDATAMININGOFSYNERGISTICGENEPAIRSFORCANCERPRECISIONMEDICINE.................120EunaJeong,ChoaPark,SukjoonYoon

THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTH.................................121ShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic

ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVELDATA.......................................................................................................................................................................................122JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen

PHARMGKB:AUTOMATEDLITERATUREANNOTATIONS............................................................................................123MichelleWhirl-Carrillo,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MarkWoon,JuliaBarbarino,JakeLever,RussB.Altman,TeriE.Klein

WORKSHOPSWITHPOSTERPRESENTATIONSPACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE...........124APOLLOPROVIDESCOLLABORATIVEGENOMEANNOTATIONEDITINGWITHTHEPOWEROFJBROWSE...........125NathanDunn,ColinDiesh,RobertBuels,HelenaRasche,AnthonyBretaudeau,NomiHarris,IanHolmes

G:PROFILER-ONEFUNCTIONALENRICHMENTANALYSISTOOL,MANYINTERFACESSERVINGLIFESCIENCECOMMUNITIES.......................................................................................................................................................................126LiisKolberg,UkuRaudvere,IvanKuzmin,JaakVilo,HediPeterson

INCREASINGUSABILITYANDDISSEMINATIONOFTHEPATHFXALGORITHMUSINGWEBAPPLICATIONSANDDOCKERSYSTEMS.................................................................................................................................................................127JenniferWilson,NicholasStepanov,AjinkyaChalke,MikeWong,DragutinPetkovic,RussB.Altman

TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA......................................................................................................................................................................128IDENTIFICATIONOFBIOMARKERSRELATEDTOAUTISMSPECTRUMDISORDERUSINGGENOMICINFORMATION.................................................................................................................................................................................................129LeenaSait,MarthaGizaw,andIosifVaisman

APAN-CANCER3-GENESIGNATURETOPREDICTDORMANCY.....................................................................................130IvyTran,AnchalSharma,SubhajyotiDe

AUTHORINDEX.......................................................................................................................................131

1

ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

2

ArtificialIntelligenceforEnhancingClinicalMedicine

PredictingLongitudinalOutcomesofAlzheimer'sDiseaseviaaTensor-BasedJointClassificationandRegressionModel

LodewijkBrand1,KaiNichols1,HuaWang1,HengHuang2,LiShen3,fortheADNI

1ColoradoSchoolofMines,2UniversityofPittsburgh,3UniversityofPennsylvaniaHuaWangAlzheimer'sdisease(AD)isaseriousneurodegenerativeconditionthataffectsmillionsofpeopleacrosstheworld.RecentlymachinelearningmodelshavebeenusedtopredicttheprogressionofAD,althoughtheyfrequentlydonottakeadvantageofthelongitudinalandstructuralcomponentsassociatedwithmulti-modalmedicaldata.Toaddressthis,wepresentanewalgorithmthatusesthemulti-blockalternatingdirectionmethodofmultiplierstooptimizeanovelobjectivethatcombinesmulti-modallongitudinalclinicaldataofvariousmodalitiestosimultaneouslypredictthecognitivescoresanddiagnosesoftheparticipantsintheAlzheimer'sDiseaseNeuroimagingInitiativecohort.Ournewmodelisdesignedtoleveragethestructureassociatedwithclinicaldatathatisnotincorporatedintostandardmachinelearningoptimizationalgorithms.Thisnewapproachshowsstate-of-the-artpredictiveperformanceandvalidatesacollectionofbrainandgeneticbiomarkersthathavebeenrecordedpreviouslyinADliterature.

3

ArtificialIntelligenceforEnhancingClinicalMedicine

RobustlyExtractingMedicalKnowledgefromEHRs:ACaseStudyofLearningaHealthKnowledgeGraph

IreneY.Chen1,MonicaAgrawal1,StevenHorng2,DavidSontag1

1MassachusettsInstituteofTechnology,2BethIsraelDeaconessMedicalCenter

IreneChenIncreasinglylargeelectronichealthrecords(EHRs)provideanopportunitytoalgorithmicallylearnmedicalknowledge.Inoneprominentexample,acausalhealthknowledgegraphcouldlearnrelationshipsbetweendiseasesandsymptomsandthenserveasadiagnostictooltoberefinedwithadditionalclinicalinput.Priorresearchhasdemonstratedtheabilitytoconstructsuchagraphfromover270,000emergencydepartmentpatientvisits.Inthiswork,wedescribemethodstoevaluateahealthknowledgegraphforrobustness.Movingbeyondprecisionandrecall,weanalyzeforwhichdiseasesandforwhichpatientsthegraphismostaccurate.Weidentifysamplesizeandunmeasuredconfoundersasmajorsourcesoferrorinthehealthknowledgegraph.Weintroduceamethodtoleveragenon-linearfunctionsinbuildingthecausalgraphtobetterunderstandexistingmodelassumptions.Finally,toassessmodelgeneralizability,weextendtoalargersetofcompletepatientvisitswithinahospitalsystem.WeconcludewithadiscussiononhowtorobustlyextractmedicalknowledgefromEHRs.

4

ArtificialIntelligenceforEnhancingClinicalMedicine

IncreasingClinicalTrialAccrualviaAutomatedMatchingofBiomarkerCriteria

JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehr

StanfordUniversitySchoolofMedicine

Successfulimplementationofprecisiononcologyrequiresboththedeploymentofnucleicacidsequencingpanelstoidentifyclinicallyactionablebiomarkers,andtheefficientscreeningofpatientbiomarkereligibilitytoon-goingclinicaltrialsandtherapies.Thisprocessistypicallyperformedmanuallybybiocurators,geneticists,pathologists,andoncologists;however,thisisatime-intensive,andinconsistentprocessamongsthealthcareproviders.WepresentthedevelopmentofafeaturematchingalgorithmicpipelinethatidentifiespatientswhomeeteligibilitycriteriaofprecisionmedicineclinicaltrialsviageneticbiomarkersandapplyittopatientsundergoingtreatmentattheStanfordCancerCenter.Thisstudydemonstrates,throughourpatienteligibilityscreeningalgorithmthatleveragesclinicalsequencingderivedbiomarkerswithprecisionmedicineclinicaltrials,thesuccessfuluseofanautomatedalgorithmicpipelineasafeasible,accurateandeffectivealternativetothetraditionalmanualclinicaltrialcuration.

5

ArtificialIntelligenceforEnhancingClinicalMedicine

AddressingtheCreditAssignmentProbleminTreatmentOutcomePredictionusingTemporalDifferenceLearning

SaharHarati1,AndreaCrowell2,HelenMayberg3,ShamimNemati4

1StanfordUniversity,2EmoryUniversity,3MountSinai,4UniversityofCaliforniaSanDiego

SaharHaratiMentalhealthpatientsoftenundergoavarietyoftreatmentsbeforefindinganeffectiveone.Improvedpredictionoftreatmentresponsecanshortenthedurationoftrials.Akeychallengeofapplyingpredictivemodelingtothisproblemisthatoftentheeffectivenessofatreatmentregimenremainsunknownforseveralweeks,andthereforeimmediatefeedbacksignalsmaynotbeavailableforsupervisedlearning.HereweproposeaMachineLearningapproachtoextractingaudio-visualfeaturesfromweeklyvideointerviewrecordingsforpredictingthelikelyoutcomeofDeepBrainStimulation(DBS)treatmentseveralweeksinadvance.Intheabsenceofimmediatetreatment-responsefeedback,weutilizeajointstate-estimationandtemporaldifferencelearningapproachtomodelboththetrajectoryofapatient'sresponseandthedelayednatureoffeedbacks.Ourresultsbasedonlongitudinalrecordingsfrom12patientswithdepressionshowthatthelearnedstatevaluesarepredictiveofthelong-termsuccessofDBStreatments.Weachieveanareaunderthereceiveroperatingcharacteristiccurveof0.88,beatingallbaselinemethods.

6

ArtificialIntelligenceforEnhancingClinicalMedicine

Fromgenometophenome:Predictingmultiplecancerphenotypesbasedonsomaticgenomicalterationsviathegenomicimpacttransformer

YifengTao1,ChunhuiCai2,WilliamW.Cohen1,XinghuaLu2

1CarnegieMellonUniversity,2UniversityofPittsburgh

YifengTaoCancersaremainlycausedbysomaticgenomicalterations(SGAs)thatperturbcellularsignalingsystemsandeventuallyactivateoncogenicprocesses.Therefore,understandingthefunctionalimpactofSGAsisafundamentaltaskincancerbiologyandprecisiononcology.Here,wepresentadeepneuralnetworkmodelwithencoder-decoderarchitecture,referredtoasgenomicimpacttransformer(GIT),toinferthefunctionalimpactofSGAsoncellularsignalingsystemsthroughmodelingthestatisticalrelationshipsbetweenSGAeventsanddifferentiallyexpressedgenes(DEGs)intumors.Themodelutilizesamulti-headself-attentionmechanismtoidentifySGAsthatlikelycauseDEGs,orinotherwords,differentiatingpotentialdriverSGAsfrompassengeronesinatumor.GITmodellearnsavector(geneembedding)asanabstractrepresentationoffunctionalimpactforeachSGA-affectedgene.GivenSGAsofatumor,themodelcaninstantiatethestatesofthehiddenlayer,providinganabstractrepresentation(tumorembedding)reflectingcharacteristicsofperturbedmolecular/cellularprocessesinthetumor,whichinturncanbeusedtopredictmultiplephenotypes.WeapplytheGITmodelto4,468tumorsprofiledbyTheCancerGenomeAtlas(TCGA)project.TheattentionmechanismenablesthemodeltobettercapturethestatisticalrelationshipbetweenSGAsandDEGsthanconventionalmethods,anddistinguishescancerdriversfrompassengers.ThelearnedgeneembeddingscapturethefunctionalsimilarityofSGAsperturbingcommonpathways.Thetumorembeddingsareshowntobeusefulfortumorstatusrepresentation,andphenotypepredictionincludingpatientsurvivaltimeanddrugresponseofcancercelllines.

7

ArtificialIntelligenceforEnhancingClinicalMedicine

Automatedphenotypingofpatientswithnon-alcoholicfattyliverdiseaserevealsclinicallyrelevantdiseasesubtypes

MaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi

IcahnSchoolofMedicineatMountSinai,Sema4MaxenceVandrommeNon-alcoholicfattyliverdisease(NAFLD)isacomplexheterogeneousdiseasewhichaffectsmorethan20%ofthepopulationworldwide.SomesubtypesofNAFLDhavebeenclinicallyidentifiedusinghypothesis-drivenmethods.Inthisstudy,weuseddataminingtechniquestosearchforsubtypesinanunbiasedfashion.Usingelectronicsignaturesofthedisease,weidentifiedacohortof13,290patientswithNAFLDfromahospitaldatabase.Wegatheredclinicaldatafrommultiplesourcesandappliedunsupervisedclusteringtoidentifyfivesubtypesamongthiscohort.Descriptivestatisticsandsurvivalanalysisshowedthatthesubtypeswereclinicallydistinctandwereassociatedwithdifferentratesofdeath,cirrhosis,hepatocellularcarcinoma,chronickidneydisease,cardiovasculardisease,andmyocardialinfarction.Noveldiseasesubtypesidentifiedinthismannercouldbeusedtorisk-stratifypatientsandguidemanagement.

8

ArtificialIntelligenceforEnhancingClinicalMedicine

MonitoringICUMortalityRiskwithALongShort-TermMemoryRecurrentNeuralNetwork

KeYu1,MingdaZhang2,TianyiCui2,MilosHauskrecht2

1IntelligentSystemsProgram,UniversityofPittsburgh;2DepartmentofComputerScience,

UniversityofPittsburghKeYuInintensivecareunits(ICU),mortalitypredictionisacriticalfactornotonlyforeffectivemedicalinterventionbutalsoforallocationofclinicalresources.Structuredelectronichealthrecords(EHR)containvaluableinformationforassessingmortalityriskinICUpatients,butcurrentmortalitypredictionmodelsusuallyrequirelaborioushuman-engineeredfeatures.Furthermore,substantialmissingdatainEHRisacommonproblemforboththeconstructionandimplementationofapredictionmodel.Inspiredbylanguage-relatedmodels,wedesignanewframeworkfordynamicmonitoringofpatients’mortalityrisk.Ourframeworkusesthebag-of-wordsrepresentationforallrelevantmedicaleventsbasedonmostrecenthistoryasinputs.Bydesign,itisrobusttomissingdatainEHRandcanbeeasilyimplementedasaninstantscoringsystemtomonitorthemedicaldevelopmentofallICUpatients.Specifically,ourmodeluseslatentsemanticanalysis(LSA)toencodethepatients’statesintolow-dimensionalembeddings,whicharefurtherfedtolongshort-termmemorynetworksformortalityriskprediction.Ourresultsshowthatthedeeplearningbasedframeworkperformsbetterthantheexistingseverityscoringsystem,SAPS-II.Weobservethatbidirectionallongshort-termmemorydemonstratessuperiorperformance,probablyduetothesuccessfulcaptureofbothforwardandbackwardtemporaldependencies.

9

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

10

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions

SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

DepartmentofComputerScience,VirginiaCommonwealthUniversity,401WestMainStreet,Richmond,VA23284,USA

LukaszKurganIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresidue-levelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.

11

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

DenovoensemblemodelingsuggeststhatAP2-bindingtodisorderedregionscanincreasestericvolumeofEpsinbutnotEps15

N.SuhasJagannathan1,ChristopherW.V.Hogue2,LisaTucker-Kellogg3

1Duke-NUSMedicalSchool;2NationalUniversityofSingapore,600EpicWayUnit345SanJoseCA95134;3Cancer&StemCellBiologyandCentreforComputationalBiologyDuke-NUSMedical

SchoolLisaTucker-KelloggProteinswithintrinsicallydisorderedregions(IDRs)havelargehydrodynamicradii,comparedwithglobularproteinsofequivalentweight.RecentexperimentsshowedthatIDRswithlargeradiicancreatestericpressuretodrivemembranecurvatureduringClathrin-mediatedendocytosis(CME).EpsinandEps15aretwoCMEproteinswithIDRsthatcontainmultiplemotifsforbindingtheadaptorproteinAP2,buttheimpactofAP2-bindingontheseIDRsisunknown.SomeIDRsacquirebinding-inducedfunctionbyformingafoldedquaternarystructure,butwehypothesizethattheIDRsofEpsinand/orEps15acquirebinding-inducedfunctionbyincreasingtheirstericvolume.WeexplorethishypothesisinsilicobygeneratingconformationalensemblesoftheIDRsofEpsin(4millionstructures)orEps15(3millionstructures),thenestimatingtheimpactofAP2-bindingonRadiusofGyration(RG).ResultsshowthattheensembleofEpsinIDRconformationsthataccommodateAP2bindinghasaright-shifteddistributionofRG(largerradii)thantheunboundEpsinensemble.Incontrast,theensembleofEps15IDRconformationshascomparableRGdistributionbetweenAP2-boundandunbound.WespeculatethatAP2triggerstheEpsinIDRtofunctionthroughbinding-induced-expansion,whichcouldincreasestericpressureandmembranebendingduringCME.

12

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

Modulationofp53TransactivationDomainConformationsbyLigandBindingandCancer-AssociatedMutations

XiaorongLiu,JianhanChen

UniversityofMassachusettsAmherstJianhanChenIntrinsicallydisorderedproteins(IDPs)areimportantfunctionalproteins,andtheirderegulationarelinkedtonumeroushumandiseasesincludingcancers.Understandinghowdisease-associatedmutationsordrugmoleculescanperturbthesequence-disorderedensemble-function-diseaserelationshipofIDPsremainschallenging,becauseitrequiresdetailedcharacterizationoftheheterogeneousstructuralensemblesofIDPs.Inthiswork,wecombinethelatestatomisticforcefielda99SB-disp,enhancedsamplingtechniquereplicaexchangewithsolutetempering,andGPU-acceleratedmoleculardynamicssimulationstoinvestigatehowfourcancer-associatedmutations,K24N,N29K/N30D,D49Y,andW53G,andbindingofananti-cancermolecule,epigallocatechingallate(EGCG),modulatethedisorderedensembleofthetransactivationdomain(TAD)oftumorsuppressorp53.Throughextensivesampling,inexcessof1.0μsperreplica,well-convergedstructuralensemblesofwild-typeandmutantp53-TADaswellasWTp53-TADinthepresenceofEGCGweregenerated.Theresultsrevealthatmutantscouldinducelocalstructuralchangesandaffectsecondarystructuralproperties.Interestingly,bothEGCGbindingandN29K/N30Dcouldalsoinducelong-rangestructuralreorganizationsandleadtomorecompactstructuresthatcouldshieldkeybindingsitesofp53-TADregulators.FurtheranalysisrevealsthattheeffectsofEGCGbindingaremainlyachievedthroughnonspecificinteractions.Theseobservationsaregenerallyconsistentwithon-goingNMRstudiesandbindingassays.OurstudiessuggestthatinducedconformationalcollapseofIDPsmaybeageneralmechanismforshieldingfunctionalsites,thusinhibitingrecognitionoftheirtargets.Thecurrentstudyalsodemonstratesthatatomisticsimulationsprovideaviableapproachforstudyingthesequence-disorderedensemble-function-diseaserelationshipsofIDPsanddevelopingnewdrugdesignstrategiestargetingregulatoryIDPs.

13

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

ExploringRelationshipsbetweentheDensityofChargedTractswithinDisorderedRegionsandPhaseSeparation

RamizSomjee1,2,DianaM.Mitrea1,RichardW.Kriwacki1,3

1St.JudeChildren'sResearchHospital;2RhodesCollege,3UniversityofTennesseeHealthSciences

CenterRamizSomjeeBiomolecularcondensatesformthroughaprocesstermedphaseseparationandplaydiverserolesthroughoutthecell.Proteinsthatundergophaseseparationoftenhavedisorderedregionsthatcanengageinweak,multivalentinteractions;however,ourunderstandingofthesequencegrammarthatdefineswhichproteinsphaseseparateisfarfromcomplete.Here,weshowthatproteinsthatdisplayahighdensityofchargedtractswithinintrinsicallydisorderedregionsarelikelytobeconstituentsofelectrostaticallyorganizedbiomolecularcondensates.WescoredthehumanproteomeusinganalgorithmtermedABTdensitythatquantifiesthedensityofchargedtractsandobservedthatproteinswithmorechargedtractsareenrichedinparticularGeneOntologyannotationsand,baseduponanalysisofinteractionnetworks,clusterintodistinctbiomolecularcondensates.Theseresultssuggestthatelectrostatically-driven,multivalentinteractionsinvolvingchargedtractswithindisorderedregionsservetoorganizecertainbiomolecularcondensatesthroughphaseseparation.

14

MUTATIONALSIGNATURES

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

15

MutationalSignatures

PhySigs:PhylogeneticInferenceofMutationalSignatureDynamics

SarahChristensen1,MarkD.M.Leiserson2,MohammedEl-Kebir1

1UniversityofIllinoisatUrbana-Champaign,2UniversityofMaryland

SarahChristensenDistinctmutationalprocessesshapethegenomesoftheclonescomprisingatumor.Theseprocessesresultindistinctmutationalpatterns,summarizedbyasmallnumberofmutationalsignatures.Currentanalysesofclone-specificexposurestomutationalsignaturesdonotfullyincorporateatumor’sevolutionarycontext,eitherinferringidenticalexposuresforalltumorclones,orinferringexposuresforeachcloneindependently.Here,weintroducetheTree-constrainedExposureproblemtoinferasmallnumberofexposureshiftsalongtheedgesofagiventumorphylogeny.Ouralgorithm,PhySigs,solvesthisproblemandincludesmodelselectiontoidentifythenumberofexposureshiftsthatbestexplainthedata.Wevalidateourapproachonsimulateddataandidentifyexposureshiftsinlungcancerdata,includingatleastoneshiftwithamatchingsubclonaldrivermutationinthemismatchrepairpathway.Moreover,weshowthatourapproachenablestheprioritizationofalternativephylogeniesinferredfromthesamesequencingdata.PhySigsispubliclyavailableathttps://github.com/elkebir-group/PhySigs

16

MutationalSignatures

TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies

CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4

1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof

MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer

Centre,NewYork,USA(pending)CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Recon-structingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.

17

MutationalSignatures

DNARepairFootprintUncoversContributionofDNARepairMechanismtoMutationalSignatures

DamianWojtowicz1,MarkD.M.Leiserson2,RodedSharan3,TeresaM.Przytycka1

1NIH,2UniversityofMaryland,3TelAvivUniversityTeresaPrzytyckaCancergenomesaccumulatealargenumberofsomaticmutationsresultingfromimperfectionofDNAprocessingduringnormalcellcycleaswellasfromcarcinogenicexposuresorcancerrelatedaberrationsofDNAmaintenancemachinery.Theseprocessesoftenleadtodistinctivepatternsofmutations,calledmutationalsignatures.Severalcomputationalmethodshavebeendevelopedtouncoversuchsignaturesfromcatalogsofsomaticmutations.However,cancermutationalsignaturesaretheend-effectofseveralinterplayingfactorsincludingcarcinogenicexposuresandpotentialdeficienciesoftheDNArepairmechanism.Tofullyunderstandthenatureofeachsignature,itisimportanttodisambiguatetheatomiccomponentsthatcontributetothefinalsignature.Here,weintroduceanewdescriptorofmutationalsignatures,DNARepairFootPrint(RePrint),andshowthatitcancapturecommonpropertiesofdeficienciesinrepairmechanismscontributingtodiversesignatures.WevalidatethemethodwithpublishedmutationalsignaturesfromcelllinestargetedwithCRISPR-Cas9-basedknockoutsofDNArepairgenes.

18

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

19

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ClinicalConceptEmbeddingsLearnedfromMassiveSourcesofMultimodalMedicalData

AndrewL.Beam1,BenjaminKompa2,AllenSchmaltz1,InbarFried3,GriffinWeber2,NathanPalmer2,XuShi1,TianxiCai1,IsaacS.Kohane3

1HarvardT.H.ChanSchoolofPublicHealth,2HarvardMedicalSchool,3UniversityofNorth

CarolinaSchoolofMedicineBenjaminKompaWordembeddingsareapopularapproachtounsupervisedlearningofwordrelationshipsthatarewidelyusedinnaturallanguageprocessing.Inthisarticle,wepresentanewsetofembeddingsformedicalconceptslearnedusinganextremelylargecollectionofmultimodalmedicaldata.Leaningonrecenttheoreticalinsights,wedemonstratehowaninsuranceclaimsdatabaseof60millionmembers,acollectionof20millionclinicalnotes,and1.7millionfulltextbiomedicaljournalarticlescanbecombinedtoembedconceptsintoacommonspace,resultinginthelargesteversetofembeddingsfor108,477medicalconcepts.Toevaluateourapproach,wepresentanewbenchmarkmethodologybasedonstatisticalpowerspecificallydesignedtotestembeddingsofmedicalconcepts.Ourapproach,calledcui2vec,attainsstate-of-the-artperformancerelativetopreviousmethodsinmostinstances.Finally,weprovideadownloadablesetofpre-trainedembeddingsforotherresearcherstouse,aswellasanonlinetoolforinteractiveexplorationofthecui2vecembeddings.

20

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

AssessmentofImputationMethodsforMissingGeneExpressionDatainMeta-AnalysisofDistinctCohortsofTuberculosisPatients

CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.Hill

DartmouthCollegeCarlyBobakThegrowthofpubliclyavailablerepositories,suchastheGeneExpressionOmnibus,hasallowedresearcherstoconductmeta-analysisofgeneexpressiondataacrossdistinctcohorts.Inthiswork,weassesseightimputationmethodsfortheirabilitytoimputegeneexpressiondatawhenvaluesaremissingacrossanentirecohortofTuberculosis(TB)patients.Weinvestigatehowvaryingproportionsofmissingdata(across10%,20%,and30%ofpatientsamples)influencetheimputationresults,andtestforsignificantlydifferentiallyexpressedgenesandenrichedpathwaysinpatientswithactiveTB.Ourresultsindicatethattruncatingtocommongenesobservedacrosscohorts,whichisthecurrentmethodusedbyresearchers,resultsintheexclusionofimportantbiologyandsuggestthatLASSOandLLSimputationmethodologiescanreasonablyimputegenesacrosscohortswhentotalmissingnessratesarebelow20%.

21

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Towardsidentifyingdrugsideeffectsfromsocialmediausingactivelearningandcrowdsourcing

SophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer

UniversityofMainzSophieBurkhardtMotivation:Socialmediaisalargelyuntappedsourceofinformationonsideeffectsofdrugs.Twitterinparticulariswidelyusedtoreportoneverydayeventsandpersonalailments.However,labelingthisnoisydataisadifficultproblembecauselabeledtrainingdataissparseandautomaticlabelingiserror-prone.Crowdsourcingcanhelpinsuchascenariotoobtainmorereliablelabels,butisexpensiveincomparisonbecauseworkershavetobepaid.Toremedythis,semi-supervisedactivelearningmayreducethenumberoflabeleddataneededandfocusthemanuallabelingprocessonimportantinformation.Results:WeextracteddatafromTwitterusingthepublicAPI.WesubsequentlyuseAmazonMechanicalTurkincombinationwithastate-of-the-artsemi-supervisedactivelearningmethodtolabeltweetswiththeirassociateddrugsandsideeffectsintwostages.Ourresultsshowthatourmethodisaneffectivewayofdiscoveringsideeffectsintweetswithanimprovementfrom53%F-measureto67%F-measureascomparedtoaonestageworkflow.Additionally,weshowtheeffectivenessoftheactivelearningschemeinreducingthelabelingcostincomparisontoanon-activebaseline.

22

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

MicrovascularDynamicsfrom4DMicroscopyUsingTemporalSegmentation

ShirGur,LiorWolf,LiorGolgher,PabloBlinder

TelAvivUniversityLiorWolfRecentlydevelopedmethodsforrapidcontinuousvolumetrictwo-photonmicroscopyfacilitatetheobservationofneuronalactivityinhundredsofindividualneuronsandchangesinbloodflowinadjacentbloodvesselsacrossalargevolumeoflivingbrainatunprecedentedspatio-temporalresolution.However,thehighimagingratenecessitatesfullyautomatedimageanalysis,whereastissueturbidityandphoto-toxicitylimitationsleadtoextremelysparseandnoisyimagery.Inthiswork,weextendarecentlyproposeddeeplearningvolumetricbloodvesselsegmentationnetwork,suchthatitsupportstemporalanalysis.Withthistechnology,weareabletotrackchangesincerebralbloodvolumeovertimeandidentifyspontaneousarterialdilationsthatpropagatetowardsthepialsurface.Thisnewcapabilityisapromisingsteptowardscharacterizingthehemodynamicresponsefunctionuponwhichfunctionalmagneticresonanceimaging(fMRI)isbased.

23

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

UsingTranscriptionalSignaturestoFindCancerDriverswithLURE

DavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart

Dept.ofBiomolecularEngineeringandUCSantaCruzGenomicsInstitute,UniversityOf

CaliforniaSantaCruz,SantaCruz,CA95064USADavidHaanCancergenomeprojectshaveproducedmultidimensionaldatasetsonthousandsofsamples.Yet,dependingonthetumortype,5-50%ofsampleshavenoknowndrivingevent.Weintroduceasemi-supervisedmethodcalledLearningUnRealizedEvents(LURE)thatusesaprogressivelabellearningframeworkandminimumspanninganalysistopredictcancerdriversbasedontheiralteredsamplessharingageneexpressionsignaturewiththesamplesofaknownevent.WedemonstratetheutilityofthemethodontheTCGAPan-CancerAt-lasdatasetforwhichitproducedahigh-confidenceresultrelating59newconnectionsto18knownmutationeventsincludingalterationsinthesamegene,family,andpathway.WegiveexamplesofpredicteddriversinvolvedinTP53,telomeremaintenance,andMAPK/RTKsignalingpathways.LUREidentifiesconnectionsbetweengeneswithnoknownpriorrela-tionship,someofwhichmayoffercluesfortargetingspecificformsofcancer.CodeandSup-plementalMaterialareavailableontheLUREwebsite:https://sysbiowiki.soe.ucsc.edu/lure.

24

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

PAGE-Net:InterpretableandIntegrativeDeepLearningforSurvivalAnalysisUsingHistopathologicalImagesandGenomicData

JieHao1,SaiChandraKosaraju2,NelsonZangeTsaku3,DaeHyunSong4,MingonKang2

1UniversityofPennsylvania,2UniversityofNevadaLasVegas,3KennesawStateUniversity,

4GyeongsangNationalUniversityChangwonHospitalJieHaoTheintegrationofmulti-modaldata,suchashistopathologicalimagesandgenomicdata,isessentialforunderstandingcancerheterogeneityandcomplexityforpersonalizedtreatments,aswellasforenhancingsurvivalpredictionsincancerstudy.Histopathology,asaclinicalgold-standardtoolfordiagnosisandprognosisincancers,allowsclinicianstomakeprecisedecisionsontherapies,whereashigh-throughputgenomicdatahavebeeninvestigatedtodissectthegeneticmechanismsofcancers.Weproposeabiologicallyinterpretabledeeplearningmodel(PAGE-Net)thatintegrateshistopathologicalimagesandgenomicdata,notonlytoimprovesurvivalprediction,butalsotoidentifygeneticandhistopathologicalpatternsthatcausedifferentsurvivalratesinpatients.PAGE-Netconsistsofpathology/genome/demography-specificlayers,eachofwhichprovidescomprehensivebiologicalinterpretation.Inparticular,weproposeanovelpatch-wisetexture-basedconvolutionalneuralnetwork,withapatchaggregationstrategy,toextractglobalsurvival-discriminativefeatures,withoutmanualannotationforthepathology-specificlayers.Weadaptedthepathway-basedsparsedeepneuralnetwork,namedCox-PASNet,forthegenome-specificlayers.TheproposeddeeplearningmodelwasassessedwiththehistopathologicalimagesandthegeneexpressiondataofGlioblastomaMultiforme(GBM)atTheCancerGenomeAtlas(TCGA)andTheCancerImagingArchive(TCIA).PAGE-NetachievedaC-indexof0.702,whichishigherthantheresultsachievedwithonlyhistopathologicalimages(0.509)andCox-PASNet(0.640).Moreimportantly,PAGE-Netcansimultaneouslyidentifyhistopathologicalandgenomicprognosticfactorsassociatedwithpatients’survivals.ThesourcecodeofPAGE-Netispubliclyavailableathttps://github.com/DataX-JieHao/PAGE-Net

25

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Machinelearningalgorithmsforsimultaneoussuperviseddetectionofpeaksinmultiplesamplesandcelltypes

TobyDylanHocking1,GuillaumeBourque2

1NorthernArizonaUniversity,2McGillUniversity

TobyHockingJointpeakdetectionisacentralproblemwhencomparingsamplesinepigenomicdataanalysis,butcurrentalgorithmsforthistaskareunsupervisedandlimitedtoatmosttwosampletypes.WeproposePeakSegPipeline,anewgenome-widemulti-samplepeakcallingpipelineforepigenomicdatasets.Itperformspeakdetectionusingaconstrainedmaximumlikelihoodsegmentationmodelwithessentiallyonlyonefreeparameterthatneedstobetuned:thenumberofpeaks.Toselectthenumberofpeaks,weproposetolearnapenaltyfunctionbasedonuser-providedlabelsthatindicategenomicregionswithorwithoutpeaksinspecificsamples.Incomparisonswithstate-of-the-artpeakdetectionalgorithms,PeakSegPipelineachievessimilarorbetteraccuracy,andamoreinterpretablemodelwithoverlappingpeaksthatoccurinexactlythesamepositionsacrossallsamples.Ournovelapproachisabletolearnthatpredictedpeaksizesvarybyexperimenttype.

26

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Graph-basedinformationdiffusionmethodforprioritizingfunctionallyrelatedgenesinprotein-proteininteractionnetworks

MinhPham,OlivierLichtarge

BaylorCollegeofMedicineMinhPhamShortestpathlengthmethodsareroutinelyusedtovalidatewhethergenesofinterestarefunctionallyrelatedtoeachotherbasedonbiologicalnetworkinformation.However,themethodsarecomputationallyintensive,impedingextensiveutilizationofnetworkinformation.Inaddition,non-weightedshortestpathlengthapproach,whichismorefrequentlyused,oftentreatallnetworkconnectionsequallywithouttakingintoaccountofconfidencelevelsoftheassociations.Ontheotherhand,graph-basedinformationdiffusionmethod,whichemploysboththepresenceandconfidenceweightsofnetworkedges,canefficientlyexplorelargenetworksandhaspreviouslydetectedmeaningfulbiologicalpatterns.Therefore,inthisstudy,wehypothesizedthatthegraph-basedinformationdiffusionmethodcouldprioritizegeneswithrelevantfunctionsmoreefficientlyandaccuratelythantheshortestpathlengthapproaches.Wedemonstratedthatthegraph-basedinformationdiffusionmethodsubstantiallydifferentiatednotonlygenesparticipatinginsamebiologicalpathways(p<<0.0001)butalsogenesassociatedwithspecifichumandrug-inducedclinicalsymptoms(p<<0.0001)fromrandom.Furthermore,thediffusionmethodprioritizedthesefunctionallyrelatedgenesfasterandmoreaccuratelythantheshortestpathlengthapproaches(pathways:p=2.7e-28,clinicalsymptoms:p=0.032).Thesedatashowthegraph-basedinformationdiffusionmethodcanberoutinelyusedforrobustprioritizationoffunctionallyrelatedgenes,facilitatingefficientnetworkvalidationandhypothesisgeneration,especiallyforhumanphenotype-specificgenes.

27

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ALiterature-BasedKnowledgeGraphEmbeddingMethodforIdentifyingDrugRepurposingOpportunitiesinRareDiseases

DanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman

StanfordUniversityDanielSosaMillionsofAmericansareaffectedbyrarediseases,manyofwhichhavepoorsurvivalrates.However,thesmallmarketsizeofindividualrarediseases,combinedwiththetimeandcapitalrequirementsofpharmaceuticalR&D,havehinderedthedevelopmentofnewdrugsforthesecases.Apromisingalternativeisdrugrepurposing,wherebyexistingFDA-approveddrugsmightbeusedtotreatdiseasesdifferentfromtheiroriginalindications.Inordertogeneratedrugrepurposinghypothesesinasystematicandcomprehensivefashion,itisessentialtointegrateinformationfromacrosstheliteratureofpharmacology,genetics,andpathology.Tothisend,weleverageanewlydevelopedknowledgegraph,theGlobalNetworkofBiomedicalRelationships(GNBR).GNBRisalarge,heterogeneousknowledgegraphcomprisingdrug,disease,andgene(orprotein)entitieslinkedbyasmallsetofsemantic“themes”derivedfromtheabstractsofbiomedicalliterature.Weapplyaknowledgegraphembeddingmethodthatexplicitlymodelstheuncertaintyassociatedwithliterature-derivedrelationshipsanduseslinkpredictiontogeneratedrugrepurposinghypotheses.Thisapproachachieveshighperformanceonagold-standardtestsetofknowndrugindications(AUROC=0.89)andiscapableofgeneratingnovelrepurposinghypotheses,whichweindependentlyvalidateusingexternalliteraturesourcesandproteininteractionnetworks.Finally,wedemonstratetheabilityofourmodeltoproduceexplanationsofitspredictions.

28

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Two-stageMLClassifierforIdentifyingHostProteinTargetsoftheDengueProtease

JacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell

DepartmentofMolecular,CellularandDevelopmentalBiology;BioFrontiersInstitute;University

ofColoradoBoulder(allauthorshavethesameaffiliation)JacobStanleyFlavivirusessuchasdengueencodeaproteasethatisessentialforviralreplication.Theproteasefunctionsbycleavingwell-conservedpositionsintheviralpolyprotein.Inadditiontotheviralpolyprotein,thedengueproteasecleavesatleastonehostproteininvolvedinimmuneresponse.Thisraisesthequestion,whatotherhostproteinsaretargetedandcleaved?Herewepresentanewcomputationalmethodforidentifyingputativehostproteintargetsofthedenguevirusprotease.Ourmethodreliesonbiochemicalandsecondarystructurefeaturesattheknowncleavagesitesintheviralpolyproteininatwo-stageclassificationprocesstoidentifyputativecleavagetargets.Theaccuracyofourpredictionsscaledinverselywithevolutionarydistancewhenweappliedittotheknowncleavagesitesofseveralotherflaviviruses---agoodindicationofthevalidityofourpredictions.Ultimately,ourclassifieridentified257humanproteinsitespossessingbothasimilartargetmotifandaccessiblelocalstructure.Theseproteinsarepromisingcandidatesforfurtherinvestigation.Asthenumberofviralsequencesexpands,ourmethodcouldbeadoptedtopredicthosttargetsofotherflaviviruses.

29

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

EnhancingModelInterpretabilityandAccuracyforDiseaseProgressionPredictionviaPhenotype-BasedPatientSimilarityLearning

YueWang1,TongWu1,2,YunlongWang1,GaoWang3

1IQVIAInc.,2UniversityofMinnesota,3UniversityofChicago

YueWangModelshavebeenproposedtoextracttemporalpatternsfromlongitudinalelectronichealthrecords(EHR)forclinicalpredictivemodels.However,thecommonrelationsamongpatients(e.g.,receivingthesamemedicaltreatments)wererarelyconsidered.Inthispaper,weproposetolearnpatientsimilarityfeaturesasphenotypesfromtheaggregatedpatient-medicalservicematrixusingnon-negativematrixfactorization.Onreal-worldmedicalclaimdata,weshowthatthelearnedphenotypesarecoherentwithineachgroup,andalsoexplanatoryandindicativeoftargeteddiseases.WeconductedexperimentstopredictthediagnosesforChronicLymphocyticLeukemia(CLL)patients.Resultsshowthatthephenotype-basedsimilarityfeaturescanimprovepredictionovermultiplebaselines,includinglogisticregression,randomforest,convolutionalneuralnetwork,andmore.

30

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

31

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

IntegratedCancerSubtypingusingHeterogeneousGenome-ScaleMolecularDatasets

SuzanArslanturk1,SorinDraghici1,TinNguyen2

1WayneStateUniversity,2UniversityofNevada

SorinDraghiciVastrepositoriesofheterogeneousdatafromexistingsourcespresentuniqueopportunities.Takenindividually,eachofthedatasetsofferssolutionstoimportantdomainandsource-specificquestions.Collectively,theyrepresentcomplementaryviewsofrelateddataentitieswithanaggregateinformationvalueoftenwellexceedingthesumofitsparts.Integrationofheterogeneousdataisthereforeparamounttoi)obtainamoreunifiedpictureandcomprehensiveviewoftherelations,ii)achievemorerobustresults,iii)improvetheaccuracyandintegrity,andiv)illuminatethecomplexinteractionsamongdatafeatures.Inthispaper,wehaveproposedadataintegrationmethodologytoidentifysubtypesofcancerusingmultipledatatypes(mRNA,methylation,microRNAandsomaticvariants)anddifferentdatascalesthatcomefromdifferentplatforms(microarray,sequencing,etc.).TheCancerGenomeAtlas(TCGA)datasetisusedtobuildthedataintegrationandcancersubtypingframework.Theproposeddataintegrationanddiseasesubtypingapproachaccuratelyidentifiesnovelsubgroupsofpatientswithsignificantlydifferentsurvivalprofiles.Withcurrentavailabilityofvastgenomics,andvariantdataforcancer,theproposeddataintegrationsystemwillbetterdifferentiatecancerandpatientsubtypesforriskandoutcomepredictionandtargetedtreatmentplanningwithoutadditionalcostandpreciouslosttime.

32

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Assessmentofcoverageforendogenousmetabolitesandexogenouschemicalcompoundsusinganuntargetedmetabolomicsplatform

SekWonKong1,CarlesHernandez-Ferrer2

1ComputationalHealthInformaticsProgram,BostonChildren’sHospital,300LongwoodAvenueBoston,MA02115,USA;2DepartmentofPediatrics,HarvardMedicalSchool,Boston,MA02115,

USASekWonKongPhysiologicalstatusandpathologicalchangesinanindividualcanbecapturedbymetabolicstatethatreflectstheinfluenceofbothgeneticvariantsandenvironmentalfactorssuchasdiet,lifestyleandgutmicrobiome.Thetotalityofenvironmentalexposurethroughoutlifetime–i.e.,exposome–isdifficulttomeasurewithcurrenttechnologies.However,targetedmeasurementofexogenouschemicalsanduntargetedprofilingofendogenousmetaboliteshavebeenwidelyusedtodiscoverbiomarkersofpathophysiologicchangesandtounderstandfunctionalimpactsofgeneticvariants.Toinvestigatethecoverageofchemicalspaceandinterindividualvariationrelatedtodemographicandpathologicalconditions,weprofiled169plasmasamplesusinganuntargetedmetabolomicsplatform.Onaverage,1,009metaboliteswerequantifiedineachindividual(range906–1,038)outof1,244totalchemicalcompoundsdetectedinourcohort.Ofnote,agewaspositivelycorrelatedwiththetotalnumberofdetectedmetabolitesinbothmalesandfemales.UsingtherobustQnestimator,wefoundmetaboliteoutliersineachsample(mean22,rangefrom7to86).Atotalof50metaboliteswereoutliersinapatientwithphenylketonuriaincludingtheonesknownforphenylalaninepathwaysuggestingmultiplemetabolicpathwaysperturbedinthispatient.Thelargestnumberofoutliers(N=86)wasfoundina5-year-oldboywithalpha-1-antitrypsindeficiencywhowerewaitingforlivertransplantationduetocirrhosis.Xenobioticsincludingdrugs,dietsandenvironmentalchemicalsweresignificantlycorrelatedwithdiverseendogenousmetabolitesandtheuseofantibioticssignificantlychangedgutmicrobialproductsdetectedinhostcirculation.Severalchallengessuchasannotationoffeatures,referencerangeandvarianceforeachfeatureperagegroupandgender,andpopulationscalereferencedatasetsneedtobeaddressed;however,untargetedmetabolomicscouldbeimmediatelydeployedasabiomarkerdiscoveryplatformandtoevaluatetheimpactofgenomicvariantsandexposuresonmetabolicpathwaysforsomediseases.

33

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Coverageprofilecorrectionofshallow-depthcirculatingcell-freeDNAsequencingviamulti-distancelearning

NicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey

MayoClinicCollegeofMedicineandSciences

NicholasLarsonShallow-depthwhole-genomesequencing(WGS)ofcirculatingcell-freeDNA(ccfDNA)isapopularapproachfornon-invasivegenomicscreeningassays,includingliquidbiopsyforearlydetectionofinvasivetumorsaswellasnon-invasiveprenatalscreening(NIPS)forcommonfetaltrisomies.IncontrasttonuclearDNAWGS,ccfDNAWGSexhibitsextensiveinter-andintra-samplecoveragevariabilitythatisnotfullyexplainedbytypicalsourcesofvariationinWGS,suchasGCcontent.Thisvariabilitymayinflatefalsepositiveandfalsenegativescreeningratesofcopy-numberalterationsandaneuploidy,particularlyifthesefeaturesarepresentatarelativelylowproportionoftotalsequencedcontent.Herein,weproposeanempirically-drivencoveragecorrectionstrategythatleveragespriorannotationinformationinamulti-distancelearningcontexttoimprovewithin-samplecoverageprofilecorrection.Specifically,wetrainaweightedk-nearestneighbors-stylemethodonnon-pregnantfemaledonorccfDNAWGSsamples,andapplyittoNIPSsamplestoevaluatecoverageprofilevariabilityreduction.Weadditionallycharacterizeimprovementinthediscriminationofpositivefetaltrisomycasesrelativetonormalcontrols,andcompareourresultsagainstamoretraditionalregression-basedapproachtoprofilecoveragecorrectionbasedonGCcontentandmappability.Undercross-validation,performancemeasuresindicatedbenefittocombiningthetwofeaturesetsrelativetoeitherinisolation.Wealsoobservedsubstantialimprovementincoverageprofilevariabilityreductioninleave-outclinicalNIPSsamples,withvariabilityreducedby26.5-53.5%relativetothestandardregression-basedmethodasquantifiedbymedianabsolutedeviation.Finally,weobservedimprovementdiscriminationforscreeningpositivetrisomycasesreducingccfDNAWGScoveragevariabilitywhileadditionallyimprovingNIPStrisomyscreeningassayperformance.Overall,ourresultsindicatethatmachinelearningapproachescansubstantiallyimproveccfDNAWGScoverageprofilecorrectionanddownstreamanalyses.

34

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

PGxMine:TextminingforcurationofPharmGKB

JakeLever1,JuliaM.Barbarino2,LiGong2,RachelHuddart2,KatrinSangkuhl2,RyanWhaley2,MichelleWhirl-Carrillo2,MarkWoon2,TeriE.Klein2,3,RussB.Altman1,2,3

1DepartmentofBioengineering,StanfordUniversity,Stanford,CA,94305;2Departmentof

BiomedicalDataScience,StanfordUniversity,Stanford,CA,94305;3DepartmentofMedicine,StanfordUniversity,Stanford,CA,94305

JakeLeverPrecisionmedicinetailorstreatmenttoindividualspersonaldataincludingdifferencesintheirgenome.ThePharmacogenomicsKnowledgebase(PharmGKB)provideshighlycuratedinformationontheeffectofgeneticvariationondrugresponseandsideeffectsforawiderangeofdrugs.PharmGKB’sscientificcuratorstriage,reviewandannotatealargenumberofpaperseachyearbutthetaskischallenging.WepresentthePGxMineresource,atext-minedresourceofpharmacogenomicassociationsfromallaccessiblepublishedliteraturetoassistinthecurationofPharmGKB.Wedevelopedasupervisedmachinelearningpipelinetoextractassociationsbetweenavariant(DNAandproteinchanges,starallelesanddbSNPidentifiers)andachemical.PGxMinecovers452chemicalsand2,426variantsandcontains19,930mentionsofpharmacogenomicassociationsacross7,170papers.AnevaluationbyPharmGKBcuratorsfoundthat57ofthetop100associationsnotfoundinPharmGKBledto83curatablepapersandafurther24associationswouldlikelyleadtocuratablepapersthroughcitations.Theresultscanbeviewedathttps://pgxmine.pharmgkb.org/andcodecanbedownloadedathttps://github.com/jakelever/pgxmine.

35

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth

ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1

1UniversityofNotreDame,2UniversityofCaliforniaLosAngeles

ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine''correlation''betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.

36

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

ImplementingaCloudBasedMethodforProtectedClinicalTrialDataSharing

GauravLuthria,QingboWang

HarvardUniversityGauravLuthriaClinicaltrialsgeneratealargeamountofdatathathavebeenunderutilizedduetoobstaclesthatpreventdatasharingincludingriskingpatientprivacy,datamisrepresentation,andinvalidsecondaryanalyses.Inordertoaddresstheseobstacles,wedevelopedanoveldatasharingmethodwhichensurespatientprivacywhilealsoprotectingtheinterestsofclinicaltrialinvestigators.Ourflexibleandrobustapproachinvolvestwocomponents:(1)anadvancedcloud-basedqueryinglanguagethatallowsuserstotesthypotheseswithoutdirectaccesstotherealclinicaltrialdataand(2)correspondingsyntheticdataforthequeryofinterestthatallowsforexploratoryresearchandmodeldevelopment.Bothcomponentscanbemodifiedbytheclinicaltrialinvestigatordependingonfactorssuchasthetypeoftrialornumberofpatientsenrolled.Totesttheeffectivenessofoursystem,wefirstimplementasimpleandrobustpermutationbasedsyntheticdatagenerator.Wethenusethesyntheticdatageneratorcoupledwithourqueryinglanguagetoidentifysignificantrelationshipsamongvariablesinarealisticclinicaltrialdataset.

37

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Pathwayandnetworkembeddingmethodsforprioritizingpsychiatricdrugs

YashPershad1,MargaretGuo2,RussB.Altman3

1StanfordUniversityDepartmentofBioengineering,2StanfordUniversityBiomedicalInformatics

Program,3StanfordUniversityDepartmentsofBioengineering,Genetics,&MedicineYashPershad

OneinfiveAmericansexperiencementalillness,androughly75%ofpsychiatricprescriptionsdonotsuccessfullytreatthepatient’scondition.Extensiveevidenceimplicatesgeneticfactorsandsignalingdisruptioninthepathophysiologyofthesediseases.Changesintranscriptionoftenunderliethismolecularpathwaydysregulation;individualpatienttranscriptionaldatacanimprovetheefficacyofdiagnosisandtreatment.Recentlarge-scalegenomicstudieshaveuncoveredsharedgeneticmodulesacrossmultiplepsychiatricdisorders—providinganopportunityforanintegratedmulti-diseaseapproachfordiagnosis.Moreover,network-basedmodelsinformedbygeneexpressioncanrepresentpathologicalbiologicalmechanismsandsuggestnewgenesfordiagnosisandtreatment.Here,weusepatientgeneexpressiondatafrommultiplestudiestoclassifypsychiatricdiseases,integrateknowledgefromexpert-curateddatabasesandpubliclyavailableexperimentaldatatocreateaugmenteddisease-specificgenesets,andusethesetorecommenddisease-relevantdrugs.FromGeneExpressionOmnibus,weextractexpressiondatafrom145casesofschizophrenia,82casesofbipolardisorder,190casesofmajordepressivedisorder,and307sharedcontrols.Weusepathway-basedapproachestopredictpsychiatricdiseasediagnosiswitharandomforestmodel(78%accuracy)andderiveimportantfeaturestoaugmentavailabledruganddiseasesignatures.Usingprotein-protein-interactionnetworksandembedding-basedmethods,webuildapipelinetoprioritizetreatmentsforpsychiatricdiseasesthatachievesa3.4-foldimprovementoverabackgroundmodel.Thus,wedemonstratethatgene-expression-derivedpathwayfeaturescandiagnosepsychiatricdiseasesandthatmolecularinsightsderivedfromthisclassificationtaskcaninformtreatmentprioritizationforpsychiatricdiseases.

38

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata

JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1

1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC

JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.

39

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Computationallyefficient,exact,covariate-adjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks

JackWolf1,MarthaBarnard1,XuetingXia2,NathanRyder3,JasonWestra4,NathanTintle4

1St.OlafCollege,2TexasTechUniversity,3ColoradoStateUniversity,4DordtUniversity

NathanTintle

Thepopularizationofbiobanksprovidesanunprecedentedamountofgeneticandphenotypicinformationthatcanbeusedtoresearchtherelationshipbetweengeneticsandhumanhealth.Despitetheopportunitiesthesedatasetsprovide,theyalsoposemanyproblemsassociatedwithcomputationaltimeandcosts,datasizeandtransfer,andprivacyandsecurity.Thepublishingofsummarystatisticsfromthesebiobanks,andtheuseoftheminavarietyofdownstreamstatisticalanalyses,alleviatesmanyoftheselogisticalproblems.However,majorquestionsremainabouthowtousesummarystatisticsinallbutthesimplestdownstreamapplications.Here,wepresentanovelapproachtoutilizebasicsummarystatistics(estimatesfromsinglemarkerregressionsonsinglephenotypes)toevaluatemorecomplexphenotypesusingmultivariatemethods.Inparticular,wepresentacovariate-adjustedmethodforconductingprincipalcomponentanalysis(PCA)utilizingonlybiobanksummarystatistics.Wevalidateexactformulasforthismethod,aswellasprovideaframeworkofestimationwhenspecificsummarystatisticsarenotavailable,throughsimulation.Weapplyourmethodtoarealdatasetoffattyacidandgenomicdata.

40

ARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

41

ArtificialIntelligenceforEnhancingClinicalMedicine

MulticlassDiseaseClassificationfromMicrobialWhole-CommunityMetagenomes

SaadKhan,LibushaKelly

AlbertEinsteinCollegeofMedicineSaadKhanThemicrobiome,thecommunityofmicroorganismslivingwithinanindividual,isapromisingavenuefordevelopingnon-invasivemethodsfordiseasescreeninganddiagnosis.Here,weutilize5643aggregated,annotatedwhole-communitymetagenomestoimplementthefirstmulticlassmicrobiomediseaseclassifierofthisscale,abletodiscriminatebetween18differentdiseasesandhealthy.Wecomparedthreedifferentmachinelearningmodels:randomforests,deepneuralnets,andanovelgraphconvolutionalarchitecturewhichexploitsthegraphstructureofphylogenetictreesasitsinput.Weshowthatthegraphconvolutionalmodeloutperformsdeepneuralnetsintermsofaccuracy(achieving75%averagetest-setaccuracy),receiver-operator-characteristics(92.1%averagearea-under-ROC(AUC)),andprecision-recall(50%averagearea-under-precision-recall(AUPR)).Additionally,theconvolutionalnet'sperformancecomplementsthatoftherandomforest,showingalowerpropensityforType-Ierrors(false-positives)whiletherandomforestmakeslessType-IIerrors(false-negatives).Lastly,weareabletoachieveover90%averagetop-3accuracyacrossallofourmodels.Together,theseresultsindicatethattherearepredictive,disease-specificsignaturesacrossmicrobiomesthatcanbeusedfordiagnosticpurposes.

42

ArtificialIntelligenceforEnhancingClinicalMedicine

LitGen:GeneticLiteratureRecommendationGuidedbyHumanExplanations

AllenNie1,ArturoL.Pineda1,MattW.Wright1,HannahWand1,BryanWulf1,HelioA.Costa1,RonakY.Patel2,CarlosD.Bustamante1,JamesZou1

1StanfordUniversity,2BaylorCollegeofMedicine

AllenNieAsgeneticsequencingcostsdecrease,thelackofclinicalinterpretationofvariantshasbecomethebottleneckinusinggeneticsdata.Amajorratelimitingstepinclinicalinterpretationisthemanualcurationofevidenceinthegeneticliteraturebyhighlytrainedbiocurators.Whatmakescurationparticularlytime-consumingisthatthecuratorneedstoidentifypapersthatstudyvariantpathogenicityusingdifferenttypesofapproachesandevidences---e.g.biochemicalassaysorcasecontrolanalysis.IncollaborationwiththeClinicalGenomicResource(ClinGen)---theflagshipNIHprogramforclinicalcuration---weproposethefirstmachinelearningsystem,LitGen,thatcanretrievepapersforaparticularvariantandfilterthembyspecificevidencetypesusedbycuratorstoassessforpathogenicity.LitGenusessemi-superviseddeeplearningtopredictthetypeofevi+denceprovidedbyeachpaper.ItistrainedonpapersannotatedbyClinGencuratorsandsystematicallyevaluatedonnewtestdatacollectedbyClinGen.LitGenfurtherleveragesrichhumanexplanationsandunlabeleddatatogain7.9%-12.6%relativeperformanceimprovementovermodelslearnedonlyontheannotatedpapers.Itisausefulframeworktoimproveclinicalvariantcuration.

43

ArtificialIntelligenceforEnhancingClinicalMedicine

MultilevelSelf-AttentionModelanditsUseonMedicalRiskPrediction

XianlongZeng1,2,YunyiFeng1,2,SoheilMoosavinasab2,DeborahLin2,SimonLin2,ChangLiu1

1SchoolofElectricalEngineeringandComputerScience,OhioUniversity,Athens,OH,USA;2The

ResearchInstituteatNationwideChildren’sHospital,Columbus,OH,USAxianlongzengVariousdeeplearningmodelshavebeendevelopedfordifferenthealthcarepredictivetasksusingElectronicHealthRecordsandhaveshownpromisingperformance.Inthesemodels,medicalcodesareoftenaggregatedintovisitrepresentationwithoutconsideringtheirheterogeneity,e.g.,thesamediagnosismightimplydifferenthealthcareconcernswithdifferentproceduresormedications.Thenthevisitsareoftenfedintodeeplearningmodels,suchasrecurrentneuralnetworks,sequentiallywithoutconsideringtheirregulartemporalinformationanddependenciesamongvisits.Toaddresstheselimitations,wedevelopedaMultilevelSelf-AttentionModel(MSAM)thatcancapturetheunderlyingrelationshipsbetweenmedicalcodesandbetweenmedicalvisits.WecomparedMSAMwithvariousbaselinemodelsontwopredictivetasks,i.e.,futurediseasepredictionandfuturemedicalcostprediction,withtwolargedatasets,i.e.,MIMIC-3andPFK.Intheexperiments,MSAMconsistentlyoutperformedbaselinemodels.Additionally,forfuturemedicalcostprediction,weuseddiseasepredictionasanauxiliarytask,whichnotonlyguidesthemodeltoachieveastrongerandmorestablefinancialprediction,butalsoallowsmanagedcareorganizationstoprovideabettercarecoordination.

44

ArtificialIntelligenceforEnhancingClinicalMedicine

IdentifyingTransitionalHighCostUsersfromUnstructuredPatientProfilesWrittenbyPrimaryCarePhysicians

HaoranZhang1,2,3,ElisaCandido3,AndrewS.Wilton3,RaquelDuchen3,LiisaJaakkimainen3,WalterWodchis3,4,5,QuaidMorris1,2,6,7

1DepartmentofComputerScience,UniversityofToronto;2VectorInstituteforArtificial

Intelligence,Toronto,Ontario,Canada;3ICES,Toronto,Ontario,Canada;4InstituteofHealthPolicy,Management,andEvaluation,UniversityofToronto;5InstituteforBetterHealth,Trillium

HealthPartners,Mississauga,Ontario,Canada;6TerrenceDonnellyCenterforCellularandBiomolecularResearch,UniversityofToronto;7DepartmentofMolecularGenetics,Universityof

TorontoHaoranZhangIdentificationandsubsequentinterventionofpatientsatriskofbecomingHighCostUsers(HCUs)presentstheopportunitytoimproveoutcomeswhilealsoprovidingsignificantsavingsforthehealthcaresystem.Inthispaper,the2016HCUstatusofpatientswaspredictedusingfree-formtextdatafromthe2015cumulativepatientprofileswithintheelectronicmedicalrecordsoffamilycarepracticesinOntario.Theseunstructurednotesmakesubstantialuseofdomain-specificspellingsandabbreviations;weshowthatwordembeddingsderivedfromthesamecontextprovidemoreinformativefeaturesthanpre-trainedonesbasedonWikipedia,MIMIC,andPubmed.Wefurtherdemonstratethatamodelusingfeaturesderivedfromaggregatedwordembeddings(EmbEncode)providesasignificantperformanceimprovementoverthebag-of-wordsrepresentation(82.48±0.35%versus81.85±0.36%held-outAUROC,p=3.2E-4),usingfarfewerinputfeatures(5,492versus214,750)andfewernon-zerocoefficients(1,177versus4,284).ThefutureHCUsofgreatestinterestarethetransitionaloneswhoarenotalreadyHCUs,becausetheyprovidethegreatestscopeforinterventions.PredictingthesenewHCUischallengingbecausemostHCUsrecur.WeshowthatremovingrecurrentHCUsfromthetrainingsetimprovestheabilityofEmbEncodetopredictnewHCUs,whileonlyslightlydecreasingitsabilitytopredictrecurrentones.

45

ArtificialIntelligenceforEnhancingClinicalMedicine

Obtainingdual-energycomputedtomography(CT)informationfromasingle-energyCTimageforquantitativeimaginganalysisoflivingsubjectsbyusingdeeplearning

WeiZhao1,TianlingLv2,RenaLee3,YangChen2,LeiXing1

1StanfordUniversity,2SoutheastUniversity,3EhwaWomensUniversity

LeiXingComputedtomographic(CT)isafundamentalimagingmodalitytogeneratecross-sectionalviewsofinternalanatomyinalivingsubjectorinterrogatematerialcompositionofanobject,andithasbeenroutinelyusedinclinicalapplicationsandnondestructivetesting.InastandardCTimage,pixelshavingthesameHounsfieldUnits(HU)cancorrespondtodifferentmaterials,anditisthereforechallengingtodifferentiateandquantifymaterials.Dual-energyCT(DECT)isdesirabletodifferentiatemultiplematerials,butthecostlyDECTscannersarenotwidelyavailableassingle-energyCT(SECT)scanners.Recentadvancementindeeplearningprovidesanenablingtooltomapimagesbetweendifferentmodalitieswithincorporatedpriorknowledge.HerewedevelopadeeplearningapproachtoperformDECTimagingbyusingthestandardSECTdata.Theendpointoftheapproachisamodelcapableofprovidingthehigh-energyCTimageforagiveninputlow-energyCTimage.Thefeasibilityofthedeeplearning-basedDECTimagingmethodusingaSECTdataisdemonstratedusingcontrast-enhancedDECTimagesandevaluatedusingclinicalrelevantindexes.ThisworkopensnewopportunitiesfornumerousDECTclinicalapplicationswithastandardSECTdataandmayenablesignificantlysimplifiedhardwaredesign,scanningdose,andimagecostreductionforfutureDECTsystems.

46

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

47

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

Many-to-onebindingbyintrinsicallydisorderedproteinregions

Wei-LunAlterovitz1*,EshelFaraggi1,2,3*,ChristopherJ.Oldfield1,JingweiMeng1,BinXue1,FeiHuang1,PedroRomero1,AndrzejKloczkowski2,VladimirN.Uversky1,A.KeithDunker1

1CenterforComputationalBiologyandBioinformatics,DepartmentofBiochemistryand

MolecularBiology,IndianaUniversitySchoolofMedicine,410W.10thSt,HS5000,Indianapolis,IN46202,USA(kedunker@iupui.edu);2BattelleCenterforMathematicalMedicine,andthe

NationwideChildren’sHospital,DepartmentofPediatrics,TheOhioStateUniversity,Columbus,OH43210,USA;3ResearchandInformationSystems,LLC,1620E.72ndSt.Indianapolis,IN

46240USA*Contributedequally(weilun.hsu@gmail.com,efaaggi@gmail.com)

KeithDunkerDisorderedbindingregions(DBRs),whichareembeddedwithinintrinsicallydisorderedproteinsorregions(IDPsorIDRs),enableIDPsorIDRstomediatemultipleprotein-proteininteractions.DBR-proteincomplexeswerecollectedfromtheProteinDataBankforwhichtwoormoreDBRshavingdifferentaminoacidsequencesbindtothesame(100%sequenceidentical)globularproteinpartner,atypeofinteractionhereincalledmany-to-onebinding.Twodistinctbindingprofileswereidentified:independentandoverlapping.Fortheoverlappingbindingprofiles,thedistinctDBRsinteractbymeansofalmostidenticalbindingsites(hereincalled“similar”),orthebindingsitescontainbothcommonanddivergentinteractionresidues(hereincalled“intersecting”).FurtheranalysisofthesequenceandstructuraldifferencesamongthesethreegroupsindicatehowIDPflexibilityallowsdifferentsegmentstoadjusttosimilar,intersecting,andindependentbindingpockets.

48

MUTATIONALSIGNATURES

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

49

MutationalSignatures

ImpactofmutationalsignaturesonmicroRNAandtheirresponseelements

EiriniStamoulakatou1,PietroPinoli1,StefanoCeri1,RosarioPiro2

1PolitecnicodiMilano,2FreieUniversitatBerlin

EiriniStamoulakatouMicroRNAsareaclassofsmallnon-codingRNAmoleculeswithgreatimportanceforregulatingalargenumberofdiversebiologicalprocessesinhealthanddisease,mostlybybindingtocomplementarymicroRNAresponseelements(MREs)onprotein-codingmessengerRNAsandothernon-codingRNAsandsubsequentlyinducingtheirdegradation.AgrowingbodyofevidenceindicatesthatthedysregulationofcertainmicroRNAsmayeitherdriveorsuppressoncogenesis.TheseedregionofamicroRNAisofcrucialimportanceforitstargetrecognition.MutationsintheseseedregionsmaydisruptthebindingofmicroRNAstotheirtargetgenes.Inthisstudy,weinvestigatethetheoreticalimpactofcancer-associatedmutagenicprocessesandtheirmutationalsignaturesonmicroRNAseedsandtheirMREs.Toourknowledge,thisisthefirststudywhichprovidesaprobabilisticframeworkformicroRNAandMREsequencealterationanalysisbasedonmutationalsignaturesandcomputationallyassessingthedisruptiveimpactofmutationalsignaturesonhumanmicroRNA–targetinteractions.

50

MutationalSignatures

GenomeGerrymandering:optimaldivisonofthegenomeintoregionswithcancertypespecificdifferencesinmutationrates

AdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal

UniversityofTorontoAdamoYoungTheactivityofmutationalprocessesdiffersacrossthegenome,andisinfluencedbychromatinstateandspatialgenomeorganization.Atthescaleofonemegabase-pair(Mb),regionalmutationdensitycorrelatestronglywithchromatinfeaturesandmutationdensityatthisscalecanbeusedtoaccuratelyidentifycancertype.Here,weexploretherelationshipbetweengenomicregionandmutationratebydevelopinganinformationtheorydriven,dynamicprogrammingalgorithmfordividingthegenomeintoregionswithdifferingrelativemutationratesbetweencancertypes.Ouralgorithmimprovesmutualinformationwhencomparedtothenaiveapproach,effectivelyreducingtheaveragenumberofmutationsrequiredtoidentifycancertype.Ourapproachprovidesanefficientmethodforassociatingregionalmutationdensitywithmutationlabels,andhasfutureapplicationsinexploringtheroleofsomaticmutationsinanumberofdiseases.

51

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

52

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

LearningaLatentSpaceofHighlyMultidimensionalCancerData

BenjaminKompa1,BeauCoker2

1HarvardMedicalSchool,2HarvardSchoolofPublicHealth

BenjaminKompaWeintroduceaUnifiedDisentanglementNetwork(UFDN)trainedonTheCancerGenomeAtlas(TCGA),whichwerefertoasUFDN-TCGA.WedemonstratethatUFDN-TCGAlearnsabiologicallyrelevant,low-dimensionallatentspaceofhigh-dimensionalgeneexpressiondatabyapplyingournetworktotwoclassificationtasksofcancerstatusandcancertype.UFDN-TCGAperformscomparablytorandomforestmethods.TheUFDNallowsforcontinuous,partialinterpolationbetweendistinctcancertypes.Furthermore,weperformananalysisofdifferentiallyexpressedgenesbetweenskincutaneousmelanoma(SKCM)samplesandthesamesamplesinterpolatedintoglioblastoma(GBM).Wedemonstratethatourinterpolationsconsistofrelevantmetagenesthatrecapitulateknownglioblastomamechanisms.

53

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ScalingstructurallearningwithNO-BEARStoinfercausaltranscriptomenetworks

Hao-ChihLee1,3,MatteoDanieletto1,2,3,RiccardoMiotto1,2,3,SarahT.Cherng1,3,JoelT.Dudley1,2,3

1InstituteforNextGenerationHealthcare,2HassoPlattnerInstituteforDigitalHealth,

3DepartmentofGeneticsandGenomicSciencesIcahnSchoolofMedicineatMountSinaiNewYork,NY10065,USA

Hao-ChihLeeConstructinggeneregulatorynetworksisacriticalstepinrevealingdiseasemechanismsfromtranscriptomicdata.Inthiswork,wepresentNO-BEARS,anovelalgorithmforestimatinggeneregulatorynetworks.TheNO-BEARSalgorithmisbuiltonthebasisoftheNO-TEARSalgorithmwithtwoimprovements.First,weproposeanewconstraintanditsfastapproximationtoreducethecomputationalcostoftheNO-TEARSalgorithm.Next,weintroduceapolynomialregressionlosstohandlenon-linearityingeneexpressions.OurimplementationutilizesmodernGPUcomputationthatcandecreasethetimeofhours-longCPUcomputationtoseconds.Usingsyntheticdata,wedemonstrateimprovedperformance,bothinprocessingtimeandaccuracy,oninferringgeneregulatorynetworksfromgeneexpressiondata.

54

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

PathFlowAI:AHigh-ThroughputWorkflowforPreprocessing,DeepLearningandInterpretationinDigitalPathology

JoshuaJ.Levy1,LucasA.Salas1,BrockC.Christensen1,AravindhanSriharan2,LouisJ.Vaickus2

1GeiselSchoolofMedicineatDartmouth,2DartmouthHitchcockMedicalCenter

JoshuaLevyThediagnosisofdiseaseoftenrequiresanalysisofabiopsy.Manydiagnosesdependnotonlyonthepresenceofcertainfeaturesbutontheirlocationwithinthetissue.Recently,anumberofdeeplearningdiagnosticaidshavebeendevelopedtoclassifydigitizedbiopsyslides.Clinicalworkflowsofteninvolveprocessingofmorethan500slidesperday.But,clinicaluseofdeeplearningdiagnosticaidswouldrequireapreprocessingworkflowthatiscost-effective,flexible,scalable,rapid,interpretable,andtransparent.Here,wepresentsuchaworkflow,optimizedusingDaskandmixedprecisiontrainingviaAPEX,capableofhandlinganypatch-levelorslidelevelclassificationandpredictionproblem.Theworkflowusesaflexibleandfastpreprocessinganddeeplearninganalyticspipeline,incorporatesmodelinterpretationandhasahighlystorage-efficientaudittrail.Wedemonstratetheutilityofthispackageontheanalysisofaprototypicalanatomicpathologyspecimen,liverbiopsiesforevaluationofhepatitisfromaprospectivecohort.ThepreliminarydataindicatethatPathFlowAImaybecomeacost-effectiveandtime-efficienttoolforclinicaluseofArtificialIntelligence(AI)algorithms.

55

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Improvingsurvivalpredictionusinganovelfeatureselectionandfeaturereductionframeworkbasedontheintegrationofclinicalandmoleculardata*

LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson

DepartmentofBiostatisticsandDataScience,UniversityofKansasMedicalCenter,andUniversityofKansasCancerCenter

LisaNeumsTheaccuratepredictionofacancerpatient’sriskofprogressionordeathcanguidecliniciansintheselectionoftreatmentandhelppatientsinplanningpersonalaffairs.Predictivemodelsbasedonpatient-leveldatarepresentatoolfordeterminingrisk.Ideally,predictivemodelswillusemultiplesourcesofdata(e.g.,clinical,demographic,molecular,etc.).However,therearemanychallengesassociatedwithdataintegration,suchasoverfittingandredundantfeatures.Inthispaperweaimtoaddressthosechallengesthroughthedevelopmentofanovelfeatureselectionandfeaturereductionframeworkthatcanhandlecorrelateddata.Ourmethodbeginsbycomputingasurvivaldistancescoreforgeneexpression,whichincombinationwithascoreforclinicalindependence,resultsintheselectionofhighlypredictivegenesthatarenon-redundantwithclinicalfeatures.Thesurvivaldistancescoreisameasureofvariationofgeneexpressionovertime,weightedbythevarianceofthegeneexpressionoverallpatients.Selectedgenes,incombinationwithclinicaldata,areusedtobuildapredictivemodelforsurvival.Webenchmarkourapproachagainstcommonlyusedmethods,namelylasso-aswellasridge-penalizedCoxproportionalhazardsmodels,usingthreepubliclyavailablecancerdatasets:kidneycancer(521samples),lungcancer(454samples)andbladdercancer(335samples).Acrossalldatasets,ourapproachbuiltonthetrainingsetoutperformedtheclinicaldataaloneinthetestsetintermsofpredictivepowerwithac.Indexof0.773vs0.755forkidneycancer,0.695vs0.664forlungcancerand0.648vs0.636forbladdercancer.Further,wewereabletoshowincreasedpredictiveperformanceofourmethodcomparedtolasso-penalizedmodelsfittobothgeneexpressionandclinicaldata,whichhadac.Indexof0.767,0.677,and0.645,aswellasincreasedorcomparablepredictivepowercomparedtoridgemodels,whichhadac.Indexof0.773,0.668and0.650forthekidney,lung,andbladdercancerdatasets,respectively.Therefore,ourscoreforclinicalindependenceimprovesprognosticperformanceascomparedtomodelingapproachesthatdonotconsidercombiningnon-redundantdata.Futureworkwillconcentrateonoptimizingthesurvivaldistancescoreinordertoachieveimprovedresultsforalltypesofcancer.

56

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Bayesiansemi-nonnegativematrixtri-factorizationtoidentifypathwaysassociatedwithcancerphenotypes

SunhoPark1,NabhonilKar1,Jae-HoCheong2,TaeHyunHwang1

1ClevelandClinic,2YonseiUniversityCollegeofMedicine

SunhoParkAccurateidentificationofpathwaysassociatedwithcancerphenotypes(e.g.,cancersubtypesandtreatmentoutcome)couldleadtodiscoveringreliableprognosticand/orpredictivebiomarkersforbetterpatientsstratificationandtreatmentguidance.Inourpreviouswork,wehaveshownthatnon-negativematrixtri-factorization(NMTF)canbesuccessfullyappliedtoidentifypathwaysassociatedwithspecificcancertypesordiseaseclassesasaprognosticandpredictivebiomarker.However,onekeylimitationofnon-negativefactorizationmethods,includingvariousnon-negativebi-factorizationmethods,istheirlimitedabilitytohandlenegativeinputdata.Forexample,manymoleculardatathatconsistofreal-valuescontainingbothpositiveandnegativevalues(e.g.,normalized/logtransformedgeneexpressiondatawherenegativevaluerepresentsdown-regulatedexpressionofgenes)arenotsuitableinputforthesealgorithms.Inaddition,mostpreviousmethodsprovidejustasinglepointestimateandhencecannotdealwithuncertaintyeffectively.Toaddresstheselimitations,weproposeaBayesiansemi-nonnegativematrixtri-factorizationmethodtoidentifypathwaysassociatedwithcancerphenotypesfromareal-valuedinputmatrix,e.g.,geneexpressionvalues.Motivatedbysemi-nonnegativefactorization,weallowoneofthefactormatrices,thecentroidmatrix,tobereal-valuedsothateachcentroidcanexpresseithertheup-ordown-regulationofthemembergenesinapathway.Inaddition,weplacestructuredspike-and-slabpriors(whichareencodedwiththepathwaysandagene-geneinteraction(GGI)network)onthecentroidmatrixsothatevenasetofgenesthatisnotinitiallycontainedinthepathways(duetotheincompletenessofthecurrentpathwaydatabase)canbeinvolvedinthefactorizationinastochasticwayspecifically,ifthosegenesareconnectedtothemembergenesofthepathwaysontheGGInetwork.Wealsopresentupdaterulesfortheposteriordistributionsintheframeworkofvariationalinference.AsafullBayesianmethod,ourproposedmethodhasseveraladvantagesoverthecurrentNMTFmethods,whicharedemonstratedusingsyntheticdatasetsinexperiments.UsingtheTheCancerGenomeAtlas(TCGA)gastriccancerandmetastaticgastriccancerimmunotherapyclinical-trialdatasets,weshowthatourmethodcouldidentifybiologicallyandclinicallyrelevantpathwaysassociatedwiththemolecularsubtypesandimmunotherapyresponse,respectively.Finally,weshowthatthosepathwaysidentifiedbytheproposedmethodcouldbeusedasprognosticbiomarkerstostratifypatientswithdistinctsurvivaloutcomeintwoindependentvalidationdatasets.Additionalinformationandcodescanbefoundathttps://github.com/parks-cs-ccf/BayesianSNMTF.

57

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Tree-WeightingforMulti-StudyEnsembleLearners

MayaRamchandran1,PrasadPatil1,2,GiovanniParmigiani1,2

1DepartmentofBiostatistics,HarvardT.H.ChanSchoolofPublicHealth;Departmentof

Biostatistics,HarvardT.H.ChanSchoolofPublicHealth;2DepartmentofDataSciences,Dana-FarberCancerInstitute

MayaRamchandranMulti-studylearningusesmultipletrainingstudies,separatelytrainsclassifiersoneach,andformsanensemblewithweightsrewardingmemberswithbettercross-studypredictionability.Thisarticleconsidersnovelweightingapproachesforconstructingtree-basedensemblelearnersinthissetting.UsingRandomForestsasasingle-studylearner,wecompareweightingeachforesttoformtheensemble,toextractingtheindividualtreestrainedbyeachRandomForestandweightingthemdirectly.Wefindthatincorporatingmultiplelayersofensemblinginthetrainingprocessbyweightingtreesincreasestherobustnessoftheresultingpredictor.Furthermore,weexplorehowensemblingweightscorrespondtotreestructure,toshedlightonthefeaturesthatdeterminewhetherweightingtreesdirectlyisadvantageous.Finally,weapplyourapproachtogenomicdatasetsandshowthatweightingtreesimprovesuponthebasicmulti-studylearningparadigm.Codeandsupplementarymaterialareavailableathttps://github.com/m-ramchandran/tree-weighting.

58

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

PTRExplorer:AnapproachtoidentifyandexplorePostTranscriptionalRegulatorymechanismsusingproteogenomics

ArunimaSrivastava1,MichaelSharpnack1,KunHuang2,ParagMallick3,RaghuMachiraju1

1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversity

ArunimaSrivastavaIntegrationoftranscriptomicandproteomicdatashouldrevealmulti-layeredregulatoryprocessesgoverningcancercellbehaviors.Traditionalcorrelation-basedanalyseshavedemonstratedlimitedabilitytoidentifythepost-transcriptionalregulatory(PTR)processesthatdrivethenon-linearrelationshipbetweentranscriptandproteinabundances.Inthiswork,weideateanintegrativeapproachtoexplorethevarietyofpost-transcriptionalmechanismsthatdictaterelationshipsbetweengenesandcorrespondingproteins.Theproposedworkflowutilizestheintuitivetechniqueofscatterplotdiagnosticsorscagnostics,tocharacterizeandexaminethediversescatterplotsbuiltfromtranscriptandproteinabundancesinaproteogenomicexperiment.Theworkflowincludesrepresentinggene-proteinrelationshipsasscatterplots,clusteringongeometricscagnosticfeaturesofthesescatterplots,andfinallyidentifyingandgroupingthepotentialgene-proteinrelationshipsaccordingtotheirdispositiontovariousPTRmechanisms.Ourstudyverifiestheefficacyoftheimplementedapproachtoexcavatepossibleregulatorymechanismsbyutilizingcomprehensivetestsonasyntheticdataset.Wealsoproposeavarietyof2Dpattern-specificdownstreamanalysesmethodologiessuchasmixturemodeling,andmappingmiRNApost-transcriptionaleffectstoexploreeachmechanismfurther.Thisworksuggeststhattheproposedmethodologyhasthepotentialfordiscoveringandcategorizingpost-transcriptionalregulatorymechanisms,manifestinginproteogenomictrends.Thesetrendssubsequentlyprovideevidenceforcancerspecificity,miRNAtargeting,andidentificationofregulationimpactedbybiologicalfunctionalityanddifferenttypesofdegradation.

59

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

NetworkRepresentationofLarge-ScaleHeterogeneousRNASequenceswithIntegrationofDiverseMulti-omics,Interactions,andAnnotationsData

NhatTran,JeanGao

TheUniversityofTexasatArlingtonJeanGaoLongnon-codingRNA(lncRNA),microRNA,andmessengerRNAenablekeyregulationsofvariousbiologicalprocessesthroughavarietyofdiverseinteractionmechanisms.Identifyingtheinteractionsandcross-talkbetweentheseheterogeneousRNAclassesisessentialinordertouncoverthefunctionalroleofindividualRNAtranscripts,especiallyforunannotatedandsparselydiscoveredRNAsequenceswithnoknowninteractions.Recently,sequence-baseddeeplearningandnetworkembeddingmethodsaregainingtractionashigh-performingandflexibleapproachesthatcaneitherpredictRNA-RNAinteractionsfromsequenceorinfermissinginteractionsfrompatternsthatmayexistinthenetworktopology.However,mostofthecurrentmethodshaveseverallimitations,e.g.,theinabilitytoperforminductivepredictions,todistinguishthedirectionalityofinteractions,ortointegratevarioussequence,interaction,expression,andgenomicannotationdatasets.Weproposedanoveldeeplearningframework,rna2rna,whichlearnsfromRNAsequencestoproducealow-dimensionalembeddingthatpreservesproximitiesinboththeinteractiontopologyandthefunctionalaffinitytopology.Inthisproposedembeddingspace,thetwo-part"sourceandtargetcontexts"capturethereceptivefieldsofeachRNAtranscripttoencapsulateheterogeneouscross-talkinteractionsbetweenlncRNAsandmicroRNAs.TheproximitybetweenRNAsinthisembeddingspacealsouncoversthesecond-orderrelationshipsthatallowforaccurateinferenceofnoveldirectedinteractionsorfunctionalsimilaritiesbetweenanytwoRNAsequences.Inaprospectiveevaluation,ourmethodexhibitssuperiorperformancecomparedtostate-of-artapproachesatpredictingmissinginteractionsfromseveralRNA-RNAinteractiondatabases.AdditionalresultssuggestthatourproposedframeworkcancaptureamanifoldforheterogeneousRNAsequencestodiscovernovelfunctionalannotations.

60

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

HadoopandPySparkforreproducibilityandscalabilityofgenomicsequencingstudies

NicholasR.Wheeler1,PenelopeBenchek1,BrianW.Kunkle2,KaraL.Hamilton-Nelson2,MikeWarfe1,JeremyR.Fondran1,JonathanL.Haines1,WilliamS.Bush1

1CaseWesternReserveUniversity,2UniversityofMiami

WilliamBushModerngenomicstudiesarerapidlygrowinginscale,andtheanalyticalapproachesusedtoanalyzegenomicdataareincreasingincomplexity.Genomicdatamanagementposeslogisticandcomputationalchallenges,andanalysesareincreasinglyreliantongenomicannotationresourcesthatcreatetheirowndatamanagementandversioningissues.Asaresult,genomicdatasetsareincreasinglyhandledinwaysthatlimittherigorandreproducibilityofmanyanalyses.Inthiswork,weexaminetheuseoftheSparkinfrastructureforthemanagement,access,andanalysisofgenomicdataincomparisontotraditionalgenomicworkflowsontypicalclusterenvironments.WevalidatetheframeworkbyreproducingpreviouslypublishedresultsfromtheAlzheimer’sDiseaseSequencingProject.UsingtheframeworkandanalysesdesignedusingJupyternotebooks,Sparkprovidesimprovedworkflows,reducesuser-drivendatapartitioning,andenhancestheportabilityandreproducibilityofdistributedanalysesrequiredforlarge-scalegenomicstudies.

61

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

CERENKOV3:Clusteringandmolecularnetwork-derivedfeaturesimprovecomputationalpredictionoffunctionalnoncodingSNPs

YaoYao,StephenA.Ramsey

OregonStateUniversityYaoYaoIdentificationofcausalnoncodingsinglenucleotidepolymorphisms(SNPs)isimportantformaximizingtheknowledgedividendfromhumangenome-wideassociationstudies(GWAS).Recently,diversemachinelearning-basedmethodshavebeenusedforfunctionalSNPidentification;however,thistaskremainsafundamentalchallengeincomputationalbiology.WereportCERENKOV3,amachinelearningpipelinethatleveragesclustering-derivedandmolecularnetwork-derivedfeaturestoimprovepredictionaccuracyofregulatorySNPs(rSNPs)inthecontextofpost-GWASanalysis.Theclustering-derivedfeature,locussize(numberofSNPsinthelocus),derivesfromourlocuspartitioningprocedureandrepresentsthesizesofclustersbasedonSNPlocations.Wegeneratedtwomolecularnetwork-derivedfeaturesfromrepresentationlearningonanetworkrepresentingSNP-geneandgene-generelations.Basedonempiricalstudiesusingaground-truthSNPdataset,CERENKOV3significantlyimprovesrSNPrecognitionperformanceinAUPRC,AUROC,andAVGRANK(alocus-wiserank-basedmeasureofclassificationaccuracywepreviouslyproposed).

62

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

63

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

AnomiGAN:GenerativeAdversarialNetworksforAnonymizingPrivateMedicalData

HoBae,DahuinJung,Hyun-SooChoi,SungrohYoon

SeoulNationalUniversityHoBae

Typicalpersonalmedicaldatacontainssensitiveinformationaboutindividuals.Storingorsharingthepersonalmedicaldataisthusoftenrisky.Forexample,ashortDNAsequencecanprovideinformationthatcanidentifynotonlyanindividual,butalsohisorherrelatives.Nonetheless,mostcountriesandresearchersagreeonthenecessityofcollectingpersonalmedicaldata.Thisstemsfromthefactthatmedicaldata,includinggenomicdata,areanindispensableresourceforfurtherresearchanddevelopmentregardingdiseasepreventionandtreatment.Topreventpersonalmedicaldatafrombeingmisused,techniquestoreliablypreservesensitiveinformationshouldbedevelopedforrealworldapplications.Inthispaper,weproposeaframeworkcalledanonymizedgenerativeadversarialnetworks(AnomiGAN),topreservetheprivacyofpersonalmedicaldata,whilealsomaintaininghighpredictionperformance.Wecomparedourmethodtostate-of-the-arttechniquesandobservedthatourmethodpreservesthesamelevelofprivacyasdifferentialprivacy(DP)andprovidesbetterpredictionresults.Wealsoobservedthatthereisatrade-offbetweenprivacyandpredictionresultsthatdependsonthedegreeofpreservationoftheoriginaldata.Here,weprovideamathematicaloverviewofourproposedmodelanddemonstrateitsvalidationusingUCImachinelearningrepositorydatasetsinordertohighlightitsutilityinpractice.Thecodeisavailableathttps://github.com/hobae/AnomiGAN/

64

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

FrequencyofClinVarpathogenicvariantsinchronickidneydiseasepatientssurveyedforreturnofresearchresultsataClevelandpublichospital

DanaC.Crawford1,2,3,,JohnLin1,JessicaN.CookeBailey1,2,TylerKinzy1,JohnR.Sedor4,5,JohnF.O'Toole5,WilliamS.Bush1,2,3

1ClevelandInstituteforComputationalBiology,2DepartmentsofPopulationandQuantitative

HealthSciences,and3GeneticsandGenomeSciences,CaseWesternReserveUniversity4DepartmentofPhysiologyandBiophysics,CaseWesternReserveUniversity;and5DepartmentofNephrologyandHypertension,GlickmanUrologyandKidneyandLernerResearchInstitute,

ClevelandClinicDanaCrawfordReturnofresultsisnotcommoninresearchsettingsasstandardsarenotyetinplaceforwhattoreturn,howtoreturn,andtowhom.Asapioneeroflarge-scaleofreturnofresearchresults,thePrecisionMedicineInitiativeCohortnowknownofAllofUsplanstoreturnpharmacogenomicresultsandvariantsofclinicalsignificancetoitsparticipantsstartinglate2019.Tobetterunderstandthelocallandscapeofpossibilitiesregardingreturnofresearchresults,weassessedthefrequencyofpathogenicvariantsandAPOL1renalriskvariantsinasmalldiversecohortofchronickidneydiseasepatients(CKD)ascertainedfromapublichospitalinCleveland,OhiogenotypedontheIlluminaInfiniumMegaEX.Ofthe23,720ClinVar-designatedvariantsdirectlyassayedbytheMegaEX,8,355(35%)hadatleastonealternatealleleinthe130participantsgenotyped.Ofthese,18ClinVarvariantsdeemedpathogenicbymultiplesubmitterswithnoconflictsininterpretationweredistributedacross27participants.ThemajorityofthesepathogenicClinVarvariants(14/18)wereassociatedwithautosomalrecessivedisorders.OfnotewerefourAfricanAmericancarriersofTTRrs76992529associatedwithamyloidogenictransthyretinamyloidosis,otherwiseknownasfamilialtransthyretinamyloidosis(FTA).FTA,anautosomaldominantdisorderwithvariablepenetrance,ismorecommonamongAfrican-descentpopulationscomparedwithEuropean-descentpopulations.AlsocommoninthisCKDpopulationwereAPOL1renalriskallelesG1(rs73885319)andG2(rs71785313)with60%ofthestudypopulationcarryingatleastonerenalriskallele.BothpathogenicClinVarvariantsandAPOL1renalriskallelesweredistributedamongparticipantswhowantedactionablegeneticresultsreturned,wantedgeneticresultsreturnedregardlessofactionability,andwantednoresultsreturned.Resultsfromthislocalgeneticstudyhighlightchallengesinwhichvariantstoreport,howtointerpretthem,andtheparticipant’spotentialforfollow-up,onlysomeofthechallengesinreturnofresearchresultslikelyfacinglargerstudiessuchasAllofUs.

65

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Network-BasedMatchingofPatientsandTargetedTherapiesforPrecisionOncology

QingzhiLiu1,MinJinHa2,RupamBhattacharyya1,LanaGarmire3,VeerabhadranBaladandayuthapani1

1DepartmentofBiostatistics,UniversityofMichigan;2DepartmentofBiostatistics,The

UniversityofTexasMDAndersonCancerCenter;3DepartmentofComputationalMedicineandBioinformaticsUniversityofMichigan

QingzhiLiuTheextensiveacquisitionofhigh-throughputmolecularprofilingdataacrossmodelsystems(humantumorsandcancercelllines)anddrugsensitivitydata,makesprecisiononcologypossible–allowingclinicianstomatchtherightdrugtotherightpatient.Currentsupervisedmodelsfordrugsensitivityprediction,oftenusecelllinesasexemplarsofpatienttumorsandformodeltraining.However,thesemodelsarelimitedintheirabilitytoaccuratelypredictdrugsensitivityofindividualcancerpatientstoalargesetofdrugs,giventhepaucityofpatientdrugsensitivitydatausedfortestingandhighvariabilityacrossdifferentdrugs.Toaddressthesechallenges,wedevelopedamultilayernetwork-basedapproachtoimputeindividualpatients’responsestoalargesetofdrugs.Thisapproachconsidersthetripletofpatients,celllinesanddrugsasoneinter-connectedholisticsystem.Wefirstusetheomicsprofilestoconstructapatient-celllinenetworkanddeterminebestmatchingcelllinesforpatienttumorsbasedonrobustmeasuresofnetworksimilarity.Subsequently,theseresultsareusedtoimputethe“missinglink”betweeneachindividualpatientandeachdrug,calledPersonalizedImputedDrugSensitivityScore(PIDS-Score),whichcanbeconstruedasameasureofthetherapeuticpotentialofadrugortherapy.Weappliedourmethodtotwosubtypesoflungcancerpatients,matchedthesepatientswithcancercelllinesderivedfrom19tissuetypesbasedontheirfunctionalproteomicsprofiles,andcomputedtheirPIDS-Scoresto251drugsandexperimentalcompounds.Weidentifiedthebestrepresentativecelllinesthatconservelungcancerbiologyandmoleculartargets.ThePIDS-Scorebasedtopsensitivedrugsfortheentirepatientcohortaswellasindividualpatientsarehighlyrelatedtolungcancerintermsoftheirtargets,andtheirPIDS-Scoresaresignificantlyassociatedwithpatientclinicaloutcomes.Thesefindingsprovideevidencethatourmethodisusefultonarrowthescopeofpossibleeffectivepatient-drugmatchingsforimplementingevidence-basedpersonalizedmedicinestrategies.

66

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Phenome-wideassociationstudiesoncardiovascularhealthandfattyacidsconsideringphenotypequalitycontrolpracticesforepidemiologicaldata

KristinPassero1,XiHe1,JiayanZhou1,BertramMueller-Myhsok2,3,4,MarcusE.Kleber5,WinfriedMaerz5,6,7,MollyA.Hall1

1PennState;2MaxPlanckInstituteofPsychiatry;3MunichClusterofSystemsBiology;4University

ofLiverpool;5HeidelbergUniversity;6SYNLABAcademy;7MedicalUniversityofGrazKristinPasseroPhenome-wideassociationstudies(PheWAS)allowagnosticinvestigationofcommongeneticvariantsinrelationtoavarietyofphenotypesbutpreservingthepowerofPheWASrequirescarefulphenotypicqualitycontrol(QC)procedures.WhileQCofgeneticdataiswell-defined,noestablishedQCpracticesexistformulti-phenotypicdata.Manuallyimposingsamplesizerestrictions,identifyingvariabletypes/distributions,andlocatingproblemssuchasmissingdataoroutliersisarduousinlarge,multivariatedatasets.Inthispaper,weperformtwoPheWASonepidemiologicaldataand,utilizingthenovelsoftwareCLARITE(CLeaningtoAnalysis:Reproducibility-basedInterfaceforTraitsandExposures),showcaseatransparentandreplicablephenomeQCpipelinewhichwebelieveisanecessityforthefield.UsingdatafromtheLudwigshafenRiskandCardiovascular(LURIC)HealthStudywerantwoPheWAS,oneoncardiac-relateddiseasesandtheotheronpolyunsaturatedfattyacidslevels.Thesephenotypesunderwentastringentqualitycontrolscreenandwereregressedonagenome-widesampleofsinglenucleotidepolymorphisms(SNPs).SevenSNPsweresignificantinassociationwithdihomo-γ-linolenicacid,ofwhichfivewerewithinfattyaciddesaturasesFADS1andFADS2.PheWASisausefultooltoelucidatethegeneticarchitectureofcomplexdiseasephenotypeswithinasingleexperimentalframework.However,toreducecomputationalandmultiple-comparisonsburden,carefulassessmentofphenotypequalityandremovaloflow-qualitydataisprudent.HereinweperformtwoPheWASwhileapplyingadetailedphenotypeQCprocess,forwhichweprovideareplicablepipelinethatismodifiableforapplicationtootherlargedatasetswithheterogenousphenotypes.Asinvestigationofcomplextraitscontinuesbeyondtraditionalgenomewideassociationstudies(GWAS),suchQCconsiderationsandtoolssuchasCLARITEarecrucialtotheintheanalysisofnon-geneticbigdatasuchasclinicalmeasurements,lifestylehabits,andpolygenictraits.

67

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

aTEMPO:Pathway-SpecificTemporalAnomaliesforPrecisionTherapeutics

ChristopherMichaelPietras,LiamPower,DonnaK.Slonim

TuftsUniversityChristopherPietrasDynamicprocessesareinherentlyimportantindisease,andidentifyingdisease-relateddisruptionsofnormaldynamicprocessescanprovideinformationaboutindividualpatients.Wehavepreviouslycharacterizedindividuals'diseasestatesviapathway-basedanomaliesinexpressiondata,andwehaveidentifieddisease-correlateddisruptionofpredictabledynamicpatternsbymodelingavirtualtimeseriesinstaticdata.Herewecombinethetwoapproaches,usingananomalydetectionmodelandvirtualtimeseriestoidentifyanomaloustemporalprocessesinspecificdiseasestates.Wedemonstratethatthisapproachcaninformativelycharacterizeindividualpatients,suggestingpersonalizedtherapeuticapproaches.

68

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

FeatureSelectionandDimensionReductionofSocialAutismData

PeterWashington1,KelleyMariePaskov1,HaikKalantarian1,NathanielStockham1,CatalinVoss1,AaronKline1,RitikPatnaik2,BriannaChrisman1,MayaVarma1,QandeelTariq1,Kaitlyn

Dunlap1,JesseySchwartz1,NickHaber1,DennisP.Wall1

1StanfordUniversity,2MassachusettsInstituteofTechnology

PeterWashingtonAutismSpectrumDisorder(ASD)isacomplexneuropsychiatricconditionwithahighlyheterogeneousphenotype.FollowingtheworkofDudaetal.,whichusesareducedfeaturesetfromtheSocialResponsivenessScale,SecondEdition(SRS)todistinguishASDfromADHD,weperformeditem-levelquestionselectiononanswerstotheSRStodeterminewhetherASDcanbedistinguishedfromnon-ASDusingasimilarlysmallsubsetofquestions.ToexplorefeatureredundanciesbetweentheSRSquestions,weperformedfilter,wrapper,andembeddedfeatureselectionanalyses.ToexplorethelinearityoftheSRS-relatedASDphenotype,wethencompressedthe65-questionSRSintolow-dimensionrepresentationsusingPCA,t-SNE,andadenoisingautoencoder.Wemeasuredtheperformanceofamulti-layerperceptron(MLP)classifierwiththetop-rankingquestionsasinput.Classificationusingonlythetop-ratedquestionresultedinanAUCofover92%forSRS-deriveddiagnosesandanAUCofover83%fordataset-specificdiagnoses.Highredundancyoffeatureshaveimplicationstowardsreplacingthesocialbehaviorsthataretargetedinbehavioraldiagnosticsandinterventions,wheredigitalquantificationofcertainfeaturesmaybeobfuscatedduetoprivacyconcerns.WesimilarlyevaluatedtheperformanceofanMLPclassifiertrainedonthelow-dimensionrepresentationsoftheSRS,findingthatthedenoisingautoencoderachievedslightlyhigherperformancethanthePCAandt-SNErepresentations.

69

ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE

POSTERPRESENTATIONS

70

ArtificialIntelligenceforEnhancingClinicalMedicine

PrioritizingCopyNumberVariantsusingPhenotypeandGeneFunctionalSimilarity

AzzaAlthagafi,JunChen,RobertHoehndorf

Computer,Electrical&MathematicalScienceandEngineeringDivision(CEMSE),ComputationalBioscienceResearchCenter(CBRC),KingAbdullahUniversityofScienceandTechnology

(KAUST),4700KAUST,23955-6900,Thuwal,KingdomofSaudiArabia

AzzaAlthagafiTherearemanytypesofgeneticvariationinthehumangenome,rangingfromlargechromosomeanomaliestoSingleNucleotideVariant(SNV).Itisbecomingnecessarytodevelopmethodsfordistinguishingdisease-causingvariantsfromalargenumberofneutralgeneticvariationinanindividual.ThisproblemisalsorelevanttoCopyNumberVariants(CNVs),whichisaclassofgeneticvariationwherelargesegmentsofthegenomedifferincopynumberamongstvariousindividuals.Overthepastseveralyears,muchprogresshasbeenmadeintheareaofCNVsdetectionandunderstandingtheirroleinhumandiseases.WenowunderstandthatCNVsaccountformuchofhumanvariability.Correspondingly,therehavebeenseveralmethodsintroducedtofinddisease-associatedgenesandSNVs.DifferentmethodshavebeendevelopedforpredictingandprioritizingpathogenicityofSNVsfoundwithinagenome.ConstructingsimilarmethodsforCNVischallengingduetotheheterogeneityinvariantsize,typeandthepossibilityofmultiplegenesbeingaffectedbylargeCNVs.CNVimpactpredictionmethodsshouldconsiderthesefactorsinordertorobustlyprioritizepathogenicvariants.Wehavebuiltamethodthatincorporatesbiologicalbackgroundknowledgeabouttherelationbetweenphenotypesresultingfromalossoffunctioninmousegenes,genefunctionsasdescribedusingtheGeneOntology(GO),aswellastheanatomicalsiteofgeneexpressionalongwithascorethatpredictsthepathogenicityofCNVSVScore.WeusethisinformationtobuildamachinelearningmodelthatranksCNVsbasedontheirpredictedpathogenicityandtherelationbetweengenesaffectedbytheCNVandthephenotypeweobserveinaffectedindividuals.Additionally,ourapproachconsidersseveralgenomicfeaturesofeachCNVs,suchasthelengthofthecodingsequenceoverlappingwiththeCNV,haploinsufficiencyandtriplosensitivityscorestomeasurethedosage-sensitivityforgenes/regions,andGCcontent.Ourresultsshowthatincorporatingthisinformationleadstoimprovementoverabaselinemodelwhichusesonlysimilarityscoresbetweengene--phenotypeassociationsanddisease-associatedphenotypes,aswellasimprovementoverusingonlypathogenicitypredictionmethodsforCNVs.OurmethodachievesanF-scoreof80.85%,with82.05%precisionand79.67%recallinourevaluationset.Theresultsdemonstratethatincorporatingphenotype,functional,andgeneexpressioninformationmaybeutilizedtoidentifycausativeCNVs.Futureworkisrequiredtoevaluateandimproveourmodelusingpatient-derivedWGSdata.

71

ArtificialIntelligenceforEnhancingClinicalMedicine

InferringtheRewardFunctionsthatGuideCancerProgression

JohnKalantari1,HeidiNelson2,NicholasChia3

1MicrobiomeProgram,CenterforIndividualizedMedicine,MayoClinic,Rochester,MN,USA;2ColonandRectalSurgery,MayoClinic,Rochester,MN,USA;3DivisionofSurgicalResearch,

DepartmentofSurgery,MayoClinic,Rochester,MN,USA

JohnKalantariCancercanoccurinpatientswithdifferentgeneticbackgroundsviaamulti-stepevolutionaryprocess,i.e.,drivenbymodificationandselection,thatcanaccumulatedifferentgeneticalterations.Despitethesedifferences,manycancersubtypesareunifiedbysimilarmechanismsortypesofgeneticchanges.Inotherwords,therearemultipleetiologicalpathstiedtogetherbyspecificeventsthatsharecommonalityintheircausalmechanism.Understandingthesecommonmechanismswillenablethedevelopmentofbettertherapiesandpreventativemeasures.Itwillalsoenableimprovedpredictionofrecurrenceandmetastaticadvancementofcancer,directlyimpactingthe606,880annualcancerdeathsintheUnitedStatesalone.OurworkisbuiltuponthecentralpropositionthattheMarkovDecisionProcess(MDP)canbetterrepresenttheprocessbywhichcancerarisesandprogresses.Morespecifically,byencodingacancercell'scomplexbehaviorasaMDP,weseektomodeltheseriesofgeneticchanges,orevolutionarytrajectory,thatleadstocancerasanoptimaldecisionprocess.WepositthatusinganInverseReinforcementLearning(IRL)approachwillenableustoreverseengineeranoptimalpolicyandrewardfunctionbasedonasetofexpertdemonstrationsextractedfromtheDNAofpatienttumors.Theinferredrewardfunctionandoptimalpolicycansubsequentlybeusedtoextrapolatetheevolutionarytrajectoryofanytumor.Weintroduceanoveldata-agnosticartificialintelligenceframeworkwhichcaninferrewardfunctionsdescribingthecausalmechanismsthatbestexplaintheobservedbehaviorofan'optimally-behavingagent'–thecancercell.Usingmulti-omicdatafrom27colorectalcancer(CRC)patientsasproof-of-principle,weshowthatIRLprovidesasystematicandscalableapproachtoformallystatingandsolvingtheproblemofcancerevolution.Byprovidingalineagepath(i.e.,sequencesofalterations)obtainedviasubclonalreconstructionforeachtumor,weareabletoreducethiscomplexproblemtotherecoveryofanassociatedreinforcementlearningrewardfunction.Theserewardfunctionshavethepotentialtomodelunknownmolecularmechanismsdrivingintratumorheterogeneityandtoelucidatecanceretiologies.

72

ArtificialIntelligenceforEnhancingClinicalMedicine

Predictingdisease-associatedmutationofmetal-bindingsitesinproteinsusingadeeplearningapproach

MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun

DepartmentofChemistry,TheUniversityofHongKong,HongKong,China;

DepartmentofHealthSciences,MayoClinic,Scottsdale,AZ,USA;DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic,

Scottsdale,AZ,USA;CenterforIndividualizedMedicine,MayoClinic,Scottsdale,AZ,USA;

CollegeofHealthSolutions,ArizonaStateUniversity,Scottsdale,AZ,USA

JunwenWangMetalloproteinsplayimportantrolesinmanybiologicalprocesses.Mutationsatthemetal-bindingsitesmayfunctionallydisruptmetalloproteins,initiatingseverediseases;however,thereseemedtobenoeffectiveapproachtopredictsuchmutationsuntilnow.Herewedevelopadeeplearningapproachtosuccessfullypredictdisease-associatedmutationsthatoccuratthemetal-bindingsitesofmetalloproteins.Wegenerateenergy-basedaffinitygridmapsandphysiochemicalfeaturesofthemetalbindingpockets(obtainedfromdifferentdatabasesasspatialandsequentialfeatures)andsubsequentlyimplementthesefeaturesintoamultichannelconvolutionalneuralnetwork.Aftertrainingthemodel,thenetworkcansuccessfullypredictdisease-associatedmutationsthatoccuratthefirstandsecondcoordinationspheresofzinc-bindingsiteswithanareaunderthecurveof0.90andanaccuracyof0.82.Ourapproachstandsforthefirstdeeplearningapproachforthepredictionofdisease-associatedmetal-relevantsitemutationsinmetalloproteins,providinganewplatformtotacklehumandiseases.

73

ArtificialIntelligenceforEnhancingClinicalMedicine

GENERAL

POSTERPRESENTATIONS

74

General

RankingRASpathwaymutationsusingevolutionaryhistoryofMEK1

KatiaAndrianova,IgorJouline

OhioStateUniversity,DepartmentofMicrobiology,Columbus,Ohio43210

KatiaAndrianovaTheRas/MAPK(ratsarcoma/mitogen-activatedproteinkinase)signalingpathwayisinvolvedinessentiallyallaspectsoforganismaldevelopment,fromthefirstcelldivisionsintheearlyembryotopostnataldevelopmentandgrowth.Givenitscriticalfunction,itisnotsurprisingthatderegulatedRas/MAPKsignaling,resultingfromeithergeneticorenvironmentalperturbations,canleadtocanceranddevelopmentalabnormalities.Alargeclassofsuchabnormalities,knownasRASopathies,isassociatedwithactivatinggerm-linemutationsinmanycomponentsoftheRaspathway.Overthepastdecadewhennextgenerationsequencing(NGS)hasbecomevaluableandcost-effectivetoolforresearchapplicationsandclinicaldiagnosticsofMendeliandiseases,simultaneoussequencingofmultiplegenesinMAPKsignalingpathwayshaveyieldedmanyreportswithhundredsofmutationspossiblyassociatedwithRASopathiesandcancer.Inparticular,multiplenewmutationswereidentifiedinMEK1kinase.Themajorityofnewlydiscoveredcodingvariationsneitherhavebeendescribedinotherindividualsnorhavebeenstudiedorfunctionallyanalyzedincellularoranimalmodels,thusleavingclinicianstorelyoninsilicopredictionsofthe“variantsofuncertainsignificance”consequenceswithcomputationalsoftware,suchasPolyPhenandSIFT.Automatedsequencesearchesusedinthesemethodsdonotdistinguishpossibleduplicationeventsinthegenes’histories,hencemultiplesequencealignment(MSA)setsusuallyincludebothorthologandparalogcopies.Aspurifyingselectiontreadsononeoftheduplicatecopyitcanbecomeassociatedwithadifferentphenotypecomparedtoitsparalogoussiblingand/ortotheparentalgene.InmostcasesofMendeliandiseasesonlyonespecificduplicateofthegeneinthehumangenomeresultstobeassociatedwithadisease.Thisindicatestheimportanceofconsideringbothcommonancestorsandanygene’sduplicationhistoryforthevariantsinterpretation.ThepresenceofsevenhumanMEKproteinsincreasesthechancesofincludingparalogsintotheanalysis,andtherefore,substantiallylimitsmutationinterpretation.InthisstudyweestablishedthefirstprecisedescriptionofanevolutionaryhistoryofMEKkinasesandidentifiedpotentialduplicationevents.WedeterminedthatMEK1isanancestoroftheentireMEKfamily.Indepthanalysisoftheorthologousproteinsshowedthatessentiallyallexperimentallyprovenpathogenicmutationswerepredictedas“damaging”byourapproach.BycomparingourresultswiththepredictionsmadebyPolyPhen-2andSIFTweshowedhowcarefulanalysisofanevolutionaryhistoryofagenemayimproveaccuracyofmissensemutationsoutcomesprediction.

75

GeneraNlGeneral

IntegrativeAnalysisofCOPDandLungCancerMetadataRevealsSharedAlterationsinImmuneResponse,PTENandPI3K-AKTPathways}

DannielleSkander1,ArdaDurmaz1,MohammedOrloff2,GurkanBebek1

1CaseWesternReserveUniversity,2UniversityofArkansasforMedicalSciences

GurkanBebekChronicobstructivepulmonarydisease(COPD)andlungcancerareamongtheleadingcausesofdeathworldwide.Whileitisbelievedthetwodiseasesarerelated,themechanismsbehindthisrelationshipremainunclear.WeinvestigatetherelationshipbetweenCOPDandlungcancerusinganintegrative-omicsapproach.IntegrationofepigeneticandmRNAgeneexpressiondataallowsustodiscoverthefunctionallyrelevantgenes,i.e.,thegenescrucialfordiseasedevelopment.Usingthisapproach,ourstudysuggeststhatthemechanismsdrivingthedevelopmentofbothdiseasesarerelatedtotheinterleukinimmuneresponse(IL4andIL17),PTENandPI3K-AKTpathways.UnderstandingthisrelationshipbetweenCOPDandlungcanceriscrucialforfuturepreventionandtreatmentoptionsofbothCOPDandlungcancer.

76

General

Investigatingsourcesofirreproducibilityinanalysisofgeneexpressiondata

CarlyA.Bobak,JaneE.Hill

DartmouthCollege

CarlyBobakTheuseofbigdatapromisestochangethelandscapeofbiomedicalresearch;however,irreproducibilityofresultsremainsaproblem.Inthiswork,wesetouttoinvestigateproposedmethodstoincreasereproducibilityofgeneexpressionresults.Specifically,wetestthefollowingthreehypotheses:Resultsfrompathwayenrichmentwillbemoresimilaracrossdatasetsthanresultsondifferentiallyexpressed(DE)genesSimilarityacrosssmallerdatasetswillbelowerthansimilarityinlargerdatasetsResultsfrommulti-cohortdatawillbemoresimilarthanresultsfromsinglecohortdataWeselectedthreeuniquedatasetsfromtheGeneExpressionOmnibusthatincludeactiveTBpatients,spanningpediatricandadultpatients.IneachdatasetwerankedDEgenesastheywereassociatedwithTBvsother(healthycontrols,otherdiseases,orlatenttuberculosisinfection).Wethencalculatedtherankbiasedoverlap(RBO)oftherankedgenesacrosseachdataset.RBOisasimilaritymeasurescaledbetween0and1andcanbeinterpretedastheaverageagreementbetweentwolists.Genesetenrichmentanalysis(GSEA)wasperformed,andwecalculatedarankforthepathwayhitsandcomparedRBOforassociatedpathwaysbetweendatasets.Onaverage,theRBOincreasedbyafoldchangeof1.83×10^4whencomparingsimilarityofassociatedpathwaystosimilarityofDEgenes.Wethendividedeachdatasetinhalfandrepeatedtheanalysisonallsub-datasets.Sub-datasetsfromthesameparentdatasethadsimilarresults(meanRBOof0.60,sd=0.24)asopposedtosubsetsfromadifferentparentdataset(mean=0.10,sd=0.15).Contradictingouroriginalhypothesis,overallRBOcalculatedbetweensubsetsfromdifferentparentdatasetsdidnotnecessarilydecreasecomparedtotheinitialRBOcalculation–infact,halfoftheRBOcomparisonsincreasedinthesub-datasetscomparedtousingthewholedatasets.Totestthefinalhypothesis,weco-normalized,merged,andthenrandomlydivideddatasetsintothreeapproximatelyequalpieces.WerepeatedtheDEanalysisoneachpieceofthemergeddataset.Acrossmixeddatasets,themeanRBOwas0.023(sd=0.43).Heterogeneousdatasetsweremorealikethanuniquedatasets,butlessalikethanasingledivideddataset.However,theRBOsfrommixeddatasetscomparedtooriginaldatasetswerenotstatisticallysignificantlydifferentfromtheRBOscomparingresultsfromtheoriginaldatasets.Thus,wedemonstratedthatassociatedpathwaysaregreatlymorereproduciblethanassociatedgenes.Furtherstudyisnecessarytoinvestigatetheconditionsunderwhichstatisticalpowerandheterogeneityofdatainfluencereproducibilityoffindingsfromgeneexpressionstudies.

77

General

EthereumandMultiChainblockchainsassecuretoolsforindividualizedmedicine

CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein

YaleUniversityComputationalBiologyandBioinformaticsProgram

CharlotteBrannonWiththerapidlydecreasingcostofgenomesequencingandadventofindividualizedmedicine,relianceonindividualgenomicdatawillsoonbeintegraltomedicaltreatmentdecisions.Forexample,apatient’spersonalgenomicsequencewillprovidephysicianswithinformationonwhichtobasetestsanddiagnoses.Similarly,pharmacogenomicsdatawillrevealthemosteffectiveprescriptionsforaparticularpatient.Genomicdatawillneedtobesharedefficientlyamongmultipleparties.However,becausethesearesensitivepersonaldatawhichwilldirectlyimpactmedicaltreatmentdecisions,theymustbemaintainedinasecure,high-integrityfashion.Blockchaintechnologyisonewaytoachievesecure,high-integritydatastorage.Wepresenttwoproof-of-conceptsolutions,oneforstoringandqueryingpersonalgenomicsequencedatainaMultiChainblockchaindesignedfordirectsharingwithphysicians;andoneforstoringandqueryinggene-druginteractiondatainanEthereumblockchainsmartcontractdesignedforsharedaccessamongpermissionedresearchersandphysicians.Despitethehighsecurityandintegritythatcomeswithblockchaindatastorage,thereisatrade-offwithdataaccessefficiencyandstoragecosts.Weovercomethesechallengesbydevelopingnovelstoragetechniques.Whenstoringpersonalgenomicsequencedata,wedonotstoretheactualsequencedatabutratherasetofmeta-datawhichcanbeusedincombinationwithareferencegenometoreconstructtheoriginalsequences.Whenstoringpharmacogenomicsdata,weuseanindex-based,multi-mappingapproachtoprovidetime-andspace-efficientinsertionandquerying.

78

General

GenomicpredictorsofL-asparaginase-inducedpancreatitisinpediatriccancerpatients

BrittI.Drögemöller,GalenE.B.Wright,ShahradR.Rassekh,ShinyaIto,BruceC.Carleton,ColinJ.D.Ross,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium

FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BC,Canada;BCChildren’sHospitalResearchInstitute,UniversityofBritishColumbia,Vancouver,BC,Canada;DepartmentofPediatrics,FacultyofMedicine,UniversityofBritishColumbia,Vancouver,BC,Canada;ClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Universityof

Toronto,Toronto,ON,Canada;PharmaceuticalOutcomesProgramme,BCChildren’sHospital,Vancouver,BC,Canada

BrittDrogemollerBackground:L-asparaginaseishighlyeffectiveinthetreatmentofpediatricacutelymphoblasticleukemia.Unfortunately,theuseofthistreatmentislimitedbytheoccurrenceofpancreatitis,asevereandpotentiallylethaladversedrugreaction,whichoccursin2-18%ofpatients.AspreviousstudieshavebeenunabletoidentifystrongassociationsbetweenclinicalvariablesandsusceptibilitytoL-asparaginase-inducedpancreatitis,geneticfactorsareexpectedtoplayanimportantrolethisadversedrugreaction.Objectives:WesoughttoexploretheroleofthesegeneticsusceptibilityfactorstoL-asparaginase-inducedpancreatitisinpediatriccancerpatients.Methods:PatientswhoweretreatedwithL-asparaginasewererecruitedfrom13pediatriconcologyunitsacrossCanada(n=284)andextensiveclinicaldatawerecollectedforallpatients.GenotypingwasperformedusingtheIlluminaHumanOmniExpressandGlobalScreeningArraysandpancreaticgeneexpressionprofileswereimputedintheseindividualsusingGTExv7andS-PrediXcan.Genome-andtranscriptome-wideassociations(GWASandTWAS)wereperformedtoidentifyassociationswithL-asparaginase-inducedpancreatitis.Results:GWASanalysesidentifiedsignificantassociationsbetweengeneticvariantsinHLA-DQA1and–DRB1andpancreatitis,whileTWASrevealedthatindividualsexperiencingL-asparaginase-inducedpancreatitisexhibitedlowerexpressionlevelsofHLA-DRB5.FurtherinterrogationoftheTWASdatarevealedanenrichmentingenesinvolvedinthesomaticdiversificationofimmunereceptors.Conclusions:Theseanalysesuncoveredanassociationbetweengeneticvariationinimmune-relatedgenesandthedevelopmentofL-asparaginase-inducedpancreatitis.TheseassociationsmirrorpreviousassociationswiththeHLAregionand(i)pancreatitisinducedbyotherdrugsand(ii)L-asparaginase-inducedhypersensitivity.

79

General

NITECAP:Anovelmethodandinterfacefortheidentificationofcircadianbehaviorinhighlyparalleltime-coursedata

ThomasG.Brooks1,CrisW.Lawrence1,NicholasF.Lahens1,SoumyashantNayak1,DimitraSarantopoulou1,GarretA.FitzGerald1,2,GregoryR.Grant3

1InstituteforTranslationalMedicineandTherapeutics(ITMAT),UniversityofPennsylvania;

2SystemsPharmacologyandTranslationalTherapeutics;3DepartmentofGenetics,UniversityofPennsylvania

ThomasBrooksWeintroduceanewtoolcalledNITECAPforthetaskofidentifyingcircadianbehaviorinmassivelyparallelmeasurementsofbiologicalentities;forexample,findingcircadiangenesfromgeneexpressiontimecoursedatameasuredbyRNA-Seqormicroarrays.NITECAPemploysapermutation-basedapproachwhichusesanovelstatisticdesignedtobesensitivetocircadianbehavior.NITECAPalsousesanapproachtomultiple-testingwhichproducesq-valuesdirectlywithoutneedingtofirstgeneratep-valueswhichthenneedtobeadjusted.Ourapproachhasseveraladvantagesparticularlywhenindividualp-valuesareunderpoweredorunreliable.Importantly,wehavedevelopedanintuitiveuser-friendlyweb-basedinterfacewhichenablesinvestigatorstoperformrobustcircadiananalysesofthistypedirectlywithoutexpertinformaticssupport.Userscanquicklyscrollthroughtimecourseprofilessortedbyeffectsize,greatlyfacilitatingthechoiceofsignificancethresholdsthatcurrentlyrequiremakingblindchoicesofnumericalcutoffs.Puttingthistypeofanalysisinthehandsoftheinvestigatorscansignificantlystreamlinetheirresearch.ThewebsitealsoenablestheotherstandardsignificancetestssuchasJTKandANOVAandprovidestoolstoperformcomparativestudies,suchasfindingphaseoramplitudedifferencesbetweendifferentconditions.NITECAPisfreelyavailableforpublicuseat:http://www.nitecap.org

80

General

TheInterplayofObesityandRace/EthnicityonMajorPerinatalComplications

YaadiraBrown,MPH1;OlubodeA.Olufajo,MD,MPH2;EdwardE.CornwellIII,MD2;WilliamSoutherland,PhD3

1ResearchCentersinMinorityInstitutions:HowardUniversity,HowardUniversityCollegeofMedicine;2ResearchCentersinMinorityInstitutions:HowardUniversity,CliveCallender

Howard-HarvardHealthSciencesOutcomesResearchCenter;3ResearchCentersinMinorityInstitutions:HowardUniversity

YaadiraBrownBackground:Ithasbeenestablishedthatasignificantdisparityexistsintheratesofadverseperinataloutcomesacrossdifferentracial/ethnicgroups,withnon-HispanicBlackwomengenerallybeingmostimpacted.Thereisalsoevidencethatobesityisassociatedwithadverseperinataloutcomes.Althoughsomestudieshaveexaminedtheimpactofrace/ethnicityandobesityonadverseperinataloutcomes,moststudieshavedonesousinglocalorstatewidedata.Thisstudyaimstouseanationalsampletodeterminetheroleofobesityintheracial/ethnicdisparitiesseeninadverseperinataloutcomesintheUnitedStates.Methods:DatafromtheNationalInpatientSamplewasutilizedinselectingpregnantwomenadmittedfordeliverybetween2010and2014.Demographics(race/ethnicity,insurancetype,householdincome,co-morbidities)andhospitalcharacteristicswereextracted.Race/ethnicitywascategorizedasNon-HispanicWhites(NHW),Non-HispanicBlacks(NHB),andHispanics.Outcomesofinterestweregestationaldiabetes,pre-eclampsia,pre-termbirth,andhospitalmortality.Multivariatelogisticregressionswereperformedtodeterminetheindependentpredictorsoftheoutcomes,usingtwosetsofmodels;onewhichincludedobesityasavariableinthemodelandonewhichdidnot.ThedifferencesbetweenthetwosetsofmodelswerecomparedbyperformingtheWaldTest.Results:Ourcohortconsistedof15,561,942pregnantindividualsadmittedfordelivery.Therewere9,247,729(59.43%)NHW,2,552,569(16.4%)NHB,and3,761,644(24.17%)Hispanic.Comparedtoothergroups,NHBhadsignificantlyhigherratesofpre-eclampsia(5.1%),pre-termbirth(9.4%),andhospitalmortality(.11%).Theyalsohadthehighestratesofobesity(9.0%).Onmultivariateanalysis,NHBweremorelikelytohavepre-eclampsia(AdjustedOddsRatio[aOR]1.26;95%ConfidenceInterval[CI]1.23-1.29),pre-termbirth(aOR1.38;95%CI1.34-1.41),andhospitalmortality(aOR2.05;95%CI1.2-3.38)whencomparedtoNHW.However,theyhadasimilarriskforgestationaldiabetes(aOR0.94;95%CI0.91-0.96)asNHW.Obesitywassignificantlyassociatedwithgestationaldiabetes(aOR3.08;95%CI3.02-3.15),pre-eclampsia(aOR2.14;95%CI2.09-2.19),andpre-termbirth(aOR1.04;95%CI1.01-1.06).Althoughthedifferenceswereminimal,theregressionmodelsthatincludedobesityasavariablebetterpredictedtheoutcomesthanthosethatdidnotwhenassessinggestationaldiabetes,pre-eclampsia,andpre-termbirth.Conclusion:Thesefindingsfurtherconfirmthatracial/ethnicdisparitiesexistamongstadverseperinataloutcomes,withNHBbeingdisproportionatelyaffected.Theyalsosuggestthatobesityplaysasignificantroleintheracial/ethnicdisparitiesthatdoexistfortheadverseperinataloutcomesmeasured,otherthanhospitalmortality.Thesedatasuggestthataddressingobesityinthepopulationmaybebeneficialinimprovingperinataloutcomes,buttheyalsosuggestthatmoreresearchisneededtoidentifythemajorfactorsthatdrivetheracial/ethnicdisparitiesthatexistamongstperinataloutcomesintheUnitedStates.

81

General

AComparisonofPharmacogenomicInformationinFDA-ApprovedDrugLabelsandCPICGuidelines

KatherineI.Carrillo1,TeriE.Klein2

1HenryM.GunnHighSchool,PaloAlto,CA;2StanfordUniversity,Stanford,CA

KatherineCarrilloPharmacogenomics(PGx)isusefulinhelpingtopredictapatient’slikelyreactiontoamedicationbasedontheirgenotype,allowingforpersonalizedmedicine.TheFDAmaintainsa“TableofPharmacogenomicBiomarkersinDrugLabeling”(https://www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenomic-biomarkers-drug-labeling)consistingofpharmacogenomicinformationfoundinthedruglabeling.However,manylabelsonthelistdonotcontainadviceforaclinicianabouthoworwhentouseapatient’sgeneticinformation.GuidelinescreatedbytheClinicalPharmacogeneticImplementationConsortium(CPIC;https://cpicpgx.org/)containinformationabouthowtousepatientgeneticinformationwhenprescribingdrugs.Also,CPICprovidesguidelinesforsomedrugsnotcurrentlyontheFDAbiomarkerlist,thoughitdoesnotprovideguidelinesforeverydrugonthebiomarkerlist.UsingPharmGKBannotatedFDA-approvedlabels(throughOctober2019),weevaluatedlabelinformationtodetermine(1)whichlabelscontainedanykindofprescribinginformationincludingasuggestedalternatedrug,dosinginformationorspecialconsiderationsbasedonthepatient’sgenotype/metabolizerstatus,(2)whichPharmGKBannotatedlabelswerepresentontheFDAbiomarkerlist,and(3)whatgeneswereinvolved.WedidnotincludeFDAlabelsannotatedforgeneticvariationincancercells;onlygermlinevariationwasincluded.WecomparedallavailableCPICguidelinerecommendationstotheinformationfromthelabels.Weidentifiedwherethelabelsandguidelinesaresimilarornot.PharmGKBhas223annotations(notincluding82annotationsforcancercellDNAvariation)basedon219FDA-approveddruglabels.Ofthese,199labelsarecurrentlyonthebiomarkerlistand17wereonthebiomarkerlistatonetimebuthavebeenremovedbytheFDA.Twentylabelshavedosinginformationand35recommendanalternatedrugbasedongenotype/metabolizerphenotype.Another34labelshavesomeotherspecialconsideration,butmostlabelsonthebiomarkerlist(136)havenoguidanceforcliniciansaboutwhattodoaboutthebiomarker,ifanything.Thereare45drugswithpublishedCPICguidelines(https://cpicpgx.org/genes-drugs/).Thirty-sixofthedrugshavealabelontheFDAbiomarkerlistbuttheinformationonthelabeldoesnotalwaysmatchtheguideline.Only21oftheCPICdrugshavelabelswithguidance.Forsomedrugs,thePGxinformationonthelabelsissimilartotheCPICguidelinesbutdifferentformanyothers.TheFDAbiomarkerlisthasmoredrugsthanCPICguidelineswrittenandinsomecasesthelabelstellclinicianswhentheyshouldtestapatientwhileCPICdoesn’ttalkabouttesting.However,formostdrugs,thelabelsdon’tgivethecliniciansalotofinformationaboutwhattodowiththeirpatients’genetictestresults.ForthedrugswithCPICguidelines,thereismoreinformationabouthowtousegenetictestresultsandwhy.FundedbyNIH/NIGMSR24GM61374.

82

General

xTEA:atransposableelementinsertionanalyzerforgenomesequencingdatafrommultipletechnologies

ChongChu1,RebecaMonroy2,SoohyunLee1,E.AliceLee2,PeterJ.Park1

1HarvardMedicalSchool,2BostonChildren'sHospital

E.AliceLeeTransposableelements(TEs)comprisenearly50%ofthehumangenome.AlthoughmostoftheTEsarenowsilent,severaltypesofretrotransposonsincludingLINE-1,Alu,andSVAarestillactive.SomaticTEinsertionshavebeenshowntooccurfrequentlyinmultipletumortypes[1,2]andatalowrateinneuronsofphenotypicallynormalindividuals[3].MultipletoolshavebeendevelopedtocallTEinsertionsfromgenomesequencingdata,butanefficienttoolthatcanidentifybothgermlineandsomaticTEinsertionswithhighsensitivityandspecificityisstilllacking.Moreover,newertechnologiessuchas10XLinked-ReadandPacBioorNanoporelongreadsequencingprovideanunprecedentedopportunitytostudyTEs;however,currentmethodsdonottakeadvantageofthesedatatypes.Here,wepresentanewcomputationaltoolxTEA,buildingonourpreviousalgorithmTEA[1].ThistoolidentifiesTEinsertionsfromIlluminapaired-endreads,10XLinked-Reads,longreads,oracombineddataset.xTEAoutperformsMELT[4]andTraffic-mem[5]onnormalandtumorIlluminadata,respectively.Acomparisonofdifferentsequencingplatformsrevealsthattheanalysisoflongreadshadgreatersensitivityandspecificity,especiallyinrepetitiveregions.Both10XLinked-ReadsandlongreadsdemonstratedclearadvantagesovershortreadsinconstructingfulllengthTEinsertions.Betterperformancewasachievedonhybriddatacomparedtosingleplatformdata.Using22humansampleswitheitherPacBioorNanoporelongreadsandmatchedshortreads,weuncoveredLINE-1internalSVhotspotsandSVAinternalVNTRexpansion.xTEAisacomprehensivecross-platformTEinsertion-callingtool.Itcanbedeployedonacomputingcluster,AWS,andGoogleCloud,andisefficientforlargecohortanalysis.xTEAispubliclyavailableathttps://github.com/parklab/xTEA.References[1]Lee,Eunjung,etal."Landscapeofsomaticretrotranspositioninhumancancers."Science337.6097(2012):967-971.[2]Rodriguez-Martin,Bernardo,etal."Pan-canceranalysisofwholegenomesrevealsdriverrearrangementspromotedbyLINE-1retrotranspositioninhumantumours."BioRxiv(2017):179705.[3]Evrony,GiladD.,etal."Celllineageanalysisinhumanbrainusingendogenousretroelements."Neuron85.1(2015):49-59.[4]Gardner,EugeneJ.,etal."TheMobileElementLocatorTool(MELT):population-scalemobileelementdiscoveryandbiology."Genomeresearch27.11(2017):1916-1929.[5]Tubio,JoseMC,etal."ExtensivetransductionofnonrepetitiveDNAmediatedbyL1retrotranspositionincancergenomes."Science345.6196(2014):1251343.

83

General

GoGetData(GGD):simple,reproducibleaccesstoscientificdata

MichaelCormier1,JonBelyeu1,BrentPedersen1,JoeBrown1,JohannesKoster2,AaronR.Quinlan1

1DepartmentofHumanGenetics,UniversityofUtah,SaltLakeCity,UT,USA;2Algorithmsforreproduciblebioinformatics,InstituteofHumanGenetics,UniversityofDuisburg-Essen,Essen,

NRW,Germany

AaronQuinlanGenomicsresearchiscomplicatedbythedifficultyofidentifying,collecting,andintegratingthenumerousdatasetsandannotationsgermanetoourexperiments.Furthermore,thesedataexistindisparatesources,andarestoredindiverse,oftenabusedformatspertainingtodifferentgenomebuilds.Thesecomplexitieswastetime,inhibitreproducibility,andcurtailresearchcreativity.Inspiredbythesuccessofsoftwarepackagemanagers,wehavedevelopedGoGetData(GGD;https://gogetdata.github.io/)asafast,reproducibleapproachtoinstallstandardizedpackagesofdataandannotationsforgenomicsresearch.

84

General

GlobalepigenomicregulationofgeneexpressionandcellularproliferationinT-cellleukemia

SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue

PennsylvaniaStateUniversityCollegeofMedicine,Hershey,PA,USA

SinisaDovatIkarosencodesaDNA-bindingproteinthatfunctionsasatumorsuppressorinT-cellacutelymphoblasticleukemia(T-ALL).Deletionand/orfunctionalinactivationofIkarosresultsinthedevelopmentofhigh-riskleukemia.ThemechanismsthroughwhichIkarosregulatesgeneexpressionandtumorsuppressioninT-ALLareunknown.Ikaroshaplo-knockoutmicedevelopT-ALLwith100%penetrancewitharrestofT-celldifferentiation.DuringtheprocessofmalignanttransformationtoT-ALL,IkaroshaploinsufficientthymocyteslosetheirremainingwildtypeIkarosallele.Re-introductionofIkarosintoIkaros-nullT-ALLcellsresultsincessationofcellularproliferationandinductionofT-celldifferentiation.Thus,thisisanoptimalsystemforstudyingIkarostumorsuppressorfunctionbecauseitcapturestheroleofIkarosinthetransitionfromamalignantstate(Ikaros-nullT-ALL)toanon-malignantstate(followingIkarosre-introduction).WeusedATAC-seqandChIP-seqofH3K4me1,H3K4me3,H3K27ac,andIkarostoperformdynamic,globalepigenomicandgeneexpressionanalysesatseveraltimepointsinIkaros-nullT-ALLandfollowingIkarosre-introductioninordertodeterminethemechanismsofIkaros’tumorsuppressoractivity.ExpressionanalysisidentifiedalargenumberofnovelsignalingpathwaysthataredirectlyregulatedbyIkarosandIkaros-inducedenhancers,andthatareresponsibleforthecessationofproliferationandinductionofT-celldifferentiationinT-ALLcells.EpigenomicanalysisidentifiednovelIkarosfunctionsintheepigeneticregulationofgeneexpression:Ikarosdirectlyregulatesdenovoformationanddepletionofenhancers;denovoformationofactiveenhancersandactivationofpoisedenhancers;andIkarosdirectlyinducestheformationofsuper-enhancers.GlobalanalysisofchromatinaccessibilityrevealedthatIkarosbindingresultedintheopeningofover3400previously-inaccessiblechromatinsites.ThisisaccompaniedbydenovoenrichmentofH3K4me1andH3K4me3modificationsandformationofdenovoenhancersandpromoters.ThesedatademonstratethatIkaroshaspioneeractivityandtriggerscoordinatedregulationofgeneexpression.Ikarospioneeringactivitywasfurtherdeterminedbydirectbindingofikarostoreconstitutednucleosomesbyelectromobilityshiftassay.Dynamicanalysesdemonstratethelong-lastingeffectsofIkaros’DNAbindingonenhanceractivation,denovoformationofenhancersandsuper-enhancers,andchromatinaccessibility.Inconclusion,ourresultsestablishthatIkaros’tumorsuppressorfunctionoccursviaglobalregulationoftheenhancerandsuper-enhancerlandscape,alongwithregulationofchromatinaccessibility,andidentifiednoveltumorsuppressorregulatorypathwaysinT-ALL.

85

General

Apharmacogenomicinvestigationofthecardiacsafetyprofileofondansetroninchildrenandinpregnantwomen

GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraA.Magee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,

ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton

BritishColumbiaChildren’sHospitalResearchInstitute,Vancouver,BritishColumbia,Canada;PharmaceuticalOutcomesProgramme,BritishColumbiaChildren’sHospital,Vancouver,BritishColumbia,Canada;Divisionof

TranslationalTherapeutics,DepartmentofPediatrics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;ClinicalResearchUnit,Children'sHospitalResearchInstituteofManitoba,Winnipeg,

Manitoba,Canada;DivisionofClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Toronto,Ontario,Canada;BritishColumbiaWomen’sHospitalandHealthCentre,Vancouver,BritishColumbia,Canada;DepartmentofAnesthesiology,PharmacologyandTherapeutics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;SchoolofLifeCourseSciences,FacultyofLifeSciencesandMedicine,King'sCollege,London,UnitedKingdom;DepartmentofPediatricAnesthesia,BritishColumbiaChildren'sHospital,Vancouver,BritishColumbia,Canada;MaxRadyCollegeofMedicine,RadyFacultyofHealth

Sciences,UniversityofManitoba,Winnipeg,Manitoba,Canada;DepartmentofObstetricsandGynecology,St.Michael'sHospital,Toronto,Ontario,Canada;EpiMethodsConsulting,Toronto,Ontario,Canada;DivisionofCardiology,DepartmentofPediatrics,Children'sHeartCentre,BCChildren'sHospital,UniversityofBritish

Columbia,Vancouver,CanadaGalenWrightBackground:5-HT3receptorantagonists,suchasondansetron,arehighlyeffectivemedicationsforthetreatmentofnauseaandvomiting.However,thesemedicationsarealsoassociatedwithprolongationoftheQTinterval,placingpatientsatriskofcardiacadverseevents.Pharmacogenomicinformationfortherapeuticresponsetoondansetronexists,particularlypertainingtoCYP2D6,butnostudyhasbeenperformedongeneticfactorsthatinfluencethecardiacsafetyofthismedication.Objectives:Determineondansetron-inducedcardiacelectrophysiologicalchangesinthreeuniquepatientcohortsandidentifypharmacogenomicpredictorsofQTintervalprolongation.Methods:Threepatientgroupsreceivingondansetronforthepreventionofnauseaandvomitingwererecruitedandfollowedprospectively(pediatricpost-surgicalpatientsn=101;pediatriconcologypatientsn=98;pregnantwomenn=62).Electrocardiogramswereconductedatbaselineandpost-ondansetronadministration.PharmacogenomicassociationswerethenassessedviaanalysesofcomprehensiveCYP2D6genotypingdataandgenome-wideassociationanalyses.Results:Intheentirecohort,62patients(24.1%)weredefinedascasesbasedonBazett-correctedQTcvalues.Themostsignificantshiftfrombaselineoccurredatfiveminutespost-administration(P=9.8x10-4).Genome-wideanalysesidentifiednovelcandidategenesforthisdrug-inducedphenotype.ThetwomostsignificantassociationswereobservedforamissensevariantinTLR3(rs3775291;P=2.00x10-7)andaneQTLforSLC36A1(rs34124313;P=1.97x10-7).Thesegenesareimplicatedinserotonin-andQT-relatedtraitsandthereforelikelyrepresentbiologicallyrelevantfindings.CYP2D6activityscorewasnotassociatedwithcase-controlstatus.Conclusions:Theresultsofthisstudyprovidethefirststeptowardsunderstandingthegenomicbasisofcardiacchangesoccurringafterondansetronuseinchildrenandpregnantwomen,withtheoverallgoaltoimprovethesafetyofthesecommonlyusedantiemeticmedications.

86

General

TREND:aplatformforexploringproteinfunctioninprokaryotesusingphylogenetics,domainarchitectures,andgeneneighborhoodsinformation.

VadimM.Gumerov,IgorB.Zhulin

TheOhioStateUniversity

VadimGumerovKeystepsinacomputationalstudyofproteinfunctioninvolveanalysisof(i)relationshipsbetweenhomologousproteins,(ii)proteindomainarchitecture,and(iii)geneneighborhoodsthecorrespondingproteinsareencodedin.Eachofthesestepsrequiresaseparatecomputationaltaskandsetsoftools.Combiningtheresultsintoacompleteanalysisisusuallydonebyhand,whichistime-consuminganderror-prone.Herewepresentanewplatform,TREND(tree-basedexplorationofneighborhoodsanddomains),whichcanperformallthenecessarystepsinautomatedfashionandputthederivedinformationintophylogenomiccontext,thusmakingevolutionarybasedproteinfunctionanalysismoreefficient.TRENDisfreelyavailableathttp://trend.zhulinlab.org.TRENDconsistsoftwopipelines:(1)Domains,whichidentifiesproteindomains,transmembraneregionsandlow-complexitysegments,andmapsthisinformationonthephylogenetictree,and(2)Neighborhoods,whichidentifiesgeneneighborhoodsforthegivensetofproteinsequences,clustersthegenesbasedonshareddomainsoftheencodedproteins,identifiesoperonsandputsthederiveddataintophylogenomiccontext.LocallystoreddatabasesofthePfamprofileHiddenMarkovmodels(HMMs)andCDDposition-specificscoringmatricesareusedasasourceofmodelsfordomainsidentification.Anothersourceisarichcollectionofsignal-transductionspecificprofileHMMsderivedfromMiSTdatabase.Thepipelinesarehighlycustomizable.Onstart,bothpipelinesfirstalignprovidedproteinsandbuildphylogenetictrees.Thesestepscanbeskippedifaresearcheralreadyhasanalignmentoratreeandwouldliketousetheminstead.Optionallyredundancyofthesequencescanbereduced.Insteadofproteinsequences,proteinidentifierscanbeprovidedasinput;correspondingsequenceswillbefetchedfromRefSeqandMiSTdatabases.Resultsofthepipelinesarepresentedasinteractivepictureswithcross-linkstoPfam,CDD,RefSeqandMiSTdatabases.Allproducedresultscanbedownloadedforsubsequentanalysis.

87

General

TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies

CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4

1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof

MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer

Centre,NewYork,USA(pending)

CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Reconstructingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.

88

General

AFlexiblePipelineforthePredictionofBiomarkersRelevanttoDrugSensitivity

V.KeithHughitt1,SayehGorjifard1,AleksandraM.Michalowski1,JohnK.Simmons2,RyanDale1,EricC.Polley3,JonathanJ.Keats4,BeverlyA.Mock1

1NCI,2PersonalGenomeDiagnostics,3MayoClinic,Rochester,4TGen

V.KeithHughittRecentyearshaveseenanexplosionintheavailabilityofpairedmolecularprofilinganddrugscreendata,providinganunprecedentedopportunityforthedevelopmentoftargetedtherapiesbasedonanindividual’sgeneticbackground.Despiteanumberofrecentsuccessesindiseasesrangingfromcysticfibrosistocancer,significanthurdlesremaininourabilitytoaccuratelypredicttreatmentsbasedonmolecularprofilingdata.Inparticular,fewsuchtoolsexistthatallowtheintegrationofheterogeneousdatatypes(e.g.genomic,transcriptomic,andsomaticmutations),alongwithhigh-throughputdrugscreendatatomakepredictionsabouttreatmentefficacy.Here,wedescribeageneralizedopen-sourcepipelinedevelopedfortheanalysisofprecisionmedicinedata,PharmacogenomicsPredictionPipeline,or“P3”.ThemodulardesignofP3enablestheinclusionofarbitraryinputdatatypesandtheselectionfrommultiplealternativemachinelearningalgorithms,whileautomatedstatisticalandvisualizationreportingstepsincorporatedthroughoutthepipelineassistinparametertuningandearlydetectionofproblematicdataelements.ByincorporatingexternalbiologicalannotationsfromsourcessuchasTheMolecularSignaturesDatabase(MSigDB),DrugSignaturesDatabase(DSigDB),andDrugBank,P3isabletodetectimportantpathwayscorrelatedwithdrugsensitivity,whiletheinclusionofmolecularprofilingandclinicaldatafromexternalpatientandcelllinesdatasetsallowsP3tofocusitseffortsongeneswhicharemostlikelytoplayaroleintherapeuticresponse.TodemonstratetheuseofP3forpreclinicalbiomarkerprediction,weappliedP3toanunpublishedmultiplemyelomadatasetconsistingofexome,RNA-Seq,anddrugscreendatafor1900compoundsacross45tumorcelllines.Furthermore,geneexpressionandclinicaldatafrom20additionalpublically-availablepatientandcelllinemultiplemyelomadatasets(>5,500samplesintotal),alongwithdatafromtheGDSCandCCLEdrugsensitivityexperimentswerealsoanalyzed,providingarichsourceofinformationwithrespecttothebiologicalrelevanceofputativebiomarkersdetectedbythepipeline.

89

General

CreatingaMetabolicSyndromeResearchResource(MetSRR)

WillyshaJenkins1,ChristianRichardson2,ClarLyndaWilliams-DeVanePhD1

1FiskUniversityNashvilleTN,2DukeUniversityDurhamNC

WillyshaJenkinsMetabolicsyndrome(MetS)isamultifacetedsyndrome.Riskfactorsincludevisceraladiposity,dyslipidemia,hyperglycemia,hypertension,andenvironmentalfactors.Anestablishedcomponentofchronicdiseasesequela,MetSleadstoanincreasedriskofcardiovasculardiseaseandtype2diabetes.MetSalsoleadstoanincreasedriskofstroke.ComparativestudieshaveidentifiedheterogeneityinthepathologyofMetSacrossgroups,however,theetiologyofthesedifferenceshasyettobeelucidated.DespitethepresenceofpublicrepositoriesofbiologicalMetS-relateddata,theabilitytoaccessandworksaiddatahasitschallenges.Theprocessofqueryingdatabases,wrestlingwithsoftwareandwranglingdataintoworkableformatspriortoanalysisisbothcumbersomeandtimeconsuming.TheMetabolicSyndromeResearchResource(MetSRR)isacurateddatabasethatprovidesaccesstoMetSassociatedbiologicalandancillarydata.ItisanamalgamationofcurrentandpotentialbiomarkersofMetSextractedfromrelevantNationalHealthandNutritionExaminationSurvey(NHANES)datafrom1999-2016.Eachpotentialbiomarkerselectionwasdrivenbyinsightselucidatedbythereviewofover100peer-reviewedarticles.Itincludes28demographic,surveyandknownMetSrelatedvariables.Thereare9curatedcategoricalvariablesand42potentiallynovelbiomarkers.Allmeasuresarecapturedfromover90,000individuals.ThisbiocurationeffortwillprovideincreasedaccesstocuratedMetSrelateddata.ItwillalsoserveasahypothesisgenerationtoolfordisparateMetSetiologydiscovery,providingtheabilitytogenerate;andexportethnicgroup/race,sex,andage-specificcurateddatasets.MetSRRseekstobroadenparticipationinresearcheffortstoidentifyclinicallyevaluativedisparateMetSbiomarkers.Tothebestofourknowledge,MetSRRistheonlyMetSspecificdatabasetargetedatuncoveringthedisparateetiologyofMetSthroughbiocuration.

90

General

Utilizingcohortinformationtofindcausativevariants

SenayKafkas,RobertHoehndorf

ComputationalBioscienceResearchCenter,Computer,ElectricalandMathematicalSciences&EngineeringDivision,KingAbdullahUniversityScienceandTechnology,4700KAUST,Thuwal,

23955-6900SaudiArabiaSenayKafkasIdentificationofcausativevariantsingenomicdataischallenging.Currentstudiesfocusonprioritizingvariantswithinindividualgenomes,orapplystatisticalmethods(e.g.GWAS)tolargecohorts.WiththerapidadvancementsandcostdecreaseinNGS,scientistsareabletoproducesequencedatafromlargediseasecohortsandhealthypopulation.Forexample,UKBiobankmakesavailablegenotypetophenotyperelationsfor>500,000individualsandwholeexomesequencing(WES)datafor50,000individuals.Patientswiththesame/similarsetofphenotypesmaysharethesame/biologicallyrelatedgeneticabnormalitiesandriskfactors.Theavailabilityofthesedatasetsmayallowustostratifyindividualsbytheirphenotypeandusethisinformationtoidentifycausativevariantswithinlargecohorts.WeproposeanewmethodthatstratifiespatientsbytheirphenotypesandidentifiesthesetofcausativevariantswhichcanexplainphenotypesinmostindividualswithinacohortfromWES/WGS.First,wegeneratedandusedsyntheticdiseasecohortstoevaluateourmethod.Weusedthehumangenotype-phenotypeassociationsfromClinVarandthesequencedatafrom1000Genomesandgeneratedsyntheticcohortswithdifferentpopulationsizesfor200randomlyselecteddiseasesfromClinVar.TogenerateasyntheticdiseasecohortofsizeN,firstwepickedrandomlyNindividualsfrom1000Genomesandthenforeachindividual,wepickedrandomlyoneofthevariantsofthegivendiseaseandaddedittothegenotypeofthegivenindividual.Wepre-processedthesequencedatabyannotatingwithCADDandselectingonlythemostdeleteriousvariantofagivengeneforeachindividual.Furthermore,we“normalize”pathogenicityscoresbasedontheirfrequencieswithinapopulationinordertoaccountfordifferentdistributionwithingenesbasedontheirlength.WethenapplyourmethodonUKBiobank.WedevelopedamethodthatidentifiescausativevariantsbyutilizinginformationaboutsharedphenotypeswithinacohortandcomparedthemagainstindividuallyprioritizingvariantsusingWES/WGSdataandaveragegeneranks.Ourapproachreliesonamachinelearningmodeltrainedonapathogenicitypredictionscore(e.g.CADD),thefrequencyofobservingapathogenicityscoreaboveacertainthresholdinthesamegenewithinapopulation,andusesthiscohortandphenotype-derivedinformationasfeaturetopredictcausativevariantswithinindividualgenomesequences.Ourmethodcanidentifycausativevariantsinsmallandmedium-sizedcohorts(2to100individuals).Asthediseasebecomesmorecomplex(i.e.involvingharmfulvariantsinmultiplegenes),ourmachinelearningmodelimprovesoverestablishedmethodsinparticularinlargercohorts(>80individuals).Currently,weappliedourmethodonUKBiobankandsuggestcandidatecausativevariantsfor1499complexdiseases.

91

General

IntegratedanalysisofJAK-STATpathwayinhomeostasis,simulatedinflammationandtumour

MilicaKrunic1,AnzhelikaKarjalainen1,MojoyinolaJoannaOla1,StephenShoebridge1,SabineMacho-Maschler1,CarolineLassnig1,AndreaPoelzl1,MatthiasFarlik2,NikolausFortelny2,

ChristophBock2,BirgitStrobl1,MathiasMueller1

1InstituteofAnimalBreedingandGeneticsandBiomodelsAustriaUniversityofVeterinaryMedicineViennaAustria;2CeMM–CenterforMolecularMedicineAustrianAcademyofSciences

ViennaAustria

MilicaKrunicJanuskinases(JAKs)andsignaltransducersandactivatorsoftranscription(STATs)playakeyroleincytokinesignallingandinthedefenceagainstinfectionandcancer.JAK-STATsignallingcomponentsinteractwithchromatinremodellingproteinsandchangechromatinarchitecture/landscapeduringcelldifferentiationandrecognitionandeliminationofpathogens.Usingdifferentsequencingapproaches(ATAC-Seq,ChIPmentation,single-cellRNA-Seq,RNA-Seq),ourgoalistountangletherolesofJAK-STATproteinsinshapingchromatinlandscapesofmyeloidandlymphoidcellsinhomeostasis,sterile(simulated)inflammationandwithintumourmicroenvironment.Additionally,weareinvestigatinghowevolutionaryconservedSTATproteinisoformsinteractwithchromatinandco-regulatoryproteinstoinducecelltype-andgene-specificresponses.Thepostershowsoursummarisedfindingsasaresultofintegrationofdifferentapproaches.

92

General

BEERS2:TheNextGenerationofRNA-SeqSimulator

NicholasF.Lahens1,ThomasG.Brooks1,DimitraSarantopoulou1,SoumyashantNayak1,CrisW.Lawrence1,AnandSrinivasan2,JonathanSchug3,4,GarretA.FitzGerald1,5,JohnB.Hogenesch6,

YosephBarash4,GregoryR.Grant1,4

1InstituteforTranslationalMedicineandTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;2PMACSEnterpriseResearchApplicationsandHigh

PerformanceComputing,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;3InstituteforDiabetes,Obesity,andMetabolism,PerelmanSchoolofMedicine,Universityof

Pennsylvania,Philadelphia,PA;4DepartmentofGenetics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;5DepartmentofSystemPharmacologyandTranslationalTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;6DivisionofHumanGenetics,DepartmentofPediatrics,Centerfor

Chronobiology,CincinnatiChildren'sHospitalMedicalCenter,Cincinnati,OH

NicholasLahensTheaccurateinterpretationofRNA-Seqdatapresentsamovingtargetasscientistscontinuetointroducenewexperimentaltechniquesandanalysisalgorithms.Thischallengehasledresearcherstoperformasubstantialnumberofbenchmarkingstudiesinordertodeterminebestanalysispractices.Simulateddatasetshaveproventobeaninvaluabletoolintheseefforts.Despitethisstrongneedforsimulateddata,onlyafewRNA-Seqsimulatorshavebeenreleasedinthepublicdomain,andallofthemarebasedonsimplifyingassumptionsthatlimittheirutility.ToaddresstheseshortcomingsandgeneraterealisticsimulateddatawearedevelopingtheBenchmarkerforEvaluatingtheEffectivenessofRNA-SeqSoftware(BEERS)2:anopen-source,modularsimulatorthatmodelseachstepintheprocessofconvertingRNAmoleculesintosequencingreads.WetakeanempiricalapproachtogeneratingrealisticRNAsamplesreflectingbiologicalvariability,alternativesplicing,andallele-specificexpression,whichusesrealdatatotraintheparameters.Next,wemodelbiochemicalreactionsandbiasesfromeachstepinlibraryconstructionasseparatemodules.Usinganobject-orientedparadigm,eachmodulehaswell-definedinputsandoutputsallowinguserstoeasilysubstitutenewmodules.ThisdesigngivesBEERS2theflexibilitytomodelchangestolibraryconstructionandsequencingprotocols,evolvinginparallelwithsequencingtechnology.BEERS2isopensource,freelyavailable,andwillbeacrucialtoolforthecommunityaswecontinuetodevelopstandardsfortranscriptomeanalysis.

93

General

EffectModificationbyAgeonaDiagnosticThree-Gene-SignatureinPatientswithActiveTuberculosis

LaurenMcDonnell1,CarlyA.Bobak1,2,MatthewNemesure1,JustinLin1,JaneE.Hill1

1ThayerSchoolofEngineeringatDartmouthCollege,2GeiselSchoolofMedicineatDartmouthCollege

LaurenMcDonnellIntroductionTuberculosis(TB)istheleadingcauseofdeathfromasingleinfectiousagentworldwide(1).In2017,therewere10millionreportedcasesofTBandanother1.3milliondeathsfromthedisease(1).ItiscurrentlytheleadingkillerforindividualswhoareHIVpositive(1).In2014,theWHOdevelopedtheambitiousSustainableDevelopmentGoals(SDGs)whichincluded"EndTB",amajorprogramaimingtoeradicatetheTBepidemicby2030(2).Accomplishingthiswillrequiremoreadvanceddiagnosticsthatarelessinvasiveanddeterminethediseasestatusmorequicklyandmorereliably.Inouranalysis,weaimtomodelriskfactorsassociatedwiththedevelopmentofTB.Here,wearelookingatdemographicfeaturesfrommulti-cohortstudiespullingdatafromthirtydifferentcountriesfromtheGeneExpressionOmnibusexaminingpatientswithactiveTB,latentTB,otherdiseases,andhealthycontrols.Thedataispulledpredominantlyfromdevelopingcountries,butalsoincludessamplesfromdevelopedcountries,includingtheUK,France,Germany,andtheUnitedStates.Intotal,thedatasetincludes3,096participants.Metaanalysisofsimilardatasetshaveproposedathree-gene-scoreasa"global"tuberculosismetric(3).ThistypeofanalysissuggeststhatallactiveTBpatients,regardlessofotherfactors,willexpressthisgenescore.OurhypothesisisthatthisactiveTBwillbeadditionallymediatedbydemographicfactorssuchasageandHIVstatusthatareassociatedwithTB.MethodologyWeperformedamultivariatelogisticregressionanalysistoidentifydemographicfeaturesassociatedwithculture-confirmedTuberculosis.Themodelfeaturesincludedage,HIVstatus,andgeneexpressionsforeachgeneindividually(GBP5,DUSP3,andKLF2),aswellasaninteractiontermforHIVandagewitheachofthethreegenes.ResultsTheresultsofourmultivariatelogisticregressionsuggestthatagemodifiesallthreegenesintheproposedglobalgenesignatures(p-valuesof5.38e-05,6.75e-05,and,0.01012,forGBP5,KLF2andDUSP3respectively).InitialfindingsalsoindicatethatHIVstatusisamediatoroftheeffectofGBP5(p-valueof0.03437).Knowingthattherelationshipbetweenthegeneexpressionofthesethreegenesvariesbydemographicsmaychangethewaythatadiagnosticisimplementedinclinic.Ourhopeisthatthisanalysiswillbeusedtofurtherrefinethethree-genesignatureforspecificdemographicgroupswhereitmaybemosteffectiveindiagnosingactiveTB.Citations(1)WHOGlobalTuberculosisReport2018www.who.int/tb/publications/global_report/en/(2)EndingTuberculosisby2030:CanWeDoIt?A.B.Suthar,R.Zachariah,Harrieshttps://www.ingentaconnect.com/contentone/iuatld/ijtld/2016/00000020/00000009/art00007?crawler=true(3)Genome-WideExpressionforDiagnosisofPulmonaryTuberculosis:aMulticohortAnalysishttps://www.ncbi.nlm.nih.gov/pubmed/26907218

94

General

Classificationandmutationpredictionfromgastrointestinalcancerhistopathologyimagesusingdeeplearning

SungHakLee1,Hyun-JongJang2

1DepartmentofHospitalPathology,SeoulSt.Mary’sHospital,CollegeofMedicine,TheCatholicUniversityofKorea,2DepartmentofPhysiology,CollegeofMedicine,TheCatholic

UniversityofKorea

SungHakLeeBACKGROUND:Althoughmicroscopicanalysisoftissueslideshasbeenthebasisfordiseasediagnosisfordecades,intra-andinter-observervariabilitiesremainissuestoberesolved.TherecentintroductionofdigitalscannershasallowedforresearcherstousedeeplearningintheanalysisoftissueimagesbecausemanyH&Ewholeslideimages(WSIs)areavailable.Inthepresentstudy,weinvestigatedthepossibilityofadeeplearning-based,fullyautomated,computer-aideddiagnosissystemwithWSIsfromagastricadenocarcinoma(STAD)dataset.Inaddition,wetrainedthenetworktopredictseveralcommonlymutatedgenesinSTAD.Furthermore,weshowedthatdeeplearningcanpredictMSIdirectlyfromH&Eimages.MATERIALSANDMETHODS:Westudiedtheautomaticclassificationof‘normal’and‘tumor’regionsusingatotalof432H&E-stainedWSIsfromTCGAgastriccancerimagedataset.Theslidesweretiledinnon-overlapping360x360pixelwindowsatamagnificationof20x.Weused70%ofthosetilesfortraining,15%forvalidation,and15%forfinaltesting.Thedeeplearningwithconvolutionalneuralnetworkswasperformedbasedoninceptionv3architecture.TostudythepredictionofgenemutationsfromH&Eimages,averageareaunderthecurve(AUC)valuesforKRASandSMAD4mutation(93and88cases,respectively)werecalculatedusingourautomatictumorclassificationdeep-learningapproach.TostudythepredictionofMSI(MSSvs.MSI-H)fromH&Eimages,383caseswereenrolledusingthesameapproach.RESULTS:Theperformanceofourmethodiscomparabletothatofpathologists,withanAUCofupto0.999.Furthermore,wetrainedthenetworktopredicttwocommonlymutatedgenesinSTAD(KRASandSMAD)andinvestigatedwhethertheycanbepredictedfrompathologyH&Eimages.WefoundthatKRASandSMADmutationcanbepredictedfrompathologyimages,withAUCsof0.711to0.737,similarresultsfrompreviousstudieswithnon-smallcelllungcancerhistopathologyimagesusingdeeplearning.ForthepredictionofMSI,patch-levelandpatient-levelAUCswere0.843and0.912,respectively,whichissuperiortothepreviousstudieswithTCGA-COADand-STADhistopathologyimages.CONCLUSIONS:Thesefindingssuggestthatdeep-learningmodelscanassistpathologistsinthedetectionofcancersubtypesandinthepredictionofgenemutationsandMSIstatus.Aftertrainingonlargerdatasetsandprospectivevalidation,thisapproachhasthepotentialtoprovideimmunotherapytoamuchbroadersubsetofpatientswithSTAD.

95

General

MappingtheEmergenceandMigrationofHematopoieticStemCellsandProgenitorsDuringHumanDevelopmentatSingleCellResolution

FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola

DepartmentofMolecular,CellandDevelopmentalBiology,UCLA,LosAngeles,CA,USA

FeiyangMaHematopoiesisisestablishedduringdevelopmentthroughmultiplewavesofbloodcellproduction,startingwithlineage-primedprogenitorsrequiredfortheembryosneeds,andculminatinginthegenerationofself-renewinghematopoieticstemcells(HSCs)forlife-longhematopoiesis.Althoughhematopoieticontogenyhasbeenstudiedextensivelyinmice,welackknowledgeoftheanatomical,temporalandmolecularmapforhematopoieticdevelopmentinhuman.PriorstudiessuggestthatHSCsemergefromhemogenicendotheliumintheaorta-gonad-mesonephros(AGM)regionbetween4-6weeksofhumangestation.Extraembryonicsitesincludingtheplacenta,umbilicalandvitellinearteries,andtheyolksac,havebeenproposedtogenerateHSCsinthemouse.However,whetherthesamesitesgenerateHSCsinhumanisunclear,mainlyduetothelimitedaccesstodevelopmentaltissuesandlackofreliablemethodstoidentifydevelopinghumanHSCs.Wecreatedasingle-celltranscriptomemapofhemato-vascularcells(CD34+and/orCD31+)fromhumanhematopoietictissuesat1stand2ndtrimester.Usingamolecularsignatureofself-renewingHSCsdefinedinourpreviousmolecularandfunctionalstudies,wecouldidentifyCD34+Thy1+RUNX1+HOXA7+MLLT3+HLF+cellsasHSCsthroughoutdevelopment.Analysesof5-wkAGMrevealedadistinctpopulationofnewlyemergedHSCsthatvanishedby7wks.HSCscolonizedthefetalliverby6wks,wheretheyexpandedanddifferentiatedbeyond15wks.SmallbutdistinctpopulationexpressingHSCmolecularmarkerswasreproduciblydetectedin5wkplacentas.Atthistime,theheart,umbilicalcordandfetalliverlackedclearHSCpopulations,implyingminimalspreadingthroughcirculatingblood.Interestingly,precedingHSCcolonization,the5wkfetalliveralreadyharboredCD34+Thy1-RUNX1+HOXA7-MLLT3-HLF-progenitorsthatco-expressedmarkersassociatedwitherythro-myeloidandlympho-myeloidpotential.Comparablepopulationswereabundantintheyolksac,suggestiveoftheirorigin.Thisdata-setprovidesanunprecedentedresourcetodissectthedynamicsandmolecularpathwaysgoverningtheemergenceandprogressionofdistinctwavesofhematopoieticcellsduringhumandevelopment,andservesasareferencemapforthegenerationofHSCsinvitrofortherapeuticpurposes.

96

General

Large-scaleMachineLearningandGraphAnalyticsforFunctionalPredictionofPathogenProteins

JasonMcDermott1,SongFeng1,WilliamNelson1,Joon-YongLee1,SayanGhosh1,ArifulKhan1,MahanteshHalappanavar1,JustineNguyen2,JonathanPruneda2,DavidBaltrus3,JoshuaAdkins1

1PacificNorthwestNationalLaboratory,2OregonHealth&ScienceUniversity,3Universityof

Arizona

JasonMcDermottProteinsenactthefunctionalityencodedbygenomesandsounderstandingproteinfunctioniscriticaltomanyareasofbiology.Predictionofproteinfunctionfromsequenceispossiblebecauseofevolutionaryrelationshipsbetweenproteinswithsimilarfunctions,andexistingalgorithmscanidentifythecorrespondingsequencesimilarity.However,manyproteinshavesimilarfunctionsbutdiversesequences,whichthwartexistingmethods,anddrivenbyadvancesinsequencingtechnologythenumberofproteinsequenceswithnoknownfunctionorsimilaritytoproteinsofknownfunctionislargeandgrowingrapidly.Weusereducedaminoacidalphabetmappingandkmer-basedproteinsequencerepresentationtodetectfunctionalsimilaritiesbetweenproteinsandapplythismethodtobacterialandviralproteinsthatmimiceukaryoticubiquitinligasesanddeubiquitinasesandclassesofbacteriocins.Thesemodelsallowpredictionofnovelexamplesthatarenotdetectedbytraditionalsequencesimilarity,andcanprovideinsightintoactivesitesorotherfunctionaldomainsfortheproteins.Toexploresequencespaceinamorediscovery-orientedwaywehaveappliedthisapproachtoaverylargesetofbacterialproteinsequences(>20millionsequences)anduseaGPU-basedalgorithmtoquicklycalculateasimilaritygraphbasedonproteinfeaturesbeyondtraditionalsequencesimilarity.Exascalegraphanalyticsmethodsareusedtoidentifygroupsofcloselyrelatedsequencesfromthesimilaritygraph.Weshowthatthismethodcanrecapitulateknownrelationshipsbetweenproteins,highlightinconsistenciesintheunderlyingproteindatabase,andprovidehypothesesforfunctionsofnovelproteinsthusprovidingalarge-scalesequencelandscape.

97

General

Gene-setanalysisusingGWASsummarystatisticsandGTExdatabase

MasahiroNakatochi

DepartmentofNursing,NagoyaUniversityGraduateSchoolofMedicine

MasahiroNakatochiRecently,samplesizesofgenome-wideassociationstudies(GWASs)arerapidlyincreasing.Consequently,manygeneticlociassociatedwithtraitshavebeenidentified.ItisdifficulttointerprethowthesemanylociidentifiedbyGWAScontributetothetraits.AsafunctionofSNP,regulationofgeneexpressionlevelisconsidered.TheSNPiscalledasexpressionquantitativetraitloci(eQTLs).TheGTExprojectrevealedmanyeQTLsinmanytissuesofhuman.Inthisstudy,IproposeanapproachofagenesetanalysisusingGWASsummarystatisticsandGTExdatabasetoinvestigatehowthegeneticlociidentifiedbyGWAScontributetothetrait.Thisapproachhasthreesteps.Atfirst,trait-associatedSNPsareidentifiedbyGWAS.Second,geneswhoseexpressionlevelwasassociatedwithtrait-associatedSNPsinatleastonetissueintheGTExdatabasearesearched.Thesegeneswereclassifiedintoeitherofpositivelyornegativelycorrelatedgenes.Finally,genesetenrichmentanalysesofpositivelycorrelatedgenesandnegativelycorrelatedgenesareperformedwiththemodifiedFisher’sexacttesttoidentifytrait-associatedpathwaysorgenesets.Usingthisapproach,Ifoundserumuricacid(SUA)-associatedgenesetsbasedonaSUAGWAS.GenesetenrichmentanalysisofUniProttermsfoundtheterms“Williams-Beurensyndrome”,“sodium”,“transport”,“sodiumtransport”,and“alternativesplicing”wereenrichedforthepositivelycorrelatedgenes.ThisapproachprovidesanotherinsightintotheSNPsidentifiedbyGWAS.

98

GeneralGeneral

TargetingCancerviaSignalingPathways:ANovelApproachtotheDiscoveryofGeneCCDC191'sDouble-agentFunctionusingDifferentialGeneExpression,HeatMap

AnalysesthroughAIDeepLearning,andMathematicalModeling

AnnieOstojic

PurdueUniversity

AnnieOstojicAccordingtoarecentJohnsHopkinsUniversitystudypostedinMayof2018,thenumberoftotalgenesinthegenomewasrecalculatedtobe43,162genescomprisedof21,306protein-codedgenesand21,865non-codedgenes.WithcompletionofbasepairsequencingintheHumanGenomeProjectbackin2003,hopeexistedforaccelerationofnewmedicaltreatmentsanddiseaseintervention.However,earlierbioinformaticprocesseswereunabletoproduceresultsquicklyenough,somanygenefunctionsremainunknowntodate.Aneedexiststoanalyzegenefunctionsinpathwaystomeetachangingmedicalindustryofpharmacogenomics,personalizedmedicine,andcancertreatmentsrelativetogeneexpressionpatterns.Newmethodologyfordeterminingfunctionsofunstudiedgenestorapidlyextrapolate,classify,andcorrelatetheirgeneexpressionstobiologicalpathwaysisattheforefrontofbioinformaticstudies.ThisresearchdiscoveredthefunctionofgeneCCDC191,acoiled-coildomain-containingprotein-codinggene,whosefunctionhadnotbeenfullystudiednordefined.AnovelapproachwasutilizedtodeterminethefunctionofCCDC191bycombininggeneexpressionanalysis,patientsurvivalanalysis,differentialgeneexpression,heatmapwithAIdeeplearning,andreverseengineeringmathematicalmodeling.ThisstudypresentsanalysesandinsightsintogeneCCDC191whichhavenotbeenperformedprior,anditprovidesareplicablemethodologywhichincorporatesAIdeeplearningimageclassification,andreverseengineeringmathematicalmodelingtodeterminegenefunctionsinpathwaysandcancerconnectedness.

99

General

RFEX:SimpleRandomForestModelandSampleExplainerfornon-MachineLearningexperts

DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar

SanFranciscoStateUniversity(allauthors)

DragutinPetkovicMachineLearning(ML)isbecominganincreasinglycriticaltechnologyinmanyareas.However,itscomplexityanditsfrequent“non-transparency”createsignificantchallenges,especiallyinthebiomedicalandhealthareas.OneofthecriticalcomponentsinaddressingtheabovechallengesistheexplainabilityortransparencyofMLsystems,whichreferstothemodel(relatedtothewholedata)andsampleexplainability(relatedtospecificsamples).OurresearchfocusesonbothmodelandsampleexplainabilityofRandomForest(RF)classifiers.OurRFexplainer,RFEX,isdesignedfromthegroundupwithnon-MLexpertsinmind,andwithsimplicityandfamiliarity,e.g.providingaone-pagetabularoutputandmeasuresfamiliartomostusers.InthispaperwepresentsignificantimprovementinRFEXModelexplainercomparedtotheversionpublishedpreviously,anewRFEXSampleexplainerthatprovidesexplanationofhowtheRFclassifiesaparticulardatasampleandisdesignedtodirectlyrelatetoRFEXModelexplainer,andaRFEXModelandSampleexplainercasestudyfromourcollaborationwiththeJ.CraigVenterInstitute(JCVI).WeshowthatourapproachoffersasimpleyetpowerfulmeansofexplainingRFclassificationatthemodelandsamplelevels,andinsomecasesevenpointstoareasofnewinvestigation.RFEXiseasytoimplementusingavailableRFtoolsanditstabularformatofferseasy-to-understandrepresentationsfornon-experts,enablingthemtobetterleveragetheRFtechnology.

100

General

ApparentbiastowardlonggenemisregulationinMeCP2syndromesdisappearsaftercontrollingforbaselinevariations

AyushT.Raman1,2,AmyE.Pohodich2,Ying-WooiWan2,HariKrishnaYalamanchili2,WilliamE.Lowry3,HudaY.Zoghbi2,ZhandongLiu2

1BroadInstituteofMITandHarvard,2BaylorCollegeofMedicine,3UniversityofCaliforniaLos

Angeles

AyushRamanBackground:RettsyndromeisaneurodevelopmentaldisordercausedbymutationsinMECP2,amethyl-bindingproteinwhosetaskistoorchestrategeneexpression,andMeCP2mutationsdisrupttheexpressionofseveralthousandgenes.Overthepasttenyears,anumberofstudiesobservedthatRettsyndromeandotherdisordersthataffectneuronalsynapsesseemtopreferentiallydysregulategenesthatarelongerthan100Kb.Theselength-dependenttranscriptionalchangesinMeCP2-mutantsamplesaremodest,but,giventhelowsensitivityofhigh-throughputtranscriptomeprofilingtechnology,herewere-evaluatethestatisticalsignificanceoftheseresults.Results:Wedeveloparobuststatisticalapproachtoestimatenoiseaccuratelyandidentifystatisticallysignificantgenelength-dependentchanges.Wefindthattheapparentlength-dependenttrendspreviouslyobservedinMeCP2microarrayandRNA-sequencingdatasetsdisappearafterestimatingbaselinevariability(i.e.,intra-sampledifferences)fromrandomizedcontrolsamplesacrosspublicallyavailable17differentMeCP2datasets.WeshowthatevenMAQC/SEQCPhase-IIIbenchmarkdatasetsarepronetothelonggenebias,whichdoesnotincludeMeCP2oritseffectsonexpression—suggestingthatthebiasisnotaninherentfeatureofgeneexpressionfollowingMeCP2disruption.WehypothesizedthatPCRamplification,aprocesssharedbybothmicroarrayandRNA-seqtechnologies,mightintroducetheobservedbiasinlonggeneexpression.WefindnobiaswithnanoStringtechnology,atechniquethatdoesnotusePCRamplification,forSEQC/MAQCsamplesorMecp2mutantsamples.Thisconfirmedournotionthatthepreviousobservationsoflong-genebiasresultedfromamplification-basedtechnologiesandthefailuretoestablishaproperbaseline.Conclusions:Weconcludethataccuratecharacterizationoflength-dependent(orother)trendsrequiresestablishingabaselinefromrandomizedcontrolsamples.WeproposethatsmallerfoldchangesintranscriptionobservedafterPCRamplificationleadstoanoverestimationoflonggeneexpressionlevels.

101

General

Predictionofchronologicalandbiologicalagefromlaboratorydata

LukeSagers1,LukeMelas-Kyriazi2,ChiragJ.Patel3,ArjunK.Manrai1

1BostonChildren’sHospitalComputationalHealthInformaticsProgram,2HarvardUniversityDepartmentofMathematics,3HarvardMedicalSchoolDepartmentofBiomedicalInformatics

LukeSagersAginghaspronouncedeffectsonbloodlaboratorybiomarkersusedintheclinic.Priorstudieshavelargelyinvestigatedasinglebiomarkerorpopulationatatime,limitingacomprehensiveviewofbiomarkervariationandagingacrossdifferentpopulations.Herewedevelopasupervisedmachinelearningapproachtostudytheagingprocessusing356bloodbiomarkersmeasuredin67,536individualsacrossdemographicallydiversepopulations.Ourmodelpredictsagewithameanabsoluteerror(MAE)inheld-outdataof4.76yearsandanR2valueof0.92.Agepredictionwashighlyaccurateforthepediatriccohort(MAE=0.87,R2=0.94)butinaccurateforages65+(MAE=4.30,R2=0.25).Extensivevariabilitywasobservedinwhichbiomarkerscarrythemostpredictivepoweracrossdifferentagegroups,genders,andrace/ethnicitygroups,andnovelcandidatebiomarkersofagingwereidentifiedforspecificageranges(e.g.VitaminEforages18-45).Wefurthershowthatpredictorsaccurateforoneagegroupmayfailtogeneralizetoothergroups,andfindthatnearlyathirdofallbiomarkersexhibitnon-linearitynearadulthood.Aspopulationsworldwideundergomajordemographicchanges,itwillbeincreasinglyimportanttocataloguebiomarkervariationacrossagegroupsanddiscovernewbiomarkerstodistinguishchronologicalandbiologicalaging.

102

General

WholegenomesequencinganalysisofinfluenzaCvirusinKorea

SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim

DivisionofInfectiousDiseases,DepartmentofInternalMedicine,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea;DivisionofBrainKorea21ProgramforBiomedicineScience,

CollegeofMedicine,KoreaUniversity,Seoul,SouthKorea;AsiaPacificInfluenzaInstitute,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea

SooyeonLimThroughtheHospital-basedInfluenzaMorbidityandMortality(HIMM)surveillancesystem,973nasopharyngealswabspecimensfromchildrenunder2yearsofagewerecollectedandtestedforinfluenzavirusesusingreal-timePCR.Amongthetestedspecimens,383werepositiveforinfluenzaAand/orBvirus.InfluenzaCviruswasconfirmedinfivespecimens.Inthisstudy,weusedfiveinfluenzaCviruspositivespecimensandacell-culturedinfluenzaCvirus.ViralRNAwasisolatedusingtheQIAampviralRNAminikit(Qiagen,Hilden,Germany)followingamanufacturer’sinstructions.AllisolatedRNAwasfinallyelutedwith60ulofdistilledwater.ReversetranscriptionreactionwasperformedbyPrimescript1ststrandcDNAsynthesiskit(Takara,Shiga,Japan)usinguni-5’primer.Thegenome-wideamplificationoftheinfluenzaCviruswasperformedusingtaqpolymerase.TheamplifiedgenefragmentswereperformedusingtheNexteraXTDNAlibraryPrepkit(Illumina),accordingtothemanufacturer’sprotocol.ThisstudywasthefirstreportofinfluenzaCvirususingNGSanalysisinSouthKorea.Inthisstudy,youngchildrenwithinfluenzaCvirusinfectionshadacuterespiratoryillnesses,suchasfever,rhinorrhea,andcough,butnopneumoniaorsevererespiratoryillnesswasobserved.BasedonNGSanalysis,wecanexpandourunderstandingvarioussymptomsofinfluenzaCvirus.

103

General

MiningtheHumuhumunukunukuapuaandtheShakaofAutismwithBigDataBiomedicalDataScience

PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyMariePaskov,NathanielStockham,MayaVarma,EmilieLeBlanc,JackKent,Yordan

Penev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall

DepartmentsofPediatrics(SystemsMedicine)andBiomedicalDataScience,StanfordUniversity

DennisWallMentalhealthisarguablyatthecoreofallhealth,andearlychildhoodmentalhealthpredictsalongtermhealthylifecourse.Yet,finding,treating,andpreventingmentalhealthdisordersinchildrenislimitedbyreachandscalablemethods.Thankfully,advancesinAIandubiquitoustechnologyhavemarshaledinunparalleledopportunitiesforscalablemobilehealth.Wehaveconstructedaseriesofmobilesolutionsthattreatandtrackwhilesimultaneouslybuildingnovelcomputervisionlibrariesforprecisionmodels.Thesesolutionsfunctionasmobilegamesthatarehighlyengaginganddesignedfortheindividual,encouragingcompliancewiththerequired“dose”whilepassivelycollectingmetricstomeasure,andultimatelypredictoutcomes.Wecanquantifyordigitizeachild’sphenotypethroughthesepassivelycollecteddata,notjustonce,butmanytimes,asthechildplaysourgamesandlearnsthroughplaying.Thesegamesengendertrustandastheydo,we“crowd”buildacommunityofstakeholdersthatnotonlysharesPhenomedata,butalsodataontheirGenomeandtheEnvironment.Withthe3modalities,weusedatafusionmultivariatetechniquestoresolvetheG+E=Pequationforautismandsetthestagefordoingthesameinotherspectrumdisordersacrossmentalhealth.

104

General

Developmentofarecurrencepredictionmodelforearlylungadenocarcinomausingradiomics-basedartificialintelligence

HeeChulYang,GunseokPark,JiEunOh

DivisionofConvergenceTechnology,NationalCancerCenterResearchInstitute

HeeChulYangPurpose:Thisstudyaimedatpredictingtherecurrenceaftercurativeresectionforthepatientswithlungadenocarcinoma(ADC)usingthephenotypicradiomicsfeaturesobtainedfromtheCTimages.Material:FromJanuary1,2010,toDecember31,2015,atotalof604primarylungADCpatientswhohadthetumorsizeof1-3cmunderwentcurativeresectionatasingleinstitution.Method:Atotalof604patients’preoperativeCTimageswereusedforfeatureextraction.Thefinaldatasetwasrandomizedintoatrainingset(n=424)andatestset(n=180)withtheratioof7:3.Radiomicsfeatureswereselectedfromt-test(P<0.05)andaradiomicssignaturewasclassifiedbythelogisticregressionmodel.TheoptimalmodelwasevaluatedthroughaROCcurve.Result:Inalogisticregressionanalysis,6radiomicsfeatureswerefinallyselectedfrom51featurestobuildaradiomicssignaturethatwassignificantlyassociatedwithrecurrence.Theoptimalmodelwasbuiltwithfeaturesassociatedwiththedependentvariable.TheypresentedgoodperformanceinthepredictionofrecurrencealonewithanAUCof76.2%accuracy.Thetestsetvalidated72.2%accuracy.Conclusion:Theradiomicssignaturecanbeausefulrecurrencepredictiontooleveninsmall-sizedlungADC.

105

General

DRLPC:DimensionReductionofSequencingDatausingLocalPrincipalComponents

YunJooYoo1,FatemehYavartanu1,ShelleyB.Bull2

1SeoulNationalUniversity,2TheLunenfeld-TanenbaumResearchInstitute

YunJooYooGenome-wideassociationstudies(GWAS)usingsinglenucleotidepolymorphism(SNP)datausuallyhavemillionsofvariableswithcomplexcorrelationstructureresultingfromlinkagedisequilibrium.Whenmulti-SNPjointanalysisusingmultipleregressionisapplied,adimensionreductionmethodsuchasprincipalcomponentanalysiscanbeconsidered.ReplacingSNPdatawithprincipalcomponentscanresolvemulti-collinearitywhichoftenoccursinregressionusinghigh-densitysequencingorimputedSNPdata.However,theprincipalcomponentsconstructedfromallSNPvariablesinaregionarehardtointerpretasabiologicalentityandarenotusefulforlocalizationandfinemapping.Inthisstudy,weproposeanalgorithmDRLPC(DimensionReductionusingLocalPrincipalComponents)toreducethedimensionforregressionanalysisbyselectingclustersofSNPsinhighcorrelationandreplacingeachclusterbyalocalprincipalcomponentconstructedfromtheSNPsinthecluster.Thealgorithmaimstoresolvemulticollinearitybetweenupdatedvariablesbyconsideringvarianceinflationfactor(VIF)andremovingvariableswithhighVIF.WeexaminedthebehaviourofDRLPCbyapplyingthealgorithmtothe1000GenomesProjectdata.Chromosome22SNPsetsofthreepopulations(EUR,ASN,AFR)weredimensionreducedforeachgeneregionseparatelycomparingseveralchoicesofthresholdvaluesforclusteringandprincipalcomponentsselection.Whenaveragedacrossthegenes,theratioofthenumberoffinalvariablesoverthenumberoforiginalvariableswas50%forthegeneswith5~10SNPsandaslowas10%forthegeneswithmorethan1,000SNPs.ThereductionratewassmallerfortheAFRpopulationcomparedtotheotherpopulationsEURandASN,possiblyduetoweakerLDintheAfricanpopulation.Wealsocomparedthepowerofmulti-SNPtestsconstructedbasedonregressionresultsobtainedfromtheoriginaldataanddimensionreduceddata.ThesetestsincludegeneralizedWald,LC(linearcombination)tests,andMLC(Multi-binslinearcombination)tests.LCtestsandMLCtestsarealsodimensionreductiontechniquesinthesensethatLCcombinesallindividualeffectsintoaonedegreeoffreedomtestandandMLCcombinestheindividualeffectsintoalinearcombinationwithinabin(cluster)andconstructsatestwithdegreesoffreedomequaltothenumberofclusters.SinceDRLPCusesthesameclusteringalgorithmbasedoncliquepartitioningasMLCwecomparedresultsofMLCwithoriginaldatatoDRLPCWaldtestwithprocesseddataunderthesameclusteringthresholdandfoundthattheyyieldsimilarpower.WeconcludethatDRLPCcanprovideefficientdimensionreductionwhileresolvingmulti-collinearityandalsolessenstheproblemofinterpretabilitybecausetheseprincipalcomponentsrepresentsmallersizedregions,possiblyshorthaplotypes.

106

General

Meta-analysisinexhaustedTcellsfromHomosapiensandMusmusculusprovidesnoveltargetsforimmunotherapy

LinZhang1,YichengGuo2,HafumiNishi1

1TohokuUniversityGraduateSchoolofInformationSciences,2ColumbiaUniversity,Department

ofSystemsBiology

LinZhangAntibodytargetimmunecheckpointinhibitorstoreverseTcellexhaustionisapromisingapproachforimmunotherapyofcancers.However,thetherapeuticefficacyisstilllowforknownimmunecheckpointinhibitors,suchasPD1andCTLA4.TcellexhaustionisastateofTcelldysfunctionduringchronicinfectionsandcancers.Itexhibitsseveralcharacteristicfeatures,suchaspooreffectorfunctionsinahierarchicalmanner,impairedmemoryTcellpotential,sustainedupregulationandco-expressionofmultipleinhibitoryreceptors.ThemechanismandpathwaysforTcellexhaustionremaintobefullydescribed.Inthisstudy,weperformedmeta-analysiswith7datasetsfrombothhumansandmice,touncoverthemolecularmechanismofTcelldysfunction.Throughgenesetenrichmentanalysis,thepredefinedexhaustiongenesetswereobservedtobesignificantenrichmentintheexhaustedTcells.Thedifferentexpressionanalysesshowedanoverlapof21upregulationand37downregulationgenessharedbyexhaustedTcellsinhumansandmice.Thesegenesweresignificantlyenrichedinexhaustionresponse-relatedpathways,suchassignaltransduction,immunesystemprocess,andregulationofcytokineproduction.Besides,co-expressionanalysisidentified175geneswerehighlycorrelatedwithexhaustiontraitinhumansandmice.Aboveall,ourstudyrevealedthatTOXandCD200R1mightbeconsideredaspotentialandhigh-efficienttargetsforimmunotherapy.

107

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS

POSTERPRESENTATIONS

108

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions

SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

VirginiaCommonwealthUniversity

SinaGhadermarziIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresiduelevelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.

109

MUTATIONALSIGNATURES

POSTERPRESENTATIONS

110

MutationalSignatures

Transcription-associatedregionalmutationratesandsignaturesinregulatoryelementsacross2,500wholecancergenomes

JüriReimand

OntarioInstituteforCancerResearch,UniversityofToronto

JuriReimandThegenomesofhealthyandcancerouscellsaccumulatesomaticmutationsovertimewithcomplexvariationsacrosstissuesandgenomiccontexts.Certainclassesoffunctionalelementsofthegenomearesubjecttodifferentialmutationratesduetoregionalizedactivitiesofmutationalprocesses.Toinvestigateregionalmutations,wedevelopedRM4RM,astatisticalframeworkfordetectingdifferentialmutationratesandtrinucleotidesignaturesinsetsofgenomicregulatoryelements.Tovalidateourmodel,wefirstanalyzedCTCFbindingsitesacross>2,500wholecancergenomesof39cancertypesoftheICGC-TCGAPCAWGcohort.WefoundsignificantmutationenrichmentsinCTCFsitesinliver,esophageal,breastandothercancertypesthatwasprimarilydrivenbyT>C/Gmutationsandmultipleraremutationsignaturesofunknownetiology.Transcriptionstartsitesofprotein-codinggenesandabroadersetofexperimentally-definedregulatoryelementsderivedfromprimarytumorsoftheTCGAprojectalsoshowedsignificantlyelevatedregionalmutationratesinmultiplecancertypes.TSS-specificregionalmutationenrichmentwasparticularlydominantinhighlytranscribedgenesofmatchingtumorswhilenonewasapparentinsilencedgenes.Incontrast,nomutationenrichmentdependencyontranscriptabundancewasobservedindistalregulatoryelements.Thesedataindicateatranscriptioninitiation-coupledmutationalprocessactiveinmultiplecancertypessupportedbymultiplemutationalprocessesandtrinucleotidesignaturesspecificallyenrichedinhighly-transcribedTSSs.Ourfindingsandstatisticalmodelenabledetailedstudiesofthemechanismsofsomaticmutagenesisandadvancesourunderstandingofgeneticdriversofdisease.

111

MutationalSignatures

Complexmosaicstructuralvariationsinhumanfetalbrains

ShobanaSekar1,LiviaTomasini2,MariaKalyva3,TaejeongBae1,LoganManlove1,BoZhou4,JessicaMariani2,FritzSedlazeck5,AlexanderE.Urban4,ChristosProukakis3,FloraM.Vaccarino2,

AlexejAbyzov1

1MayoClinic,2YaleUniversity,3UniversityCollegeLondon,4StanfordUniversity,5BaylorCollege

ofMedicine

AlexejAbyzovSomaticmosaicismincellsofthehumanbrainiscommonandmayhavefunctionalconsequencesthatleadtodiseasesincludingneurologicalones.Mosaicvariationsinbraincanbepointmutations,insertionsofmobileelements,andstructuralchanges.Previouslywedetectedanddescribed200-400mosaicpointmutationspersinglecellclonesfromcorticesofthreehumanfetuses(15to21weekspostconception).Herewedescribefourmosaicstructuralvariations(SVs)inthesamebrains.TheSVswereofkilobasescaleandcomplex,i.e.,consistingofdeletion(s)andafewrearrangedgenomicfragmentsthatsometimesoriginatedfromdifferentchromosomes.Sequencesatbreakpointsattherearrangementshadmicrohomologiessuggestingtheiroriginfromreplicationerrors.OneSVwasfoundintwoclonesandwetimeditsoriginto~14weekspostconception.OurstudyrevealstheexistenceofmosaicSVs,likelyarisingfromcellproliferation,inthehumanbraininmid-neurogenesis.

112

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

POSTERPRESENTATIONS

113

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWorkPatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Stratificationofkidneytransplantrecipientsbasedontemporaldiseasetrajectories

IsabellaFriisJørgensenPhD1,SørenSchwartzSørensenPhD2,SørenBrunakPhD1

1NovoNordiskFoundationCenterforProteinResearch-FacultyofHealthandMedicalSciences-UniversityofCopenhagen-Blegdamsvej3B-DK-2200CopenhagenN-Denmark;2DepartmentofNephrology-Rigshospitalet-CopenhagenUniversityHospital-Blegdamsvej9-DK-2100

CopenhagenØ-Denmark

IsabellaFriisJørgensenOrgantransplantationsoftenimprovethelifeofchronicallysickpatients.However,immune-suppressivemedicationgiventotransplantrecipientsincreasetheriskofcomplications,especiallyinfectionsandinfection-relateddeath.Oneinfivekidneytransplantrecipientsdiefrominfection.Wewanttostratifykidneytransplantrecipientsintogroupsofpatientswithdifferentpatternsofinfectiousdiseasesandmortalitytopredictwhichpatientshavehigherriskofspecificinfections.WeusetheDanishNationalPatientRegistry(DNPR)thatcontainshospitaldiagnosesfor6.9millionpatientsfromtheentireDanishpopulationfrom1994to2018.Weuseapreviouslypublishedmethodtoidentifysignificanttime-dependentdiseasetrajectoriesforallpatientswithakidneytransplantation.Subsequently,weusehierarchicalclusteringofJaccarddistancesbetweenthediseasetrajectoriestofinddistinctgroupsoftrajectoriesfromkidneytransplantrecipients.IntheDNPR,weidentified5,644patientswithakidneytransplantationresultingin43significantdiseasetrajectoriesthatconsistofthreeconsecutivediseasesincludingseveralinfectious-relateddiagnoses.Morethan87%ofthekidneytransplantationrecipientsfollowatleastoneofthesetrajectories;hencearediagnosedwiththethreediseasesintheorderthetrajectoryspecifies.Clusteringrevealstwomaingroupsoftemporaldiseasetrajectories.Weidentifypatientsfollowingthetwogroupsofdiseasetrajectoriesanddiscoversignificantdifferencesinmortalityafterkidneytransplantationbetweenpatientsfollowingdifferentdiseasetrajectories.Thisstudyusedpreviousdiseasehistoryfromlarge-scalehospitaldiagnosestostratifycommon,temporaldiseasetrajectoriesintotwodistinctgroups.Dependingonthetypeoftrajectorykidneytransplantationrecipientsfollowsignificantdifferencesinmortalityareseen.Thesemethodscanbeusedtoguidecliniciansabouthigherrisksofcertaininfectionsandmortalityofcertaingroupsofkidneytransplantrecipients.

114

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ModelingGeneExpressionLevelsfromEpigeneticMarkersUsingaDynamicalSystemsApproach

JamesBrunner1,JacobKim2,KordM.Kober3

1MayoClinic,Rochester,MN;2ColumbiaUniversity,NewYork,NY;3UniversityofCalifornia,San

Francisco,CA

KordKoberGeneregulationisanimportantfundamentalbiologicalprocessandinvolvesanumberofcomplexbiologicalprocessesthatareessentialfordevelopmentandadaptationtotheenvironment.Understandingtheroleofepigeneticchangesingeneexpressionisafundamentalquestionofmolecularbiology.Predictinggeneexpressionfromepigeneticdataisanactiveareaofresearchandpreviousstudieshaveusedstatisticalapproachesforbuildingpredictionmodels.Dynamicalsystemscanbeusedtogenerateamodeltopredictgeneexpressionusingepigeneticdataandageneregulatorynetwork(GRN).Bydynamicallysimulatinghypothesizedmechanismsoftranscriptionalregulation,weprovidepredictionsbaseddirectlyonthesebiologicalhypotheses.Furthermore,astochasticdynamicalsystemprovidesuswithadistributionofgeneexpressionestimates,representingthepossibilitiesthatmayoccurwithinthecell.ThepurposeofthisstudyistodevelopandevaluateastochasticdynamicalsystemsmodelpredictinggeneexpressionlevelsfromepigeneticdataforagivenGRN.Wemodelgeneregulationusingapiecewise-deterministicMarkovprocess(PDMP)wheretranscriptionfactor(TF)bindingisaBooleanrandomvariablerepresentingthebound/unboundstateofabindingsiteregionofDNA.TFbindingisgivenasthedifferenceoftwoPoissonjumpprocesses(i.e.,bindingandunbinding),sothattimebetweenbindingandunbindingeventsisexponentiallydistributedwithpropensitiestakentobelinearfunctionsoftheavailableTF.EpigeneticmodificationoftheTFbindingsiteimpactsthebindingpropensityofTFandismeasuredasthepercentageofmethylatedbases(i.e.,beta).WeusealinearordinarydifferentialequationbasedontheunderlyingGRNtodeterminethevalueofthetranscriptbetweenTFbindingorunbindingevents.Weincludebaselinetranscriptionanddecayandareabletosolveexactlybetweenjumpsofbinding/unbindingevents.Inadiscretespace,continuoustimeMarkovprocess,theequilibriumdistributioncanbeestimatedbysamplingfromarealizationoftheprocess.ForourcontinuousspacePDMPwecanestimatetheequilibriumdistributioninasimilarmannerusingkerneldensityestimationwithaGaussiankernel.Weestimatethemarginaldistributionsofvariousgenevariableswitha1-dimensionalkernel.WeuseaGRNassumetobeknowntocreateamodelofgeneregulationthatincludesTFbindingdynamics.Weassociatebindingsiteswiththegenesthattheyregulateandusetheseassociationstocreateabipartitegraph.TheGRNandtraining/testingdataarecreatedfrompubliclyavailabledata.Theepigeneticparameterisassumedtobemeasurable.Theremainingparametersareestimatedusinganegativelog-likelihoodminimizationprocedure.Wecancomputealog-likelihoodforasetofpairedepigeneticandtranscriptionsamplesbytimeaveragingasamplepathagainstaGaussiankernel.Wereportonthedesignandevaluationofthemodel’sperformance.

115

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

TranslatingBigDataneuroimagingfindingsintomeasurementsofindividualvulnerability

PeterKochunov1,PaulThompson2,NedaJahanshad2,ElliotHong1

1UniversityofMarylandSchoolofMedicine,Maryland,USA;2UniversityofSouthernCalifornia,

California,USAPeterKochunovWeproposeanintuitiveanatomicallyinformedapproachtoderiveanindexofsimilaritybetweenindividualbrainpatternsandtheexpectedpatternsofneuropsychiatricdisordersbasedonBigDataneuroimagingstudies.BigDataneuroimagingstudies,suchastheseperformedbyEnchancingNeuroImagingGeneticsMetaAnalysis(ENIGMA)consortiumprovidedscientificcommunitywiththeregionalpatternsofeffectsizesincommonneuropsychiatricdisorderssuchasschizophrenia(SZ),bipolarandmajordepressivedisorders(BPandMDD),epilepsy(EP),Alzheimer’sdementia(AD),mildcognitiveimpairment(MCI)andothers.ThesepatternsdescriberegionaldeficitusingstandardizedsMRI,dMRIandrsfMRIworkflows.Theyarederivedfromstatisticallypowerfulandinclusivesamplesandarehighlyreproducible(r=0.8-0.9)inindependentsamples.Wedeveloped“RegionalVulnerabilityIndex”(RVI)tomeasuresimilaritybetweenanindividualandtheexpectedpatternofthepatient-controldifferencesRVIcanbecalculatedforasingleoracrossimagingmodalities.ForasinglemodalityRVI,exampleusesFractionalAnisotropy(FA)measurefromdMRI,iscalculatedasfollowing.FAforeachofthe23majorwhitematterregions,asdefinedbyENIGMAatlas,inanindividualisconvertedtoz-valuesby(A)calculatingtheresidualvaluesafterregressingoutageandsexeffectsforthisregionand(B)subtractingtheaveragevalueforaregionand(C)dividingbythestandarddeviationcalculatedfromthehealthycontrols.Thisproducesavectorof23z-values(oneperregion)foreachindividualinthesample.RVIiscalculatedasthecorrelationcoefficientbetween23region-wisezvaluesforthesubjectandthepatient-controlseffectsizesinENIGMA.RVItakesvaluesfrom1(individualpatternisalignedwithdisorderpattern)to-1(individualpatternisinanti-alignment).Forcross-modalityresearch,RVIcanbeexpandedhierarchicallybybuildingacombinedvectorthatincludesmultiplephenotypes.Forexample,theRVI-WhiteMattercalculationusesavectorof69valuesthatcombinetract-wiseFA,radial(RaD)andaxial(AxD)diffusivityvaluesperperson.Tomergeeffectsizesacrossdiversedomains,weuseapseudo-ordinarytransformationthatmapseffectsizesbetween0and1whilepreservingtherelativedistancebetweenthem.WefirstdemonstratedthatRVI-SZvaluesaresignificantlyelevatedinpatientswithSZandarealsopredictiveoftreatmentresistance.ThatissubjectswhodevelopedresistancetomodernantipsychoticmedicationshadsignificantlyhigherRVI-SZvaluesthanthesewhorespondedtotreatment.WenextdemonstratedthatRVIforSZweresignificantlycorrelatedwithRVIforADbutnotMCIduetosignificantoverlapindeficitpatternsbetweenthesedisorders.WenextshowedthatcalculatingRVIacrossmultiplemodalitiesproducesvulnerabilitymeasuresthataremoresensitivetopatientcontroldifferencesintheindependentdatasetsandshowedstrongersensitivitytocognitivedeficitsandnegativesymptoms.TheRVIcalculatortoolsaredistributedwithsolar-eclipsesoftware(www.solar-eclipse-genetics.org)

116

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Automatingnew-usercohortconstructionwithindicationembeddings

RachelD.Melamed

DepartmentofComputationalBiomedicineandBiomedicalData,UniversityofChicago

RachelMelamedTheelectronichealthrecordisarisingresourceforquantifyingmedicalpracticeanddiscoveringadverseeffectsofdrugs.Oneofthechallengesofhealthcaredataisthehighdimensionalityofthehealthrecord.Anystudyofpatternsinhealthdatamustaccountfortensofthousandsofpotentiallyrelevantdiagnosesortreatments.Inthiswork,wedevelopindicationembeddings,awaytoreducethedimensionalityofhealthdatawhilecapturingtheinformationrelevanttotreatmentdecisions.Wedemonstratethattheseembeddingsrecovertherapeuticusesofdrugs.Thenweusetheseembeddingsasaninformativerepresentationofrelationshipsbetweendrugs,betweenhealthhistoryeventsanddrugprescriptions,andbetweenpatientsataparticulartimeintheirhealthhistory.Weshowtheapplicationoftheseembeddingsinareasofcurrentresearch.Fordrugsafetystudies,particularlyretrospectivecohortstudies,ourlow-dimensionalrepresentationhelpsinfindingcomparatordrugsandconstructingcomparatorcohorts.Thisenablesustodevelopanautomatedapproachtochoosecomparatorcohortsforatreatedpopulation.

117

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Reproducibility-optimizedstatisticaltestingforomicsstudies

TomiSuomi,LauraElo

TurkuBioscienceCentre,UniversityofTurkuandÅboAkademiUniversity,Turku,Finland

LauraEloDifferentialexpressionanalysisisoneofthemostcommontypesofanalysesperformedonvariousbiologicalandbiomedicaldata,includinge.g.RNA-sequencingandmassspectrometryproteomics.Itistheprocessthatdetectsfeatures,suchasgenesorproteins,showingstatisticallysignificantdifferencesbetweenthesamplegroupsundercomparison.However,asdifferentteststatisticsperformwellindifferentdatasets,thechoiceofanappropriateteststatistichasremainedamajorchallenge.Toaddressthechallenge,ourreproducibility-optimizedteststatistic(ROTS)optimizesthestatisticonthebasisofthedatabymaximizingthereproducibilityofthetop-rankedfeaturesthroughabootstrapprocedure.Finally,itprovidesarankingofthefeaturesaccordingtotheirstatisticalevidencefordifferentialexpressionbetweenthesamplegroups.WehaveshowntherobustperformanceofROTSinarangeofstudiesfromtranscriptomicstoproteomics,coveringbothbulkandsinglecellmeasurements.ROTSisfreelyavailableasanRpackageinBioconductor.

118

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

DataIntegrationExpectationMaps:Towardsmoreinformed'omicdataintegration

TiaTate1,ChristianRichardson2,ClarLyndaWilliams-DeVane3

1UniversityofNorthCarolina-Charlotte,2DukeUniversity,3FiskUniversity

ClarLyndaWilliams-DeVaneInnovativedatatechnologiesanddecreasingcostshaveexpandedthescopeofavailabledatarelatingtovariousdiseases.Avastamountof-omicsdatageneratedatdiverselevels(DNA,RNA,protein,metaboliteandepigenetic)haverevealedrelationshipsofvariousbiologicalprocesses.Generally,thesediversedatatypesareconsideredindependentlywhilecombinationsoftwoormoredatatypesarelessexplored.Thisnarrowapproachoftenfailstoidentifytheintricateinteractionsresponsiblefortheetiologyofcomplexdisease.Completebiologicalmodelsofcomplexdiseasesareonlylikelytobediscoveredifthevariouslevelsof-omicmechanismsareconsideredfromanintegrativeperspective.Integrativemodelsoftenrequiretheintegrationofbiological,computational,mathematical,andstatisticaldomains.However,awell-documentedshortageofresearcherswithacommandofmultipledomainsexists.Thus,wehaveproposedtheuseofDataIntegrationExpectationMaps(DIEMs)asvisualtoolsforfacilitatingtheunderstandingofintegratingvarious-omicdatatypestounderstandcomplexdiseasesbyfillingingapsinbiologicalknowledge.DIEMsprovideauser-friendlyformatforunderstandingintegrativemodeldevelopmentincomplexdiseasesby1)identifyingdataformatsthatcanand/orhavebeenintegrated,2)providingguidanceonthebestmethodtointegratethedata,and3)providinganexpectationofbiologicalinsighttobegainedfromtheintegration.

119

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE

POSTERPRESENTATIONS

120

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Integratedomicsdataminingofsynergisticgenepairsforcancerprecisionmedicine

EunaJeong,ChoaPark,SukjoonYoon

SookmyungWomen'sUniversity

EunJeongCurrenthigh-throughputtechnologiesenablesimultaneousacquisitionofmulti-levelomicsandRNAi/chemicalscreeningdataincancers.Productionandintegrationofthesedatahelpidentifyingassociationsofdrugtargetsandsynergisticbiomarkers(mutationsorgeneexpression),thusacceleratingtheirclinicalapplicationsandpatientstratification.WehaveextensivelycarriedoutcancerbigdataminingandphenotypicsiRNAlibraryscreeningforfindingtheoptimalcombinationoftargetsandbiomarkersforadvancedcancertherapiessuchasregulatingcancerstem-likecells(CSLCs)andoncogenictranscriptionfactors.Ourmultiplexedscreeningdissectphenotypicresponsesintosensitivityandresistancytothetargetknockdown.Combinedwithmutaomeandtransciptomedataofscreenedcelllines,targetome-wideknockdowndatarevealthefunctionalaspectofsynergisticeffectsbetweentargetsiRNAsandmutation/transcriptionsignatures,leadingtothediscoveryofnovelsyntheticlethalgenepairs.Productionandintegrationofthesedataenabledustoidentifytarget-biomarkercombinationsforacceleratingtheirclinicalapplicationsandpatientstratification.

121

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth

ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1

1UniversityofNotreDame,2UniversityofCaliforniaatLosAngeles

ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine"correlation"betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.

122

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata

JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1

1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC

JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.

123

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

PharmGKB:AutomatedLiteratureAnnotations

MichelleWhirl-Carrillo1,LiGong1,RachelHuddart1,KatrinSangkuhl1,RyanWhaley1,MarkWoon1,JuliaM.Barbarino2,JakeLever3,RussB.Altman4,TeriE.Klein5

1DepartmentofBiomedicalDataScience,StanfordUniversity;2FormerlyDepartmentofBiomedicalDataScience,StanfordUniversity;3DepartmentofBioengineering,StanfordUniversity;4DepartmentsofBioengineering,MedicineandGenetics,StanfordUniversity;

5DepartmentsofBiomedicalDataScienceandMedicine,StanfordUniversity

MichelleWhirl-CarilloPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.PharmGKBscientistSmanuallycuratetheprimaryliteraturetocapturedetailsofpublishedpharmacogenomicstudiessuchasvariant-gene-drug-phenotypeassociations,statisticalsignificance,studysizeandpopulationcharacteristics.PharmGKBreferstothesemanuallycreatedannotationsas“VariantAnnotations.”

124

PACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE

WORKSHOPPOSTERPRESENTATIONS

125

Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse

ApolloprovidesCollaborativeGenomeAnnotationEditingwiththepowerofJBrowse

NathanDunn1,ColinDiesh2,RobertBuels2,HelenaRasche3,AnthonyBretaudeau4,NomiHarris1,IanHolmes2

1LawrenceBerkeleyNationalLab,2UCBerkeley,3UniversityofFreiburg,4INRA

NathanDunnGenomeannotationprojectsinvolvemulti-stepworkflowsthatarelargelyautomated.However,evenwithafullyautomatedannotationpipelinevisualinspectionandrefinementofdiversetypesofinformationsuchasgenomicandtranscriptomealignmentsandpredictivemodelsbasedonsequenceelementsarecriticaltoassureandimprovetheaccuracyofthegenomeannotationspriortopublication.Tothisend,Apollo(https://github.com/GMOD/Apollo/)isawebapplicationthatprovidesresponsiveandcustomizablevisualizationandeditingofgenomicelements.BuiltontopoftheJBrowsegenomebrowser(http://jbrowse.org/)anditslargeregistryofplugins(https://gmod.github.io/jbrowse-registry/),Apollosupportsefficientannotationcurationthroughdrag-and-dropediting,alargesuiteofautomatedstructuraleditoperations,theabilitytopre-definecuratorcommentsandannotationstatustomaintainconsistency,attributionofannotationauthors,fine-graineduserandgroupaccessandeditpermissions,andavisualhistoryofrevertibleannotationedits.SettingupanewgenomeannotationinApolloisstraightforward.ApollocanberunfromDockerorfromprovidedAWSinstances,andgenomeswithfeatureevidencecanberetrievedfromanexistingJBrowsedirectory.Wehavealsorecentlyenabledresearcherstouploadtheirgenomesequenceandfeatures(inFASTA,VCF,BAM,orGFF3format)directlytoApollo,minimizingtheneedforscriptingorserveraccess.ItisalsopossibletocreateannotationsontheflyfromBLATorBLASTsearchresults,whichprovidesawaytoinitiateagenepreviouslyannotatedonacloselyrelatedspecies..ApolloprovidesaPythonlibrarythatwrapstheweb-services(https://github.com/galaxy-genome-annotation/python-apollo)sothatworkflowenvironmentssuchasGalaxycanbeautomatedsothattheoutputofanautomatedworkflowcandirectlycreategenomeprojects,provideevidence,andmanageaccesstoanApolloinstance.Apollosupportsseveralpopularformatsfordataexport.StructuralgenomeannotationscanbeexportedasFASTA,GFF3,orVCF(ifannotatingvariants)alongwithanyassociatedmetadata.FunctionalannotationsmappedtoGeneOntologytermscanbeexportedinGPAD2orGPI2format.Apolloisanopen-sourcetoolusedinoveronehundredgenomeannotationprojectsaroundtheworld,rangingfromtheannotationofasinglespeciestolineage-specificeffortssupportingtheannotationofdozensofgenomes.https://github.com/GMOD/Apollo/https://genomearchitect.readthedocs.io/

126

Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse

g:Profiler - One functional enrichment analysis tool, many interfaces serving life science communities

Liis Kolberg, Uku Raudvere, Ivan Kuzmin, Jaak Vilo, Hedi Peterson

University of Tartu

Making sense of gene lists plays an important role in majority of biological and biomedical experiments. There are several methods and tools that help the scientists to carry out the computational load of these tasks. One of such is g:Profiler (https://biit.cs.ut.ee/gprofiler), a widely used toolset for functional interpretation and conversion of gene lists from hundreds of species. g:Profiler has served the community since 2007 and continues to provide life scientists with the most up-to-date data and methods to this day. Keeping the service trustworthy, the results reproducible and transparent has been the main goal of the team developing g:Profiler. The success in this end is indicated in the increasing number of user requests per year, which already in 2019 alone is close to 9 million queries. These millions of queries originating across the world reflect the diversity of usage preferences, skill sets and research goals of the scientific community. We, as the developers of g:Profiler, have taken this into account by developing and supporting different access options which, in hindsight, has been a huge factor in the increasing user traffic. On the one hand, g:Profiler web application provides researchers, who want quick and easily interpretable results, with nice visualizations, searchable tables and data export possibilities. On the other hand, there is a large bioinformatics community, whose members prefer to analyze gene lists in an automated manner. We support them by offering a standardized access through public APIs. And, as R and Python are the most popular programming languages among life scientists with informatics expertise, we have simplified the usage of APIs by wrapping them into corresponding packages named gprofiler2 and gprofiler-official, respectively. For the users somewhere in between, g:Profiler is also available from the Galaxy platform, which is a popular framework for data intensive biomedical research pipelines run in a graphical user interface. It is clear that the tools in such an interdisciplinary field need to be flexible in order to fully benefit the research community. However, from our experience, the complexity of providing a widely distributed toolset lies in the maintenance of the services rather than in the development, and this is the core reason for depreciation of tools. In g:Profiler the separate interfaces all use the data and methods from a shared hub making them reliable and consistent with each other even after the frequent data updates. We are positive that g:Profiler has been able to help thousands of researchers across the life science community because our priorities have been to reuse high quality and regularly updated data, and to maximize the access options so that we would not leave any life science subcommunity behind.

127

Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse

IncreasingusabilityanddisseminationofthePathFXalgorithmusingwebapplicationsanddockersystems

JenniferWilson1,NicholasStepanov2,AjinkyaChalke2,MikeWong3,DragutinPetkovic2,RussB.Altman4

1DepartmentofChemical&SystemsBiologyatStanfordUniversity;2ComputerScienceDeptat

SanFranciscoStateUniversity;3COSEComputingforLifeSciencesatSanFranciscoStateUniversity;4HelixGroupatStanfordUniversity

MikeWongLimitedefficacyandunacceptablesafetyconfoundtherapeuticdevelopment.Identifyingpotentialliabilitiesearlierindrugdevelopmentcouldsignificantlyimprovesuccessrates.Recently,incollaborationwiththeUSFDA,wedevelopedthePathFXalgorithmandopenlyavailablePathFXwebapplicationforbetterunderstandingpathway-levelsafetyandefficacyphenotypesassociatedwithadrug’starget(s).RunningPathFXalgorithmlocallywouldenableimprovedefficiency,security,andprivacy,howeverinstallationofPathFXanditsdependenciesischallengingfornon-computationalscientistsandpreventsdissemination.Inaddition,whilePathFX-webquicklyanalyzesnetworkassociations,thephenotypeclusteringfeaturehashighcomputationalcoststhatlimittheefficiencyofthesharedcloudserver.Toresolvethesechallenges,wedevelopedPathFX-webDockercontainerwhichprovidesaneasy-to-install,easy-to-usewebinterface,astandalonecommand-lineformulationtoPathFX,addedsecurity/privacyandallowsleveragingofthecomputationalpoweroftheuser’shardware.

128

TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA

WORKSHOPPOSTERPRESENTATIONS

129

Workshop:TBIworkshop

Identificationofbiomarkersrelatedtoautismspectrumdisorderusinggenomicinformation

LeenaSait,MarthaGizaw,IosifVaisman

SchoolofSystemsBiology,GeorgeMasonUniversity

LeenaSaitAutismspectrumdisorder(ASD)isoneofthemostcommonneurodevelopmentaldisorders.Worldwide,ASDtendstohaveaprevalenceofoneper132persons,withanestimatedprevalenceof1in59children,accordingtoCDC’sAutismandDevelopmentalDisabilitiesMonitoringNetwork.Todate,noeffectivemedicaltreatmentsforthecoresymptomsofASDexists.However,biomarkerscapableofdetectinganddiagnosingASDcanhelptotranslateexperimentalresearchresultstobenchsideclinicalpractices.BiomarkerdiscoveryinASDiscomplicatedbythediversityofcoresymptomswhichcomprisedeficitsinsocialcommunication,presenceofrigid,repetitiveandstereotypicalbehaviors,andcomorbidmedical(e.g.,epilepsy)orpsychiatricsymptoms.TheEU-AIMSLongitudinalEuropeanAutismProject(LEAP),thelargestconsortiamadeagreatadvancementinthediscoveryofbiomarkersforASD.Itseekstoidentifystratificationbiomarkersusingneurobiologicalorneurocognitivemeasures,neuroimaging,electrophysiology,biochemistryandgenetics.Thisworkisaimedattheidentificationofsinglenucleotidepolymorphisms(SNPs)basedonSNPgenotypingingenomicDNAinalargecohortofASDpatientsandunaffectedrelatedindividualstohelpunderstandtheexactgeneticcausesofASD.Wehypothesizedthatrankingthegenesbasedondistanceinthespaceoftheallelesfrequenciesbetweenaffectedandunaffectedpopulationscanbeusedtoidentifynewputativebiomarkers.ThedatasetretrievedfromtheGeneExpressionOmnibusdatabase(GSE6754)containsmorethan6000samplesfrom1,400families.OurresultsshowthattheSNPsthatarehighlyrankedbythedistanceinthree-dimensionalgenotypecountspacebetweenalltheaffectedandunaffectedsubjectsinthecohortaremorelikelytobelinkedtoASD.TheseresultscanopennewpossibilitiesforfurtherinvestigationinidentifyingthegeneticmechanismsofASD.

130

Workshop:TBIworkshop

Apan-cancer3-genesignaturetopredictdormancy

IvyTran1,AnchalSharma2,SubhajyotiDe2

1RutgersUniversity-Camden,2RutgersCancerInstituteofNewJersey

IvyTranTumordormancyischaracterizedbythedisseminationofhibernatingtumorcellsthatdonotproliferateuntilyearsafterapparentlysuccessfulremovalofpatients’primarycancer,resultinginthelaterelapseofthecancer.Distinguishingbetweentheriskofearly(£8months)andlate(³5years)relapseincancerpatientsisimportantforthetargetedtreatmentofthetumor.Inthisstudy,weidentified53genesthatweresignificantlyup-regulatedordown-regulatedindormantcells,fromwhichthreegenes,CD300LG,OCIAD2,VSIG4,weredeterminedbyrecursivefeatureeliminationtobethemostimportantfeaturesinpredictingtumordormancy.Usingthisthreegenesignature,wetrainedaRandomForestalgorithmonacross-validated(10foldrepeated3times)dataset(n=422)randomlysubsettedintotrainingdata(75%)andtestdata(25%),consistingofsevendifferenttumortypes-testicularcancer,breastcancer,glioblastomamultiforme,lungcancer,colonrectalcancer,kidneycancerandmelanoma.Thetunedpredictionmodelyielded80.19%predictionaccuracyusingconfusionmatrixanalysis,and82.74%predictionaccuracywhenusingAUCofaROCcurveastheaccuracymetric.Whenindependentlytestingthemodelonavalidationset(n=44)oflivercancerdownloadedfromICGC,confusionmatrixanalysisyieldeda67.44%accuracyandAUCofaROCcurveyieldeda60.48%accuracy.Thisidentified3-genesignaturecanbeusefulinpredictingearlyorlaterelapseofcancerinpatientsinclinicalpractice.

131

AUTHORINDEX

’tJong,Geert·85

A

Abyzov,Alexej·111Adkins,Joshua·96

·3Agrawal,MonicaAlavi,Ali·99

·28Allen,MaryA.·47Alterovitz,Wei-Lun

Althagafi,Azza·70·27,34,37,123,127Altman,RussB.

·23Anastopoulos,IoannisN.·21Andrade-Navarro,MiguelA.

Andrianova,Katia·74·31Arslanturk,Suzan

·50Atwal,Gurnit

B

·63Bae,HoBae,Taejeong·111

·65Baladandayuthapani, VeerabhadranBaltrus,David·96Barash,Yoseph·92

·34,123Barbarino,JuliaM.·10,108Barik,Amita

Barlaskar,Sabiha·99·39Barnard,Martha·19Beam,AndrewL.

Bebek,Gurkan·75Belyeu,Jon·83

·60Benchek,PenelopeBerger,Howard·85

·65Bhattacharyya,Rupam·22Blinder,Pablo·20,76,93Bobak,CarlyA.

Bock,Christoph·91·25Bourque,Guillaume

·7Branch,Andrea·2Brand,Lodewijk

Brannon,Charlotte·77Bretaudeau,Anthony·125

·27Brinton,ConnorBrodie,Sonia·85Brooks,ThomasG.·79,92Brown,James·85Brown,Joe·83Brown,Yaadira·80Brunak,Søren·113

Brunner,James·114Buels,Robert·125

·4Bui,NamBull,ShelleyB.·105

·21Burkhardt,Sophie·60,64Bush,WilliamS.

·42Bustamante,CarlosD.

C

·6Cai,ChunhuiCai,DanDan·99

·19Cai,TianxiCalvanese,Vincenzo·95

·44Candido,ElisaCapellera-Garcia,Sandra·95Carleton,BruceC.·78,85Carrillo,KatherineI.·81

·49Ceri,StefanoChalke,Ajinkya·127Chaudhry,Shahnaz·85

·3Chen,IreneY.·4Chen,JessicaW.

·12Chen,JianhanChen,Jun·70

·45Chen,Yang·38,122Chen,Yong

Cheong,HeeJin·102·56Cheong,Jae-Ho·53Cherng,SarahT.

Chia,Nicholas·71·50Chmura,Jacob·63Choi,Hyun-Soo

·68,103Chrisman, Brianna·54Christensen,BrockC.

·15Christensen,SarahChu,Chong·82

·6Cohen,WilliamW.·52Coker,Beau

·64CookeBailey,JessicaN.Cormier,Michael·83CornwellIII,EdwardE.·80

·4,42Costa,HelioA.·64Crawford,DanaC.

·5Crowell,Andrea·8Cui,Tianyi

D

Dale,Ryan·88·53Danieletto,Matteo

De,Subhajyoti·130·27Derry,Alexander

Diesh,Colin·125Ding,Yali·84

132

Dovat,Sinisa·84·28Dowell,RobinD.

·31Draghici,SorinDrögemöller,BrittI.·78,85

·38,122Duan,Rui·44Duchen,Raquel·7,53Dudley,JoelT.·47Dunker,A.Keith

Dunlap,Kaiti·103 ·68Dunlap, Kaitlyn

Dunn,Nathan·125Durmaz,Arda·75

E

Ekstrand,Sophia·95·15El-Kebir,Mohammed

Elo,Laura·117

F

·47Faraggi,EshelFarlik,Matthias·91Feng,Song·96

·43Feng,YunyiFitzGerald,GarretA.·79,92

·60Fondran,JeremyR.Fortelny,Matthias·91

·19Fried,Inbar·23Friedl,Verena

G

·59Gao,Jean ·65Garmire, Lana

Gerstein,Mark·77·10,108Ghadermarzi,Sina

Ghayoori,Sholeh·85Ghosh,Sayan·96

·28Gilchrist,AlisonR.Gizaw,Martha·129

·21Glodde,Josua·22Golgher,Lior

·34,123Gong,LiGorjifard,Sayeh·88Grant,GregoryR.·79,92Groeneweg,GabriellaS.S.·85Gumerov,VadimM.·86

·27,37Guo,MargaretGuo,Yicheng·106

·22Gur,ShirGursoy,Gamze·77

H

·65Ha,MinJin·23Haan,David

·68,103Haber, Nick·35,121Hachen,David

·60Haines,JonathanL.Halappanavar,Mahantesh·96

·66Hall, MollyA.·60Hamilton-Nelson,KaraL.

·24Hao,Jie·5Harati,Sahar

·16,87Harrigan,CaitlinF.Harris,Nomi·125

·8Hauskrecht,Milos ·66He, Xi

·32Hernandez-Ferrer,CarlesHigginson,Michelle·85

·20,76,93Hill,JaneE.·25Hocking,TobyDylan

Hoehndorf,Robert·70,90Hogenesch,JohnB.·92

·11Hogue,ChristopherW.V.Holmes,Ian·125Hong,Elliot·115

·3Horng,Steven·47Huang,Fei·2Huang,Heng·58Huang,Kun

·34,123Huddart,RachelHughitt,V.Keith·88Husic,Arman·103

·56Hwang,TaeHyun

I

Ito,Shinya·78,85

J

·44Jaakkimainen,Liisa·11Jagannathan,N.Suhas

Jahanshad,Neda·115Jang,Hyun-Jong·94Jenkins,Willysha·89Jeong,Euna·120Jørgensen,IsabellaFriis·113Jouline,Igor·74

·7Jun,Tomi·63Jung,Dahuin

Jung,Jae-Yoon·103

133

K

Kafkas,Senay·90Kalantari,John·71

·68Kalantarian, HaikKalyva,Maria·111

·24Kang,Mingon·56Kar,Nabhonil

Karjalainen,Anzhelika·91·10,108Katuwawala,Akila

Keats,JonathanJ.·88·41Kelly,Libusha

Kent,Jack·103Khan,Ariful·96

·41Khan,SaadKim,Jacob·114Kim,WooJoo·102

·64Kinzy,Tyler ·66Kleber, MarcusE.

·34,81,123Klein,TeriE. ·68,103Kline, Aaron

·47Kloczkowski,AndrzejKober,KordM.·114

·33Kocher,Jean-PierreKochunov,Peter·115

·55Koestler,DevinC.·19Kohane,IsaacS.

Kolberg,Liis·126·19,52Kompa,Benjamin

·32Kong,SekWonKoohi-Moghadam,Mohamad·72

·24Kosaraju,SaiChandraKoster,Johannes·83

·21Kramer,Stefan·13Kriwacki,RichardW.

Krunic,Milica·91·4Kunder,ChristianA.

·60Kunkle,BrianW.·10,108Kurgan,Lukasz

Kuzmin,Ivan·126

L

Lahens,NicholasF.·79,92·33Larson,MelissaC.·33Larson,NicholasB.

Lassnig,Caroline·91Lawrence,CrisW.·79,92LeBlanc,Emilie·103Lee,E.Alice·82Lee,HanSol·102

·53Lee,Hao-ChihLee,Joon-Yong·96

·45Lee,RenaLee,Soohyun·82Lee,SungHak·94

·15,17Leiserson,MarkD.M.·34,123Lever,Jake

·54Levy,JoshuaJ.

Li,Hongyan·72·7Li,Li

·38,122Li,Ruowang·26Lichtarge,Olivier

Lim,Sooyeon·102·43Lin,Deborah

·64Lin,John·20,93Lin,Justin·43Lin,Simon·43Liu,Chang·65Liu,Qingzhi·35,121Liu,Shikang·12Liu,Xiaorong

Liu,Zhandong·100·35,121Lizardo,Omar

Lowry,WilliamE.·100·6Lu,Xinghua

·36Luthria,Gaurav·45Lv,Tianling

M

Ma,Feiyang·95·58Machiraju,Raghu

Macho-Maschler,Sabine·91 ·66Maerz, Winfried

Magee,LauraA.·85·58Mallick,Parag

Manlove,Logan·111Manrai,ArjunK.·101Mariani,Jessica·111

·5Mayberg,HelenMcDermott,Jason·96

·20,93McDonnell,Lauren·55Meier,Richard

Melamed,RachelD.·116Melas-Kyriazi,Luke·101

·47Meng,JingweiMiao,Fudan·85Michalowski,AleksandraM.·88Mikkola,HannaK.A.·95

·35,121Milenkovic,Tijana·53Miotto,Riccardo·13Mitrea,DianaM.

Mock,BeverlyA.·88Monroy,Rebeca·82

·38,122Moore,JasonH.·43Moosavinasab,Soheil

·16,44,50,87Morris,QuaidMueller,Mathias·91

·66Mueller-Myhsok, Bertram

N

·33Na,JieNakatochi,Masahiro·97Nayak,Soumyashant·79,92Nelson,Heidi·71

134

Nelson,William·96·5Nemati,Shamim

Nemesure,Matthew·93·20Nemesure,MatthewD.

·55Neums,LisaNguyen,Justine·96

·31Nguyen,Tin·2Nichols,Kai

·42Nie,AllenNing,Michael·103Nishi,Hafumi·106Noh,JiYun·102

O

·64O'Toole,JohnF.Oh,JiEun·104Ola,MojoyinolaJoanna·91

·10,47,108Oldfield,ChristopherJ.Olufajo,OlubodeA.·80Orloff,Mohammed·75Ostojic,Annie·98

P

·19Palmer,NathanPark,Choa·120Park,Gunseok·104Park,PeterJ.·82

·56Park,Sunho·50Park,Yoonsik

·57Parmigiani,Giovanni ·68,103Paskov, KelleyMarie ·66Passero, Kristin

Patel,ChiragJ.·101·42Patel,RonakY.

·57Patil,Prasad ·68Patnaik, Ritik

Payne,JonathonL.·84Pedersen,Brent·83Pellegrini,Matteo·95Penev,Yordan·103

·37Pershad,Yash·7Perumalswami,Ponni

Peterson,Hedi·126Petkovic,Dragutin·99,127

·26Pham,Minh ·67Pietras, ChristopherMichael

·42Pineda,ArturoL.·49Pinoli,Pietro·49Piro,Rosario

·35,121Poellabauer,ChristianPoelzl,Andrea·91Pohodich,AmyE.·100Polley,EricC.·88

·67Power, LiamProukakis,Christos·111Pruneda,Jonathan·96

·17Przytycka,TeresaM.

Q

Quinlan,AaronR.·83

R

Raman,AyushT.·100·57Ramchandran,Maya·61Ramsey,StephenA.

Rasche,Helena·125Rassekh,Shahrad·78Rassekh,ShahradR.·85Raudvere,Uku·126Reimand,Jüri·110Richardson,Christian·89,118

·47Romero,PedroRoss,ColinJ.D.·78,85

·33Rowsey,Ross·16,87Rubanova,Yulia

·39Ryder,Nathan

S

Sagers,Luke·101Sait,Leena·129

·54Salas,LucasA.Sanatani,Shubhayan·85

·34,123Sangkuhl,KatrinSarantopoulou,Dimitra·79,92

·28Sawyer,SaraL.·38,122Scheuemie,MartijnJ.

·19Schmaltz,AllenSchug,Jonathan·92

·68Schwartz, JesseySedlazeck,Fritz·111

·64Sedor,JohnR.Sekar,Shobana·111

·16,87Selega,Alina·17Sharan,Roded

Sharma,Anchal·130·58Sharpnack,Michael

Shaw,Kaitlyn·85·2Shen,Li·19Shi,Xu

Shoebridge,Stephen·91·21Siekiera,Julia

Simmons,JohnK.·88Skander,Dannielle·75

·67Slonim, DonnaK.·13Somjee,Ramiz·24Song,DaeHyun

Song,JoonYoung·102·3Sontag,David

Sørensen,SørenSchwartz·113·33Sosa,CarlosP.

135

·27Sosa,DanielN.Southerland,William·80

·54Sriharan,AravindhanSrinivasan,Anand·92

·58Srivastava,Arunima·28Stabell,AlexC.

·49Stamoulakatou,Eirini·28Stanley,JacobT.

Staub,Michelle·85·4Stehr,Henning

Stepanov,Nicholas·127 ·68,103Stockham, Nathaniel

·35,121Striegel,AaronStrobl,Birgit·91

·23Stuart,JoshuaM.Sun,Hongzhe·72Sun,MinWoo·103Suomi,Tomi·117

T

·23Tao,Ruikang·6Tao,Yifeng

·68Tariq, QandeelTate,Tia·118

·55Thompson,JeffreyA.Thompson,Paul·115

·39Tintle,NathanTomasini,Livia·111

·38,122Tong,JiayiTran,Ivy·130

·59Tran,NhatTrueman,Jessica·85

·24Tsaku,NelsonZange·11Tucker-Kellogg,Lisa

U

Urban,AlexanderE.·111·47Uversky,VladimirN.

V

Vaccarino,FloraM.·111·54Vaickus,LouisJ.

Vaisman,Iosif·129·7Vandromme,Maxence

·68,103Varma, MayaVilo,Jaak·126

·68,103Voss, Catalin

W

Wagner,Sarah·77 ·68,103Wall, DennisP.

Wan,Ying-Wooi·100·42Wand,Hannah

·33Wang,Chen·29Wang,Gao

Wang,Haibo·72·2Wang,Hua

Wang,Junwen·72·36Wang,Qingbo

Wang,Yuchuan·72·29Wang,Yue

·29Wang,Yunlong·60Warfe,Mike

·68,103Washington,Peter·19Weber,Griffin

·27Wei,Eric·23Weinstein,AlanaS.

West,Nicholas·85·39Westra,Jason·34,123Whaley,Ryan

·60Wheeler,NicholasR.·34,123Whirl-Carrillo,Michelle

Whyte,SimonD.·85Williams-DeVane,ClarLynda·89,118Wilson,Jennifer·127

·44Wilton,AndrewS.·44Wodchis,Walter·17Wojtowicz,Damian

·39Wolf,Jack·22Wolf,Lior

·23Wong,ChristopherK.Wong,Mike·127

·34,123Woon,MarkWright,GalenE.B.·78,85

·42Wright,MattW.·29Wu,Tong·42Wulf,Bryan

X

·39Xia,Xueting·45Xing,Lei·47Xue,Bin

Y

Yalamanchili,HariKrishna·100Yang,HeeChul·104Yang,Jizhou·99Yang,Xinming·72

·61Yao,YaoYavartanu,Fatemeh·105Yoo,YunJoo·105Yoon,Sukjoon·120

·63Yoon,Sungroh·50Young,Adamo

·8Yu,KeYue,Feng·84

136

Z

·4Zehnder,JamesL.·43Zeng,Xianlong

Zhang,Bo·84·44Zhang,Haoran

Zhang,Lin·106

·8Zhang,Mingda·45Zhao,Wei

Zhou,Bo·111 ·66Zhou, Jiayan

Zhulin,IgorB.·86Zoghbi,HudaY.·100

·42Zou,James