PACIFICSYMPOSIUMONBIOCOMPUTING2020
ABSTRACTBOOK
PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwith
thecorrespondingnumber(e.g.,ifyourabstractisonpage50,putyourposteronboard#50).
Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.
Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlinedintheTableofContents
andinboldtextontheabstracts.
PROCEEDINGSPAPERSWITHORALPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE....................................................1PREDICTINGLONGITUDINALOUTCOMESOFALZHEIMER'SDISEASEVIAATENSOR-BASEDJOINT
.........................................................................................................................2CLASSIFICATIONANDREGRESSIONMODELLodewijkBrand,KaiNichols,HuaWang,HengHuang,LiShen,fortheADNI
ROBUSTLYEXTRACTINGMEDICALKNOWLEDGEFROMEHRS:ACASESTUDYOFLEARNINGAHEALTH................................................................................................................................................................3KNOWLEDGEGRAPH
IreneY.Chen,MonicaAgrawal,StevenHorng,DavidSontag...........................4INCREASINGCLINICALTRIALACCRUALVIAAUTOMATEDMATCHINGOFBIOMARKERCRITERIA
JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehrADDRESSINGTHECREDITASSIGNMENTPROBLEMINTREATMENTOUTCOMEPREDICTIONUSINGTEMPORAL
...........................................................................................................................................................5DIFFERENCELEARNINGSaharHarati,AndreaCrowell,HelenMayberg,ShamimNemati
FROMGENOMETOPHENOME:PREDICTINGMULTIPLECANCERPHENOTYPESBASEDONSOMATICGENOMIC.................................................................................................6ALTERATIONSVIATHEGENOMICIMPACTTRANSFORMER
YifengTao,ChunhuiCai,WilliamW.Cohen,XinghuaLuAUTOMATEDPHENOTYPINGOFPATIENTSWITHNON-ALCOHOLICFATTYLIVERDISEASEREVEALSCLINICALLY
................................................................................................................................................7RELEVANTDISEASESUBTYPESMaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi
...8MONITORINGICUMORTALITYRISKWITHALONGSHORT-TERMMEMORYRECURRENTNEURALNETWORKKeYu,MingdaZhang,TianyiCui,MilosHauskrecht
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS......................................9DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLY
............................................................................................................................................................10DISORDEREDREGIONSSinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
DENOVOENSEMBLEMODELINGSUGGESTSTHATAP2-BINDINGTODISORDEREDREGIONSCANINCREASESTERIC.....................................................................................................................................11VOLUMEOFEPSINBUTNOTEPS15
N. SuhasJagannathan,ChristopherW.V.Hogue,LisaTucker-KelloggMODULATIONOFP53TRANSACTIVATIONDOMAINCONFORMATIONSBYLIGANDBINDINGANDCANCER-
......................................................................................................................................................12ASSOCIATEDMUTATIONSXiaorongLiu,JianhanChen
EXPLORINGRELATIONSHIPSBETWEENTHEDENSITYOFCHARGEDTRACTSWITHINDISORDEREDREGIONSAND...............................................................................................................................................................13PHASESEPARATION
RamizSomjee,DianaM.Mitrea,RichardW.Kriwacki
MUTATIONALSIGNATURES...........................................................................................................................14......................................................15PHYSIGS:PHYLOGENETICINFERENCEOFMUTATIONALSIGNATUREDYNAMICS
SarahChristensen,MarkD.M.Leiserson,MohammedEl-KebirTRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSB ..16ASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIESCaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega
DNAREPAIRFOOTPRINTUNCOVERSCONTRIBUTIONOFDNAREPAIRMECHANISMTOMUTATIONAL.............................................................................................................................................................................17SIGNATURES
DamianWojtowicz,MarkD.M.Leiserson,RodedSharan,TeresaM.Przytycka
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................18
.........19CLINICALCONCEPTEMBEDDINGSLEARNEDFROMMASSIVESOURCESOFMULTIMODALMEDICALDATAAndrewL.Beam,BenjaminKompa,AllenSchmaltz,InbarFried,GriffinWeber,NathanPalmer,XuShi,TianxiCai,IsaacS.Kohane
ii
ASSESSMENTOFIMPUTATIONMETHODSFORMISSINGGENEEXPRESSIONDATAINMETA-ANALYSISOF...........................................................................................................20DISTINCTCOHORTSOFTUBERCULOSISPATIENTS
CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.HillTOWARDSIDENTIFYINGDRUGSIDEEFFECTSFROMSOCIALMEDIAUSINGACTIVELEARNINGANDCROWD
.................................................................................................................................................................................21SOURCINGSophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer
.....................................22MICROVASCULARDYNAMICSFROM4DMICROSCOPYUSINGTEMPORALSEGMENTATIONShirGur,LiorWolf,LiorGolgher,PabloBlinder
.....................................................23USINGTRANSCRIPTIONALSIGNATURESTOFINDCANCERDRIVERSWITHLUREDavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart
PAGE-NET:INTERPRETABLEANDINTEGRATIVEDEEPLEARNINGFORSURVIVALANALYSISUSING.......................................................................................................24HISTOPATHOLOGICALIMAGESANDGENOMICDATA
JieHao,SaiChandraKosaraju,NelsonZangeTsaku,DaeHyunSong,MingonKangMACHINELEARNINGALGORITHMSFORSIMULTANEOUSSUPERVISEDDETECTIONOFPEAKSINMULTIPLE
.....................................................................................................................................................25SAMPLESANDCELLTYPESTobyDylanHocking,GuillaumeBourque
GRAPH-BASEDINFORMATIONDIFFUSIONMETHODFORPRIORITIZINGFUNCTIONALLYRELATEDGENESIN...................................................................................................................26PROTEIN-PROTEININTERACTIONNETWORKS
MinhPham,OlivierLichtargeALITERATURE-BASEDKNOWLEDGEGRAPHEMBEDDINGMETHODFORIDENTIFYINGDRUGREPURPOSING
....................................................................................................................................27OPPORTUNITIESINRAREDISEASESDanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman
...............28TWO-STAGEMLCLASSIFIERFORIDENTIFYINGHOSTPROTEINTARGETSOFTHEDENGUEPROTEASEJacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell
ENHANCINGMODELINTERPRETABILITYANDACCURACYFORDISEASEPROGRESSIONPREDICTIONVIA....................................................................................................29PHENOTYPE-BASEDPATIENTSIMILARITYLEARNING
YueWang,TongWu,YunlongWang,GaoWangPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................30
...............31INTEGRATEDCANCERSUBTYPINGUSINGHETEROGENEOUSGENOME-SCALEMOLECULARDATASETSSuzanArslanturk,SorinDraghici,TinNguyen
ASSESSMENTOFCOVERAGEFORENDOGENOUSMETABOLITESANDEXOGENOUSCHEMICALCOMPOUNDSUSING...................................................................................................................32ANUNTARGETEDMETABOLOMICSPLATFORM
SekWonKong,CarlesHernandez-FerrerCOVERAGEPROFILECORRECTIONOFSHALLOW-DEPTHCIRCULATINGCELL-FREEDNASEQUENCINGVIAMULTI-
..............................................................................................................................................................33DISTANCELEARNINGNicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey
..............................................................................................34PGXMINE:TEXTMININGFORCURATIONOFPHARMGKBJakeLever,JuliaM.Barbarino,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MichelleWhirl-Carrillo,MarkWoon,TeriE.Klein,RussB.Altman
....................................35THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTHShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic
.............................36IMPLEMENTINGACLOUDBASEDMETHODFORPROTECTEDCLINICALTRIALDATASHARINGGauravLuthria,QingboWang
....................................37PATHWAYANDNETWORKEMBEDDINGMETHODSFORPRIORITIZINGPSYCHIATRICDRUGSYashPershad,MargaretGuo,RussB.Altman
ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVEL..........................................................................................................................................................................................38DATA
JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen
iii
COMPUTATIONALLYEFFICIENT,EXACT,COVARIATE-ADJUSTEDGENETICPRINCIPALCOMPONENTANALYSISBY..................................................39LEVERAGINGINDIVIDUALMARKERSUMMARYSTATISTICSFROMLARGEBIOBANKS
JackWolf,MarthaBarnard,XuetingXia,NathanRyder,JasonWestra,NathanTintle
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................40
.......................41MULTICLASSDISEASECLASSIFICATIONFROMMICROBIALWHOLE-COMMUNITYMETAGENOMESSaadKhan,LibushaKelly
.....................................42LITGEN:GENETICLITERATURERECOMMENDATIONGUIDEDBYHUMANEXPLANATIONSAllenNie,ArturoL.Pineda,MattW.Wright,HannahWand,BryanWulf,HelioA.Costa,RonakY.Patel,CarlosD.Bustamante,JamesZou
...........................................43MULTILEVELSELF-ATTENTIONMODELANDITSUSEONMEDICALRISKPREDICTIONXianlongZeng,YunyiFeng,SoheilMoosavinasab,DeborahLin,SimonLin,ChangLiu
IDENTIFYINGTRANSITIONALHIGHCOSTUSERSFROMUNSTRUCTUREDPATIENTPROFILESWRITTENBY.................................................................................................................................................44PRIMARYCAREPHYSICIANS
HaoranZhang,ElisaCandido,AndrewS.Wilton,RaquelDuchen,LiisaJaakkimainen,WalterWodchis,QuaidMorris
OBTAININGDUAL-ENERGYCOMPUTEDTOMOGRAPHY(CT)INFORMATIONFROMASINGLE-ENERGYCTIMAGE.........................................45FORQUANTITATIVEIMAGINGANALYSISOFLIVINGSUBJECTSBYUSINGDEEPLEARNING
WeiZhao,TianlingLv,RenaLee,YangChen,LeiXingINTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS....................................46
............................................................47MANY-TO-ONEBINDINGBYINTRINSICALLYDISORDEREDPROTEINREGIONSWei-LunAlterovitz,EshelFaraggi,ChristopherJ.Oldfield,JingweiMeng,BinXue,FeiHuang,PedroRomero,AndrzejKloczkowski,VladimirN.Uversky,A.KeithDunker
MUTATIONALSIGNATURES...........................................................................................................................48......................................49IMPACTOFMUTATIONALSIGNATURESONMICRORNAANDTHEIRRESPONSEELEMENTS
EiriniStamoulakatou,PietroPinoli,StefanoCeri,RosarioPiroGENOMEGERRYMANDERING:OPTIMALDIVISONOFTHEGENOMEINTOREGIONSWITHCANCERTYPESPECIFIC
.....................................................................................................................................50DIFFERENCESINMUTATIONRATESAdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................51
..........................................................52LEARNINGALATENTSPACEOFHIGHLYMULTIDIMENSIONALCANCERDATABenjaminKompa,BeauCoker
................53SCALINGSTRUCTURALLEARNINGWITHNO-BEARSTOINFERCAUSALTRANSCRIPTOMENETWORKSHao-ChihLee,MatteoDanieletto,RiccardoMiotto,SarahT.Cherng,JoelT.Dudley
PATHFLOWAI:AHIGH-THROUGHPUTWORKFLOWFORPREPROCESSING,DEEPLEARNINGAND.......................................................................................................................54INTERPRETATIONINDIGITALPATHOLOGY
JoshuaJ.Levy,LucasA.Salas,BrockC.Christensen,AravindhanSriharan,LouisJ.VaickusIMPROVINGSURVIVALPREDICTIONUSINGANOVELFEATURESELECTIONANDFEATUREREDUCTION
...................................................55FRAMEWORKBASEDONTHEINTEGRATIONOFCLINICALANDMOLECULARDATA*LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson
BAYESIANSEMI-NONNEGATIVEMATRIXTRI-FACTORIZATIONTOIDENTIFYPATHWAYSASSOCIATEDWITH.............................................................................................................................................................56CANCERPHENOTYPES
SunhoPark,NabhonilKar,Jae-HoCheong,TaeHyunHwang......................................................................................57TREE-WEIGHTINGFORMULTI-STUDYENSEMBLELEARNERS
MayaRamchandran,PrasadPatil,GiovanniParmigianiPTREXPLORER:ANAPPROACHTOIDENTIFYANDEXPLOREPOSTTRANSCRIPTIONALREGULATORY
.............................................................................................................................58MECHANISMSUSINGPROTEOGENOMICSArunimaSrivastava,MichaelSharpnack,KunHuang,ParagMallick,RaghuMachiraju
iv
NETWORKREPRESENTATIONOFLARGE-SCALEHETEROGENEOUSRNASEQUENCESWITHINTEGRATIONOF............................................................................59DIVERSEMULTI-OMICS,INTERACTIONS,ANDANNOTATIONSDATA
NhatTran,JeanGao...............60HADOOPANDPYSPARKFORREPRODUCIBILITYANDSCALABILITYOFGENOMICSEQUENCINGSTUDIES
NicholasR.Wheeler,PenelopeBenchek,BrianW.Kunkle,KaraL.Hamilton-Nelson,MikeWarfe,JeremyR.Fondran,JonathanL.Haines,WilliamS.Bush
CERENKOV3:CLUSTERINGANDMOLECULARNETWORK-DERIVEDFEATURESIMPROVECOMPUTATIONAL..............................................................................................................61PREDICTIONOFFUNCTIONALNONCODINGSNPS
YaoYao,StephenA.RamseyPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................62
.................63ANOMIGAN:GENERATIVEADVERSARIALNETWORKSFORANONYMIZINGPRIVATEMEDICALDATAHoBae,DahuinJung,Hyun-SooChoi,SungrohYoon
FREQUENCYOFCLINVARPATHOGENICVARIANTSINCHRONICKIDNEYDISEASEPATIENTSSURVEYEDFOR..........................................................................64RETURNOFRESEARCHRESULTSATACLEVELANDPUBLICHOSPITAL
DanaC.Crawford,JohnLin,JessicaN.CookeBailey,TylerKinzy,JohnR.Sedor,JohnF.O'Toole,WilliamsS.Bush
................65NETWORK-BASEDMATCHINGOFPATIENTSANDTARGETEDTHERAPIESFORPRECISIONONCOLOGYQingzhiLiu,MinJinHa,RupamBhattacharyya,LanaGarmire,VeerabhadranBaladandayuthapani
PHENOME-WIDEASSOCIATIONSTUDIESONCARDIOVASCULARHEALTHANDFATTYACIDSCONSIDERING..................................................................66PHENOTYPEQUALITYCONTROLPRACTICESFOREPIDEMIOLOGICALDATA
KristinPassero,XiHe,JiayanZhou,BertramMueller-Myhsok,MarcusE.Kleber,WinfriedMaerz,MollyA.Hall
.....................................67ATEMPO:PATHWAY-SPECIFICTEMPORALANOMALIESFORPRECISIONTHERAPEUTICSChristopherMichaelPietras,LiamPower,DonnaK.Slonim
.........................................................68FEATURESELECTIONANDDIMENSIONREDUCTIONOFSOCIALAUTISMDATAPeterWashington,KelleyMariePaskov,HaikKalantarian,NathanielStockham,CatalinVoss,AaronKline,RitikPatnaik,BriannaChrisman,MayaVarma,QandeelTariq,KaitlynDunlap,JesseySchwartz,NickHaber,DennisP.Wall
POSTERPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................69PRIORITIZINGCOPYNUMBERVARIANTSUSINGPHENOTYPEANDGENEFUNCTIONALSIMILARITY.....................70AzzaAlthagafi,JunChen,RobertHoehndorf
INFERRINGTHEREWARDFUNCTIONSTHATGUIDECANCERPROGRESSION..............................................................71JohnKalantari,HeidiNelson,NicholasChia
PREDICTINGDISEASE-ASSOCIATEDMUTATIONOFMETAL-BINDINGSITESINPROTEINSUSINGADEEPLEARNINGAPPROACH................................................................................................................................................................................72MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun
GENERAL...............................................................................................................................................................73RANKINGRASPATHWAYMUTATIONSUSINGEVOLUTIONARYHISTORYOFMEK1...................................................74KatiaAndrianova,IgorJouline
INTEGRATIVEANALYSISOFCOPDANDLUNGCANCERMETADATAREVEALSSHAREDALTERATIONSINIMMUNERESPONSE,PTENANDPI3K-AKTPATHWAYS}.............................................................................................................75DannielleSkander,ArdaDurmaz,MohammedOrloff,GurkanBebek
INVESTIGATINGSOURCESOFIRREPRODUCIBILITYINANALYSISOFGENEEXPRESSIONDATA..................................76CarlyA.Bobak,JaneE.Hill
ETHEREUMANDMULTICHAINBLOCKCHAINSASSECURETOOLSFORINDIVIDUALIZEDMEDICINE........................77CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein
v
GENOMICPREDICTORSOFL-ASPARAGINASE-INDUCEDPANCREATITISINPEDIATRICCANCERPATIENTS............78BrittDrogemoller,GalenE.B.Wright,ShahradRassekh,ShinyaIto,BruceCarleton,ColinRoss,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium
NITECAP:ANOVELMETHODANDINTERFACEFORTHEIDENTIFICATIONOFCIRCADIANBEHAVIORINHIGHLYPARALLELTIME-COURSEDATA.............................................................................................................................................79ThomasG.Brooks,CrisW.Lawrence,NicholasF.Lahens,SoumyashantNayak,DimitraSarantopoulou,GarretA.FitzGerald,GregoryR.Grant
THEINTERPLAYOFOBESITYANDRACE/ETHNICITYONMAJORPERINATALCOMPLICATIONS.............................80YaadiraBrown,MPH;OlubodeA.Olufajo,MD,MPH;EdwardE.CornwellIII,MD;WilliamSoutherland,PhD
ACOMPARISONOFPHARMACOGENOMICINFORMATIONINFDA-APPROVEDDRUGLABELSANDCPICGUIDELINES..............................................................................................................................................................................81KatherineI.Carrillo,TeriE.Klein
XTEA:ATRANSPOSABLEELEMENTINSERTIONANALYZERFORGENOMESEQUENCINGDATAFROMMULTIPLETECHNOLOGIES........................................................................................................................................................................82ChongChu,RebecaMonroy,SoohyunLee,E.AliceLee,PeterJ.Park
GOGETDATA(GGD):SIMPLE,REPRODUCIBLEACCESSTOSCIENTIFICDATA............................................................83MichaelCormier,JonBelyeu,BrentPedersen,JoeBrown,JohannesKoster,AaronR.Quinlan
GLOBALEPIGENOMICREGULATIONOFGENEEXPRESSIONANDCELLULARPROLIFERATIONINT-CELLLEUKEMIA..84SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue
APHARMACOGENOMICINVESTIGATIONOFTHECARDIACSAFETYPROFILEOFONDANSETRONINCHILDRENANDINPREGNANTWOMEN............................................................................................................................................................85GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraAMagee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton
TREND:APLATFORMFOREXPLORINGPROTEINFUNCTIONINPROKARYOTESUSINGPHYLOGENETICS,DOMAINARCHITECTURES,ANDGENENEIGHBORHOODSINFORMATION......................................................................................86VadimM.Gumerov,IgorB.Zhulin
TRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSBASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIES..87CaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega
AFLEXIBLEPIPELINEFORTHEPREDICTIONOFBIOMARKERSRELEVANTTODRUGSENSITIVITY........................88V.KeithHughitt,SayehGorjifard,AleksandraM.Michalowski,JohnK.Simmons,RyanDale,EricC.Polley,JonathanJ.Keats,BeverlyA.Mock
CREATINGAMETABOLICSYNDROMERESEARCHRESOURCE(METSRR)...................................................................89WillyshaJenkins,ChristianRichardson,ClarLyndaWilliams-DeVanePhD
UTILIZINGCOHORTINFORMATIONTOFINDCAUSATIVEVARIANTS...............................................................................90SenayKafkas,RobertHoehndorf
INTEGRATEDANALYSISOFJAK-STATPATHWAYINHOMEOSTASIS,SIMULATEDINFLAMMATIONANDTUMOUR...91MilicaKrunic,AnzhelikaKarjalainen,MojoyinolaJoannaOla,StephenShoebridge,SabineMacho-Maschler,CarolineLassnig,AndreaPoelzl,MatthiasFarlik,NikolausFortelny,ChristophBock,BirgitStrobl,MathiasMueller
BEERS2:THENEXTGENERATIONOFRNA-SEQSIMULATOR....................................................................................92NicholasF.Lahens,ThomasG.Brooks,DimitraSarantopoulou,SoumyashantNayak,CrisLawrence,AnandSrinivasan,JonathanSchug,GarretA.FitzGerald,JohnB.Hogenesch,YosephBarash,GregoryR.Grant
EFFECTMODIFICATIONBYAGEONADIAGNOSTICTHREE-GENE-SIGNATUREINPATIENTSWITHACTIVETUBERCULOSIS........................................................................................................................................................................93LaurenMcDonnell,CarlyBobak,MatthewNemesure,JustinLin,JaneHill
CLASSIFICATIONANDMUTATIONPREDICTIONFROMGASTROINTESTINALCANCERHISTOPATHOLOGYIMAGESUSINGDEEPLEARNING...........................................................................................................................................................94SungHakLee,Hyun-JongJang
vi
MAPPINGTHEEMERGENCEANDMIGRATIONOFHEMATOPOIETICSTEMCELLSANDPROGENITORSDURINGHUMANDEVELOPMENTATSINGLECELLRESOLUTION..................................................................................................95FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola
LARGE-SCALEMACHINELEARNINGANDGRAPHANALYTICSFORFUNCTIONALPREDICTIONOFPATHOGENPROTEINS.................................................................................................................................................................................96JasonMcDermott,SongFeng,WilliamNelson,Joon-YongLee,SayanGhosh,ArifulKhan,MahanteshHalappanavar,JustineNguyen,JonathanPruneda,DavidBaltrus,JoshuaAdkins
GENE-SETANALYSISUSINGGWASSUMMARYSTATISTICSANDGTEXDATABASE....................................................97MasahiroNakatochi
TARGETINGCANCERVIASIGNALINGPATHWAYS:ANOVELAPPROACHTOTHEDISCOVERYOFGENECCDC191'SDOUBLE-AGENTFUNCTIONUSINGDIFFERENTIALGENEEXPRESSION,HEATMAPANALYSESTHROUGHAIDEEPLEARNING,ANDMATHEMATICALMODELING................................................................................98AnnieOstojic
RFEX:SIMPLERANDOMFORESTMODELANDSAMPLEEXPLAINERFORNON-MACHINELEARNINGEXPERTS..99DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar
APPARENTBIASTOWARDLONGGENEMISREGULATIONINMECP2SYNDROMESDISAPPEARSAFTERCONTROLLINGFORBASELINEVARIATIONS.....................................................................................................................100AyushT.Raman,AmyEPohodich,Ying-WooiWan,HariKrishnaYalamanchili,WilliamE.Lowry,HudaY.Zoghbi,ZhandongLiu
PREDICTIONOFCHRONOLOGICALANDBIOLOGICALAGEFROMLABORATORYDATA..............................................101LukeSagers,LukeMelas-Kyriazi,ChiragJ.Patel,ArjunK.Manrai
WHOLEGENOMESEQUENCINGANALYSISOFINFLUENZACVIRUSINKOREA...........................................................102SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim
MININGTHEHUMUHUMUNUKUNUKUAPUAANDTHESHAKAOFAUTISMWITHBIGDATABIOMEDICALDATASCIENCE.................................................................................................................................................................................103PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyPaskov,NateStockham,MayaVarma,EmilieLeBlanc,JackKent,YordanPenev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall
DEVELOPMENTOFARECURRENCEPREDICTIONMODELFOREARLYLUNGADENOCARCINOMAUSINGRADIOMICS-BASEDARTIFICIALINTELLIGENCE.....................................................................................................................................104HeeChulYang,GunseokPark,JiEunOh
DRLPC:DIMENSIONREDUCTIONOFSEQUENCINGDATAUSINGLOCALPRINCIPALCOMPONENTS...................105YunJooYoo,FatemehYavartanu,ShelleyB.Bull
META-ANALYSISINEXHAUSTEDTCELLSFROMHOMOSAPIENSANDMUSMUSCULUSPROVIDESNOVELTARGETSFORIMMUNOTHERAPY........................................................................................................................................................106LinZhang,YichengGuo,HafumiNishi
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS.................................107DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLYDISORDEREDREGIONS.........................................................................................................................................................108SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
MUTATIONALSIGNATURES........................................................................................................................109TRANSCRIPTION-ASSOCIATEDREGIONALMUTATIONRATESANDSIGNATURESINREGULATORYELEMENTSACROSS2,500WHOLECANCERGENOMES......................................................................................................................110JüriReimand
COMPLEXMOSAICSTRUCTURALVARIATIONSINHUMANFETALBRAINS...................................................................111ShobanaSekar,LiviaTomasini,MariaKalyva,TaejeongBae,LoganManlove,BoZhou,JessicaMariani,FritzSedlazeck,AlexanderE.Urban,ChristosProukakis,FloraM.Vaccarino,AlexejAbyzov
vii
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK.................................................................................................................................................................112STRATIFICATIONOFKIDNEYTRANSPLANTRECIPIENTSBASEDONTEMPORALDISEASETRAJECTORIES............113IsabellaFriisJørgensenPhD,SørenSchwartzSørensenPhD,SørenBrunakPhD
MODELINGGENEEXPRESSIONLEVELSFROMEPIGENETICMARKERSUSINGADYNAMICALSYSTEMSAPPROACH114JamesBrunner,JacobKim,KordM.Kober
TRANSLATINGBIGDATANEUROIMAGINGFINDINGSINTOMEASUREMENTSOFINDIVIDUALVULNERABILITY..115PeterKochunov,PaulThompson,NedaJahanshad,ElliotHong
AUTOMATINGNEW-USERCOHORTCONSTRUCTIONWITHINDICATIONEMBEDDINGS............................................116RachelD.Melamed
REPRODUCIBILITY-OPTIMIZEDSTATISTICALTESTINGFOROMICSSTUDIES.............................................................117TomiSuomi,LauraElo
DATAINTEGRATIONEXPECTATIONMAPS:TOWARDSMOREINFORMED'OMICDATAINTEGRATION.................118TiaTate,ChristainRichardson,ClarLyndaWIlliams-DeVane
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE............................................................................................................................................................119INTEGRATEDOMICSDATAMININGOFSYNERGISTICGENEPAIRSFORCANCERPRECISIONMEDICINE.................120EunaJeong,ChoaPark,SukjoonYoon
THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTH.................................121ShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic
ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVELDATA.......................................................................................................................................................................................122JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen
PHARMGKB:AUTOMATEDLITERATUREANNOTATIONS............................................................................................123MichelleWhirl-Carrillo,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MarkWoon,JuliaBarbarino,JakeLever,RussB.Altman,TeriE.Klein
WORKSHOPSWITHPOSTERPRESENTATIONSPACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE...........124APOLLOPROVIDESCOLLABORATIVEGENOMEANNOTATIONEDITINGWITHTHEPOWEROFJBROWSE...........125NathanDunn,ColinDiesh,RobertBuels,HelenaRasche,AnthonyBretaudeau,NomiHarris,IanHolmes
G:PROFILER-ONEFUNCTIONALENRICHMENTANALYSISTOOL,MANYINTERFACESSERVINGLIFESCIENCECOMMUNITIES.......................................................................................................................................................................126LiisKolberg,UkuRaudvere,IvanKuzmin,JaakVilo,HediPeterson
INCREASINGUSABILITYANDDISSEMINATIONOFTHEPATHFXALGORITHMUSINGWEBAPPLICATIONSANDDOCKERSYSTEMS.................................................................................................................................................................127JenniferWilson,NicholasStepanov,AjinkyaChalke,MikeWong,DragutinPetkovic,RussB.Altman
TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA......................................................................................................................................................................128IDENTIFICATIONOFBIOMARKERSRELATEDTOAUTISMSPECTRUMDISORDERUSINGGENOMICINFORMATION.................................................................................................................................................................................................129LeenaSait,MarthaGizaw,andIosifVaisman
APAN-CANCER3-GENESIGNATURETOPREDICTDORMANCY.....................................................................................130IvyTran,AnchalSharma,SubhajyotiDe
AUTHORINDEX.......................................................................................................................................131
1
ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
2
ArtificialIntelligenceforEnhancingClinicalMedicine
PredictingLongitudinalOutcomesofAlzheimer'sDiseaseviaaTensor-BasedJointClassificationandRegressionModel
LodewijkBrand1,KaiNichols1,HuaWang1,HengHuang2,LiShen3,fortheADNI
1ColoradoSchoolofMines,2UniversityofPittsburgh,3UniversityofPennsylvaniaHuaWangAlzheimer'sdisease(AD)isaseriousneurodegenerativeconditionthataffectsmillionsofpeopleacrosstheworld.RecentlymachinelearningmodelshavebeenusedtopredicttheprogressionofAD,althoughtheyfrequentlydonottakeadvantageofthelongitudinalandstructuralcomponentsassociatedwithmulti-modalmedicaldata.Toaddressthis,wepresentanewalgorithmthatusesthemulti-blockalternatingdirectionmethodofmultiplierstooptimizeanovelobjectivethatcombinesmulti-modallongitudinalclinicaldataofvariousmodalitiestosimultaneouslypredictthecognitivescoresanddiagnosesoftheparticipantsintheAlzheimer'sDiseaseNeuroimagingInitiativecohort.Ournewmodelisdesignedtoleveragethestructureassociatedwithclinicaldatathatisnotincorporatedintostandardmachinelearningoptimizationalgorithms.Thisnewapproachshowsstate-of-the-artpredictiveperformanceandvalidatesacollectionofbrainandgeneticbiomarkersthathavebeenrecordedpreviouslyinADliterature.
3
ArtificialIntelligenceforEnhancingClinicalMedicine
RobustlyExtractingMedicalKnowledgefromEHRs:ACaseStudyofLearningaHealthKnowledgeGraph
IreneY.Chen1,MonicaAgrawal1,StevenHorng2,DavidSontag1
1MassachusettsInstituteofTechnology,2BethIsraelDeaconessMedicalCenter
IreneChenIncreasinglylargeelectronichealthrecords(EHRs)provideanopportunitytoalgorithmicallylearnmedicalknowledge.Inoneprominentexample,acausalhealthknowledgegraphcouldlearnrelationshipsbetweendiseasesandsymptomsandthenserveasadiagnostictooltoberefinedwithadditionalclinicalinput.Priorresearchhasdemonstratedtheabilitytoconstructsuchagraphfromover270,000emergencydepartmentpatientvisits.Inthiswork,wedescribemethodstoevaluateahealthknowledgegraphforrobustness.Movingbeyondprecisionandrecall,weanalyzeforwhichdiseasesandforwhichpatientsthegraphismostaccurate.Weidentifysamplesizeandunmeasuredconfoundersasmajorsourcesoferrorinthehealthknowledgegraph.Weintroduceamethodtoleveragenon-linearfunctionsinbuildingthecausalgraphtobetterunderstandexistingmodelassumptions.Finally,toassessmodelgeneralizability,weextendtoalargersetofcompletepatientvisitswithinahospitalsystem.WeconcludewithadiscussiononhowtorobustlyextractmedicalknowledgefromEHRs.
4
ArtificialIntelligenceforEnhancingClinicalMedicine
IncreasingClinicalTrialAccrualviaAutomatedMatchingofBiomarkerCriteria
JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehr
StanfordUniversitySchoolofMedicine
Successfulimplementationofprecisiononcologyrequiresboththedeploymentofnucleicacidsequencingpanelstoidentifyclinicallyactionablebiomarkers,andtheefficientscreeningofpatientbiomarkereligibilitytoon-goingclinicaltrialsandtherapies.Thisprocessistypicallyperformedmanuallybybiocurators,geneticists,pathologists,andoncologists;however,thisisatime-intensive,andinconsistentprocessamongsthealthcareproviders.WepresentthedevelopmentofafeaturematchingalgorithmicpipelinethatidentifiespatientswhomeeteligibilitycriteriaofprecisionmedicineclinicaltrialsviageneticbiomarkersandapplyittopatientsundergoingtreatmentattheStanfordCancerCenter.Thisstudydemonstrates,throughourpatienteligibilityscreeningalgorithmthatleveragesclinicalsequencingderivedbiomarkerswithprecisionmedicineclinicaltrials,thesuccessfuluseofanautomatedalgorithmicpipelineasafeasible,accurateandeffectivealternativetothetraditionalmanualclinicaltrialcuration.
5
ArtificialIntelligenceforEnhancingClinicalMedicine
AddressingtheCreditAssignmentProbleminTreatmentOutcomePredictionusingTemporalDifferenceLearning
SaharHarati1,AndreaCrowell2,HelenMayberg3,ShamimNemati4
1StanfordUniversity,2EmoryUniversity,3MountSinai,4UniversityofCaliforniaSanDiego
SaharHaratiMentalhealthpatientsoftenundergoavarietyoftreatmentsbeforefindinganeffectiveone.Improvedpredictionoftreatmentresponsecanshortenthedurationoftrials.Akeychallengeofapplyingpredictivemodelingtothisproblemisthatoftentheeffectivenessofatreatmentregimenremainsunknownforseveralweeks,andthereforeimmediatefeedbacksignalsmaynotbeavailableforsupervisedlearning.HereweproposeaMachineLearningapproachtoextractingaudio-visualfeaturesfromweeklyvideointerviewrecordingsforpredictingthelikelyoutcomeofDeepBrainStimulation(DBS)treatmentseveralweeksinadvance.Intheabsenceofimmediatetreatment-responsefeedback,weutilizeajointstate-estimationandtemporaldifferencelearningapproachtomodelboththetrajectoryofapatient'sresponseandthedelayednatureoffeedbacks.Ourresultsbasedonlongitudinalrecordingsfrom12patientswithdepressionshowthatthelearnedstatevaluesarepredictiveofthelong-termsuccessofDBStreatments.Weachieveanareaunderthereceiveroperatingcharacteristiccurveof0.88,beatingallbaselinemethods.
6
ArtificialIntelligenceforEnhancingClinicalMedicine
Fromgenometophenome:Predictingmultiplecancerphenotypesbasedonsomaticgenomicalterationsviathegenomicimpacttransformer
YifengTao1,ChunhuiCai2,WilliamW.Cohen1,XinghuaLu2
1CarnegieMellonUniversity,2UniversityofPittsburgh
YifengTaoCancersaremainlycausedbysomaticgenomicalterations(SGAs)thatperturbcellularsignalingsystemsandeventuallyactivateoncogenicprocesses.Therefore,understandingthefunctionalimpactofSGAsisafundamentaltaskincancerbiologyandprecisiononcology.Here,wepresentadeepneuralnetworkmodelwithencoder-decoderarchitecture,referredtoasgenomicimpacttransformer(GIT),toinferthefunctionalimpactofSGAsoncellularsignalingsystemsthroughmodelingthestatisticalrelationshipsbetweenSGAeventsanddifferentiallyexpressedgenes(DEGs)intumors.Themodelutilizesamulti-headself-attentionmechanismtoidentifySGAsthatlikelycauseDEGs,orinotherwords,differentiatingpotentialdriverSGAsfrompassengeronesinatumor.GITmodellearnsavector(geneembedding)asanabstractrepresentationoffunctionalimpactforeachSGA-affectedgene.GivenSGAsofatumor,themodelcaninstantiatethestatesofthehiddenlayer,providinganabstractrepresentation(tumorembedding)reflectingcharacteristicsofperturbedmolecular/cellularprocessesinthetumor,whichinturncanbeusedtopredictmultiplephenotypes.WeapplytheGITmodelto4,468tumorsprofiledbyTheCancerGenomeAtlas(TCGA)project.TheattentionmechanismenablesthemodeltobettercapturethestatisticalrelationshipbetweenSGAsandDEGsthanconventionalmethods,anddistinguishescancerdriversfrompassengers.ThelearnedgeneembeddingscapturethefunctionalsimilarityofSGAsperturbingcommonpathways.Thetumorembeddingsareshowntobeusefulfortumorstatusrepresentation,andphenotypepredictionincludingpatientsurvivaltimeanddrugresponseofcancercelllines.
7
ArtificialIntelligenceforEnhancingClinicalMedicine
Automatedphenotypingofpatientswithnon-alcoholicfattyliverdiseaserevealsclinicallyrelevantdiseasesubtypes
MaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi
IcahnSchoolofMedicineatMountSinai,Sema4MaxenceVandrommeNon-alcoholicfattyliverdisease(NAFLD)isacomplexheterogeneousdiseasewhichaffectsmorethan20%ofthepopulationworldwide.SomesubtypesofNAFLDhavebeenclinicallyidentifiedusinghypothesis-drivenmethods.Inthisstudy,weuseddataminingtechniquestosearchforsubtypesinanunbiasedfashion.Usingelectronicsignaturesofthedisease,weidentifiedacohortof13,290patientswithNAFLDfromahospitaldatabase.Wegatheredclinicaldatafrommultiplesourcesandappliedunsupervisedclusteringtoidentifyfivesubtypesamongthiscohort.Descriptivestatisticsandsurvivalanalysisshowedthatthesubtypeswereclinicallydistinctandwereassociatedwithdifferentratesofdeath,cirrhosis,hepatocellularcarcinoma,chronickidneydisease,cardiovasculardisease,andmyocardialinfarction.Noveldiseasesubtypesidentifiedinthismannercouldbeusedtorisk-stratifypatientsandguidemanagement.
8
ArtificialIntelligenceforEnhancingClinicalMedicine
MonitoringICUMortalityRiskwithALongShort-TermMemoryRecurrentNeuralNetwork
KeYu1,MingdaZhang2,TianyiCui2,MilosHauskrecht2
1IntelligentSystemsProgram,UniversityofPittsburgh;2DepartmentofComputerScience,
UniversityofPittsburghKeYuInintensivecareunits(ICU),mortalitypredictionisacriticalfactornotonlyforeffectivemedicalinterventionbutalsoforallocationofclinicalresources.Structuredelectronichealthrecords(EHR)containvaluableinformationforassessingmortalityriskinICUpatients,butcurrentmortalitypredictionmodelsusuallyrequirelaborioushuman-engineeredfeatures.Furthermore,substantialmissingdatainEHRisacommonproblemforboththeconstructionandimplementationofapredictionmodel.Inspiredbylanguage-relatedmodels,wedesignanewframeworkfordynamicmonitoringofpatients’mortalityrisk.Ourframeworkusesthebag-of-wordsrepresentationforallrelevantmedicaleventsbasedonmostrecenthistoryasinputs.Bydesign,itisrobusttomissingdatainEHRandcanbeeasilyimplementedasaninstantscoringsystemtomonitorthemedicaldevelopmentofallICUpatients.Specifically,ourmodeluseslatentsemanticanalysis(LSA)toencodethepatients’statesintolow-dimensionalembeddings,whicharefurtherfedtolongshort-termmemorynetworksformortalityriskprediction.Ourresultsshowthatthedeeplearningbasedframeworkperformsbetterthantheexistingseverityscoringsystem,SAPS-II.Weobservethatbidirectionallongshort-termmemorydemonstratessuperiorperformance,probablyduetothesuccessfulcaptureofbothforwardandbackwardtemporaldependencies.
9
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
10
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions
SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
DepartmentofComputerScience,VirginiaCommonwealthUniversity,401WestMainStreet,Richmond,VA23284,USA
LukaszKurganIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresidue-levelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.
11
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
DenovoensemblemodelingsuggeststhatAP2-bindingtodisorderedregionscanincreasestericvolumeofEpsinbutnotEps15
N.SuhasJagannathan1,ChristopherW.V.Hogue2,LisaTucker-Kellogg3
1Duke-NUSMedicalSchool;2NationalUniversityofSingapore,600EpicWayUnit345SanJoseCA95134;3Cancer&StemCellBiologyandCentreforComputationalBiologyDuke-NUSMedical
SchoolLisaTucker-KelloggProteinswithintrinsicallydisorderedregions(IDRs)havelargehydrodynamicradii,comparedwithglobularproteinsofequivalentweight.RecentexperimentsshowedthatIDRswithlargeradiicancreatestericpressuretodrivemembranecurvatureduringClathrin-mediatedendocytosis(CME).EpsinandEps15aretwoCMEproteinswithIDRsthatcontainmultiplemotifsforbindingtheadaptorproteinAP2,buttheimpactofAP2-bindingontheseIDRsisunknown.SomeIDRsacquirebinding-inducedfunctionbyformingafoldedquaternarystructure,butwehypothesizethattheIDRsofEpsinand/orEps15acquirebinding-inducedfunctionbyincreasingtheirstericvolume.WeexplorethishypothesisinsilicobygeneratingconformationalensemblesoftheIDRsofEpsin(4millionstructures)orEps15(3millionstructures),thenestimatingtheimpactofAP2-bindingonRadiusofGyration(RG).ResultsshowthattheensembleofEpsinIDRconformationsthataccommodateAP2bindinghasaright-shifteddistributionofRG(largerradii)thantheunboundEpsinensemble.Incontrast,theensembleofEps15IDRconformationshascomparableRGdistributionbetweenAP2-boundandunbound.WespeculatethatAP2triggerstheEpsinIDRtofunctionthroughbinding-induced-expansion,whichcouldincreasestericpressureandmembranebendingduringCME.
12
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
Modulationofp53TransactivationDomainConformationsbyLigandBindingandCancer-AssociatedMutations
XiaorongLiu,JianhanChen
UniversityofMassachusettsAmherstJianhanChenIntrinsicallydisorderedproteins(IDPs)areimportantfunctionalproteins,andtheirderegulationarelinkedtonumeroushumandiseasesincludingcancers.Understandinghowdisease-associatedmutationsordrugmoleculescanperturbthesequence-disorderedensemble-function-diseaserelationshipofIDPsremainschallenging,becauseitrequiresdetailedcharacterizationoftheheterogeneousstructuralensemblesofIDPs.Inthiswork,wecombinethelatestatomisticforcefielda99SB-disp,enhancedsamplingtechniquereplicaexchangewithsolutetempering,andGPU-acceleratedmoleculardynamicssimulationstoinvestigatehowfourcancer-associatedmutations,K24N,N29K/N30D,D49Y,andW53G,andbindingofananti-cancermolecule,epigallocatechingallate(EGCG),modulatethedisorderedensembleofthetransactivationdomain(TAD)oftumorsuppressorp53.Throughextensivesampling,inexcessof1.0μsperreplica,well-convergedstructuralensemblesofwild-typeandmutantp53-TADaswellasWTp53-TADinthepresenceofEGCGweregenerated.Theresultsrevealthatmutantscouldinducelocalstructuralchangesandaffectsecondarystructuralproperties.Interestingly,bothEGCGbindingandN29K/N30Dcouldalsoinducelong-rangestructuralreorganizationsandleadtomorecompactstructuresthatcouldshieldkeybindingsitesofp53-TADregulators.FurtheranalysisrevealsthattheeffectsofEGCGbindingaremainlyachievedthroughnonspecificinteractions.Theseobservationsaregenerallyconsistentwithon-goingNMRstudiesandbindingassays.OurstudiessuggestthatinducedconformationalcollapseofIDPsmaybeageneralmechanismforshieldingfunctionalsites,thusinhibitingrecognitionoftheirtargets.Thecurrentstudyalsodemonstratesthatatomisticsimulationsprovideaviableapproachforstudyingthesequence-disorderedensemble-function-diseaserelationshipsofIDPsanddevelopingnewdrugdesignstrategiestargetingregulatoryIDPs.
13
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
ExploringRelationshipsbetweentheDensityofChargedTractswithinDisorderedRegionsandPhaseSeparation
RamizSomjee1,2,DianaM.Mitrea1,RichardW.Kriwacki1,3
1St.JudeChildren'sResearchHospital;2RhodesCollege,3UniversityofTennesseeHealthSciences
CenterRamizSomjeeBiomolecularcondensatesformthroughaprocesstermedphaseseparationandplaydiverserolesthroughoutthecell.Proteinsthatundergophaseseparationoftenhavedisorderedregionsthatcanengageinweak,multivalentinteractions;however,ourunderstandingofthesequencegrammarthatdefineswhichproteinsphaseseparateisfarfromcomplete.Here,weshowthatproteinsthatdisplayahighdensityofchargedtractswithinintrinsicallydisorderedregionsarelikelytobeconstituentsofelectrostaticallyorganizedbiomolecularcondensates.WescoredthehumanproteomeusinganalgorithmtermedABTdensitythatquantifiesthedensityofchargedtractsandobservedthatproteinswithmorechargedtractsareenrichedinparticularGeneOntologyannotationsand,baseduponanalysisofinteractionnetworks,clusterintodistinctbiomolecularcondensates.Theseresultssuggestthatelectrostatically-driven,multivalentinteractionsinvolvingchargedtractswithindisorderedregionsservetoorganizecertainbiomolecularcondensatesthroughphaseseparation.
14
MUTATIONALSIGNATURES
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
15
MutationalSignatures
PhySigs:PhylogeneticInferenceofMutationalSignatureDynamics
SarahChristensen1,MarkD.M.Leiserson2,MohammedEl-Kebir1
1UniversityofIllinoisatUrbana-Champaign,2UniversityofMaryland
SarahChristensenDistinctmutationalprocessesshapethegenomesoftheclonescomprisingatumor.Theseprocessesresultindistinctmutationalpatterns,summarizedbyasmallnumberofmutationalsignatures.Currentanalysesofclone-specificexposurestomutationalsignaturesdonotfullyincorporateatumor’sevolutionarycontext,eitherinferringidenticalexposuresforalltumorclones,orinferringexposuresforeachcloneindependently.Here,weintroducetheTree-constrainedExposureproblemtoinferasmallnumberofexposureshiftsalongtheedgesofagiventumorphylogeny.Ouralgorithm,PhySigs,solvesthisproblemandincludesmodelselectiontoidentifythenumberofexposureshiftsthatbestexplainthedata.Wevalidateourapproachonsimulateddataandidentifyexposureshiftsinlungcancerdata,includingatleastoneshiftwithamatchingsubclonaldrivermutationinthemismatchrepairpathway.Moreover,weshowthatourapproachenablestheprioritizationofalternativephylogeniesinferredfromthesamesequencingdata.PhySigsispubliclyavailableathttps://github.com/elkebir-group/PhySigs
16
MutationalSignatures
TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies
CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4
1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof
MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer
Centre,NewYork,USA(pending)CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Recon-structingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.
17
MutationalSignatures
DNARepairFootprintUncoversContributionofDNARepairMechanismtoMutationalSignatures
DamianWojtowicz1,MarkD.M.Leiserson2,RodedSharan3,TeresaM.Przytycka1
1NIH,2UniversityofMaryland,3TelAvivUniversityTeresaPrzytyckaCancergenomesaccumulatealargenumberofsomaticmutationsresultingfromimperfectionofDNAprocessingduringnormalcellcycleaswellasfromcarcinogenicexposuresorcancerrelatedaberrationsofDNAmaintenancemachinery.Theseprocessesoftenleadtodistinctivepatternsofmutations,calledmutationalsignatures.Severalcomputationalmethodshavebeendevelopedtouncoversuchsignaturesfromcatalogsofsomaticmutations.However,cancermutationalsignaturesaretheend-effectofseveralinterplayingfactorsincludingcarcinogenicexposuresandpotentialdeficienciesoftheDNArepairmechanism.Tofullyunderstandthenatureofeachsignature,itisimportanttodisambiguatetheatomiccomponentsthatcontributetothefinalsignature.Here,weintroduceanewdescriptorofmutationalsignatures,DNARepairFootPrint(RePrint),andshowthatitcancapturecommonpropertiesofdeficienciesinrepairmechanismscontributingtodiversesignatures.WevalidatethemethodwithpublishedmutationalsignaturesfromcelllinestargetedwithCRISPR-Cas9-basedknockoutsofDNArepairgenes.
18
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
19
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ClinicalConceptEmbeddingsLearnedfromMassiveSourcesofMultimodalMedicalData
AndrewL.Beam1,BenjaminKompa2,AllenSchmaltz1,InbarFried3,GriffinWeber2,NathanPalmer2,XuShi1,TianxiCai1,IsaacS.Kohane3
1HarvardT.H.ChanSchoolofPublicHealth,2HarvardMedicalSchool,3UniversityofNorth
CarolinaSchoolofMedicineBenjaminKompaWordembeddingsareapopularapproachtounsupervisedlearningofwordrelationshipsthatarewidelyusedinnaturallanguageprocessing.Inthisarticle,wepresentanewsetofembeddingsformedicalconceptslearnedusinganextremelylargecollectionofmultimodalmedicaldata.Leaningonrecenttheoreticalinsights,wedemonstratehowaninsuranceclaimsdatabaseof60millionmembers,acollectionof20millionclinicalnotes,and1.7millionfulltextbiomedicaljournalarticlescanbecombinedtoembedconceptsintoacommonspace,resultinginthelargesteversetofembeddingsfor108,477medicalconcepts.Toevaluateourapproach,wepresentanewbenchmarkmethodologybasedonstatisticalpowerspecificallydesignedtotestembeddingsofmedicalconcepts.Ourapproach,calledcui2vec,attainsstate-of-the-artperformancerelativetopreviousmethodsinmostinstances.Finally,weprovideadownloadablesetofpre-trainedembeddingsforotherresearcherstouse,aswellasanonlinetoolforinteractiveexplorationofthecui2vecembeddings.
20
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
AssessmentofImputationMethodsforMissingGeneExpressionDatainMeta-AnalysisofDistinctCohortsofTuberculosisPatients
CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.Hill
DartmouthCollegeCarlyBobakThegrowthofpubliclyavailablerepositories,suchastheGeneExpressionOmnibus,hasallowedresearcherstoconductmeta-analysisofgeneexpressiondataacrossdistinctcohorts.Inthiswork,weassesseightimputationmethodsfortheirabilitytoimputegeneexpressiondatawhenvaluesaremissingacrossanentirecohortofTuberculosis(TB)patients.Weinvestigatehowvaryingproportionsofmissingdata(across10%,20%,and30%ofpatientsamples)influencetheimputationresults,andtestforsignificantlydifferentiallyexpressedgenesandenrichedpathwaysinpatientswithactiveTB.Ourresultsindicatethattruncatingtocommongenesobservedacrosscohorts,whichisthecurrentmethodusedbyresearchers,resultsintheexclusionofimportantbiologyandsuggestthatLASSOandLLSimputationmethodologiescanreasonablyimputegenesacrosscohortswhentotalmissingnessratesarebelow20%.
21
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Towardsidentifyingdrugsideeffectsfromsocialmediausingactivelearningandcrowdsourcing
SophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer
UniversityofMainzSophieBurkhardtMotivation:Socialmediaisalargelyuntappedsourceofinformationonsideeffectsofdrugs.Twitterinparticulariswidelyusedtoreportoneverydayeventsandpersonalailments.However,labelingthisnoisydataisadifficultproblembecauselabeledtrainingdataissparseandautomaticlabelingiserror-prone.Crowdsourcingcanhelpinsuchascenariotoobtainmorereliablelabels,butisexpensiveincomparisonbecauseworkershavetobepaid.Toremedythis,semi-supervisedactivelearningmayreducethenumberoflabeleddataneededandfocusthemanuallabelingprocessonimportantinformation.Results:WeextracteddatafromTwitterusingthepublicAPI.WesubsequentlyuseAmazonMechanicalTurkincombinationwithastate-of-the-artsemi-supervisedactivelearningmethodtolabeltweetswiththeirassociateddrugsandsideeffectsintwostages.Ourresultsshowthatourmethodisaneffectivewayofdiscoveringsideeffectsintweetswithanimprovementfrom53%F-measureto67%F-measureascomparedtoaonestageworkflow.Additionally,weshowtheeffectivenessoftheactivelearningschemeinreducingthelabelingcostincomparisontoanon-activebaseline.
22
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
MicrovascularDynamicsfrom4DMicroscopyUsingTemporalSegmentation
ShirGur,LiorWolf,LiorGolgher,PabloBlinder
TelAvivUniversityLiorWolfRecentlydevelopedmethodsforrapidcontinuousvolumetrictwo-photonmicroscopyfacilitatetheobservationofneuronalactivityinhundredsofindividualneuronsandchangesinbloodflowinadjacentbloodvesselsacrossalargevolumeoflivingbrainatunprecedentedspatio-temporalresolution.However,thehighimagingratenecessitatesfullyautomatedimageanalysis,whereastissueturbidityandphoto-toxicitylimitationsleadtoextremelysparseandnoisyimagery.Inthiswork,weextendarecentlyproposeddeeplearningvolumetricbloodvesselsegmentationnetwork,suchthatitsupportstemporalanalysis.Withthistechnology,weareabletotrackchangesincerebralbloodvolumeovertimeandidentifyspontaneousarterialdilationsthatpropagatetowardsthepialsurface.Thisnewcapabilityisapromisingsteptowardscharacterizingthehemodynamicresponsefunctionuponwhichfunctionalmagneticresonanceimaging(fMRI)isbased.
23
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
UsingTranscriptionalSignaturestoFindCancerDriverswithLURE
DavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart
Dept.ofBiomolecularEngineeringandUCSantaCruzGenomicsInstitute,UniversityOf
CaliforniaSantaCruz,SantaCruz,CA95064USADavidHaanCancergenomeprojectshaveproducedmultidimensionaldatasetsonthousandsofsamples.Yet,dependingonthetumortype,5-50%ofsampleshavenoknowndrivingevent.Weintroduceasemi-supervisedmethodcalledLearningUnRealizedEvents(LURE)thatusesaprogressivelabellearningframeworkandminimumspanninganalysistopredictcancerdriversbasedontheiralteredsamplessharingageneexpressionsignaturewiththesamplesofaknownevent.WedemonstratetheutilityofthemethodontheTCGAPan-CancerAt-lasdatasetforwhichitproducedahigh-confidenceresultrelating59newconnectionsto18knownmutationeventsincludingalterationsinthesamegene,family,andpathway.WegiveexamplesofpredicteddriversinvolvedinTP53,telomeremaintenance,andMAPK/RTKsignalingpathways.LUREidentifiesconnectionsbetweengeneswithnoknownpriorrela-tionship,someofwhichmayoffercluesfortargetingspecificformsofcancer.CodeandSup-plementalMaterialareavailableontheLUREwebsite:https://sysbiowiki.soe.ucsc.edu/lure.
24
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
PAGE-Net:InterpretableandIntegrativeDeepLearningforSurvivalAnalysisUsingHistopathologicalImagesandGenomicData
JieHao1,SaiChandraKosaraju2,NelsonZangeTsaku3,DaeHyunSong4,MingonKang2
1UniversityofPennsylvania,2UniversityofNevadaLasVegas,3KennesawStateUniversity,
4GyeongsangNationalUniversityChangwonHospitalJieHaoTheintegrationofmulti-modaldata,suchashistopathologicalimagesandgenomicdata,isessentialforunderstandingcancerheterogeneityandcomplexityforpersonalizedtreatments,aswellasforenhancingsurvivalpredictionsincancerstudy.Histopathology,asaclinicalgold-standardtoolfordiagnosisandprognosisincancers,allowsclinicianstomakeprecisedecisionsontherapies,whereashigh-throughputgenomicdatahavebeeninvestigatedtodissectthegeneticmechanismsofcancers.Weproposeabiologicallyinterpretabledeeplearningmodel(PAGE-Net)thatintegrateshistopathologicalimagesandgenomicdata,notonlytoimprovesurvivalprediction,butalsotoidentifygeneticandhistopathologicalpatternsthatcausedifferentsurvivalratesinpatients.PAGE-Netconsistsofpathology/genome/demography-specificlayers,eachofwhichprovidescomprehensivebiologicalinterpretation.Inparticular,weproposeanovelpatch-wisetexture-basedconvolutionalneuralnetwork,withapatchaggregationstrategy,toextractglobalsurvival-discriminativefeatures,withoutmanualannotationforthepathology-specificlayers.Weadaptedthepathway-basedsparsedeepneuralnetwork,namedCox-PASNet,forthegenome-specificlayers.TheproposeddeeplearningmodelwasassessedwiththehistopathologicalimagesandthegeneexpressiondataofGlioblastomaMultiforme(GBM)atTheCancerGenomeAtlas(TCGA)andTheCancerImagingArchive(TCIA).PAGE-NetachievedaC-indexof0.702,whichishigherthantheresultsachievedwithonlyhistopathologicalimages(0.509)andCox-PASNet(0.640).Moreimportantly,PAGE-Netcansimultaneouslyidentifyhistopathologicalandgenomicprognosticfactorsassociatedwithpatients’survivals.ThesourcecodeofPAGE-Netispubliclyavailableathttps://github.com/DataX-JieHao/PAGE-Net
25
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Machinelearningalgorithmsforsimultaneoussuperviseddetectionofpeaksinmultiplesamplesandcelltypes
TobyDylanHocking1,GuillaumeBourque2
1NorthernArizonaUniversity,2McGillUniversity
TobyHockingJointpeakdetectionisacentralproblemwhencomparingsamplesinepigenomicdataanalysis,butcurrentalgorithmsforthistaskareunsupervisedandlimitedtoatmosttwosampletypes.WeproposePeakSegPipeline,anewgenome-widemulti-samplepeakcallingpipelineforepigenomicdatasets.Itperformspeakdetectionusingaconstrainedmaximumlikelihoodsegmentationmodelwithessentiallyonlyonefreeparameterthatneedstobetuned:thenumberofpeaks.Toselectthenumberofpeaks,weproposetolearnapenaltyfunctionbasedonuser-providedlabelsthatindicategenomicregionswithorwithoutpeaksinspecificsamples.Incomparisonswithstate-of-the-artpeakdetectionalgorithms,PeakSegPipelineachievessimilarorbetteraccuracy,andamoreinterpretablemodelwithoverlappingpeaksthatoccurinexactlythesamepositionsacrossallsamples.Ournovelapproachisabletolearnthatpredictedpeaksizesvarybyexperimenttype.
26
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Graph-basedinformationdiffusionmethodforprioritizingfunctionallyrelatedgenesinprotein-proteininteractionnetworks
MinhPham,OlivierLichtarge
BaylorCollegeofMedicineMinhPhamShortestpathlengthmethodsareroutinelyusedtovalidatewhethergenesofinterestarefunctionallyrelatedtoeachotherbasedonbiologicalnetworkinformation.However,themethodsarecomputationallyintensive,impedingextensiveutilizationofnetworkinformation.Inaddition,non-weightedshortestpathlengthapproach,whichismorefrequentlyused,oftentreatallnetworkconnectionsequallywithouttakingintoaccountofconfidencelevelsoftheassociations.Ontheotherhand,graph-basedinformationdiffusionmethod,whichemploysboththepresenceandconfidenceweightsofnetworkedges,canefficientlyexplorelargenetworksandhaspreviouslydetectedmeaningfulbiologicalpatterns.Therefore,inthisstudy,wehypothesizedthatthegraph-basedinformationdiffusionmethodcouldprioritizegeneswithrelevantfunctionsmoreefficientlyandaccuratelythantheshortestpathlengthapproaches.Wedemonstratedthatthegraph-basedinformationdiffusionmethodsubstantiallydifferentiatednotonlygenesparticipatinginsamebiologicalpathways(p<<0.0001)butalsogenesassociatedwithspecifichumandrug-inducedclinicalsymptoms(p<<0.0001)fromrandom.Furthermore,thediffusionmethodprioritizedthesefunctionallyrelatedgenesfasterandmoreaccuratelythantheshortestpathlengthapproaches(pathways:p=2.7e-28,clinicalsymptoms:p=0.032).Thesedatashowthegraph-basedinformationdiffusionmethodcanberoutinelyusedforrobustprioritizationoffunctionallyrelatedgenes,facilitatingefficientnetworkvalidationandhypothesisgeneration,especiallyforhumanphenotype-specificgenes.
27
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ALiterature-BasedKnowledgeGraphEmbeddingMethodforIdentifyingDrugRepurposingOpportunitiesinRareDiseases
DanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman
StanfordUniversityDanielSosaMillionsofAmericansareaffectedbyrarediseases,manyofwhichhavepoorsurvivalrates.However,thesmallmarketsizeofindividualrarediseases,combinedwiththetimeandcapitalrequirementsofpharmaceuticalR&D,havehinderedthedevelopmentofnewdrugsforthesecases.Apromisingalternativeisdrugrepurposing,wherebyexistingFDA-approveddrugsmightbeusedtotreatdiseasesdifferentfromtheiroriginalindications.Inordertogeneratedrugrepurposinghypothesesinasystematicandcomprehensivefashion,itisessentialtointegrateinformationfromacrosstheliteratureofpharmacology,genetics,andpathology.Tothisend,weleverageanewlydevelopedknowledgegraph,theGlobalNetworkofBiomedicalRelationships(GNBR).GNBRisalarge,heterogeneousknowledgegraphcomprisingdrug,disease,andgene(orprotein)entitieslinkedbyasmallsetofsemantic“themes”derivedfromtheabstractsofbiomedicalliterature.Weapplyaknowledgegraphembeddingmethodthatexplicitlymodelstheuncertaintyassociatedwithliterature-derivedrelationshipsanduseslinkpredictiontogeneratedrugrepurposinghypotheses.Thisapproachachieveshighperformanceonagold-standardtestsetofknowndrugindications(AUROC=0.89)andiscapableofgeneratingnovelrepurposinghypotheses,whichweindependentlyvalidateusingexternalliteraturesourcesandproteininteractionnetworks.Finally,wedemonstratetheabilityofourmodeltoproduceexplanationsofitspredictions.
28
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Two-stageMLClassifierforIdentifyingHostProteinTargetsoftheDengueProtease
JacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell
DepartmentofMolecular,CellularandDevelopmentalBiology;BioFrontiersInstitute;University
ofColoradoBoulder(allauthorshavethesameaffiliation)JacobStanleyFlavivirusessuchasdengueencodeaproteasethatisessentialforviralreplication.Theproteasefunctionsbycleavingwell-conservedpositionsintheviralpolyprotein.Inadditiontotheviralpolyprotein,thedengueproteasecleavesatleastonehostproteininvolvedinimmuneresponse.Thisraisesthequestion,whatotherhostproteinsaretargetedandcleaved?Herewepresentanewcomputationalmethodforidentifyingputativehostproteintargetsofthedenguevirusprotease.Ourmethodreliesonbiochemicalandsecondarystructurefeaturesattheknowncleavagesitesintheviralpolyproteininatwo-stageclassificationprocesstoidentifyputativecleavagetargets.Theaccuracyofourpredictionsscaledinverselywithevolutionarydistancewhenweappliedittotheknowncleavagesitesofseveralotherflaviviruses---agoodindicationofthevalidityofourpredictions.Ultimately,ourclassifieridentified257humanproteinsitespossessingbothasimilartargetmotifandaccessiblelocalstructure.Theseproteinsarepromisingcandidatesforfurtherinvestigation.Asthenumberofviralsequencesexpands,ourmethodcouldbeadoptedtopredicthosttargetsofotherflaviviruses.
29
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
EnhancingModelInterpretabilityandAccuracyforDiseaseProgressionPredictionviaPhenotype-BasedPatientSimilarityLearning
YueWang1,TongWu1,2,YunlongWang1,GaoWang3
1IQVIAInc.,2UniversityofMinnesota,3UniversityofChicago
YueWangModelshavebeenproposedtoextracttemporalpatternsfromlongitudinalelectronichealthrecords(EHR)forclinicalpredictivemodels.However,thecommonrelationsamongpatients(e.g.,receivingthesamemedicaltreatments)wererarelyconsidered.Inthispaper,weproposetolearnpatientsimilarityfeaturesasphenotypesfromtheaggregatedpatient-medicalservicematrixusingnon-negativematrixfactorization.Onreal-worldmedicalclaimdata,weshowthatthelearnedphenotypesarecoherentwithineachgroup,andalsoexplanatoryandindicativeoftargeteddiseases.WeconductedexperimentstopredictthediagnosesforChronicLymphocyticLeukemia(CLL)patients.Resultsshowthatthephenotype-basedsimilarityfeaturescanimprovepredictionovermultiplebaselines,includinglogisticregression,randomforest,convolutionalneuralnetwork,andmore.
30
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE
PROCEEDINGSPAPERSWITHORALPRESENTATIONS
31
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
IntegratedCancerSubtypingusingHeterogeneousGenome-ScaleMolecularDatasets
SuzanArslanturk1,SorinDraghici1,TinNguyen2
1WayneStateUniversity,2UniversityofNevada
SorinDraghiciVastrepositoriesofheterogeneousdatafromexistingsourcespresentuniqueopportunities.Takenindividually,eachofthedatasetsofferssolutionstoimportantdomainandsource-specificquestions.Collectively,theyrepresentcomplementaryviewsofrelateddataentitieswithanaggregateinformationvalueoftenwellexceedingthesumofitsparts.Integrationofheterogeneousdataisthereforeparamounttoi)obtainamoreunifiedpictureandcomprehensiveviewoftherelations,ii)achievemorerobustresults,iii)improvetheaccuracyandintegrity,andiv)illuminatethecomplexinteractionsamongdatafeatures.Inthispaper,wehaveproposedadataintegrationmethodologytoidentifysubtypesofcancerusingmultipledatatypes(mRNA,methylation,microRNAandsomaticvariants)anddifferentdatascalesthatcomefromdifferentplatforms(microarray,sequencing,etc.).TheCancerGenomeAtlas(TCGA)datasetisusedtobuildthedataintegrationandcancersubtypingframework.Theproposeddataintegrationanddiseasesubtypingapproachaccuratelyidentifiesnovelsubgroupsofpatientswithsignificantlydifferentsurvivalprofiles.Withcurrentavailabilityofvastgenomics,andvariantdataforcancer,theproposeddataintegrationsystemwillbetterdifferentiatecancerandpatientsubtypesforriskandoutcomepredictionandtargetedtreatmentplanningwithoutadditionalcostandpreciouslosttime.
32
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Assessmentofcoverageforendogenousmetabolitesandexogenouschemicalcompoundsusinganuntargetedmetabolomicsplatform
SekWonKong1,CarlesHernandez-Ferrer2
1ComputationalHealthInformaticsProgram,BostonChildren’sHospital,300LongwoodAvenueBoston,MA02115,USA;2DepartmentofPediatrics,HarvardMedicalSchool,Boston,MA02115,
USASekWonKongPhysiologicalstatusandpathologicalchangesinanindividualcanbecapturedbymetabolicstatethatreflectstheinfluenceofbothgeneticvariantsandenvironmentalfactorssuchasdiet,lifestyleandgutmicrobiome.Thetotalityofenvironmentalexposurethroughoutlifetime–i.e.,exposome–isdifficulttomeasurewithcurrenttechnologies.However,targetedmeasurementofexogenouschemicalsanduntargetedprofilingofendogenousmetaboliteshavebeenwidelyusedtodiscoverbiomarkersofpathophysiologicchangesandtounderstandfunctionalimpactsofgeneticvariants.Toinvestigatethecoverageofchemicalspaceandinterindividualvariationrelatedtodemographicandpathologicalconditions,weprofiled169plasmasamplesusinganuntargetedmetabolomicsplatform.Onaverage,1,009metaboliteswerequantifiedineachindividual(range906–1,038)outof1,244totalchemicalcompoundsdetectedinourcohort.Ofnote,agewaspositivelycorrelatedwiththetotalnumberofdetectedmetabolitesinbothmalesandfemales.UsingtherobustQnestimator,wefoundmetaboliteoutliersineachsample(mean22,rangefrom7to86).Atotalof50metaboliteswereoutliersinapatientwithphenylketonuriaincludingtheonesknownforphenylalaninepathwaysuggestingmultiplemetabolicpathwaysperturbedinthispatient.Thelargestnumberofoutliers(N=86)wasfoundina5-year-oldboywithalpha-1-antitrypsindeficiencywhowerewaitingforlivertransplantationduetocirrhosis.Xenobioticsincludingdrugs,dietsandenvironmentalchemicalsweresignificantlycorrelatedwithdiverseendogenousmetabolitesandtheuseofantibioticssignificantlychangedgutmicrobialproductsdetectedinhostcirculation.Severalchallengessuchasannotationoffeatures,referencerangeandvarianceforeachfeatureperagegroupandgender,andpopulationscalereferencedatasetsneedtobeaddressed;however,untargetedmetabolomicscouldbeimmediatelydeployedasabiomarkerdiscoveryplatformandtoevaluatetheimpactofgenomicvariantsandexposuresonmetabolicpathwaysforsomediseases.
33
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Coverageprofilecorrectionofshallow-depthcirculatingcell-freeDNAsequencingviamulti-distancelearning
NicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey
MayoClinicCollegeofMedicineandSciences
NicholasLarsonShallow-depthwhole-genomesequencing(WGS)ofcirculatingcell-freeDNA(ccfDNA)isapopularapproachfornon-invasivegenomicscreeningassays,includingliquidbiopsyforearlydetectionofinvasivetumorsaswellasnon-invasiveprenatalscreening(NIPS)forcommonfetaltrisomies.IncontrasttonuclearDNAWGS,ccfDNAWGSexhibitsextensiveinter-andintra-samplecoveragevariabilitythatisnotfullyexplainedbytypicalsourcesofvariationinWGS,suchasGCcontent.Thisvariabilitymayinflatefalsepositiveandfalsenegativescreeningratesofcopy-numberalterationsandaneuploidy,particularlyifthesefeaturesarepresentatarelativelylowproportionoftotalsequencedcontent.Herein,weproposeanempirically-drivencoveragecorrectionstrategythatleveragespriorannotationinformationinamulti-distancelearningcontexttoimprovewithin-samplecoverageprofilecorrection.Specifically,wetrainaweightedk-nearestneighbors-stylemethodonnon-pregnantfemaledonorccfDNAWGSsamples,andapplyittoNIPSsamplestoevaluatecoverageprofilevariabilityreduction.Weadditionallycharacterizeimprovementinthediscriminationofpositivefetaltrisomycasesrelativetonormalcontrols,andcompareourresultsagainstamoretraditionalregression-basedapproachtoprofilecoveragecorrectionbasedonGCcontentandmappability.Undercross-validation,performancemeasuresindicatedbenefittocombiningthetwofeaturesetsrelativetoeitherinisolation.Wealsoobservedsubstantialimprovementincoverageprofilevariabilityreductioninleave-outclinicalNIPSsamples,withvariabilityreducedby26.5-53.5%relativetothestandardregression-basedmethodasquantifiedbymedianabsolutedeviation.Finally,weobservedimprovementdiscriminationforscreeningpositivetrisomycasesreducingccfDNAWGScoveragevariabilitywhileadditionallyimprovingNIPStrisomyscreeningassayperformance.Overall,ourresultsindicatethatmachinelearningapproachescansubstantiallyimproveccfDNAWGScoverageprofilecorrectionanddownstreamanalyses.
34
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
PGxMine:TextminingforcurationofPharmGKB
JakeLever1,JuliaM.Barbarino2,LiGong2,RachelHuddart2,KatrinSangkuhl2,RyanWhaley2,MichelleWhirl-Carrillo2,MarkWoon2,TeriE.Klein2,3,RussB.Altman1,2,3
1DepartmentofBioengineering,StanfordUniversity,Stanford,CA,94305;2Departmentof
BiomedicalDataScience,StanfordUniversity,Stanford,CA,94305;3DepartmentofMedicine,StanfordUniversity,Stanford,CA,94305
JakeLeverPrecisionmedicinetailorstreatmenttoindividualspersonaldataincludingdifferencesintheirgenome.ThePharmacogenomicsKnowledgebase(PharmGKB)provideshighlycuratedinformationontheeffectofgeneticvariationondrugresponseandsideeffectsforawiderangeofdrugs.PharmGKB’sscientificcuratorstriage,reviewandannotatealargenumberofpaperseachyearbutthetaskischallenging.WepresentthePGxMineresource,atext-minedresourceofpharmacogenomicassociationsfromallaccessiblepublishedliteraturetoassistinthecurationofPharmGKB.Wedevelopedasupervisedmachinelearningpipelinetoextractassociationsbetweenavariant(DNAandproteinchanges,starallelesanddbSNPidentifiers)andachemical.PGxMinecovers452chemicalsand2,426variantsandcontains19,930mentionsofpharmacogenomicassociationsacross7,170papers.AnevaluationbyPharmGKBcuratorsfoundthat57ofthetop100associationsnotfoundinPharmGKBledto83curatablepapersandafurther24associationswouldlikelyleadtocuratablepapersthroughcitations.Theresultscanbeviewedathttps://pgxmine.pharmgkb.org/andcodecanbedownloadedathttps://github.com/jakelever/pgxmine.
35
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth
ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1
1UniversityofNotreDame,2UniversityofCaliforniaLosAngeles
ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine''correlation''betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.
36
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
ImplementingaCloudBasedMethodforProtectedClinicalTrialDataSharing
GauravLuthria,QingboWang
HarvardUniversityGauravLuthriaClinicaltrialsgeneratealargeamountofdatathathavebeenunderutilizedduetoobstaclesthatpreventdatasharingincludingriskingpatientprivacy,datamisrepresentation,andinvalidsecondaryanalyses.Inordertoaddresstheseobstacles,wedevelopedanoveldatasharingmethodwhichensurespatientprivacywhilealsoprotectingtheinterestsofclinicaltrialinvestigators.Ourflexibleandrobustapproachinvolvestwocomponents:(1)anadvancedcloud-basedqueryinglanguagethatallowsuserstotesthypotheseswithoutdirectaccesstotherealclinicaltrialdataand(2)correspondingsyntheticdataforthequeryofinterestthatallowsforexploratoryresearchandmodeldevelopment.Bothcomponentscanbemodifiedbytheclinicaltrialinvestigatordependingonfactorssuchasthetypeoftrialornumberofpatientsenrolled.Totesttheeffectivenessofoursystem,wefirstimplementasimpleandrobustpermutationbasedsyntheticdatagenerator.Wethenusethesyntheticdatageneratorcoupledwithourqueryinglanguagetoidentifysignificantrelationshipsamongvariablesinarealisticclinicaltrialdataset.
37
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Pathwayandnetworkembeddingmethodsforprioritizingpsychiatricdrugs
YashPershad1,MargaretGuo2,RussB.Altman3
1StanfordUniversityDepartmentofBioengineering,2StanfordUniversityBiomedicalInformatics
Program,3StanfordUniversityDepartmentsofBioengineering,Genetics,&MedicineYashPershad
OneinfiveAmericansexperiencementalillness,androughly75%ofpsychiatricprescriptionsdonotsuccessfullytreatthepatient’scondition.Extensiveevidenceimplicatesgeneticfactorsandsignalingdisruptioninthepathophysiologyofthesediseases.Changesintranscriptionoftenunderliethismolecularpathwaydysregulation;individualpatienttranscriptionaldatacanimprovetheefficacyofdiagnosisandtreatment.Recentlarge-scalegenomicstudieshaveuncoveredsharedgeneticmodulesacrossmultiplepsychiatricdisorders—providinganopportunityforanintegratedmulti-diseaseapproachfordiagnosis.Moreover,network-basedmodelsinformedbygeneexpressioncanrepresentpathologicalbiologicalmechanismsandsuggestnewgenesfordiagnosisandtreatment.Here,weusepatientgeneexpressiondatafrommultiplestudiestoclassifypsychiatricdiseases,integrateknowledgefromexpert-curateddatabasesandpubliclyavailableexperimentaldatatocreateaugmenteddisease-specificgenesets,andusethesetorecommenddisease-relevantdrugs.FromGeneExpressionOmnibus,weextractexpressiondatafrom145casesofschizophrenia,82casesofbipolardisorder,190casesofmajordepressivedisorder,and307sharedcontrols.Weusepathway-basedapproachestopredictpsychiatricdiseasediagnosiswitharandomforestmodel(78%accuracy)andderiveimportantfeaturestoaugmentavailabledruganddiseasesignatures.Usingprotein-protein-interactionnetworksandembedding-basedmethods,webuildapipelinetoprioritizetreatmentsforpsychiatricdiseasesthatachievesa3.4-foldimprovementoverabackgroundmodel.Thus,wedemonstratethatgene-expression-derivedpathwayfeaturescandiagnosepsychiatricdiseasesandthatmolecularinsightsderivedfromthisclassificationtaskcaninformtreatmentprioritizationforpsychiatricdiseases.
38
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata
JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1
1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC
JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.
39
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Computationallyefficient,exact,covariate-adjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks
JackWolf1,MarthaBarnard1,XuetingXia2,NathanRyder3,JasonWestra4,NathanTintle4
1St.OlafCollege,2TexasTechUniversity,3ColoradoStateUniversity,4DordtUniversity
NathanTintle
Thepopularizationofbiobanksprovidesanunprecedentedamountofgeneticandphenotypicinformationthatcanbeusedtoresearchtherelationshipbetweengeneticsandhumanhealth.Despitetheopportunitiesthesedatasetsprovide,theyalsoposemanyproblemsassociatedwithcomputationaltimeandcosts,datasizeandtransfer,andprivacyandsecurity.Thepublishingofsummarystatisticsfromthesebiobanks,andtheuseoftheminavarietyofdownstreamstatisticalanalyses,alleviatesmanyoftheselogisticalproblems.However,majorquestionsremainabouthowtousesummarystatisticsinallbutthesimplestdownstreamapplications.Here,wepresentanovelapproachtoutilizebasicsummarystatistics(estimatesfromsinglemarkerregressionsonsinglephenotypes)toevaluatemorecomplexphenotypesusingmultivariatemethods.Inparticular,wepresentacovariate-adjustedmethodforconductingprincipalcomponentanalysis(PCA)utilizingonlybiobanksummarystatistics.Wevalidateexactformulasforthismethod,aswellasprovideaframeworkofestimationwhenspecificsummarystatisticsarenotavailable,throughsimulation.Weapplyourmethodtoarealdatasetoffattyacidandgenomicdata.
40
ARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
41
ArtificialIntelligenceforEnhancingClinicalMedicine
MulticlassDiseaseClassificationfromMicrobialWhole-CommunityMetagenomes
SaadKhan,LibushaKelly
AlbertEinsteinCollegeofMedicineSaadKhanThemicrobiome,thecommunityofmicroorganismslivingwithinanindividual,isapromisingavenuefordevelopingnon-invasivemethodsfordiseasescreeninganddiagnosis.Here,weutilize5643aggregated,annotatedwhole-communitymetagenomestoimplementthefirstmulticlassmicrobiomediseaseclassifierofthisscale,abletodiscriminatebetween18differentdiseasesandhealthy.Wecomparedthreedifferentmachinelearningmodels:randomforests,deepneuralnets,andanovelgraphconvolutionalarchitecturewhichexploitsthegraphstructureofphylogenetictreesasitsinput.Weshowthatthegraphconvolutionalmodeloutperformsdeepneuralnetsintermsofaccuracy(achieving75%averagetest-setaccuracy),receiver-operator-characteristics(92.1%averagearea-under-ROC(AUC)),andprecision-recall(50%averagearea-under-precision-recall(AUPR)).Additionally,theconvolutionalnet'sperformancecomplementsthatoftherandomforest,showingalowerpropensityforType-Ierrors(false-positives)whiletherandomforestmakeslessType-IIerrors(false-negatives).Lastly,weareabletoachieveover90%averagetop-3accuracyacrossallofourmodels.Together,theseresultsindicatethattherearepredictive,disease-specificsignaturesacrossmicrobiomesthatcanbeusedfordiagnosticpurposes.
42
ArtificialIntelligenceforEnhancingClinicalMedicine
LitGen:GeneticLiteratureRecommendationGuidedbyHumanExplanations
AllenNie1,ArturoL.Pineda1,MattW.Wright1,HannahWand1,BryanWulf1,HelioA.Costa1,RonakY.Patel2,CarlosD.Bustamante1,JamesZou1
1StanfordUniversity,2BaylorCollegeofMedicine
AllenNieAsgeneticsequencingcostsdecrease,thelackofclinicalinterpretationofvariantshasbecomethebottleneckinusinggeneticsdata.Amajorratelimitingstepinclinicalinterpretationisthemanualcurationofevidenceinthegeneticliteraturebyhighlytrainedbiocurators.Whatmakescurationparticularlytime-consumingisthatthecuratorneedstoidentifypapersthatstudyvariantpathogenicityusingdifferenttypesofapproachesandevidences---e.g.biochemicalassaysorcasecontrolanalysis.IncollaborationwiththeClinicalGenomicResource(ClinGen)---theflagshipNIHprogramforclinicalcuration---weproposethefirstmachinelearningsystem,LitGen,thatcanretrievepapersforaparticularvariantandfilterthembyspecificevidencetypesusedbycuratorstoassessforpathogenicity.LitGenusessemi-superviseddeeplearningtopredictthetypeofevi+denceprovidedbyeachpaper.ItistrainedonpapersannotatedbyClinGencuratorsandsystematicallyevaluatedonnewtestdatacollectedbyClinGen.LitGenfurtherleveragesrichhumanexplanationsandunlabeleddatatogain7.9%-12.6%relativeperformanceimprovementovermodelslearnedonlyontheannotatedpapers.Itisausefulframeworktoimproveclinicalvariantcuration.
43
ArtificialIntelligenceforEnhancingClinicalMedicine
MultilevelSelf-AttentionModelanditsUseonMedicalRiskPrediction
XianlongZeng1,2,YunyiFeng1,2,SoheilMoosavinasab2,DeborahLin2,SimonLin2,ChangLiu1
1SchoolofElectricalEngineeringandComputerScience,OhioUniversity,Athens,OH,USA;2The
ResearchInstituteatNationwideChildren’sHospital,Columbus,OH,USAxianlongzengVariousdeeplearningmodelshavebeendevelopedfordifferenthealthcarepredictivetasksusingElectronicHealthRecordsandhaveshownpromisingperformance.Inthesemodels,medicalcodesareoftenaggregatedintovisitrepresentationwithoutconsideringtheirheterogeneity,e.g.,thesamediagnosismightimplydifferenthealthcareconcernswithdifferentproceduresormedications.Thenthevisitsareoftenfedintodeeplearningmodels,suchasrecurrentneuralnetworks,sequentiallywithoutconsideringtheirregulartemporalinformationanddependenciesamongvisits.Toaddresstheselimitations,wedevelopedaMultilevelSelf-AttentionModel(MSAM)thatcancapturetheunderlyingrelationshipsbetweenmedicalcodesandbetweenmedicalvisits.WecomparedMSAMwithvariousbaselinemodelsontwopredictivetasks,i.e.,futurediseasepredictionandfuturemedicalcostprediction,withtwolargedatasets,i.e.,MIMIC-3andPFK.Intheexperiments,MSAMconsistentlyoutperformedbaselinemodels.Additionally,forfuturemedicalcostprediction,weuseddiseasepredictionasanauxiliarytask,whichnotonlyguidesthemodeltoachieveastrongerandmorestablefinancialprediction,butalsoallowsmanagedcareorganizationstoprovideabettercarecoordination.
44
ArtificialIntelligenceforEnhancingClinicalMedicine
IdentifyingTransitionalHighCostUsersfromUnstructuredPatientProfilesWrittenbyPrimaryCarePhysicians
HaoranZhang1,2,3,ElisaCandido3,AndrewS.Wilton3,RaquelDuchen3,LiisaJaakkimainen3,WalterWodchis3,4,5,QuaidMorris1,2,6,7
1DepartmentofComputerScience,UniversityofToronto;2VectorInstituteforArtificial
Intelligence,Toronto,Ontario,Canada;3ICES,Toronto,Ontario,Canada;4InstituteofHealthPolicy,Management,andEvaluation,UniversityofToronto;5InstituteforBetterHealth,Trillium
HealthPartners,Mississauga,Ontario,Canada;6TerrenceDonnellyCenterforCellularandBiomolecularResearch,UniversityofToronto;7DepartmentofMolecularGenetics,Universityof
TorontoHaoranZhangIdentificationandsubsequentinterventionofpatientsatriskofbecomingHighCostUsers(HCUs)presentstheopportunitytoimproveoutcomeswhilealsoprovidingsignificantsavingsforthehealthcaresystem.Inthispaper,the2016HCUstatusofpatientswaspredictedusingfree-formtextdatafromthe2015cumulativepatientprofileswithintheelectronicmedicalrecordsoffamilycarepracticesinOntario.Theseunstructurednotesmakesubstantialuseofdomain-specificspellingsandabbreviations;weshowthatwordembeddingsderivedfromthesamecontextprovidemoreinformativefeaturesthanpre-trainedonesbasedonWikipedia,MIMIC,andPubmed.Wefurtherdemonstratethatamodelusingfeaturesderivedfromaggregatedwordembeddings(EmbEncode)providesasignificantperformanceimprovementoverthebag-of-wordsrepresentation(82.48±0.35%versus81.85±0.36%held-outAUROC,p=3.2E-4),usingfarfewerinputfeatures(5,492versus214,750)andfewernon-zerocoefficients(1,177versus4,284).ThefutureHCUsofgreatestinterestarethetransitionaloneswhoarenotalreadyHCUs,becausetheyprovidethegreatestscopeforinterventions.PredictingthesenewHCUischallengingbecausemostHCUsrecur.WeshowthatremovingrecurrentHCUsfromthetrainingsetimprovestheabilityofEmbEncodetopredictnewHCUs,whileonlyslightlydecreasingitsabilitytopredictrecurrentones.
45
ArtificialIntelligenceforEnhancingClinicalMedicine
Obtainingdual-energycomputedtomography(CT)informationfromasingle-energyCTimageforquantitativeimaginganalysisoflivingsubjectsbyusingdeeplearning
WeiZhao1,TianlingLv2,RenaLee3,YangChen2,LeiXing1
1StanfordUniversity,2SoutheastUniversity,3EhwaWomensUniversity
LeiXingComputedtomographic(CT)isafundamentalimagingmodalitytogeneratecross-sectionalviewsofinternalanatomyinalivingsubjectorinterrogatematerialcompositionofanobject,andithasbeenroutinelyusedinclinicalapplicationsandnondestructivetesting.InastandardCTimage,pixelshavingthesameHounsfieldUnits(HU)cancorrespondtodifferentmaterials,anditisthereforechallengingtodifferentiateandquantifymaterials.Dual-energyCT(DECT)isdesirabletodifferentiatemultiplematerials,butthecostlyDECTscannersarenotwidelyavailableassingle-energyCT(SECT)scanners.Recentadvancementindeeplearningprovidesanenablingtooltomapimagesbetweendifferentmodalitieswithincorporatedpriorknowledge.HerewedevelopadeeplearningapproachtoperformDECTimagingbyusingthestandardSECTdata.Theendpointoftheapproachisamodelcapableofprovidingthehigh-energyCTimageforagiveninputlow-energyCTimage.Thefeasibilityofthedeeplearning-basedDECTimagingmethodusingaSECTdataisdemonstratedusingcontrast-enhancedDECTimagesandevaluatedusingclinicalrelevantindexes.ThisworkopensnewopportunitiesfornumerousDECTclinicalapplicationswithastandardSECTdataandmayenablesignificantlysimplifiedhardwaredesign,scanningdose,andimagecostreductionforfutureDECTsystems.
46
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
47
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
Many-to-onebindingbyintrinsicallydisorderedproteinregions
Wei-LunAlterovitz1*,EshelFaraggi1,2,3*,ChristopherJ.Oldfield1,JingweiMeng1,BinXue1,FeiHuang1,PedroRomero1,AndrzejKloczkowski2,VladimirN.Uversky1,A.KeithDunker1
1CenterforComputationalBiologyandBioinformatics,DepartmentofBiochemistryand
MolecularBiology,IndianaUniversitySchoolofMedicine,410W.10thSt,HS5000,Indianapolis,IN46202,USA([email protected]);2BattelleCenterforMathematicalMedicine,andthe
NationwideChildren’sHospital,DepartmentofPediatrics,TheOhioStateUniversity,Columbus,OH43210,USA;3ResearchandInformationSystems,LLC,1620E.72ndSt.Indianapolis,IN
46240USA*Contributedequally([email protected],[email protected])
KeithDunkerDisorderedbindingregions(DBRs),whichareembeddedwithinintrinsicallydisorderedproteinsorregions(IDPsorIDRs),enableIDPsorIDRstomediatemultipleprotein-proteininteractions.DBR-proteincomplexeswerecollectedfromtheProteinDataBankforwhichtwoormoreDBRshavingdifferentaminoacidsequencesbindtothesame(100%sequenceidentical)globularproteinpartner,atypeofinteractionhereincalledmany-to-onebinding.Twodistinctbindingprofileswereidentified:independentandoverlapping.Fortheoverlappingbindingprofiles,thedistinctDBRsinteractbymeansofalmostidenticalbindingsites(hereincalled“similar”),orthebindingsitescontainbothcommonanddivergentinteractionresidues(hereincalled“intersecting”).FurtheranalysisofthesequenceandstructuraldifferencesamongthesethreegroupsindicatehowIDPflexibilityallowsdifferentsegmentstoadjusttosimilar,intersecting,andindependentbindingpockets.
48
MUTATIONALSIGNATURES
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
49
MutationalSignatures
ImpactofmutationalsignaturesonmicroRNAandtheirresponseelements
EiriniStamoulakatou1,PietroPinoli1,StefanoCeri1,RosarioPiro2
1PolitecnicodiMilano,2FreieUniversitatBerlin
EiriniStamoulakatouMicroRNAsareaclassofsmallnon-codingRNAmoleculeswithgreatimportanceforregulatingalargenumberofdiversebiologicalprocessesinhealthanddisease,mostlybybindingtocomplementarymicroRNAresponseelements(MREs)onprotein-codingmessengerRNAsandothernon-codingRNAsandsubsequentlyinducingtheirdegradation.AgrowingbodyofevidenceindicatesthatthedysregulationofcertainmicroRNAsmayeitherdriveorsuppressoncogenesis.TheseedregionofamicroRNAisofcrucialimportanceforitstargetrecognition.MutationsintheseseedregionsmaydisruptthebindingofmicroRNAstotheirtargetgenes.Inthisstudy,weinvestigatethetheoreticalimpactofcancer-associatedmutagenicprocessesandtheirmutationalsignaturesonmicroRNAseedsandtheirMREs.Toourknowledge,thisisthefirststudywhichprovidesaprobabilisticframeworkformicroRNAandMREsequencealterationanalysisbasedonmutationalsignaturesandcomputationallyassessingthedisruptiveimpactofmutationalsignaturesonhumanmicroRNA–targetinteractions.
50
MutationalSignatures
GenomeGerrymandering:optimaldivisonofthegenomeintoregionswithcancertypespecificdifferencesinmutationrates
AdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal
UniversityofTorontoAdamoYoungTheactivityofmutationalprocessesdiffersacrossthegenome,andisinfluencedbychromatinstateandspatialgenomeorganization.Atthescaleofonemegabase-pair(Mb),regionalmutationdensitycorrelatestronglywithchromatinfeaturesandmutationdensityatthisscalecanbeusedtoaccuratelyidentifycancertype.Here,weexploretherelationshipbetweengenomicregionandmutationratebydevelopinganinformationtheorydriven,dynamicprogrammingalgorithmfordividingthegenomeintoregionswithdifferingrelativemutationratesbetweencancertypes.Ouralgorithmimprovesmutualinformationwhencomparedtothenaiveapproach,effectivelyreducingtheaveragenumberofmutationsrequiredtoidentifycancertype.Ourapproachprovidesanefficientmethodforassociatingregionalmutationdensitywithmutationlabels,andhasfutureapplicationsinexploringtheroleofsomaticmutationsinanumberofdiseases.
51
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
52
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
LearningaLatentSpaceofHighlyMultidimensionalCancerData
BenjaminKompa1,BeauCoker2
1HarvardMedicalSchool,2HarvardSchoolofPublicHealth
BenjaminKompaWeintroduceaUnifiedDisentanglementNetwork(UFDN)trainedonTheCancerGenomeAtlas(TCGA),whichwerefertoasUFDN-TCGA.WedemonstratethatUFDN-TCGAlearnsabiologicallyrelevant,low-dimensionallatentspaceofhigh-dimensionalgeneexpressiondatabyapplyingournetworktotwoclassificationtasksofcancerstatusandcancertype.UFDN-TCGAperformscomparablytorandomforestmethods.TheUFDNallowsforcontinuous,partialinterpolationbetweendistinctcancertypes.Furthermore,weperformananalysisofdifferentiallyexpressedgenesbetweenskincutaneousmelanoma(SKCM)samplesandthesamesamplesinterpolatedintoglioblastoma(GBM).Wedemonstratethatourinterpolationsconsistofrelevantmetagenesthatrecapitulateknownglioblastomamechanisms.
53
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ScalingstructurallearningwithNO-BEARStoinfercausaltranscriptomenetworks
Hao-ChihLee1,3,MatteoDanieletto1,2,3,RiccardoMiotto1,2,3,SarahT.Cherng1,3,JoelT.Dudley1,2,3
1InstituteforNextGenerationHealthcare,2HassoPlattnerInstituteforDigitalHealth,
3DepartmentofGeneticsandGenomicSciencesIcahnSchoolofMedicineatMountSinaiNewYork,NY10065,USA
Hao-ChihLeeConstructinggeneregulatorynetworksisacriticalstepinrevealingdiseasemechanismsfromtranscriptomicdata.Inthiswork,wepresentNO-BEARS,anovelalgorithmforestimatinggeneregulatorynetworks.TheNO-BEARSalgorithmisbuiltonthebasisoftheNO-TEARSalgorithmwithtwoimprovements.First,weproposeanewconstraintanditsfastapproximationtoreducethecomputationalcostoftheNO-TEARSalgorithm.Next,weintroduceapolynomialregressionlosstohandlenon-linearityingeneexpressions.OurimplementationutilizesmodernGPUcomputationthatcandecreasethetimeofhours-longCPUcomputationtoseconds.Usingsyntheticdata,wedemonstrateimprovedperformance,bothinprocessingtimeandaccuracy,oninferringgeneregulatorynetworksfromgeneexpressiondata.
54
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
PathFlowAI:AHigh-ThroughputWorkflowforPreprocessing,DeepLearningandInterpretationinDigitalPathology
JoshuaJ.Levy1,LucasA.Salas1,BrockC.Christensen1,AravindhanSriharan2,LouisJ.Vaickus2
1GeiselSchoolofMedicineatDartmouth,2DartmouthHitchcockMedicalCenter
JoshuaLevyThediagnosisofdiseaseoftenrequiresanalysisofabiopsy.Manydiagnosesdependnotonlyonthepresenceofcertainfeaturesbutontheirlocationwithinthetissue.Recently,anumberofdeeplearningdiagnosticaidshavebeendevelopedtoclassifydigitizedbiopsyslides.Clinicalworkflowsofteninvolveprocessingofmorethan500slidesperday.But,clinicaluseofdeeplearningdiagnosticaidswouldrequireapreprocessingworkflowthatiscost-effective,flexible,scalable,rapid,interpretable,andtransparent.Here,wepresentsuchaworkflow,optimizedusingDaskandmixedprecisiontrainingviaAPEX,capableofhandlinganypatch-levelorslidelevelclassificationandpredictionproblem.Theworkflowusesaflexibleandfastpreprocessinganddeeplearninganalyticspipeline,incorporatesmodelinterpretationandhasahighlystorage-efficientaudittrail.Wedemonstratetheutilityofthispackageontheanalysisofaprototypicalanatomicpathologyspecimen,liverbiopsiesforevaluationofhepatitisfromaprospectivecohort.ThepreliminarydataindicatethatPathFlowAImaybecomeacost-effectiveandtime-efficienttoolforclinicaluseofArtificialIntelligence(AI)algorithms.
55
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Improvingsurvivalpredictionusinganovelfeatureselectionandfeaturereductionframeworkbasedontheintegrationofclinicalandmoleculardata*
LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson
DepartmentofBiostatisticsandDataScience,UniversityofKansasMedicalCenter,andUniversityofKansasCancerCenter
LisaNeumsTheaccuratepredictionofacancerpatient’sriskofprogressionordeathcanguidecliniciansintheselectionoftreatmentandhelppatientsinplanningpersonalaffairs.Predictivemodelsbasedonpatient-leveldatarepresentatoolfordeterminingrisk.Ideally,predictivemodelswillusemultiplesourcesofdata(e.g.,clinical,demographic,molecular,etc.).However,therearemanychallengesassociatedwithdataintegration,suchasoverfittingandredundantfeatures.Inthispaperweaimtoaddressthosechallengesthroughthedevelopmentofanovelfeatureselectionandfeaturereductionframeworkthatcanhandlecorrelateddata.Ourmethodbeginsbycomputingasurvivaldistancescoreforgeneexpression,whichincombinationwithascoreforclinicalindependence,resultsintheselectionofhighlypredictivegenesthatarenon-redundantwithclinicalfeatures.Thesurvivaldistancescoreisameasureofvariationofgeneexpressionovertime,weightedbythevarianceofthegeneexpressionoverallpatients.Selectedgenes,incombinationwithclinicaldata,areusedtobuildapredictivemodelforsurvival.Webenchmarkourapproachagainstcommonlyusedmethods,namelylasso-aswellasridge-penalizedCoxproportionalhazardsmodels,usingthreepubliclyavailablecancerdatasets:kidneycancer(521samples),lungcancer(454samples)andbladdercancer(335samples).Acrossalldatasets,ourapproachbuiltonthetrainingsetoutperformedtheclinicaldataaloneinthetestsetintermsofpredictivepowerwithac.Indexof0.773vs0.755forkidneycancer,0.695vs0.664forlungcancerand0.648vs0.636forbladdercancer.Further,wewereabletoshowincreasedpredictiveperformanceofourmethodcomparedtolasso-penalizedmodelsfittobothgeneexpressionandclinicaldata,whichhadac.Indexof0.767,0.677,and0.645,aswellasincreasedorcomparablepredictivepowercomparedtoridgemodels,whichhadac.Indexof0.773,0.668and0.650forthekidney,lung,andbladdercancerdatasets,respectively.Therefore,ourscoreforclinicalindependenceimprovesprognosticperformanceascomparedtomodelingapproachesthatdonotconsidercombiningnon-redundantdata.Futureworkwillconcentrateonoptimizingthesurvivaldistancescoreinordertoachieveimprovedresultsforalltypesofcancer.
56
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Bayesiansemi-nonnegativematrixtri-factorizationtoidentifypathwaysassociatedwithcancerphenotypes
SunhoPark1,NabhonilKar1,Jae-HoCheong2,TaeHyunHwang1
1ClevelandClinic,2YonseiUniversityCollegeofMedicine
SunhoParkAccurateidentificationofpathwaysassociatedwithcancerphenotypes(e.g.,cancersubtypesandtreatmentoutcome)couldleadtodiscoveringreliableprognosticand/orpredictivebiomarkersforbetterpatientsstratificationandtreatmentguidance.Inourpreviouswork,wehaveshownthatnon-negativematrixtri-factorization(NMTF)canbesuccessfullyappliedtoidentifypathwaysassociatedwithspecificcancertypesordiseaseclassesasaprognosticandpredictivebiomarker.However,onekeylimitationofnon-negativefactorizationmethods,includingvariousnon-negativebi-factorizationmethods,istheirlimitedabilitytohandlenegativeinputdata.Forexample,manymoleculardatathatconsistofreal-valuescontainingbothpositiveandnegativevalues(e.g.,normalized/logtransformedgeneexpressiondatawherenegativevaluerepresentsdown-regulatedexpressionofgenes)arenotsuitableinputforthesealgorithms.Inaddition,mostpreviousmethodsprovidejustasinglepointestimateandhencecannotdealwithuncertaintyeffectively.Toaddresstheselimitations,weproposeaBayesiansemi-nonnegativematrixtri-factorizationmethodtoidentifypathwaysassociatedwithcancerphenotypesfromareal-valuedinputmatrix,e.g.,geneexpressionvalues.Motivatedbysemi-nonnegativefactorization,weallowoneofthefactormatrices,thecentroidmatrix,tobereal-valuedsothateachcentroidcanexpresseithertheup-ordown-regulationofthemembergenesinapathway.Inaddition,weplacestructuredspike-and-slabpriors(whichareencodedwiththepathwaysandagene-geneinteraction(GGI)network)onthecentroidmatrixsothatevenasetofgenesthatisnotinitiallycontainedinthepathways(duetotheincompletenessofthecurrentpathwaydatabase)canbeinvolvedinthefactorizationinastochasticwayspecifically,ifthosegenesareconnectedtothemembergenesofthepathwaysontheGGInetwork.Wealsopresentupdaterulesfortheposteriordistributionsintheframeworkofvariationalinference.AsafullBayesianmethod,ourproposedmethodhasseveraladvantagesoverthecurrentNMTFmethods,whicharedemonstratedusingsyntheticdatasetsinexperiments.UsingtheTheCancerGenomeAtlas(TCGA)gastriccancerandmetastaticgastriccancerimmunotherapyclinical-trialdatasets,weshowthatourmethodcouldidentifybiologicallyandclinicallyrelevantpathwaysassociatedwiththemolecularsubtypesandimmunotherapyresponse,respectively.Finally,weshowthatthosepathwaysidentifiedbytheproposedmethodcouldbeusedasprognosticbiomarkerstostratifypatientswithdistinctsurvivaloutcomeintwoindependentvalidationdatasets.Additionalinformationandcodescanbefoundathttps://github.com/parks-cs-ccf/BayesianSNMTF.
57
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Tree-WeightingforMulti-StudyEnsembleLearners
MayaRamchandran1,PrasadPatil1,2,GiovanniParmigiani1,2
1DepartmentofBiostatistics,HarvardT.H.ChanSchoolofPublicHealth;Departmentof
Biostatistics,HarvardT.H.ChanSchoolofPublicHealth;2DepartmentofDataSciences,Dana-FarberCancerInstitute
MayaRamchandranMulti-studylearningusesmultipletrainingstudies,separatelytrainsclassifiersoneach,andformsanensemblewithweightsrewardingmemberswithbettercross-studypredictionability.Thisarticleconsidersnovelweightingapproachesforconstructingtree-basedensemblelearnersinthissetting.UsingRandomForestsasasingle-studylearner,wecompareweightingeachforesttoformtheensemble,toextractingtheindividualtreestrainedbyeachRandomForestandweightingthemdirectly.Wefindthatincorporatingmultiplelayersofensemblinginthetrainingprocessbyweightingtreesincreasestherobustnessoftheresultingpredictor.Furthermore,weexplorehowensemblingweightscorrespondtotreestructure,toshedlightonthefeaturesthatdeterminewhetherweightingtreesdirectlyisadvantageous.Finally,weapplyourapproachtogenomicdatasetsandshowthatweightingtreesimprovesuponthebasicmulti-studylearningparadigm.Codeandsupplementarymaterialareavailableathttps://github.com/m-ramchandran/tree-weighting.
58
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
PTRExplorer:AnapproachtoidentifyandexplorePostTranscriptionalRegulatorymechanismsusingproteogenomics
ArunimaSrivastava1,MichaelSharpnack1,KunHuang2,ParagMallick3,RaghuMachiraju1
1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversity
ArunimaSrivastavaIntegrationoftranscriptomicandproteomicdatashouldrevealmulti-layeredregulatoryprocessesgoverningcancercellbehaviors.Traditionalcorrelation-basedanalyseshavedemonstratedlimitedabilitytoidentifythepost-transcriptionalregulatory(PTR)processesthatdrivethenon-linearrelationshipbetweentranscriptandproteinabundances.Inthiswork,weideateanintegrativeapproachtoexplorethevarietyofpost-transcriptionalmechanismsthatdictaterelationshipsbetweengenesandcorrespondingproteins.Theproposedworkflowutilizestheintuitivetechniqueofscatterplotdiagnosticsorscagnostics,tocharacterizeandexaminethediversescatterplotsbuiltfromtranscriptandproteinabundancesinaproteogenomicexperiment.Theworkflowincludesrepresentinggene-proteinrelationshipsasscatterplots,clusteringongeometricscagnosticfeaturesofthesescatterplots,andfinallyidentifyingandgroupingthepotentialgene-proteinrelationshipsaccordingtotheirdispositiontovariousPTRmechanisms.Ourstudyverifiestheefficacyoftheimplementedapproachtoexcavatepossibleregulatorymechanismsbyutilizingcomprehensivetestsonasyntheticdataset.Wealsoproposeavarietyof2Dpattern-specificdownstreamanalysesmethodologiessuchasmixturemodeling,andmappingmiRNApost-transcriptionaleffectstoexploreeachmechanismfurther.Thisworksuggeststhattheproposedmethodologyhasthepotentialfordiscoveringandcategorizingpost-transcriptionalregulatorymechanisms,manifestinginproteogenomictrends.Thesetrendssubsequentlyprovideevidenceforcancerspecificity,miRNAtargeting,andidentificationofregulationimpactedbybiologicalfunctionalityanddifferenttypesofdegradation.
59
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
NetworkRepresentationofLarge-ScaleHeterogeneousRNASequenceswithIntegrationofDiverseMulti-omics,Interactions,andAnnotationsData
NhatTran,JeanGao
TheUniversityofTexasatArlingtonJeanGaoLongnon-codingRNA(lncRNA),microRNA,andmessengerRNAenablekeyregulationsofvariousbiologicalprocessesthroughavarietyofdiverseinteractionmechanisms.Identifyingtheinteractionsandcross-talkbetweentheseheterogeneousRNAclassesisessentialinordertouncoverthefunctionalroleofindividualRNAtranscripts,especiallyforunannotatedandsparselydiscoveredRNAsequenceswithnoknowninteractions.Recently,sequence-baseddeeplearningandnetworkembeddingmethodsaregainingtractionashigh-performingandflexibleapproachesthatcaneitherpredictRNA-RNAinteractionsfromsequenceorinfermissinginteractionsfrompatternsthatmayexistinthenetworktopology.However,mostofthecurrentmethodshaveseverallimitations,e.g.,theinabilitytoperforminductivepredictions,todistinguishthedirectionalityofinteractions,ortointegratevarioussequence,interaction,expression,andgenomicannotationdatasets.Weproposedanoveldeeplearningframework,rna2rna,whichlearnsfromRNAsequencestoproducealow-dimensionalembeddingthatpreservesproximitiesinboththeinteractiontopologyandthefunctionalaffinitytopology.Inthisproposedembeddingspace,thetwo-part"sourceandtargetcontexts"capturethereceptivefieldsofeachRNAtranscripttoencapsulateheterogeneouscross-talkinteractionsbetweenlncRNAsandmicroRNAs.TheproximitybetweenRNAsinthisembeddingspacealsouncoversthesecond-orderrelationshipsthatallowforaccurateinferenceofnoveldirectedinteractionsorfunctionalsimilaritiesbetweenanytwoRNAsequences.Inaprospectiveevaluation,ourmethodexhibitssuperiorperformancecomparedtostate-of-artapproachesatpredictingmissinginteractionsfromseveralRNA-RNAinteractiondatabases.AdditionalresultssuggestthatourproposedframeworkcancaptureamanifoldforheterogeneousRNAsequencestodiscovernovelfunctionalannotations.
60
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
HadoopandPySparkforreproducibilityandscalabilityofgenomicsequencingstudies
NicholasR.Wheeler1,PenelopeBenchek1,BrianW.Kunkle2,KaraL.Hamilton-Nelson2,MikeWarfe1,JeremyR.Fondran1,JonathanL.Haines1,WilliamS.Bush1
1CaseWesternReserveUniversity,2UniversityofMiami
WilliamBushModerngenomicstudiesarerapidlygrowinginscale,andtheanalyticalapproachesusedtoanalyzegenomicdataareincreasingincomplexity.Genomicdatamanagementposeslogisticandcomputationalchallenges,andanalysesareincreasinglyreliantongenomicannotationresourcesthatcreatetheirowndatamanagementandversioningissues.Asaresult,genomicdatasetsareincreasinglyhandledinwaysthatlimittherigorandreproducibilityofmanyanalyses.Inthiswork,weexaminetheuseoftheSparkinfrastructureforthemanagement,access,andanalysisofgenomicdataincomparisontotraditionalgenomicworkflowsontypicalclusterenvironments.WevalidatetheframeworkbyreproducingpreviouslypublishedresultsfromtheAlzheimer’sDiseaseSequencingProject.UsingtheframeworkandanalysesdesignedusingJupyternotebooks,Sparkprovidesimprovedworkflows,reducesuser-drivendatapartitioning,andenhancestheportabilityandreproducibilityofdistributedanalysesrequiredforlarge-scalegenomicstudies.
61
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
CERENKOV3:Clusteringandmolecularnetwork-derivedfeaturesimprovecomputationalpredictionoffunctionalnoncodingSNPs
YaoYao,StephenA.Ramsey
OregonStateUniversityYaoYaoIdentificationofcausalnoncodingsinglenucleotidepolymorphisms(SNPs)isimportantformaximizingtheknowledgedividendfromhumangenome-wideassociationstudies(GWAS).Recently,diversemachinelearning-basedmethodshavebeenusedforfunctionalSNPidentification;however,thistaskremainsafundamentalchallengeincomputationalbiology.WereportCERENKOV3,amachinelearningpipelinethatleveragesclustering-derivedandmolecularnetwork-derivedfeaturestoimprovepredictionaccuracyofregulatorySNPs(rSNPs)inthecontextofpost-GWASanalysis.Theclustering-derivedfeature,locussize(numberofSNPsinthelocus),derivesfromourlocuspartitioningprocedureandrepresentsthesizesofclustersbasedonSNPlocations.Wegeneratedtwomolecularnetwork-derivedfeaturesfromrepresentationlearningonanetworkrepresentingSNP-geneandgene-generelations.Basedonempiricalstudiesusingaground-truthSNPdataset,CERENKOV3significantlyimprovesrSNPrecognitionperformanceinAUPRC,AUROC,andAVGRANK(alocus-wiserank-basedmeasureofclassificationaccuracywepreviouslyproposed).
62
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE
PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS
63
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
AnomiGAN:GenerativeAdversarialNetworksforAnonymizingPrivateMedicalData
HoBae,DahuinJung,Hyun-SooChoi,SungrohYoon
SeoulNationalUniversityHoBae
Typicalpersonalmedicaldatacontainssensitiveinformationaboutindividuals.Storingorsharingthepersonalmedicaldataisthusoftenrisky.Forexample,ashortDNAsequencecanprovideinformationthatcanidentifynotonlyanindividual,butalsohisorherrelatives.Nonetheless,mostcountriesandresearchersagreeonthenecessityofcollectingpersonalmedicaldata.Thisstemsfromthefactthatmedicaldata,includinggenomicdata,areanindispensableresourceforfurtherresearchanddevelopmentregardingdiseasepreventionandtreatment.Topreventpersonalmedicaldatafrombeingmisused,techniquestoreliablypreservesensitiveinformationshouldbedevelopedforrealworldapplications.Inthispaper,weproposeaframeworkcalledanonymizedgenerativeadversarialnetworks(AnomiGAN),topreservetheprivacyofpersonalmedicaldata,whilealsomaintaininghighpredictionperformance.Wecomparedourmethodtostate-of-the-arttechniquesandobservedthatourmethodpreservesthesamelevelofprivacyasdifferentialprivacy(DP)andprovidesbetterpredictionresults.Wealsoobservedthatthereisatrade-offbetweenprivacyandpredictionresultsthatdependsonthedegreeofpreservationoftheoriginaldata.Here,weprovideamathematicaloverviewofourproposedmodelanddemonstrateitsvalidationusingUCImachinelearningrepositorydatasetsinordertohighlightitsutilityinpractice.Thecodeisavailableathttps://github.com/hobae/AnomiGAN/
64
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
FrequencyofClinVarpathogenicvariantsinchronickidneydiseasepatientssurveyedforreturnofresearchresultsataClevelandpublichospital
DanaC.Crawford1,2,3,,JohnLin1,JessicaN.CookeBailey1,2,TylerKinzy1,JohnR.Sedor4,5,JohnF.O'Toole5,WilliamS.Bush1,2,3
1ClevelandInstituteforComputationalBiology,2DepartmentsofPopulationandQuantitative
HealthSciences,and3GeneticsandGenomeSciences,CaseWesternReserveUniversity4DepartmentofPhysiologyandBiophysics,CaseWesternReserveUniversity;and5DepartmentofNephrologyandHypertension,GlickmanUrologyandKidneyandLernerResearchInstitute,
ClevelandClinicDanaCrawfordReturnofresultsisnotcommoninresearchsettingsasstandardsarenotyetinplaceforwhattoreturn,howtoreturn,andtowhom.Asapioneeroflarge-scaleofreturnofresearchresults,thePrecisionMedicineInitiativeCohortnowknownofAllofUsplanstoreturnpharmacogenomicresultsandvariantsofclinicalsignificancetoitsparticipantsstartinglate2019.Tobetterunderstandthelocallandscapeofpossibilitiesregardingreturnofresearchresults,weassessedthefrequencyofpathogenicvariantsandAPOL1renalriskvariantsinasmalldiversecohortofchronickidneydiseasepatients(CKD)ascertainedfromapublichospitalinCleveland,OhiogenotypedontheIlluminaInfiniumMegaEX.Ofthe23,720ClinVar-designatedvariantsdirectlyassayedbytheMegaEX,8,355(35%)hadatleastonealternatealleleinthe130participantsgenotyped.Ofthese,18ClinVarvariantsdeemedpathogenicbymultiplesubmitterswithnoconflictsininterpretationweredistributedacross27participants.ThemajorityofthesepathogenicClinVarvariants(14/18)wereassociatedwithautosomalrecessivedisorders.OfnotewerefourAfricanAmericancarriersofTTRrs76992529associatedwithamyloidogenictransthyretinamyloidosis,otherwiseknownasfamilialtransthyretinamyloidosis(FTA).FTA,anautosomaldominantdisorderwithvariablepenetrance,ismorecommonamongAfrican-descentpopulationscomparedwithEuropean-descentpopulations.AlsocommoninthisCKDpopulationwereAPOL1renalriskallelesG1(rs73885319)andG2(rs71785313)with60%ofthestudypopulationcarryingatleastonerenalriskallele.BothpathogenicClinVarvariantsandAPOL1renalriskallelesweredistributedamongparticipantswhowantedactionablegeneticresultsreturned,wantedgeneticresultsreturnedregardlessofactionability,andwantednoresultsreturned.Resultsfromthislocalgeneticstudyhighlightchallengesinwhichvariantstoreport,howtointerpretthem,andtheparticipant’spotentialforfollow-up,onlysomeofthechallengesinreturnofresearchresultslikelyfacinglargerstudiessuchasAllofUs.
65
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Network-BasedMatchingofPatientsandTargetedTherapiesforPrecisionOncology
QingzhiLiu1,MinJinHa2,RupamBhattacharyya1,LanaGarmire3,VeerabhadranBaladandayuthapani1
1DepartmentofBiostatistics,UniversityofMichigan;2DepartmentofBiostatistics,The
UniversityofTexasMDAndersonCancerCenter;3DepartmentofComputationalMedicineandBioinformaticsUniversityofMichigan
QingzhiLiuTheextensiveacquisitionofhigh-throughputmolecularprofilingdataacrossmodelsystems(humantumorsandcancercelllines)anddrugsensitivitydata,makesprecisiononcologypossible–allowingclinicianstomatchtherightdrugtotherightpatient.Currentsupervisedmodelsfordrugsensitivityprediction,oftenusecelllinesasexemplarsofpatienttumorsandformodeltraining.However,thesemodelsarelimitedintheirabilitytoaccuratelypredictdrugsensitivityofindividualcancerpatientstoalargesetofdrugs,giventhepaucityofpatientdrugsensitivitydatausedfortestingandhighvariabilityacrossdifferentdrugs.Toaddressthesechallenges,wedevelopedamultilayernetwork-basedapproachtoimputeindividualpatients’responsestoalargesetofdrugs.Thisapproachconsidersthetripletofpatients,celllinesanddrugsasoneinter-connectedholisticsystem.Wefirstusetheomicsprofilestoconstructapatient-celllinenetworkanddeterminebestmatchingcelllinesforpatienttumorsbasedonrobustmeasuresofnetworksimilarity.Subsequently,theseresultsareusedtoimputethe“missinglink”betweeneachindividualpatientandeachdrug,calledPersonalizedImputedDrugSensitivityScore(PIDS-Score),whichcanbeconstruedasameasureofthetherapeuticpotentialofadrugortherapy.Weappliedourmethodtotwosubtypesoflungcancerpatients,matchedthesepatientswithcancercelllinesderivedfrom19tissuetypesbasedontheirfunctionalproteomicsprofiles,andcomputedtheirPIDS-Scoresto251drugsandexperimentalcompounds.Weidentifiedthebestrepresentativecelllinesthatconservelungcancerbiologyandmoleculartargets.ThePIDS-Scorebasedtopsensitivedrugsfortheentirepatientcohortaswellasindividualpatientsarehighlyrelatedtolungcancerintermsoftheirtargets,andtheirPIDS-Scoresaresignificantlyassociatedwithpatientclinicaloutcomes.Thesefindingsprovideevidencethatourmethodisusefultonarrowthescopeofpossibleeffectivepatient-drugmatchingsforimplementingevidence-basedpersonalizedmedicinestrategies.
66
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Phenome-wideassociationstudiesoncardiovascularhealthandfattyacidsconsideringphenotypequalitycontrolpracticesforepidemiologicaldata
KristinPassero1,XiHe1,JiayanZhou1,BertramMueller-Myhsok2,3,4,MarcusE.Kleber5,WinfriedMaerz5,6,7,MollyA.Hall1
1PennState;2MaxPlanckInstituteofPsychiatry;3MunichClusterofSystemsBiology;4University
ofLiverpool;5HeidelbergUniversity;6SYNLABAcademy;7MedicalUniversityofGrazKristinPasseroPhenome-wideassociationstudies(PheWAS)allowagnosticinvestigationofcommongeneticvariantsinrelationtoavarietyofphenotypesbutpreservingthepowerofPheWASrequirescarefulphenotypicqualitycontrol(QC)procedures.WhileQCofgeneticdataiswell-defined,noestablishedQCpracticesexistformulti-phenotypicdata.Manuallyimposingsamplesizerestrictions,identifyingvariabletypes/distributions,andlocatingproblemssuchasmissingdataoroutliersisarduousinlarge,multivariatedatasets.Inthispaper,weperformtwoPheWASonepidemiologicaldataand,utilizingthenovelsoftwareCLARITE(CLeaningtoAnalysis:Reproducibility-basedInterfaceforTraitsandExposures),showcaseatransparentandreplicablephenomeQCpipelinewhichwebelieveisanecessityforthefield.UsingdatafromtheLudwigshafenRiskandCardiovascular(LURIC)HealthStudywerantwoPheWAS,oneoncardiac-relateddiseasesandtheotheronpolyunsaturatedfattyacidslevels.Thesephenotypesunderwentastringentqualitycontrolscreenandwereregressedonagenome-widesampleofsinglenucleotidepolymorphisms(SNPs).SevenSNPsweresignificantinassociationwithdihomo-γ-linolenicacid,ofwhichfivewerewithinfattyaciddesaturasesFADS1andFADS2.PheWASisausefultooltoelucidatethegeneticarchitectureofcomplexdiseasephenotypeswithinasingleexperimentalframework.However,toreducecomputationalandmultiple-comparisonsburden,carefulassessmentofphenotypequalityandremovaloflow-qualitydataisprudent.HereinweperformtwoPheWASwhileapplyingadetailedphenotypeQCprocess,forwhichweprovideareplicablepipelinethatismodifiableforapplicationtootherlargedatasetswithheterogenousphenotypes.Asinvestigationofcomplextraitscontinuesbeyondtraditionalgenomewideassociationstudies(GWAS),suchQCconsiderationsandtoolssuchasCLARITEarecrucialtotheintheanalysisofnon-geneticbigdatasuchasclinicalmeasurements,lifestylehabits,andpolygenictraits.
67
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
aTEMPO:Pathway-SpecificTemporalAnomaliesforPrecisionTherapeutics
ChristopherMichaelPietras,LiamPower,DonnaK.Slonim
TuftsUniversityChristopherPietrasDynamicprocessesareinherentlyimportantindisease,andidentifyingdisease-relateddisruptionsofnormaldynamicprocessescanprovideinformationaboutindividualpatients.Wehavepreviouslycharacterizedindividuals'diseasestatesviapathway-basedanomaliesinexpressiondata,andwehaveidentifieddisease-correlateddisruptionofpredictabledynamicpatternsbymodelingavirtualtimeseriesinstaticdata.Herewecombinethetwoapproaches,usingananomalydetectionmodelandvirtualtimeseriestoidentifyanomaloustemporalprocessesinspecificdiseasestates.Wedemonstratethatthisapproachcaninformativelycharacterizeindividualpatients,suggestingpersonalizedtherapeuticapproaches.
68
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
FeatureSelectionandDimensionReductionofSocialAutismData
PeterWashington1,KelleyMariePaskov1,HaikKalantarian1,NathanielStockham1,CatalinVoss1,AaronKline1,RitikPatnaik2,BriannaChrisman1,MayaVarma1,QandeelTariq1,Kaitlyn
Dunlap1,JesseySchwartz1,NickHaber1,DennisP.Wall1
1StanfordUniversity,2MassachusettsInstituteofTechnology
PeterWashingtonAutismSpectrumDisorder(ASD)isacomplexneuropsychiatricconditionwithahighlyheterogeneousphenotype.FollowingtheworkofDudaetal.,whichusesareducedfeaturesetfromtheSocialResponsivenessScale,SecondEdition(SRS)todistinguishASDfromADHD,weperformeditem-levelquestionselectiononanswerstotheSRStodeterminewhetherASDcanbedistinguishedfromnon-ASDusingasimilarlysmallsubsetofquestions.ToexplorefeatureredundanciesbetweentheSRSquestions,weperformedfilter,wrapper,andembeddedfeatureselectionanalyses.ToexplorethelinearityoftheSRS-relatedASDphenotype,wethencompressedthe65-questionSRSintolow-dimensionrepresentationsusingPCA,t-SNE,andadenoisingautoencoder.Wemeasuredtheperformanceofamulti-layerperceptron(MLP)classifierwiththetop-rankingquestionsasinput.Classificationusingonlythetop-ratedquestionresultedinanAUCofover92%forSRS-deriveddiagnosesandanAUCofover83%fordataset-specificdiagnoses.Highredundancyoffeatureshaveimplicationstowardsreplacingthesocialbehaviorsthataretargetedinbehavioraldiagnosticsandinterventions,wheredigitalquantificationofcertainfeaturesmaybeobfuscatedduetoprivacyconcerns.WesimilarlyevaluatedtheperformanceofanMLPclassifiertrainedonthelow-dimensionrepresentationsoftheSRS,findingthatthedenoisingautoencoderachievedslightlyhigherperformancethanthePCAandt-SNErepresentations.
69
ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE
POSTERPRESENTATIONS
70
ArtificialIntelligenceforEnhancingClinicalMedicine
PrioritizingCopyNumberVariantsusingPhenotypeandGeneFunctionalSimilarity
AzzaAlthagafi,JunChen,RobertHoehndorf
Computer,Electrical&MathematicalScienceandEngineeringDivision(CEMSE),ComputationalBioscienceResearchCenter(CBRC),KingAbdullahUniversityofScienceandTechnology
(KAUST),4700KAUST,23955-6900,Thuwal,KingdomofSaudiArabia
AzzaAlthagafiTherearemanytypesofgeneticvariationinthehumangenome,rangingfromlargechromosomeanomaliestoSingleNucleotideVariant(SNV).Itisbecomingnecessarytodevelopmethodsfordistinguishingdisease-causingvariantsfromalargenumberofneutralgeneticvariationinanindividual.ThisproblemisalsorelevanttoCopyNumberVariants(CNVs),whichisaclassofgeneticvariationwherelargesegmentsofthegenomedifferincopynumberamongstvariousindividuals.Overthepastseveralyears,muchprogresshasbeenmadeintheareaofCNVsdetectionandunderstandingtheirroleinhumandiseases.WenowunderstandthatCNVsaccountformuchofhumanvariability.Correspondingly,therehavebeenseveralmethodsintroducedtofinddisease-associatedgenesandSNVs.DifferentmethodshavebeendevelopedforpredictingandprioritizingpathogenicityofSNVsfoundwithinagenome.ConstructingsimilarmethodsforCNVischallengingduetotheheterogeneityinvariantsize,typeandthepossibilityofmultiplegenesbeingaffectedbylargeCNVs.CNVimpactpredictionmethodsshouldconsiderthesefactorsinordertorobustlyprioritizepathogenicvariants.Wehavebuiltamethodthatincorporatesbiologicalbackgroundknowledgeabouttherelationbetweenphenotypesresultingfromalossoffunctioninmousegenes,genefunctionsasdescribedusingtheGeneOntology(GO),aswellastheanatomicalsiteofgeneexpressionalongwithascorethatpredictsthepathogenicityofCNVSVScore.WeusethisinformationtobuildamachinelearningmodelthatranksCNVsbasedontheirpredictedpathogenicityandtherelationbetweengenesaffectedbytheCNVandthephenotypeweobserveinaffectedindividuals.Additionally,ourapproachconsidersseveralgenomicfeaturesofeachCNVs,suchasthelengthofthecodingsequenceoverlappingwiththeCNV,haploinsufficiencyandtriplosensitivityscorestomeasurethedosage-sensitivityforgenes/regions,andGCcontent.Ourresultsshowthatincorporatingthisinformationleadstoimprovementoverabaselinemodelwhichusesonlysimilarityscoresbetweengene--phenotypeassociationsanddisease-associatedphenotypes,aswellasimprovementoverusingonlypathogenicitypredictionmethodsforCNVs.OurmethodachievesanF-scoreof80.85%,with82.05%precisionand79.67%recallinourevaluationset.Theresultsdemonstratethatincorporatingphenotype,functional,andgeneexpressioninformationmaybeutilizedtoidentifycausativeCNVs.Futureworkisrequiredtoevaluateandimproveourmodelusingpatient-derivedWGSdata.
71
ArtificialIntelligenceforEnhancingClinicalMedicine
InferringtheRewardFunctionsthatGuideCancerProgression
JohnKalantari1,HeidiNelson2,NicholasChia3
1MicrobiomeProgram,CenterforIndividualizedMedicine,MayoClinic,Rochester,MN,USA;2ColonandRectalSurgery,MayoClinic,Rochester,MN,USA;3DivisionofSurgicalResearch,
DepartmentofSurgery,MayoClinic,Rochester,MN,USA
JohnKalantariCancercanoccurinpatientswithdifferentgeneticbackgroundsviaamulti-stepevolutionaryprocess,i.e.,drivenbymodificationandselection,thatcanaccumulatedifferentgeneticalterations.Despitethesedifferences,manycancersubtypesareunifiedbysimilarmechanismsortypesofgeneticchanges.Inotherwords,therearemultipleetiologicalpathstiedtogetherbyspecificeventsthatsharecommonalityintheircausalmechanism.Understandingthesecommonmechanismswillenablethedevelopmentofbettertherapiesandpreventativemeasures.Itwillalsoenableimprovedpredictionofrecurrenceandmetastaticadvancementofcancer,directlyimpactingthe606,880annualcancerdeathsintheUnitedStatesalone.OurworkisbuiltuponthecentralpropositionthattheMarkovDecisionProcess(MDP)canbetterrepresenttheprocessbywhichcancerarisesandprogresses.Morespecifically,byencodingacancercell'scomplexbehaviorasaMDP,weseektomodeltheseriesofgeneticchanges,orevolutionarytrajectory,thatleadstocancerasanoptimaldecisionprocess.WepositthatusinganInverseReinforcementLearning(IRL)approachwillenableustoreverseengineeranoptimalpolicyandrewardfunctionbasedonasetofexpertdemonstrationsextractedfromtheDNAofpatienttumors.Theinferredrewardfunctionandoptimalpolicycansubsequentlybeusedtoextrapolatetheevolutionarytrajectoryofanytumor.Weintroduceanoveldata-agnosticartificialintelligenceframeworkwhichcaninferrewardfunctionsdescribingthecausalmechanismsthatbestexplaintheobservedbehaviorofan'optimally-behavingagent'–thecancercell.Usingmulti-omicdatafrom27colorectalcancer(CRC)patientsasproof-of-principle,weshowthatIRLprovidesasystematicandscalableapproachtoformallystatingandsolvingtheproblemofcancerevolution.Byprovidingalineagepath(i.e.,sequencesofalterations)obtainedviasubclonalreconstructionforeachtumor,weareabletoreducethiscomplexproblemtotherecoveryofanassociatedreinforcementlearningrewardfunction.Theserewardfunctionshavethepotentialtomodelunknownmolecularmechanismsdrivingintratumorheterogeneityandtoelucidatecanceretiologies.
72
ArtificialIntelligenceforEnhancingClinicalMedicine
Predictingdisease-associatedmutationofmetal-bindingsitesinproteinsusingadeeplearningapproach
MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun
DepartmentofChemistry,TheUniversityofHongKong,HongKong,China;
DepartmentofHealthSciences,MayoClinic,Scottsdale,AZ,USA;DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic,
Scottsdale,AZ,USA;CenterforIndividualizedMedicine,MayoClinic,Scottsdale,AZ,USA;
CollegeofHealthSolutions,ArizonaStateUniversity,Scottsdale,AZ,USA
JunwenWangMetalloproteinsplayimportantrolesinmanybiologicalprocesses.Mutationsatthemetal-bindingsitesmayfunctionallydisruptmetalloproteins,initiatingseverediseases;however,thereseemedtobenoeffectiveapproachtopredictsuchmutationsuntilnow.Herewedevelopadeeplearningapproachtosuccessfullypredictdisease-associatedmutationsthatoccuratthemetal-bindingsitesofmetalloproteins.Wegenerateenergy-basedaffinitygridmapsandphysiochemicalfeaturesofthemetalbindingpockets(obtainedfromdifferentdatabasesasspatialandsequentialfeatures)andsubsequentlyimplementthesefeaturesintoamultichannelconvolutionalneuralnetwork.Aftertrainingthemodel,thenetworkcansuccessfullypredictdisease-associatedmutationsthatoccuratthefirstandsecondcoordinationspheresofzinc-bindingsiteswithanareaunderthecurveof0.90andanaccuracyof0.82.Ourapproachstandsforthefirstdeeplearningapproachforthepredictionofdisease-associatedmetal-relevantsitemutationsinmetalloproteins,providinganewplatformtotacklehumandiseases.
73
ArtificialIntelligenceforEnhancingClinicalMedicine
GENERAL
POSTERPRESENTATIONS
74
General
RankingRASpathwaymutationsusingevolutionaryhistoryofMEK1
KatiaAndrianova,IgorJouline
OhioStateUniversity,DepartmentofMicrobiology,Columbus,Ohio43210
KatiaAndrianovaTheRas/MAPK(ratsarcoma/mitogen-activatedproteinkinase)signalingpathwayisinvolvedinessentiallyallaspectsoforganismaldevelopment,fromthefirstcelldivisionsintheearlyembryotopostnataldevelopmentandgrowth.Givenitscriticalfunction,itisnotsurprisingthatderegulatedRas/MAPKsignaling,resultingfromeithergeneticorenvironmentalperturbations,canleadtocanceranddevelopmentalabnormalities.Alargeclassofsuchabnormalities,knownasRASopathies,isassociatedwithactivatinggerm-linemutationsinmanycomponentsoftheRaspathway.Overthepastdecadewhennextgenerationsequencing(NGS)hasbecomevaluableandcost-effectivetoolforresearchapplicationsandclinicaldiagnosticsofMendeliandiseases,simultaneoussequencingofmultiplegenesinMAPKsignalingpathwayshaveyieldedmanyreportswithhundredsofmutationspossiblyassociatedwithRASopathiesandcancer.Inparticular,multiplenewmutationswereidentifiedinMEK1kinase.Themajorityofnewlydiscoveredcodingvariationsneitherhavebeendescribedinotherindividualsnorhavebeenstudiedorfunctionallyanalyzedincellularoranimalmodels,thusleavingclinicianstorelyoninsilicopredictionsofthe“variantsofuncertainsignificance”consequenceswithcomputationalsoftware,suchasPolyPhenandSIFT.Automatedsequencesearchesusedinthesemethodsdonotdistinguishpossibleduplicationeventsinthegenes’histories,hencemultiplesequencealignment(MSA)setsusuallyincludebothorthologandparalogcopies.Aspurifyingselectiontreadsononeoftheduplicatecopyitcanbecomeassociatedwithadifferentphenotypecomparedtoitsparalogoussiblingand/ortotheparentalgene.InmostcasesofMendeliandiseasesonlyonespecificduplicateofthegeneinthehumangenomeresultstobeassociatedwithadisease.Thisindicatestheimportanceofconsideringbothcommonancestorsandanygene’sduplicationhistoryforthevariantsinterpretation.ThepresenceofsevenhumanMEKproteinsincreasesthechancesofincludingparalogsintotheanalysis,andtherefore,substantiallylimitsmutationinterpretation.InthisstudyweestablishedthefirstprecisedescriptionofanevolutionaryhistoryofMEKkinasesandidentifiedpotentialduplicationevents.WedeterminedthatMEK1isanancestoroftheentireMEKfamily.Indepthanalysisoftheorthologousproteinsshowedthatessentiallyallexperimentallyprovenpathogenicmutationswerepredictedas“damaging”byourapproach.BycomparingourresultswiththepredictionsmadebyPolyPhen-2andSIFTweshowedhowcarefulanalysisofanevolutionaryhistoryofagenemayimproveaccuracyofmissensemutationsoutcomesprediction.
75
GeneraNlGeneral
IntegrativeAnalysisofCOPDandLungCancerMetadataRevealsSharedAlterationsinImmuneResponse,PTENandPI3K-AKTPathways}
DannielleSkander1,ArdaDurmaz1,MohammedOrloff2,GurkanBebek1
1CaseWesternReserveUniversity,2UniversityofArkansasforMedicalSciences
GurkanBebekChronicobstructivepulmonarydisease(COPD)andlungcancerareamongtheleadingcausesofdeathworldwide.Whileitisbelievedthetwodiseasesarerelated,themechanismsbehindthisrelationshipremainunclear.WeinvestigatetherelationshipbetweenCOPDandlungcancerusinganintegrative-omicsapproach.IntegrationofepigeneticandmRNAgeneexpressiondataallowsustodiscoverthefunctionallyrelevantgenes,i.e.,thegenescrucialfordiseasedevelopment.Usingthisapproach,ourstudysuggeststhatthemechanismsdrivingthedevelopmentofbothdiseasesarerelatedtotheinterleukinimmuneresponse(IL4andIL17),PTENandPI3K-AKTpathways.UnderstandingthisrelationshipbetweenCOPDandlungcanceriscrucialforfuturepreventionandtreatmentoptionsofbothCOPDandlungcancer.
76
General
Investigatingsourcesofirreproducibilityinanalysisofgeneexpressiondata
CarlyA.Bobak,JaneE.Hill
DartmouthCollege
CarlyBobakTheuseofbigdatapromisestochangethelandscapeofbiomedicalresearch;however,irreproducibilityofresultsremainsaproblem.Inthiswork,wesetouttoinvestigateproposedmethodstoincreasereproducibilityofgeneexpressionresults.Specifically,wetestthefollowingthreehypotheses:Resultsfrompathwayenrichmentwillbemoresimilaracrossdatasetsthanresultsondifferentiallyexpressed(DE)genesSimilarityacrosssmallerdatasetswillbelowerthansimilarityinlargerdatasetsResultsfrommulti-cohortdatawillbemoresimilarthanresultsfromsinglecohortdataWeselectedthreeuniquedatasetsfromtheGeneExpressionOmnibusthatincludeactiveTBpatients,spanningpediatricandadultpatients.IneachdatasetwerankedDEgenesastheywereassociatedwithTBvsother(healthycontrols,otherdiseases,orlatenttuberculosisinfection).Wethencalculatedtherankbiasedoverlap(RBO)oftherankedgenesacrosseachdataset.RBOisasimilaritymeasurescaledbetween0and1andcanbeinterpretedastheaverageagreementbetweentwolists.Genesetenrichmentanalysis(GSEA)wasperformed,andwecalculatedarankforthepathwayhitsandcomparedRBOforassociatedpathwaysbetweendatasets.Onaverage,theRBOincreasedbyafoldchangeof1.83×10^4whencomparingsimilarityofassociatedpathwaystosimilarityofDEgenes.Wethendividedeachdatasetinhalfandrepeatedtheanalysisonallsub-datasets.Sub-datasetsfromthesameparentdatasethadsimilarresults(meanRBOof0.60,sd=0.24)asopposedtosubsetsfromadifferentparentdataset(mean=0.10,sd=0.15).Contradictingouroriginalhypothesis,overallRBOcalculatedbetweensubsetsfromdifferentparentdatasetsdidnotnecessarilydecreasecomparedtotheinitialRBOcalculation–infact,halfoftheRBOcomparisonsincreasedinthesub-datasetscomparedtousingthewholedatasets.Totestthefinalhypothesis,weco-normalized,merged,andthenrandomlydivideddatasetsintothreeapproximatelyequalpieces.WerepeatedtheDEanalysisoneachpieceofthemergeddataset.Acrossmixeddatasets,themeanRBOwas0.023(sd=0.43).Heterogeneousdatasetsweremorealikethanuniquedatasets,butlessalikethanasingledivideddataset.However,theRBOsfrommixeddatasetscomparedtooriginaldatasetswerenotstatisticallysignificantlydifferentfromtheRBOscomparingresultsfromtheoriginaldatasets.Thus,wedemonstratedthatassociatedpathwaysaregreatlymorereproduciblethanassociatedgenes.Furtherstudyisnecessarytoinvestigatetheconditionsunderwhichstatisticalpowerandheterogeneityofdatainfluencereproducibilityoffindingsfromgeneexpressionstudies.
77
General
EthereumandMultiChainblockchainsassecuretoolsforindividualizedmedicine
CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein
YaleUniversityComputationalBiologyandBioinformaticsProgram
CharlotteBrannonWiththerapidlydecreasingcostofgenomesequencingandadventofindividualizedmedicine,relianceonindividualgenomicdatawillsoonbeintegraltomedicaltreatmentdecisions.Forexample,apatient’spersonalgenomicsequencewillprovidephysicianswithinformationonwhichtobasetestsanddiagnoses.Similarly,pharmacogenomicsdatawillrevealthemosteffectiveprescriptionsforaparticularpatient.Genomicdatawillneedtobesharedefficientlyamongmultipleparties.However,becausethesearesensitivepersonaldatawhichwilldirectlyimpactmedicaltreatmentdecisions,theymustbemaintainedinasecure,high-integrityfashion.Blockchaintechnologyisonewaytoachievesecure,high-integritydatastorage.Wepresenttwoproof-of-conceptsolutions,oneforstoringandqueryingpersonalgenomicsequencedatainaMultiChainblockchaindesignedfordirectsharingwithphysicians;andoneforstoringandqueryinggene-druginteractiondatainanEthereumblockchainsmartcontractdesignedforsharedaccessamongpermissionedresearchersandphysicians.Despitethehighsecurityandintegritythatcomeswithblockchaindatastorage,thereisatrade-offwithdataaccessefficiencyandstoragecosts.Weovercomethesechallengesbydevelopingnovelstoragetechniques.Whenstoringpersonalgenomicsequencedata,wedonotstoretheactualsequencedatabutratherasetofmeta-datawhichcanbeusedincombinationwithareferencegenometoreconstructtheoriginalsequences.Whenstoringpharmacogenomicsdata,weuseanindex-based,multi-mappingapproachtoprovidetime-andspace-efficientinsertionandquerying.
78
General
GenomicpredictorsofL-asparaginase-inducedpancreatitisinpediatriccancerpatients
BrittI.Drögemöller,GalenE.B.Wright,ShahradR.Rassekh,ShinyaIto,BruceC.Carleton,ColinJ.D.Ross,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium
FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BC,Canada;BCChildren’sHospitalResearchInstitute,UniversityofBritishColumbia,Vancouver,BC,Canada;DepartmentofPediatrics,FacultyofMedicine,UniversityofBritishColumbia,Vancouver,BC,Canada;ClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Universityof
Toronto,Toronto,ON,Canada;PharmaceuticalOutcomesProgramme,BCChildren’sHospital,Vancouver,BC,Canada
BrittDrogemollerBackground:L-asparaginaseishighlyeffectiveinthetreatmentofpediatricacutelymphoblasticleukemia.Unfortunately,theuseofthistreatmentislimitedbytheoccurrenceofpancreatitis,asevereandpotentiallylethaladversedrugreaction,whichoccursin2-18%ofpatients.AspreviousstudieshavebeenunabletoidentifystrongassociationsbetweenclinicalvariablesandsusceptibilitytoL-asparaginase-inducedpancreatitis,geneticfactorsareexpectedtoplayanimportantrolethisadversedrugreaction.Objectives:WesoughttoexploretheroleofthesegeneticsusceptibilityfactorstoL-asparaginase-inducedpancreatitisinpediatriccancerpatients.Methods:PatientswhoweretreatedwithL-asparaginasewererecruitedfrom13pediatriconcologyunitsacrossCanada(n=284)andextensiveclinicaldatawerecollectedforallpatients.GenotypingwasperformedusingtheIlluminaHumanOmniExpressandGlobalScreeningArraysandpancreaticgeneexpressionprofileswereimputedintheseindividualsusingGTExv7andS-PrediXcan.Genome-andtranscriptome-wideassociations(GWASandTWAS)wereperformedtoidentifyassociationswithL-asparaginase-inducedpancreatitis.Results:GWASanalysesidentifiedsignificantassociationsbetweengeneticvariantsinHLA-DQA1and–DRB1andpancreatitis,whileTWASrevealedthatindividualsexperiencingL-asparaginase-inducedpancreatitisexhibitedlowerexpressionlevelsofHLA-DRB5.FurtherinterrogationoftheTWASdatarevealedanenrichmentingenesinvolvedinthesomaticdiversificationofimmunereceptors.Conclusions:Theseanalysesuncoveredanassociationbetweengeneticvariationinimmune-relatedgenesandthedevelopmentofL-asparaginase-inducedpancreatitis.TheseassociationsmirrorpreviousassociationswiththeHLAregionand(i)pancreatitisinducedbyotherdrugsand(ii)L-asparaginase-inducedhypersensitivity.
79
General
NITECAP:Anovelmethodandinterfacefortheidentificationofcircadianbehaviorinhighlyparalleltime-coursedata
ThomasG.Brooks1,CrisW.Lawrence1,NicholasF.Lahens1,SoumyashantNayak1,DimitraSarantopoulou1,GarretA.FitzGerald1,2,GregoryR.Grant3
1InstituteforTranslationalMedicineandTherapeutics(ITMAT),UniversityofPennsylvania;
2SystemsPharmacologyandTranslationalTherapeutics;3DepartmentofGenetics,UniversityofPennsylvania
ThomasBrooksWeintroduceanewtoolcalledNITECAPforthetaskofidentifyingcircadianbehaviorinmassivelyparallelmeasurementsofbiologicalentities;forexample,findingcircadiangenesfromgeneexpressiontimecoursedatameasuredbyRNA-Seqormicroarrays.NITECAPemploysapermutation-basedapproachwhichusesanovelstatisticdesignedtobesensitivetocircadianbehavior.NITECAPalsousesanapproachtomultiple-testingwhichproducesq-valuesdirectlywithoutneedingtofirstgeneratep-valueswhichthenneedtobeadjusted.Ourapproachhasseveraladvantagesparticularlywhenindividualp-valuesareunderpoweredorunreliable.Importantly,wehavedevelopedanintuitiveuser-friendlyweb-basedinterfacewhichenablesinvestigatorstoperformrobustcircadiananalysesofthistypedirectlywithoutexpertinformaticssupport.Userscanquicklyscrollthroughtimecourseprofilessortedbyeffectsize,greatlyfacilitatingthechoiceofsignificancethresholdsthatcurrentlyrequiremakingblindchoicesofnumericalcutoffs.Puttingthistypeofanalysisinthehandsoftheinvestigatorscansignificantlystreamlinetheirresearch.ThewebsitealsoenablestheotherstandardsignificancetestssuchasJTKandANOVAandprovidestoolstoperformcomparativestudies,suchasfindingphaseoramplitudedifferencesbetweendifferentconditions.NITECAPisfreelyavailableforpublicuseat:http://www.nitecap.org
80
General
TheInterplayofObesityandRace/EthnicityonMajorPerinatalComplications
YaadiraBrown,MPH1;OlubodeA.Olufajo,MD,MPH2;EdwardE.CornwellIII,MD2;WilliamSoutherland,PhD3
1ResearchCentersinMinorityInstitutions:HowardUniversity,HowardUniversityCollegeofMedicine;2ResearchCentersinMinorityInstitutions:HowardUniversity,CliveCallender
Howard-HarvardHealthSciencesOutcomesResearchCenter;3ResearchCentersinMinorityInstitutions:HowardUniversity
YaadiraBrownBackground:Ithasbeenestablishedthatasignificantdisparityexistsintheratesofadverseperinataloutcomesacrossdifferentracial/ethnicgroups,withnon-HispanicBlackwomengenerallybeingmostimpacted.Thereisalsoevidencethatobesityisassociatedwithadverseperinataloutcomes.Althoughsomestudieshaveexaminedtheimpactofrace/ethnicityandobesityonadverseperinataloutcomes,moststudieshavedonesousinglocalorstatewidedata.Thisstudyaimstouseanationalsampletodeterminetheroleofobesityintheracial/ethnicdisparitiesseeninadverseperinataloutcomesintheUnitedStates.Methods:DatafromtheNationalInpatientSamplewasutilizedinselectingpregnantwomenadmittedfordeliverybetween2010and2014.Demographics(race/ethnicity,insurancetype,householdincome,co-morbidities)andhospitalcharacteristicswereextracted.Race/ethnicitywascategorizedasNon-HispanicWhites(NHW),Non-HispanicBlacks(NHB),andHispanics.Outcomesofinterestweregestationaldiabetes,pre-eclampsia,pre-termbirth,andhospitalmortality.Multivariatelogisticregressionswereperformedtodeterminetheindependentpredictorsoftheoutcomes,usingtwosetsofmodels;onewhichincludedobesityasavariableinthemodelandonewhichdidnot.ThedifferencesbetweenthetwosetsofmodelswerecomparedbyperformingtheWaldTest.Results:Ourcohortconsistedof15,561,942pregnantindividualsadmittedfordelivery.Therewere9,247,729(59.43%)NHW,2,552,569(16.4%)NHB,and3,761,644(24.17%)Hispanic.Comparedtoothergroups,NHBhadsignificantlyhigherratesofpre-eclampsia(5.1%),pre-termbirth(9.4%),andhospitalmortality(.11%).Theyalsohadthehighestratesofobesity(9.0%).Onmultivariateanalysis,NHBweremorelikelytohavepre-eclampsia(AdjustedOddsRatio[aOR]1.26;95%ConfidenceInterval[CI]1.23-1.29),pre-termbirth(aOR1.38;95%CI1.34-1.41),andhospitalmortality(aOR2.05;95%CI1.2-3.38)whencomparedtoNHW.However,theyhadasimilarriskforgestationaldiabetes(aOR0.94;95%CI0.91-0.96)asNHW.Obesitywassignificantlyassociatedwithgestationaldiabetes(aOR3.08;95%CI3.02-3.15),pre-eclampsia(aOR2.14;95%CI2.09-2.19),andpre-termbirth(aOR1.04;95%CI1.01-1.06).Althoughthedifferenceswereminimal,theregressionmodelsthatincludedobesityasavariablebetterpredictedtheoutcomesthanthosethatdidnotwhenassessinggestationaldiabetes,pre-eclampsia,andpre-termbirth.Conclusion:Thesefindingsfurtherconfirmthatracial/ethnicdisparitiesexistamongstadverseperinataloutcomes,withNHBbeingdisproportionatelyaffected.Theyalsosuggestthatobesityplaysasignificantroleintheracial/ethnicdisparitiesthatdoexistfortheadverseperinataloutcomesmeasured,otherthanhospitalmortality.Thesedatasuggestthataddressingobesityinthepopulationmaybebeneficialinimprovingperinataloutcomes,buttheyalsosuggestthatmoreresearchisneededtoidentifythemajorfactorsthatdrivetheracial/ethnicdisparitiesthatexistamongstperinataloutcomesintheUnitedStates.
81
General
AComparisonofPharmacogenomicInformationinFDA-ApprovedDrugLabelsandCPICGuidelines
KatherineI.Carrillo1,TeriE.Klein2
1HenryM.GunnHighSchool,PaloAlto,CA;2StanfordUniversity,Stanford,CA
KatherineCarrilloPharmacogenomics(PGx)isusefulinhelpingtopredictapatient’slikelyreactiontoamedicationbasedontheirgenotype,allowingforpersonalizedmedicine.TheFDAmaintainsa“TableofPharmacogenomicBiomarkersinDrugLabeling”(https://www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenomic-biomarkers-drug-labeling)consistingofpharmacogenomicinformationfoundinthedruglabeling.However,manylabelsonthelistdonotcontainadviceforaclinicianabouthoworwhentouseapatient’sgeneticinformation.GuidelinescreatedbytheClinicalPharmacogeneticImplementationConsortium(CPIC;https://cpicpgx.org/)containinformationabouthowtousepatientgeneticinformationwhenprescribingdrugs.Also,CPICprovidesguidelinesforsomedrugsnotcurrentlyontheFDAbiomarkerlist,thoughitdoesnotprovideguidelinesforeverydrugonthebiomarkerlist.UsingPharmGKBannotatedFDA-approvedlabels(throughOctober2019),weevaluatedlabelinformationtodetermine(1)whichlabelscontainedanykindofprescribinginformationincludingasuggestedalternatedrug,dosinginformationorspecialconsiderationsbasedonthepatient’sgenotype/metabolizerstatus,(2)whichPharmGKBannotatedlabelswerepresentontheFDAbiomarkerlist,and(3)whatgeneswereinvolved.WedidnotincludeFDAlabelsannotatedforgeneticvariationincancercells;onlygermlinevariationwasincluded.WecomparedallavailableCPICguidelinerecommendationstotheinformationfromthelabels.Weidentifiedwherethelabelsandguidelinesaresimilarornot.PharmGKBhas223annotations(notincluding82annotationsforcancercellDNAvariation)basedon219FDA-approveddruglabels.Ofthese,199labelsarecurrentlyonthebiomarkerlistand17wereonthebiomarkerlistatonetimebuthavebeenremovedbytheFDA.Twentylabelshavedosinginformationand35recommendanalternatedrugbasedongenotype/metabolizerphenotype.Another34labelshavesomeotherspecialconsideration,butmostlabelsonthebiomarkerlist(136)havenoguidanceforcliniciansaboutwhattodoaboutthebiomarker,ifanything.Thereare45drugswithpublishedCPICguidelines(https://cpicpgx.org/genes-drugs/).Thirty-sixofthedrugshavealabelontheFDAbiomarkerlistbuttheinformationonthelabeldoesnotalwaysmatchtheguideline.Only21oftheCPICdrugshavelabelswithguidance.Forsomedrugs,thePGxinformationonthelabelsissimilartotheCPICguidelinesbutdifferentformanyothers.TheFDAbiomarkerlisthasmoredrugsthanCPICguidelineswrittenandinsomecasesthelabelstellclinicianswhentheyshouldtestapatientwhileCPICdoesn’ttalkabouttesting.However,formostdrugs,thelabelsdon’tgivethecliniciansalotofinformationaboutwhattodowiththeirpatients’genetictestresults.ForthedrugswithCPICguidelines,thereismoreinformationabouthowtousegenetictestresultsandwhy.FundedbyNIH/NIGMSR24GM61374.
82
General
xTEA:atransposableelementinsertionanalyzerforgenomesequencingdatafrommultipletechnologies
ChongChu1,RebecaMonroy2,SoohyunLee1,E.AliceLee2,PeterJ.Park1
1HarvardMedicalSchool,2BostonChildren'sHospital
E.AliceLeeTransposableelements(TEs)comprisenearly50%ofthehumangenome.AlthoughmostoftheTEsarenowsilent,severaltypesofretrotransposonsincludingLINE-1,Alu,andSVAarestillactive.SomaticTEinsertionshavebeenshowntooccurfrequentlyinmultipletumortypes[1,2]andatalowrateinneuronsofphenotypicallynormalindividuals[3].MultipletoolshavebeendevelopedtocallTEinsertionsfromgenomesequencingdata,butanefficienttoolthatcanidentifybothgermlineandsomaticTEinsertionswithhighsensitivityandspecificityisstilllacking.Moreover,newertechnologiessuchas10XLinked-ReadandPacBioorNanoporelongreadsequencingprovideanunprecedentedopportunitytostudyTEs;however,currentmethodsdonottakeadvantageofthesedatatypes.Here,wepresentanewcomputationaltoolxTEA,buildingonourpreviousalgorithmTEA[1].ThistoolidentifiesTEinsertionsfromIlluminapaired-endreads,10XLinked-Reads,longreads,oracombineddataset.xTEAoutperformsMELT[4]andTraffic-mem[5]onnormalandtumorIlluminadata,respectively.Acomparisonofdifferentsequencingplatformsrevealsthattheanalysisoflongreadshadgreatersensitivityandspecificity,especiallyinrepetitiveregions.Both10XLinked-ReadsandlongreadsdemonstratedclearadvantagesovershortreadsinconstructingfulllengthTEinsertions.Betterperformancewasachievedonhybriddatacomparedtosingleplatformdata.Using22humansampleswitheitherPacBioorNanoporelongreadsandmatchedshortreads,weuncoveredLINE-1internalSVhotspotsandSVAinternalVNTRexpansion.xTEAisacomprehensivecross-platformTEinsertion-callingtool.Itcanbedeployedonacomputingcluster,AWS,andGoogleCloud,andisefficientforlargecohortanalysis.xTEAispubliclyavailableathttps://github.com/parklab/xTEA.References[1]Lee,Eunjung,etal."Landscapeofsomaticretrotranspositioninhumancancers."Science337.6097(2012):967-971.[2]Rodriguez-Martin,Bernardo,etal."Pan-canceranalysisofwholegenomesrevealsdriverrearrangementspromotedbyLINE-1retrotranspositioninhumantumours."BioRxiv(2017):179705.[3]Evrony,GiladD.,etal."Celllineageanalysisinhumanbrainusingendogenousretroelements."Neuron85.1(2015):49-59.[4]Gardner,EugeneJ.,etal."TheMobileElementLocatorTool(MELT):population-scalemobileelementdiscoveryandbiology."Genomeresearch27.11(2017):1916-1929.[5]Tubio,JoseMC,etal."ExtensivetransductionofnonrepetitiveDNAmediatedbyL1retrotranspositionincancergenomes."Science345.6196(2014):1251343.
83
General
GoGetData(GGD):simple,reproducibleaccesstoscientificdata
MichaelCormier1,JonBelyeu1,BrentPedersen1,JoeBrown1,JohannesKoster2,AaronR.Quinlan1
1DepartmentofHumanGenetics,UniversityofUtah,SaltLakeCity,UT,USA;2Algorithmsforreproduciblebioinformatics,InstituteofHumanGenetics,UniversityofDuisburg-Essen,Essen,
NRW,Germany
AaronQuinlanGenomicsresearchiscomplicatedbythedifficultyofidentifying,collecting,andintegratingthenumerousdatasetsandannotationsgermanetoourexperiments.Furthermore,thesedataexistindisparatesources,andarestoredindiverse,oftenabusedformatspertainingtodifferentgenomebuilds.Thesecomplexitieswastetime,inhibitreproducibility,andcurtailresearchcreativity.Inspiredbythesuccessofsoftwarepackagemanagers,wehavedevelopedGoGetData(GGD;https://gogetdata.github.io/)asafast,reproducibleapproachtoinstallstandardizedpackagesofdataandannotationsforgenomicsresearch.
84
General
GlobalepigenomicregulationofgeneexpressionandcellularproliferationinT-cellleukemia
SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue
PennsylvaniaStateUniversityCollegeofMedicine,Hershey,PA,USA
SinisaDovatIkarosencodesaDNA-bindingproteinthatfunctionsasatumorsuppressorinT-cellacutelymphoblasticleukemia(T-ALL).Deletionand/orfunctionalinactivationofIkarosresultsinthedevelopmentofhigh-riskleukemia.ThemechanismsthroughwhichIkarosregulatesgeneexpressionandtumorsuppressioninT-ALLareunknown.Ikaroshaplo-knockoutmicedevelopT-ALLwith100%penetrancewitharrestofT-celldifferentiation.DuringtheprocessofmalignanttransformationtoT-ALL,IkaroshaploinsufficientthymocyteslosetheirremainingwildtypeIkarosallele.Re-introductionofIkarosintoIkaros-nullT-ALLcellsresultsincessationofcellularproliferationandinductionofT-celldifferentiation.Thus,thisisanoptimalsystemforstudyingIkarostumorsuppressorfunctionbecauseitcapturestheroleofIkarosinthetransitionfromamalignantstate(Ikaros-nullT-ALL)toanon-malignantstate(followingIkarosre-introduction).WeusedATAC-seqandChIP-seqofH3K4me1,H3K4me3,H3K27ac,andIkarostoperformdynamic,globalepigenomicandgeneexpressionanalysesatseveraltimepointsinIkaros-nullT-ALLandfollowingIkarosre-introductioninordertodeterminethemechanismsofIkaros’tumorsuppressoractivity.ExpressionanalysisidentifiedalargenumberofnovelsignalingpathwaysthataredirectlyregulatedbyIkarosandIkaros-inducedenhancers,andthatareresponsibleforthecessationofproliferationandinductionofT-celldifferentiationinT-ALLcells.EpigenomicanalysisidentifiednovelIkarosfunctionsintheepigeneticregulationofgeneexpression:Ikarosdirectlyregulatesdenovoformationanddepletionofenhancers;denovoformationofactiveenhancersandactivationofpoisedenhancers;andIkarosdirectlyinducestheformationofsuper-enhancers.GlobalanalysisofchromatinaccessibilityrevealedthatIkarosbindingresultedintheopeningofover3400previously-inaccessiblechromatinsites.ThisisaccompaniedbydenovoenrichmentofH3K4me1andH3K4me3modificationsandformationofdenovoenhancersandpromoters.ThesedatademonstratethatIkaroshaspioneeractivityandtriggerscoordinatedregulationofgeneexpression.Ikarospioneeringactivitywasfurtherdeterminedbydirectbindingofikarostoreconstitutednucleosomesbyelectromobilityshiftassay.Dynamicanalysesdemonstratethelong-lastingeffectsofIkaros’DNAbindingonenhanceractivation,denovoformationofenhancersandsuper-enhancers,andchromatinaccessibility.Inconclusion,ourresultsestablishthatIkaros’tumorsuppressorfunctionoccursviaglobalregulationoftheenhancerandsuper-enhancerlandscape,alongwithregulationofchromatinaccessibility,andidentifiednoveltumorsuppressorregulatorypathwaysinT-ALL.
85
General
Apharmacogenomicinvestigationofthecardiacsafetyprofileofondansetroninchildrenandinpregnantwomen
GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraA.Magee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,
ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton
BritishColumbiaChildren’sHospitalResearchInstitute,Vancouver,BritishColumbia,Canada;PharmaceuticalOutcomesProgramme,BritishColumbiaChildren’sHospital,Vancouver,BritishColumbia,Canada;Divisionof
TranslationalTherapeutics,DepartmentofPediatrics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;ClinicalResearchUnit,Children'sHospitalResearchInstituteofManitoba,Winnipeg,
Manitoba,Canada;DivisionofClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Toronto,Ontario,Canada;BritishColumbiaWomen’sHospitalandHealthCentre,Vancouver,BritishColumbia,Canada;DepartmentofAnesthesiology,PharmacologyandTherapeutics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;SchoolofLifeCourseSciences,FacultyofLifeSciencesandMedicine,King'sCollege,London,UnitedKingdom;DepartmentofPediatricAnesthesia,BritishColumbiaChildren'sHospital,Vancouver,BritishColumbia,Canada;MaxRadyCollegeofMedicine,RadyFacultyofHealth
Sciences,UniversityofManitoba,Winnipeg,Manitoba,Canada;DepartmentofObstetricsandGynecology,St.Michael'sHospital,Toronto,Ontario,Canada;EpiMethodsConsulting,Toronto,Ontario,Canada;DivisionofCardiology,DepartmentofPediatrics,Children'sHeartCentre,BCChildren'sHospital,UniversityofBritish
Columbia,Vancouver,CanadaGalenWrightBackground:5-HT3receptorantagonists,suchasondansetron,arehighlyeffectivemedicationsforthetreatmentofnauseaandvomiting.However,thesemedicationsarealsoassociatedwithprolongationoftheQTinterval,placingpatientsatriskofcardiacadverseevents.Pharmacogenomicinformationfortherapeuticresponsetoondansetronexists,particularlypertainingtoCYP2D6,butnostudyhasbeenperformedongeneticfactorsthatinfluencethecardiacsafetyofthismedication.Objectives:Determineondansetron-inducedcardiacelectrophysiologicalchangesinthreeuniquepatientcohortsandidentifypharmacogenomicpredictorsofQTintervalprolongation.Methods:Threepatientgroupsreceivingondansetronforthepreventionofnauseaandvomitingwererecruitedandfollowedprospectively(pediatricpost-surgicalpatientsn=101;pediatriconcologypatientsn=98;pregnantwomenn=62).Electrocardiogramswereconductedatbaselineandpost-ondansetronadministration.PharmacogenomicassociationswerethenassessedviaanalysesofcomprehensiveCYP2D6genotypingdataandgenome-wideassociationanalyses.Results:Intheentirecohort,62patients(24.1%)weredefinedascasesbasedonBazett-correctedQTcvalues.Themostsignificantshiftfrombaselineoccurredatfiveminutespost-administration(P=9.8x10-4).Genome-wideanalysesidentifiednovelcandidategenesforthisdrug-inducedphenotype.ThetwomostsignificantassociationswereobservedforamissensevariantinTLR3(rs3775291;P=2.00x10-7)andaneQTLforSLC36A1(rs34124313;P=1.97x10-7).Thesegenesareimplicatedinserotonin-andQT-relatedtraitsandthereforelikelyrepresentbiologicallyrelevantfindings.CYP2D6activityscorewasnotassociatedwithcase-controlstatus.Conclusions:Theresultsofthisstudyprovidethefirststeptowardsunderstandingthegenomicbasisofcardiacchangesoccurringafterondansetronuseinchildrenandpregnantwomen,withtheoverallgoaltoimprovethesafetyofthesecommonlyusedantiemeticmedications.
86
General
TREND:aplatformforexploringproteinfunctioninprokaryotesusingphylogenetics,domainarchitectures,andgeneneighborhoodsinformation.
VadimM.Gumerov,IgorB.Zhulin
TheOhioStateUniversity
VadimGumerovKeystepsinacomputationalstudyofproteinfunctioninvolveanalysisof(i)relationshipsbetweenhomologousproteins,(ii)proteindomainarchitecture,and(iii)geneneighborhoodsthecorrespondingproteinsareencodedin.Eachofthesestepsrequiresaseparatecomputationaltaskandsetsoftools.Combiningtheresultsintoacompleteanalysisisusuallydonebyhand,whichistime-consuminganderror-prone.Herewepresentanewplatform,TREND(tree-basedexplorationofneighborhoodsanddomains),whichcanperformallthenecessarystepsinautomatedfashionandputthederivedinformationintophylogenomiccontext,thusmakingevolutionarybasedproteinfunctionanalysismoreefficient.TRENDisfreelyavailableathttp://trend.zhulinlab.org.TRENDconsistsoftwopipelines:(1)Domains,whichidentifiesproteindomains,transmembraneregionsandlow-complexitysegments,andmapsthisinformationonthephylogenetictree,and(2)Neighborhoods,whichidentifiesgeneneighborhoodsforthegivensetofproteinsequences,clustersthegenesbasedonshareddomainsoftheencodedproteins,identifiesoperonsandputsthederiveddataintophylogenomiccontext.LocallystoreddatabasesofthePfamprofileHiddenMarkovmodels(HMMs)andCDDposition-specificscoringmatricesareusedasasourceofmodelsfordomainsidentification.Anothersourceisarichcollectionofsignal-transductionspecificprofileHMMsderivedfromMiSTdatabase.Thepipelinesarehighlycustomizable.Onstart,bothpipelinesfirstalignprovidedproteinsandbuildphylogenetictrees.Thesestepscanbeskippedifaresearcheralreadyhasanalignmentoratreeandwouldliketousetheminstead.Optionallyredundancyofthesequencescanbereduced.Insteadofproteinsequences,proteinidentifierscanbeprovidedasinput;correspondingsequenceswillbefetchedfromRefSeqandMiSTdatabases.Resultsofthepipelinesarepresentedasinteractivepictureswithcross-linkstoPfam,CDD,RefSeqandMiSTdatabases.Allproducedresultscanbedownloadedforsubsequentanalysis.
87
General
TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies
CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4
1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof
MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer
Centre,NewYork,USA(pending)
CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Reconstructingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.
88
General
AFlexiblePipelineforthePredictionofBiomarkersRelevanttoDrugSensitivity
V.KeithHughitt1,SayehGorjifard1,AleksandraM.Michalowski1,JohnK.Simmons2,RyanDale1,EricC.Polley3,JonathanJ.Keats4,BeverlyA.Mock1
1NCI,2PersonalGenomeDiagnostics,3MayoClinic,Rochester,4TGen
V.KeithHughittRecentyearshaveseenanexplosionintheavailabilityofpairedmolecularprofilinganddrugscreendata,providinganunprecedentedopportunityforthedevelopmentoftargetedtherapiesbasedonanindividual’sgeneticbackground.Despiteanumberofrecentsuccessesindiseasesrangingfromcysticfibrosistocancer,significanthurdlesremaininourabilitytoaccuratelypredicttreatmentsbasedonmolecularprofilingdata.Inparticular,fewsuchtoolsexistthatallowtheintegrationofheterogeneousdatatypes(e.g.genomic,transcriptomic,andsomaticmutations),alongwithhigh-throughputdrugscreendatatomakepredictionsabouttreatmentefficacy.Here,wedescribeageneralizedopen-sourcepipelinedevelopedfortheanalysisofprecisionmedicinedata,PharmacogenomicsPredictionPipeline,or“P3”.ThemodulardesignofP3enablestheinclusionofarbitraryinputdatatypesandtheselectionfrommultiplealternativemachinelearningalgorithms,whileautomatedstatisticalandvisualizationreportingstepsincorporatedthroughoutthepipelineassistinparametertuningandearlydetectionofproblematicdataelements.ByincorporatingexternalbiologicalannotationsfromsourcessuchasTheMolecularSignaturesDatabase(MSigDB),DrugSignaturesDatabase(DSigDB),andDrugBank,P3isabletodetectimportantpathwayscorrelatedwithdrugsensitivity,whiletheinclusionofmolecularprofilingandclinicaldatafromexternalpatientandcelllinesdatasetsallowsP3tofocusitseffortsongeneswhicharemostlikelytoplayaroleintherapeuticresponse.TodemonstratetheuseofP3forpreclinicalbiomarkerprediction,weappliedP3toanunpublishedmultiplemyelomadatasetconsistingofexome,RNA-Seq,anddrugscreendatafor1900compoundsacross45tumorcelllines.Furthermore,geneexpressionandclinicaldatafrom20additionalpublically-availablepatientandcelllinemultiplemyelomadatasets(>5,500samplesintotal),alongwithdatafromtheGDSCandCCLEdrugsensitivityexperimentswerealsoanalyzed,providingarichsourceofinformationwithrespecttothebiologicalrelevanceofputativebiomarkersdetectedbythepipeline.
89
General
CreatingaMetabolicSyndromeResearchResource(MetSRR)
WillyshaJenkins1,ChristianRichardson2,ClarLyndaWilliams-DeVanePhD1
1FiskUniversityNashvilleTN,2DukeUniversityDurhamNC
WillyshaJenkinsMetabolicsyndrome(MetS)isamultifacetedsyndrome.Riskfactorsincludevisceraladiposity,dyslipidemia,hyperglycemia,hypertension,andenvironmentalfactors.Anestablishedcomponentofchronicdiseasesequela,MetSleadstoanincreasedriskofcardiovasculardiseaseandtype2diabetes.MetSalsoleadstoanincreasedriskofstroke.ComparativestudieshaveidentifiedheterogeneityinthepathologyofMetSacrossgroups,however,theetiologyofthesedifferenceshasyettobeelucidated.DespitethepresenceofpublicrepositoriesofbiologicalMetS-relateddata,theabilitytoaccessandworksaiddatahasitschallenges.Theprocessofqueryingdatabases,wrestlingwithsoftwareandwranglingdataintoworkableformatspriortoanalysisisbothcumbersomeandtimeconsuming.TheMetabolicSyndromeResearchResource(MetSRR)isacurateddatabasethatprovidesaccesstoMetSassociatedbiologicalandancillarydata.ItisanamalgamationofcurrentandpotentialbiomarkersofMetSextractedfromrelevantNationalHealthandNutritionExaminationSurvey(NHANES)datafrom1999-2016.Eachpotentialbiomarkerselectionwasdrivenbyinsightselucidatedbythereviewofover100peer-reviewedarticles.Itincludes28demographic,surveyandknownMetSrelatedvariables.Thereare9curatedcategoricalvariablesand42potentiallynovelbiomarkers.Allmeasuresarecapturedfromover90,000individuals.ThisbiocurationeffortwillprovideincreasedaccesstocuratedMetSrelateddata.ItwillalsoserveasahypothesisgenerationtoolfordisparateMetSetiologydiscovery,providingtheabilitytogenerate;andexportethnicgroup/race,sex,andage-specificcurateddatasets.MetSRRseekstobroadenparticipationinresearcheffortstoidentifyclinicallyevaluativedisparateMetSbiomarkers.Tothebestofourknowledge,MetSRRistheonlyMetSspecificdatabasetargetedatuncoveringthedisparateetiologyofMetSthroughbiocuration.
90
General
Utilizingcohortinformationtofindcausativevariants
SenayKafkas,RobertHoehndorf
ComputationalBioscienceResearchCenter,Computer,ElectricalandMathematicalSciences&EngineeringDivision,KingAbdullahUniversityScienceandTechnology,4700KAUST,Thuwal,
23955-6900SaudiArabiaSenayKafkasIdentificationofcausativevariantsingenomicdataischallenging.Currentstudiesfocusonprioritizingvariantswithinindividualgenomes,orapplystatisticalmethods(e.g.GWAS)tolargecohorts.WiththerapidadvancementsandcostdecreaseinNGS,scientistsareabletoproducesequencedatafromlargediseasecohortsandhealthypopulation.Forexample,UKBiobankmakesavailablegenotypetophenotyperelationsfor>500,000individualsandwholeexomesequencing(WES)datafor50,000individuals.Patientswiththesame/similarsetofphenotypesmaysharethesame/biologicallyrelatedgeneticabnormalitiesandriskfactors.Theavailabilityofthesedatasetsmayallowustostratifyindividualsbytheirphenotypeandusethisinformationtoidentifycausativevariantswithinlargecohorts.WeproposeanewmethodthatstratifiespatientsbytheirphenotypesandidentifiesthesetofcausativevariantswhichcanexplainphenotypesinmostindividualswithinacohortfromWES/WGS.First,wegeneratedandusedsyntheticdiseasecohortstoevaluateourmethod.Weusedthehumangenotype-phenotypeassociationsfromClinVarandthesequencedatafrom1000Genomesandgeneratedsyntheticcohortswithdifferentpopulationsizesfor200randomlyselecteddiseasesfromClinVar.TogenerateasyntheticdiseasecohortofsizeN,firstwepickedrandomlyNindividualsfrom1000Genomesandthenforeachindividual,wepickedrandomlyoneofthevariantsofthegivendiseaseandaddedittothegenotypeofthegivenindividual.Wepre-processedthesequencedatabyannotatingwithCADDandselectingonlythemostdeleteriousvariantofagivengeneforeachindividual.Furthermore,we“normalize”pathogenicityscoresbasedontheirfrequencieswithinapopulationinordertoaccountfordifferentdistributionwithingenesbasedontheirlength.WethenapplyourmethodonUKBiobank.WedevelopedamethodthatidentifiescausativevariantsbyutilizinginformationaboutsharedphenotypeswithinacohortandcomparedthemagainstindividuallyprioritizingvariantsusingWES/WGSdataandaveragegeneranks.Ourapproachreliesonamachinelearningmodeltrainedonapathogenicitypredictionscore(e.g.CADD),thefrequencyofobservingapathogenicityscoreaboveacertainthresholdinthesamegenewithinapopulation,andusesthiscohortandphenotype-derivedinformationasfeaturetopredictcausativevariantswithinindividualgenomesequences.Ourmethodcanidentifycausativevariantsinsmallandmedium-sizedcohorts(2to100individuals).Asthediseasebecomesmorecomplex(i.e.involvingharmfulvariantsinmultiplegenes),ourmachinelearningmodelimprovesoverestablishedmethodsinparticularinlargercohorts(>80individuals).Currently,weappliedourmethodonUKBiobankandsuggestcandidatecausativevariantsfor1499complexdiseases.
91
General
IntegratedanalysisofJAK-STATpathwayinhomeostasis,simulatedinflammationandtumour
MilicaKrunic1,AnzhelikaKarjalainen1,MojoyinolaJoannaOla1,StephenShoebridge1,SabineMacho-Maschler1,CarolineLassnig1,AndreaPoelzl1,MatthiasFarlik2,NikolausFortelny2,
ChristophBock2,BirgitStrobl1,MathiasMueller1
1InstituteofAnimalBreedingandGeneticsandBiomodelsAustriaUniversityofVeterinaryMedicineViennaAustria;2CeMM–CenterforMolecularMedicineAustrianAcademyofSciences
ViennaAustria
MilicaKrunicJanuskinases(JAKs)andsignaltransducersandactivatorsoftranscription(STATs)playakeyroleincytokinesignallingandinthedefenceagainstinfectionandcancer.JAK-STATsignallingcomponentsinteractwithchromatinremodellingproteinsandchangechromatinarchitecture/landscapeduringcelldifferentiationandrecognitionandeliminationofpathogens.Usingdifferentsequencingapproaches(ATAC-Seq,ChIPmentation,single-cellRNA-Seq,RNA-Seq),ourgoalistountangletherolesofJAK-STATproteinsinshapingchromatinlandscapesofmyeloidandlymphoidcellsinhomeostasis,sterile(simulated)inflammationandwithintumourmicroenvironment.Additionally,weareinvestigatinghowevolutionaryconservedSTATproteinisoformsinteractwithchromatinandco-regulatoryproteinstoinducecelltype-andgene-specificresponses.Thepostershowsoursummarisedfindingsasaresultofintegrationofdifferentapproaches.
92
General
BEERS2:TheNextGenerationofRNA-SeqSimulator
NicholasF.Lahens1,ThomasG.Brooks1,DimitraSarantopoulou1,SoumyashantNayak1,CrisW.Lawrence1,AnandSrinivasan2,JonathanSchug3,4,GarretA.FitzGerald1,5,JohnB.Hogenesch6,
YosephBarash4,GregoryR.Grant1,4
1InstituteforTranslationalMedicineandTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;2PMACSEnterpriseResearchApplicationsandHigh
PerformanceComputing,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;3InstituteforDiabetes,Obesity,andMetabolism,PerelmanSchoolofMedicine,Universityof
Pennsylvania,Philadelphia,PA;4DepartmentofGenetics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;5DepartmentofSystemPharmacologyandTranslationalTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;6DivisionofHumanGenetics,DepartmentofPediatrics,Centerfor
Chronobiology,CincinnatiChildren'sHospitalMedicalCenter,Cincinnati,OH
NicholasLahensTheaccurateinterpretationofRNA-Seqdatapresentsamovingtargetasscientistscontinuetointroducenewexperimentaltechniquesandanalysisalgorithms.Thischallengehasledresearcherstoperformasubstantialnumberofbenchmarkingstudiesinordertodeterminebestanalysispractices.Simulateddatasetshaveproventobeaninvaluabletoolintheseefforts.Despitethisstrongneedforsimulateddata,onlyafewRNA-Seqsimulatorshavebeenreleasedinthepublicdomain,andallofthemarebasedonsimplifyingassumptionsthatlimittheirutility.ToaddresstheseshortcomingsandgeneraterealisticsimulateddatawearedevelopingtheBenchmarkerforEvaluatingtheEffectivenessofRNA-SeqSoftware(BEERS)2:anopen-source,modularsimulatorthatmodelseachstepintheprocessofconvertingRNAmoleculesintosequencingreads.WetakeanempiricalapproachtogeneratingrealisticRNAsamplesreflectingbiologicalvariability,alternativesplicing,andallele-specificexpression,whichusesrealdatatotraintheparameters.Next,wemodelbiochemicalreactionsandbiasesfromeachstepinlibraryconstructionasseparatemodules.Usinganobject-orientedparadigm,eachmodulehaswell-definedinputsandoutputsallowinguserstoeasilysubstitutenewmodules.ThisdesigngivesBEERS2theflexibilitytomodelchangestolibraryconstructionandsequencingprotocols,evolvinginparallelwithsequencingtechnology.BEERS2isopensource,freelyavailable,andwillbeacrucialtoolforthecommunityaswecontinuetodevelopstandardsfortranscriptomeanalysis.
93
General
EffectModificationbyAgeonaDiagnosticThree-Gene-SignatureinPatientswithActiveTuberculosis
LaurenMcDonnell1,CarlyA.Bobak1,2,MatthewNemesure1,JustinLin1,JaneE.Hill1
1ThayerSchoolofEngineeringatDartmouthCollege,2GeiselSchoolofMedicineatDartmouthCollege
LaurenMcDonnellIntroductionTuberculosis(TB)istheleadingcauseofdeathfromasingleinfectiousagentworldwide(1).In2017,therewere10millionreportedcasesofTBandanother1.3milliondeathsfromthedisease(1).ItiscurrentlytheleadingkillerforindividualswhoareHIVpositive(1).In2014,theWHOdevelopedtheambitiousSustainableDevelopmentGoals(SDGs)whichincluded"EndTB",amajorprogramaimingtoeradicatetheTBepidemicby2030(2).Accomplishingthiswillrequiremoreadvanceddiagnosticsthatarelessinvasiveanddeterminethediseasestatusmorequicklyandmorereliably.Inouranalysis,weaimtomodelriskfactorsassociatedwiththedevelopmentofTB.Here,wearelookingatdemographicfeaturesfrommulti-cohortstudiespullingdatafromthirtydifferentcountriesfromtheGeneExpressionOmnibusexaminingpatientswithactiveTB,latentTB,otherdiseases,andhealthycontrols.Thedataispulledpredominantlyfromdevelopingcountries,butalsoincludessamplesfromdevelopedcountries,includingtheUK,France,Germany,andtheUnitedStates.Intotal,thedatasetincludes3,096participants.Metaanalysisofsimilardatasetshaveproposedathree-gene-scoreasa"global"tuberculosismetric(3).ThistypeofanalysissuggeststhatallactiveTBpatients,regardlessofotherfactors,willexpressthisgenescore.OurhypothesisisthatthisactiveTBwillbeadditionallymediatedbydemographicfactorssuchasageandHIVstatusthatareassociatedwithTB.MethodologyWeperformedamultivariatelogisticregressionanalysistoidentifydemographicfeaturesassociatedwithculture-confirmedTuberculosis.Themodelfeaturesincludedage,HIVstatus,andgeneexpressionsforeachgeneindividually(GBP5,DUSP3,andKLF2),aswellasaninteractiontermforHIVandagewitheachofthethreegenes.ResultsTheresultsofourmultivariatelogisticregressionsuggestthatagemodifiesallthreegenesintheproposedglobalgenesignatures(p-valuesof5.38e-05,6.75e-05,and,0.01012,forGBP5,KLF2andDUSP3respectively).InitialfindingsalsoindicatethatHIVstatusisamediatoroftheeffectofGBP5(p-valueof0.03437).Knowingthattherelationshipbetweenthegeneexpressionofthesethreegenesvariesbydemographicsmaychangethewaythatadiagnosticisimplementedinclinic.Ourhopeisthatthisanalysiswillbeusedtofurtherrefinethethree-genesignatureforspecificdemographicgroupswhereitmaybemosteffectiveindiagnosingactiveTB.Citations(1)WHOGlobalTuberculosisReport2018www.who.int/tb/publications/global_report/en/(2)EndingTuberculosisby2030:CanWeDoIt?A.B.Suthar,R.Zachariah,Harrieshttps://www.ingentaconnect.com/contentone/iuatld/ijtld/2016/00000020/00000009/art00007?crawler=true(3)Genome-WideExpressionforDiagnosisofPulmonaryTuberculosis:aMulticohortAnalysishttps://www.ncbi.nlm.nih.gov/pubmed/26907218
94
General
Classificationandmutationpredictionfromgastrointestinalcancerhistopathologyimagesusingdeeplearning
SungHakLee1,Hyun-JongJang2
1DepartmentofHospitalPathology,SeoulSt.Mary’sHospital,CollegeofMedicine,TheCatholicUniversityofKorea,2DepartmentofPhysiology,CollegeofMedicine,TheCatholic
UniversityofKorea
SungHakLeeBACKGROUND:Althoughmicroscopicanalysisoftissueslideshasbeenthebasisfordiseasediagnosisfordecades,intra-andinter-observervariabilitiesremainissuestoberesolved.TherecentintroductionofdigitalscannershasallowedforresearcherstousedeeplearningintheanalysisoftissueimagesbecausemanyH&Ewholeslideimages(WSIs)areavailable.Inthepresentstudy,weinvestigatedthepossibilityofadeeplearning-based,fullyautomated,computer-aideddiagnosissystemwithWSIsfromagastricadenocarcinoma(STAD)dataset.Inaddition,wetrainedthenetworktopredictseveralcommonlymutatedgenesinSTAD.Furthermore,weshowedthatdeeplearningcanpredictMSIdirectlyfromH&Eimages.MATERIALSANDMETHODS:Westudiedtheautomaticclassificationof‘normal’and‘tumor’regionsusingatotalof432H&E-stainedWSIsfromTCGAgastriccancerimagedataset.Theslidesweretiledinnon-overlapping360x360pixelwindowsatamagnificationof20x.Weused70%ofthosetilesfortraining,15%forvalidation,and15%forfinaltesting.Thedeeplearningwithconvolutionalneuralnetworkswasperformedbasedoninceptionv3architecture.TostudythepredictionofgenemutationsfromH&Eimages,averageareaunderthecurve(AUC)valuesforKRASandSMAD4mutation(93and88cases,respectively)werecalculatedusingourautomatictumorclassificationdeep-learningapproach.TostudythepredictionofMSI(MSSvs.MSI-H)fromH&Eimages,383caseswereenrolledusingthesameapproach.RESULTS:Theperformanceofourmethodiscomparabletothatofpathologists,withanAUCofupto0.999.Furthermore,wetrainedthenetworktopredicttwocommonlymutatedgenesinSTAD(KRASandSMAD)andinvestigatedwhethertheycanbepredictedfrompathologyH&Eimages.WefoundthatKRASandSMADmutationcanbepredictedfrompathologyimages,withAUCsof0.711to0.737,similarresultsfrompreviousstudieswithnon-smallcelllungcancerhistopathologyimagesusingdeeplearning.ForthepredictionofMSI,patch-levelandpatient-levelAUCswere0.843and0.912,respectively,whichissuperiortothepreviousstudieswithTCGA-COADand-STADhistopathologyimages.CONCLUSIONS:Thesefindingssuggestthatdeep-learningmodelscanassistpathologistsinthedetectionofcancersubtypesandinthepredictionofgenemutationsandMSIstatus.Aftertrainingonlargerdatasetsandprospectivevalidation,thisapproachhasthepotentialtoprovideimmunotherapytoamuchbroadersubsetofpatientswithSTAD.
95
General
MappingtheEmergenceandMigrationofHematopoieticStemCellsandProgenitorsDuringHumanDevelopmentatSingleCellResolution
FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola
DepartmentofMolecular,CellandDevelopmentalBiology,UCLA,LosAngeles,CA,USA
FeiyangMaHematopoiesisisestablishedduringdevelopmentthroughmultiplewavesofbloodcellproduction,startingwithlineage-primedprogenitorsrequiredfortheembryosneeds,andculminatinginthegenerationofself-renewinghematopoieticstemcells(HSCs)forlife-longhematopoiesis.Althoughhematopoieticontogenyhasbeenstudiedextensivelyinmice,welackknowledgeoftheanatomical,temporalandmolecularmapforhematopoieticdevelopmentinhuman.PriorstudiessuggestthatHSCsemergefromhemogenicendotheliumintheaorta-gonad-mesonephros(AGM)regionbetween4-6weeksofhumangestation.Extraembryonicsitesincludingtheplacenta,umbilicalandvitellinearteries,andtheyolksac,havebeenproposedtogenerateHSCsinthemouse.However,whetherthesamesitesgenerateHSCsinhumanisunclear,mainlyduetothelimitedaccesstodevelopmentaltissuesandlackofreliablemethodstoidentifydevelopinghumanHSCs.Wecreatedasingle-celltranscriptomemapofhemato-vascularcells(CD34+and/orCD31+)fromhumanhematopoietictissuesat1stand2ndtrimester.Usingamolecularsignatureofself-renewingHSCsdefinedinourpreviousmolecularandfunctionalstudies,wecouldidentifyCD34+Thy1+RUNX1+HOXA7+MLLT3+HLF+cellsasHSCsthroughoutdevelopment.Analysesof5-wkAGMrevealedadistinctpopulationofnewlyemergedHSCsthatvanishedby7wks.HSCscolonizedthefetalliverby6wks,wheretheyexpandedanddifferentiatedbeyond15wks.SmallbutdistinctpopulationexpressingHSCmolecularmarkerswasreproduciblydetectedin5wkplacentas.Atthistime,theheart,umbilicalcordandfetalliverlackedclearHSCpopulations,implyingminimalspreadingthroughcirculatingblood.Interestingly,precedingHSCcolonization,the5wkfetalliveralreadyharboredCD34+Thy1-RUNX1+HOXA7-MLLT3-HLF-progenitorsthatco-expressedmarkersassociatedwitherythro-myeloidandlympho-myeloidpotential.Comparablepopulationswereabundantintheyolksac,suggestiveoftheirorigin.Thisdata-setprovidesanunprecedentedresourcetodissectthedynamicsandmolecularpathwaysgoverningtheemergenceandprogressionofdistinctwavesofhematopoieticcellsduringhumandevelopment,andservesasareferencemapforthegenerationofHSCsinvitrofortherapeuticpurposes.
96
General
Large-scaleMachineLearningandGraphAnalyticsforFunctionalPredictionofPathogenProteins
JasonMcDermott1,SongFeng1,WilliamNelson1,Joon-YongLee1,SayanGhosh1,ArifulKhan1,MahanteshHalappanavar1,JustineNguyen2,JonathanPruneda2,DavidBaltrus3,JoshuaAdkins1
1PacificNorthwestNationalLaboratory,2OregonHealth&ScienceUniversity,3Universityof
Arizona
JasonMcDermottProteinsenactthefunctionalityencodedbygenomesandsounderstandingproteinfunctioniscriticaltomanyareasofbiology.Predictionofproteinfunctionfromsequenceispossiblebecauseofevolutionaryrelationshipsbetweenproteinswithsimilarfunctions,andexistingalgorithmscanidentifythecorrespondingsequencesimilarity.However,manyproteinshavesimilarfunctionsbutdiversesequences,whichthwartexistingmethods,anddrivenbyadvancesinsequencingtechnologythenumberofproteinsequenceswithnoknownfunctionorsimilaritytoproteinsofknownfunctionislargeandgrowingrapidly.Weusereducedaminoacidalphabetmappingandkmer-basedproteinsequencerepresentationtodetectfunctionalsimilaritiesbetweenproteinsandapplythismethodtobacterialandviralproteinsthatmimiceukaryoticubiquitinligasesanddeubiquitinasesandclassesofbacteriocins.Thesemodelsallowpredictionofnovelexamplesthatarenotdetectedbytraditionalsequencesimilarity,andcanprovideinsightintoactivesitesorotherfunctionaldomainsfortheproteins.Toexploresequencespaceinamorediscovery-orientedwaywehaveappliedthisapproachtoaverylargesetofbacterialproteinsequences(>20millionsequences)anduseaGPU-basedalgorithmtoquicklycalculateasimilaritygraphbasedonproteinfeaturesbeyondtraditionalsequencesimilarity.Exascalegraphanalyticsmethodsareusedtoidentifygroupsofcloselyrelatedsequencesfromthesimilaritygraph.Weshowthatthismethodcanrecapitulateknownrelationshipsbetweenproteins,highlightinconsistenciesintheunderlyingproteindatabase,andprovidehypothesesforfunctionsofnovelproteinsthusprovidingalarge-scalesequencelandscape.
97
General
Gene-setanalysisusingGWASsummarystatisticsandGTExdatabase
MasahiroNakatochi
DepartmentofNursing,NagoyaUniversityGraduateSchoolofMedicine
MasahiroNakatochiRecently,samplesizesofgenome-wideassociationstudies(GWASs)arerapidlyincreasing.Consequently,manygeneticlociassociatedwithtraitshavebeenidentified.ItisdifficulttointerprethowthesemanylociidentifiedbyGWAScontributetothetraits.AsafunctionofSNP,regulationofgeneexpressionlevelisconsidered.TheSNPiscalledasexpressionquantitativetraitloci(eQTLs).TheGTExprojectrevealedmanyeQTLsinmanytissuesofhuman.Inthisstudy,IproposeanapproachofagenesetanalysisusingGWASsummarystatisticsandGTExdatabasetoinvestigatehowthegeneticlociidentifiedbyGWAScontributetothetrait.Thisapproachhasthreesteps.Atfirst,trait-associatedSNPsareidentifiedbyGWAS.Second,geneswhoseexpressionlevelwasassociatedwithtrait-associatedSNPsinatleastonetissueintheGTExdatabasearesearched.Thesegeneswereclassifiedintoeitherofpositivelyornegativelycorrelatedgenes.Finally,genesetenrichmentanalysesofpositivelycorrelatedgenesandnegativelycorrelatedgenesareperformedwiththemodifiedFisher’sexacttesttoidentifytrait-associatedpathwaysorgenesets.Usingthisapproach,Ifoundserumuricacid(SUA)-associatedgenesetsbasedonaSUAGWAS.GenesetenrichmentanalysisofUniProttermsfoundtheterms“Williams-Beurensyndrome”,“sodium”,“transport”,“sodiumtransport”,and“alternativesplicing”wereenrichedforthepositivelycorrelatedgenes.ThisapproachprovidesanotherinsightintotheSNPsidentifiedbyGWAS.
98
GeneralGeneral
TargetingCancerviaSignalingPathways:ANovelApproachtotheDiscoveryofGeneCCDC191'sDouble-agentFunctionusingDifferentialGeneExpression,HeatMap
AnalysesthroughAIDeepLearning,andMathematicalModeling
AnnieOstojic
PurdueUniversity
AnnieOstojicAccordingtoarecentJohnsHopkinsUniversitystudypostedinMayof2018,thenumberoftotalgenesinthegenomewasrecalculatedtobe43,162genescomprisedof21,306protein-codedgenesand21,865non-codedgenes.WithcompletionofbasepairsequencingintheHumanGenomeProjectbackin2003,hopeexistedforaccelerationofnewmedicaltreatmentsanddiseaseintervention.However,earlierbioinformaticprocesseswereunabletoproduceresultsquicklyenough,somanygenefunctionsremainunknowntodate.Aneedexiststoanalyzegenefunctionsinpathwaystomeetachangingmedicalindustryofpharmacogenomics,personalizedmedicine,andcancertreatmentsrelativetogeneexpressionpatterns.Newmethodologyfordeterminingfunctionsofunstudiedgenestorapidlyextrapolate,classify,andcorrelatetheirgeneexpressionstobiologicalpathwaysisattheforefrontofbioinformaticstudies.ThisresearchdiscoveredthefunctionofgeneCCDC191,acoiled-coildomain-containingprotein-codinggene,whosefunctionhadnotbeenfullystudiednordefined.AnovelapproachwasutilizedtodeterminethefunctionofCCDC191bycombininggeneexpressionanalysis,patientsurvivalanalysis,differentialgeneexpression,heatmapwithAIdeeplearning,andreverseengineeringmathematicalmodeling.ThisstudypresentsanalysesandinsightsintogeneCCDC191whichhavenotbeenperformedprior,anditprovidesareplicablemethodologywhichincorporatesAIdeeplearningimageclassification,andreverseengineeringmathematicalmodelingtodeterminegenefunctionsinpathwaysandcancerconnectedness.
99
General
RFEX:SimpleRandomForestModelandSampleExplainerfornon-MachineLearningexperts
DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar
SanFranciscoStateUniversity(allauthors)
DragutinPetkovicMachineLearning(ML)isbecominganincreasinglycriticaltechnologyinmanyareas.However,itscomplexityanditsfrequent“non-transparency”createsignificantchallenges,especiallyinthebiomedicalandhealthareas.OneofthecriticalcomponentsinaddressingtheabovechallengesistheexplainabilityortransparencyofMLsystems,whichreferstothemodel(relatedtothewholedata)andsampleexplainability(relatedtospecificsamples).OurresearchfocusesonbothmodelandsampleexplainabilityofRandomForest(RF)classifiers.OurRFexplainer,RFEX,isdesignedfromthegroundupwithnon-MLexpertsinmind,andwithsimplicityandfamiliarity,e.g.providingaone-pagetabularoutputandmeasuresfamiliartomostusers.InthispaperwepresentsignificantimprovementinRFEXModelexplainercomparedtotheversionpublishedpreviously,anewRFEXSampleexplainerthatprovidesexplanationofhowtheRFclassifiesaparticulardatasampleandisdesignedtodirectlyrelatetoRFEXModelexplainer,andaRFEXModelandSampleexplainercasestudyfromourcollaborationwiththeJ.CraigVenterInstitute(JCVI).WeshowthatourapproachoffersasimpleyetpowerfulmeansofexplainingRFclassificationatthemodelandsamplelevels,andinsomecasesevenpointstoareasofnewinvestigation.RFEXiseasytoimplementusingavailableRFtoolsanditstabularformatofferseasy-to-understandrepresentationsfornon-experts,enablingthemtobetterleveragetheRFtechnology.
100
General
ApparentbiastowardlonggenemisregulationinMeCP2syndromesdisappearsaftercontrollingforbaselinevariations
AyushT.Raman1,2,AmyE.Pohodich2,Ying-WooiWan2,HariKrishnaYalamanchili2,WilliamE.Lowry3,HudaY.Zoghbi2,ZhandongLiu2
1BroadInstituteofMITandHarvard,2BaylorCollegeofMedicine,3UniversityofCaliforniaLos
Angeles
AyushRamanBackground:RettsyndromeisaneurodevelopmentaldisordercausedbymutationsinMECP2,amethyl-bindingproteinwhosetaskistoorchestrategeneexpression,andMeCP2mutationsdisrupttheexpressionofseveralthousandgenes.Overthepasttenyears,anumberofstudiesobservedthatRettsyndromeandotherdisordersthataffectneuronalsynapsesseemtopreferentiallydysregulategenesthatarelongerthan100Kb.Theselength-dependenttranscriptionalchangesinMeCP2-mutantsamplesaremodest,but,giventhelowsensitivityofhigh-throughputtranscriptomeprofilingtechnology,herewere-evaluatethestatisticalsignificanceoftheseresults.Results:Wedeveloparobuststatisticalapproachtoestimatenoiseaccuratelyandidentifystatisticallysignificantgenelength-dependentchanges.Wefindthattheapparentlength-dependenttrendspreviouslyobservedinMeCP2microarrayandRNA-sequencingdatasetsdisappearafterestimatingbaselinevariability(i.e.,intra-sampledifferences)fromrandomizedcontrolsamplesacrosspublicallyavailable17differentMeCP2datasets.WeshowthatevenMAQC/SEQCPhase-IIIbenchmarkdatasetsarepronetothelonggenebias,whichdoesnotincludeMeCP2oritseffectsonexpression—suggestingthatthebiasisnotaninherentfeatureofgeneexpressionfollowingMeCP2disruption.WehypothesizedthatPCRamplification,aprocesssharedbybothmicroarrayandRNA-seqtechnologies,mightintroducetheobservedbiasinlonggeneexpression.WefindnobiaswithnanoStringtechnology,atechniquethatdoesnotusePCRamplification,forSEQC/MAQCsamplesorMecp2mutantsamples.Thisconfirmedournotionthatthepreviousobservationsoflong-genebiasresultedfromamplification-basedtechnologiesandthefailuretoestablishaproperbaseline.Conclusions:Weconcludethataccuratecharacterizationoflength-dependent(orother)trendsrequiresestablishingabaselinefromrandomizedcontrolsamples.WeproposethatsmallerfoldchangesintranscriptionobservedafterPCRamplificationleadstoanoverestimationoflonggeneexpressionlevels.
101
General
Predictionofchronologicalandbiologicalagefromlaboratorydata
LukeSagers1,LukeMelas-Kyriazi2,ChiragJ.Patel3,ArjunK.Manrai1
1BostonChildren’sHospitalComputationalHealthInformaticsProgram,2HarvardUniversityDepartmentofMathematics,3HarvardMedicalSchoolDepartmentofBiomedicalInformatics
LukeSagersAginghaspronouncedeffectsonbloodlaboratorybiomarkersusedintheclinic.Priorstudieshavelargelyinvestigatedasinglebiomarkerorpopulationatatime,limitingacomprehensiveviewofbiomarkervariationandagingacrossdifferentpopulations.Herewedevelopasupervisedmachinelearningapproachtostudytheagingprocessusing356bloodbiomarkersmeasuredin67,536individualsacrossdemographicallydiversepopulations.Ourmodelpredictsagewithameanabsoluteerror(MAE)inheld-outdataof4.76yearsandanR2valueof0.92.Agepredictionwashighlyaccurateforthepediatriccohort(MAE=0.87,R2=0.94)butinaccurateforages65+(MAE=4.30,R2=0.25).Extensivevariabilitywasobservedinwhichbiomarkerscarrythemostpredictivepoweracrossdifferentagegroups,genders,andrace/ethnicitygroups,andnovelcandidatebiomarkersofagingwereidentifiedforspecificageranges(e.g.VitaminEforages18-45).Wefurthershowthatpredictorsaccurateforoneagegroupmayfailtogeneralizetoothergroups,andfindthatnearlyathirdofallbiomarkersexhibitnon-linearitynearadulthood.Aspopulationsworldwideundergomajordemographicchanges,itwillbeincreasinglyimportanttocataloguebiomarkervariationacrossagegroupsanddiscovernewbiomarkerstodistinguishchronologicalandbiologicalaging.
102
General
WholegenomesequencinganalysisofinfluenzaCvirusinKorea
SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim
DivisionofInfectiousDiseases,DepartmentofInternalMedicine,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea;DivisionofBrainKorea21ProgramforBiomedicineScience,
CollegeofMedicine,KoreaUniversity,Seoul,SouthKorea;AsiaPacificInfluenzaInstitute,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea
SooyeonLimThroughtheHospital-basedInfluenzaMorbidityandMortality(HIMM)surveillancesystem,973nasopharyngealswabspecimensfromchildrenunder2yearsofagewerecollectedandtestedforinfluenzavirusesusingreal-timePCR.Amongthetestedspecimens,383werepositiveforinfluenzaAand/orBvirus.InfluenzaCviruswasconfirmedinfivespecimens.Inthisstudy,weusedfiveinfluenzaCviruspositivespecimensandacell-culturedinfluenzaCvirus.ViralRNAwasisolatedusingtheQIAampviralRNAminikit(Qiagen,Hilden,Germany)followingamanufacturer’sinstructions.AllisolatedRNAwasfinallyelutedwith60ulofdistilledwater.ReversetranscriptionreactionwasperformedbyPrimescript1ststrandcDNAsynthesiskit(Takara,Shiga,Japan)usinguni-5’primer.Thegenome-wideamplificationoftheinfluenzaCviruswasperformedusingtaqpolymerase.TheamplifiedgenefragmentswereperformedusingtheNexteraXTDNAlibraryPrepkit(Illumina),accordingtothemanufacturer’sprotocol.ThisstudywasthefirstreportofinfluenzaCvirususingNGSanalysisinSouthKorea.Inthisstudy,youngchildrenwithinfluenzaCvirusinfectionshadacuterespiratoryillnesses,suchasfever,rhinorrhea,andcough,butnopneumoniaorsevererespiratoryillnesswasobserved.BasedonNGSanalysis,wecanexpandourunderstandingvarioussymptomsofinfluenzaCvirus.
103
General
MiningtheHumuhumunukunukuapuaandtheShakaofAutismwithBigDataBiomedicalDataScience
PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyMariePaskov,NathanielStockham,MayaVarma,EmilieLeBlanc,JackKent,Yordan
Penev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall
DepartmentsofPediatrics(SystemsMedicine)andBiomedicalDataScience,StanfordUniversity
DennisWallMentalhealthisarguablyatthecoreofallhealth,andearlychildhoodmentalhealthpredictsalongtermhealthylifecourse.Yet,finding,treating,andpreventingmentalhealthdisordersinchildrenislimitedbyreachandscalablemethods.Thankfully,advancesinAIandubiquitoustechnologyhavemarshaledinunparalleledopportunitiesforscalablemobilehealth.Wehaveconstructedaseriesofmobilesolutionsthattreatandtrackwhilesimultaneouslybuildingnovelcomputervisionlibrariesforprecisionmodels.Thesesolutionsfunctionasmobilegamesthatarehighlyengaginganddesignedfortheindividual,encouragingcompliancewiththerequired“dose”whilepassivelycollectingmetricstomeasure,andultimatelypredictoutcomes.Wecanquantifyordigitizeachild’sphenotypethroughthesepassivelycollecteddata,notjustonce,butmanytimes,asthechildplaysourgamesandlearnsthroughplaying.Thesegamesengendertrustandastheydo,we“crowd”buildacommunityofstakeholdersthatnotonlysharesPhenomedata,butalsodataontheirGenomeandtheEnvironment.Withthe3modalities,weusedatafusionmultivariatetechniquestoresolvetheG+E=Pequationforautismandsetthestagefordoingthesameinotherspectrumdisordersacrossmentalhealth.
104
General
Developmentofarecurrencepredictionmodelforearlylungadenocarcinomausingradiomics-basedartificialintelligence
HeeChulYang,GunseokPark,JiEunOh
DivisionofConvergenceTechnology,NationalCancerCenterResearchInstitute
HeeChulYangPurpose:Thisstudyaimedatpredictingtherecurrenceaftercurativeresectionforthepatientswithlungadenocarcinoma(ADC)usingthephenotypicradiomicsfeaturesobtainedfromtheCTimages.Material:FromJanuary1,2010,toDecember31,2015,atotalof604primarylungADCpatientswhohadthetumorsizeof1-3cmunderwentcurativeresectionatasingleinstitution.Method:Atotalof604patients’preoperativeCTimageswereusedforfeatureextraction.Thefinaldatasetwasrandomizedintoatrainingset(n=424)andatestset(n=180)withtheratioof7:3.Radiomicsfeatureswereselectedfromt-test(P<0.05)andaradiomicssignaturewasclassifiedbythelogisticregressionmodel.TheoptimalmodelwasevaluatedthroughaROCcurve.Result:Inalogisticregressionanalysis,6radiomicsfeatureswerefinallyselectedfrom51featurestobuildaradiomicssignaturethatwassignificantlyassociatedwithrecurrence.Theoptimalmodelwasbuiltwithfeaturesassociatedwiththedependentvariable.TheypresentedgoodperformanceinthepredictionofrecurrencealonewithanAUCof76.2%accuracy.Thetestsetvalidated72.2%accuracy.Conclusion:Theradiomicssignaturecanbeausefulrecurrencepredictiontooleveninsmall-sizedlungADC.
105
General
DRLPC:DimensionReductionofSequencingDatausingLocalPrincipalComponents
YunJooYoo1,FatemehYavartanu1,ShelleyB.Bull2
1SeoulNationalUniversity,2TheLunenfeld-TanenbaumResearchInstitute
YunJooYooGenome-wideassociationstudies(GWAS)usingsinglenucleotidepolymorphism(SNP)datausuallyhavemillionsofvariableswithcomplexcorrelationstructureresultingfromlinkagedisequilibrium.Whenmulti-SNPjointanalysisusingmultipleregressionisapplied,adimensionreductionmethodsuchasprincipalcomponentanalysiscanbeconsidered.ReplacingSNPdatawithprincipalcomponentscanresolvemulti-collinearitywhichoftenoccursinregressionusinghigh-densitysequencingorimputedSNPdata.However,theprincipalcomponentsconstructedfromallSNPvariablesinaregionarehardtointerpretasabiologicalentityandarenotusefulforlocalizationandfinemapping.Inthisstudy,weproposeanalgorithmDRLPC(DimensionReductionusingLocalPrincipalComponents)toreducethedimensionforregressionanalysisbyselectingclustersofSNPsinhighcorrelationandreplacingeachclusterbyalocalprincipalcomponentconstructedfromtheSNPsinthecluster.Thealgorithmaimstoresolvemulticollinearitybetweenupdatedvariablesbyconsideringvarianceinflationfactor(VIF)andremovingvariableswithhighVIF.WeexaminedthebehaviourofDRLPCbyapplyingthealgorithmtothe1000GenomesProjectdata.Chromosome22SNPsetsofthreepopulations(EUR,ASN,AFR)weredimensionreducedforeachgeneregionseparatelycomparingseveralchoicesofthresholdvaluesforclusteringandprincipalcomponentsselection.Whenaveragedacrossthegenes,theratioofthenumberoffinalvariablesoverthenumberoforiginalvariableswas50%forthegeneswith5~10SNPsandaslowas10%forthegeneswithmorethan1,000SNPs.ThereductionratewassmallerfortheAFRpopulationcomparedtotheotherpopulationsEURandASN,possiblyduetoweakerLDintheAfricanpopulation.Wealsocomparedthepowerofmulti-SNPtestsconstructedbasedonregressionresultsobtainedfromtheoriginaldataanddimensionreduceddata.ThesetestsincludegeneralizedWald,LC(linearcombination)tests,andMLC(Multi-binslinearcombination)tests.LCtestsandMLCtestsarealsodimensionreductiontechniquesinthesensethatLCcombinesallindividualeffectsintoaonedegreeoffreedomtestandandMLCcombinestheindividualeffectsintoalinearcombinationwithinabin(cluster)andconstructsatestwithdegreesoffreedomequaltothenumberofclusters.SinceDRLPCusesthesameclusteringalgorithmbasedoncliquepartitioningasMLCwecomparedresultsofMLCwithoriginaldatatoDRLPCWaldtestwithprocesseddataunderthesameclusteringthresholdandfoundthattheyyieldsimilarpower.WeconcludethatDRLPCcanprovideefficientdimensionreductionwhileresolvingmulti-collinearityandalsolessenstheproblemofinterpretabilitybecausetheseprincipalcomponentsrepresentsmallersizedregions,possiblyshorthaplotypes.
106
General
Meta-analysisinexhaustedTcellsfromHomosapiensandMusmusculusprovidesnoveltargetsforimmunotherapy
LinZhang1,YichengGuo2,HafumiNishi1
1TohokuUniversityGraduateSchoolofInformationSciences,2ColumbiaUniversity,Department
ofSystemsBiology
LinZhangAntibodytargetimmunecheckpointinhibitorstoreverseTcellexhaustionisapromisingapproachforimmunotherapyofcancers.However,thetherapeuticefficacyisstilllowforknownimmunecheckpointinhibitors,suchasPD1andCTLA4.TcellexhaustionisastateofTcelldysfunctionduringchronicinfectionsandcancers.Itexhibitsseveralcharacteristicfeatures,suchaspooreffectorfunctionsinahierarchicalmanner,impairedmemoryTcellpotential,sustainedupregulationandco-expressionofmultipleinhibitoryreceptors.ThemechanismandpathwaysforTcellexhaustionremaintobefullydescribed.Inthisstudy,weperformedmeta-analysiswith7datasetsfrombothhumansandmice,touncoverthemolecularmechanismofTcelldysfunction.Throughgenesetenrichmentanalysis,thepredefinedexhaustiongenesetswereobservedtobesignificantenrichmentintheexhaustedTcells.Thedifferentexpressionanalysesshowedanoverlapof21upregulationand37downregulationgenessharedbyexhaustedTcellsinhumansandmice.Thesegenesweresignificantlyenrichedinexhaustionresponse-relatedpathways,suchassignaltransduction,immunesystemprocess,andregulationofcytokineproduction.Besides,co-expressionanalysisidentified175geneswerehighlycorrelatedwithexhaustiontraitinhumansandmice.Aboveall,ourstudyrevealedthatTOXandCD200R1mightbeconsideredaspotentialandhigh-efficienttargetsforimmunotherapy.
107
INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS
POSTERPRESENTATIONS
108
IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions
DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions
SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan
VirginiaCommonwealthUniversity
SinaGhadermarziIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresiduelevelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.
109
MUTATIONALSIGNATURES
POSTERPRESENTATIONS
110
MutationalSignatures
Transcription-associatedregionalmutationratesandsignaturesinregulatoryelementsacross2,500wholecancergenomes
JüriReimand
OntarioInstituteforCancerResearch,UniversityofToronto
JuriReimandThegenomesofhealthyandcancerouscellsaccumulatesomaticmutationsovertimewithcomplexvariationsacrosstissuesandgenomiccontexts.Certainclassesoffunctionalelementsofthegenomearesubjecttodifferentialmutationratesduetoregionalizedactivitiesofmutationalprocesses.Toinvestigateregionalmutations,wedevelopedRM4RM,astatisticalframeworkfordetectingdifferentialmutationratesandtrinucleotidesignaturesinsetsofgenomicregulatoryelements.Tovalidateourmodel,wefirstanalyzedCTCFbindingsitesacross>2,500wholecancergenomesof39cancertypesoftheICGC-TCGAPCAWGcohort.WefoundsignificantmutationenrichmentsinCTCFsitesinliver,esophageal,breastandothercancertypesthatwasprimarilydrivenbyT>C/Gmutationsandmultipleraremutationsignaturesofunknownetiology.Transcriptionstartsitesofprotein-codinggenesandabroadersetofexperimentally-definedregulatoryelementsderivedfromprimarytumorsoftheTCGAprojectalsoshowedsignificantlyelevatedregionalmutationratesinmultiplecancertypes.TSS-specificregionalmutationenrichmentwasparticularlydominantinhighlytranscribedgenesofmatchingtumorswhilenonewasapparentinsilencedgenes.Incontrast,nomutationenrichmentdependencyontranscriptabundancewasobservedindistalregulatoryelements.Thesedataindicateatranscriptioninitiation-coupledmutationalprocessactiveinmultiplecancertypessupportedbymultiplemutationalprocessesandtrinucleotidesignaturesspecificallyenrichedinhighly-transcribedTSSs.Ourfindingsandstatisticalmodelenabledetailedstudiesofthemechanismsofsomaticmutagenesisandadvancesourunderstandingofgeneticdriversofdisease.
111
MutationalSignatures
Complexmosaicstructuralvariationsinhumanfetalbrains
ShobanaSekar1,LiviaTomasini2,MariaKalyva3,TaejeongBae1,LoganManlove1,BoZhou4,JessicaMariani2,FritzSedlazeck5,AlexanderE.Urban4,ChristosProukakis3,FloraM.Vaccarino2,
AlexejAbyzov1
1MayoClinic,2YaleUniversity,3UniversityCollegeLondon,4StanfordUniversity,5BaylorCollege
ofMedicine
AlexejAbyzovSomaticmosaicismincellsofthehumanbrainiscommonandmayhavefunctionalconsequencesthatleadtodiseasesincludingneurologicalones.Mosaicvariationsinbraincanbepointmutations,insertionsofmobileelements,andstructuralchanges.Previouslywedetectedanddescribed200-400mosaicpointmutationspersinglecellclonesfromcorticesofthreehumanfetuses(15to21weekspostconception).Herewedescribefourmosaicstructuralvariations(SVs)inthesamebrains.TheSVswereofkilobasescaleandcomplex,i.e.,consistingofdeletion(s)andafewrearrangedgenomicfragmentsthatsometimesoriginatedfromdifferentchromosomes.Sequencesatbreakpointsattherearrangementshadmicrohomologiessuggestingtheiroriginfromreplicationerrors.OneSVwasfoundintwoclonesandwetimeditsoriginto~14weekspostconception.OurstudyrevealstheexistenceofmosaicSVs,likelyarisingfromcellproliferation,inthehumanbraininmid-neurogenesis.
112
PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK
POSTERPRESENTATIONS
113
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWorkPatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Stratificationofkidneytransplantrecipientsbasedontemporaldiseasetrajectories
IsabellaFriisJørgensenPhD1,SørenSchwartzSørensenPhD2,SørenBrunakPhD1
1NovoNordiskFoundationCenterforProteinResearch-FacultyofHealthandMedicalSciences-UniversityofCopenhagen-Blegdamsvej3B-DK-2200CopenhagenN-Denmark;2DepartmentofNephrology-Rigshospitalet-CopenhagenUniversityHospital-Blegdamsvej9-DK-2100
CopenhagenØ-Denmark
IsabellaFriisJørgensenOrgantransplantationsoftenimprovethelifeofchronicallysickpatients.However,immune-suppressivemedicationgiventotransplantrecipientsincreasetheriskofcomplications,especiallyinfectionsandinfection-relateddeath.Oneinfivekidneytransplantrecipientsdiefrominfection.Wewanttostratifykidneytransplantrecipientsintogroupsofpatientswithdifferentpatternsofinfectiousdiseasesandmortalitytopredictwhichpatientshavehigherriskofspecificinfections.WeusetheDanishNationalPatientRegistry(DNPR)thatcontainshospitaldiagnosesfor6.9millionpatientsfromtheentireDanishpopulationfrom1994to2018.Weuseapreviouslypublishedmethodtoidentifysignificanttime-dependentdiseasetrajectoriesforallpatientswithakidneytransplantation.Subsequently,weusehierarchicalclusteringofJaccarddistancesbetweenthediseasetrajectoriestofinddistinctgroupsoftrajectoriesfromkidneytransplantrecipients.IntheDNPR,weidentified5,644patientswithakidneytransplantationresultingin43significantdiseasetrajectoriesthatconsistofthreeconsecutivediseasesincludingseveralinfectious-relateddiagnoses.Morethan87%ofthekidneytransplantationrecipientsfollowatleastoneofthesetrajectories;hencearediagnosedwiththethreediseasesintheorderthetrajectoryspecifies.Clusteringrevealstwomaingroupsoftemporaldiseasetrajectories.Weidentifypatientsfollowingthetwogroupsofdiseasetrajectoriesanddiscoversignificantdifferencesinmortalityafterkidneytransplantationbetweenpatientsfollowingdifferentdiseasetrajectories.Thisstudyusedpreviousdiseasehistoryfromlarge-scalehospitaldiagnosestostratifycommon,temporaldiseasetrajectoriesintotwodistinctgroups.Dependingonthetypeoftrajectorykidneytransplantationrecipientsfollowsignificantdifferencesinmortalityareseen.Thesemethodscanbeusedtoguidecliniciansabouthigherrisksofcertaininfectionsandmortalityofcertaingroupsofkidneytransplantrecipients.
114
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
ModelingGeneExpressionLevelsfromEpigeneticMarkersUsingaDynamicalSystemsApproach
JamesBrunner1,JacobKim2,KordM.Kober3
1MayoClinic,Rochester,MN;2ColumbiaUniversity,NewYork,NY;3UniversityofCalifornia,San
Francisco,CA
KordKoberGeneregulationisanimportantfundamentalbiologicalprocessandinvolvesanumberofcomplexbiologicalprocessesthatareessentialfordevelopmentandadaptationtotheenvironment.Understandingtheroleofepigeneticchangesingeneexpressionisafundamentalquestionofmolecularbiology.Predictinggeneexpressionfromepigeneticdataisanactiveareaofresearchandpreviousstudieshaveusedstatisticalapproachesforbuildingpredictionmodels.Dynamicalsystemscanbeusedtogenerateamodeltopredictgeneexpressionusingepigeneticdataandageneregulatorynetwork(GRN).Bydynamicallysimulatinghypothesizedmechanismsoftranscriptionalregulation,weprovidepredictionsbaseddirectlyonthesebiologicalhypotheses.Furthermore,astochasticdynamicalsystemprovidesuswithadistributionofgeneexpressionestimates,representingthepossibilitiesthatmayoccurwithinthecell.ThepurposeofthisstudyistodevelopandevaluateastochasticdynamicalsystemsmodelpredictinggeneexpressionlevelsfromepigeneticdataforagivenGRN.Wemodelgeneregulationusingapiecewise-deterministicMarkovprocess(PDMP)wheretranscriptionfactor(TF)bindingisaBooleanrandomvariablerepresentingthebound/unboundstateofabindingsiteregionofDNA.TFbindingisgivenasthedifferenceoftwoPoissonjumpprocesses(i.e.,bindingandunbinding),sothattimebetweenbindingandunbindingeventsisexponentiallydistributedwithpropensitiestakentobelinearfunctionsoftheavailableTF.EpigeneticmodificationoftheTFbindingsiteimpactsthebindingpropensityofTFandismeasuredasthepercentageofmethylatedbases(i.e.,beta).WeusealinearordinarydifferentialequationbasedontheunderlyingGRNtodeterminethevalueofthetranscriptbetweenTFbindingorunbindingevents.Weincludebaselinetranscriptionanddecayandareabletosolveexactlybetweenjumpsofbinding/unbindingevents.Inadiscretespace,continuoustimeMarkovprocess,theequilibriumdistributioncanbeestimatedbysamplingfromarealizationoftheprocess.ForourcontinuousspacePDMPwecanestimatetheequilibriumdistributioninasimilarmannerusingkerneldensityestimationwithaGaussiankernel.Weestimatethemarginaldistributionsofvariousgenevariableswitha1-dimensionalkernel.WeuseaGRNassumetobeknowntocreateamodelofgeneregulationthatincludesTFbindingdynamics.Weassociatebindingsiteswiththegenesthattheyregulateandusetheseassociationstocreateabipartitegraph.TheGRNandtraining/testingdataarecreatedfrompubliclyavailabledata.Theepigeneticparameterisassumedtobemeasurable.Theremainingparametersareestimatedusinganegativelog-likelihoodminimizationprocedure.Wecancomputealog-likelihoodforasetofpairedepigeneticandtranscriptionsamplesbytimeaveragingasamplepathagainstaGaussiankernel.Wereportonthedesignandevaluationofthemodel’sperformance.
115
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
TranslatingBigDataneuroimagingfindingsintomeasurementsofindividualvulnerability
PeterKochunov1,PaulThompson2,NedaJahanshad2,ElliotHong1
1UniversityofMarylandSchoolofMedicine,Maryland,USA;2UniversityofSouthernCalifornia,
California,USAPeterKochunovWeproposeanintuitiveanatomicallyinformedapproachtoderiveanindexofsimilaritybetweenindividualbrainpatternsandtheexpectedpatternsofneuropsychiatricdisordersbasedonBigDataneuroimagingstudies.BigDataneuroimagingstudies,suchastheseperformedbyEnchancingNeuroImagingGeneticsMetaAnalysis(ENIGMA)consortiumprovidedscientificcommunitywiththeregionalpatternsofeffectsizesincommonneuropsychiatricdisorderssuchasschizophrenia(SZ),bipolarandmajordepressivedisorders(BPandMDD),epilepsy(EP),Alzheimer’sdementia(AD),mildcognitiveimpairment(MCI)andothers.ThesepatternsdescriberegionaldeficitusingstandardizedsMRI,dMRIandrsfMRIworkflows.Theyarederivedfromstatisticallypowerfulandinclusivesamplesandarehighlyreproducible(r=0.8-0.9)inindependentsamples.Wedeveloped“RegionalVulnerabilityIndex”(RVI)tomeasuresimilaritybetweenanindividualandtheexpectedpatternofthepatient-controldifferencesRVIcanbecalculatedforasingleoracrossimagingmodalities.ForasinglemodalityRVI,exampleusesFractionalAnisotropy(FA)measurefromdMRI,iscalculatedasfollowing.FAforeachofthe23majorwhitematterregions,asdefinedbyENIGMAatlas,inanindividualisconvertedtoz-valuesby(A)calculatingtheresidualvaluesafterregressingoutageandsexeffectsforthisregionand(B)subtractingtheaveragevalueforaregionand(C)dividingbythestandarddeviationcalculatedfromthehealthycontrols.Thisproducesavectorof23z-values(oneperregion)foreachindividualinthesample.RVIiscalculatedasthecorrelationcoefficientbetween23region-wisezvaluesforthesubjectandthepatient-controlseffectsizesinENIGMA.RVItakesvaluesfrom1(individualpatternisalignedwithdisorderpattern)to-1(individualpatternisinanti-alignment).Forcross-modalityresearch,RVIcanbeexpandedhierarchicallybybuildingacombinedvectorthatincludesmultiplephenotypes.Forexample,theRVI-WhiteMattercalculationusesavectorof69valuesthatcombinetract-wiseFA,radial(RaD)andaxial(AxD)diffusivityvaluesperperson.Tomergeeffectsizesacrossdiversedomains,weuseapseudo-ordinarytransformationthatmapseffectsizesbetween0and1whilepreservingtherelativedistancebetweenthem.WefirstdemonstratedthatRVI-SZvaluesaresignificantlyelevatedinpatientswithSZandarealsopredictiveoftreatmentresistance.ThatissubjectswhodevelopedresistancetomodernantipsychoticmedicationshadsignificantlyhigherRVI-SZvaluesthanthesewhorespondedtotreatment.WenextdemonstratedthatRVIforSZweresignificantlycorrelatedwithRVIforADbutnotMCIduetosignificantoverlapindeficitpatternsbetweenthesedisorders.WenextshowedthatcalculatingRVIacrossmultiplemodalitiesproducesvulnerabilitymeasuresthataremoresensitivetopatientcontroldifferencesintheindependentdatasetsandshowedstrongersensitivitytocognitivedeficitsandnegativesymptoms.TheRVIcalculatortoolsaredistributedwithsolar-eclipsesoftware(www.solar-eclipse-genetics.org)
116
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Automatingnew-usercohortconstructionwithindicationembeddings
RachelD.Melamed
DepartmentofComputationalBiomedicineandBiomedicalData,UniversityofChicago
RachelMelamedTheelectronichealthrecordisarisingresourceforquantifyingmedicalpracticeanddiscoveringadverseeffectsofdrugs.Oneofthechallengesofhealthcaredataisthehighdimensionalityofthehealthrecord.Anystudyofpatternsinhealthdatamustaccountfortensofthousandsofpotentiallyrelevantdiagnosesortreatments.Inthiswork,wedevelopindicationembeddings,awaytoreducethedimensionalityofhealthdatawhilecapturingtheinformationrelevanttotreatmentdecisions.Wedemonstratethattheseembeddingsrecovertherapeuticusesofdrugs.Thenweusetheseembeddingsasaninformativerepresentationofrelationshipsbetweendrugs,betweenhealthhistoryeventsanddrugprescriptions,andbetweenpatientsataparticulartimeintheirhealthhistory.Weshowtheapplicationoftheseembeddingsinareasofcurrentresearch.Fordrugsafetystudies,particularlyretrospectivecohortstudies,ourlow-dimensionalrepresentationhelpsinfindingcomparatordrugsandconstructingcomparatorcohorts.Thisenablesustodevelopanautomatedapproachtochoosecomparatorcohortsforatreatedpopulation.
117
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
Reproducibility-optimizedstatisticaltestingforomicsstudies
TomiSuomi,LauraElo
TurkuBioscienceCentre,UniversityofTurkuandÅboAkademiUniversity,Turku,Finland
LauraEloDifferentialexpressionanalysisisoneofthemostcommontypesofanalysesperformedonvariousbiologicalandbiomedicaldata,includinge.g.RNA-sequencingandmassspectrometryproteomics.Itistheprocessthatdetectsfeatures,suchasgenesorproteins,showingstatisticallysignificantdifferencesbetweenthesamplegroupsundercomparison.However,asdifferentteststatisticsperformwellindifferentdatasets,thechoiceofanappropriateteststatistichasremainedamajorchallenge.Toaddressthechallenge,ourreproducibility-optimizedteststatistic(ROTS)optimizesthestatisticonthebasisofthedatabymaximizingthereproducibilityofthetop-rankedfeaturesthroughabootstrapprocedure.Finally,itprovidesarankingofthefeaturesaccordingtotheirstatisticalevidencefordifferentialexpressionbetweenthesamplegroups.WehaveshowntherobustperformanceofROTSinarangeofstudiesfromtranscriptomicstoproteomics,coveringbothbulkandsinglecellmeasurements.ROTSisfreelyavailableasanRpackageinBioconductor.
118
PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork
DataIntegrationExpectationMaps:Towardsmoreinformed'omicdataintegration
TiaTate1,ChristianRichardson2,ClarLyndaWilliams-DeVane3
1UniversityofNorthCarolina-Charlotte,2DukeUniversity,3FiskUniversity
ClarLyndaWilliams-DeVaneInnovativedatatechnologiesanddecreasingcostshaveexpandedthescopeofavailabledatarelatingtovariousdiseases.Avastamountof-omicsdatageneratedatdiverselevels(DNA,RNA,protein,metaboliteandepigenetic)haverevealedrelationshipsofvariousbiologicalprocesses.Generally,thesediversedatatypesareconsideredindependentlywhilecombinationsoftwoormoredatatypesarelessexplored.Thisnarrowapproachoftenfailstoidentifytheintricateinteractionsresponsiblefortheetiologyofcomplexdisease.Completebiologicalmodelsofcomplexdiseasesareonlylikelytobediscoveredifthevariouslevelsof-omicmechanismsareconsideredfromanintegrativeperspective.Integrativemodelsoftenrequiretheintegrationofbiological,computational,mathematical,andstatisticaldomains.However,awell-documentedshortageofresearcherswithacommandofmultipledomainsexists.Thus,wehaveproposedtheuseofDataIntegrationExpectationMaps(DIEMs)asvisualtoolsforfacilitatingtheunderstandingofintegratingvarious-omicdatatypestounderstandcomplexdiseasesbyfillingingapsinbiologicalknowledge.DIEMsprovideauser-friendlyformatforunderstandingintegrativemodeldevelopmentincomplexdiseasesby1)identifyingdataformatsthatcanand/orhavebeenintegrated,2)providingguidanceonthebestmethodtointegratethedata,and3)providinganexpectationofbiologicalinsighttobegainedfromtheintegration.
119
PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE
POSTERPRESENTATIONS
120
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Integratedomicsdataminingofsynergisticgenepairsforcancerprecisionmedicine
EunaJeong,ChoaPark,SukjoonYoon
SookmyungWomen'sUniversity
EunJeongCurrenthigh-throughputtechnologiesenablesimultaneousacquisitionofmulti-levelomicsandRNAi/chemicalscreeningdataincancers.Productionandintegrationofthesedatahelpidentifyingassociationsofdrugtargetsandsynergisticbiomarkers(mutationsorgeneexpression),thusacceleratingtheirclinicalapplicationsandpatientstratification.WehaveextensivelycarriedoutcancerbigdataminingandphenotypicsiRNAlibraryscreeningforfindingtheoptimalcombinationoftargetsandbiomarkersforadvancedcancertherapiessuchasregulatingcancerstem-likecells(CSLCs)andoncogenictranscriptionfactors.Ourmultiplexedscreeningdissectphenotypicresponsesintosensitivityandresistancytothetargetknockdown.Combinedwithmutaomeandtransciptomedataofscreenedcelllines,targetome-wideknockdowndatarevealthefunctionalaspectofsynergisticeffectsbetweentargetsiRNAsandmutation/transcriptionsignatures,leadingtothediscoveryofnovelsyntheticlethalgenepairs.Productionandintegrationofthesedataenabledustoidentifytarget-biomarkercombinationsforacceleratingtheirclinicalapplicationsandpatientstratification.
121
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth
ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1
1UniversityofNotreDame,2UniversityofCaliforniaatLosAngeles
ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine"correlation"betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.
122
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata
JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1
1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC
JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.
123
Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale
PharmGKB:AutomatedLiteratureAnnotations
MichelleWhirl-Carrillo1,LiGong1,RachelHuddart1,KatrinSangkuhl1,RyanWhaley1,MarkWoon1,JuliaM.Barbarino2,JakeLever3,RussB.Altman4,TeriE.Klein5
1DepartmentofBiomedicalDataScience,StanfordUniversity;2FormerlyDepartmentofBiomedicalDataScience,StanfordUniversity;3DepartmentofBioengineering,StanfordUniversity;4DepartmentsofBioengineering,MedicineandGenetics,StanfordUniversity;
5DepartmentsofBiomedicalDataScienceandMedicine,StanfordUniversity
MichelleWhirl-CarilloPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.PharmGKBscientistSmanuallycuratetheprimaryliteraturetocapturedetailsofpublishedpharmacogenomicstudiessuchasvariant-gene-drug-phenotypeassociations,statisticalsignificance,studysizeandpopulationcharacteristics.PharmGKBreferstothesemanuallycreatedannotationsas“VariantAnnotations.”
124
PACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE
WORKSHOPPOSTERPRESENTATIONS
125
Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse
ApolloprovidesCollaborativeGenomeAnnotationEditingwiththepowerofJBrowse
NathanDunn1,ColinDiesh2,RobertBuels2,HelenaRasche3,AnthonyBretaudeau4,NomiHarris1,IanHolmes2
1LawrenceBerkeleyNationalLab,2UCBerkeley,3UniversityofFreiburg,4INRA
NathanDunnGenomeannotationprojectsinvolvemulti-stepworkflowsthatarelargelyautomated.However,evenwithafullyautomatedannotationpipelinevisualinspectionandrefinementofdiversetypesofinformationsuchasgenomicandtranscriptomealignmentsandpredictivemodelsbasedonsequenceelementsarecriticaltoassureandimprovetheaccuracyofthegenomeannotationspriortopublication.Tothisend,Apollo(https://github.com/GMOD/Apollo/)isawebapplicationthatprovidesresponsiveandcustomizablevisualizationandeditingofgenomicelements.BuiltontopoftheJBrowsegenomebrowser(http://jbrowse.org/)anditslargeregistryofplugins(https://gmod.github.io/jbrowse-registry/),Apollosupportsefficientannotationcurationthroughdrag-and-dropediting,alargesuiteofautomatedstructuraleditoperations,theabilitytopre-definecuratorcommentsandannotationstatustomaintainconsistency,attributionofannotationauthors,fine-graineduserandgroupaccessandeditpermissions,andavisualhistoryofrevertibleannotationedits.SettingupanewgenomeannotationinApolloisstraightforward.ApollocanberunfromDockerorfromprovidedAWSinstances,andgenomeswithfeatureevidencecanberetrievedfromanexistingJBrowsedirectory.Wehavealsorecentlyenabledresearcherstouploadtheirgenomesequenceandfeatures(inFASTA,VCF,BAM,orGFF3format)directlytoApollo,minimizingtheneedforscriptingorserveraccess.ItisalsopossibletocreateannotationsontheflyfromBLATorBLASTsearchresults,whichprovidesawaytoinitiateagenepreviouslyannotatedonacloselyrelatedspecies..ApolloprovidesaPythonlibrarythatwrapstheweb-services(https://github.com/galaxy-genome-annotation/python-apollo)sothatworkflowenvironmentssuchasGalaxycanbeautomatedsothattheoutputofanautomatedworkflowcandirectlycreategenomeprojects,provideevidence,andmanageaccesstoanApolloinstance.Apollosupportsseveralpopularformatsfordataexport.StructuralgenomeannotationscanbeexportedasFASTA,GFF3,orVCF(ifannotatingvariants)alongwithanyassociatedmetadata.FunctionalannotationsmappedtoGeneOntologytermscanbeexportedinGPAD2orGPI2format.Apolloisanopen-sourcetoolusedinoveronehundredgenomeannotationprojectsaroundtheworld,rangingfromtheannotationofasinglespeciestolineage-specificeffortssupportingtheannotationofdozensofgenomes.https://github.com/GMOD/Apollo/https://genomearchitect.readthedocs.io/
126
Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse
g:Profiler - One functional enrichment analysis tool, many interfaces serving life science communities
Liis Kolberg, Uku Raudvere, Ivan Kuzmin, Jaak Vilo, Hedi Peterson
University of Tartu
Making sense of gene lists plays an important role in majority of biological and biomedical experiments. There are several methods and tools that help the scientists to carry out the computational load of these tasks. One of such is g:Profiler (https://biit.cs.ut.ee/gprofiler), a widely used toolset for functional interpretation and conversion of gene lists from hundreds of species. g:Profiler has served the community since 2007 and continues to provide life scientists with the most up-to-date data and methods to this day. Keeping the service trustworthy, the results reproducible and transparent has been the main goal of the team developing g:Profiler. The success in this end is indicated in the increasing number of user requests per year, which already in 2019 alone is close to 9 million queries. These millions of queries originating across the world reflect the diversity of usage preferences, skill sets and research goals of the scientific community. We, as the developers of g:Profiler, have taken this into account by developing and supporting different access options which, in hindsight, has been a huge factor in the increasing user traffic. On the one hand, g:Profiler web application provides researchers, who want quick and easily interpretable results, with nice visualizations, searchable tables and data export possibilities. On the other hand, there is a large bioinformatics community, whose members prefer to analyze gene lists in an automated manner. We support them by offering a standardized access through public APIs. And, as R and Python are the most popular programming languages among life scientists with informatics expertise, we have simplified the usage of APIs by wrapping them into corresponding packages named gprofiler2 and gprofiler-official, respectively. For the users somewhere in between, g:Profiler is also available from the Galaxy platform, which is a popular framework for data intensive biomedical research pipelines run in a graphical user interface. It is clear that the tools in such an interdisciplinary field need to be flexible in order to fully benefit the research community. However, from our experience, the complexity of providing a widely distributed toolset lies in the maintenance of the services rather than in the development, and this is the core reason for depreciation of tools. In g:Profiler the separate interfaces all use the data and methods from a shared hub making them reliable and consistent with each other even after the frequent data updates. We are positive that g:Profiler has been able to help thousands of researchers across the life science community because our priorities have been to reuse high quality and regularly updated data, and to maximize the access options so that we would not leave any life science subcommunity behind.
127
Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse
IncreasingusabilityanddisseminationofthePathFXalgorithmusingwebapplicationsanddockersystems
JenniferWilson1,NicholasStepanov2,AjinkyaChalke2,MikeWong3,DragutinPetkovic2,RussB.Altman4
1DepartmentofChemical&SystemsBiologyatStanfordUniversity;2ComputerScienceDeptat
SanFranciscoStateUniversity;3COSEComputingforLifeSciencesatSanFranciscoStateUniversity;4HelixGroupatStanfordUniversity
MikeWongLimitedefficacyandunacceptablesafetyconfoundtherapeuticdevelopment.Identifyingpotentialliabilitiesearlierindrugdevelopmentcouldsignificantlyimprovesuccessrates.Recently,incollaborationwiththeUSFDA,wedevelopedthePathFXalgorithmandopenlyavailablePathFXwebapplicationforbetterunderstandingpathway-levelsafetyandefficacyphenotypesassociatedwithadrug’starget(s).RunningPathFXalgorithmlocallywouldenableimprovedefficiency,security,andprivacy,howeverinstallationofPathFXanditsdependenciesischallengingfornon-computationalscientistsandpreventsdissemination.Inaddition,whilePathFX-webquicklyanalyzesnetworkassociations,thephenotypeclusteringfeaturehashighcomputationalcoststhatlimittheefficiencyofthesharedcloudserver.Toresolvethesechallenges,wedevelopedPathFX-webDockercontainerwhichprovidesaneasy-to-install,easy-to-usewebinterface,astandalonecommand-lineformulationtoPathFX,addedsecurity/privacyandallowsleveragingofthecomputationalpoweroftheuser’shardware.
128
TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA
WORKSHOPPOSTERPRESENTATIONS
129
Workshop:TBIworkshop
Identificationofbiomarkersrelatedtoautismspectrumdisorderusinggenomicinformation
LeenaSait,MarthaGizaw,IosifVaisman
SchoolofSystemsBiology,GeorgeMasonUniversity
LeenaSaitAutismspectrumdisorder(ASD)isoneofthemostcommonneurodevelopmentaldisorders.Worldwide,ASDtendstohaveaprevalenceofoneper132persons,withanestimatedprevalenceof1in59children,accordingtoCDC’sAutismandDevelopmentalDisabilitiesMonitoringNetwork.Todate,noeffectivemedicaltreatmentsforthecoresymptomsofASDexists.However,biomarkerscapableofdetectinganddiagnosingASDcanhelptotranslateexperimentalresearchresultstobenchsideclinicalpractices.BiomarkerdiscoveryinASDiscomplicatedbythediversityofcoresymptomswhichcomprisedeficitsinsocialcommunication,presenceofrigid,repetitiveandstereotypicalbehaviors,andcomorbidmedical(e.g.,epilepsy)orpsychiatricsymptoms.TheEU-AIMSLongitudinalEuropeanAutismProject(LEAP),thelargestconsortiamadeagreatadvancementinthediscoveryofbiomarkersforASD.Itseekstoidentifystratificationbiomarkersusingneurobiologicalorneurocognitivemeasures,neuroimaging,electrophysiology,biochemistryandgenetics.Thisworkisaimedattheidentificationofsinglenucleotidepolymorphisms(SNPs)basedonSNPgenotypingingenomicDNAinalargecohortofASDpatientsandunaffectedrelatedindividualstohelpunderstandtheexactgeneticcausesofASD.Wehypothesizedthatrankingthegenesbasedondistanceinthespaceoftheallelesfrequenciesbetweenaffectedandunaffectedpopulationscanbeusedtoidentifynewputativebiomarkers.ThedatasetretrievedfromtheGeneExpressionOmnibusdatabase(GSE6754)containsmorethan6000samplesfrom1,400families.OurresultsshowthattheSNPsthatarehighlyrankedbythedistanceinthree-dimensionalgenotypecountspacebetweenalltheaffectedandunaffectedsubjectsinthecohortaremorelikelytobelinkedtoASD.TheseresultscanopennewpossibilitiesforfurtherinvestigationinidentifyingthegeneticmechanismsofASD.
130
Workshop:TBIworkshop
Apan-cancer3-genesignaturetopredictdormancy
IvyTran1,AnchalSharma2,SubhajyotiDe2
1RutgersUniversity-Camden,2RutgersCancerInstituteofNewJersey
IvyTranTumordormancyischaracterizedbythedisseminationofhibernatingtumorcellsthatdonotproliferateuntilyearsafterapparentlysuccessfulremovalofpatients’primarycancer,resultinginthelaterelapseofthecancer.Distinguishingbetweentheriskofearly(£8months)andlate(³5years)relapseincancerpatientsisimportantforthetargetedtreatmentofthetumor.Inthisstudy,weidentified53genesthatweresignificantlyup-regulatedordown-regulatedindormantcells,fromwhichthreegenes,CD300LG,OCIAD2,VSIG4,weredeterminedbyrecursivefeatureeliminationtobethemostimportantfeaturesinpredictingtumordormancy.Usingthisthreegenesignature,wetrainedaRandomForestalgorithmonacross-validated(10foldrepeated3times)dataset(n=422)randomlysubsettedintotrainingdata(75%)andtestdata(25%),consistingofsevendifferenttumortypes-testicularcancer,breastcancer,glioblastomamultiforme,lungcancer,colonrectalcancer,kidneycancerandmelanoma.Thetunedpredictionmodelyielded80.19%predictionaccuracyusingconfusionmatrixanalysis,and82.74%predictionaccuracywhenusingAUCofaROCcurveastheaccuracymetric.Whenindependentlytestingthemodelonavalidationset(n=44)oflivercancerdownloadedfromICGC,confusionmatrixanalysisyieldeda67.44%accuracyandAUCofaROCcurveyieldeda60.48%accuracy.Thisidentified3-genesignaturecanbeusefulinpredictingearlyorlaterelapseofcancerinpatientsinclinicalpractice.
131
AUTHORINDEX
’
’tJong,Geert·85
A
Abyzov,Alexej·111Adkins,Joshua·96
·3Agrawal,MonicaAlavi,Ali·99
·28Allen,MaryA.·47Alterovitz,Wei-Lun
Althagafi,Azza·70·27,34,37,123,127Altman,RussB.
·23Anastopoulos,IoannisN.·21Andrade-Navarro,MiguelA.
Andrianova,Katia·74·31Arslanturk,Suzan
·50Atwal,Gurnit
B
·63Bae,HoBae,Taejeong·111
·65Baladandayuthapani, VeerabhadranBaltrus,David·96Barash,Yoseph·92
·34,123Barbarino,JuliaM.·10,108Barik,Amita
Barlaskar,Sabiha·99·39Barnard,Martha·19Beam,AndrewL.
Bebek,Gurkan·75Belyeu,Jon·83
·60Benchek,PenelopeBerger,Howard·85
·65Bhattacharyya,Rupam·22Blinder,Pablo·20,76,93Bobak,CarlyA.
Bock,Christoph·91·25Bourque,Guillaume
·7Branch,Andrea·2Brand,Lodewijk
Brannon,Charlotte·77Bretaudeau,Anthony·125
·27Brinton,ConnorBrodie,Sonia·85Brooks,ThomasG.·79,92Brown,James·85Brown,Joe·83Brown,Yaadira·80Brunak,Søren·113
Brunner,James·114Buels,Robert·125
·4Bui,NamBull,ShelleyB.·105
·21Burkhardt,Sophie·60,64Bush,WilliamS.
·42Bustamante,CarlosD.
C
·6Cai,ChunhuiCai,DanDan·99
·19Cai,TianxiCalvanese,Vincenzo·95
·44Candido,ElisaCapellera-Garcia,Sandra·95Carleton,BruceC.·78,85Carrillo,KatherineI.·81
·49Ceri,StefanoChalke,Ajinkya·127Chaudhry,Shahnaz·85
·3Chen,IreneY.·4Chen,JessicaW.
·12Chen,JianhanChen,Jun·70
·45Chen,Yang·38,122Chen,Yong
Cheong,HeeJin·102·56Cheong,Jae-Ho·53Cherng,SarahT.
Chia,Nicholas·71·50Chmura,Jacob·63Choi,Hyun-Soo
·68,103Chrisman, Brianna·54Christensen,BrockC.
·15Christensen,SarahChu,Chong·82
·6Cohen,WilliamW.·52Coker,Beau
·64CookeBailey,JessicaN.Cormier,Michael·83CornwellIII,EdwardE.·80
·4,42Costa,HelioA.·64Crawford,DanaC.
·5Crowell,Andrea·8Cui,Tianyi
D
Dale,Ryan·88·53Danieletto,Matteo
De,Subhajyoti·130·27Derry,Alexander
Diesh,Colin·125Ding,Yali·84
132
Dovat,Sinisa·84·28Dowell,RobinD.
·31Draghici,SorinDrögemöller,BrittI.·78,85
·38,122Duan,Rui·44Duchen,Raquel·7,53Dudley,JoelT.·47Dunker,A.Keith
Dunlap,Kaiti·103 ·68Dunlap, Kaitlyn
Dunn,Nathan·125Durmaz,Arda·75
E
Ekstrand,Sophia·95·15El-Kebir,Mohammed
Elo,Laura·117
F
·47Faraggi,EshelFarlik,Matthias·91Feng,Song·96
·43Feng,YunyiFitzGerald,GarretA.·79,92
·60Fondran,JeremyR.Fortelny,Matthias·91
·19Fried,Inbar·23Friedl,Verena
G
·59Gao,Jean ·65Garmire, Lana
Gerstein,Mark·77·10,108Ghadermarzi,Sina
Ghayoori,Sholeh·85Ghosh,Sayan·96
·28Gilchrist,AlisonR.Gizaw,Martha·129
·21Glodde,Josua·22Golgher,Lior
·34,123Gong,LiGorjifard,Sayeh·88Grant,GregoryR.·79,92Groeneweg,GabriellaS.S.·85Gumerov,VadimM.·86
·27,37Guo,MargaretGuo,Yicheng·106
·22Gur,ShirGursoy,Gamze·77
H
·65Ha,MinJin·23Haan,David
·68,103Haber, Nick·35,121Hachen,David
·60Haines,JonathanL.Halappanavar,Mahantesh·96
·66Hall, MollyA.·60Hamilton-Nelson,KaraL.
·24Hao,Jie·5Harati,Sahar
·16,87Harrigan,CaitlinF.Harris,Nomi·125
·8Hauskrecht,Milos ·66He, Xi
·32Hernandez-Ferrer,CarlesHigginson,Michelle·85
·20,76,93Hill,JaneE.·25Hocking,TobyDylan
Hoehndorf,Robert·70,90Hogenesch,JohnB.·92
·11Hogue,ChristopherW.V.Holmes,Ian·125Hong,Elliot·115
·3Horng,Steven·47Huang,Fei·2Huang,Heng·58Huang,Kun
·34,123Huddart,RachelHughitt,V.Keith·88Husic,Arman·103
·56Hwang,TaeHyun
I
Ito,Shinya·78,85
J
·44Jaakkimainen,Liisa·11Jagannathan,N.Suhas
Jahanshad,Neda·115Jang,Hyun-Jong·94Jenkins,Willysha·89Jeong,Euna·120Jørgensen,IsabellaFriis·113Jouline,Igor·74
·7Jun,Tomi·63Jung,Dahuin
Jung,Jae-Yoon·103
133
K
Kafkas,Senay·90Kalantari,John·71
·68Kalantarian, HaikKalyva,Maria·111
·24Kang,Mingon·56Kar,Nabhonil
Karjalainen,Anzhelika·91·10,108Katuwawala,Akila
Keats,JonathanJ.·88·41Kelly,Libusha
Kent,Jack·103Khan,Ariful·96
·41Khan,SaadKim,Jacob·114Kim,WooJoo·102
·64Kinzy,Tyler ·66Kleber, MarcusE.
·34,81,123Klein,TeriE. ·68,103Kline, Aaron
·47Kloczkowski,AndrzejKober,KordM.·114
·33Kocher,Jean-PierreKochunov,Peter·115
·55Koestler,DevinC.·19Kohane,IsaacS.
Kolberg,Liis·126·19,52Kompa,Benjamin
·32Kong,SekWonKoohi-Moghadam,Mohamad·72
·24Kosaraju,SaiChandraKoster,Johannes·83
·21Kramer,Stefan·13Kriwacki,RichardW.
Krunic,Milica·91·4Kunder,ChristianA.
·60Kunkle,BrianW.·10,108Kurgan,Lukasz
Kuzmin,Ivan·126
L
Lahens,NicholasF.·79,92·33Larson,MelissaC.·33Larson,NicholasB.
Lassnig,Caroline·91Lawrence,CrisW.·79,92LeBlanc,Emilie·103Lee,E.Alice·82Lee,HanSol·102
·53Lee,Hao-ChihLee,Joon-Yong·96
·45Lee,RenaLee,Soohyun·82Lee,SungHak·94
·15,17Leiserson,MarkD.M.·34,123Lever,Jake
·54Levy,JoshuaJ.
Li,Hongyan·72·7Li,Li
·38,122Li,Ruowang·26Lichtarge,Olivier
Lim,Sooyeon·102·43Lin,Deborah
·64Lin,John·20,93Lin,Justin·43Lin,Simon·43Liu,Chang·65Liu,Qingzhi·35,121Liu,Shikang·12Liu,Xiaorong
Liu,Zhandong·100·35,121Lizardo,Omar
Lowry,WilliamE.·100·6Lu,Xinghua
·36Luthria,Gaurav·45Lv,Tianling
M
Ma,Feiyang·95·58Machiraju,Raghu
Macho-Maschler,Sabine·91 ·66Maerz, Winfried
Magee,LauraA.·85·58Mallick,Parag
Manlove,Logan·111Manrai,ArjunK.·101Mariani,Jessica·111
·5Mayberg,HelenMcDermott,Jason·96
·20,93McDonnell,Lauren·55Meier,Richard
Melamed,RachelD.·116Melas-Kyriazi,Luke·101
·47Meng,JingweiMiao,Fudan·85Michalowski,AleksandraM.·88Mikkola,HannaK.A.·95
·35,121Milenkovic,Tijana·53Miotto,Riccardo·13Mitrea,DianaM.
Mock,BeverlyA.·88Monroy,Rebeca·82
·38,122Moore,JasonH.·43Moosavinasab,Soheil
·16,44,50,87Morris,QuaidMueller,Mathias·91
·66Mueller-Myhsok, Bertram
N
·33Na,JieNakatochi,Masahiro·97Nayak,Soumyashant·79,92Nelson,Heidi·71
134
Nelson,William·96·5Nemati,Shamim
Nemesure,Matthew·93·20Nemesure,MatthewD.
·55Neums,LisaNguyen,Justine·96
·31Nguyen,Tin·2Nichols,Kai
·42Nie,AllenNing,Michael·103Nishi,Hafumi·106Noh,JiYun·102
O
·64O'Toole,JohnF.Oh,JiEun·104Ola,MojoyinolaJoanna·91
·10,47,108Oldfield,ChristopherJ.Olufajo,OlubodeA.·80Orloff,Mohammed·75Ostojic,Annie·98
P
·19Palmer,NathanPark,Choa·120Park,Gunseok·104Park,PeterJ.·82
·56Park,Sunho·50Park,Yoonsik
·57Parmigiani,Giovanni ·68,103Paskov, KelleyMarie ·66Passero, Kristin
Patel,ChiragJ.·101·42Patel,RonakY.
·57Patil,Prasad ·68Patnaik, Ritik
Payne,JonathonL.·84Pedersen,Brent·83Pellegrini,Matteo·95Penev,Yordan·103
·37Pershad,Yash·7Perumalswami,Ponni
Peterson,Hedi·126Petkovic,Dragutin·99,127
·26Pham,Minh ·67Pietras, ChristopherMichael
·42Pineda,ArturoL.·49Pinoli,Pietro·49Piro,Rosario
·35,121Poellabauer,ChristianPoelzl,Andrea·91Pohodich,AmyE.·100Polley,EricC.·88
·67Power, LiamProukakis,Christos·111Pruneda,Jonathan·96
·17Przytycka,TeresaM.
Q
Quinlan,AaronR.·83
R
Raman,AyushT.·100·57Ramchandran,Maya·61Ramsey,StephenA.
Rasche,Helena·125Rassekh,Shahrad·78Rassekh,ShahradR.·85Raudvere,Uku·126Reimand,Jüri·110Richardson,Christian·89,118
·47Romero,PedroRoss,ColinJ.D.·78,85
·33Rowsey,Ross·16,87Rubanova,Yulia
·39Ryder,Nathan
S
Sagers,Luke·101Sait,Leena·129
·54Salas,LucasA.Sanatani,Shubhayan·85
·34,123Sangkuhl,KatrinSarantopoulou,Dimitra·79,92
·28Sawyer,SaraL.·38,122Scheuemie,MartijnJ.
·19Schmaltz,AllenSchug,Jonathan·92
·68Schwartz, JesseySedlazeck,Fritz·111
·64Sedor,JohnR.Sekar,Shobana·111
·16,87Selega,Alina·17Sharan,Roded
Sharma,Anchal·130·58Sharpnack,Michael
Shaw,Kaitlyn·85·2Shen,Li·19Shi,Xu
Shoebridge,Stephen·91·21Siekiera,Julia
Simmons,JohnK.·88Skander,Dannielle·75
·67Slonim, DonnaK.·13Somjee,Ramiz·24Song,DaeHyun
Song,JoonYoung·102·3Sontag,David
Sørensen,SørenSchwartz·113·33Sosa,CarlosP.
135
·27Sosa,DanielN.Southerland,William·80
·54Sriharan,AravindhanSrinivasan,Anand·92
·58Srivastava,Arunima·28Stabell,AlexC.
·49Stamoulakatou,Eirini·28Stanley,JacobT.
Staub,Michelle·85·4Stehr,Henning
Stepanov,Nicholas·127 ·68,103Stockham, Nathaniel
·35,121Striegel,AaronStrobl,Birgit·91
·23Stuart,JoshuaM.Sun,Hongzhe·72Sun,MinWoo·103Suomi,Tomi·117
T
·23Tao,Ruikang·6Tao,Yifeng
·68Tariq, QandeelTate,Tia·118
·55Thompson,JeffreyA.Thompson,Paul·115
·39Tintle,NathanTomasini,Livia·111
·38,122Tong,JiayiTran,Ivy·130
·59Tran,NhatTrueman,Jessica·85
·24Tsaku,NelsonZange·11Tucker-Kellogg,Lisa
U
Urban,AlexanderE.·111·47Uversky,VladimirN.
V
Vaccarino,FloraM.·111·54Vaickus,LouisJ.
Vaisman,Iosif·129·7Vandromme,Maxence
·68,103Varma, MayaVilo,Jaak·126
·68,103Voss, Catalin
W
Wagner,Sarah·77 ·68,103Wall, DennisP.
Wan,Ying-Wooi·100·42Wand,Hannah
·33Wang,Chen·29Wang,Gao
Wang,Haibo·72·2Wang,Hua
Wang,Junwen·72·36Wang,Qingbo
Wang,Yuchuan·72·29Wang,Yue
·29Wang,Yunlong·60Warfe,Mike
·68,103Washington,Peter·19Weber,Griffin
·27Wei,Eric·23Weinstein,AlanaS.
West,Nicholas·85·39Westra,Jason·34,123Whaley,Ryan
·60Wheeler,NicholasR.·34,123Whirl-Carrillo,Michelle
Whyte,SimonD.·85Williams-DeVane,ClarLynda·89,118Wilson,Jennifer·127
·44Wilton,AndrewS.·44Wodchis,Walter·17Wojtowicz,Damian
·39Wolf,Jack·22Wolf,Lior
·23Wong,ChristopherK.Wong,Mike·127
·34,123Woon,MarkWright,GalenE.B.·78,85
·42Wright,MattW.·29Wu,Tong·42Wulf,Bryan
X
·39Xia,Xueting·45Xing,Lei·47Xue,Bin
Y
Yalamanchili,HariKrishna·100Yang,HeeChul·104Yang,Jizhou·99Yang,Xinming·72
·61Yao,YaoYavartanu,Fatemeh·105Yoo,YunJoo·105Yoon,Sukjoon·120
·63Yoon,Sungroh·50Young,Adamo
·8Yu,KeYue,Feng·84
136
Z
·4Zehnder,JamesL.·43Zeng,Xianlong
Zhang,Bo·84·44Zhang,Haoran
Zhang,Lin·106
·8Zhang,Mingda·45Zhao,Wei
Zhou,Bo·111 ·66Zhou, Jiayan
Zhulin,IgorB.·86Zoghbi,HudaY.·100
·42Zou,James