+ All Categories
Home > Documents > PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK...

PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK...

Date post: 10-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
144
PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the page that your abstract is on and put your poster on the poster board with the corresponding number (e.g., if your abstract is on page 50, put your poster on board #50). Proceedings papers with oral presentations #2-39 are not assigned poster space. Abstracts are organized first by session, then the last name of the first author. Presenting authors’ names are underlined in the Table of Contents and in bold text on the abstracts.
Transcript
Page 1: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

PACIFICSYMPOSIUMONBIOCOMPUTING2020

ABSTRACTBOOK

PosterPresenters:Posterspaceisassignedbyabstractpagenumber.Pleasefindthepagethatyourabstractisonandputyourposterontheposterboardwith

thecorrespondingnumber(e.g.,ifyourabstractisonpage50,putyourposteronboard#50).

Proceedingspaperswithoralpresentations#2-39arenotassignedposterspace.

Abstractsareorganizedfirstbysession,thenthelastnameofthefirstauthor.Presentingauthors’namesareunderlinedintheTableofContents

andinboldtextontheabstracts.

Page 2: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

PROCEEDINGSPAPERSWITHORALPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE....................................................1PREDICTINGLONGITUDINALOUTCOMESOFALZHEIMER'SDISEASEVIAATENSOR-BASEDJOINT

.........................................................................................................................2CLASSIFICATIONANDREGRESSIONMODELLodewijkBrand,KaiNichols,HuaWang,HengHuang,LiShen,fortheADNI

ROBUSTLYEXTRACTINGMEDICALKNOWLEDGEFROMEHRS:ACASESTUDYOFLEARNINGAHEALTH................................................................................................................................................................3KNOWLEDGEGRAPH

IreneY.Chen,MonicaAgrawal,StevenHorng,DavidSontag...........................4INCREASINGCLINICALTRIALACCRUALVIAAUTOMATEDMATCHINGOFBIOMARKERCRITERIA

JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehrADDRESSINGTHECREDITASSIGNMENTPROBLEMINTREATMENTOUTCOMEPREDICTIONUSINGTEMPORAL

...........................................................................................................................................................5DIFFERENCELEARNINGSaharHarati,AndreaCrowell,HelenMayberg,ShamimNemati

FROMGENOMETOPHENOME:PREDICTINGMULTIPLECANCERPHENOTYPESBASEDONSOMATICGENOMIC.................................................................................................6ALTERATIONSVIATHEGENOMICIMPACTTRANSFORMER

YifengTao,ChunhuiCai,WilliamW.Cohen,XinghuaLuAUTOMATEDPHENOTYPINGOFPATIENTSWITHNON-ALCOHOLICFATTYLIVERDISEASEREVEALSCLINICALLY

................................................................................................................................................7RELEVANTDISEASESUBTYPESMaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi

...8MONITORINGICUMORTALITYRISKWITHALONGSHORT-TERMMEMORYRECURRENTNEURALNETWORKKeYu,MingdaZhang,TianyiCui,MilosHauskrecht

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS......................................9DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLY

............................................................................................................................................................10DISORDEREDREGIONSSinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

DENOVOENSEMBLEMODELINGSUGGESTSTHATAP2-BINDINGTODISORDEREDREGIONSCANINCREASESTERIC.....................................................................................................................................11VOLUMEOFEPSINBUTNOTEPS15

N. SuhasJagannathan,ChristopherW.V.Hogue,LisaTucker-KelloggMODULATIONOFP53TRANSACTIVATIONDOMAINCONFORMATIONSBYLIGANDBINDINGANDCANCER-

......................................................................................................................................................12ASSOCIATEDMUTATIONSXiaorongLiu,JianhanChen

EXPLORINGRELATIONSHIPSBETWEENTHEDENSITYOFCHARGEDTRACTSWITHINDISORDEREDREGIONSAND...............................................................................................................................................................13PHASESEPARATION

RamizSomjee,DianaM.Mitrea,RichardW.Kriwacki

MUTATIONALSIGNATURES...........................................................................................................................14......................................................15PHYSIGS:PHYLOGENETICINFERENCEOFMUTATIONALSIGNATUREDYNAMICS

SarahChristensen,MarkD.M.Leiserson,MohammedEl-KebirTRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSB ..16ASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIESCaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega

DNAREPAIRFOOTPRINTUNCOVERSCONTRIBUTIONOFDNAREPAIRMECHANISMTOMUTATIONAL.............................................................................................................................................................................17SIGNATURES

DamianWojtowicz,MarkD.M.Leiserson,RodedSharan,TeresaM.Przytycka

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................18

.........19CLINICALCONCEPTEMBEDDINGSLEARNEDFROMMASSIVESOURCESOFMULTIMODALMEDICALDATAAndrewL.Beam,BenjaminKompa,AllenSchmaltz,InbarFried,GriffinWeber,NathanPalmer,XuShi,TianxiCai,IsaacS.Kohane

Page 3: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

ii

ASSESSMENTOFIMPUTATIONMETHODSFORMISSINGGENEEXPRESSIONDATAINMETA-ANALYSISOF...........................................................................................................20DISTINCTCOHORTSOFTUBERCULOSISPATIENTS

CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.HillTOWARDSIDENTIFYINGDRUGSIDEEFFECTSFROMSOCIALMEDIAUSINGACTIVELEARNINGANDCROWD

.................................................................................................................................................................................21SOURCINGSophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer

.....................................22MICROVASCULARDYNAMICSFROM4DMICROSCOPYUSINGTEMPORALSEGMENTATIONShirGur,LiorWolf,LiorGolgher,PabloBlinder

.....................................................23USINGTRANSCRIPTIONALSIGNATURESTOFINDCANCERDRIVERSWITHLUREDavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart

PAGE-NET:INTERPRETABLEANDINTEGRATIVEDEEPLEARNINGFORSURVIVALANALYSISUSING.......................................................................................................24HISTOPATHOLOGICALIMAGESANDGENOMICDATA

JieHao,SaiChandraKosaraju,NelsonZangeTsaku,DaeHyunSong,MingonKangMACHINELEARNINGALGORITHMSFORSIMULTANEOUSSUPERVISEDDETECTIONOFPEAKSINMULTIPLE

.....................................................................................................................................................25SAMPLESANDCELLTYPESTobyDylanHocking,GuillaumeBourque

GRAPH-BASEDINFORMATIONDIFFUSIONMETHODFORPRIORITIZINGFUNCTIONALLYRELATEDGENESIN...................................................................................................................26PROTEIN-PROTEININTERACTIONNETWORKS

MinhPham,OlivierLichtargeALITERATURE-BASEDKNOWLEDGEGRAPHEMBEDDINGMETHODFORIDENTIFYINGDRUGREPURPOSING

....................................................................................................................................27OPPORTUNITIESINRAREDISEASESDanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman

...............28TWO-STAGEMLCLASSIFIERFORIDENTIFYINGHOSTPROTEINTARGETSOFTHEDENGUEPROTEASEJacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell

ENHANCINGMODELINTERPRETABILITYANDACCURACYFORDISEASEPROGRESSIONPREDICTIONVIA....................................................................................................29PHENOTYPE-BASEDPATIENTSIMILARITYLEARNING

YueWang,TongWu,YunlongWang,GaoWangPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................30

...............31INTEGRATEDCANCERSUBTYPINGUSINGHETEROGENEOUSGENOME-SCALEMOLECULARDATASETSSuzanArslanturk,SorinDraghici,TinNguyen

ASSESSMENTOFCOVERAGEFORENDOGENOUSMETABOLITESANDEXOGENOUSCHEMICALCOMPOUNDSUSING...................................................................................................................32ANUNTARGETEDMETABOLOMICSPLATFORM

SekWonKong,CarlesHernandez-FerrerCOVERAGEPROFILECORRECTIONOFSHALLOW-DEPTHCIRCULATINGCELL-FREEDNASEQUENCINGVIAMULTI-

..............................................................................................................................................................33DISTANCELEARNINGNicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey

..............................................................................................34PGXMINE:TEXTMININGFORCURATIONOFPHARMGKBJakeLever,JuliaM.Barbarino,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MichelleWhirl-Carrillo,MarkWoon,TeriE.Klein,RussB.Altman

....................................35THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTHShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic

.............................36IMPLEMENTINGACLOUDBASEDMETHODFORPROTECTEDCLINICALTRIALDATASHARINGGauravLuthria,QingboWang

....................................37PATHWAYANDNETWORKEMBEDDINGMETHODSFORPRIORITIZINGPSYCHIATRICDRUGSYashPershad,MargaretGuo,RussB.Altman

ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVEL..........................................................................................................................................................................................38DATA

JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen

Page 4: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

iii

COMPUTATIONALLYEFFICIENT,EXACT,COVARIATE-ADJUSTEDGENETICPRINCIPALCOMPONENTANALYSISBY..................................................39LEVERAGINGINDIVIDUALMARKERSUMMARYSTATISTICSFROMLARGEBIOBANKS

JackWolf,MarthaBarnard,XuetingXia,NathanRyder,JasonWestra,NathanTintle

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONSARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................40

.......................41MULTICLASSDISEASECLASSIFICATIONFROMMICROBIALWHOLE-COMMUNITYMETAGENOMESSaadKhan,LibushaKelly

.....................................42LITGEN:GENETICLITERATURERECOMMENDATIONGUIDEDBYHUMANEXPLANATIONSAllenNie,ArturoL.Pineda,MattW.Wright,HannahWand,BryanWulf,HelioA.Costa,RonakY.Patel,CarlosD.Bustamante,JamesZou

...........................................43MULTILEVELSELF-ATTENTIONMODELANDITSUSEONMEDICALRISKPREDICTIONXianlongZeng,YunyiFeng,SoheilMoosavinasab,DeborahLin,SimonLin,ChangLiu

IDENTIFYINGTRANSITIONALHIGHCOSTUSERSFROMUNSTRUCTUREDPATIENTPROFILESWRITTENBY.................................................................................................................................................44PRIMARYCAREPHYSICIANS

HaoranZhang,ElisaCandido,AndrewS.Wilton,RaquelDuchen,LiisaJaakkimainen,WalterWodchis,QuaidMorris

OBTAININGDUAL-ENERGYCOMPUTEDTOMOGRAPHY(CT)INFORMATIONFROMASINGLE-ENERGYCTIMAGE.........................................45FORQUANTITATIVEIMAGINGANALYSISOFLIVINGSUBJECTSBYUSINGDEEPLEARNING

WeiZhao,TianlingLv,RenaLee,YangChen,LeiXingINTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS....................................46

............................................................47MANY-TO-ONEBINDINGBYINTRINSICALLYDISORDEREDPROTEINREGIONSWei-LunAlterovitz,EshelFaraggi,ChristopherJ.Oldfield,JingweiMeng,BinXue,FeiHuang,PedroRomero,AndrzejKloczkowski,VladimirN.Uversky,A.KeithDunker

MUTATIONALSIGNATURES...........................................................................................................................48......................................49IMPACTOFMUTATIONALSIGNATURESONMICRORNAANDTHEIRRESPONSEELEMENTS

EiriniStamoulakatou,PietroPinoli,StefanoCeri,RosarioPiroGENOMEGERRYMANDERING:OPTIMALDIVISONOFTHEGENOMEINTOREGIONSWITHCANCERTYPESPECIFIC

.....................................................................................................................................50DIFFERENCESINMUTATIONRATESAdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK....................................................................................................................................................................51

..........................................................52LEARNINGALATENTSPACEOFHIGHLYMULTIDIMENSIONALCANCERDATABenjaminKompa,BeauCoker

................53SCALINGSTRUCTURALLEARNINGWITHNO-BEARSTOINFERCAUSALTRANSCRIPTOMENETWORKSHao-ChihLee,MatteoDanieletto,RiccardoMiotto,SarahT.Cherng,JoelT.Dudley

PATHFLOWAI:AHIGH-THROUGHPUTWORKFLOWFORPREPROCESSING,DEEPLEARNINGAND.......................................................................................................................54INTERPRETATIONINDIGITALPATHOLOGY

JoshuaJ.Levy,LucasA.Salas,BrockC.Christensen,AravindhanSriharan,LouisJ.VaickusIMPROVINGSURVIVALPREDICTIONUSINGANOVELFEATURESELECTIONANDFEATUREREDUCTION

...................................................55FRAMEWORKBASEDONTHEINTEGRATIONOFCLINICALANDMOLECULARDATA*LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson

BAYESIANSEMI-NONNEGATIVEMATRIXTRI-FACTORIZATIONTOIDENTIFYPATHWAYSASSOCIATEDWITH.............................................................................................................................................................56CANCERPHENOTYPES

SunhoPark,NabhonilKar,Jae-HoCheong,TaeHyunHwang......................................................................................57TREE-WEIGHTINGFORMULTI-STUDYENSEMBLELEARNERS

MayaRamchandran,PrasadPatil,GiovanniParmigianiPTREXPLORER:ANAPPROACHTOIDENTIFYANDEXPLOREPOSTTRANSCRIPTIONALREGULATORY

.............................................................................................................................58MECHANISMSUSINGPROTEOGENOMICSArunimaSrivastava,MichaelSharpnack,KunHuang,ParagMallick,RaghuMachiraju

Page 5: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

iv

NETWORKREPRESENTATIONOFLARGE-SCALEHETEROGENEOUSRNASEQUENCESWITHINTEGRATIONOF............................................................................59DIVERSEMULTI-OMICS,INTERACTIONS,ANDANNOTATIONSDATA

NhatTran,JeanGao...............60HADOOPANDPYSPARKFORREPRODUCIBILITYANDSCALABILITYOFGENOMICSEQUENCINGSTUDIES

NicholasR.Wheeler,PenelopeBenchek,BrianW.Kunkle,KaraL.Hamilton-Nelson,MikeWarfe,JeremyR.Fondran,JonathanL.Haines,WilliamS.Bush

CERENKOV3:CLUSTERINGANDMOLECULARNETWORK-DERIVEDFEATURESIMPROVECOMPUTATIONAL..............................................................................................................61PREDICTIONOFFUNCTIONALNONCODINGSNPS

YaoYao,StephenA.RamseyPRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE...............................................................................................................................................................62

.................63ANOMIGAN:GENERATIVEADVERSARIALNETWORKSFORANONYMIZINGPRIVATEMEDICALDATAHoBae,DahuinJung,Hyun-SooChoi,SungrohYoon

FREQUENCYOFCLINVARPATHOGENICVARIANTSINCHRONICKIDNEYDISEASEPATIENTSSURVEYEDFOR..........................................................................64RETURNOFRESEARCHRESULTSATACLEVELANDPUBLICHOSPITAL

DanaC.Crawford,JohnLin,JessicaN.CookeBailey,TylerKinzy,JohnR.Sedor,JohnF.O'Toole,WilliamsS.Bush

................65NETWORK-BASEDMATCHINGOFPATIENTSANDTARGETEDTHERAPIESFORPRECISIONONCOLOGYQingzhiLiu,MinJinHa,RupamBhattacharyya,LanaGarmire,VeerabhadranBaladandayuthapani

PHENOME-WIDEASSOCIATIONSTUDIESONCARDIOVASCULARHEALTHANDFATTYACIDSCONSIDERING..................................................................66PHENOTYPEQUALITYCONTROLPRACTICESFOREPIDEMIOLOGICALDATA

KristinPassero,XiHe,JiayanZhou,BertramMueller-Myhsok,MarcusE.Kleber,WinfriedMaerz,MollyA.Hall

.....................................67ATEMPO:PATHWAY-SPECIFICTEMPORALANOMALIESFORPRECISIONTHERAPEUTICSChristopherMichaelPietras,LiamPower,DonnaK.Slonim

.........................................................68FEATURESELECTIONANDDIMENSIONREDUCTIONOFSOCIALAUTISMDATAPeterWashington,KelleyMariePaskov,HaikKalantarian,NathanielStockham,CatalinVoss,AaronKline,RitikPatnaik,BriannaChrisman,MayaVarma,QandeelTariq,KaitlynDunlap,JesseySchwartz,NickHaber,DennisP.Wall

POSTERPRESENTATIONSATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE..................................................69PRIORITIZINGCOPYNUMBERVARIANTSUSINGPHENOTYPEANDGENEFUNCTIONALSIMILARITY.....................70AzzaAlthagafi,JunChen,RobertHoehndorf

INFERRINGTHEREWARDFUNCTIONSTHATGUIDECANCERPROGRESSION..............................................................71JohnKalantari,HeidiNelson,NicholasChia

PREDICTINGDISEASE-ASSOCIATEDMUTATIONOFMETAL-BINDINGSITESINPROTEINSUSINGADEEPLEARNINGAPPROACH................................................................................................................................................................................72MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun

GENERAL...............................................................................................................................................................73RANKINGRASPATHWAYMUTATIONSUSINGEVOLUTIONARYHISTORYOFMEK1...................................................74KatiaAndrianova,IgorJouline

INTEGRATIVEANALYSISOFCOPDANDLUNGCANCERMETADATAREVEALSSHAREDALTERATIONSINIMMUNERESPONSE,PTENANDPI3K-AKTPATHWAYS}.............................................................................................................75DannielleSkander,ArdaDurmaz,MohammedOrloff,GurkanBebek

INVESTIGATINGSOURCESOFIRREPRODUCIBILITYINANALYSISOFGENEEXPRESSIONDATA..................................76CarlyA.Bobak,JaneE.Hill

ETHEREUMANDMULTICHAINBLOCKCHAINSASSECURETOOLSFORINDIVIDUALIZEDMEDICINE........................77CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein

Page 6: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

v

GENOMICPREDICTORSOFL-ASPARAGINASE-INDUCEDPANCREATITISINPEDIATRICCANCERPATIENTS............78BrittDrogemoller,GalenE.B.Wright,ShahradRassekh,ShinyaIto,BruceCarleton,ColinRoss,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium

NITECAP:ANOVELMETHODANDINTERFACEFORTHEIDENTIFICATIONOFCIRCADIANBEHAVIORINHIGHLYPARALLELTIME-COURSEDATA.............................................................................................................................................79ThomasG.Brooks,CrisW.Lawrence,NicholasF.Lahens,SoumyashantNayak,DimitraSarantopoulou,GarretA.FitzGerald,GregoryR.Grant

THEINTERPLAYOFOBESITYANDRACE/ETHNICITYONMAJORPERINATALCOMPLICATIONS.............................80YaadiraBrown,MPH;OlubodeA.Olufajo,MD,MPH;EdwardE.CornwellIII,MD;WilliamSoutherland,PhD

ACOMPARISONOFPHARMACOGENOMICINFORMATIONINFDA-APPROVEDDRUGLABELSANDCPICGUIDELINES..............................................................................................................................................................................81KatherineI.Carrillo,TeriE.Klein

XTEA:ATRANSPOSABLEELEMENTINSERTIONANALYZERFORGENOMESEQUENCINGDATAFROMMULTIPLETECHNOLOGIES........................................................................................................................................................................82ChongChu,RebecaMonroy,SoohyunLee,E.AliceLee,PeterJ.Park

GOGETDATA(GGD):SIMPLE,REPRODUCIBLEACCESSTOSCIENTIFICDATA............................................................83MichaelCormier,JonBelyeu,BrentPedersen,JoeBrown,JohannesKoster,AaronR.Quinlan

GLOBALEPIGENOMICREGULATIONOFGENEEXPRESSIONANDCELLULARPROLIFERATIONINT-CELLLEUKEMIA..84SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue

APHARMACOGENOMICINVESTIGATIONOFTHECARDIACSAFETYPROFILEOFONDANSETRONINCHILDRENANDINPREGNANTWOMEN............................................................................................................................................................85GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraAMagee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton

TREND:APLATFORMFOREXPLORINGPROTEINFUNCTIONINPROKARYOTESUSINGPHYLOGENETICS,DOMAINARCHITECTURES,ANDGENENEIGHBORHOODSINFORMATION......................................................................................86VadimM.Gumerov,IgorB.Zhulin

TRACKSIGFREQ:SUBCLONALRECONSTRUCTIONSBASEDONMUTATIONSIGNATURESANDALLELEFREQUENCIES..87CaitlinF.Harrigan,YuliaRubanova,QuaidMorris,AlinaSelega

AFLEXIBLEPIPELINEFORTHEPREDICTIONOFBIOMARKERSRELEVANTTODRUGSENSITIVITY........................88V.KeithHughitt,SayehGorjifard,AleksandraM.Michalowski,JohnK.Simmons,RyanDale,EricC.Polley,JonathanJ.Keats,BeverlyA.Mock

CREATINGAMETABOLICSYNDROMERESEARCHRESOURCE(METSRR)...................................................................89WillyshaJenkins,ChristianRichardson,ClarLyndaWilliams-DeVanePhD

UTILIZINGCOHORTINFORMATIONTOFINDCAUSATIVEVARIANTS...............................................................................90SenayKafkas,RobertHoehndorf

INTEGRATEDANALYSISOFJAK-STATPATHWAYINHOMEOSTASIS,SIMULATEDINFLAMMATIONANDTUMOUR...91MilicaKrunic,AnzhelikaKarjalainen,MojoyinolaJoannaOla,StephenShoebridge,SabineMacho-Maschler,CarolineLassnig,AndreaPoelzl,MatthiasFarlik,NikolausFortelny,ChristophBock,BirgitStrobl,MathiasMueller

BEERS2:THENEXTGENERATIONOFRNA-SEQSIMULATOR....................................................................................92NicholasF.Lahens,ThomasG.Brooks,DimitraSarantopoulou,SoumyashantNayak,CrisLawrence,AnandSrinivasan,JonathanSchug,GarretA.FitzGerald,JohnB.Hogenesch,YosephBarash,GregoryR.Grant

EFFECTMODIFICATIONBYAGEONADIAGNOSTICTHREE-GENE-SIGNATUREINPATIENTSWITHACTIVETUBERCULOSIS........................................................................................................................................................................93LaurenMcDonnell,CarlyBobak,MatthewNemesure,JustinLin,JaneHill

CLASSIFICATIONANDMUTATIONPREDICTIONFROMGASTROINTESTINALCANCERHISTOPATHOLOGYIMAGESUSINGDEEPLEARNING...........................................................................................................................................................94SungHakLee,Hyun-JongJang

Page 7: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

vi

MAPPINGTHEEMERGENCEANDMIGRATIONOFHEMATOPOIETICSTEMCELLSANDPROGENITORSDURINGHUMANDEVELOPMENTATSINGLECELLRESOLUTION..................................................................................................95FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola

LARGE-SCALEMACHINELEARNINGANDGRAPHANALYTICSFORFUNCTIONALPREDICTIONOFPATHOGENPROTEINS.................................................................................................................................................................................96JasonMcDermott,SongFeng,WilliamNelson,Joon-YongLee,SayanGhosh,ArifulKhan,MahanteshHalappanavar,JustineNguyen,JonathanPruneda,DavidBaltrus,JoshuaAdkins

GENE-SETANALYSISUSINGGWASSUMMARYSTATISTICSANDGTEXDATABASE....................................................97MasahiroNakatochi

TARGETINGCANCERVIASIGNALINGPATHWAYS:ANOVELAPPROACHTOTHEDISCOVERYOFGENECCDC191'SDOUBLE-AGENTFUNCTIONUSINGDIFFERENTIALGENEEXPRESSION,HEATMAPANALYSESTHROUGHAIDEEPLEARNING,ANDMATHEMATICALMODELING................................................................................98AnnieOstojic

RFEX:SIMPLERANDOMFORESTMODELANDSAMPLEEXPLAINERFORNON-MACHINELEARNINGEXPERTS..99DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar

APPARENTBIASTOWARDLONGGENEMISREGULATIONINMECP2SYNDROMESDISAPPEARSAFTERCONTROLLINGFORBASELINEVARIATIONS.....................................................................................................................100AyushT.Raman,AmyEPohodich,Ying-WooiWan,HariKrishnaYalamanchili,WilliamE.Lowry,HudaY.Zoghbi,ZhandongLiu

PREDICTIONOFCHRONOLOGICALANDBIOLOGICALAGEFROMLABORATORYDATA..............................................101LukeSagers,LukeMelas-Kyriazi,ChiragJ.Patel,ArjunK.Manrai

WHOLEGENOMESEQUENCINGANALYSISOFINFLUENZACVIRUSINKOREA...........................................................102SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim

MININGTHEHUMUHUMUNUKUNUKUAPUAANDTHESHAKAOFAUTISMWITHBIGDATABIOMEDICALDATASCIENCE.................................................................................................................................................................................103PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyPaskov,NateStockham,MayaVarma,EmilieLeBlanc,JackKent,YordanPenev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall

DEVELOPMENTOFARECURRENCEPREDICTIONMODELFOREARLYLUNGADENOCARCINOMAUSINGRADIOMICS-BASEDARTIFICIALINTELLIGENCE.....................................................................................................................................104HeeChulYang,GunseokPark,JiEunOh

DRLPC:DIMENSIONREDUCTIONOFSEQUENCINGDATAUSINGLOCALPRINCIPALCOMPONENTS...................105YunJooYoo,FatemehYavartanu,ShelleyB.Bull

META-ANALYSISINEXHAUSTEDTCELLSFROMHOMOSAPIENSANDMUSMUSCULUSPROVIDESNOVELTARGETSFORIMMUNOTHERAPY........................................................................................................................................................106LinZhang,YichengGuo,HafumiNishi

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS.................................107DISORDEREDFUNCTIONCONJUNCTION:ONTHEIN-SILICOFUNCTIONANNOTATIONOFINTRINSICALLYDISORDEREDREGIONS.........................................................................................................................................................108SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

MUTATIONALSIGNATURES........................................................................................................................109TRANSCRIPTION-ASSOCIATEDREGIONALMUTATIONRATESANDSIGNATURESINREGULATORYELEMENTSACROSS2,500WHOLECANCERGENOMES......................................................................................................................110JüriReimand

COMPLEXMOSAICSTRUCTURALVARIATIONSINHUMANFETALBRAINS...................................................................111ShobanaSekar,LiviaTomasini,MariaKalyva,TaejeongBae,LoganManlove,BoZhou,JessicaMariani,FritzSedlazeck,AlexanderE.Urban,ChristosProukakis,FloraM.Vaccarino,AlexejAbyzov

Page 8: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

vii

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK.................................................................................................................................................................112STRATIFICATIONOFKIDNEYTRANSPLANTRECIPIENTSBASEDONTEMPORALDISEASETRAJECTORIES............113IsabellaFriisJørgensenPhD,SørenSchwartzSørensenPhD,SørenBrunakPhD

MODELINGGENEEXPRESSIONLEVELSFROMEPIGENETICMARKERSUSINGADYNAMICALSYSTEMSAPPROACH114JamesBrunner,JacobKim,KordM.Kober

TRANSLATINGBIGDATANEUROIMAGINGFINDINGSINTOMEASUREMENTSOFINDIVIDUALVULNERABILITY..115PeterKochunov,PaulThompson,NedaJahanshad,ElliotHong

AUTOMATINGNEW-USERCOHORTCONSTRUCTIONWITHINDICATIONEMBEDDINGS............................................116RachelD.Melamed

REPRODUCIBILITY-OPTIMIZEDSTATISTICALTESTINGFOROMICSSTUDIES.............................................................117TomiSuomi,LauraElo

DATAINTEGRATIONEXPECTATIONMAPS:TOWARDSMOREINFORMED'OMICDATAINTEGRATION.................118TiaTate,ChristainRichardson,ClarLyndaWIlliams-DeVane

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE............................................................................................................................................................119INTEGRATEDOMICSDATAMININGOFSYNERGISTICGENEPAIRSFORCANCERPRECISIONMEDICINE.................120EunaJeong,ChoaPark,SukjoonYoon

THEPOWEROFDYNAMICSOCIALNETWORKSTOPREDICTINDIVIDUALS'MENTALHEALTH.................................121ShikangLiu,DavidHachen,OmarLizardo,ChristianPoellabauer,AaronStriegel,TijanaMilenkovic

ROBUST-ODAL:LEARNINGFROMHETEROGENEOUSHEALTHSYSTEMSWITHOUTSHARINGPATIENT-LEVELDATA.......................................................................................................................................................................................122JiayiTong,RuiDuan,RuowangLi,MartijnJ.Scheuemie,JasonH.Moore,YongChen

PHARMGKB:AUTOMATEDLITERATUREANNOTATIONS............................................................................................123MichelleWhirl-Carrillo,LiGong,RachelHuddart,KatrinSangkuhl,RyanWhaley,MarkWoon,JuliaBarbarino,JakeLever,RussB.Altman,TeriE.Klein

WORKSHOPSWITHPOSTERPRESENTATIONSPACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE...........124APOLLOPROVIDESCOLLABORATIVEGENOMEANNOTATIONEDITINGWITHTHEPOWEROFJBROWSE...........125NathanDunn,ColinDiesh,RobertBuels,HelenaRasche,AnthonyBretaudeau,NomiHarris,IanHolmes

G:PROFILER-ONEFUNCTIONALENRICHMENTANALYSISTOOL,MANYINTERFACESSERVINGLIFESCIENCECOMMUNITIES.......................................................................................................................................................................126LiisKolberg,UkuRaudvere,IvanKuzmin,JaakVilo,HediPeterson

INCREASINGUSABILITYANDDISSEMINATIONOFTHEPATHFXALGORITHMUSINGWEBAPPLICATIONSANDDOCKERSYSTEMS.................................................................................................................................................................127JenniferWilson,NicholasStepanov,AjinkyaChalke,MikeWong,DragutinPetkovic,RussB.Altman

TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA......................................................................................................................................................................128IDENTIFICATIONOFBIOMARKERSRELATEDTOAUTISMSPECTRUMDISORDERUSINGGENOMICINFORMATION.................................................................................................................................................................................................129LeenaSait,MarthaGizaw,andIosifVaisman

APAN-CANCER3-GENESIGNATURETOPREDICTDORMANCY.....................................................................................130IvyTran,AnchalSharma,SubhajyotiDe

AUTHORINDEX.......................................................................................................................................131

Page 9: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

1

ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 10: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

2

ArtificialIntelligenceforEnhancingClinicalMedicine

PredictingLongitudinalOutcomesofAlzheimer'sDiseaseviaaTensor-BasedJointClassificationandRegressionModel

LodewijkBrand1,KaiNichols1,HuaWang1,HengHuang2,LiShen3,fortheADNI

1ColoradoSchoolofMines,2UniversityofPittsburgh,3UniversityofPennsylvaniaHuaWangAlzheimer'sdisease(AD)isaseriousneurodegenerativeconditionthataffectsmillionsofpeopleacrosstheworld.RecentlymachinelearningmodelshavebeenusedtopredicttheprogressionofAD,althoughtheyfrequentlydonottakeadvantageofthelongitudinalandstructuralcomponentsassociatedwithmulti-modalmedicaldata.Toaddressthis,wepresentanewalgorithmthatusesthemulti-blockalternatingdirectionmethodofmultiplierstooptimizeanovelobjectivethatcombinesmulti-modallongitudinalclinicaldataofvariousmodalitiestosimultaneouslypredictthecognitivescoresanddiagnosesoftheparticipantsintheAlzheimer'sDiseaseNeuroimagingInitiativecohort.Ournewmodelisdesignedtoleveragethestructureassociatedwithclinicaldatathatisnotincorporatedintostandardmachinelearningoptimizationalgorithms.Thisnewapproachshowsstate-of-the-artpredictiveperformanceandvalidatesacollectionofbrainandgeneticbiomarkersthathavebeenrecordedpreviouslyinADliterature.

Page 11: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

3

ArtificialIntelligenceforEnhancingClinicalMedicine

RobustlyExtractingMedicalKnowledgefromEHRs:ACaseStudyofLearningaHealthKnowledgeGraph

IreneY.Chen1,MonicaAgrawal1,StevenHorng2,DavidSontag1

1MassachusettsInstituteofTechnology,2BethIsraelDeaconessMedicalCenter

IreneChenIncreasinglylargeelectronichealthrecords(EHRs)provideanopportunitytoalgorithmicallylearnmedicalknowledge.Inoneprominentexample,acausalhealthknowledgegraphcouldlearnrelationshipsbetweendiseasesandsymptomsandthenserveasadiagnostictooltoberefinedwithadditionalclinicalinput.Priorresearchhasdemonstratedtheabilitytoconstructsuchagraphfromover270,000emergencydepartmentpatientvisits.Inthiswork,wedescribemethodstoevaluateahealthknowledgegraphforrobustness.Movingbeyondprecisionandrecall,weanalyzeforwhichdiseasesandforwhichpatientsthegraphismostaccurate.Weidentifysamplesizeandunmeasuredconfoundersasmajorsourcesoferrorinthehealthknowledgegraph.Weintroduceamethodtoleveragenon-linearfunctionsinbuildingthecausalgraphtobetterunderstandexistingmodelassumptions.Finally,toassessmodelgeneralizability,weextendtoalargersetofcompletepatientvisitswithinahospitalsystem.WeconcludewithadiscussiononhowtorobustlyextractmedicalknowledgefromEHRs.

Page 12: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

4

ArtificialIntelligenceforEnhancingClinicalMedicine

IncreasingClinicalTrialAccrualviaAutomatedMatchingofBiomarkerCriteria

JessicaW.Chen,ChristianA.Kunder,NamBui,JamesL.Zehnder,HelioA.Costa,HenningStehr

StanfordUniversitySchoolofMedicine

Successfulimplementationofprecisiononcologyrequiresboththedeploymentofnucleicacidsequencingpanelstoidentifyclinicallyactionablebiomarkers,andtheefficientscreeningofpatientbiomarkereligibilitytoon-goingclinicaltrialsandtherapies.Thisprocessistypicallyperformedmanuallybybiocurators,geneticists,pathologists,andoncologists;however,thisisatime-intensive,andinconsistentprocessamongsthealthcareproviders.WepresentthedevelopmentofafeaturematchingalgorithmicpipelinethatidentifiespatientswhomeeteligibilitycriteriaofprecisionmedicineclinicaltrialsviageneticbiomarkersandapplyittopatientsundergoingtreatmentattheStanfordCancerCenter.Thisstudydemonstrates,throughourpatienteligibilityscreeningalgorithmthatleveragesclinicalsequencingderivedbiomarkerswithprecisionmedicineclinicaltrials,thesuccessfuluseofanautomatedalgorithmicpipelineasafeasible,accurateandeffectivealternativetothetraditionalmanualclinicaltrialcuration.

Page 13: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

5

ArtificialIntelligenceforEnhancingClinicalMedicine

AddressingtheCreditAssignmentProbleminTreatmentOutcomePredictionusingTemporalDifferenceLearning

SaharHarati1,AndreaCrowell2,HelenMayberg3,ShamimNemati4

1StanfordUniversity,2EmoryUniversity,3MountSinai,4UniversityofCaliforniaSanDiego

SaharHaratiMentalhealthpatientsoftenundergoavarietyoftreatmentsbeforefindinganeffectiveone.Improvedpredictionoftreatmentresponsecanshortenthedurationoftrials.Akeychallengeofapplyingpredictivemodelingtothisproblemisthatoftentheeffectivenessofatreatmentregimenremainsunknownforseveralweeks,andthereforeimmediatefeedbacksignalsmaynotbeavailableforsupervisedlearning.HereweproposeaMachineLearningapproachtoextractingaudio-visualfeaturesfromweeklyvideointerviewrecordingsforpredictingthelikelyoutcomeofDeepBrainStimulation(DBS)treatmentseveralweeksinadvance.Intheabsenceofimmediatetreatment-responsefeedback,weutilizeajointstate-estimationandtemporaldifferencelearningapproachtomodelboththetrajectoryofapatient'sresponseandthedelayednatureoffeedbacks.Ourresultsbasedonlongitudinalrecordingsfrom12patientswithdepressionshowthatthelearnedstatevaluesarepredictiveofthelong-termsuccessofDBStreatments.Weachieveanareaunderthereceiveroperatingcharacteristiccurveof0.88,beatingallbaselinemethods.

Page 14: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

6

ArtificialIntelligenceforEnhancingClinicalMedicine

Fromgenometophenome:Predictingmultiplecancerphenotypesbasedonsomaticgenomicalterationsviathegenomicimpacttransformer

YifengTao1,ChunhuiCai2,WilliamW.Cohen1,XinghuaLu2

1CarnegieMellonUniversity,2UniversityofPittsburgh

YifengTaoCancersaremainlycausedbysomaticgenomicalterations(SGAs)thatperturbcellularsignalingsystemsandeventuallyactivateoncogenicprocesses.Therefore,understandingthefunctionalimpactofSGAsisafundamentaltaskincancerbiologyandprecisiononcology.Here,wepresentadeepneuralnetworkmodelwithencoder-decoderarchitecture,referredtoasgenomicimpacttransformer(GIT),toinferthefunctionalimpactofSGAsoncellularsignalingsystemsthroughmodelingthestatisticalrelationshipsbetweenSGAeventsanddifferentiallyexpressedgenes(DEGs)intumors.Themodelutilizesamulti-headself-attentionmechanismtoidentifySGAsthatlikelycauseDEGs,orinotherwords,differentiatingpotentialdriverSGAsfrompassengeronesinatumor.GITmodellearnsavector(geneembedding)asanabstractrepresentationoffunctionalimpactforeachSGA-affectedgene.GivenSGAsofatumor,themodelcaninstantiatethestatesofthehiddenlayer,providinganabstractrepresentation(tumorembedding)reflectingcharacteristicsofperturbedmolecular/cellularprocessesinthetumor,whichinturncanbeusedtopredictmultiplephenotypes.WeapplytheGITmodelto4,468tumorsprofiledbyTheCancerGenomeAtlas(TCGA)project.TheattentionmechanismenablesthemodeltobettercapturethestatisticalrelationshipbetweenSGAsandDEGsthanconventionalmethods,anddistinguishescancerdriversfrompassengers.ThelearnedgeneembeddingscapturethefunctionalsimilarityofSGAsperturbingcommonpathways.Thetumorembeddingsareshowntobeusefulfortumorstatusrepresentation,andphenotypepredictionincludingpatientsurvivaltimeanddrugresponseofcancercelllines.

Page 15: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

7

ArtificialIntelligenceforEnhancingClinicalMedicine

Automatedphenotypingofpatientswithnon-alcoholicfattyliverdiseaserevealsclinicallyrelevantdiseasesubtypes

MaxenceVandromme,TomiJun,PonniPerumalswami,JoelT.Dudley,AndreaBranch,LiLi

IcahnSchoolofMedicineatMountSinai,Sema4MaxenceVandrommeNon-alcoholicfattyliverdisease(NAFLD)isacomplexheterogeneousdiseasewhichaffectsmorethan20%ofthepopulationworldwide.SomesubtypesofNAFLDhavebeenclinicallyidentifiedusinghypothesis-drivenmethods.Inthisstudy,weuseddataminingtechniquestosearchforsubtypesinanunbiasedfashion.Usingelectronicsignaturesofthedisease,weidentifiedacohortof13,290patientswithNAFLDfromahospitaldatabase.Wegatheredclinicaldatafrommultiplesourcesandappliedunsupervisedclusteringtoidentifyfivesubtypesamongthiscohort.Descriptivestatisticsandsurvivalanalysisshowedthatthesubtypeswereclinicallydistinctandwereassociatedwithdifferentratesofdeath,cirrhosis,hepatocellularcarcinoma,chronickidneydisease,cardiovasculardisease,andmyocardialinfarction.Noveldiseasesubtypesidentifiedinthismannercouldbeusedtorisk-stratifypatientsandguidemanagement.

Page 16: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

8

ArtificialIntelligenceforEnhancingClinicalMedicine

MonitoringICUMortalityRiskwithALongShort-TermMemoryRecurrentNeuralNetwork

KeYu1,MingdaZhang2,TianyiCui2,MilosHauskrecht2

1IntelligentSystemsProgram,UniversityofPittsburgh;2DepartmentofComputerScience,

UniversityofPittsburghKeYuInintensivecareunits(ICU),mortalitypredictionisacriticalfactornotonlyforeffectivemedicalinterventionbutalsoforallocationofclinicalresources.Structuredelectronichealthrecords(EHR)containvaluableinformationforassessingmortalityriskinICUpatients,butcurrentmortalitypredictionmodelsusuallyrequirelaborioushuman-engineeredfeatures.Furthermore,substantialmissingdatainEHRisacommonproblemforboththeconstructionandimplementationofapredictionmodel.Inspiredbylanguage-relatedmodels,wedesignanewframeworkfordynamicmonitoringofpatients’mortalityrisk.Ourframeworkusesthebag-of-wordsrepresentationforallrelevantmedicaleventsbasedonmostrecenthistoryasinputs.Bydesign,itisrobusttomissingdatainEHRandcanbeeasilyimplementedasaninstantscoringsystemtomonitorthemedicaldevelopmentofallICUpatients.Specifically,ourmodeluseslatentsemanticanalysis(LSA)toencodethepatients’statesintolow-dimensionalembeddings,whicharefurtherfedtolongshort-termmemorynetworksformortalityriskprediction.Ourresultsshowthatthedeeplearningbasedframeworkperformsbetterthantheexistingseverityscoringsystem,SAPS-II.Weobservethatbidirectionallongshort-termmemorydemonstratessuperiorperformance,probablyduetothesuccessfulcaptureofbothforwardandbackwardtemporaldependencies.

Page 17: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

9

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 18: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

10

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions

SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

DepartmentofComputerScience,VirginiaCommonwealthUniversity,401WestMainStreet,Richmond,VA23284,USA

LukaszKurganIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresidue-levelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.

Page 19: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

11

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

DenovoensemblemodelingsuggeststhatAP2-bindingtodisorderedregionscanincreasestericvolumeofEpsinbutnotEps15

N.SuhasJagannathan1,ChristopherW.V.Hogue2,LisaTucker-Kellogg3

1Duke-NUSMedicalSchool;2NationalUniversityofSingapore,600EpicWayUnit345SanJoseCA95134;3Cancer&StemCellBiologyandCentreforComputationalBiologyDuke-NUSMedical

SchoolLisaTucker-KelloggProteinswithintrinsicallydisorderedregions(IDRs)havelargehydrodynamicradii,comparedwithglobularproteinsofequivalentweight.RecentexperimentsshowedthatIDRswithlargeradiicancreatestericpressuretodrivemembranecurvatureduringClathrin-mediatedendocytosis(CME).EpsinandEps15aretwoCMEproteinswithIDRsthatcontainmultiplemotifsforbindingtheadaptorproteinAP2,buttheimpactofAP2-bindingontheseIDRsisunknown.SomeIDRsacquirebinding-inducedfunctionbyformingafoldedquaternarystructure,butwehypothesizethattheIDRsofEpsinand/orEps15acquirebinding-inducedfunctionbyincreasingtheirstericvolume.WeexplorethishypothesisinsilicobygeneratingconformationalensemblesoftheIDRsofEpsin(4millionstructures)orEps15(3millionstructures),thenestimatingtheimpactofAP2-bindingonRadiusofGyration(RG).ResultsshowthattheensembleofEpsinIDRconformationsthataccommodateAP2bindinghasaright-shifteddistributionofRG(largerradii)thantheunboundEpsinensemble.Incontrast,theensembleofEps15IDRconformationshascomparableRGdistributionbetweenAP2-boundandunbound.WespeculatethatAP2triggerstheEpsinIDRtofunctionthroughbinding-induced-expansion,whichcouldincreasestericpressureandmembranebendingduringCME.

Page 20: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

12

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

Modulationofp53TransactivationDomainConformationsbyLigandBindingandCancer-AssociatedMutations

XiaorongLiu,JianhanChen

UniversityofMassachusettsAmherstJianhanChenIntrinsicallydisorderedproteins(IDPs)areimportantfunctionalproteins,andtheirderegulationarelinkedtonumeroushumandiseasesincludingcancers.Understandinghowdisease-associatedmutationsordrugmoleculescanperturbthesequence-disorderedensemble-function-diseaserelationshipofIDPsremainschallenging,becauseitrequiresdetailedcharacterizationoftheheterogeneousstructuralensemblesofIDPs.Inthiswork,wecombinethelatestatomisticforcefielda99SB-disp,enhancedsamplingtechniquereplicaexchangewithsolutetempering,andGPU-acceleratedmoleculardynamicssimulationstoinvestigatehowfourcancer-associatedmutations,K24N,N29K/N30D,D49Y,andW53G,andbindingofananti-cancermolecule,epigallocatechingallate(EGCG),modulatethedisorderedensembleofthetransactivationdomain(TAD)oftumorsuppressorp53.Throughextensivesampling,inexcessof1.0μsperreplica,well-convergedstructuralensemblesofwild-typeandmutantp53-TADaswellasWTp53-TADinthepresenceofEGCGweregenerated.Theresultsrevealthatmutantscouldinducelocalstructuralchangesandaffectsecondarystructuralproperties.Interestingly,bothEGCGbindingandN29K/N30Dcouldalsoinducelong-rangestructuralreorganizationsandleadtomorecompactstructuresthatcouldshieldkeybindingsitesofp53-TADregulators.FurtheranalysisrevealsthattheeffectsofEGCGbindingaremainlyachievedthroughnonspecificinteractions.Theseobservationsaregenerallyconsistentwithon-goingNMRstudiesandbindingassays.OurstudiessuggestthatinducedconformationalcollapseofIDPsmaybeageneralmechanismforshieldingfunctionalsites,thusinhibitingrecognitionoftheirtargets.Thecurrentstudyalsodemonstratesthatatomisticsimulationsprovideaviableapproachforstudyingthesequence-disorderedensemble-function-diseaserelationshipsofIDPsanddevelopingnewdrugdesignstrategiestargetingregulatoryIDPs.

Page 21: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

13

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

ExploringRelationshipsbetweentheDensityofChargedTractswithinDisorderedRegionsandPhaseSeparation

RamizSomjee1,2,DianaM.Mitrea1,RichardW.Kriwacki1,3

1St.JudeChildren'sResearchHospital;2RhodesCollege,3UniversityofTennesseeHealthSciences

CenterRamizSomjeeBiomolecularcondensatesformthroughaprocesstermedphaseseparationandplaydiverserolesthroughoutthecell.Proteinsthatundergophaseseparationoftenhavedisorderedregionsthatcanengageinweak,multivalentinteractions;however,ourunderstandingofthesequencegrammarthatdefineswhichproteinsphaseseparateisfarfromcomplete.Here,weshowthatproteinsthatdisplayahighdensityofchargedtractswithinintrinsicallydisorderedregionsarelikelytobeconstituentsofelectrostaticallyorganizedbiomolecularcondensates.WescoredthehumanproteomeusinganalgorithmtermedABTdensitythatquantifiesthedensityofchargedtractsandobservedthatproteinswithmorechargedtractsareenrichedinparticularGeneOntologyannotationsand,baseduponanalysisofinteractionnetworks,clusterintodistinctbiomolecularcondensates.Theseresultssuggestthatelectrostatically-driven,multivalentinteractionsinvolvingchargedtractswithindisorderedregionsservetoorganizecertainbiomolecularcondensatesthroughphaseseparation.

Page 22: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

14

MUTATIONALSIGNATURES

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 23: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

15

MutationalSignatures

PhySigs:PhylogeneticInferenceofMutationalSignatureDynamics

SarahChristensen1,MarkD.M.Leiserson2,MohammedEl-Kebir1

1UniversityofIllinoisatUrbana-Champaign,2UniversityofMaryland

SarahChristensenDistinctmutationalprocessesshapethegenomesoftheclonescomprisingatumor.Theseprocessesresultindistinctmutationalpatterns,summarizedbyasmallnumberofmutationalsignatures.Currentanalysesofclone-specificexposurestomutationalsignaturesdonotfullyincorporateatumor’sevolutionarycontext,eitherinferringidenticalexposuresforalltumorclones,orinferringexposuresforeachcloneindependently.Here,weintroducetheTree-constrainedExposureproblemtoinferasmallnumberofexposureshiftsalongtheedgesofagiventumorphylogeny.Ouralgorithm,PhySigs,solvesthisproblemandincludesmodelselectiontoidentifythenumberofexposureshiftsthatbestexplainthedata.Wevalidateourapproachonsimulateddataandidentifyexposureshiftsinlungcancerdata,includingatleastoneshiftwithamatchingsubclonaldrivermutationinthemismatchrepairpathway.Moreover,weshowthatourapproachenablestheprioritizationofalternativephylogeniesinferredfromthesamesequencingdata.PhySigsispubliclyavailableathttps://github.com/elkebir-group/PhySigs

Page 24: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

16

MutationalSignatures

TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies

CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4

1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof

MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer

Centre,NewYork,USA(pending)CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Recon-structingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.

Page 25: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

17

MutationalSignatures

DNARepairFootprintUncoversContributionofDNARepairMechanismtoMutationalSignatures

DamianWojtowicz1,MarkD.M.Leiserson2,RodedSharan3,TeresaM.Przytycka1

1NIH,2UniversityofMaryland,3TelAvivUniversityTeresaPrzytyckaCancergenomesaccumulatealargenumberofsomaticmutationsresultingfromimperfectionofDNAprocessingduringnormalcellcycleaswellasfromcarcinogenicexposuresorcancerrelatedaberrationsofDNAmaintenancemachinery.Theseprocessesoftenleadtodistinctivepatternsofmutations,calledmutationalsignatures.Severalcomputationalmethodshavebeendevelopedtouncoversuchsignaturesfromcatalogsofsomaticmutations.However,cancermutationalsignaturesaretheend-effectofseveralinterplayingfactorsincludingcarcinogenicexposuresandpotentialdeficienciesoftheDNArepairmechanism.Tofullyunderstandthenatureofeachsignature,itisimportanttodisambiguatetheatomiccomponentsthatcontributetothefinalsignature.Here,weintroduceanewdescriptorofmutationalsignatures,DNARepairFootPrint(RePrint),andshowthatitcancapturecommonpropertiesofdeficienciesinrepairmechanismscontributingtodiversesignatures.WevalidatethemethodwithpublishedmutationalsignaturesfromcelllinestargetedwithCRISPR-Cas9-basedknockoutsofDNArepairgenes.

Page 26: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

18

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 27: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

19

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ClinicalConceptEmbeddingsLearnedfromMassiveSourcesofMultimodalMedicalData

AndrewL.Beam1,BenjaminKompa2,AllenSchmaltz1,InbarFried3,GriffinWeber2,NathanPalmer2,XuShi1,TianxiCai1,IsaacS.Kohane3

1HarvardT.H.ChanSchoolofPublicHealth,2HarvardMedicalSchool,3UniversityofNorth

CarolinaSchoolofMedicineBenjaminKompaWordembeddingsareapopularapproachtounsupervisedlearningofwordrelationshipsthatarewidelyusedinnaturallanguageprocessing.Inthisarticle,wepresentanewsetofembeddingsformedicalconceptslearnedusinganextremelylargecollectionofmultimodalmedicaldata.Leaningonrecenttheoreticalinsights,wedemonstratehowaninsuranceclaimsdatabaseof60millionmembers,acollectionof20millionclinicalnotes,and1.7millionfulltextbiomedicaljournalarticlescanbecombinedtoembedconceptsintoacommonspace,resultinginthelargesteversetofembeddingsfor108,477medicalconcepts.Toevaluateourapproach,wepresentanewbenchmarkmethodologybasedonstatisticalpowerspecificallydesignedtotestembeddingsofmedicalconcepts.Ourapproach,calledcui2vec,attainsstate-of-the-artperformancerelativetopreviousmethodsinmostinstances.Finally,weprovideadownloadablesetofpre-trainedembeddingsforotherresearcherstouse,aswellasanonlinetoolforinteractiveexplorationofthecui2vecembeddings.

Page 28: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

20

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

AssessmentofImputationMethodsforMissingGeneExpressionDatainMeta-AnalysisofDistinctCohortsofTuberculosisPatients

CarlyA.Bobak,LaurenMcDonnell,MatthewD.Nemesure,JustinLin,JaneE.Hill

DartmouthCollegeCarlyBobakThegrowthofpubliclyavailablerepositories,suchastheGeneExpressionOmnibus,hasallowedresearcherstoconductmeta-analysisofgeneexpressiondataacrossdistinctcohorts.Inthiswork,weassesseightimputationmethodsfortheirabilitytoimputegeneexpressiondatawhenvaluesaremissingacrossanentirecohortofTuberculosis(TB)patients.Weinvestigatehowvaryingproportionsofmissingdata(across10%,20%,and30%ofpatientsamples)influencetheimputationresults,andtestforsignificantlydifferentiallyexpressedgenesandenrichedpathwaysinpatientswithactiveTB.Ourresultsindicatethattruncatingtocommongenesobservedacrosscohorts,whichisthecurrentmethodusedbyresearchers,resultsintheexclusionofimportantbiologyandsuggestthatLASSOandLLSimputationmethodologiescanreasonablyimputegenesacrosscohortswhentotalmissingnessratesarebelow20%.

Page 29: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

21

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Towardsidentifyingdrugsideeffectsfromsocialmediausingactivelearningandcrowdsourcing

SophieBurkhardt,JuliaSiekiera,JosuaGlodde,MiguelA.Andrade-Navarro,StefanKramer

UniversityofMainzSophieBurkhardtMotivation:Socialmediaisalargelyuntappedsourceofinformationonsideeffectsofdrugs.Twitterinparticulariswidelyusedtoreportoneverydayeventsandpersonalailments.However,labelingthisnoisydataisadifficultproblembecauselabeledtrainingdataissparseandautomaticlabelingiserror-prone.Crowdsourcingcanhelpinsuchascenariotoobtainmorereliablelabels,butisexpensiveincomparisonbecauseworkershavetobepaid.Toremedythis,semi-supervisedactivelearningmayreducethenumberoflabeleddataneededandfocusthemanuallabelingprocessonimportantinformation.Results:WeextracteddatafromTwitterusingthepublicAPI.WesubsequentlyuseAmazonMechanicalTurkincombinationwithastate-of-the-artsemi-supervisedactivelearningmethodtolabeltweetswiththeirassociateddrugsandsideeffectsintwostages.Ourresultsshowthatourmethodisaneffectivewayofdiscoveringsideeffectsintweetswithanimprovementfrom53%F-measureto67%F-measureascomparedtoaonestageworkflow.Additionally,weshowtheeffectivenessoftheactivelearningschemeinreducingthelabelingcostincomparisontoanon-activebaseline.

Page 30: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

22

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

MicrovascularDynamicsfrom4DMicroscopyUsingTemporalSegmentation

ShirGur,LiorWolf,LiorGolgher,PabloBlinder

TelAvivUniversityLiorWolfRecentlydevelopedmethodsforrapidcontinuousvolumetrictwo-photonmicroscopyfacilitatetheobservationofneuronalactivityinhundredsofindividualneuronsandchangesinbloodflowinadjacentbloodvesselsacrossalargevolumeoflivingbrainatunprecedentedspatio-temporalresolution.However,thehighimagingratenecessitatesfullyautomatedimageanalysis,whereastissueturbidityandphoto-toxicitylimitationsleadtoextremelysparseandnoisyimagery.Inthiswork,weextendarecentlyproposeddeeplearningvolumetricbloodvesselsegmentationnetwork,suchthatitsupportstemporalanalysis.Withthistechnology,weareabletotrackchangesincerebralbloodvolumeovertimeandidentifyspontaneousarterialdilationsthatpropagatetowardsthepialsurface.Thisnewcapabilityisapromisingsteptowardscharacterizingthehemodynamicresponsefunctionuponwhichfunctionalmagneticresonanceimaging(fMRI)isbased.

Page 31: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

23

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

UsingTranscriptionalSignaturestoFindCancerDriverswithLURE

DavidHaan,RuikangTao,VerenaFriedl,IoannisN.Anastopoulos,ChristopherK.Wong,AlanaS.Weinstein,JoshuaM.Stuart

Dept.ofBiomolecularEngineeringandUCSantaCruzGenomicsInstitute,UniversityOf

CaliforniaSantaCruz,SantaCruz,CA95064USADavidHaanCancergenomeprojectshaveproducedmultidimensionaldatasetsonthousandsofsamples.Yet,dependingonthetumortype,5-50%ofsampleshavenoknowndrivingevent.Weintroduceasemi-supervisedmethodcalledLearningUnRealizedEvents(LURE)thatusesaprogressivelabellearningframeworkandminimumspanninganalysistopredictcancerdriversbasedontheiralteredsamplessharingageneexpressionsignaturewiththesamplesofaknownevent.WedemonstratetheutilityofthemethodontheTCGAPan-CancerAt-lasdatasetforwhichitproducedahigh-confidenceresultrelating59newconnectionsto18knownmutationeventsincludingalterationsinthesamegene,family,andpathway.WegiveexamplesofpredicteddriversinvolvedinTP53,telomeremaintenance,andMAPK/RTKsignalingpathways.LUREidentifiesconnectionsbetweengeneswithnoknownpriorrela-tionship,someofwhichmayoffercluesfortargetingspecificformsofcancer.CodeandSup-plementalMaterialareavailableontheLUREwebsite:https://sysbiowiki.soe.ucsc.edu/lure.

Page 32: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

24

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

PAGE-Net:InterpretableandIntegrativeDeepLearningforSurvivalAnalysisUsingHistopathologicalImagesandGenomicData

JieHao1,SaiChandraKosaraju2,NelsonZangeTsaku3,DaeHyunSong4,MingonKang2

1UniversityofPennsylvania,2UniversityofNevadaLasVegas,3KennesawStateUniversity,

4GyeongsangNationalUniversityChangwonHospitalJieHaoTheintegrationofmulti-modaldata,suchashistopathologicalimagesandgenomicdata,isessentialforunderstandingcancerheterogeneityandcomplexityforpersonalizedtreatments,aswellasforenhancingsurvivalpredictionsincancerstudy.Histopathology,asaclinicalgold-standardtoolfordiagnosisandprognosisincancers,allowsclinicianstomakeprecisedecisionsontherapies,whereashigh-throughputgenomicdatahavebeeninvestigatedtodissectthegeneticmechanismsofcancers.Weproposeabiologicallyinterpretabledeeplearningmodel(PAGE-Net)thatintegrateshistopathologicalimagesandgenomicdata,notonlytoimprovesurvivalprediction,butalsotoidentifygeneticandhistopathologicalpatternsthatcausedifferentsurvivalratesinpatients.PAGE-Netconsistsofpathology/genome/demography-specificlayers,eachofwhichprovidescomprehensivebiologicalinterpretation.Inparticular,weproposeanovelpatch-wisetexture-basedconvolutionalneuralnetwork,withapatchaggregationstrategy,toextractglobalsurvival-discriminativefeatures,withoutmanualannotationforthepathology-specificlayers.Weadaptedthepathway-basedsparsedeepneuralnetwork,namedCox-PASNet,forthegenome-specificlayers.TheproposeddeeplearningmodelwasassessedwiththehistopathologicalimagesandthegeneexpressiondataofGlioblastomaMultiforme(GBM)atTheCancerGenomeAtlas(TCGA)andTheCancerImagingArchive(TCIA).PAGE-NetachievedaC-indexof0.702,whichishigherthantheresultsachievedwithonlyhistopathologicalimages(0.509)andCox-PASNet(0.640).Moreimportantly,PAGE-Netcansimultaneouslyidentifyhistopathologicalandgenomicprognosticfactorsassociatedwithpatients’survivals.ThesourcecodeofPAGE-Netispubliclyavailableathttps://github.com/DataX-JieHao/PAGE-Net

Page 33: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

25

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Machinelearningalgorithmsforsimultaneoussuperviseddetectionofpeaksinmultiplesamplesandcelltypes

TobyDylanHocking1,GuillaumeBourque2

1NorthernArizonaUniversity,2McGillUniversity

TobyHockingJointpeakdetectionisacentralproblemwhencomparingsamplesinepigenomicdataanalysis,butcurrentalgorithmsforthistaskareunsupervisedandlimitedtoatmosttwosampletypes.WeproposePeakSegPipeline,anewgenome-widemulti-samplepeakcallingpipelineforepigenomicdatasets.Itperformspeakdetectionusingaconstrainedmaximumlikelihoodsegmentationmodelwithessentiallyonlyonefreeparameterthatneedstobetuned:thenumberofpeaks.Toselectthenumberofpeaks,weproposetolearnapenaltyfunctionbasedonuser-providedlabelsthatindicategenomicregionswithorwithoutpeaksinspecificsamples.Incomparisonswithstate-of-the-artpeakdetectionalgorithms,PeakSegPipelineachievessimilarorbetteraccuracy,andamoreinterpretablemodelwithoverlappingpeaksthatoccurinexactlythesamepositionsacrossallsamples.Ournovelapproachisabletolearnthatpredictedpeaksizesvarybyexperimenttype.

Page 34: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

26

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Graph-basedinformationdiffusionmethodforprioritizingfunctionallyrelatedgenesinprotein-proteininteractionnetworks

MinhPham,OlivierLichtarge

BaylorCollegeofMedicineMinhPhamShortestpathlengthmethodsareroutinelyusedtovalidatewhethergenesofinterestarefunctionallyrelatedtoeachotherbasedonbiologicalnetworkinformation.However,themethodsarecomputationallyintensive,impedingextensiveutilizationofnetworkinformation.Inaddition,non-weightedshortestpathlengthapproach,whichismorefrequentlyused,oftentreatallnetworkconnectionsequallywithouttakingintoaccountofconfidencelevelsoftheassociations.Ontheotherhand,graph-basedinformationdiffusionmethod,whichemploysboththepresenceandconfidenceweightsofnetworkedges,canefficientlyexplorelargenetworksandhaspreviouslydetectedmeaningfulbiologicalpatterns.Therefore,inthisstudy,wehypothesizedthatthegraph-basedinformationdiffusionmethodcouldprioritizegeneswithrelevantfunctionsmoreefficientlyandaccuratelythantheshortestpathlengthapproaches.Wedemonstratedthatthegraph-basedinformationdiffusionmethodsubstantiallydifferentiatednotonlygenesparticipatinginsamebiologicalpathways(p<<0.0001)butalsogenesassociatedwithspecifichumandrug-inducedclinicalsymptoms(p<<0.0001)fromrandom.Furthermore,thediffusionmethodprioritizedthesefunctionallyrelatedgenesfasterandmoreaccuratelythantheshortestpathlengthapproaches(pathways:p=2.7e-28,clinicalsymptoms:p=0.032).Thesedatashowthegraph-basedinformationdiffusionmethodcanberoutinelyusedforrobustprioritizationoffunctionallyrelatedgenes,facilitatingefficientnetworkvalidationandhypothesisgeneration,especiallyforhumanphenotype-specificgenes.

Page 35: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

27

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ALiterature-BasedKnowledgeGraphEmbeddingMethodforIdentifyingDrugRepurposingOpportunitiesinRareDiseases

DanielN.Sosa,AlexanderDerry,MargaretGuo,EricWei,ConnorBrinton,RussB.Altman

StanfordUniversityDanielSosaMillionsofAmericansareaffectedbyrarediseases,manyofwhichhavepoorsurvivalrates.However,thesmallmarketsizeofindividualrarediseases,combinedwiththetimeandcapitalrequirementsofpharmaceuticalR&D,havehinderedthedevelopmentofnewdrugsforthesecases.Apromisingalternativeisdrugrepurposing,wherebyexistingFDA-approveddrugsmightbeusedtotreatdiseasesdifferentfromtheiroriginalindications.Inordertogeneratedrugrepurposinghypothesesinasystematicandcomprehensivefashion,itisessentialtointegrateinformationfromacrosstheliteratureofpharmacology,genetics,andpathology.Tothisend,weleverageanewlydevelopedknowledgegraph,theGlobalNetworkofBiomedicalRelationships(GNBR).GNBRisalarge,heterogeneousknowledgegraphcomprisingdrug,disease,andgene(orprotein)entitieslinkedbyasmallsetofsemantic“themes”derivedfromtheabstractsofbiomedicalliterature.Weapplyaknowledgegraphembeddingmethodthatexplicitlymodelstheuncertaintyassociatedwithliterature-derivedrelationshipsanduseslinkpredictiontogeneratedrugrepurposinghypotheses.Thisapproachachieveshighperformanceonagold-standardtestsetofknowndrugindications(AUROC=0.89)andiscapableofgeneratingnovelrepurposinghypotheses,whichweindependentlyvalidateusingexternalliteraturesourcesandproteininteractionnetworks.Finally,wedemonstratetheabilityofourmodeltoproduceexplanationsofitspredictions.

Page 36: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

28

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Two-stageMLClassifierforIdentifyingHostProteinTargetsoftheDengueProtease

JacobT.Stanley,AlisonR.Gilchrist,AlexC.Stabell,MaryA.Allen,SaraL.Sawyer,RobinD.Dowell

DepartmentofMolecular,CellularandDevelopmentalBiology;BioFrontiersInstitute;University

ofColoradoBoulder(allauthorshavethesameaffiliation)JacobStanleyFlavivirusessuchasdengueencodeaproteasethatisessentialforviralreplication.Theproteasefunctionsbycleavingwell-conservedpositionsintheviralpolyprotein.Inadditiontotheviralpolyprotein,thedengueproteasecleavesatleastonehostproteininvolvedinimmuneresponse.Thisraisesthequestion,whatotherhostproteinsaretargetedandcleaved?Herewepresentanewcomputationalmethodforidentifyingputativehostproteintargetsofthedenguevirusprotease.Ourmethodreliesonbiochemicalandsecondarystructurefeaturesattheknowncleavagesitesintheviralpolyproteininatwo-stageclassificationprocesstoidentifyputativecleavagetargets.Theaccuracyofourpredictionsscaledinverselywithevolutionarydistancewhenweappliedittotheknowncleavagesitesofseveralotherflaviviruses---agoodindicationofthevalidityofourpredictions.Ultimately,ourclassifieridentified257humanproteinsitespossessingbothasimilartargetmotifandaccessiblelocalstructure.Theseproteinsarepromisingcandidatesforfurtherinvestigation.Asthenumberofviralsequencesexpands,ourmethodcouldbeadoptedtopredicthosttargetsofotherflaviviruses.

Page 37: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

29

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

EnhancingModelInterpretabilityandAccuracyforDiseaseProgressionPredictionviaPhenotype-BasedPatientSimilarityLearning

YueWang1,TongWu1,2,YunlongWang1,GaoWang3

1IQVIAInc.,2UniversityofMinnesota,3UniversityofChicago

YueWangModelshavebeenproposedtoextracttemporalpatternsfromlongitudinalelectronichealthrecords(EHR)forclinicalpredictivemodels.However,thecommonrelationsamongpatients(e.g.,receivingthesamemedicaltreatments)wererarelyconsidered.Inthispaper,weproposetolearnpatientsimilarityfeaturesasphenotypesfromtheaggregatedpatient-medicalservicematrixusingnon-negativematrixfactorization.Onreal-worldmedicalclaimdata,weshowthatthelearnedphenotypesarecoherentwithineachgroup,andalsoexplanatoryandindicativeoftargeteddiseases.WeconductedexperimentstopredictthediagnosesforChronicLymphocyticLeukemia(CLL)patients.Resultsshowthatthephenotype-basedsimilarityfeaturescanimprovepredictionovermultiplebaselines,includinglogisticregression,randomforest,convolutionalneuralnetwork,andmore.

Page 38: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

30

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE

PROCEEDINGSPAPERSWITHORALPRESENTATIONS

Page 39: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

31

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

IntegratedCancerSubtypingusingHeterogeneousGenome-ScaleMolecularDatasets

SuzanArslanturk1,SorinDraghici1,TinNguyen2

1WayneStateUniversity,2UniversityofNevada

SorinDraghiciVastrepositoriesofheterogeneousdatafromexistingsourcespresentuniqueopportunities.Takenindividually,eachofthedatasetsofferssolutionstoimportantdomainandsource-specificquestions.Collectively,theyrepresentcomplementaryviewsofrelateddataentitieswithanaggregateinformationvalueoftenwellexceedingthesumofitsparts.Integrationofheterogeneousdataisthereforeparamounttoi)obtainamoreunifiedpictureandcomprehensiveviewoftherelations,ii)achievemorerobustresults,iii)improvetheaccuracyandintegrity,andiv)illuminatethecomplexinteractionsamongdatafeatures.Inthispaper,wehaveproposedadataintegrationmethodologytoidentifysubtypesofcancerusingmultipledatatypes(mRNA,methylation,microRNAandsomaticvariants)anddifferentdatascalesthatcomefromdifferentplatforms(microarray,sequencing,etc.).TheCancerGenomeAtlas(TCGA)datasetisusedtobuildthedataintegrationandcancersubtypingframework.Theproposeddataintegrationanddiseasesubtypingapproachaccuratelyidentifiesnovelsubgroupsofpatientswithsignificantlydifferentsurvivalprofiles.Withcurrentavailabilityofvastgenomics,andvariantdataforcancer,theproposeddataintegrationsystemwillbetterdifferentiatecancerandpatientsubtypesforriskandoutcomepredictionandtargetedtreatmentplanningwithoutadditionalcostandpreciouslosttime.

Page 40: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

32

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Assessmentofcoverageforendogenousmetabolitesandexogenouschemicalcompoundsusinganuntargetedmetabolomicsplatform

SekWonKong1,CarlesHernandez-Ferrer2

1ComputationalHealthInformaticsProgram,BostonChildren’sHospital,300LongwoodAvenueBoston,MA02115,USA;2DepartmentofPediatrics,HarvardMedicalSchool,Boston,MA02115,

USASekWonKongPhysiologicalstatusandpathologicalchangesinanindividualcanbecapturedbymetabolicstatethatreflectstheinfluenceofbothgeneticvariantsandenvironmentalfactorssuchasdiet,lifestyleandgutmicrobiome.Thetotalityofenvironmentalexposurethroughoutlifetime–i.e.,exposome–isdifficulttomeasurewithcurrenttechnologies.However,targetedmeasurementofexogenouschemicalsanduntargetedprofilingofendogenousmetaboliteshavebeenwidelyusedtodiscoverbiomarkersofpathophysiologicchangesandtounderstandfunctionalimpactsofgeneticvariants.Toinvestigatethecoverageofchemicalspaceandinterindividualvariationrelatedtodemographicandpathologicalconditions,weprofiled169plasmasamplesusinganuntargetedmetabolomicsplatform.Onaverage,1,009metaboliteswerequantifiedineachindividual(range906–1,038)outof1,244totalchemicalcompoundsdetectedinourcohort.Ofnote,agewaspositivelycorrelatedwiththetotalnumberofdetectedmetabolitesinbothmalesandfemales.UsingtherobustQnestimator,wefoundmetaboliteoutliersineachsample(mean22,rangefrom7to86).Atotalof50metaboliteswereoutliersinapatientwithphenylketonuriaincludingtheonesknownforphenylalaninepathwaysuggestingmultiplemetabolicpathwaysperturbedinthispatient.Thelargestnumberofoutliers(N=86)wasfoundina5-year-oldboywithalpha-1-antitrypsindeficiencywhowerewaitingforlivertransplantationduetocirrhosis.Xenobioticsincludingdrugs,dietsandenvironmentalchemicalsweresignificantlycorrelatedwithdiverseendogenousmetabolitesandtheuseofantibioticssignificantlychangedgutmicrobialproductsdetectedinhostcirculation.Severalchallengessuchasannotationoffeatures,referencerangeandvarianceforeachfeatureperagegroupandgender,andpopulationscalereferencedatasetsneedtobeaddressed;however,untargetedmetabolomicscouldbeimmediatelydeployedasabiomarkerdiscoveryplatformandtoevaluatetheimpactofgenomicvariantsandexposuresonmetabolicpathwaysforsomediseases.

Page 41: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

33

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Coverageprofilecorrectionofshallow-depthcirculatingcell-freeDNAsequencingviamulti-distancelearning

NicholasB.Larson,MelissaC.Larson,JieNa,CarlosP.Sosa,ChenWang,Jean-PierreKocher,RossRowsey

MayoClinicCollegeofMedicineandSciences

NicholasLarsonShallow-depthwhole-genomesequencing(WGS)ofcirculatingcell-freeDNA(ccfDNA)isapopularapproachfornon-invasivegenomicscreeningassays,includingliquidbiopsyforearlydetectionofinvasivetumorsaswellasnon-invasiveprenatalscreening(NIPS)forcommonfetaltrisomies.IncontrasttonuclearDNAWGS,ccfDNAWGSexhibitsextensiveinter-andintra-samplecoveragevariabilitythatisnotfullyexplainedbytypicalsourcesofvariationinWGS,suchasGCcontent.Thisvariabilitymayinflatefalsepositiveandfalsenegativescreeningratesofcopy-numberalterationsandaneuploidy,particularlyifthesefeaturesarepresentatarelativelylowproportionoftotalsequencedcontent.Herein,weproposeanempirically-drivencoveragecorrectionstrategythatleveragespriorannotationinformationinamulti-distancelearningcontexttoimprovewithin-samplecoverageprofilecorrection.Specifically,wetrainaweightedk-nearestneighbors-stylemethodonnon-pregnantfemaledonorccfDNAWGSsamples,andapplyittoNIPSsamplestoevaluatecoverageprofilevariabilityreduction.Weadditionallycharacterizeimprovementinthediscriminationofpositivefetaltrisomycasesrelativetonormalcontrols,andcompareourresultsagainstamoretraditionalregression-basedapproachtoprofilecoveragecorrectionbasedonGCcontentandmappability.Undercross-validation,performancemeasuresindicatedbenefittocombiningthetwofeaturesetsrelativetoeitherinisolation.Wealsoobservedsubstantialimprovementincoverageprofilevariabilityreductioninleave-outclinicalNIPSsamples,withvariabilityreducedby26.5-53.5%relativetothestandardregression-basedmethodasquantifiedbymedianabsolutedeviation.Finally,weobservedimprovementdiscriminationforscreeningpositivetrisomycasesreducingccfDNAWGScoveragevariabilitywhileadditionallyimprovingNIPStrisomyscreeningassayperformance.Overall,ourresultsindicatethatmachinelearningapproachescansubstantiallyimproveccfDNAWGScoverageprofilecorrectionanddownstreamanalyses.

Page 42: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

34

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

PGxMine:TextminingforcurationofPharmGKB

JakeLever1,JuliaM.Barbarino2,LiGong2,RachelHuddart2,KatrinSangkuhl2,RyanWhaley2,MichelleWhirl-Carrillo2,MarkWoon2,TeriE.Klein2,3,RussB.Altman1,2,3

1DepartmentofBioengineering,StanfordUniversity,Stanford,CA,94305;2Departmentof

BiomedicalDataScience,StanfordUniversity,Stanford,CA,94305;3DepartmentofMedicine,StanfordUniversity,Stanford,CA,94305

JakeLeverPrecisionmedicinetailorstreatmenttoindividualspersonaldataincludingdifferencesintheirgenome.ThePharmacogenomicsKnowledgebase(PharmGKB)provideshighlycuratedinformationontheeffectofgeneticvariationondrugresponseandsideeffectsforawiderangeofdrugs.PharmGKB’sscientificcuratorstriage,reviewandannotatealargenumberofpaperseachyearbutthetaskischallenging.WepresentthePGxMineresource,atext-minedresourceofpharmacogenomicassociationsfromallaccessiblepublishedliteraturetoassistinthecurationofPharmGKB.Wedevelopedasupervisedmachinelearningpipelinetoextractassociationsbetweenavariant(DNAandproteinchanges,starallelesanddbSNPidentifiers)andachemical.PGxMinecovers452chemicalsand2,426variantsandcontains19,930mentionsofpharmacogenomicassociationsacross7,170papers.AnevaluationbyPharmGKBcuratorsfoundthat57ofthetop100associationsnotfoundinPharmGKBledto83curatablepapersandafurther24associationswouldlikelyleadtocuratablepapersthroughcitations.Theresultscanbeviewedathttps://pgxmine.pharmgkb.org/andcodecanbedownloadedathttps://github.com/jakelever/pgxmine.

Page 43: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

35

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth

ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1

1UniversityofNotreDame,2UniversityofCaliforniaLosAngeles

ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine''correlation''betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.

Page 44: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

36

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

ImplementingaCloudBasedMethodforProtectedClinicalTrialDataSharing

GauravLuthria,QingboWang

HarvardUniversityGauravLuthriaClinicaltrialsgeneratealargeamountofdatathathavebeenunderutilizedduetoobstaclesthatpreventdatasharingincludingriskingpatientprivacy,datamisrepresentation,andinvalidsecondaryanalyses.Inordertoaddresstheseobstacles,wedevelopedanoveldatasharingmethodwhichensurespatientprivacywhilealsoprotectingtheinterestsofclinicaltrialinvestigators.Ourflexibleandrobustapproachinvolvestwocomponents:(1)anadvancedcloud-basedqueryinglanguagethatallowsuserstotesthypotheseswithoutdirectaccesstotherealclinicaltrialdataand(2)correspondingsyntheticdataforthequeryofinterestthatallowsforexploratoryresearchandmodeldevelopment.Bothcomponentscanbemodifiedbytheclinicaltrialinvestigatordependingonfactorssuchasthetypeoftrialornumberofpatientsenrolled.Totesttheeffectivenessofoursystem,wefirstimplementasimpleandrobustpermutationbasedsyntheticdatagenerator.Wethenusethesyntheticdatageneratorcoupledwithourqueryinglanguagetoidentifysignificantrelationshipsamongvariablesinarealisticclinicaltrialdataset.

Page 45: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

37

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Pathwayandnetworkembeddingmethodsforprioritizingpsychiatricdrugs

YashPershad1,MargaretGuo2,RussB.Altman3

1StanfordUniversityDepartmentofBioengineering,2StanfordUniversityBiomedicalInformatics

Program,3StanfordUniversityDepartmentsofBioengineering,Genetics,&MedicineYashPershad

OneinfiveAmericansexperiencementalillness,androughly75%ofpsychiatricprescriptionsdonotsuccessfullytreatthepatient’scondition.Extensiveevidenceimplicatesgeneticfactorsandsignalingdisruptioninthepathophysiologyofthesediseases.Changesintranscriptionoftenunderliethismolecularpathwaydysregulation;individualpatienttranscriptionaldatacanimprovetheefficacyofdiagnosisandtreatment.Recentlarge-scalegenomicstudieshaveuncoveredsharedgeneticmodulesacrossmultiplepsychiatricdisorders—providinganopportunityforanintegratedmulti-diseaseapproachfordiagnosis.Moreover,network-basedmodelsinformedbygeneexpressioncanrepresentpathologicalbiologicalmechanismsandsuggestnewgenesfordiagnosisandtreatment.Here,weusepatientgeneexpressiondatafrommultiplestudiestoclassifypsychiatricdiseases,integrateknowledgefromexpert-curateddatabasesandpubliclyavailableexperimentaldatatocreateaugmenteddisease-specificgenesets,andusethesetorecommenddisease-relevantdrugs.FromGeneExpressionOmnibus,weextractexpressiondatafrom145casesofschizophrenia,82casesofbipolardisorder,190casesofmajordepressivedisorder,and307sharedcontrols.Weusepathway-basedapproachestopredictpsychiatricdiseasediagnosiswitharandomforestmodel(78%accuracy)andderiveimportantfeaturestoaugmentavailabledruganddiseasesignatures.Usingprotein-protein-interactionnetworksandembedding-basedmethods,webuildapipelinetoprioritizetreatmentsforpsychiatricdiseasesthatachievesa3.4-foldimprovementoverabackgroundmodel.Thus,wedemonstratethatgene-expression-derivedpathwayfeaturescandiagnosepsychiatricdiseasesandthatmolecularinsightsderivedfromthisclassificationtaskcaninformtreatmentprioritizationforpsychiatricdiseases.

Page 46: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

38

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata

JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1

1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC

JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.

Page 47: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

39

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Computationallyefficient,exact,covariate-adjustedgeneticprincipalcomponentanalysisbyleveragingindividualmarkersummarystatisticsfromlargebiobanks

JackWolf1,MarthaBarnard1,XuetingXia2,NathanRyder3,JasonWestra4,NathanTintle4

1St.OlafCollege,2TexasTechUniversity,3ColoradoStateUniversity,4DordtUniversity

NathanTintle

Thepopularizationofbiobanksprovidesanunprecedentedamountofgeneticandphenotypicinformationthatcanbeusedtoresearchtherelationshipbetweengeneticsandhumanhealth.Despitetheopportunitiesthesedatasetsprovide,theyalsoposemanyproblemsassociatedwithcomputationaltimeandcosts,datasizeandtransfer,andprivacyandsecurity.Thepublishingofsummarystatisticsfromthesebiobanks,andtheuseoftheminavarietyofdownstreamstatisticalanalyses,alleviatesmanyoftheselogisticalproblems.However,majorquestionsremainabouthowtousesummarystatisticsinallbutthesimplestdownstreamapplications.Here,wepresentanovelapproachtoutilizebasicsummarystatistics(estimatesfromsinglemarkerregressionsonsinglephenotypes)toevaluatemorecomplexphenotypesusingmultivariatemethods.Inparticular,wepresentacovariate-adjustedmethodforconductingprincipalcomponentanalysis(PCA)utilizingonlybiobanksummarystatistics.Wevalidateexactformulasforthismethod,aswellasprovideaframeworkofestimationwhenspecificsummarystatisticsarenotavailable,throughsimulation.Weapplyourmethodtoarealdatasetoffattyacidandgenomicdata.

Page 48: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

40

ARTIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 49: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

41

ArtificialIntelligenceforEnhancingClinicalMedicine

MulticlassDiseaseClassificationfromMicrobialWhole-CommunityMetagenomes

SaadKhan,LibushaKelly

AlbertEinsteinCollegeofMedicineSaadKhanThemicrobiome,thecommunityofmicroorganismslivingwithinanindividual,isapromisingavenuefordevelopingnon-invasivemethodsfordiseasescreeninganddiagnosis.Here,weutilize5643aggregated,annotatedwhole-communitymetagenomestoimplementthefirstmulticlassmicrobiomediseaseclassifierofthisscale,abletodiscriminatebetween18differentdiseasesandhealthy.Wecomparedthreedifferentmachinelearningmodels:randomforests,deepneuralnets,andanovelgraphconvolutionalarchitecturewhichexploitsthegraphstructureofphylogenetictreesasitsinput.Weshowthatthegraphconvolutionalmodeloutperformsdeepneuralnetsintermsofaccuracy(achieving75%averagetest-setaccuracy),receiver-operator-characteristics(92.1%averagearea-under-ROC(AUC)),andprecision-recall(50%averagearea-under-precision-recall(AUPR)).Additionally,theconvolutionalnet'sperformancecomplementsthatoftherandomforest,showingalowerpropensityforType-Ierrors(false-positives)whiletherandomforestmakeslessType-IIerrors(false-negatives).Lastly,weareabletoachieveover90%averagetop-3accuracyacrossallofourmodels.Together,theseresultsindicatethattherearepredictive,disease-specificsignaturesacrossmicrobiomesthatcanbeusedfordiagnosticpurposes.

Page 50: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

42

ArtificialIntelligenceforEnhancingClinicalMedicine

LitGen:GeneticLiteratureRecommendationGuidedbyHumanExplanations

AllenNie1,ArturoL.Pineda1,MattW.Wright1,HannahWand1,BryanWulf1,HelioA.Costa1,RonakY.Patel2,CarlosD.Bustamante1,JamesZou1

1StanfordUniversity,2BaylorCollegeofMedicine

AllenNieAsgeneticsequencingcostsdecrease,thelackofclinicalinterpretationofvariantshasbecomethebottleneckinusinggeneticsdata.Amajorratelimitingstepinclinicalinterpretationisthemanualcurationofevidenceinthegeneticliteraturebyhighlytrainedbiocurators.Whatmakescurationparticularlytime-consumingisthatthecuratorneedstoidentifypapersthatstudyvariantpathogenicityusingdifferenttypesofapproachesandevidences---e.g.biochemicalassaysorcasecontrolanalysis.IncollaborationwiththeClinicalGenomicResource(ClinGen)---theflagshipNIHprogramforclinicalcuration---weproposethefirstmachinelearningsystem,LitGen,thatcanretrievepapersforaparticularvariantandfilterthembyspecificevidencetypesusedbycuratorstoassessforpathogenicity.LitGenusessemi-superviseddeeplearningtopredictthetypeofevi+denceprovidedbyeachpaper.ItistrainedonpapersannotatedbyClinGencuratorsandsystematicallyevaluatedonnewtestdatacollectedbyClinGen.LitGenfurtherleveragesrichhumanexplanationsandunlabeleddatatogain7.9%-12.6%relativeperformanceimprovementovermodelslearnedonlyontheannotatedpapers.Itisausefulframeworktoimproveclinicalvariantcuration.

Page 51: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

43

ArtificialIntelligenceforEnhancingClinicalMedicine

MultilevelSelf-AttentionModelanditsUseonMedicalRiskPrediction

XianlongZeng1,2,YunyiFeng1,2,SoheilMoosavinasab2,DeborahLin2,SimonLin2,ChangLiu1

1SchoolofElectricalEngineeringandComputerScience,OhioUniversity,Athens,OH,USA;2The

ResearchInstituteatNationwideChildren’sHospital,Columbus,OH,USAxianlongzengVariousdeeplearningmodelshavebeendevelopedfordifferenthealthcarepredictivetasksusingElectronicHealthRecordsandhaveshownpromisingperformance.Inthesemodels,medicalcodesareoftenaggregatedintovisitrepresentationwithoutconsideringtheirheterogeneity,e.g.,thesamediagnosismightimplydifferenthealthcareconcernswithdifferentproceduresormedications.Thenthevisitsareoftenfedintodeeplearningmodels,suchasrecurrentneuralnetworks,sequentiallywithoutconsideringtheirregulartemporalinformationanddependenciesamongvisits.Toaddresstheselimitations,wedevelopedaMultilevelSelf-AttentionModel(MSAM)thatcancapturetheunderlyingrelationshipsbetweenmedicalcodesandbetweenmedicalvisits.WecomparedMSAMwithvariousbaselinemodelsontwopredictivetasks,i.e.,futurediseasepredictionandfuturemedicalcostprediction,withtwolargedatasets,i.e.,MIMIC-3andPFK.Intheexperiments,MSAMconsistentlyoutperformedbaselinemodels.Additionally,forfuturemedicalcostprediction,weuseddiseasepredictionasanauxiliarytask,whichnotonlyguidesthemodeltoachieveastrongerandmorestablefinancialprediction,butalsoallowsmanagedcareorganizationstoprovideabettercarecoordination.

Page 52: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

44

ArtificialIntelligenceforEnhancingClinicalMedicine

IdentifyingTransitionalHighCostUsersfromUnstructuredPatientProfilesWrittenbyPrimaryCarePhysicians

HaoranZhang1,2,3,ElisaCandido3,AndrewS.Wilton3,RaquelDuchen3,LiisaJaakkimainen3,WalterWodchis3,4,5,QuaidMorris1,2,6,7

1DepartmentofComputerScience,UniversityofToronto;2VectorInstituteforArtificial

Intelligence,Toronto,Ontario,Canada;3ICES,Toronto,Ontario,Canada;4InstituteofHealthPolicy,Management,andEvaluation,UniversityofToronto;5InstituteforBetterHealth,Trillium

HealthPartners,Mississauga,Ontario,Canada;6TerrenceDonnellyCenterforCellularandBiomolecularResearch,UniversityofToronto;7DepartmentofMolecularGenetics,Universityof

TorontoHaoranZhangIdentificationandsubsequentinterventionofpatientsatriskofbecomingHighCostUsers(HCUs)presentstheopportunitytoimproveoutcomeswhilealsoprovidingsignificantsavingsforthehealthcaresystem.Inthispaper,the2016HCUstatusofpatientswaspredictedusingfree-formtextdatafromthe2015cumulativepatientprofileswithintheelectronicmedicalrecordsoffamilycarepracticesinOntario.Theseunstructurednotesmakesubstantialuseofdomain-specificspellingsandabbreviations;weshowthatwordembeddingsderivedfromthesamecontextprovidemoreinformativefeaturesthanpre-trainedonesbasedonWikipedia,MIMIC,andPubmed.Wefurtherdemonstratethatamodelusingfeaturesderivedfromaggregatedwordembeddings(EmbEncode)providesasignificantperformanceimprovementoverthebag-of-wordsrepresentation(82.48±0.35%versus81.85±0.36%held-outAUROC,p=3.2E-4),usingfarfewerinputfeatures(5,492versus214,750)andfewernon-zerocoefficients(1,177versus4,284).ThefutureHCUsofgreatestinterestarethetransitionaloneswhoarenotalreadyHCUs,becausetheyprovidethegreatestscopeforinterventions.PredictingthesenewHCUischallengingbecausemostHCUsrecur.WeshowthatremovingrecurrentHCUsfromthetrainingsetimprovestheabilityofEmbEncodetopredictnewHCUs,whileonlyslightlydecreasingitsabilitytopredictrecurrentones.

Page 53: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

45

ArtificialIntelligenceforEnhancingClinicalMedicine

Obtainingdual-energycomputedtomography(CT)informationfromasingle-energyCTimageforquantitativeimaginganalysisoflivingsubjectsbyusingdeeplearning

WeiZhao1,TianlingLv2,RenaLee3,YangChen2,LeiXing1

1StanfordUniversity,2SoutheastUniversity,3EhwaWomensUniversity

LeiXingComputedtomographic(CT)isafundamentalimagingmodalitytogeneratecross-sectionalviewsofinternalanatomyinalivingsubjectorinterrogatematerialcompositionofanobject,andithasbeenroutinelyusedinclinicalapplicationsandnondestructivetesting.InastandardCTimage,pixelshavingthesameHounsfieldUnits(HU)cancorrespondtodifferentmaterials,anditisthereforechallengingtodifferentiateandquantifymaterials.Dual-energyCT(DECT)isdesirabletodifferentiatemultiplematerials,butthecostlyDECTscannersarenotwidelyavailableassingle-energyCT(SECT)scanners.Recentadvancementindeeplearningprovidesanenablingtooltomapimagesbetweendifferentmodalitieswithincorporatedpriorknowledge.HerewedevelopadeeplearningapproachtoperformDECTimagingbyusingthestandardSECTdata.Theendpointoftheapproachisamodelcapableofprovidingthehigh-energyCTimageforagiveninputlow-energyCTimage.Thefeasibilityofthedeeplearning-basedDECTimagingmethodusingaSECTdataisdemonstratedusingcontrast-enhancedDECTimagesandevaluatedusingclinicalrelevantindexes.ThisworkopensnewopportunitiesfornumerousDECTclinicalapplicationswithastandardSECTdataandmayenablesignificantlysimplifiedhardwaredesign,scanningdose,andimagecostreductionforfutureDECTsystems.

Page 54: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

46

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 55: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

47

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

Many-to-onebindingbyintrinsicallydisorderedproteinregions

Wei-LunAlterovitz1*,EshelFaraggi1,2,3*,ChristopherJ.Oldfield1,JingweiMeng1,BinXue1,FeiHuang1,PedroRomero1,AndrzejKloczkowski2,VladimirN.Uversky1,A.KeithDunker1

1CenterforComputationalBiologyandBioinformatics,DepartmentofBiochemistryand

MolecularBiology,IndianaUniversitySchoolofMedicine,410W.10thSt,HS5000,Indianapolis,IN46202,USA([email protected]);2BattelleCenterforMathematicalMedicine,andthe

NationwideChildren’sHospital,DepartmentofPediatrics,TheOhioStateUniversity,Columbus,OH43210,USA;3ResearchandInformationSystems,LLC,1620E.72ndSt.Indianapolis,IN

46240USA*Contributedequally([email protected],[email protected])

KeithDunkerDisorderedbindingregions(DBRs),whichareembeddedwithinintrinsicallydisorderedproteinsorregions(IDPsorIDRs),enableIDPsorIDRstomediatemultipleprotein-proteininteractions.DBR-proteincomplexeswerecollectedfromtheProteinDataBankforwhichtwoormoreDBRshavingdifferentaminoacidsequencesbindtothesame(100%sequenceidentical)globularproteinpartner,atypeofinteractionhereincalledmany-to-onebinding.Twodistinctbindingprofileswereidentified:independentandoverlapping.Fortheoverlappingbindingprofiles,thedistinctDBRsinteractbymeansofalmostidenticalbindingsites(hereincalled“similar”),orthebindingsitescontainbothcommonanddivergentinteractionresidues(hereincalled“intersecting”).FurtheranalysisofthesequenceandstructuraldifferencesamongthesethreegroupsindicatehowIDPflexibilityallowsdifferentsegmentstoadjusttosimilar,intersecting,andindependentbindingpockets.

Page 56: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

48

MUTATIONALSIGNATURES

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 57: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

49

MutationalSignatures

ImpactofmutationalsignaturesonmicroRNAandtheirresponseelements

EiriniStamoulakatou1,PietroPinoli1,StefanoCeri1,RosarioPiro2

1PolitecnicodiMilano,2FreieUniversitatBerlin

EiriniStamoulakatouMicroRNAsareaclassofsmallnon-codingRNAmoleculeswithgreatimportanceforregulatingalargenumberofdiversebiologicalprocessesinhealthanddisease,mostlybybindingtocomplementarymicroRNAresponseelements(MREs)onprotein-codingmessengerRNAsandothernon-codingRNAsandsubsequentlyinducingtheirdegradation.AgrowingbodyofevidenceindicatesthatthedysregulationofcertainmicroRNAsmayeitherdriveorsuppressoncogenesis.TheseedregionofamicroRNAisofcrucialimportanceforitstargetrecognition.MutationsintheseseedregionsmaydisruptthebindingofmicroRNAstotheirtargetgenes.Inthisstudy,weinvestigatethetheoreticalimpactofcancer-associatedmutagenicprocessesandtheirmutationalsignaturesonmicroRNAseedsandtheirMREs.Toourknowledge,thisisthefirststudywhichprovidesaprobabilisticframeworkformicroRNAandMREsequencealterationanalysisbasedonmutationalsignaturesandcomputationallyassessingthedisruptiveimpactofmutationalsignaturesonhumanmicroRNA–targetinteractions.

Page 58: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

50

MutationalSignatures

GenomeGerrymandering:optimaldivisonofthegenomeintoregionswithcancertypespecificdifferencesinmutationrates

AdamoYoung,JacobChmura,YoonsikPark,QuaidMorris,GurnitAtwal

UniversityofTorontoAdamoYoungTheactivityofmutationalprocessesdiffersacrossthegenome,andisinfluencedbychromatinstateandspatialgenomeorganization.Atthescaleofonemegabase-pair(Mb),regionalmutationdensitycorrelatestronglywithchromatinfeaturesandmutationdensityatthisscalecanbeusedtoaccuratelyidentifycancertype.Here,weexploretherelationshipbetweengenomicregionandmutationratebydevelopinganinformationtheorydriven,dynamicprogrammingalgorithmfordividingthegenomeintoregionswithdifferingrelativemutationratesbetweencancertypes.Ouralgorithmimprovesmutualinformationwhencomparedtothenaiveapproach,effectivelyreducingtheaveragenumberofmutationsrequiredtoidentifycancertype.Ourapproachprovidesanefficientmethodforassociatingregionalmutationdensitywithmutationlabels,andhasfutureapplicationsinexploringtheroleofsomaticmutationsinanumberofdiseases.

Page 59: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

51

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 60: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

52

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

LearningaLatentSpaceofHighlyMultidimensionalCancerData

BenjaminKompa1,BeauCoker2

1HarvardMedicalSchool,2HarvardSchoolofPublicHealth

BenjaminKompaWeintroduceaUnifiedDisentanglementNetwork(UFDN)trainedonTheCancerGenomeAtlas(TCGA),whichwerefertoasUFDN-TCGA.WedemonstratethatUFDN-TCGAlearnsabiologicallyrelevant,low-dimensionallatentspaceofhigh-dimensionalgeneexpressiondatabyapplyingournetworktotwoclassificationtasksofcancerstatusandcancertype.UFDN-TCGAperformscomparablytorandomforestmethods.TheUFDNallowsforcontinuous,partialinterpolationbetweendistinctcancertypes.Furthermore,weperformananalysisofdifferentiallyexpressedgenesbetweenskincutaneousmelanoma(SKCM)samplesandthesamesamplesinterpolatedintoglioblastoma(GBM).Wedemonstratethatourinterpolationsconsistofrelevantmetagenesthatrecapitulateknownglioblastomamechanisms.

Page 61: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

53

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ScalingstructurallearningwithNO-BEARStoinfercausaltranscriptomenetworks

Hao-ChihLee1,3,MatteoDanieletto1,2,3,RiccardoMiotto1,2,3,SarahT.Cherng1,3,JoelT.Dudley1,2,3

1InstituteforNextGenerationHealthcare,2HassoPlattnerInstituteforDigitalHealth,

3DepartmentofGeneticsandGenomicSciencesIcahnSchoolofMedicineatMountSinaiNewYork,NY10065,USA

Hao-ChihLeeConstructinggeneregulatorynetworksisacriticalstepinrevealingdiseasemechanismsfromtranscriptomicdata.Inthiswork,wepresentNO-BEARS,anovelalgorithmforestimatinggeneregulatorynetworks.TheNO-BEARSalgorithmisbuiltonthebasisoftheNO-TEARSalgorithmwithtwoimprovements.First,weproposeanewconstraintanditsfastapproximationtoreducethecomputationalcostoftheNO-TEARSalgorithm.Next,weintroduceapolynomialregressionlosstohandlenon-linearityingeneexpressions.OurimplementationutilizesmodernGPUcomputationthatcandecreasethetimeofhours-longCPUcomputationtoseconds.Usingsyntheticdata,wedemonstrateimprovedperformance,bothinprocessingtimeandaccuracy,oninferringgeneregulatorynetworksfromgeneexpressiondata.

Page 62: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

54

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

PathFlowAI:AHigh-ThroughputWorkflowforPreprocessing,DeepLearningandInterpretationinDigitalPathology

JoshuaJ.Levy1,LucasA.Salas1,BrockC.Christensen1,AravindhanSriharan2,LouisJ.Vaickus2

1GeiselSchoolofMedicineatDartmouth,2DartmouthHitchcockMedicalCenter

JoshuaLevyThediagnosisofdiseaseoftenrequiresanalysisofabiopsy.Manydiagnosesdependnotonlyonthepresenceofcertainfeaturesbutontheirlocationwithinthetissue.Recently,anumberofdeeplearningdiagnosticaidshavebeendevelopedtoclassifydigitizedbiopsyslides.Clinicalworkflowsofteninvolveprocessingofmorethan500slidesperday.But,clinicaluseofdeeplearningdiagnosticaidswouldrequireapreprocessingworkflowthatiscost-effective,flexible,scalable,rapid,interpretable,andtransparent.Here,wepresentsuchaworkflow,optimizedusingDaskandmixedprecisiontrainingviaAPEX,capableofhandlinganypatch-levelorslidelevelclassificationandpredictionproblem.Theworkflowusesaflexibleandfastpreprocessinganddeeplearninganalyticspipeline,incorporatesmodelinterpretationandhasahighlystorage-efficientaudittrail.Wedemonstratetheutilityofthispackageontheanalysisofaprototypicalanatomicpathologyspecimen,liverbiopsiesforevaluationofhepatitisfromaprospectivecohort.ThepreliminarydataindicatethatPathFlowAImaybecomeacost-effectiveandtime-efficienttoolforclinicaluseofArtificialIntelligence(AI)algorithms.

Page 63: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

55

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Improvingsurvivalpredictionusinganovelfeatureselectionandfeaturereductionframeworkbasedontheintegrationofclinicalandmoleculardata*

LisaNeums,RichardMeier,DevinC.Koestler,JeffreyA.Thompson

DepartmentofBiostatisticsandDataScience,UniversityofKansasMedicalCenter,andUniversityofKansasCancerCenter

LisaNeumsTheaccuratepredictionofacancerpatient’sriskofprogressionordeathcanguidecliniciansintheselectionoftreatmentandhelppatientsinplanningpersonalaffairs.Predictivemodelsbasedonpatient-leveldatarepresentatoolfordeterminingrisk.Ideally,predictivemodelswillusemultiplesourcesofdata(e.g.,clinical,demographic,molecular,etc.).However,therearemanychallengesassociatedwithdataintegration,suchasoverfittingandredundantfeatures.Inthispaperweaimtoaddressthosechallengesthroughthedevelopmentofanovelfeatureselectionandfeaturereductionframeworkthatcanhandlecorrelateddata.Ourmethodbeginsbycomputingasurvivaldistancescoreforgeneexpression,whichincombinationwithascoreforclinicalindependence,resultsintheselectionofhighlypredictivegenesthatarenon-redundantwithclinicalfeatures.Thesurvivaldistancescoreisameasureofvariationofgeneexpressionovertime,weightedbythevarianceofthegeneexpressionoverallpatients.Selectedgenes,incombinationwithclinicaldata,areusedtobuildapredictivemodelforsurvival.Webenchmarkourapproachagainstcommonlyusedmethods,namelylasso-aswellasridge-penalizedCoxproportionalhazardsmodels,usingthreepubliclyavailablecancerdatasets:kidneycancer(521samples),lungcancer(454samples)andbladdercancer(335samples).Acrossalldatasets,ourapproachbuiltonthetrainingsetoutperformedtheclinicaldataaloneinthetestsetintermsofpredictivepowerwithac.Indexof0.773vs0.755forkidneycancer,0.695vs0.664forlungcancerand0.648vs0.636forbladdercancer.Further,wewereabletoshowincreasedpredictiveperformanceofourmethodcomparedtolasso-penalizedmodelsfittobothgeneexpressionandclinicaldata,whichhadac.Indexof0.767,0.677,and0.645,aswellasincreasedorcomparablepredictivepowercomparedtoridgemodels,whichhadac.Indexof0.773,0.668and0.650forthekidney,lung,andbladdercancerdatasets,respectively.Therefore,ourscoreforclinicalindependenceimprovesprognosticperformanceascomparedtomodelingapproachesthatdonotconsidercombiningnon-redundantdata.Futureworkwillconcentrateonoptimizingthesurvivaldistancescoreinordertoachieveimprovedresultsforalltypesofcancer.

Page 64: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

56

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Bayesiansemi-nonnegativematrixtri-factorizationtoidentifypathwaysassociatedwithcancerphenotypes

SunhoPark1,NabhonilKar1,Jae-HoCheong2,TaeHyunHwang1

1ClevelandClinic,2YonseiUniversityCollegeofMedicine

SunhoParkAccurateidentificationofpathwaysassociatedwithcancerphenotypes(e.g.,cancersubtypesandtreatmentoutcome)couldleadtodiscoveringreliableprognosticand/orpredictivebiomarkersforbetterpatientsstratificationandtreatmentguidance.Inourpreviouswork,wehaveshownthatnon-negativematrixtri-factorization(NMTF)canbesuccessfullyappliedtoidentifypathwaysassociatedwithspecificcancertypesordiseaseclassesasaprognosticandpredictivebiomarker.However,onekeylimitationofnon-negativefactorizationmethods,includingvariousnon-negativebi-factorizationmethods,istheirlimitedabilitytohandlenegativeinputdata.Forexample,manymoleculardatathatconsistofreal-valuescontainingbothpositiveandnegativevalues(e.g.,normalized/logtransformedgeneexpressiondatawherenegativevaluerepresentsdown-regulatedexpressionofgenes)arenotsuitableinputforthesealgorithms.Inaddition,mostpreviousmethodsprovidejustasinglepointestimateandhencecannotdealwithuncertaintyeffectively.Toaddresstheselimitations,weproposeaBayesiansemi-nonnegativematrixtri-factorizationmethodtoidentifypathwaysassociatedwithcancerphenotypesfromareal-valuedinputmatrix,e.g.,geneexpressionvalues.Motivatedbysemi-nonnegativefactorization,weallowoneofthefactormatrices,thecentroidmatrix,tobereal-valuedsothateachcentroidcanexpresseithertheup-ordown-regulationofthemembergenesinapathway.Inaddition,weplacestructuredspike-and-slabpriors(whichareencodedwiththepathwaysandagene-geneinteraction(GGI)network)onthecentroidmatrixsothatevenasetofgenesthatisnotinitiallycontainedinthepathways(duetotheincompletenessofthecurrentpathwaydatabase)canbeinvolvedinthefactorizationinastochasticwayspecifically,ifthosegenesareconnectedtothemembergenesofthepathwaysontheGGInetwork.Wealsopresentupdaterulesfortheposteriordistributionsintheframeworkofvariationalinference.AsafullBayesianmethod,ourproposedmethodhasseveraladvantagesoverthecurrentNMTFmethods,whicharedemonstratedusingsyntheticdatasetsinexperiments.UsingtheTheCancerGenomeAtlas(TCGA)gastriccancerandmetastaticgastriccancerimmunotherapyclinical-trialdatasets,weshowthatourmethodcouldidentifybiologicallyandclinicallyrelevantpathwaysassociatedwiththemolecularsubtypesandimmunotherapyresponse,respectively.Finally,weshowthatthosepathwaysidentifiedbytheproposedmethodcouldbeusedasprognosticbiomarkerstostratifypatientswithdistinctsurvivaloutcomeintwoindependentvalidationdatasets.Additionalinformationandcodescanbefoundathttps://github.com/parks-cs-ccf/BayesianSNMTF.

Page 65: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

57

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Tree-WeightingforMulti-StudyEnsembleLearners

MayaRamchandran1,PrasadPatil1,2,GiovanniParmigiani1,2

1DepartmentofBiostatistics,HarvardT.H.ChanSchoolofPublicHealth;Departmentof

Biostatistics,HarvardT.H.ChanSchoolofPublicHealth;2DepartmentofDataSciences,Dana-FarberCancerInstitute

MayaRamchandranMulti-studylearningusesmultipletrainingstudies,separatelytrainsclassifiersoneach,andformsanensemblewithweightsrewardingmemberswithbettercross-studypredictionability.Thisarticleconsidersnovelweightingapproachesforconstructingtree-basedensemblelearnersinthissetting.UsingRandomForestsasasingle-studylearner,wecompareweightingeachforesttoformtheensemble,toextractingtheindividualtreestrainedbyeachRandomForestandweightingthemdirectly.Wefindthatincorporatingmultiplelayersofensemblinginthetrainingprocessbyweightingtreesincreasestherobustnessoftheresultingpredictor.Furthermore,weexplorehowensemblingweightscorrespondtotreestructure,toshedlightonthefeaturesthatdeterminewhetherweightingtreesdirectlyisadvantageous.Finally,weapplyourapproachtogenomicdatasetsandshowthatweightingtreesimprovesuponthebasicmulti-studylearningparadigm.Codeandsupplementarymaterialareavailableathttps://github.com/m-ramchandran/tree-weighting.

Page 66: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

58

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

PTRExplorer:AnapproachtoidentifyandexplorePostTranscriptionalRegulatorymechanismsusingproteogenomics

ArunimaSrivastava1,MichaelSharpnack1,KunHuang2,ParagMallick3,RaghuMachiraju1

1TheOhioStateUniversity,2IndianaUniversitySchoolofMedicine,3StanfordUniversity

ArunimaSrivastavaIntegrationoftranscriptomicandproteomicdatashouldrevealmulti-layeredregulatoryprocessesgoverningcancercellbehaviors.Traditionalcorrelation-basedanalyseshavedemonstratedlimitedabilitytoidentifythepost-transcriptionalregulatory(PTR)processesthatdrivethenon-linearrelationshipbetweentranscriptandproteinabundances.Inthiswork,weideateanintegrativeapproachtoexplorethevarietyofpost-transcriptionalmechanismsthatdictaterelationshipsbetweengenesandcorrespondingproteins.Theproposedworkflowutilizestheintuitivetechniqueofscatterplotdiagnosticsorscagnostics,tocharacterizeandexaminethediversescatterplotsbuiltfromtranscriptandproteinabundancesinaproteogenomicexperiment.Theworkflowincludesrepresentinggene-proteinrelationshipsasscatterplots,clusteringongeometricscagnosticfeaturesofthesescatterplots,andfinallyidentifyingandgroupingthepotentialgene-proteinrelationshipsaccordingtotheirdispositiontovariousPTRmechanisms.Ourstudyverifiestheefficacyoftheimplementedapproachtoexcavatepossibleregulatorymechanismsbyutilizingcomprehensivetestsonasyntheticdataset.Wealsoproposeavarietyof2Dpattern-specificdownstreamanalysesmethodologiessuchasmixturemodeling,andmappingmiRNApost-transcriptionaleffectstoexploreeachmechanismfurther.Thisworksuggeststhattheproposedmethodologyhasthepotentialfordiscoveringandcategorizingpost-transcriptionalregulatorymechanisms,manifestinginproteogenomictrends.Thesetrendssubsequentlyprovideevidenceforcancerspecificity,miRNAtargeting,andidentificationofregulationimpactedbybiologicalfunctionalityanddifferenttypesofdegradation.

Page 67: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

59

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

NetworkRepresentationofLarge-ScaleHeterogeneousRNASequenceswithIntegrationofDiverseMulti-omics,Interactions,andAnnotationsData

NhatTran,JeanGao

TheUniversityofTexasatArlingtonJeanGaoLongnon-codingRNA(lncRNA),microRNA,andmessengerRNAenablekeyregulationsofvariousbiologicalprocessesthroughavarietyofdiverseinteractionmechanisms.Identifyingtheinteractionsandcross-talkbetweentheseheterogeneousRNAclassesisessentialinordertouncoverthefunctionalroleofindividualRNAtranscripts,especiallyforunannotatedandsparselydiscoveredRNAsequenceswithnoknowninteractions.Recently,sequence-baseddeeplearningandnetworkembeddingmethodsaregainingtractionashigh-performingandflexibleapproachesthatcaneitherpredictRNA-RNAinteractionsfromsequenceorinfermissinginteractionsfrompatternsthatmayexistinthenetworktopology.However,mostofthecurrentmethodshaveseverallimitations,e.g.,theinabilitytoperforminductivepredictions,todistinguishthedirectionalityofinteractions,ortointegratevarioussequence,interaction,expression,andgenomicannotationdatasets.Weproposedanoveldeeplearningframework,rna2rna,whichlearnsfromRNAsequencestoproducealow-dimensionalembeddingthatpreservesproximitiesinboththeinteractiontopologyandthefunctionalaffinitytopology.Inthisproposedembeddingspace,thetwo-part"sourceandtargetcontexts"capturethereceptivefieldsofeachRNAtranscripttoencapsulateheterogeneouscross-talkinteractionsbetweenlncRNAsandmicroRNAs.TheproximitybetweenRNAsinthisembeddingspacealsouncoversthesecond-orderrelationshipsthatallowforaccurateinferenceofnoveldirectedinteractionsorfunctionalsimilaritiesbetweenanytwoRNAsequences.Inaprospectiveevaluation,ourmethodexhibitssuperiorperformancecomparedtostate-of-artapproachesatpredictingmissinginteractionsfromseveralRNA-RNAinteractiondatabases.AdditionalresultssuggestthatourproposedframeworkcancaptureamanifoldforheterogeneousRNAsequencestodiscovernovelfunctionalannotations.

Page 68: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

60

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

HadoopandPySparkforreproducibilityandscalabilityofgenomicsequencingstudies

NicholasR.Wheeler1,PenelopeBenchek1,BrianW.Kunkle2,KaraL.Hamilton-Nelson2,MikeWarfe1,JeremyR.Fondran1,JonathanL.Haines1,WilliamS.Bush1

1CaseWesternReserveUniversity,2UniversityofMiami

WilliamBushModerngenomicstudiesarerapidlygrowinginscale,andtheanalyticalapproachesusedtoanalyzegenomicdataareincreasingincomplexity.Genomicdatamanagementposeslogisticandcomputationalchallenges,andanalysesareincreasinglyreliantongenomicannotationresourcesthatcreatetheirowndatamanagementandversioningissues.Asaresult,genomicdatasetsareincreasinglyhandledinwaysthatlimittherigorandreproducibilityofmanyanalyses.Inthiswork,weexaminetheuseoftheSparkinfrastructureforthemanagement,access,andanalysisofgenomicdataincomparisontotraditionalgenomicworkflowsontypicalclusterenvironments.WevalidatetheframeworkbyreproducingpreviouslypublishedresultsfromtheAlzheimer’sDiseaseSequencingProject.UsingtheframeworkandanalysesdesignedusingJupyternotebooks,Sparkprovidesimprovedworkflows,reducesuser-drivendatapartitioning,andenhancestheportabilityandreproducibilityofdistributedanalysesrequiredforlarge-scalegenomicstudies.

Page 69: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

61

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

CERENKOV3:Clusteringandmolecularnetwork-derivedfeaturesimprovecomputationalpredictionoffunctionalnoncodingSNPs

YaoYao,StephenA.Ramsey

OregonStateUniversityYaoYaoIdentificationofcausalnoncodingsinglenucleotidepolymorphisms(SNPs)isimportantformaximizingtheknowledgedividendfromhumangenome-wideassociationstudies(GWAS).Recently,diversemachinelearning-basedmethodshavebeenusedforfunctionalSNPidentification;however,thistaskremainsafundamentalchallengeincomputationalbiology.WereportCERENKOV3,amachinelearningpipelinethatleveragesclustering-derivedandmolecularnetwork-derivedfeaturestoimprovepredictionaccuracyofregulatorySNPs(rSNPs)inthecontextofpost-GWASanalysis.Theclustering-derivedfeature,locussize(numberofSNPsinthelocus),derivesfromourlocuspartitioningprocedureandrepresentsthesizesofclustersbasedonSNPlocations.Wegeneratedtwomolecularnetwork-derivedfeaturesfromrepresentationlearningonanetworkrepresentingSNP-geneandgene-generelations.Basedonempiricalstudiesusingaground-truthSNPdataset,CERENKOV3significantlyimprovesrSNPrecognitionperformanceinAUPRC,AUROC,andAVGRANK(alocus-wiserank-basedmeasureofclassificationaccuracywepreviouslyproposed).

Page 70: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

62

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE

PROCEEDINGSPAPERSWITHPOSTERPRESENTATIONS

Page 71: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

63

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

AnomiGAN:GenerativeAdversarialNetworksforAnonymizingPrivateMedicalData

HoBae,DahuinJung,Hyun-SooChoi,SungrohYoon

SeoulNationalUniversityHoBae

Typicalpersonalmedicaldatacontainssensitiveinformationaboutindividuals.Storingorsharingthepersonalmedicaldataisthusoftenrisky.Forexample,ashortDNAsequencecanprovideinformationthatcanidentifynotonlyanindividual,butalsohisorherrelatives.Nonetheless,mostcountriesandresearchersagreeonthenecessityofcollectingpersonalmedicaldata.Thisstemsfromthefactthatmedicaldata,includinggenomicdata,areanindispensableresourceforfurtherresearchanddevelopmentregardingdiseasepreventionandtreatment.Topreventpersonalmedicaldatafrombeingmisused,techniquestoreliablypreservesensitiveinformationshouldbedevelopedforrealworldapplications.Inthispaper,weproposeaframeworkcalledanonymizedgenerativeadversarialnetworks(AnomiGAN),topreservetheprivacyofpersonalmedicaldata,whilealsomaintaininghighpredictionperformance.Wecomparedourmethodtostate-of-the-arttechniquesandobservedthatourmethodpreservesthesamelevelofprivacyasdifferentialprivacy(DP)andprovidesbetterpredictionresults.Wealsoobservedthatthereisatrade-offbetweenprivacyandpredictionresultsthatdependsonthedegreeofpreservationoftheoriginaldata.Here,weprovideamathematicaloverviewofourproposedmodelanddemonstrateitsvalidationusingUCImachinelearningrepositorydatasetsinordertohighlightitsutilityinpractice.Thecodeisavailableathttps://github.com/hobae/AnomiGAN/

Page 72: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

64

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

FrequencyofClinVarpathogenicvariantsinchronickidneydiseasepatientssurveyedforreturnofresearchresultsataClevelandpublichospital

DanaC.Crawford1,2,3,,JohnLin1,JessicaN.CookeBailey1,2,TylerKinzy1,JohnR.Sedor4,5,JohnF.O'Toole5,WilliamS.Bush1,2,3

1ClevelandInstituteforComputationalBiology,2DepartmentsofPopulationandQuantitative

HealthSciences,and3GeneticsandGenomeSciences,CaseWesternReserveUniversity4DepartmentofPhysiologyandBiophysics,CaseWesternReserveUniversity;and5DepartmentofNephrologyandHypertension,GlickmanUrologyandKidneyandLernerResearchInstitute,

ClevelandClinicDanaCrawfordReturnofresultsisnotcommoninresearchsettingsasstandardsarenotyetinplaceforwhattoreturn,howtoreturn,andtowhom.Asapioneeroflarge-scaleofreturnofresearchresults,thePrecisionMedicineInitiativeCohortnowknownofAllofUsplanstoreturnpharmacogenomicresultsandvariantsofclinicalsignificancetoitsparticipantsstartinglate2019.Tobetterunderstandthelocallandscapeofpossibilitiesregardingreturnofresearchresults,weassessedthefrequencyofpathogenicvariantsandAPOL1renalriskvariantsinasmalldiversecohortofchronickidneydiseasepatients(CKD)ascertainedfromapublichospitalinCleveland,OhiogenotypedontheIlluminaInfiniumMegaEX.Ofthe23,720ClinVar-designatedvariantsdirectlyassayedbytheMegaEX,8,355(35%)hadatleastonealternatealleleinthe130participantsgenotyped.Ofthese,18ClinVarvariantsdeemedpathogenicbymultiplesubmitterswithnoconflictsininterpretationweredistributedacross27participants.ThemajorityofthesepathogenicClinVarvariants(14/18)wereassociatedwithautosomalrecessivedisorders.OfnotewerefourAfricanAmericancarriersofTTRrs76992529associatedwithamyloidogenictransthyretinamyloidosis,otherwiseknownasfamilialtransthyretinamyloidosis(FTA).FTA,anautosomaldominantdisorderwithvariablepenetrance,ismorecommonamongAfrican-descentpopulationscomparedwithEuropean-descentpopulations.AlsocommoninthisCKDpopulationwereAPOL1renalriskallelesG1(rs73885319)andG2(rs71785313)with60%ofthestudypopulationcarryingatleastonerenalriskallele.BothpathogenicClinVarvariantsandAPOL1renalriskallelesweredistributedamongparticipantswhowantedactionablegeneticresultsreturned,wantedgeneticresultsreturnedregardlessofactionability,andwantednoresultsreturned.Resultsfromthislocalgeneticstudyhighlightchallengesinwhichvariantstoreport,howtointerpretthem,andtheparticipant’spotentialforfollow-up,onlysomeofthechallengesinreturnofresearchresultslikelyfacinglargerstudiessuchasAllofUs.

Page 73: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

65

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Network-BasedMatchingofPatientsandTargetedTherapiesforPrecisionOncology

QingzhiLiu1,MinJinHa2,RupamBhattacharyya1,LanaGarmire3,VeerabhadranBaladandayuthapani1

1DepartmentofBiostatistics,UniversityofMichigan;2DepartmentofBiostatistics,The

UniversityofTexasMDAndersonCancerCenter;3DepartmentofComputationalMedicineandBioinformaticsUniversityofMichigan

QingzhiLiuTheextensiveacquisitionofhigh-throughputmolecularprofilingdataacrossmodelsystems(humantumorsandcancercelllines)anddrugsensitivitydata,makesprecisiononcologypossible–allowingclinicianstomatchtherightdrugtotherightpatient.Currentsupervisedmodelsfordrugsensitivityprediction,oftenusecelllinesasexemplarsofpatienttumorsandformodeltraining.However,thesemodelsarelimitedintheirabilitytoaccuratelypredictdrugsensitivityofindividualcancerpatientstoalargesetofdrugs,giventhepaucityofpatientdrugsensitivitydatausedfortestingandhighvariabilityacrossdifferentdrugs.Toaddressthesechallenges,wedevelopedamultilayernetwork-basedapproachtoimputeindividualpatients’responsestoalargesetofdrugs.Thisapproachconsidersthetripletofpatients,celllinesanddrugsasoneinter-connectedholisticsystem.Wefirstusetheomicsprofilestoconstructapatient-celllinenetworkanddeterminebestmatchingcelllinesforpatienttumorsbasedonrobustmeasuresofnetworksimilarity.Subsequently,theseresultsareusedtoimputethe“missinglink”betweeneachindividualpatientandeachdrug,calledPersonalizedImputedDrugSensitivityScore(PIDS-Score),whichcanbeconstruedasameasureofthetherapeuticpotentialofadrugortherapy.Weappliedourmethodtotwosubtypesoflungcancerpatients,matchedthesepatientswithcancercelllinesderivedfrom19tissuetypesbasedontheirfunctionalproteomicsprofiles,andcomputedtheirPIDS-Scoresto251drugsandexperimentalcompounds.Weidentifiedthebestrepresentativecelllinesthatconservelungcancerbiologyandmoleculartargets.ThePIDS-Scorebasedtopsensitivedrugsfortheentirepatientcohortaswellasindividualpatientsarehighlyrelatedtolungcancerintermsoftheirtargets,andtheirPIDS-Scoresaresignificantlyassociatedwithpatientclinicaloutcomes.Thesefindingsprovideevidencethatourmethodisusefultonarrowthescopeofpossibleeffectivepatient-drugmatchingsforimplementingevidence-basedpersonalizedmedicinestrategies.

Page 74: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

66

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Phenome-wideassociationstudiesoncardiovascularhealthandfattyacidsconsideringphenotypequalitycontrolpracticesforepidemiologicaldata

KristinPassero1,XiHe1,JiayanZhou1,BertramMueller-Myhsok2,3,4,MarcusE.Kleber5,WinfriedMaerz5,6,7,MollyA.Hall1

1PennState;2MaxPlanckInstituteofPsychiatry;3MunichClusterofSystemsBiology;4University

ofLiverpool;5HeidelbergUniversity;6SYNLABAcademy;7MedicalUniversityofGrazKristinPasseroPhenome-wideassociationstudies(PheWAS)allowagnosticinvestigationofcommongeneticvariantsinrelationtoavarietyofphenotypesbutpreservingthepowerofPheWASrequirescarefulphenotypicqualitycontrol(QC)procedures.WhileQCofgeneticdataiswell-defined,noestablishedQCpracticesexistformulti-phenotypicdata.Manuallyimposingsamplesizerestrictions,identifyingvariabletypes/distributions,andlocatingproblemssuchasmissingdataoroutliersisarduousinlarge,multivariatedatasets.Inthispaper,weperformtwoPheWASonepidemiologicaldataand,utilizingthenovelsoftwareCLARITE(CLeaningtoAnalysis:Reproducibility-basedInterfaceforTraitsandExposures),showcaseatransparentandreplicablephenomeQCpipelinewhichwebelieveisanecessityforthefield.UsingdatafromtheLudwigshafenRiskandCardiovascular(LURIC)HealthStudywerantwoPheWAS,oneoncardiac-relateddiseasesandtheotheronpolyunsaturatedfattyacidslevels.Thesephenotypesunderwentastringentqualitycontrolscreenandwereregressedonagenome-widesampleofsinglenucleotidepolymorphisms(SNPs).SevenSNPsweresignificantinassociationwithdihomo-γ-linolenicacid,ofwhichfivewerewithinfattyaciddesaturasesFADS1andFADS2.PheWASisausefultooltoelucidatethegeneticarchitectureofcomplexdiseasephenotypeswithinasingleexperimentalframework.However,toreducecomputationalandmultiple-comparisonsburden,carefulassessmentofphenotypequalityandremovaloflow-qualitydataisprudent.HereinweperformtwoPheWASwhileapplyingadetailedphenotypeQCprocess,forwhichweprovideareplicablepipelinethatismodifiableforapplicationtootherlargedatasetswithheterogenousphenotypes.Asinvestigationofcomplextraitscontinuesbeyondtraditionalgenomewideassociationstudies(GWAS),suchQCconsiderationsandtoolssuchasCLARITEarecrucialtotheintheanalysisofnon-geneticbigdatasuchasclinicalmeasurements,lifestylehabits,andpolygenictraits.

Page 75: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

67

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

aTEMPO:Pathway-SpecificTemporalAnomaliesforPrecisionTherapeutics

ChristopherMichaelPietras,LiamPower,DonnaK.Slonim

TuftsUniversityChristopherPietrasDynamicprocessesareinherentlyimportantindisease,andidentifyingdisease-relateddisruptionsofnormaldynamicprocessescanprovideinformationaboutindividualpatients.Wehavepreviouslycharacterizedindividuals'diseasestatesviapathway-basedanomaliesinexpressiondata,andwehaveidentifieddisease-correlateddisruptionofpredictabledynamicpatternsbymodelingavirtualtimeseriesinstaticdata.Herewecombinethetwoapproaches,usingananomalydetectionmodelandvirtualtimeseriestoidentifyanomaloustemporalprocessesinspecificdiseasestates.Wedemonstratethatthisapproachcaninformativelycharacterizeindividualpatients,suggestingpersonalizedtherapeuticapproaches.

Page 76: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

68

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

FeatureSelectionandDimensionReductionofSocialAutismData

PeterWashington1,KelleyMariePaskov1,HaikKalantarian1,NathanielStockham1,CatalinVoss1,AaronKline1,RitikPatnaik2,BriannaChrisman1,MayaVarma1,QandeelTariq1,Kaitlyn

Dunlap1,JesseySchwartz1,NickHaber1,DennisP.Wall1

1StanfordUniversity,2MassachusettsInstituteofTechnology

PeterWashingtonAutismSpectrumDisorder(ASD)isacomplexneuropsychiatricconditionwithahighlyheterogeneousphenotype.FollowingtheworkofDudaetal.,whichusesareducedfeaturesetfromtheSocialResponsivenessScale,SecondEdition(SRS)todistinguishASDfromADHD,weperformeditem-levelquestionselectiononanswerstotheSRStodeterminewhetherASDcanbedistinguishedfromnon-ASDusingasimilarlysmallsubsetofquestions.ToexplorefeatureredundanciesbetweentheSRSquestions,weperformedfilter,wrapper,andembeddedfeatureselectionanalyses.ToexplorethelinearityoftheSRS-relatedASDphenotype,wethencompressedthe65-questionSRSintolow-dimensionrepresentationsusingPCA,t-SNE,andadenoisingautoencoder.Wemeasuredtheperformanceofamulti-layerperceptron(MLP)classifierwiththetop-rankingquestionsasinput.Classificationusingonlythetop-ratedquestionresultedinanAUCofover92%forSRS-deriveddiagnosesandanAUCofover83%fordataset-specificdiagnoses.Highredundancyoffeatureshaveimplicationstowardsreplacingthesocialbehaviorsthataretargetedinbehavioraldiagnosticsandinterventions,wheredigitalquantificationofcertainfeaturesmaybeobfuscatedduetoprivacyconcerns.WesimilarlyevaluatedtheperformanceofanMLPclassifiertrainedonthelow-dimensionrepresentationsoftheSRS,findingthatthedenoisingautoencoderachievedslightlyhigherperformancethanthePCAandt-SNErepresentations.

Page 77: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

69

ATRIFICIALINTELLIGENCEFORENHANCINGCLINICALMEDICINE

POSTERPRESENTATIONS

Page 78: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

70

ArtificialIntelligenceforEnhancingClinicalMedicine

PrioritizingCopyNumberVariantsusingPhenotypeandGeneFunctionalSimilarity

AzzaAlthagafi,JunChen,RobertHoehndorf

Computer,Electrical&MathematicalScienceandEngineeringDivision(CEMSE),ComputationalBioscienceResearchCenter(CBRC),KingAbdullahUniversityofScienceandTechnology

(KAUST),4700KAUST,23955-6900,Thuwal,KingdomofSaudiArabia

AzzaAlthagafiTherearemanytypesofgeneticvariationinthehumangenome,rangingfromlargechromosomeanomaliestoSingleNucleotideVariant(SNV).Itisbecomingnecessarytodevelopmethodsfordistinguishingdisease-causingvariantsfromalargenumberofneutralgeneticvariationinanindividual.ThisproblemisalsorelevanttoCopyNumberVariants(CNVs),whichisaclassofgeneticvariationwherelargesegmentsofthegenomedifferincopynumberamongstvariousindividuals.Overthepastseveralyears,muchprogresshasbeenmadeintheareaofCNVsdetectionandunderstandingtheirroleinhumandiseases.WenowunderstandthatCNVsaccountformuchofhumanvariability.Correspondingly,therehavebeenseveralmethodsintroducedtofinddisease-associatedgenesandSNVs.DifferentmethodshavebeendevelopedforpredictingandprioritizingpathogenicityofSNVsfoundwithinagenome.ConstructingsimilarmethodsforCNVischallengingduetotheheterogeneityinvariantsize,typeandthepossibilityofmultiplegenesbeingaffectedbylargeCNVs.CNVimpactpredictionmethodsshouldconsiderthesefactorsinordertorobustlyprioritizepathogenicvariants.Wehavebuiltamethodthatincorporatesbiologicalbackgroundknowledgeabouttherelationbetweenphenotypesresultingfromalossoffunctioninmousegenes,genefunctionsasdescribedusingtheGeneOntology(GO),aswellastheanatomicalsiteofgeneexpressionalongwithascorethatpredictsthepathogenicityofCNVSVScore.WeusethisinformationtobuildamachinelearningmodelthatranksCNVsbasedontheirpredictedpathogenicityandtherelationbetweengenesaffectedbytheCNVandthephenotypeweobserveinaffectedindividuals.Additionally,ourapproachconsidersseveralgenomicfeaturesofeachCNVs,suchasthelengthofthecodingsequenceoverlappingwiththeCNV,haploinsufficiencyandtriplosensitivityscorestomeasurethedosage-sensitivityforgenes/regions,andGCcontent.Ourresultsshowthatincorporatingthisinformationleadstoimprovementoverabaselinemodelwhichusesonlysimilarityscoresbetweengene--phenotypeassociationsanddisease-associatedphenotypes,aswellasimprovementoverusingonlypathogenicitypredictionmethodsforCNVs.OurmethodachievesanF-scoreof80.85%,with82.05%precisionand79.67%recallinourevaluationset.Theresultsdemonstratethatincorporatingphenotype,functional,andgeneexpressioninformationmaybeutilizedtoidentifycausativeCNVs.Futureworkisrequiredtoevaluateandimproveourmodelusingpatient-derivedWGSdata.

Page 79: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

71

ArtificialIntelligenceforEnhancingClinicalMedicine

InferringtheRewardFunctionsthatGuideCancerProgression

JohnKalantari1,HeidiNelson2,NicholasChia3

1MicrobiomeProgram,CenterforIndividualizedMedicine,MayoClinic,Rochester,MN,USA;2ColonandRectalSurgery,MayoClinic,Rochester,MN,USA;3DivisionofSurgicalResearch,

DepartmentofSurgery,MayoClinic,Rochester,MN,USA

JohnKalantariCancercanoccurinpatientswithdifferentgeneticbackgroundsviaamulti-stepevolutionaryprocess,i.e.,drivenbymodificationandselection,thatcanaccumulatedifferentgeneticalterations.Despitethesedifferences,manycancersubtypesareunifiedbysimilarmechanismsortypesofgeneticchanges.Inotherwords,therearemultipleetiologicalpathstiedtogetherbyspecificeventsthatsharecommonalityintheircausalmechanism.Understandingthesecommonmechanismswillenablethedevelopmentofbettertherapiesandpreventativemeasures.Itwillalsoenableimprovedpredictionofrecurrenceandmetastaticadvancementofcancer,directlyimpactingthe606,880annualcancerdeathsintheUnitedStatesalone.OurworkisbuiltuponthecentralpropositionthattheMarkovDecisionProcess(MDP)canbetterrepresenttheprocessbywhichcancerarisesandprogresses.Morespecifically,byencodingacancercell'scomplexbehaviorasaMDP,weseektomodeltheseriesofgeneticchanges,orevolutionarytrajectory,thatleadstocancerasanoptimaldecisionprocess.WepositthatusinganInverseReinforcementLearning(IRL)approachwillenableustoreverseengineeranoptimalpolicyandrewardfunctionbasedonasetofexpertdemonstrationsextractedfromtheDNAofpatienttumors.Theinferredrewardfunctionandoptimalpolicycansubsequentlybeusedtoextrapolatetheevolutionarytrajectoryofanytumor.Weintroduceanoveldata-agnosticartificialintelligenceframeworkwhichcaninferrewardfunctionsdescribingthecausalmechanismsthatbestexplaintheobservedbehaviorofan'optimally-behavingagent'–thecancercell.Usingmulti-omicdatafrom27colorectalcancer(CRC)patientsasproof-of-principle,weshowthatIRLprovidesasystematicandscalableapproachtoformallystatingandsolvingtheproblemofcancerevolution.Byprovidingalineagepath(i.e.,sequencesofalterations)obtainedviasubclonalreconstructionforeachtumor,weareabletoreducethiscomplexproblemtotherecoveryofanassociatedreinforcementlearningrewardfunction.Theserewardfunctionshavethepotentialtomodelunknownmolecularmechanismsdrivingintratumorheterogeneityandtoelucidatecanceretiologies.

Page 80: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

72

ArtificialIntelligenceforEnhancingClinicalMedicine

Predictingdisease-associatedmutationofmetal-bindingsitesinproteinsusingadeeplearningapproach

MohamadKoohi-Moghadam,HaiboWang,YuchuanWang,XinmingYang,HongyanLi,JunwenWang,HongzheSun

DepartmentofChemistry,TheUniversityofHongKong,HongKong,China;

DepartmentofHealthSciences,MayoClinic,Scottsdale,AZ,USA;DepartmentofMolecularPharmacologyandExperimentalTherapeutics,MayoClinic,

Scottsdale,AZ,USA;CenterforIndividualizedMedicine,MayoClinic,Scottsdale,AZ,USA;

CollegeofHealthSolutions,ArizonaStateUniversity,Scottsdale,AZ,USA

JunwenWangMetalloproteinsplayimportantrolesinmanybiologicalprocesses.Mutationsatthemetal-bindingsitesmayfunctionallydisruptmetalloproteins,initiatingseverediseases;however,thereseemedtobenoeffectiveapproachtopredictsuchmutationsuntilnow.Herewedevelopadeeplearningapproachtosuccessfullypredictdisease-associatedmutationsthatoccuratthemetal-bindingsitesofmetalloproteins.Wegenerateenergy-basedaffinitygridmapsandphysiochemicalfeaturesofthemetalbindingpockets(obtainedfromdifferentdatabasesasspatialandsequentialfeatures)andsubsequentlyimplementthesefeaturesintoamultichannelconvolutionalneuralnetwork.Aftertrainingthemodel,thenetworkcansuccessfullypredictdisease-associatedmutationsthatoccuratthefirstandsecondcoordinationspheresofzinc-bindingsiteswithanareaunderthecurveof0.90andanaccuracyof0.82.Ourapproachstandsforthefirstdeeplearningapproachforthepredictionofdisease-associatedmetal-relevantsitemutationsinmetalloproteins,providinganewplatformtotacklehumandiseases.

Page 81: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

73

ArtificialIntelligenceforEnhancingClinicalMedicine

GENERAL

POSTERPRESENTATIONS

Page 82: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

74

General

RankingRASpathwaymutationsusingevolutionaryhistoryofMEK1

KatiaAndrianova,IgorJouline

OhioStateUniversity,DepartmentofMicrobiology,Columbus,Ohio43210

KatiaAndrianovaTheRas/MAPK(ratsarcoma/mitogen-activatedproteinkinase)signalingpathwayisinvolvedinessentiallyallaspectsoforganismaldevelopment,fromthefirstcelldivisionsintheearlyembryotopostnataldevelopmentandgrowth.Givenitscriticalfunction,itisnotsurprisingthatderegulatedRas/MAPKsignaling,resultingfromeithergeneticorenvironmentalperturbations,canleadtocanceranddevelopmentalabnormalities.Alargeclassofsuchabnormalities,knownasRASopathies,isassociatedwithactivatinggerm-linemutationsinmanycomponentsoftheRaspathway.Overthepastdecadewhennextgenerationsequencing(NGS)hasbecomevaluableandcost-effectivetoolforresearchapplicationsandclinicaldiagnosticsofMendeliandiseases,simultaneoussequencingofmultiplegenesinMAPKsignalingpathwayshaveyieldedmanyreportswithhundredsofmutationspossiblyassociatedwithRASopathiesandcancer.Inparticular,multiplenewmutationswereidentifiedinMEK1kinase.Themajorityofnewlydiscoveredcodingvariationsneitherhavebeendescribedinotherindividualsnorhavebeenstudiedorfunctionallyanalyzedincellularoranimalmodels,thusleavingclinicianstorelyoninsilicopredictionsofthe“variantsofuncertainsignificance”consequenceswithcomputationalsoftware,suchasPolyPhenandSIFT.Automatedsequencesearchesusedinthesemethodsdonotdistinguishpossibleduplicationeventsinthegenes’histories,hencemultiplesequencealignment(MSA)setsusuallyincludebothorthologandparalogcopies.Aspurifyingselectiontreadsononeoftheduplicatecopyitcanbecomeassociatedwithadifferentphenotypecomparedtoitsparalogoussiblingand/ortotheparentalgene.InmostcasesofMendeliandiseasesonlyonespecificduplicateofthegeneinthehumangenomeresultstobeassociatedwithadisease.Thisindicatestheimportanceofconsideringbothcommonancestorsandanygene’sduplicationhistoryforthevariantsinterpretation.ThepresenceofsevenhumanMEKproteinsincreasesthechancesofincludingparalogsintotheanalysis,andtherefore,substantiallylimitsmutationinterpretation.InthisstudyweestablishedthefirstprecisedescriptionofanevolutionaryhistoryofMEKkinasesandidentifiedpotentialduplicationevents.WedeterminedthatMEK1isanancestoroftheentireMEKfamily.Indepthanalysisoftheorthologousproteinsshowedthatessentiallyallexperimentallyprovenpathogenicmutationswerepredictedas“damaging”byourapproach.BycomparingourresultswiththepredictionsmadebyPolyPhen-2andSIFTweshowedhowcarefulanalysisofanevolutionaryhistoryofagenemayimproveaccuracyofmissensemutationsoutcomesprediction.

Page 83: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

75

GeneraNlGeneral

IntegrativeAnalysisofCOPDandLungCancerMetadataRevealsSharedAlterationsinImmuneResponse,PTENandPI3K-AKTPathways}

DannielleSkander1,ArdaDurmaz1,MohammedOrloff2,GurkanBebek1

1CaseWesternReserveUniversity,2UniversityofArkansasforMedicalSciences

GurkanBebekChronicobstructivepulmonarydisease(COPD)andlungcancerareamongtheleadingcausesofdeathworldwide.Whileitisbelievedthetwodiseasesarerelated,themechanismsbehindthisrelationshipremainunclear.WeinvestigatetherelationshipbetweenCOPDandlungcancerusinganintegrative-omicsapproach.IntegrationofepigeneticandmRNAgeneexpressiondataallowsustodiscoverthefunctionallyrelevantgenes,i.e.,thegenescrucialfordiseasedevelopment.Usingthisapproach,ourstudysuggeststhatthemechanismsdrivingthedevelopmentofbothdiseasesarerelatedtotheinterleukinimmuneresponse(IL4andIL17),PTENandPI3K-AKTpathways.UnderstandingthisrelationshipbetweenCOPDandlungcanceriscrucialforfuturepreventionandtreatmentoptionsofbothCOPDandlungcancer.

Page 84: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

76

General

Investigatingsourcesofirreproducibilityinanalysisofgeneexpressiondata

CarlyA.Bobak,JaneE.Hill

DartmouthCollege

CarlyBobakTheuseofbigdatapromisestochangethelandscapeofbiomedicalresearch;however,irreproducibilityofresultsremainsaproblem.Inthiswork,wesetouttoinvestigateproposedmethodstoincreasereproducibilityofgeneexpressionresults.Specifically,wetestthefollowingthreehypotheses:Resultsfrompathwayenrichmentwillbemoresimilaracrossdatasetsthanresultsondifferentiallyexpressed(DE)genesSimilarityacrosssmallerdatasetswillbelowerthansimilarityinlargerdatasetsResultsfrommulti-cohortdatawillbemoresimilarthanresultsfromsinglecohortdataWeselectedthreeuniquedatasetsfromtheGeneExpressionOmnibusthatincludeactiveTBpatients,spanningpediatricandadultpatients.IneachdatasetwerankedDEgenesastheywereassociatedwithTBvsother(healthycontrols,otherdiseases,orlatenttuberculosisinfection).Wethencalculatedtherankbiasedoverlap(RBO)oftherankedgenesacrosseachdataset.RBOisasimilaritymeasurescaledbetween0and1andcanbeinterpretedastheaverageagreementbetweentwolists.Genesetenrichmentanalysis(GSEA)wasperformed,andwecalculatedarankforthepathwayhitsandcomparedRBOforassociatedpathwaysbetweendatasets.Onaverage,theRBOincreasedbyafoldchangeof1.83×10^4whencomparingsimilarityofassociatedpathwaystosimilarityofDEgenes.Wethendividedeachdatasetinhalfandrepeatedtheanalysisonallsub-datasets.Sub-datasetsfromthesameparentdatasethadsimilarresults(meanRBOof0.60,sd=0.24)asopposedtosubsetsfromadifferentparentdataset(mean=0.10,sd=0.15).Contradictingouroriginalhypothesis,overallRBOcalculatedbetweensubsetsfromdifferentparentdatasetsdidnotnecessarilydecreasecomparedtotheinitialRBOcalculation–infact,halfoftheRBOcomparisonsincreasedinthesub-datasetscomparedtousingthewholedatasets.Totestthefinalhypothesis,weco-normalized,merged,andthenrandomlydivideddatasetsintothreeapproximatelyequalpieces.WerepeatedtheDEanalysisoneachpieceofthemergeddataset.Acrossmixeddatasets,themeanRBOwas0.023(sd=0.43).Heterogeneousdatasetsweremorealikethanuniquedatasets,butlessalikethanasingledivideddataset.However,theRBOsfrommixeddatasetscomparedtooriginaldatasetswerenotstatisticallysignificantlydifferentfromtheRBOscomparingresultsfromtheoriginaldatasets.Thus,wedemonstratedthatassociatedpathwaysaregreatlymorereproduciblethanassociatedgenes.Furtherstudyisnecessarytoinvestigatetheconditionsunderwhichstatisticalpowerandheterogeneityofdatainfluencereproducibilityoffindingsfromgeneexpressionstudies.

Page 85: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

77

General

EthereumandMultiChainblockchainsassecuretoolsforindividualizedmedicine

CharlotteBrannon,GamzeGursoy,SarahWagner,MarkGerstein

YaleUniversityComputationalBiologyandBioinformaticsProgram

CharlotteBrannonWiththerapidlydecreasingcostofgenomesequencingandadventofindividualizedmedicine,relianceonindividualgenomicdatawillsoonbeintegraltomedicaltreatmentdecisions.Forexample,apatient’spersonalgenomicsequencewillprovidephysicianswithinformationonwhichtobasetestsanddiagnoses.Similarly,pharmacogenomicsdatawillrevealthemosteffectiveprescriptionsforaparticularpatient.Genomicdatawillneedtobesharedefficientlyamongmultipleparties.However,becausethesearesensitivepersonaldatawhichwilldirectlyimpactmedicaltreatmentdecisions,theymustbemaintainedinasecure,high-integrityfashion.Blockchaintechnologyisonewaytoachievesecure,high-integritydatastorage.Wepresenttwoproof-of-conceptsolutions,oneforstoringandqueryingpersonalgenomicsequencedatainaMultiChainblockchaindesignedfordirectsharingwithphysicians;andoneforstoringandqueryinggene-druginteractiondatainanEthereumblockchainsmartcontractdesignedforsharedaccessamongpermissionedresearchersandphysicians.Despitethehighsecurityandintegritythatcomeswithblockchaindatastorage,thereisatrade-offwithdataaccessefficiencyandstoragecosts.Weovercomethesechallengesbydevelopingnovelstoragetechniques.Whenstoringpersonalgenomicsequencedata,wedonotstoretheactualsequencedatabutratherasetofmeta-datawhichcanbeusedincombinationwithareferencegenometoreconstructtheoriginalsequences.Whenstoringpharmacogenomicsdata,weuseanindex-based,multi-mappingapproachtoprovidetime-andspace-efficientinsertionandquerying.

Page 86: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

78

General

GenomicpredictorsofL-asparaginase-inducedpancreatitisinpediatriccancerpatients

BrittI.Drögemöller,GalenE.B.Wright,ShahradR.Rassekh,ShinyaIto,BruceC.Carleton,ColinJ.D.Ross,TheCanadianPharmacogenomicsNetworkforDrugSafetyConsortium

FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BC,Canada;BCChildren’sHospitalResearchInstitute,UniversityofBritishColumbia,Vancouver,BC,Canada;DepartmentofPediatrics,FacultyofMedicine,UniversityofBritishColumbia,Vancouver,BC,Canada;ClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Universityof

Toronto,Toronto,ON,Canada;PharmaceuticalOutcomesProgramme,BCChildren’sHospital,Vancouver,BC,Canada

BrittDrogemollerBackground:L-asparaginaseishighlyeffectiveinthetreatmentofpediatricacutelymphoblasticleukemia.Unfortunately,theuseofthistreatmentislimitedbytheoccurrenceofpancreatitis,asevereandpotentiallylethaladversedrugreaction,whichoccursin2-18%ofpatients.AspreviousstudieshavebeenunabletoidentifystrongassociationsbetweenclinicalvariablesandsusceptibilitytoL-asparaginase-inducedpancreatitis,geneticfactorsareexpectedtoplayanimportantrolethisadversedrugreaction.Objectives:WesoughttoexploretheroleofthesegeneticsusceptibilityfactorstoL-asparaginase-inducedpancreatitisinpediatriccancerpatients.Methods:PatientswhoweretreatedwithL-asparaginasewererecruitedfrom13pediatriconcologyunitsacrossCanada(n=284)andextensiveclinicaldatawerecollectedforallpatients.GenotypingwasperformedusingtheIlluminaHumanOmniExpressandGlobalScreeningArraysandpancreaticgeneexpressionprofileswereimputedintheseindividualsusingGTExv7andS-PrediXcan.Genome-andtranscriptome-wideassociations(GWASandTWAS)wereperformedtoidentifyassociationswithL-asparaginase-inducedpancreatitis.Results:GWASanalysesidentifiedsignificantassociationsbetweengeneticvariantsinHLA-DQA1and–DRB1andpancreatitis,whileTWASrevealedthatindividualsexperiencingL-asparaginase-inducedpancreatitisexhibitedlowerexpressionlevelsofHLA-DRB5.FurtherinterrogationoftheTWASdatarevealedanenrichmentingenesinvolvedinthesomaticdiversificationofimmunereceptors.Conclusions:Theseanalysesuncoveredanassociationbetweengeneticvariationinimmune-relatedgenesandthedevelopmentofL-asparaginase-inducedpancreatitis.TheseassociationsmirrorpreviousassociationswiththeHLAregionand(i)pancreatitisinducedbyotherdrugsand(ii)L-asparaginase-inducedhypersensitivity.

Page 87: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

79

General

NITECAP:Anovelmethodandinterfacefortheidentificationofcircadianbehaviorinhighlyparalleltime-coursedata

ThomasG.Brooks1,CrisW.Lawrence1,NicholasF.Lahens1,SoumyashantNayak1,DimitraSarantopoulou1,GarretA.FitzGerald1,2,GregoryR.Grant3

1InstituteforTranslationalMedicineandTherapeutics(ITMAT),UniversityofPennsylvania;

2SystemsPharmacologyandTranslationalTherapeutics;3DepartmentofGenetics,UniversityofPennsylvania

ThomasBrooksWeintroduceanewtoolcalledNITECAPforthetaskofidentifyingcircadianbehaviorinmassivelyparallelmeasurementsofbiologicalentities;forexample,findingcircadiangenesfromgeneexpressiontimecoursedatameasuredbyRNA-Seqormicroarrays.NITECAPemploysapermutation-basedapproachwhichusesanovelstatisticdesignedtobesensitivetocircadianbehavior.NITECAPalsousesanapproachtomultiple-testingwhichproducesq-valuesdirectlywithoutneedingtofirstgeneratep-valueswhichthenneedtobeadjusted.Ourapproachhasseveraladvantagesparticularlywhenindividualp-valuesareunderpoweredorunreliable.Importantly,wehavedevelopedanintuitiveuser-friendlyweb-basedinterfacewhichenablesinvestigatorstoperformrobustcircadiananalysesofthistypedirectlywithoutexpertinformaticssupport.Userscanquicklyscrollthroughtimecourseprofilessortedbyeffectsize,greatlyfacilitatingthechoiceofsignificancethresholdsthatcurrentlyrequiremakingblindchoicesofnumericalcutoffs.Puttingthistypeofanalysisinthehandsoftheinvestigatorscansignificantlystreamlinetheirresearch.ThewebsitealsoenablestheotherstandardsignificancetestssuchasJTKandANOVAandprovidestoolstoperformcomparativestudies,suchasfindingphaseoramplitudedifferencesbetweendifferentconditions.NITECAPisfreelyavailableforpublicuseat:http://www.nitecap.org

Page 88: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

80

General

TheInterplayofObesityandRace/EthnicityonMajorPerinatalComplications

YaadiraBrown,MPH1;OlubodeA.Olufajo,MD,MPH2;EdwardE.CornwellIII,MD2;WilliamSoutherland,PhD3

1ResearchCentersinMinorityInstitutions:HowardUniversity,HowardUniversityCollegeofMedicine;2ResearchCentersinMinorityInstitutions:HowardUniversity,CliveCallender

Howard-HarvardHealthSciencesOutcomesResearchCenter;3ResearchCentersinMinorityInstitutions:HowardUniversity

YaadiraBrownBackground:Ithasbeenestablishedthatasignificantdisparityexistsintheratesofadverseperinataloutcomesacrossdifferentracial/ethnicgroups,withnon-HispanicBlackwomengenerallybeingmostimpacted.Thereisalsoevidencethatobesityisassociatedwithadverseperinataloutcomes.Althoughsomestudieshaveexaminedtheimpactofrace/ethnicityandobesityonadverseperinataloutcomes,moststudieshavedonesousinglocalorstatewidedata.Thisstudyaimstouseanationalsampletodeterminetheroleofobesityintheracial/ethnicdisparitiesseeninadverseperinataloutcomesintheUnitedStates.Methods:DatafromtheNationalInpatientSamplewasutilizedinselectingpregnantwomenadmittedfordeliverybetween2010and2014.Demographics(race/ethnicity,insurancetype,householdincome,co-morbidities)andhospitalcharacteristicswereextracted.Race/ethnicitywascategorizedasNon-HispanicWhites(NHW),Non-HispanicBlacks(NHB),andHispanics.Outcomesofinterestweregestationaldiabetes,pre-eclampsia,pre-termbirth,andhospitalmortality.Multivariatelogisticregressionswereperformedtodeterminetheindependentpredictorsoftheoutcomes,usingtwosetsofmodels;onewhichincludedobesityasavariableinthemodelandonewhichdidnot.ThedifferencesbetweenthetwosetsofmodelswerecomparedbyperformingtheWaldTest.Results:Ourcohortconsistedof15,561,942pregnantindividualsadmittedfordelivery.Therewere9,247,729(59.43%)NHW,2,552,569(16.4%)NHB,and3,761,644(24.17%)Hispanic.Comparedtoothergroups,NHBhadsignificantlyhigherratesofpre-eclampsia(5.1%),pre-termbirth(9.4%),andhospitalmortality(.11%).Theyalsohadthehighestratesofobesity(9.0%).Onmultivariateanalysis,NHBweremorelikelytohavepre-eclampsia(AdjustedOddsRatio[aOR]1.26;95%ConfidenceInterval[CI]1.23-1.29),pre-termbirth(aOR1.38;95%CI1.34-1.41),andhospitalmortality(aOR2.05;95%CI1.2-3.38)whencomparedtoNHW.However,theyhadasimilarriskforgestationaldiabetes(aOR0.94;95%CI0.91-0.96)asNHW.Obesitywassignificantlyassociatedwithgestationaldiabetes(aOR3.08;95%CI3.02-3.15),pre-eclampsia(aOR2.14;95%CI2.09-2.19),andpre-termbirth(aOR1.04;95%CI1.01-1.06).Althoughthedifferenceswereminimal,theregressionmodelsthatincludedobesityasavariablebetterpredictedtheoutcomesthanthosethatdidnotwhenassessinggestationaldiabetes,pre-eclampsia,andpre-termbirth.Conclusion:Thesefindingsfurtherconfirmthatracial/ethnicdisparitiesexistamongstadverseperinataloutcomes,withNHBbeingdisproportionatelyaffected.Theyalsosuggestthatobesityplaysasignificantroleintheracial/ethnicdisparitiesthatdoexistfortheadverseperinataloutcomesmeasured,otherthanhospitalmortality.Thesedatasuggestthataddressingobesityinthepopulationmaybebeneficialinimprovingperinataloutcomes,buttheyalsosuggestthatmoreresearchisneededtoidentifythemajorfactorsthatdrivetheracial/ethnicdisparitiesthatexistamongstperinataloutcomesintheUnitedStates.

Page 89: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

81

General

AComparisonofPharmacogenomicInformationinFDA-ApprovedDrugLabelsandCPICGuidelines

KatherineI.Carrillo1,TeriE.Klein2

1HenryM.GunnHighSchool,PaloAlto,CA;2StanfordUniversity,Stanford,CA

KatherineCarrilloPharmacogenomics(PGx)isusefulinhelpingtopredictapatient’slikelyreactiontoamedicationbasedontheirgenotype,allowingforpersonalizedmedicine.TheFDAmaintainsa“TableofPharmacogenomicBiomarkersinDrugLabeling”(https://www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenomic-biomarkers-drug-labeling)consistingofpharmacogenomicinformationfoundinthedruglabeling.However,manylabelsonthelistdonotcontainadviceforaclinicianabouthoworwhentouseapatient’sgeneticinformation.GuidelinescreatedbytheClinicalPharmacogeneticImplementationConsortium(CPIC;https://cpicpgx.org/)containinformationabouthowtousepatientgeneticinformationwhenprescribingdrugs.Also,CPICprovidesguidelinesforsomedrugsnotcurrentlyontheFDAbiomarkerlist,thoughitdoesnotprovideguidelinesforeverydrugonthebiomarkerlist.UsingPharmGKBannotatedFDA-approvedlabels(throughOctober2019),weevaluatedlabelinformationtodetermine(1)whichlabelscontainedanykindofprescribinginformationincludingasuggestedalternatedrug,dosinginformationorspecialconsiderationsbasedonthepatient’sgenotype/metabolizerstatus,(2)whichPharmGKBannotatedlabelswerepresentontheFDAbiomarkerlist,and(3)whatgeneswereinvolved.WedidnotincludeFDAlabelsannotatedforgeneticvariationincancercells;onlygermlinevariationwasincluded.WecomparedallavailableCPICguidelinerecommendationstotheinformationfromthelabels.Weidentifiedwherethelabelsandguidelinesaresimilarornot.PharmGKBhas223annotations(notincluding82annotationsforcancercellDNAvariation)basedon219FDA-approveddruglabels.Ofthese,199labelsarecurrentlyonthebiomarkerlistand17wereonthebiomarkerlistatonetimebuthavebeenremovedbytheFDA.Twentylabelshavedosinginformationand35recommendanalternatedrugbasedongenotype/metabolizerphenotype.Another34labelshavesomeotherspecialconsideration,butmostlabelsonthebiomarkerlist(136)havenoguidanceforcliniciansaboutwhattodoaboutthebiomarker,ifanything.Thereare45drugswithpublishedCPICguidelines(https://cpicpgx.org/genes-drugs/).Thirty-sixofthedrugshavealabelontheFDAbiomarkerlistbuttheinformationonthelabeldoesnotalwaysmatchtheguideline.Only21oftheCPICdrugshavelabelswithguidance.Forsomedrugs,thePGxinformationonthelabelsissimilartotheCPICguidelinesbutdifferentformanyothers.TheFDAbiomarkerlisthasmoredrugsthanCPICguidelineswrittenandinsomecasesthelabelstellclinicianswhentheyshouldtestapatientwhileCPICdoesn’ttalkabouttesting.However,formostdrugs,thelabelsdon’tgivethecliniciansalotofinformationaboutwhattodowiththeirpatients’genetictestresults.ForthedrugswithCPICguidelines,thereismoreinformationabouthowtousegenetictestresultsandwhy.FundedbyNIH/NIGMSR24GM61374.

Page 90: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

82

General

xTEA:atransposableelementinsertionanalyzerforgenomesequencingdatafrommultipletechnologies

ChongChu1,RebecaMonroy2,SoohyunLee1,E.AliceLee2,PeterJ.Park1

1HarvardMedicalSchool,2BostonChildren'sHospital

E.AliceLeeTransposableelements(TEs)comprisenearly50%ofthehumangenome.AlthoughmostoftheTEsarenowsilent,severaltypesofretrotransposonsincludingLINE-1,Alu,andSVAarestillactive.SomaticTEinsertionshavebeenshowntooccurfrequentlyinmultipletumortypes[1,2]andatalowrateinneuronsofphenotypicallynormalindividuals[3].MultipletoolshavebeendevelopedtocallTEinsertionsfromgenomesequencingdata,butanefficienttoolthatcanidentifybothgermlineandsomaticTEinsertionswithhighsensitivityandspecificityisstilllacking.Moreover,newertechnologiessuchas10XLinked-ReadandPacBioorNanoporelongreadsequencingprovideanunprecedentedopportunitytostudyTEs;however,currentmethodsdonottakeadvantageofthesedatatypes.Here,wepresentanewcomputationaltoolxTEA,buildingonourpreviousalgorithmTEA[1].ThistoolidentifiesTEinsertionsfromIlluminapaired-endreads,10XLinked-Reads,longreads,oracombineddataset.xTEAoutperformsMELT[4]andTraffic-mem[5]onnormalandtumorIlluminadata,respectively.Acomparisonofdifferentsequencingplatformsrevealsthattheanalysisoflongreadshadgreatersensitivityandspecificity,especiallyinrepetitiveregions.Both10XLinked-ReadsandlongreadsdemonstratedclearadvantagesovershortreadsinconstructingfulllengthTEinsertions.Betterperformancewasachievedonhybriddatacomparedtosingleplatformdata.Using22humansampleswitheitherPacBioorNanoporelongreadsandmatchedshortreads,weuncoveredLINE-1internalSVhotspotsandSVAinternalVNTRexpansion.xTEAisacomprehensivecross-platformTEinsertion-callingtool.Itcanbedeployedonacomputingcluster,AWS,andGoogleCloud,andisefficientforlargecohortanalysis.xTEAispubliclyavailableathttps://github.com/parklab/xTEA.References[1]Lee,Eunjung,etal."Landscapeofsomaticretrotranspositioninhumancancers."Science337.6097(2012):967-971.[2]Rodriguez-Martin,Bernardo,etal."Pan-canceranalysisofwholegenomesrevealsdriverrearrangementspromotedbyLINE-1retrotranspositioninhumantumours."BioRxiv(2017):179705.[3]Evrony,GiladD.,etal."Celllineageanalysisinhumanbrainusingendogenousretroelements."Neuron85.1(2015):49-59.[4]Gardner,EugeneJ.,etal."TheMobileElementLocatorTool(MELT):population-scalemobileelementdiscoveryandbiology."Genomeresearch27.11(2017):1916-1929.[5]Tubio,JoseMC,etal."ExtensivetransductionofnonrepetitiveDNAmediatedbyL1retrotranspositionincancergenomes."Science345.6196(2014):1251343.

Page 91: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

83

General

GoGetData(GGD):simple,reproducibleaccesstoscientificdata

MichaelCormier1,JonBelyeu1,BrentPedersen1,JoeBrown1,JohannesKoster2,AaronR.Quinlan1

1DepartmentofHumanGenetics,UniversityofUtah,SaltLakeCity,UT,USA;2Algorithmsforreproduciblebioinformatics,InstituteofHumanGenetics,UniversityofDuisburg-Essen,Essen,

NRW,Germany

AaronQuinlanGenomicsresearchiscomplicatedbythedifficultyofidentifying,collecting,andintegratingthenumerousdatasetsandannotationsgermanetoourexperiments.Furthermore,thesedataexistindisparatesources,andarestoredindiverse,oftenabusedformatspertainingtodifferentgenomebuilds.Thesecomplexitieswastetime,inhibitreproducibility,andcurtailresearchcreativity.Inspiredbythesuccessofsoftwarepackagemanagers,wehavedevelopedGoGetData(GGD;https://gogetdata.github.io/)asafast,reproducibleapproachtoinstallstandardizedpackagesofdataandannotationsforgenomicsresearch.

Page 92: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

84

General

GlobalepigenomicregulationofgeneexpressionandcellularproliferationinT-cellleukemia

SinisaDovat,YaliDing,BoZhang,JonathonL.Payne,FengYue

PennsylvaniaStateUniversityCollegeofMedicine,Hershey,PA,USA

SinisaDovatIkarosencodesaDNA-bindingproteinthatfunctionsasatumorsuppressorinT-cellacutelymphoblasticleukemia(T-ALL).Deletionand/orfunctionalinactivationofIkarosresultsinthedevelopmentofhigh-riskleukemia.ThemechanismsthroughwhichIkarosregulatesgeneexpressionandtumorsuppressioninT-ALLareunknown.Ikaroshaplo-knockoutmicedevelopT-ALLwith100%penetrancewitharrestofT-celldifferentiation.DuringtheprocessofmalignanttransformationtoT-ALL,IkaroshaploinsufficientthymocyteslosetheirremainingwildtypeIkarosallele.Re-introductionofIkarosintoIkaros-nullT-ALLcellsresultsincessationofcellularproliferationandinductionofT-celldifferentiation.Thus,thisisanoptimalsystemforstudyingIkarostumorsuppressorfunctionbecauseitcapturestheroleofIkarosinthetransitionfromamalignantstate(Ikaros-nullT-ALL)toanon-malignantstate(followingIkarosre-introduction).WeusedATAC-seqandChIP-seqofH3K4me1,H3K4me3,H3K27ac,andIkarostoperformdynamic,globalepigenomicandgeneexpressionanalysesatseveraltimepointsinIkaros-nullT-ALLandfollowingIkarosre-introductioninordertodeterminethemechanismsofIkaros’tumorsuppressoractivity.ExpressionanalysisidentifiedalargenumberofnovelsignalingpathwaysthataredirectlyregulatedbyIkarosandIkaros-inducedenhancers,andthatareresponsibleforthecessationofproliferationandinductionofT-celldifferentiationinT-ALLcells.EpigenomicanalysisidentifiednovelIkarosfunctionsintheepigeneticregulationofgeneexpression:Ikarosdirectlyregulatesdenovoformationanddepletionofenhancers;denovoformationofactiveenhancersandactivationofpoisedenhancers;andIkarosdirectlyinducestheformationofsuper-enhancers.GlobalanalysisofchromatinaccessibilityrevealedthatIkarosbindingresultedintheopeningofover3400previously-inaccessiblechromatinsites.ThisisaccompaniedbydenovoenrichmentofH3K4me1andH3K4me3modificationsandformationofdenovoenhancersandpromoters.ThesedatademonstratethatIkaroshaspioneeractivityandtriggerscoordinatedregulationofgeneexpression.Ikarospioneeringactivitywasfurtherdeterminedbydirectbindingofikarostoreconstitutednucleosomesbyelectromobilityshiftassay.Dynamicanalysesdemonstratethelong-lastingeffectsofIkaros’DNAbindingonenhanceractivation,denovoformationofenhancersandsuper-enhancers,andchromatinaccessibility.Inconclusion,ourresultsestablishthatIkaros’tumorsuppressorfunctionoccursviaglobalregulationoftheenhancerandsuper-enhancerlandscape,alongwithregulationofchromatinaccessibility,andidentifiednoveltumorsuppressorregulatorypathwaysinT-ALL.

Page 93: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

85

General

Apharmacogenomicinvestigationofthecardiacsafetyprofileofondansetroninchildrenandinpregnantwomen

GalenE.B.Wright,BrittI.Drögemöller,JessicaTrueman,KaitlynShaw,MichelleStaub,ShahnazChaudhry,SholehGhayoori,FudanMiao,MichelleHigginson,GabriellaS.S.Groeneweg,JamesBrown,LauraA.Magee,SimonD.Whyte,NicholasWest,SoniaBrodie,Geert’tJong,HowardBerger,ShinyaIto,

ShahradR.Rassekh,ShubhayanSanatani,ColinJ.D.Ross,BruceC.Carleton

BritishColumbiaChildren’sHospitalResearchInstitute,Vancouver,BritishColumbia,Canada;PharmaceuticalOutcomesProgramme,BritishColumbiaChildren’sHospital,Vancouver,BritishColumbia,Canada;Divisionof

TranslationalTherapeutics,DepartmentofPediatrics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;FacultyofPharmaceuticalSciences,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;ClinicalResearchUnit,Children'sHospitalResearchInstituteofManitoba,Winnipeg,

Manitoba,Canada;DivisionofClinicalPharmacologyandToxicology,TheHospitalforSickChildren,Toronto,Ontario,Canada;BritishColumbiaWomen’sHospitalandHealthCentre,Vancouver,BritishColumbia,Canada;DepartmentofAnesthesiology,PharmacologyandTherapeutics,UniversityofBritishColumbia,Vancouver,BritishColumbia,Canada;SchoolofLifeCourseSciences,FacultyofLifeSciencesandMedicine,King'sCollege,London,UnitedKingdom;DepartmentofPediatricAnesthesia,BritishColumbiaChildren'sHospital,Vancouver,BritishColumbia,Canada;MaxRadyCollegeofMedicine,RadyFacultyofHealth

Sciences,UniversityofManitoba,Winnipeg,Manitoba,Canada;DepartmentofObstetricsandGynecology,St.Michael'sHospital,Toronto,Ontario,Canada;EpiMethodsConsulting,Toronto,Ontario,Canada;DivisionofCardiology,DepartmentofPediatrics,Children'sHeartCentre,BCChildren'sHospital,UniversityofBritish

Columbia,Vancouver,CanadaGalenWrightBackground:5-HT3receptorantagonists,suchasondansetron,arehighlyeffectivemedicationsforthetreatmentofnauseaandvomiting.However,thesemedicationsarealsoassociatedwithprolongationoftheQTinterval,placingpatientsatriskofcardiacadverseevents.Pharmacogenomicinformationfortherapeuticresponsetoondansetronexists,particularlypertainingtoCYP2D6,butnostudyhasbeenperformedongeneticfactorsthatinfluencethecardiacsafetyofthismedication.Objectives:Determineondansetron-inducedcardiacelectrophysiologicalchangesinthreeuniquepatientcohortsandidentifypharmacogenomicpredictorsofQTintervalprolongation.Methods:Threepatientgroupsreceivingondansetronforthepreventionofnauseaandvomitingwererecruitedandfollowedprospectively(pediatricpost-surgicalpatientsn=101;pediatriconcologypatientsn=98;pregnantwomenn=62).Electrocardiogramswereconductedatbaselineandpost-ondansetronadministration.PharmacogenomicassociationswerethenassessedviaanalysesofcomprehensiveCYP2D6genotypingdataandgenome-wideassociationanalyses.Results:Intheentirecohort,62patients(24.1%)weredefinedascasesbasedonBazett-correctedQTcvalues.Themostsignificantshiftfrombaselineoccurredatfiveminutespost-administration(P=9.8x10-4).Genome-wideanalysesidentifiednovelcandidategenesforthisdrug-inducedphenotype.ThetwomostsignificantassociationswereobservedforamissensevariantinTLR3(rs3775291;P=2.00x10-7)andaneQTLforSLC36A1(rs34124313;P=1.97x10-7).Thesegenesareimplicatedinserotonin-andQT-relatedtraitsandthereforelikelyrepresentbiologicallyrelevantfindings.CYP2D6activityscorewasnotassociatedwithcase-controlstatus.Conclusions:Theresultsofthisstudyprovidethefirststeptowardsunderstandingthegenomicbasisofcardiacchangesoccurringafterondansetronuseinchildrenandpregnantwomen,withtheoverallgoaltoimprovethesafetyofthesecommonlyusedantiemeticmedications.

Page 94: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

86

General

TREND:aplatformforexploringproteinfunctioninprokaryotesusingphylogenetics,domainarchitectures,andgeneneighborhoodsinformation.

VadimM.Gumerov,IgorB.Zhulin

TheOhioStateUniversity

VadimGumerovKeystepsinacomputationalstudyofproteinfunctioninvolveanalysisof(i)relationshipsbetweenhomologousproteins,(ii)proteindomainarchitecture,and(iii)geneneighborhoodsthecorrespondingproteinsareencodedin.Eachofthesestepsrequiresaseparatecomputationaltaskandsetsoftools.Combiningtheresultsintoacompleteanalysisisusuallydonebyhand,whichistime-consuminganderror-prone.Herewepresentanewplatform,TREND(tree-basedexplorationofneighborhoodsanddomains),whichcanperformallthenecessarystepsinautomatedfashionandputthederivedinformationintophylogenomiccontext,thusmakingevolutionarybasedproteinfunctionanalysismoreefficient.TRENDisfreelyavailableathttp://trend.zhulinlab.org.TRENDconsistsoftwopipelines:(1)Domains,whichidentifiesproteindomains,transmembraneregionsandlow-complexitysegments,andmapsthisinformationonthephylogenetictree,and(2)Neighborhoods,whichidentifiesgeneneighborhoodsforthegivensetofproteinsequences,clustersthegenesbasedonshareddomainsoftheencodedproteins,identifiesoperonsandputsthederiveddataintophylogenomiccontext.LocallystoreddatabasesofthePfamprofileHiddenMarkovmodels(HMMs)andCDDposition-specificscoringmatricesareusedasasourceofmodelsfordomainsidentification.Anothersourceisarichcollectionofsignal-transductionspecificprofileHMMsderivedfromMiSTdatabase.Thepipelinesarehighlycustomizable.Onstart,bothpipelinesfirstalignprovidedproteinsandbuildphylogenetictrees.Thesestepscanbeskippedifaresearcheralreadyhasanalignmentoratreeandwouldliketousetheminstead.Optionallyredundancyofthesequencescanbereduced.Insteadofproteinsequences,proteinidentifierscanbeprovidedasinput;correspondingsequenceswillbefetchedfromRefSeqandMiSTdatabases.Resultsofthepipelinesarepresentedasinteractivepictureswithcross-linkstoPfam,CDD,RefSeqandMiSTdatabases.Allproducedresultscanbedownloadedforsubsequentanalysis.

Page 95: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

87

General

TrackSigFreq:subclonalreconstructionsbasedonmutationsignaturesandallelefrequencies

CaitlinF.Harrigan1,2,4,YuliaRubanova1,2,4,QuaidMorris1,2,3,4,5,6,AlinaSelega2,4

1DepartmentofComputerScience,UniversityofToronto,Toronto,Canada;2DonnellyCentreforCellularandBiomolecularResearch,UniversityofToronto,Toronto,Canada;3Departmentof

MolecularGenetics,UniversityofToronto,Toronto,Canada;4VectorInstitute,Toronto,Canada;5OntarioInstituteforCancerResearch,Toronto,Canada;6MemorialSloanKetteringCancer

Centre,NewYork,USA(pending)

CaitHarriganMutationalsignaturesarepatternsofmutationtypes,manyofwhicharelinkedtoknownmutagenicprocesses.Signatureactivityrepresentstheproportionofmutationsasignaturegenerates.Incancer,cellsmaygainadvantageousphenotypesthroughmutationaccumulation,causingrapidgrowthofthatsubpopulationwithinthetumour.Thepresenceofmanysubclonescanmakecancershardertotreatandhaveotherclinicalimplications.Reconstructingchangesinsignatureactivitiescangiveinsightintotheevolutionofcellswithinatumour.Recently,weintroducedanewmethod,TrackSig,todetectchangesinsignatureactivitiesacrosstimefromsinglebulktumoursample.Bydesign,TrackSigisunabletoidentifymutationpopulationswithdifferentfrequenciesbutlittletonodifferenceinsignatureactivity.Herewepresentanextensionofthismethod,TrackSigFreq,whichenablestrajectoryreconstructionbasedonbothobserveddensityofmutationfrequenciesandchangesinmutationalsignatureactivities.TrackSigFreqpreservestheadvantagesofTrackSig,namelyoptimalandrapidmutationclusteringthroughsegmentation,whileextendingitsothatitcanidentifydistinctmutationpopulationsthatsharesimilarsignatureactivities.

Page 96: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

88

General

AFlexiblePipelineforthePredictionofBiomarkersRelevanttoDrugSensitivity

V.KeithHughitt1,SayehGorjifard1,AleksandraM.Michalowski1,JohnK.Simmons2,RyanDale1,EricC.Polley3,JonathanJ.Keats4,BeverlyA.Mock1

1NCI,2PersonalGenomeDiagnostics,3MayoClinic,Rochester,4TGen

V.KeithHughittRecentyearshaveseenanexplosionintheavailabilityofpairedmolecularprofilinganddrugscreendata,providinganunprecedentedopportunityforthedevelopmentoftargetedtherapiesbasedonanindividual’sgeneticbackground.Despiteanumberofrecentsuccessesindiseasesrangingfromcysticfibrosistocancer,significanthurdlesremaininourabilitytoaccuratelypredicttreatmentsbasedonmolecularprofilingdata.Inparticular,fewsuchtoolsexistthatallowtheintegrationofheterogeneousdatatypes(e.g.genomic,transcriptomic,andsomaticmutations),alongwithhigh-throughputdrugscreendatatomakepredictionsabouttreatmentefficacy.Here,wedescribeageneralizedopen-sourcepipelinedevelopedfortheanalysisofprecisionmedicinedata,PharmacogenomicsPredictionPipeline,or“P3”.ThemodulardesignofP3enablestheinclusionofarbitraryinputdatatypesandtheselectionfrommultiplealternativemachinelearningalgorithms,whileautomatedstatisticalandvisualizationreportingstepsincorporatedthroughoutthepipelineassistinparametertuningandearlydetectionofproblematicdataelements.ByincorporatingexternalbiologicalannotationsfromsourcessuchasTheMolecularSignaturesDatabase(MSigDB),DrugSignaturesDatabase(DSigDB),andDrugBank,P3isabletodetectimportantpathwayscorrelatedwithdrugsensitivity,whiletheinclusionofmolecularprofilingandclinicaldatafromexternalpatientandcelllinesdatasetsallowsP3tofocusitseffortsongeneswhicharemostlikelytoplayaroleintherapeuticresponse.TodemonstratetheuseofP3forpreclinicalbiomarkerprediction,weappliedP3toanunpublishedmultiplemyelomadatasetconsistingofexome,RNA-Seq,anddrugscreendatafor1900compoundsacross45tumorcelllines.Furthermore,geneexpressionandclinicaldatafrom20additionalpublically-availablepatientandcelllinemultiplemyelomadatasets(>5,500samplesintotal),alongwithdatafromtheGDSCandCCLEdrugsensitivityexperimentswerealsoanalyzed,providingarichsourceofinformationwithrespecttothebiologicalrelevanceofputativebiomarkersdetectedbythepipeline.

Page 97: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

89

General

CreatingaMetabolicSyndromeResearchResource(MetSRR)

WillyshaJenkins1,ChristianRichardson2,ClarLyndaWilliams-DeVanePhD1

1FiskUniversityNashvilleTN,2DukeUniversityDurhamNC

WillyshaJenkinsMetabolicsyndrome(MetS)isamultifacetedsyndrome.Riskfactorsincludevisceraladiposity,dyslipidemia,hyperglycemia,hypertension,andenvironmentalfactors.Anestablishedcomponentofchronicdiseasesequela,MetSleadstoanincreasedriskofcardiovasculardiseaseandtype2diabetes.MetSalsoleadstoanincreasedriskofstroke.ComparativestudieshaveidentifiedheterogeneityinthepathologyofMetSacrossgroups,however,theetiologyofthesedifferenceshasyettobeelucidated.DespitethepresenceofpublicrepositoriesofbiologicalMetS-relateddata,theabilitytoaccessandworksaiddatahasitschallenges.Theprocessofqueryingdatabases,wrestlingwithsoftwareandwranglingdataintoworkableformatspriortoanalysisisbothcumbersomeandtimeconsuming.TheMetabolicSyndromeResearchResource(MetSRR)isacurateddatabasethatprovidesaccesstoMetSassociatedbiologicalandancillarydata.ItisanamalgamationofcurrentandpotentialbiomarkersofMetSextractedfromrelevantNationalHealthandNutritionExaminationSurvey(NHANES)datafrom1999-2016.Eachpotentialbiomarkerselectionwasdrivenbyinsightselucidatedbythereviewofover100peer-reviewedarticles.Itincludes28demographic,surveyandknownMetSrelatedvariables.Thereare9curatedcategoricalvariablesand42potentiallynovelbiomarkers.Allmeasuresarecapturedfromover90,000individuals.ThisbiocurationeffortwillprovideincreasedaccesstocuratedMetSrelateddata.ItwillalsoserveasahypothesisgenerationtoolfordisparateMetSetiologydiscovery,providingtheabilitytogenerate;andexportethnicgroup/race,sex,andage-specificcurateddatasets.MetSRRseekstobroadenparticipationinresearcheffortstoidentifyclinicallyevaluativedisparateMetSbiomarkers.Tothebestofourknowledge,MetSRRistheonlyMetSspecificdatabasetargetedatuncoveringthedisparateetiologyofMetSthroughbiocuration.

Page 98: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

90

General

Utilizingcohortinformationtofindcausativevariants

SenayKafkas,RobertHoehndorf

ComputationalBioscienceResearchCenter,Computer,ElectricalandMathematicalSciences&EngineeringDivision,KingAbdullahUniversityScienceandTechnology,4700KAUST,Thuwal,

23955-6900SaudiArabiaSenayKafkasIdentificationofcausativevariantsingenomicdataischallenging.Currentstudiesfocusonprioritizingvariantswithinindividualgenomes,orapplystatisticalmethods(e.g.GWAS)tolargecohorts.WiththerapidadvancementsandcostdecreaseinNGS,scientistsareabletoproducesequencedatafromlargediseasecohortsandhealthypopulation.Forexample,UKBiobankmakesavailablegenotypetophenotyperelationsfor>500,000individualsandwholeexomesequencing(WES)datafor50,000individuals.Patientswiththesame/similarsetofphenotypesmaysharethesame/biologicallyrelatedgeneticabnormalitiesandriskfactors.Theavailabilityofthesedatasetsmayallowustostratifyindividualsbytheirphenotypeandusethisinformationtoidentifycausativevariantswithinlargecohorts.WeproposeanewmethodthatstratifiespatientsbytheirphenotypesandidentifiesthesetofcausativevariantswhichcanexplainphenotypesinmostindividualswithinacohortfromWES/WGS.First,wegeneratedandusedsyntheticdiseasecohortstoevaluateourmethod.Weusedthehumangenotype-phenotypeassociationsfromClinVarandthesequencedatafrom1000Genomesandgeneratedsyntheticcohortswithdifferentpopulationsizesfor200randomlyselecteddiseasesfromClinVar.TogenerateasyntheticdiseasecohortofsizeN,firstwepickedrandomlyNindividualsfrom1000Genomesandthenforeachindividual,wepickedrandomlyoneofthevariantsofthegivendiseaseandaddedittothegenotypeofthegivenindividual.Wepre-processedthesequencedatabyannotatingwithCADDandselectingonlythemostdeleteriousvariantofagivengeneforeachindividual.Furthermore,we“normalize”pathogenicityscoresbasedontheirfrequencieswithinapopulationinordertoaccountfordifferentdistributionwithingenesbasedontheirlength.WethenapplyourmethodonUKBiobank.WedevelopedamethodthatidentifiescausativevariantsbyutilizinginformationaboutsharedphenotypeswithinacohortandcomparedthemagainstindividuallyprioritizingvariantsusingWES/WGSdataandaveragegeneranks.Ourapproachreliesonamachinelearningmodeltrainedonapathogenicitypredictionscore(e.g.CADD),thefrequencyofobservingapathogenicityscoreaboveacertainthresholdinthesamegenewithinapopulation,andusesthiscohortandphenotype-derivedinformationasfeaturetopredictcausativevariantswithinindividualgenomesequences.Ourmethodcanidentifycausativevariantsinsmallandmedium-sizedcohorts(2to100individuals).Asthediseasebecomesmorecomplex(i.e.involvingharmfulvariantsinmultiplegenes),ourmachinelearningmodelimprovesoverestablishedmethodsinparticularinlargercohorts(>80individuals).Currently,weappliedourmethodonUKBiobankandsuggestcandidatecausativevariantsfor1499complexdiseases.

Page 99: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

91

General

IntegratedanalysisofJAK-STATpathwayinhomeostasis,simulatedinflammationandtumour

MilicaKrunic1,AnzhelikaKarjalainen1,MojoyinolaJoannaOla1,StephenShoebridge1,SabineMacho-Maschler1,CarolineLassnig1,AndreaPoelzl1,MatthiasFarlik2,NikolausFortelny2,

ChristophBock2,BirgitStrobl1,MathiasMueller1

1InstituteofAnimalBreedingandGeneticsandBiomodelsAustriaUniversityofVeterinaryMedicineViennaAustria;2CeMM–CenterforMolecularMedicineAustrianAcademyofSciences

ViennaAustria

MilicaKrunicJanuskinases(JAKs)andsignaltransducersandactivatorsoftranscription(STATs)playakeyroleincytokinesignallingandinthedefenceagainstinfectionandcancer.JAK-STATsignallingcomponentsinteractwithchromatinremodellingproteinsandchangechromatinarchitecture/landscapeduringcelldifferentiationandrecognitionandeliminationofpathogens.Usingdifferentsequencingapproaches(ATAC-Seq,ChIPmentation,single-cellRNA-Seq,RNA-Seq),ourgoalistountangletherolesofJAK-STATproteinsinshapingchromatinlandscapesofmyeloidandlymphoidcellsinhomeostasis,sterile(simulated)inflammationandwithintumourmicroenvironment.Additionally,weareinvestigatinghowevolutionaryconservedSTATproteinisoformsinteractwithchromatinandco-regulatoryproteinstoinducecelltype-andgene-specificresponses.Thepostershowsoursummarisedfindingsasaresultofintegrationofdifferentapproaches.

Page 100: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

92

General

BEERS2:TheNextGenerationofRNA-SeqSimulator

NicholasF.Lahens1,ThomasG.Brooks1,DimitraSarantopoulou1,SoumyashantNayak1,CrisW.Lawrence1,AnandSrinivasan2,JonathanSchug3,4,GarretA.FitzGerald1,5,JohnB.Hogenesch6,

YosephBarash4,GregoryR.Grant1,4

1InstituteforTranslationalMedicineandTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;2PMACSEnterpriseResearchApplicationsandHigh

PerformanceComputing,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;3InstituteforDiabetes,Obesity,andMetabolism,PerelmanSchoolofMedicine,Universityof

Pennsylvania,Philadelphia,PA;4DepartmentofGenetics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;5DepartmentofSystemPharmacologyandTranslationalTherapeutics,PerelmanSchoolofMedicine,UniversityofPennsylvania,Philadelphia,PA;6DivisionofHumanGenetics,DepartmentofPediatrics,Centerfor

Chronobiology,CincinnatiChildren'sHospitalMedicalCenter,Cincinnati,OH

NicholasLahensTheaccurateinterpretationofRNA-Seqdatapresentsamovingtargetasscientistscontinuetointroducenewexperimentaltechniquesandanalysisalgorithms.Thischallengehasledresearcherstoperformasubstantialnumberofbenchmarkingstudiesinordertodeterminebestanalysispractices.Simulateddatasetshaveproventobeaninvaluabletoolintheseefforts.Despitethisstrongneedforsimulateddata,onlyafewRNA-Seqsimulatorshavebeenreleasedinthepublicdomain,andallofthemarebasedonsimplifyingassumptionsthatlimittheirutility.ToaddresstheseshortcomingsandgeneraterealisticsimulateddatawearedevelopingtheBenchmarkerforEvaluatingtheEffectivenessofRNA-SeqSoftware(BEERS)2:anopen-source,modularsimulatorthatmodelseachstepintheprocessofconvertingRNAmoleculesintosequencingreads.WetakeanempiricalapproachtogeneratingrealisticRNAsamplesreflectingbiologicalvariability,alternativesplicing,andallele-specificexpression,whichusesrealdatatotraintheparameters.Next,wemodelbiochemicalreactionsandbiasesfromeachstepinlibraryconstructionasseparatemodules.Usinganobject-orientedparadigm,eachmodulehaswell-definedinputsandoutputsallowinguserstoeasilysubstitutenewmodules.ThisdesigngivesBEERS2theflexibilitytomodelchangestolibraryconstructionandsequencingprotocols,evolvinginparallelwithsequencingtechnology.BEERS2isopensource,freelyavailable,andwillbeacrucialtoolforthecommunityaswecontinuetodevelopstandardsfortranscriptomeanalysis.

Page 101: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

93

General

EffectModificationbyAgeonaDiagnosticThree-Gene-SignatureinPatientswithActiveTuberculosis

LaurenMcDonnell1,CarlyA.Bobak1,2,MatthewNemesure1,JustinLin1,JaneE.Hill1

1ThayerSchoolofEngineeringatDartmouthCollege,2GeiselSchoolofMedicineatDartmouthCollege

LaurenMcDonnellIntroductionTuberculosis(TB)istheleadingcauseofdeathfromasingleinfectiousagentworldwide(1).In2017,therewere10millionreportedcasesofTBandanother1.3milliondeathsfromthedisease(1).ItiscurrentlytheleadingkillerforindividualswhoareHIVpositive(1).In2014,theWHOdevelopedtheambitiousSustainableDevelopmentGoals(SDGs)whichincluded"EndTB",amajorprogramaimingtoeradicatetheTBepidemicby2030(2).Accomplishingthiswillrequiremoreadvanceddiagnosticsthatarelessinvasiveanddeterminethediseasestatusmorequicklyandmorereliably.Inouranalysis,weaimtomodelriskfactorsassociatedwiththedevelopmentofTB.Here,wearelookingatdemographicfeaturesfrommulti-cohortstudiespullingdatafromthirtydifferentcountriesfromtheGeneExpressionOmnibusexaminingpatientswithactiveTB,latentTB,otherdiseases,andhealthycontrols.Thedataispulledpredominantlyfromdevelopingcountries,butalsoincludessamplesfromdevelopedcountries,includingtheUK,France,Germany,andtheUnitedStates.Intotal,thedatasetincludes3,096participants.Metaanalysisofsimilardatasetshaveproposedathree-gene-scoreasa"global"tuberculosismetric(3).ThistypeofanalysissuggeststhatallactiveTBpatients,regardlessofotherfactors,willexpressthisgenescore.OurhypothesisisthatthisactiveTBwillbeadditionallymediatedbydemographicfactorssuchasageandHIVstatusthatareassociatedwithTB.MethodologyWeperformedamultivariatelogisticregressionanalysistoidentifydemographicfeaturesassociatedwithculture-confirmedTuberculosis.Themodelfeaturesincludedage,HIVstatus,andgeneexpressionsforeachgeneindividually(GBP5,DUSP3,andKLF2),aswellasaninteractiontermforHIVandagewitheachofthethreegenes.ResultsTheresultsofourmultivariatelogisticregressionsuggestthatagemodifiesallthreegenesintheproposedglobalgenesignatures(p-valuesof5.38e-05,6.75e-05,and,0.01012,forGBP5,KLF2andDUSP3respectively).InitialfindingsalsoindicatethatHIVstatusisamediatoroftheeffectofGBP5(p-valueof0.03437).Knowingthattherelationshipbetweenthegeneexpressionofthesethreegenesvariesbydemographicsmaychangethewaythatadiagnosticisimplementedinclinic.Ourhopeisthatthisanalysiswillbeusedtofurtherrefinethethree-genesignatureforspecificdemographicgroupswhereitmaybemosteffectiveindiagnosingactiveTB.Citations(1)WHOGlobalTuberculosisReport2018www.who.int/tb/publications/global_report/en/(2)EndingTuberculosisby2030:CanWeDoIt?A.B.Suthar,R.Zachariah,Harrieshttps://www.ingentaconnect.com/contentone/iuatld/ijtld/2016/00000020/00000009/art00007?crawler=true(3)Genome-WideExpressionforDiagnosisofPulmonaryTuberculosis:aMulticohortAnalysishttps://www.ncbi.nlm.nih.gov/pubmed/26907218

Page 102: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

94

General

Classificationandmutationpredictionfromgastrointestinalcancerhistopathologyimagesusingdeeplearning

SungHakLee1,Hyun-JongJang2

1DepartmentofHospitalPathology,SeoulSt.Mary’sHospital,CollegeofMedicine,TheCatholicUniversityofKorea,2DepartmentofPhysiology,CollegeofMedicine,TheCatholic

UniversityofKorea

SungHakLeeBACKGROUND:Althoughmicroscopicanalysisoftissueslideshasbeenthebasisfordiseasediagnosisfordecades,intra-andinter-observervariabilitiesremainissuestoberesolved.TherecentintroductionofdigitalscannershasallowedforresearcherstousedeeplearningintheanalysisoftissueimagesbecausemanyH&Ewholeslideimages(WSIs)areavailable.Inthepresentstudy,weinvestigatedthepossibilityofadeeplearning-based,fullyautomated,computer-aideddiagnosissystemwithWSIsfromagastricadenocarcinoma(STAD)dataset.Inaddition,wetrainedthenetworktopredictseveralcommonlymutatedgenesinSTAD.Furthermore,weshowedthatdeeplearningcanpredictMSIdirectlyfromH&Eimages.MATERIALSANDMETHODS:Westudiedtheautomaticclassificationof‘normal’and‘tumor’regionsusingatotalof432H&E-stainedWSIsfromTCGAgastriccancerimagedataset.Theslidesweretiledinnon-overlapping360x360pixelwindowsatamagnificationof20x.Weused70%ofthosetilesfortraining,15%forvalidation,and15%forfinaltesting.Thedeeplearningwithconvolutionalneuralnetworkswasperformedbasedoninceptionv3architecture.TostudythepredictionofgenemutationsfromH&Eimages,averageareaunderthecurve(AUC)valuesforKRASandSMAD4mutation(93and88cases,respectively)werecalculatedusingourautomatictumorclassificationdeep-learningapproach.TostudythepredictionofMSI(MSSvs.MSI-H)fromH&Eimages,383caseswereenrolledusingthesameapproach.RESULTS:Theperformanceofourmethodiscomparabletothatofpathologists,withanAUCofupto0.999.Furthermore,wetrainedthenetworktopredicttwocommonlymutatedgenesinSTAD(KRASandSMAD)andinvestigatedwhethertheycanbepredictedfrompathologyH&Eimages.WefoundthatKRASandSMADmutationcanbepredictedfrompathologyimages,withAUCsof0.711to0.737,similarresultsfrompreviousstudieswithnon-smallcelllungcancerhistopathologyimagesusingdeeplearning.ForthepredictionofMSI,patch-levelandpatient-levelAUCswere0.843and0.912,respectively,whichissuperiortothepreviousstudieswithTCGA-COADand-STADhistopathologyimages.CONCLUSIONS:Thesefindingssuggestthatdeep-learningmodelscanassistpathologistsinthedetectionofcancersubtypesandinthepredictionofgenemutationsandMSIstatus.Aftertrainingonlargerdatasetsandprospectivevalidation,thisapproachhasthepotentialtoprovideimmunotherapytoamuchbroadersubsetofpatientswithSTAD.

Page 103: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

95

General

MappingtheEmergenceandMigrationofHematopoieticStemCellsandProgenitorsDuringHumanDevelopmentatSingleCellResolution

FeiyangMa,VincenzoCalvanese,SandraCapellera-Garcia,SophiaEkstrand,MatteoPellegrini,HannaK.A.Mikkola

DepartmentofMolecular,CellandDevelopmentalBiology,UCLA,LosAngeles,CA,USA

FeiyangMaHematopoiesisisestablishedduringdevelopmentthroughmultiplewavesofbloodcellproduction,startingwithlineage-primedprogenitorsrequiredfortheembryosneeds,andculminatinginthegenerationofself-renewinghematopoieticstemcells(HSCs)forlife-longhematopoiesis.Althoughhematopoieticontogenyhasbeenstudiedextensivelyinmice,welackknowledgeoftheanatomical,temporalandmolecularmapforhematopoieticdevelopmentinhuman.PriorstudiessuggestthatHSCsemergefromhemogenicendotheliumintheaorta-gonad-mesonephros(AGM)regionbetween4-6weeksofhumangestation.Extraembryonicsitesincludingtheplacenta,umbilicalandvitellinearteries,andtheyolksac,havebeenproposedtogenerateHSCsinthemouse.However,whetherthesamesitesgenerateHSCsinhumanisunclear,mainlyduetothelimitedaccesstodevelopmentaltissuesandlackofreliablemethodstoidentifydevelopinghumanHSCs.Wecreatedasingle-celltranscriptomemapofhemato-vascularcells(CD34+and/orCD31+)fromhumanhematopoietictissuesat1stand2ndtrimester.Usingamolecularsignatureofself-renewingHSCsdefinedinourpreviousmolecularandfunctionalstudies,wecouldidentifyCD34+Thy1+RUNX1+HOXA7+MLLT3+HLF+cellsasHSCsthroughoutdevelopment.Analysesof5-wkAGMrevealedadistinctpopulationofnewlyemergedHSCsthatvanishedby7wks.HSCscolonizedthefetalliverby6wks,wheretheyexpandedanddifferentiatedbeyond15wks.SmallbutdistinctpopulationexpressingHSCmolecularmarkerswasreproduciblydetectedin5wkplacentas.Atthistime,theheart,umbilicalcordandfetalliverlackedclearHSCpopulations,implyingminimalspreadingthroughcirculatingblood.Interestingly,precedingHSCcolonization,the5wkfetalliveralreadyharboredCD34+Thy1-RUNX1+HOXA7-MLLT3-HLF-progenitorsthatco-expressedmarkersassociatedwitherythro-myeloidandlympho-myeloidpotential.Comparablepopulationswereabundantintheyolksac,suggestiveoftheirorigin.Thisdata-setprovidesanunprecedentedresourcetodissectthedynamicsandmolecularpathwaysgoverningtheemergenceandprogressionofdistinctwavesofhematopoieticcellsduringhumandevelopment,andservesasareferencemapforthegenerationofHSCsinvitrofortherapeuticpurposes.

Page 104: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

96

General

Large-scaleMachineLearningandGraphAnalyticsforFunctionalPredictionofPathogenProteins

JasonMcDermott1,SongFeng1,WilliamNelson1,Joon-YongLee1,SayanGhosh1,ArifulKhan1,MahanteshHalappanavar1,JustineNguyen2,JonathanPruneda2,DavidBaltrus3,JoshuaAdkins1

1PacificNorthwestNationalLaboratory,2OregonHealth&ScienceUniversity,3Universityof

Arizona

JasonMcDermottProteinsenactthefunctionalityencodedbygenomesandsounderstandingproteinfunctioniscriticaltomanyareasofbiology.Predictionofproteinfunctionfromsequenceispossiblebecauseofevolutionaryrelationshipsbetweenproteinswithsimilarfunctions,andexistingalgorithmscanidentifythecorrespondingsequencesimilarity.However,manyproteinshavesimilarfunctionsbutdiversesequences,whichthwartexistingmethods,anddrivenbyadvancesinsequencingtechnologythenumberofproteinsequenceswithnoknownfunctionorsimilaritytoproteinsofknownfunctionislargeandgrowingrapidly.Weusereducedaminoacidalphabetmappingandkmer-basedproteinsequencerepresentationtodetectfunctionalsimilaritiesbetweenproteinsandapplythismethodtobacterialandviralproteinsthatmimiceukaryoticubiquitinligasesanddeubiquitinasesandclassesofbacteriocins.Thesemodelsallowpredictionofnovelexamplesthatarenotdetectedbytraditionalsequencesimilarity,andcanprovideinsightintoactivesitesorotherfunctionaldomainsfortheproteins.Toexploresequencespaceinamorediscovery-orientedwaywehaveappliedthisapproachtoaverylargesetofbacterialproteinsequences(>20millionsequences)anduseaGPU-basedalgorithmtoquicklycalculateasimilaritygraphbasedonproteinfeaturesbeyondtraditionalsequencesimilarity.Exascalegraphanalyticsmethodsareusedtoidentifygroupsofcloselyrelatedsequencesfromthesimilaritygraph.Weshowthatthismethodcanrecapitulateknownrelationshipsbetweenproteins,highlightinconsistenciesintheunderlyingproteindatabase,andprovidehypothesesforfunctionsofnovelproteinsthusprovidingalarge-scalesequencelandscape.

Page 105: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

97

General

Gene-setanalysisusingGWASsummarystatisticsandGTExdatabase

MasahiroNakatochi

DepartmentofNursing,NagoyaUniversityGraduateSchoolofMedicine

MasahiroNakatochiRecently,samplesizesofgenome-wideassociationstudies(GWASs)arerapidlyincreasing.Consequently,manygeneticlociassociatedwithtraitshavebeenidentified.ItisdifficulttointerprethowthesemanylociidentifiedbyGWAScontributetothetraits.AsafunctionofSNP,regulationofgeneexpressionlevelisconsidered.TheSNPiscalledasexpressionquantitativetraitloci(eQTLs).TheGTExprojectrevealedmanyeQTLsinmanytissuesofhuman.Inthisstudy,IproposeanapproachofagenesetanalysisusingGWASsummarystatisticsandGTExdatabasetoinvestigatehowthegeneticlociidentifiedbyGWAScontributetothetrait.Thisapproachhasthreesteps.Atfirst,trait-associatedSNPsareidentifiedbyGWAS.Second,geneswhoseexpressionlevelwasassociatedwithtrait-associatedSNPsinatleastonetissueintheGTExdatabasearesearched.Thesegeneswereclassifiedintoeitherofpositivelyornegativelycorrelatedgenes.Finally,genesetenrichmentanalysesofpositivelycorrelatedgenesandnegativelycorrelatedgenesareperformedwiththemodifiedFisher’sexacttesttoidentifytrait-associatedpathwaysorgenesets.Usingthisapproach,Ifoundserumuricacid(SUA)-associatedgenesetsbasedonaSUAGWAS.GenesetenrichmentanalysisofUniProttermsfoundtheterms“Williams-Beurensyndrome”,“sodium”,“transport”,“sodiumtransport”,and“alternativesplicing”wereenrichedforthepositivelycorrelatedgenes.ThisapproachprovidesanotherinsightintotheSNPsidentifiedbyGWAS.

Page 106: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

98

GeneralGeneral

TargetingCancerviaSignalingPathways:ANovelApproachtotheDiscoveryofGeneCCDC191'sDouble-agentFunctionusingDifferentialGeneExpression,HeatMap

AnalysesthroughAIDeepLearning,andMathematicalModeling

AnnieOstojic

PurdueUniversity

AnnieOstojicAccordingtoarecentJohnsHopkinsUniversitystudypostedinMayof2018,thenumberoftotalgenesinthegenomewasrecalculatedtobe43,162genescomprisedof21,306protein-codedgenesand21,865non-codedgenes.WithcompletionofbasepairsequencingintheHumanGenomeProjectbackin2003,hopeexistedforaccelerationofnewmedicaltreatmentsanddiseaseintervention.However,earlierbioinformaticprocesseswereunabletoproduceresultsquicklyenough,somanygenefunctionsremainunknowntodate.Aneedexiststoanalyzegenefunctionsinpathwaystomeetachangingmedicalindustryofpharmacogenomics,personalizedmedicine,andcancertreatmentsrelativetogeneexpressionpatterns.Newmethodologyfordeterminingfunctionsofunstudiedgenestorapidlyextrapolate,classify,andcorrelatetheirgeneexpressionstobiologicalpathwaysisattheforefrontofbioinformaticstudies.ThisresearchdiscoveredthefunctionofgeneCCDC191,acoiled-coildomain-containingprotein-codinggene,whosefunctionhadnotbeenfullystudiednordefined.AnovelapproachwasutilizedtodeterminethefunctionofCCDC191bycombininggeneexpressionanalysis,patientsurvivalanalysis,differentialgeneexpression,heatmapwithAIdeeplearning,andreverseengineeringmathematicalmodeling.ThisstudypresentsanalysesandinsightsintogeneCCDC191whichhavenotbeenperformedprior,anditprovidesareplicablemethodologywhichincorporatesAIdeeplearningimageclassification,andreverseengineeringmathematicalmodelingtodeterminegenefunctionsinpathwaysandcancerconnectedness.

Page 107: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

99

General

RFEX:SimpleRandomForestModelandSampleExplainerfornon-MachineLearningexperts

DragutinPetkovic,AliAlavi,DanDanCai,JizhouYang,SabihaBarlaskar

SanFranciscoStateUniversity(allauthors)

DragutinPetkovicMachineLearning(ML)isbecominganincreasinglycriticaltechnologyinmanyareas.However,itscomplexityanditsfrequent“non-transparency”createsignificantchallenges,especiallyinthebiomedicalandhealthareas.OneofthecriticalcomponentsinaddressingtheabovechallengesistheexplainabilityortransparencyofMLsystems,whichreferstothemodel(relatedtothewholedata)andsampleexplainability(relatedtospecificsamples).OurresearchfocusesonbothmodelandsampleexplainabilityofRandomForest(RF)classifiers.OurRFexplainer,RFEX,isdesignedfromthegroundupwithnon-MLexpertsinmind,andwithsimplicityandfamiliarity,e.g.providingaone-pagetabularoutputandmeasuresfamiliartomostusers.InthispaperwepresentsignificantimprovementinRFEXModelexplainercomparedtotheversionpublishedpreviously,anewRFEXSampleexplainerthatprovidesexplanationofhowtheRFclassifiesaparticulardatasampleandisdesignedtodirectlyrelatetoRFEXModelexplainer,andaRFEXModelandSampleexplainercasestudyfromourcollaborationwiththeJ.CraigVenterInstitute(JCVI).WeshowthatourapproachoffersasimpleyetpowerfulmeansofexplainingRFclassificationatthemodelandsamplelevels,andinsomecasesevenpointstoareasofnewinvestigation.RFEXiseasytoimplementusingavailableRFtoolsanditstabularformatofferseasy-to-understandrepresentationsfornon-experts,enablingthemtobetterleveragetheRFtechnology.

Page 108: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

100

General

ApparentbiastowardlonggenemisregulationinMeCP2syndromesdisappearsaftercontrollingforbaselinevariations

AyushT.Raman1,2,AmyE.Pohodich2,Ying-WooiWan2,HariKrishnaYalamanchili2,WilliamE.Lowry3,HudaY.Zoghbi2,ZhandongLiu2

1BroadInstituteofMITandHarvard,2BaylorCollegeofMedicine,3UniversityofCaliforniaLos

Angeles

AyushRamanBackground:RettsyndromeisaneurodevelopmentaldisordercausedbymutationsinMECP2,amethyl-bindingproteinwhosetaskistoorchestrategeneexpression,andMeCP2mutationsdisrupttheexpressionofseveralthousandgenes.Overthepasttenyears,anumberofstudiesobservedthatRettsyndromeandotherdisordersthataffectneuronalsynapsesseemtopreferentiallydysregulategenesthatarelongerthan100Kb.Theselength-dependenttranscriptionalchangesinMeCP2-mutantsamplesaremodest,but,giventhelowsensitivityofhigh-throughputtranscriptomeprofilingtechnology,herewere-evaluatethestatisticalsignificanceoftheseresults.Results:Wedeveloparobuststatisticalapproachtoestimatenoiseaccuratelyandidentifystatisticallysignificantgenelength-dependentchanges.Wefindthattheapparentlength-dependenttrendspreviouslyobservedinMeCP2microarrayandRNA-sequencingdatasetsdisappearafterestimatingbaselinevariability(i.e.,intra-sampledifferences)fromrandomizedcontrolsamplesacrosspublicallyavailable17differentMeCP2datasets.WeshowthatevenMAQC/SEQCPhase-IIIbenchmarkdatasetsarepronetothelonggenebias,whichdoesnotincludeMeCP2oritseffectsonexpression—suggestingthatthebiasisnotaninherentfeatureofgeneexpressionfollowingMeCP2disruption.WehypothesizedthatPCRamplification,aprocesssharedbybothmicroarrayandRNA-seqtechnologies,mightintroducetheobservedbiasinlonggeneexpression.WefindnobiaswithnanoStringtechnology,atechniquethatdoesnotusePCRamplification,forSEQC/MAQCsamplesorMecp2mutantsamples.Thisconfirmedournotionthatthepreviousobservationsoflong-genebiasresultedfromamplification-basedtechnologiesandthefailuretoestablishaproperbaseline.Conclusions:Weconcludethataccuratecharacterizationoflength-dependent(orother)trendsrequiresestablishingabaselinefromrandomizedcontrolsamples.WeproposethatsmallerfoldchangesintranscriptionobservedafterPCRamplificationleadstoanoverestimationoflonggeneexpressionlevels.

Page 109: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

101

General

Predictionofchronologicalandbiologicalagefromlaboratorydata

LukeSagers1,LukeMelas-Kyriazi2,ChiragJ.Patel3,ArjunK.Manrai1

1BostonChildren’sHospitalComputationalHealthInformaticsProgram,2HarvardUniversityDepartmentofMathematics,3HarvardMedicalSchoolDepartmentofBiomedicalInformatics

LukeSagersAginghaspronouncedeffectsonbloodlaboratorybiomarkersusedintheclinic.Priorstudieshavelargelyinvestigatedasinglebiomarkerorpopulationatatime,limitingacomprehensiveviewofbiomarkervariationandagingacrossdifferentpopulations.Herewedevelopasupervisedmachinelearningapproachtostudytheagingprocessusing356bloodbiomarkersmeasuredin67,536individualsacrossdemographicallydiversepopulations.Ourmodelpredictsagewithameanabsoluteerror(MAE)inheld-outdataof4.76yearsandanR2valueof0.92.Agepredictionwashighlyaccurateforthepediatriccohort(MAE=0.87,R2=0.94)butinaccurateforages65+(MAE=4.30,R2=0.25).Extensivevariabilitywasobservedinwhichbiomarkerscarrythemostpredictivepoweracrossdifferentagegroups,genders,andrace/ethnicitygroups,andnovelcandidatebiomarkersofagingwereidentifiedforspecificageranges(e.g.VitaminEforages18-45).Wefurthershowthatpredictorsaccurateforoneagegroupmayfailtogeneralizetoothergroups,andfindthatnearlyathirdofallbiomarkersexhibitnon-linearitynearadulthood.Aspopulationsworldwideundergomajordemographicchanges,itwillbeincreasinglyimportanttocataloguebiomarkervariationacrossagegroupsanddiscovernewbiomarkerstodistinguishchronologicalandbiologicalaging.

Page 110: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

102

General

WholegenomesequencinganalysisofinfluenzaCvirusinKorea

SooyeonLim,HanSolLee,JiYunNoh,JoonYoungSong,HeeJinCheong,WooJooKim

DivisionofInfectiousDiseases,DepartmentofInternalMedicine,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea;DivisionofBrainKorea21ProgramforBiomedicineScience,

CollegeofMedicine,KoreaUniversity,Seoul,SouthKorea;AsiaPacificInfluenzaInstitute,KoreaUniversityCollegeofMedicine,Seoul,SouthKorea

SooyeonLimThroughtheHospital-basedInfluenzaMorbidityandMortality(HIMM)surveillancesystem,973nasopharyngealswabspecimensfromchildrenunder2yearsofagewerecollectedandtestedforinfluenzavirusesusingreal-timePCR.Amongthetestedspecimens,383werepositiveforinfluenzaAand/orBvirus.InfluenzaCviruswasconfirmedinfivespecimens.Inthisstudy,weusedfiveinfluenzaCviruspositivespecimensandacell-culturedinfluenzaCvirus.ViralRNAwasisolatedusingtheQIAampviralRNAminikit(Qiagen,Hilden,Germany)followingamanufacturer’sinstructions.AllisolatedRNAwasfinallyelutedwith60ulofdistilledwater.ReversetranscriptionreactionwasperformedbyPrimescript1ststrandcDNAsynthesiskit(Takara,Shiga,Japan)usinguni-5’primer.Thegenome-wideamplificationoftheinfluenzaCviruswasperformedusingtaqpolymerase.TheamplifiedgenefragmentswereperformedusingtheNexteraXTDNAlibraryPrepkit(Illumina),accordingtothemanufacturer’sprotocol.ThisstudywasthefirstreportofinfluenzaCvirususingNGSanalysisinSouthKorea.Inthisstudy,youngchildrenwithinfluenzaCvirusinfectionshadacuterespiratoryillnesses,suchasfever,rhinorrhea,andcough,butnopneumoniaorsevererespiratoryillnesswasobserved.BasedonNGSanalysis,wecanexpandourunderstandingvarioussymptomsofinfluenzaCvirus.

Page 111: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

103

General

MiningtheHumuhumunukunukuapuaandtheShakaofAutismwithBigDataBiomedicalDataScience

PeterWashington,BriannaChrisman,KaitiDunlap,AaronKline,ArmanHusic,MichaelNing,KelleyMariePaskov,NathanielStockham,MayaVarma,EmilieLeBlanc,JackKent,Yordan

Penev,MinWooSun,Jae-YoonJung,CatalinVoss,NickHaber,DennisP.Wall

DepartmentsofPediatrics(SystemsMedicine)andBiomedicalDataScience,StanfordUniversity

DennisWallMentalhealthisarguablyatthecoreofallhealth,andearlychildhoodmentalhealthpredictsalongtermhealthylifecourse.Yet,finding,treating,andpreventingmentalhealthdisordersinchildrenislimitedbyreachandscalablemethods.Thankfully,advancesinAIandubiquitoustechnologyhavemarshaledinunparalleledopportunitiesforscalablemobilehealth.Wehaveconstructedaseriesofmobilesolutionsthattreatandtrackwhilesimultaneouslybuildingnovelcomputervisionlibrariesforprecisionmodels.Thesesolutionsfunctionasmobilegamesthatarehighlyengaginganddesignedfortheindividual,encouragingcompliancewiththerequired“dose”whilepassivelycollectingmetricstomeasure,andultimatelypredictoutcomes.Wecanquantifyordigitizeachild’sphenotypethroughthesepassivelycollecteddata,notjustonce,butmanytimes,asthechildplaysourgamesandlearnsthroughplaying.Thesegamesengendertrustandastheydo,we“crowd”buildacommunityofstakeholdersthatnotonlysharesPhenomedata,butalsodataontheirGenomeandtheEnvironment.Withthe3modalities,weusedatafusionmultivariatetechniquestoresolvetheG+E=Pequationforautismandsetthestagefordoingthesameinotherspectrumdisordersacrossmentalhealth.

Page 112: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

104

General

Developmentofarecurrencepredictionmodelforearlylungadenocarcinomausingradiomics-basedartificialintelligence

HeeChulYang,GunseokPark,JiEunOh

DivisionofConvergenceTechnology,NationalCancerCenterResearchInstitute

HeeChulYangPurpose:Thisstudyaimedatpredictingtherecurrenceaftercurativeresectionforthepatientswithlungadenocarcinoma(ADC)usingthephenotypicradiomicsfeaturesobtainedfromtheCTimages.Material:FromJanuary1,2010,toDecember31,2015,atotalof604primarylungADCpatientswhohadthetumorsizeof1-3cmunderwentcurativeresectionatasingleinstitution.Method:Atotalof604patients’preoperativeCTimageswereusedforfeatureextraction.Thefinaldatasetwasrandomizedintoatrainingset(n=424)andatestset(n=180)withtheratioof7:3.Radiomicsfeatureswereselectedfromt-test(P<0.05)andaradiomicssignaturewasclassifiedbythelogisticregressionmodel.TheoptimalmodelwasevaluatedthroughaROCcurve.Result:Inalogisticregressionanalysis,6radiomicsfeatureswerefinallyselectedfrom51featurestobuildaradiomicssignaturethatwassignificantlyassociatedwithrecurrence.Theoptimalmodelwasbuiltwithfeaturesassociatedwiththedependentvariable.TheypresentedgoodperformanceinthepredictionofrecurrencealonewithanAUCof76.2%accuracy.Thetestsetvalidated72.2%accuracy.Conclusion:Theradiomicssignaturecanbeausefulrecurrencepredictiontooleveninsmall-sizedlungADC.

Page 113: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

105

General

DRLPC:DimensionReductionofSequencingDatausingLocalPrincipalComponents

YunJooYoo1,FatemehYavartanu1,ShelleyB.Bull2

1SeoulNationalUniversity,2TheLunenfeld-TanenbaumResearchInstitute

YunJooYooGenome-wideassociationstudies(GWAS)usingsinglenucleotidepolymorphism(SNP)datausuallyhavemillionsofvariableswithcomplexcorrelationstructureresultingfromlinkagedisequilibrium.Whenmulti-SNPjointanalysisusingmultipleregressionisapplied,adimensionreductionmethodsuchasprincipalcomponentanalysiscanbeconsidered.ReplacingSNPdatawithprincipalcomponentscanresolvemulti-collinearitywhichoftenoccursinregressionusinghigh-densitysequencingorimputedSNPdata.However,theprincipalcomponentsconstructedfromallSNPvariablesinaregionarehardtointerpretasabiologicalentityandarenotusefulforlocalizationandfinemapping.Inthisstudy,weproposeanalgorithmDRLPC(DimensionReductionusingLocalPrincipalComponents)toreducethedimensionforregressionanalysisbyselectingclustersofSNPsinhighcorrelationandreplacingeachclusterbyalocalprincipalcomponentconstructedfromtheSNPsinthecluster.Thealgorithmaimstoresolvemulticollinearitybetweenupdatedvariablesbyconsideringvarianceinflationfactor(VIF)andremovingvariableswithhighVIF.WeexaminedthebehaviourofDRLPCbyapplyingthealgorithmtothe1000GenomesProjectdata.Chromosome22SNPsetsofthreepopulations(EUR,ASN,AFR)weredimensionreducedforeachgeneregionseparatelycomparingseveralchoicesofthresholdvaluesforclusteringandprincipalcomponentsselection.Whenaveragedacrossthegenes,theratioofthenumberoffinalvariablesoverthenumberoforiginalvariableswas50%forthegeneswith5~10SNPsandaslowas10%forthegeneswithmorethan1,000SNPs.ThereductionratewassmallerfortheAFRpopulationcomparedtotheotherpopulationsEURandASN,possiblyduetoweakerLDintheAfricanpopulation.Wealsocomparedthepowerofmulti-SNPtestsconstructedbasedonregressionresultsobtainedfromtheoriginaldataanddimensionreduceddata.ThesetestsincludegeneralizedWald,LC(linearcombination)tests,andMLC(Multi-binslinearcombination)tests.LCtestsandMLCtestsarealsodimensionreductiontechniquesinthesensethatLCcombinesallindividualeffectsintoaonedegreeoffreedomtestandandMLCcombinestheindividualeffectsintoalinearcombinationwithinabin(cluster)andconstructsatestwithdegreesoffreedomequaltothenumberofclusters.SinceDRLPCusesthesameclusteringalgorithmbasedoncliquepartitioningasMLCwecomparedresultsofMLCwithoriginaldatatoDRLPCWaldtestwithprocesseddataunderthesameclusteringthresholdandfoundthattheyyieldsimilarpower.WeconcludethatDRLPCcanprovideefficientdimensionreductionwhileresolvingmulti-collinearityandalsolessenstheproblemofinterpretabilitybecausetheseprincipalcomponentsrepresentsmallersizedregions,possiblyshorthaplotypes.

Page 114: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

106

General

Meta-analysisinexhaustedTcellsfromHomosapiensandMusmusculusprovidesnoveltargetsforimmunotherapy

LinZhang1,YichengGuo2,HafumiNishi1

1TohokuUniversityGraduateSchoolofInformationSciences,2ColumbiaUniversity,Department

ofSystemsBiology

LinZhangAntibodytargetimmunecheckpointinhibitorstoreverseTcellexhaustionisapromisingapproachforimmunotherapyofcancers.However,thetherapeuticefficacyisstilllowforknownimmunecheckpointinhibitors,suchasPD1andCTLA4.TcellexhaustionisastateofTcelldysfunctionduringchronicinfectionsandcancers.Itexhibitsseveralcharacteristicfeatures,suchaspooreffectorfunctionsinahierarchicalmanner,impairedmemoryTcellpotential,sustainedupregulationandco-expressionofmultipleinhibitoryreceptors.ThemechanismandpathwaysforTcellexhaustionremaintobefullydescribed.Inthisstudy,weperformedmeta-analysiswith7datasetsfrombothhumansandmice,touncoverthemolecularmechanismofTcelldysfunction.Throughgenesetenrichmentanalysis,thepredefinedexhaustiongenesetswereobservedtobesignificantenrichmentintheexhaustedTcells.Thedifferentexpressionanalysesshowedanoverlapof21upregulationand37downregulationgenessharedbyexhaustedTcellsinhumansandmice.Thesegenesweresignificantlyenrichedinexhaustionresponse-relatedpathways,suchassignaltransduction,immunesystemprocess,andregulationofcytokineproduction.Besides,co-expressionanalysisidentified175geneswerehighlycorrelatedwithexhaustiontraitinhumansandmice.Aboveall,ourstudyrevealedthatTOXandCD200R1mightbeconsideredaspotentialandhigh-efficienttargetsforimmunotherapy.

Page 115: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

107

INTRINSICALLYDISORDEREDPROTEINS(IDPS)ANDTHEIRFUNCTIONS

POSTERPRESENTATIONS

Page 116: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

108

IntrinsicallyDisorderedProteins(IDPs)andTheirFunctions

DisorderedFunctionConjunction:Onthein-silicofunctionannotationofintrinsicallydisorderedregions

SinaGhadermarzi,AkilaKatuwawala,ChristopherJ.Oldfield,AmitaBarik,LukaszKurgan

VirginiaCommonwealthUniversity

SinaGhadermarziIntrinsicallydisorderregions(IDRs)lackastablestructure,yetperformbiologicalfunctions.ThefunctionsofIDRsincludemediatinginteractionswithothermolecules,includingproteins,DNA,orRNAandentropicfunctions,includingdomainlinkers.Computationalpredictorsprovideresiduelevelindicationsoffunctionfordisorderedproteins,whichcontrastswiththeneedtofunctionallyannotatethethousandsofexperimentallyandcomputationallydiscoveredIDRs.Inthiswork,weinvestigatethefeasibilityofusingresidue-levelpredictionmethodsforregion-levelfunctionpredictions.Foraninitialexaminationofthemultiplefunctionregion-levelpredictionproblem,weconstructedadatasetof(likely)singlefunctionIDRsinproteinsthataredissimilartothetrainingdatasetsoftheresidue-levelfunctionpredictors.Wefindthatavailableresidue-levelpredictionmethodsareonlymodestlyusefulinpredictingmultipleregion-levelfunctions.Classificationisenhancedbysimultaneoususeofmultipleresidue-levelfunctionpredictionsandisfurtherimprovedbyinclusionofaminoacidscontentextractedfromtheproteinsequence.WeconcludethatmultifunctionpredictionforIDRsisfeasibleandbenefitsfromtheresultsproducedbycurrentresidue-levelfunctionpredictors,however,ithastoaccommodateinaccuracyinfunctionalannotations.

Page 117: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

109

MUTATIONALSIGNATURES

POSTERPRESENTATIONS

Page 118: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

110

MutationalSignatures

Transcription-associatedregionalmutationratesandsignaturesinregulatoryelementsacross2,500wholecancergenomes

JüriReimand

OntarioInstituteforCancerResearch,UniversityofToronto

JuriReimandThegenomesofhealthyandcancerouscellsaccumulatesomaticmutationsovertimewithcomplexvariationsacrosstissuesandgenomiccontexts.Certainclassesoffunctionalelementsofthegenomearesubjecttodifferentialmutationratesduetoregionalizedactivitiesofmutationalprocesses.Toinvestigateregionalmutations,wedevelopedRM4RM,astatisticalframeworkfordetectingdifferentialmutationratesandtrinucleotidesignaturesinsetsofgenomicregulatoryelements.Tovalidateourmodel,wefirstanalyzedCTCFbindingsitesacross>2,500wholecancergenomesof39cancertypesoftheICGC-TCGAPCAWGcohort.WefoundsignificantmutationenrichmentsinCTCFsitesinliver,esophageal,breastandothercancertypesthatwasprimarilydrivenbyT>C/Gmutationsandmultipleraremutationsignaturesofunknownetiology.Transcriptionstartsitesofprotein-codinggenesandabroadersetofexperimentally-definedregulatoryelementsderivedfromprimarytumorsoftheTCGAprojectalsoshowedsignificantlyelevatedregionalmutationratesinmultiplecancertypes.TSS-specificregionalmutationenrichmentwasparticularlydominantinhighlytranscribedgenesofmatchingtumorswhilenonewasapparentinsilencedgenes.Incontrast,nomutationenrichmentdependencyontranscriptabundancewasobservedindistalregulatoryelements.Thesedataindicateatranscriptioninitiation-coupledmutationalprocessactiveinmultiplecancertypessupportedbymultiplemutationalprocessesandtrinucleotidesignaturesspecificallyenrichedinhighly-transcribedTSSs.Ourfindingsandstatisticalmodelenabledetailedstudiesofthemechanismsofsomaticmutagenesisandadvancesourunderstandingofgeneticdriversofdisease.

Page 119: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

111

MutationalSignatures

Complexmosaicstructuralvariationsinhumanfetalbrains

ShobanaSekar1,LiviaTomasini2,MariaKalyva3,TaejeongBae1,LoganManlove1,BoZhou4,JessicaMariani2,FritzSedlazeck5,AlexanderE.Urban4,ChristosProukakis3,FloraM.Vaccarino2,

AlexejAbyzov1

1MayoClinic,2YaleUniversity,3UniversityCollegeLondon,4StanfordUniversity,5BaylorCollege

ofMedicine

AlexejAbyzovSomaticmosaicismincellsofthehumanbrainiscommonandmayhavefunctionalconsequencesthatleadtodiseasesincludingneurologicalones.Mosaicvariationsinbraincanbepointmutations,insertionsofmobileelements,andstructuralchanges.Previouslywedetectedanddescribed200-400mosaicpointmutationspersinglecellclonesfromcorticesofthreehumanfetuses(15to21weekspostconception).Herewedescribefourmosaicstructuralvariations(SVs)inthesamebrains.TheSVswereofkilobasescaleandcomplex,i.e.,consistingofdeletion(s)andafewrearrangedgenomicfragmentsthatsometimesoriginatedfromdifferentchromosomes.Sequencesatbreakpointsattherearrangementshadmicrohomologiessuggestingtheiroriginfromreplicationerrors.OneSVwasfoundintwoclonesandwetimeditsoriginto~14weekspostconception.OurstudyrevealstheexistenceofmosaicSVs,likelyarisingfromcellproliferation,inthehumanbraininmid-neurogenesis.

Page 120: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

112

PATTERNRECOGNITIONINBIOMEDICALDATA:CHALLENGESINPUTTINGBIGDATATOWORK

POSTERPRESENTATIONS

Page 121: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

113

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWorkPatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Stratificationofkidneytransplantrecipientsbasedontemporaldiseasetrajectories

IsabellaFriisJørgensenPhD1,SørenSchwartzSørensenPhD2,SørenBrunakPhD1

1NovoNordiskFoundationCenterforProteinResearch-FacultyofHealthandMedicalSciences-UniversityofCopenhagen-Blegdamsvej3B-DK-2200CopenhagenN-Denmark;2DepartmentofNephrology-Rigshospitalet-CopenhagenUniversityHospital-Blegdamsvej9-DK-2100

CopenhagenØ-Denmark

IsabellaFriisJørgensenOrgantransplantationsoftenimprovethelifeofchronicallysickpatients.However,immune-suppressivemedicationgiventotransplantrecipientsincreasetheriskofcomplications,especiallyinfectionsandinfection-relateddeath.Oneinfivekidneytransplantrecipientsdiefrominfection.Wewanttostratifykidneytransplantrecipientsintogroupsofpatientswithdifferentpatternsofinfectiousdiseasesandmortalitytopredictwhichpatientshavehigherriskofspecificinfections.WeusetheDanishNationalPatientRegistry(DNPR)thatcontainshospitaldiagnosesfor6.9millionpatientsfromtheentireDanishpopulationfrom1994to2018.Weuseapreviouslypublishedmethodtoidentifysignificanttime-dependentdiseasetrajectoriesforallpatientswithakidneytransplantation.Subsequently,weusehierarchicalclusteringofJaccarddistancesbetweenthediseasetrajectoriestofinddistinctgroupsoftrajectoriesfromkidneytransplantrecipients.IntheDNPR,weidentified5,644patientswithakidneytransplantationresultingin43significantdiseasetrajectoriesthatconsistofthreeconsecutivediseasesincludingseveralinfectious-relateddiagnoses.Morethan87%ofthekidneytransplantationrecipientsfollowatleastoneofthesetrajectories;hencearediagnosedwiththethreediseasesintheorderthetrajectoryspecifies.Clusteringrevealstwomaingroupsoftemporaldiseasetrajectories.Weidentifypatientsfollowingthetwogroupsofdiseasetrajectoriesanddiscoversignificantdifferencesinmortalityafterkidneytransplantationbetweenpatientsfollowingdifferentdiseasetrajectories.Thisstudyusedpreviousdiseasehistoryfromlarge-scalehospitaldiagnosestostratifycommon,temporaldiseasetrajectoriesintotwodistinctgroups.Dependingonthetypeoftrajectorykidneytransplantationrecipientsfollowsignificantdifferencesinmortalityareseen.Thesemethodscanbeusedtoguidecliniciansabouthigherrisksofcertaininfectionsandmortalityofcertaingroupsofkidneytransplantrecipients.

Page 122: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

114

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

ModelingGeneExpressionLevelsfromEpigeneticMarkersUsingaDynamicalSystemsApproach

JamesBrunner1,JacobKim2,KordM.Kober3

1MayoClinic,Rochester,MN;2ColumbiaUniversity,NewYork,NY;3UniversityofCalifornia,San

Francisco,CA

KordKoberGeneregulationisanimportantfundamentalbiologicalprocessandinvolvesanumberofcomplexbiologicalprocessesthatareessentialfordevelopmentandadaptationtotheenvironment.Understandingtheroleofepigeneticchangesingeneexpressionisafundamentalquestionofmolecularbiology.Predictinggeneexpressionfromepigeneticdataisanactiveareaofresearchandpreviousstudieshaveusedstatisticalapproachesforbuildingpredictionmodels.Dynamicalsystemscanbeusedtogenerateamodeltopredictgeneexpressionusingepigeneticdataandageneregulatorynetwork(GRN).Bydynamicallysimulatinghypothesizedmechanismsoftranscriptionalregulation,weprovidepredictionsbaseddirectlyonthesebiologicalhypotheses.Furthermore,astochasticdynamicalsystemprovidesuswithadistributionofgeneexpressionestimates,representingthepossibilitiesthatmayoccurwithinthecell.ThepurposeofthisstudyistodevelopandevaluateastochasticdynamicalsystemsmodelpredictinggeneexpressionlevelsfromepigeneticdataforagivenGRN.Wemodelgeneregulationusingapiecewise-deterministicMarkovprocess(PDMP)wheretranscriptionfactor(TF)bindingisaBooleanrandomvariablerepresentingthebound/unboundstateofabindingsiteregionofDNA.TFbindingisgivenasthedifferenceoftwoPoissonjumpprocesses(i.e.,bindingandunbinding),sothattimebetweenbindingandunbindingeventsisexponentiallydistributedwithpropensitiestakentobelinearfunctionsoftheavailableTF.EpigeneticmodificationoftheTFbindingsiteimpactsthebindingpropensityofTFandismeasuredasthepercentageofmethylatedbases(i.e.,beta).WeusealinearordinarydifferentialequationbasedontheunderlyingGRNtodeterminethevalueofthetranscriptbetweenTFbindingorunbindingevents.Weincludebaselinetranscriptionanddecayandareabletosolveexactlybetweenjumpsofbinding/unbindingevents.Inadiscretespace,continuoustimeMarkovprocess,theequilibriumdistributioncanbeestimatedbysamplingfromarealizationoftheprocess.ForourcontinuousspacePDMPwecanestimatetheequilibriumdistributioninasimilarmannerusingkerneldensityestimationwithaGaussiankernel.Weestimatethemarginaldistributionsofvariousgenevariableswitha1-dimensionalkernel.WeuseaGRNassumetobeknowntocreateamodelofgeneregulationthatincludesTFbindingdynamics.Weassociatebindingsiteswiththegenesthattheyregulateandusetheseassociationstocreateabipartitegraph.TheGRNandtraining/testingdataarecreatedfrompubliclyavailabledata.Theepigeneticparameterisassumedtobemeasurable.Theremainingparametersareestimatedusinganegativelog-likelihoodminimizationprocedure.Wecancomputealog-likelihoodforasetofpairedepigeneticandtranscriptionsamplesbytimeaveragingasamplepathagainstaGaussiankernel.Wereportonthedesignandevaluationofthemodel’sperformance.

Page 123: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

115

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

TranslatingBigDataneuroimagingfindingsintomeasurementsofindividualvulnerability

PeterKochunov1,PaulThompson2,NedaJahanshad2,ElliotHong1

1UniversityofMarylandSchoolofMedicine,Maryland,USA;2UniversityofSouthernCalifornia,

California,USAPeterKochunovWeproposeanintuitiveanatomicallyinformedapproachtoderiveanindexofsimilaritybetweenindividualbrainpatternsandtheexpectedpatternsofneuropsychiatricdisordersbasedonBigDataneuroimagingstudies.BigDataneuroimagingstudies,suchastheseperformedbyEnchancingNeuroImagingGeneticsMetaAnalysis(ENIGMA)consortiumprovidedscientificcommunitywiththeregionalpatternsofeffectsizesincommonneuropsychiatricdisorderssuchasschizophrenia(SZ),bipolarandmajordepressivedisorders(BPandMDD),epilepsy(EP),Alzheimer’sdementia(AD),mildcognitiveimpairment(MCI)andothers.ThesepatternsdescriberegionaldeficitusingstandardizedsMRI,dMRIandrsfMRIworkflows.Theyarederivedfromstatisticallypowerfulandinclusivesamplesandarehighlyreproducible(r=0.8-0.9)inindependentsamples.Wedeveloped“RegionalVulnerabilityIndex”(RVI)tomeasuresimilaritybetweenanindividualandtheexpectedpatternofthepatient-controldifferencesRVIcanbecalculatedforasingleoracrossimagingmodalities.ForasinglemodalityRVI,exampleusesFractionalAnisotropy(FA)measurefromdMRI,iscalculatedasfollowing.FAforeachofthe23majorwhitematterregions,asdefinedbyENIGMAatlas,inanindividualisconvertedtoz-valuesby(A)calculatingtheresidualvaluesafterregressingoutageandsexeffectsforthisregionand(B)subtractingtheaveragevalueforaregionand(C)dividingbythestandarddeviationcalculatedfromthehealthycontrols.Thisproducesavectorof23z-values(oneperregion)foreachindividualinthesample.RVIiscalculatedasthecorrelationcoefficientbetween23region-wisezvaluesforthesubjectandthepatient-controlseffectsizesinENIGMA.RVItakesvaluesfrom1(individualpatternisalignedwithdisorderpattern)to-1(individualpatternisinanti-alignment).Forcross-modalityresearch,RVIcanbeexpandedhierarchicallybybuildingacombinedvectorthatincludesmultiplephenotypes.Forexample,theRVI-WhiteMattercalculationusesavectorof69valuesthatcombinetract-wiseFA,radial(RaD)andaxial(AxD)diffusivityvaluesperperson.Tomergeeffectsizesacrossdiversedomains,weuseapseudo-ordinarytransformationthatmapseffectsizesbetween0and1whilepreservingtherelativedistancebetweenthem.WefirstdemonstratedthatRVI-SZvaluesaresignificantlyelevatedinpatientswithSZandarealsopredictiveoftreatmentresistance.ThatissubjectswhodevelopedresistancetomodernantipsychoticmedicationshadsignificantlyhigherRVI-SZvaluesthanthesewhorespondedtotreatment.WenextdemonstratedthatRVIforSZweresignificantlycorrelatedwithRVIforADbutnotMCIduetosignificantoverlapindeficitpatternsbetweenthesedisorders.WenextshowedthatcalculatingRVIacrossmultiplemodalitiesproducesvulnerabilitymeasuresthataremoresensitivetopatientcontroldifferencesintheindependentdatasetsandshowedstrongersensitivitytocognitivedeficitsandnegativesymptoms.TheRVIcalculatortoolsaredistributedwithsolar-eclipsesoftware(www.solar-eclipse-genetics.org)

Page 124: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

116

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Automatingnew-usercohortconstructionwithindicationembeddings

RachelD.Melamed

DepartmentofComputationalBiomedicineandBiomedicalData,UniversityofChicago

RachelMelamedTheelectronichealthrecordisarisingresourceforquantifyingmedicalpracticeanddiscoveringadverseeffectsofdrugs.Oneofthechallengesofhealthcaredataisthehighdimensionalityofthehealthrecord.Anystudyofpatternsinhealthdatamustaccountfortensofthousandsofpotentiallyrelevantdiagnosesortreatments.Inthiswork,wedevelopindicationembeddings,awaytoreducethedimensionalityofhealthdatawhilecapturingtheinformationrelevanttotreatmentdecisions.Wedemonstratethattheseembeddingsrecovertherapeuticusesofdrugs.Thenweusetheseembeddingsasaninformativerepresentationofrelationshipsbetweendrugs,betweenhealthhistoryeventsanddrugprescriptions,andbetweenpatientsataparticulartimeintheirhealthhistory.Weshowtheapplicationoftheseembeddingsinareasofcurrentresearch.Fordrugsafetystudies,particularlyretrospectivecohortstudies,ourlow-dimensionalrepresentationhelpsinfindingcomparatordrugsandconstructingcomparatorcohorts.Thisenablesustodevelopanautomatedapproachtochoosecomparatorcohortsforatreatedpopulation.

Page 125: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

117

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

Reproducibility-optimizedstatisticaltestingforomicsstudies

TomiSuomi,LauraElo

TurkuBioscienceCentre,UniversityofTurkuandÅboAkademiUniversity,Turku,Finland

LauraEloDifferentialexpressionanalysisisoneofthemostcommontypesofanalysesperformedonvariousbiologicalandbiomedicaldata,includinge.g.RNA-sequencingandmassspectrometryproteomics.Itistheprocessthatdetectsfeatures,suchasgenesorproteins,showingstatisticallysignificantdifferencesbetweenthesamplegroupsundercomparison.However,asdifferentteststatisticsperformwellindifferentdatasets,thechoiceofanappropriateteststatistichasremainedamajorchallenge.Toaddressthechallenge,ourreproducibility-optimizedteststatistic(ROTS)optimizesthestatisticonthebasisofthedatabymaximizingthereproducibilityofthetop-rankedfeaturesthroughabootstrapprocedure.Finally,itprovidesarankingofthefeaturesaccordingtotheirstatisticalevidencefordifferentialexpressionbetweenthesamplegroups.WehaveshowntherobustperformanceofROTSinarangeofstudiesfromtranscriptomicstoproteomics,coveringbothbulkandsinglecellmeasurements.ROTSisfreelyavailableasanRpackageinBioconductor.

Page 126: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

118

PatternRecognitioninBiomedicalData:ChallengesinPuttingBigDatatoWork

DataIntegrationExpectationMaps:Towardsmoreinformed'omicdataintegration

TiaTate1,ChristianRichardson2,ClarLyndaWilliams-DeVane3

1UniversityofNorthCarolina-Charlotte,2DukeUniversity,3FiskUniversity

ClarLyndaWilliams-DeVaneInnovativedatatechnologiesanddecreasingcostshaveexpandedthescopeofavailabledatarelatingtovariousdiseases.Avastamountof-omicsdatageneratedatdiverselevels(DNA,RNA,protein,metaboliteandepigenetic)haverevealedrelationshipsofvariousbiologicalprocesses.Generally,thesediversedatatypesareconsideredindependentlywhilecombinationsoftwoormoredatatypesarelessexplored.Thisnarrowapproachoftenfailstoidentifytheintricateinteractionsresponsiblefortheetiologyofcomplexdisease.Completebiologicalmodelsofcomplexdiseasesareonlylikelytobediscoveredifthevariouslevelsof-omicmechanismsareconsideredfromanintegrativeperspective.Integrativemodelsoftenrequiretheintegrationofbiological,computational,mathematical,andstatisticaldomains.However,awell-documentedshortageofresearcherswithacommandofmultipledomainsexists.Thus,wehaveproposedtheuseofDataIntegrationExpectationMaps(DIEMs)asvisualtoolsforfacilitatingtheunderstandingofintegratingvarious-omicdatatypestounderstandcomplexdiseasesbyfillingingapsinbiologicalknowledge.DIEMsprovideauser-friendlyformatforunderstandingintegrativemodeldevelopmentincomplexdiseasesby1)identifyingdataformatsthatcanand/orhavebeenintegrated,2)providingguidanceonthebestmethodtointegratethedata,and3)providinganexpectationofbiologicalinsighttobegainedfromtheintegration.

Page 127: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

119

PRECISIONMEDICINE:ADDRESSINGTHECHALLENGESOFSHARING,ANALYSIS,ANDPRIVACYATSCALE

POSTERPRESENTATIONS

Page 128: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

120

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Integratedomicsdataminingofsynergisticgenepairsforcancerprecisionmedicine

EunaJeong,ChoaPark,SukjoonYoon

SookmyungWomen'sUniversity

EunJeongCurrenthigh-throughputtechnologiesenablesimultaneousacquisitionofmulti-levelomicsandRNAi/chemicalscreeningdataincancers.Productionandintegrationofthesedatahelpidentifyingassociationsofdrugtargetsandsynergisticbiomarkers(mutationsorgeneexpression),thusacceleratingtheirclinicalapplicationsandpatientstratification.WehaveextensivelycarriedoutcancerbigdataminingandphenotypicsiRNAlibraryscreeningforfindingtheoptimalcombinationoftargetsandbiomarkersforadvancedcancertherapiessuchasregulatingcancerstem-likecells(CSLCs)andoncogenictranscriptionfactors.Ourmultiplexedscreeningdissectphenotypicresponsesintosensitivityandresistancytothetargetknockdown.Combinedwithmutaomeandtransciptomedataofscreenedcelllines,targetome-wideknockdowndatarevealthefunctionalaspectofsynergisticeffectsbetweentargetsiRNAsandmutation/transcriptionsignatures,leadingtothediscoveryofnovelsyntheticlethalgenepairs.Productionandintegrationofthesedataenabledustoidentifytarget-biomarkercombinationsforacceleratingtheirclinicalapplicationsandpatientstratification.

Page 129: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

121

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Thepowerofdynamicsocialnetworkstopredictindividuals'mentalhealth

ShikangLiu1,DavidHachen1,OmarLizardo2,ChristianPoellabauer1,AaronStriegel1,TijanaMilenkovic1

1UniversityofNotreDame,2UniversityofCaliforniaatLosAngeles

ShikangLiuPrecisionmedicinehasreceivedattentionbothinandoutsidetheclinic.Wefocusonthelatter,byexploitingtherelationshipbetweenindividuals'socialinteractionsandtheirmentalhealthtopredictone'slikelihoodofbeingdepressedoranxiousfromrichdynamicsocialnetworkdata.Existingstudiesdifferfromourworkinatleastoneaspect:theydonotmodelsocialinteractiondataasanetwork;theydosobutanalyzestaticnetworkdata;theyexamine"correlation"betweensocialnetworksandhealthbutwithoutmakinganypredictions;ortheystudyotherindividualtraitsbutnotmentalhealth.Inacomprehensiveevaluation,weshowthatourpredictivemodelthatusesdynamicsocialnetworkdataissuperiortoitsstaticnetworkaswellasnon-networkequivalentswhenrunonthesamedata.

Page 130: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

122

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

Robust-ODAL:Learningfromheterogeneoushealthsystemswithoutsharingpatient-leveldata

JiayiTong1,RuiDuan1,RuowangLi1,MartijnJ.Scheuemie2,JasonH.Moore1,YongChen1

1UniversityofPennsylvania,2JanssenResearchandDevelopmentLLC

JiayiTongElectronicHealthRecords(EHR)containextensivepatientdataonvarioushealthoutcomesandriskpredictors,providinganefficientandwide-reachingsourceforhealthresearch.IntegratedEHRdatacanprovidealargersamplesizeofthepopulationtoimproveestimationandpredictionaccuracy.Toovercometheobstacleofsharingpatient-leveldata,distributedalgorithmsweredevelopedtoconductstatisticalanalysesacrossmultipleclinicalsitesthroughsharingonlyaggregatedinformation.However,theheterogeneityofdataacrosssitesisoftenignoredbyexistingdistributedalgorithms,whichleadstosubstantialbiaswhenstudyingtheassociationbetweentheoutcomesandexposures.Inthisstudy,weproposeaprivacy-preservingandcommunication-efficientdistributedalgorithmwhichaccountsfortheheterogeneitycausedbyasmallnumberoftheclinicalsites.Weevaluatedouralgorithmthroughasystematicsimulationstudymotivatedbyreal-worldscenariosandappliedouralgorithmtomultipleclaimsdatasetsfromtheObservationalHealthDataSciencesandInformatics(OHDSI)network.TheresultsshowedthattheproposedmethodperformedbetterthantheexistingdistributedalgorithmODALandameta-analysismethod.

Page 131: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

123

Precisionmedicine:addressingthechallengesofsharing,analysis,andprivacyatscale

PharmGKB:AutomatedLiteratureAnnotations

MichelleWhirl-Carrillo1,LiGong1,RachelHuddart1,KatrinSangkuhl1,RyanWhaley1,MarkWoon1,JuliaM.Barbarino2,JakeLever3,RussB.Altman4,TeriE.Klein5

1DepartmentofBiomedicalDataScience,StanfordUniversity;2FormerlyDepartmentofBiomedicalDataScience,StanfordUniversity;3DepartmentofBioengineering,StanfordUniversity;4DepartmentsofBioengineering,MedicineandGenetics,StanfordUniversity;

5DepartmentsofBiomedicalDataScienceandMedicine,StanfordUniversity

MichelleWhirl-CarilloPharmGKBisthelargestpubliclyavailableresourceforpharmacogenomics(PGx)discoveryandimplementation.Itsmissionistocollect,curate,integrateanddisseminateknowledgeabouthowhumangeneticvariationinfluencesdrugresponse.PharmGKBscientistSmanuallycuratetheprimaryliteraturetocapturedetailsofpublishedpharmacogenomicstudiessuchasvariant-gene-drug-phenotypeassociations,statisticalsignificance,studysizeandpopulationcharacteristics.PharmGKBreferstothesemanuallycreatedannotationsas“VariantAnnotations.”

Page 132: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

124

PACKAGINGBIOCOMPUTINGSOFTWARETOMAXIMIZEDISTRIBUTIONANDREUSE

WORKSHOPPOSTERPRESENTATIONS

Page 133: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

125

Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse

ApolloprovidesCollaborativeGenomeAnnotationEditingwiththepowerofJBrowse

NathanDunn1,ColinDiesh2,RobertBuels2,HelenaRasche3,AnthonyBretaudeau4,NomiHarris1,IanHolmes2

1LawrenceBerkeleyNationalLab,2UCBerkeley,3UniversityofFreiburg,4INRA

NathanDunnGenomeannotationprojectsinvolvemulti-stepworkflowsthatarelargelyautomated.However,evenwithafullyautomatedannotationpipelinevisualinspectionandrefinementofdiversetypesofinformationsuchasgenomicandtranscriptomealignmentsandpredictivemodelsbasedonsequenceelementsarecriticaltoassureandimprovetheaccuracyofthegenomeannotationspriortopublication.Tothisend,Apollo(https://github.com/GMOD/Apollo/)isawebapplicationthatprovidesresponsiveandcustomizablevisualizationandeditingofgenomicelements.BuiltontopoftheJBrowsegenomebrowser(http://jbrowse.org/)anditslargeregistryofplugins(https://gmod.github.io/jbrowse-registry/),Apollosupportsefficientannotationcurationthroughdrag-and-dropediting,alargesuiteofautomatedstructuraleditoperations,theabilitytopre-definecuratorcommentsandannotationstatustomaintainconsistency,attributionofannotationauthors,fine-graineduserandgroupaccessandeditpermissions,andavisualhistoryofrevertibleannotationedits.SettingupanewgenomeannotationinApolloisstraightforward.ApollocanberunfromDockerorfromprovidedAWSinstances,andgenomeswithfeatureevidencecanberetrievedfromanexistingJBrowsedirectory.Wehavealsorecentlyenabledresearcherstouploadtheirgenomesequenceandfeatures(inFASTA,VCF,BAM,orGFF3format)directlytoApollo,minimizingtheneedforscriptingorserveraccess.ItisalsopossibletocreateannotationsontheflyfromBLATorBLASTsearchresults,whichprovidesawaytoinitiateagenepreviouslyannotatedonacloselyrelatedspecies..ApolloprovidesaPythonlibrarythatwrapstheweb-services(https://github.com/galaxy-genome-annotation/python-apollo)sothatworkflowenvironmentssuchasGalaxycanbeautomatedsothattheoutputofanautomatedworkflowcandirectlycreategenomeprojects,provideevidence,andmanageaccesstoanApolloinstance.Apollosupportsseveralpopularformatsfordataexport.StructuralgenomeannotationscanbeexportedasFASTA,GFF3,orVCF(ifannotatingvariants)alongwithanyassociatedmetadata.FunctionalannotationsmappedtoGeneOntologytermscanbeexportedinGPAD2orGPI2format.Apolloisanopen-sourcetoolusedinoveronehundredgenomeannotationprojectsaroundtheworld,rangingfromtheannotationofasinglespeciestolineage-specificeffortssupportingtheannotationofdozensofgenomes.https://github.com/GMOD/Apollo/https://genomearchitect.readthedocs.io/

Page 134: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

126

Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse

g:Profiler - One functional enrichment analysis tool, many interfaces serving life science communities

Liis Kolberg, Uku Raudvere, Ivan Kuzmin, Jaak Vilo, Hedi Peterson

University of Tartu

Making sense of gene lists plays an important role in majority of biological and biomedical experiments. There are several methods and tools that help the scientists to carry out the computational load of these tasks. One of such is g:Profiler (https://biit.cs.ut.ee/gprofiler), a widely used toolset for functional interpretation and conversion of gene lists from hundreds of species. g:Profiler has served the community since 2007 and continues to provide life scientists with the most up-to-date data and methods to this day. Keeping the service trustworthy, the results reproducible and transparent has been the main goal of the team developing g:Profiler. The success in this end is indicated in the increasing number of user requests per year, which already in 2019 alone is close to 9 million queries. These millions of queries originating across the world reflect the diversity of usage preferences, skill sets and research goals of the scientific community. We, as the developers of g:Profiler, have taken this into account by developing and supporting different access options which, in hindsight, has been a huge factor in the increasing user traffic. On the one hand, g:Profiler web application provides researchers, who want quick and easily interpretable results, with nice visualizations, searchable tables and data export possibilities. On the other hand, there is a large bioinformatics community, whose members prefer to analyze gene lists in an automated manner. We support them by offering a standardized access through public APIs. And, as R and Python are the most popular programming languages among life scientists with informatics expertise, we have simplified the usage of APIs by wrapping them into corresponding packages named gprofiler2 and gprofiler-official, respectively. For the users somewhere in between, g:Profiler is also available from the Galaxy platform, which is a popular framework for data intensive biomedical research pipelines run in a graphical user interface. It is clear that the tools in such an interdisciplinary field need to be flexible in order to fully benefit the research community. However, from our experience, the complexity of providing a widely distributed toolset lies in the maintenance of the services rather than in the development, and this is the core reason for depreciation of tools. In g:Profiler the separate interfaces all use the data and methods from a shared hub making them reliable and consistent with each other even after the frequent data updates. We are positive that g:Profiler has been able to help thousands of researchers across the life science community because our priorities have been to reuse high quality and regularly updated data, and to maximize the access options so that we would not leave any life science subcommunity behind.

Page 135: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

127

Workshop:PackagingBiocomputingSoftwaretoMaximizeDistributionandReuse

IncreasingusabilityanddisseminationofthePathFXalgorithmusingwebapplicationsanddockersystems

JenniferWilson1,NicholasStepanov2,AjinkyaChalke2,MikeWong3,DragutinPetkovic2,RussB.Altman4

1DepartmentofChemical&SystemsBiologyatStanfordUniversity;2ComputerScienceDeptat

SanFranciscoStateUniversity;3COSEComputingforLifeSciencesatSanFranciscoStateUniversity;4HelixGroupatStanfordUniversity

MikeWongLimitedefficacyandunacceptablesafetyconfoundtherapeuticdevelopment.Identifyingpotentialliabilitiesearlierindrugdevelopmentcouldsignificantlyimprovesuccessrates.Recently,incollaborationwiththeUSFDA,wedevelopedthePathFXalgorithmandopenlyavailablePathFXwebapplicationforbetterunderstandingpathway-levelsafetyandefficacyphenotypesassociatedwithadrug’starget(s).RunningPathFXalgorithmlocallywouldenableimprovedefficiency,security,andprivacy,howeverinstallationofPathFXanditsdependenciesischallengingfornon-computationalscientistsandpreventsdissemination.Inaddition,whilePathFX-webquicklyanalyzesnetworkassociations,thephenotypeclusteringfeaturehashighcomputationalcoststhatlimittheefficiencyofthesharedcloudserver.Toresolvethesechallenges,wedevelopedPathFX-webDockercontainerwhichprovidesaneasy-to-install,easy-to-usewebinterface,astandalonecommand-lineformulationtoPathFX,addedsecurity/privacyandallowsleveragingofthecomputationalpoweroftheuser’shardware.

Page 136: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

128

TRANSLATIONALBIOINFORMATICSWORKSHOP:BIOBANKSINTHEPRECISIONMEDICINEERA

WORKSHOPPOSTERPRESENTATIONS

Page 137: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

129

Workshop:TBIworkshop

Identificationofbiomarkersrelatedtoautismspectrumdisorderusinggenomicinformation

LeenaSait,MarthaGizaw,IosifVaisman

SchoolofSystemsBiology,GeorgeMasonUniversity

LeenaSaitAutismspectrumdisorder(ASD)isoneofthemostcommonneurodevelopmentaldisorders.Worldwide,ASDtendstohaveaprevalenceofoneper132persons,withanestimatedprevalenceof1in59children,accordingtoCDC’sAutismandDevelopmentalDisabilitiesMonitoringNetwork.Todate,noeffectivemedicaltreatmentsforthecoresymptomsofASDexists.However,biomarkerscapableofdetectinganddiagnosingASDcanhelptotranslateexperimentalresearchresultstobenchsideclinicalpractices.BiomarkerdiscoveryinASDiscomplicatedbythediversityofcoresymptomswhichcomprisedeficitsinsocialcommunication,presenceofrigid,repetitiveandstereotypicalbehaviors,andcomorbidmedical(e.g.,epilepsy)orpsychiatricsymptoms.TheEU-AIMSLongitudinalEuropeanAutismProject(LEAP),thelargestconsortiamadeagreatadvancementinthediscoveryofbiomarkersforASD.Itseekstoidentifystratificationbiomarkersusingneurobiologicalorneurocognitivemeasures,neuroimaging,electrophysiology,biochemistryandgenetics.Thisworkisaimedattheidentificationofsinglenucleotidepolymorphisms(SNPs)basedonSNPgenotypingingenomicDNAinalargecohortofASDpatientsandunaffectedrelatedindividualstohelpunderstandtheexactgeneticcausesofASD.Wehypothesizedthatrankingthegenesbasedondistanceinthespaceoftheallelesfrequenciesbetweenaffectedandunaffectedpopulationscanbeusedtoidentifynewputativebiomarkers.ThedatasetretrievedfromtheGeneExpressionOmnibusdatabase(GSE6754)containsmorethan6000samplesfrom1,400families.OurresultsshowthattheSNPsthatarehighlyrankedbythedistanceinthree-dimensionalgenotypecountspacebetweenalltheaffectedandunaffectedsubjectsinthecohortaremorelikelytobelinkedtoASD.TheseresultscanopennewpossibilitiesforfurtherinvestigationinidentifyingthegeneticmechanismsofASD.

Page 138: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

130

Workshop:TBIworkshop

Apan-cancer3-genesignaturetopredictdormancy

IvyTran1,AnchalSharma2,SubhajyotiDe2

1RutgersUniversity-Camden,2RutgersCancerInstituteofNewJersey

IvyTranTumordormancyischaracterizedbythedisseminationofhibernatingtumorcellsthatdonotproliferateuntilyearsafterapparentlysuccessfulremovalofpatients’primarycancer,resultinginthelaterelapseofthecancer.Distinguishingbetweentheriskofearly(£8months)andlate(³5years)relapseincancerpatientsisimportantforthetargetedtreatmentofthetumor.Inthisstudy,weidentified53genesthatweresignificantlyup-regulatedordown-regulatedindormantcells,fromwhichthreegenes,CD300LG,OCIAD2,VSIG4,weredeterminedbyrecursivefeatureeliminationtobethemostimportantfeaturesinpredictingtumordormancy.Usingthisthreegenesignature,wetrainedaRandomForestalgorithmonacross-validated(10foldrepeated3times)dataset(n=422)randomlysubsettedintotrainingdata(75%)andtestdata(25%),consistingofsevendifferenttumortypes-testicularcancer,breastcancer,glioblastomamultiforme,lungcancer,colonrectalcancer,kidneycancerandmelanoma.Thetunedpredictionmodelyielded80.19%predictionaccuracyusingconfusionmatrixanalysis,and82.74%predictionaccuracywhenusingAUCofaROCcurveastheaccuracymetric.Whenindependentlytestingthemodelonavalidationset(n=44)oflivercancerdownloadedfromICGC,confusionmatrixanalysisyieldeda67.44%accuracyandAUCofaROCcurveyieldeda60.48%accuracy.Thisidentified3-genesignaturecanbeusefulinpredictingearlyorlaterelapseofcancerinpatientsinclinicalpractice.

Page 139: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

131

AUTHORINDEX

’tJong,Geert·85

A

Abyzov,Alexej·111Adkins,Joshua·96

·3Agrawal,MonicaAlavi,Ali·99

·28Allen,MaryA.·47Alterovitz,Wei-Lun

Althagafi,Azza·70·27,34,37,123,127Altman,RussB.

·23Anastopoulos,IoannisN.·21Andrade-Navarro,MiguelA.

Andrianova,Katia·74·31Arslanturk,Suzan

·50Atwal,Gurnit

B

·63Bae,HoBae,Taejeong·111

·65Baladandayuthapani, VeerabhadranBaltrus,David·96Barash,Yoseph·92

·34,123Barbarino,JuliaM.·10,108Barik,Amita

Barlaskar,Sabiha·99·39Barnard,Martha·19Beam,AndrewL.

Bebek,Gurkan·75Belyeu,Jon·83

·60Benchek,PenelopeBerger,Howard·85

·65Bhattacharyya,Rupam·22Blinder,Pablo·20,76,93Bobak,CarlyA.

Bock,Christoph·91·25Bourque,Guillaume

·7Branch,Andrea·2Brand,Lodewijk

Brannon,Charlotte·77Bretaudeau,Anthony·125

·27Brinton,ConnorBrodie,Sonia·85Brooks,ThomasG.·79,92Brown,James·85Brown,Joe·83Brown,Yaadira·80Brunak,Søren·113

Brunner,James·114Buels,Robert·125

·4Bui,NamBull,ShelleyB.·105

·21Burkhardt,Sophie·60,64Bush,WilliamS.

·42Bustamante,CarlosD.

C

·6Cai,ChunhuiCai,DanDan·99

·19Cai,TianxiCalvanese,Vincenzo·95

·44Candido,ElisaCapellera-Garcia,Sandra·95Carleton,BruceC.·78,85Carrillo,KatherineI.·81

·49Ceri,StefanoChalke,Ajinkya·127Chaudhry,Shahnaz·85

·3Chen,IreneY.·4Chen,JessicaW.

·12Chen,JianhanChen,Jun·70

·45Chen,Yang·38,122Chen,Yong

Cheong,HeeJin·102·56Cheong,Jae-Ho·53Cherng,SarahT.

Chia,Nicholas·71·50Chmura,Jacob·63Choi,Hyun-Soo

·68,103Chrisman, Brianna·54Christensen,BrockC.

·15Christensen,SarahChu,Chong·82

·6Cohen,WilliamW.·52Coker,Beau

·64CookeBailey,JessicaN.Cormier,Michael·83CornwellIII,EdwardE.·80

·4,42Costa,HelioA.·64Crawford,DanaC.

·5Crowell,Andrea·8Cui,Tianyi

D

Dale,Ryan·88·53Danieletto,Matteo

De,Subhajyoti·130·27Derry,Alexander

Diesh,Colin·125Ding,Yali·84

Page 140: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

132

Dovat,Sinisa·84·28Dowell,RobinD.

·31Draghici,SorinDrögemöller,BrittI.·78,85

·38,122Duan,Rui·44Duchen,Raquel·7,53Dudley,JoelT.·47Dunker,A.Keith

Dunlap,Kaiti·103 ·68Dunlap, Kaitlyn

Dunn,Nathan·125Durmaz,Arda·75

E

Ekstrand,Sophia·95·15El-Kebir,Mohammed

Elo,Laura·117

F

·47Faraggi,EshelFarlik,Matthias·91Feng,Song·96

·43Feng,YunyiFitzGerald,GarretA.·79,92

·60Fondran,JeremyR.Fortelny,Matthias·91

·19Fried,Inbar·23Friedl,Verena

G

·59Gao,Jean ·65Garmire, Lana

Gerstein,Mark·77·10,108Ghadermarzi,Sina

Ghayoori,Sholeh·85Ghosh,Sayan·96

·28Gilchrist,AlisonR.Gizaw,Martha·129

·21Glodde,Josua·22Golgher,Lior

·34,123Gong,LiGorjifard,Sayeh·88Grant,GregoryR.·79,92Groeneweg,GabriellaS.S.·85Gumerov,VadimM.·86

·27,37Guo,MargaretGuo,Yicheng·106

·22Gur,ShirGursoy,Gamze·77

H

·65Ha,MinJin·23Haan,David

·68,103Haber, Nick·35,121Hachen,David

·60Haines,JonathanL.Halappanavar,Mahantesh·96

·66Hall, MollyA.·60Hamilton-Nelson,KaraL.

·24Hao,Jie·5Harati,Sahar

·16,87Harrigan,CaitlinF.Harris,Nomi·125

·8Hauskrecht,Milos ·66He, Xi

·32Hernandez-Ferrer,CarlesHigginson,Michelle·85

·20,76,93Hill,JaneE.·25Hocking,TobyDylan

Hoehndorf,Robert·70,90Hogenesch,JohnB.·92

·11Hogue,ChristopherW.V.Holmes,Ian·125Hong,Elliot·115

·3Horng,Steven·47Huang,Fei·2Huang,Heng·58Huang,Kun

·34,123Huddart,RachelHughitt,V.Keith·88Husic,Arman·103

·56Hwang,TaeHyun

I

Ito,Shinya·78,85

J

·44Jaakkimainen,Liisa·11Jagannathan,N.Suhas

Jahanshad,Neda·115Jang,Hyun-Jong·94Jenkins,Willysha·89Jeong,Euna·120Jørgensen,IsabellaFriis·113Jouline,Igor·74

·7Jun,Tomi·63Jung,Dahuin

Jung,Jae-Yoon·103

Page 141: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

133

K

Kafkas,Senay·90Kalantari,John·71

·68Kalantarian, HaikKalyva,Maria·111

·24Kang,Mingon·56Kar,Nabhonil

Karjalainen,Anzhelika·91·10,108Katuwawala,Akila

Keats,JonathanJ.·88·41Kelly,Libusha

Kent,Jack·103Khan,Ariful·96

·41Khan,SaadKim,Jacob·114Kim,WooJoo·102

·64Kinzy,Tyler ·66Kleber, MarcusE.

·34,81,123Klein,TeriE. ·68,103Kline, Aaron

·47Kloczkowski,AndrzejKober,KordM.·114

·33Kocher,Jean-PierreKochunov,Peter·115

·55Koestler,DevinC.·19Kohane,IsaacS.

Kolberg,Liis·126·19,52Kompa,Benjamin

·32Kong,SekWonKoohi-Moghadam,Mohamad·72

·24Kosaraju,SaiChandraKoster,Johannes·83

·21Kramer,Stefan·13Kriwacki,RichardW.

Krunic,Milica·91·4Kunder,ChristianA.

·60Kunkle,BrianW.·10,108Kurgan,Lukasz

Kuzmin,Ivan·126

L

Lahens,NicholasF.·79,92·33Larson,MelissaC.·33Larson,NicholasB.

Lassnig,Caroline·91Lawrence,CrisW.·79,92LeBlanc,Emilie·103Lee,E.Alice·82Lee,HanSol·102

·53Lee,Hao-ChihLee,Joon-Yong·96

·45Lee,RenaLee,Soohyun·82Lee,SungHak·94

·15,17Leiserson,MarkD.M.·34,123Lever,Jake

·54Levy,JoshuaJ.

Li,Hongyan·72·7Li,Li

·38,122Li,Ruowang·26Lichtarge,Olivier

Lim,Sooyeon·102·43Lin,Deborah

·64Lin,John·20,93Lin,Justin·43Lin,Simon·43Liu,Chang·65Liu,Qingzhi·35,121Liu,Shikang·12Liu,Xiaorong

Liu,Zhandong·100·35,121Lizardo,Omar

Lowry,WilliamE.·100·6Lu,Xinghua

·36Luthria,Gaurav·45Lv,Tianling

M

Ma,Feiyang·95·58Machiraju,Raghu

Macho-Maschler,Sabine·91 ·66Maerz, Winfried

Magee,LauraA.·85·58Mallick,Parag

Manlove,Logan·111Manrai,ArjunK.·101Mariani,Jessica·111

·5Mayberg,HelenMcDermott,Jason·96

·20,93McDonnell,Lauren·55Meier,Richard

Melamed,RachelD.·116Melas-Kyriazi,Luke·101

·47Meng,JingweiMiao,Fudan·85Michalowski,AleksandraM.·88Mikkola,HannaK.A.·95

·35,121Milenkovic,Tijana·53Miotto,Riccardo·13Mitrea,DianaM.

Mock,BeverlyA.·88Monroy,Rebeca·82

·38,122Moore,JasonH.·43Moosavinasab,Soheil

·16,44,50,87Morris,QuaidMueller,Mathias·91

·66Mueller-Myhsok, Bertram

N

·33Na,JieNakatochi,Masahiro·97Nayak,Soumyashant·79,92Nelson,Heidi·71

Page 142: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

134

Nelson,William·96·5Nemati,Shamim

Nemesure,Matthew·93·20Nemesure,MatthewD.

·55Neums,LisaNguyen,Justine·96

·31Nguyen,Tin·2Nichols,Kai

·42Nie,AllenNing,Michael·103Nishi,Hafumi·106Noh,JiYun·102

O

·64O'Toole,JohnF.Oh,JiEun·104Ola,MojoyinolaJoanna·91

·10,47,108Oldfield,ChristopherJ.Olufajo,OlubodeA.·80Orloff,Mohammed·75Ostojic,Annie·98

P

·19Palmer,NathanPark,Choa·120Park,Gunseok·104Park,PeterJ.·82

·56Park,Sunho·50Park,Yoonsik

·57Parmigiani,Giovanni ·68,103Paskov, KelleyMarie ·66Passero, Kristin

Patel,ChiragJ.·101·42Patel,RonakY.

·57Patil,Prasad ·68Patnaik, Ritik

Payne,JonathonL.·84Pedersen,Brent·83Pellegrini,Matteo·95Penev,Yordan·103

·37Pershad,Yash·7Perumalswami,Ponni

Peterson,Hedi·126Petkovic,Dragutin·99,127

·26Pham,Minh ·67Pietras, ChristopherMichael

·42Pineda,ArturoL.·49Pinoli,Pietro·49Piro,Rosario

·35,121Poellabauer,ChristianPoelzl,Andrea·91Pohodich,AmyE.·100Polley,EricC.·88

·67Power, LiamProukakis,Christos·111Pruneda,Jonathan·96

·17Przytycka,TeresaM.

Q

Quinlan,AaronR.·83

R

Raman,AyushT.·100·57Ramchandran,Maya·61Ramsey,StephenA.

Rasche,Helena·125Rassekh,Shahrad·78Rassekh,ShahradR.·85Raudvere,Uku·126Reimand,Jüri·110Richardson,Christian·89,118

·47Romero,PedroRoss,ColinJ.D.·78,85

·33Rowsey,Ross·16,87Rubanova,Yulia

·39Ryder,Nathan

S

Sagers,Luke·101Sait,Leena·129

·54Salas,LucasA.Sanatani,Shubhayan·85

·34,123Sangkuhl,KatrinSarantopoulou,Dimitra·79,92

·28Sawyer,SaraL.·38,122Scheuemie,MartijnJ.

·19Schmaltz,AllenSchug,Jonathan·92

·68Schwartz, JesseySedlazeck,Fritz·111

·64Sedor,JohnR.Sekar,Shobana·111

·16,87Selega,Alina·17Sharan,Roded

Sharma,Anchal·130·58Sharpnack,Michael

Shaw,Kaitlyn·85·2Shen,Li·19Shi,Xu

Shoebridge,Stephen·91·21Siekiera,Julia

Simmons,JohnK.·88Skander,Dannielle·75

·67Slonim, DonnaK.·13Somjee,Ramiz·24Song,DaeHyun

Song,JoonYoung·102·3Sontag,David

Sørensen,SørenSchwartz·113·33Sosa,CarlosP.

Page 143: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

135

·27Sosa,DanielN.Southerland,William·80

·54Sriharan,AravindhanSrinivasan,Anand·92

·58Srivastava,Arunima·28Stabell,AlexC.

·49Stamoulakatou,Eirini·28Stanley,JacobT.

Staub,Michelle·85·4Stehr,Henning

Stepanov,Nicholas·127 ·68,103Stockham, Nathaniel

·35,121Striegel,AaronStrobl,Birgit·91

·23Stuart,JoshuaM.Sun,Hongzhe·72Sun,MinWoo·103Suomi,Tomi·117

T

·23Tao,Ruikang·6Tao,Yifeng

·68Tariq, QandeelTate,Tia·118

·55Thompson,JeffreyA.Thompson,Paul·115

·39Tintle,NathanTomasini,Livia·111

·38,122Tong,JiayiTran,Ivy·130

·59Tran,NhatTrueman,Jessica·85

·24Tsaku,NelsonZange·11Tucker-Kellogg,Lisa

U

Urban,AlexanderE.·111·47Uversky,VladimirN.

V

Vaccarino,FloraM.·111·54Vaickus,LouisJ.

Vaisman,Iosif·129·7Vandromme,Maxence

·68,103Varma, MayaVilo,Jaak·126

·68,103Voss, Catalin

W

Wagner,Sarah·77 ·68,103Wall, DennisP.

Wan,Ying-Wooi·100·42Wand,Hannah

·33Wang,Chen·29Wang,Gao

Wang,Haibo·72·2Wang,Hua

Wang,Junwen·72·36Wang,Qingbo

Wang,Yuchuan·72·29Wang,Yue

·29Wang,Yunlong·60Warfe,Mike

·68,103Washington,Peter·19Weber,Griffin

·27Wei,Eric·23Weinstein,AlanaS.

West,Nicholas·85·39Westra,Jason·34,123Whaley,Ryan

·60Wheeler,NicholasR.·34,123Whirl-Carrillo,Michelle

Whyte,SimonD.·85Williams-DeVane,ClarLynda·89,118Wilson,Jennifer·127

·44Wilton,AndrewS.·44Wodchis,Walter·17Wojtowicz,Damian

·39Wolf,Jack·22Wolf,Lior

·23Wong,ChristopherK.Wong,Mike·127

·34,123Woon,MarkWright,GalenE.B.·78,85

·42Wright,MattW.·29Wu,Tong·42Wulf,Bryan

X

·39Xia,Xueting·45Xing,Lei·47Xue,Bin

Y

Yalamanchili,HariKrishna·100Yang,HeeChul·104Yang,Jizhou·99Yang,Xinming·72

·61Yao,YaoYavartanu,Fatemeh·105Yoo,YunJoo·105Yoon,Sukjoon·120

·63Yoon,Sungroh·50Young,Adamo

·8Yu,KeYue,Feng·84

Page 144: PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 · PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020 ABSTRACT BOOK Poster Presenters: Poster space is assigned by abstract page number. Please find the

136

Z

·4Zehnder,JamesL.·43Zeng,Xianlong

Zhang,Bo·84·44Zhang,Haoran

Zhang,Lin·106

·8Zhang,Mingda·45Zhao,Wei

Zhou,Bo·111 ·66Zhou, Jiayan

Zhulin,IgorB.·86Zoghbi,HudaY.·100

·42Zou,James


Recommended