TheDrosophilaGeneExpressionTool(DGET)forexpressionanalyses
YanhuiHu1,AramComjean1,NorbertPerrimon1,2,StephanieMohr1
1. Dept.ofGenetics,HarvardMedicalSchool,Boston,MA0211;2.HowardHughesMedicalInstitute
Correspondingauthor:StephanieMohr
Abstract
Background
Next-generationsequencingtechnologieshavegreatlyincreasedourabilitytoidentifygeneexpressionlevels,includingatspecificdevelopmentalstagesandinspecifictissues.Geneexpressiondatacanhelpresearchersunderstandthediversefunctionsofgenesandgenenetworks,aswellashelpinthedesignofspecificandefficientfunctionalstudies,suchasbyhelpingresearcherschoosethemostappropriatetissueforastudyofagroupofgenes,orconversely,bylimitingalonglistofgenecandidatestothesubsetthatarenormallyexpressedatagivenstageorinagiventissue.
Results
WereportaDrosophilaGeneExpressionTool(DGET,www.flyrnai.org/tools/dget/web/),whichstoresandfacilitatessearchofRNA-SeqbasedexpressionprofilesavailablefromthemodENCODEconsortiumandotherpublicdatasets.UsingDGET,researchersareabletolookupgeneexpressionprofiles,filterresultsbasedonthresholdexpressionvalues,andcompareexpressiondataacrossdifferentdevelopmentalstages,tissuesandtreatments.Inaddition,atDGETaresearchercananalyzetissueorstage-specificenrichmentforaninputtedlistofgenes(e.g.‘hits’fromascreen)andsearchforadditionalgeneswithsimilarexpressionpatterns.Weperformedanumberofanalysestodemonstratethequalityandrobustnessoftheresource.Inparticular,weshowthatevolutionaryconservedgenesexpressedathighormoderatelevelsinbothflyandhumantendtobeexpressedinsimilartissues.UsingDGET,wecomparedwholetissueprofileandsub-region/cell-typespecificdatasetsandestimatedthepotentialcauseoffalsepositivesinonedataset.WealsodemonstratedtheusefulnessofDGETforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.
Conclusion
Altogether,DGETprovidesaflexibletoolforexpressiondataretrievalandanalysiswithshortorlonglistsofDrosophilagenes,whichcanhelpscientiststodesignstage-ortissue-specificinvivostudiesanddoothersubsequentanalyses.
Keywords
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Drosophila,RNA-Seq,expressionprofile,synexpression
Background
Theapplicationofnext-generationsequencetechnologiestoRNAanalysishasopenedthedoortorelativelyrapid,large-scaleanalysesofgeneexpression.‘Standard’RNA-seqanalysis,forexample,canprovideasnapshotofgeneexpressioninspecificcelltypesortissues(Wang,Gersteinetal.2009),andrelatedtechnologiessuchasRibo-seq(MichelandBaranov2013)providemorerefinedviews,suchasasnapshotofwhatgenesareactivelytranscribedinagivencellortissue.ForDrosophila,effortssuchasthemodENCODEproject(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014)haveprovidedabaselineoverviewofexpressionunderstandardlaboratoryconditionsforvariousculturedcelltypes,developmentalstages,andtissues,aswellastreatmentconditions.Moreover,studiessuchasthoseinvestigatingexpressioninsub-regionsoftheflygut(MarianesandSpradling2013,Dutta,Dobsonetal.2015)areprovidingincreasinglydetailedviewsofthebaselineexpressionlevelsofvariousgenesinvarioustissues,celltypesandsub-regions.Altogether,theseRNAseqdataresourcesprovidehelpfulstartingpointsforanalysisofothergenelists.
ResourcessuchasFlyBase(dosSantos,Schroederetal.2015)makeitpossibletoquicklyviewmodENCODEdataforagivengeneandmakethesedatagenerallyaccessibletothecommunity.Thevalueofthesedatatothecommunitycanbefurtherincreasedbyfacilitatingsearchoflistsofgenes.Forexample,forgenelistsoriginatingfromwhole-animalorculturedcellstudies,orforstudiesbasedonalistoforthologsofgenesfromanotherspecies,itcanbeveryhelpfultogetapictureofwhatstagesortissuesnormallyexpressthosegenes,asthatwillhelpfocusstage-ortissue-specificinvivostudiesandothersubsequentanalyses.WeimplementedDGETtohelpscientistsretrievemodENCODEexpressiondatainbatchmode.DGETalsohostsotherrelevantRNA-Seqdatasetspublishedinindividualstudies,suchasprofilesofspecificsub-regionsandcelltypesoftheDrosophilagut(MarianesandSpradling2013,Dutta,Dobsonetal.2015).Here,wedescribeDGETandperformanumberofanalysestodemonstratethequalityandrobustnessoftheresource.
ResultsandDiscussion
Databasecontentandfeaturesoftheuserinterface(UI)
TheDGETdatabasecontainsprocessedRNA-SeqdatafromthemodENCODEconsortium(mod,Royetal.2010,Cherbas,Willinghametal.2011,Graveley,Brooksetal.2011,Boley,Wanetal.2014),asreleasedbyFlyBase(dosSantos,Schroederetal.2015),aswellasotherpublisheddatasetsweobtainedfromsupplementaltables(MarianesandSpradling2013,Dutta,Dobsonetal.2015,CloughandBarrett2016).TheDGETUIhastwotabs(Figure1).
Atthe“SearchGeneExpression”tab,userscanenteralistofgenesorchooseoneofthepredefinedgeneclassesfromGLAD(Hu,Comjeanetal.2015),e.g.kinases,thenspecifythedatasetstobedisplayed.Therearetwosearchoptions,“lookatexpression”and“enrichmentanalysis.”Theresultspagefor“lookatexpression”displaysexpressionvaluesinaheatmapformat.Atthisresultspage,usershavetheoptiontodownloadtherelevantexpressionvalues;downloadtheheatmap;andfurtherfilterthelistbydefiningacutoff,limittospecificdataset(s),orfilteringoutgenes,forexamplewithlessthan
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
1RPKMvaluebasedoncarcassand/ordigestionsystemexpressionof1dayadult.WeusedanRPKMcutoffof1becausethisisconsideredthecutofffor‘noorextremelylowexpression’atFlyBase.Theresultspageforanenrichmentanalysisdisplaysthedistributionofgenesatdifferentexpressionlevelsusingabargraphandheatmap.ThecutoffvaluesfordifferentlevelsaredefinedbasedonFlyBaseguidelines.
Usingthe“SearchSimilarGenes”tab,userscanenterageneofinterestandsearchforothergeneswithsimilarexpressionpatternbasedonPearsoncorrelationscore.Usershavetheoptionstodownloadthelistofgeneswithsimilarexpressionpatterns,aheatmap,andanormalizedheatmap.
ExpressionpatternofDrosophilaregulatorygenes
Whengenome-scalescreeningisnotpracticaltodo,acommonapproachistoselectaspecificsubsetofgenestostartwith,suchasagroupofgeneswithrelatedactivities.Themostfrequentlyscreenedsub-setsofgenesareimportantregulatorygenesincludinggenesthatencodekinases,phosphatases,transcriptionfactors,orcanonicalsignaltransductionpathwayscomponents.Ourexpectationisthattheseregulatorygenes,whichappeartobere-usedinmanycontexts,willbeexpressedinmanytissues.Totestthis,weanalyzedtheexpressionpatternsofseveralDrosophilaregulatorygeneclassesdefinedbyGLAD(Hu,Comjeanetal.2015).Theseincludedcanonicalsignaltransductionpathwaygenes,kinases,phosphatases,transcriptionfactors,secretedproteins,andreceptors.ThepercentagesofexpressedgeneswerecalculatedacrossalltissuesprofiledusingaRPKMof1oraboveasacutoffforexpressedversusnotexpressed(Figure2).About70-90%ofthegenescategorizedasencodingcanonicalsignaltransductionpathwaycomponents,kinases,phosphatases,ortranscriptionfactorsareexpressedineachofthemajortissuecategoriesprofiled,whereasonly30-60%ofreceptororsecretedproteinsaredetectedinanygiventissue.Correlationofexpressionwithconfidenceinanorthologrelationship
Itiswellestablishedthattheevolutionaryconservationofproteinscorrelateswithconservationatthelevelofbiologicaland/orbiochemicalfunctions.Drosophilaisamodelorganismofparticularinterestforwhichawidevarietyofmoleculargenetictoolsarereadilyavailable.Particularly,Drosophilamodelshavebeendevelopedforanumberofhumandiseases(Perrimon,Boninietal.2016).AccordingtoDIOPT,9,705of13,902protein-codinggenesinDrosophilaarepredictedtohavehumanortholog(s)(Hu,Flockhartetal.2011).UsingDGETweanalyzedtheexpressionlevelsofthesubsetofDrosophilagenesforwhichthereisevidencethattheyareconservedinthehumangenome.Specifically,weanalyzedsubsetsofgenesscoringasputativehumanorthologsofflygenesatdifferentlevelsofconfidence,asdefinedbytheDIOPTscore(Hu,Flockhartetal.2011).WefoundastrongcorrelationofpercentexpressedgeneswithDIOPTscore(Figure3).Forexample,forgenesthathaveahigh-confidenceorthologrelationship(DIOPTscoreof7orabove),almostallareexpressedacrossalltissues.Bycontrast,forgenesforwhichDIOPTanalysissuggeststhatthereisnoevidenceofahumanortholog(i.e.noneofthe10orthologalgorithmsqueriedwithDIOPTpredictanortholog),only20-50%areexpressedineachofthemajortissuecategoriesprofiled.Wesuspectthatthiscorrelationisdrivenby
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
essentialgenes,whicharemoreconservedevolutionary.Wealsonotethatgenesetenrichmentforthesetofhigh-confidenceorthologsindicatesthat“kinases”and“nucleotidebinding”amongthetop20enrichedsets,indicatingthatthesetofregulatorygenesanalyzedabovehasoverlapwiththisset.
Wenextanalyzedthe418DrosophilaessentialgenesidentifiedbySpradlingetal(Spradling,Sternetal.1999)usingalarge-scalesingleP-elementinsertionflystockcollection.TheproportionsofessentialgenesexpressedatdetectablelevelsinvarioustissuesareverysimilartothegeneswithDIOPTscore7-10(Figure3,lightpurpleanddarkpurplebars)withaPearsoncorrelationcoefficientequalto0.92.ExpressionpatternsofDrosophilaorthologsofhumangenesthatarehighlyexpressedinspecifictissues
Next,weaskedwhethergenesconservedbetweenhumanandDrosophilaarealsoexpressedinsimilarpatterns.Weusedthetissue-basedhumanproteomeannotationavailableattheHumanProteinAtlas(HPA)(www.proteinatlas.org)(Uhlen,Fagerbergetal.2015),asthesourcefortissue-specificexpression,andretrievedthesetofhumangenesthatareexpressedinspecifictissues.Next,wemappedthesehumangenestoDrosophilaorthologsusingDIOPT(Hu,Flockhartetal.2011),filteringoutlowrankorthologpairs(seeMaterialsandMethods),andanalyzedtheexpressionpatternsofthesehigh-confidenceorthologsinDrosophilatissuesusingDGET(Figure4).Theresultsofouranalysisusingallannotatedproteinswithoutafilterdidnotclearlydemonstrateconservationofexpressionpatterns.However,ananalysislimitedtogenesexpressedathighormoderatelevels(asannotatedbyHPA)fromhighconfidentannotation(i.e.excludingHPA“reliability”valueof“uncertain”),indicatesthatgeneexpressionpatternsareconservedinsimilartissuesinDrosophila.Forexample,asagroup,orthologsofgeneshighlyexpressedinthehumancerebellum,cerebralcortex,lateralventricleorhippocampusarehighlyexpressedintheDrosophilacentralnervoussystem(CNS)orhead,atbothlarvalandadultstages,andorthologsofgeneshighlyexpressedinhumantestisarealsohighlyexpressedintheDrosophilatestis.Moreover,orthologsofgenesfromsomeorgansofthehumandigestivesystem,suchasstomach,duodenumorsmallintestine,arealsohighlyexpressedintheDrosophiladigestivesystem.TofurthercomparetheexpressionpatternsofgenesexpressedinthehumanandDrosophila,digestivesystems,weanalyzedtheDrosophilagutsub-regiondatafromDuttaetal.(Dutta,Dobsonetal.2015)(Figure5).OrthologsofgeneshighlyexpressedinthehumansalivaryglandandesophagusarehighlyexpressedintheR1upstreamregion,andorthologsofgeneshighlyexpressedinthehumanrectum,colonorappendixaremorebiasedtowardsexpressionintheR5downstreamregion.Flyorthologsofgeneshighlyexpressedinthehumanstomach,duodenumandsmallintestinearedetectedthroughoutthesamplescorrespondingtoR1toR5.
Mininginformationfromdistinctbutrelatedflygutgeneexpressiondatasets
Wenextsoughttocomparetheresultsofwhole-gutprofilingwithresultsfromprofilingofspecificsub-regionsorcelltypeswiththegoalofidentifyinggenesonlyexpressedinspecificsub-populations.Ourrationalefortheanalysiswastodeterminethelikelihoodthatgenesexpressedinasub-populationaremissedinexpressionanalysisofanentireorgan.Thistypeoffalsenegativeanalysisshouldprovidehelpfulinformationforinterpretingresultsofwhole-organorwhole-tissuestudies.Thus,
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
wecomparedthewholegutprofilingdataobtainedbymodENCODEconsortiumfor20dayoldadultflies(mod,Royetal.2010)withdatageneratedbyprofilingsub-regionsofthemidgutin16-20dayoldadultflies(MarianesandSpradling2013).Wholegutprofilingindicatesthat9,109genesareexpressedinthegutof20dayoldadultflies(RPKMcutoffvalueof1).Amongthe4,790protein-codinggenesnotdetectedasexpressedinthewhole-gutstudy,136genesaredetectedinatleast3sub-regionsofthegut(RPKM>=3).Thesegenesareeitherfalsenegativeinwholegutprofilingorfalsepositiveinsub-regionprofiling.Wenextdidagenesetenrichmentanalysiswiththese136genesandfoundthatstressresponsegenes,suchasheat-shockgenes(Hsp70Aa,Hsp70Ab,Hsp70Ba,Hsp70Bbb)areenriched(Pvalue=3.05E-07).Thissuggeststhatthesampleusedforsub-regionprofilingwasassociatedwithsomelevelofstress.Comparingthelistof136geneswiththeDrosophilaessentialgenelist,wefoundonlyoneoverlappinggene.Inaddition,only23ofthe136geneshaveDIOPTscore7-10whenmappingtohumangenes.Thus,smallfractionofthesegenesmightbethefalsenegativewithwholetissueprofilingwhilemajorityofthegenesarelikelytobethefalsepositivesnotnormallypresentinthegutundernon-stressconditions.
SynexpressionanalysisfortranscriptionfactorTwist
Expressionprofilingisapowerfulapproachtoidentifyfunctionallyrelatedgenes,asgenesshowingsynexpressionoftenoperateinsimilarpathwaysand/orprocesses(seeforexample(Dequeant,Fagegaltieretal.2015)).WetestedDGETforitsusefulnessforsynexpressionstudiesbyqueryinggeneswithsimilarexpressionprofiletothemesodermalmasterregulatorTwist.DGETpreferentiallyretrievedTwisttargetgeneswithcelllinedataaswellasdevelopmentdata.Forexample,amongthetop27genesthatsharesimilarexpressionwithTwistincelllines(Pearsoncorrelationco-efficiencycutoff=0.7),11ofthemareTwisttargetgenesbasedonChip-on-chipdata(Sandmann,Girardotetal.2007),and8ofthe11genesarehigh-confidence(Table1).Theenrichmentp-valueforTwisttargetgenesis8.70E-04and3.05E-05forhigh-confidencetargets.Weobservedalesssignificantenrichmentwithdevelopmentdata(p-value5.00E-02forallTwisttargetgenesandp-valueof2.70E-03forhigh-confidencetargets),likelyreflectingthediversityofcelltypespresentinthedevelopmentaldataandthatnotenoughcellsexpresstwist.Thus,DGETwillbeverypowerfulwhenappliedtoRNA-seqdatasetsfromsinglecellorgroupsofhomogeneouscellpopulations.
ConcludingRemarks
Insummary,DGETmakesitpossibletoretrieveandcompareDrosophilageneexpressionpatternsgeneratedbyvariousgroupsusingRNA-Seq.Thetoolcanhelpscientistsdesignexperimentsbasedonexpressionandanalyzeexperimentresults.ThebackenddatabaseforDGETisdesignedtoeasilyaccommodatetheadditionofnewhighqualityRNA-Seqdatasetsastheybecomeavailable.Finally,althoughtheanatomyofhumanandDrosophilaarequitedifferent,byusingDGET,wedemonstratethatexpressionpatternsofgenesthatareconservedandhighlyexpressedareconservedbetweenhumanandDrosophilainmanymatchingtissues,underscoringtheutilityoftheDrosophilamodeltounderstandtheroleofhumangeneswithunknownfunctions.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Methods
Dataretrieval
ProcessedmodENCODEdatawereretrievedfromFlyBase(ftp://ftp.flybase.net/releases/current/precomputed_files/genes/gene_rpkm_report_fb_2015_05.tsv.gz).DatapublishedbyMarianesandSpradling(MarianesandSpradling2013)wereretrievedfromNCBIGeneExpressionOmnibusat(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47780).DatapublishedbyDuttaetal(Dutta,Dobsonetal.2015)wereretrievedfromtheflygut-seqwebsite(http://flygutseq.buchonlab.com/resources).DataretrievedweremappedtoFlyBaseidentifiersfromrelease2015_5andformattedforuploadintotheFlyRNAidatabase(Hu,Flockhartetal.2011).
Expressionpatternanalysis
Humanproteinexpressiondatawereretrievedfromproteinatlas.organdtissue-specificgeneswereselectedusingthefile“ProteinAtlas_Normal_tissue_vs14.”Proteinswithhighormediumexpressionlevelswithareliabilityvalueof“supportive”wereselected.Proteinsexpressedinabroadrangeoftissues(i.e.morethan5tissues)werefilteredout.DIOPTvs5wasusedtomapgenesfromhumantoDrosophila(Hu,Flockhartetal.2011).‘Orthologpairrank’wasaddedatrecentDIOPTrelease5.2.1(http://www.flyrnai.org/DRSC-ORH.html#versions).Drosophilageneswithhighormoderaterankwereselected.Thehigh/moderaterankmappingincludethegenepairsthatarebestscoreineitherforwardorreversemapping(andDIOPTscore>1)aswellasgenepairswithDIOPTscore>3ifnotbestscoreeitherway.
Implementation
DGETwasimplementedusingphpandJavaScriptwithMySQLdatabasefordatastore.ItishostedonaserverprovidedbytheResearchITGroup(RITG)atHarvardMedicalSchool.TheMySQLdatabaseisalsohostedonaserverprovidedbyRITG.Plottingofheat-mapsforsvgdownloadisdoneinRusingthegplotheatmapfunction.Websitebarchartsaredrawnusingthe3C.jsplottingpackage.ThephpsymfonyframeworkscaffoldisusedtocreateDGETwebpagesandforms.
Declarations
FundingWorkattheDRSCissupportedbyNIGMSR01GM067761,NIGMSR01GM084947,andORIP/NCRRR24RR032668.S.E.M.isadditionallysupportedinpartbyNCICancerCenterSupportGrantNIH5P30CA06516(E.Benz,PI).N.PisanInvestigatoroftheHowardHughesMedicalInstitute.
Competinginterest
Theauthorsdeclarethattheyhavenocompetinginterests.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Authors’contributions
YHdesignedandtestedtheapplication,implementedtheback-endoftheapplication,performedtheanalysisanddraftedthemanuscript.ACimplementedtheuserinterfaceandcontributedtotheback-endoftheapplication.NPprovidedcriticalinputonkeyfeaturesandtheanalysisaswellaseditedthemanuscript.SEMprovidedoversightandcriticalinputonkeyfeaturesandtheanalysis,andhelpeddraftthemanuscript.Allauthorsreadandapprovedthefinalmanuscript.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Table1.DGETsimilargenesearchresultsforTwistwithcelllinedataFBgn Gene Correlationscore Twisttarget?*FBgn0005636 nvy 0.910987 Yes,highconfidentFBgn0031738 CG9171 0.88094 Yes,highconfidentFBgn0015568 alpha-Est1 0.831094 FBgn0035733 CG8641 0.816603 FBgn0034997 CG3376 0.813417 Yes,lowconfidentFBgn0040091 Ugt58Fa 0.799835 FBgn0039827 CG1544 0.773761 FBgn0010389 htl 0.772568 Yes,highconfidentFBgn0001250 if 0.769353 Yes,highconfidentFBgn0039799 CG15543 0.765649 FBgn0038755 Hs6st 0.76281 Yes,highconfidentFBgn0265577 CR44404 0.76095 FBgn0037439 CG10286 0.745739 FBgn0025682 scf 0.744414 FBgn0003301 rut 0.74375 Yes,lowconfidentFBgn0036147 Plod 0.73896 FBgn0000723 FER 0.738927 Yes,lowconfidentFBgn0034804 CG3831 0.735346 FBgn0051075 CG31075 0.731916 FBgn0263144 bin3 0.72961 Yes,highconfidentFBgn0000575 emc 0.728894 Yes,highconfidentFBgn0038353 CG5399 0.724139 FBgn0085407 Pvf3 0.720044 Yes,highconfidentFBgn0036857 CG9629 0.716929 FBgn0039073 CG4408 0.714359 FBgn0037632 Tcp-1eta 0.702547 FBgn0038804 CG10877 0.701509 *Twisttargetsasdefinedin(Sandmann,Girardotetal.2007)
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Figure1.TheDGETuserinterface.
1a.Onthe“SearchGeneExpression”page,userscaninputagenelistbypastingDrosophilageneorproteinsymbolsorIDsintothetextbox,orbyuploadingafile.ThespecificidentifiersacceptedareFlyBaseGeneIdentifier(FBgn),genesymbol,CGnumber,andfullgenename.UserscanchoosetolookatexpressionpatternsorperformanenrichmentanalysisoftheinputtedlistascomparedwiththeunderlyingRNA-Seqdata.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
1b.Atthe“SearchSimilarGenes”page,userscanenteragenesymbol(orotheracceptedidentifier)tofindgeneswithsimilarexpressionpatterns.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Figure2.ExpressionpatternsofgenesinmajorDrosophilaregulatorygenegroups.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Figure3.Relationshipbetweenexpressionlevelsandgeneconservation.
Drosophilagenesthatareconservedinthehumangenomeatdifferentconfidencelevels(i.e.differentDIOPTscores)wereanalyzedbyDGET.Wefoundthatacrossalltissues,expressionlevelscorrelatewithconfidenceintheorthologrelationship.Thatis,ingeneral,geneswithhigherDIOPTscoresvs.humangeneshavehigherexpressionlevels.GeneswithDIOPTscoresof7-10(lightpurplebars)havesimilarexpressionpatternsascomparedwithDrosophilaessentialgenes(darkpurplebars);i.e.inbothcases,thegenesarelikelytobeexpressedinmanyoralltissues.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Figure4.ComparisonofgeneexpressionpatternsinhumansandDrosophila.
High-confidenceDrosophilaorthologsofgenesthatarehighlyexpressedinthesmallintestine,ovary,testis,cerebellum,cerebralcortex,orothertissueswereanalyzedusingDGET.Foratleastsometissues,weseeacorrelationbetweengeneshighlyexpressedinspecifichumantissues(e.g.cerebellum,testis)andtheexpressionoforthologsincognatetissuesample(s)availableforDrosophila(e.g.CNSorhead,testis).
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
Figure5.ComparisonofDrosophilagutsub-regiondatawiththehumandigestivesystem.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
References
Boley,N.,K.H.Wan,P.J.BickelandS.E.Celniker(2014)."NavigatingandminingmodENCODEdata."Methods68(1):38-47.Cherbas,L.,A.Willingham,D.Zhang,L.Yang,Y.Zou,B.D.Eads,J.W.Carlson,J.M.Landolin,P.Kapranov,J.Dumais,A.Samsonova,J.H.Choi,J.Roberts,C.A.Davis,H.Tang,M.J.vanBaren,S.Ghosh,A.Dobin,K.Bell,W.Lin,L.Langton,M.O.Duff,A.E.Tenney,C.Zaleski,M.R.Brent,R.A.Hoskins,T.C.Kaufman,J.Andrews,B.R.Graveley,N.Perrimon,S.E.Celniker,T.R.GingerasandP.Cherbas(2011)."Thetranscriptionaldiversityof25Drosophilacelllines."GenomeRes21(2):301-314.Clough,E.andT.Barrett(2016)."TheGeneExpressionOmnibusDatabase."MethodsMolBiol1418:93-110.Dequeant,M.L.,D.Fagegaltier,Y.Hu,K.Spirohn,A.Simcox,G.J.HannonandN.Perrimon(2015)."Discoveryofprogenitorcellsignaturesbytime-seriessynexpressionanalysisduringDrosophilaembryoniccellimmortalization."ProcNatlAcadSciUSA112(42):12974-12979.dosSantos,G.,A.J.Schroeder,J.L.Goodman,V.B.Strelets,M.A.Crosby,J.Thurmond,D.B.Emmert,W.M.GelbartandC.FlyBase(2015)."FlyBase:introductionoftheDrosophilamelanogasterRelease6referencegenomeassemblyandlarge-scalemigrationofgenomeannotations."NucleicAcidsRes43(Databaseissue):D690-697.Dutta,D.,A.J.Dobson,P.L.Houtz,C.Glasser,J.Revah,J.Korzelius,P.H.Patel,B.A.EdgarandN.Buchon(2015)."RegionalCell-SpecificTranscriptomeMappingRevealsRegulatoryComplexityintheAdultDrosophilaMidgut."CellRep12(2):346-358.Graveley,B.R.,A.N.Brooks,J.W.Carlson,M.O.Duff,J.M.Landolin,L.Yang,C.G.Artieri,M.J.vanBaren,N.Boley,B.W.Booth,J.B.Brown,L.Cherbas,C.A.Davis,A.Dobin,R.Li,W.Lin,J.H.Malone,N.R.Mattiuzzo,D.Miller,D.Sturgill,B.B.Tuch,C.Zaleski,D.Zhang,M.Blanchette,S.Dudoit,B.Eads,R.E.Green,A.Hammonds,L.Jiang,P.Kapranov,L.Langton,N.Perrimon,J.E.Sandler,K.H.Wan,A.Willingham,Y.Zhang,Y.Zou,J.Andrews,P.J.Bickel,S.E.Brenner,M.R.Brent,P.Cherbas,T.R.Gingeras,R.A.Hoskins,T.C.Kaufman,B.OliverandS.E.Celniker(2011)."ThedevelopmentaltranscriptomeofDrosophilamelanogaster."Nature471(7339):473-479.Hu,Y.,A.Comjean,L.A.Perkins,N.PerrimonandS.E.Mohr(2015)."GLAD:anOnlineDatabaseofGeneListAnnotationforDrosophila."JGenomics3:75-81.Hu,Y.,I.Flockhart,A.Vinayagam,C.Bergwitz,B.Berger,N.PerrimonandS.E.Mohr(2011)."Anintegrativeapproachtoorthologpredictionfordisease-focusedandotherfunctionalstudies."BMCBioinformatics12:357.Marianes,A.andA.C.Spradling(2013)."PhysiologicalandstemcellcompartmentalizationwithintheDrosophilamidgut."Elife2:e00886.Michel,A.M.andP.V.Baranov(2013)."Ribosomeprofiling:aHi-Defmonitorforproteinsynthesisatthegenome-widescale."WileyInterdiscipRevRNA4(5):473-490.mod,E.C.,S.Roy,J.Ernst,P.V.Kharchenko,P.Kheradpour,N.Negre,M.L.Eaton,J.M.Landolin,C.A.Bristow,L.Ma,M.F.Lin,S.Washietl,B.I.Arshinoff,F.Ay,P.E.Meyer,N.Robine,N.L.Washington,L.DiStefano,E.Berezikov,C.D.Brown,R.Candeias,J.W.Carlson,A.Carr,I.Jungreis,D.Marbach,R.Sealfon,M.Y.Tolstorukov,S.Will,A.A.Alekseyenko,C.Artieri,B.W.Booth,A.N.Brooks,Q.Dai,C.A.Davis,M.O.Duff,X.Feng,A.A.Gorchakov,T.Gu,J.G.Henikoff,P.Kapranov,R.Li,H.K.MacAlpine,J.Malone,A.Minoda,J.Nordman,K.Okamura,M.Perry,S.K.Powell,N.C.Riddle,A.Sakai,A.Samsonova,J.E.Sandler,Y.B.Schwartz,N.Sher,R.Spokony,D.Sturgill,M.vanBaren,K.H.Wan,L.Yang,C.Yu,E.Feingold,P.Good,M.Guyer,R.Lowdon,K.Ahmad,J.Andrews,B.Berger,S.E.Brenner,M.R.Brent,L.Cherbas,S.C.Elgin,T.R.Gingeras,R.Grossman,R.A.Hoskins,T.C.Kaufman,W.Kent,M.I.Kuroda,T.Orr-Weaver,N.Perrimon,V.Pirrotta,J.W.Posakony,B.Ren,S.Russell,P.Cherbas,B.R.Graveley,S.Lewis,G.Micklem,B.Oliver,P.J.Park,S.E.Celniker,S.Henikoff,G.H.Karpen,E.C.Lai,D.M.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint
MacAlpine,L.D.Stein,K.P.WhiteandM.Kellis(2010)."IdentificationoffunctionalelementsandregulatorycircuitsbyDrosophilamodENCODE."Science330(6012):1787-1797.Perrimon,N.,N.M.BoniniandP.Dhillon(2016)."Fruitfliesonthefrontline:thetranslationalimpactofDrosophila."DisModelMech9(3):229-231.Sandmann,T.,C.Girardot,M.Brehme,W.Tongprasit,V.StolcandE.E.Furlong(2007)."AcoretranscriptionalnetworkforearlymesodermdevelopmentinDrosophilamelanogaster."GenesDev21(4):436-449.Spradling,A.C.,D.Stern,A.Beaton,E.J.Rhem,T.Laverty,N.Mozden,S.MisraandG.M.Rubin(1999)."TheBerkeleyDrosophilaGenomeProjectgenedisruptionproject:SingleP-elementinsertionsmutating25%ofvitalDrosophilagenes."Genetics153(1):135-177.Uhlen,M.,L.Fagerberg,B.M.Hallstrom,C.Lindskog,P.Oksvold,A.Mardinoglu,A.Sivertsson,C.Kampf,E.Sjostedt,A.Asplund,I.Olsson,K.Edlund,E.Lundberg,S.Navani,C.A.Szigyarto,J.Odeberg,D.Djureinovic,J.O.Takanen,S.Hober,T.Alm,P.H.Edqvist,H.Berling,H.Tegel,J.Mulder,J.Rockberg,P.Nilsson,J.M.Schwenk,M.Hamsten,K.vonFeilitzen,M.Forsberg,L.Persson,F.Johansson,M.Zwahlen,G.vonHeijne,J.NielsenandF.Ponten(2015)."Proteomics.Tissue-basedmapofthehumanproteome."Science347(6220):1260419.Wang,Z.,M.GersteinandM.Snyder(2009)."RNA-Seq:arevolutionarytoolfortranscriptomics."NatRevGenet10(1):57-63.
.CC-BY-NC 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted September 15, 2016. . https://doi.org/10.1101/075358doi: bioRxiv preprint