+ All Categories
Home > Documents > New Perspec*ves on Gene Family Evolu*on: losses in reconciliaon … · 2017. 3. 20. · Fit Stas*c:...

New Perspec*ves on Gene Family Evolu*on: losses in reconciliaon … · 2017. 3. 20. · Fit Stas*c:...

Date post: 28-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
New Perspec*ves on Gene Family Evolu*on: losses in reconcilia*on and a link with supertrees By: Cedric Chauve and Nadia El-Mabrouk Presenta*on by Julie Hudson For MAT5313 March 10, 2017
Transcript
  • NewPerspec*vesonGeneFamilyEvolu*on:lossesinreconcilia*onandalink

    withsupertrees

    By:CedricChauveandNadiaEl-Mabrouk

    Presenta*onbyJulieHudsonForMAT5313March10,2017

  • Overview•  Twomainproblems:– Reconcilingagenetreewithaknownspeciestree– Determiningaprobablespeciestreegivenonlygenetrees

    •  Op#miza#onproblems•  Thispresenta*onwillleavetheoremswithoutproof(orouten*rely)andinsteadaimstobroadlyintroducereconcilia*on

    •  Experimentalresultsarepresentedatend

  • GeneFamily

    •  Genesthatevolvedfromacommonancestorthroughspecia*onandduplica*on

    •  Containorthologs(genecopiesindifferentspecies)andparalogs(copiesevolvedbyduplica*on)

    •  Importantalsoarethegenelosses,arisingthroughpseudogeniza*on(func*onlostthroughadisrup*ontothecodingsequence)

  • TreeTerminology

    •  SpeciesTree:binarytreewithG={1,2,…,g}leaves,oneforeveryspecies

    •  GeneTree:binarytreewhereeachleafislabelledfromGandrepresentsagenecopy

    •  L(x):Thegenomesetofvertexx•  Forest:Asetofgenetrees

  • Reconcilia*on

    •  Mostcommonlyusedmethodsforinferringevolu*onaryscenariosarereconcilia*onapproaches

    •  Reconcilia*onisamapbetweenagenetreeandspeciestreewhereincongruencesareexplainedthroughhypothesizedgeneduplica*onsandlosses

  • Reconcilia*on•  Definedintermsofsubtreeinser#ons•  A\erundergoingsubtreeinser*onsatreeissaidtobeanextensionoftheoriginal

    •  Areconcilia*onbetweenagenetree(T)andspeciestree(S)isanextensionofTthatisDS-consistentwithS

    •  DS-consistent:ifforeveryvertexxofTsuchthat|L(x)|≥2,thereexistsavertexuofSsuchthatL(x)=L(u)andoneofthefollowingcondi*onsholds:L(xr)=L(xl)[duplica*onevent]orL(xr)=L(ur)andL(xl)=L(ul)

  • AlgorithmMinimum-Reconcilia*on

    •  Theorem2:AlgorithmMinimum-Reconcilia*onreconstructstheuniquereconcilia*onbetweenTandSthatminimizesthenumberofgenelosses

    •  Aseriesofsubtreeinser*onsonTcorrespondingtospecia*oneventsofS

    •  Roughly,visitleafonT->checksibling->doesitmatchthesiblingonS?->insertifdoesn’t

    •  Turnourleavesintocherries!

  • SpeciesTree(S)

    12345 11234515

    GeneTree(T)

    AlgorithmMinimum-Reconcilia*on

  • 12345 11234515

    Sibling:2 Sibling:NA

    AlgorithmMinimum-Reconcilia*on

    SpeciesTree(S)

    GeneTree(T)

  • 12345

    Sibling:2 Sibling:2

    121234515

    AlgorithmMinimum-Reconcilia*on

    SpeciesTree(S)

    GeneTree(T)

  • 12345

    CheckingformatchingpahernfromSinTAllthesematch

    121234515

    AlgorithmMinimum-Reconcilia*on

    SpeciesTree(S)

    ReconciledTree(T)

  • 12345

    Finishingfirstitera*on

    12123451245

    AlgorithmMinimum-Reconcilia*on

    SpeciesTree(S)

    ReconciledTree(T)

  • 12345

    FinalReconciledTree

    12123451245

    AlgorithmMinimum-Reconcilia*on

    345 123

    ReconciledTree(T)SpeciesTree(S)

  • AlgorithmMinimum-Reconcilia*on•  MinR(S,T)•  Visitseachvertexexactlyonce•  Runsinlinear#me

    •  Fromitera*onsofMinR(S,T),anevolu*onaryscenariocanbedrawn

    expanded leaf of T is a vertex x such that |L(x)| = 1 and L(x) ̸= L(xp),or x is the root of T . A cherry of a tree is an internal vertex x for whichboth children are expanded leaves.

    Reconciliation. There are several definitions of reconciliation between agene tree and a species tree. Here we define reconciliation in terms of sub-tree insertions, following an approach used in [16, 7]. A subtree insertionin a tree T consists in grafting a new subtree onto an existing branch ofT . A tree T ′ is said to be an extension of T if it can be obtained from Tby a sequence subtree insertions in T .

    Given a gene tree T on G and a species tree S on G, T is said to beDS-consistent with S (following the terminology used in [7]) if, for everyvertex x of T such that |L(x)| ≥ 2, there exists a vertex u of S such thatL(x) = L(u) and one of the two following conditions (D) or (S) holds:(D) either L(xr) = L(xℓ), or (S) L(xr) = L(ur) and L(xℓ) = L(uℓ).

    A reconciliation between a gene tree T and a species tree S is anextension R of T that is DS-consistent with S (this definition is easilyshown to be equivalent to other definitions of reconciliation [3, 12]). Sucha reconciliation between T and S implies an unambiguous evolution sce-nario for the gene family T where a vertex of R that satisfies property(D) represents a duplication (the number of duplications induced by R isdenoted by d(R,S)), and an inserted subtree represents a gene loss (thenumber of gene losses induced by R is denoted by ℓ(R,S)). Vertices of Rthat satisfy property (S) represent speciation events (see Fig. 1).

    2111 12 13

    31 41

    Genome 2Genome 1

    Genome 4Genome 3

    Speciation 1,3

    Duplication

    Gene loss

    Duplication

    Speciation 3,4

    Speciation 1,2 Gene lossGene loss

    (c) H:

    2111 12 3113 41

    A

    B C

    4321

    (a) S:

    2432

    A

    AA

    B B B CC

    (b)

    Fig. 1. (a) A species tree S; (b) The reconciliation R of S with the gene tree T rep-resented by plain lines. Dotted lines represent subtree insertions (3 insertions). Thecorrespondence between vertices of R and S is indicated by vertices labels. Circles rep-resent duplications. All other internal vertices of R are speciation vertices; (c) Evolutionscenario resulting from R. Each oval is a gene copy.

    Given a gene tree T , it is immediate to see that every vertex x of Tsuch that L(xℓ) ∩ L(xr) ̸= ∅ will always be a duplication vertex in any

  • Problem2

    •  Whataboutwhenwedon’thaveaspeciestreetoinformtheevolu*onaryscenarios?

    •  Goal:Findanevolu*onaryhistorythatiscompa*blewithasmanyofthegenetreesintheforestaspossible

    •  NotethatminimizinglosscostsdoesNOTminimizeduplica*oncostsinthisproblemsoduplica*onsarethefocus

  • Supertree!

    •  Aninducedspeciestreefromasetofuniquelyleaf-labelledgenetrees

    •  Trea*ngourproblemasasupertreeproblemmeansthattheheuris*csusedonsupertreescouldbeusefulhereIFwecanshowthistobeasupertreeproblemofsorts

    •  Issue:Ourgenetreesarenotuniquelylabelled

  • SolvingSupertreethroughBipar**ons•  Bipar##on:a“collapsed”uniquely-labelledtreewhereonly3internalver*cesexist:aparent,andtwochildren.Fromthesenon-binaryver*ces,leavesofallspeciesinthegenomesetarepresent(B)

    •  C(B,T)isthenumberofbipar**onsnotconsistentwiththespeciestree

    •  Minimizingthissolvesthesupertreeproblem

    becomes

    12345 12345

  • Rela*ontoaSupertree

    •  IFweleteachgenetreebeasinglespecia*onevent,

    •  Theorem3:LetFbeaforestofgenetreesonGandkbethenumberofapparentduplica*onspresentinthetreesofF.ThenforanyspeciestreeSonG,d(F,S)=k+C(B(F),T))

    •  Transla*on:theduplica*oncost(whichwewanttominimize)isafunc*onofthenumberofinconsistentbipar**ons

  • So….?•  Wecanapplysupertreeheurisi*cstotheminimumduplica*onop*miza*on!

    •  Inpar*cular,themincutalgorithmisagreedyapproachtosolveit

    •  Runsinpolynomial*me

    •  Majorresult:anMDforestwillbeacompa*blegenetreeforest!

    •  Compa*bilityis,looselyspeaking,wherethegenetreesandspeciestreeagreeateverypointyoucancutthem

    •  ThisgivesONEspeciestreethatisthemostparsimoniousevolu*onhistory

  • ExperimentalResults

    •  250genetreesweresimulatedusinga12speciesDrosophiliatreefor4differentgenegain/lossrates

    •  Fromthesegeneforestsinforma*vebipar**onswereusedtocomputeaspeciestree(usingamin-cutalgorithm)

    Rate Nb. of Nb. of Losses Nb. of Genes Nb. of Int. Nb. of Apparent Nb. ofDuplications vertices duplications Bipartitions0.02 1080 976 3014 2752 1057 8310.05 2018 1366 3622 3360 1948 5930.1 3126 1603 4376 4114 3007 3580.2 6123 2552 7709 7447 5875 429

    Table 1. Characteristics of simulated gene trees. Considered bipartitions are thosecontaining more than two species.

    Modified Min-Cut algorithm described in [27] to compute a species treefrom these bipartitions. With rates 0.02 and 0.04, this species tree is thecorrect species tree, while with rate 0.1, it differs from the correct one bya single branch swap, and with rate 0.2, it differs from the correct oneby the fact that two consecutive binary nodes have been replaced by asingle quaternary node. The fit statistic associated to the inferred speciestree, that measures how well it agrees with the bipartitions, is very high,ranging from 0.98 to 0.855 (maximum fit is 1). This shows the effective-ness of the supertree approach using bipartitions, at least on a dataset ofrelatively close species where few vertices indicating a speciation are falsepositive.

    We also studied the phylogenetic signal given by triplets of speciesthat were split by non-apparent duplication vertices. With rates 0.02 and0.05, for each triplet of species, there is a phylogeny that appears inmost cases. However, with rates 0.1 and 0.2, among the triplets thatappear a significant number of times (at least 50 times), the ones wherethe dominant phylogeny appears in less than 90% of the bipartitionssplitting this triplet, contain the two species involved in the branch swapor species involved in the unresolved node that differs from the correctspecies tree. This illustrates the interest in using triplets of species that aresplit by non-apparent duplication vertices to point at possible locationsof an inferred species tree that are associated with a weaker phylogeneticsignal.

    6 Conclusion

    In this paper, we show that minimizing losses is a more constraining crite-rion than minimizing duplications for reconciliation. This highlights theimportance of the former criterion from a combinatorial point of view,although it has been rarely considered alone in reconciliation approaches.

  • YAK

    SEC

    WILMOJ

    SIM

    GRI

    PSEPERANA

    VIR

    EREMEL

    Treefromlowestratesofgain/loss

    SECSIM

    ERE

    PER

    VIR

    WIL

    PSE

    MOJ

    ANA

    MELYAK

    GRI

    Treefromhighestrateofgain/loss

    ExperimentalResults

    FitSta*s*c:0.98

    FitSta*s*c:0.85

  • Conclusion•  Op*miza*onproblemshaveasagoalefficiency•  Inproblem1,asimplyimplementedalgorithmtoinducereconcilia*onwasintroducedthatrunsinlinear*me

    •  Problem2firstdescribeditselfintermsofasupertreeproblemandusesthosewelltestedalgorithmstosolvetheminimumduplica*onproblem

    •  Fairlyobviously,errorsingenetreescanleadtoerroneousduplica*on/losshistoriesandspeciestrees.Supertreemethodshighlightpoten*alerrorsfortreepruningpurposes(usingsplittripletstodetectweakphylogene*csignal)


Recommended