+ All Categories
Home > Documents > Bioinformatics and Advanced Programming2014/12/11  · Bioinformatics Jan T. Kim Introduction Mol....

Bioinformatics and Advanced Programming2014/12/11  · Bioinformatics Jan T. Kim Introduction Mol....

Date post: 25-Jan-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
125
Bioinformatics Jan T. Kim Introduction Mol. Bio. Basics Plant Phylogeny FMD Transmission Sequence Analysis Pairwise Alignment BLAST NGS MSA Bioinformatics and Advanced Programming Jan T. Kim BCS Advanced Programming SG, 11 Dec 2014
Transcript
  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Bioinformatics and Advanced Programming

    Jan T. Kim

    BCS Advanced Programming SG, 11 Dec 2014

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Abstract

    Bioinformatics can be defined strictly as the science of information in biological systems, or more broadly asdeveloping and applying computational tools for analysing biological data. The biological processes thatgenerate this information, particularly evolution, are highly complex, and therefore analysis of biologicalinformation is often computationally challenging. I will present the following selected topics and highlight theadvanced computing challenges they involve, and also outline advances in the biosciences that have beenenabled by tackling these challenges.Many bioinformatics analyses are based on DNA sequences which today can be determined at very highvolume through ”Next Generation Sequencing” (NGS) techniques. As a result, the volume of publiclyavailable sequence data has reached the range of petabytes. Searching this body of data requires highlyoptimised computational approaches, such as BLAST (”Basic Local Alignment Search Tool”).More recently, NGS methods that generate very large numbers of ”short reads”, i.e. strings of sequence .Central computational challenges resulting from these new technologies are ”de novo assembly” of theoriginal long sequence(s) from short reads, and mapping very large numbers of short reads to a knownreference sequence.Phylogeny analysis, i.e. reconstruction of ancestry relationships among species, is a classical field ofbioinformatics which typically involves two steps, first a multiple alignment of the sequences is computedwhich subsequently is used to compute a tree. Computing multiple alignments is an optimisation problemthat can only be approximately solved.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The Pirbright Institute

    Preventing and controlling viral diseases

    Core funding:

    Project funding by BBSRCand many others.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Bioinformatics: Definition(s)

    • Scientific inquiry into information in biological systems.

    • Computational analysis of biological data.

    • Computer assisted mining of biological literature.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    DNA: Structure

    POH

    OH

    O

    O

    CH2

    ✔✔❚❚

    ✧✧

    ❜❜

    O✧✧❜❜

    ❜❜

    ✧✧❜❜

    ✧✧ N

    N

    NH2

    N

    N

    O

    POHO

    O

    CH2

    ✔✔❚❚

    ✧✧

    ❜❜

    O

    ❜❜

    ✧✧❜❜

    ✧✧ N

    N

    NH2

    O

    HO

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Base Complementarity

    http://commons.wikimedia.org/wiki/File:DNA chemical structure.svg

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The “Central Dogma”

    http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biologyhttp://en.wikipedia.org/wiki/File:Cdmb.svg

    http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biologyhttp://en.wikipedia.org/wiki/File:Cdmb.svg

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The Success of Bioinformatics

    • The Object: Information in biological systems:

    In living systems, a dynamics of information hasgained control over the dynamics of energy,which determines the behavior of most non-livingsystems.

    [Langton, 1992]

    • Genetic information is digital.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The Success of Bioinformatics

    • The Object: Information in biological systems:

    In living systems, a dynamics of information hasgained control over the dynamics of energy,which determines the behavior of most non-livingsystems.

    [Langton, 1992]

    • Genetic information is digital.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The Success of Bioinformatics

    • The Object: Information in biological systems:

    In living systems, a dynamics of information hasgained control over the dynamics of energy,which determines the behavior of most non-livingsystems.

    [Langton, 1992]

    • Genetic information is digital.

    TACCGTCAC

    CTACACCAT

    ACCTACATG

    TTCACATTA

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Sequence Data Is Big Data

    • NCBI-GenBank Flat File Release 204.0 (15 Oct 2014):ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt

    • 178,322,253 loci,• 181,563,676,918 bases.

    [Crosswell and Thornton, 2012]

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Sequence Data Is Big Data

    • NCBI-GenBank Flat File Release 204.0 (15 Oct 2014):ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt

    • 178,322,253 loci,• 181,563,676,918 bases.

    http://www.ebi.ac.uk/ena/about/statistics

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Land Plant Phylogeny

    Angiosperms(flowering plants)

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Land Plant Phylogeny

    Angiosperms(flowering plants)

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Land Plant Phylogeny

    Angiosperms(flowering plants)

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Phylogeny of MADS Proteins

    98

    78

    ZMM1 (Zea)

    FBP11 (Petunia)

    BAG1 (Brassica)

    AG (Arabidopsis)

    PLE (Antirrhinum)

    100

    GGM3 (Gnetum)

    DAL2 (Picea)

    GGM10 (Gnetum)

    AGL12 (Arabidopsis)

    88

    98

    TOBMADS1 (Nicotiana)

    TM3 (Lycopersicon)

    SAMADSA (Sinapis)

    DEFH24 (Antirrhinum)

    AGL14 (Arabidopsis)

    98

    DAL3 (Picea)

    GGM1 (Gnetum)

    78

    624658

    AGL13 (Arabidopsis)

    AGL6 (Arabidopsis)

    Zag5 (Zea)

    ZAG3 (Zea)

    62

    PRMADS2 (Pinus)

    GBM1 (Ginkgo)

    PRMADS3 (Pinus)

    DAL1 (Picea)

    GGM11 (Gnetum)

    GGM9 (Gnetum)

    100

    ZMM6 (Zea)

    OM1 (Aranda)

    EGM1 (Eucalyptus)

    FBP2 (Petunia)

    AGL2 (Arabidopsis)

    100

    ZAP1 (Zea)

    AP1 (Arabidopsis)

    SQUA (Antirrhinum)

    EAP2 (Eucalyptus)

    TM4 (Lycopersicon)

    100

    GGM4 (Gnetum)

    GGM8 (Gnetum)

    GGM5 (Gnetum)

    CRM7 (Ceratopteris)

    OPM1 (Ophioglossum)

    OPM5 (Ophioglossum)

    CRM6 (Ceratopteris)

    OPM4 (Ophioglossum)

    GGM12 (Gnetum)

    GGM6 (Gnetum)

    GGM7 (Gnetum)

    98

    NMHC5 (Medicago)

    AGL17 (Arabidopsis)

    DEFH125 (Antirrhinum)

    ANR1 (Arabidopsis)

    OPM3 (Ophioglossum)

    100

    CRM9 (Ceratopteris)

    CRM3 (Ceratopteris)

    4852

    100

    NTGLO (Nicotiana)

    GLO (Antirrhinum)

    PI (Arabidopsis)

    DAPI (Delphinium)

    OSMADS2 (Oryza)

    100

    PMADS1 (Petunia)

    NTDEF (Nicotiana)

    DEF (Antirrhinum)

    TM6 (Lycopersicon)

    BOBAP3 (Brassica)

    AP3 (Arabidopsis)

    58

    DAL13 (Picea)

    GGM2 (Gnetum)

    GGM13 (Gnetum)

    98

    CRM1 (Ceratopteris)

    CRM5 (Ceratopteris)

    CRM4 (Ceratopteris)

    CRM2 (Ceratopteris)

    CERMADS5 (Ceratopteris)

    CRM10 (Ceratopteris)

    100

    AGL15-1 (Brassica)

    AGL15-2 (Brassica)

    AGL15 (Arabidopsis)

    AG

    TM

    3

    AG

    L6

    AG

    L2

    SQ

    UA

    GG

    M4

    AG

    L17

    CR

    M3

    GL

    O

    DE

    F

    DE

    F/G

    LO

    CR

    M1

    AG

    L15

    Angiosperms Gnetales Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The AG and AGL6 Subfamilies

    9878

    ZMM1 (Zea)FBP11 (Petunia)BAG1 (Brassica)AG (Arabidopsis)

    PLE (Antirrhinum)100 GGM3 (Gnetum)

    DAL2 (Picea)GGM10 (Gnetum)

    AG subfamily

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The AG and AGL6 Subfamilies

    9878

    ZMM1 (Zea)FBP11 (Petunia)BAG1 (Brassica)AG (Arabidopsis)

    PLE (Antirrhinum)100 GGM3 (Gnetum)

    DAL2 (Picea)GGM10 (Gnetum)

    AG subfamily

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The AG and AGL6 Subfamilies

    9878

    ZMM1 (Zea)FBP11 (Petunia)BAG1 (Brassica)AG (Arabidopsis)

    PLE (Antirrhinum)100 GGM3 (Gnetum)

    DAL2 (Picea)GGM10 (Gnetum)

    AG subfamily

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The AG and AGL6 Subfamilies

    9878

    ZMM1 (Zea)FBP11 (Petunia)BAG1 (Brassica)AG (Arabidopsis)

    PLE (Antirrhinum)100 GGM3 (Gnetum)

    DAL2 (Picea)GGM10 (Gnetum)

    AG subfamily

    62

    46

    58AGL13 (Arabidopsis)

    AGL6 (Arabidopsis)Zag5 (Zea)ZAG3 (Zea)

    62

    PRMADS2 (Pinus)GBM1 (Ginkgo)

    PRMADS3 (Pinus)DAL1 (Picea)GGM11 (Gnetum)GGM9 (Gnetum)

    ZMM6 (Zea)

    AGL6 subfamily

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The AG and AGL6 Subfamilies

    9878

    ZMM1 (Zea)FBP11 (Petunia)BAG1 (Brassica)AG (Arabidopsis)

    PLE (Antirrhinum)100 GGM3 (Gnetum)

    DAL2 (Picea)GGM10 (Gnetum)

    AG subfamily

    62

    46

    58AGL13 (Arabidopsis)

    AGL6 (Arabidopsis)Zag5 (Zea)ZAG3 (Zea)

    62

    PRMADS2 (Pinus)GBM1 (Ginkgo)

    PRMADS3 (Pinus)DAL1 (Picea)GGM11 (Gnetum)GGM9 (Gnetum)

    ZMM6 (Zea)

    AGL6 subfamily

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The DEF and GLO Subfamilies

    4852

    100

    NTGLO (Nicotiana)GLO (Antirrhinum)PI (Arabidopsis)

    DAPI (Delphinium)OSMADS2 (Oryza)

    100

    PMADS1 (Petunia)NTDEF (Nicotiana)DEF (Antirrhinum)

    TM6 (Lycopersicon)BOBAP3 (Brassica)

    AP3 (Arabidopsis)58 DAL13 (Picea)

    GGM2 (Gnetum)GGM13 (Gnetum)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The DEF and GLO Subfamilies

    4852

    100

    NTGLO (Nicotiana)GLO (Antirrhinum)PI (Arabidopsis)

    DAPI (Delphinium)OSMADS2 (Oryza)

    100

    PMADS1 (Petunia)NTDEF (Nicotiana)DEF (Antirrhinum)

    TM6 (Lycopersicon)BOBAP3 (Brassica)

    AP3 (Arabidopsis)58 DAL13 (Picea)

    GGM2 (Gnetum)GGM13 (Gnetum)

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    The DEF and GLO Subfamilies

    4852

    100

    NTGLO (Nicotiana)GLO (Antirrhinum)PI (Arabidopsis)

    DAPI (Delphinium)OSMADS2 (Oryza)

    100

    PMADS1 (Petunia)NTDEF (Nicotiana)DEF (Antirrhinum)

    TM6 (Lycopersicon)BOBAP3 (Brassica)

    AP3 (Arabidopsis)58 DAL13 (Picea)

    GGM2 (Gnetum)GGM13 (Gnetum)

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Conclusion: Land Plant Phylogeny

    MADS-Box Genes Reveal That Gnetophytes Are More CloselyRelated to Conifers than to Flowering Plants[Winter et al., 1999].

    Angiosperms

    Gnetales

    Gymnosperms

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Transmission Trees

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Consensus Sequence Data

    site sequences

    p0 (ancestor) 2p1b 6p2b 7p2c 3p3b 8p3c 2p4b 2p5 5p6b 3p7 8p8 1

    sum 47combinations 967680

    • 2007 FMD outbreak inUK

    • 10 premises, 2 ancestorsamples (p0)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Reference Transmission Tree

    p0p1b

    p2b

    p2c

    p5p3b

    p4b

    p3c

    p6b

    p7 p8

    Based on a TCS geneaology of all 47 samples, and additionalbackground knowledge.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Flowchart

    consensus sequences

    sample with 1 sequence / premise

    Hamming distances

    TCS genealogy

    rooted genealogy

    rpetal rfugal closest MST

    Analysis carried out for 1000 random samples containing oneconsensus sequence from each site.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Results: Tree Topology

    t2 t0 t5 t16 t6 t7 t3 t14

    t11

    t31

    t12

    t41

    t13

    t18

    t19 t9 t20

    t43

    t44

    t24

    t26

    t47

    t25

    t17

    t34

    t37 t4 t42

    t21

    t22

    t33

    t38

    t39 t8 t40

    t15

    t23

    t30 t1 t10

    t46

    t53

    t54

    t28

    t29

    t36

    t51

    t56

    t32

    t35

    t45

    t48

    t49

    t50

    t57

    t60

    t61

    t62

    t63

    t27

    t52

    t55

    t58

    t59

    rpetal

    topology

    freq

    uenc

    y

    0

    50

    100

    150

    200

    t2 t0 t5 t16 t6 t7 t3 t14

    t11

    t31

    t12

    t41

    t13

    t18

    t19 t9 t20

    t43

    t44

    t24

    t26

    t47

    t25

    t17

    t34

    t37 t4 t42

    t21

    t22

    t33

    t38

    t39 t8 t40

    t15

    t23

    t30 t1 t10

    t46

    t53

    t54

    t28

    t29

    t36

    t51

    t56

    t32

    t35

    t45

    t48

    t49

    t50

    t57

    t60

    t61

    t62

    t63

    t27

    t52

    t55

    t58

    t59

    topology

    topo

    Dis

    t to

    ref.

    tree

    0

    2

    4

    6

    8

    10

    radipetal: branch nodes merged towards root (p0)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Results: Tree Topology

    t0 t4 t2 t9 t3 t11 t7 t13 t5 t14 t6 t1 t8 t12

    t23

    t22

    t16

    t15

    t19

    t17

    t24

    t26

    t18

    t25

    t10

    t29

    t27

    t35

    t31

    t36

    t20

    t21

    t28

    t30

    t32

    t33

    t34

    t37

    t38

    t39

    rfugal

    topology

    freq

    uenc

    y

    0

    50

    100

    150

    200

    t0 t4 t2 t9 t3 t11 t7 t13 t5 t14 t6 t1 t8 t12

    t23

    t22

    t16

    t15

    t19

    t17

    t24

    t26

    t18

    t25

    t10

    t29

    t27

    t35

    t31

    t36

    t20

    t21

    t28

    t30

    t32

    t33

    t34

    t37

    t38

    t39

    topology

    topo

    Dis

    t to

    ref.

    tree

    0

    2

    4

    6

    8

    10

    radifugal: branch nodes merged away from root (p0)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Results: Tree Topology

    t0 t6 t4 t12 t8 t3 t2 t9 t14

    t19

    t13 t5 t11

    t10

    t16 t7 t23

    t22

    t15 t1 t21

    t28

    t24

    t17

    t25

    t37

    t18

    t26

    t34

    t35

    t38

    t20

    t27

    t29

    t30

    t31

    t32

    t33

    t36

    t39

    t40

    t41

    closest

    topology

    freq

    uenc

    y

    0

    50

    100

    150

    200

    t0 t6 t4 t12 t8 t3 t2 t9 t14

    t19

    t13 t5 t11

    t10

    t16 t7 t23

    t22

    t15 t1 t21

    t28

    t24

    t17

    t25

    t37

    t18

    t26

    t34

    t35

    t38

    t20

    t27

    t29

    t30

    t31

    t32

    t33

    t36

    t39

    t40

    t41

    topology

    topo

    Dis

    t to

    ref.

    tree

    0

    2

    4

    6

    8

    10

    closest: branch nodes merged towards closest premise

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Results: Tree Topology

    t4 t9 t2 t0 t3 t10 t1 t12 t7 t13 t8 t5 t11

    t16 t6 t14

    t15

    t17

    mst

    topology

    freq

    uenc

    y

    0

    50

    100

    150

    200

    t4 t9 t2 t0 t3 t10 t1 t12 t7 t13 t8 t5 t11

    t16 t6 t14

    t15

    t17

    topology

    topo

    Dis

    t to

    ref.

    tree

    0

    2

    4

    6

    8

    10

    MST

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Summary: FMD TransmissionTrees

    • Algorithms for constructing transmission trees• from TCS genealogies: radipetal, radifugal, closest,• minimum spanning tree (MST).

    • Comparison based on the 2007 outbreak.• closest provides TCS based transmission trees best

    precision.• MST provides even marginally better precision.

    • Outlook:• Try more sophisticated distance measures.• Include further transmission tree reconstruction methods.• Larger data sets, NGS “beyond the consensus”

    [Wright et al., 2011]

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Pairwise Alignment: Idea

    S = ACATCTCGT = ACTGTA

    alignment

    Sa = ACATCTCG|| | | :

    T a = AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Formal Definition

    • Extend sequences S and T by inserting gaps to Sa andT a.

    • aligned sequences have equal length: |Sa|= |T a|• gaps cannot be paired with gaps

    • Biological background: homology, symbols in a columnshould derive from same common ancestor.

    • Match: column with equal symbols in Sa and T a.

    • Indel: column with a gap symbol in Sa or T a.

    • Mismatch: column with different symbols (non-gap) inSa and T a.

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Formal Definition

    • Extend sequences S and T by inserting gaps to Sa andT a.

    • aligned sequences have equal length: |Sa|= |T a|• gaps cannot be paired with gaps

    • Biological background: homology, symbols in a columnshould derive from same common ancestor.

    • Match: column with equal symbols in Sa and T a.

    • Indel: column with a gap symbol in Sa or T a.

    • Mismatch: column with different symbols (non-gap) inSa and T a.

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Formal Definition

    • Extend sequences S and T by inserting gaps to Sa andT a.

    • aligned sequences have equal length: |Sa|= |T a|• gaps cannot be paired with gaps

    • Biological background: homology, symbols in a columnshould derive from same common ancestor.

    • Match: column with equal symbols in Sa and T a.

    • Indel: column with a gap symbol in Sa or T a.

    • Mismatch: column with different symbols (non-gap) inSa and T a.

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Formal Definition

    • Extend sequences S and T by inserting gaps to Sa andT a.

    • aligned sequences have equal length: |Sa|= |T a|• gaps cannot be paired with gaps

    • Biological background: homology, symbols in a columnshould derive from same common ancestor.

    • Match: column with equal symbols in Sa and T a.

    • Indel: column with a gap symbol in Sa or T a.

    • Mismatch: column with different symbols (non-gap) inSa and T a.

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Scoring of Alignments

    The score m(k) of column k is

    • the space penalty m(k) =−g , if one symbol is the gapsymbol, here: g = 2),

    • otherwise the pair score m(k) = µ(sa(k), ta(k)), here

    µ(x ,y) =

    {

    1, if x = y ,−1, otherwise

    Sa = A C A T C T C G

    T a = A C - T G T - Ascore: +1 +1 −2 +1 −1 +1 −2 −1 =−2

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Objective: Find the alignment with maximal score.

    • Problem: The number of alignments is

    (

    |S |+ |T |)

    |S |

    )

    ·

    (

    |S |+ |T |

    |T |

    )

    • Trying out all alignments is impossible.• Recursion results in trying out all alignments.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Optimal Alignments

    • Observation: A prefix alignment of an optimal alignmentis optimal (as well).

    • Otherwise, a contradiction results: The optimalalignment could be improved by changing the prefix.

    • Dynamic programming: Tabulate optimal scores ofprefix alignments

    ACATCTCG

    AC-TGT-A

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Table ofPrefix-Alignment Scores

    - A G A C

    - 0.0 -2.0 -4.0 -6.0 -8.0

    A -2.0 1.0 -1.0 -3.0 -5.0

    G -4.0 -1.0 2.0 0.0 -2.0

    C -6.0 -3.0 0.0 1.0 1.0

    The optimal alignment score is 1.0.☞Notice O(n2) complexity.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Backtracking the Alignment- A G A C

    - -0.0 -2.0✛ -4.0✛ -6.0✛ -8.0✛

    A -2.0

    ✻1.0

    ❅❅■-1.0✛ -3.0

    ❅❅■✛ -5.0✛

    G -4.0

    ✻-1.0

    ✻2.0

    ❅❅■0.0✛ -2.0✛

    C -6.0

    ✻-3.0

    ✻0.0

    ✻1.0

    ❅❅■1.0

    ❅❅■

    AG-C

    AGAC

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    BLAST: Basic Local AlignmentSearch Tool

    • Objective: Given a query sequence, find similarsequences in a database.

    • Size of database prohibits pairwise alignment of query toall entries.

    • Algorithm outline [Altschul et al., 1997]:

    1 Scan for hits, i.e. gapless short word alignments exceedinga threshold score.

    2 Extend hits maximally to obtain HSPs (high scoring pairs.3 Combine HSPs to (gapped) alignments.

    • E-values indicate expected number of HSPs with givenscore.

    • Interesting HSPs have E ≪ 1.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    BLAST:Search Engine for Sequences

    http://blast.ncbi.nlm.nih.gov/

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    BLAST:Search Engine for Sequences

    http://blast.ncbi.nlm.nih.gov/

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    BLAST:Search Engine for Sequences

    http://blast.ncbi.nlm.nih.gov/

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    BLAST:Search Engine for Sequences

    http://blast.ncbi.nlm.nih.gov/

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    “Next Generation” Sequencing

    • Long DNA sequences cannot be read like a tape.

    • Short fragments from random genomic locations can besequenced.

    • NGS generates very large number of (very) shortsequencing reads.

    http://www.illumina.com/systems/miseq.ilmn

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Massive numbers of sequencing reactions take place in oneflow cell.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    DNA is fragmented and adapters are ligated.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Fragments (with adapters) are attached to the slide in a flowcell.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    The slide is studded with primers, facilitating bridgeamplification . . .

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    . . . resulting in double stranded fragments . . .

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    . . . which are then denatured.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Multiple rounds of amplification result in a cluster from eachinitial fragment.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Reversible terminator nucleotides are added.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Incorporated nucleotide fluoresce at different wave lengths.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    After removal of the terminator, the next nucleotide is added. . .

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    . . . and the fluorescent light is imaged.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    read0 =

    read1 =

    read2 =

    read3 =

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    read0 = G

    read1 = T

    read2 = C

    read3 = A

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    read0 = GC

    read1 = TA

    read2 = CT

    read3 = AG

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    read0 = GCT

    read1 = TAA

    read2 = CTT

    read3 = AGC

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    read0 = GCTG

    read1 = TAAG

    read2 = CTTA

    read3 = AGCC

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    read0 = GCTGA

    read1 = TAAGT

    read2 = CTTAG

    read3 = AGCCG

    Each image yields one base for each cluster.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Illumina NGS Sequencing

    Reads are further processed, e.g. in sequence assembly.

    http://www.illumina.com/documents/products/techspotlights/techspotlight sequencing.pdf

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Applications of “Next Generation”Sequencing

    • Novel genome sequencing

    • Re-sequencing to discover genomic variation• Single nucleotide polumorphisms (SNPs), and their

    association to pheonotypic traits,• Evolution of genomic variation patterns.

    • Metagenomics

    • *-Seq techniques• gene expression measurement: RNA-Seq• binding sites: ChIP-Seq• microRNA-Seq

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Mapping NGS Reads

    • Task: align billions of to a known reference genome.

    • Not feasible using dynamic programming.

    • Feasible using advanced indexing of the reference.• e.g. Burrows-Wheeler transform• Pigeonhole principle

    GACTAGAGTAGACGATGAGACCCATGACA

    GGC GAGTAGACGAT GACCCATGATAGGCT GAGTAGCCGATG CCCATGACAGGCT AGTAGCCGATGAG CTCATGACAGGCTAG GTAGACGATGAGA CATGACAGGCTAGA AGACGATGAGA ATGACAGGCTAGAG AGCCGATGAGACC ATGACAGGCTAGAGT GACGATGAGACCC

    CCGATGAGACCCAT

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Finding Single NucleotidePolymorphisms (SNPs)

    GACTAGAGTAGACGATGAGACCCATGACA

    GGC GAGTAGACGAT GACCCATGATA

    GGCT GAGTAGCCGATG CCCATGACA

    GGCT AGTAGCCGATGAG CTCATGACA

    GGCTAG GTAGACGATGAGA CATGACA

    GGCTAGA AGACGATGAGA ATGACA

    GGCTAGAG AGCCGATGAGACC ATGACA

    GGCTAGAGT GACGATGAGACCC

    CCGATGAGACCCAT

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Finding Single NucleotidePolymorphisms (SNPs)

    GACTAGAGTAGACGATGAGACCCATGACA

    GGC GAGTAGACGAT GACCCATGATA

    GGCT GAGTAGCCGATG CCCATGACA

    GGCT AGTAGCCGATGAG CTCATGACA

    GGCTAG GTAGACGATGAGA CATGACA

    GGCTAGA AGACGATGAGA ATGACA

    GGCTAGAG AGCCGATGAGACC ATGACA

    GGCTAGAGT GACGATGAGACCC

    CCGATGAGACCCAT

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Aligning Reads to ReferenceSequences

    Example: RNA-Seq

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Assembling NGS Reads

    GCTGATGTGCCGCCTCACTCCGGTGG

    CACTCCGGTGG

    CTCACTCCTGTGG

    GCTGATGTGCCACCTCA

    GATGTGCCGCCTCACTC

    GTGCCACCTCACTCCGG

    CTCCGGTGG

    • Many copies of a genome are fragmented

    • Each base has quality, giving its probability of beingcorrect.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Assembly: Example

    C99A99G99A98G95C96A93

    C99A99G99A99G99C99A97G95A94C96A89

    A99G99A99C99A99A45C26T57A87A85G84T78

    A99A99G99T99G99C98T99A96T91C88A82

    C99T99A99T99C99A99A96C94T95

    T99A99T99C99A99A94C97T95A91G88

    A99A99C99T98A91G93

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Assembly: Example

    C99A99G99A98G95C96A93

    C99A99G99A99G99C99A97G95A94C96A89

    A99A99A99A99A99A99A99G99A99C99A99A45C26T57A87A85G84T78

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99G99T99G99C98T99A96T91C88A82

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99C99T99A99T99C99A99A96C94T95

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99T99A99T99C99A99A94C97T95A91G88

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99C99T98A91G93

    C99A99G99A99G99C99A99G99A99C99A99A99C99T99A99A99G99T99G99C99T99A99T99C99A99A99C99T99A99G99

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Assembly: Example

    C99A99G99A98G95C96A93

    C99A99G99A99G99C99A97G95A94C96A89

    A99A99A99A99A99A99A99G99A99C99A99A45C26T57A87A85G84T78

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99G99T99G99C98T99A96T91C88A82

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99C99T99A99T99C99A99A96C94T95

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99T99A99T99C99A99A94C97T95A91G88

    A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99A99C99T98A91G93

    C99A99G99A99G99C99A99G99A99C99A99A99N99T99A99A99G99T99G99C99T99A99T99C99A99A99C99T99A99G99

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Sequence Assembly

    • Assembly depends on overlaps among reads.

    • Quality of bases must be taken into account.

    • Reads that are too short are not informative.

    • Repetitive sequences make assembly difficult.

    • Insufficient depth results in multiple contigs.

    • Sufficient depth is a key success factor:• Joining of contigs depends on sufficient overlap (N50

    value).• Resolving low quality bases depends on depth.• Depth does not help resolve repetitive sequences.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Assembly: Overlap Approach

    ATTCCCGTA

    CCCGTAA

    6

    TAATCTACGACTAAG

    2

    ATTAAGTCA

    1

    CTACGAT

    GTCACAACC

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Assembly: Overlap Approach

    ATTCCCGTA

    CCCGTAA

    6

    TAATCTACGACTAAG

    2

    ATTAAGTCA

    1

    1

    3

    1

    GTCACAACC

    1CTACGAT

    2

    1

    21

    4

    2

    1

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    De Bruijn Graph

    http://commons.wikimedia.org/wiki/File:DeBruijn-as-line-digraph.svg

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    NGS Assembly: De BruijnApproach

    Compeau, Pevzner & Tesler, Nature Computational Biology 29 (2011): 987–991, Fig. 3

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Polymorphisms and de BruijnAssembly

    [Leggett et al., 2013, Fig. 1]

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Summary ¡: NGS Data Analysis

    • Mapping to a reference sequence, using indexing• resequencing• detection of SNPs and other variants,• identification of genes (RNA-seq).

    • De novo assembly of genomes or transcriptomes.• Resource intensive (particularly memory)• Overlap assembly: feasible with smaller sets• De Bruijn graph assembly of k-mers

    • NGS metagenomics . . .

    Software has limitations and is evolving rapidly.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Outline

    1 IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    2 Sequence AnalysisPairwise AlignmentBLAST

    3 “Next Generation” Sequencing Challenges

    4 Multiple Sequence Alignment (MSA)

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Multiple Alignment

    • Extend pairwise approach?• 2 sequences: table of n2 prefix alignments• 3 sequences: table of n3 prefix alignments• Warning: Very large numbers ahead

    • Aligning 100 sequences of 300 symbols: about 10170 prefixalignments.

    • How much computing time does the universe have?

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Multiple Alignment

    • Extend pairwise approach?• 2 sequences: table of n2 prefix alignments• 3 sequences: table of n3 prefix alignments• Warning: Very large numbers ahead

    • Aligning 100 sequences of 300 symbols: about 10170 prefixalignments.

    • How much computing time does the universe have?

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Multiple Alignment

    • Extend pairwise approach?• 2 sequences: table of n2 prefix alignments• 3 sequences: table of n3 prefix alignments• Warning: Very large numbers ahead

    • Aligning 100 sequences of 300 symbols: about 10170 prefixalignments.

    • How much computing time does the universe have?

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Multiple Alignment

    • Extend pairwise approach?• 2 sequences: table of n2 prefix alignments• 3 sequences: table of n3 prefix alignments• Warning: Very large numbers ahead

    • Aligning 100 sequences of 300 symbols: about 10170 prefixalignments.

    • How much computing time does the universe have?

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Progressive Multiple Alignment

    • Compute all pairwise alignments

    • Use alignment dissimilarities to produce a guilde tree.

    • Align most similar pair of sequences and merge them intoa profile.

    • Progressively align profiles.

    • Result: All sequences aligned (and merged into oneprofile).

    • Programs clustal, muscle

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    More Uses ofContinuous Sequences

    • Profile searches (mostly superseded by HMMs)

    • Progressive multiple alignment

    ACAC

    ACCC

    AGT

    AGAT

    AGCT

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    More Uses ofContinuous Sequences

    • Profile searches (mostly superseded by HMMs)

    • Progressive multiple alignment

    ACAC

    ACCC

    AGT

    AGAT

    AGCTa 1.0 0.0 0.5 0.0c 0.0 0.0 0.5 0.0g 0.0 1.0 0.0 0.0

    t 0.0 0.0 0.0 1.0

    - 0.0 0.0 0.0 0.0

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    More Uses ofContinuous Sequences

    • Profile searches (mostly superseded by HMMs)

    • Progressive multiple alignment

    ACAC

    ACCC

    AGT

    AGAT

    AGCTa 1.0 0.0 0.5 0.0c 0.0 0.0 0.5 0.0g 0.0 1.0 0.0 0.0

    t 0.0 0.0 0.0 1.0

    - 0.0 0.0 0.0 0.0

    a 1.0 0.0 0.5 0.0

    c 0.0 1.0 0.5 1.0

    g 0.0 0.0 0.0 0.0

    t 0.0 0.0 0.0 0.0

    - 0.0 0.0 0.0 0.0

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    More Uses ofContinuous Sequences

    • Profile searches (mostly superseded by HMMs)

    • Progressive multiple alignment

    ACAC

    ACCC

    AGT

    AGAT

    AGCTa 1.0 0.0 0.5 0.0c 0.0 0.0 0.5 0.0g 0.0 1.0 0.0 0.0

    t 0.0 0.0 0.0 1.0

    - 0.0 0.0 0.0 0.0

    a 1.0 0.0 0.5 0.0

    c 0.0 1.0 0.5 1.0

    g 0.0 0.0 0.0 0.0

    t 0.0 0.0 0.0 0.0

    - 0.0 0.0 0.0 0.0

    a 1.0 0.0 0.3 0.0

    c 0.0 0.0 0.3 0.0

    g 0.0 1.0 0.0 0.0

    t 0.0 0.0 0.0 1.0

    - 0.0 0.0 0.3 0.0

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    More Uses ofContinuous Sequences

    • Profile searches (mostly superseded by HMMs)

    • Progressive multiple alignment

    ACAC

    ACCC

    AGT

    AGAT

    AGCTa 1.0 0.0 0.5 0.0c 0.0 0.0 0.5 0.0g 0.0 1.0 0.0 0.0

    t 0.0 0.0 0.0 1.0

    - 0.0 0.0 0.0 0.0

    a 1.0 0.0 0.5 0.0

    c 0.0 1.0 0.5 1.0

    g 0.0 0.0 0.0 0.0

    t 0.0 0.0 0.0 0.0

    - 0.0 0.0 0.0 0.0

    a 1.0 0.0 0.3 0.0

    c 0.0 0.0 0.3 0.0

    g 0.0 1.0 0.0 0.0

    t 0.0 0.0 0.0 1.0

    - 0.0 0.0 0.3 0.0a 1.0 0.0 0.4 0.0

    c 0.0 0.4 0.4 0.4

    g 0.0 0.6 0.0 0.0

    t 0.0 0.0 0.0 0.6

    - 0.0 0.0 0.2 0.0

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Multiple Alignment

    (program: clustalx)http://www.clustal.org/

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Overview of Molecular Phylogeny

    sequences

    alignment

    aligned sequences

    dist. calc.

    dist. matrix

    neighbor j.

    tree

    parsimony

    tree

    max. likelih.

    tree

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Acknowledgements

    • Kai-Uwe Winter, Thomas Münster, Luzie U. Wingen,Günter Theißen, Heinz Saedler

    • Begoña Valdazo-Gonzalez, Nick Knowles, Don King

    • Jan Gewehr, Thomas Martinetz, Daniel Polani, SimonMoxon, Vincent Moulton

    • Anyela Camargo, Alessandra Devoto, John Turner

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    References

    Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang,

    J., Zhang, Z., Miller, W., and Lipman, D. J. (1997).Gapped BLAST and PSI-BLAST: A new generation ofprotein database search programs.Nucleic Acids Research, 25:3389–3402.http://nar.oupjournals.org/cgi/content/full/25/17/3389.

    Crosswell, L. C. and Thornton, J. M. (2012).

    ELIXIR: A distributed infrastructure for Europeanbiological data.Trends in Biotechnology, 30:241–242.

    Langton, C. G. (1992).

    Preface.In Langton, C. G., Taylor, C., Farmer, J. D., andRasmussen, S., editors, Artificial Life II, volume X ofSanta Fe Institute Studies in the Sciences ofComplexity, Proceedings, pages xiii–xviii, RedwoodCity, CA. Addison-Wesley.

    Leggett, R. M., Ramirez-Gonzalez, R. H., Verweij, W.,

    Kawashima, C. G., Iqbal, Z., Jones, J. D., Caccamo,M., and MacLean, D. (2013).Identifying and classifying trait linked polymorphismsin non-reference species by walking coloured de bruijngraphs.PLoS One, 8:e60058.

    Winter, K.-U., Becker, A., Münster, T., Kim, J. T.,

    Saedler, H., and Theißen, G. (1999).MADS-box genes reveal that gnetophytes are moreclosely related to conifers than to flowering plants.Proceedings of the National Academy of Sciences,USA, 96:7342–7347.

    Wright, C. F., Morelli, M. J., Thébaud, G., Knowles,

    N. J., Merzyk, P., Paton, D. J., Haydon, D. T., andKing, D. P. (2011).Beyond the consensus: Dissecting within-host viralpopulation diversity of foot-and-mouth disease virusby using next-generation genome sequencing.Journal of Virology, 85:2266–2275.

  • Bioinformatics

    Jan T. Kim

    Introduction

    Mol. Bio. Basics

    Plant Phylogeny

    FMDTransmission

    SequenceAnalysis

    PairwiseAlignment

    BLAST

    NGS

    MSA

    Thank Youfor your attention and participation

    IntroductionMolecular Biology BasicsResolving the Phylogeneny of Land PlantsReconstructing Foot and Mouth Disease Transmission Trees

    Sequence AnalysisPairwise AlignmentBLAST

    ``Next Generation'' Sequencing ChallengesMultiple Sequence Alignment (MSA)


Recommended