+ All Categories
Home > Documents > Multiple Sequence Alignment

Multiple Sequence Alignment

Date post: 25-Feb-2016
Category:
Upload: oral
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Multiple Sequence Alignment. Definition. Given N sequences x 1 , x 2 ,…, x N : Insert gaps (-) in each sequence x i , such that All sequences have the same length L Score of the global map is maximum. Applications. Scoring Function: Sum Of Pairs . Definition: Induced pairwise alignment - PowerPoint PPT Presentation
Popular Tags:
76
http://cs273a.stanford.edu [Bejerano Fall16/17] 1 CS273A Lecture 17: Cross Species Comparisons
Transcript
Page 1: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 1

CS273A

Lecture 17: Cross Species Comparisons

Page 2: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 2

Announcements• Your project should be coming along nicely!

Page 3: Multiple Sequence Alignment

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG

3http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 4: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 4

TerminologyOrthologs : Genes related via speciation (e.g. C,M,H3)Paralogs: Genes related through duplication (e.g. H1,H2,H3)Homologs: Genes that share a common origin

(e.g. C,M,H1,H2,H3)

Species tree

Gene tree

SpeciationDuplicationLoss

singleancestralgene

Page 5: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 5

Chains join together related local alignments

Protease Regulatory Subunit 3

likely ortholog

likely paralogsshared domain?

Page 6: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 6

Before and After Chaining

Page 7: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 7

Netting AlignmentsCommonly multiple mouse alignments can be found for a particular human region, eg including for most coding regions.

Net finds best match mouse match for each human region.Highest scoring chains are used first.Lower scoring chains fill in gaps within chains inducing a natural hierarchy.

Page 8: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 8

Net highlights rearrangements

A large gap in the top level of the net is filled by an inversion containing two genes. Numerous smaller gaps are filled in by local duplications and processed pseudo-genes.

Page 9: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 9

Nets attempt to computationally capture orthologs

(they also hide everything else)

Page 10: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 10

Nets/chains can reveal retrogenes (and when they jumped in!)

Page 11: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 11

Nets

• a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels.

• a net is single-coverage for target but not for query.• because it's single-coverage in the target, it's no longer symmetrical.• the netter has two outputs, one of which we usually ignore: the target-

centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.

• nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.

• GB: for human inspection always prefer looking at the chains!

[Angie Hinrichs, UCSC wiki]

Page 12: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 12

Before and After Netting

Page 13: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 13

Convert / LiftOver"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process.

LiftOver – batch utility

Page 14: Multiple Sequence Alignment

Drawbacks

14

• Inversions not handled optimally

> > > > chr1 > > >

> > > > chr1 > > >

< < < < chr1 < < < <

< < < < chr5 < < < <

Chains

Nets > > > > chr1 > > >

> > > > chr1 > > >

< < < < chr5 < < < <

http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 15: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 15

What nets can’t show, but chains will

Page 16: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 16

Same Region…

same in allthe other fish

Page 17: Multiple Sequence Alignment

Drawbacks

• High copy number genes can break orthology

17

Page 18: Multiple Sequence Alignment

Gene Families

18

Page 19: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 19

Self Chain reveals (some) paralogs

(self net ismeaningless)

Page 20: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 20

The Biggest Challenge in Genomics…… is computational:

How does this encode this

Program Output

Page 21: Multiple Sequence Alignment

21

Xkcd Take – It’s Actually Not That Bad

http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 22: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 22

Why compare to Chimp?

Page 23: Multiple Sequence Alignment

2323

Humans and Chimpanzees PossessMany Vastly Different Phenotypes

A: Chimp B: Human

A B

[Varki, A. and Altheide, T., Genome Res., 2005]

A B

Page 24: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 24

Disease Susceptibility Differences

Page 25: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 25

What human-chimp changes do we find?

Small

Large

Medium

Page 26: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 26

Large differences

Fusion (HSA 2) 18 pericentromeric inversions

Page 27: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 27

Medium Sized Differences

Gene families expandand contract

Mobile element insertionand mediated deletion

Page 28: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 28

Small Differences

1% difference at the base level

Page 29: Multiple Sequence Alignment

PhenotypeGenotype

Genetic basis of human phenotypes?N

umbe

r of r

earr

ange

men

ts

29http://cs273a.stanford.edu [Bejerano Fall16/17]

Most mutationsare near/neutral.How do we know?4D sites, ARs.

Page 30: Multiple Sequence Alignment

The Genotype - Phenotype divide

http://cs273a.stanford.edu [Bejerano Fall16/17] 30

Can we find evolutionary patterns that are distinct enough to be phenotypically revealing?

Species A

Species B

Problem #1:

Too many nucleotide changes between any pair of related species (or individuals).

The vast majority of these are near/neutral.

Page 31: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 31

Is it in our protein coding genes?

70-80% of all human-chimp orthologous proteins differ.On average they differ by 1-2 amino acids.• Which amino acid changes matter?• One can also compare non-synonymous amino acid

substitutions with synonymous changes, and look for proteins unusually enriched from the former.Those may be evolving under positive selection.

Page 32: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 32

Positive and negative gene selection in the human genome

Page 33: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 33

Candidate genes for human specific evolution

...

Page 34: Multiple Sequence Alignment

34

What if we did an unbiased search?Human-specific substitutions in conserved sequences

34

[Pollard, K. et al., Nature, 2006] [Beniaminov, A. et al., RNA, 2008]

Human

Chimp

Humanrapid change

HAR1:• Novel ncRNA• 18 unique human substitutions

conserved

Chimp

Page 35: Multiple Sequence Alignment

Different Unbiased Search: Loss vs Gain

Chimp

Humanrapid change • 4-18 unique human substitutions

• Pollard, K. et al., Nature, 2006• Prabhakar, S. et al., Science, 2008

conserved

Human Accelerated Regions

deleted!

Chimp

Human

conserved

Human Conserved Sequence Deletions

(hCONDELs)• Complete human loss of sequence• Likely to confer human-specific

phenotypes

http://cs273a.stanford.edu [Bejerano Fall16/17]

[McLean, Reno, Pollen et al., Nature, 2011]

35

Page 36: Multiple Sequence Alignment

Identifying hCONDELs

http://cs273a.stanford.edu [Bejerano Fall16/17] 36

deleted!

Chimp

Human

conserved

Page 37: Multiple Sequence Alignment

hCONDEL genomic distribution

• Median size: 2.8kb• Not enriched in highly variable genomic regions• Most do not disrupt proteins: only 1 validated exonic deletion

37http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 38: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17]

Deletions of functional non-coding DNAGene Gene Gene

GeneGeneGene

Gene Gene

GeneGene

( ) ( ) ( )

( )

( ) ( ) ( ) ( )

( )( )

Gene Gene

Gene with functione.g. “neuronal gene” Gene without function

( )hCONDEL Conserved element

[McLean et al., Nat. Biotechnol., 2010]

http://great.stanford.edu

38

Page 39: Multiple Sequence Alignment

Functional enrichments of hCONDELs

Ontology Term p-valueGene Ontology Steroid hormone receptor activity 3.73 x 10-4

InterPro Fibronectin, type III 1.01 x 10-4

Zinc finger, nuclear hormone receptor type 1.80 x 10-4

CD80-like, immunoglobulin C2 set 1.37 x 10-3

Entrez Gene Neuronal genes 1.11 x 10-4

Monoallelically-Expressed Genes Monoallelic expression 8.62 x 10-3

These enrichmentsare unique to hCONDELs

http://great.stanford.eduhttp://cs273a.stanford.edu [Bejerano Fall16/17] 39

Page 40: Multiple Sequence Alignment

hCONDEL near Androgen Receptor

The deletion appears fixed in humansand appears deleted in Neandertal.

http://cs273a.stanford.edu [Bejerano Fall16/17] 40

Page 41: Multiple Sequence Alignment

Androgen Receptor chimpanzee enhancer assay

[Phil Reno, David Kingsley]

Androgen Receptor

Human

Chimp

Genomic fragment Hsp68 promoter LacZ reporter gene

http://cs273a.stanford.edu [Bejerano Fall16/17] 41

Page 42: Multiple Sequence Alignment

The human deletion near AR acts as an enhancer within known AR expression domains

E16.5

Sensory whiskers

E16.5

Genital tubercle

E16.5

E16.5

Penile spines

8 weeksE16.5

Chi

mp

enha

ncer

Mou

se e

nhan

cer

http://cs273a.stanford.edu [Bejerano Fall16/17] [Phil Reno, David Kingsley] 42

Page 43: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 43

Androgen Receptor

Cell

AndrogenReceptor

Nucleus

Testosterone

AR+Tdimer

Androgen Receptor

Human

Chimp

Page 44: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17]

Androgen responsiveness in domains of expressionSensory whiskers Penile spines

Galago

Sen

sory

whi

sker

leng

th (m

m)

[Dixson, 1976]

Mice with Ar coding region mutations lack penile spines

[Murakami, 1987]

Sensory Penilewhiskers spines

44

[Ibrahim & Wright 1983]

Page 45: Multiple Sequence Alignment

Could sequence loss lead to tissue gain?

• hCONDELs enriched for suppressors of cell proliferation or cell migration expressed in cortex (P=1.3 x 10-3)

Non-human mammals Humans

( )

Suppressproliferation

Do notsuppressproliferation

45http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 46: Multiple Sequence Alignment

The Genotype - Phenotype divide

http://cs273a.stanford.edu [Bejerano Fall16/17] 46

Can we find evolutionary patterns that are distinct enough to be phenotypically revealing?

Species A

Species B

Problem #1:

Too many nucleotide changes between any pair of related species (or individuals).

The vast majority of these are near/neutral.

Page 47: Multiple Sequence Alignment

Genotype -> Phenotype screens

http://cs273a.stanford.edu [Bejerano Fall16/17] 47

deleted!

Chimp

Human

conserved

Define a “dramatic” (non-neutral) genomic scenario:

hCONDEL

[McLean, Pollen, Reno et al, 2011]

Problem #2:

What is the phenotype?

Page 48: Multiple Sequence Alignment

Testing is Exciting… and Humbling

http://cs273a.stanford.edu [Bejerano Fall16/17] 48

These are “wild rides”: Often not what we expected, Often not what we can understand.Are we looking at the right place?Did we test at the right time?

[McLean, Pollen, Reno et al, 2011]

We are creating the humanized mice KOs

Page 49: Multiple Sequence Alignment

What about a tree of related species?

http://cs273a.stanford.edu [Bejerano Fall16/17] 49

What if we could find evolutionary patterns that were distinct enough to be phenotypically revealing?

ancestor

Species A

Species H

Genomes:Inherited and Modified.

Traits:Come and Go.

Species B...

Page 50: Multiple Sequence Alignment

ancestral trait information

Trait information is no longer under selection

Erodes away over evolutionary time

ancestor

What happens when an ancestral trait “goes”?

Phenotype Genome

50http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 51: Multiple Sequence Alignment

ancestral trait information

Trait information is no longer under selection

Erodes away over evolutionary time

ancestor

Phenotype Genome

A lot of DNA and many traitsvary between any two species.

51http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 52: Multiple Sequence Alignment

ancestral trait information

Trait information is no longer under selection

Erodes away over evolutionary time

ancestor

Phenotype Genome

52http://cs273a.stanford.edu [Bejerano Fall16/17]

A lot of DNA and many traitsvary between any two species.

What about independent trait loss?

vitamin C synthesis, tail, body hair,dentition features, etc. etc.

Page 53: Multiple Sequence Alignment

ancestral trait information

Trait information is no longer under selection

Erodes away over evolutionary time

ancestor

Phenotype Genome

53http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 54: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17]

matches trait presence/absence pattern

The PG screen

[Hiller et al., 2012a] 54

Page 55: Multiple Sequence Alignment

The PG screen

http://cs273a.stanford.edu [Bejerano Fall16/17] 55

Capture the independent genomic switch from purifying selection neutral evolution

in all and only the trait loss species.

Robust to: Different trait disabling times.Different trait disabling mutations.

Page 56: Multiple Sequence Alignment

Forward Genetics:Search for mutations that segregate with a trait of interest

Forward Genomics:Search for regions that are lost only in species lacking the trait

phenotype genotype

56http://cs273a.stanford.edu [Bejerano Fall16/17]

Branding ;-)

But does it work?

Page 57: Multiple Sequence Alignment

Vitamin C Synthesis

synthesize vitamin C cannot synthesize vitamin C

rats & mice human

57http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 58: Multiple Sequence Alignment

vitamin C synthesis was lost3-4 times independently in mammalian evolution

58http://cs273a.stanford.edu [Bejerano Fall16/17]

The Vitamin C synthesis “phenotree”

Fwd Genomics asks:Do one or moregenomic locilook like THAT?

Page 59: Multiple Sequence Alignment

We quantify divergence by comparing sequences to the reconstructed ancestral sequence

reconstruct ancestral sequence

ancestor

59

species 1

outgroup

species 2

ACCCTATCGATT-CA

ACCCTATCGATTGCA

TCCGTATCG-TT-CA

species 1

species 2

14 identical bases

11 identical bases

Mutation in species 1 or 2?

species 1species 2

93%79%

percent of identical bases: more diverged

Insertion in species 1 or deletion in species 2 ?

ACCCTATCGATTGCA

TCCGTATCG-TT-CA

ACTCT-TCGATT-AA

Page 60: Multiple Sequence Alignment

Sequencing errors mimic divergence

60

high sequencing error rate

treat species 2 as missing data

sequence quality scores

ancestor ACCCTATCGATT-CAATGG

ACCCTATCGATTGCAAGGGspecies 1

species 2

89% identical bases

61% identical basesTCCGTAACG--T-CTATCG

Page 61: Multiple Sequence Alignment

Assembly gaps mimic divergence

61

?????????species 1

Sanger reads

assembly gap

conserved region

treat species 1 as missing data

species 2species 3species 4species 5

Page 62: Multiple Sequence Alignment

...

Reconstruct the evolutionary history of all conserved regions, coding and non-coding

85%

70%

93%

matrix: 33 species x 544,549 regions

544,549 conserved regions

• Reconstruct ancestral sequence• Measure extant species divergence• Avoid

• Low quality sequence• Assembly gaps

• Seek perfect phenotree match

62http://cs273a.stanford.edu [Bejerano Fall16/17]

reconstructancestrallocus

Page 63: Multiple Sequence Alignment

We quantify the match to the vitamin C pattern by counting the number of species that violate the pattern

Percent identity0 100

Percent identity0 100

1 violation

2 violations63http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 64: Multiple Sequence Alignment

8

Regions matching the vitamin C trait are clustered

these conserved regions are all exons of a single gene

544,549 conserved regions

no. o

f vio

latin

g sp

ecie

s

012345

7

910

6

no match

perfect match

64http://cs273a.stanford.edu [Bejerano Fall16/17]

Page 65: Multiple Sequence Alignment

This gene is more diverged in all non-vitamin C synthesizing species

http://cs273a.stanford.edu [Bejerano Fall16/17] 65

Page 66: Multiple Sequence Alignment

What is the function of this gene ?

http://cs273a.stanford.edu [Bejerano Fall16/17] 66

encodes the enzyme responsible for vitamin C biosynthesis

Vitamin C pattern

Gulo - gulonolactone (L-) oxidase

33 genomes X 544,549 regions

Note: 1. No likely shared

disabling mutation.2. We learned about

both evolution and function.

Page 67: Multiple Sequence Alignment

The Power of Forward Genomics

http://cs273a.stanford.edu [Bejerano Fall16/17] 67

Vitamin C pattern

Gulo - gulonolactone (L-) oxidase

33 genomes X 544,549 regions

Forward genomics works.Can it work for continuous traits?With only two independent losses?And many unknown values?

Page 68: Multiple Sequence Alignment

BileBile is a fluid produced by the liver that aids the digestion of lipids in the small intestine.

http://cs273a.stanford.edu [Bejerano Fall16/17] 68

Page 69: Multiple Sequence Alignment

Bile Phospholipids

http://cs273a.stanford.edu [Bejerano Fall16/17] 69

Different mammals have remarkably different levels of biliary phospholipids:

Page 70: Multiple Sequence Alignment

ABCB4 is a phospholipid transporter

http://cs273a.stanford.edu [Bejerano Fall16/17] 70

Page 71: Multiple Sequence Alignment

Find “Cure” Models for Human Disease

http://cs273a.stanford.edu [Bejerano Fall16/17] 71

Human ABCB4 mutations lower patient biliary phospholipid levels to guinea pig levels but are detrimental. Our discovery: Guinea pig and horse have inactivated the Abcb4 gene in their natural state. How can they do it?

create KO gene

try to fix/treat

Natural KO

find nature’s cure!

Page 72: Multiple Sequence Alignment

We have now collected • Million genomic loci by Fifty mammals• Thousands of scored mammalian traits

And we are playing MATCH and TEST.

Reverse Genetics:Pick interesting loci, mutate and try to figure out phenotype/s

Reverse Genomics:Compute independent loss for ALL genomic loci, match to traits

phenotype genotype

72http://cs273a.stanford.edu [Bejerano Fall16/17]

Reverse Genomics

Page 73: Multiple Sequence Alignment

Reverse Genomics of Enhancers

http://cs273a.stanford.edu [Bejerano Fall16/17] 73

Page 74: Multiple Sequence Alignment

Back of an Envelope Wish

http://cs273a.stanford.edu [Bejerano Fall16/17] 74

Page 75: Multiple Sequence Alignment

Poster Child Example

http://cs273a.stanford.edu [Bejerano Fall16/17] 75

Page 76: Multiple Sequence Alignment

http://cs273a.stanford.edu [Bejerano Fall16/17] 76


Recommended