+ All Categories
Home > Documents > Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course?...

Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course?...

Date post: 17-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
47
Sequence Analysis Fall ‘18
Transcript
Page 1: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Sequence AnalysisFall ‘18

Page 2: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Are you ready for this course?

•Computer Science prerequisites Can you program? Can you write pseudocode? Can you download, install, run apps?

•Biology prerequisites Do you know the central dogma of biology? Have you memorized the amino acids and their properties? Do you know how evolution works?

Page 3: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

You need this:

Zvelebil & Baum = Z&B

Page 4: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

How Biologists Think• Spatio-temporo-chemical phenomena• Experimental methods/design• Macromolecular structure• Evolution

�4

Page 5: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

How Computer Scientists Think• Convert desired

input/output to set of Data Structures, and Instructions

• Query a database.• Simulate.• Optimize.

�5http://shaunskeen.com/

Page 6: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

How a computational biologist thinks

Understands biological systems.

Designs experiments to produce new data.

Compares predictions to experimental results.

�6

Converts biological systems into data structures.

Converts hypotheses into instructions.

Writes database queries.

Biology sideof brain

CS sideof brain

m o d e l i n g

v a l i d a t i n g

Page 7: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

The Scientific Process -- DIPA

�7

Biology Computer Science

• Data generation experiments filling databases• Interpretation Informal models Formal models• Prediction Outcomes Simulations• Action Experimental design Obtain more data

Page 8: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

An example of thinking like a computational

biologist: The Evolution of

Chromosomes ...

�8

Page 9: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Syntenic groups: genes that stay together, generally for survival

Trp operon genes in bacteria

Page 10: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

• Inversions

Large scale evolutionary changes in chromosomes...

• Duplications

�10

“...the most important evolutionary force since the emergence of the universal common ancestor.” -- Susumu Ohno

• TranspositionsA syntenic group appears on a different chromosome.

A syntenic group appears on the opposite strand, different location.

Page 11: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

•Mouse and human chromosomes have the same content, but gene locations are scrambled!

NOTE: no transpositions from/to X, Y chromosomes!

...have led to chromosomes that are mosaics of other species' chromosomes

Page 12: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Let's look at meiosis, Prophase I

http://www.phschool.com/science/biology_place/labbench/lab3/concepts2.html

How?

Page 13: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Chiasmata

�13

http://ib.bioninja.com.au/higher-level/topic-10-genetics-and-evolu/101-meiosis/chiasmata.html

where crossovers occur

Page 14: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Normal prophase 1

non-sister cromatids synapse, sometimes cross over

alleles swapped, gene order conserved

C

da

BA D

bc tetrad forms

C

da

BA D

bc

Cda B

A Db c

xx

...then, tetrads line up (metaphase I), separate (anaphase I)

Page 15: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

xx

Abnormal prophase 1

illegitimate synapses form, sometimes cross over

alleles swapped, genes conserved, order reversed

CBA D

d abc tetrad forms,

something goes wrong.... it is backwards.

...then, tetrads line up (metaphase I), separate (anaphase I)

CBA D

d abc

CB

A D

d a

bc

Page 16: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

�16

A B C D + a b c d => A b c D + a B C d

A B C D + a b c d => A c b D + a C B d

Normal prophase 1

Abnormal prophase 1

Page 17: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

x

..or maybe it happens this way.

A

BC

D

c

b

a

x legitimate synapses form, but due to a loop, they get too close, crossovers are swapped.

CB

A D

d a

bc

same result

Page 18: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Accumulation of inversions over (how much?) evolutionary time

Mouse chromosomes colored according to homology to human chromosomes.

Page 19: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

rug rat mouse rat

Can we map our ancestry using inversions?

Page 20: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

rug rat mouse rat

How many inversions: mouse-to-human, rat-to-human, rat-to-mouse?

How many inversions have happened? How do we count them? The answer tells us

the evolutionary distance.

Page 21: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

�21

How many inversions is this?

Page 22: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

The way a computational biologist thinks of the problem of counting

inversions.

�22

Page 23: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

The Pancake Flipping ProblemA sloppy cook at a pancake diner makes pancakes of all different sizes and stacks them haphazardly.

A meticulous waiter likes the pancakes to be stacked with the largest on the bottom and the smallest on top. On the way to the table, using only one hand with a spatula, he flips the pancakes until they are arranged by size, largest on bottom, smallest on top.

•What is the algorithm for flipping?

•What is the algorithm for finding the fewest flips?

Page 24: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

In class exercise:

Given the arrangement below, flip the pancakes until they are in order. How many flips? (You can order the numbers instead of the pancakes.)

642315 123456

Page 25: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

In class exercise: work in pairs

•Write detailed instructions on how to stack six pancakes in order by flipping. The instructions should not depend on the starting order.

•Generate a random pancake order. Apply your algorithm. 125436 ---> ... ---> ... ---> 123456

•Pick a volunteer: Write pseudocode on the board.

Page 26: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

TATTAGCCCGTGACAAACTAAGCCTATG

�26

TATTAGCCTAGTTTGTCACGAGCCTATG5’ 3’

5’3’

5’ 3’

5’3’ATAATCGGATCAAACAGTGCTCGGACAC

ATAATCGGGCACTGTTTGATTCGGATAC

Flipping changes chain direction

If we make reverse complement to be negative numbers:1 2 3 4

becomes,1 -3 -2 4

Page 27: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Evolution by reversals.123456789

-4-3-2-156789

-4-3-6-512789

-4-35612789

-4-356-9-8-7-2-1

1234-9-8-7-6-5

123789-4-6-5

12378964-5

-8-7-3-2-1964-5

ancestral mammal

humanmouse

“-” indicates reverse complement.

Page 28: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Pancakes-burned-on-one-side Problem

A sloppy cook at a pancake diner makes pancakes of all different sizes and stacks them haphazardly. He’s also a bad cook. All of the pancakes are burned on one side.

The waiter likes the pancakes to be stacked with the largest on the bottom and the smallest on top and with all of the burned sides down!

He has two spatulas and can flip any number of adjacent pancakes at a time.

•How does he arrange the pancakes in the fewest flips?

Page 29: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

In class exercise, part 2:

•Write instructions for the Two-Spatula-Pancakes-burned-on-one-side Problem. When flipped, pancakes change order and sign. Flip any segment.

•Generate a random pancake arrangement. Apply your algorithm 1 -2 5 4 -3 6 ---> ... ---> ... ---> 123456

Page 30: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

"Dot plot"�30

1 2 3 4 5 6 7 8 9

2 9 5 4 6 8 7 1 3

sequence 1se

quen

ce 2 Place a dot where

sequences are identical

A data structure for comparing sequences

Data structure!

Page 31: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Simpler dot plot: line segments

�31

1 2 3 4 5 6 7 8 9

9 5 4 6 7 8 1 2 3

If dots are bases, lines are sequences. If dots are sequences, lines are syntenic groups

Page 32: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

+/- integers are direct/rc strands

�32

Z&B p. 158

1 2 3 4 5 6 7 8 9-4 -3 5 6 -9 -8 -7 -2-1

1 2 3 4 5 6 7 8 9 -4 -3 5 6 -9 -8 -7 -2-1

Each number represents a line segment

Each negative number represents its reverse complement

reverse complement alignment

direct alignment

Page 33: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

�33

P. Pevzner et al, Trends in Genetics. Volume 20, Issue 12, December 2004, Pages 631–639

Decompose graph into cyclesConnect breakpoints

Breakpoint graph decomposition algorithm for counting minimal genomic reversals in the mammalian X chromosome: human versus mouse.

Page 34: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

�34

1 2 3 4 5 6 7 8 9

1 2-7-6-5-4-3 8 9

1 2-7-8 3 4 5 6 9

1-3 8 7-2 4 5 6 9

Breakpoint graph decomposition

Page 35: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

�35

1 2 3 4 5 6 7 8 9

1 2-7-6-5-4-3 8 9

1 2-7-8 3 4 5 6 9

1-3 8 7-2 4 5 6 9

Number of inversions = number of cycles

Breakpoint graph decomposition 1

Page 36: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

1 2 3 4 5 6 7 8 9

1 2-7-6-5-4-3 8 9

1 2 6 7-5-4-3 8 9

1 2 6 7-5 4 3 8 9

Number of inversions = N/2 - 1 N = # of verteces

Breakpoint graph decomposition 2

Page 37: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

�37

How many inversions?

P. Pevzner et al, Trends in Genetics. Volume 20, Issue 12, December 2004, Pages 631–639

Breakpoint graph decomposition

Page 38: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

How do I find the alignment matrix given the sequences?

�38

Page 39: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

A similarity matrix

�39

•Identical characters get score=1.•Non-identical gets score=0.

A C T G A A C C T1

11

11

111

1

0 0 0 0 0 00 0 0 0 0

0 0 0 0 00

000 0 0 0 000 0

0 0 0 00 00 0 0 00 00 0 0 00 00 0 0 00 00 0 0 0 000

A C

T G A

A C

C T

11

1

1

1

1

1

11

1

11

1

1

... in its most basic form:

Page 40: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

An alignment matrix.

�40

1=aligned (associated)0=not aligned

•Boolean matrix.•Max one “1” per row.•Max one “1” per column. 0

1

A C T G A A C C T1

11

1

101

1

0 0 0 0 0 00 0 0 0 0

0 0 0 0 00

000 0 0 0 000 0

0 0 0 00 00 0 0 00 00 0 00 00 0 0 00 00 0 0 0 000

A C

T G A

A C

C T

00

0

0

0

0

0

10

0

00

0

0

Page 41: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

An alignment matrix.

�41

•If A(i,j)==1, then A(m,n)=0 for all (m<i && n>j)•If A(i,j)==1, then A(m,n)=0 for all (m>i && n<j)i.e. SW and NE of “1” must be zero.

Alignments may be “sequential” or "non-sequantial".If alignment is sequential, see rules below.

0

1

A C T G A A C C T1

11

1

101

1

0 0 0 0 0 00 0 0 0 0

0 0 0 0 00

000 0 0 0 000 0

0 0 0 00 00 0 0 00 00 0 00 00 0 0 00 00 0 0 0 000

A C

T G A

A C

C T

00

0

0

0

0

0

10

0

00

0

0

Page 42: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

A Dot Plot is a similarity matrixEach position in the matrix D[i,j] is either

dot, if A[i] == B[j]

blank, otherwise.

AAGACGTTTA GACGTACT

Page 43: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Dot plot diagonals are alignmentsTo find short, unbroken alignments, we set window size and stringency and mark all dots that pass with a line.

AAGACGTTTA GACGTACT

Show all diagonals with at least 4 out

of 5 matches.

"window size" is the length of a diagonal, "stringency" is minimum number of matches in the window.

Page 44: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Dot matrix with reverse complement

AAGACGTTTA GACGTACT

Base matches its complement

Reverse diagonal means inverse alignment

Now we have two types of dots

Page 45: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Install UGENE • ugene.net/• Download UGENE manual.• Follow instructions for installing UGENE

on your system.

�45

Page 46: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

Take-home exercise 1: UGENE Turn in Thurs. Sep 6

• Open FASTA/human_T1.fa (or download from "UGENE files" link)• Right-click in sequence window. Select Analyze...Build Dotplot... Set

x=y=human_T1. Direct repeats and inverted. Set minimum length = 50, 100%identity. Click OK.

• Navigate the dotplot window by zooming and scrolling. • Find locus of the longest repeat in chromosome coordinates. • Annotate it as a repeat unit, call it “longest repeat”.• Next class (or by the end of class), turn in a small strip of

paper with your name, the location of the longest direct or reverse complement repeat in the sequence, and the first 5 bases of the 5' end.

�46

Page 47: Sequence Analysis Fall ‘18 · Sequence Analysis Fall ‘18. Are you ready for this course? •Computer Science prerequisites ... Title: lecture1_2018_tools Author: Chris Bystroff

We're done. Review.

• How is genome rearrangement like flipping a stack of burnt pancakes?

• What is a similarity matrix?• What is an alignment matrix?• What is the significance of a row of dots in a

dotplot?• What is the first step of algorithm development?

(a) define data structures? (b) define loop structures? (c) google stackoverflow?

�47See you next time.


Recommended