+ All Categories
Home > Documents > STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou,...

STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou,...

Date post: 17-Mar-2018
Category:
Upload: doandat
View: 216 times
Download: 3 times
Share this document with a friend
22
STAT 157 Homework #1 Problem #4 Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Team 1
Transcript
Page 1: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

STAT 157Homework #1

Problem #4

Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) SongTeam 1

Page 2: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Agenda

• Problem

• Method

– Random sampling

– Alignment

– Ordering reads

– Construction of contigs

• Summary of findings

• Comparison

Page 3: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Problem

Download the set of 40 sequence reads (8 nucleotides long each).Perform the sequence assembly using the greedy algorithm.

Page 4: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Problem

Download the set of 40 sequence reads (8 nucleotides long each).Perform the sequence assembly using the graph greedy algorithm.

Download the set of 40 sequence reads (8 nucleotides long each).Perform the sequence assembly using the graph greedy algorithm.

Suppose that you were told that read 2 and 23* are from the same clone and round 70 bp apart. How would this affect your sequence assembly?

Suppose that you were told that read 2 and 23* are from the same clone and round 70 bp apart. How would this affect your sequence assembly?

Now we sample 25 reads without replacement from the set of 40 reads. Perform the assembly again as above on the 25 reads. How does this affect the sizes of the assembled contigs and the number of contigs?

Now we sample 25 reads without replacement from the set of 40 reads. Perform the assembly again as above on the 25 reads. How does this affect the sizes of the assembled contigs and the number of contigs?

Page 5: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Alignment of base pairs and linking of reads

Alignment of base pairs and linking of reads

Find reads to be used in contigconstruction (greedy algorithm)

Find reads to be used in contigconstruction (greedy algorithm)

Find weights and their directions of overlap

Find weights and their directions of overlap

Method

AlignmentAlignment

Construction of contigsConstruction of contigs

Ordering readsOrdering reads

Page 6: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Random sampling

Write code to select a random sample of 25 reads without replacement

Our PROBLEM !!!

Page 7: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Alignment

Find the number of overlapping base pairs for each combination of reads in both directions.

ACTGG TTG

TTG AGCAC

Weight: 3

1

2

TTGAGC AC

AC TGGTTG

Weight: 2

2

1

1

1

2

2

8

82

3Weight

matrix

Example:

Page 8: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Ordering reads

Order the reads by decreasing weightsIf a tie occurs, pick randomly

Follow the greedy algorithm

Example: W S E

8

8

7

7

2

3

8

6

3

2

5

5

1

2

3

4

1 3

2 4

1

2

Reads used for contig construction

Randomly order reads w/ same weight W = Weight S = Start E = End

Page 9: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Construction of contigs

Match the read starts with read ends and vice versa

S E

2

8

5

6

3

5

9

2

1

2

3

4

Page 10: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Problem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song

Computer Simulation���

Page 11: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) SongProblem #4 Part 2

10.0017.00Max

9.4515.47Mean

9.0014.00Min.

Sample 25 Reads

40 Reads

Number of Contigs��

Page 12: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) SongProblem #4 Part 2

102.00173.00Max

98.15166.67Mean

95.00161.00Min.

Sample 25 Reads

40 Reads

Length of contigs

Note: Total number of bp is 320

Page 13: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) SongProblem #4 Part 2

10.5611.50Max

10.410.80Mean

10.210.18Min.

Sample 25 Reads

40 Reads

Average Length of contigs

Page 14: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Page 15: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Sample 25 reads w/o replacement###input

reads=reads<-read.table("http://www.cmb.usc.edu/deonier/data/data_files/r.reads.txt",skip=1)

reads<-reads[sample(nrow(reads),25,replace=FALSE),]

Page 16: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Getting the weights

for(x in 1:dim(weights)[1]) {

for(y in 1:dim(weights)[2]){

if(x==y) weights[x,y]<- 0

else weights[x,y] <- {

max(sapply(1:8, function(i)

attr(sapply(1:8, function(ss)

gregexpr(substring(reads.v[x], 1, ss), substring(reads.v[y], 9-ss)))[[i]], 'match.length')

))}}}

weights <- replace(weights, weights==-1, 0)

Page 17: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Ordering the reads

ordered.weights <- ss.weights[order(ss.weights$weights, decreasing=TRUE), ]

Page 18: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Tie breaker

rand.order.weights <-ordered.weights[c(rev(sapply(min(ordered.weights$weights):max(ordered.weights$weights),

function(x) {sample(which(ordered.weights$weights==x),

length(which(ordered.weights$weights==x)))

} )), recursive=TRUE), ]

Page 19: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Constructing the Contig

dataframename <- 0repeat {for(i in 1:nrow(reads.use)) {

contig <- reads.use[1, ]

repeat{contig <- rbind(reads.use[which(contig$start[1]==reads.use$end), ],contig,reads.use[which(contig$end[nrow(contig)]==reads.use$start),])reads.use <- reads.use[-(which(rownames(reads.use) %in% rownames(contig))),]

if(length(which(contig$start[1]==reads.use$end))==0 &length(which(contig$end[nrow(contig)]==reads.use$start))==0)

break}

dataframename[i] <- paste("contigseq", i, sep='’)assign(dataframename[i], contig)}if(dim(reads.use)[1]==0)break}

nona <- sapply(1:length(dataframename), function(x) length(which(!is.na(get(dataframename[x])))))nona <- which(nona>0)sapply(nona, function(x) print(get(dataframename[x])))

Page 20: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

APPENDIX

Getting the letterings

contig.name<-0

repeat {for(cc in 1:length(nona)) {contig.letter <- reads.v[get(dataframename[cc])$start[1]]

for(i in 1:nrow(get(dataframename[cc]))) {{contig.letter <- paste(substring(contig.letter,

1,nchar(contig.letter)-get(dataframename[cc])$weight[i]),

reads.v[get(dataframename[cc])$end[i]],sep='')}}

contig.name[cc] <- paste("full.contig.letter", cc, sep='')

assign(contig.name[cc], contig.letter)}

break}

sapply(1:length(contig.name), function(x) get(contig.name[x]))}

Page 21: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) SongProblem #4 Part 2

Summary Statistics: 25 reads

Average size of contigs

Min. 1st Qu. Median Mean 3rd Qu. Max.

10.20 10.20 10.56 10.40 10.56 10.56

Total number of base pairs

Min. 1st Qu. Median Mean 3rd Qu. Max.

95.00 95.00 95.00 98.15 102.00 102.00

Number of contigs

Min. 1st Qu. Median Mean 3rd Qu. Max.

9.00 9.00 9.00 9.45 10.00 10.00

Page 22: STAT 157 Homework #1nolan/vigre/reports/Team1.pdfProblem #4 Part 2 Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) Song Alignment of base pairs and

Problem Method 40 reads 25 reads Comparison

Team 1: Jiejing Zhao, Mei Zhou, Yang Yang Liu, Robin Hestir, Henry Ye Li, Yi (Eve) SongProblem #4 Part 2

Summary Statistics: 40 reads

Average size of contigs

Min. 1st Qu. Median Mean 3rd Qu. Max.

10.18 10.38 10.87 10.80 10.93 11.50

Total number of base pairs

Min. 1st Qu. Median Mean 3rd Qu. Max.

161.0 163.8 166.0 166.7 170.0 173.0

Number of contigs

Min. 1st Qu. Median Mean 3rd Qu. Max.

14.00 15.00 15.00 15.47 16.00 17.00


Recommended