Post on 02-Jan-2016
description
transcript
LESSON 9: Analyzing DNA Sequences and DNA Barcoding
PowerPoint slides to accompany
Using Bioinformatics: Genetic Research
Chowning, J., Kovarik, D., Porter, S., Grisworld, J., Spitze, J., Farris, C., K. Petersen, and T. Caraballo. Using Bioinformatics: Genetic Research. Published Online October 2012. figshare. http://dx.doi.org/10.6084/m9.figshare.936568
Image Source: Wikimedia Commons
How DNA Sequence Data is Obtained for Genetic Research
Genetic Data
…TTCACCAACAGGCCCACA…
Extract DNA from Cells
Sequence DNA
CompareDNA
Sequences to One Another
Obtain Samples: Blood , Saliva, Hair
Follicles, Feathers, Scales
TTCAACAACAGGCCCACTTCACCAACAGGCCCACTTCATCAACAGGCCCAC
GOALS:• Identify the organism from which the DNA was obtained.• Compare DNA sequences to each other.
Overview of DNA Sequencing
DNA Sample
Mix with primersPerform sequencing reaction
…T T C A C C A A C T G G C C C A C A…
DNA Sequence Chromatogram
Image Source: Wikimedia Commons
Sequence Both Strands of DNA
Sequence #1:Top Strand
Sequence #1: Top Strand
Sequence #2: Bottom Strand
A T G A C G G A T C A G C
T A C T G C C T A G T C GSequence #2:Bottom Strand
Compare the Two Sequences
5’- A T G A C G G A T C A G C – 3’
3’- T A C T G C C T A G T C G – 5’
Sequence #1:Top (“F”)
Sequence #2:Bottom (“R”)
Bioinformatics tools like BLAST can be used to compare the sequences from the two strands.
Sequence #1: Top Strand
Sequence #2: Bottom Strand
Image Source: Wikimedia Commons
Image Source: NCBI, FinchTV, BOLD.
Analyzing DNA SequencesDay One:1. Obtain two chromatograms for each sample.
2. Align the sequences with BLAST.
Day Two: 3. Visualize the chromatograms using FinchTV. Compare BLAST alignments against base calls in chromatogram.
Day Three:6. Translate the sequence to check for stop codons.
7. Use BLAST to identify origin of sequence.
8. Use BOLD to confirm identity and make phylogenetic tree.
ATGCCGTAA M P STOP
Sequence #1:Top Strand
Sequence #2:Bottom Strand
A T G A C G G A T C A G C
T A C T G C C T A G T C G
Sequence #1 Sequence #2 4. Review any differences and determine which base is most likely correct.
5. Edit and trim the DNA sequence using quality data.
Quality Values Represent the Accuracy of Each Base Call
Quality values represent the ability of the DNA sequencing software to identify the base at a given position.
Quality Value (Q) = log10 of the error probability * -10.
Q10 means the base has a one in ten chance (probability) of being misidentified.
Q20 = probability of 1 in 100 of being misidentified.
Q30 = probability of 1 in 1,000 of being misidentified.
Q40 = probability of 1 in 10,000 of being misidentified.
Quality Values Are Used When Comparing Sequences
Quality values represent the ability of the DNA sequencing software to identify the base at a given position.
Image Source: FinchTV
Examples of Chromatogram Data
Circle #1: Example of a series of the same nucleotide (many T’s in a row). Notice the highest peaks are visible at each position.
Circle #2: Example of an ambiguous base call. Notice the T (Red) at position 57 (highlighted in blue) is just below a green peak (A) at the same position. Look at the poor quality score on bottom left of screen (Q12). An A may be the actual nucleotide at this position.
Circle #3: Example of two A’s together. The peaks look different, but are the highest peaks at these positions.
#1 #2 #3
Image Source: FinchTV
Image Source: NCBI, FinchTV, BOLD.
Analyzing DNA SequencesDay One:1. Obtain two chromatograms for each sample.
2. Align the sequences with BLAST.
Day Two: 3. Visualize the chromatograms using FinchTV. Compare BLAST alignments against base calls in chromatogram.
Day Three:6. Translate the sequence to check for stop codons.
7. Use BLAST to identify origin of sequence.
8. Use BOLD to confirm identity and make phylogenetic tree.
ATGCCGTAA M P STOP
Sequence #1:Top Strand
Sequence #2:Bottom Strand
A T G A C G G A T C A G C
T A C T G C C T A G T C G
Sequence #1 Sequence #2 4. Review any differences and determine which base is most likely correct.
5. Edit and trim the DNA sequence using quality data.
Transcription and Translation Begin at the Start Codon
5’- A T G A C G G A T G A G C – 3’3’- T A C T G C C T A C T C G – 5’
Sequence #1:
Sequence #2:
Reading Frame +1 M T D Q
There Are Six Potential Reading Frames in DNA
5’- A T G A C G G A T G A G C – 3’3’- T A C T G C C T A C T C G – 5’
Sequence #1:
Sequence #2:
Reading Frame +1 M T D Q Reading Frame +2
Reading Frame +3
Reading Frame -2 Reading Frame -1
Reading Frame -3