Date post: | 13-Dec-2015 |
Category: |
Documents |
Upload: | ashlee-hubbard |
View: | 221 times |
Download: | 0 times |
Hepatitis CHepatitis C
Analysis of Sequence Data from ARUP and NCBI databases
Analysis of Sequence Data from ARUP and NCBI databases
By Ian Odell
What Information can we get from ARUP sequencing data?What Information can we get from ARUP sequencing data?
Data is from January 2002 – July 2004.
5’ Un-translated Region of types 1 – 6:
• Number of unique sequences by type.
• Frequency of unique sequences for each type.
• Frequency of each base in each type seen in a position weight matrix.
• Regions of high and low variation seen in graphs of a Position Weight Matrix.
Data is from January 2002 – July 2004.
5’ Un-translated Region of types 1 – 6:
• Number of unique sequences by type.
• Frequency of unique sequences for each type.
• Frequency of each base in each type seen in a position weight matrix.
• Regions of high and low variation seen in graphs of a Position Weight Matrix.
Unique Sequences by type:Unique Sequences by type:
HCV Total Unique UnambiguousType Sequences Sequences Unique Sequences1 16151 1320 750
2 2862 585 373
3 2430 404 232
4 284 99 68
5 7 5 4
6 44 20 17
total 21778 2434 1444
***Ambiguous bases causes unique sequences to be overrepresented.
HCV Total Unique UnambiguousType Sequences Sequences Unique Sequences1 16151 1320 750
2 2862 585 373
3 2430 404 232
4 284 99 68
5 7 5 4
6 44 20 17
total 21778 2434 1444
***Ambiguous bases causes unique sequences to be overrepresented.
ConclusionsConclusions
1. Each type has a ‘profile’ sequence.2. Do the log v log graphs give us insight into the
distribution of mutations within the Hepatitis C population?
NEXT:Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates).Open Profiles
1. Each type has a ‘profile’ sequence.2. Do the log v log graphs give us insight into the
distribution of mutations within the Hepatitis C population?
NEXT:Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates).Open Profiles
Stuyver et al. 1996. “Second-generation line probe assay for hepatitis C virus genotyping.” J. Clin. Microbiol. 34:2259-2266.
In R5, the six selected probes were used for types 1 (line 4), 3 (line 15), 4 and 10 (line 18), and 5 (line 20), as well as for subtypes 2a/2c (line 11), 2b (line 12), and 3b (line 18).
Weight MatricesWeight Matrices
• From Profiles, we can see areas of variation between types and their conservation within each type.
• Next, we want to see what these look like for all sequences in each type.
• From Profiles, we can see areas of variation between types and their conservation within each type.
• Next, we want to see what these look like for all sequences in each type.
Example Weight MatrixExample Weight Matrix
First 10 base positions of Type 2 HCV1 2 3 4 5 6 7 8 9 10
A 0.000433 0.999752 0.000124 0.000186 0 0 0.999629 0 0 0.000124
C 0 0 6.20E-05 6.20E-05 6.20E-05 6.20E-05 0 6.20E-05 6.20E-05 0
G 6.20E-05 0 0.000124 0.999567 0.000495 0 6.20E-05 0.000186 6.20E-05 0.000186
T 0.999257 6.20E-05 0 0 0.998824 6.20E-05 0.000248 6.20E-05 0.999629 6.20E-05
DASH 0.000186 0.000124 0.99969 0.000124 6.20E-05 0.999876 6.20E-05 0.99969 0.000124 0.999629
AMBIG 6.20E-05 6.20E-05 0 6.20E-05 0.000557 0 0 0 0.000124 0
This allows us to see the variationwithin a type at each nucleotide.
Graphical Type 1 Weight MatrixGraphical Type 1 Weight Matrix
[ R5 ] ]
[ R5 ] ]
Sum of all points at each x-value = 1.Y-value tells us percentage each base is found at that index.We are looking for a region of conservation in all types;later we can look for variation between types.
What information can we get from NCBI data?What information can we get from NCBI data?
• Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation).
• Are there mismatches under the ARUP primers? Do ARUP primers bias the sequence data by not amplifying a certain group?
• Regions of low and high variation in the complete genome. Compare to 5’ UTR. alignment not good enough for an accurate analysis.
• Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation).
• Are there mismatches under the ARUP primers? Do ARUP primers bias the sequence data by not amplifying a certain group?
• Regions of low and high variation in the complete genome. Compare to 5’ UTR. alignment not good enough for an accurate analysis.
Graphical Weight Matrix of ARUP (5’ UTR) AmpliconGraphical Weight Matrix of ARUP (5’ UTR) Amplicon
[ Rev Primer ]
[ For Primer ]]
[ Rev Primer ]
[ For Primer ]]
Data is from 239 aligned complete HCV genomes downloaded from GenBank.
Graphical Weight MatrixARUP forward primer region in Blast complete genome alignment
Graphical Weight MatrixARUP forward primer region in Blast complete genome alignment
1 5 32 Ins 17 SNP’s / 239 Sequences
SNP’s and insertions under ARUP Forward Primer