Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell.

Hepatitis CHepatitis C

Analysis of Sequence Data from ARUP and NCBI databases

Analysis of Sequence Data from ARUP and NCBI databases

By Ian Odell

What Information can we get from ARUP sequencing data?What Information can we get from ARUP sequencing data?

Data is from January 2002 – July 2004.

5’ Un-translated Region of types 1 – 6:

• Number of unique sequences by type.

• Frequency of unique sequences for each type.

• Frequency of each base in each type seen in a position weight matrix.

• Regions of high and low variation seen in graphs of a Position Weight Matrix.

Data is from January 2002 – July 2004.

5’ Un-translated Region of types 1 – 6:

• Number of unique sequences by type.

• Frequency of unique sequences for each type.

• Frequency of each base in each type seen in a position weight matrix.

• Regions of high and low variation seen in graphs of a Position Weight Matrix.

Unique Sequences by type:Unique Sequences by type:

HCV Total Unique UnambiguousType Sequences Sequences Unique Sequences1 16151 1320 750

2 2862 585 373

3 2430 404 232

4 284 99 68

5 7 5 4

6 44 20 17

total 21778 2434 1444

***Ambiguous bases causes unique sequences to be overrepresented.

HCV Total Unique UnambiguousType Sequences Sequences Unique Sequences1 16151 1320 750

2 2862 585 373

3 2430 404 232

4 284 99 68

5 7 5 4

6 44 20 17

total 21778 2434 1444

***Ambiguous bases causes unique sequences to be overrepresented.

Frequency of unique sequences for type 1:Frequency of unique sequences for type 1:











ConclusionsConclusions

1. Each type has a ‘profile’ sequence.2. Do the log v log graphs give us insight into the

distribution of mutations within the Hepatitis C population?

NEXT:Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates).Open Profiles

1. Each type has a ‘profile’ sequence.2. Do the log v log graphs give us insight into the

distribution of mutations within the Hepatitis C population?

NEXT:Look for variation between and within types from the unique sequences that are highly represented in the population (i.e. those that have many duplicates).Open Profiles

Stuyver et al. 1996. “Second-generation line probe assay for hepatitis C virus genotyping.” J. Clin. Microbiol. 34:2259-2266.

In R5, the six selected probes were used for types 1 (line 4), 3 (line 15), 4 and 10 (line 18), and 5 (line 20), as well as for subtypes 2a/2c (line 11), 2b (line 12), and 3b (line 18).

Weight MatricesWeight Matrices

• From Profiles, we can see areas of variation between types and their conservation within each type.

• Next, we want to see what these look like for all sequences in each type.

• From Profiles, we can see areas of variation between types and their conservation within each type.

• Next, we want to see what these look like for all sequences in each type.

Example Weight MatrixExample Weight Matrix

First 10 base positions of Type 2 HCV1 2 3 4 5 6 7 8 9 10

A 0.000433 0.999752 0.000124 0.000186 0 0 0.999629 0 0 0.000124

C 0 0 6.20E-05 6.20E-05 6.20E-05 6.20E-05 0 6.20E-05 6.20E-05 0

G 6.20E-05 0 0.000124 0.999567 0.000495 0 6.20E-05 0.000186 6.20E-05 0.000186

T 0.999257 6.20E-05 0 0 0.998824 6.20E-05 0.000248 6.20E-05 0.999629 6.20E-05

DASH 0.000186 0.000124 0.99969 0.000124 6.20E-05 0.999876 6.20E-05 0.99969 0.000124 0.999629

AMBIG 6.20E-05 6.20E-05 0 6.20E-05 0.000557 0 0 0 0.000124 0

This allows us to see the variationwithin a type at each nucleotide.

Graphical Type 1 Weight MatrixGraphical Type 1 Weight Matrix

[ R5 ] ]

[ R5 ] ]

Sum of all points at each x-value = 1.Y-value tells us percentage each base is found at that index.We are looking for a region of conservation in all types;later we can look for variation between types.






What information can we get from NCBI data?What information can we get from NCBI data?

• Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation).

• Are there mismatches under the ARUP primers? Do ARUP primers bias the sequence data by not amplifying a certain group?

• Regions of low and high variation in the complete genome. Compare to 5’ UTR. alignment not good enough for an accurate analysis.

• Look at Complete HCV Genome publications because blasting 5’ UTR primers biases towards what those primers amplify (i.e. Blast returns most similar hits and we want to look for variation).

• Are there mismatches under the ARUP primers? Do ARUP primers bias the sequence data by not amplifying a certain group?

• Regions of low and high variation in the complete genome. Compare to 5’ UTR. alignment not good enough for an accurate analysis.

Graphical Weight Matrix of ARUP (5’ UTR) AmpliconGraphical Weight Matrix of ARUP (5’ UTR) Amplicon

[ Rev Primer ]

[ For Primer ]]

[ Rev Primer ]

[ For Primer ]]

Data is from 239 aligned complete HCV genomes downloaded from GenBank.

Graphical Weight MatrixARUP forward primer region in Blast complete genome alignment

Graphical Weight MatrixARUP forward primer region in Blast complete genome alignment

1 5 32 Ins 17 SNP’s / 239 Sequences

SNP’s and insertions under ARUP Forward Primer

Graphical Weight MatrixARUP reverse primer inBlast complete genome alignment

Graphical Weight MatrixARUP reverse primer inBlast complete genome alignment

3 SNP’s / 239 Sequences

SNP’s and insertions under ARUP Reverse Primer

Date post:	13-Dec-2015
Category:	Documents
Upload:	ashlee-hubbard
View:	221 times
Download:	0 times

Hepatitis C Analysis of Sequence Data from ARUP and NCBI databases By Ian Odell.

Documents