+ All Categories
Home > Documents > Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database...

Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database...

Date post: 11-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
Unique HCV Data and Analysis Tools in the Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute, San Diego, USA
Transcript
Page 1: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Unique HCV Data and Analysis Tools in the Virus Pathogen Resource (ViPR)

Yun Zhang

J. Craig Venter Institute, San Diego, USA

Page 2: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

New HCV Typing Pipeline

Improvement of Subtype Annotations in Virus Pathogen Resource (ViPR)

Leveraging Annotation Results for Analysis

Future Plan

Outline

Page 3: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Objective for HCV Subtyping

https://talk.ictvonline.org/ictv_wikis/flaviviridae/w/sg_flavi/56/hcv-classification

• Phylogenetically-principled subtyping

• Consistently subtype all HCV genomes in public domain

• Make annotations available via ViPR

• Be consistent with the ICTV subtype classification

Page 4: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Phylogeny-based Subtype Classification

AqAX

q is A-type: bracketed by A and A.

AqBX

AB-anc

A.1qA.2X

q is of unknown type: bracketed by A and B (it could be "C", "A", or "A.x").Naïvely, it looks like q must be of A-type, but we do notknow at which point along the branch going from AB-ancestor to A, the type changes from AB-ancestor-type to A-type.

q is of A-type: bracketed by A.1 and A.2 (it could be "A.3", "A.1", or "A.1.x").

Page 5: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Novel HCV Typing Pipeline

Query Identifier

Query Length Type Consensus

Assignment SupportPhylogenetic

Tree Report

AB677533 9471Matching CladesMatching Down-tree Bracketing CladesMatching Up-tree Bracketing Clades

1b1b1b

1.01.01.0

ViewInput alignment (FASTA)Output tree (Newick)Subtype assignment (text)

Genotyping/Subtyping Report (Beta) (SOP)

Your analysis contains 1 records

Page 6: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

HCV Typing in ViPR3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization and analysis tools

https://www.viprbrc.org/brc/home.spg?decorator=flavi_hcv 1/2

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Hepatitis C VirusTaxonomy: Group IV ((+)ssRNA); Flaviviridae; Hepacivirus; Hepatitis C virusVirion: 50 nm, icosahedral, envelopedGenome: 9.6 kilobase positive­sense, single­stranded RNAProteome: single polyprotein, co­ & post­translationally cleaved into 10 mature proteinsInfection: initiates by E2 protein interacting with cell surface heparan sulfate proteoglycansRNA Transcript: 5’ internal ribosomal entry site (IRES), no 3’ poly­A tailTransmission: infects humans & chimps via blood­to­blood contactPhylogeny: 6 distinct genotypes identified, each with multiple subtypesEpidemiology: 2­3 million infected each year worldwide, almost 200 million infectedClinical: causes cirrhosis, hepatocellular carcinoma, and liver failure

SearchSearch our comprehensive database for:

AnalyzeAnalyze data online:

Save to WorkbenchUse your workbench to:

Browse All Search Types Browse All Tools

Data on host response toInfluenza and SARS

infections is now available!Host­virus interaction data produced bylaboratories associated with the NIAID­fundedSystems Biology for Infectious DiseasesResearch Program is now available in ViPR.

This release increases the amount of hostfactor data for a total of 46 microarray, 16proteomics and 4 lipidomics (in vivo and invitro) experiments for various SARS­ andMERS­CoV strains as well as H5N1, H3N2and H1N1 influenza A viruses.In this release, the capability is nowavailable to search for a single host factoracross multiple experiments through the'Host Factor Results' button on the 'HostFactor Biosets' page. In addition, displayingthe Reactome pathway(s) containing one(or more) host factors are now availablefrom both the 'Patterns' and 'BooleanOperator' pages.Additional experiments using various '­omics' technologies, as well as analyticaland visualization tools will become availablein future releases of ViPR.

For more details about these studies, or to viewthe results, click on the “Host Factor Data” linkfrom the “Search Data” menu.

Genomes

Genes & proteins

Sequence Feature Variant Types

Immune epitopes

3D protein structures

Host Factor Data

Antiviral Drugs

Sequence Alignment

Phylogenetic Tree

Sequence Variation (SNP)

Metadata­driven Sequence Analysis

Genome Annotator

BLAST

Store and share data

Combine working sets

Integrate your data with ViPR data

Store and share analyses

Custom search alert

Highlights

Decoration options let you color tree leaves by metadata.Export image and legend, or download trees as Newick or

Start Analysis

Tutorial

Multiple Sequence AlignmentCompute and visualize multiple sequence alignmentstogether with derived consensus sequence andconservation score within ViPR. Perform customalignments using the MUSCLE algorithm. Alignments canbe saved to the ViPR WorkBench or downloaded invarious formats.

Key Highlights:

Align multiple virus sequencesVisualize alignments; customize alignment displaySave alignment to ViPR workbench

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES HELP [email protected]

Hepatitis C virusAbout Us Community Announcements Links Resources Support

10/9/2018 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Sequence Search

https://www.viprbrc.org/brc/vipr_genome_search.spg?method=ShowCleanSearch&decorator=flavi_hcv 1/2

Loading Virus Pathogen Database and Analysis Resource (ViPR)...

Start to type subfamily, genus, species or strain to get suggestions Deselect All

DATA TO RETURNGenome

ProteinStrain

SELECT VIRUS(ES) TO INCLUDE IN SEARCHJump to subfamily, genus, species or strain in taxonomy:

COMPLETE GENOME Complete Genome Only

Start:

End:

COLLECTIONYEAR

To add month tosearch, see AdvanceSearch Options:Month Range

GEOGRAPHIC GROUPING

COUNTRY

HOST SELECTION

Host GenderAllMale

Female

HOST ATTRIBUTES

Sample Source

SAMPLE ATTRIBUTES

Subtype Infection Type

VIRUS ATTRIBUTES

Results matching your criteria: 542,434

Subtype: 1 Select All(0/7796 strains selected) (7796 Strains ­ 38 complete genomes)

Subtype: 1a Select All(0/28683 strains selected) (28683 Strains ­ 587 complete genomes)

Subtype: 1b Select All(0/28283 strains selected) (28283 Strains ­ 830 complete genomes)

Subtype: 1b/2k Select All(0/1 strains selected) (1 Strain ­ 0 complete genomes)

Subtype: 1c Select All(0/85 strains selected) (85 Strains 19 complete genomes)

YYYY

YYYY

Gene/Protein SearchSearch for virus protein/gene and related information. You can search for the whole virus family or search for specified genus, species etc. You can also find your strain orgenome record if you have its information, such as strain name, accession. Protein/Gene searches for Dengue virus or Hepatitis C virus can be augmented with clinical metadata criteria. Selecting the appropriate nodes in the taxonomy browser(Flavivirus, Dengue virus, Hepacivirus, Hepatitis C virus) will add metadata search panels and enable you to include these criteria. Some sequences have more metadata fieldsdefined than others. Queries based on metadata only retrieve sequences for which those fields are defined.

Choose a Geographic..

Choose a Country...

Choose a Host...

GENE SYMBOL( SOP )

ViPR Home Hepatitis C virus Home Gene/Protein Search

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA VIRUS FAMILIES HELP [email protected]

Hepatitis C virusAbout Us Community Announcements Links Resources Support

Page 7: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Improved Annotations in ViPR

83603

27336

443760

12498

1319 2550

20000

40000

60000

80000

100000

Identical types

between ViPR

& GB

ViPR new

annotations

GB unique

annotations

ViPR improved

type precision

ViPR lost type

precision

Different types

between ViPR

& GB

Others

65%

21%

3%

10%

1%

Total sequences: 223,324Sequences >= 400 nt: 128,815

Jul 30, 2018 release

Page 8: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Genotype/Subtype Distribution

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

1 1a 1b 1c 1e 1g 1h 1l 1m 1n 2 2a 2b 2c 2f 2i 2j 2k 2m 2q 3 3a 3b 3g 3h 3i 3k 4 4a 4d 4f 4g 4k 4l 4m 4n 4o 4r 4v 4w5 5a 6 6a 6c 6e 6f 6g 6h 6i 6j 6l 6m 6n 6o 6p 6q 6r 6s 6t 6u 6v 6w 6xa

6xb

6xd

6xe

6xf 7 7a

Log2 # sequences July 2018 release

0%

20%

40%

60%

80%

100%

1 1a 1b 1c 1e 1g 1h 1l 1m 1n 2 2a 2b 2c 2f 2i 2j 2k 2m 2q 3 3a 3b 3g 3h 3i 3k 4 4a 4d 4f 4g 4k 4l 4m 4n 4o 4r 4v 4w5 5a 6 6a 6c 6e 6f 6g 6h 6i 6j 6l 6m 6n 6o 6p 6q 6r 6s 6t 6u 6v 6w 6xa

6xb

6xd

6xe

6xf 7 7a

Af rica Asia Europe North America Oceania South America

GT1 78% GT3 11%

Page 9: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Leveraging Annotation Results for Analysis

GT1

GT2

GT3

GT4

GT6

GT5

Page 10: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Other Comparative Analysis Tools in ViPR

Excel Download FASTA Download View Phylogenetic Tree Find a VT(s)

Protein Name NS5aSequence Feature Name Hepatitis C Virus_NS5a_RAS_31(1)Sequence Feature ID Hepatitis C virus_NS5a_SF3Reference Strain H77-1aReference Sequence Accession NC_004102Reference Position 31

Source Strain

VT Number

Source Position

Source Accession

3D Protein

StructurePublication Evidence

Codes Comment

H77-1a -N/A- 31 NC_004102 1CWX EXP L31F/M/V substitutions conferred resistance to

NS5A inhibitor treatment for certain genotype infections.

Source: HCV Guidance: Recommendations for

Testing, Managing, and Treating Hepatitis C

[http://www.hcvguidelines.org/print/92]

SEQUENCE FEATURE DEFINITION

SOURCE STRAIN(S)

VARIANT TYPES

Strain Count Variant Type Phenotypic Variant TypeSequence Variation

31 Total Variation11933 VT-1 No L 0670 VT-2 Yes M 155 VT-3 Yes V 125 VT-4 No I 15 VT-5 No P 13 VT-6 No S 1

Page 11: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Known limitations– Reference tree defines subtype

boundaries– Limited gold standard annotations

curated by experts

Big data– Genome sequences: 223,324– Fragments– Highly similar

– Regional diversity between subtypes

Improving Typing Tool via Data Mining

2862

1888

1005

235840

1000

2000

3000

4000

0.80 0.90 1.00

# se

quen

ces

CD-HIT ThresholdCDS

A section of HCV reference alignment

Page 12: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Expand the Reference Tree

Current reference tree– GT1b

Expanded reference tree– GT1b

GB-type_accession|new

Count %Unique testing sequences 6381 1.00Typed as GT1b using the expanded tree 5699 0.89Typed as GT1 using the expanded tree 668 0.10

Testing expanded reference treeTesting data: 8186 sequences

annotated as GT1 in ViPR/GT1b in GenBank

Leverage subtype metadata in GenBank

Page 13: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

An automated HCV subtyping pipeline• accurate• efficient

Comprehensive, improved subtype annotations in ViPR

Many comparative genomic analysis tools in ViPR

Future plan: Verify new candidate reference sequences

Summary & Plan

Page 14: Unique HCV Data and Analysis Tools in the Virus Pathogen ... · 3/27/2017 Virus Pathogen Database and Analysis Resource (ViPR) - Flaviviridae - Genome database with visualization

Acknowledgements

Christian ZmasekRichard Scheuermann

Sherry HeChristian SulowayJyothsna ReddyXiaomei Li Sam Zaremba

Donald Smith

HHSN272201400028C


Recommended