Post on 03-Jan-2016
transcript
1
SNPTrack1.1.0 Quick Start
National Center for Toxicological ResearchU.S. Food and Drug Administration
3900 NCTR Road Jefferson, AR 72079
2
SNPTrack Overview
Mission – To support the FDA’s research in
pharmacogenetics and regulation of personalized medicine (8 out of 40 VGDS submissions contain SNP data)
What is SNPTrack?– An integrated system for SNP (Single Nucleotide
Polymorphism) and GWAS (Genome-wide Association Studies) data management, analysis and interpretation.
3
SNPTrack - An integrated system for SNP data
• Oracle Relational Database Hosting – Phenotype data (study information)
Including sample ID, sex, age, race, family history, and etc. Plus demographic and phenotypic information
– Genotype data (genotype results)Genotype calls results (SNP chip assay results or individual genotype results, usually calls generated from software provided by the manufacture)
– SNP panel (a list of SNP that is typed. Or SNP chip from Affymatrix or Illumina).Affymatrix (SNP 6.0, SNP 5.0, Mapping 500K, Mapping 100K, Mapping10K, DMET 2K)Illumina BeadChips (100K, 300K, 550K and 1 million)
– SNP Library and other Libraries from ArrayTrack
4
• Association analysis– SNP filtering by the Hardy-Weinberg equilibrium– PLINK (Allelic association, Genotypic association,
Linear/logistic regression etc.)– Haploview (table view, correlation plot, LD plot,
Haplotype block view, combination of p value filter, FDR, and combining and merging tables).
SNPTrack - An integrated system for SNP data
5
Links of SNP Library for data interpretation– dbSNP, HapMap, Ensemble and UCSC Browser – Gene library for extended annotation and other
library inter-links, such as protein Library, Orthologene Library, EntrezGene, GenBank, Gene Card, SwissPort, OMIM, Chromosomal map etc.
– GOFFA and Pathway analysis for gene ontology, pathway and disease information
SNPTrack - An integrated system for SNP data
6
7
Microarray DB
Libraries SNP data analysis
SNPTrack for Genetic Data Management, Analysis and Interpretation
8
SNP DB
Exp Owner
Data is organized in a tree structure:
Exp Name
SNP List
Study Dataset
Input Data
Formatted Data
9
Exploring DB
SNP Panel (a list of SNP that is typed)
Phenotype (study information)
Genotype data (genotype results from SNP array or sequencing)
10
Exploring DB ( Creating a SNP Exp, data security, sharing and data importing)
11
Searching SNP Library
12
Analysis Tools
13
Analysis Tools
Allelic AssociationSNPs are taken one by one. For every SNP, a 2x2 contingency table is built by counting the number of times each possible allele SNP appears in a case or control sample.
Test results outputF_A: Minor Allele freq in cases F_U: Minor Allele freq in controlsCHISQ: Allelic test chi-squareP: Fisher’s exact testOR: Odds Ratio
Minor Allele Major AlleleControl F_U 1 - F_UCase F_A 1 - F_A
14
Analysis Tools
Genotypic Association Test Results (Full Model Association Tests)One may perform tests of association between a disease and a variant other than the basic allelic test (which compares frequencies of alleles in cases versus controls),
In addition to the basic allelic test: Cochran-Armitage trend test (additive) Genotypic (2 df) test
– 2x3 contingency table) Dominant gene action test
Recessive gene action test Results: ALLELIC, TREND, GENO, DOM or REC test with CHISQ, DF
and corresponding P values.
AA AB BB
Ctrl 27 101 142
Case 55 124 91
15
Analysis Tools
Linear /Logistic regression (Quantitative trait association)
Quantitative traits can be tested for association also, using either asymptotic (likelihood ratio test and Wald test) or empirical significance values. If the chosen phenotype is quantitative (i.e. contains values other than 1, 2, 0 or missing) then PLINK will automatically treat the analysis as a quantitative trait analysis. That is, the same command as for disease-trait association
BETA Regression coefficient SE Standard error R2 Regression r-squared T Wald test (based on t-distribution) P Wald test asymptotic p-value
16
SNP List
Create Display Import Export Delete
17
SNP List – Venn Diagram
Comm and unique SNP
18
Study Case1- CFS (Chronic Fatigue syndrome) collaborating with CDC
It is part of the Wichita CFS Surveillance Study. Microarray Data (MWG 20K Human) was generated from167 participants and genotype data was generated from 80 participants by Affymatrix GeneChip (Mapping 100K). There are 62 participants are common for both platforms, in which 35 CFS and 27 non_CFS
175 genes have been identify as significant (P<0.05 and Fold>4) and 65 SNPs are indentified as significant in Allelic association tests (P<0.01). Two genes are common (NPAS2_rs356653, and GRIK2 _rs2247215 & rs2247218)
19 LD plot ( linkage disequilibrium)( they can be exchangeable)
Gene_GRIK2 HapMap view: rs2247215 and rs2247218 are at intron(glutamate receptor, ionotropic, kainate)
20
UCSC browser on NPAS2_rs356653 (neuronal PAS domain protein 2)Raj did one SNP at a time for 65 SNPs, we Show aTable for SNP information from UCSC using SNP Track by one click!!!)
21
UCSC browser on NPAS2_rs356653 (a closer view)
22
Hapmap View Gene_NPAS2 NPAS2_rs356653 is closer to exon
LD plot ( linkage disequilibrium)
23
NPAS2_rs356653 HapMap view
LD plot ( linkage disequilibrium)
24
Study Case2- Parkinson Disease_SNPTrack live demo(Hon-Chung Fung etl. Laboratory of Nerogenetics, The Lancet Neurology, Vol. 6 No. 5 pp 414-420 , May 2007)
273 Parkinson’s disease samples and 275 controls, total of 548 samples. SNP 100K (Illumina Infinium I) and HumanHap 300K are used, total of 400K SNP have been typed
25
Interpretation Link SNPs to various libraries for data interpretation
26
Interpretation (continued)
KEGG – Kyoto Encyclopedia of Genes and Genomes
http://www.genome.jp/kegg/ KEGG is a suite of databases and associated
software. KEGG Pathway database provides the information of
metabolic, regulatory and disease pathways; Most of them are metabolic pathways.
27
Interpretation (continued)
Human Rat Mouse
Kegg 203 195 197
PathArt 587 151 297
PathArt (Jubilant) – a pathway database•The Pathways (over 600 mammalian disease and signaling)•The Pathways is a collection of manually curated information from literature and public domain databases.
In ArrayTrack
28
Interpretation (continued) PathArt
Genes Pathways Physiology/disease Statistical significance of the pathway
29
Accessing SNPTrack and Support
Currently SNPTrack is only available to FDA internally:
http://weblaunch.nctr.fda.gov/jnlp/arraytrack/snptrack.html
NCTRBioinformaticsSupport@nctr.fda.gov
SNPTrack is developed by the U.S. Food and Drug Administration, NationalCenter for Toxicological Research (FDA/NCTR). FDA/NCTR reserves all rights for the software .
30
Thank you!
National Center for Toxicological ResearchU.S. Food and Drug Administration