Date post: | 14-Jun-2015 |
Category: |
Education |
Upload: | dan-bolser |
View: | 1,681 times |
Download: | 0 times |
1
Potato SNPs
Dan Bolser and David Martin
Next Gen Bug, Dundee01/18/2010
2
Aims of the work
1) Learn about handling RNASeq Create a SNP calling pipeline
2) Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA)
3
Creating a SNP calling pipeline
4
5
1) Index the potato genome assembly
bwa index [-a bwtsw|div|is] [-c] <in.fasta>
2) Perform the alignment
bwa aln [options] <in.fasta> <in.fq>
3) Output results in SAM format (single end)
bwa samse <in.fasta> <in.sai> <in.fq>
Align (using BWA)
Align (using Bowtie)
1) Index the potato genome assembly
bowtie-build [options] <in.fasta> <ebwt>
2) Perform the alignment and output results
bowtie [options] <ebwt> <in.fq>
7
8
1) Convert SAM to BAM for sorting
samtools view -S -b <in.sam>
2) Sort BAM for SNP calling
samtools sort <in.bam> <out.bam.s>
Alignments are both compressed for long term storage and sorted for variant discovery.
Convert (using SAMtools)
9
10
Coverage profiles /Depth vectors
11
SAMtools...
Dump a coverage profile
samtools mpileup -f <in.fasta> <my.bam.s>P1 244526 A 10 ...,.,,,.. BBQa`aaaa[P1 244527 A 10 ...,.,,,.. BBZ_`^a_a[P1 244528 C 10 .$.$.,.,,,.. >>RaZ`aaaaP1 244529 C 8 .,.,,,.. NaXaaaa`P1 244530 T 8 .,.,,,.. Xa\_aaa`P1 244531 C 8 .,.,,,.. Rb\abbaaP1 244532 T 9 .,.,,,..^~. EE^^^^^^AP1 244533 T 9 .,.,,,... BB\\\\\\BP1 244534 T 9 .$,$.,,,... @@^^^^^^E
12
SAMtools Bio::DB::Sam (BioPerl)
Dump a coverage profile 2
13
SAMtools Bio::DB::Sam (BioPerl)
P41630
Matches : 9
0 2 3 3 3 3 3 3 3 3 3 3 3 3 4 5 5 5 5 5 5 5 5 5 5 6 6 6 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 7 6 6 6 6 6 6 6 6 6 6 6 6 5 4 4 4 4 4 4 4 4 4 4 3 3 3 2 2 1 1 1 1 1 1 1 1 0 0 0
14
15
mpileup
samtools mpileup collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format.
bcftools view applies the prior and does the actual calling.
Finally, we filter.
SNP call
1) Index the potato genome assembly (again!)
samtools faidx in.fasta
2) Run 'mpileup' to generate VCF format
samtools mpileup -ug -f in.fasta my1.bam.s my2.bam.s > my.raw.bcf
Actually, all we did (I think) is perform a format conversion (BAM to VCF).
17
VCF format
18
VCF format
A standard format for sequence variation: SNPs, indels and structural variants.
Compressed and indexed.
Developed for the 1000 Genomes Project.
VCFtools for VCF like SAMtools for SAM.
Specification and tools available from http://vcftools.sourceforge.net
19
20
SNP call and filter
1) Call SNPs
bcftools view -bvcg my.raw.bcf > my.var.bcf
2) Filter SNPs
bcftools view my.var.bcf | vcfutils.pl varFilter my.var.bcf > my.var.bcf.filt
21
22
Aims of the work
1) Learn about handling RNASeq Create a SNP calling pipeline
2) Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA)
23
Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA)
24
SNP chip (OPA) construction
A set of DM SNP positions was provided by the SolCAP project (RNASeq derived).
A subset was selected for developing OPAs (Illumina’s SNP chip technology).
OPAs were run, and results have now been compared to RNASeq.
Comparison (using an early SAMtools)
Comparison (using an early SAMtools)
27
Comparison (using an early SAMtools)
Comparison (using new SAMtools)
Comparison (using new SAMtools)
34
Looking into the RNASeq data…
35
36
Potato genome assembly
RNASeq read library
RNASeq read library
37
38
39
40
41
42
A lot more questions to answer…
Track down more ‘strange’ SNPs based on the expected AFS of the two samples.
Go beyond bialleleic SNPs
Check the OPA base... Was the right base probed by the chip?
43
Thank you for your patience!
OPAs in 5 steps...
The DNA sample is activated for binding to paramagnetic particles.
OPAs in 5 steps...
Three oligos are designed for each SNP locus. Two are specific to each allele of the SNP site (ASO) and a Locus-Specific Oligo (LSO).
OPAs in 5 steps...
Several wash steps remove excess and mis-hybridized oligos.
Extension of the appropriate ASO and ligation to the LSO joins information about the genotype to the address sequence on the LSO.
OPAs in 5 steps...
The single-stranded, dye-labeled DNAs are hybridized to their complement bead type through their unique address sequences.
OPAs in 5 steps...
Key to the assay:
Scalable, multiplexing sample preparation (one tube reaction).
Highly parallel array-based read-out.
High-quality data: Average call rates above 99% accuracy.