Effect of Single Nucleotide
Polymorphism in Affymetrix
probes
Olivia Sanchez-GrailletDepartments of Biological Sciences and Mathematical Sciences
University of Essex (UK)December 2008
SNPs: a single base pair is
different between one
individual and the other.
Polymorphism: if at least two variants have frequencies > 1% in a population.
Single Nucleotide Polymorphisms (SNPs )
SNPs are the most common type of
sequence variation between individuals.
SNPs are markers of phenotypes and
diseases.
SNPs may alter the gene expression and
may change or not the amino acid sequence.
Other common variations: DIP: deletion/insertion polymorphism : -/T , C/-
STR: short tandem repeat (microsatellite) polymorphism
(CA)19/20/21/22/23/24/25/26
MIXED: cluster containing submissions from 2 or more alleleic classes
-/AAA/AAAAA/AAAACCAAAAAAAAAAAAAAA
MNP: multiple nucleotide polymorphism with alleles of common length > 1
AAA/CCC
We are studying the relationships between probes intensities on Affymetrix GeneChips.
Affymetrix Gene chips contain thousands of probes
Probes map to different exons. Because of alternative splicing, some of the exons may be upregulated whereas others may be downregulated. We therefore focus on probes within exons.
Probes mapping to the same exon should behave similarly.
What causes Affymetrix probes to behave as outliers with respect to other probes within a single exon?
Objective: Study the impact of SNPs and other common
variation upon Affymetrix probes on GeneChips. Explore whether the existence of a SNP causes a
probe to behave differently to other probes which map uniquely to a single exon.
Previous research on how SNPs might affect gene
expression: Allele A is over-expressed compared to allele B or vs or both alleles
are equally expressed (Kumari et al.,2007).
Hybridization resulted from variation might mislead the interpretation
of data from individual genes, even if a single probe is affected
(Alberts et al., 2007).
In 15 of 25 probesets, SNPs caused a difference in hybridization.
Not every SNP causes a difference in hybridization (Alberts et al.,
2007).
When the SNPs located at the very beginning or end of a probe, it
might have little or not effect on hybridization (Hughes et al., 2001).
Method:
A) Generation of exon heatmaps
B) Identification of probes containing SNPs.
C) Study of SNP-probes which are outliers.
1. CEL files are downloaded from the GEO
database.
2. Calibration of microarray data: Quality control: detection of spatial flaws.
Row Quantile Normalisation.
3. Correlate the intensities for groups of probes, using many thousands of GeneChip experiments.
(A) Generation of exon heatmaps
Example flaw in CEL file
W. B. Langdon et al. (2008). A Survey
of Spatial Defects in Homo Sapiens
Affymetrix GeneChips. In IEEE/ACM
Transactions on Computational
Biology and Bioinformatics.
Probe correlations
The correlation in log intensities between Probe 9 and Probe 11 from probeset 208772_at, obtained from 5,638 HG-U133A GeneChips.
The number in each square is the correlation multiplied by 10 and rounded
Blue = low correlationYellow = high correlation
Average intensity in GEO
Relative probe position on exon
Standard deviation in GEO
Probe number on heatmap
4. Unique mappings (alignments) of probes to individual
exons (Sanchez-Graillet et al.,2008. Widespread existence of
uncorrelated probe intensities from within the same probeset on Affymetrix
GeneChips. In Journal of Integrative Bioinformatics, 5(2):98) :
avoid cross-hybridization and multiple targeting.
sense direction (antisense is avoided).
X
(25 bases, 96% identity)probe 2
exon 3
transcript 3
(25 bases, 100% identity)probe 2
exon 2
transcript 2
(25 bases, 100% identity)probe 1
exon 1
transcript 1
(B,C) Identification of probes containing SNPs and outlier
SNP probes
1. SNPs data downloaded from Ensembl 48 : 3' UnTranslated Region, 5' UTR, and coding
regions.
Chromosome 10
Gene 1
Gene 2
transcript 1
transcript 2
transcript 3
3'UTR
3'downstream
3'5'
5'upstream
ENSG00000172586
ENSG00000172586
ENSG00000212959
gene_id
ENST00000372837
ENST00000372833
ENST00000391642
trans_id
3downstream
3utr
5upstream
G/A
G/A
G/A
75213225
75213225
75213225
10
10
10
rs11000776
rs11000776
rs11000776
biotypeallelechrom_positionchrom_namesnp_id
2. Identification of exons with SNPs by using
transcript information and chromosomic
positions.
3. Selection of unique exons and probes:
Only unique exons with more than 4 probes.
SNP positions on the probes uniquely
mapping to exons are obtained.
4. Identification of SNP-probes which are
outliers:
The overall correlation matrix median
(OMM) is compared with each SNP-probe
median (SPM).
If OMM – SPM >= 0.15
0.66>0.150.03<0.15Difference
SPM_9
0.21
SPM_8
0.84
OMM
0.87
SNP in anoutlier probe
SNP in an no-outlier probe
Results
ENSE00001454795HG_U133_Plus_2
O
N
N
N
N
ENSE00001191156HG_U95A
SNP in overlapped probes.
The same SNP is in outlier probes and no-outliers probes
10 1045_s_at-109-625 rs45612038 14 T/C CTTCAAGAGCATCATGAAGAAGAGT O
9 1045_s_at-237-557 rs45612038 16 T/C ACCTTCAAGAGCATCATGAAGAAGA O
8 1045_s_at-357-497 rs45612038 18 T/C AGACCTTCAAGAGCATCATGAAGAA N
7 1045_s_at-586-137 rs45612038 20 T/C TGAGACCTTCAAGAGCATCATGAAG N
6 1045_s_at-233-503 rs45612038 23 T/C ATATGAGACCTTCAAGAGCATCATG N
5 1045_s_at-153-611 rs45612038 25 T/C ACATATGAGACCTTCAAGAGCATCA N
Probe position heatmap probe_id snp_id snp position allele sequence Outlier
ENSE0000129003HG_U133A
SNPs in only no-outlier probes
rs11038 221667_s_at-512-441 10 13 A/G GTTTATGATCTGACCTAGGTCCCCC N
rs6413487 221667_s_at-570-641 9 7 C/G TAAGGACGCTGGGAGCCTGTCAGTT N
snp_id probe_id probe_position_heatmap snp_position_probe allele seq
ENSE00001416163
HG_U133A (5,374 CEL files)
SNP in only outlier probes
rs13505 219768_at-2-233 8 24 C/A CTGAATTTAGATCTCCAGACCCT GC O
rs13505 219768_at-602-267 9 4 C/A CCT GCCTGGCCACAATTCAAATTAA O
snp_id probe_id probe_position_heatmap snp_position_probe allele sequence
ENSE00001416163HG_U133_Plus_2 (2,572 CEL files)
SNP in both outlier and no-outlier probes
rs13505 219768_at-765-395 8 24 C/A CTGAATTTAGATCTCCAGACCCT GC N
rs13505 219768_at-507-443 9 4 C/A CCT GCCTGGCCACAATTCAAATTAA O
snp_id probe_id probe_position_heatmap snp_position_probe allele sequence
ENSE00001416163 HG_U133A_2(159 CEL files)
SNP in only NO-outlier probes
rs13505 219768_at-432-225 8 24 C/A CTGAATTTAGATCTCCAGACCCT GC N
rs13505 219768_at-534-259 9 4 C/A CCT GCCTGGCCACAATTCAAATTAA N
snp_id probe_id probe_position_heatmap snp_position_probe allele sequence
~60,000 SNPs distributed in unique exons of ten array
designs. 11% in unique exons in which all probes that contain the
same SNP are outliers.
5% in which not all the probes containing the same SNP
are outliers.
84% in which all probes are not outliers.
These numbers may vary according to the Ensembl
version used and the threshold for outliers chosen.
Cross-validation for HG_U133_Plus_2
Examination of SNP-Outlier Associations
Outlier (Yes) Outlier (No) Total
SNP (Yes)
11.4%
(n=1,788)
88.6%
(n=13,869)
100%
SNP (No)
11.6%
(n=17,231)
88.4%
(n=131,035)
100%
Phi = -.002
Median differences and positions of SNPs on probes in HG_U133_Plus_2
Median differences and main alleles (A,C,T,G) found in SNPs in HG_U133_Plus_2
We have identified other causes of outlier probes:
Probes containing a contiguous run of 4 or more guanines: formation of G-quadruplexes occurring on the surface of a GeneChip. (Upton et al., BMC Genomics (in press)).
Probes located next to bright probes, such as at
the edge of the Genechip, are affected by blur.
Motifs or any other “problematic” subsequences.
11%
89%
With PS
Without PS
Outlier SNP-probes in HG_U133_Plus_2 with “problematic” sub sequences (PS):
G’s (>=4), CCTCC, CCACC, GGTGG
40%
60%
With PS
Without PS
Gs, CCTCC
CCACC, GGTGG
Outlier probes No-outlier probes
Conclusions
We have not found a common behaviour when SNPs are present in a probe.
SNPs do not seem to cause outliers in groups of probes representing individual exons.
SNPs may influence other biological events like alternative poly(A).
The genomic region where SNPs are found, the position of the SNP in a probe, the main allele, and the number of SNPs in a probe does not make a probe an outlier in the correlation heatmap.
Bioinformatics GroupDr Andrew Harrison PhysicsDr Berthold Lausen StatisticsDr Abdel Salhi MathematicsProfessor Graham Upton Statistics
Dr William Langdon Physics and Computer Sc.Dr Olivia Sanchez Computer Sc. Dr Maria Stalteri Inorganic Chemistry & Bioinformatics
Jose Arteaga-Salas StatisticsRohmatul Fajriyah StatisticsAbdelhak Kheniche Pharmacology & MathematicsRahim Bux Khokhar MathematicsZain-Ul-Abdin Khurho MathematicsFarhat Memon Computer Sc. Joanna Rowsell Mathematics
Thank you!
Adjacent probes within a cell on a GeneChip have the same sequence – a run of Guanines will result in closely packed DNA with just the right properties to form quadruplexes.