SW-ARRAY: a dynamic programming solution for the identification of copy-number
changes in genomic DNA using array comparative gnome
hybridization data
Motivation
• Chromosomal changes cause genetic diseases– aneusomies
• Easy to detect
– Copy number changes of genes• Not so easy
Array CGH
• Comparative Genome Hybridization CGH to DNA microarrays
• Method for detecting copy number changes– Data analyzed using thresholds– Not reliable to detect single-copy gains or losses
when using large insert clones as probes – High false positives and false negatives– Inconsistent for probes of different chromosomal
regions
• Cannot be used for clinical diagnostic applications!
Data Adjustment
• Normalization and Correction– Reason: variations between probes– Control vs. control data ratio
• Find mean and SD
– Divide control vs. test ratios by that mean
Threshold method
• Compare each data from control vs. test experiment to threshold values– Below 0.8=deletion– Above 1.2=polysomy
SW-ARRAY
• Smith-Waterman algorithm adapted for Array CGH
• New way to analyze Array CGH data
• Reason:– Log ratio data is contiguous one-dimensional
series, where locations of high values may indicate polysomic regions, low deletions
SW-ARRAY
• Step 1:– Remove outlying probes
• Log intensity ratio more than 2.5 MAD away from median of other probes in array
• MAD=Mean Absolute Deviation– Robust measure of Standard Deviation
1
1 n
iix x
n
SW-ARRAY
• Step 2:– Log ratio data - t0
– Ensures that the mean of adjusted data is negative
• t0=median + 0.2 x MAD
SW-ARRAY
• Step 3:– Search for high-scoring islands
• Definition– locally high-scoring segment-a positive
scoring segment whose score cannot be increased by shrinking or expanding segment boundaries
SW-ARRAY
( , ) ( )q
i pT p q X i
T(p,q)=score of segmentX(i)=score for the pth probe ordered along genome
SW-ARRAY
• Iterate through locations along gene probes
• Search where scores>0– Find max-scoring island– Record data– Set island=0– Find next max-scoring island
SW-ARRAY
• Statistical Significance– In 1000 runs with permuted log ratios for each
probe• find frequency of highest scoring island in each run
Experiment
• Test Group– DNA from subjects with well-characterized
monosomies
• Control groups
• Data analyzed using 2 methods– Threshold– SW-ARRAY
Experiment Results
• Threshold Method– 78.1% correct identification of copy-number
changes
• SW-ARRAY– Identified 13/14 of the monosomic regions
with high significance levels in the 14 blind tests
Ideal Conditions for SW-ARRAY
• numerious probes border region of copy number change
• long sequences for which edge effects are minimized