+ All Categories
Home > Documents > Quality and Error Control Coding for DNA Microarrays

Quality and Error Control Coding for DNA Microarrays

Date post: 30-Dec-2015
Category:
Upload: mary-kirkland
View: 38 times
Download: 1 times
Share this document with a friend
Description:
Quality and Error Control Coding for DNA Microarrays. Olgica Milenkovic ECE Department University of Colorado, Boulder. IEEE Denver ComSoc. Outline. DNA Microarrays VLSIPS (Very Large Scale Immobilized Polymer Synthesis) Production of DNA Microarrays (http://www.affymetrix.com/) - PowerPoint PPT Presentation
Popular Tags:
19
Quality and Error Control Coding for DNA Microarrays Olgica Milenkovic ECE Department University of Colorado, Boulder IEEE Denver ComSoc
Transcript
Page 1: Quality and Error Control Coding for DNA Microarrays

Quality and Error Control Coding for DNA

Microarrays

Olgica MilenkovicECE Department

University of Colorado, Boulder

IEEE Denver ComSoc

Page 2: Quality and Error Control Coding for DNA Microarrays

Outline

• DNA Microarrays• VLSIPS (Very Large Scale Immobilized Polymer

Synthesis)• Production of DNA Microarrays (http://www.affymetrix.com/)

– Base Scheduling– Mask Design– Quality-Control Coding

• Error-Correcting DNA Microarrays (Multiplexed Arrays)• Production of Multiplexed DNA Microarrays

– Base/Color Scheduling– Mask Design– Quality-Control Coding

IEEE Denver ComSoc

Page 3: Quality and Error Control Coding for DNA Microarrays

DNA microarrays I

Protein Coding Sequence

Protein

Tra

nscri

pti

on

Tra

nsla

tio

n

Control of Transcription &

Translation

Gene expression and co-regulation

Goal: Determining which genes are expressed (active) and which are unexpressed (inactive)

Comparative gene expression study of multiple cells

Protein Coding Sequence

Protein

Tra

nscri

pti

on

Tra

nsla

tio

nIEEE Denver ComSoc

Slide #1

Page 4: Quality and Error Control Coding for DNA Microarrays

DNA microarrays II

DNA Subsequence

mRNA

cDNA

3’- AATTT CGC… - 5’

5’ - UUAAAGCG… - 3’

3’ - AATTTCGC… - 5’

3’ - AATTTCGC… - 5’

Creation of tagged cDNA sequences from first cell type

“Color Coding”

`Green’ Cell Culture :

Creating the `cell cultures’ to be compared…

DNA Subsequence

mRNA

cDNA

Creation of tagged cDNA sequences from first cell type

“Color Coding”

`Red’ Cell Culture:

3’- AATTT CGC… - 5’

3’ - UUAAAGCG… - 5’

3’- AATTT CGC… - 5’

3’- AATTT CGC… - 5’

IEEE Denver ComSoc

Slide #2

Page 5: Quality and Error Control Coding for DNA Microarrays

DNA microarrays III

Complementary sequences hybridize with each other,

forming stable double-helices

Hybridization:

3’-AAGCT-5’

5’-TTCGA-3’

DNA microarray is scanned by laser light of different wave-lengths

Gene ProbesSpots

IEEE Denver ComSoc

Slide #3

Page 6: Quality and Error Control Coding for DNA Microarrays

Probe synthesis in microarrays IVLSIPS (Gene Chip, AFFYMETRIX, Array

Manufacturing Manual)

Quartz Wafer

Linkers

Linker Activation

Mask

IEEE Denver ComSoc

Slide #4

Page 7: Quality and Error Control Coding for DNA Microarrays

Probe synthesis in microarrays IIVLSIPS (Gene Chip, AFFYMETRIX, Array

Manufacturing Manual)

Solution of one DNA base

(A or T or G or C)Solution of one DNA base

(A)

IEEE Denver ComSoc

Slide #5

Page 8: Quality and Error Control Coding for DNA Microarrays

A T G C A T G C A T G C A T G C

Spots

1

2

3

4

5

Production steps

Synchronous

schedule

(length 4N)

Base scheduling I

IEEE Denver ComSoc

CTGA

ACAA

Slide #6

Fixed probe length: N

Page 9: Quality and Error Control Coding for DNA Microarrays

A G G C T T G C T T G C C C G C

Spots

1

2

3

4

5

Production steps

Asynchronous

schedule

Base scheduling II

IEEE Denver ComSoc

Slide #7

Page 10: Quality and Error Control Coding for DNA Microarrays

Base Scheduling III

• Shortest asynchronous base schedule– Shortest common super-sequence of set of M sequences (NP-

hard)

ESN(M,k) – expected length of a longest common subsequence of M randomly chosen sequences of length N over an alphabet of size k

N)k,M(ES

lim N

N

)l(k

12

11

10

0

0

M/

M)M(

k

z

)))z((klog()kzlog(M

No significant gain for N≈20-30

Periodic schedule used instead (length 4N)

IEEE Denver ComSoc

Slide #8

Page 11: Quality and Error Control Coding for DNA Microarrays

Mask Design

Border-length minimization

Feldman and Pevzner, 1994

Hannehalli et.al., 2002

Kahng et.al. 2003, 2004

Key idea:

Arrange the probes on the array in such a way that the border-length of all masks is minimal

Border-length graph: complete graph on M vertices, weight of edges equal to the Hamming distance between probes

Greedy traveling salesman algorithm+ threading (discrete space-filing curve)

IEEE Denver ComSoc

Slide #9

Page 12: Quality and Error Control Coding for DNA Microarrays

Quality Control

Quality control (fidelity) spots

Hubbell and Pevzner, 1999

Sengupta and Tompa, 2002

Colbourn et.al., 2002

Manufacture identical probes at several quality-control spots in order to test

precision of production steps

IEEE Denver ComSoc

Slide #10

Page 13: Quality and Error Control Coding for DNA Microarrays

Relevant coding-theoretic ideas Balanced code (Sengupta and Tompa, 2002):

An b×v binary matrix of zeros and ones with

• each row has weight k;

• each column has weight bounded between l and b-l, for some constant l;

• any pair of columns is at least at Hamming distance d apart;

Superimposed designs in Renyi’s search model (Kautz and Singleton, 1964, Dyachkov and Rykov, 1983):

An b×v binary matrix of zeros and ones with

• all Boolean sums composed of no more than s columns are distinct;

•each row has weight exactly t;

Additional constraints: the Boolean sums form an error-correcting code with prescribed minimum distance d; IEEE Denver ComSoc

Slide #11

Page 14: Quality and Error Control Coding for DNA Microarrays

Error-correcting microarray design

• Probe multiplexing (Khan et.al, 2003)

0011

1001

1010

1100

0101

0110

G

Probes

s

p

o

t

s

X – vector of RNA levels corresponding to N genes

Y – total concentration of RNA at all spots

.constc)j,i(G

k)G(rank

kn,matrixGkn

G)GG(*G

)Gtr(Gmin

j

TT

*T*

10

1

S - hybridization affinity matrix, T - spot quality matrix

Decoding algorithm: numerical optimization

IEEE Denver ComSoc

Slide #12

Excluding hybridization effects, spot formation quality and under iid measurement noise,

XY TGS

Page 15: Quality and Error Control Coding for DNA Microarrays

VLSIPS/analysis for multiplexed arrays

Features: • Multiple polymer synthesis at one given spot (for simplicity, will consider only

two probes per spot)• Can use two different classes of linkers sensitive to different wavelengths so to

select probes for extension (say, `blue’ and `green’ and `cyan’)

A T G C A T G C A T G C A T G C

Spots

1

2

3

4

5

6

g b g b c c b c g g c b b g b g

Slide #13

Page 16: Quality and Error Control Coding for DNA Microarrays

Slide #14VLSIPS/analysis for multiplexed

arraysScheduling: shortest schedule of bases/colors

(Using results from V. Dancık, Expected Length of Longest Common Subsequences, 1994)

Set-up: two identical sets of M `blue’ and M `green’ randomly and uniformly chosen sequences of length N over the alphabet of size four

Length of shortest schedule

)(lim )M()M()M(

N122 444

Synchronous schedule, no `cyan’ colored steps: 8N

Chvatal-Sankoff

constants

IEEE Denver ComSoc

Page 17: Quality and Error Control Coding for DNA Microarrays

s1 s4

s3 s2

s1 s3

s2 s4

A C G T A C G T

b g c c g c c b

S1

S2

S3

S4

AT,CA

AC,CC

GT,GA

TT,TA

L(M)=4, L(M)=4, L(M)=2, L(M)=2, L(M)=2, L(M)=2, L(M)=2

L(M)=2, L(M)=2, L(M)=2, L(M)=2, L(M)=2, L(M)=2, L(M)=2

Slide #15

Mask design:

Page 18: Quality and Error Control Coding for DNA Microarrays

Mask Design / Scheduling

))(),(()),(),(( 22211211 pppp

Neighborhood graph: complete graph with M vertices labeled by two distinct sequences

No `cyan’ steps: weight of edge between two vertices

sums of Hamming distances

))(),(( 21 pp

Issues: For reasons of controlled hybridization, different probes (blue and green) at the same spot should have fairly large Hamming distance

(Milenkovic and Kashyap, 2005)

Border-length minimization becomes less effective

With cyan colored steps involved, the distance measure also depends on the longest common subsequence of the probes at the same spot

Slide #16

IEEE Denver ComSoc

Page 19: Quality and Error Control Coding for DNA Microarrays

Quality Control CodingSlide #17

IEEE Denver ComSoc

Theorem: Assume that there exists a linear error-control code with parameters [n,k,d] containing the all-ones codeword. Then one can

construct a quality control array for a multiplexed DNA chip with 2(2k-2) disjoint blue and green production steps and M probes such that the

length of each quality control probe is 2(k-1)-1, and that the weights w of the columns in the quality control array satisfy

Furthermore, with such an array any collection of less than n/(n-d) failed blue or green steps, respectively, can be uniquely identified.

Open question: how does one extend this result for schedules involving `cyan’ colored production steps, and under `spot’ failures.

dnwd


Recommended