DNA Computing on Surfaces
Anne Condon, Computer Science, UBCRobert Corn, Chemistry, U. WisconsinMax Lagally, Materials Science, U. WisconsinLloyd Smith, Chemistry, U. Wisconsin
Goals
• Encode information in DNA strands
• Compute on many strands in parallel: chemical manipulations = logical operations
(Adleman, Science 266:1994)
“…the number of of operations per second … would exceed that of current supercomputers by a thousandfold…remarkable energy efficiency… information density a dramatic improvement over existing storage media
Len Adleman, Science 266:1994
“for certain intrinsically complex problems…where existing electronic computers are very inefficient and where massively parallel searches can be organized to take advantage of the operations that molecular biology currently provides, molecular computation might compete with electronic computation in the near term”
OutlineBackground
DNA Computing on Surfaces
Conclusions•Models•Experiments
• What is computation? What is DNA?• DNA computation
•Research on DNA computation
• in the biotech industry• in the solution of combinatorial problems
What is Computation?(very simple view)
• Input: string over finite alphabet
• Process: determine if input satisfies
some property
• Output: yes or no
Satisfy a Property: Binary Inputs
• set the output of a circuit to 1 or
and
notand
0 11 0
Output:
Input:
0 1
1
1
Satisfy a Property: Non-binary Inputs
• Set the output of a generalized circuit to a given value
C GA G
Output:
T G
G
C
Simple Parallel Computation
• Input: set of strings
• Process: independently for each input,
determine if it satisfies a
common circuit
• Output: indicate whether there exists an
input satisfying the circuit
What is DNA?
“DNA Computation:” Affymetrix Arrays
• Input: strings over {A,C,G,T}, (represented as the corresponding single-stranded DNA)
Photolithography used to synthesize and array DNA strands on a planar surface
“DNA Computation:” Affymetrix Arrays
• Process: e.g. for each input, test if it approximately matches a given string
(i.e. hybridizes to Watson-Crick complement of given string)
“DNA Computation:” Affymetrix Arrays
• Output: fluorescence detection
Adleman’s Hamiltonian Path Experiment
• Input: generate random paths
• Process:
• Output: “yes” iff path remains
S
2 1
3
4
5
T
• select paths from S to T• select paths with 7 nodes• select paths entering all nodes at least once
Generate Random Paths
• Associate DNA strands with nodes and edges
• Join edge strands in test tube to form double-stranded “paths” (hybridization, ligation)
• Wash to form single-stranded paths
542 3
Adleman’s Experiment: Select Paths That Enter Node 2
• Attach strand associated with node 2 to beads and introduce to test tube
• The paths that enter node 2 hybridize to strands on the beads
• Remove beads; wash and detach desired paths
Biomolecular Computation Research
• “Classical” DNA/RNA computation
(e.g. search-and-prune)• O(1)-biostep computation
(e.g. self-assembly of 3-D DNA molecules)
Biomolecular Computation Research
• Splicing-based computation• Non-computational applications
(e.g. exquisite detection, DNA2DNA computation, DNA nanotechnology, DNA tags)
DNA Computing on Surfaces
• Advantages over “solution phase” chemistry:
• Disadvantages:
DNA Computing on Surfaces
•Facile purification steps•Reduced interference between strands•Easily automated
•Loss of information density (2D)•Lower surface hybridization efficiency•Slower surface enzyme kinetics
DNA Surface Model: Input
DNA strands representing the set {0,1}^n are synthesized and subsequently immobilized on a surface in a non-addressed fashion
Encoding of Binary Information in DNA Strands
A strand is comprised of words. Each word is a short DNA strand (16mer) representing one or more bits.
ACCT...
Word Bit
1
2
3
4
12341234...
DNA Word Design Problem
• Requirements of a “DNA code”:– Success in specific hybridization between a DNA
code word and its Watson-crick complement– Few false positive signals
• Virtually all designs enforce combinatorial constraints on the code words
• Applications: – Information storage, retrieval for DNA computing – Molecular bar codes for chemical libraries
What combinatorial constraints are placed on DNA Codes?
• Hamming: distance between two code words should be large
• Reverse complement: distance between a word and the reverse complement of another word should be large
• Also: frame shift, distinct sub-words, forbidden sub-words, …
Work on DNA code design• Seeman (1990): de novo design of
sequences for nucleic acid structural engineering
• Brenner (1997): sorting polynucleotides using DNA tags
• Shoemaker et al. (1996): analysis of yeast deletion mutants using a parallel molecular bar-coding strategy
• Many other examples in DNA computing
Word Design Example
DNA Surface Model: Process
•MARK strands in which bit j = 0 (or 1): hybridize with Watson-Crick complements of word containing bit j, followed by polymerization•DESTROY•UNMARK
DNA Surface Model: Process
•MARK strands in which bit j = 0 (or 1)•DESTROY unmarked strands: exonuclease degradation•UNMARK
DNA Surface Model: Process
MARK strands in which bit j = 0 (or 1): hybridize with Watson-Crick complements of word containing bit j, followed by polymerization
DNA Surface Model: Process
•MARK strands in which bit j = 0 (or 1)•DESTROY unmarked strands•UNMARK strands: wash in distilled water
DNA Surface Model: Output
• Detect remaining strands (if any)
by detaching strands from surface and
amplifying using PCR (polymerase chain
reaction).
Computational Power ofDNA Surface Model
Theorem: Any CNFSAT formula of size m can be computed using O(m) mark, unmark and destroy operations.
Theorem: Any circuit of size m can be computed using O(m) mark, unmark, destroy, and append operations.
Surface DNA Computation: the Satisfiability Problem
•Input: 16 strands•Process:
•Output: exactly those strands that satisfy the circuit remain on the surface.
or
not
or
z
and
w y x
MARK if bit z = 1 MARK if bit w = 1 MARK if bit y = 0 DESTROY UNMARK
MARK if bit w = 0 MARK if bit y = 0 DESTROY UNMARK
…
or or
not not
DNA Computing on Surfaces: Experiments
Students: Tony Frutos, Susan Gillmor, Zhen Guo, Qinghua Liu, Andy Thiel, Liman Wang
MARK Operation: 4-Base Mismatch Word Design
Repeated MARK, DESTROY, UNMARK Operations
Append (DNA Ligase)
A. Hybridize with CbB. Hybridize with Cab, WbC. Ligate; Wash; Hybridize with Cb.
Two-Word Mark and Destroy
A. Mark C1a, C1b, C2bB. Ligate; Melt single wordsC. Destroy; Unmark; Mark C1a, C1b, C2b.
Surface Attachment Chemistry
Word Readout Strategy
•PCR amplify words remaining on surface
•Detect PCR products on single word readout arrays
4-Variable SAT Demo
•Synthesize; Attach•Mark•Destroy•Umark•Readout
Cycle
Conclusions• DNA computing has expanded the notion of what
is computation• Solid-phase chemistry is a promising approach to
DNA computing• DNA computing will require greatly improved
DNA surface attachment chemistries and control of chemical and enzymatic processes
• New research problems in combinatorics, complexity theory and algorithms
Open Problem: DNA Strand Engineering
Given a DNA strand, there are polynomial-time algorithms that predict the secondary structure of the strand.
Inverse Problem: find an efficient algorithm that, given a desired secondary structure, generates a strand with that structure.