Chip arrays and gene expression data. With the chip array technology, one can measure the expression...

Post on 15-Jan-2016

220 views 0 download

Tags:

transcript

Chip arrays and gene expression data

With the chip array technology, one can measure the expression of 10,000 (~all) genes at once. Can answer questions such as:

1.Which genes are expressed in a muscle cell?

2.Which genes are expressed during the first weak of pregnancy in the mother? In the new baby?

3.Which genes are expressed in cancer?

4. If one mutates a TF: which genes are not expressed following this change?

5. Which genes are not expressed in the brain of a retarded baby?

6. Which genes are expressed when one is asleep versus when the same person is awake?

DNA chip: in each spot there’s a specific marked DNA molecule. Upon hybridization with a marked mRNA molecule (or cDNA one) – the intensity of the hybridization can be quantified by light.

Affymetrix: The base is a “wafer” מצע גבישי מוליך למחצה דק

A light-sensitive chemical compound that prevents coupling between the wafer and the first nucleotide of the DNA probe being created.

The blue “cap” is light sensitive. A mask is added to some of the cells. When the cells are illuminated, only where there is light – a reaction with a nucleotide can happen.

Affymetrix

The nucleotide that is added is also chemically linked with a new “cap” (light sensitive).

Affymetrix

The entire process is called photolithography

Affymetrix

Affymetrix

Affymetrix: each probe is 25 bp – a part of an exon.

The readerThe chip itself

In one cm2 > 106 different oligos.

Affymetrix

Affymetrix: each probe is 25 nucleotides. Above this, a technological problem exists: the synthesis becomes inaccurate.

With such short probes, each mRNA can hybridize to more than one probe. The solution, each gene is “covered” by several probes.

Affymetrix

Affymetrix: one can buy ready-made chips (human genome, mouse genome), or he can design (“print”) his own chip (more expensive).

Affymetrix

Detection: mRNA is isolated from the tissue

Affymetrix

(cells, viruses). cDNA is synthesized. The cDNA is fluorescently labeled. Sometimes, the cDNA is amplified using PCR. The intensity in each cell (probe) is measured by “the reader”.

AgilentDeveloped DNA printers – in each spot pico-liters of nucleotides are added. They can make probes up to 60 mers (Agilent is derived from Hewlett-Packard).

Agilent

Standard phosphoramidite chemistry

Hybridization to Agilent probes is more accurate.

If there is hybridization, to a probe, the gene it represents is probably expressed.

Agilent

But, it is impossible to know how many probes are in each cell. So absolute fluorescent intensities are meaningless.

Agilent

Solution, in the same experiment, hybridize samples with two conditions: healthy mRNA (in Red) versus tumor cells (green).

The Agilent reader will give the ratio of the two colors.

Agilent

In this approach, long cDNA sequences (>300bp) are produced in a cell (a clone) and are linked to each chip cell. Producing long cDNA rather than synthesizing them a nucleotide at a time is cheaper!

As in the case of Agilent, it is impossible to control the number of probes in each cell.

Stanford cDNA chips

Stanford cDNA chips

Output

w.tBrain tumor

males

Brain tumor

females

Gene 1

Gene 2

Gene 3

Gene 25,000

Each cell is either an absolute number or a relative one, depending on the technology used.

Repeats

w.tBrain tumor

male1

Brain tumor

male2

Brain tumor

female1

Gene 1

Gene 2

Gene 3

Gene 25,000

The repeat can either be the same sample – a different chip or a “real” biological repeat – a different sample.

Expression profile

wt1wt2wt3wt4bt1bt2bt3bt4

g1435415161723

g275466379

g3232525263060

Genes 1 and 3 show the same trend (go both high under the same conditions). That is: they have the same expression profile.

Clustering

wt1

wt2

wt3wt4bt1bt2bt3bt4

g1435415161723

g275466379

g3232525263060

In general, we want to find all the genes that share the same expression profile → suggestive of a functional linkage.

There are clustering algorithms, which do exactly that.

Clustering

wt1

wt2

wt3wt4bt1bt2bt3bt4

g14354022023

g275460809

g32325601661

Clustering of the conditions can suggest two types of brain tumor (bt)

Bi-clustering: both on the conditions and the genes.

Applications

Think of increasing the glucose concentration of E.coli and making a chip array in various concentration.

One can potentially discover all genes in the glucose pathway.

Knocking out a gene → discover all genes that interact with it.

Applications

Analyzing expression of genes can help reveal the gene network of a given organism.

Gene network

Clinical

/

g111

g24

g30

Do someone has a brain tumor?

wt1

wt2

wt3wt4bt1bt2bt3bt4

g14354022023

g275460809

g32325601661

Sequence by hybridizationIt was thought that the following procedure could work for sequencing a genome:

1.Make a chip containing all x mers (e.g., x = 25).2.Hybridize a genome to the chip.3.By analyzing all the hybridizations with their overlaps – assemble the genome.

Problem: it doesn’t work.

ChIP-on-chip : A method for measuring protein-DNA interaction.

Proteins that bind DNA includes:

Those responsible for transcription regulation

Transcription factors (TFs)

Replication proteins

Histones…

ChIP-on-chip: One chip is for Chromatin ImmunoPrecipitation and the second chip is for DNA microarrays.

The method is used mostly to detect TF binding sites.

ChIP-on-chip:

Tiling arrays

Here the chip array should include not only protein coding genes but also control regions, or simply – the entire genome.

Protein-Protein interaction

Some facts:

• Human genome, 20,000-30,000 genes, more then 500,000 proteins. At a given time in a cell 10,000 proteins are present. (Proteome).

• Estimate of >80% of proteins interact.

• The network includes hubs.

Large scale studies of protein-protein interactions (PPIs) give very noisy data:

40-80% of interactions are false negatives (true interactions that are unidentified).

30-60% of interactions are false positives

(interactions that are inferred but are not real).

Method 1: affinity tag purification of complexes in vivo.

Cell

Say we want to know what interact with protein X.We construct a plasmid with the gene coding for X (filled box) fused to a bait (empty box).

In the cell, protein X fused to the bait is expressed, and interacts with some proteins.

The cells are lysed and the protein complex is isolated using a solid support linked to a ligand that can interact with the bait.

Method 1: affinity tag purification of complexes in vivo.

Bound proteins are eluted, separated on a gel and identified using mass spectroscopy (MS).

The method is biased towards proteins of high abundance.

Method 1: affinity tag purification of complexes in vivo.

Method 2: yeast two hybrid system.

Some transcription factors are composed of two domains: BD which Binds the DNA and AD (in red), which activate transcription. They need to interact in order to express the gene.

yeast two hybrid system.

In order to check if protein A (bait) interacts with protein B (prey), protein A is expressed fused to AD, and protein B fused to BD. Only if A and B interact – the reporter gene will be expressed.

Databases of protein-protein interactions:

DIPIntActMINTMIPSiHOP

Protein-protein interactions are fundamental for functional annotation.

If X interacts with Y & Y is known to be related to muscle development, maybe X is also related to muscle development.

“Guilt by association”