Gene Expression
• Expressed in the transcriptome
• Every eukaryotic genome contains between 5000-60,000 protein-coding genes
• Only a small subset of those genes are transcribed
• by region (e.g. brain versus kidney)
• in development (e.g. fetal versus adult tissue)
• in dynamic response to environmental signals
(e.g. immediate-early response genes)
• in disease states
• by gene activity
Gene expression is regulated in several basic ways
Page 157
Expression Databases & Analyses
• UniGene: for the comparison of cDNA libraries– Goals: (1) create one unique entry for each
gene, (2) collect all the ESTs associated with each gene
• SAGE: Serial Analysis of Gene Expression library
• DNA microarrays
Fig. 6.3Page 161
exon 1 exon 2 exon 3intron intron
transcription
RNA splicing (remove introns)
polyadenylation
Export to cytoplasm
AAAAA 3’5’
5’
5’
5’ 3’5’3’
3’
3’
Analysis of gene expression in cDNA libraries
A fundamental approach to studying gene expressionis through cDNA libraries.
• Isolate RNA (always from a specific organism, region, and time point)
• Convert RNA to complementary DNA
• Subclone into a vector
• Sequence the cDNA inserts. These are Expressed Sequence Tags
Page 162-163
vector
insert
UniGene: unique genes via ESTs
• Find UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene
• UniGene clusters contain many ESTs
• UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution.
Page 164
Cluster sizes in UniGene
This is a gene with1 EST associated;the cluster size is 1
Page 164& Fig. 2.3,Page 23
Cluster sizes in UniGene (human)
Cluster size Number of clusters1 10,4002 7,1003-4 6,8005-8 5,3009-16 3,80017-32 3,100
500-1000 1,5002000-4000 1308000-16,000 1216,000-30,000 3
UniGene build 186, 9/05 Page 164
Ten largest human UniGene clusters
Cluster size Gene22,925 eukary. translation EF (Hs. 522463)22,320 eukary. translation EF (Hs. 4395522)16,562 actin, gamma 1 (Hs.514581)16,309 GAPDH (Hs.169476)16,231 actin, beta (Hs.520640)11,076 ribosomal prot. L3 (Hs.119598)10,517 dehydrin (Hs.524390)
10,087 enolase 1 (alpha)(Hs.517145)
9,973 ferritin (Hs.433670)8,966 metastasis associated (Hs.187199)
UniGene build 186, 9/05
Table 6.2Page 165
Fisher’s exact test provides a p value
Digital differential display (DDD) results in UniGeneare assessed for significance using Fisher’s exact testto generate a p value.
p =
The null hypothesis (that gene 1 is not differentiallyregulated in a comparison of two libraries) is rejectedwhen p is < 0.05/G (where G = the number of UniGeneclusters analyzed).
Pages 165
NA! NB! c! C!
(NA + NB)! g1A! g1B! (NA – g1A)!(NB – g1B)!
Pitfalls in interpreting cDNA library data
• bias in library construction• variable depth of sequencing• library normalization• error rate in sequencing• contamination (chimeric sequences)
Pages 166-168
Serial analysis of gene expression (SAGE)
• 9 to 11 base “tags” correspond to genes
• measure of gene expression in different biological samples
• SAGE tags can be compared electronically
Page 169