+ All Categories
Home > Documents > 1. Bacterial genomes - genes tightly packed, no introns...

1. Bacterial genomes - genes tightly packed, no introns...

Date post: 04-Jan-2016
Category:
Upload: heavynne-mays
View: 27 times
Download: 3 times
Share this document with a friend
Description:
HOW TO FIND GENES WITHIN A DNA SEQUENCE?. 1. Bacterial genomes - genes tightly packed, no introns. Scan for ORFs (open reading frames). check all 6 reading frames (both strands). Fig. 5.2. look for significant distance between potential start and stop codon (eg 100 codons). - PowerPoint PPT Presentation
Popular Tags:
18
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading frames (both strands) look for significant distance between potential start nd stop codon (eg 100 codons) Fig. 5.2 … but when examining short sequences, start codon (or stop codon) might be located further upstream (or downstream)
Transcript
Page 1: 1.  Bacterial genomes - genes tightly packed, no introns...

1. Bacterial genomes- genes tightly packed, no introns...

HOW TO FIND GENES WITHIN A DNA SEQUENCE?

Scan for ORFs (open reading frames)

- check all 6 reading frames (both strands)

- look for significant distance between potential startand stop codon (eg 100 codons)

Fig. 5.2

… but when examining short sequences, start codon (or stop codon) might be located further upstream (or downstream)

Page 2: 1.  Bacterial genomes - genes tightly packed, no introns...

- if initiation codon other than ATG (relatively rare)- if overlapping genes (rare)

Potential problems?

- if gene contains intron(s)

Use computer programs to search for ORFs:

Query: 3 kb sequence

- if deviation from standard genetic code (can change default)

Page 3: 1.  Bacterial genomes - genes tightly packed, no introns...

2. Eukaryotic genomes (such as human) - genes usually far apart, long introns & short exons

Fig.5.4

Would an ORF scan work here?

Page 4: 1.  Bacterial genomes - genes tightly packed, no introns...

Can also use algorithms to look for:

1. Exon-intron boundaries- “GT-AG” rule, but consensus sequences very short

2. Regulatory motifs - upstream promoters, downstream polyA addition signals…- but consensus sequences usually very short

3. Codon bias patterns- synonymous codons are

not all used equally- patterns differ among

organisms

Table 5.1, Brown1st ed

(see Fig.5.5)

See Fig. 5.10 which shows results from various bioinformatics tools used to analyze 15 kb of human genome

Page 5: 1.  Bacterial genomes - genes tightly packed, no introns...

BLAST searches www.ncbi.nlm.nih.gov/BLAST/Basic Local Alignment Search Tool

- search programs to look for similarity between your sequence of interest (protein or DNA) and entries in global data banks

BLASTP – search at protein level

BLASTN – search at nucleotide level

BLASTX – search nt sequence against protein databases(automatic 6-reading frame conceptual translation)

4. Homologous sequences in databank

tBLASTN – protein query vs. conceptual translation of DNA database

Page 6: 1.  Bacterial genomes - genes tightly packed, no introns...

Query = yeast mitochondrialribosomal protein L8 (238 aa)

Fungal

Bacterial

Page 7: 1.  Bacterial genomes - genes tightly packed, no introns...

E-values: statistical measure of likelihood that sequences with this degree of similarity occur randomly

ie. reflects number of hits expected by chance

Nomenclature may differ among organisms - called L17 in Streptococcus but L8 in yeast

Page 8: 1.  Bacterial genomes - genes tightly packed, no introns...

Query = yeast mitochondrial ribosomal protein L8 gene (including promoter & UTRs)

What if this search was done at nucleotide (instead of protein) level?

Only got “hits” with other yeast entries, in this case

Homologous genes from divergent organisms typicallyshow greater similarity at amino acid level than at nt level

Degeneracy of genetic code

Codon bias among organisms

Probability of specific stretch of nucleotides occurring by random chance (“spurious hits”) is higher than for the same length of amino acids

Page 9: 1.  Bacterial genomes - genes tightly packed, no introns...

To illustrate the power of amino acid level searches, text shows 2 sequences with 76% nt identity

… but only 28% aa identity

But it’s a rather artificial example…

Fig. 5.18

because if 2 DNA stretches of 300 bp or so (normal default length in ORF Finder) showed 76% nt identity,

it’s very improbable that such similarity occurred by chance

Page 10: 1.  Bacterial genomes - genes tightly packed, no introns...

HOMOLOGOUS GENES (share common evolutionary origin)

1. Orthologous- homologous genes in different organisms

2. Paralogous- homologous genes in same organism

(eg. -globin genes from mouse and human)

(eg. multi-gene family members, -globin and -globinfrom mouse)

Two genes are either evolutionarily related or they are not …. so instead of “…% homologous”, use “… % identity”

(p.145)

Page 11: 1.  Bacterial genomes - genes tightly packed, no introns...

ARE TWO SEQUENCES HOMOLOGOUS OR INDEPENDENT IN ORIGIN?

Factors to consider:

1. Length of sequence- short sequences more likely to occur by chance

2. Base composition- highly biased (eg if only AT) more likely to occur by chance

3. Similarity at amino acid level (if protein-coding region)- high % identity is strong argument for homology

- usually implies common protein function

- nt changes such that minimal effect on aa sequence

“low complexity regions”

Page 12: 1.  Bacterial genomes - genes tightly packed, no introns...

- score of % nt sequence similarity (blocks compared vs. reference sequence)

“Numbered boxes correspond to exons”

- gives overview of sequence relationships for genomic region shared among organisms

Comparison of homologous regions from multiple genomes

Thomas Nature 424:788, 2003

Human chr 7 (1.8 Mbp region)

MultiPipMaker program (percent identity plot)

Page 13: 1.  Bacterial genomes - genes tightly packed, no introns...

1. Zoo blot (Southern) analysis

Fig. 5.12

Heterologous hybridization- use conditions of “reducedstringency” (eg lower temp)so that duplex hybrids withsome mismatches are stable

- find regions homologous to DNA from other organisms

- to determine presence/absence of gene among different organisms

EXPERIMENTAL TECHNIQUES TO FIND GENES

Interpretation of data shown in figure?

Page 14: 1.  Bacterial genomes - genes tightly packed, no introns...

Bra

in

Kid

ney

Hea

rt

Lun

g

Liv

er

Strachan & Read Fig. 5.17

In situ hybridization- to determine cellularlocation

35S-labeled -myosin antisense probe hybridizing to heart ventricle in 13-day embryonic mouse

gene X probe

Probe: tagged DNA (eg. PCR product, restriction fragment, cDNA clone…) in denatured form or oligomer or antisense (synthetic) RNA …

Fig. 5.11

2. Northern blot analysis- to identify expressed regions of genomes

(detect transcripts)

(but note that many identical copies of that particular mRNA are present on blot)

Page 15: 1.  Bacterial genomes - genes tightly packed, no introns...

Some protein genes are constitutively expressed …

“housekeeping gene” products needed at all times

… whereas others are differentially expressed

Only a subset of genes are expressed at a given timeand mRNA levels can vary greatly among genes

during development

in specific tissue type

in response to environmental cues

~10,000 – 15,000 different mRNAs present in “typical” mammalian cell type under given condition

(may be ~ 20,000 different proteins present)

Aside: RNA-sequencing studies suggest ~ 8000 genes ubiquitously expressed in human tissues (Ramskold PLoS 2009)

(higher than predicted from microarray analysis, to be discussed in Topic 7)

Page 16: 1.  Bacterial genomes - genes tightly packed, no introns...

Fig.3.36

ESTs- short sequences obtained bysequence analysis of cDNA clones

5’cap AAAAAAAAAn

eg. for primer can use mixture of “anchored” oligo(dT)s with A, C or G in the 3’ position

3’ 5’

…. or cDNA maybe not full-length

If so, which end would you expect to be missing?

… but if low abundance mRNA may not be in bank

EXPERIMENTAL TECHNIQUES TO FIND CODING REGIONS WITHIN GENES

1. Sequencing of cDNA (or EST) clones

... & compare to genomic sequences to determine positions of introns

Page 17: 1.  Bacterial genomes - genes tightly packed, no introns...

Human phosphatidylinositol glycan gene (chromosome 18)

- additional info from RNA level data

~60% of RefSeq genes could be extended at 5’ and 3’ ends (based on additional EST data = UTRs)

Nusbaum Nature 437:551, 2005 (Fig S2)

RefSeq: gene data agreed upon by everyone

Page 18: 1.  Bacterial genomes - genes tightly packed, no introns...

RACE – rapid amplification of cDNA ends

Fig.5.13

2. To obtain sequence info correspondingto termini of mRNAs:

where NNN… might be restriction site(eg. to aid in cloning RACE product)

“specialized” RT-PCR strategy

5’ RACE

- mapping 5’ end of mRNA useful in locating position of promoter

- promoter immediately upstream of transcription start site

How would you carry out 3’ RACE (to determine exact position of 3’end of mRNA)?


Recommended