+ All Categories
Home > Documents > An Introduction to Bioinformatics

An Introduction to Bioinformatics

Date post: 02-Feb-2016
Category:
Upload: shanta
View: 51 times
Download: 0 times
Share this document with a friend
Description:
An Introduction to Bioinformatics. Finding genes in prokaryotes. AIMS. To establish the concept of ORFs and their relationship to genes. To describe the features used by software to find ORFs/genes. To become familiar with Web-based programmes used to find ORFs/genes. OBJECTIVES. - PowerPoint PPT Presentation
Popular Tags:
37
An Introduction to Bioinformatics Finding genes in prokaryotes
Transcript
Page 1: An Introduction to Bioinformatics

An Introduction to Bioinformatics

Finding genes in prokaryotes

Page 2: An Introduction to Bioinformatics

AIMS

To establish the concept of ORFs and their relationship to genes

To describe the features used by software to find ORFs/genes

To become familiar with Web-based programmes used to find ORFs/genes

OBJECTIVES

To be able to distinguish between the concepts of ORF and gene

Use ORF Finder to find ORFs in prokaryotic nucleotide sequences

Page 3: An Introduction to Bioinformatics

Usually the primary challenge that follows the sequencing of anything from a small segment of DNA to a complete genome is to establish where the location functional elements such as:

genes (intron/exon boundaries) promoters, terminators etc

DNA sequences that may potentially encode proteins are calledOpen Reading Frames (ORFs)

The situation in prokaryotes is relatively straightforward since scarcely any eubacterial and archaeal genes contain introns

Page 4: An Introduction to Bioinformatics

FINDING ORFs

The simplest method in prokaryotes is to scan the DNA for start and stop codons

The DNA is double stranded and each strand has three potential reading frames (codons are groups of 3 bases)

THE CAT ATE THE RAT Frame 1

T HEC ATA TET HER AT Frame 2

TH ECA TAT ETH ERA T Frame 3

The scan must look at all 6 reading frames

Page 5: An Introduction to Bioinformatics

Any region of DNA between a start codon and a stop codon in the same reading frame could potentially code for a polypeptide and is therefore an ORF

Start AUG (methionine) Stop UAA UAG UGA

small potential coding sequences like this will occur frequently by chance, and therefore the longer they are the more likely they are to represent real coding regions, genes

Problems

Small genes may be missed

The actual start codon may be internal to the ORF

There may be overlapping genes

Page 6: An Introduction to Bioinformatics

The simplest tool for finding ORFs is ORF Finder at NCBI

It simply scans all 6 reading frames and shows the position ofthe ORFs which are greater than a user defined minimum size

The genetic code used for the analysis can be altered by the user

This would be important if e.g. mitochondrial or ciliate nuclearDNA were being analysed

Page 7: An Introduction to Bioinformatics
Page 8: An Introduction to Bioinformatics
Page 9: An Introduction to Bioinformatics
Page 10: An Introduction to Bioinformatics
Page 11: An Introduction to Bioinformatics
Page 12: An Introduction to Bioinformatics
Page 13: An Introduction to Bioinformatics
Page 14: An Introduction to Bioinformatics
Page 15: An Introduction to Bioinformatics
Page 16: An Introduction to Bioinformatics

To overcome the limitations of ORF finder, more sophisticated programmes detect compositional biases and increase the reliability of gene detection

These compositional biases are regular, though very diffuse, And arise for a variety of reasons:

many organisms there is a detectable preference for G or C over A and T in the third ("wobble") position in a codon

all organisms do not utilize synonymous codons with the same frequency - consequently there is a codon bias

there is an unequal usage of amino acids in proteins sufficient to cause a bias in all three positions of codons and increase the overall codon bias

Page 17: An Introduction to Bioinformatics

the %GC content of the first two codon positions of the universal genetic code is approximately 50%, therefore, organisms which have a low or high %GC content will exhibita marked bias at the third position of codons to achieve their overall %GC content

The most recent approaches to using compositional features to distinguish coding from non-coding regions employ ‘Markov models’

such approaches include the popular GENEMARK and GLIMMER programs

Page 18: An Introduction to Bioinformatics
Page 19: An Introduction to Bioinformatics
Page 20: An Introduction to Bioinformatics
Page 21: An Introduction to Bioinformatics
Page 22: An Introduction to Bioinformatics

Finding Genes in Eukaryotes

An Introduction to Bioinformatics

Page 23: An Introduction to Bioinformatics

AIMS To establish the concept of ORFs and their relationship to genes

To describe the features used by software to find ORFs/genes

To become familiar with Web-based programmes used to find ORFs/genes

OBJECTIVES

To be able to distinguish between the concepts of ORF and gene

Use ORF Finder to find ORFs in prokaryotic nucleotide sequences

To describe the complications of the eukaryote “signals”

To be aware of the Web-based programmes

To be able to use the eukaryote programmes for a number of organisms

Page 24: An Introduction to Bioinformatics

Organisms whose cells have a membrane-bound nucleus and many specialised structures located within their cell boundary.

In these organisms, genetic material is organized into chromosomes that reside in the nucleus.

Page 25: An Introduction to Bioinformatics

Principles

• Content - codon usage– often species or class specific

• Signals - PWMs– principle is the same, signals are different

– Complication of introns/exons

Page 26: An Introduction to Bioinformatics

Eukaryotic promoter

TATA boxGC boxCAAT box5’ 3’

-110 -40 -25 +1mRNA

In addition - transcription factor binding sites

Genes can be enormous!

Controlled by “distant” enhancers

Page 27: An Introduction to Bioinformatics

AAUAA

~ 12bp polyA

AAAAA…...

Kozak sequence

At translational start

Polyadenylation sequence

AUG

Signals on the mRNA

STOP

Page 28: An Introduction to Bioinformatics

Introns and Exons

Chicken 12 collagen genehas - 38 kb > 50 Introns

Muscular Dystrophy gene is 2.5 Mb and has? Exons!

Page 29: An Introduction to Bioinformatics

Splicing signals

C A T CA G C T

AGGT AGT N AGG( )>11

5’Exon 3’Exon

GT-AG rule

Page 30: An Introduction to Bioinformatics

Exon finding

• Initial exons, from the initiation codon to the first splice site;

• Internal exons from splice site to splice site;

• Terminal exons from splice site to stop codon;

• Single introns corresponding to uninterrupted, intronless genes, i.e., running from initiation codon to stop codon.

Page 31: An Introduction to Bioinformatics

Intergrated Gene Parsing

• Search for signals

• Perform a content analysis

• Define the intron/exon boundaries

Page 32: An Introduction to Bioinformatics

Gene finding web sites

http://www.tigr.org/~salzberg/appendixa.html

>25 listed sites

GENSCAN

FGENES

Page 33: An Introduction to Bioinformatics
Page 34: An Introduction to Bioinformatics
Page 35: An Introduction to Bioinformatics
Page 36: An Introduction to Bioinformatics
Page 37: An Introduction to Bioinformatics

Recommended