Intro to Probabilistic ModelsPSSMs
Computational Genomics,Lecture 6b
Partially based on slides by Metsada Pasmanik-Chor
Biological MotivesA large number of biological units with common functions tend to exhibit similarities at the sequence level. These include very short “motives”, such asgene splice sites, DNA regulatory binding sites, recognized by transcription factors (proteins that bind to the promoter and control gene expression), microRNAs, and all the way to protein families.
Often it is desirable to model such motives, to enable searching for new ones. Probabilistic models are veryuseful. Today we deal with PSSM - the simplest.
Promoter…
Regulation of Genes
GeneRegulatory Element
RNA polymerase(Protein)
Transcription Factor(Protein)
DNA
www.cs.washington.edu/homes/tompa/papers/binding.ppt
Gene
RNA polymerase
Transcription Factor(Protein)
Regulatory Element
DNA
Regulation of Genes
Gene
RNA polymeraseTranscription Factor
Regulatory Element
DNA
New protein
Regulation of Genes
Motif Logo• Motifs can mutate on
less important bases. • The five motifs at top
right have mutations in position 3 and 5.
• Representations called motif logos illustrate the conserved regions of a motif.
http://weblogo.berkeley.eduhttp://fold.stanford.edu/eblocks/acsearch.html
1234567TGGGGGATGAGAGATGGGGGATGAGAGATGAGGGA
Position:
Example: Calmodulin-Binding Motif (calcium-binding proteins)
PSSM Starting Point
• A gap-less MSA of known instances of a given motif. Representing the motif by either:1. Consensus.2. Position Specific Scoring Matrix (PSSM).
Consider now a specific “motives server”,called Consite.
Sequence logos: Visualizing PSSMs
Sequence logos: Visualizing PSSMs (2)