+ All Categories
Home > Documents > Bioinformatics

Bioinformatics

Date post: 25-Feb-2016
Category:
Upload: chaka
View: 24 times
Download: 0 times
Share this document with a friend
Description:
2010-2011. Bioinformatics. Lecture 3 Finding Motifs. Dr. Aladdin Hamwieh Khalid Al- shamaa Abdulqader Jighly. Aleppo University Faculty of technical engineering Department of Biotechnology. Main Lines. Definition Motif types Motifs problem Motifs: Profiles and Consensus - PowerPoint PPT Presentation
Popular Tags:
22
Bioinformatics Dr. Aladdin Hamwieh Khalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 3 Finding Motifs po University lty of technical engineering rtment of Biotechnology
Transcript
Page 1: Bioinformatics

Bioinformatics

Dr. Aladdin Hamwieh Khalid Al-shamaaAbdulqader Jighly

2010-2011

Lecture 3Finding Motifs

Aleppo UniversityFaculty of technical engineeringDepartment of Biotechnology

Page 2: Bioinformatics

Main Lines• Definition• Motif types• Motifs problem• Motifs: Profiles and Consensus• Motif Logo• Motif Search in Local Database

Page 3: Bioinformatics

Definition

• A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA.

Page 4: Bioinformatics

Motif Types1. Regulatory sequences

Page 5: Bioinformatics

Combinatorial Gene Regulation

• A microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed

–How can one gene have such drastic effects?

Combinatorial Gene Regulation

Page 6: Bioinformatics

Combinatorial Gene Regulation•Gene X encodes regulatory protein, a.k.a. a transcription factor (TF)

•The 20 unexpressed genes rely on gene X’s TF to induce transcription

•A single TF may regulate multiple genes

Regulatory Protein

Page 7: Bioinformatics

• Every gene contains a regulatory region (RR) typically stretching 100-1000 bp upstream of the transcriptional start site• Located within the RR are the

Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor• TFs influence gene expression by

binding to a specific location in the respective gene’s regulatory region - TFBS

Regulatory Regions

Page 8: Bioinformatics

• A TFBS can be located anywhere within the Regulatory Region.

• TFBS may vary slightly across different regulatory regions since non-essential bases could mutate

Transcription Factor Binding Sites

Page 9: Bioinformatics

geneATCCCG

geneTTCCGG

geneATCCCG

geneATGCCG

geneATGCCC

Motifs and Transcriptional Start Sites

Page 10: Bioinformatics

TTGACA

-35 hexamerspacer

TATAAT

-10 hexamer

Transcription start site

interval

15 - 19 bases 5 - 9 bases

-35 -10

A weight matrix contains more information

ATGC

1 2 3 4 5 6ATGC

1 2 3 4 5 6

Based on ~450 known promoters

0.1 0.1 0.1 0.5 0.2 0.5 0.7 0.7 0.2 0.2 0.2 0.2

0.1 0.1 0.5 0.1 0.1 0.2

0.1 0.1 0.2 0.2 0.5 0.1

0.1 0.7 0.2 0.6 0.5 0.1

0.7 0.1 0.5 0.2 0.2 0.8

0.1 0.1 0.1 0.1 0.1 0.0

0.1 0.1 0.2 0.1 0.1 0.1

Consensus considerations

Page 11: Bioinformatics

• GAL4 in Yeast– Activator of galactose-

induced genes (convert galactose to glucose)

– Protein structure determines motif• DNA-protein interactions

require certain bases at specified locations• Motif reflects homodimer

structure

Example

Page 12: Bioinformatics

Motif Types2. Motifs in protein structure

Page 13: Bioinformatics

Importance• Functional relationships between

proteins cannot be distinguished through simple BLAST or FASTA database. • Proteins often perform multiple functions

that cannot be fully described using a single annotation. • To resolve these issues, identification of

the motifs and domains becomes very useful.

Page 14: Bioinformatics

atgaccgggatactgataccgtatttggcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtaca

tgagtatccctgggatgacttttgggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgaccttgtaagtgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttatag

gtcaatcatgttcttgtgaatggatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttggtttcgaaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagca

Random Sample

Page 15: Bioinformatics

Implanting Motif AAAAAAAGGGGGGG

atgaccgggatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGGa

tgagtatccctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttatag

gtcaatcatgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGGa

Page 16: Bioinformatics

• Hard to identify– Relatively short sequences (as small as 6

bases)– Many positions not well conserved

• Factors improving identification– Usually localized in certain proximity of a

gene (search within 3 kb upstream)– Some positions highly conserved– Use other data (Microarray?)

The Challenge

Page 17: Bioinformatics

• Find a motif in a sample of:• 20 “random” sequences (e.g. 600

nt long)• each sequence containing an

implanted pattern of length 15. • each pattern appearing with 4

mismatches as (15,4) motif.

Challenge Problem

Page 18: Bioinformatics

atgaccgggatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggga

tgagtatccctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttatag

gtcaatcatgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgga

Where is the Motif???

Page 19: Bioinformatics

AgAAgAAAGGttGGG

cAAtAAAAcGGcGGG|||..|.|||..|..

Why Finding (15,4) Motif is Difficult?

atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg

acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGGa

tgagtatccctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga

gctgagaattggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga

tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttatag

gtcaatcatgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa

cggttttggcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttcat

aacttgagttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta

ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaag

ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGGa

Page 20: Bioinformatics

a G g t a c T t C c A t a c g t

Alignment a c g t T A g t

a c g t C c A t C c g t a c g G

_________________

A 3 0 1 0 3 1 1 0

Profile C 2 4 0 0 1 4 0 0

G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4

_________________

Consensus A C G T A C G T

• Line up the patterns by their start indexes

s = (s1, s2, …, st)

• Construct matrix profile with frequencies of each nucleotide in columns

• Consensus nucleotide in each position has the highest score in column

Motifs: Profiles and Consensus

Page 21: Bioinformatics

Motif Search in Local Database

Page 22: Bioinformatics

Recommended