Date post: | 24-Dec-2015 |
Category: |
Documents |
Upload: | randolf-austin-bryant |
View: | 218 times |
Download: | 2 times |
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Searching for transcription factor binding sites
with TRANSFAC
George Bell, Ph.D.Bioinformatics and Research Computing
Hot Topics – October 2009
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Outline
• What is known about your favorite TFs?
• In what regulatory DNA should we search?
• How can we search for an inexact sequence motif like a TFBS?
• What related resources are available?
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Transcription control is complex
Lodish et al. Molecular Cell Biology. Model for cooperative assembly of an activated transcription-initiation complex at the TTR promoter in hepatocytes
Kettenberger et al., 2004. (1y1w)Complete RNA Polymerase II elongation complex(12 subunits)
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
TRANSFAC at BiobaseConnect from Whitehead network
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
TRANSFAC introduction
• created in 1988• contains information about transcription factors
that have been experimentally determined to bind DNA
• includes eukaryotic cis-acting regulatory DNA elements and trans-acting factors, in organisms ranging from yeast to humans.
• The majority of information has been manually curated from the primary literature.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Browsing transcription factors
Select species Detailed
info
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Types of TRANSFAC data
• Gene – curated info• Promoter – TSS coordinates from
Ensembl, FANTOM, etc.• Functional Region – describes publushed
regulatory regions• Composite Element (with two or more
nearby binding sites)• Site – describes published TFBSs• ChIP-chip – shows data by target• Matrix – contains published aligned
binding sites and positional probabilities
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Transcription factor matrix
A C G T Consensus
1 2 2 0 S
2 1 2 0 R
3 0 1 1 A
0 5 0 0 C
5 0 0 0 A
0 0 4 1 G
0 1 4 0 G
0 0 0 5 T
0 0 5 0 G
0 1 2 2 K
0 2 0 3 Y
1 0 3 1 G
Example: V$MYOD_01 vertebrate MyoD matrix 1
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Matrix identifiers
• Examples: V$MYOD_01, V$AP1_Q4_01 V$ = vertebrate
I$ = insects; P$ = plants; F$ = fungi;
N$ = nematodes; B$ = bacteria
MYOD = factor or family name
01 = matrix number 1 for MYOD
Q* = matrix reliability/quality (1 – 6)
1 Functionally confirmed transcription factor binding site
2 Binding of pure protein (purified or recombinant)
3 Immunologically characterized binding activity of a cellular extract
4 Binding activity characterized via a known binding sequence
5 Binding of uncharacterized extract protein to a bona fide element
6 No quality assigned
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Matrices are redundant
V$MYOD_01
V$MYOD_Q6
V$MYOD_Q6_01
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Extracting regulatory regions
• One, many or all genes?
• Promoters or all potential regions (introns, intergenic)?
• Sources of genomic sequence:– UCSC genome browser (click on “DNA”)– Ensembl BioMart (“Sequences” for output)– Published datasets
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Starting MATCH
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
MATCH profiles (sets of matrices)
Taxon:•all•bacteria•fungi•insects•invertebrates•nematodes•plants•vertebrate_non_redundant•vertebrate_non_redundant_minFN•vertebrate_non_redundant_minFP•vertebrate_non_redundant_minSUM•vertebrates
Tissue:•adipocyte_specific•immune_cell_specific•liver_specific•lung_specific•muscle_specific•nerve_system_specific•pancreatic_beta_cell_specific•pituitary_specific•redox_specific
Biological process:•cell_cycle_specific
User defined:•Muscle_george
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
MATCH output
Core == first 5 most conserved positions
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Creating a custom matrix: input
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Creating a custom matrix: output
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
MATCH Profiler - input
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
MATCH Profiler - output
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
MATCH with our custom profile
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics
Related resources
• UCSC Genome Browser (hg18): – “TFBS Conserved” track (human/mouse/rat)
• JASPAR (public database of transcription factor binding profiles):– http://jaspar.genereg.net/
• Create a sequence logo: http://weblogo.berkeley.edu
• Command-line tools:– TRANSFAC; tffind; HMMER1; MAST (MEME Suite)
• Search for “patterns” ( ex: CAxxTGx[TC] )– EMBOSS: fuzznuc; dreg