+ All Categories
Home > Technology > HMMER 3 & Community Profiling

HMMER 3 & Community Profiling

Date post: 27-May-2015
Category:
Upload: morgan-langille
View: 1,946 times
Download: 1 times
Share this document with a friend
Description:
This is my first lab presentation during my post-doc in Jonathan Eisen's lab. I discuss new features and changes with HMMER 3. Also, I discuss how I used the new version to identify PFAMs in all 80 samples of the GOS metagenomic datasets with the hope of testing of "community profiling" may work.
Popular Tags:
17
HMMER 3 & COMMUNITY PROFILING Morgan Langille UC Davis
Transcript
Page 1: HMMER 3 & Community Profiling

HMMER 3 &COMMUNITY

PROFILING

Morgan Langille

UC Davis

Page 2: HMMER 3 & Community Profiling

HMMER 3 – What’s new?

Much Faster100 X HMMER 2≈ BLAST

More sensitive

Page 3: HMMER 3 & Community Profiling

What’s new?

Alignment column confidence Each residue is given a posterior

probability annotation

* = 95-100% 9= 85-95% 8= 75-85% etc.

fn3 2 saPenlsvsevtstsltlsWsppkdgggpitgYeveyqekgegeewqevtvprtttsvtltgLepgteYefrVqavngagegp 84 saP ++ + ++ l ++W p + +gpi+gY++++++++++ + e+ vp+ s+ +++L++gt+Y++ + +n++gegp7LESS_DROME 439 SAPVIEHLMGLDDSHLAVHWHPGRFTNGPIEGYRLRLSSSEGNA-TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGEGP 520

78999999999*****************************9998.**********************************9997 PP

Page 4: HMMER 3 & Community Profiling

What’s new?

Sequence scores, not alignment scoresscoring just a single best alignment can

break down if it is a remote homologscoring sequences by integrating over

alignment uncertainty

Page 5: HMMER 3 & Community Profiling

Single Sequence Queries phmmer ≈ BLASTP

Search a sequence against a sequence database.

jackhmmer ≈ PSI-BLASTIteratively search a sequence against a

sequence database.

Internally they produce a profile HMM from the query sequence then run an HMM search

Page 6: HMMER 3 & Community Profiling

Small Changes hmmpfam -> hmmscan

Search a sequence against a profile HMM database

hmmcalibrate -> built into hmmbuild

hmmpress Creates binary hmm files so hmmscan is faster Similar idea to formatting Blast db’s using formatdb

New output format options --tblout (seq score, best domain score) --domtblout (seq score, all domain scores with coordinates) Gives a tab-delimited output without alignments 1/5 file size of regular output

Page 7: HMMER 3 & Community Profiling

Upcoming changes

ParallelizationMulti-threaded, MPI (cluster), GPU

Translated comparisonsBLASTX, TBLASTN, TBLASTX

More input sequence formatsGenBank, EMBL, etcClustal format

Page 8: HMMER 3 & Community Profiling

Problems/Issues

hmmconvertUsed to convert hmmer2 profiles into

hmmer3 profilesOnly converts file format

○ Good: get hmmer3 speedup ○ Bad: get hmmer2 sensitivity/specificity

Should rebuild old HMMER2 HMMs using hmmbuild

Page 9: HMMER 3 & Community Profiling

Glocal vs local alignments Local

Any portion of the HMM can align to any portion of the sequence Glocal

The entire HMM is aligned to any portion of the sequence

HMMER2 Had both, but local was not as sensitive as glocal

HMMER3 Local was improved so that glocal was thought to be not needed

(and was not included in HMMER3) However, some models do very poorly Short extremely diverse seed alignments such as zinc finger

transcription factors may be missed

Page 10: HMMER 3 & Community Profiling

Community Profiling

Page 11: HMMER 3 & Community Profiling

Phylogenetic profiling C. hydrogenoformans

identified presence or absence of homologs in all other completely sequence genomes

Identified many hypothetical proteins that had the same profile as other sporulation proteins

Wu, et al., PLOS Genetics, 2005

Page 12: HMMER 3 & Community Profiling

Community ProfilingKEGG COG

Delong, et al., Science, 2006

Page 13: HMMER 3 & Community Profiling

Community Profiling

Look across multiple metagenomic samples

Gene families that have similar profiles may have similar functionSimilar to using co-expression to identify

similar functioning genes

Page 14: HMMER 3 & Community Profiling

So what have I done? Downloaded the GOS peptide file

41M sequences, 80 samples 43GB -> 7GB, by removing extra information Split into ~100 smaller files

Downloaded HMMER 3 Pfams (email request) Containing 11098 Pfams

Ran hmmscan on genbeo 4 days later 12.5 M pfam predictions

○ Some sequences contain >1 pfam 9643 pfams

Used “cluster” to group genes and samples

Page 15: HMMER 3 & Community Profiling

Results Red = above avg.

number of pfams Green = below avg.

number of pfams Have not normalized

Number of sequences per sample

For number of pfams

GOS Metagenomic Samples

Pfams

Page 16: HMMER 3 & Community Profiling

Example of phage Pfams clustering together

Page 17: HMMER 3 & Community Profiling

Future Community Profiling

Include other (all) metagenomic samplesTry to group Pfams by GO category to see how strong

the correlation is between branch length and functionExamine if some functionality categories are more

easily predicted by this profiling strategy (i.e. HGTs)

Identify novel gene families and sub-familiesClustering genes, building HMMs, scanning, …repeat. Community profiling may help in annotation of these


Recommended