+ All Categories

BLAST

Date post: 22-Jan-2016
Category:
Upload: tamira
View: 235 times
Download: 8 times
Share this document with a friend
Description:
BLAST. Objectives. Gain familiarity with sequence searches and comparisons via web-based BLAST To understand the BLAST algorithm To understand the principles of BLAST scoring and BLAST statistics To understand scoring matrices To become aware of other BLAST services and applications. BLAST. - PowerPoint PPT Presentation
Popular Tags:
26
BLAST Objectives Gain familiarity with sequence searches and comparisons via web- based BLAST To understand the BLAST algorithm To understand the principles of BLAST scoring and BLAST statistics To understand scoring matrices To become aware of other BLAST
Transcript
Page 1: BLAST

BLAST

Objectives

• Gain familiarity with sequence searches and comparisons via web-based BLAST

• To understand the BLAST algorithm

• To understand the principles of BLAST scoring and BLAST statistics

• To understand scoring matrices

• To become aware of other BLAST services and applications

Page 2: BLAST

BLAST

• Basic Local Alignment Search Tool

• Developed in 1990 and 1997 (Altschul et al.)

• A heuristic method for performing local alignments through searches of high scoring pairs (HSPs)

• First to use statistics to predict significance of initial matches – saves on false leads

• Offers both sensitivity and speed

Page 3: BLAST

BLAST

• Looks for clusters of nearby or locally dense “similar or homologous” words/k-tuples

• Uses look-up tables to shorten the search time

• Use larger “word size” than FASTA to accelerate the search process

• Does both Global and Local Alignment

• Fastest and most frequently used sequence alignment program tool – de facto standard

Page 4: BLAST

BLAST

• NCBI BLASThttp://www.ncbi.nih.gov/BLAST/

• European Bioinformatics Institute • NCBI BLAST

http://www.ebi.ac.uk/Tools/sss/ncbiblast/• WUBLAST

http://www.ebi.ac.uk/Tools/sss/wublast/

• Rosaceae BLAST (www.rosaceae.org)• Legume BLAST (http://lis.comparative-

legumes.org; www.gabcsfl.org)• Grasses BLAST (www.gramene.org)

Page 5: BLAST

• BLASTP – protein query against protein DB

• BLASTN – DNA/RNA query against DNA DB

• BLASTX – 6 frame translation of DNA query against protein DB

• TBLASTN – protein query against 6 frame translation of DNA DB

• TBLASTX – 6 frame translation of DNA query against 6 frame translation of DNA DB

• BLAST2SEQ – for performing pairwise alignments for 2 chosen sequences

Types of BLAST

Page 6: BLAST

Types of BLAST

• PSI-BLAST - protein “profile” query against protein DB

• PHI-BLAST – protein pattern against protein DB

• RPS-BLAST – Conserved Domain Detection

• MEGABLAST – for comparison of large sets of long DNA sequences

• Primer BLAST – uses Primer3 to design PCR primers

• Genomic BLAST – for alignments against completed genomes

• VecScreen – for detecting cloning vector contamination in sequenced data.

see last weeks handout for rest of them

Page 7: BLAST

Types of Comparison

• What program will best suite your query and desired output?

• DNA sequences contain less information with which to deduce homology than do the encoded protein sequences when compared using simple nucleotide substitution scores, 20 aa vs 4 nt!

• Protein comparisons give more meaningful results

• Moderately similar nt sequences often a highly similar protein sequence

Page 8: BLAST

NCBI WEB BLAST

Step 1: Select a BLAST program

In this example we willchoose nucleotide blast

Page 9: BLAST

Basic BLAST OptionsUsing Nucleotide

BLAST

Step 2: Type in your sequence in FASTA format or type in a GI or accessionnumber or upload a file

>my protein MT08976KIQIYTGTCANGTCKIQIYTGTCANGTCKIQ

IYGTCANGTCKIQIYTGTCANGTC

MEGABLAST is specifically designed to efficiently find long alignments between very similar sequences

Step 3: Give your search a name/title

Step 4: Choose a database to search

Page 10: BLAST

NCBI WEB BLAST

Page 11: BLAST

Basic BLAST Options

Using Nucleotide BLAST

Step 2: Type in your sequence in FASTA format or type in a GI or accessionnumber or upload a file

>my protein MT08976KIQIYTGTCANGTCKIQIYTGTCANGTCKIQ

IYGTCANGTCKIQIYTGTCANGTC

Note you can also restrict the range of your

query to be searchedStep 3: Give your search a name/title

Step 4: Choose a database to search

and/or restrict the database selectione.g. Viridiplantae [ORGN] of the nr

Means restrict my search to just plant proteins

Page 12: BLAST

Basic BLAST Options

Using Nucleotide BLASTStep 5: Choose a BLAST program

Megablast is intended for comparing a query to closely related sequences and works best if the target percent identity is 95% or more but is very fast.

Discontiguous megablast uses an initial seed that ignores some bases (allowing mismatches) and is intended for cross-species comparisons.

BlastN is slow, but allows a word-size downto seven bases.

Page 13: BLAST
Page 14: BLAST
Page 15: BLAST
Page 16: BLAST

Basic BLAST OptionsUsing Amino Acid

BLAST

Steps 2 – 4 the same as in nt blast

Step 5: Choose a BLAST program

BlastP simply compares a protein query to a protein database. It is used for finding similar sequences in protein databases. It is designed to find local regions of similarity but when sequence similarity spans the whole sequence, blastp will also report a global alignment.

PSI-BLAST is the most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family.

PHI-BLAST performs the search but limits alignments to those that match a pattern in the query.

Page 17: BLAST

Basic BLAST OptionsBLASTP RESULTS for

Steps 2 – 4 the same as in nt blast

NP_001031578.1

Page 18: BLAST

Basic BLAST OptionsBLASTP RESULTS for

Steps 2 – 4 the same as in nt blast

NP_001031578.1

Page 19: BLAST

• The sequence accession name: this takes you to the database entry that contains the sequence.

• Description of the match organism and any assigned/putative function

• The alignment score. Higher scoring hits are at the top

• Query coverage is how much of your sequence aligned to the match

• The expectation value (E Value) which provides an estimate of statistical significance. This tells you the number of times you could have expected such a good match only by chance. The E value provides you with the most important measure of statistical significance.

BLAST OUTPUT by Column

Page 20: BLAST

Interpreting Significance of BLAST Results

• In general:

DNA to DNA alignmentFor nucleotide sequences at least 100 bp long, if 70% of your nucleotides are identical with your match sequence then they can be considered to be homologous

AA to AA alignmentFor amino acid sequences at least 100 aa long, if 25% of your aa are identical with your match sequence then they can be considered to be homologous

Below these values, the alignments are considered to be in the twilight zone!

However how do you tell the difference between 60 matched residues spread over a 100 residue segment and 120 matches spread over a 200 residue segment? The longest is probably the more meaningful but the percent identitysays nothing about this!

Page 21: BLAST

Interpreting Significance of BLAST Results

• So we use E values. In theory any match with an E value below 1 should all be trusted. In practice this is NOT true because BLAST uses an approximate formula for computing E values and strongly underestimates them.

• Rule of thumb, look for E values above 1e-4 (0.0001). So if you want to be certain of homology , your E value must be lower than 0.0001

• Caveat – if you are doing a blast search with thousands of query sequences you need to take into account the size of your query database and lower the E value further.

Page 22: BLAST

15,000 EST query sequences

• A 10-3 E-value cutoff means that you should expect one false positive in 1000 searches.

• Thus with 15,000 searches, we should expect 15 false positives with a cutoff of 10-3.

• To reduce the chances of identifying a false positive, set the E-value cutoff lower.

• For 15,000 searches, an E-value cutoff of 10-5 will mean that you should expect 0.15 false positives. Most of the time we make it even lower < 10-6

Interpreting Significance of BLAST Results

Page 23: BLAST
Page 24: BLAST

Database Searching Questions

• What database should I search?

• What kind of sequences should I search with?

• What E-value is significant?

• What can I reliably infer about the function of my sequence based on homology?

Sequence Analysis I

Page 25: BLAST

• Bigger databases have more sequences.

• Bigger databases are also more redundant, which can skew the statistics.

• Bigger databases are also poorly annotated (homology with an "unidentified sequence" doesn't really tell you much)

• Bigger databases take lots of time to search.

Databases

Sequence Analysis I

Page 26: BLAST

• Smaller databases (like Swiss-Prot) are often better curated and annotated.

• Smaller databases are much less redundant.

• Smaller databases can contain phylogenetically relevant sequences (all plant)

• Smaller databases are much faster to search.

Databases Cont.

Sequence Analysis I


Recommended