+ All Categories
Home > Documents > Bio-informatics tools

Bio-informatics tools

Date post: 30-May-2018
Category:
Upload: dr-rajesh-kumar
View: 217 times
Download: 0 times
Share this document with a friend

of 23

Transcript
  • 8/14/2019 Bio-informatics tools

    1/23

    Genome & Protein Sequence Analysis Programsapplication in establishing Epidemiology and

    Variability

    RAJESH KUMARRAJESH KUMARPh.D 1Ph.D 1stst yryrDairy Microbiology DivisionDairy Microbiology Division

    N.D.R.IN.D.R.I

  • 8/14/2019 Bio-informatics tools

    2/23

  • 8/14/2019 Bio-informatics tools

    3/23

    Major Research efforts of Bio-informatics:-

    Sequence analysis / alignment.

    gene finding.

    genome assembly.

    protein structure alignment.

    protein structure prediction.

    prediction of gene expression and protein-protein interactions.

    modeling of evolution.

  • 8/14/2019 Bio-informatics tools

    4/23

    Sequence Analysis

    Encompasses the use of various bioinformatic methods todetermine the biological function and structure of genesand the proteins.

    DNA sequences Decoded Stored in electronic databases

    Analysis

    Comparative GenomicsPhylogenetic Tree

  • 8/14/2019 Bio-informatics tools

    5/23

    Shotgun Sequencing

    Used in genetics for sequencing long DNA strands.

    DNA small segments sequenced

    Computer programs

    Sequence Alignment:-arrangement of two or more sequences & highlighting

    their similarity.tcctctgcctctgccatcat---caaccccaaagt

    |||| ||| ||||| ||||| ||||||||||||tcctgtgcatctgcaatcatgggcaaccccaaagt

  • 8/14/2019 Bio-informatics tools

    6/23

    Structural Alignment

    More reliable over long evolutionary distances.

    Useful in identifying structurally-conserved regions.

    Multiple Alignment

    extension of pairwise alignment to incorporate more thantwo sequences into an alignment.

    help in the identification of common regions between the

    sequences.ProgramsClustal is used in cladistics to build phylogenetic trees

  • 8/14/2019 Bio-informatics tools

    7/23

    Framesearch

    It is extension of Smith-Waterman, for pairwisealignment between a protein sequence and a nucleotidesequence.

    It dynamically considers every possible single-nucleotideinsertion or deletion to generate the translation that

    best matches the protein sequence.

    Software:-

    Ssearch

    Smith-Waterman remains thegold standardfor protein-rotein or nucleotide-nucleotide airwise ali nment.

  • 8/14/2019 Bio-informatics tools

    8/23

    BLAST

    An algorithm for comparing biological sequences.

    Widely used tools for searching protein and DNAdatabases for sequence similarities.

    It gives answers of following questions:-

    Which bacterial species have a protein that is related in lineageto a certain protein whose amino-acid sequence I know?

    Where does the DNA that I've just sequenced come from?. What other genes encode proteins that exhibit structures ormotifs such as the one I've just determined?

  • 8/14/2019 Bio-informatics tools

    9/23

    To run, BLAST requires two sequences as input:

    a query sequence or target sequence

    a sequence database.

    Search for high scoring sequence alignments.

    Three stages of BLAST:-

    1st stage, BLAST searches for exact matches of a small fixedlength W between the query and sequences in the database.

    2nd stage, BLAST tries to extend the match in both directions,

    starting at the seed.

    If a high-scoring ungapped alignment is found, the databasesequence is passed on to 3rd stage .

  • 8/14/2019 Bio-informatics tools

    10/23

    In 3rd stage BLAST performs a gapped alignmentbetween the query sequence and the database sequence

    Alternative to BLAST is BLAT (Blast Like Alignment Tool).

    FASTA:-

    Slower but more sensitive than BLAST.

    DNA and Protein sequence alignment software package.

    The original FASTP program was designed for protein

    sequence similarity searching.FASTA provided a more sophisticated shuffling programfor evaluating statistical significance.

  • 8/14/2019 Bio-informatics tools

    11/23

    Programs in this package:-

    "FAST-Aye", and stands for "FAST-All.

    "FAST-P" (protein) alignment."FAST-N" (nucleotide) alignment.

    Current FASTA package contains programs for:-

    protein:proteinDNA:DNA.Protein:translated DNAOrdered or unordered peptide searches.

    Recent versions of the FASTA package include specialtranslated search algorithms that correctly handleframeshift errors when comparing nucleotide to proteinsequence data.

  • 8/14/2019 Bio-informatics tools

    12/23

    Clustal

    Clustal is a widely used multiple alignment computerprogram.

    i) ClustalW ii) ClustalX

    Sequence Analysis Programmes:-

    EMBOSS

    European Molecular Biology Open Software Suite (EMBOSS) is aprogram suite for nucleic acid and protein sequence analysis.

    EMBOSS programs manipulate, analyze, and display nucleic acid andprotein sequences.

    Similar in functionality to the commercial GCG Wisconsin Software.

  • 8/14/2019 Bio-informatics tools

    13/23

    PhyloGibbs

    Designed to identify where these regulatory molecules bind toDNA.

    PhyloGibbs compares DNA from multiple species in order toidentify areas in which the genetic code is statistically similarand filter segments that are most likely to be of interest toscientists.

    AutoEditor : Automated correction of sequencing and

    basecaller errors

    a tool for correcting sequencing and basecaller errors usingsequence alignment and chromatogram data.

    On average AutoEditor corrects 80% of erroneous base calls.

    It also greatly improves our ability to discover SNPs between

    closely related strains and isolates of the same species.

  • 8/14/2019 Bio-informatics tools

    14/23

    MUMmerSystem for aligning whole genome sequences. Using an efficientdata structure called a suffix tree, the system is able rapidly toalign sequences containing millions of nucleotides.

    MUMmer 3.0

    Open source.

    Improved efficiency.

    Ability to find non-unique, repetitive matches as well as uniquematches.

    New graphical output modules.

    Applications:-

    MUMmer 1.0 was used to detect numerous large-scale inversionsin bacterial genomes.

  • 8/14/2019 Bio-informatics tools

    15/23

    MUMmer 2.1 was used to align all humanchromosomes to one another and to detect numerouslarge-scale.

    PROmer was used to compare the human and mousemalaria parasites P.falcipariumand P.yoelii.

    Current use of MUMmer 3.0:-

    8) Identifying SNPs and other mutations in a largecollection of Bacillus anthracis strains.

    2) Comparing different assemblies of the same genomeat different stages of sequencing and finishing.

  • 8/14/2019 Bio-informatics tools

    16/23

    PSORT WWW ServerPSORTis a computer program for the prediction of protein localization

    sites in cells.

    WoLF PSORTWoLF PSORT PredictionPSORT II (Recommended for animal/yeast sequences)

    PSORT II Users' ManualPSORT II PredictionPSORT (Old version; for bacterial/plant sequences)PSORT-B (Recommended for Gram-negative bacteria)PSORT-B PredictionPSORT-B, a program applicable to the sequences of Gram-negativebacteria.

    E.coliK12 vs. E.coliO157:H7S.cerevisiae vs. S.pombeA.fumigatus vs. A.nidulansP.falciparum vs.P.yoelii

  • 8/14/2019 Bio-informatics tools

    17/23

    PSORT Prediction

    Source of Input Sequence:

    Gram-positive bacteriumGram-negative bacterium

    yeast

    animal

    plant

    Sequence ID (Default is MYSEQ):

    Enter your Amino Acid sequence below (by copy & paste):

    Characters except the standard 20 codes will be removed off

    To submit the query, press this button: Submit

  • 8/14/2019 Bio-informatics tools

    18/23

    PHIRE

    This Visual Basic program performs an algorithmic string-based searchon bacteriophage genome sequences.

    Discovering and extracting blocks displaying sequence similarity,without any prior experimental or predictive knowledge.

    MB Advanced DNA Analysis

    MB is relatively small and easy to use program.

    Main features of MB are:

    restriction analysisamino acids analysismultiple sequence alignment tooldot plotcalculation of molecular weights and chemical properties of proteinsprediction of 3D structures for small amino acids sequences.

  • 8/14/2019 Bio-informatics tools

    19/23

    UniPro DPviewThis is a tool for finding and analyzing matches betweengenomes.

    SEQtoolsProgram package for routine handling and analysis of DNAand protein sequences.

    The package includes general facilities for sequence andcontig editing, restriction enzyme mapping, translation, andrepeat identification.

    DNA ClubDNA analysis software,Features:- remove vector sequence, find ORF, sequenceediting, translate to protein sequence, protein sequenceediting, RE Map, RE Map with translation, PCR primer

    selection, primer or probe evaluation.

    ZCURVE

  • 8/14/2019 Bio-informatics tools

    20/23

    ZCURVENew highly accurate system for recognizing protein codinggenes in bacterial and archaeal genomes based on the Zcurve theory of DNA sequence.

    DNA for Windowsis a compact, easy to use DNA analysis program, ideal forsmall-scale sequencing projects.

    Webcutteris a free on-line tool to help restriction map nucleotidesequences.Features:-a simple, customizable interface worldwide platform-independent accessibility via the web seamless interfaces to NCBI's GenBank DNA sequence database

    restriction enzyme database.

  • 8/14/2019 Bio-informatics tools

    21/23

    Multilocus sequence typing (MLST)

    Compares sequence variation in numerous housekeeping

    gene targets.Developed for Neisseria gonorrhoeae, Streptococcuspneumoniae, andS. aureus.

    Based on the classic multilocus enzyme electrophoresis(MLEE) method used to study the genetic variability of aspecies.

    Drawbacks:-labor-intensive, time-consuming, and costly.

  • 8/14/2019 Bio-informatics tools

    22/23

    Single-locus sequence typing(SLST)

    compares sequence variation of a single target.

    provides an inexpensive, rapid, objective, and portablegenotyping method to subspeciate bacteria.

    Using a single target depends on finding a region forsequencing that is sufficiently polymorphic to provide usefulstrain resolution.

    Loci with short sequence repeat (SSR) regions may havesuitable variability for discriminating outbreaks.

  • 8/14/2019 Bio-informatics tools

    23/23

    Two S. aureusgenes conserved within the species, protein A (spa) andcoagulase (coa), have variable SSR regions constructed from closely

    related 24- and 81-bp tandem repeat units, respectively.

    The genetic alterations in SSR regions include both point mutations andintragenic recombination that arise by slipped-strand mispairing duringchromosomal replication and that result in a high degree of

    polymorphism.


Recommended