Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | martin-arrieta |
View: | 402 times |
Download: | 1 times |
Introducción a la Bioinformática
Universidad de Sucre.
Programa de Medicina.
Alveiro Pérez-Doria
What is Bioinformatics?
Introduction
• Quantitation and quantitative tools are indispensable in modern
biology. Gregor Mendel and Thomas Morgan, who, by simply
counting genetic variations of plants and fruit flies, were able to
discover the principles of genetic inheritance.
• For very sophisticated uses of quantitative tools, one may find
application to model animal behavior and evolution, or the use of
millions of nonlinear partial differential equations to model cardiac
blood flow. it is clear that mathematical and computational tools
have become an integral part of modern-day biological research.
What is Bioinformatics? • a union of biology and informatics: bioinformatics
involves the technology that uses computers for storage,
retrieval, manipulation, and distribution of information
related to biological macromolecules such as DNA,
RNA, and proteins.
What is Bioinformatics? • For example, mathematical modeling of ecosystems,
population dynamics, and phylogenetic construction
using fossil records all employ computational tools, but
do not necessarily involve biological macromolecules.
Computational molecular biology.
• Bioinformatics is limited to sequence, structural, and
functional analysis of genes and genomes and their
corresponding products
Bioinformatics
Bioinformatics consists of two subfields: the development of
computational tools and databases and the application of these
tools and databases in generating biological knowledge to better
understand living systems.
Bioinformatics
These two subfields are complementary to each other. The tool
development includes writing software for sequence, structural, and
functional analysis, aswell as the construction and curating of
biological databases.
The analyses of biological data often generate new problems and
challenges that in turn spur the development of new and better
computational tools.
GOALS
• Is to better understand a living cell and how it functions at the
molecular level. By analyzing raw molecular sequence and
structural data, bioinformatics research can generate new insights
and provide a “global” perspective of the cell.
GOALS
• The reason that the functions of a cell can be better understood
by analyzing sequence data is ultimately because the flow of
genetic information is dictated by the “central dogma” of biology in
whichDNAis transcribed to RNA, which is translated to proteins.
APPLICATIONS
• Bioinformatics has not only become essential for basic genomic
and molecularbiology research, but is having a major impact on
many areas of biotechnology and biomedical sciences.
• In knowledge-based drug design.
• Forensic DNA analysis.
• Agricultural biotechnology.
APPLICATIONS
• knowledge-based drug design: (identification of
novel leads for synthetic drugs)
• Knowledge of the three-dimensional structures of
proteins allows molecules to be designed that are
capable of binding to the receptor site of a target
protein with great affinity and specificity.
• This informatics-based approach significantly reduces
the time and cost necessary to developdrugs with
higher potency, fewer side effects, and less toxicity
than using the traditional trial-and-error approach
APPLICATIONS
• Forensic DNA analysis: results from molecular phylogenetic analysis have been accepted as evidence in criminal courts. • Identify potential suspects whose DNA may match evidence left at crime scenes • Exonerate persons wrongly accused of crimes • Identify crime and catastrophe victims • Establish paternity and other family relationships • Identify endangered and protected species• Detect bacteria and other organisms that may
pollute air, water, soil, and food • Match organ donors with recipients in transplant• Determine pedigree for seed or livestock breeds • Authenticate consumables such as caviar and wine
APPLICATIONS
• Biomedical science: the genomics and bioinformtics are now poised to
revolutionize our healthcare system by developing personalized and
customized medicine.
APPLICATIONS
• Biomedical science: the genomics and bioinformtics are now poised to
revolutionize our healthcare system by developing personalized and
customized medicine.
APPLICATIONS
• Biomedical science: The high speed
genomic sequencing coupled with
sophisticated informatics technology will
allow a quickly sequence a patient’s
genome and easily detect potential
harmful mutations and to engage in
early diagnosis and effective treatment
of diseases.
Single-nucleotide polymorphism (SNP)
• What Is SNP?: its represents a substitution of one base for another, e.g., C to T or A
to G.
• SNP is the most common variation in the human genome and occurs approximately
once every 100 to 300 bases.
• SNP is terminologically distinguished from mutation based on an arbitrary population
frequency cutoff value: 1%, with SNP > 1% and mutation < 1%.
• A key aspect of research in genetics is associating sequence variations with
heritable phenotypes. Because SNPs are expected to facilitate large-scale
association genetics studies, there has been an increasing interest in SNP discovery
and detection.
SNP and Human Disease
• Identification of SNPs that contribute to susceptibility to common diseases will
provide highly accurate diagnostic information that will facilitate early diagnosis,
prevention, and treatment of human diseases.
• Common SNPs, ranging from a minor allele frequency of 5 to >20%, are of interest
because it has been argued that common genetic variation can explain a proportion
of common human disease — the common variant/common disease (CV/CD)
hypothesis.
• There are two types of coding SNPs: nonsynonymous SNPs and synonymous SNPs.
SNP and Human Disease
• “Direct” approach: nonsynonymous SNPs directly affect protein function, many
investigators focus on the genotyping of coding SNPs in genetic association studies,
however this approach lies in predicting or determining a priori which SNPs are likely
to be causative or predicting the phenotype of interest.
• The “indirect” approach to genetic association studies differs from the direct approach
in that the causal SNP is not assayed directly.
• The assumption is that the assayed or genotyped SNPs will be in linkage
disequilibrium or associated with the causative SNP.
Cystic Fibrosis
• Is caused by mutations in a single gene encoding CFTR, the disease has a variable clinical
phenotype.
• The most common mutation associated with cystic fibrosis, deletion of a phenylalanine at position
508 (frequency, 67%), is associated with severe disease. But some mutations, in which arginine
is replaced by histidine at residue at 117 (R117H; 0.8%), tryptophan at 334 (0.4%), or proline at
347 (0.5%), are associated with milder disease.
• NCBI: CFTR cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-
family C, member 7) [ Homo sapiens ]
APPLICATIONS
• Agricultural biotechnology: Plant genome databases and gene
expression profile analyses have played an important role in the
development of new crop varieties that have higher productivity
and more resistance to disease, insecticides and insects.
LIMITATIONS
• Bioinformatics and experimental biology are
independent, but complementary, activities.
Bioinformatics depends on experimental science to
produce raw data for analysis.
• Bioinformatics predictions are not formal proofs of any
concepts. They do not replace the traditional
experimental research methods of actually testing
hypotheses.
LIMITATIONS
• The quality of bioinformatics predictions depends on the
quality of data and the sophistication of the algorithms
being used. (Sequence data from high throughput
analysis often contain errors), it is so important to
maintain a realistic perspective of the role of
bioinformatics.
NEW THEMES
• To providing more reliable and more rigorous computational tools for sequence, structural, and functional analysis, the major challenge for future bioinformatics development is to develop tools for elucidation of the functions and interactions of all gene products in a cell..
• This molecular simulation of all the cellular processes is termed systems biology. The ultimate goal of this endeavor is to transform biology from a qualitative science to a quantitative and predictive science.
Databases
DNA Sequence DataEBI: http://www.ebi.ac.uk/NCBI:http://www.ncbi.nlm.nih.gov/DDBJ:http://www.ddbj.nig.ac.jp/
(Sequence Retrieval System) NCBI ( National Center for Biotechnology Information) ( European Bioinformatics Institute)
Introduction
• One of the hallmarks of modern genomic research is the generation of
enormous amounts of raw sequence data.
• As the volume of genomic data grows, sophisticated computational
methodologies are required to manage the data deluge.
• The very first challenge in the genomics era is store and handle the
staggering volume of information through the establishment and use of
computer databases.
• The development of databases to handle the vast amount of molecular
biological data is thus a fundamental task of bioinformatics.
WHAT IS A DATABASE?
• A database is a computerized archive used to store and organize
data in such a way that information can be retrieved easily via a
variety of search criteria.
• Databases are composed of computer hardware and software for
data management.
• Each record, also called an entry, should contain a number of fields
that hold the actual data items, for example, fields for names, phone
numbers, addresses, dates.
• To retrieve a particular record from the database, a user can
specify a particular piece of information, called value, to be found in
a particular field and expect the computer to retrieve the whole data
record. This process is called making a query.
Which courses are students from Texas taking?
• Construction and query of an object-oriented database. Three objects are constructed and are linked by pointers shown as arrows. Finding specific information relies on navigating through the objects by way of pointers. For simplicity, some of the pointers are omitted
MODERN DNA SEQUENCING
Historically there are two main methods of DNA sequencing:
Maxam & Gilbert, using chemical sequencing
Sanger, using dideoxynucleotides. Modern sequencing equipment uses the principles of the Sanger technique.
How to obtain sequences
Step 1- Before submission for sequencing DNA purity & concentration is checked
with the ‘Nanodrop’
A Nanodrop readout of known concentration to be run as a control
Step 2 -Samples are received and
stored in the refrigerator and a
request filed
Cost?
Samples arrive in Eppendorf tubes
Step 3 - paperwork. Each request is assigned a ‘well’ in the sample tray and volumes of primers, water, dye, etc are calculated. A typical ‘run’ has samples
from a number of researchers
Step 4- Samples are agitated then centrifuged in an Ultracentrifuge to be sure they are in the bottom of their Eppendorf
tubes.
Step 5 - Reagents
• Each reaction requires several reagents:• Specific primers for the DNA in question• Fluorescent Dye attached to DD
nucleotides (Big Dye)• Deionised water• DNA polymerase• Additionally, a ‘control’ sample of a
known DNA is prepared so it can run at the same time as the experimental DNA
Micropipettors come in a range of sizes. They have disposable tips that hold tiny amounts of required reagents.
Step 6 - Preparing the wells
• The Sample wells are loaded with DNA to be sequenced. Great care needs to be taken to ensure that each sample goes into its assigned well.
• Reagents are added (water, dye, primers) in required amounts
• The sample wells are ‘spun’ to ensure that the DNA and reagents are mixed and at the bottom of the sample wells.
Sample tray and micropipettor. Each tray holds 96 samples
Step 7 - The samples are run through a cycle
sequencing process to get the fluorescent dyes incorporated by
the DNA.The DNA and reagents are
alternately heated and cooled over a2 1/2 hour period.
A Sequence print-out from a control sample
THANK YOU