Introducción a la bioinformatica

transcript

Introducción a la Bioinformática

Universidad de Sucre.

Programa de Medicina.

Alveiro Pérez-Doria

alveiroperez@gmail.com

What is Bioinformatics?

Introduction

• Quantitation and quantitative tools are indispensable in modern

biology. Gregor Mendel and Thomas Morgan, who, by simply

counting genetic variations of plants and fruit flies, were able to

discover the principles of genetic inheritance.

• For very sophisticated uses of quantitative tools, one may find

application to model animal behavior and evolution, or the use of

millions of nonlinear partial differential equations to model cardiac

blood flow. it is clear that mathematical and computational tools

have become an integral part of modern-day biological research.

What is Bioinformatics? • a union of biology and informatics: bioinformatics

involves the technology that uses computers for storage,

retrieval, manipulation, and distribution of information

related to biological macromolecules such as DNA,

RNA, and proteins.

What is Bioinformatics? • For example, mathematical modeling of ecosystems,

population dynamics, and phylogenetic construction

using fossil records all employ computational tools, but

do not necessarily involve biological macromolecules.

Computational molecular biology.

• Bioinformatics is limited to sequence, structural, and

functional analysis of genes and genomes and their

corresponding products

Bioinformatics

Bioinformatics consists of two subfields: the development of

computational tools and databases and the application of these

tools and databases in generating biological knowledge to better

understand living systems.

Bioinformatics

These two subfields are complementary to each other. The tool

development includes writing software for sequence, structural, and

functional analysis, aswell as the construction and curating of

biological databases.

The analyses of biological data often generate new problems and

challenges that in turn spur the development of new and better

computational tools.

• Is to better understand a living cell and how it functions at the

molecular level. By analyzing raw molecular sequence and

structural data, bioinformatics research can generate new insights

and provide a “global” perspective of the cell.

• The reason that the functions of a cell can be better understood

by analyzing sequence data is ultimately because the flow of

genetic information is dictated by the “central dogma” of biology in

whichDNAis transcribed to RNA, which is translated to proteins.

APPLICATIONS

• Bioinformatics has not only become essential for basic genomic

and molecularbiology research, but is having a major impact on

many areas of biotechnology and biomedical sciences.

• In knowledge-based drug design.

• Forensic DNA analysis.

• Agricultural biotechnology.

APPLICATIONS

• knowledge-based drug design: (identification of

novel leads for synthetic drugs)

• Knowledge of the three-dimensional structures of

proteins allows molecules to be designed that are

capable of binding to the receptor site of a target

protein with great affinity and specificity.

• This informatics-based approach significantly reduces

the time and cost necessary to developdrugs with

higher potency, fewer side effects, and less toxicity

than using the traditional trial-and-error approach

APPLICATIONS

• Forensic DNA analysis: results from molecular phylogenetic analysis have been accepted as evidence in criminal courts. • Identify potential suspects whose DNA may match evidence left at crime scenes • Exonerate persons wrongly accused of crimes • Identify crime and catastrophe victims • Establish paternity and other family relationships • Identify endangered and protected species• Detect bacteria and other organisms that may

pollute air, water, soil, and food • Match organ donors with recipients in transplant• Determine pedigree for seed or livestock breeds • Authenticate consumables such as caviar and wine

APPLICATIONS

• Biomedical science: the genomics and bioinformtics are now poised to

revolutionize our healthcare system by developing personalized and

customized medicine.

APPLICATIONS

• Biomedical science: the genomics and bioinformtics are now poised to

revolutionize our healthcare system by developing personalized and

customized medicine.

APPLICATIONS

• Biomedical science: The high speed

genomic sequencing coupled with

sophisticated informatics technology will

allow a quickly sequence a patient’s

genome and easily detect potential

harmful mutations and to engage in

early diagnosis and effective treatment

of diseases.

Single-nucleotide polymorphism (SNP)

• What Is SNP?: its represents a substitution of one base for another, e.g., C to T or A

• SNP is the most common variation in the human genome and occurs approximately

once every 100 to 300 bases.

• SNP is terminologically distinguished from mutation based on an arbitrary population

frequency cutoff value: 1%, with SNP > 1% and mutation < 1%.

• A key aspect of research in genetics is associating sequence variations with

heritable phenotypes. Because SNPs are expected to facilitate large-scale

association genetics studies, there has been an increasing interest in SNP discovery

and detection.

SNP and Human Disease

• Identification of SNPs that contribute to susceptibility to common diseases will

provide highly accurate diagnostic information that will facilitate early diagnosis,

prevention, and treatment of human diseases.

• Common SNPs, ranging from a minor allele frequency of 5 to >20%, are of interest

because it has been argued that common genetic variation can explain a proportion

of common human disease — the common variant/common disease (CV/CD)

hypothesis.

• There are two types of coding SNPs: nonsynonymous SNPs and synonymous SNPs.

SNP and Human Disease

• “Direct” approach: nonsynonymous SNPs directly affect protein function, many

investigators focus on the genotyping of coding SNPs in genetic association studies,

however this approach lies in predicting or determining a priori which SNPs are likely

to be causative or predicting the phenotype of interest.

• The “indirect” approach to genetic association studies differs from the direct approach

in that the causal SNP is not assayed directly.

• The assumption is that the assayed or genotyped SNPs will be in linkage

disequilibrium or associated with the causative SNP.

Cystic Fibrosis

• Is caused by mutations in a single gene encoding CFTR, the disease has a variable clinical

phenotype.

• The most common mutation associated with cystic fibrosis, deletion of a phenylalanine at position

508 (frequency, 67%), is associated with severe disease. But some mutations, in which arginine

is replaced by histidine at residue at 117 (R117H; 0.8%), tryptophan at 334 (0.4%), or proline at

347 (0.5%), are associated with milder disease.

• NCBI: CFTR cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-

family C, member 7) [ Homo sapiens ]

APPLICATIONS

• Agricultural biotechnology: Plant genome databases and gene

expression profile analyses have played an important role in the

development of new crop varieties that have higher productivity

and more resistance to disease, insecticides and insects.

LIMITATIONS

• Bioinformatics and experimental biology are

independent, but complementary, activities.

Bioinformatics depends on experimental science to

produce raw data for analysis.

• Bioinformatics predictions are not formal proofs of any

concepts. They do not replace the traditional

experimental research methods of actually testing

hypotheses.

LIMITATIONS

• The quality of bioinformatics predictions depends on the

quality of data and the sophistication of the algorithms

being used. (Sequence data from high throughput

analysis often contain errors), it is so important to

maintain a realistic perspective of the role of

bioinformatics.

NEW THEMES

• To providing more reliable and more rigorous computational tools for sequence, structural, and functional analysis, the major challenge for future bioinformatics development is to develop tools for elucidation of the functions and interactions of all gene products in a cell..

• This molecular simulation of all the cellular processes is termed systems biology. The ultimate goal of this endeavor is to transform biology from a qualitative science to a quantitative and predictive science.

Databases

DNA Sequence DataEBI: http://www.ebi.ac.uk/NCBI:http://www.ncbi.nlm.nih.gov/DDBJ:http://www.ddbj.nig.ac.jp/

(Sequence Retrieval System) NCBI ( National Center for Biotechnology Information) ( European Bioinformatics Institute)

Introduction

• One of the hallmarks of modern genomic research is the generation of

enormous amounts of raw sequence data.

• As the volume of genomic data grows, sophisticated computational

methodologies are required to manage the data deluge.

• The very first challenge in the genomics era is store and handle the

staggering volume of information through the establishment and use of

computer databases.

• The development of databases to handle the vast amount of molecular

biological data is thus a fundamental task of bioinformatics.

WHAT IS A DATABASE?

• A database is a computerized archive used to store and organize

data in such a way that information can be retrieved easily via a

variety of search criteria.

• Databases are composed of computer hardware and software for

data management.

• Each record, also called an entry, should contain a number of fields

that hold the actual data items, for example, fields for names, phone

numbers, addresses, dates.

• To retrieve a particular record from the database, a user can

specify a particular piece of information, called value, to be found in

a particular field and expect the computer to retrieve the whole data

record. This process is called making a query.

Which courses are students from Texas taking?

• Construction and query of an object-oriented database. Three objects are constructed and are linked by pointers shown as arrows. Finding specific information relies on navigating through the objects by way of pointers. For simplicity, some of the pointers are omitted

MODERN DNA SEQUENCING

Historically there are two main methods of DNA sequencing:

Maxam & Gilbert, using chemical sequencing

Sanger, using dideoxynucleotides. Modern sequencing equipment uses the principles of the Sanger technique.

How to obtain sequences

Step 1- Before submission for sequencing DNA purity & concentration is checked

with the ‘Nanodrop’

A Nanodrop readout of known concentration to be run as a control

Step 2 -Samples are received and

stored in the refrigerator and a

request filed

Samples arrive in Eppendorf tubes

Step 3 - paperwork. Each request is assigned a ‘well’ in the sample tray and volumes of primers, water, dye, etc are calculated. A typical ‘run’ has samples

from a number of researchers

Step 4- Samples are agitated then centrifuged in an Ultracentrifuge to be sure they are in the bottom of their Eppendorf

tubes.

Step 5 - Reagents

• Each reaction requires several reagents:• Specific primers for the DNA in question• Fluorescent Dye attached to DD

nucleotides (Big Dye)• Deionised water• DNA polymerase• Additionally, a ‘control’ sample of a

known DNA is prepared so it can run at the same time as the experimental DNA

Micropipettors come in a range of sizes. They have disposable tips that hold tiny amounts of required reagents.

Step 6 - Preparing the wells

• The Sample wells are loaded with DNA to be sequenced. Great care needs to be taken to ensure that each sample goes into its assigned well.

• Reagents are added (water, dye, primers) in required amounts

• The sample wells are ‘spun’ to ensure that the DNA and reagents are mixed and at the bottom of the sample wells.

Sample tray and micropipettor. Each tray holds 96 samples

Step 7 - The samples are run through a cycle

sequencing process to get the fluorescent dyes incorporated by

the DNA.The DNA and reagents are

alternately heated and cooled over a2 1/2 hour period.

A Sequence print-out from a control sample

THANK YOU

Introducción a la bioinformatica

Technology