Features of biological databases

Post on 14-Apr-2017

305 views 0 download

transcript

Features of Biological Databases

CHARU SHARMAB.Sc(H) BOTANY 3rd YEAR

Biological Database It is a collection of data that is

structured, searchable, updated periodically and cross-referenced.

Stores biological data in electronic form.

Purpose-Systemization of databaseAvailability of biological dataAnalysis of computed biological data

HISTORYInsulin, first protein that was sequenced;

composed of 55 amino acid.The sequence was published in “Atlas Of

Protein Sequence” in 1965 by Margaret Day Hoff.

Became base for PIR database.First nucleotide sequenced was of Yeast

tRNA, composed of 77 bp.First organism whose genome was

sequenced, a free living virus Haemophilus influenzae in 1995 by Craig Ventar

Features of Biological Databases1. Heterogeneity2. High volume data3. Uncertainity4. Data curation5. Data integration6. Data sharing7. Dynamics

1. Data Heterogeneity Availability of diverse and

complex data types. Data Types : Sequence- Nucleotide, Protein Graph - Data indicating

relationship among themselves can be captured as graph. It includes pathway data, genetic maps and structural taxonomy.

High dimensional data – Data generated from micro-array

experiments that involves thousands of genes and hundreds of experimental condition.

Shapes – It consists of 3D molecular structural

data. Example- Docking Temporal data – For studying dynamics of any biological

system. Example- Development biology

Patterns – There are patterns lying within

the genome that characterize biologically entities.

Example-Regulatory sequence (promoter)

Scalar and Vector fields – Extracted features data – Numerical data obtained from

combination of one of the above mentioned data types

2. High volume data In addition to being highly heterogeneous,

biological data are voluminous to support comprehensive investigations in various fields and directions.

3. Uncertainity Biological data have great deal of

uncertainity as they represent biological phenomenon that are observed and assumed.

4. Data curation Biological data are collected from

various sources across different structural and functional boundaries.

There are always chances of missing links.

To fill these, the data is analyzed and curated via automated methods.

5. Data integration After years of research, across

different structural and functional scales, data is collected from laboratories worldwide, and integrated together through a database and made available for use.

6. Data sharingBiological data is shared via

databases.Purpose: For scientific community’s

inspectionFor cross verificationTo prevent repetition and

validation of data

7. DynamicsNew data is generated every day

in laboratories.And sometimes this new data

contradicts with the old data.So, its necessary to develop new

organizational database schemes to incorporate new data.

CLASSIFICATION

Classification of biological databaseso Data typeo Maintainer statuso Data accesso Data sourceo Database designo Organism

1. Data type Sequence database a. Nucleotide database : GenBank, EMBL-Bank b. Protein database : Swiss-Prot, PIR Structure database - PDB, NDB, DALI, MSD Microarray database - ArrayExpress, MIAME Chemical database - PubChem Pathway database - KEGG, BioSilico Enzyme database - ExPASy, REBASE Disease database - OMIM, OMIA Literature database - PubMed, ScoPUS

2. Maintainer statusNCBI, EMBLAcademic group or scientistCommercial company

3. Data accessPublicly availableAvailable with copyrightBrowsing only, accessible but not

downloadableAcademic but not freely availableRestricted

4. Data sourcea) Primary database (archival) Original data submission by researcher occurs. Examples: Nucleotide - GenBank, EMBL, DDBJ Protein - UniProt Structure - PDB Literature - Medline (PubMed)b) Secondary database (curated) - Results of analysis of primary databases. - Either manually curated or by automated

methods Examples: Prosite , Pfam , RefSeq

5. Database designFlat filesRelational database (SQL)Object oriented databaseExchange/publication

technologies (FTP, HTML, SOAP, COBRA, XML)

6. Organism BacteriaVirusHuman

THANK YOU