of 54
8/13/2019 List of Public Databases
1/54
Canadian Bioinformatics
Workshopswww.bioinformatics.ca
8/13/2019 List of Public Databases
2/54
2Module #: Title of Module
8/13/2019 List of Public Databases
3/54
Module 4
Databases for Chemical, Spectral and
Biological Data
David Wishart
8/13/2019 List of Public Databases
4/54
Two Solitudes
B
ioinformatics
Ch
eminformatic
s
8/13/2019 List of Public Databases
5/54
Cheminformatics vs.Bioinformatics
Established in the1960s
Designed for the
needs of organicchemists
User-pay, limitedpublic access
Funded by largecompanies (MDL,Bielstein, Sigma,
CAS)
Established in the1990s
Designed for needs
of molecularbiologists
Web-based, openaccess model
Funded by largegovt agencies
(NCBI, EBI, NIH, GC)
8/13/2019 List of Public Databases
6/54
Whats A Database For?
Information consolidation & linkage Information retrieval (query matching)
Reference values, reference data,
reference sequences, reference images
Data for training/testing algorithms
Similarity searching (image, spectra,structure, sequence, text)
Prediction (structure, function, property,phylogeny, activity, relationship)
8/13/2019 List of Public Databases
7/54
Database Evolution
Hobby database(flatfile)
Limited coverageLimited depth
Greater coverageGreater depth
Extensive coverageModest depth
In
creasingCost+
Resources
Sizeofuserco
mmunity
Curated,non-redundant(relational, warehouse)
Archived, open deposit,redundant
(relational, distributed)
8/13/2019 List of Public Databases
8/54
Database Evolution
Hobby database(flatfile)
N
eedforstandar
dization
Curated,non-redundant(relational, warehouse)
Archived, open deposit,redundant
(relational, distributed)
D
ependenceonautomation
Queryingcap
abilities
8/13/2019 List of Public Databases
9/54
The Problem with Metabolomics
?
Genomics
Proteomics
Metabolomics
Gene IDs +
Transcript
Abundance
Protein IDs +
Concentrations
Metabolite IDs +
Concentrations
8/13/2019 List of Public Databases
10/54
Metabolomics Databases
Most data for metabolomics is still in texbooksor print journals (100+ years of clinicalchemistry, 75 years of classic biochemistry)
Field lags behind genomics/proteomics by about20 years
Challenge is to appeal to different usercommunities (metabolomics researchers,
analytical chemists, plant chemists, clinicalchemists, physicians, drug researchers, NMRspecialists, MS specialists, bioinformaticians,standards setters, etc.)
8/13/2019 List of Public Databases
11/54
Databases for Metabolomics
NMR spectral databases
Primarily small molecule spectra, not all metabolites
MS or MS/MS spectral databases
Primarily small molecule spectra, not all metabolites
Compound databases
Mostly compound names, structures, IDs, physprops
Pathway databases Mix of metabolite, drug, protein, signaling pathways
Comprehensive metabolomic databases
Combines most/all of the above, focus on metabolites
8/13/2019 List of Public Databases
12/54
NMR Spectral DBsSBDS NMRShiftDB
MMCD BMRB
8/13/2019 List of Public Databases
13/54
SDBS
http://riodb01.ibase.aist.go.jp/sdbs/cgi-bin/direct_frame_top.cgi
8/13/2019 List of Public Databases
14/54
SDBS
Maintained in Japan by AIST (since1970s)
Includes 24,700 MS spectra, 15,400 1HNMR spectra, 13,600 13C NMR spectra,52,500 FT-IR spectra on 34,000 cmpds
Extensive suite of spectral search tools
Most compounds are not metabolites,but still very useful for manyresearchers
8/13/2019 List of Public Databases
15/54
BioMagResBank
http://www.bmrb.wisc.edu/metabolomics/
8/13/2019 List of Public Databases
16/54
BioMagResBank
868 reference metabolites
5-6 NMR spectra (1H, 13C, 1D, 2D) percompound
Search by name, synonyms, InChI,formula, SMILES
Focus primarily on plant metabolites
(Arabidopsis) although now includesother mammalian metabolites
No assignments available
8/13/2019 List of Public Databases
17/54
NMRShiftDB
46,606 1H/13C
NMR Spectra
38,802 structures
http://www.ebi.ac.uk/nmrshiftdb/
8/13/2019 List of Public Databases
18/54
NMRShiftDB
Database developed by ChristophSteinbeck (who also leads ChEBI)
Not restricted to metabolites, includes
many organic compounds Supports chemical shift prediction
Can search by name, structure or
chemical shifts (peaks and Jcamp file)
Includes chemical shift assignments(but in organic solvents)
8/13/2019 List of Public Databases
19/54
MMCD
http://mmcd.nmrfam.wisc.edu/
20,306 cmpds
791 1H NMR
791 13C NMR791 TOCSY
791 13C HSQC
300 1H NMR (Lit)
907 13C NMR (Lit)
525 HSQC (Lit)
2021 MS (Lit)
8/13/2019 List of Public Databases
20/54
MMCD
Supports structure, name, NMR (shifts),MS (peaks) searches
Data includes chemical formula, namesand synonyms, structure, physical andchemical properties, NMR and MS data,NMR chemical shifts, species
associations and extensive links toimages, references, and other publicdatabases
8/13/2019 List of Public Databases
21/54
MS Spectral DBsNIST/AMDIS Metlin
GolmDB MassBank
8/13/2019 List of Public Databases
22/54
MassBank
http://www.massbank.jp/
8/13/2019 List of Public Databases
23/54
8/13/2019 List of Public Databases
24/54
MassBank
Very nicely maintained and easilysearchable collection of mostlymetabolite MS spectra
Includes ESI-QTOF, ESI-QqQ, GC-EI-TOF, EI, ESI-FTICR, Ion-trap, etc.
Covers 30,857 MS spectra from
approximately 14,500 compounds Archives data from ~20 different
sources (Japan, Germany, US, etc)
8/13/2019 List of Public Databases
25/54
Compound DBsChEBI PubChem
ChemSpider Ligand Expo
8/13/2019 List of Public Databases
26/54
ChEBI
Pronounced KEBEE
Chemical Entities of Biological Interest
Contains 25,518 3 star compounds Most compounds are from KEGG,
LipidMaps, DrugBank, Patents
Most data is on names, ontology,synonyms, MW, formula and structure
Searchable by name, formula, structure
8/13/2019 List of Public Databases
27/54
PubChem
NIH database of 31 million compounds and 75million substances, 1644 HT screens of compounds
Compound must have
8/13/2019 List of Public Databases
28/54
ChemSpider
Contains 25 million compounds from 400data sources
Searchable by name, synonym, InChi,
structure, registry #, SMILES, calculatedproperties (but not by formula or mass)
Data includes names, synonyms, wikipediaarticles, descriptions, data sources,suppliers, patents, articles, properties, MESHheadings, pharmacology links, spectra (UV,IR, NMR, MS) sourced from other sites
8/13/2019 List of Public Databases
29/54
Ligand Expo
Contains the small molecules in thePDB
Useful because it links
chemicals/metabolites/drugs to theirtargets
Also provides 3D structure coordinates
Searchable via 3-letter chemicalidentifier code, molecular name,molecular formula, SMILES description,
InChi, 3D structure
8/13/2019 List of Public Databases
30/54
Other Compound DBs3DMet KNApSAcK
ZINC LipidMaps
8/13/2019 List of Public Databases
31/54
Other DBs 3DMet
3D structure database of natural metabolites
KNApSAcK
Database of 50,000 plant metabolites linked to
species information ZINC
Database of 2.7 million commercially available
chemicals (mostly drug-like compounds)
LipidMaps Database with 30,000 lipids (Fatty acyls,
glycerolipids, glycerophospholipids, sphingolipids,
sterols, prenols, saccharolipids, polyketides)
8/13/2019 List of Public Databases
32/54
Pathway DBsKEGG SMPDB
BioCyc/MetaCyc Reactome
8/13/2019 List of Public Databases
33/54
Pathway DBs
Rich source of biological data thatrelates metabolites to genes, proteins,diseases, signaling events and
processes
Provide various tools to permitvisualization and gene/metabolite
mapping Often cover multiple species
8/13/2019 List of Public Databases
34/54
KEGG Kyoto Encyclopediaof Genes and Genomes
http://www.genome.jp/kegg/
8/13/2019 List of Public Databases
35/54
8/13/2019 List of Public Databases
36/54
The Small Molecule
Pathway Database (SMPDB)
http://www.smpdb.ca
8/13/2019 List of Public Databases
37/54
SMPDB
350 hand-drawn pathways relevant tohuman/mammalian metabolism
155 drug pathways
72 disease pathways
12 signalling pathways
70 standard metabolic pathways
Searching and browsing capabilities
Metabolite mapping capabilities
Captures structure, organelle,
cellular and organ information
8/13/2019 List of Public Databases
38/54
Exploring Pathways withSMPDB
8/13/2019 List of Public Databases
39/54
Mapping Metabolites withSMPDB
8/13/2019 List of Public Databases
40/54
Mapping MetaboliteConcentrations with SMPDB
8/13/2019 List of Public Databases
41/54
8/13/2019 List of Public Databases
42/54
HMDB Features/Content
7969 metabolites
120 bacterial (gutmicrobe) metabolites
Normal/abnormalconcentrations
700+ disease links
1700 NMR spectra
2600 MS spectra
310 GC-MS spectra
Sequence search tools
Spectral search tools
Extensive browsing
tools Pathway search tools
Structure searches
Biofluid browsing
Text search tools
Full data downloads
8/13/2019 List of Public Databases
43/54
The Human Metabolome Project
$7.5 million Genome Canada Project launched in Jan.2005 - Based at the University of Alberta
Mandate to quantify and identify all metabolites inbiofluids such as urine, CSF and blood as well as
tissues using HT experiments and text analysis (~8000cmpds to date)
Associate metabolite concentrations to ~500 diseasesor conditions
Make all data freely and electronically accessible(HMDB, DrugBank, FooDB, T3DB)
Develop novel technologies and software to improvemetabolome coverage and metabolomic throughput
8/13/2019 List of Public Databases
44/54
The Human Metabolomes
M mM M nM pM fM
Endogenous metabolites
Drugs
Food additives/Phytochemicals
Drug metabolites
Toxins/Env. Chemicals3100 (T3DB)
500 (DrugMet)
30,000 (FooDB)
1450 (DrugBank)
8000 (HMDB)
8/13/2019 List of Public Databases
45/54
Meet the Metabolomes
http://www.foodb.ca http://www.drugbank.ca
http://www.hmdb.ca http://www.T3DB.org
8/13/2019 List of Public Databases
46/54
Inside the HMDB
8/13/2019 List of Public Databases
47/54
Inside the HMDB
HMDB Databrowser
102 data fields
8/13/2019 List of Public Databases
48/54
HMDB Spectral Searching
8/13/2019 List of Public Databases
49/54
HMDB: Pathway Tools
-Enter metabolites-Link to metabolic pathways-Explore pathway images
8/13/2019 List of Public Databases
50/54
The HMDB Biofluid Database
Reference metaboliteconcentrations for >450different diseases &conditions
Abnormal and normalmetaboliteconcentrations for >15biofluids and >4500different metabolites
Designed for clinicalchemists & physicians
Largest & most completeresource of its kind
8/13/2019 List of Public Databases
51/54
Inside DrugBank
http://www.drugbank.ca
8/13/2019 List of Public Databases
52/54
Query Tools
PharmaBrowse
ChemQuery
8/13/2019 List of Public Databases
53/54
Query Tools
SeqSearch
DataExtractor
8/13/2019 List of Public Databases
54/54
Database Comparison
HMDB
KEGGPubChem
MMCD
ChEBI
SDBS
Reactome
Metlin
Cyc DBs
MSSpectra
NMR
Spectra
Pathways
Structures
Description
s
ChemP
rops
Physiol.da
ta
Nomenclat.
Links+Re
fs