Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | opal-harrington |
View: | 216 times |
Download: | 0 times |
PubChem—Substance, Compound, BioAssay
Part 3:
Essentials
PubChem—Substance, Compound, BioAssay
Global Entrez Search Page
All[Filter]All[Filter]
PubChem—Substance, Compound, BioAssay
Overall Goal:
An on-line resource providing comprehensive information on the
biological activities of small molecules
PubChem—Substance, Compound, BioAssay
Why Are Small Molecules Important?
Constituents to all macromolecules(DNA, RNA, protein, carbohydrates, etc.)
Serve as cofactors and signaling molecules to thousands of proteins
The chemistry part of “biochemistry” Most drug entities and drug types are small
molecules Most biomarkers used in clinical chemistry are
small molecules
PubChem—Substance, Compound, BioAssay
PubChem Databases and Tools:http://
pubchem.ncbi.nlm.nih.gov/
http://pubchem.ncbi.nlm.nih.gov/
PubChem—Substance, Compound, BioAssay
ChemicalDiversity
Technology Development
Screening
Instrumentation
AssayDevelopment
PredictiveADMET
Compound Repository(MLSMR)
Informatics
Chem-informaticsResearchCenters
The Molecular Libraries Roadmap:
An Integrated Initiative
Molecular LibrariesScreening Centers
Network ( M L S C N )
PubChem—Substance, Compound, BioAssay
PubChem = Repository for small molecules and
bioactivity assay data Part of Entrez search and linking system Links to other NCBI databases, e.g.,
• PubMed, MeSH• Protein structures (MMDB)• Protein/Nucleotide sequences
(GenPept/GenBank) Contains complete chemical structures
Standardized for uniformity Small set of computed properties
Structure similarity searching
PubChem—Substance, Compound, BioAssay
and more…
Other Depositors to PubChem
PubChem—Substance, Compound, BioAssay
PubChem: Bird’s Eye View
Depositors
PubChemBioAssays
PubChemCompound
PubChemSubstance
ChemicalStructureSimilarity
PubChem—Substance, Compound, BioAssay
How does data get into PubChem?
PubChem—Substance, Compound, BioAssay
PubChem integration in Entrez
Protein Sequences
LiteratureVAST
StructureSimilarity
BioactivityAssay
Results
SmallMolecule
Structures
3DStructures
Term FrequencyStatistics
ChemicalStructureSimilarityActivity
ProfileSimilarity
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PrimaryDatabase
PubChem—Substance, Compound, BioAssay
Depositor Data
• No “Global” rules or standards– Based on organizational needs– Lots of data overlap– Often based on individual Scientist preferences
• PubChem accepts data from many organizations– Previously unseen data representation– Combinatorial explosion of ways for drawing the
same structure
PubChem—Substance, Compound, BioAssay
Redundancy, mixtures
Mixture
PubChem—Substance, Compound, BioAssay
DerivativeDatabase
PubChem—Substance, Compound, BioAssay
Chemical Structures may be representedin many different ways
PubChem—Substance, Compound, BioAssay
Chemical Structures may be representedin many different ways
PubChem—Substance, Compound, BioAssay
Compound
Substance
PubChem—Substance, Compound, BioAssay
Knownstereochemistry
Unknown stereo Unknown E/Z isomers
Compound
Substance
PubChem—Substance, Compound, BioAssay
Most molecules come out right, even complex ones
VancomycinNeed to fix heme bond orders Result
Sometimes there is a need to fix problems, e.g. bond orders
PDB lacks chemical detail
– no bond order information
– no hydrogens
Substances (heterogens) from Protein 3D structures (PDB)
Deposited structure receives
– bond information
– hydrogens
– stereochemistry(where possible)
Dopamine
PubChem—Substance, Compound, BioAssay
PubChem Compound Processing
• Chemical Data Verification– Atom description (label, element?)– Functional group clean-up– Atom valence verification to prevent non-sense
• “Normalize” and “Standardize”– Valence-Bond canonicalize (for Tautomer invariance)– Aromaticity detection and self-consistency– Stereochemistry detection– Explicit hydrogen assignment
• Calculation– 2-D Coordinate generation– Image Depictions– Fingerprints
– IUPAC Name– SMILES, InChI, Hash Codes– xLogP, TPSA, HBD, HBA, MW, MF
PubChem—Substance, Compound, BioAssay
Chemical Structure “Sanitization”
Chemical Structures that fail Sanitization Are not part of the aggregated PubChem Compound
Database Still “searchable” via PubChem Substance Database
Keeps the PubChem Compound Database “Clean” for Chemical Informatic Analysis
Collapses structures represented in various ways into a uniform, identical representation
PubChem—Substance, Compound, BioAssay
Compound for mixture
Component compounds
PubChem—Substance, Compound, BioAssay
Components of a mixture
PubChem—Substance, Compound, BioAssay
Substance vs. Compound
Substance summary Compound summary
PubChem—Substance, Compound, BioAssay
Substance vs. Compound
PubChem—Substance, Compound, BioAssay
"InChI=1/Ca.3H2O/h;3*1H2/q 2;;;/p-3/fCa.3HO/h;3*1h/qm;3*-1"[InChI]
200[MW]
300:500[MW]
“ dopamine”[CompleteSynonym]
“ pcsubstance structure"[Filter]
“ ca"[Element] AND 300:500[MW] AND "chemidplus"[SourceName]
"lipinski"[Filter] AND "antineoplastic agents"[PharmAction]
Examples of queries
Lipinski rule of 5 -- a molecule is likely to be bioactive if it has:•not more than 5 hydrogen bond donors (OH and NH groups) •<10 hydrogen bond acceptors (N or O) •a molecular weight under 500 •a LogP under 5
PubChem—Substance, Compound, BioAssay
All [ALL] -- All of the following fields are searched; default search field. Uid[UID] -- The integer represents SID for PCSubstance database. By default, an integer without a field alias is recognized as a UID. Same as [SID].Filter [Filter] -- Limits the records to various indexed filters. ActiveAid [AA] -- Active BioAssay identifier, integer. ActiveAidCount [AC, ACNT] -- # bioassays where tested active. AtomChiralCount [ACC, ACCNT] -- Total count of chiral atoms in a given compound.BioAssayID [BAID, AID] -- BioAssay identifier.BondChiralCount [BCC, BCCNT] –- Number of chiral bonds.Comment [CMT] -- Substance or bioassay comment. CompleteSynonym [CSYN, CSYNO] – exactly matching name for substance/compound. CompoundID [CID] -- Compound identifier, integer. DepositDate [DDAT, DEPDAT] -- Deposition timestamp for a substance.Element [ELMT, EL] -- Chemical element in a substance/compound. ExactMass [EMAS, EXMASS]-- The calculated mass of an ion or a molecule containing most likely isotopic composition for a single random molecule, corresponding to mass of most intense ion/molecule peak in a MS spec. A real number.HeavyAtomCount [HAC, HACNT] -- Atom count in a compound except hydrogen, integer. HydrogenBondAcceptorCount [HBAC, HBACNT] -- Hydrogen bond acceptors for a compound, integer. HydrogenBondDonorCount [HBDC, HBDCNT] -- Hydrogen bond donors for a compound, integer. InChI [inchi] -- IUPAC International Chemical Identifier.
Examples of PubChem Index Fields …
PubChem—Substance, Compound, BioAssay
IUPACName [UPAC, IUPAC] -- Standard IUPAC name for compound. MeSHDescription [MHD]MeSHTerm [MSHT, MESHT] -- Medical Subject Heading term.MeSHTreeNode [MSHN, MESHTN] -- Medical Subject Heading tree node (tree structures).MolecularWeight [MW, MWT, MOLWT] -- Mass of a molecule calculated using the average mass of each element weighted for its natural isotopic abundance. E.g., Carbon has two natural isotopes 12 and 13 with relative abundances of 98.9% and 1.1% to yield an average mass of 12.011 g/mol. A real number. MonoisotopicMass [MMAS, MIMASS] -- Mass of a molecule calculated using the mass of the most abundant isotope of each element. E.g., Carbon has a monoisotopic mass of 12.000 g/mol. A real number. PharmAction [PHMA, PHARMA] -- MeSH pharmacological actions heading.RotatableBondCount [RBC, RBCNT] – Number of rotatable bonds. SourceCategory [SRCC, SRCCAT, SRCCATG] -- Depositor categories.SourceID [SRID, SRCID] -- Depositor's external id.SourceName [SRC, SRCNAM, SRCNAME] -- official depositor name.SubstanceID [SID] -- Substance ID. Same as [UID].Synonym [SYNO] -- Synonyms for substance. TautomerCount [TC, TCNT, TTMC] -- Possible tautomer count for each given structure, ≤ 200. TotalFormalCharge [TFC, CHG, CHRG] -- Total formula charge.TPSA [TPSA] -- Topological Polar Surface Area.XLogP [XLGP, LOGP]
Examples of PubChem Index Fields, contd.
PubChem—Substance, Compound, BioAssay
Preview/Index Tab
PubChem—Substance, Compound, BioAssay
History Tab
Substances of MW 300-500Da having antineoplastic properties and obeying Lipinski rule of 5
Substances of MW 300-500Da having antineoplastic properties and obeying Lipinski rule of 5
PubChem—Substance, Compound, BioAssay
LinksLinks
For the whole set oronly selected records
PubChem—Substance, Compound, BioAssay
Property Report
PubChem—Substance, Compound, BioAssay
SDF format
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
Medical Subject Headings (MeSH)
MeSH is the National Library of Medicine's controlled vocabulary thesaurus.
Consists of sets of terms naming descriptors in a hierarchical and alphabetic structure, e.g.:
"Mental Disorders”, “Pharmacological action”, “Catecholamine hormones” , etc.
Permits searching at various levels of specificity MeSH thesaurus is used for indexing articles for the
MEDLINE/PubMed database MeSH is continually updated
PubChem assigns MeSH headings to Compound records
PubChem—Substance, Compound, BioAssay
Contains bioactivity screens of chemical substances described in PubChem Substance
Provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to a screening protocol
Depositor decides on data definitions and interpretation
Data can be plotted as graphs of statistical histograms
Cross-indexed to other Entrez databases
PrimaryDatabase
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
Click to view structureClick to view structureClick to view structureClick to view structure
PubChem—Substance, Compound, BioAssay
PubChem—Substance, Compound, BioAssay
NCBI FTP >> PubChem Folder
PubChem—Substance, Compound, BioAssay
Entrez PubChem: Help and Tabs
PubChem—Substance, Compound, BioAssay
PubChem is part of NIH Molecular Libraries Roadmap for Medicine Initiative
PubChem consists of 3 databases, Substance, Compound and BioAssay, and a poweful Structure Search engine
Substance = samples; Compounds = calculated structures, properties
PubChem is integrated into NCBI’s Entrez Search and Linking system of databases
Records are indexed using number of terms
Records are linked to each other and to other databases at NCBI
Brief Summary
PubChem—Substance, Compound, BioAssay
For More Information…
PubChem—Substance, Compound, BioAssay
For More Information…
•General Help [email protected]•[email protected]•Telephone:• Voice: +1 (301) 496-2475
Fax: +1 (301) 480-9241
E-mail addresses
The (free!) NCBI Newsletter
The NCBI Handbook
http://www.ncbi.nih.gov/Education/index.html
The NCBI Education Page
http://www.ncbi.nih.gov/About/newsletter.html
Follow the link from the NCBI Home Page