+ All Categories
Home > Documents > PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay Part 3: Essentials.

Date post: 16-Jan-2016
Category:
Upload: opal-harrington
View: 216 times
Download: 0 times
Share this document with a friend
53
PubChem—Substance, Compound, BioAssay Part 3: Essential s
Transcript
Page 1: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Part 3:

Essentials

Page 2: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Global Entrez Search Page

All[Filter]All[Filter]

Page 3: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Overall Goal:

An on-line resource providing comprehensive information on the

biological activities of small molecules

Page 4: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Why Are Small Molecules Important?

Constituents to all macromolecules(DNA, RNA, protein, carbohydrates, etc.)

Serve as cofactors and signaling molecules to thousands of proteins

The chemistry part of “biochemistry” Most drug entities and drug types are small

molecules Most biomarkers used in clinical chemistry are

small molecules

Page 5: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PubChem Databases and Tools:http://

pubchem.ncbi.nlm.nih.gov/

http://pubchem.ncbi.nlm.nih.gov/

Page 6: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

ChemicalDiversity

Technology Development

Screening

Instrumentation

AssayDevelopment

PredictiveADMET

Compound Repository(MLSMR)

Informatics

Chem-informaticsResearchCenters

The Molecular Libraries Roadmap:

An Integrated Initiative

Molecular LibrariesScreening Centers

Network ( M L S C N )

Page 7: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PubChem = Repository for small molecules and

bioactivity assay data Part of Entrez search and linking system Links to other NCBI databases, e.g.,

• PubMed, MeSH• Protein structures (MMDB)• Protein/Nucleotide sequences

(GenPept/GenBank) Contains complete chemical structures

Standardized for uniformity Small set of computed properties

Structure similarity searching

Page 8: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

and more…

Other Depositors to PubChem

Page 9: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PubChem: Bird’s Eye View

Depositors

PubChemBioAssays

PubChemCompound

PubChemSubstance

ChemicalStructureSimilarity

Page 10: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

How does data get into PubChem?

Page 11: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PubChem integration in Entrez

Protein Sequences

LiteratureVAST

StructureSimilarity

BioactivityAssay

Results

SmallMolecule

Structures

3DStructures

Term FrequencyStatistics

ChemicalStructureSimilarityActivity

ProfileSimilarity

Page 12: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 13: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PrimaryDatabase

Page 14: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Depositor Data

• No “Global” rules or standards– Based on organizational needs– Lots of data overlap– Often based on individual Scientist preferences

• PubChem accepts data from many organizations– Previously unseen data representation– Combinatorial explosion of ways for drawing the

same structure

Page 15: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Redundancy, mixtures

Mixture

Page 16: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

DerivativeDatabase

Page 17: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Chemical Structures may be representedin many different ways

Page 18: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Chemical Structures may be representedin many different ways

Page 19: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Compound

Substance

Page 20: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Knownstereochemistry

Unknown stereo Unknown E/Z isomers

Compound

Substance

Page 21: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Most molecules come out right, even complex ones

VancomycinNeed to fix heme bond orders Result

Sometimes there is a need to fix problems, e.g. bond orders

PDB lacks chemical detail

– no bond order information

– no hydrogens

Substances (heterogens) from Protein 3D structures (PDB)

Deposited structure receives

– bond information

– hydrogens

– stereochemistry(where possible)

Dopamine

Page 22: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PubChem Compound Processing

• Chemical Data Verification– Atom description (label, element?)– Functional group clean-up– Atom valence verification to prevent non-sense

• “Normalize” and “Standardize”– Valence-Bond canonicalize (for Tautomer invariance)– Aromaticity detection and self-consistency– Stereochemistry detection– Explicit hydrogen assignment

• Calculation– 2-D Coordinate generation– Image Depictions– Fingerprints

– IUPAC Name– SMILES, InChI, Hash Codes– xLogP, TPSA, HBD, HBA, MW, MF

Page 23: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Chemical Structure “Sanitization”

Chemical Structures that fail Sanitization Are not part of the aggregated PubChem Compound

Database Still “searchable” via PubChem Substance Database

Keeps the PubChem Compound Database “Clean” for Chemical Informatic Analysis

Collapses structures represented in various ways into a uniform, identical representation

Page 24: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Compound for mixture

Component compounds

Page 25: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Components of a mixture

Page 26: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Substance vs. Compound

Substance summary Compound summary

Page 27: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Substance vs. Compound

Page 28: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

"InChI=1/Ca.3H2O/h;3*1H2/q 2;;;/p-3/fCa.3HO/h;3*1h/qm;3*-1"[InChI]

200[MW]

300:500[MW]

“ dopamine”[CompleteSynonym]

“ pcsubstance structure"[Filter]

“ ca"[Element] AND 300:500[MW] AND "chemidplus"[SourceName]

"lipinski"[Filter] AND "antineoplastic agents"[PharmAction]

Examples of queries

Lipinski rule of 5 -- a molecule is likely to be bioactive if it has:•not more than 5 hydrogen bond donors (OH and NH groups) •<10 hydrogen bond acceptors (N or O) •a molecular weight under 500 •a LogP under 5

Page 29: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

All [ALL] -- All of the following fields are searched; default search field. Uid[UID] -- The integer represents SID for PCSubstance database. By default, an integer without a field alias is recognized as a UID. Same as [SID].Filter [Filter] -- Limits the records to various indexed filters. ActiveAid [AA] -- Active BioAssay identifier, integer. ActiveAidCount [AC, ACNT] -- # bioassays where tested active. AtomChiralCount [ACC, ACCNT] -- Total count of chiral atoms in a given compound.BioAssayID [BAID, AID] -- BioAssay identifier.BondChiralCount [BCC, BCCNT] –- Number of chiral bonds.Comment [CMT] -- Substance or bioassay comment. CompleteSynonym [CSYN, CSYNO] – exactly matching name for substance/compound. CompoundID [CID] -- Compound identifier, integer. DepositDate [DDAT, DEPDAT] -- Deposition timestamp for a substance.Element [ELMT, EL] -- Chemical element in a substance/compound. ExactMass [EMAS, EXMASS]-- The calculated mass of an ion or a molecule containing most likely isotopic composition for a single random molecule, corresponding to mass of most intense ion/molecule peak in a MS spec. A real number.HeavyAtomCount [HAC, HACNT] -- Atom count in a compound except hydrogen, integer. HydrogenBondAcceptorCount [HBAC, HBACNT] -- Hydrogen bond acceptors for a compound, integer. HydrogenBondDonorCount [HBDC, HBDCNT] -- Hydrogen bond donors for a compound, integer. InChI [inchi] -- IUPAC International Chemical Identifier.

Examples of PubChem Index Fields …

Page 30: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

IUPACName [UPAC, IUPAC] -- Standard IUPAC name for compound. MeSHDescription [MHD]MeSHTerm [MSHT, MESHT] -- Medical Subject Heading term.MeSHTreeNode [MSHN, MESHTN] -- Medical Subject Heading tree node (tree structures).MolecularWeight [MW, MWT, MOLWT] -- Mass of a molecule calculated using the average mass of each element weighted for its natural isotopic abundance. E.g., Carbon has two natural isotopes 12 and 13 with relative abundances of 98.9% and 1.1% to yield an average mass of 12.011 g/mol. A real number. MonoisotopicMass [MMAS, MIMASS] -- Mass of a molecule calculated using the mass of the most abundant isotope of each element. E.g., Carbon has a monoisotopic mass of 12.000 g/mol. A real number. PharmAction [PHMA, PHARMA] -- MeSH pharmacological actions heading.RotatableBondCount [RBC, RBCNT] – Number of rotatable bonds. SourceCategory [SRCC, SRCCAT, SRCCATG] -- Depositor categories.SourceID [SRID, SRCID] -- Depositor's external id.SourceName [SRC, SRCNAM, SRCNAME] -- official depositor name.SubstanceID [SID] -- Substance ID. Same as [UID].Synonym [SYNO] -- Synonyms for substance. TautomerCount [TC, TCNT, TTMC] -- Possible tautomer count for each given structure, ≤ 200.  TotalFormalCharge [TFC, CHG, CHRG] -- Total formula charge.TPSA [TPSA] -- Topological Polar Surface Area.XLogP [XLGP, LOGP]

Examples of PubChem Index Fields, contd.

Page 31: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Preview/Index Tab

Page 32: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

History Tab

Substances of MW 300-500Da having antineoplastic properties and obeying Lipinski rule of 5

Substances of MW 300-500Da having antineoplastic properties and obeying Lipinski rule of 5

Page 33: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

LinksLinks

For the whole set oronly selected records

Page 34: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Property Report

Page 35: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

SDF format

Page 36: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 37: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 38: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 39: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Medical Subject Headings (MeSH)

MeSH is the National Library of Medicine's controlled vocabulary thesaurus.

Consists of sets of terms naming descriptors in a hierarchical and alphabetic structure, e.g.:

"Mental Disorders”, “Pharmacological action”, “Catecholamine hormones” , etc.

Permits searching at various levels of specificity MeSH thesaurus is used for indexing articles for the

MEDLINE/PubMed database MeSH is continually updated

PubChem assigns MeSH headings to Compound records

Page 40: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Contains bioactivity screens of chemical substances described in PubChem Substance

Provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to a screening protocol

Depositor decides on data definitions and interpretation

Data can be plotted as graphs of statistical histograms

Cross-indexed to other Entrez databases

PrimaryDatabase

Page 41: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 42: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 43: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 44: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 45: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 46: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 47: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Click to view structureClick to view structureClick to view structureClick to view structure

Page 48: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Page 49: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

NCBI FTP >> PubChem Folder

Page 50: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

Entrez PubChem: Help and Tabs

Page 51: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

PubChem is part of NIH Molecular Libraries Roadmap for Medicine Initiative

PubChem consists of 3 databases, Substance, Compound and BioAssay, and a poweful Structure Search engine

Substance = samples; Compounds = calculated structures, properties

PubChem is integrated into NCBI’s Entrez Search and Linking system of databases

Records are indexed using number of terms

Records are linked to each other and to other databases at NCBI

Brief Summary

Page 52: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

For More Information…

Page 53: PubChem—Substance, Compound, BioAssay Part 3: Essentials.

PubChem—Substance, Compound, BioAssay

For More Information…

•General Help [email protected][email protected]•Telephone:• Voice: +1 (301) 496-2475

Fax:     +1 (301) 480-9241

E-mail addresses

The (free!) NCBI Newsletter

The NCBI Handbook

http://www.ncbi.nih.gov/Education/index.html

The NCBI Education Page

http://www.ncbi.nih.gov/About/newsletter.html

Follow the link from the NCBI Home Page


Recommended