+ All Categories
Home > Documents > Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical...

Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical...

Date post: 29-Jan-2016
Category:
Upload: linda-nicholson
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Klaus Gubernator, Craig James, eMolecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical Structure Search Engines in Cyberspace
Transcript
Page 1: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Klaus Gubernator, Craig James, eMolecules Inc.

ACS 232nd National Meeting

Division of Chemical Information

  San Francisco, September 14, 2006

Chemical Structure Search Engines in Cyberspace

Page 2: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

The web has revolutionized the way we retrieve information

Chemistry is a late participant in this revolution

Chemistry on the Internet

Page 3: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

N

Search Google Images for “Aspirin”

Page 4: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

http://scripts.iucr.org/cgi-bin/paper?cnor=a12172&buy=yes

Acta Cryst. (1975). B31, 1427-1429    The crystal structure of 7-amino-2H,4H-vic-triazolo[4,5-c]-1,2,6-

thiadiazine 1,1-dioxide (ATT)C. Foces-Foces, F. H. Cano and S. García-Blanco

Buy onlineYou may purchase this article in PDF and/or HTML formats. For

purchasers in the UK, and for purchasers elsewhere in the European Community who do not have a VAT number, VAT will be added to the price of the article.

Format*   PDF (US $40, plus US $7 for EC purchases)

Structure of “triazolo thiadiazine”

Page 5: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Datasets (which are, in contrast to other dataset lists, available in a structural format)This list will be expanded continuously. Please don't hesitate to make published datasets publicly available here.

Currently available: 44 DatasetsNote: The Briem/Lessel and Hert/Willett Dataset are only available as MDDR ID's due to license reasons. Please contact MDL for further information on the database. The datasets have nonethless been included here because they are standard datasets for similarity searching. – Andreas Bender

Binary (active/inactive) datasets QSAR datasets QSPR datasets Toxicity datasets Metabolism datasets Permeability datasets Docking datasets Mechanistic datasets Mixed/Other datasets

Cheminformatics.org

Page 6: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

CS(=O)(=O)Nc1ccc(cc1OC2CCCCC2)N(=O)=OCS(=O)(=O)Nc1cc2CCC(=O)c2cc1Oc3ccc(F)cc3FCS(=O)(=O)Nc1cc2CCC(=O)c2cc1Sc3ccc(F)cc3FCS(=O)(=O)Nc1ccc(cc1Sc2ccc(F)cc2F)C(=O)NCS(=O)(=O)Nc1ccc(cc1Sc2ccc(Cl)cc2Cl)S(=O)(=O)NCOc1ccc(cc1)c2sc(nc2c3ccc(cc3)S(=O)(=O)C)c4ccccc4ClCOc1ccc(cc1)c2sc(nc2c3ccc(SC)cc3)c4ccccc4ClCS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(F)cc3)C(F)(F)FCS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(Br)cc3)C(F)(F)FCc1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)FCS(=O)(=O)c1ccc(cc1)c2snnc2c3ccc(F)cc3CC(=O)c1nc(c(o1)c2ccc(c(F)c2)S(=O)(=O)N)c3ccccc3Cc1nc(C2CCCCC2)c(o1)c3ccc(c(F)c3)S(=O)(=O)NCS(=O)(=O)c1ccc(cc1)c2[nH]c(nc2C3CCCCC3)C(F)(F)FCS(=O)(=O)c1ccc(cc1)c2[nH]c(nc2c3ccc(F)cc3)C(F)(F)FCS(=O)(=O)c1ccc(cc1)C2=C(C(=O)OC32CC3)c4ccccc4CS(=O)(=O)c1ccc(cc1)C2=C(C(=O)OC32CCCC3)c4ccccc4CS(=O)(=O)c1ccc(cc1)c2cnn(Cc3ccccc3)c(=O)c2c4ccccc4CS(=O)(=O)c1ccc(cc1)c2nn(Cc3ccccc3)c(c2c4ccc(F)cc4)C(F)(F)FNS(=O)(=O)c1ccc(cc1)c2c(CO)onc2c3ccccc3CS(=O)(=O)c1ccc(cc1)c2cc(Cl)nn2c3ccc(F)cc3NS(=O)(=O)c1ccc(cc1)c2cc(nn2c3ccc(F)cc3)C(F)(F)FNS(=O)(=O)c1ccc(cc1)n2nc(cc2c3nc4cccc(F)c4s3)C(F)F

Stahl dataset

Page 7: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Unnamed -MTS- 06200418093D 0 0.00000 0.00000 0

13 13 0 0 0 0 0 0 0 0 1 V2000 0.0180 -0.0030 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 1.7880 0.0070 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4880 0.0110 1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 3.8880 0.0200 1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5880 0.0240 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.0030 0.0330 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 6.6610 1.1880 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.0400 2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.1410 1.1970 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7570 0.1440 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.7890 2.3360 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 3.8880 0.0200 -1.2130 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4880 0.0120 -1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 3 4 1 0 0 0 0 4 5 2 0 0 0 0 …M END> <BIO>48.00

$$$$

Yokoyama dataset

Page 8: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Search Genbank for “aattccgg”

Page 9: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

C

Page 10: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

C

Page 11: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Why is so little chemistry on the web?

Tradition? Strong providers of subscription services? Searching for chemical structures is

significantly more difficult than text searching?

Chemical identifiers are not standardized?

Page 12: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Open Access Chemical Search Engines

PubChem - NIH

ChemBank – Harvard

ZINC – UCSF

ChemDB – UC Irvine

ChemExper - Lausanne

ChemFinder – CambridgeSoft

Page 13: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

www.emolecules.com

New Chemistry Search Engine A large database of publicly available

molecular structures Launched November 2005 50,000 searches per month, rapidly

growing

Page 14: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

www.emolecules.com

Free chemistry search site for publicly available chemical information

Page 15: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.
Page 16: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Advanced Search

Powerful features: hit list management union, intersect, subtract, difference manual selection export lists in many formats persistent hitlists

Page 17: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

T

O

Page 18: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.
Page 19: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Content: 16M entries, 5.6M structures

Academic and government databases NIST WebBook DrugBank Protein Ligands

Chemical suppliers 150 electronic catalogs included

Future goal All publicly available chemical information

Page 20: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Why is it so fast?

Novel chemical search engine technology

Method represents a major departure from previously known algorithms- Molecular keys (MDL)

- Fingerprints (Daylight)

- Feature Trees (BioSolv)

Page 21: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Search engine technology

Analyze each molecule for distinguishing structural features

Generate all features algorithmically

Normalize features and use them for indexing

Result: very fast searches

N

O

NCl

H2N

O

H2N

H2N

HN

N

N

Cl

HN

N

Page 22: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Who is eMolecules?

Klaus Gubernator

Craig A. James

Rashmi Mistry

Page 23: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Summary

Free for depositors and users Very fast search engine High quality user interface Rich functionality Complementary with other engines

Page 24: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical.

Contact Information

Klaus Gubernator

[email protected]

Skype: emolecules

+1-858-764-1941


Recommended