+ All Categories
Home > Documents > April 2006 Use of Chemical Information in Organic Synthesis Reaction Information for the Practicing...

April 2006 Use of Chemical Information in Organic Synthesis Reaction Information for the Practicing...

Date post: 26-Dec-2015
Category:
Upload: cleopatra-norris
View: 215 times
Download: 1 times
Share this document with a friend
45
April 2006 Use of Chemical Information in Organic Synthesis Reaction Information for the Practicing Synthetic Chemist: The Search for Relevant Answers Guenter Grethe May, 2006 Available information Introduction to reaction data searching Concepts and problems Basis of reaction classification DiscoveryGate Retrieving relevant information for the synthesis of new compounds Questions & Answers AGENDA:
Transcript

April 2006

Use of Chemical Information in Organic Synthesis

Reaction Information for the Practicing Synthetic Chemist: The Search for Relevant Answers

Guenter Grethe May, 2006

Available information Introduction to reaction data searching Concepts and problems Basis of reaction classification DiscoveryGate Retrieving relevant information for the synthesis of new compounds Questions & Answers

AGENDA:

April 2006

Use of Chemical Information in Organic Synthesis

Information Needs of Synthetic Organic Chemists in Basic Research and Development

General: searching for information on molecules precedes retrieval of synthetic methodology data

• new preparation of intermediates and starting materials• well established, high yield preparations (experimental procedures)• new synthetic methodologies (new reagents, catalysts etc.)• information on starting materials (availability, price, physical data etc.)• physical properties of reagents, solvents and catalysts• access to the primary, secondary, and tertiary literature• spectral information of related compounds

April 2006

Query: Is this particular molecule or similar ones known? Specific data? Answer: Yes or No from existing databases, including patents

Query: How to selectively reduce the nitrile group (transformation?) Answer: Pointers to relevant examples in the literature Criteria:

Efficient transformation Functional group compatibility Reactions conditions

Use of Chemical Information in Organic Synthesis

Differences in Molecule vs. Reaction SearchingCl CN

NO2

Cl CN

NO2

Cl

NO2

NH2

ReactionConditions?

Molecules:

Reactions:

April 2006

Use of Chemical Information in Organic Synthesis

online: CASREACT (CAS) (ca. 10.5 Mio, including Spresi database, 1985 - present )

Spresi (InfoChem) (ca. 4.5 Mio, 1974 – 2004)

CrossFireplusReactions (Elsevier MDL, STN) (ca. 10 Mio, 1779 - present) ChemInform RX on STN (FIZ Chemie) (ca. 0.8 Mio) CCR (Thomson Scientific) (ca. 0.6 Mio)

inhouse:ChemInform Reaction Library (Elsevier MDL)

Spresi (InfoChem)CrossFire Beilstein (Elsevier MDL)Specialty Databases (several vendors) Proprietary Databases

For a good review see: Zass, E. "Reaction Databases", In: Encyclopedia of Computational Chemistry, Schleyer, P. von R.; Allinger, N.L.; Clark, T.; Gasteiger, J.; Kollman, P.A.; Schaefer, H.F.; Shreiner, P.R. (Eds.). Wiley, Chichester, 4, 2402-2420. QD39.3.E46 E53 1998

Available Reaction Databases

April 2006

Use of Chemical Information in Organic Synthesis

Preparation of a distinct compound requires access to information about new synthetic methodologies in

journals and databases experimental details for the preparation of known

intermediates and starting materials from databases, journals and other sources

tools to plan syntheses and select optimal reaction conditions Preparation of a library of diverse compounds requires

all of the above knowledge about the characteristics of functional groups information about available building blocks

Process development requirements are defined by access to information about various reaction conditions of a

reaction knowledge about the characteristics of molecules or their

fragments under required reaction condition tools to calculate the behavior of reagents, solvents, and

catalysts

Use of Available Information in Synthesis

April 2006

Use of Chemical Information in Organic Synthesis

multiple access systems different user interfaces different modi operandi difficult query formulation

substructure concept keyword inconsistencies

limited post-search management of large hitlists some integrated access to other information

sources

Most importantly: failure of available systems to recognizeand to facilitate the integration of the vast knowledge of synthetic chemists

Barriers Impeding the Use of Available Information by Endusers

April 2006

Structure-Based Searches Full structure

Only for reactions with known molecules (not very useful) Reaction substructure (RSS)

Most frequently used mode (difficult for end-users to formulate effective query) Reaction similarity

Various methodologies using different parameters (results often vary greatly, good for browsing and idea generation)

Reaction classification Several methodologies, mostly based on structural information about reaction

centers and immediate environment (good indexing tool, improvement over reaction similarity)

Reagents, Solvents Full structure and substructure searches for molecules (not available in all

databases, used mostly in conjunction with other structural searches) Data-Based Searches

Keywords intellectually derived terms for name reactions, reaction types etc. (incomplete, not

very useful) Journal, author, title, yields, etc.

Text or numeric data searches (mostly used in conjunction with structural searches)

Use of Chemical Information in Organic Synthesis

Search Modes

April 2006

Use of Chemical Information in Organic Synthesis

N

O

CH3O

O

O

N

O

CH3O

O

O

Synthetic Problem:

Full Structure Search: No hits*

Reaction Substructure Search (colored fragment): 119 hits*

Keyword Search “Michael Addition”: 2972 hits*

*Results were obtained from Elsevier MDL’s combined reaction databases (ca. 1 Mio reactions); 2006

Class Code Search 672 hits* (broad, reaction center only)

Problems with Reaction Searching

April 2006

Use of Chemical Information in Organic Synthesis

NO2N

NCl

NO2N

Cl

NH2

Cl NO2

NCl NO2 NH2

N

NH2Oversimplified Query(nitrile to primary amine)

737 Hits

Problems with Substructure Searching

0 Hits

Narrowly Defined Query

Problems:

- how to avoid excessively large hitlist- how to formulate “reasonable” search queries

Solutions:

- combination of several queries (expert approach)- indexing of reactions (focusing on relevant reactions)- facilitating query building (non-expert approach, intuitive)

DATABASE SIZE: ca. 1 million reactions

April 2006

Use of Chemical Information in Organic Synthesis

Goal for an Efficient Reaction Data Management System

Create an environment that allows for combining the intelligence and creativity of synthetic chemists with the processing and simulating power of computers and the wealth of information in databases to meet the challenges in the laboratory for developing efficient syntheses.

April 2006

Use of Chemical Information in Organic Synthesis

User interfaces based on users’ tasks and capabilities

(e.g. CrossFire Web, DiscoveryGate, Reaction Browser, Scifinder)(see “A Framework for the Evaluation of Chemical Structure Databases”, Cooke,F; Schofield, H. J. Chem. Inf. Comput. Sci. 2001, 41, 1131-1140)

Hierarchical thesauri for keywords and reaction types

Effective indexing of databases (e.g. classification)

Simplification of the querying process (natural, not rule dependent)

Efficient post-search management tools (e.g.clustering)

Seamless integration of various information sources

(web environment, point-and-click)

Most importantly: available tools must simulate the chemist’s problem solving process

Requirements to Facilitate Enduser Searching

April 2006

Use of Chemical Information in Organic Synthesis

Databases in DiscoveryGate

April 2006

Reasons alternate method for indexing databases - complement to structure-

based retrieval systems access to “generic” types of information in retrieval systems post-search management of large hitlists simplification of query generation linking of reaction information from different sources source for deriving knowledge bases for reaction prediction and

synthesis design automatic procedures for analyses and correlations, e.g. quality

control and overlap studies

‘Do We Still Need a Classification of Organic Reactions?’

Reaction Classification as Indexing Tool

Use of Chemical Information in Organic Synthesis

Reaction Classification as Indexing Tool

April 2006

Reaction Classification as Indexing Tool

Examples of some recent work

Horace: An Automatic System for the Hierarchical Classification of Chemical

Reactions. Rose, J.R., Gasteiger, J. J. Chem. Inf. Comput. Sci. 1994, 34, 74

COGNOS: A Beilstein-Type System for Organizing Organic Reactions.

Hendrickson, J.B., Sander, T. J. Chem. Inf. Comput. Sci. 1995, 35, 251

Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a Self-Organizing Neural Network.Chen, L., Gasteiger, J. J. Am. Chem. Soc. 1997, 119, 4033

Classification of Organic Reactions: Similarity of Reactions Based on Changes in the Electronic Features of Oxygen Atoms at the Reaction Sites.Satoh, H., Sacher, O., Nakata, T., Chen, L., Gasteiger, J., Funatsu, K. J. Chem. Inf. Comput. Sci. 1998, 38, 210

Topology-Based Reaction Classification: An Important Tool for the Efficient Management of Reaction Information.Kraut, H., Löw, P., Matuszczyk, H., Saller, H., Grethe, G. Proceed. 5th Internat. Conf. Chem. Struct., Noordwijkerhout, The Netherlands 1999, 26

Analysis of Reaction Information.Grethe, G. In “Handbook of Chemoinformatics” Gasteiger, J. (Ed.) Wiley-VCH, Volume 4, 1407 – 1427, Weinheim, 2003

Use of Chemical Information in Organic Synthesis

April 2006

Use of Chemical Information in Organic Synthesis

Reaction Indexing through Classification

N

O

CH3O

O

O

N

O

CH3O

O

O

Keywords: Michael addition, Michael reaction, ring closure…….

Molecule Type: N-heterocycle, isoquinoline, quinolizidine…..

Reaction Type: reaction centers

Based on:

N

O

CH3O

O

O

N

O

CH3O

O

O

April 2006

Use of Chemical Information in Organic Synthesis

Classify v.2. 5, developed by InfoChem, Munich

Based on InfoChem’s reaction center perception algorithm

A bond is defined as a reaction center if it is made or broken

An atom is defined as a reaction center if it changes

number of implicit hydrogens

number of valencies

number of -electrons

atomic charge

the connecting bond is a reaction center

Reaction Classification - Background

Rules and Definitions

April 2006

Use of Chemical Information in Organic Synthesis

Hashcodes are calculated for all reaction centers taking into account atom properties atom type valence state total number of bonded hydrogens (implicit plus explicitly drawn) number of -electrons aromaticity formal charges reaction center information

The sum of all reaction center hashcodes of all reactants and one product of a reaction provides the unique reaction classification code:

‘ClassCode’

Rules and Definitions

Reaction Classification - Background

April 2006

Use of Chemical Information in Organic Synthesis

Rules and Definitions

Reaction Classification - Background

Inclusion of atoms in the immediate environment (spheres)

reaction centers only (0-sphere = BROAD) reaction centers + -atoms (1-sphere = MEDIUM) reaction centers + -atoms (2-sphere = NARROW) inclusion of one sp3-atoms during sphere expansion

Atom equivalency atoms in the same group of the periodic table, with the

exception of row-2 elements, are considered equivalent

Multiple occurrences of identical transformations are

handled as one

April 2006

Use of Chemical Information in Organic Synthesis

Rules and Definitions

Reaction Classification - Background

N

CN

H

H

N

CN 0-Sphere (Broad)

Reaction centers only, similar to broadlybased substructure search

large-sized cluster or hitlist

1-Sphere (Medium)Reaction centers plus alpha atoms,excluding hydrogens

medium-sized cluster or hitlist

2-Sphere (Narrow)

N

CC

N

N

CC

N

H

H

N

CC

N

N

CC

N

H

HReaction centers plus beta atoms,excluding consecutive sp3-atoms

small-sized cluster or hitlist

Number of hits from CIRX97 (70060 rxns) for identical transformation at different classification levels

O

O

OH

OH

...655778

...151297

...077692

Number of hits

Topological specificity

700

300

50

broad

medium

narrow

April 2006

Use of Chemical Information in Organic Synthesis

Classification codes are data stored in the database usable for sorting (clustering)

N

O

CH3O

O

O

N

O

CH3O

O

O

N

O

O

O N

O

O

O

H

H

Chiral

RSS-Search Query: (in red)

Result: 156 hits

Clustered byClassification Code “MEDIUM)

72 clusters1.Cluster (20 rxns)

N O

O

O

O

ON

O

O O

2.Cluster (15 rxns)

NO

OO O

NO

O

OO

Chiral

3.Cluster (13 rxns)

O

O OO

OO

HH

4.Cluster (8 rxns)

Reaction Classification – Clustering of Search Results

April 2006

Use of Chemical Information in Organic Synthesis

Classification by Reaction Names

Chemists are familiar with Name Reactions (Diels-Alder, Michael etc.) Papers in a one issue of JOC (22, 2004) mentioned 20 name reactions,

known and lesser known, some multiple times e.g.,Mitsunobu reaction, Nazarov reaction, Wolff rearrangement etc.

Several books dealing exclusively with Name Reactions* (ca.700 reactions)

Use of Name Reactions facilitates reaction retrieval Complementary to other searches Used in combination with other data Easier alternative to formulating complex RSS queries

Excellent browsing tool Overview of scope and limitations of a given reaction, e.g. Aldol reaction Combining different reaction types leading to same compound class

Hantzsch pyridine synthesis from dihydropyridines or ß-keto esters Fischer Indole synthesis from hydrazines or hydrazones Darzens reaction of epoxides from esters, amides, sulfones, or nitriles

Named Organic Reactions, Laue, T. and Plagens, A., Eds., John Wiley &Sons, 1st Edition 1999, 2nd Edition 2005Organic Syntheses Based on Name Reactions, Hassner, A. and Stumer,C., Eds., Elsevier Science,1st Edition 1994; 2nd Edition 2002Name Reactions, Li, J. J., Ed., Springer, 2002Strategic Applications of Named Reactions, Kürti, L. and Czakó, B., Eds., Elsevier, 2005Name Reactions and Reagents in Organic Synthesis, Mundy, B.P; Ellerd, M.G. and Favaloro, F.G., Jr. Wiley Interscience 2005

*References

Note: The work on classification by reaction names is being developed at InfoChem (Munich) in consultation with G.Grethe

April 2006

Use of Chemical Information in Organic Synthesis

Use of Chemical Information in Organic Synthesis

Established electronically not intellectually NOW – Intellectually derived Inclusion of intellectually derived keywords greatly varies from database to

database and depend on abstractors and are either too inclusive or not comprehensive

Example: “Michael addition” 184 hits (keywords) vs. 89 hits (RSS search) 52 hits (reaction name keywords)

FUTURE – Electronically derived Assignments based on single or multiple RSS searches

Boolean logic is applied to combine and/or subtract search results (queries) Assignments are pre-processed and added as data to database(s)

Name reactions are aligned in hierarchical order Based on main reaction categories (addition, substitution,

rearrangements, eliminations, oxidations, reductions) Reactions can be listed in multiple categories, e.g.:

Baeyer-Villiger oxidation in Oxidation and Rearrangement Hierarchy must be able to accommodate non-name reactions (future project) Reactions containing n reactions (e.g., tandem reactions) are listed in n

categories Individual name reactions have to be recognizable Otherwise, stored under “Miscellaneous”

Queries and corresponding names are stored in spreadsheet

Classification by Reaction Names - Requirements

April 2006

Main categories First Level Second Level Third Level

Addition

Elimination

Rearrangements

Reductions

Oxidations

Heterocyclic Synthesis

Miscellaneous

1,2-Addition

1,4-Addition

Cycloaddition

Aromatic electrophilic

Aliphatic Nucleophilic

Free radical

Sigmatropic

Substitution

Nucleophilic

Darzens condensation

Michael reaction

Schotten-Baumann reaction

Sulfones

Intermolecular

Diels-Alder reaction4+2 Cycloadditions

Friedel-Crafts acylation Intramolecular

Gomberg-Bachmann reaction Intermolecular

Hofmann rearrangement Alkyl

[3,3] Sigmatropic rearrangementClaisen rearrangement

Radical

Cope reaction

Cannizaro reaction

Baeyer-Villiger oxidation Lactones

Hantzsch pyridine synthesisModified

Alper reaction Cyclocarbonylation

Chugaev reaction

Intermolecular

Use of Chemical Information in Organic Synthesis

Classification by Reaction Names - Hierarchy

April 2006

N.1. [C,H]

H

[C,H][C,H]

C(s*).2.

ON(s*).1. C(s*)

.2.

[C,H]

[C,H]

.3.

[C,H] .4.

A.3.

.4.

A

H

+ +

O OO

N N

O

N

N

O

O

+ + H A AN(s*)

Q

N(s*)

[C,H]

N(s*)

C(s*)

[C,H]

A A

Q N

[C,H]

[C,H]C(s*)

O+ +

Example: Intermolecular Mannich reaction with CH-acidic compounds

N.1. [C,H]

H

[C,H][C,H]

C(s*).2.

OH A

.3.A .4. N(s*)

.1. C(s*).2.

[C,H]

[C,H]

A.3.

A .4.

Rn[C,H]+ +CHO NH2NH

H

+ +

H3C

CH3

O

OCH3H2N

CHO NH

H3C

O

OCH3

+ +

Procedure: - generate query for general search - check hitlist for non-relevant hits - formulate queries to eliminate

negatives - combine queries using Boolean

operators

Mannich reaction Query Q1

Elimination of negative hits:

Biginelli reaction Query Q2

Aza Diels-Alder reaction Query Q3

Query set for intermolecular Mannich reaction with CH-acidic compounds: Q1 – (Q2+Q3)

Classification by Reaction Names– Keyword Generation

Use of Chemical Information in Organic Synthesis

April 2006

Use of Chemical Information in Organic Synthesis

Example of query menu (partial view) from InfoChem’s SpresiWeb

Classification by Reaction Names

April 2006

Use of Chemical Information in Organic Synthesis

“The design of organic syntheses by chemists without the help of computers proceeds in anything but a systematic stepwise manner from the target molecule to available starting materials. A systematic stepwise approach is more the exception than the rule”.

“The human mind solves problems by lateral thinking, jumping from one idea to the next, from one question to a different one, from retrosynthetic thinking to considering the course and outcome of a reaction ,etc.”

Gasteiger, J.; Ihlenfeldt, W.D.; Roese, P. Recl.Trav.Chim.Pays-Bas 1992, 111, 270.

Journals Major Reference Works

Books Databases E-Labjournal

+ Knowledge, Intuition, and Experience of Synthetic Chemist

Databases

The paradigm in an ideal electronic world

April 2006

Use of Chemical Information in Organic Synthesis

(Reaction Databases, DiscoveryGate ) (Elsevier MDL, Third Party, Proprietary etc.)

Tertiary Sources Primary Journals

ClassCodes

LinkFinderPlus (citations)

LinkFinderPlus (citations)

Future links

Major Reference Works (MRWs)

present status

iMRW links

Integrated Major Reference Works (iMRW)

April 2006

Use of Chemical Information in Organic Synthesis

Simulating chemists’ approach of gathering information from various sources (lateral approach) for solving synthetic problems through a simple point-and-click mechanism

Assisting chemists with the synthesis of new compounds by providing complementary information With examples for synthetic methodologies from reaction databases From summaries, critically evaluated by experts, describing

reaction mechanisms principles of stereo-controlled reactions applications, preparations, and properties of reagents and other information generally not found in reaction databases

Through one-click linking to the primary literature when combined with LinkFinderPlus

Integrated Major Reference Works - Concept

April 2006

Use of Chemical Information in Organic Synthesis

is a unique collaboration between Elsevier MDL, InfoChem and leading scientific

publishers (Elsevier Science, Georg Thieme Verlag, and Springer-Verlag)

provides one-click, bi-directional linking based on reaction type between synthetic methodology databases and electronic versions of major reference works (MRWs) or between individual MRWs, i.e.a true integration of information:

allows text and (sub)structure searching over multiple major reference works from a single user interface

iMRW….

Integrated Major Reference Works - Summary

April 2006

Use of Chemical Information in Organic Synthesis

Detailed information about methodologies based on reaction type

Information about scope and limitations of reactions

Evaluated experimental procedures

Information about reaction mechanism, stereo-control, effect of substituents and ligands, and other factors influencing a reaction

Information about reagents and catalysts, their preparation and properties

Updates for each of them are planned or under consideration by the publishers and will be added when available

Major Reference Works in iMRW

April 2006

Use of Chemical Information in Organic Synthesis

CAC is an innovative reference work that reviews in three volumes catalytic methods for asymmetric organic synthesis, a major challenge in synthetic chemistry today. Illustrated by over 6,000 reactions critically evaluated by 60 leading experts in the field, the basic principles, mechanisms, basis for stereoinduction, and scope and limitations of asymmetric reactions are covered in-depth.

Editors: Eric N. Jacobsen, Andreas Pfaltz, Hisashi Yamamoto

(1999)

Comprehensive Asymmetric Catalysis (CAC) - Summary

April 2006

Use of Chemical Information in Organic Synthesis

COFGT covers in 40,000 reactions and seven volumes the vast subject of organic synthesis in terms of the introduction and interconversionof functional groups. The editors have adopted a rather rigorous, logical and formal treatment on the basis of structure, which enables a detailed analysis of all known, and indeed of some as yet unknown, functional groups. Therefore, the treatise deals rationally and comprehensively with the method of their construction.

Editors-in-Chief: Alan R. Katritzky, Otto Meth-Kohn, Charles W. Rees

Comprehensive Organic Functional Group Transformations (COFGT) – Summary

(1995)

April 2006

Use of Chemical Information in Organic Synthesis

Editorial Board: D. Bellus, S. V. Ley, R. Noyori, M. RegitzP. J. Reider, E. Schaumann, I. Shinkai, E. J. Thomas, B. M. Trost

Science of Synthesis is the authoritative and comprehensive reference work for the entire field of organic and organometallic synthesis. The series of 48 volumes will be published over a period of 8 years, it will present 15,000 selected synthetic methods for all classes of compounds illustrated by 150,000 reactions, and it includes- Methods critically evaluated by leading scientists- Background information and detailed experimental procedures- Schemes and tables which illustrate the reaction scope

2001

Science of Synthesis - Summary Houben-Weyl Methods of Molecular Transformations

April 2006

Use of Chemical Information in Organic Synthesis

N

NN

NMe

EtO2C

NH2

Muray, E.; Rifé, J.; Branchadell, V.; Ortuňo, R.M. J. Org. Chem. 2002, 67, 4520 – 4525

(The paper describes the syntheses of cyclopropyl nucleosides as potential antiviral and antitumor agents)

Collecting Information for the Synthesis of a new Compound

Target molecule:

April 2006

Use of Chemical Information in Organic Synthesis

N

NN

NMe

EtO2C

NH2

Retrosynthetic Analysis: N1-alkylation of adenine

1.Step: general information about the alkylation reaction2.Step: information about the preparation of A, including stereochemistry3.Step: information about scope and limitations, effect of substituents, applicable reagents etc.

Synthesis Plan

N

NN

NH

NH2

Me

EtO2C

X

+

A B

April 2006

Use of Chemical Information in Organic Synthesis

Reaction Substructure + Data Search in DiscoveryGate

April 2006

Use of Chemical Information in Organic Synthesis

N

NN

N NCl

Cl

I

NN

N

N

N

Cl

Cl

+

April 2006

Use of Chemical Information in Organic Synthesis

April 2006

Use of Chemical Information in Organic Synthesis

Search for Similar Reactions in iMRW

April 2006

Use of Chemical Information in Organic Synthesis

Literature LinkingCOFGT chapter

April 2006

Use of Chemical Information in Organic Synthesis

Text Search in iMRW

April 2006

Use of Chemical Information in Organic Synthesis

Information about Enantioselective Cyclopropanation from CAC

April 2006

Use of Chemical Information in Organic Synthesis

Text Search Results from COFGT and Linking to Literature

April 2006

Integration of iMRW with Reaction Database

Use of Chemical Information in Organic Synthesis

April 2006

Use of Chemical Information in Organic Synthesis

DiscoveryGate provides chemists with relevant information from different sources required for solving synthetic problems in a single system allowing for interaction by the user in an interactive fashion

Access is provided from an intuitive user-interface by a simple point-and-click mechanism.

The system very closely simulates the lateral information gathering process of synthetic chemists

Conclusion


Recommended