Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | cleopatra-norris |
View: | 215 times |
Download: | 1 times |
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Information for the Practicing Synthetic Chemist: The Search for Relevant Answers
Guenter Grethe May, 2006
Available information Introduction to reaction data searching Concepts and problems Basis of reaction classification DiscoveryGate Retrieving relevant information for the synthesis of new compounds Questions & Answers
AGENDA:
April 2006
Use of Chemical Information in Organic Synthesis
Information Needs of Synthetic Organic Chemists in Basic Research and Development
General: searching for information on molecules precedes retrieval of synthetic methodology data
• new preparation of intermediates and starting materials• well established, high yield preparations (experimental procedures)• new synthetic methodologies (new reagents, catalysts etc.)• information on starting materials (availability, price, physical data etc.)• physical properties of reagents, solvents and catalysts• access to the primary, secondary, and tertiary literature• spectral information of related compounds
April 2006
Query: Is this particular molecule or similar ones known? Specific data? Answer: Yes or No from existing databases, including patents
Query: How to selectively reduce the nitrile group (transformation?) Answer: Pointers to relevant examples in the literature Criteria:
Efficient transformation Functional group compatibility Reactions conditions
Use of Chemical Information in Organic Synthesis
Differences in Molecule vs. Reaction SearchingCl CN
NO2
Cl CN
NO2
Cl
NO2
NH2
ReactionConditions?
Molecules:
Reactions:
April 2006
Use of Chemical Information in Organic Synthesis
online: CASREACT (CAS) (ca. 10.5 Mio, including Spresi database, 1985 - present )
Spresi (InfoChem) (ca. 4.5 Mio, 1974 – 2004)
CrossFireplusReactions (Elsevier MDL, STN) (ca. 10 Mio, 1779 - present) ChemInform RX on STN (FIZ Chemie) (ca. 0.8 Mio) CCR (Thomson Scientific) (ca. 0.6 Mio)
inhouse:ChemInform Reaction Library (Elsevier MDL)
Spresi (InfoChem)CrossFire Beilstein (Elsevier MDL)Specialty Databases (several vendors) Proprietary Databases
For a good review see: Zass, E. "Reaction Databases", In: Encyclopedia of Computational Chemistry, Schleyer, P. von R.; Allinger, N.L.; Clark, T.; Gasteiger, J.; Kollman, P.A.; Schaefer, H.F.; Shreiner, P.R. (Eds.). Wiley, Chichester, 4, 2402-2420. QD39.3.E46 E53 1998
Available Reaction Databases
April 2006
Use of Chemical Information in Organic Synthesis
Preparation of a distinct compound requires access to information about new synthetic methodologies in
journals and databases experimental details for the preparation of known
intermediates and starting materials from databases, journals and other sources
tools to plan syntheses and select optimal reaction conditions Preparation of a library of diverse compounds requires
all of the above knowledge about the characteristics of functional groups information about available building blocks
Process development requirements are defined by access to information about various reaction conditions of a
reaction knowledge about the characteristics of molecules or their
fragments under required reaction condition tools to calculate the behavior of reagents, solvents, and
catalysts
Use of Available Information in Synthesis
April 2006
Use of Chemical Information in Organic Synthesis
multiple access systems different user interfaces different modi operandi difficult query formulation
substructure concept keyword inconsistencies
limited post-search management of large hitlists some integrated access to other information
sources
Most importantly: failure of available systems to recognizeand to facilitate the integration of the vast knowledge of synthetic chemists
Barriers Impeding the Use of Available Information by Endusers
April 2006
Structure-Based Searches Full structure
Only for reactions with known molecules (not very useful) Reaction substructure (RSS)
Most frequently used mode (difficult for end-users to formulate effective query) Reaction similarity
Various methodologies using different parameters (results often vary greatly, good for browsing and idea generation)
Reaction classification Several methodologies, mostly based on structural information about reaction
centers and immediate environment (good indexing tool, improvement over reaction similarity)
Reagents, Solvents Full structure and substructure searches for molecules (not available in all
databases, used mostly in conjunction with other structural searches) Data-Based Searches
Keywords intellectually derived terms for name reactions, reaction types etc. (incomplete, not
very useful) Journal, author, title, yields, etc.
Text or numeric data searches (mostly used in conjunction with structural searches)
Use of Chemical Information in Organic Synthesis
Search Modes
April 2006
Use of Chemical Information in Organic Synthesis
N
O
CH3O
O
O
N
O
CH3O
O
O
Synthetic Problem:
Full Structure Search: No hits*
Reaction Substructure Search (colored fragment): 119 hits*
Keyword Search “Michael Addition”: 2972 hits*
*Results were obtained from Elsevier MDL’s combined reaction databases (ca. 1 Mio reactions); 2006
Class Code Search 672 hits* (broad, reaction center only)
Problems with Reaction Searching
April 2006
Use of Chemical Information in Organic Synthesis
NO2N
NCl
NO2N
Cl
NH2
Cl NO2
NCl NO2 NH2
N
NH2Oversimplified Query(nitrile to primary amine)
737 Hits
Problems with Substructure Searching
0 Hits
Narrowly Defined Query
Problems:
- how to avoid excessively large hitlist- how to formulate “reasonable” search queries
Solutions:
- combination of several queries (expert approach)- indexing of reactions (focusing on relevant reactions)- facilitating query building (non-expert approach, intuitive)
DATABASE SIZE: ca. 1 million reactions
April 2006
Use of Chemical Information in Organic Synthesis
Goal for an Efficient Reaction Data Management System
Create an environment that allows for combining the intelligence and creativity of synthetic chemists with the processing and simulating power of computers and the wealth of information in databases to meet the challenges in the laboratory for developing efficient syntheses.
April 2006
Use of Chemical Information in Organic Synthesis
User interfaces based on users’ tasks and capabilities
(e.g. CrossFire Web, DiscoveryGate, Reaction Browser, Scifinder)(see “A Framework for the Evaluation of Chemical Structure Databases”, Cooke,F; Schofield, H. J. Chem. Inf. Comput. Sci. 2001, 41, 1131-1140)
Hierarchical thesauri for keywords and reaction types
Effective indexing of databases (e.g. classification)
Simplification of the querying process (natural, not rule dependent)
Efficient post-search management tools (e.g.clustering)
Seamless integration of various information sources
(web environment, point-and-click)
Most importantly: available tools must simulate the chemist’s problem solving process
Requirements to Facilitate Enduser Searching
April 2006
Reasons alternate method for indexing databases - complement to structure-
based retrieval systems access to “generic” types of information in retrieval systems post-search management of large hitlists simplification of query generation linking of reaction information from different sources source for deriving knowledge bases for reaction prediction and
synthesis design automatic procedures for analyses and correlations, e.g. quality
control and overlap studies
‘Do We Still Need a Classification of Organic Reactions?’
Reaction Classification as Indexing Tool
Use of Chemical Information in Organic Synthesis
Reaction Classification as Indexing Tool
April 2006
Reaction Classification as Indexing Tool
Examples of some recent work
Horace: An Automatic System for the Hierarchical Classification of Chemical
Reactions. Rose, J.R., Gasteiger, J. J. Chem. Inf. Comput. Sci. 1994, 34, 74
COGNOS: A Beilstein-Type System for Organizing Organic Reactions.
Hendrickson, J.B., Sander, T. J. Chem. Inf. Comput. Sci. 1995, 35, 251
Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a Self-Organizing Neural Network.Chen, L., Gasteiger, J. J. Am. Chem. Soc. 1997, 119, 4033
Classification of Organic Reactions: Similarity of Reactions Based on Changes in the Electronic Features of Oxygen Atoms at the Reaction Sites.Satoh, H., Sacher, O., Nakata, T., Chen, L., Gasteiger, J., Funatsu, K. J. Chem. Inf. Comput. Sci. 1998, 38, 210
Topology-Based Reaction Classification: An Important Tool for the Efficient Management of Reaction Information.Kraut, H., Löw, P., Matuszczyk, H., Saller, H., Grethe, G. Proceed. 5th Internat. Conf. Chem. Struct., Noordwijkerhout, The Netherlands 1999, 26
Analysis of Reaction Information.Grethe, G. In “Handbook of Chemoinformatics” Gasteiger, J. (Ed.) Wiley-VCH, Volume 4, 1407 – 1427, Weinheim, 2003
Use of Chemical Information in Organic Synthesis
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Indexing through Classification
N
O
CH3O
O
O
N
O
CH3O
O
O
Keywords: Michael addition, Michael reaction, ring closure…….
Molecule Type: N-heterocycle, isoquinoline, quinolizidine…..
Reaction Type: reaction centers
Based on:
N
O
CH3O
O
O
N
O
CH3O
O
O
April 2006
Use of Chemical Information in Organic Synthesis
Classify v.2. 5, developed by InfoChem, Munich
Based on InfoChem’s reaction center perception algorithm
A bond is defined as a reaction center if it is made or broken
An atom is defined as a reaction center if it changes
number of implicit hydrogens
number of valencies
number of -electrons
atomic charge
the connecting bond is a reaction center
Reaction Classification - Background
Rules and Definitions
April 2006
Use of Chemical Information in Organic Synthesis
Hashcodes are calculated for all reaction centers taking into account atom properties atom type valence state total number of bonded hydrogens (implicit plus explicitly drawn) number of -electrons aromaticity formal charges reaction center information
The sum of all reaction center hashcodes of all reactants and one product of a reaction provides the unique reaction classification code:
‘ClassCode’
Rules and Definitions
Reaction Classification - Background
April 2006
Use of Chemical Information in Organic Synthesis
Rules and Definitions
Reaction Classification - Background
Inclusion of atoms in the immediate environment (spheres)
reaction centers only (0-sphere = BROAD) reaction centers + -atoms (1-sphere = MEDIUM) reaction centers + -atoms (2-sphere = NARROW) inclusion of one sp3-atoms during sphere expansion
Atom equivalency atoms in the same group of the periodic table, with the
exception of row-2 elements, are considered equivalent
Multiple occurrences of identical transformations are
handled as one
April 2006
Use of Chemical Information in Organic Synthesis
Rules and Definitions
Reaction Classification - Background
N
CN
H
H
N
CN 0-Sphere (Broad)
Reaction centers only, similar to broadlybased substructure search
large-sized cluster or hitlist
1-Sphere (Medium)Reaction centers plus alpha atoms,excluding hydrogens
medium-sized cluster or hitlist
2-Sphere (Narrow)
N
CC
N
N
CC
N
H
H
N
CC
N
N
CC
N
H
HReaction centers plus beta atoms,excluding consecutive sp3-atoms
small-sized cluster or hitlist
Number of hits from CIRX97 (70060 rxns) for identical transformation at different classification levels
O
O
OH
OH
...655778
...151297
...077692
Number of hits
Topological specificity
700
300
50
broad
medium
narrow
April 2006
Use of Chemical Information in Organic Synthesis
Classification codes are data stored in the database usable for sorting (clustering)
N
O
CH3O
O
O
N
O
CH3O
O
O
N
O
O
O N
O
O
O
H
H
Chiral
RSS-Search Query: (in red)
Result: 156 hits
Clustered byClassification Code “MEDIUM)
72 clusters1.Cluster (20 rxns)
N O
O
O
O
ON
O
O O
2.Cluster (15 rxns)
NO
OO O
NO
O
OO
Chiral
3.Cluster (13 rxns)
O
O OO
OO
HH
4.Cluster (8 rxns)
Reaction Classification – Clustering of Search Results
April 2006
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names
Chemists are familiar with Name Reactions (Diels-Alder, Michael etc.) Papers in a one issue of JOC (22, 2004) mentioned 20 name reactions,
known and lesser known, some multiple times e.g.,Mitsunobu reaction, Nazarov reaction, Wolff rearrangement etc.
Several books dealing exclusively with Name Reactions* (ca.700 reactions)
Use of Name Reactions facilitates reaction retrieval Complementary to other searches Used in combination with other data Easier alternative to formulating complex RSS queries
Excellent browsing tool Overview of scope and limitations of a given reaction, e.g. Aldol reaction Combining different reaction types leading to same compound class
Hantzsch pyridine synthesis from dihydropyridines or ß-keto esters Fischer Indole synthesis from hydrazines or hydrazones Darzens reaction of epoxides from esters, amides, sulfones, or nitriles
Named Organic Reactions, Laue, T. and Plagens, A., Eds., John Wiley &Sons, 1st Edition 1999, 2nd Edition 2005Organic Syntheses Based on Name Reactions, Hassner, A. and Stumer,C., Eds., Elsevier Science,1st Edition 1994; 2nd Edition 2002Name Reactions, Li, J. J., Ed., Springer, 2002Strategic Applications of Named Reactions, Kürti, L. and Czakó, B., Eds., Elsevier, 2005Name Reactions and Reagents in Organic Synthesis, Mundy, B.P; Ellerd, M.G. and Favaloro, F.G., Jr. Wiley Interscience 2005
*References
Note: The work on classification by reaction names is being developed at InfoChem (Munich) in consultation with G.Grethe
April 2006
Use of Chemical Information in Organic Synthesis
Use of Chemical Information in Organic Synthesis
Established electronically not intellectually NOW – Intellectually derived Inclusion of intellectually derived keywords greatly varies from database to
database and depend on abstractors and are either too inclusive or not comprehensive
Example: “Michael addition” 184 hits (keywords) vs. 89 hits (RSS search) 52 hits (reaction name keywords)
FUTURE – Electronically derived Assignments based on single or multiple RSS searches
Boolean logic is applied to combine and/or subtract search results (queries) Assignments are pre-processed and added as data to database(s)
Name reactions are aligned in hierarchical order Based on main reaction categories (addition, substitution,
rearrangements, eliminations, oxidations, reductions) Reactions can be listed in multiple categories, e.g.:
Baeyer-Villiger oxidation in Oxidation and Rearrangement Hierarchy must be able to accommodate non-name reactions (future project) Reactions containing n reactions (e.g., tandem reactions) are listed in n
categories Individual name reactions have to be recognizable Otherwise, stored under “Miscellaneous”
Queries and corresponding names are stored in spreadsheet
Classification by Reaction Names - Requirements
April 2006
Main categories First Level Second Level Third Level
Addition
Elimination
Rearrangements
Reductions
Oxidations
Heterocyclic Synthesis
Miscellaneous
1,2-Addition
1,4-Addition
Cycloaddition
Aromatic electrophilic
Aliphatic Nucleophilic
Free radical
Sigmatropic
Substitution
Nucleophilic
Darzens condensation
Michael reaction
Schotten-Baumann reaction
Sulfones
Intermolecular
Diels-Alder reaction4+2 Cycloadditions
Friedel-Crafts acylation Intramolecular
Gomberg-Bachmann reaction Intermolecular
Hofmann rearrangement Alkyl
[3,3] Sigmatropic rearrangementClaisen rearrangement
Radical
Cope reaction
Cannizaro reaction
Baeyer-Villiger oxidation Lactones
Hantzsch pyridine synthesisModified
Alper reaction Cyclocarbonylation
Chugaev reaction
Intermolecular
Use of Chemical Information in Organic Synthesis
Classification by Reaction Names - Hierarchy
April 2006
N.1. [C,H]
H
[C,H][C,H]
C(s*).2.
ON(s*).1. C(s*)
.2.
[C,H]
[C,H]
.3.
[C,H] .4.
A.3.
.4.
A
H
+ +
O OO
N N
O
N
N
O
O
+ + H A AN(s*)
Q
N(s*)
[C,H]
N(s*)
C(s*)
[C,H]
A A
Q N
[C,H]
[C,H]C(s*)
O+ +
Example: Intermolecular Mannich reaction with CH-acidic compounds
N.1. [C,H]
H
[C,H][C,H]
C(s*).2.
OH A
.3.A .4. N(s*)
.1. C(s*).2.
[C,H]
[C,H]
A.3.
A .4.
Rn[C,H]+ +CHO NH2NH
H
+ +
H3C
CH3
O
OCH3H2N
CHO NH
H3C
O
OCH3
+ +
Procedure: - generate query for general search - check hitlist for non-relevant hits - formulate queries to eliminate
negatives - combine queries using Boolean
operators
Mannich reaction Query Q1
Elimination of negative hits:
Biginelli reaction Query Q2
Aza Diels-Alder reaction Query Q3
Query set for intermolecular Mannich reaction with CH-acidic compounds: Q1 – (Q2+Q3)
Classification by Reaction Names– Keyword Generation
Use of Chemical Information in Organic Synthesis
April 2006
Use of Chemical Information in Organic Synthesis
Example of query menu (partial view) from InfoChem’s SpresiWeb
Classification by Reaction Names
April 2006
Use of Chemical Information in Organic Synthesis
“The design of organic syntheses by chemists without the help of computers proceeds in anything but a systematic stepwise manner from the target molecule to available starting materials. A systematic stepwise approach is more the exception than the rule”.
“The human mind solves problems by lateral thinking, jumping from one idea to the next, from one question to a different one, from retrosynthetic thinking to considering the course and outcome of a reaction ,etc.”
Gasteiger, J.; Ihlenfeldt, W.D.; Roese, P. Recl.Trav.Chim.Pays-Bas 1992, 111, 270.
Journals Major Reference Works
Books Databases E-Labjournal
+ Knowledge, Intuition, and Experience of Synthetic Chemist
Databases
The paradigm in an ideal electronic world
April 2006
Use of Chemical Information in Organic Synthesis
(Reaction Databases, DiscoveryGate ) (Elsevier MDL, Third Party, Proprietary etc.)
Tertiary Sources Primary Journals
ClassCodes
LinkFinderPlus (citations)
LinkFinderPlus (citations)
Future links
Major Reference Works (MRWs)
present status
iMRW links
Integrated Major Reference Works (iMRW)
April 2006
Use of Chemical Information in Organic Synthesis
Simulating chemists’ approach of gathering information from various sources (lateral approach) for solving synthetic problems through a simple point-and-click mechanism
Assisting chemists with the synthesis of new compounds by providing complementary information With examples for synthetic methodologies from reaction databases From summaries, critically evaluated by experts, describing
reaction mechanisms principles of stereo-controlled reactions applications, preparations, and properties of reagents and other information generally not found in reaction databases
Through one-click linking to the primary literature when combined with LinkFinderPlus
Integrated Major Reference Works - Concept
April 2006
Use of Chemical Information in Organic Synthesis
is a unique collaboration between Elsevier MDL, InfoChem and leading scientific
publishers (Elsevier Science, Georg Thieme Verlag, and Springer-Verlag)
provides one-click, bi-directional linking based on reaction type between synthetic methodology databases and electronic versions of major reference works (MRWs) or between individual MRWs, i.e.a true integration of information:
allows text and (sub)structure searching over multiple major reference works from a single user interface
iMRW….
Integrated Major Reference Works - Summary
April 2006
Use of Chemical Information in Organic Synthesis
Detailed information about methodologies based on reaction type
Information about scope and limitations of reactions
Evaluated experimental procedures
Information about reaction mechanism, stereo-control, effect of substituents and ligands, and other factors influencing a reaction
Information about reagents and catalysts, their preparation and properties
Updates for each of them are planned or under consideration by the publishers and will be added when available
Major Reference Works in iMRW
April 2006
Use of Chemical Information in Organic Synthesis
CAC is an innovative reference work that reviews in three volumes catalytic methods for asymmetric organic synthesis, a major challenge in synthetic chemistry today. Illustrated by over 6,000 reactions critically evaluated by 60 leading experts in the field, the basic principles, mechanisms, basis for stereoinduction, and scope and limitations of asymmetric reactions are covered in-depth.
Editors: Eric N. Jacobsen, Andreas Pfaltz, Hisashi Yamamoto
(1999)
Comprehensive Asymmetric Catalysis (CAC) - Summary
April 2006
Use of Chemical Information in Organic Synthesis
COFGT covers in 40,000 reactions and seven volumes the vast subject of organic synthesis in terms of the introduction and interconversionof functional groups. The editors have adopted a rather rigorous, logical and formal treatment on the basis of structure, which enables a detailed analysis of all known, and indeed of some as yet unknown, functional groups. Therefore, the treatise deals rationally and comprehensively with the method of their construction.
Editors-in-Chief: Alan R. Katritzky, Otto Meth-Kohn, Charles W. Rees
Comprehensive Organic Functional Group Transformations (COFGT) – Summary
(1995)
April 2006
Use of Chemical Information in Organic Synthesis
Editorial Board: D. Bellus, S. V. Ley, R. Noyori, M. RegitzP. J. Reider, E. Schaumann, I. Shinkai, E. J. Thomas, B. M. Trost
Science of Synthesis is the authoritative and comprehensive reference work for the entire field of organic and organometallic synthesis. The series of 48 volumes will be published over a period of 8 years, it will present 15,000 selected synthetic methods for all classes of compounds illustrated by 150,000 reactions, and it includes- Methods critically evaluated by leading scientists- Background information and detailed experimental procedures- Schemes and tables which illustrate the reaction scope
2001
Science of Synthesis - Summary Houben-Weyl Methods of Molecular Transformations
April 2006
Use of Chemical Information in Organic Synthesis
N
NN
NMe
EtO2C
NH2
Muray, E.; Rifé, J.; Branchadell, V.; Ortuňo, R.M. J. Org. Chem. 2002, 67, 4520 – 4525
(The paper describes the syntheses of cyclopropyl nucleosides as potential antiviral and antitumor agents)
Collecting Information for the Synthesis of a new Compound
Target molecule:
April 2006
Use of Chemical Information in Organic Synthesis
N
NN
NMe
EtO2C
NH2
Retrosynthetic Analysis: N1-alkylation of adenine
1.Step: general information about the alkylation reaction2.Step: information about the preparation of A, including stereochemistry3.Step: information about scope and limitations, effect of substituents, applicable reagents etc.
Synthesis Plan
N
NN
NH
NH2
Me
EtO2C
X
+
A B
April 2006
Use of Chemical Information in Organic Synthesis
Reaction Substructure + Data Search in DiscoveryGate
April 2006
Use of Chemical Information in Organic Synthesis
Information about Enantioselective Cyclopropanation from CAC
April 2006
Use of Chemical Information in Organic Synthesis
Text Search Results from COFGT and Linking to Literature
April 2006
Integration of iMRW with Reaction Database
Use of Chemical Information in Organic Synthesis
April 2006
Use of Chemical Information in Organic Synthesis
DiscoveryGate provides chemists with relevant information from different sources required for solving synthetic problems in a single system allowing for interaction by the user in an interactive fashion
Access is provided from an intuitive user-interface by a simple point-and-click mechanism.
The system very closely simulates the lateral information gathering process of synthetic chemists
Conclusion