Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | ursula-marsh |
View: | 232 times |
Download: | 1 times |
Simplified molecular input line entry specification
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings
SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules
SMILESSMILESSimplified Molecular Input Line Entry
System (SMILES)Widely used AND computationally efficientUses atomic symbols and a set of intuitive
rulesUses hydrogen-suppressed molecular
graphs (HSMG)
Canonical SMILES and Isomeric SMILES
The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation
– A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database
The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds
– A notable feature of these rules is that they allow rigorous partial specification of chirality.
Graph-based definition
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph
The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree
Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes
Parentheses are used to indicate points of branching on the tree
SMILES BranchesSMILES BranchesRepresented by enclosure in
parenthesesCan be nested or stackedExamples:
CC(O)CC is 2-Butanol
OCC(C)C is iso-Butanol
OC(C)(C)C is tert-Butanol
SMILES BondsSMILES BondsEthene
Chloroethene
1,1-Dichloroethene
cis-1,2-Dichloroethene
Trichloroethene
Perchloroethene
C=C
ClC=C
ClC(Cl)=C
ClC=CCl
ClC(Cl)=CCl
ClC(Cl)=C(Cl)Cl
SMILES SymbolsSMILES SymbolsString of alphanumeric characters and
certain punctuation symbolsTerminates at the first space
encountered when read left to rightThe ORGANIC SUBSET:
B, C, N, O, P, S, F, Cl, Br, I
Other SMILES Other SMILES AtomsAtoms
Aliphatic or nonaromatic carbon: CAtom in aromatic ring: lowercase letterDesignate ring closure with pairs of
matching digits, e.g.c1ccccc1 is Benzene, whereas
C1CCCCC1 is Cyclohexane
SMILES ChargesSMILES Charges
Specify attached hydrogens and charges in square brackets
Number of attached hydrogens is the symbol H followed by optional digit
SMILES Charges[H+]
[OH-]
[OH3+]
[Fe++]
[NH4+]
proton
hydroxyl anion
hydronium cation
iron(II) cation
ammonium cation
SMILES Cyclic SMILES Cyclic StructuresStructures
Break one single or one aromatic bond in each ring
Number in any order– Designate ring-breaking atoms by the
same digit following the atomic symbol
Cyclic StructuresCyclic Structures Numbers indicate start and stop of ring Same number indicates start and end of the ring,
entered immediately following the start/end atoms
Only numbers 1 – 9 are used A number should appear only twice Atom can be associated w. 2 consecutive
numbers, e.g., Napthalene: c12ccccc1cccc2
SMILES ConventionsSMILES Conventions
Avoid two consecutive left parentheses if possible
Strive for the fewest number of possible branches
Tautomeric bonds are not designated; enter the appropriate form
Further RestrictionsFurther Restrictions
A branch cannot begin a SMILES notation
A branch cannot immediately follow a double- or triple-bond symbol
Example: C=(CC)C is invalid, butC(=CC)C or C(CC)=C are valid SMILES
SMILES FragmentsSMILES Fragments
Nitro
Nitrate
Nitrite
Sulfonic acid
Cyanide/Nitrile
Azide
Azido
N(=O)(=O)
ON(=O)(=O)
ON(=O)
S(=O)(=O)O
C#N
N=N#N
N+=N-
SMILES MetalsSMILES Metals[Al] [As] [Au] [Be]
[Bi] [Cd] [Ca] [Fe]
[Hg] [K] [Li] [Mg]
[Na] [Ni] [Pt] [Sb]
[Sn] [Zn] [Zr]
Isomeric and Chiral SMILESIsomeric and Chiral SMILES
Isomeric configuration indicated by forward and backward slashes: / \
Examples:– trans-1,2-dibromoethene: Br/C=C/Br– cis-1,2-dibromoethene: Br/C=C\Br
Chirality indicated by the “@” symbol
Another Application
SMILESCAS Databasehttp://esc.syrres.com/interkow/smilecas.htm
Over 103,000 SMILES notations Input CAS Registry Number Leads to SMILES and thence to a structure
search