Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Workshop on CEFIC LRI Project EEM9.4
LRI AMBIT with IUCLID6 support and extended search capabilities
AMBIT Cheminformatics system
1
Nina Jeliazkova, Nikolay Kochev
Ideaconsult Ltd.
Sofia,Bulgaria
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Content
– Introduction
– Substance data integration in AMBIT ( different input formats)
– Search functionalities
- Structures, substances and endpoint data
- Structure standardization , transformation, tautomers
– Tools integration – via common API
- Toxtree, VEGA, other models, descriptors
– User management system to grant access rights via roles
– The read across workflow
- An use case integrating the above functionalities
– IT requirements
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium2
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT Chemoinformatics System
Developed within a CEFIC Long-Range Initiative (LRI)
EEM9.3 (2005,2008), EEM9.3-IC (2013-2015), EEM9.4 (2016-ongoing)
Continuously developed and extended through various projects
An Open Source Application with the following functions
Search for structure(s) [exact, similar, substructure] and meta data
Assigning structures to constituents, impurities …
Assessment tools (read across/category formation)
Prediction tools e.g. Toxtree (including Cramer rules , Protein binding, etc.),
descriptor calculation, pKa etc;
Data analysis tools e.g. regression, classification, clustering etc;
Data management : flexible import/export of data
Data exchange tools: manual or automated via REST Web services API;
Read across workflow
3
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT: Chemical structures database & machine
learning with web services API http://ambit.sourceforge.net
4
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT : Data integration via common data model
Excel spreadsheets
IUCLID6
Other
formats
Reports (Excel,
Word)
Other formats (RDF,
ISA-TAB, etc.)IUCLID5
JSON
ambitlri.ideaconsult.netFree text search
REST API
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
IUCLID6 support in AMBIT2
– IUCLID6: Completely new XML schema of all objects
- 372 schema files, 111 endpoint study record files
- Different approach of linking between objects (compared to IUCLID5)
– Implementation
- Java classes generated from the XML schema (via JAXB)
- AMBIT code to convert the generated classes to the internal data model and be
able to store into the database
- Use existing code for writing into the database
- And existing UI to show the data
– Transparent from user point of view: select .i6z or .i5z
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium6
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Spreadsheets for substance data import
7
configurable parser for spreadsheet data templates
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
EFSA OpenFoodTox datahttps://www.zenodo.org/record/344883
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium8
• Excel files• Not only
chemical structures and data
• Relationships between structures
• Imported into AMBIT database with the help of a JSON configuration
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Search substances by endpoint data
Ideaconsult Ltd.9
• Check one or more checkboxes and click the Update
results
The endpoints are combined by AND.
The results above show there are
only two substances having data for the three
selected endpoints (Appearance, Melting
point and Dissociation constant),
although there are
16 substances with data for appearance,
36 substances with melting point values and
15 substances with dissociation constant
Endpoints are grouped
in four categories
P-Chem, Env Fate
Eco Tox, Tox
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Free text search (experimental)
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium10
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT Search for Structures & Endpoint data
2) Find
Substance(s)
3) Display data
1) Find Structure(s)
11
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Combining information from other data sources and prediction results
The vertical sidebar
allows collating data
and model information
with the search results.
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Structure Diagram Editor
Ideaconsult Ltd.13
Click to show/hide
the editor
The structure
editor is JavaScript
based.
• To use the drawn
structure for
search, click
the Use button.
• To show the
structure,
specified as
SMILES in the
search bar, click
the Draw button.
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Substructure search
Ideaconsult Ltd.14
The substructure search query can be defined by drawing the
structure, selecting a SMARTS from the predefined list of
SMARTS, or entering a SMARTS, SMILES or chemical name in
the text box
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Substance tab
Ideaconsult Ltd.15
Use the folder
icon to open
the details.
The Substanc
es tab shows
the substances
related to the
chemical
structure, and
the role of the
chemical
structure (last
column ,
e.g. Constituent,
Impurity, Additive).
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
16
NH2 O
O-Na
+
NH2 O
O-Na
+
conversion
to implicit
hydrogens
keep the
largest
fragment
NH2 O
O-
kekulisation
NH2 O
O-
NH2 O
OH
(i) molecule neutralization
(ii) Custom reaction transformations
NH2 O
OH
Isotopes
cleanup
NH O
OH
structure
conversion to
a canonic
tautomer
Output: smiles, InChI
N=CCCCC(=O)O
InChI=1/C5H9NO2/c6-4-2-1-3-
5(7)8/h4,6H,1-3H2,(H,7,8)
Enabling Structure Search : Structure Standardization
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Canonic tautomer generation(a component of the standardisation procedure)
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium17
Input structure Generation of all
tautomers:
Rule instance search
RankingCanonical
NH2
OH
OH
NH
OH
OH
NH2
O
OH
NH
O
OH
NH2 O
OH
NH2 O
OH
NH O
OH
0.0 C(C=CN)C=C(O)O-0.1 C(C=CN)CC(=O)O-0.05 C(CC=N)C=C(O)O-0.15 C(CC=N)CC(=O)O
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Ideaconsult Ltd.18
• Automatic generation of all tautomeric forms of a given organic compound.
• Customizable rules for tautomeric transformations.
• The predefined knowledge base covers 1–3, 1–5 and 1–7 proton tautomeric
shifts. Typical supported tautomerism rules are keto-enol, imin-amin, nitroso-
oxime, azo-hydrazone, thioketo-thioenol, thionitroso-thiooxime, amidine-imidine,
diazoamino-diazoamino, thioamide-iminothiol and nitrosamine-diazohydroxide
• Simple energy based system for tautomer ranking implemented by a set of
empirically derived rules.
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT TAUTOMER
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium19
I D E A C O N S U L T L T D . 19
Result
Input structure
OC(O)=C(N)C
Generating of tautomeric forms:
- Combinatorial method
- Combinatorial method improved
- Incremental method (IA-DFS)
Rule selection
and
flag settings
RankingRe
cu
rsio
n
Structure is
removed
Post-generation
filtering
Canonical
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Structures transformation :AMBIT SMARTS/ SMIRKS
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium20
(1) Efficient representation of SMARTS
Queries (full Daylight syntax)
(2) Fast structure isomorphism /mapping/
(3) Support of recursive SMARTS and
stereo
(4) Syntax extensions
(5) Parsing of SMIRKS
(6) Transformation of the target chemical
objects
Transformations modes:
(1) single
(2) non-overlapping,
(3) non-identical,
(4) non-homomorphic or
(5) externally specified list of sites.
Recursive expressions explicitly define
the environment around S atom.
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Structure standardization in large datasets
– Flexible standardisation workflow
- The rules synchronised with pharma companies
– Datasets standardised with AMBIT ( H2020 FET ExCAPE
project)
- PubChem,ChEMBL,eMolecules,SureChem,ZINC,tox datasets ( > 80
mln compounds)
- http://ambit.sf.net/ambitcli_standardisation.html
- ExCAPE DB (1 mln compounds, 70 mln SAR data points) , AMBIT-
hosted , open access
- Possible future integration with LRI AMBIT
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium21
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Communications with other systems
Other
Tools
Other
Databases
Company IUCLID DB
& ECHA IUCLID DB
as
Major Data Sources
Transfer
of 14570
Dossiers
Transfer via
Web service
or *.i6z files
22
Data
transfer
Data
transfer
Data
transfer
LRI AMBIT
Supporting
Read across &
Category formation
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT Web API /UI for data analysis
Dataset
Models
Visualisation
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT Web API / UI for data analysis
24
Descriptor calculation, feature selection;
Classification and regression algorithms;
Rule based algorithms;
Applicability domain algorithms;
Visualization, similarity and substructure
queries ;
Composite algorithms (workflows);
Structure optimization (MOPAC), metabolite
generation, tautomer generation, etc.
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Integration with external toolshttps://www.vegahub.eu
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium25
Command line java application
provided by IRFMN
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Integration with external tools : VEGAhttps://ambitlri.ideaconsult.net/tool2/ui/vega
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium26
– REST model
wrapper
– Same API as
other models
(e.g. Toxtree)
– Same user
interface
– Predictions
automatically
stored
– Straightforward
integration with
read across
matrix
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT users management
Ideaconsult Ltd.27
The authorization is role based.
• Default roles: user, data
manager, admin, read-across
• Roles can be assigned at the
users page by admin user
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Restricted access to assessments
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium28
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
The read across workflow: integrated view of data and predictions
AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium29
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
AMBIT publications and contributing projects
Peer reviewed publications (excerpt)
1. J. Sun, N. Jeliazkova, et al, ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics, J. Cheminform., vol. 9, n. 1, p. 17, Mar. 2017.
2. N. Jeliazkova, et al, The eNanoMapper database for nanomaterial safety information, Beilstein J. Nanotechnol., vol. 6, pp. 1609–1634, Jul. 2015.
3. N. Kochev, V. Paskaleva, and N. Jeliazkova, AMBIT-Tautomer: An open source tool for tautomer generation, Mol. Inform., vol. 32, pp. 1–24, 2013.
4. N. Jeliazkova and V. Jeliazkov, AMBIT RESTful web services: an implementation of the OpenTox application programming interface, J. Cheminform., vol. 3, no. 1, p. 18, Jan. 2011.
5. N. Jeliazkova, J. Jaworska, and A. Worth, Open Source Tools for Read-Across and Category Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry, 2010, pp. 408–445.
CEFIC LRI EEM9.3
P&G (J.Jaworska), Nina Jeliazkova
CEFIC LRI EEM9.3-IC , EEM9.4 (ongoing) :
IdeaConsult Ltd., UM, Clariant
Projects contributed to the development
EC FP7 OpenTox (2008-2011)
EC FP7 ToxBank (2011-2015)
EC FP7 eNanoMapper (2014-2017)
EC H2020 ExCAPE (2015-2018)
(and more)
Open source libraries
The Chemistry Development Kit
(and many more)
30
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
can be downloaded or consulted online:
31
Publicly available https://ambitlrli.ideaconsult.net
- Clients only need a web browser
More information and download links
- http://cefic-lri.org/news/cefic-launches-ambit-chemical-safety-prediction-software/
Installation options
LOCAL on a LAPTOP/DESKTOP
- Local database, local webserver
SERVER (on company INTRANET)
- Shared database and web server. Clients only need a web browser.
Requirements
- Java 7, MySQL 5.7, Web server (servlet container, e.g. Apache Tomcat 7.x)
TECHNICAL SUPPORTcontact Ideaconsult Ltd, Sofia www.ideaconsult.net , email: [email protected]
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities
Acknowledgements
CEFIC LRI EEM9.3-IC/EEM9.4
o Bruno Hubesch
Project idea for LRI EEM9.3-IC
o Volker Koch, Clariant
Project input :
Clariant CompTox Team
o Udo Jensch (Toxicologist)
o Volker Koch (Ecotoxicologist)
o Qiang Li (Toxicologist)
o Joachim Schneider-Reigl (Ecotoxicologist)
Project implementation
Ideaconsult Ltd. www.ideaconsult.net
32
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities