Instant JChem Instant JChem -- enabling new enabling new ways of working with data andways of working with data andways of working with data and ways of working with data and
access to new data to work withaccess to new data to work withDana VanderwallBristol-Myers Squibb
Research Information Technology & Automation
Chemaxon US UGM, Sept 2014
1
Initial State in Chemistry AnalyticsInitial State in Chemistry Analyticsy yy yCDR
SI FormsSI Forms
KnowledgeKnowledge•Annotation•Folks-onomies
Additional dataAdditional data
Manual copy Manual copy & paste, & paste, typingtyping
SI FormsSI Forms
HPLC log P vs. rat Vds
y = 0.0344x + 3.886R2 = 0.2737
4.00
4.50
5.00
5.50
log
P
ExcelExcel
•Folks-onomies
VisualizationVisualizationExportExport
2.00
2.50
3.00
3.50
0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00
Vds
HPL
C
Scatter PlotScatter Plot
Master Spreadsheet (Excel, Word)Master Spreadsheet (Excel, Word)
Additional chemical structure analyses:
Rat Pct BoundRat Pct Bound
Export Export Manual copy & pasteManual copy & paste
2
Additional chemical structure analyses:•SAR R-group analysis•Clustering (CADD and in-house solutions)•Predictive models (HERG, Solubility, Permeability; FACT)
compound compound structures, IDsstructures, IDs
The DARE ProjectThe DARE Project(Data & Analytics for Research)(Data & Analytics for Research)(Data & Analytics for Research)(Data & Analytics for Research)
Simplify.
• Replace legacy app/workflow… with integrated tools for analytics• Decrease stand alone docs/reports• Put any needed calculations & predicted properties where they’re
neededneeded
Modernize
• A new product that maintains the functionality of form view…• Plus a richer set of views, tables, in-place conditional formatting,
graphs, & more chemistry functionality
• Learn by doing; established base camp in 1st yr, then ramped up
Phased approach to dev. & migration
y g y• Gradually phasing in IJC over 2013-2014
3
DARE technology mapDARE technology mapUser interface
gy pgy pDrill down: web service for conc. response curves &
secondary results
SOLR Index for text queries (IBM Patent DB only)
Data Alerts
Annotations
Data Marts: New data layer for access & integration Lead
Evaluation
PAMPACellular CYP InhDWG A Enzyme
DWG A Cellular
DWG B Receptor
DWG B Cellular
Data common to most DWGs Data unique to a DWG
ss
EvaluationProfiling: Enzymes MetStab CYP
InductionDWG A
selectivityDWG B
Selectivity
InformaticaInformatica
Operational Screening
BioBook
calculated fields
Chemical structures, properties,
calculated fields
Meta Data Annotation
Web
W
eb
serv
ices
serv
ices
Central Data Repository (CDR)
4
DARE TimelinesDARE TimelinesH1-2012 H2-2012 H1-2013 H2-2013 H1-2014 H2-2014
Phase 1:Informatica & ChemAxon Set-up , Prototyping and Build
Deliverable5 DWGs deployed Deliverableand Build 5 DWGs deployed Deliverable
~40 DWGs deployed
Deliverables~20 DWGs deployed
Phase 2:GUI & Datamart –Prototyping and B ild
Phase 3:GUI & Mart – Deployment
deployed
Phase 4:
Build
Phase 4:GUI & Mart – Deployment
Decommissions
5
Start with the basics & build upStart with the basics & build upppFoundation
• Program Specific Forms and use cases• Universal Forms (profiling platforms or compilations of data commonly
used)
Extended use cases
• Use cases requiring bespoke data structures, scripting, or visualization• Unique data sources, combinations of data, all biological data
• Hooks into internal web services: drill down for curves/secondary data
Extended functionality
6
Hooks into internal web services: drill down for curves/secondary data• Query to SOLR index• Data Alerts
DatamartDatamart InfrastructureInfrastructureGeneral
• ETL from primarily CDR, some additional sources• Provides environment to create tables & other data structures for IJC
• Tables in IJC not enormously popular with users• Comfort and orientation with data in text box, fixed in position on form
‘Cell Factory’
, p• Cell = entity in oracle that effectively provides the data for one assay;
CDR queries sometimes require complex set of conditions• Captures metadata associated with cell creation, keeps them unique,
etcetc
Incremental updates
7
• Via Informatica Power Center, 15-30 min incremental updates • Gentle failure in face of long running jobs
Data management v1Data management v1ggBA catalogs data required for new project teamfor new project team
Passes it to DB
Manually:• New tables/entities
promoted to IJC• New data tree created
• Build Form• Add new Passes it to DB
developer to define new ETL
New data tree created• Build edges cells/columns to form
IJC IJC FormsFormsDARE DARE
D t tD t tIJC SchemaIJC SchemaCDRCDR ETL
FormsFormsData martData mart
Manual coding/scriptingRate determining step
8
Rate determining stepDB development not self documenting
Automated data managementAutomated data managementgg
User User
• UI to search/define/create cells, tables, calc. fields• Consumes metadata & creates meta data ‘cell’
definitioncreates creates cell/table cell/table
definition• Promotes the new table / new fields into IJC• If it’s a new entity then
o Creates a new data Tree using a data tree TemplateAdd th T bl t th d t t
Metadata Metadata UIUI RepositoryRepository
Metadata Metadata RepositoryRepository
o Adds the new Table to the new data treeo Create a new form on the new data tree
• Creates edges
Auto PromotionAuto PromotionETL
IJC IJC FormsFormsDARE DARE
Data martData martIJC SchemaIJC SchemaCDRCDR ETL
Promote QueuePromote Queue
Data martData mart
• Creates tables & columns immediately upon cell ‘activation’ 9
ScaleScale
Instant JChem• 1455 forms + Grids• 288 saved queries
Data• 211 data trees• 526 ‘entities’
Traffic• 631 users (to date)• 1000-2000 db
• 474 saved lists• 10 scripts
• 8 schema• 2400 assays • 41,571 ‘fields’
connections daily
10
The flexibility of [datamart + IJC] have enabled solutions well beyond the standard ‘program’ formsolutions well beyond the standard program form
IBM PatentExternal data source
Novel & multiple data structures & presentationsPatent
DB
HT Metabolite
structures & presentations
IJC
MutagenesisDBVisualizations;
Integration of custom scripts & calculations
Datamarts
ChiralAlliance
D t
scripts & calculations
Chiral Separations
Drug Safety
Data AccessIntegration active &
historical of BMS dataDrug Safety Warehouse
Integration of BMS data not in the CDR11
HighHigh--Throughput Mutagenesis: SAR, but differentThroughput Mutagenesis: SAR, but differentgg g p gg p g
• Lead Evaluation Applied Genomics Research IT & Automation ComputerLead Evaluation, Applied Genomics, Research IT & Automation, Computer Aided Drug Design designed & built cloning and screening platform
• >150 mutants, testing >30 compounds12
A Different Data A Different Data ScaleScaleFor each cmpd compare WT to 150 mutants For 30 compounds
13
Endpoint variation over mutants by compoundEndpoint variation over mutants by compound
Datamart
• Mutagenesis datamart created drawing on data from 2 operational data sources• ‘Mart generation automated & refreshed as new data is available• DataMart structure is heavily augmented based on the need of Instant JChem(IJC)y g ( )• Utilize IJC’s flexible entity relationship model & charting fxns to aid data visualization
14
All compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutants
• Offered summary birds-eye view on all compounds by each individual result type (EC50, WTRATIO, KBWTRATIO etc) to identify trendsresult type (EC50, WTRATIO, KBWTRATIO etc) to identify trends
• Compound as column header- a novel pivot
15
Shift workload from queryquery & discovering to discovering to alerting alerting & reportingreportinggg p gp g Define what the teams want to monitor Automate the delivery of new data packages
Base case: Go find the data & construct analysis
Open SI Forms Open form query Select Select
dataExport data
Import data
Table or visualization
Table or visualization
Is my data there yet?
Q d tNew capability: Push data alerts
Is my data there yet?Is my data….
Instant JChem with
new data
Spreadsheet & link to open
project-form-list in IJC
Datamart
Query data source
in IJCin IJC
Automated email to user when new data
User data alert parameters
16
Alert manager (internal GWT), 2Alert manager (internal GWT), 2--way way integration with IJCintegration with IJCgg
• Grab active data tree ID and bring it to alert tool• Take all the assays under the data tree as selection
source for data alertSt thi i f ti d t h it i t DMART• Store this information and match it against DMART
• Create hit list using compound ID/lot ID as ‘permanent list’.
• Send the link to subscribers
17
What do the users think about all this?What do the users think about all this?
• Change is never easyg y• Sub-populations are attracted to new capabilities and
adopt new tools and practices• Others need more encouragement; stability is critical• Maintaining the capabilities of the familiar and well
understood in the new environment a pre requisite forunderstood in the new environment a pre-requisite for complete migration
• We’re getting thereg g
18
Legacy application usage vs. Instant JChem
Unique Users per MonthAnnouncement of SI
Forms retirement
700
800
900
500
600
DARE
SI
200
300
400SI
0
100
200
Mar 2014 Apr 2014 May 2014 Jun 2014Mar 2014 Apr 2014 May 2014 Jun 2014
19
Reduced number of data sets exported for analysis
Number of Data Exports per month
2600
2700
Number of Data Exports per month
2400
2500
2300
2400
2100
2200
1900
2000
2014 - MAR 2014 - APR 2014 - May 2014 - Jun
20
Monitor URL Sharing in IJC
70
80
Launched URLs
50
60
70
30
40 Total Form URL
List URLs
Query URLs
10
20
01 2 3 4 5 6
2014
21
a moment for reflectiona moment for reflection
cause for dancing• Conditional formatting!
what we learned• Train just in time
coaching• More thorough regressionConditional formatting!
• Grid view• Query builder• Query/browse
performance!
Train just in time• Listen; listen some more• STOP the presses if it’s not
right- they’d rather wait• Simple >> rich
More thorough regression testing
• Clearer release notes• Login/start-up
performanceperformance!• Tabbed panes• URL sharing*• Help from CXN!!
• Simple >> rich• Provide a thread of
continuity to lead through new tools
• Don’t disrupt the
performance• List query result retains
original order• Cleaner Excel export, keep
structure orientation• Don t disrupt the workflow, let it evolve
structure orientation• More conversations!• Web services• Plexus! 22
The DARE teamThe DARE team
Heather Artman Dong LiAcknowledgementsScientific Computing
Core TeamHeather ArtmanDawn CohenJohn Duncan
Dong LiMark ManfrediMinimol Mathew
Scientific Computing ServicesRay ReichardPadma Vellanki
Ramesh DurvasulaJames EwenLisa Johnson
Christa MusialMatthias NolteAnusha Ramanathan
Padma VellankiThomas CurnealMike Beluch
Lisa JohnsonSangeet Khullar
Anusha RamanathanDavid VanderbrookeDana Vanderwall
Nelly MasiasMahesh Nawade
BMS Internal 23
End user supportEnd user support
Support email groupUser Community SharePoint
Training and reference, FAQs, External links, contact info All reported issues and status All reported issues and status
– [open, in progress, scheduled fix/improvement, resolved]
Internal BMS User Group Meeting1 h thl i t d t ti & t i i f 2 3 i l 1 hr. monthly session to cover demonstration & training for 2-3 special topics or features
Topics drawn from suggestions and requests for more info or training; topics covered to date:topics covered to date:
– IJC: Query Builder; Visualization; Sharing by URL; exporting; working list (pick list); R group decomposition; Markush draw/search
– JChem4XL- patent doc creation
BMS Internal
p– IBM Patent Database; Metabolite Database
24
Assay meta data Assay meta data yy• Describe assay protocol & conditions
in controlled vocabulary
Biological description Targety
• Protocols would have a minimum set of fields that would have to be populated before going into production
Gene name (look-up, and capture locus link)
SpeciesC ll t
• Opportunity for business rules that guide the protocol registration
• All downstream systems would utilize th f k & t d t
Cell typeAssay description
Assay typeA dthe same framework & meta-data
• Propose adopting established standard, aligning/collaborating with
Assay mode Detection method
ResultsR lt tNIH BARD Project & BioAssay
Ontology (BAO)
• Requires process & roles for
Result type Modifier Units
tRequires process & roles for maintaining up to date dictionaries and governance
etc
25
BAO scope and purposeBAO scope and purposep p pp p p• BAO to describe assays and screening results
• Defines relevant assays and result annotations• Provides controlled terminology
Formalizes knowledge of assays and screening results• Formalizes knowledge of assays and screening results• Describes and formalizes screening campaigns, i.e.
relationship between assays in terms of their use
• BAO addresses problems with using data and facilitates
• Leveraging existing data in discovery projects• Global analysis across diverse data sets
I t ti f d t f diff t• Integration of data from different resources
26
What do we need to describe assaysWhat do we need to describe assaysyy
27
28