+ All Categories
Home > Health & Medicine > Exploring SAR between Patents and PubChem

Exploring SAR between Patents and PubChem

Date post: 11-May-2015
Category:
Upload: chris-southan
View: 559 times
Download: 0 times
Share this document with a friend
Description:
2012 ChemAxon users meeting
Popular Tags:
22
[1] Using Chemicalize.org with Other Open Resources to Extract SAR from Patents and Explore Intersects in PubChem Christopher Southan ChrisDS Consulting, Göteborg, Sweden, Prepared for the ChemAxon UGM, May 2012, version 2nd May
Transcript
Page 1: Exploring SAR between Patents and PubChem

[1]

Using Chemicalize.org with Other Open Resources to Extract SAR from Patents and

Explore Intersects in PubChem

Christopher Southan

ChrisDS Consulting, Göteborg, Sweden,

Prepared for the ChemAxon UGM, May 2012, version 2nd May

Page 2: Exploring SAR between Patents and PubChem

[2]

Key Relationships in Patents and Papers

Document Assay Result Compound Target

MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNIPQTDESTLMTIAYVMAAICALFMLPLCLMVCQWRCLRCLRQQHDDFADDISLLK

2011 http://www.ncbi.nlm.nih.gov/pubmed/21569515

2010 http://www.citeulike.org/user/cdsouthan/article/8637426

2012 http://www.slideshare.net/cdsouthan/southan-bio-it2012patents

Discerning and mapping these relatioshionships from documents is crucial and demanding

Chemicalize.org is a significant advance in open chemistry extraction

Page 3: Exploring SAR between Patents and PubChem

[3]

Practical Utilities

• Name-to-struc (n>s) for selected or batch conversions from patents, papers, abstracts, web pages and other sources

• Intersect different content at identity or similarity level• Molecular properties and bulk download• Extracted structures archived, searchable and sharable• Similarity display of analogue series from a document• Bulk upload to PubChem for intersects and triage • Result display in JChem for Excel• Can iterate with OPSIN for IUPAC fixes

Page 4: Exploring SAR between Patents and PubChem

[4]

Chemicalize.org Exploitation Challenges

• Specific retrieval of patent or other source (e.g. target recall)• Working different sources (e.g. CiteXplore/espace/Scibite for retrieval,

Google for cross-checks, WIPO for images and tables, Freepatentsonline for deeper queries)

• Eyeballing original documents for relevant sections• Locating exemplified drug-relevant/lead-like structures with data links• For many patents examples >> activity data links > potent structures• Selecting best sources/family members for optimal IUPAC extraction

quality (e.g. US pats and FPO)• Filtering novel structures from common chemistry• Need to be PubChem cogniscant for effective triage• For a variety of reasons some documents have low extraction rates • Tricks and work-rounds enhance exploitation

Page 5: Exploring SAR between Patents and PubChem

[5]

Target Recall: CiteExplore

• Title only ”DPPIV” Medline = 37 Patents = 31• Title + abstract ”DPPIV” Medline = 402 Patents = 144• Title + abstract ”dipeptidyl peptidase” Medline = 4,838 Patents = 1,520• Title + abstract ” inhibitor” Medline = 772,053 Patents = 124,516• Title + abstract ” diabetes” Medline = 431,299 Patents = 36,792• Title + abstract ”DPPIV OR dipeptidyl peptidase AND inhibitor AND

diabetes” Medline = 1,105 Patents = 604

CiteXplore is restricted to EBI patent abstracts so you can get higher recall at full-text sources such as SureChemOpen, EPO/espace, WIPO and FPO (but not search Medline in parallel)

Page 6: Exploring SAR between Patents and PubChem

[6]

Target Alerts: SciBite

US2012040982DPPIV

Boeringer Ingelheim Feb 2012

Page 7: Exploring SAR between Patents and PubChem

[7]

Slicing and Dicing US2012040982 (I)

• Chemicalize converted 1,390 structures from the FreePatentsOnline (FPO) URL• From the 497 examples 486 converted• Need to scan the document and iterate with scroll bar to spot lead-like structures

Page 8: Exploring SAR between Patents and PubChem

[8]

Slicing and Dicing US2012040982 (II)

• OPSIN picks up some of what chemicalize misses (e.g. 389 above) but not all • OPSIN error reports may help fix a series for Chemicalize (e.g 1 vs. L)• Practically more important if that example has potent activity

Page 9: Exploring SAR between Patents and PubChem

[9]

Slicing and Dicing US2012040982 (III)

• Similarity display clearly picks out the lead-like analog series (top)• Select via FPO text > example list only, > Word > PDF > chemicalize

upload > SDF download 486 structures (bottom)• However, from the partial descriptions these may include prophetics• Also download 28 claimed examples via PDF

Page 10: Exploring SAR between Patents and PubChem

[10]

Slicing and Dicing US2012040982 (IV)

• Can locate an SAR table with 11 point IC50s• But.... only 9 examples below 100 nM, example 25 is 56 nM• The designation of series 1 and 2 obfuscates their example identity

Page 11: Exploring SAR between Patents and PubChem

[11]

PubChem Triage of Chemicalize Output (I)

• Example 25 SMILES > neither an exact match nor tautomer – thus novel• Repeat search at 95% Tanimoto > 289 neighbors > cluster• Closest PubChem analog > ChemSpider > SureChem > Novo Nordisk DPPIV

patent from 2005

Page 12: Exploring SAR between Patents and PubChem

[12]

PubChem Triage of Chemicalize Output (II)

• Total extraction from US2012040982 > 1,390 SDFs > 1387 uploaded > 7 “failed” • 493 exact matches (= preexisting PubChem CIDs)• 486 example-only SDFs > upload > 21 exact-match CIDs • 34 claims-only give 9 exact-match CIDs, primary sources were:• 5 from ChEMBL from a Boeringer Ingleheim 2007 Publication• 7 from Thomson Pharma • 2 from ChemSpider with SureChem links to Boeringer Ingleheim patents• Thus 461 examples chemicalized from US2012040982 are “novel” structures• However, cannot check enatiomeric or tautomeric inexact matches from

PubChem interface (only for existing CIDs)

Page 13: Exploring SAR between Patents and PubChem

[13]

PubChem Triage of Chemicalize Output (III)

• Chemicalize examples-plus-claims US201204098 = 29 CIDs (search 36 above) • Thomson Pharma/Discovery gate intersect is ~ Derwent WPI (search 31)• This matched 20 from the 29 (search 36), presumably DWPI extractions• ChEMBL (7) matched 6 from 29 (i.e. extracted from papers)• SLING matched 8 from 29 (i.e. extracted from EPO patents) • It was thus possible to intersect the chemicalize extractions from this patent with

four independent primary sources in PubChem from patents and publications

Page 14: Exploring SAR between Patents and PubChem

[14]

Patent ”Walking” from Chemicalize similarity results (I)

• The similarity results from one example gave 1734 matches out to Tanimoto 0.5, extending ”beyond” the example space of US2012040982.

• Scrolling these shows at Tanimoto 0.6, with shared substructures in blue, connect to a different older patent US7772226, also for DPPIV, from Eisai

Page 15: Exploring SAR between Patents and PubChem

[15]

Patent ”Walking” from Chemicalize similarity results (II)

• US7772226 from FPO converted 1127 (i.e. more than the 992 from PatBase)• 680 matched PubChem CIDs• Example 228 CC#CCN1C(=NC2=C1C(=O)NC(OC1=CC=C(C=C1)C(=O)OC(=O)C(F)(F)F)=N2)N1CCNCC1

had a 12 nM IC50 for DPPIV• Can even ”walk” to a third DPPIV patent WO2007071738 from Novartis

Page 16: Exploring SAR between Patents and PubChem

[16]

Extracting from CiteXplore ChEMBL

• CiteXplore lists ChEMBL IUPACs and IDs

• Can chemicalize all ChEMBL structures from from one paper

• Difficult to ID these in ChEMBL

• Upload 8 structures to PubChem

• 7 match ChEMBL IDs

• Only one matches the 29 from US2012040982

• Thus paper probably from mutiple patents

Page 17: Exploring SAR between Patents and PubChem

[17]

Mining PubMed Central Full-text Papers (I)

• Only a few examples converted direct• So > wordpad > direct chemicalize (iterate) > web page (Google sites)• Download > Upload to JChem for Excel• Add in IC50 values from paper

Page 18: Exploring SAR between Patents and PubChem

[18]

Mining PubMed Central Full-text Papers (II)

• Add the SAR data from the paper into the structure table

• These had no exact matches in PubChem

Page 19: Exploring SAR between Patents and PubChem

[19]

Chemicalizing the DrugBank Entry for DPPIV

41 conversions of inhbitors, many are PDB ligands

Page 20: Exploring SAR between Patents and PubChem

[20]

Can Even Extract Catalogues that have no SMILES or InChIs....

Tocris DPPIV inhibitor > chemicalize >

PubChem > 6 analogs

Page 21: Exploring SAR between Patents and PubChem

[21]

Conclusions

• Chemicalize.org is powerful, flexible and free, as in beer....• Significantly enables small-scale roll-your-own patent mining • Ditto for journal article/abstract mining (e.g. for papers not captured in ChEMBL) • You still need perspicacity to discern SAR details• Complementary to commercial patent databases populated by manual extraction

(e.g. you can extract more structures)• Commercial automated patent extraction databases typically combine ChemAxon

n>s with other algorithms (e.g. http://www.chemaxon.com/library/benchmarking-chemaxon%E2%80%99s-name-to-structure-batch-tool-on-patent-text/)

• While they thus out-perform chemicalize, it is still very useful for intersecting journal articles or other sources against any databases

• Significant novel content (w.r.t. public databases) is accumulating via ”default crowdsourcing” in the chemicalize archive which becomes an important cross-check source and can be ”walked” between documents

• Combined with OPSIN and OSRA structures from most sources are extractable• Synergies with sources such as PubChem, PubMed Central, ChEMBL and

SureChemOpen will advance academic drug discovery and chemical biology

Page 22: Exploring SAR between Patents and PubChem

[22]

Questions Welcome

ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htmMobile: +46(0)702-530710Skype: cdsouthanEmail: cdsouthan – at - hotmail.comTwitter: http://twitter.com/#!/cdsouthanBlog: http://cdsouthan.blogspot.com/ (includes postings on patent themes)LinkedIN: http://www.linkedin.com/in/cdsouthanWebsite: http://www.cdsouthan.info/CDS_prof.htmPublications: http://www.citeulike.org/user/cdsouthan/publications/order/yearCitations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=enPresentations: http://www.slideshare.net/cdsouthan


Recommended