+ All Categories
Home > Technology > Patent Search: An important new test bed for IR

Patent Search: An important new test bed for IR

Date post: 08-May-2015
Category:
Upload: giovannaroda
View: 2,561 times
Download: 0 times
Share this document with a friend
Description:
Patent Search: An important new test bed for IRpresented at the 9th Dutch-Belgian Information Retrieval Workshop (DIR 2009)Enschede, The Netherlandshttp://dir2009.cs.utwente.nl/
29
Patent Search: An important new test bed for IR J. Tait, M. Lupu 1 H. Berger, G. Roda, M. Dittenbach, A. Pesenhofer 2 E. Graf, K. van Rijsbergen 3 1 Information Retrieval Facility Vienna, Austria 2 Matrixware Vienna, Austria 3 University of Glasgow Dept. of Computing Science Glasgow, UK DIR 2009 / Feb. 2-3, 2009
Transcript
Page 1: Patent Search: An important new test bed for IR

Patent Search: An important new test bed for IR

J. Tait, M. Lupu1

H. Berger, G. Roda, M. Dittenbach, A. Pesenhofer2

E. Graf, K. van Rijsbergen3

1Information Retrieval Facility

Vienna, Austria

2Matrixware

Vienna, Austria

3University of Glasgow

Dept. of Computing Science

Glasgow, UK

DIR 2009 / Feb. 2-3, 2009

Page 2: Patent Search: An important new test bed for IR

Patent Search.

Patent search is a highly specialized form of information search.It is characterized by its

target data

type of information needs

legal and economic implications

Page 3: Patent Search: An important new test bed for IR

Target data

Data for patent retrieval comes mainly from:

patent databases from patent authorities (EPO, USPTO,JPO, SIPO, WIPO, etc.)

scientific publications

prior art databases (IP.com)

A new acronym

SIPO: State Intellectual Property Office of the Peoples’ Republic ofChina

Page 4: Patent Search: An important new test bed for IR

Target data

Characteristics of patent documents

multilingual and ’legalese’

non uniform formats

some are OCR’d

figures, images, chemical formulas, DNA sequences

include references to patent and non-patent literature

A new acronym

NPL: Non-Patent Literature

Page 5: Patent Search: An important new test bed for IR

Information Needs.

K.H. Atkinson, Towards a more rational patent search paradigm:

depending on what group is doing the asking, the types of patentsearch requested may include simple patentability, clearance tomarket a product, validity, opposition to a patent being sought byanother, infringement watch, creating IP landscapes for businessdevelopment or R&D, infringement defense, litigation, prosecutionsupport, and creation of portfolios for assignments, investments,mergers and acquisitions [ . . . ]

Page 6: Patent Search: An important new test bed for IR

Legal and economic implications.

patents are legal documents

patent portfolios are assets for enterprises

a single patent search can be worth several days of work

High recall searches

Missing even a single relevant document can have severe financialand economic impact. For example, when a granted patentbecomes invalidated because of a document omitted at applicationtime.

Page 7: Patent Search: An important new test bed for IR

IntroductionPatent Search

A modern IR test bedPromoting take up of research

Conclusion

We have characterized the patent search problem by describing itstarget data, types of information needs, legal and economicimplications.

Next:

evaluating IR techniques in the patent domain

previous initiatives in the area of patent retrievalthe CLEF-IP and TREC-Chem initiatives

promoting take-up of research

Tait et al. Patent Search: An important new test bed for IR

Page 8: Patent Search: An important new test bed for IR

Test collections

Test collections in Information Retrieval play a pivotal role in theevaluation of retrieval models.

Domain-specific test collections already exist for:

Web pages

news stories

legal documents

blogs

genomics

patents

Page 9: Patent Search: An important new test bed for IR

Pioneering work in patent retrieval.

Patent retrieval task at the NTCIR Workshop1 since 2001.

produced test collections primarily targeting Japanese patents

retrieval tasks

ad-hoc (goal: find patents on a given topic)invalidity search (goal: find patents invalidating a given claim)patent classification according to the F-term system

Two new acronyms

F-term (abbreviation of File-forming term) is the classificationsystem used in Japan as a complement to IPC (InternationalPatent Classification)

1http://research.nii.ac.jp/ntcir

Page 10: Patent Search: An important new test bed for IR

Evaluation tracks.

The IRF has engaged in two pilot evaluation tracks on patentretrieval

CLEF-IPwww.ir-facility.org/the_irf/clef-ip09-track

TREC-Chemwww.ir-facility.org/the_irf/trec_chem.htm

Page 11: Patent Search: An important new test bed for IR

CLEF-Intellectual Property Initiative.

CLEF-IP

coordinated by the IRF

part of the Cross-Language Evaluation Forum2

will focus on the task of prior art search

European patents as target data

automatic extraction of relevance assessments

Prior art search

Prior art search consists in identifying all information (includingNPL) that might be relevant to a patent’s claim of novelty.

2http://www.clef-campaign.org

Page 12: Patent Search: An important new test bed for IR

Prior art search.

The most common type of patent search. Performed at variousstages of the patent life-cycle and with different intentions:

before filing an application (novelty search or patentabilitysearch) to determine whether the invention fulfills therequirements of

noveltyinventive step

before grant - results go into a search report attached topatent

invalidity search: post-grant search used to unveil prior artthat invalidates a patent’s claims of originality

Page 13: Patent Search: An important new test bed for IR

Target data.

The CLEF-IP evaluation track will restrict target data to patents.

Target data:

comprising 16 years (filing date between 1985 and 2000) ofEPO patents

1.9 million patent documents corresponding to 1 millionpatents

75 GB, in XML format

documents are in English, German, and French

Page 14: Patent Search: An important new test bed for IR

Automatic extraction of relevance assessments.

The data resulting from prior art searches is saved in the EPO orUSPTO databases as:

citations in patent applications

citations in search report

citations in opposition’s legal files

The CLEF-IP track is going to extract this information (as muchas possible) automatically in order to form a large set of topics.

Page 15: Patent Search: An important new test bed for IR

Prior art from opposition procedures.

According to the European patent law, a granted patent maybe opposed.

It is often the case that opponent provides new prior art thatinvalidates claim of originality of the invention.

Patents cited in opposition procedures are very relevant priorart documents.

They are the results of a very thorough invalidity search.

Page 16: Patent Search: An important new test bed for IR

Crowdsourcing extraction of relevance assessments.

Need to extract citations from documents arising fromopposition procedures

These documents are only are available as scanned images3

Will be using crowdsourcing for extracting these citations.

A new word from business jargon

Crowdsourcing.

3at http://www.epoline.org

Page 17: Patent Search: An important new test bed for IR

Relevance and evaluation measures.

Labels used in search reports:

label means that cited document is

X relevant when taken aloneY relevant in combination with other documentsA relevant but not prejudicial to novelty or inventive step

How to use these labels for defining new evaluation measures?

Page 18: Patent Search: An important new test bed for IR

Challenges.

As a result of the CLEF-IP track we expect to obtain new insightson:

how to represent information need given by a patent

query reformulation

evaluation metrics for patent retrieval

using machine translation for improving retrieval effectiveness

Page 19: Patent Search: An important new test bed for IR

TREC Chemistry track.

Ad-hoc search

Target data:

academic papers (Royal Society of Chemistry)chemical patent documents (class C in the IPC)

Will use automatic extraction of citations for relevanceassessments

Challenges:

chemical names and structureschemical interactions, relations, transformations, properties

Page 20: Patent Search: An important new test bed for IR

IntroductionPatent Search

A modern IR test bedPromoting take up of research

Conclusion

Pioneering work at NTCIRCLEF-IPTREC-Chem

The IRF is contributing to the creation of new patent testcollections by organizing two tracks within the CLEF andTREC evaluation campaigns.

In addition to the TREC and CLEF contributions, the IRF,together with Matrixware, is promoting several initiativesaimed at facilitating and improving the patent retrievalprocess.

Tait et al. Patent Search: An important new test bed for IR

Page 21: Patent Search: An important new test bed for IR

IntroductionPatent Search

A modern IR test bedPromoting take up of research

Conclusion

The IRFMatrixwarePromoting researchProviding the toolsCurrent University Projects

Promoting take up of research

Next:

presentation of the IRF and Matrixware

promoting take up of research

the IRF symposiumthe PaIR workshop

providing the tools

funding research in the area of patent retrieval

Tait et al. Patent Search: An important new test bed for IR

Page 22: Patent Search: An important new test bed for IR

IRF: the Information Retrieval Facility.

New international not-for-profitfoundation, based in Vienna,

Its mission:

to bridge the gap between the needs ofthe industry and the academic know-howto promote and facilitate research inlarge scale information retrievalmaintain a facility that enables largescale information retrieval and in-depthdata processing

Page 23: Patent Search: An important new test bed for IR

Matrixware.

Founded 2005 in Vienna

80 Employees

> 15 Academic Partners Worldwide

Implements solutions for access to patentinformation

Page 24: Patent Search: An important new test bed for IR

Promoting research.

Matrixware and the IRF have engaged in several initiatives aimedat promoting research and raising awareness in the area of patentretrieval.

the Information Retrieval Facility Symposiuman annual symposium held in Vienna to foster knowledgeexchange between IR experts and IP professionals

the PaIR workshopa workshop on Patent Information retrieval hosted by theCIKM conference

Page 25: Patent Search: An important new test bed for IR

Providing the tools.

Successful IR research conventionally depends on three elements:

1 the availability of test collections

2 access to suitable software systems on which to runexperiments

3 access to sufficiently powerful hardware

The IRF, supported by Matrixware, is providing all three of these.

Page 26: Patent Search: An important new test bed for IR

Current University Projects.

Accessibility of Information (Glasgow)

Large Scale Logical Retrieval (Glasgow)

Semantic Analysis of Patent Data (Sheffield and Nijmegen)

Language Modeling for Patent Retrieval (Umass Amherst)

OCR for patents (Umass Amherst)

Page 27: Patent Search: An important new test bed for IR

Concluding remarks

Patent retrieval is an interesting and important openchallenge for IR researchers.

The IRF and Matrixware have engaged in several projectsaimed at promoting research in this area.

Page 28: Patent Search: An important new test bed for IR

IntroductionPatent Search

A modern IR test bedPromoting take up of research

Conclusion

Concluding remarksInvitationClosing

Invitation.

You are invited to:

join one of the evaluation tracks

CLEF-IPTREC-Chem

participate in the PaIR workshop

participate in the Information Retrieval Facility Symposium

Tait et al. Patent Search: An important new test bed for IR

Page 29: Patent Search: An important new test bed for IR

Thank you for your attention.


Recommended