Post on 17-Dec-2015
transcript
Opportunity and Rewards for Pathology Data SharingAssociation of Pathology ChairsJuly 23, 2004Mt. Tremblant, Quebec
Jules J. Berman, Ph.D., M.D.*Program Director, Pathology InformaticsCancer Diagnosis Program, NCI, NIHemail: bermanj@mail.nih.gov
*Opinions do not necessarily represent the policies/opinions of the U.S. federal government.
UFO Abductees
Lots of them
They often say about the same thing (independent confirmations)
All walks of life
Mostly honest and rational people
Minority are a little crazy
One problem: no evidence
Researchers who don’t publish their primary data
Lots of them
They often say about the same thing (independent confirmations)
All walks of life
Mostly honest and rational people
Minority are a little crazy
One problem: no evidence
Pathology research is data-intensive.
Example: A tissue microarray study can easily involve terabytes of data.
After your research data reaches a certain size, the data becomes the research, and the journal articles become tiny editorials that describe or interpret the data
Think of the relationship between the earth and the sun.
The sun is hundreds of thousands of times larger than the earth. Consequently, it’s the earth that orbits the sun.
In data-intensive research, the manuscripts are tiny satellites of editorial comment orbiting a central large BLOB of data.
Data sharing requirements (from funding organizations):
NIH Statement on Data Sharinghttp://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html
Data sharing requirements (from journals)
National Research Council UPSIDE Universal Principle of Sharing Integral Data Expeditiouslyhttp://books.nap.edu/books/0309088593/html/R1.html
Data sharing requirements (from congress, in addition to FOIA):
On June 26, 2003, the "Public Access to Science Act, (H.R.2613)was introduced to the House by Congressman Sabo.
Purpose: “to exclude from copyright protection works resulting from scientific research substantially funded by the Federal Government.”
Latest Major Action: 9/4/2003 Referred to House subcommittee. Status: Referred to the Subcommittee on Courts, the Internet, and Intellectual Property
Data Quality requirements (from federal government):
The Data Quality Act was passed as part of the FY 2001 Consolidated Appropriations Act (Pub. L. No. 106-554. codified at 44 U.S.C. § 3516, note.)
The DQA requires the Office of Management and Budget ("OMB") to develop government-wide standards for data quality in the form of guidelines, which OMB has done through a series of rule makings.
Data Quality requirements (from courts):
In 1999, however, the Supreme Court of Pennsylvania carved a large exception out of the immunity doctrine for expert witnesses. In the case of LLMD of Michigan, Inc. v. Jackson-Cross Co. (1999), the court held that a client could sue his expert witness for negligence.
In 2002, the Supreme Court of Appeals in West Virginia took the issue further in Davis v. Wallace (2002) when it suggested that an expert witness could be sued for negligence not only by his own client but also by the opposing party against whom the expert testifies.
NIH Funding for data sharing
Shared Pathology Informatics Networkhttp://grants.nih.gov/grants/guide/rfa-files/RFA-CA-01-006.html
Tools for collaborations that involve data sharinghttp://grants1.nih.gov/grants/guide/pa-files/PAR-03-134.html
Infrastructure for data sharing and archivinghttp://grants.nih.gov/grants/guide/rfa-files/RFA-HD-03-032.html
caBIGhttp://cabig.nci.nih.gov/
How are we doing?
The ScientistVolume 18 | Issue 3 | 47 | Feb. 16, 2004
Scientists Abandon their Software
Good biology programs abound in universities, but academia offers little incentive to keep them current
By Sam Jaffe
http://www.isse.gmu.edu/~adinh/wchap1.html
GAO investigation of fed-funded software projects
29% never delivered 47% never used 19% reworked or abandoned after delivery 03% needs modifications after delivered 02% could be used as delivered
For the IRS There's No EZ Fix
“By assembling a star-studded team of vendors, the IRS thought its $8 billion modernization project would manage itself.
The IRS thought wrong. Now the agency's ability to collect revenue, conduct audits and go after tax evaders has been severely compromised.”
BY ELANA VARON
Apr. 1, 2004 Issue of CIO Magazine
http://www.cio.com/archive/040104/irs.html
Not unusual for a large medical center to spend over $100 million on Information Technology
What use are we getting from all that data?
Do we even have the fundamental tools needed to share data?
1. Standard ways of obtaining medical research data (confidentiality methods)
2. Standard ways of organizing data (nomenclatures, taxonomies, ontologies, classifications, data structures)
3. Standard ways of exchanging and merging data
Ensuring confidentiality
Berman JJ. Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions.Arch Pathol Lab Med. 2004 Mar;128(3):344-6.
Berman JJ. Racing to share pathology data. Am J Clin Pathol. 2004 Feb;121(2):169-71 (editorial).
Berman JJ. Concept-Match Medical Data Scrubbing: How pathology datasets can be used in research. Arch Pathol Lab Med. 2003 Jun;127(6):680-6.
Berman JJ. Threshold protocol for the exchange of confidential medical data. BMC Medical Research Methodology, 2002, 2:12.
Organizing data
Developmental Lineage Classification and Taxonomy of Neoplasms
Free, open access, soon to be merged into NCI Thesaurus
Comprehensive 102,000+ terms ( 7+ Megabytes)
Heritable class structure with a unique class location for each tumor
XML document that can be cross-annotated with molecular biology databases
Preserves current tumor names, while abandoning purely morphologic categories (e.g. epithelial/stromal)
Berman JJ. Tumor classification: molecular analysis meets Aristotle. BMC Cancer 2004 4:10, 17 March 2004
Articlehttp://www.biomedcentral.com/1471-2407/4/10
XML file (gzipped)http://12.183.10.150/jjb/neoclxml.gz
Flat file (gzipped)http://12.183.10.150/jjb/neoself.gz
Exchanging data
Real world example: The Tissue Microarray Data Exchange Specification
The greatest value of TMAs is the ability to link TMA data with data from other TMAs and from other databases that inform on the data contained in the TMA database.
That value is essentially untapped because there has been no way to publish, exchange, merge and link TMA datasets in a manner that everyone can use and understand.
The TMA Specification is an open access document that can be used without any restriction.
Its development was sponsored by the NCI and by the Association for Pathology Informatics
Jules J Berman, Mary Edgerton and Bruce Friedman.The tissue microarray data exchange specification: a community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Mak. 2003 May 23;3:5
Jules J Berman, Milton Datta, Andre Kajdacsy-Balla, Jonathan Melamed, Jan Orenstein, Kevin Dobbin, Ashok Patel, Rajiv Dhir, Michael J Becich. The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource. BMC Bioinformatics 2004 Feb 27, 5:19
Querying/collecting dispersed and heterogeneous data
Shared Pathology Informatics Networkhttp://grants.nih.gov/grants/guide/rfa-files/RFA-CA-01-006.html
MGH and affiliates (Isaac Kohane PI)
UCLA and affiliate hospitals (Jonathan Braun PI)
Indiana University and affiliates (Clem McDonald PI)
U. Of Pittsburgh (Mike Becich PI)
Program Director, Jules Berman
Shared Pathology Informatics Network – NCI’s most ambitious pathology data sharing effort
Peer-2-Peer network that open their databases to queries on their surgical pathology data, providing de-identified records linked to specimens (first public demo on May 28, 2004)
The individual informatics systems are all different, but they have common data exchange and data query language
Current prototype has millions of annotated specimens
Future: expanding number of SPIN participants, expanding the kinds of data that can be queried
end