+ All Categories
Home > Documents > A European inventory of common electronic health record data elements for clinical trial feasibility

A European inventory of common electronic health record data elements for clinical trial feasibility

Date post: 23-Dec-2016
Category:
Upload: fleur
View: 212 times
Download: 0 times
Share this document with a friend
10
RESEARCH Open Access A European inventory of common electronic health record data elements for clinical trial feasibility Justin Doods 1 , Florence Botteri 2 , Martin Dugas 1 , Fleur Fritz 1* and on behalf of EHR4CR WP7 Abstract Background: Clinical studies are a necessity for new medications and therapies. Many studies, however, struggle to meet their recruitment numbers in time or have problems in meeting them at all. With increasing numbers of electronic health records (EHRs) in hospitals, huge databanks emerge that could be utilized to support research. The Innovative Medicine Initiative (IMI) funded project Electronic Health Records for Clinical Research(EHR4CR) created a standardized and homogenous inventory of data elements to support research by utilizing EHRs. Our aim was to develop a Data Inventory that contains elements required for site feasibility analysis. Methods: The Data Inventory was created in an iterative, consensus driven approach, by a group of up to 30 people consisting of pharmaceutical experts and informatics specialists. An initial list was subsequently expanded by data elements of simplified eligibility criteria from clinical trial protocols. Each element was manually reviewed by pharmaceutical experts and standard definitions were identified and added. To verify their availability, data exports of the source systems at eleven university hospitals throughout Europe were conducted and evaluated. Results: The Data Inventory consists of 75 data elements that, on the one hand are frequently used in clinical studies, and on the other hand are available in European EHR systems. Rankings of data elements were created from the results of the data exports. In addition a sub-list was created with 21 data elements that were separated from the Data Inventory because of their low usage in routine documentation. Conclusion: The data elements in the Data Inventory were identified with the knowledge of domain experts from pharmaceutical companies. Currently, not all information that is frequently used in site feasibility is documented in routine patient care. Keywords: Electronic health record, Data elements, Feasibility criteria, Clinical information system, Clinical trials Background Many clinical studies have difficulties in recruiting the re- quired number of patients within the specified time frame and subsequent protocol amendments are costly. Between a third and half of the studies meet their recruitment numbers and over half take longer than planned to reach their goal [1,2]. At the same time, clinical documentation is increasingly carried out in electronic health record (EHR) systems and, therefore, huge amounts of data are stored in an electronic format. EHRs are used for routine patient care but can also be used to support the identification of eligible patients for clinical research [3]. If recruitment could be optimized by a better selection of clinical research centers, and if better trial protocols could be created through an improved and more accurate site feasibility analysis, clinical studies could be completed faster and be more cost-efficient. It is common practice during feasibility analysis that pharmaceutical companies ask physicians how many patients they treat under certain conditions in a certain period of time. The physicians then estimate the number of suitable patients per site. However, how these results are compiled is non-transparent for the study sponsor. An improvement of the site feasibility * Correspondence: [email protected] 1 Institute of Medical Informatics, University Münster, Albert-Schweitzer-Campus 1/A11, D-48149 Münster, Germany Full list of author information is available at the end of the article TRIALS © 2014 Doods et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Doods et al. Trials 2014, 15:18 http://www.trialsjournal.com/content/15/1/18
Transcript

RESEARCH Open Access

A European inventory of common electronichealth record data elements for clinical trialfeasibilityJustin Doods1 Florence Botteri2 Martin Dugas1 Fleur Fritz1 and on behalf of EHR4CR WP7

Abstract

Background Clinical studies are a necessity for new medications and therapies Many studies however struggle tomeet their recruitment numbers in time or have problems in meeting them at all With increasing numbers ofelectronic health records (EHRs) in hospitals huge databanks emerge that could be utilized to support research TheInnovative Medicine Initiative (IMI) funded project lsquoElectronic Health Records for Clinical Researchrsquo (EHR4CR) createda standardized and homogenous inventory of data elements to support research by utilizing EHRs Our aim was todevelop a Data Inventory that contains elements required for site feasibility analysis

Methods The Data Inventory was created in an iterative consensus driven approach by a group of up to 30people consisting of pharmaceutical experts and informatics specialists An initial list was subsequently expandedby data elements of simplified eligibility criteria from clinical trial protocols Each element was manually reviewedby pharmaceutical experts and standard definitions were identified and added To verify their availability dataexports of the source systems at eleven university hospitals throughout Europe were conducted and evaluated

Results The Data Inventory consists of 75 data elements that on the one hand are frequently used in clinicalstudies and on the other hand are available in European EHR systems Rankings of data elements were createdfrom the results of the data exports In addition a sub-list was created with 21 data elements that were separatedfrom the Data Inventory because of their low usage in routine documentation

Conclusion The data elements in the Data Inventory were identified with the knowledge of domain experts frompharmaceutical companies Currently not all information that is frequently used in site feasibility is documented inroutine patient care

Keywords Electronic health record Data elements Feasibility criteria Clinical information system Clinical trials

BackgroundMany clinical studies have difficulties in recruiting the re-quired number of patients within the specified time frameand subsequent protocol amendments are costly Betweena third and half of the studies meet their recruitmentnumbers and over half take longer than planned to reachtheir goal [12]At the same time clinical documentation is increasingly

carried out in electronic health record (EHR) systemsand therefore huge amounts of data are stored in an

electronic format EHRs are used for routine patient carebut can also be used to support the identification ofeligible patients for clinical research [3]If recruitment could be optimized by a better selection

of clinical research centers and if better trial protocolscould be created through an improved and more accuratesite feasibility analysis clinical studies could be completedfaster and be more cost-efficient It is common practiceduring feasibility analysis that pharmaceutical companiesask physicians how many patients they treat under certainconditions in a certain period of time The physicians thenestimate the number of suitable patients per site Howeverhow these results are compiled is non-transparent for thestudy sponsor An improvement of the site feasibility

Correspondence FleurFritzuni-muensterde1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster GermanyFull list of author information is available at the end of the article

TRIALS

copy 2014 Doods et al licensee BioMed Central Ltd This is an open access article distributed under the terms of the CreativeCommons Attribution License (httpcreativecommonsorglicensesby20) which permits unrestricted use distribution andreproduction in any medium provided the original work is properly cited

Doods et al Trials 2014 1518httpwwwtrialsjournalcomcontent15118

analysis could be achieved through re-use of EHR datato generate more reliable patient count estimates forpharmaceutical sponsors of clinical studies and spon-sors of investigator initiated trials (IITs) alike If theprotocol designers know how the feasibility numberscome about they can redefine their criteria to improvethe protocolsEven though EHRs are being adopted in more and more

hospitals data is not necessarily reusable Data can becaptured fully structured semi-structured or in free textStructured data are documented during routine patient carethrough the use of national value sets or terminologies butit is currently unclear what kind of data besides data forreimbursement are available across European EHR sys-tems It is also unclear how much of these data are relevantand specific enough for clinical research and which dataelements are most relevant for feasibility analyses of clinicalstudiesTo tackle those issues the IMI [4] funded project

lsquoElectronic Health Records for Clinical Researchrsquo (EHR4CR)[5] aims to support clinical trials including site feasibilityanalysis through the re-use of EHR data The project runsover four years (2011 to 2014) and being a public-privatepartnership consists of 33 partners from industry and aca-demia Clinical partners are located in France GermanyPoland Switzerland and the United Kingdom Scenarioswhich will be addressed are lsquoclinical protocol feasibilityrsquo lsquopa-tient identification and recruitmentrsquo lsquoclinical trial executionrsquoand lsquoadverse event reportingrsquo EHR4CR is focused on thefollowing disease areas oncology inflammatory diseasesneuroscience diabetes cardiovascular and respiratory dis-eases The project will utilize existing or specifically createdclinical data warehouses and connect those databases tothe lsquoEHR4CR Platformrsquo in secure technical ways and com-ply with European data protection lawsAs part of the workpackage lsquoPilotsrsquo (WP7) we aim to

obtain an overview of the data content and frequency inEHRs which allow electronic support for protocol feasi-bility Our objective is to develop an inventory of avail-able core data elements of European EHRs for all thedisease areas of the EHR4CR project These data ele-ments have to be relevant for clinical research accordingto clinical trial experts from the European Federationof Pharmaceutical Industries and Associations (EFPIA)Our motivation is to foster secondary use of EHR datafor research and our research question therefore is Whatare the common data elements in Europe relevant for site

feasibility analyses and what is currently available inEHRs to create a valid and EFPIA accepted inventory

MethodsData elementThe term lsquodata elementrsquo is used in several contexts withmultiple possible meanings The ISOIEC 11179 Standarddefines a data element in Part 1 [6] as follows

lsquoA data element is produced when a representation isassociated with a data element concept Therepresentation describes the form of the dataincluding a value domain datatype representationclass (optionally) and if necessary a unit of measurersquo

In the following we focus on a consented definitionwith a data element concept comprising two parts Thefirst part is assigned to identify groups of related dataelements (data group) for example lsquoFindingsrsquo while thesecond specifies the datatype in more detail (data item)for example lsquoWeightrsquo Links to Unified Medical LanguageSystem (UMLS) [7] codes are provided to identify theunderlying medical concepts Representations as definedin ISOIEC 11179 with value domains data types andunits of measurement for each data element were notspecified for the Data Inventory because data sourceswith different languages were analyzed Instead examplesof typical values were provided Table 1 shows an exampleof such a data element

Data inventoryThe Data Inventory is a catalog of data elements Everydata element consists of a data group and data item partwhich together correspond to ISO 11179rsquos data elementconcept Elements also contain a sequential ID an ex-ample for a possible data value a definition and a link tothe UMLS code of its medical concept

MaterialThe Data Inventory was created from an initial listprovided by the pharmaceutical companies with dataelements they consider most important for their stud-ies In addition the inventory contains data elementsfrom 17 studies from acute or chronic diseases in on-cology neurology diabetes cardiovascular and inflam-matory diseases (see Table 2) These studies were selectedfrom the EFPIA companies in the EHR4CR project and

Table 1 Data element example

Data element concept Example Consensus definition Link

FindingsWeight 80 kg The weight of a subject httpncimncinihgovncimbrowserpagesconcept_detailsjsfdictionary=NCI20MetaThesaurusampcode=C0005910

Example for the definition of data elements Data element concepts consist of a data group and data item part The elements also contain an example adefinition and a link to the NCI Metathesaurus referencing its UMLS concept

Doods et al Trials 2014 1518 Page 2 of 10httpwwwtrialsjournalcomcontent15118

had finished their feasibility phase as of end 2011 The se-lection excluded Phase I and non-interventional studiesAdditional criteria for the selection were that the studiesshould have run at least at one EHR4CR data providersite (the participating hospitals are Assistance Publique -Hocircpitaux de Paris Friedrich-Alexander-UniversitaumltErlangen-Nuumlrnberg Hocircpitaux Universitaires de GenegraveveKings College London Medical University of WarsawUniversiteacute de Rennes University College London Universityof Dundee University of Glasgow University of ManchesterWestfaumllische Wilhelms-Universitaumlt Muumlnster) - preferencewas given to those studies that ran at the most - andthat each EFPIA company (participating companiesare AMGEN AstraZeneca Bayer Health Care FHoffmann-La Roche Ltd GlaxoSmithKline Johnsonamp Johnson Lilly MERCK KGaA Novartis PharmaAG Sanofi-Aventis) was represented with at least onestudy With the exception of one company the criteriacould be met for the current version of the Data InventoryData sources used for the project depend on the access

to the systems by the local partners In total 15 EHRswere surveyed because some sites used data from theirwhole EHR and others data from one or more depart-mental subsystems for example specific systems forbreast cancer or diabetes

MethodsThe process to create the Data Inventory was iterativeand consensus driven An overview of the main steps ofour iterative approach is summarized in Figure 1 Face-to-face meetings and telephone conferences were carriedout to achieve common understandings and agreementsBetween ten and 30 people attended the meetings andcalls depending on their availability As a startingground pharmaceutical companies were asked to pro-vide a list of the most commonly used data elements forthe feasibility phase based on their own personal experi-ence Elements were grouped by their context to createthe data groups and afterwards the initial list was itera-tively extended by data elements from a total of 17 stud-ies The data elements from the study protocols wereextracted by expert-driven manual lsquosimplificationrsquo ofeligibility criteria [8] feasibility and recruitment expertsfrom the companies removed unnecessary text phrasesor unimportant information until the core informationfor lsquofeasibility criteriarsquo remained lsquoPatient with confirmeddeep vein thrombosisrsquo was for example simplified tolsquoDiagnosisText deep vein thrombosisrsquo Data exports at theeleven EHR4CR sites were conducted to capture the avail-ability of each element (available yesno) and the frequencyof documentation (measured in relative percentages) at the

Table 2 Overview of the companies the numbers and disease areas of studies used

Cardiovascular Diabetes Inflammatory Oncology Neurology

AMGEN 1

AstraZeneca 1

Bayer Health Care 2

GlaxoSmithKline 3

Johnson amp Johnson 1 1

MERCK KGaA 2

Novartis Pharma AG 1 1 1

F Hoffmann-La Roche Ltd 1

Sanofi-Aventis 2

Figure 1 Main steps of the iterative approach to create the data inventory A list of data elements (DI (previous iteration)) is extended bydata elements of simplified eligibility criteria The availability of data elements from the extended list (Preliminary DI) then get validated throughdata exports (DE) at the sites which afterwards get analyzed Data elements which are hardly used or not available at the sites get removed fromthe DI and are added to the wish list The remaining elements form a new version of the Data Inventory (DI (new version))

Doods et al Trials 2014 1518 Page 3 of 10httpwwwtrialsjournalcomcontent15118

source systems To distinguish between data elementsthat are not available in EHRs and those that are simplynot documented - both could in theory be represented aslsquo0rsquo - availability and frequency of a data element werecaptured separately To avoid privacy concerns and allowfor comparability between sites the relative percentage ofeach element was captured instead of absolute numbersRelative percentages were calculated by first identifyinghow many patients had an entry in the EHR for each dataelement and then dividing it by the number of all pa-tients seen in the respective time frame These exportswere then analyzed by creating rankings and heat mapsdisplaying the general availability and usage of the ele-ments by using different colors Microsoft Excel [9] wasused for the analysis and creation of the heat maps Theheat maps were created using the conditional format-ting featureFor each iteration consensuses on the data elements

in the Data Inventory were agreed for example splittingthe element lsquoChild bearing statusrsquo up into four separateelements (lsquoCurrently pregnantrsquo lsquoPregnancy numberrsquolsquoMenopausal statusrsquo lsquoLactationrsquo) Other decisions werethat elements were moved to a separate list referred toas lsquowish listrsquo because they are not available or hardlyused at any of the sitesEach of the data elements was manually reviewed by a

peer group of ten pharmaceutical and informaticsexperts The background of the pharmaceutical expertsis in feasibility assessmentmanagement drug safetydata managementanalytics and clinical operations Thereview was needed to determine whether the elementswere viable for the feasibility stage and whether themeaning of the element name was clear to all peergroup members Once a common understanding wasreached definitions were identified and added to eachdata element

ResultsIn the following section the Data Inventory in its currentversion is described as well as the wish list and result fromthe latest data exports

Data groupsThe data groups define the context of the data elementsData groups which a data element can belong to aredemographics medical history diagnosis procedure

findings laboratory findings medication scores andclassifications or patient characteristics

Data InventoryThe Data Inventory in its current version is composedof 75 elements It consists of 5 demographics 4 diagno-sis 7 findings 41 laboratory findings 8 medical history7 medication and 3 procedure data elements The defin-ition of each element contains a link to the correspond-ing UMLS Concept Unique Identifiers [10] at the NCIMetathesaurus (NCIm) [11] and a textual description Incase the NCIm reference did not contain a textual defin-ition a suitable one was created by the expert groupAn overview of examples from the Data Inventory

containing data elements from each data group can beseen in Figure 2 The whole Data Inventory can be foundin the additional material (Additional file 1)

Wish listData elements that showed through the data exports tobe not available or hardly documented were removedfrom the Data Inventory and put on a separate list whichcontains 21 data elements The rarely available data ele-ments in EHRs are listed in Table 3

Availability of data elementsData exports captured the availability and the usage ofthe data elements Elements which are highly availableare from the data groups demographics diagnosis pro-cedures and the majority of the laboratory findingsRarely documented are the elements from the groupslsquoscores and classificationsrsquo and medical history with theexception of allergies and hypersensitivity reactionsMedication and findings data elements are generallyavailable but not in all of the systemsThe color-coded heat map (Figure 3) gives an overview

of the general availability of the data elements of the

Figure 2 Examples from the data inventory Each element in the data inventory contains a sequential number the data group and item anexample of a possible value the textual definition and the corresponding NCIm link

Doods et al Trials 2014 1518 Page 4 of 10httpwwwtrialsjournalcomcontent15118

Data Inventory The six least available data elements inthis figure were moved to the wish list after the analysisof the heat map

DiscussionThe overall goal of this work was to identify data ele-ments that are needed for site feasibility analysis in clin-ical studies and are at the same time commonlydocumented in European EHR systems The heat mapwas created to determine the availability across the dataprovider sites The coloring of each cell was considereda good means to give an overview of how frequentlyeach element is documented and especially highlightthose which are generally not used Widely available dataelements are from data groups demographics diagnosisprocedures and laboratory findings Under-documentedelements are those captured in the wish list which arefrom the groups lsquoscores and classificationsrsquo and medicalhistory We chose to use own groups instead of usingWeng et alLuo et alrsquos [1213] semantic classes becauseour focus was not only on clinical trials but also onEHRs With the data groups we also wanted to indicatewhere data could be found in EHRs For example proce-dures are used in Europe for Diagnosis Related Groups(DRG) Consequently diagnostic and therapeutic proce-dures would be covered by the same data group Eachelement was reviewed by feasibility and recruitment ex-perts and contains a definition and link to the NCIm toensure a clear understanding and avoid confusion of theexact meaning Value lists for example diagnoses thatare frequently used for the projectrsquos disease areas arenot specified and out of scope of this work thereforethey were not taken into consideration The focus is onavailability and frequency of data elements at EHR4CR

pilot sites so only examples for values are given Thehigh availability of the elements from the aforemen-tioned data groups is most likely because they areneeded for reimbursement and quality managementLaboratories have started structuring their data veryearly because in most cases special laboratory informa-tion systems are connected to the main EHR Becausethe data have to be available there laboratory findingsare also highly available The exact reasons howeverwere not investigated Despite the fact that a lot oflaboratory findings are available in EHRs many labora-tories do not yet use standard terminologies like LogicalObservation Identifiers Names and Codes (LOINC)Classifications like the International Statistical Classifica-tion of Diseases and Related Health Problems 10threvision (ICD10) on the other hand are standard inEuropean EHRs That is also the reason why we usedgeneral data elements for diagnoses and procedures be-cause it is generally easier to identify diagnosis and pro-cedures data in EHRs than it is to find non standardizeddata elementsData elements were ranked according to the availability

of the exports so that elements of low availability couldbe identified The wish list contains those additional dataelements that are relevant for clinical research but notdocumented in a structured manner during routine careTo enhance secondary use of data for clinical researchEHR systems could be extended to allow a structureddocumentation of these elements like the Eastern Co-operative Oncology Group (ECOG) This would not ne-cessarily result in more documentation work but ratherin a different representation of the same content Insteadof free text lsquopatient is bedriddenrsquo an ECOG score of 4could be documented We assume that scores like theMini-Mental State Examination (MMSE) and medicalhistory elements like lsquocurrent method of contraceptionrsquoare primarily documented for research purposes There isgenerally no direct incentive for documentation of inten-sive scales with many data elements especially consider-ing that physicians already spend equal or more timeon patient documentation than on direct patient care[1415] This might indicate why those elements droppedout of the Data Inventory and into the wish list Likewisethis might be the case for the medical history dataelements lsquoTrial titlersquo lsquoInclusion datersquo and lsquoEnd of participa-tion datersquo which are also related to research We furtherassume that data elements from the data group medicalhistory are more frequently documented than our ex-ports show but most likely in free text and not as struc-tured data Natural language processing is not within thescope of EHR4CR and therefore only structured datawere used The number of data elements that can befound in free text was not further investigated and is sub-ject for future research

Table 3 Data elements of the wish list

Data group Data items

Findings QTc interval Left ventricular ejection fraction

Laboratoryfindings

MAGE-A3 status

Medicalhistory

Trial title Inclusion date End of participation dateCurrent method of contraception Vaccines HIV statusLactation

Patientcharacteristics

Willingness to participate in clinical trials

Scoresclassifications

Date of scoreClassifications Karnofsky-score EasternCooperative Oncology Group -performance statusTNM-classification New York Heart Association - statusResponse Evaluation Criteria in Solid Tumors Hoehnand Yahr scale GRID-Hamilton Depression Rating ScaleMini-Mental State Examination Unified ParkinsonrsquosDisease Rating Scale Section 1

The wish list contains data elements that are currently not or very rarelyavailable in European EHRs but that are frequently requested instudy protocols

Doods et al Trials 2014 1518 Page 5 of 10httpwwwtrialsjournalcomcontent15118

To create a valid and sponsor (EFPIA or IIT)-acceptedData Inventory we decided to compile the list throughan iterative and consensus driven process with partnersfrom academia and strong participation from domainexperts of European pharmaceutical companiesThe EFPIA partners in the project are among the largest

researching pharmaceutical companies in Europe withmany studies each year This way both sides added their

perspectives and increased the acceptance of such a listThe international character of the focus group and thevalidation at several university hospitals makes the datainventory meaningful beyond national borders No ref-erences on settings of average European hospitals werefound so whether the data exports at non-universityhospitals would have resulted in similar availabilitynumbers cannot be stated An iterative approach was

Figure 3 Heat map of the data exports from the data inventory current version The first two columns describe the ISO 11179 dataelement concept (data groupdata item) The third column shows the average usage of the data element over all sites while the followingcolumns (site 1 to site 9) display the frequency at the individual sites The Data Inventory is ordered by the average usage sorted in descendingorder from most available to least The frequency ranges from 100 (dark green) to 0 (dark red) Data elements that are not available at a siteare shown as Not Available (NA) (black)

Doods et al Trials 2014 1518 Page 6 of 10httpwwwtrialsjournalcomcontent15118

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

analysis could be achieved through re-use of EHR datato generate more reliable patient count estimates forpharmaceutical sponsors of clinical studies and spon-sors of investigator initiated trials (IITs) alike If theprotocol designers know how the feasibility numberscome about they can redefine their criteria to improvethe protocolsEven though EHRs are being adopted in more and more

hospitals data is not necessarily reusable Data can becaptured fully structured semi-structured or in free textStructured data are documented during routine patient carethrough the use of national value sets or terminologies butit is currently unclear what kind of data besides data forreimbursement are available across European EHR sys-tems It is also unclear how much of these data are relevantand specific enough for clinical research and which dataelements are most relevant for feasibility analyses of clinicalstudiesTo tackle those issues the IMI [4] funded project

lsquoElectronic Health Records for Clinical Researchrsquo (EHR4CR)[5] aims to support clinical trials including site feasibilityanalysis through the re-use of EHR data The project runsover four years (2011 to 2014) and being a public-privatepartnership consists of 33 partners from industry and aca-demia Clinical partners are located in France GermanyPoland Switzerland and the United Kingdom Scenarioswhich will be addressed are lsquoclinical protocol feasibilityrsquo lsquopa-tient identification and recruitmentrsquo lsquoclinical trial executionrsquoand lsquoadverse event reportingrsquo EHR4CR is focused on thefollowing disease areas oncology inflammatory diseasesneuroscience diabetes cardiovascular and respiratory dis-eases The project will utilize existing or specifically createdclinical data warehouses and connect those databases tothe lsquoEHR4CR Platformrsquo in secure technical ways and com-ply with European data protection lawsAs part of the workpackage lsquoPilotsrsquo (WP7) we aim to

obtain an overview of the data content and frequency inEHRs which allow electronic support for protocol feasi-bility Our objective is to develop an inventory of avail-able core data elements of European EHRs for all thedisease areas of the EHR4CR project These data ele-ments have to be relevant for clinical research accordingto clinical trial experts from the European Federationof Pharmaceutical Industries and Associations (EFPIA)Our motivation is to foster secondary use of EHR datafor research and our research question therefore is Whatare the common data elements in Europe relevant for site

feasibility analyses and what is currently available inEHRs to create a valid and EFPIA accepted inventory

MethodsData elementThe term lsquodata elementrsquo is used in several contexts withmultiple possible meanings The ISOIEC 11179 Standarddefines a data element in Part 1 [6] as follows

lsquoA data element is produced when a representation isassociated with a data element concept Therepresentation describes the form of the dataincluding a value domain datatype representationclass (optionally) and if necessary a unit of measurersquo

In the following we focus on a consented definitionwith a data element concept comprising two parts Thefirst part is assigned to identify groups of related dataelements (data group) for example lsquoFindingsrsquo while thesecond specifies the datatype in more detail (data item)for example lsquoWeightrsquo Links to Unified Medical LanguageSystem (UMLS) [7] codes are provided to identify theunderlying medical concepts Representations as definedin ISOIEC 11179 with value domains data types andunits of measurement for each data element were notspecified for the Data Inventory because data sourceswith different languages were analyzed Instead examplesof typical values were provided Table 1 shows an exampleof such a data element

Data inventoryThe Data Inventory is a catalog of data elements Everydata element consists of a data group and data item partwhich together correspond to ISO 11179rsquos data elementconcept Elements also contain a sequential ID an ex-ample for a possible data value a definition and a link tothe UMLS code of its medical concept

MaterialThe Data Inventory was created from an initial listprovided by the pharmaceutical companies with dataelements they consider most important for their stud-ies In addition the inventory contains data elementsfrom 17 studies from acute or chronic diseases in on-cology neurology diabetes cardiovascular and inflam-matory diseases (see Table 2) These studies were selectedfrom the EFPIA companies in the EHR4CR project and

Table 1 Data element example

Data element concept Example Consensus definition Link

FindingsWeight 80 kg The weight of a subject httpncimncinihgovncimbrowserpagesconcept_detailsjsfdictionary=NCI20MetaThesaurusampcode=C0005910

Example for the definition of data elements Data element concepts consist of a data group and data item part The elements also contain an example adefinition and a link to the NCI Metathesaurus referencing its UMLS concept

Doods et al Trials 2014 1518 Page 2 of 10httpwwwtrialsjournalcomcontent15118

had finished their feasibility phase as of end 2011 The se-lection excluded Phase I and non-interventional studiesAdditional criteria for the selection were that the studiesshould have run at least at one EHR4CR data providersite (the participating hospitals are Assistance Publique -Hocircpitaux de Paris Friedrich-Alexander-UniversitaumltErlangen-Nuumlrnberg Hocircpitaux Universitaires de GenegraveveKings College London Medical University of WarsawUniversiteacute de Rennes University College London Universityof Dundee University of Glasgow University of ManchesterWestfaumllische Wilhelms-Universitaumlt Muumlnster) - preferencewas given to those studies that ran at the most - andthat each EFPIA company (participating companiesare AMGEN AstraZeneca Bayer Health Care FHoffmann-La Roche Ltd GlaxoSmithKline Johnsonamp Johnson Lilly MERCK KGaA Novartis PharmaAG Sanofi-Aventis) was represented with at least onestudy With the exception of one company the criteriacould be met for the current version of the Data InventoryData sources used for the project depend on the access

to the systems by the local partners In total 15 EHRswere surveyed because some sites used data from theirwhole EHR and others data from one or more depart-mental subsystems for example specific systems forbreast cancer or diabetes

MethodsThe process to create the Data Inventory was iterativeand consensus driven An overview of the main steps ofour iterative approach is summarized in Figure 1 Face-to-face meetings and telephone conferences were carriedout to achieve common understandings and agreementsBetween ten and 30 people attended the meetings andcalls depending on their availability As a startingground pharmaceutical companies were asked to pro-vide a list of the most commonly used data elements forthe feasibility phase based on their own personal experi-ence Elements were grouped by their context to createthe data groups and afterwards the initial list was itera-tively extended by data elements from a total of 17 stud-ies The data elements from the study protocols wereextracted by expert-driven manual lsquosimplificationrsquo ofeligibility criteria [8] feasibility and recruitment expertsfrom the companies removed unnecessary text phrasesor unimportant information until the core informationfor lsquofeasibility criteriarsquo remained lsquoPatient with confirmeddeep vein thrombosisrsquo was for example simplified tolsquoDiagnosisText deep vein thrombosisrsquo Data exports at theeleven EHR4CR sites were conducted to capture the avail-ability of each element (available yesno) and the frequencyof documentation (measured in relative percentages) at the

Table 2 Overview of the companies the numbers and disease areas of studies used

Cardiovascular Diabetes Inflammatory Oncology Neurology

AMGEN 1

AstraZeneca 1

Bayer Health Care 2

GlaxoSmithKline 3

Johnson amp Johnson 1 1

MERCK KGaA 2

Novartis Pharma AG 1 1 1

F Hoffmann-La Roche Ltd 1

Sanofi-Aventis 2

Figure 1 Main steps of the iterative approach to create the data inventory A list of data elements (DI (previous iteration)) is extended bydata elements of simplified eligibility criteria The availability of data elements from the extended list (Preliminary DI) then get validated throughdata exports (DE) at the sites which afterwards get analyzed Data elements which are hardly used or not available at the sites get removed fromthe DI and are added to the wish list The remaining elements form a new version of the Data Inventory (DI (new version))

Doods et al Trials 2014 1518 Page 3 of 10httpwwwtrialsjournalcomcontent15118

source systems To distinguish between data elementsthat are not available in EHRs and those that are simplynot documented - both could in theory be represented aslsquo0rsquo - availability and frequency of a data element werecaptured separately To avoid privacy concerns and allowfor comparability between sites the relative percentage ofeach element was captured instead of absolute numbersRelative percentages were calculated by first identifyinghow many patients had an entry in the EHR for each dataelement and then dividing it by the number of all pa-tients seen in the respective time frame These exportswere then analyzed by creating rankings and heat mapsdisplaying the general availability and usage of the ele-ments by using different colors Microsoft Excel [9] wasused for the analysis and creation of the heat maps Theheat maps were created using the conditional format-ting featureFor each iteration consensuses on the data elements

in the Data Inventory were agreed for example splittingthe element lsquoChild bearing statusrsquo up into four separateelements (lsquoCurrently pregnantrsquo lsquoPregnancy numberrsquolsquoMenopausal statusrsquo lsquoLactationrsquo) Other decisions werethat elements were moved to a separate list referred toas lsquowish listrsquo because they are not available or hardlyused at any of the sitesEach of the data elements was manually reviewed by a

peer group of ten pharmaceutical and informaticsexperts The background of the pharmaceutical expertsis in feasibility assessmentmanagement drug safetydata managementanalytics and clinical operations Thereview was needed to determine whether the elementswere viable for the feasibility stage and whether themeaning of the element name was clear to all peergroup members Once a common understanding wasreached definitions were identified and added to eachdata element

ResultsIn the following section the Data Inventory in its currentversion is described as well as the wish list and result fromthe latest data exports

Data groupsThe data groups define the context of the data elementsData groups which a data element can belong to aredemographics medical history diagnosis procedure

findings laboratory findings medication scores andclassifications or patient characteristics

Data InventoryThe Data Inventory in its current version is composedof 75 elements It consists of 5 demographics 4 diagno-sis 7 findings 41 laboratory findings 8 medical history7 medication and 3 procedure data elements The defin-ition of each element contains a link to the correspond-ing UMLS Concept Unique Identifiers [10] at the NCIMetathesaurus (NCIm) [11] and a textual description Incase the NCIm reference did not contain a textual defin-ition a suitable one was created by the expert groupAn overview of examples from the Data Inventory

containing data elements from each data group can beseen in Figure 2 The whole Data Inventory can be foundin the additional material (Additional file 1)

Wish listData elements that showed through the data exports tobe not available or hardly documented were removedfrom the Data Inventory and put on a separate list whichcontains 21 data elements The rarely available data ele-ments in EHRs are listed in Table 3

Availability of data elementsData exports captured the availability and the usage ofthe data elements Elements which are highly availableare from the data groups demographics diagnosis pro-cedures and the majority of the laboratory findingsRarely documented are the elements from the groupslsquoscores and classificationsrsquo and medical history with theexception of allergies and hypersensitivity reactionsMedication and findings data elements are generallyavailable but not in all of the systemsThe color-coded heat map (Figure 3) gives an overview

of the general availability of the data elements of the

Figure 2 Examples from the data inventory Each element in the data inventory contains a sequential number the data group and item anexample of a possible value the textual definition and the corresponding NCIm link

Doods et al Trials 2014 1518 Page 4 of 10httpwwwtrialsjournalcomcontent15118

Data Inventory The six least available data elements inthis figure were moved to the wish list after the analysisof the heat map

DiscussionThe overall goal of this work was to identify data ele-ments that are needed for site feasibility analysis in clin-ical studies and are at the same time commonlydocumented in European EHR systems The heat mapwas created to determine the availability across the dataprovider sites The coloring of each cell was considereda good means to give an overview of how frequentlyeach element is documented and especially highlightthose which are generally not used Widely available dataelements are from data groups demographics diagnosisprocedures and laboratory findings Under-documentedelements are those captured in the wish list which arefrom the groups lsquoscores and classificationsrsquo and medicalhistory We chose to use own groups instead of usingWeng et alLuo et alrsquos [1213] semantic classes becauseour focus was not only on clinical trials but also onEHRs With the data groups we also wanted to indicatewhere data could be found in EHRs For example proce-dures are used in Europe for Diagnosis Related Groups(DRG) Consequently diagnostic and therapeutic proce-dures would be covered by the same data group Eachelement was reviewed by feasibility and recruitment ex-perts and contains a definition and link to the NCIm toensure a clear understanding and avoid confusion of theexact meaning Value lists for example diagnoses thatare frequently used for the projectrsquos disease areas arenot specified and out of scope of this work thereforethey were not taken into consideration The focus is onavailability and frequency of data elements at EHR4CR

pilot sites so only examples for values are given Thehigh availability of the elements from the aforemen-tioned data groups is most likely because they areneeded for reimbursement and quality managementLaboratories have started structuring their data veryearly because in most cases special laboratory informa-tion systems are connected to the main EHR Becausethe data have to be available there laboratory findingsare also highly available The exact reasons howeverwere not investigated Despite the fact that a lot oflaboratory findings are available in EHRs many labora-tories do not yet use standard terminologies like LogicalObservation Identifiers Names and Codes (LOINC)Classifications like the International Statistical Classifica-tion of Diseases and Related Health Problems 10threvision (ICD10) on the other hand are standard inEuropean EHRs That is also the reason why we usedgeneral data elements for diagnoses and procedures be-cause it is generally easier to identify diagnosis and pro-cedures data in EHRs than it is to find non standardizeddata elementsData elements were ranked according to the availability

of the exports so that elements of low availability couldbe identified The wish list contains those additional dataelements that are relevant for clinical research but notdocumented in a structured manner during routine careTo enhance secondary use of data for clinical researchEHR systems could be extended to allow a structureddocumentation of these elements like the Eastern Co-operative Oncology Group (ECOG) This would not ne-cessarily result in more documentation work but ratherin a different representation of the same content Insteadof free text lsquopatient is bedriddenrsquo an ECOG score of 4could be documented We assume that scores like theMini-Mental State Examination (MMSE) and medicalhistory elements like lsquocurrent method of contraceptionrsquoare primarily documented for research purposes There isgenerally no direct incentive for documentation of inten-sive scales with many data elements especially consider-ing that physicians already spend equal or more timeon patient documentation than on direct patient care[1415] This might indicate why those elements droppedout of the Data Inventory and into the wish list Likewisethis might be the case for the medical history dataelements lsquoTrial titlersquo lsquoInclusion datersquo and lsquoEnd of participa-tion datersquo which are also related to research We furtherassume that data elements from the data group medicalhistory are more frequently documented than our ex-ports show but most likely in free text and not as struc-tured data Natural language processing is not within thescope of EHR4CR and therefore only structured datawere used The number of data elements that can befound in free text was not further investigated and is sub-ject for future research

Table 3 Data elements of the wish list

Data group Data items

Findings QTc interval Left ventricular ejection fraction

Laboratoryfindings

MAGE-A3 status

Medicalhistory

Trial title Inclusion date End of participation dateCurrent method of contraception Vaccines HIV statusLactation

Patientcharacteristics

Willingness to participate in clinical trials

Scoresclassifications

Date of scoreClassifications Karnofsky-score EasternCooperative Oncology Group -performance statusTNM-classification New York Heart Association - statusResponse Evaluation Criteria in Solid Tumors Hoehnand Yahr scale GRID-Hamilton Depression Rating ScaleMini-Mental State Examination Unified ParkinsonrsquosDisease Rating Scale Section 1

The wish list contains data elements that are currently not or very rarelyavailable in European EHRs but that are frequently requested instudy protocols

Doods et al Trials 2014 1518 Page 5 of 10httpwwwtrialsjournalcomcontent15118

To create a valid and sponsor (EFPIA or IIT)-acceptedData Inventory we decided to compile the list throughan iterative and consensus driven process with partnersfrom academia and strong participation from domainexperts of European pharmaceutical companiesThe EFPIA partners in the project are among the largest

researching pharmaceutical companies in Europe withmany studies each year This way both sides added their

perspectives and increased the acceptance of such a listThe international character of the focus group and thevalidation at several university hospitals makes the datainventory meaningful beyond national borders No ref-erences on settings of average European hospitals werefound so whether the data exports at non-universityhospitals would have resulted in similar availabilitynumbers cannot be stated An iterative approach was

Figure 3 Heat map of the data exports from the data inventory current version The first two columns describe the ISO 11179 dataelement concept (data groupdata item) The third column shows the average usage of the data element over all sites while the followingcolumns (site 1 to site 9) display the frequency at the individual sites The Data Inventory is ordered by the average usage sorted in descendingorder from most available to least The frequency ranges from 100 (dark green) to 0 (dark red) Data elements that are not available at a siteare shown as Not Available (NA) (black)

Doods et al Trials 2014 1518 Page 6 of 10httpwwwtrialsjournalcomcontent15118

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

had finished their feasibility phase as of end 2011 The se-lection excluded Phase I and non-interventional studiesAdditional criteria for the selection were that the studiesshould have run at least at one EHR4CR data providersite (the participating hospitals are Assistance Publique -Hocircpitaux de Paris Friedrich-Alexander-UniversitaumltErlangen-Nuumlrnberg Hocircpitaux Universitaires de GenegraveveKings College London Medical University of WarsawUniversiteacute de Rennes University College London Universityof Dundee University of Glasgow University of ManchesterWestfaumllische Wilhelms-Universitaumlt Muumlnster) - preferencewas given to those studies that ran at the most - andthat each EFPIA company (participating companiesare AMGEN AstraZeneca Bayer Health Care FHoffmann-La Roche Ltd GlaxoSmithKline Johnsonamp Johnson Lilly MERCK KGaA Novartis PharmaAG Sanofi-Aventis) was represented with at least onestudy With the exception of one company the criteriacould be met for the current version of the Data InventoryData sources used for the project depend on the access

to the systems by the local partners In total 15 EHRswere surveyed because some sites used data from theirwhole EHR and others data from one or more depart-mental subsystems for example specific systems forbreast cancer or diabetes

MethodsThe process to create the Data Inventory was iterativeand consensus driven An overview of the main steps ofour iterative approach is summarized in Figure 1 Face-to-face meetings and telephone conferences were carriedout to achieve common understandings and agreementsBetween ten and 30 people attended the meetings andcalls depending on their availability As a startingground pharmaceutical companies were asked to pro-vide a list of the most commonly used data elements forthe feasibility phase based on their own personal experi-ence Elements were grouped by their context to createthe data groups and afterwards the initial list was itera-tively extended by data elements from a total of 17 stud-ies The data elements from the study protocols wereextracted by expert-driven manual lsquosimplificationrsquo ofeligibility criteria [8] feasibility and recruitment expertsfrom the companies removed unnecessary text phrasesor unimportant information until the core informationfor lsquofeasibility criteriarsquo remained lsquoPatient with confirmeddeep vein thrombosisrsquo was for example simplified tolsquoDiagnosisText deep vein thrombosisrsquo Data exports at theeleven EHR4CR sites were conducted to capture the avail-ability of each element (available yesno) and the frequencyof documentation (measured in relative percentages) at the

Table 2 Overview of the companies the numbers and disease areas of studies used

Cardiovascular Diabetes Inflammatory Oncology Neurology

AMGEN 1

AstraZeneca 1

Bayer Health Care 2

GlaxoSmithKline 3

Johnson amp Johnson 1 1

MERCK KGaA 2

Novartis Pharma AG 1 1 1

F Hoffmann-La Roche Ltd 1

Sanofi-Aventis 2

Figure 1 Main steps of the iterative approach to create the data inventory A list of data elements (DI (previous iteration)) is extended bydata elements of simplified eligibility criteria The availability of data elements from the extended list (Preliminary DI) then get validated throughdata exports (DE) at the sites which afterwards get analyzed Data elements which are hardly used or not available at the sites get removed fromthe DI and are added to the wish list The remaining elements form a new version of the Data Inventory (DI (new version))

Doods et al Trials 2014 1518 Page 3 of 10httpwwwtrialsjournalcomcontent15118

source systems To distinguish between data elementsthat are not available in EHRs and those that are simplynot documented - both could in theory be represented aslsquo0rsquo - availability and frequency of a data element werecaptured separately To avoid privacy concerns and allowfor comparability between sites the relative percentage ofeach element was captured instead of absolute numbersRelative percentages were calculated by first identifyinghow many patients had an entry in the EHR for each dataelement and then dividing it by the number of all pa-tients seen in the respective time frame These exportswere then analyzed by creating rankings and heat mapsdisplaying the general availability and usage of the ele-ments by using different colors Microsoft Excel [9] wasused for the analysis and creation of the heat maps Theheat maps were created using the conditional format-ting featureFor each iteration consensuses on the data elements

in the Data Inventory were agreed for example splittingthe element lsquoChild bearing statusrsquo up into four separateelements (lsquoCurrently pregnantrsquo lsquoPregnancy numberrsquolsquoMenopausal statusrsquo lsquoLactationrsquo) Other decisions werethat elements were moved to a separate list referred toas lsquowish listrsquo because they are not available or hardlyused at any of the sitesEach of the data elements was manually reviewed by a

peer group of ten pharmaceutical and informaticsexperts The background of the pharmaceutical expertsis in feasibility assessmentmanagement drug safetydata managementanalytics and clinical operations Thereview was needed to determine whether the elementswere viable for the feasibility stage and whether themeaning of the element name was clear to all peergroup members Once a common understanding wasreached definitions were identified and added to eachdata element

ResultsIn the following section the Data Inventory in its currentversion is described as well as the wish list and result fromthe latest data exports

Data groupsThe data groups define the context of the data elementsData groups which a data element can belong to aredemographics medical history diagnosis procedure

findings laboratory findings medication scores andclassifications or patient characteristics

Data InventoryThe Data Inventory in its current version is composedof 75 elements It consists of 5 demographics 4 diagno-sis 7 findings 41 laboratory findings 8 medical history7 medication and 3 procedure data elements The defin-ition of each element contains a link to the correspond-ing UMLS Concept Unique Identifiers [10] at the NCIMetathesaurus (NCIm) [11] and a textual description Incase the NCIm reference did not contain a textual defin-ition a suitable one was created by the expert groupAn overview of examples from the Data Inventory

containing data elements from each data group can beseen in Figure 2 The whole Data Inventory can be foundin the additional material (Additional file 1)

Wish listData elements that showed through the data exports tobe not available or hardly documented were removedfrom the Data Inventory and put on a separate list whichcontains 21 data elements The rarely available data ele-ments in EHRs are listed in Table 3

Availability of data elementsData exports captured the availability and the usage ofthe data elements Elements which are highly availableare from the data groups demographics diagnosis pro-cedures and the majority of the laboratory findingsRarely documented are the elements from the groupslsquoscores and classificationsrsquo and medical history with theexception of allergies and hypersensitivity reactionsMedication and findings data elements are generallyavailable but not in all of the systemsThe color-coded heat map (Figure 3) gives an overview

of the general availability of the data elements of the

Figure 2 Examples from the data inventory Each element in the data inventory contains a sequential number the data group and item anexample of a possible value the textual definition and the corresponding NCIm link

Doods et al Trials 2014 1518 Page 4 of 10httpwwwtrialsjournalcomcontent15118

Data Inventory The six least available data elements inthis figure were moved to the wish list after the analysisof the heat map

DiscussionThe overall goal of this work was to identify data ele-ments that are needed for site feasibility analysis in clin-ical studies and are at the same time commonlydocumented in European EHR systems The heat mapwas created to determine the availability across the dataprovider sites The coloring of each cell was considereda good means to give an overview of how frequentlyeach element is documented and especially highlightthose which are generally not used Widely available dataelements are from data groups demographics diagnosisprocedures and laboratory findings Under-documentedelements are those captured in the wish list which arefrom the groups lsquoscores and classificationsrsquo and medicalhistory We chose to use own groups instead of usingWeng et alLuo et alrsquos [1213] semantic classes becauseour focus was not only on clinical trials but also onEHRs With the data groups we also wanted to indicatewhere data could be found in EHRs For example proce-dures are used in Europe for Diagnosis Related Groups(DRG) Consequently diagnostic and therapeutic proce-dures would be covered by the same data group Eachelement was reviewed by feasibility and recruitment ex-perts and contains a definition and link to the NCIm toensure a clear understanding and avoid confusion of theexact meaning Value lists for example diagnoses thatare frequently used for the projectrsquos disease areas arenot specified and out of scope of this work thereforethey were not taken into consideration The focus is onavailability and frequency of data elements at EHR4CR

pilot sites so only examples for values are given Thehigh availability of the elements from the aforemen-tioned data groups is most likely because they areneeded for reimbursement and quality managementLaboratories have started structuring their data veryearly because in most cases special laboratory informa-tion systems are connected to the main EHR Becausethe data have to be available there laboratory findingsare also highly available The exact reasons howeverwere not investigated Despite the fact that a lot oflaboratory findings are available in EHRs many labora-tories do not yet use standard terminologies like LogicalObservation Identifiers Names and Codes (LOINC)Classifications like the International Statistical Classifica-tion of Diseases and Related Health Problems 10threvision (ICD10) on the other hand are standard inEuropean EHRs That is also the reason why we usedgeneral data elements for diagnoses and procedures be-cause it is generally easier to identify diagnosis and pro-cedures data in EHRs than it is to find non standardizeddata elementsData elements were ranked according to the availability

of the exports so that elements of low availability couldbe identified The wish list contains those additional dataelements that are relevant for clinical research but notdocumented in a structured manner during routine careTo enhance secondary use of data for clinical researchEHR systems could be extended to allow a structureddocumentation of these elements like the Eastern Co-operative Oncology Group (ECOG) This would not ne-cessarily result in more documentation work but ratherin a different representation of the same content Insteadof free text lsquopatient is bedriddenrsquo an ECOG score of 4could be documented We assume that scores like theMini-Mental State Examination (MMSE) and medicalhistory elements like lsquocurrent method of contraceptionrsquoare primarily documented for research purposes There isgenerally no direct incentive for documentation of inten-sive scales with many data elements especially consider-ing that physicians already spend equal or more timeon patient documentation than on direct patient care[1415] This might indicate why those elements droppedout of the Data Inventory and into the wish list Likewisethis might be the case for the medical history dataelements lsquoTrial titlersquo lsquoInclusion datersquo and lsquoEnd of participa-tion datersquo which are also related to research We furtherassume that data elements from the data group medicalhistory are more frequently documented than our ex-ports show but most likely in free text and not as struc-tured data Natural language processing is not within thescope of EHR4CR and therefore only structured datawere used The number of data elements that can befound in free text was not further investigated and is sub-ject for future research

Table 3 Data elements of the wish list

Data group Data items

Findings QTc interval Left ventricular ejection fraction

Laboratoryfindings

MAGE-A3 status

Medicalhistory

Trial title Inclusion date End of participation dateCurrent method of contraception Vaccines HIV statusLactation

Patientcharacteristics

Willingness to participate in clinical trials

Scoresclassifications

Date of scoreClassifications Karnofsky-score EasternCooperative Oncology Group -performance statusTNM-classification New York Heart Association - statusResponse Evaluation Criteria in Solid Tumors Hoehnand Yahr scale GRID-Hamilton Depression Rating ScaleMini-Mental State Examination Unified ParkinsonrsquosDisease Rating Scale Section 1

The wish list contains data elements that are currently not or very rarelyavailable in European EHRs but that are frequently requested instudy protocols

Doods et al Trials 2014 1518 Page 5 of 10httpwwwtrialsjournalcomcontent15118

To create a valid and sponsor (EFPIA or IIT)-acceptedData Inventory we decided to compile the list throughan iterative and consensus driven process with partnersfrom academia and strong participation from domainexperts of European pharmaceutical companiesThe EFPIA partners in the project are among the largest

researching pharmaceutical companies in Europe withmany studies each year This way both sides added their

perspectives and increased the acceptance of such a listThe international character of the focus group and thevalidation at several university hospitals makes the datainventory meaningful beyond national borders No ref-erences on settings of average European hospitals werefound so whether the data exports at non-universityhospitals would have resulted in similar availabilitynumbers cannot be stated An iterative approach was

Figure 3 Heat map of the data exports from the data inventory current version The first two columns describe the ISO 11179 dataelement concept (data groupdata item) The third column shows the average usage of the data element over all sites while the followingcolumns (site 1 to site 9) display the frequency at the individual sites The Data Inventory is ordered by the average usage sorted in descendingorder from most available to least The frequency ranges from 100 (dark green) to 0 (dark red) Data elements that are not available at a siteare shown as Not Available (NA) (black)

Doods et al Trials 2014 1518 Page 6 of 10httpwwwtrialsjournalcomcontent15118

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

source systems To distinguish between data elementsthat are not available in EHRs and those that are simplynot documented - both could in theory be represented aslsquo0rsquo - availability and frequency of a data element werecaptured separately To avoid privacy concerns and allowfor comparability between sites the relative percentage ofeach element was captured instead of absolute numbersRelative percentages were calculated by first identifyinghow many patients had an entry in the EHR for each dataelement and then dividing it by the number of all pa-tients seen in the respective time frame These exportswere then analyzed by creating rankings and heat mapsdisplaying the general availability and usage of the ele-ments by using different colors Microsoft Excel [9] wasused for the analysis and creation of the heat maps Theheat maps were created using the conditional format-ting featureFor each iteration consensuses on the data elements

in the Data Inventory were agreed for example splittingthe element lsquoChild bearing statusrsquo up into four separateelements (lsquoCurrently pregnantrsquo lsquoPregnancy numberrsquolsquoMenopausal statusrsquo lsquoLactationrsquo) Other decisions werethat elements were moved to a separate list referred toas lsquowish listrsquo because they are not available or hardlyused at any of the sitesEach of the data elements was manually reviewed by a

peer group of ten pharmaceutical and informaticsexperts The background of the pharmaceutical expertsis in feasibility assessmentmanagement drug safetydata managementanalytics and clinical operations Thereview was needed to determine whether the elementswere viable for the feasibility stage and whether themeaning of the element name was clear to all peergroup members Once a common understanding wasreached definitions were identified and added to eachdata element

ResultsIn the following section the Data Inventory in its currentversion is described as well as the wish list and result fromthe latest data exports

Data groupsThe data groups define the context of the data elementsData groups which a data element can belong to aredemographics medical history diagnosis procedure

findings laboratory findings medication scores andclassifications or patient characteristics

Data InventoryThe Data Inventory in its current version is composedof 75 elements It consists of 5 demographics 4 diagno-sis 7 findings 41 laboratory findings 8 medical history7 medication and 3 procedure data elements The defin-ition of each element contains a link to the correspond-ing UMLS Concept Unique Identifiers [10] at the NCIMetathesaurus (NCIm) [11] and a textual description Incase the NCIm reference did not contain a textual defin-ition a suitable one was created by the expert groupAn overview of examples from the Data Inventory

containing data elements from each data group can beseen in Figure 2 The whole Data Inventory can be foundin the additional material (Additional file 1)

Wish listData elements that showed through the data exports tobe not available or hardly documented were removedfrom the Data Inventory and put on a separate list whichcontains 21 data elements The rarely available data ele-ments in EHRs are listed in Table 3

Availability of data elementsData exports captured the availability and the usage ofthe data elements Elements which are highly availableare from the data groups demographics diagnosis pro-cedures and the majority of the laboratory findingsRarely documented are the elements from the groupslsquoscores and classificationsrsquo and medical history with theexception of allergies and hypersensitivity reactionsMedication and findings data elements are generallyavailable but not in all of the systemsThe color-coded heat map (Figure 3) gives an overview

of the general availability of the data elements of the

Figure 2 Examples from the data inventory Each element in the data inventory contains a sequential number the data group and item anexample of a possible value the textual definition and the corresponding NCIm link

Doods et al Trials 2014 1518 Page 4 of 10httpwwwtrialsjournalcomcontent15118

Data Inventory The six least available data elements inthis figure were moved to the wish list after the analysisof the heat map

DiscussionThe overall goal of this work was to identify data ele-ments that are needed for site feasibility analysis in clin-ical studies and are at the same time commonlydocumented in European EHR systems The heat mapwas created to determine the availability across the dataprovider sites The coloring of each cell was considereda good means to give an overview of how frequentlyeach element is documented and especially highlightthose which are generally not used Widely available dataelements are from data groups demographics diagnosisprocedures and laboratory findings Under-documentedelements are those captured in the wish list which arefrom the groups lsquoscores and classificationsrsquo and medicalhistory We chose to use own groups instead of usingWeng et alLuo et alrsquos [1213] semantic classes becauseour focus was not only on clinical trials but also onEHRs With the data groups we also wanted to indicatewhere data could be found in EHRs For example proce-dures are used in Europe for Diagnosis Related Groups(DRG) Consequently diagnostic and therapeutic proce-dures would be covered by the same data group Eachelement was reviewed by feasibility and recruitment ex-perts and contains a definition and link to the NCIm toensure a clear understanding and avoid confusion of theexact meaning Value lists for example diagnoses thatare frequently used for the projectrsquos disease areas arenot specified and out of scope of this work thereforethey were not taken into consideration The focus is onavailability and frequency of data elements at EHR4CR

pilot sites so only examples for values are given Thehigh availability of the elements from the aforemen-tioned data groups is most likely because they areneeded for reimbursement and quality managementLaboratories have started structuring their data veryearly because in most cases special laboratory informa-tion systems are connected to the main EHR Becausethe data have to be available there laboratory findingsare also highly available The exact reasons howeverwere not investigated Despite the fact that a lot oflaboratory findings are available in EHRs many labora-tories do not yet use standard terminologies like LogicalObservation Identifiers Names and Codes (LOINC)Classifications like the International Statistical Classifica-tion of Diseases and Related Health Problems 10threvision (ICD10) on the other hand are standard inEuropean EHRs That is also the reason why we usedgeneral data elements for diagnoses and procedures be-cause it is generally easier to identify diagnosis and pro-cedures data in EHRs than it is to find non standardizeddata elementsData elements were ranked according to the availability

of the exports so that elements of low availability couldbe identified The wish list contains those additional dataelements that are relevant for clinical research but notdocumented in a structured manner during routine careTo enhance secondary use of data for clinical researchEHR systems could be extended to allow a structureddocumentation of these elements like the Eastern Co-operative Oncology Group (ECOG) This would not ne-cessarily result in more documentation work but ratherin a different representation of the same content Insteadof free text lsquopatient is bedriddenrsquo an ECOG score of 4could be documented We assume that scores like theMini-Mental State Examination (MMSE) and medicalhistory elements like lsquocurrent method of contraceptionrsquoare primarily documented for research purposes There isgenerally no direct incentive for documentation of inten-sive scales with many data elements especially consider-ing that physicians already spend equal or more timeon patient documentation than on direct patient care[1415] This might indicate why those elements droppedout of the Data Inventory and into the wish list Likewisethis might be the case for the medical history dataelements lsquoTrial titlersquo lsquoInclusion datersquo and lsquoEnd of participa-tion datersquo which are also related to research We furtherassume that data elements from the data group medicalhistory are more frequently documented than our ex-ports show but most likely in free text and not as struc-tured data Natural language processing is not within thescope of EHR4CR and therefore only structured datawere used The number of data elements that can befound in free text was not further investigated and is sub-ject for future research

Table 3 Data elements of the wish list

Data group Data items

Findings QTc interval Left ventricular ejection fraction

Laboratoryfindings

MAGE-A3 status

Medicalhistory

Trial title Inclusion date End of participation dateCurrent method of contraception Vaccines HIV statusLactation

Patientcharacteristics

Willingness to participate in clinical trials

Scoresclassifications

Date of scoreClassifications Karnofsky-score EasternCooperative Oncology Group -performance statusTNM-classification New York Heart Association - statusResponse Evaluation Criteria in Solid Tumors Hoehnand Yahr scale GRID-Hamilton Depression Rating ScaleMini-Mental State Examination Unified ParkinsonrsquosDisease Rating Scale Section 1

The wish list contains data elements that are currently not or very rarelyavailable in European EHRs but that are frequently requested instudy protocols

Doods et al Trials 2014 1518 Page 5 of 10httpwwwtrialsjournalcomcontent15118

To create a valid and sponsor (EFPIA or IIT)-acceptedData Inventory we decided to compile the list throughan iterative and consensus driven process with partnersfrom academia and strong participation from domainexperts of European pharmaceutical companiesThe EFPIA partners in the project are among the largest

researching pharmaceutical companies in Europe withmany studies each year This way both sides added their

perspectives and increased the acceptance of such a listThe international character of the focus group and thevalidation at several university hospitals makes the datainventory meaningful beyond national borders No ref-erences on settings of average European hospitals werefound so whether the data exports at non-universityhospitals would have resulted in similar availabilitynumbers cannot be stated An iterative approach was

Figure 3 Heat map of the data exports from the data inventory current version The first two columns describe the ISO 11179 dataelement concept (data groupdata item) The third column shows the average usage of the data element over all sites while the followingcolumns (site 1 to site 9) display the frequency at the individual sites The Data Inventory is ordered by the average usage sorted in descendingorder from most available to least The frequency ranges from 100 (dark green) to 0 (dark red) Data elements that are not available at a siteare shown as Not Available (NA) (black)

Doods et al Trials 2014 1518 Page 6 of 10httpwwwtrialsjournalcomcontent15118

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

Data Inventory The six least available data elements inthis figure were moved to the wish list after the analysisof the heat map

DiscussionThe overall goal of this work was to identify data ele-ments that are needed for site feasibility analysis in clin-ical studies and are at the same time commonlydocumented in European EHR systems The heat mapwas created to determine the availability across the dataprovider sites The coloring of each cell was considereda good means to give an overview of how frequentlyeach element is documented and especially highlightthose which are generally not used Widely available dataelements are from data groups demographics diagnosisprocedures and laboratory findings Under-documentedelements are those captured in the wish list which arefrom the groups lsquoscores and classificationsrsquo and medicalhistory We chose to use own groups instead of usingWeng et alLuo et alrsquos [1213] semantic classes becauseour focus was not only on clinical trials but also onEHRs With the data groups we also wanted to indicatewhere data could be found in EHRs For example proce-dures are used in Europe for Diagnosis Related Groups(DRG) Consequently diagnostic and therapeutic proce-dures would be covered by the same data group Eachelement was reviewed by feasibility and recruitment ex-perts and contains a definition and link to the NCIm toensure a clear understanding and avoid confusion of theexact meaning Value lists for example diagnoses thatare frequently used for the projectrsquos disease areas arenot specified and out of scope of this work thereforethey were not taken into consideration The focus is onavailability and frequency of data elements at EHR4CR

pilot sites so only examples for values are given Thehigh availability of the elements from the aforemen-tioned data groups is most likely because they areneeded for reimbursement and quality managementLaboratories have started structuring their data veryearly because in most cases special laboratory informa-tion systems are connected to the main EHR Becausethe data have to be available there laboratory findingsare also highly available The exact reasons howeverwere not investigated Despite the fact that a lot oflaboratory findings are available in EHRs many labora-tories do not yet use standard terminologies like LogicalObservation Identifiers Names and Codes (LOINC)Classifications like the International Statistical Classifica-tion of Diseases and Related Health Problems 10threvision (ICD10) on the other hand are standard inEuropean EHRs That is also the reason why we usedgeneral data elements for diagnoses and procedures be-cause it is generally easier to identify diagnosis and pro-cedures data in EHRs than it is to find non standardizeddata elementsData elements were ranked according to the availability

of the exports so that elements of low availability couldbe identified The wish list contains those additional dataelements that are relevant for clinical research but notdocumented in a structured manner during routine careTo enhance secondary use of data for clinical researchEHR systems could be extended to allow a structureddocumentation of these elements like the Eastern Co-operative Oncology Group (ECOG) This would not ne-cessarily result in more documentation work but ratherin a different representation of the same content Insteadof free text lsquopatient is bedriddenrsquo an ECOG score of 4could be documented We assume that scores like theMini-Mental State Examination (MMSE) and medicalhistory elements like lsquocurrent method of contraceptionrsquoare primarily documented for research purposes There isgenerally no direct incentive for documentation of inten-sive scales with many data elements especially consider-ing that physicians already spend equal or more timeon patient documentation than on direct patient care[1415] This might indicate why those elements droppedout of the Data Inventory and into the wish list Likewisethis might be the case for the medical history dataelements lsquoTrial titlersquo lsquoInclusion datersquo and lsquoEnd of participa-tion datersquo which are also related to research We furtherassume that data elements from the data group medicalhistory are more frequently documented than our ex-ports show but most likely in free text and not as struc-tured data Natural language processing is not within thescope of EHR4CR and therefore only structured datawere used The number of data elements that can befound in free text was not further investigated and is sub-ject for future research

Table 3 Data elements of the wish list

Data group Data items

Findings QTc interval Left ventricular ejection fraction

Laboratoryfindings

MAGE-A3 status

Medicalhistory

Trial title Inclusion date End of participation dateCurrent method of contraception Vaccines HIV statusLactation

Patientcharacteristics

Willingness to participate in clinical trials

Scoresclassifications

Date of scoreClassifications Karnofsky-score EasternCooperative Oncology Group -performance statusTNM-classification New York Heart Association - statusResponse Evaluation Criteria in Solid Tumors Hoehnand Yahr scale GRID-Hamilton Depression Rating ScaleMini-Mental State Examination Unified ParkinsonrsquosDisease Rating Scale Section 1

The wish list contains data elements that are currently not or very rarelyavailable in European EHRs but that are frequently requested instudy protocols

Doods et al Trials 2014 1518 Page 5 of 10httpwwwtrialsjournalcomcontent15118

To create a valid and sponsor (EFPIA or IIT)-acceptedData Inventory we decided to compile the list throughan iterative and consensus driven process with partnersfrom academia and strong participation from domainexperts of European pharmaceutical companiesThe EFPIA partners in the project are among the largest

researching pharmaceutical companies in Europe withmany studies each year This way both sides added their

perspectives and increased the acceptance of such a listThe international character of the focus group and thevalidation at several university hospitals makes the datainventory meaningful beyond national borders No ref-erences on settings of average European hospitals werefound so whether the data exports at non-universityhospitals would have resulted in similar availabilitynumbers cannot be stated An iterative approach was

Figure 3 Heat map of the data exports from the data inventory current version The first two columns describe the ISO 11179 dataelement concept (data groupdata item) The third column shows the average usage of the data element over all sites while the followingcolumns (site 1 to site 9) display the frequency at the individual sites The Data Inventory is ordered by the average usage sorted in descendingorder from most available to least The frequency ranges from 100 (dark green) to 0 (dark red) Data elements that are not available at a siteare shown as Not Available (NA) (black)

Doods et al Trials 2014 1518 Page 6 of 10httpwwwtrialsjournalcomcontent15118

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

To create a valid and sponsor (EFPIA or IIT)-acceptedData Inventory we decided to compile the list throughan iterative and consensus driven process with partnersfrom academia and strong participation from domainexperts of European pharmaceutical companiesThe EFPIA partners in the project are among the largest

researching pharmaceutical companies in Europe withmany studies each year This way both sides added their

perspectives and increased the acceptance of such a listThe international character of the focus group and thevalidation at several university hospitals makes the datainventory meaningful beyond national borders No ref-erences on settings of average European hospitals werefound so whether the data exports at non-universityhospitals would have resulted in similar availabilitynumbers cannot be stated An iterative approach was

Figure 3 Heat map of the data exports from the data inventory current version The first two columns describe the ISO 11179 dataelement concept (data groupdata item) The third column shows the average usage of the data element over all sites while the followingcolumns (site 1 to site 9) display the frequency at the individual sites The Data Inventory is ordered by the average usage sorted in descendingorder from most available to least The frequency ranges from 100 (dark green) to 0 (dark red) Data elements that are not available at a siteare shown as Not Available (NA) (black)

Doods et al Trials 2014 1518 Page 6 of 10httpwwwtrialsjournalcomcontent15118

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

chosen as a pragmatic way to see if our method wasfeasible and to improve single steps of the process as wewent along One example is that we verified the avail-ability of data elements in the first data export in per-centage groups (6 100 5 lt 100 to 75 4 lt 75 to50 3 lt 50 to 25 2 lt 25 to 10 1 lt 10 to gt 0 00 NA not available) while in the second roundexact relative percentages (for example 98) were re-quested Each iteration included the simplification andidentification of data elements from eligibility criteria ofstudy protocols This was done by feasibility and re-cruitment specialists from the pharmaceutical compan-ies themselves Analysis and processing of eligibilitycriteria has been done by other groups [121617] butour aim was not to follow a specific representation orcreate a new format We wanted to extract the most im-portant information out of those free text criteria anddisplay them in a simple comprehensible way for allstakeholders By doing so we were able to identify theunderlying data elements and add new ones to the DataInventoryThe subsequent validation exports at eleven data pro-

vider sites with varying disease specific focuses were ei-ther done on whole EHRs or on subsystems of thehospitals depending on available data sources This isalso the reason why some sites have many black cells inFigure 3 when they used a specialized departmentalsubsystem instead of the whole EHR A bias by thosesites that used specific subsystems can therefore not beexcluded but while general elements like lsquodate of birthrsquowere not negatively affected disease or gender specificelements were influenced positively lsquoMedical historymenopausal statusrsquo was for example seldom docu-mented in the majority of the systems but was alwaysavailable in a breast cancer systemBecause the Data Inventory contains data elements

with clear definitions it can be used as a reference forimportant data elements when new forms are created inEHRs Through both lists the Data Inventory and thewish list it is clearer what to expect if EHRs are to beused for clinical research In general EHR systems couldbe accredited for their compliance with catalogs of im-portant data elements in the future This could demon-strate that the respective product is more suited tosupport secondary use of health care data for clinical re-search than non-compliant EHR systemsThe simplification of eligibility criteria was a manual

task focused on clinical trial feasibility It is possible thatdifferent criteria would have been identified by otherpeople The Data Inventory is created as part of theEHR4CR project and therefore the studies were selectedto include each company and each data provider siteThis means that other studies other disease areas anddifferent companies might also have resulted in different

data elements However given the large number of in-volved countries hospitals and trial experts this DataInventory represents an important consensus Given thatEHR4CR covers six major disease areas we assume thatthe Data Inventory will in general cover a large part ofclinical studies

Related workIn the following we compare the Data Inventory againstwork that relate to oursWeintraub et al [18] compiled a list of 100 cardiovas-

cular lsquodata fieldsrsquo that were identified from existing datastandards as being a lsquobase set of terms with maximalvaluersquo to specified criteria The list is intended to be usedin EHRs and facilitate secondary use but was not vali-dated with data exports from EHR systemsA comparison of the data fields and the data elements

of the Data Inventory showed that because of the differ-ent scope and definitions of data elements both listscannot readily be compared Out of the 100 data fields20 exactly match data elements of the Data Inventorywhile 46 are not directly captured in the Data Inventorybut would rather be values of our data elements and 34do not match at all An example of a data field thatwould be a value in the Data Inventory is lsquodiabetesrsquowhich we would consider a value of the data elementlsquoDiagnosistextrsquo Table 4 shows in more detail how thedata fields correspond to the data elementsIn contrast to our work the lsquokey data elements of a

base cardiovascular vocabularyrsquo describe elements thatshould be documented in EHRs to support the exchangeof information throughout care while the Data Inventoryis a catalog of available data elements in EHRs that areimportant for clinical researchHaumlyrinen et al [19] describe the core data elements

that were introduced in Finland for a national electronichealth record Similar to our approach Haumlyrinen et al

Table 4 Comparison of the Data Inventory with UScardiovascular data fields [18]

Number of data fieldsmatching data elementsof the Data Inventory

Exact match Data field asvalue of adata element

No match

History and physicalexamination elements

8 24 5

Pharmacological therapydata elements

0 20 0

Laboratory results elements 10 0 1

Diagnostic and therapeuticprocedures elements

2 2 26

Outcomes data elements 0 0 2

Exact matches are available in both lists no matches means that the data fieldis not represented in the Data Inventory and lsquodata field as value of a dataelementrsquo means that data fields can be matched to data elements becausethey refer to similar concepts (for example data field lsquoDiabetesrsquo corresponds todata elements lsquoDiagnosisTextrsquo)

Doods et al Trials 2014 1518 Page 7 of 10httpwwwtrialsjournalcomcontent15118

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

defined a list of data elements added a definition to eachitem and furthermore added the terminology or codesystems that should be used if suitable systems were ex-istent Haumlyrinenrsquos list contains elements that would beimplemented for the national EHR In contrast for ourData Inventory we identified elements that are currentlydocumented in EHRs A comparison of the lsquocore data el-ementsrsquo and the data elements of the Data Inventoryshowed several similarities for example lsquohealth problemsand diagnosisrsquo should use ICD-10 or ICPC (InternationalClassification of Primary Care) codes which correlatesto lsquodiagnosiscodersquo of the Data Inventory We did notspecify which classification should be used but ICD-10or ICPC codes would be values of this data element aswellWeng et al [12] and Luo et al [13] describe in related

publications a semi-automatic approach that allows an-notating free text eligibility criteria using semantic repre-sentation In contrast to our expert-driven simplificationapproach - intended to reduce complexity with a focus ontrial feasibility - those methods aim at semi-automaticallyextracting the complete information out of free text Dueto the different approaches there is some overlap withour Data Inventory Out of the 27 semantic classes fromthe Weng and Luo publications only lsquoAgersquo and lsquoGenderrsquomatch directly ten classes correspond to one or moredata groups and 15 are not represented in the DataInventory at all Table 5 shows how the semantic classescorrespond to the data groupsKoumlpcke et al [20] describe in their work the data com-

pleteness of five German hospital EHRs from data elementsof 15 studies re-using Luorsquos semantic classes Koumlpckersquos andour work show similar tendencies for completeness andusage For example age and gender are highly available andused for both lists although information with respect topregnancies is not so readily available in both Although[20] is focused on patient recruitment and the Data

Inventory on feasibility the tendencies of availabilityand usage are similar

Lessons learnedEligibility criteria in clinical trial protocols are usually de-scribed using long and complicated free text sentenceswhich cannot readily be used for further processingThrough a process of simplifying the criteria the informa-tion content can be reduced or split up in several partsuntil single data elements are left which can be repre-sented in a formal consistent way When doing the simpli-fication we also identified a difference between requireddata elements for trial feasibility and recruitment Whilethe criteria for feasibility are fewer in number and moregeneral data elements for recruitment have to be moreprecise From the experiences made of the simplificationtask best practice principles for simplifying eligibility cri-teria [8] were created They describe how eligibility criteriashould be formulated to be clearly understandable andcomputer readable with little additional effort When com-paring the Data Inventory with billing data in particularDRG data [21] one can see that EHRs nowadays alreadycontain more data elements that can be re-used for re-search than just diagnosis and procedure codes laboratoryfindings for exampleWeiskopf and Weng [22] identified five common dimen-

sions of data quality reviewing 95 articles (completenesscorrectness plausibility concordance and currency)Evaluating correctness plausibility and currency is a la-borious task which requires medical knowledge and cantherefore not be automated readily To investigate cor-rectness for example patient charts would have to bechecked manually to determine if the documented dataare correct A mapping between the data export and theelements in the EHRs was done with knowledge of thelocal project partners the concordance however wasnot evaluated in detail However through the data ex-ports we did capture the availability of the data elements(completeness) as is shown in Figure 2 During the creationof the Data Inventory we have seen in several instancesthat data quality is a critical issue Automatic transfer ofEHR data into an electronic data capture system for ex-ample can only be performed if the data quality is lsquohighenoughrsquo The aim of the inventory however was not toaddress these issues but to obtain an overview of what isavailable and what is required

LimitationsCertain disease areas (for example diabetes or inflam-matory diseases) are not yet fully covered by the DataInventory (see Table 2) The current version is focusedon site feasibility so data elements for patient recruit-ment or clinical trial execution are not considered Ourapproach was to create a global Data Inventory based on

Table 5 Semantic classes from Weng [12]Luo [13]corresponding to the data groups

Data groups (this work) Semantic classes (Weng et alLuo et al)

Diagnosis Disease Symptom and signs Neoplasm status

Procedures Therapy or surgery Diagnostic or lab results

Laboratory findings Diagnostic or lab results

Findings Diagnostic or lab results

Medical history Pregnancy-related activity Addictive behavior

Scores and Classification Neoplasm status Disease stage

Medication Pharmaceutical substance or drug

Demographics Age Gender

Comparison between data groups of this work and semantic classes accordingto Weng [12]Luo [13] Some of the semantic classes are listed more than oncebecause they correspond to more than one data group Similarly one datagroup can correspond to one or more semantic class

Doods et al Trials 2014 1518 Page 8 of 10httpwwwtrialsjournalcomcontent15118

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

relevant elements for research that are available in EHRsIt does not take into account varying documentationneeds of different disease areas but should be more eas-ily implemented in contrast to several disease-specificdata inventories

OutlookWe expect the Data Inventory to be constantly evolvingbased upon future releases of EHR systems and analysesof more trials from different disease areas In a review of89 papers Haumlyrinen [23] reported that use of informationsystems leads to more complete and detailed documenta-tion and structured data entry increases completenessand accuracy of data Future work will focus on appropri-ate methods and procedures to further improve EHRdata completeness and include those data elements inroutine care that are currently missing Scope as well asbenefits and costs need to be taken into account whenaiming to include new elements into the routine docu-mentation The data quality of EHRs in general not onlythe completeness will need to be analyzed to ensure thatthe data can be used for clinical studies

ConclusionToday EHR systems already provide many data elementsthat can be used for feasibility analysis of clinical studiesAn inventory of elements was created in a combinedeffort between experts from pharmaceutical companiesand academic sites It provides a common set of dataelements that are frequently used in clinical research andat the same time available for re-use from current hos-pital information systems

Additional file

Additional file 1 Complete Data Inventory with all data elementsThe Data Inventory contains data elements for feasibility analysis thatwere extracted from clinical trial protocols and that were verified to beavailable in European EHR systems The inventory contains data elementconcepts (data group + data item) optional examples the definitionsand links to NCIm

AbbreviationsDRG Diagnosis related group ECOG Eastern Cooperative Oncology GroupEFPIA European Federation of Pharmaceutical Industries and AssociationsEHR4CR Electronic Health Records for Clinical Research EHR Electronichealth record ICD10 International Statistical Classification of Diseases andRelated Health Problems 10th Revision IIT Investigator initiated trialIMI Innovative Medicine Initiative LOINC Logical Observation IdentifiersNames and Codes MMSE Mini-Mental State Examination NCIm NCIMetathesaurus UMLS Unified Medical Language System

Competing interestsThe authors declare that they have no competing interests

Authorsrsquo contributionsJD helped with the simplification was part of the peer review groupcollected the data analyzed it and wrote the manuscript MD was part ofthe peer review group and helped to draft the manuscript FF helped with

the simplification was part of the peer review group supervised themethodological approach and helped to draft the manuscript All authorsread and approved the final manuscript

AcknowledgmentsWe would like to thank all EHR4CR work package 7 members who helped increating the Data Inventory and who contributed to the simplification ofeligibility criteria We would like to especially thank Anouk Deruaz SilkeEwald Kerstin Holzapfel Andy Sykes Nadine Ulliac-Sagnes and ChristelWouters who were part of the peer group Also we would like to thank allthe project partners who contributed to the data exports especially DionisioAcosta-Mena Bernhard Breil Marc Cuggia James Cunningham ThomasGanslandt Sharon Kean Sebastian Mate Mark McGilchrist Cezary Szmigielskiand Eric ZapletalWe also like to thank Bartholomaumlus Kahl who helped in the creation of thefirst version of the Data Inventory and Dipak Kalra who provided criticalreview of the draft manuscriptThe research leading to these results has received support from theInnovative Medicines Initiative Joint Undertaking under grant agreementnumber 115189 resources of which are composed of financial contributionfrom the European Unionrsquos Seventh Framework Program (FP72007-2013)and EFPIA companiesrsquo in kind contribution

Author details1Institute of Medical Informatics University Muumlnster Albert-Schweitzer-Campus1A11 D-48149 Muumlnster Germany 2Integrated Information SciencesDevelopment Novartis Pharma AG CH-4002 Basel Switzerland

Received 5 August 2013 Accepted 13 December 2013Published 10 January 2014

References1 McDonald AM Knight RC Campbell MK Entwistle V a Grant AM Cook J a

Elbourne DR Francis D Garcia J Roberts I Snowdon C What influencesrecruitment to randomized controlled trials A review of trials funded bytwo UK funding agencies Trials 2006 79

2 Van der Wouden JC Blankenstein AH Huibers MJH van der Windt D a WMStalman W a B Verhagen AP Survey among 78 studies showed thatLasagnarsquos law holds in Dutch primary care research J Clin Epidemiol 200760819ndash824

3 Dugas M Lange M Muumlller-Tidow C Kirchhof P Prokosch H-U Routine datafrom hospital information systems can support patient recruitment forclinical studies Clinical Trials (London England) 2010 7183ndash189

4 Innovative Medicines Initiative [httpwwwimieuropaeu] Last accessedNovember 2013

5 Electronic Health Records for Clinical Research [wwwehr4creu] Lastaccessed 15112013

6 International Organization for StandardizationInternational ElectrotechnicalCommission ISOIEC 11179 Information Technology - Metadata Registries(MDR) - Part 1 Framework 2004 [httpstandardsisoorgittfPubliclyAvailableStandardsc035343_ISO_IEC_11179-1_2004(E)zip]Last accessed 29122013

7 Unified Medical Language System [httpwwwnlmnihgovresearchumls] Last accessed November 2013

8 Doods J Holzapfel K Dugas M Fritz F Development of best practiceprinciples for simplifying eligibility criteria Stud Health Technol Inform2013 1921153

9 Microsoft [wwwmicrosoftcom] Last accessed 2912201310 NCIm CUI [httpncimetancinihgovncimbrowserConceptReportjsp

dictionary=NCIMetaThesaurusampcode=C2348662] Last accessed November2013

11 NCI Metathesaurus [httpncimetancinihgovncimbrowser] Lastaccessed November 2013

12 Weng C Wu X Luo Z Boland MR Theodoratos D Johnson SB EliXR anapproach to eligibility criteria extraction and representation J Am MedInform Assoc 2011 18(Suppl 1)i116ndashi124

13 Luo Z Yetisgen-Yildiz M Weng C Dynamic categorization of clinicalresearch eligibility criteria by hierarchical clustering J Biomed Inform2011 44927ndash935

14 Ammenwerth E Spoumltl H-P The time needed for clinical documentationversus direct patient care Methods Inf Med 2009 4884ndash91

Doods et al Trials 2014 1518 Page 9 of 10httpwwwtrialsjournalcomcontent15118

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

15 Tipping M Forth V Where did the day go - A time-motion study ofhospitalists J Hospital 2010 5323ndash328

16 Tu S Peleg M Carini S Rubin D Sim I ERGO a template-based expressionlanguage for encoding eligibility criteria In Technical report 2008

17 Wang SJ Ohno-machado L Mar P Boxwala AA Greenes RA EnhancingArden Syntax for clinical trial eligibility criteria 19998280

18 Weintraub WS Karlsberg RP Tcheng JE Boris JR Buxton AE Dove JTFonarow GC Goldberg LR Heidenreich P Hendel RC Jacobs AK Lewis WMirro MJ Shahian DM Bozkurt B Jacobs JP Peterson PN Roger VL SmithEE Wang T ACCFAHA 2011 key data elements and definitions of a basecardiovascular vocabulary for electronic health records a report of theAmerican College of Cardiology FoundationAmerican Heart AssociationTask Force on Clinical Data Standards Circulation 2011 124103ndash123

19 Haumlyrinen K Saranto K The core data elements of electronic health recordin Finland Stud Health Technol Inform 2005 116131ndash136

20 Koumlpcke F Trinczek B Majeed RW Schreiweis B Wenk J Leusch TGanslandt T Ohmann C Bergh B Roumlhrig R Dugas M Prokosch HUEvaluation of data completeness in the electronic health record for thepurpose of patient recruitment into clinical trials a retrospective analysisof element presence BMC Med Inform Decis Mak 2013 1337

21 Schreyoumlgg J Stargardt T Tiemann O Busse R Methods to determinereimbursement rates for diagnosis related groups (DRG) a comparisonof nine European countries Health Care Manag Sci 2006 9215ndash223

22 Weiskopf NG Weng C Methods and dimensions of electronic healthrecord data quality assessment enabling reuse for clinical researchJ Am Med Inform Assoc 2013 20144ndash151

23 Haumlyrinen K Saranto K Nykaumlnen P Definition structure content use andimpacts of electronic health records a review of the research literatureInt J Med Inform 2008 77291ndash304

doi1011861745-6215-15-18Cite this article as Doods et al A European inventory of commonelectronic health record data elements for clinical trial feasibility Trials2014 1518

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Doods et al Trials 2014 1518 Page 10 of 10httpwwwtrialsjournalcomcontent15118

  • Abstract
    • Background
    • Methods
    • Results
    • Conclusion
      • Background
      • Methods
        • Data element
        • Data inventory
        • Material
        • Methods
          • Results
            • Data groups
            • Data Inventory
            • Wish list
            • Availability of data elements
              • Discussion
                • Related work
                • Lessons learned
                • Limitations
                • Outlook
                  • Conclusion
                  • Additional file
                  • Abbreviations
                  • Competing interests
                  • Authorsrsquo contributions
                  • Acknowledgments
                  • Author details
                  • References

Recommended