Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | sandra-carver |
View: | 22 times |
Download: | 1 times |
HL7 Clinical Genomics –RIM Constraining Issues
May 2008
The HL7 Clinical Genomics SIG
Amnon Shabo (Shvo), PhD
HL7 Clinical Genomics SIGCo-chair and Modeling Facilitator
HL7 Structured Documents TCCDA R2 Co-editorCCD Implementation Guide Co-editor
2
Haifa Research Lab
To achieve semantic interoperability…
ClinicalTrials
Imaging
EHR
Orders& Observations
Pharmacy
ClinicalGuidelines
Health RIM
ClinicalDocuments
ClinicalGenomics
But how do we cope with the challenge of personalized healthcare?Need to bring mass genomic data into healthcare oriented standards!
…we need standard specs derived from a Central Health RIM:
Bioinformatics
Data Models
Encapsulate
& bubble-upOur s
olution
:
3
Haifa Research Lab
0..* associatedObservation
typeCode*: <= COMP
component
0..* associatedProperty
typeCode*: <= DRIV
derivedFrom2
0..* polypeptide
typeCode*: <= DRIV
derivedFrom5
SEQUENCES & PROTEOMICS
0..* expression
typeCode*: <= COMPcomponent1
0..* sequenceVariation
typeCode*: <= COMP
component3
IndividualAlleleclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]negationInd: BL [0..1]text: ED [0..1]effectiveTime: GTS [0..1]value: CD [0..1] (allele code, drawn from HUGO-HGVS or OMIM)methodCode: SET<CE> CWE [0..*]
GeneticLocusclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]code: CE CWE [0..1] (e.g., ALLELIC, NON_ALLELIC)text: ED [0..1]effectiveTime: IVL<TS> [0..1]confidentialityCode: SET<CE> CWE [0..*] <= ConfidentialityuncertaintyCode: CE CNE [0..1] <= Uncertaintyvalue: CD [0..1] (identifying a gene through GenBank GeneID with an optional translation to HUGO name.)methodCode: SET<CE> CWE [0..*]
0..* individualAllele
typeCode*: <= COMP
component1
SequenceclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]code: CD CWE [1..1] (the sequence standard code, e.g. BSML)text: ED [0..1] (sequence's annotations)effectiveTime: GTS [0..1]uncertaintyCode: CE CNE [0..1] <= Uncertaintyvalue: ED [1..1] (the actual sequence)interpretationCode: SET<CE> CWE [0..*] <= ObservationInterpretationmethodCode: SET<CE> CWE [0..*] (the sequencing method)
ExpressionclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]code: CE CWE [1..1] (the standard's code (e.g., MAGE-ML identifier)negationInd: BL [0..1]text: ED [0..1]effectiveTime: GTS [0..1]uncertaintyCode: CE CNE [0..1] <= Uncertaintyvalue: ED [1..1] (the actual gene or protein expression levels)interpretationCode: SET<CE> CWE [0..*] <= ObservationInterpretationmethodCode: SET<CE> CWE [0..*]
PolypeptideclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]text: ED [0..1]effectiveTime: GTS [0..1]value: CD [0..1] (protein code, drawn from SwissProt, PDB, PIR,HUPO, etc.)methodCode: SET<CE> CWE [0..*]
DeterminantPeptidesclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]text: ED [0..1]effectiveTime: GTS [0..1]value: CD [0..1] (peptide code, drawn from referencedatabases like those used in the Polypeptide class)methodCode: SET<CE> CWE [0..*]
Constrained to a restrictedMAGE-ML constrained schema,specified separately.
Constraint: GeneExpression.value
Note:A related allele that is ona different locus, and hasinterrelation with thesource allele, e.g.,translocated duplicatesof the gene.
0..* clinicalPhenotype
typeCode*: <= PERTpertinentInformation
ExternalObservedClinicalPhenotypeclassCode*: <= OBSmoodCode*: <= EVNid*: II [1..1] (The unique id of an external observation residing outside of the instance)code: CD CWE [0..1]text: ED [0..1]effectiveTime: GTS [0..1]
Note:An external observation is preferably a valid observationinstance existing in any other HL7-compliant instance,e.g., a document or a message.Use the id attribute of this class to point to the uniqueinstance identifier of that observation.
Note:A phenotype which has been actuallyobserved in the patient representedinternally in this model.
Note:This is a computed outcome, i.e.,the lab does not test for the actualprotein, but secondary processespopulate this class with thetranslational protein.
SequenceVariationclassCode*: <= OBSmoodCode*: <= EVNid: II [0..1]code: CD CWE [0..1]negationInd: BL [0..1]text: ED [0..1]effectiveTime: GTS [0..1]value: ANY [0..1] (The variation itself expressed with recognized notation like 269T>C or markup like BSML or drawn from an external reference like OMIM or dbSNP.)interpretationCode: SET<CE> CWE [0..*] <= ObservationInterpretationmethodCode: SET<CE> CWE [0..*]
KnownClinicalPhenotypeclassCode*: <= OBSmoodCode*: <= DEFcode: CD CWE [0..1]text: ED [0..1]effectiveTime: GTS [0..1]uncertaintyCode: CE CNE [0..1] <= ActUncertaintyvalue: ANY [0..1]
Note:These phenotypes are not the actual (observed)phenotypes for the patient, rather they are thescientifically known phenotypes of the sourcegenomic observation (e.g., known risks of amutation or know responsiveness to a medication).
Note:Code: COPY_NUMBER, ZYGOSITY, DOMINANCY, GENE_FAMILY,etc. For example, if code = COPY_NUMBER, then the value is oftype INT and is holding the no. of copies of this gene or allele.
0..* clinicalPhenotype
typeCode*: <= PERT
pertinentInformation
EXPRESSION DATA
SEQUENCE VARIATIONS
Polypeptide
Note:The Expression class refers to both gene and proteinexpression levels. It is an encapsulating class that allowsthe encapsulation of raw expression data in its value attribute.
0..* sequence
typeCode*: <= COMPcomponent2
0..* clinicalPhenotypetypeCode*: <= PERT
pertinentInformation
0..* clinicalPhenotype
typeCode*: <= PERT
pertinentInformation
Note:The code attribute indicates inwhat molecule the variation occurs,i.e., DNA, RNA or Protein.
0..* expression
typeCode*: <= COMP
component5
Note:Use the associations to the shadowclasses when the data set type (e.g.,expression) is not at deeper levels(e.g., allelic level) and needs to beassociated directly with the locus(e.g., the expression level is thetranslational result of both alleles).
0..* associatedObservationtypeCode*: <= COMP
component2
0..1 associatedObservation
typeCode*: <= COMP
component4 Note:This recursive associationenables the association of anRNA sequence derived froma DNA sequence and apolypeptide sequence derivedfrom the RNA sequence.
0..* clinicalPhenotype
typeCode*: <= PERT
pertinentInformation
Note:
This class is a placeholder for a specific locus on the genome - that is - a position of a particulargiven sequence in the subject’s genome or linkage map.Note that the semantics of the locus (e.g., gene, marker, variation, etc.) is defined by data assignedin the code & value attributes of this class, and also by placing additional data relating to thislocus into the classes associated with this class like Sequence, Expression, etc..
Note:The term 'Individual Allele' doesn't refer necessarily to aknown variant of the gene/locus, rather it refers to theindividual patient data regarding the gene/locus and mightwell contain personal variations w/unknown significance.
AssociatedObservationclassCode*: <= OBSmoodCode*: <= EVNid: SET<II> [0..*]code: CD CWE [0..1]text: ED [0..1]effectiveTime: GTS [0..1]value: ANY [0..1]methodCode: SET<CE> CWE [0..*]
Note:The code attribute could hold codes likeNORMALIZED_INTENSITY, P_VALUE, etc.The value attribute is populated based on theselected code and its data type is then setupaccordingly during instance creation.
Note:The code attribute could hold codes like TYPE,POSITION.GENOME, LENGTH, REFERENCE, REGION, etc..The value attribute is populated based on the selected codeand its data type is then setup accordingly during instancecreation. Here are a few examples:If code = TYPE, then the value is of type CV and holds one of thefollowing: SNP (tagSNP), INSERTION, DELETION,TRANSLOCATION, etc.
if code = POSITION, then value is of type INT and holdsthe actual numeric value representing the variation positionalong the gene.
if code = LENGTH, then value is of type INT and holdsthe actual numeric value representing the variation length.
If code = POSITION.GENE, then value is of type CV and is oneof the following codes:INTRON, EXON, UTR, PROMOTER, etc.
If code = POSITION.GENOME, then value is of type CV and is oneof the following codes:NORMAL_LOCUS, ECTOPIC, TRANSLOCATION, etc.
If the code = REFERENCE, then value istype CD and holds the reference gene identifier drawn from areference database like GenBank.
The full description of the allowed vocabularies for codes and itsrespective values could be found in the specification.
AssociatedObservation
Note:Code: CLASSIFICATION, etc.For example, if code =CLASSIFICATION, then the valueis of type CV and is holding eitherKNOWN or NOVEL.
reference
0..* geneticLocus
typeCode*: <= REFR
Note:A related gene that is on adifferent locus, and stillhas significant interrelationwith the source gene (similarto the recursive associationof an IndividualAllele).
ClinicalPhenotypeclassCode*: <= ORGANIZERmoodCode*: <= EVN
0..* observedClinicalPhenotype
typeCode*: <= COMP
component1
0..* knownClinicalPhenotype
typeCode*: <= COMP
component2
0..* externalObservedClinicalPhenotype
typeCode*: <= COMP
component3
At least one of the target acts ofthe three component act relationshipsshould be populated, since this isjust a wrapper class.
Constraint: ClinicalPhenotype
Note:- code should indicate the type of source, e.g., OMIM- text could contain pieces from research papers- value could contain a phenotype code if known (e.g., if it’s a disease, then the disease code)
ClinicalPhenotype
ClinicalPhenotype
ClinicalPhenotype
ClinicalPhenotype
ClinicalPhenotype
ClinicalPhenotype
0..1 identifiedEntity
typeCode*: <= SBJcontextControlCode: CS CNE [0..1] <= ContextControl "OP"
subject
reference
0..* individualAllele
typeCode*: <= REFR
ObservedClinicalPhenotype
Note:This CMET might be replacedwith the Clinical Statement SharedModel for richer expressivity, whenthe that mode is approved(currently in ballot).
Constrained to a restricted BSMLcontent model, specified in aseparate schema.
Constraint: Sequence.value
0..* sequence
typeCode*: <= COMP
component4
0..* sequenceVariation
typeCode*: <= COMP
component3
AssociatedPropertyclassCode*: <= OBSmoodCode*: <= EVNcode: CD CWE [0..1]text: ED [0..1]value: ANY [0..1]
0..* associatedProperty
typeCode*: <= DRIVderivedFrom1
AssociatedObservation
0..* associatedObservation
typeCode*: <= COMP
component
AssociatedPropertyAssociatedObservation
0..* associatedProperty
typeCode*: <= DRIV
derivedFrom
AssociatedProperty0..* associatedProperty
typeCode*: <= DRIVderivedFrom1
AssociatedObservation0..* associatedObservation
typeCode*: <= COMPcomponent
0..* sequenceVariationtypeCode*: <= DRIV
derivedFrom3derivedFrom2
0..* sequence
typeCode*: <= DRIV
0..* determinantPeptides
typeCode*: <= DRIV
derivedFrom4
0..* determinantPeptides
typeCode*: <= DRIVderivedFrom
0..* clinicalPhenotype
typeCode*: <= PERT
pertinentInformation 0..* clinicalPhenotype
typeCode*: <= PERT
pertinentInformation
AssociatedProperty
0..* associatedProperty
typeCode*: <= DRIV
derivedFrom
AssociatedProperty
GeneticLociclassCode*: <= OBSmoodCode*: <= EVNid: SET<II> [0..*]code: CD CWE [0..1]effectiveTime: GTS [0..1]value: ANY [0..1]
0..* geneticLocitypeCode*: <= COMPcomponentOf
0..* clinicalPhenotype
typeCode*: <= PERTpertinentInformation
GeneticLoci
0..* geneticLoci
typeCode*: <= COMP
componentOf
GeneticLoci
0..* geneticLoci
typeCode*: <= COMP
componentOf
0..* polypeptide
typeCode*: <= DRIVderivedFrom1
Polypeptide
0..* polypeptide
typeCode*: <= DRIV
derivedFrom2
Note:Use this class to indicate a set of genetic locito which this locus belongs. The loci set couldbe a haplotype, a genetic profile and so forth.Use the id attribute to point to the GeneticLociinstance if available. The other attributesserve as a minimal data set about the loci group.
PHENOTYPES
Note:Any observation related to the variation and is notan inherent part of the variation observation (the lattershould be represented in the AssociatedProperty class).For example, the zygosity of the variation.
Note:Use this class to point to a variationgroup to which this variation belongs.For example, a SNP haplotype.
Note:Any observation related to the sequence and is notan inherent part of the sequence observation (the lattershould be represented in the AssociatedProperty class).For example, splicing alternatives.
Note:Key peptides in the proteinthat determine its function.
Note:There could be zero to manyIndividualAllele objects in aspecific instance. A typicalcase would be an allele pair,one on the paternalchromosome and one on thematernal chromosome.
Note:Use this class toshow an allelehaplotype like in HLA.
Note:Any observationrelated to theexpression assayand is not aninherent part ofthe expressionobservation.
Note:Use this class forinherent dataabout the locus, e.g.chromosome no.
IdentifiedEntityclassCode*: <= IDENTid: SET<II> [0..*]code: CE CWE [0..1] <= RoleCode
Note:Use this role to identify a different subject(e.g., healthy tissue, virus, etc.) than theone propagated from the wrappingmessage or payload (e.g., GeneticLoci).
ScopingEntityclassCode*: <= LIVdeterminerCode*: <= INSTANCEid: SET<II> [0..*]code: CE CWE [0..1] <= EntityCode
0..* assignedEntity
typeCode*: <= PRFcontextControlCode: CS CNE [0..1] <= ContextControl "OP"
performer
0..*
performer
0..*
performer1
0..*
performer2
0..*
performer1
0..*
performer2
Genetic Locus(POCG_RM000010)
The entry point tothe GeneticLocus modelis any locus on the genome.
Constrained to a restricted MAGE-MLcontent model, specified in aseparate schema.
Constraint: Expression.value
Expression
Sequence
SequenceVariation
SequenceVariation
0..* clinicalPhenotypetypeCode*: <= PERT
pertinentInformation
ClinicalPhenotype
CMET: (ASSIGNED) R_AssignedEntity
[universal](COCT_MT090000)
0..1 scopedRoleName
CMET: (ACT) A_SupportingClinicalInformation
[universal](COCT_MT200000)
The Locus and its Alleles
SequenceVariations
ExpressionData
Sequenceand
Proteomics
ClinicalPhenotypes
The HL7 v3 DSTU GeneticLocus Model - Focal Areas:
4
Haifa Research Lab
The Underlying Paradigm: Encapsulate & Bubble-up
End user Applications
for clinical practice & research
Genomic Data Sources
EHR System
HL7 CG Messages with m
ainly
Encapsulating HL7 Objects HL7 C
G M
essa
ges
with
enca
psul
ated
dat
a as
soci
ated
with
HL7 c
linic
al o
bjec
ts (p
heno
type
s)
Bubble up the most clinically-significant raw
genomic data into specialized HL7 objects and
link them with clinical data from the patient EHR
Decision Support Applications
Knowledge(KBs, Ontologies, registries,
reference DBs, Papers, etc.)
Bridging is the challenge…
Encapsulation by predefined & constrained
bioinformatics schemas
Bubbling-up is done continuously
by specialized Decision Support
applications
5
Haifa Research Lab
The Refinement Process Starts in the RIM
Observation
Act Specialization
DiagnosticImage
Observation SpecializationPublicHealthCase
Observation Specialization
So why not a genomic specialization?
Where do we draw the line and stop specializing?
The core classes:An Entity plays a Role which Participates in an Act
Generic ontology
6
Haifa Research Lab
Refining a RIM Class in a Static Model
This process of refinement is needed in Clinical Genomics Clinical Genomics brings new & unique concepts into HL7 Therefore we proposed new RIM classes but got rejected
e.g., SequenceVariance with attributes like position, length, etc.
We then developed the GEN code hierarchy instead…
classCode attr.
Clone B. Name
code attr.
OBS
BaseBattery
LOINC code for
Test Battery/Panel
OBS
BaseUnitaryResult
LOINC code for
Test
Examples from the RCRIM CT Lab Model:
OBS
ToxicityGrade
LOINC
7
Haifa Research Lab
The GEN Hierarchy (in ActClass vocabulary)
8
Haifa Research Lab
The GEN Hierarchy (in ActClass vocabulary)
The GEN code hierarchy was accepted by RIM Harmonization already in July 2006!
It indentifies core classes of the genomic models Enables further specialization of classes like sequence variation
by using the code attribute
It allows CDSS to implement the bubble-up paradigm, i.e., creating and looking for genotypephenotype associations
We need minor refinements of this hierarchy, e.g.: Add ALLELE code to the GEN hierarchy This is also addressing Bob Dolin’s ballot comment about SEQVAR for
IndividualAllele wild type which is not an appropriate code PHN (Phenotype) is a genomic-related observation (edit GEN definition)
9
Haifa Research Lab
Complex GenotypePhenotype Associations
“Some patients, particularly young children, may be born with a genetic mutation which means they are at risk of hearing loss after taken antibiotics called aminoglycosides. There is now a drive to consider screening patients for the genetic mutation known as m.1555A-G which is held in around 1 in 1,611 newborns in the USA, 1 in 206 newborns in New Zealand and 1 in 40,000 newborns in the UK. Aminoglycosides are valuable antibiotics used for serious infections such as complicated urinary tract infections, TB and septicemia. They are known to potentially cause damage to the ear - otoxicity. Individuals holding the mutation have an inherited predisposition which makes them extremely sensitive to the effects - they can end up with severe and permanent hearing loss."
10
Haifa Research Lab
The PHN (Phenotype) Class Code
Signifies that an observation is a phenotype
Typically is associated with a source genomic observation But not necessarily in cases where the exact genetic sources is yet unknown
A genomic observation could be associated with other genomic observations or with other observations not necessarily classified as phenotypes
Prepare the ground for a ‘disease model’ expressed in RIM so that patient data could be better checked against it
The IBM Clinical Genomics solution as used the Hypergenes project - http://www.hypergenes.eu/) relies on the GEN hierarchy and performs semantic computations based on these codes
11
Haifa Research Lab
General Vocabulary Issues
What are the clear-cut criteria that distinguish between ActClass and ActCode, preferably in a way that a designer of a clinical decision support application could use them programmatically?
ActCode should specialize ActClass but current guidance and the actual values in the two vocabularies sometime blur the differences between the two, e.g.: _MedicationObservationType act code doesn’t specialize any
act class SpecimenTransport act code could be specializing TRANS or
SPCTRT _ActCognitiveProfessionalServiceCode act code doesn’t
specialize a.class
If indeed one specializes the other, why not merge them into one coherent vocabulary where sub-typing will become explicit? The fields classCode, [clone name] and code will bind to the
appropriate layer in this proposed consolidated vocabulary
12
Haifa Research Lab
What if GEN is Moved to ActCode…
In lack of class code & clone name - need to rely on the code attribute, however…
This poses the following problems: Code attribute is often drawn from external terminology
Giving up class code & clone name leads to loss of granularity in the data represented by the class
Inferring the generic GEN code from a value in the code attribute might not be easy, especially if drawn from external terminology
13
Haifa Research Lab
Clone Business Names DO Carry Semantics
Current guidance: Clone business names should not carry semantics
Rebuttal: These names are meaningful to committee members and are balloted and approved by the HL7 membership
An example – CDA AuthoringDevice vs. Device The CDA R2 AuthoringDevice and Device clones have the same class code
(DEV) and the same binding of the code attribute (EntityCode). In fact, they have the very same set of attributes refined exactly the same
(cardinality, coding strength, vocab, etc). The only way to distinguish between the two is through the clone name
(unless you rely on the traversal path). The two clones are semantically distinct: AuthoringDevice has to do with the
authoring of the document and Device has to do with some specimen mentioned in the body of the document.
In addition, the clone AuthoringDevice does carry semantics which cannot be found in either EntityClass or EntityCode vocabularies.
DeviceclassCode*: <= DEVdeterminerCode*: <= INSTANCEcode: CE CWE [0..1] <= EntityCodemanufacturerModelName: SC CWE [0..1] <= ManufacturerModelNamesoftwareName: SC CWE [0..1] <= SoftwareName
AuthoringDeviceclassCode*: <= DEVdeterminerCode*: <= INSTANCEcode: CE CWE [0..1] <= EntityCodemanufacturerModelName: SC CWE [0..1] <= ManufacturerModelNamesoftwareName: SC CWE [0..1] <= SoftwareName
???
14
Haifa Research Lab
Clone Business Names (cont.)
The CDA R2 Encounter and EncompassingEncounter have the same class code (ENC) and the same binding of the code attribute (ActEncounterCode). Semantically they're distinct: the EncompassingEncounter is the encounter documented in the CDA instance and the other clone represents an encounter you may refer to in the body of the document.
A different example of the clone business name semantics is from Patient Administration (Emergency Encounter) model: the clone ValuablesLocation has class code<=OBS, no code attribute and value attribute that is not bound to any vocabulary (it also has mood & negation attributes). Obviously, the semantics carried by ValuablesLocation cannot be found under OBS but also cannot be found in both ActClass and ActCode vocabs.
EncounterclassCode*: <= ENCmoodCode*: <= x_DocumentEncounterMoodid: SET<II> [0..*]code: CD CWE [0..1] <= ActEncounterCodetext: ED [0..1]statusCode: CS CNE [0..1] <= ActStatuseffectiveTime: IVL<TS> [0..1]priorityCode: CE CWE [0..1] <= ActPriority
EncompassingEncounterclassCode*: <= ENCmoodCode*: <= EVNid: SET<II> [0..*]code: CE CWE [0..1] <= ActEncounterCodeeffectiveTime*: IVL<TS> [1..1]dischargeDispositionCode: CE CWE [0..1] <= EncounterDischargeDisposition
???
15
Haifa Research Lab
Formalize Clone Business Names…
Formalize the clone business name into a vocabulary domain, intertwined in the consolidated ActClass/Code
Adding a definition for each clone name is already done in the walk-through of each model, and possibly in the glossary
In this way, we can relate to clone names as codes in the cascading 'identification' of a class
This could ease the burden of class identification by the class code attribute, if indeed the wish is to keep the ActClass as minimal as possible