D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
1
Model Driven Paediatric European Digital Repository
Call identifier: FP7-ICT-2011-9 - Grant agreement No.: 600932
Thematic Priority: ICT - ICT-2011.5.2: Virtual Physiological Human
Deliverable 1.5.4
Fourth Half-Yearly report
Due date of delivery: September 30th, 2016
Actual submission date: February 9th, 2017
Start of the Project: 1st March 2013
Ending Date: 31st May 2017
Partner responsible for this deliverable: LYNKEUS
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
2
Dissemination Level: Reserved
Document Classification
Title Fourth Half-Yearly report
Deliverable 1.5.4
Reporting Period 3
Authors Lynkeus
Work Package WP1
Security R
Nature RE
Keyword(s) Half yearly report, targets, deliverables.
Document History
Name Remark Version Date
Antonella Trezzani 1.0
Mirko De Maldé 2.0
Davide Zaccagnini 3.0
Mirko De Maldé 4.0
List of Contributors
Name Affiliation
Mirko De Maldé LYNK
Marcello Chinali OPBG
Lorenza Putignani OPBG
Tobias Heimann Siemens
Alex Jones UCL
Olivier Pauly Siemens
Clara Malattia IGG
Marjolein Van Der Krogt VUmc
Claudia Mazzà USFD
Emilie Pasche HES-SO
Sebastien Gaspard GNUBILA
Harry Dimitropoulos ATHENA
Omiros Metaxas ATHENA
Marcus Kelm DHZB
Frans Steenbrink MOTEK
Maria Costa SAG
Reiner Thiel Empirica
List of Reviewers
Name Partner
Edwin Morley-Fletcher Lynkeus
Bruno Dallapiccola OPBG
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
3
Table of contents
Nature of this Deliverable ................................................................................................................................. 4
Short description ............................................................................................................................................ 4
MD-Paedigree’s objectives in the first six months of the Fourth Phase ........................................................... 4
Deliverables due by month M36-46............................................................................................................... 5
Main achievements in the current reporting period ......................................................................................... 6
WP1 – Coordination & Project Management ................................................................................................ 6
WP2 – Clinical and technical user requirements for disease modelling ........................................................ 6
WP3 – Data acquisition and processing for Cardiomyopathies .................................................................... 6
WP4 – Data acquisition and processing for the estimation of CVD risk in obese children ............................ 7
WP5 – Data acquisition and processing for Juvenile Idiopathic Arthritis .................................................... 10
WP6 – Data acquisition and processing for NND .................................................................................... 3031
WP7 – Genetic and metagenomic analytics ................................................................................................ 35
WP8 – Modelling and simulation for Cardiomyopathies ............................................................................. 43
WP9 – Modelling cardiovascular risk in the obese child and adolescent .................................................... 44
WP10 – Modelling and simulation for JIA ............................................................................................... 5346
WP11 – Modelling and simulation for NND ............................................................................................ 5548
WP12 – Models validation, outcome analysis and clinical workflows .................................................... 5549
WP13 –Requirements and Compliance for the MD-Paedigree Infostructure .......................................... 5952
WP14 – Grid-Cloud Services Provision and GPU Services Integration ..................................................... 5953
WP15 – Semantic Data Representation and Information access ............................................................ 6055
WP16 – Biomedical Knowledge Discovery and Simulation for Model-guided Personalised Medicine .... 6257
WP17 – Testing and validation ................................................................................................................ 6459
WP18 – Dissemination &Training ............................................................................................................ 6560
WP19 – Exploitation, HTA, and Medical Device Conformity ................................................................... 6661
Financial, administrative and consortium management relevant information .......................................... 6762
Financial and administrative information ............................................................................................... 6762
Consortium Management ........................................................................................................................ 6762
MD-Paedigree’s Meetings ........................................................................................................................... 6762
Physical meetings .................................................................................................................................... 6762
Dissemination activities ............................................................................................................................... 6863
Conferences, Workshops attended/organised/foreseen ......................................................................... 6863
Conclusion ................................................................................................................................................... 6964
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
4
Nature of this Deliverable
This Deliverable provides a brief account of the work carried out during the period M36-M44 in the MD-
Paedigree Project. The report was initially meant to report about the period M36-M42, but it has deemed
reasonable to extend the reporting period to take into account the three-months project extension, which
brings the completion of the project to M51 (from M48).
Given the fact that the period covered by this report does not coincide with any yearly reporting period, the
deliverable doesn’t follow the template provided for by the Commission for the official Periodic Reports.
Short description
During the past months of work, the main achievements have been the following:
• Preparation and submission of the Annual Report and relevant Review
• Thorough discussion within the consortium regarding the EC comments after the review (with the
preparation of comprehensive discussion documents, attached as appending to this report)
• Finalisation and submission of the Third Amendment to the DoW
• Request to the EC of a 3-months extension of the project, with the preparation of the relevant
support letter (attached as appendix to this report)
• Start the preparation of the MD-Paedigree Newsletter n.4-5
• Preparation of the fourth half-yearly meeting in Leuven (12th-13th September, 2016)
During this period, several meetings and telephone conferences (TCs) were held for each of the five
Project’s subgroups: Coordination & Management, Cardiomyopathies & Obesity, JIA, Neurologic &
neuromuscular diseases (NND), Infostructure.
The half-yearly general meeting was held in Leuven (Belgium), kindly hosted by KU Leuven, on the 12th-13th
September 2016.
The preparation of the third amendment to the DoW was deemed necessary in order to take into account
the suggestions coming from the Annual Review and also to update the work plan, on the basis of the
current status of the work, was finalised. Subsequently, the amendment has been submitted to the EC.
MD-Paedigree’s objectives in the first six months of the Fourth Phase
During these first eight months of Phase 4, the main objectives of the Project have been:
• On the management side, submitting the amendment and granting the project a three months
extension to complete the activities
• On the clinical side, provide expert support for the validation process of the implemented models
• On the technical side, to validate and set-up the models capabilities into advanced prototypes and to
finalise the integration of all the analytics and visualisation tools within the Infostructure.
• On the exploitation side, to work on a business model for the Project’s spin-off, to prepare a global RDA
Alliance group focused on “Big Data in healthcare” related topics.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
5
Deliverables due by month M36-46
Following the EC approval of the 3-months extension of the project a few deliverables which were due in
this six month period have been postponed as shows the following table.
Deliverable
n.
Deliverable title WP number Delivery date New delivery
date
D4.3 Report on patient follow up 4 M 40 M 44
D8.4 Whole heart coupled FSI simulation
report
8 M 42 M 45
D10.5 Report on multidimensional
modelling of the disease course
10 M 42 M 45
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
6
Main achievements in the current reporting period
WP1 – Coordination & Project Management
Beneficiary partner: OPBG
The activities of the WP in these past six months were mainly focused on the finalisation of the Third
annual report, as well as on the preparation of the Third Annual Review.
As usual, the preparation of the Third Annual report requested a specific effort of coordination of all the
partners, together with the control of the consistency among the activities performed and the costs
claimed.
To this end, several online sessions were organised, involving all the partners of the Project, each
requested to present the results during the second reporting period.
A three-months extension of the project was requested on the basis of the outcomes of the preceding
review with the Commission, and granted to allow the consortium to complete follow-up data
acquisition, validation activities and integration of tools in the Infostructure.
Furthermore, the fourth Biannual Meeting has been organised in Leuven, on September 12-13.
WP2 – Clinical and technical user requirements for disease modelling
Responsible partner: OPBG
Completed at M 36
WP3 – Data acquisition and processing for Cardiomyopathies
Responsible partner: OPBG
Completed at M 36
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
7
WP4 – Data acquisition and processing for the estimation of CVD risk in obese children
Responsible partner: UCL
Clinical data & Routine laboratory test data collection.
Baseline data
The enrolment period has now ended, and patient enrolment has been completed at OPBG and UCL. By
M42 (August 2016), OPBG had enrolled 104 patients, with baseline imaging data in 84 patients (84
Echocardiography, 71 MRI and 104 ECG), and other baseline data in 104 patients. This greatly exceeded
OPBG enrollment quota (80 patients at baseline). 24 additional patients were enrolled to compensate for
the expected low return rate at follow-up. Compliance of obese patients to long-term observations is
known to be poor, as most patients do not perceive their health condition as life-threating (Obesity 2012;
20:1319-1324; BMC Pediatr 2014; 19:14-53).
Table 1. Baseline data collection
Data acquired (%)* Data shared with infostructure (%)*
Questionnaire 104 (130%) 65 (81%)
Anthropometry 104 (130%) 80 (100%)
Baseline bloods and cortisol 104 (130%) 80 (100%)
OGTT 104 (130%) 80 (100%)
Genetics bloods 98 (116%) 90 (112%)
ECG 104 (130%) 80 (100%)
Echocardiography 84 (105%) 0 (0%)
Whole Body fat 71 (89%) 71 (89%)
Baseline MRI 71 (89%) 71 (89%)
Stool microbiome 75 (94%) 65 (81%)
(*) percentage based on 80 patients (expected target number for OPBG)
Echocardiograms have been acquired in 84 patients. Images of echocardiograms have been acquired and
are in the process of being anonymized. MRIs have been acquired in 71 patients, all of them anonymized,
processed and shared. ECG data have been acquired and shared in the Infostructure. The electronic case
report forms (eCRF) for the questionnaires have been acquired in all the patients and 65 eCRFs have been
shared. For the estimation of adipokines, low-grade inflammation and insulin resistance, 104 assays have
been collected and stored for batch processing. 60 samples were sent to Luminex for analysis.
At UCL, enrolment was complete and described in the 3rd annual report. Therefore, data are only briefly
summarised again here. UCL participants completed a meal challenge protocol instead of a 1 year follow-
up as was planned for OPBG and DHZB patients.
Table 2. Baseline data collection
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
8
Data acquired (%)* Data shared with infostructure (%)*
Questionnaire 82 (103%) 82 (100%)
Anthropometry 82 (103%) 82 (100%)
Baseline bloods and cortisol 82 (103%) 82 (100%)
OGTT / meal challenge 82 (103%) 82 (100%)
Genetics bloods 64 (80%) ** 64 (100%)
ECG 82 (103%) 82 (100%)
Echocardiography 72 (90%) 72 (100%)
Whole Body fat 82 (103%) 82 (100%)
Baseline MRI 82 (103%) 82 (100%)
Stool microbiome 76 (95%) 52 (68%) ***
(*) percentage based on 80 patients (expected target number for UCL); (**) Some samples were lost due
to incorrect processing in the genetics laboratory; (***) Some samples awaiting laboratory processing.
DHZB have acquired and shared 8/20 full datasets (40) at baseline and no follow-up data.
Follow up data (OPBG)
Follow up examinations are ongoing. Fifty-four patients out of 104 (51.2%) refused to participate in the
follow-study or were lost to periodic clinical follow-up and as of October 2016 (M44), follow-up data for
50 patients had been collected, including echocardiograms in all patients. MRI has been acquired in 12
patients out of 50 (24%). The remaining ones denied the procedure stating discomfort, time consumption
or because of clinical reasons such as the intervened implants of metal devices or prostheses. Table 3
summarises follow-up data collection at OPBG.
Table 3. Follow-up data collection
Data acquired (%)* Data shared with infostructure (%)*
Questionnaire 50 (100%) 0 (0%)
Anthropometry 50 (100%) 12 (24%)
Blood sample 50 (100%) 12 (24%)
OGTT 50 (100%) 12 (24%)
ECG 50 (100%) 0 (0%)
Echocardiography 50 (100%) 0 (0%)
Whole Body fat 12 (24%) 0 (0%)
MRI 12 (24%) 12 (24%)
(*) percentage based on 50 patients
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
9
Estimation of adipokines, low-grade inflammation and insulin resistance.
104 Assays stored for batch processing at OPBG, 60 sent to Luminex and 60 measures have been
acquired; not all measurements were technically possible at DHZB due to limited equipment availability.
Therefore, some data are missing. UCL has acquired all samples at baseline and at multiple time-points
following the meal challenge in 82/80 participants. All data have been laboratory processed except for
GIP, Ghrelin and GLP-1, which will be completed by December.
Image acquisition, clinical annotation and data processing.
Images have been acquired at UCL as per the full meal challenge protocol for 82/80 patients. At OPBG,
89% of participants at baseline (71/80) and 24% at follow-up have MRI data. Manual data processing at
UCL is complete for flow measures and cardiac volumes in systole and diastole. Automated processing
techniques are also being developed by the technical partners.
DHZB – Cardiac, vascular, and full body MRI sequences have been acquired in 8 patients.
Systolic and diastolic markers of cardiac dysfunction of US and CMR.
Available for all recruited patients at OPBG and UCL. DHZB has acquired cardiovascular sequences
according to the study protocol.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
10
WP5 – Data acquisition and processing for Juvenile Idiopathic Arthritis
Responsible partner: IGG
T5.1 Data collection protocols and informed consent forms have been finalized, approved, and implemented
in all three centres. This task is now completed.
T5.2 Clinical data collection [Month 4-46] Enrolment of new patients has finished as per December 2015. In total, 169 patients have been enrolled. Clinical follow up is continuing. At the time of writing of the report, 161 6 month visits, 129 12 months visits,
91 18 months visits, 54 24 months visits and 25 flare visits have been performed.
T5.3 Routine laboratory tests [M4-46]
All patients performed blood tests at baseline.
T5.4 Synovial and blood Cytokine and inflammatory mediators profile [M 4-46]
Blood and synovial fluid for Luminex was collected of 137 (81%) and 58 (34%) patients, respectively, at
baseline. Samples were also collected at follow up, if a patient presented with clinical inactive disease or
disease flare and routine laboratory tests were performed.
All samples were sent to UMCU and 136 baseline samples were analysed. The results of this analysis will be
put together in one large database, together with the clinical data, microbiota data (T5.5) and ultrasound
data (T5.6) and sent to the infostructure group for the development of the patient-specific computerised
prediction model. Newly available (follow up) data will be sent once available and will be incorporated in
the model.
The Luminex data will be furthermore explored for relations with clinical parameters, such as JIA subtype
and disease activity.
T5.5 Meta-genomic data analysis (Microbiota: metaxonomy and metabolomics-based analyses to
produce microbiome analytics data) [M 4-46]
Gut microbiota mapping has been performed in term of targeted-metagenomics by producing alpha-
diversity indexes through Shannon and ChaoI algorithms for each patient and CTRL group (Figure 1, panels
A and B). In detail, all permutations amongst CTRLs, baseline, inactive disease, persistent activity were
analysed to identify by post-doc analysis statistically significant differences in the alpha diversity between
CTRLs and patients and within patients’ subgroups (Figure 2).
The analysis was also performed stratifying patients on the basis of geographical subgroups, including
CTRLs from UMCU and OPBG (Figure 3, panels A and B). The approach was repeated for β-diversity
assignment, evaluating unweighted Unifrac analysis (Figure 4). Additionally, Krustal Wallis test allowed us
to identify JIA-related microbial biomarkers at phylum, family and species levels (Figures 5-8). Also
Spermann correlations were performed between patient subgroups and CTRLs to identify ecological co-
existing bacteria groups, enhancing or excluding presence of specific taxa witgh the aim to describe
bacterial community balances (Figure 9, panels A-D). From this analysis, a Venn diagram highlighted
potential shared or excluded “core” and “variant” groups of bacteria for each patient group (Figure 10).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
11
Figure 1. Evaluation of alpha-diversity for JIA disease-related enterophenotype index. Panel A. Shannon
Index for the evaluation of alpha-diversity. Panel B, Chao1 index.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
12
Figure 2. Alpha diversity post hoc analysis for all JIA patients and CTRLs subgroups for disease-related
enterophenotype index.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
13
Figure 3. Evaluation of alpha-diversity for geography-related JIA enterophenotype index Panel A.
Shannon Index for the evaluation of alpha-diversity. Panel B, Chao1 index for the evaluation of alpha-
diversity.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
14
Figure 4. Alpha diversity post hoc analysis for all JIA patients and CTRLs subgroups for geography-related
enterophenotype index.
Figure 5. Unweighted Unifrac analysis box plot
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
15
Figure6. Unweighted Unifrac analysis box plot for each substratifeid group of patients and CTRLs, also
included geography-dependent determinants.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
16
Figure 7. Kruskal_Wallis test employed to assess disease microbial nbiomarkers for microbiome analytics
procedures at Phylum and Family levels (all values are p<0.05).
Figure 8. Kruskal_Wallis test employed to assess disease microbial nbiomarkers for microbiome analytics
procedures at species level (all values are p<0.05).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
17
Figure 9. Panel A. Spearman correlation at species level for ITA _CTRLs
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
18
Figure 9. Panel B. Spearman correlation at species level for UCM_CTRLs
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
19
Figure 9. Panel C. Spearman correlation at species level for ITA_JIA
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
20
Figure 9. Panel D. Spearman correlation at species level for UMCU_JIA.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
21
Figure 10. Venn diagram of common OTU among JIA_IT, CTRL_IT, CTRL_UMCU, JIA_UMCU set
Status of data collection and integration. Targeted-metagenomics data from 319 samples including 212
from patients and 107 CTRL samples (OPBG biobank and UTRECHT reference subjects) were obtained by
454 NGS pyrosequencing analyses, microbiome maps-phenotypes-SLA (Supervised learning algorithms).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
22
Samples from Italian patients (more homogeneous data set, OPBG plus IGG patients) were analysed for
metabolomics analysis by GC-MS/SPME and 1H-NMR for volatile and not volatile metabolite analyses
(Figure 11). Amongst the 100 samples analysed fro metabolomics, only 86 samples were also integrated
with metagenomics data (Figure 12).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
23
Figure 11. SAMPLES LIST JIA AND CTRL 100 samples analyzed by GC-MS/SPME and 1H-
NMR
Figure 12. SAMPLES LIST of JIA and CTRL integrated for GC-MS/SPME and 1H-NMR and
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
24
metagenomics 86 (samples)
INTEGRATION MODEL: Strategy for the multivariate analysis of MG, GC/MS and NMR MB data and low
data fusion (integrated platforms).
PLS-DA method using double check of validation. Partial least squares discriminant analysis (PLS-DA) is the
most used classification methods in metabolomics. PLS-DA consists of a classical PLS regression where the
dependent variable y is categorical and represents samples class membership e.g. y can be a vector with
values of -1 and 1 where -1 represents each sample belonging to the class of controls and 1 represents each
sample belonging to the class of cases. By making use of class information, PLS-DA tends to improve the
separation between the (two) groups of samples. Two steps : 1) the selection of the optimal model
complexity e.g. optimal number of latent variables (#LV) and 2) the assessment of the overall quality of the
model. A double cross validation scheme consists of two nested loops CV1 and CV2, (see Smit et al. 2007).
The aim of CV1 is to optimize complexity of the PLS-DA model and the aim of CV2 is to assess final model
performance. In the outer loop (CV2) the complete dataset is split into a test set and a rest set: the test set
is set aside and the rest set is used in a single cross validation (inner loop, CV1). In the CV1 the rest set is
again split into a validation (sometimes called optimization) set and a training set. Statistical significance of
each PLS-DA model is estimated by using the value of the diagnostic statistics (number of misclassification,
NMC or Area under the Receiver Operating Characteristics, AUROC) to values of its null reference
distribution H0 obtained by permutation tests. Also the Discriminant Q2 is showed. However DQ2 (in
contrary to NMC and AUROC) prefers PLS-DA models with lower complexity. NMC and AUROC are more
efficient and more reliable diagnostic statistics and have been recommended in two group discrimination
metabolomic studies [Szymanska E et al. Metabolomics 2012 Jun; 8(Suppl 1):3-16]. Coomon samples
analysed by NMR and GC were 86: 24 CTRLs, 21 baselines, 19 inactive, 22 persistent. Cuts for family and
species were for presence of OTUs with at least >= 70 % of subjects.
Graphical illustration of use of diagnostic statistics: NMC, AUROC, and DQ2 in double cross validation
procedure of PLSDA is below reported. a) Use of diagnostics statistics in selection of optimal number
oflatent variables in CV1; b) use of diagnostics statistics in assessment of overall PLS-DA model quality after
double cross validation procedure (CV2) (Figure 13).
a) b)
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
25
Figure 13. Principles of integration model.
Strategy for the multivariate analysis of metagenomics, GC/MS and NMR based metabolomics data and low
data fused (integrated platforms)
Data Integration: metagenomic data were integrated with metabolomic data (H-NMR, GC-MS ). Data
Usage: Gnubila uploading, first set of data, excel files, generation of microbiota model. Obtained outcomes
were compared with former. Integration patterns and data validation done: see ppt on MG and MB omics
integration (Figure 14-19).
Figure 14. GC-MS Controls vs JIA active (baseline + persistent). Panel A.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
26
Figure 14. GC-MS Controls vs JIA active (baseline + persistent). Panel B.
Figure 15. GC-MS JIA Inactive vs JIA active (baseline + persistent). Panel A.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
27
Figure 15. GC-MS JIA Inactive vs JIA active (baseline + persistent). Panel B.
Figure 16. GC-MS JIA Inactive vs JIA persistent active)
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
28
Figure 17. GC-MS Control vs JIA. Panel A.
Figure 17. GC-MS Control vs JIA. Panel B.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
29
Figure 18. Low Level Data Fusion Controls vs JIA active (baseline + persistent)
Figure 19. Low Level DF Control vs JIA.
The results of these analyses will be presented at the Annual congress of the Paediatric Rheumatology
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
30
European Society in September. In preparation are two papers, one focused on random forest analysis to
provide microbial biomarkers in the JIA patients compared to CTRLS, based on random forest computation,
and one focused on the predictive functional model of JIA microbiota. Both papers are prepared to provide
a back-to-back option for Nature Medicine articles.
Currently, the metabolomics analysis of these samples is underway. Once finished, the data will be
analysed and the results of the microbiota and metabolomics analysis will be published together in one
paper.
T5.6 Image acquisition and clinical annotation [M 4-46]
Baseline ultrasound has been performed in almost all patients (92%). These data will be sent to the
infostructure group together with clinical, Luminex and microbiota data for the prediction model.
Moreover, analysis of the ultrasound data itself is underway, such as an analysis of the prevalence and
significance of subclinical disease activity, determined with ultrasound.
MRI and CGA analysis for the development of the biomechanical model is still being performed in both
new-onset patients and patients with long-term involvement of the ankle. Outcomes to be predicted using
the biomechanical model have been defined and a tight schedule has been set up to create the model in
time for all available data sets.
An extensive scoring system for the scoring of different disease aspects visible on the ankle MRI (such as
synovitis, tenosynovitis, joint damage progression, bone marrow oedema and cartilage erosions) has been
developed and tested among the radiologists and rheumatologists involved in the project. The
biomechanical model will be tested against these scorings (e.g. if altered joint loadings predict joint damage
progression). To this end, a specific meeting has been held on January 12th at the IGG premises.
T5.7 Gait cycle analysis [M 4-40]
Clinical gait analysis has been performed alongside the MRIs. These exams, too, have been anonymised and
uploaded to the platform for incorporation in the biomechanical model (see T5.6). Furthermore, analysis of
the gait data has commenced, aimed at correlating the data with clinical parameters, such as disease
activity and extension.
T5.8 Data Upload and Integration into the Infostructure [M 24-46]
Clinical data has been uploaded to the platform in the form of anonymised MS Access databases and MS
Excel files. Work has been done to automate this process, converting the data to the platform format, while
maintaining the clinically useful data structure, already present in the MS Access database. Rules have been
applied to maintain and secure anonymity.
CGA and MRI files have been uploaded to the platform as well.
The database containing all clinical, microbiota, ultrasound and Luminex information at baseline, plus the
observed outcomes at the various follow up visits for all patients, which will be used by the infostructure
group for the development of the personalised prediction model, will be uploaded to the platform as well.
WP6 – Data acquisition and processing for NND
Responsible partner: VUmc
Brief overview of the work done in mm 37-42
Data collection has progressed according to plan over the last 6 months. Data integration to
infostructure is behind schedule, but progress has been made and almost all procedures are in place now
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
31
to start uploading data.
TASKS
We worked on the following tasks during this time period. The progress in each of these tasks is discussed.
T 6.2 - Gait analysis collection for CP [M01 - M36]
T 6.3 - Gait analysis collection for DMD and CMT [M12 - M44]
T 6.4 - Image acquisition [M03 - M36]
T 6.5 - Data Upload and Integration into the Infostructure [M 24-44]
ISSUES
1) Some measurements are behind schedule (as reported earlier)
2) Data integration is more complex than foreseen
CORRECTIVE ACTIONS
A new time line has been agreed upon to schedule 1) the final data collection as well as 2) data
integration. Monthly teleconference meetings with all involved NND and infostrcuture team
will be planned to make sure this time line is followed up upon.
Overview data-collection status
NND OPBG
Patient Reference Complete
Acquired
GOAL
TOTAL OVERALL 298 290
Total CP prospective extended 8 10
Total CP prospective clinical 54 40
Total CP retrospective 200 200
Total DMD T0 9 10
Total DMD T1 8 10
Total CMT T0 10 10
Total CMT T1 9 10
Healthy MRI 22 22
NND KUL
Patient Reference Complete GOAL
TOTAL OVERALL 514 490
Total CP prospective extended 7 10
Total CP prospective clinical 30 40
Total CP retrospective 451 400
Total DMD T0 11 10
Total DMD T1 5 10
Total CMT T0 8 10
Total CMT T1 0 10
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
32
NND VUMC
Patient Reference Complete GOAL
TOTAL OVERALL 50 50
Total CP prospective extended 9 10
Total CP prospective clinical 41 40
Total TD reference data GRAIL 25 20
T 6.2 - Gait analysis collection for CP [M01 - M36]
OPBG
Retrospective data
All retrospective data have been collected. Preparations to extract the clinical relevant outcome
parameters (CROPs) are finalized and we have uploaded the data on the infostrucutre. Further, all clinical
exam data has been extracted from the hospital database and uploaded on infostrucutre.
Prospective data
Standard protocol
A total of 54 of the required 40 gait analyses and clinical exam data have been collected. And the data
was uploaded on infostrucutre.
Extended protocol.
Currently 8 children have been studied so far. Precedence will be given during the rest of the project to
the standard protocol to assure completion of perspective data collection.
KU Leuven
Retrospective data
All retrospective data has been collected and curated. Preparations to extract clinical relevant outcome
parameters (CROPs) are finalized and soon will be loaded in the infostructure. Further, all clinical exam
data has been extracted from the hospital database.
Prospective data
Standard protocol
A total of 30 of the required 40 gait analyses and clinical exam data have been collected of which 10 are
post measurements. Data-collection for the extended protocol is now finalized and gait analyses for the
standard protocol will restart to be finished before the end of 2016.
Extended protocol
Data-collection for the extended protocol is going well, although, in some instances, not all data could be
collected. Main reasons were fatigue due to the length of the protocol or discomfort with some of the
measurements (mainly MRI and O2-measurement). Currently seven children have been measured, two
children are scheduled (in September and December 2016) and the parents of one child have agreed to
participate, making that by the end of 2016 the required number of children (10) will be collected.
VUmc.
Prospective data
Standard protocol
All prospective data have been collected, for a total of 40 gait analyses and clinical exams. Datasets for
28 children with CP (15 male, 13 female) , of which 12 children were measured pre- and post-treatment
and 16 only pre-treatment. 10 patients were studied on a treadmill. The other patients were studied in
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
33
the overground gait lab. These patients were measured pre-SEML (6), pre-SDR (8), pre-post SDR (1) and
pre-post ITB (1).
Extended protocol
At this moment, 9 CP patients (7 male, 2 female) have been studied according on the extended protocol.
Of all patients, extended clinical data, gait data (including functional knee and hip calibration) and MRI
data were assembled. Data collection is expected to finish next month.
T 6.3 - Gait analysis collection for DMD and CMT [M12 - M44]
OPBG
Prospective data
DMD
Currently 9 children have been studied. Data-collection for the follow-up measurements is expected to
the be completed by the end of 2016.
CMT
Data-collection of the baseline measurements of 10 CMT patients has been finalized. Data-collection for
the follow-up measurements will (hopefully) be completed by the end of 2016.
KU Leuven
Prospective data
DMD.
-collection of the baseline measurements is finalized and follow-up measurements are scheduled.
Follow-up data of five children are collected and one measurement is scheduled in October. Two
children have declined the second measurement due to an operation and fast progression of the disease.
Data-collection for the follow-up measurements is almost completed.
CMT.
Data-collection of the baseline measurements of eight CMT children has been finalized. Unfortunately,
the required number of 10 will not be met. This will be compensated for by an extra extended DMD
measurement (11 instead of 10), several standard measurements of DMD children and additional
retrospective data (±450 instead of 400).
Two follow-up measurements have been scheduled and the other children will be contacted in
September. Data-collection is expected to be finalized at the end of February 2017.
T 6.4 - Image acquisition [M03 - M36]
OPBG.
MRIs
MRI data has been collected on all extended CP patients so far (N=7), and all T0 DMD (N=9) and CMT
(N=10, complete).
Echocardiography and ECG.
OPBG: At OPBG echocardiography was collected following the normal protocol of follow up of the
underlying disease. All patients had echocardiography performed close to the baseline data (with a range
+/- 3 months).
KU Leuven
MRIs
Of all recruited extended CP patients, MRIs have been acquired.
Echocardiography and ECG.
At KU Leuven it was decided due to the length of the protocol (feasibility), that echocardiography and
ECG data will be acquired when requested as standard care. Currently six echocardiography images and
six ECG have been collected.
VUmc
MRIs
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
34
MRI’s have been acquired for all CP children that were measured according to the extended protocol.
Since quality of the first MRI scans was considerably less than quality of the scans acquired at OPBG and
KUL, data collection was put on hold to solve this problem. After extensive consultations with local MRI
experts and multiple test scans, we agreed on a protocol with sufficient quality, required for the analysis.
The problems are currently solved and data collection continued.
T 6.5 - Data Upload and Integration into the infostructure [M 24-44]
OPBG
Gait data
We have uploaded all the CROPs files and the C3D files on Gnubila. For the retrospective data we have
uploaded only the C3D files, we are working for extracted the CROPs file yet.
Clinical data
We have uploaded the excel files (CP retrospective and prospective, DMD and CMT) on Gnubila. i
remember that our files excel are not homogenous, because we used three different files format.
Imaging data
We have acquired the MRI for all 8 CP children (extended protocol), for all DMD and CMT enrolled for
the baseline evaluation.
KU Leuven
Gait data
C3D-files of 13 gait analyses containing the ‘raw’ measurement data have been anonymized and
uploaded to the file-sharing system. Preparations for extracting the CROP data have been finalized, and a
final file format has been decided upon together with the infostructure and the other two centers. This
has proven to be a time-consuming task, while resources (budget and man-months) of both the clinical
partners and technical partners are limited. KU Leuven and infostructure are working hard to solve this.
Clinical data
The clinical exam data of KU Leuven is extracted from the hospital database into an excel-based format.
Together with Sheffield, KU Leuven is currently trying to map their clinical data into the format VUmc
and OPBG are using. Although time-consuming, it is expected to be finalized by the beginning or mid of
September for the prospective data. Since the retrospective data largely resembles the prospective data,
converting the retrospective data should take less time. When the clinical data of the KU Leuven can be
properly converted, all available clinical data will be uploaded into the file-sharing system.
Imaging data
All available imaging data has been anonymized and uploaded to the file sharing system.
VUmc Gait data
A custom made Matlab script was created to extract the clinical relevant outcome parameters (CROPs)
from the collected gait data. All preparations for extracting the CROP data have been finalized.
Clinical data
Clinical data of 37 measurements (7 extended, 30 standard prospective) were uploaded to the database.
Data of the remaining measurements have been collected in excel spreadsheets and were converted to a
Microsoft Acces Database. Files are ready for upload.
Imaging data
Available imaging data for the first 7 patients has been anonymized and uploaded to the file sharing
system.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
35
WP7 – Genetic and metagenomic analytics for obese patients
Responsible partner: OPBG
Status of Data Collection: targeted-metagenomics data from 207 samples including 128 from patients
(different phenotypes*) and 79 CTRL samples (OPBG biobank reference subjects) were obtained by 454
NGS pyrosequencing analyses, microbiome maps-phenotypes.
Data Usage: Gnubila uploading, first set of data, excel files, generation of NGS microbiota model
Obtained outcomes were compared with former.
Next steps: Integration patterns and data validation.
Stool samples were stratified accordingly to geographical origin, hence groups were separately analysed for
microbiota mapping.
Italian subgroups: 64 obese. UK group 40 over 52 totally.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
36
Gut microbiota mapping has been performed in term of targeted-metagenomics by producing alpha-
diversity indexes through Shannon and ChaoI algorithms for each patient and CTRL group (Figure 20, panels
A and B).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
37
Figure 20, panels A and B. Evaluation of alpha-diversity for obesedisease-related enterophenotype index.
Panel A. Shannon Index for the evaluation of alpha-diversity. Panel B, Chao1 index.
Figure 21. Alpha diversity post hoc analysis for all obese patients and CTRLs subgroups for disease-related
enterophenotype index.
The analysis was also performed stratifying patients and CTRLs from OPBG for β-diversity assignment,
evaluating unweighted Unifrac analysis (Figure 22). Additionally, Krustal Wallis test allowed us to identify
JIA-related microbial biomarkers at phylum, family and species levels (Figures 23-24).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
38
Figure 22. Unweighted Unifrac analysis box plot
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
39
Figure 23. Kruskal_Wallis test employed to assess disease microbial nbiomarkers for microbiome analytics
procedures at Phylum and Family levels (all values are p<0.05).
Figure 24. Kruskal_Wallis test employed to assess disease microbial nbiomarkers for microbiome analytics
procedures at species level (all values are p<0.05).
UK UCL patients were analysed on the basis of the BMI stratification and OTUs distribution addressed, as
follows:
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
40
Also Spermann correlations were performed between patient subgroups and CTRLs to identify ecological
co-existing bacteria groups, enhancing or excluding presence of specific taxa witgh the aim to describe
bacterial community balances (Figure 25-27).
Figure 25. Spermann correlation at Phylum level.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
41
Figure 26. Spearman Correlation: Family level
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
42
Figure 27. Spearman Correlation: OTUs level
Currently the remaining UCL 12 patient samples are under metagenomic analysis.
Two papers have been produced under MD acknowledgments: Hepatology, Del Chierico F, Nobili V,
Vernocchi P, Russo A, Stefanis C, Gnani D, Furlanello C, Zandonà A, Paci P, Capuani G, Dallapiccola B,
Miccheli A, Alisi A, Putignani L. Gut microbiota profiling of pediatric nonalcoholic fatty liver disease and
obese patients unveiled by an integrated meta-omics-based approach. Hepatology. 2017 Feb;65(2):451-
464. doi: 10.1002/hep.28572.; JoH, Journal of Hepatology, under review, 2017.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
43
WP8 – Modelling and simulation for Cardiomyopathies
Responsible partner: SAG
Task 8.1 - Personalised anatomical and structural heart modelling (M04 - M48):
The prototype with improved, consolidated MRI segmentation capabilities described in the last report has led to an efficient processing of new cases, which arrived at a rapid pace as many of the data anonymization issues were resolved. During this period we generated segmentations of cardiac chambers from all available baseline and follow-up MRI data and are in the process of generating segmentations of the mitral and aortic valves from all feasible echo data. In parallel, algorithms to fuse the chamber and valve segmentations are being incorporated into the prototype. The next step would be to finalize the chamber-valve fusion pipeline and generate fused meshes for all feasible cases.
Task 8.2 - Electrophysiological and biomechanical modelling and simulation (M04 - M48):
At INRIA, a multiscale model of the heart which enables a very fast and robust personalisation of the 3D heart model was built. These improvements led to the successful personalization of 35 patients by fitting the volume curve. This was done by estimating optimal values for the following parameters: Maximal Contraction, Stiffness, Resting Mesh, time of Ventricular activation and 3D damping. All target observations (Minimal Volume, Maximal Volume, Time of minimal Volume, Volume at the beginning of the contraction of the atrium, Flow at the beginning of the contraction of the atrium) were fitted to the data value under 6% of their mean population value. In addition to this main personalisation performed on all cases, other sets of parameters and observations on a reduced set of patients for further exploitation of the personalised hearts were investigated (an effort which is still ongoing).
Due to the technological development work being almost finalized at Siemens, the focus was put on the other tasks of WP8 during the last 6 months. The automatic personalization pipeline is however ready and the remaining patient datasets, which arrived in large quantities over the last months, will be processed soon.
Task 8.3 - Hemodynamic modelling and simulation (M13 - M48):
With both the reduced flow model and the full 3D CFD model finalized, there were only minor updates at the technological level. Data processing continues but it is limited in bandwidth due to data availability.
The personalization algorithm for the lumped parameter whole body circulation model has been further refined. The predictive capability of the model was evaluated using baseline and follow-up data of 12 patients. The model was personalized for the baseline state of each patient, and next, the heart rate was changed to that of the follow-up exam: the model was able to predict well the ejection fraction at follow-up, with a correlation of 0.87 and a mean absolute difference of only 4.58%. Thus we confirmed that the personalized model is able to identify subtle variations in cardiac status, as for example caused by minor changes in beta blocker therapy.
For the 3D modelling, 4 new datasets with complete anatomy, flow and valves data were available and have been processed to generate the moving geometry. 3D flow computations and analysis are currently being performed.
Task 8.4 - Whole-heart coupled Fluid-Structure-Interaction simulation (M22 - M48):
Implementation of the 2-way FSI solver has been finished and first verification tests were performed using newly segmented cases. Initial results indicate that 3D fluid stress effect onto solid walls does not produce a significant change when compared to a reduced (0D) version of the stress, while taking significantly longer computational time (an order of magnitude higher) with the current implementation.
New imaging data has been processed to create geometric heart and valve meshes, and being used for 2-
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
44
way FSI computations. These include from OPBG patients 15, 31 and 33, and from UCL patient 71.
Task 8.5 - Statistical shape, flow and physiological properties modelling and analysis (M04 - M48):
On the shape analysis part, we have computed an atlas of the left-ventricular shape (LV) based on a segmentation of MRI at end-diastole. We have performed the study using 35 patients and more data are being processed in order to incorporate them in the study. We have computed both an average shape (template) representing the Cardiomyopathies patients and we also have performed statistical analysis in order to study the variation seen in the population around this average template. We are now extending this study to the data from all the three clinical centres and we are investigating more clinical variables, performing variable selection techniques in order to select the most relevant ones.
On the longitudinal part, we have performed a study of the evolution of cardiac motion over time based on 3D+t cardiac image registration techniques using the polyaffine transformation model. Using the last improvements of the methodology of the polyaffine registration, we have extracted parameters for 9 patients for 2 timepoints (baseline and follow-up). We have compared the parameters of the two timepoints in order to derive an average evolution and to link the evolution with known clinical variables. We have first computed an average evolution based on all the patients to get a first view of what is the mean change seen in the cardiac motion of a population with Cardiomyopathies. As all the patients do not have the same type of evolution, we are now linking the evolution seen in the motion of different patients in pre-defined categories in order to cluster the patients depending of their clinical evolution.
WP9 – Modelling cardiovascular risk in the obese child and adolescent
Responsible partner: SAG
T9.1 Heart model adaptation to the obese heart: 0. Data Collection: More data have been made available to technical partners the last few months
(see Figure 1 Figure 1). From OPBG, 51 cases are available currently. More have been acquired however they are still in the process of being anonymized by Gnubila. Follow ups are currently being acquired at OPBG. From UCL, 81 cases have been transferred, containing all different time points during the meal challenge. From DHZB, no further cases have been acquired or made available to technical partners. The current amount of cases received from DHZB is 8.
Figure 1: Data collection status
1. Heart Segmentation: Segmentation of cardiac chambers from MRI data moved at a rapid pace
during this period. The improved, consolidated prototype for MRI segmentation described in the
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
45
previous report led to an efficient processing of new cases, which arrived at a rapid pace as many of the data anonymization issues were resolved. During this period we generated segmentations of cardiac chambers from approximately 80% available baseline MRI data. While more data will be made available from OPBG, all time point acquisitions from UCL are being currently processed. The current status of data segmentation can be seen on Figure 2Figure 2. Note that each UCL cases have between 4 and 7 acquisitions, and that the number of MRI data processed until now is around 64x3 = 192 cases.
Figure 2: Heart segmentation status
2. Whole-body circulation: Using the output of the segmentation components, as well as blood
pressure and heart rate information, whole-body circulation lumped models have been personalized for the cases collected. The current status of personalization is summarized by Figure 3Figure 3.
Figure 3: Whole-body circulation lumped model personalization status
3. Strain computation: During the last few months, significant improvements have been made to automatize the strain computation tool. While the quality of segmentation has been improved, it is now possible to process cases in batch mode, which greatly improved the pace of processing. Currently, all available OPBG cases have been processed and results are reviewed to ensure quality. Next, rest of UCL cases will be processed as well as DHZB cases. The current status is summarized by Figure 4Figure 4.
Figure 4: Strain computation status
4. Polyaffine transform parameters: based on the motion tracking, an analysis of the motion has
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
46
been performed by computing regional affine parameters giving a reduced order representation of the motion at each frame. These polyaffine parameters have been computed for a total of 38 patients (9 from OPBG, 22 from UCL and 7 from DHZB). These parameters were then projected on a 2D space built by the two first modes of a retrospective population of healthy and infarcted patients. The position of each patient within this space gives a representation of the efficiency of his cardiac motion compared to the healthy and infarcted population.
T9.2 Automated assessment of body fat distribution from MRI and ultrasound data:
Main focus of Fraunhofer was to process the available datasets to quantify liver fat. In regard to this task, the current status is as follows:
• All available retrospective datasets from UCL were manually (82 datasets) and automatically (68 datasets) segmented. Both groups were used to quantify fat. The manually segmented datasets served as ground-truth and were used to compare them to the automatically segmented datasets using a Bland-Altman plot, which showed that the error lies between +0.4 and -0.4%.
• All available prospective datasets from UCL were automatically segmented and quantified (75 datasets).
• All available and usable prospective datasets from OPBG were automatically segmented and quantified (56 datasets).
Another focus of Fraunhofer was to improve the Subcutaneous Adipose Tissue (SAT) quantification tool. For a reasonable range of parameters, we performed the described quantification which is illustrated in the following picture:
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
47
The following plot shows the result for the best five parameter sets:
It can be seen that the best five parameter sets performed equally well. Further, it can be seen that there is a big improvement in the middle of the dataset where huge MR field inhomogeneities were present.
The following images are examples of non-corrected and corrected slices corresponding to the above plot (slice 64, 67, 86, 104).
T9.3 Multi-scale data integration and virtual phenotype generation:
1. Learning patient representation: To address the challenge of complex multimodal data sources, with varying scale and distribution, we have been developing an approach based on deep learning, a form of non-linear dimensionality reduction. Starting from classical deep autoencoders, we developed multi-task deep autoencoders that aim at solving two tasks simultaneously: (i) the reconstruction of the input data after compression and (ii) the prediction of relevant targets or pseudo-targets (see Figure 5Figure 5).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
48
Figure 5: Multi-task deep autoencoders
To train such models, we use a 2 phase strategy. First, we perform a layer-wise pretraining, where
each layer is trained independently, using a denoising objective. Then, training is performed using a
hybrid loss, i.e. a linear combination of a reconstruction term, a supervised classification/regression
term as well as a regularization term. Our model has been validated using 3 non-linear
dimensionality reduction toy-examples that can be seen on Figure 6Figure 6, Figure 7Figure 7 and
Figure 8Figure 8 . In all cases, the combination of both reconstruction and supervised
classification/regression provided very promising results.
Figure 6: Two-circles data - After 100 epochs of training, multi-task DAE permits to nicely reconstruct the 2 circles from 1D.
Figure 7: Swiss roll data - After 100 epochs of training, multi-task DAE permits to reconstruct the 3D roll from 2D.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
49
Figure 8: Swiss roll data - After 100 epochs of training, multi-task DAE permits to recover the roll structure even after compressing
the data to 1D.
2. DeepReasoner in the cloud: During the last few months, we developed our first case-based
reasoning prototype based on deep autoencoders that is fully working in the cloud. As shown in
Figure 9Figure 9, our prototype is based on Azure and AzureML: the web app is served to the user
by a server running on the virtual machine that communicates through a tunnel server with a web
service we developed using AzureML. Given a post request, this web service sends back prediction
for a given set of parameters. Communication between the different server/services is performed
using https requests.
Figure 9: DeepReasoner - case-based reasoning prototype based on deep learning in the cloud
Two workflows are implemented: (i) a csv file can be dropped within the browser to get back
prediction from our DeepReasoner, (ii) data uploaded on the Gnubila platform can be sent directly
to the DeepReasoner to perform prediction (see Figure 10Figure 10).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
50
Figure 10: DeepReasoner - data can be sent to our web app for prediction through the gnubila platform
We implemented a very first use case for our DeepReasoner using data from the NND disease area to
classify GAIT signals from the knee during swing into one of the 6 different joint pattern classes from
patient with cerebral palsy. Screenshots of the current prototype can be seen in Figure 11Figure 11
and Errore. L'origine riferimento non è stata trovata.Errore. L'origine riferimento non è stata
trovata..
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
51
Figure 11: DeepReasoner - Performing query for given patient
T9.4 Cardiovascular risk stratification and predictive disease and therapy modelling: In the current setting, we have at hand a few patient samples with a large number of features extracted
from anthropometry, blood tests, imaging, microbiome and others. Many of those features might be
uninformative or not relevant for a given target of interest. To address this challenge, a statistical feature
analysis has been defined, to be performed prior to the training of deep learning models, to remove non
informative features and potentially pre-identify interesting correlations. Note that if enough data would
be available, the deep learning models should be able to also select only interesting features, and this can
be controlled by applying sparsity constraints during the training (such as L1 regularization). The proposed
statistical feature analysis consists of following steps: (i) considering a specific target of interest, separate
patients into two groups, the ones having low values and the others having high values with respect to the
median value, (ii) assuming normal distribution with unequal variance, perform Welch’s t test to assess p-
values for each single feature, (iii) filter out features that have a p-value above a certain threshold (e.g. p-
val=5%). Using the features available for OPBG, we started with our first feature analysis experiments. Very
first results are shown below for relating microbiome features to clinical targets from OPBG (see Figure
12Figure 12, Figure 13Figure 13 and Figure 14Figure 14). The color within the matrices represent the
corresponding p-value between a given feature (row) and a specific target (column) (the lower the better,
i.e. the “cooler”). In the next steps, we will perform feature analyses to assess the correlation between
different families of features and targets, depending on the clinical use case. Results will be discussed with
our clinical partners to see whether these correlations seem meaningful and confirm hypotheses or
generate new ones. Subsequently we will perform some cross-validation experiments using our
DeepReasoner approach.
Figure 12: Feature Analysis - Correlations between microbiome L2 and blood test parameters as well as z-score BMI per age
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
52
Figure 13: Feature Analysis - Correlations between microbiome L5 and blood test parameters as well as z-score BMI per age
Figure 14: Feature Analysis - Correlations between microbiome L6 and blood test parameters as well as z-
score BMI per age
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
53
WP10 – Modelling and simulation for JIA
Responsible partner: USFD
Brief overview of the work done in mm 37-42
No major problems have to be reported on the technical side. In the last six months the focus shifted
mainly on processing and extracting biomarkers to answer to the clinical questions identified with the
clinicians. The pipelines and methodologies in place for generating the geometrical and the
biomechanical models have been applied to the available data and patient specific outputs are being
generated.
Tasks
T10.1 – Patient-specific anatomical modelling based on image data (M4 -51)
Both models for the ankle and the lower limbs have been improved in the past months by adding more
training data. The ankle models for the left and the right foot are now built using 23 datasets each. The
lower limbs model is built using 11 datasets.
Furthermore, additional pre-processing steps have been introduced. Anisotropic smoothing and
Gaussian filtering are used to despeckle the MRI images while preserving edges. Regarding the MRI-
specific image inhomogeneity, non-uniform intensity normalization is now applied to input datasets
before adapting the model, thus resulting in more precise segmentations. The pipeline for the automatic
segmentation of ankle bones is almost finished now. Given a 3D-WATS sequence without severe image
inhomogeneities, segmentations of the bones can be produced automatically for all ankle bones except
the phalanges, which are work in progress.
Regarding the automatic segmentation of the lower limbs bones, similar processing steps are used to
enhance the quality of the images. However, a new challenge was that the lower limbs of some patients
are divided into three or more parts which have different intensity distributions.
T10.2 – Automatic biomarker extraction (M7-45)
After establishing a pipeline for the automatic extraction of the inflamed regions, our work is now
focused on the quantification of the detected regions. For each pre- and post CM MRI dataset, we can
currently extract the following biomarkers:
• Total number of detected inflamed regions
• The volume of each region and the total sum of the volumes in mm³
• Number of affected joints
For the evaluation of the inflamed regions extraction as well as the automatic detection of bone erosion,
input from the clinical partners is needed. For eight datasets we produced manual segmentations of
inflamed regions in ankle joints which need to be validated by a radiologist before we can use them for
evaluation.
The clinical inputs needed regarding the detection of bone erosion are examples of bone erosion from
our available data pool.
T10.3 – Biomechanical simulation based on image-based modelling and gait analysis (M13-45)
The activities of USFD in the six months focused on processing the clinical gait data made available by the
clinical partners and the bone geometries produced by FhG in order to generate the musculoskeletal
models and run the biomechanical simulations of gait. In this respect, all the datasets received in a
complete and usable from FhG and by the clinical partners have been processed and the following
models have been generated: 18 ankles at baseline, 6 bilateral lower limbs at month 6 and 12 ankles at
month 12.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
54
For six patients, for whom data was available at times 0, 6 and 12 , complete musculoskeletal models of
the lower limbs were generated at all time points by merging the models of the ankle and foot with
those of the full lower limb (obtained at month 6). Finally, simulations have been run for these models
and biomarker extraction is expected to be discussed with the clinical partners and finalized and
presented at the upcoming biannual meeting in Leuven (September 2016).
T10.4 – Multi-dimensional modelling of disease course (M24 -45)
Having finalised the multi-body models, we have focused onto extracting the multi-scale models. An
attempt of building a finite element model of the cartilage has been finalised on one patient leading to
unsatisfactory results, due to the unsuitable image quality. We hence decided to approach cartilage
stress estimation at the ankle joint using a Hertzian contact model, which will be less sensitive to the
quality of the data as it considers ideal shapes to be in contact. This work is under development.
Concerning the integration with the results coming from the clinical and anatomical assessments, as
mentioned in the 3rd annual report, we agreed with the clinical partners to investigate three questions:
1. Do biomechanical alterations correctly predict the location of arthritis in the lower limbs?
2. Do biomechanical alterations correctly discriminate responsive and non-responsive patients?
3. Do biomechanical alterations affect structural damage progression?
The first question has been further subdivided in identifying, from the available biomechanical
biomarkers, the laterality of the affected side and discriminating single-joint from multi-joint cases. The
determinants of the clinical progression of JIA are unknown; we suspect both systemic and local
mechanisms. The ability of anatomically located biomarkers to correctly discriminate which side is
affected, would confirm that local mechanisms play a major role.
In the past three months we started evaluating the predictivity of the generated biomechanical models
with respect to identification of the affected side in patients presenting mono-lateral inflammation. The
assessment will then proceed as follows: all patient-control cases will be separated in left, right, and
bilateral and initially we will compare the average value of each biomarker in the left and right groups,
using a Students’ t-test, or a Mann-Whitney test when variables are not normally distributed. Given the
large number of biomarkers we will use exploratory analysis techniques to explore the independence of
the biomarkers, and their difference in relation to laterality. If the exploratory analysis suggest that
combinations of biomarkers might be necessary for an effective discrimination, we will then use logistic
regression models to test the discriminative power of independent biomarkers in combination. Finally, if
one or more biomarkers are confirmed to discriminate laterality, we will use the Area Under the
Receiver-Operating-Characteristic Curve (ROC-AUC) as a measure of the predictive accuracy of that
biomarker for laterality; in case of combination of biomarkers we will calculate the ROC-AUC of the
model provided by the logistic regression.
ISSUES
In building the biomechanical models, USFD had to process data that did not pass the quality check, since
they did not conform to the established protocol (e.g. segmentations of MRI images not including the
entire foot segments, absence of MRI visible markers in some of the bony landmarks or gait data to be
partially relabelled or missing EMG signals). These happened for all of the processed datasets and led to
an exceptional amount of work that has now been successfully completed.
CORRECTIVE ACTIONS
Issues with data quality affecting the model generation were overcome by developing and applying
specific techniques, such as use of bone geometries from other time points and markers calibration using
gait analysis static acquisitions. How heavily this can affect the accuracy of the final model predictions is
yet to be quantified.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
55
WP11 – Modelling and simulation for NND
Responsible partner: MOTEK
T 11.1 - Construction of a scalable mass distribution model suitable for the paediatric population [M01 - M48] The mass scaling model for HBM was developed and tested. Deliverable 11.1 and 11.2 were submitted.
The developed mass distribution model is part of the workflow to derive patient specific musculoskeletal
models from MRI images. T 11.2 - Development of a personalised disease specific skeletal model [M 12-48] A personalized disease specific skeletal model was developed. Deliverable 11.3 was submitted. Different
methods were developed to construct patient specific skeletal models either from MRI data sets or from
functional calibration methods. The functional calibration method was implemented in the clinical gait
analysis software and used by the clinical partners. TUD, VUmc and Motek are setting up an extensive
validation project to compare musculoskeletal modelling outcomes between the different methods and
the generic model.
T 11.4 - Design of models driven by the dynamics of gait perturbations [M12 - M36]
A dedicated protocol for advanced perturbations of the walking surface was developed. For this a servo-
motor was used to rapidly accelerate and decelerate one of the two belts while at the same time joint
powers and muscle activation using EMG can be measures in order to estimated reflexes and spastic
activity during walking.
ISSUES
There was a delay in the availability of data from WP6 in terms of MRI and clinical gait data.
CORRECTIVE ACTIONS
Data appeared to be available, but not uploaded to the database. In the last few weeks this has been
done and the modelling pipe-line which was already in place is now fully operational to develop the
patient specific models.
WP12 – Models validation, outcome analysis and clinical workflows
Responsible partner: OPBG
Given the near model readiness and near to completion follow-up data collection, clinical validation is
now actively being performed in all disease areas.
To In order to clearly establish and represent the clinical validity of the mechanical and statistical model, we
have identified detailed clinical use-case for each disease area, defined the clinical decisions which might
eventually be influenced by the performed modelling and documented how the outcome of
technical/modelling activities may influence clinical practice.
Accordingly a robust validation protocol has been prepared, beyond the ongoing technical proofs of
accuracy, highlighting key tasks and relevant timelines.
T 12.1 - Clinical assessment and validation and integrated clinical workflows and personalized
treatment models (OPBG, M13 - M48):
As an example we report the detailed plan from the Juvenile Idiopathic Arthritis Group (WP 5):
WP-5 Clinical Validation Outline
Clinical prediction model
The clinical prediction model developed within the framework of MD-Paedigree will use baseline
predictors (i.e. available at the time of diagnosis) to predict the disease course in the first one or two
years of disease. This timeframe was chosen since a) the ultimate goal is to induce disease remission
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
56
within this time frame in all children; and b) short-term disease remission has been shown to be a
predictor of longer-term disease outcome.
Accordingly, validation of the clinical prediction model has been split up in three stages:
1. Protocol validation
2. Internal model validation
Only stage 1 and 2 will be performed within the framework of MD-Paedigree. For stage 3, recruitment of
additional patients, not used for model development, is necessary. This needs therefore to be done in a
later stage.
Protocol validation
For the clinical prediction model, different predictors are taken into account, which divide themselves
into five broad categories: clinical, laboratory, microbiological, immunological and imaging.
As far as the clinical and laboratory parameters are concerned, these are routine parameters collected
during all visits patients with (suspected) paediatric rheumatic disorders. The standardized collection and
use of these parameters have been described in a myriad of studies and can therefore be considered
validated.
Concerning the microbiological (faeces samples) and immunological (Luminex analysis) parameters,
similar remarks need to be made about both. Samples for these analyses will be collected in three
different clinical centres using slightly different approaches (e.g. collect faeces samples at home and
send them within 24 hours by express courier, versus storing them in a domestic freezer and bring them
at the subsequent visit). Protocol validation for these samples will therefore consider comparability of
the samples across the different centres. In particular, during analysis of the samples, the quality of the
samples will be assessed by evaluating the quantity and viability of the material. Furthermore, during
statistical analysis, the occurrence of clustering per centre will be assessed using principal components
analysis and clustering techniques, such as k-means clustering.
Finally, the use of imaging parameters will be validated, taking into account the ease of acquiring these
data (with respect to equipment, needed training and time to acquire the images). Validation and
standardization of imaging protocols is currently being discussed and studied by dedicated working
parties (e.g. the OMERACT ultrasound working group) and falls outside the scope of this project.
Internal model validation
Performance of the model in classifying patients with a binary outcome (active disease versus inactive
disease) is being evaluated using the area under the receiver-operating characteristics curve (AUROC).
Furthermore, the sensitivity of the model to the data, and in particular its optimism in the predictions
(the phenomenon that a model always performs best in the data set it was fitted to) is tested using 10-
fold cross-validation, a technique aimed at reducing the optimism of the model, thus yielding an
estimate of its real performance in independent data sets.
Biomechanical ankle model
Detailed clinical use case for validation
Four predictive medicine use cases were identified:
1. Identify from the biomarkers which side is affected, in mono-lateral cases. The ability of
anatomically located biomarkers to correctly discriminate which side is affected, would confirm
that local mechanisms play a major role, and justify the following use cases.
2. Identify from biomarkers single-joint from multi-joint cases: test whether the biomarkers can
correctly separate those patients who have multiple joints involved, from those who only have
one joint. This could cast some light on the mechanisms beyond this significant difference in the
clinical manifestation of the disease.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
57
3. Treatment stratification: JIA patients are stratified, at the 6-month control, into those who
responded to the first-line treatment and went in remission, and those who did not: responders
and non-responders. Prolonged disease activity in the non-responder patients, who at the
second control start a more aggressive treatment, increases the risk of irreversible joint damage,
which could be reduced or avoided altogether if we could from the outset stratify which patients
would not respond to the first-line therapy. Testing biomarkers might accurately stratify, on the
basis of the data collected in the first visit, these two groups, we could modify the clinical
pathway, so that those patients who are considered, on the basis of these biomarkers, to be at
risk of non-response, could be immediately treated with the more aggressive therapies.
4. JIA progression: this is characterised by two major clinical signs, local inflammation and cartilage
damage or bone erosions. While all patients who have prolonged inflammatory flare will
eventually develop some cartilage damage, the delay between the beginning of the flare and
that of the cartilage damage, and the rate with which such damage progress varies considerably
between patients. The most fundamental question is: do biomechanical alterations affect
structural damage progression? Accordingly we imagine a clinical pathway where the subject-
specific model is used to personalise the life style recommendations or the use of assistive
devices until the inflammatory flare recedes.
Validation protocol
Robustness
The robustness of the data acquisition, processing, and modelling will be quantified through the results
of the quality assurance protocol applied over the entire workflow:
a) Data availability: for each dataset to be collected (for each patient-control) confirm if it has been
collected and properly uploaded to the project repository;
b) Data processability: for each dataset and for each data processing step, confirm that the data
can be processed into an output of quality sufficient for its subsequent use;
c) Data modellability: for each dataset and for each data processing step, confirm that the data
processing outputs are of quality sufficient to inform the subject-specific model, without
significant degradation of the expected predictive accuracy.
Clinical accuracy – laterality discrimination
All patient-control cases will be separated in left, right, and bilateral.
Initially we will compare the average value of each biomarker in the left and right groups, using a
Student’s t-test, or a Mann-Whitney test when variables are not normally distributed.
Given the large number of biomarkers we will use exploratory analysis techniques to explore the
independence of the biomarkers, and their difference in relation to laterality.
If the exploratory analysis suggest that combinations of biomarkers might necessary for an effective
discrimination, we will then use logistic regression models to test the discriminative power of
independent biomarkers in combination. Depending on the completeness of the biomarkers matrix
some case might have to be dropped from this analysis, as sparseness degrades the performance of
logistic regressions.
Finally, if one or more biomarkers are confirmed to discriminate laterality, we use the Area Under the
Receiver-Operating-Characteristic Curve (ROC-AUC) as a measure of the predictive accuracy of that
biomarker for laterality; in case of combination of biomarkers we will calculate the ROC-AUC of the
model provided by the logistic regression.
Clinical accuracy – Single-multi joint discrimination
The protocol is identical to the previous one, except in this case we will cluster biomarkers values for
patient-controls in single-joint and multiple-joint groups.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
58
Clinical accuracy – Treatment stratification
All patients for which it was possible to obtain a given biomarker at zero and six-month controls will be
clustered in respondent and non-respondent groups, based on the clinical assessment at the second
control. The two groups will then be analysed with the same approach described in the previous points.
So far only 3 were not responsive to the treatment, so this assessment might be challenging.
Clinical accuracy – Cartilage damage prediction
This is the most complex use case. Assuming for simplicity that we can reduce this complex process to a
binary condition (for example patients who do have cartilage damage at 6 months, opposed to those
who have observable damage only at 12 months), the assessment protocol would be similar to those
above.
Clinical validation work plan
The validation protocol will assess the feasibility of the various protocols (data collection, processing, etc)
adopted in the project and the clinical accuracy of the developed models. Summarizing the content of
the previously sections, the individual tasks to be accomplished can be listed as follows:
1) Assessment of data acquisition, processing and modelling procedures
2) Validation of model accuracy in predicting flare laterality
3) Validation of model accuracy in identifying the number of affected joints
4) Validation of model accuracy in predicting treatment response (stratification)
5) Validation of model accuracy in predicting cartilage damage
The proposed working plan is presented in Table 1 as a Gantt chart reporting the objectives of the
validation protocol at 3 month intervals.
Table 1 Gantt chart for the proposed project plan and timelines of the validation protocol.
Tasks M43
M45
M48
M50
Assessment of procedures
Clinical accuracy: Laterality
Clinical accuracy: Multi-joint discrimination
Clinical accuracy: Treatment stratification
Clinical accuracy: cartilage damage prediction
Issues : due to the significant heterogeneity and complexity of the different diseases analyzed by the
NMD group, delay in delivering the model has occurred. Accordinlgy the validation process has suffered
some delay.
Corrective actions : the WP 12, with the agreement of the MD Paedigree consortium, has successfully
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
59
guided reallocation of funds, specifically dedicated to NMD validation.
WP13 –Requirements and Compliance for the MD-Paedigree Infostructure
Responsible partner: HES-SO
Completed at M 36
WP14 – Grid-Cloud Services Provision and GPU Services Integration
Responsible partner: MAAT/GNUBILA
T14.1 Adaptation and Extension of Sim-e-Child Platform (MAAT, ATHENA, LYN) [M1-48] Beta Release is taking place, with the last functionalities (such as access rights) under configuration
according to project specifications.
The system is used by several users with different profiles:
- IT partners querying data using the query management tool.
- IT partners querying data using the web service.
- Physician using Patient Browser to check specific patient information
- Modeller downloading patient’s data through the patient cat interface to test modelling tool.
T14.3 Athena Distributed Processing (ADP) Engine Integration (ATHENA, MAAT, UTBV) [M1-48]
ATHENA has continued development of the open source project EXAREME (ex. ADP+madIS,
http://www.exareme.org/) which offers a relational processing engine able to support scalable
distributed execution of complex, resource and time-consuming data processing flows mainly related to
data mining and decision support. In addition, data mining algorithms can be implemented with
EXAREME in a privacy-preserving way, transmitting only aggregated hospital data (sufficient statistics).
The IT infrastructure that might typically be required to undertake the execution of a distributed and
privacy preserving algorithm through EXAREME, comprises of four main components:
(1) a Worker component of EXAREME which is deployed inside each data node, accesses the local
database and processes the local data.
(2) a Master component of EXAREME which is deployed inside each data node, accesses the worker
components of all the data nodes and processes the aggregated results which are computed and
transmitted by the worker components.
(3) a Repository which stores the source codes of algorithms in a form of query templates. The repository
is hosted by a version control system (VCS). It can be accessed by the worker and master components as
well as by the users. In the future, this repository should also provide access control (authentication and
authorisation, as well as audit trail/logging capabilities. Authentication and authorization will be
achieved through the use of user credentials (passwords, access keys, etc.). A code review system will be
used to review all changes of the repository. A bug tracking/ticketing system will be used to collect all
user' submitted bugs and feature requests.
(4) A Gateway through which it is feasible the communication between the user and the system. The
Gateway is in reality a part of the Master Component.
EXAREME currently supports the following functionalities for supporting distributed data mining
algorithms in a privacy-preserving way:
1. Get list of the available algorithms such as K-Means, Linear Regression, Covariance Matrix, Standard
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
60
Deviation, Summary Statistics.
2. Submit any of the available algorithms for execution.
3. Get the execution status of a submitted algorithm.
4. Get the execution results of a completed algorithm.
T14. 4 SOKU Implementation (MAAT, HES-SO, LYN) [M13-48]
T14.4.1 SOA design for VPH-Share and open source technologies integration (MAAT)
VPH-DPS tool has been generalised for several importation process allowing same sources to be
redirected to VPH with no additional coding. Also, imported or not via DPS, the standard followed by the
repository for data retrieval based on web service allows automated transfer to trusted platform with
ease.
T14.4.2 SOA Governance Layer (SGL) development (HES-SO, MAAT) Technical access rights ability has been added to the system. A governance policy has been proposed to
the partners at the semi-annual meeting. The system will be configured to follow this policy.
T14.5 Privacy and security issues (LYN, ALL) [M1-48] A data control process has been defined and is taking place.
Issues
Data collection is still an issue. It is still difficult for the infostructure to handle all the sources because of
the complexity and the plurality of modalities. Completeness of patient’s data is not easy to define as far
as it is dependent of pathology, follow up and response to treatment. The original statement for the
calculation of completeness on collected data is not applicable and fakes the indicators.
Corrective actions
Data managers and controllers have been defined for each disease group for each centres to streamline
communications, a follow-up of data based on provided data and objectives has been developed and will
lead the collection final steps. A strong synchronisation between Quality manager and infostructure
partners is on the way to be able to redefine and complete existing indicators to relate with the real
state of advancement.
WP15 – Semantic Data Representation and Information access
Responsible partner: HES-SO
Brief overview of the last six months
T15.1 Data curation and validation tool (ATHENA, MAAT, URLS) [M6- 48]
In the past half year, the Data Curation and Validation (DCV) tool was tested with multiple users and it
performed pretty effectively in the corresponding time of various actions. Furthermore, new feedback
about DCV functionality with regards to user friendliness and interactivity was given by the clinicians and
biomedical personnel, driving each further release of the tool (i.e. bugs were corrected and the user-
interface and interactive visualizations were improved based on this feedback). Extensive study was
conducted in terms of enhancing DCV with the addition of PAROS, a usage profiling component, so that
recommendations may be given to further facilitate the user in his exploration and curation goals. Once
the work is completed, the profiler will monitor the users’ actions and choices, as they interact with the
data, and build their profiles. The recommender engine will then be able to provide recommendations
on any relevant possible future actions. For example, it may suggest visualization approaches that have
proven useful to other users on similar data or corrections on erroneous data, by generalizing the
pattern of errors and their corrections performed earlier. Concerning the gait data, C3D files have been
checked by URLS for anonymization and completeness, which was useful for subsequent processing and
modelling as expected from the DoW. Where relevant, collected data have been rejected.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
61
T15.2 Semantic data representation and interoperability (SAG, ATHENA, HES-SO, LYN) [M1-48]
The service to automatically attribute MeSH descriptors to discharge summaries is ready. This service has
been integrated to perform several tasks (indexing, machine translation…).
T15.3 Case-based querying (ATHENA, HES-SO, SAG, LYN) [M1-48]
The strategy explored in task 15.2 has been used to expand the CBR functionalities to integrate content-
based image retrieval and legacy literature retrieval. Based on a set of episodes of care judged as
relevant by the users and documented in Italian, MeSH terms are automatically assigned. The Italian
MeSH terms are then transcoded into English descriptors. More generic/specific descriptors can be
added using the UMLS meta-thesaurus. In parallel, normalized MeSH terms are used to query Shambala
and Europe PMC, the Open Institutional Archive for life and health sciences.
In addition, ATHENA has focused on developing a similarity based querying service. Such service has
been incorporated into DCV data curation and analytics platform (T15.1 & T16.1) and at this point is
mainly based on well-established k-NN algorithm and several similarity functions. Moreover, for free text
attributes (e.g., discharge summaries) and/or genetic or other Bag of Word (BoW) similarity analysis is
based on an innovative multi-view probabilistic topic modelling engine.
T15.4 Multimodal case-based retrieval and query reformulation (HES-SO, SAG, ATHENA, URLS, MAAT)
[M1-48]
Further development on the case-based retrieval service has been implemented, based on the
suggestions/feedback obtained during both the face-to-face evaluation session held in Rome in January
2016 and the training session held in Rome in February 2016. In particular, the processing of the
relevance feedback has been improved: 1) improvement of the Rocchio-based refinement, 2) addition
of a MeSH refinement based on the service developed within task 15.2; and 3) exclusion of cases
judged as non relevant. Alternative feedback features are also investigated, such as latent semantic
indexing (LSI) in cooperation with UTBV. Cross-links to expand user queries using content-based image
retrieval (radiology reports) and literature retrieval has been added in the GUI. As a future work, the
integration of the updated GUI should be achieved on the MD-PAEDIGREE portal.
Regarding image-based retrieval itself, a use-case scenario involving cardiomyopathy patients has been
identified. It is currently being used to adapt legacy tools: Shangri-La and Shambala such that the latter
are able to deal with the variety of imaging modalities encountered in MD-Paedigree repository. New
feature extraction methods are also being developed in order to facilitate the assessment of patient
similarity. Future work includes the addition of the tools to the MD-PAEDIGREE portal and cross-linking
with the text-based retrieval tools.
T15.5 Data Modelling and Support (MAAT, USFD, ATHENA, SAG, HES-SO, URLS)
Preliminary work on anonymization of discharge summaries has been performed by HES-SO based on a
specific EHR document template provided by GOSH.
By using the protocol already developed and validated at WP11, URLS conducted a detailed analysis of
strength measurements in healthy subjects and children with Douchenne muscular dystrophy. A Quality
Assurance Model was developed as well as a reporting system for each strength trial. Personalized
results and quality analysis were produced for each subject.
A quality coefficient was computed for each trial allowing to discard unreliable trials and allowing to
ensure the overall quality of strength measurements.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
62
WP16 – Biomedical Knowledge Discovery and Simulation for Model-guided Personalised Medicine
Responsible partner: ATHENA
Brief overview of the last six months
In T16.1 “General data analysis and knowledge discovery tools”, we have continued developing
ATHENA’s end-to-end data cleaning, analysis and KDD platform. The developments of the past six
months have been driven by specific use-cases provided by clinicians from the NND, JIA and obesity
groups.
More specifically, for the NND use-case predictive models were created to automate the clinicians' CP
gait classification. White-box techniques (Random Forests) were used in order to provide clinicians with
the rules constructed for each classification. Having completed the classification step, we now work on
identifying similar patients in each new classification group applying different distance functions. In order
to combine and accelerate our analysis, we also applied the k-NN classifier to both classify the patients
and search for similar ones simultaneously. Models were assessed using a stratified 10-fold cross-
validation and the results showed that our models achieve prediction accuracies over 85%. Therefore,
the models seem to be reliable for classification purposes, thus facilitating the identification of patient
similarities.
For JIA, we pre-processed with DCV the JIA clinical and Luminex datasets, followed by a preliminary
clustering and dimensionality reduction analysis. The Luminex borderline cases appear not to form their
own sub-cluster, showing that a more sophisticated calculation of baseline threshold should be designed.
The projection on two dimensional space also showed three patient outliers. We are next going to
continue on designing supervised analysis after the enriched datasets will become available.
Following the previous study on Gait Analysis data storage and its criticalities related to the C3D file
format, URLS developed a software tool, with graphical interface, to access kinematics and kinetics Gait
tracks and convert them to other accessible formats. The tool can now be used to extract relevant data
from C3D and share it anonymously with other partners.
For the NND group, URLS designed a protocol to validate strength measurements and conducted a study
about the quality of knee strength assessment in healthy subjects. A paper containing the results was
prepared and it is currently under review at IEEE Transactions on Instrumentation and Measurement.
Software for processing strength trials was now updated with a graphical user interface to speed up the
processing and help clinicians in its use. The tool is able to show graphical reports or export data in
accessible formats. The tool allows to compute a quality coefficient for each strength trial and therefore
it allows to identify and discard low quality trials.
In T16.2 “PAROS Personalisation Platform”, work is under way on enhancing DCV with the addition of
PAROS, a usage profiling component, so that recommendations may be given to further facilitate the
user in his goals. The expected outcome is that the profiler will monitor the users’ actions and choices as
they interact with the data and builds their profiles. The recommender engine will then provide
recommendations on any relevant aspect of possible future actions, e.g., the actions themselves or
whole sequences of them (workflows), their results, source datasets, visualizations, so as to further
improve the exploration and curation tasks. For example, it may recommend a search or analysis on a
particular area of a dataset, on the basis that its results may reveal similar errors to those the user
corrected earlier in a different area of the dataset. Or it may recommend a more specialized search than
those already posed by the user, based on how other similar users proceeded with their exploration and
eventually succeeded, having followed similar earlier searches as well. It may also suggest specific data
transformations, based on user-defined mathematical formulas that have been used on similar data;
additions to the logical profile of data (e.g., functional dependencies), when elements of the
corresponding statistical profile appear (almost) universal; visualization approaches that have proved
useful to other users on similar data; or corrections on erroneous data, by generalizing the pattern of
errors and their corrections performed earlier. Beyond actions, it may also directly recommend results of
searches similar to the ones posed (generalizations or alternative specializations usually), actual
visualizations, and possible repairs produced by profiling, detection, and cleaning algorithms.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
63
In T16.3 “AITION Knowledge Discovery & Simulation Framework”,
We have performed a preliminary analysis of the NND and JIA datasets working with exact structure
learning algorithms. Our goal is to discover associations among specific sets of variables per each case
utilizing “exact” structure learning algorithms and build specific statistical simulation models that can be
used later on for targeted prediction tasks and “what-if” analysis.
T16.4 “Data-driven drug and trial design”
Minor maintaining tasks (security patch, terminology updates...)
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
64
WP17 – Testing and validation
Responsible partner: ATHENA
Brief overview of the last six months
T17.1 MD-Paedigree Infrastructure testing and validation [M18-48]
Infrastructure is in place and running. Several users with different profiles uses it. Physician, modelers
and IT are using the infrastructure using all the kind of available connection with success. Problems are
reported and fixes frequently by the infrastructure team in an agile based process. Last format of data
has been identified and importation process have been defined. A quality controller has been chosen for
each disease group and each center to check and validate data completeness and format in the
repository.
T17.2 Case- and ontology-based retrieval service testing and validation [M24-36]
Implementation of the suggestions/feedback provided by the clinicians during both the face-to-face
evaluation held in Rome in January 2016 and the training session held in Rome in February 2016 has
started and has been reported in Deliverable D15.3.
In T17.3 Beta Prototype of KDD & Simulation Platform testing and validation [M36-48], we continued
with a number of additional development & testing iterations (similar to agile sprints), which were driven
by the requirements of the NND, JIA and Obesity use-cases of T16.1 and their end-users (clinicians, data
analysts, researchers, etc.), following the quick production process that we have already adopted for
making the functionalities usable (and so testable) as soon as possible. In addition, we prioritized and
began implementing the suggestions/feedback provided by clinicians during the beta prototype training
session in Rome in mid-February 2016 (the feedback was reported in in Deliverable D17.5 “Test on Beta
Prototype of KDD and Simulation Platform”). Bugs were corrected and the user-interface and interactive
visualizations were improved based on this feedback.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
65
WP18 – Dissemination &Training
Responsible partner: LYN
A series of dissemination and training activities were carried out during the last 6 months of work in
WP18.
T18.1 Project Web-site
The website was updated with the latest activities
performed by the project, in particular with regard
to public outreach activities.
Two new sections have been added:
1) Publications;
2) Public Deliverables (as per a specific request
coming by the EC).
Also, the banners have been updated.
Furthermore, a new Liferay theme was developed
for the private section of the website, namely the
platform.
T18.2 Dissemination Materials
No new dissemination material has been produced in the past six months.
T18.3 Training
With regard to the training activity, a 3rd training and demonstration session took place during MD-
Paedigree’s Biannual meeting, held in Leuven on the 12th and 13rd of October. During the meeting, the
training team showcased the latest development of the various tools implemented during the project,
namely: the data curation and validation tools (mainly based on some end-users’ cases) by Athena RC;
CaseReasoner by Siemens and the Case-based Retrieval service by HESSO. At the end of such
presentation, the clinicians were guided through the direct usage of some of the key features of the
different systems, allowing an interactive training session with direct questions and answers.
The partners also agreed to prepare a last training session during the final MD-Paedigree conference (to
be held in May 2017), where the clinicians themselves will demonstrate the tools to the external
audience attending the meeting.
T18.4 Seminars, Workshops, Concertation Activities with Other ICT Funded Projects, and Scenario
Analysis Sessions
MD-Paedigree successfully attended the 50th Meeting of the Association of European Paediatric
Cardiologists (AEPC 2016), held in Rome in June 1th-4th, through a joint booth organised with its cognate
project CARDIOPROOF. The various videos presenting the project results were shown in the booth, and
the annual newsletter were distributed.
T18.5 Newsletter
As part of the dissemination activities, a newsletter is currently in publication, reporting on the latest
achievements and future activities. A focus has been devoted to the MD-Paedigree Final Conference (to
be held in May 2017).
T18.6 Community Liaison and Feedback
As already mentioned, liaisons with CARDIOPROOF resulted in the joint participation to the AEPC2016
conference.
T18.7 Engaging Parent and Patient Associations
Further contacts with patients’ associations were activated, in view of the final conference. Also,
dedicated materials for communicating effectively with these associations has been produced.
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
66
WP19 – Exploitation, HTA, and Medical Device Conformity
Responsible partner: EMP
The WP focused on the activities oriented to the refinement of the economic model delivered within
D19.4 (within which, for estimating the costs of the clinical pathways, the exact costs of the technologies
used in each pathway as well as the percentages of the risk patients that go through each pathway were
assessed, collected, and computed, as reported in the yearly report).
In particular, the work has focused on feeding the model with new datasets, thus allowing the illustration
of how the transformation of bio-computational modelling and VPH technologies into a future patient
flow will supplement and improve the management the specific diseases.
As a results, the upcoming D19.6 will provide ample space for thorough impact assessment calculations
and quantification (including data from GOSH). The deliverable will represent a major step forward
towards a MD Paedigree health economic model, even if only for the case of cardiomyopathy. To
complete the deliverable, the model were trained with first data, most of it based on real costs and solid
estimates, even though certainly the quantitative results and most of the assumptions have to be treated
with much caution (but it has to be specified that such assumptions served foremost for driving the
model development, less for generating sound quantitative results). First indications, about how
simulation technologies can shift hospital costs and increase cost-effectiveness of available diagnostic
tools are available now.
With regard to the exploitation, several partners have been involved in the preparation and support to a
new proposal submitted within the SMEs’ instrument framework. All partners have been asked to
support the initiative by subscribing specific letters of intent, declaring their availability to share their
tools and expertise, in the attempt to realise a commercial product to be tested with the primary end
users, the hospitals. Also, the clinical centres were involved in the initiative, in the role of key customers.
The proposal outlined a specific business model, mainly revolving around the MD-Paedigree platform,
and exploring the possible ways of data valorisation and exploitation. Also, the proposal outlined a
number of services to be offered on the market to different stakeholders (research centres, biomedical
and pharma industries, hospitals, etc.).
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
67
Financial, administrative and consortium management relevant information
Financial and administrative information
No financial issues were raised in this period
Consortium Management
No particular activities have been deemed necessary for the management of the Consortium.
MD-Paedigree’s Meetings
The following tables report about the Project’s cooperation activities that have been performed in the last
six months.
Physical meetings
Meeting Location and date
4th Biannual Meeting- Training Workshop Leuven (Belgium) September 12-13
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
68
Dissemination activities
Conferences, Workshops attended/organised/foreseen
A list of the external meetings (conferences, workshops, etc.) with date and place held during the reporting
period or foreseen for the next reporting period is given in the table below with a brief description of type,
scope and number of persons attending events.
Title of the event
Date & Place
Attendees from the
Project
50th meeting of the Association
of European Paediatric
Cardiologists (AEPC 2016)
Rome (Italy), June 1-4, 2016 LYNKEUS, DHZB, OPBG
8th Research Data Alliance (RDA)
Plenary Meeting
Denver (USA) September 11-17 LYNKEUS, ATHENA
VPH Workshop On Clinical Data
Management and Sustainability
Amsterdam, 28th September
2016
Edwin Morley-Fletcher,
D. 1.5.4-Fourth Half-Yearly report MD-Paedigree - FP7-ICT-2011-9 (600932)
69
Conclusion
MD-Paedigree has entered Phase 4 of its multi-layered progress. Albeit various administrative and
organisational issues occasionally occurred, all relevant milestones have been reached, and all the partners
are steadily working on the completion of the validation tasks for all the models developed. Accordingly,
the forthcoming milestone (M51) includes the Final Data Collection and Prototypes, Clinical Validation, and
Deployment.
Three work packages have completed their activity at month 36 of project (February 2016) and their final
deliverables have been delivered to the PO.
To guarantee the future sustainability of the developed platform, a specific effort was conceived in
concluding a Cooperation Agreement with the EU funded project CARDIOPROOF, also to ensure the
accessibility of the (duly curated and anonymised) datasets collected within the project for future research
initiatives, as requested by the EC.
The consortium was also involved exploring a joint exploitation path, through the subsequent SME’s
instrument project proposals, focusing on the development of a business model revolving around the
platform developed within the project, in the attempt of valorising both the datasets, the data
management, anonymization, curation, and analytics tools, as well as the models and models validation
techniques implemented during the project.
This was made possible in particular by the most recent progresses in the development of the MD-
Paedigree Infostructure, of the associated data management tools, and on the specific effort devoted to
the integration of such tools in a single technological framework, associated with the datasets.
This allowed also the organisation of the third demonstration and training session, held in Leuven during
the fourth half-yearly meeting, in September 2016.
Beside the effort on validation, which will surely represent the most significant part of the work to be
performed until the end of the project, the consortium is also working on the refinement of the economic
model associated with the introduction – in the hospitals – of the advanced CDSS developed during the
project, in particular trying to understand the new workflows and associated potential costs reduction.
Finally, the consortium is working on the organisation of a public event at the end of the project, to be held
in May 2017, to present to a wider community of stakeholders the outcomes of the project, to demonstrate
the developed tools features and to launch new research initiatives building on top of those results.