+ All Categories
Home > Documents > The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data...

The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data...

Date post: 30-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
40
The SAIL DataBank A National e-Research Platform for Wales David V Ford Professor of Health Informatics Swansea University Medical School
Transcript
Page 1: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

The SAIL DataBank

A National e-Research

Platform for Wales

David V Ford

Professor of Health Informatics

Swansea University Medical School

Page 2: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

1. A quick overview of me and what my group does.

2. A description of the SAIL Databank as it operates in Wales

3. The technologies and systems we now have available for others to use

4. Questions / discussion

Agenda

Page 3: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

1. Director, SAIL DataBank in Wales, UK

2. Director, Administrative Data Research

Centre Wales, part of the ADRNetwork

3. Deputy Director, CIPHER – Part of the

UK Farr Institute of Health Informatics

Research

4. Current Director of the International

Population Data Linkage Network

My major affiliations

Page 4: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

New Data Science Building at Swansea

• Funded by MRC (Farr);

ESRC (ADRN) and Welsh

Government

• High security home to Farr

Institute, ADRC Wales,

SAIL Databank and many

other data-intensive

projects

• Office space for NHS and

other public sector staff and

industry to work alongside

university staff

Page 5: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Research infrastructures at Swansea

• Secure Anonymised Information Linkage (SAIL) system

• The MRC-led multi-funder Farr Institute Centre for the Improvement of Population Health through E-records Research (Farr Institute CIPHER)

• The MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB) Centre

• The Analysis Platform for the MRC UK Dementias Platform (DPUK)

• The ESRC Administrative Research Data Centre in Wales (ADRC-W)

• National Centre for Population Health and Wellbeing Research (NCPHWR)

• NHS Prudent Healthcare Intelligence Unit

Page 6: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

What is the SAIL Databank?

• Designed to safely provide a means of linking together all person-

based data within Wales for use in research and public-benefit

enquiry

• Assembling population-scaled life stories from data across time and

across organisations (datasets)

• Initial focus on health, now broadened into wellbeing and beyond (+

local and central government) – social justice, housing, education,

policing, employment, etc

• Purpose: to support evaluation of natural experiments (i.e. policy and

service changes); epidemiology; “e” and hybrid cohort studies;

intervention studies (clinical trials); system modelling and many more

Page 7: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

The data challenge

Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA. 2014;311 :2479-80.

Page 8: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

• Built on the international best-practice “Separation Principle”

• Use of our automated, untouched-by-human-hands Trusted Third

Party (TTP) data linkage system, in a totally separate organisation

• Identifiers never given to SAIL

• Sensitive data never given to TTP

• Fully automated if required

• Data never leaves the Databank (instead, access to it is granted)

• All projects approved by independent IG panel (in 30% public)

• Only minimised data, that is needed is provided to projects

• Compulsory training for all data users. Strict legal agreements.

Built on well established principles

Page 9: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

SAIL Split File Principle

Demographics

+ Link Key

Clinical (s)

+ Link Key

Supplier Data

ID Name Address BP Diag

56 Fred Bloggs The Big house 120/80 G33..

78 Jim Jones 87 Peterson Rd 135/45 P123.

45 Harry Lucas 19 Meirwen 125/75 G77..

ID Name Address

56 Fred Bloggs The Big house

78 Jim Jones 87 Peterson Rd

45 Harry Lucas 19 Meirwen

ID BP Diag

56 120/80 G33..

78 135/45 P123.

45 125/75 G77..

Load into SAIL

ALF_E BP Diag

4252 120/80 G33..

7482 135/45 P123.

8436 125/75 G77..

Linkage

File 1

File 2

File 3

ID ALF Conf

56 65276573 88

78 32377722 97

45 27638236 95

Add this field

Page 10: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

SAIL Split File Principle

Additional Project level encryption of ALF_E PALF_E

Page 11: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

It’s all about data linkage

Page 12: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

• SAIL= Secure Anonymised Information Linkage

• >12 billion records of the people of Wales, >5 million people

• 500+ feeder systems from Wales, inc >350 GP practices (>80%)

• Much data goes back 20-25 years

• All pre-linked then de-identified

• £5m+ investment in high performance IT

• Strong privacy protection & IG

• Currently supporting externally funded projects with value >£90m

• Over 300 registered users and 140+ active SAIL projects.

• >100 staff in Swansea working on Health Informatics-related projects

• Average 35 day turnaround from application to data

• Applications open to all

Page 13: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the
Page 14: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

• Built to consume (new data) swiftly

• Secure sharing for projects, based on project specific data views.

• Total population coverage

• Used for:

observational research (case control; e-cohort studies; etc)

trial feasibility,

outcome data for trials,

extending traditional cohorts,

post marketing surveillance

new technology evaluation

evaluation of natural experiments (i.e. service and policy change)

• Lots of trial and cohort study participants embedded in SAIL

Page 15: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Core holdings:

• Annual District Birth Extract (ADBE)

• Annual District Death Extract (ADDE)

• Bowel Screening Wales (BSW)

• Breast Test Wales (BTW)

• Cervical Screening Wales (CSW)

• Congenital Anomaly Register and Information Service (CARIS)

• Emergency department Data Set (EDDS)

• National Community Child Health Database (NCCHD)

• Outpatient Dataset (OPD)

• Patient Episode Database for Wales (PEDW)

• Primary Care GP dataset

• Welsh Cancer Intelligence and Surveillance Unit (WCISU)

• Welsh Demographic Service (WDS)

• Many more!

Data resources

Project-specific holdings:

• Clinical trials participants

• Conventional cohort participants

• Cross sectional survey participants

• Many, many others!!!

Reference data:

• Data quality reports

• Extract histories

• Coding and mapping information

• Metadata

• Organisation codes

• Lots more!

Page 16: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Sharing what we have learned . . . .

Now available under “research collaborations”

1. National Research Data Appliances (NRDAs): New concentrator

technology for NHS and other data owning organisations, including

automated matching, anonymisation, data management, metadata

capture, data quality assessment, etc.

2. UK Secure E-Research Platform (UKSeRP)– based on the SAIL

Gateway – massively extended to provide a secure platform for data

sharing across the UK – not just SAIL data (for Farr and ADRN)

3. New focus on the capture and analysis of electronic free text data

(on-board NLP in the Appliances)

4. Initially funded by MRC Farr institute and ESRC ADRN grants

Page 17: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

National Research Data

Appliances

“Everything we know, on a box”

David V Ford

Professor of Health Informatics

Swansea University Medical School

Page 18: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

National Research Data Appliance (NRDA)

• Brings many of SAIL's capabilities onto combined hardware and

software

• Shrink wrapped, ready to go.

• Easy to use, low expertise barrier

• Multiple Appliances work together as a larger whole

• Purposes: concentrate data, make it research ready

• Provide utility to data owners and partners

• Initial development funded by MRC

• Potentially provided free to our collaborators

Page 19: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Key features of relevance

• Appliances federate with each other to create a sharing network

• Network can be hierarchical or peer-to-peer

• IG Controls available at every point

• Metadata builds automatically and publishes to a global catalogue

• Data quality measurement automated

• NLP address rich, free text datasets, converting them to SNOMEDCT

• High quality identity reconciliation automatic, de-identification optional

• UKSERP provides scalable, performant analytics platform, with full IG controls

Page 20: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

RDA Use-case

Large scale data sharing platform (SeRP)

Upload

Link

Anonymise

Measure

Catalogue

Manage

Share

Analyse

Organisation A

Upload

Link

Anonymise

Measure

Catalogue

Manage

Share

Analyse

Organisation B

Upload

Link

Anonymise

Measure

Catalogue

Manage

Share

Analyse

Organisation C

Upload

Link

Anonymise

Measure

Catalogue

Manage

Share

Analyse

TTP

Share

Share

Share

Page 21: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Easy, non-technical use

Page 22: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the
Page 23: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the
Page 24: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Data

“sc

hem

a” a

uto

mati

cally c

om

pute

d

base

d o

n d

ata

conta

ined in u

plo

aded f

ile

Page 25: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Publish Dataset – Depend on Configuration/Capabilities. Data will now be available

Page 26: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Data Catalogue – Key Component

Additional points following previous sessions: All DA carry a DC, DS can inherit from other DS DC entries, DC related to Programme/Security domain. DC’s replicate to Regional/Global DC. Road map: DC used to define and create DS

Page 27: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

A Dataset

Specific version & Date

All section attach files

Contact

Request

VIMO

Theme / Type / Level

Tags

Page 28: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

A Dataset (cont.)

DDI, SPSS, SAS, STATA

Page 29: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Data Catalogue – a specific table

Page 30: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Secure e-Research

Platform (SeRP)

“Combine and share your data

and stay in complete control”

David V Ford

Professor of Health Informatics

Swansea University Medical School

Page 31: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Secure e-Research Platform

• Modelled on the SAIL Gateway, now available as tenancies for other organisations

• Allows users to view, manipulate and analyse data using powerful and familiar tools

• No data need leave the Gateway – output vetting process on the box

• Data owners (NHS or academic) remain in total charge. They operate sharing

according to their own IG

• Full suite of IG facilities available to tenants

• All servers based at Swansea in ISO27001 and HSCIC-approved systems

• Multiple projects using different data configurations with multiple users all possible

• Full audit trails on every user, every action available.

• A SeRP tenancy can connect automatically to any number of NDRA’s if required, or

can be used alone (UKSeRP is powered by its own RDA

Page 32: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

UK Se-RP

Page 33: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

UKSeRP Example

Page 34: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Dementias Platform UK

Page 35: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

DPUK Cohort Matrix

Page 36: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

DPUK - expand

WP2: Shared Space Concept

C1

C2

C3

C4 C5

C6

C7

C8

C9

Temp

shared

space for

analyses

Analysts

C6

Data

Imaging

Omics Sensors

Page 37: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

DPUK Operational model

Page 38: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

FARR (Wales) enabling DPUK – Imaging WP

Cohorts

……

Oxford UCL Cambs Edinburgh Imperial Newcastle

Central Hub

Data catalogue

Imaging an important modality. All UKSeRP now image enabled

Page 39: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Image storage, HPC Cluster, Transmart, EMIF

DPUK - UK SeRP

Storage360TB

Storage360TB

XNAT XNATIncoming Dataset

Data Appliance

UKD

P Sp

eci

ficIn

fras

truc

ture

TransMart TransMartSymantec

Harmonisation

*

Data Model Transformation

*

Load balancer

PostgreSQLPostgreSQL

Load balancer

PG ClusterStorage Server

Storage360TB

Storage360TB

Storage Backup

Job Scheduler

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Backup Shared Infrastructure

XNAT DICOM

XNATfs

Open Stack10 to 15 server,

Intel 40 core, 96GB+ each

Page 40: The SAIL DataBank - HRB · Gateway – massively extended to provide a secure platform for data sharing across the UK – not just SAIL data (for Farr and ADRN) 3. New focus on the

Questions?


Recommended