+ All Categories
Home > Documents > Data challenge in health care and life science › wiki › images › 6 › 63 ›...

Data challenge in health care and life science › wiki › images › 6 › 63 ›...

Date post: 26-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
25
Data challenge in health care and life science Bo Andersson, AstraZeneca R&D Lund Bo Andersson, AstraZeneca R&D Lund Semantic Web for Health Care and Life Sciences Interest Group Semantic Web for Health Care and Life Sciences Interest Group 20 October 2008, F2F Meeting, Mandelieu, France 20 October 2008, F2F Meeting, Mandelieu, France
Transcript
Page 1: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

Data challenge in health careand life science

Bo Andersson, AstraZeneca R&D LundBo Andersson, AstraZeneca R&D Lund

Semantic Web for Health Care and Life Sciences Interest GroupSemantic Web for Health Care and Life Sciences Interest Group

20 October 2008, F2F Meeting, Mandelieu, France20 October 2008, F2F Meeting, Mandelieu, France

Page 2: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

2

Outline

!Data challenge,

"Drug development process

"Complex requirements for new health careparadigm

"Research scientists needs

!Activities in AZ with SW components

"Clinical data repository

"Clinical study information

"Large Knowledge Collider (LarKC)

!Summary

"Some thoughts for the future

Page 3: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

3

AstraZeneca R&D is aknowledge organization

in which teams create, use, search, combine,interpret, and manage information to develop

drugs and services.

TI HI LO Early clinical

development DfL Reg LCM PoC

Project Information well managed => NDA + more projects (with less risk)

Knowledge Gap

Page 4: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

Su

sc

ep

tible

Ind

ivid

ua

ls

Smokers/

Noxious

gases

COPD

Lung cancer

CV

Mechanisms?

Systemic disorder

M

O

R

T

A

L

I

T

Y

Hospitalisation

Scre

en

ing

To

ol

Early

dia

gn

osis

Treatment for smokers

and ex-smokers

Influence guidelines

The BIG 3 concept

Maria Gerhardsson de Verdier, MD, PhD, AstraZeneca R&D Lund

Page 5: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

5

Improve the capability to integrateand interpret heterogeneous data

! Build information management capability tosupport drug development:

" Biological and environmental risk factors fordeveloping a disease and prognosis for patients

" Hypotheses for casual chains of diseases (earlydiagnosis)

" Hypotheses about patient characteristics and otherfactors that can explain segmentation criteria

RiskKnowledge

Share many

needs with

Health Care

Page 6: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

6

Project knowledge repository

! Build knowledge management capability tosupport early clinical project team:

" Disease and patient segmentation

" Risk factors for drug class and biological target

" How does others do

" Patient availability

" Animal to human models

" Known problems/failures

Knowledge Risk

Page 7: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

7

Identifying biomarkers and targetmechanisms

! Data interpretation is a non-trivial process thatrequires overcoming:

" Syntax differences in the generated format

" Semantic differences in the format, e.g. usedidentifiers

" Verify, validate and compare experimental results withother established data sets

" Vast heterogeneity of the interpreted information

" Efficient secondary usage of past experimental resultsand analysis conducted in later phases

Knowledge Risk

Page 8: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

8

Signal evaluation of adverse drugevent reports

! During signal evaluation the safety expert will evaluate ifthere is a casual relationship between the drug and theadverse event (method RUCAM):" Time to onset of the reaction

" Course of the reaction

" Risk factors for drug reaction

" Concomitant drug(s)

" Non-drug related causes of event

" Previous information on the drug

" Response to readministration

Knowledge Risk

Page 9: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

9

Outline

!Data challenge,

"Drug development process

"Complex request for new health careparadigm

"Scientists needs

!Activities in AZ with SW components

"Clinical data repository

"Clinical study information

"Large Knowledge Collider (LarKC)

!Summary

"Some ideas for the future

Page 10: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

10

Consolidated clinical datarepository

! The CRL and CCDS are designed based on the assumption thatdiversity in clinical data is part of “doing research”.

" Driver: Business value achieved by effective use of clinical datacross studies and over time

" So, in CRL we will be able to specify the variances in what weobserve on subjects in clinical studies, and the information aboutthese observations.

" CCDS will connect these specifications to the actual data. Andthereby enable us to take informed decisions when we want toutilize data cross variances.

! Enforcement of standards to reduce diversity is a line organizationdecision.

" Driver: Operational efficiency by rationalization of processes andtools for new studies.

" So, CRL will make this task easier by making the preferred(standardized) variant of the specification available as first optionwhen we will set up new studies and acquire new information.

ExistingStudies

New Studies

Page 11: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

11

Clinical Observation Concepts

To store the clinical observation within the CRL

data model we need to define some terminology

What are we trying to measure?

Systolic Blood pressure (carrier of topic )

Could it be measured in a different way and

would that affect the result? YES

• Patient position (qualifier)

• Method/Tool/Equipment (qualifier)

• Location/Site -where you measure it (qualifier)

For the clinical trial is there anything I need to

know? YES

• When was it measured, date (context)

Concepts

Page 12: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

12

Core part of JANUShave been normalized and

implemented in CCDS

" Clinical Observations – “whathappened”

" Findings, Test types, Domains

" Events

" Interventions

! Protocol – “what wassupposed to happen”

! Trial structure (arms, visits)

" Planned assessments

# Like actual findings, but noresult

" Planned interventions

# Like actual interventions

! Analysis plans and results

" Analysis datasets (queryrule)

" Analytic plans

" Analytic results

Page 13: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

13

cause

Information can be

managed for better

and easier use

A collaborative

environment where

scientist can

explore existent

information!

Knowledge will

provide better

decision OPTIONS!

consequenceeffect

Desirable situation

Clinical study information

Opportunity

Page 14: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

14

Sources

Clinical study informationConceptual model

Information Extraction

Information Service (Navigation,

Feedback & Retrieval) (API)

Doc

MgmtCTMS

Study DB Trial

Trove …

Articles …

Study

Knowledge BaseConfirmed/Trusted

Data

Not confirmed

Data

ScientistProject

teams

Page 15: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

15

LarKC in a Nutshell

! “Web Scale and StyleReasoning”

! Giving up 100%correctness:

" trading quality forsize

"often completenessis not needed

"sometimes evensoundness is notneeded pre

cis

ion (

soundness)

recall (completeness)

logi

c

IR

Semantic Web

Page 16: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

16

Main Innovations

!Enriching current logic-based SemanticWeb reasoning

!Employing cognitively inspiredapproaches and techniques

!Achieve scalability trough giving upcompleteness

!Achieve scalability trough parallelization

Page 17: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

17

LinkedLifeData

!Platform developed in context of LarKC

!Automates the process of:

"Transformation of structured data sourcesto RDF

"Load and reason on top of huge amounts ofdata

"Provide web interface to access the data

!Currently running on top of BigOWLIM

Page 18: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

18

Genomics Drugs PatientsDiseaseProteomics Chemicals

Biomedical controlled vocabularies

LinkedLifeData-Translational Medicine

LinkedLifeData - Pathway & Interaction KB (PIKB)

Knowledge base forEarly Clinical Drug Development

Integration and interpretation of heterogeneous:

genes-proteins-pathways-target-diseases-drug-patient

data

Page 19: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

19

Pathway and InteractionKnowledge Base

!Dataset load in LinkedLifeData

!Integrates BioPAX and the related datasources

!First evaluation try!

!Take everything with pitch of salt!

Page 20: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

20

13 October 2008

Database Dataset Schema Description

Uniprot Curatedentries

Original by the provider Protein sequences andannotations

Entrez-Gene Complete Custom RDF schema Genes and annotation

iProClass Complete Custom RDF schema Protein cross-references

Gene Ontology Complete Schema by the provider Gene and gene productannotation thesaurus

BioGRID Complete BioPAX 2.0 (custom generated) Protein interactionsextracted from theliterature

NCI - Pathway InteractionDatabase

Complete BioPAX 2.0 (original by the provider) Human pathway interactiondatabase

The Cancer Cell Map Complete BioPAX 2.0 (original by the provider) Cancer pathways database

Reactome Complete BioPAX 2.0 (original by the provider) Human pathways andinteractions

BioCarta Complete BioPAX 2.0 (original by the provider) Pathway database

KEGG Complete BioPAX 1.0 (original by the provider) Molecular Interaction

BioCyc Complete BioPAX 1.0 (original by the provider) Pathway database

NCBI Taxonomy Complete Custom RDF schema Organisms

Page 21: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

21

LinkedLifeData - PIKB

!Number of statements: 1,159,857,602

!Number of explicit statements:403,361,589

!Number of entities: 128,948,564

!Publicly available at:

http://www.linkedlifedata.com

Page 22: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

22

Outline

!Data challenge,

"Drug development process

"Complex request for new health careparadigm

"Scientists needs

!Activities in AZ with SW components

"Clinical data repository

"Clinical study information

"Large Knowledge Collider (LarKC)

!Summary

"Some ideas for the future

Page 23: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

23

Summary

! Information integrationand interpretation arehuge challenges forscientists

! SW technology haveshowed potential

! Research scientist mustbe closely involved

! LarKC include many ofthe component weexpect to need in thefuture

Page 24: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

24

Some ideas for the future

!We need better solutions to describeinformation so that other humans andcomputers can use it, e.g. ontologies,identifiers, standards etc.

!We need personalized smooth tools tosearch, find, integrate and interpretinformation.

!We need computational support for”annotation”, “reading” and writing

!We believe Semantic Web technologieswill be an important part of the solution!

Page 25: Data challenge in health care and life science › wiki › images › 6 › 63 › HCLSIG$$F2F$$2008-10...Data challenge in health care and life science Bo Andersson, AstraZeneca

20 October 2008

25

Read more about LarKC:

http://www.larkc.eu

http://www.linkedlifedata.com

Contributions from:

Maria Gerhardsson, AstraZeneca R&D LundKerstin Forsberg, AstraZeneca R&D MölndalVassil Momtchev, OntoText Bulgaria


Recommended