+ All Categories
Home > Documents > Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational...

Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational...

Date post: 20-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
79
Big Data Training for Translational Omics Research BigTaP: Week II Wanqing Liu, PhD Assistant Professor Department of Medicinal Chemistry and Molecular Pharmacology Purdue University Min Zhang, MD, PhD Professor Department of Statistics Purdue University
Transcript
Page 1: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

BigTaP: Week IIWanqing Liu, PhD

Assistant ProfessorDepartment of Medicinal Chemistry and Molecular

PharmacologyPurdue University

Min Zhang, MD, PhDProfessor

Department of StatisticsPurdue University

Page 2: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Learning Objectives• Main Topics:

– Biomarker discovery and development using omics data– GWAS for binary and quantitative traits

• By taking this course, you would be able to– Reinforce what you have leant in WK I– Understand the principles of biomarker discovery and GWAS

• Study design• Process• Potential challenges

– Understand basic operation steps for translational omics data analysis

• Data downloading• Data process• Data analysis• Data visualization• Validation• Result interpretation

– Understand literatures in relevant areas– Discuss with biostatisticians and bioinformaticians

Page 3: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Teaching Logistics• http://www.stat.purdue.edu/bigtap/schedule.html• Lectureàoperationàlectureàoperation…• Online resources: database and tools• Case study:

– http://www.ncbi.nlm.nih.gov/pubmed/15193263– http://www.ncbi.nlm.nih.gov/pubmed/27010727– You’ll be tested!

• Homework:– Transcriptome biomarker: case-control/continuous– GWAS: case-control/continuous

• Grouping

Page 4: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Principles of Biomarker Discovery and Development

In Translational MedicineLiuDay 1

Session I8-10am

Session 1:

Page 5: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Philosophy of Translational Research

• As a biomedical researcher, how can I make something to benefit patients?

• I am working on cell lines and mice, how the omics approach can help me understand the mechanism? esp. causality?

• Can the key molecule(s) I identified in cells and animals be able to used in humans?

Lab researchers, grant writers, physicians…

Page 6: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Key Words• Biomarker: A characteristic that is objectively measured

and evaluated as an indicator of normal biologic process, pathogenic processes, or pharmacologic responses to a therapeutic intervention.

NIH Biomarkers Definition Working Group

• Translational: Translational research aims to aid in the transformation of biological knowledge into solutions that can be applied in a clinical setting

Atkinson, et al., Clin Pharm Ther, 2001.Azuaje F. Bioinformatics and Biomarker Discovery, 2010

Page 7: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Why Biomarker?

Page 8: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

A Core Question in Modern Medicine How to Address Patient Heterogeneity?

Page 9: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Patient Heterogeneity

Page 10: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

BiomarkeràPersonalized Medicine

CML Patients

All Breast Cancer Patients

HER2+ Breast Cancer Patients

All NSCLC Patients

EGFR MT+ NSCLC Patients

Gleevec

Herceptin

Herceptin

Iressa

Iressa

90% RR

10–15% RR

35–45% RR

10–15% RR

60–70% RR

Slamon et al. NEJM 2001; Kantarjian et al. NEJM 2002; Vogel et al. JCO 2002. 20:3; Douillard et al. JCO 2010.

Biomarkers are especially important in diseases with low response rates in the overall population

Page 11: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Cancer

Other common diseases

Discovery ImplementationDrug development

EGFR

KRAS

ALK

HER2

ALK

BRAF

Gefitinib

ARS-853?

Crizotinib

Herceptin

Vemurafenib

GeneA

GeneB

ALK

GeneD

GeneC

GeneE

Precision molecules

BiomarkeràPersonalized Medicine

Page 12: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Precision MedicineTo deliver the right treatment to the right patient with the right dose and at the right time

Page 13: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Clinical Application of Biomarker

• Deal with the patient heterogeneity– Early risk assessment– Disease prevention– Assist diagnosis– Optimize treatment: high effectiveness, low risk– Match the patient to therapeutic strategy– Monitor therapy success/disease recurrence– Long-term management

Risk

Diagnosis

Treatment

Monitoring

Page 14: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Biomarker in Preclinical Studies• To characterize the phenotype• To monitor the response• To identify potential translational biomarkers for

humans

Page 15: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Omics Approach in Basic Research• Explore molecular mechanism• Hypothesis generating• Identify therapeutic targets and strategies• Establish intermediate phenotypes

Page 16: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Type of Biomarkers• Prognostic marker (a): before treatment• Predictive marker (b): before treatment• Pharmacodynamic marker (c): after treatment• Surrogate marker (d): during treatment

Gosho, et al. Sensors 2012, 12, 8966-8986

Page 17: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Prognostic Marker• Signature separates a population with respect to the outcome (risk)• Regardless of the types of therapies or treatments

– Markers associated with overall survival regardless of treatment• Distinguish outcome (poor or good) following the test and standard

treatments• Cannot guide the choice of a particular treatment• Can determine the aggressiveness of treatment

Ballman KL, JCO. 2015.63.3651

Page 18: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Predictive Biomarker

Ballman KL, JCO. 2015.63.3651

• Predicts the differential outcome of a particular therapy or treatment• Prospectively identify patients who are likely to have a favorable clinical

outcome from a specific treatment; therefore, a predictive biomarker• Can guide the choice of treatment

Page 19: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Prognostic and Predictive Markers

Ballman KL, JCO. 2015.63.3651

• Biomarkers are both predictive of disease susceptibility or progression and certain treatment outcomes

• ER status and breast cancer-prognostic• ER status and antiestrogen therapy-prediction

• Be careful about the phrase “prediction”

Page 20: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Pharmacodynamic Markers• PD biomarkers provide information about the pharmacologic

effects of a drug on its target• Measured after treatment• A clinical endpoint to be measured• Application:

– Proof of mechanism: i.e., Does the drug hit its intended target?– Proof of concept: i.e., Does hitting the drug target alter the biology of

the tumor?– Selection of optimal biologic dosing– Understanding response/resistance mechanisms

• Examples:– Protein phosphorylation markers. i.e. p-EGFR, p-ERK to evaluate

changes in target protein phosphorylation or the activation status of downstream signaling/adapter molecules.

– Apoptosis (TUNEL assay) to assess pharmacologic effect on proliferation

Page 21: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Surrogate Biomarker• Substitute for a clinical endpoint• expected to predict clinical benefit (lack of benefit or harm)

based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence

• During or after treatment• Examples:• Glucose level monitoring the treatment for diabetes• Imaging-based measurement for anti-cancer therapy

Page 22: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Questions

§ What kind of biomarker is HOX13B:IL17BR in the first paper?

§ What kind of biomarker is blood concentration of R-/S-methadone?

Page 23: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Examples of FDA Approved Biomarkers

Gosho, et al. Sensors 2012, 12, 8966-8986

Page 24: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Gosho, et al. Sensors 2012, 12, 8966-8986

Examples of FDA Approved Biomarkers

Page 25: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Biomarker Discovery and Development in the Omics Era

1970s 1980s 1990s

>2005

Page 26: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Biomarker Discovery and Development in the Omics Era

Genomics Transcriptomics

miRNomicslncRNomicsEpigenomics

Proteomics Metabolomics

LipidomicsExposomics

Page 27: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Prognostic-diagnostic Markers• Genes for ~50% of rare diseases identified

Nature Reviews Genetics 14, 681–691 (2013)

Page 28: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Prognostic-Diagnostic Markers• 11,907 SNPs strongly associated with common diseases

Page 29: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Pharmacogenomic Markers• 166 FDA approved PGx markers for drug treatment

Page 30: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Transcriptomic Biomarkers• MammaPrint test

– Agendia– 70-gene signature for breast cancer prognosis

• Oncotype Dx test– Genomic Health– 21 gene-expression biomarkers for predicting the

recurrence of breast cancer patients, and predicting response to both chemotherapy and radiation therapy

• H/I test– AviaraDx– 2-gene signature that is used to estimate the risk of

recurrence and response to therapy of breast cancer patients.

Page 31: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Technical development

Biomarker Development Pipeline

Discovery Confirmation Assay development

Validation/Refinement

Clinical Validation

Clinical Adoption

§ Genomics§ Transcriptomics§ Proteomics§ Metabolomics§ Lipidomics§ Epigenomics§ Exposomics§ Imaging

Target selection

§ Integrated technologies and platforms§ Multi-analyst assays

§ Robust validated assays§ Clinical grade assays§ Accurate, specific,

reproducible, reliable

§ Clinical grade assays§ Instruments

Number of analytesNumber of samples

https://is.muni.cz

Lead identification

PreclinicalRetrospective

Clinical trials

Marketingclinical use

Page 32: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Institute of Medicine Roadmap for omics-based tumor biomarker test development

Hayes BMC Medicine 2013, 11:221

Page 33: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Institute of Medicine Roadmap for omics-based tumor biomarker test development Hayes BMC Medicine 2013, 11:221

Page 34: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Acquisition Strategies• Retrospective:

– Clinical samples collected before the design of the biomarker study, and before comparison with control samples.

– Looks back at past, recorded data to find evidence of marker-disease relationships

– Inexpensive, rapid– Potentially biased, noisy– Weak evidence

• Prospective– The biomarker-based prediction or classification model is applied on

patients at the time of patient enrolment– Clinical outcomes or disease occurrence are unknown at the time of

enrolment– Less biased– Strong evidence– Expensive, time-consuming,

• Pro-retrospectiveFDA approval!!

Page 35: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Study Design Consideration• Biomarker discovery studies require careful planning and

design• Study style: retrospective, prospective, pro-retrospective• Sample collection• Phenotype• Sample size and power estimation• Other covariates• Data collection• Platform• Replication, validation and application• Data analysis plan

Page 36: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Sample Collection, Assay Design, Data Analysis Plan

• Establish methods• Specimen collection • Processing • Storage

• Establish criteria • Quantity and quality• Minimum amount

• Feasibility • Obtaining specimens

• Assay design• Communication with core/service provider

• Data Analysis• Communication biostatistician and bioinformatician

Page 37: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Sample and Materials• Biospecimen

• Tissue• Blood• Oral swab• Hair• Tear• Urine• Saliva• Feces• …

• Test materials• DNA• RNA• Protein• Small

molecules• Lipids

• Principles:• Non-invasive• Reproducible• Reliable• Specific• Accurate• Inexpensive• Point-of-care

invasiveness

Page 38: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Ethical, Legal, and Regulatory Issues

• Establish communication with regulatory agencies, e.g. IRB, FDA

• Regulatory approvals• Documents:

– Informed consent– Study protocol

• Intellectual property issues• CLIA-lab based test for clinical trials involving

patient selection

Page 39: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Sample Size and Power Estimation• Power setting: 0.8• Statistical significance:

– Discovery: multiple hypothesis (corrected p according to # of tests)

– Validation: usually one hypothesis (p<0.05)• Input parameters: previous publication or

pilot study• Online tools:

– piface.jar by Lenth (2006).• http://homepage.stat.uiowa.edu/~rlenth/Power/

– Microarray power/sample size estimation• http://bioinformatics.mdanderson.org/MicroarraySa

mpleSize/– RNA-seq data:

• Scotty: http://bioinformatics.bc.edu/marthlab/scotty/scotty.php

• RnaSeqSampleSize: https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/

Page 40: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Key Principles: Big Data in Biomarker

Phenotype Molecular Profiles

X“Digits” “Digits”Statistics

BioinformaticsNetwork

Page 41: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Always Start Your Design and Analysis From Data Evaluation!

• What kind of phenotypic and marker data do I have/should I use/collect?

• Are my data normally distributed?• What kind of models should I choose?• What factors may possibly confound my analyses?• How covariate data correlate with my phenotype?

Page 42: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Phenotype to Digits• Nominal data: no order

– yes or no (Binary): disease vs normal, response vs no response

– Cancer type: Breast, lung, colon…• Ordinal data: some order

– Pathologic: Tumor stage: I, II, III– Disease progression: no, mild, severe, death

• Continuous data: – glucose level, LDL, drug concentration, gene expression

• Survival data: time to event– Death, occurrence of disease, onset of toxicity, in hr, day,

wk, month, yr, etc.

Page 43: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Platform

Raw data

“Digits”Ordinal data

0, 1, 2

Continuous Variables-1.2,-1.1,0.58, 1.09,2.34…

Genomics Transcriptomics

miRNomicslncRNomicsEpigenomics

Proteomics Metabolomics

Lipidomics

Molecular Data Collection

Page 44: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Basic Statistical MethodsPhenotype Molecular Profiles

XNumerical data Numerical data

Nominal

Ordinal

Continuous

Nominal

Ordinal

Continuous

Survival

Chi-square test

t-test

ANOVA

Correlation

Log rank

Statistic Models

Descriptive and exploratory association

Page 45: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Basic Statistical Methods

• Continuous data– Normal distributed: parametric method– Non-normal distribution/ordinal data: non-parametric

method• Winsorization• Log transformation: log2

Parametric Non-parametrict-test Mann-Whitney rank-sum testPaired t-test Wilcoxon signed-rank testANOVA Kruskal-Wallis testPearson correlation Spearman correlation

Page 46: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Statistic Models

• Univariate models– Logistic regression: binary/categorical phenotype– Linear regression: continuous phenotype– Kaplan-Meier (KM) method: survival phenotype

• Multivariate models– Multivariate regressions: linear or logistic– Cox regression: survival phenotype

• Other sophisticated models

Page 47: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

• Example• P value cutoff =0.05• 1000 genes: 50 genes by chance (error) at this significance level• If 60 genes with p<0.05, many might be due to noise (false positive)

• Common Correction Method• Bonferroni Correction

• True significance level: pXn, e.g. p=0.0005, n=1000 genes, true p= 0.0005X1000=0.5.

• Correct p value = 0.05/N• Explanation: among all genes selected, the p value for at least one

false positive is <=0.05• False discovery rate (FDR)

• FDR=0.1, meaning among all genes selected, (e.g. 100), we would expect 10 to be false positive

• FDR as high as 0.5 may be acceptable to biologists• Several different approaches to estimate (Benjamini & Hochberg,

B&H, most popular)• Data filtering in the process step can also reduce the number of genes

Multiple Testing Issue

Page 48: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Azuaje F. Bioinformatics and Biomarker Discovery, 2010

Basic Biomarker Discovery Pipeline

Page 49: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Processing• Data pre-processing

– Data filtering and QC• Remove samples with failed experiment• Exclude markers with very low variance• Exclude markers with very low expression levels, e.g.

RNA-seq– Data Normalization

• To transform the data into a format that is compatible or comparable between different samples or assays

• To level potential differences caused by experimental factors, such as labelling and hybridization

Page 50: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Why Remove Genes with Low Variance?

C a s e

C o n trol

C a s e

C o n trol

0

1

2

3

4

Ge

ne

Ex

pre

ss

ion

p=0.004 p=0.008

Page 51: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Reduction• Focus on smaller sets of potentially novel and

interesting data patterns (e.g. groups of samples or gene sets).

• Confirm initial hypothesis about the relevance of the features available and to guide future experimental and computational analysis

• Exploratory univariate analyses– T-test– Chi-square test– Correlation– Univariate regression

Page 52: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Matrix

• Data matrix• Color-coded representations of• Absolute or relative expression levels

Expr

essi

on

Samples

Page 53: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Visualization

dendrogram

• Statistical plotting: Graphpad• Dendrogram and heatmap: R, GENE-E, Gitools

Page 54: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Exploratory Analysis• Univariate analysis• Single marker vs phenotype• multiple-hypotheses testing corrections

– DEG– Fold change– Statistical model: t-test, correlation, univariate regression– P values and other cut-off

• Unsupervised classification (clustering) and visualization• Filtering: to remove uninformative, highly noisy or redundant

markers for subsequent analyses• Supervised classification

Page 55: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Integration• Further reduction• Which marker to be chosen for the predictive model

construction• To estimate the potential relevance of the identified markers and relationships;• To discover other significant genes and relationships (e.g. gene-gene or gene-

disease) not found in previous data-driven analysis steps• Tools:

– human gene annotation databases (e.g. GO), – metabolic pathways databases (e.g. KEGG), – gene-disease association extractors from public databases (e.g. Endeavour), – Other functional catalogues

• Resulting data- and knowledge-driven findings, patterns or predictions provide a selected catalogue of genes, pathways and (gene-gene and gene-disease) relationships relevant to the phenotype classes investigated

IPA

Page 56: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Don’t Forget Covariates Data!• Don’t forget these:

– Demographic• age, gender, race (often a PCA component), smoking, drinking, life style etc.

– Physiological• BMI, weight, height, etc.

– Clinical• blood tests, urine tests, other analytes.

• Integrate information– Molecular data– Knowledge-driving data– Covariates

• Multivariate regression– Model training – Model validation– Model assessment

• ROC

Page 57: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Integration is Critical• Provide more reliable information• Increase the prediction value• Insight into the mechanism• Reliable hypothesis generating• But can be biased as well

Transcription Translation Catalysis

DNARNA ProteinMetabolites

GenomeTranscriptomeProteome Metabolome/Lipidome Clinicalendpoint

dysregulation

Genetic effectEnvironmental effect

Page 58: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Examples of Cardiovascular Biomarkers with Integrated

Data

Vasan, 2006; Gerszten and Wang, 2008

Page 59: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Building Predictive Models

If …Then…Build up a model based on selected markers

Discovery set

validation set

Pro-retrospective set

Prospective set

Y= β0 + β1X1 + β2 X2 + βiXi

^ ^ ^ ^

Page 60: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Predictive Models• Multivariable models

– Linear regression• Continuous data

– logistic regression • Presence/absence of disease

– Cox regression • Survival data

• Algorithmic models—Machine learning– Support vector machines (SVM)– Artificial neural networks (ANN)

Page 61: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Validation Strategies

• Internal validation– Cross-validation– Random/non-random split samples into

training and test set• External validation

– Independent sample and dataset

Page 62: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Assessment of Performance• Basic parameters

– Sensitivity: the proportion of the true positive outcomes (e.g. truly diseased subjects) that are predicted to be positive

– Specificity: the proportion of the true negative outcomes (e.g. truly disease-free subjects) that are predicted to be negative

Page 63: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Assessment of Performance• Receiver Operating Characteristic (ROC) curve• Area under the curve (AUC)

– AUC=0.5: no association– AUC=1: perfect association– AUC<0.6: No medical value– AUC>0.75: reasonable

“AUROC”

Page 64: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Case study #1

Cancer Cell. 2004;5(6):607-16. PMID: 15193263

Page 65: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Hormone receptor status and tamoxifen response

Page 66: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Hormone receptor status and tamoxifen response

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

ER+/PR+

ER+/PR-

ER-/PR+

ER-/PR-

Non-responsiveness Rate

Page 67: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Biomarker is needed!

• Who would respond to TAM?• Alternative therapy

• Aromatase inhibitors• HER1 and HER2 inhibitors• Other chemotherapy

• Save time

Page 68: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Design

Frozen biopsy

103 ER+ female BC patients in

MGH

Sample selection

Microarray22k genes

Data collection

54% (N=32) disease free >10

yrs

46% (N=28)

metastatic (recurrent) ~4 yrs

60 female breast cancer patients uniformly

treated with TAM alone

Phenotyping Data filtering

<25% variance

Expression level?

5,475genes

19 DEG

3 DEG

LCD9

DEGTechnical Validation

t-testP=0.001

Permutationtest p<0.04

Data reduction

Discovery set

Page 69: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Technical Validation

Page 70: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Data Evaluation, Refinement and Selection

3 DEG

TAM recurrent

TAM Non-recurrent

HOXB13IL17BREST (unknown)

HOXB13

IL17BR

AgeTumor size

GradeLymphonode

status

ERBB2EGFRESR1PGR

Expression ratio

Logistic model

Univariate analysis

T-testAUC ROC comparison

AUC of ROC

Multivariate analysis

Expression ratio

Page 71: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Association Model: to determine the predictive factors

• Univariate analysis: select factors• Multivariate model: test dependence• Predictive model: include the independent factors

Y= β0 + β1X1 + β2 X2 + βiXi

^ ^ ^ ^

Page 72: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Validation

• Cross validation?• Independent validation• Technical validation: qPCR-why?• FFPE-why?

Page 73: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Predictive Model• qPCR determined a feasible strategy for diagnosis purposes• The ratio is the only predictive factor after controlling other

univariate analysis-identified factors, e.g. tumor size, other genes, etc

• Can this ratio accurately predict the recurrence status?

TAM Recur

TAM non-recur

HOX13BIL17BR Y=β1X+β0

?

Page 74: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Predictive model evaluation• Sensitivity, specificity and accuracy

Non-recur recur

non-predicted

predicted

recur non-recur

Predicted

Non-predicted

Accuracy = (21+27)/59= 81%Sensitivity=21/(21+6) =78%Specificity=27/(27+5) =84%+predictive value=21/(21+5)=81%-predictive value =27/(27+6)=82%

Page 75: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Independent Evaluation for the Predictive Model

TAM Recur

TAM non-recur

HOX13BIL17BR Y=β1X+β0

?

20 FFPE samples

Accuracy = (9+7)/20= 80%Sensitivity=7/(7+3) =70%Specificity=9/(9+1) =90%+predictive value=-predictive value=

Validation Set

Page 76: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Evaluation: Other outcomes

Page 77: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Why these two genes?--Mechanism

• Correlative studies tissue samples: Association• Mechanistic studies in BC cell line: causality

• Q: Is this step necessary?

Page 78: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Our Validation Set

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gse6532

Page 79: Big Data Training for Translational Omics Research BigTaP ... 1 day 1-Liu.pdf · In Translational Medicine Liu Day 1 Session I ... • Identify therapeutic targets and strategies

Big Data Training for Translational Omics Research

Evening Session I

Genomespace: http://www.genomespace.org/cBioPortal: http://www.cbioportal.org/


Recommended