+ All Categories
Home > Documents > Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection.

Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection.

Date post: 17-Dec-2015
Category:
Upload: lambert-page
View: 232 times
Download: 1 times
Share this document with a friend
Popular Tags:
29
Multivariate Analysis of Pathways
Transcript

Multivariate Analysis of Pathways

Multivariate Approaches to Gene Set Selection

Key Multivariate Ideas

• PCA (Principal Components Analysis)

• SVD (Singular Value Decomposition)

• MDS (Multi-dimensional Scaling)

• Hotelling T2

PCA

Three correlated variablesPCA1 lies along the direction ofmaximal correlation; PCA 2 atright angles with the next highest variation.

Multivariate Representation of Pathways

• BAD pathwayNormal

IBC

Other BC

• Clear separation between groups

• Variation differences

• Compute distance between sample means using (common) metric of covariation

• Where

• Multidimensional analog of t (actually F) statistic

Hotelling’s T2

Principles of Kong et al Method

• Normal covariation generally acts to preserve homeostasis

• The transcription of genes that participate in many processes will be changed

• The joint changes in genes will be most distinctive for those genes active in pathways that are working differently

Critiques of Hotelling’s T

• Small samples: unreliable estimates– N < p

• Estimates of x and not robust to outliers

• Assumes same covariance in each sample– = ? Usually not in disease

– Kong et al propose analog of Welch t-test– Permutation in samples for significance

Making it Stable

1. Insufficient information to capture all relationships – too much correlation!

– Power of Hotelling’s method comes from identifying directions of rare variation

– Many (spurious) directions of 0 variation

2. Random variation in data leads to random variation in PCA

• Regularization strategy: force covariance to be more like IID

Making it Robust

• Microarray data has many outliers

• Multivariate methods are very much distorted by outliers

• Robust estimates of covariance could give robust PCA

• Simple approach: trim outliers

Handling Changes of Covariance

• Power of Hotelling’s method comes from identifying directions of rare variation

• If one group shows little covariation in one direction but the other does – how to test for changes?

• If one group is control then its rare covariance changes should be taken as standard– Robust measure of means in both groups

Detecting changes of covariance

Meaning of Covariance Change

• Meaning of covariance across individuals– Homeostasis in face of individual variation– e.g. BAD pathway: largest loadings of PC1 on

PRKARB & ADCY1– PRKARB represses CREB1; ADCY activates CREB1

• Gene sets whose covariance diminishes may– be responding to different inputs – have escaped their usual regulatory control

• Characteristic of cancers

Testing Covariance Changes

• Idea: directions of small variation in one should match directions of small variation in other

• Mathematical approach – Find solutions of S1 – S2

– Solutions should all be near 1, if no change

– Test statistic: easily computed

• Computational approach– Ratio of largest to smallest: max / min

pii

i,..,1

2

1

1

Network Connectivity Methods

Network Topology

• Connections represent interactions:– Regulatory (one-way)– Protein interaction (two-way)

• Hubs are genes with many connections

• Bottlenecks are single genes that connect two parts of a functional network

Devising Tests Based on Topology

• Issues: how to weight more heavily the genes that are hubs

• How to assess directionality of change

• How to measure co-operativity (activation or repression changes in appropriate ways)

Draghici et. al. Approach

• Overall measure

• Effective contribution (perturbation factor)

Analysis of Outliers

Outliers: Clues to Disease Process?

• Outliers usually reflect idiosyncratic events• Recurrent outliers reflect rare events that are selected• If a particular pathway is disrupted in disease, but by

many different mechanisms, then the expression profiles should – Lose healthy covariance– Show recurrent outliers

• How to test for ‘consistent’ outliers?• COPA: a method for flagging recurrent outliers in

expression data– Finds consistent fusion gene

A Test Statistic for Consistent Outliers

• Ratio of quantile differences to normal variation: (q.90 – q.10)tumor/max( (q.9-

q.1)normal,0.4)

• Compare to null distribution by permutation

• Many genes show much higher ratios

Statistical Significance

• Find false positives confidence limits by permutations

• Several hundred genes appear significant at 10-20% FDR – Actual scores: 267 scores are greater than 5,

where 90% of permutations have fewer than 34 scores over 5

A Test for Functional Groups

• For each group G of genes

• sG <- sum(scores[G])/sqrt(length(G))

• Scores: t-scores or range ratios

• PAGE (BMC Bioinformatics, 2005)

Do Genes Make Sense? • Quantile Ratio• [1] "DNA replication"• [2] "response to pathogenic fungi"• [6] "cleavage of lamin"• [7] "spindle organization and biogenesis"• [15] "response to osmotic stress"• [16] "nutrient import"• [22] "response to mercury ion"

• T-test• [2] "sodium ion homeostasis"• [3] "leukocyte adhesive activation"• [4] "positive regulation of calcium-independent cell-cell adhesion"• [5] "oxytocin receptor activity"• [6] "ADP biosynthesis"• [7] "dADP biosynthesis"• [10] "regulation of muscle contraction"• [11] "caveolar membrane"• [12] "response to cold"• [16] "stress fiber formation"• [18] "positive regulation of complement activation"• [19] "astrocyte activation"• [22] "regulation of long-term neuronal synaptic plasticity"• [24] "positive regulation of endocytosis"• [25] "embryonic hemopoiesis"

Cancer Functional Groups

• Do very probable cancer genes show high-discrepancy in few samples?

• Program: identify genes that might contribute to cancer processes: growth signaling, loss of cell-matrix adhesion, apoptosis1. Do most samples from these categories show at

least one gross mis-regulation?

2. Are they the same genes in most samples?

Example: Cell Growth

• Select genes in GO:001558 ‘regulation of cell growth’

• Expect most samples to have at least one very serious mis-regulated gene from this category.

• Compute maximum aberration score across category

Aberrations

• Aberration score indicated by color: vanilla: 0; red: 4

• Nine normals at left• No gene misregulated in

even 50% of samples• BUT: Only a few genes

commonly misregulated

Simplest Summary

• Maximum aberration score for samples

Testing the Pathway for Outliers

• Many genes show aberrations in tumor group

• Null distribution: medians of maxima from randomly selected gene groups of size 37

• P < .01

NB. The results for cell-matrix interaction are very similar; angiogenesis not so strong


Recommended