+ All Categories
Home > Documents > Plaid Models and Microarrays - Stanford...

Plaid Models and Microarrays - Stanford...

Date post: 26-Apr-2018
Category:
Upload: hoangnguyet
View: 222 times
Download: 0 times
Share this document with a friend
22
Plaid Models 1 Plaid Models and Microarrays Laura Lazzeroni Art Owen Stanford University Stanford University [email protected] [email protected] May 30, 2000 Sequoia Hall 200
Transcript

Plaid Models 1�

Plaid Models

and Microarrays

Laura Lazzeroni Art Owen

Stanford University Stanford University

[email protected] [email protected]

May 30, 2000

Sequoia Hall 200

Plaid Models 2�

Expression dataFor Genes � � �� � � � � �

And Samples � � �� � � � � �

��� measures expression level

Samples

From different organs/individuals/times, etc.

Genes

Many or all of the organism’s genes

Expression

Activity level of gene � in sample �

Starting points

1. Eisen, Spellman, Brown, Botstein: PNAS (1998)

2. Hastie, Tibshirani, Eisen, Brown, Ross, Scherf,

Weinstein, Alizadeh, Staudt, Botstein: S.U. Tech.

Report (2000)

Plaid Models 3�

Transposable Data

Observations Variables Data Dimension

Genes Samples � �� �

Samples Genes � � �� �

Movies Viewers � �� �

Viewers Movies � � �� �

Words Documents � �� �

Documents Words � � �� �

Good statistics problems:

1. (When) should model be symmetric?

2. Can’t have both ����� and �����

3. How best to bootstrap?

4. How to use row/col specific covariates?

Plaid Models 4�

Managing the data� is usually large

� can be large

� � � graphical methods

Eisen et. al. (1998) data

���� yeast genes

�� samples

multiple experiments

Plaid Models 5�

Modeling ApproachTry:

����� � �

�����

�� ��� �

�� � �� ��

��� � �� ��

Interpretation:

� is a background level

There are � “layers”, with levels �

�� for gene membership

��� for sample membership

� , for upregulation

� � , for downregulation

Plaid Models 6�

Bigger models

����� � �

�����

������

����� � �

�����

������� � ���

����� � �

�����

������� � ���

����� � �

�����

������� � ��� � ���

Subject to

�����

����� � � ��

�����

������ � � ��

Anova-lets, but without the orthogonality

Plaid Models 7�

Geometry of a layerInclude ��� but not ���

Drop subscript k

Let

� ������

� ���

���

��

� � � �� � ��

Like a cluster of � genes around � � ��

� � � some, maybe not all, genes

� � � some, maybe not all, samples

� �� gives an “expression pattern”

Importance of sample � given by � ��

Adding layers: lets genes be in multiple clusters

Converse: get cluster of samples wrt some genes

Plaid Models 8�

More geometryConsider � �� � ��

Genes � cluster around a line through

�� � � �� � ��

Samples � cluster around a line through

� � � �� � ��

Most important/typical genes: large � ��

Plaid Models 9�

Even Bigger models1. Write

� ������

������� � ��� � ��� � �������

2. Incorporates Tukey’s 1 df for non-additivity

3. Clusters genes around a more general line in ��

4. We can mix/match layer types.

5. We can replace the background � by a model

layer.

Plaid Models 10�

SVD and others

�����

�����

��������

Method �� ��� ��� Also:

SVD � � � �������� � ���

SDD � ���� ����

NND � ��� ���

VQ � �� �� ��

� ��� � �

VQ � � �� ���

� ��� � �

Shave � ���� �

ADDCL � �� �� �� �� � � �

Plaid �� �� �� ��

Plaid replaces �� by a model

Plaid Models 11�

AlgorithmSeek small value of

�����

�����

���� �

�����

���������

��

Where

��� ��� � �� ��� and,

���� � �

�� � � ���

�� � � ���

�� � � ��� � ���

1. Likely to be NP-hard � � � even clustering is

2. We pick one layer at a time � � �

3. � � � using an interior point algorithm

4. Larger clusters are more attractive

5. Clusters near background not attractive

Plaid Models 12�

Finding one layer

Residual:

��� � ��� ��������

���������

Drop � and write:

� ��

��

��

���� � ������

��

We want to min � over �, , �

1. Start with arbitrary �� �� � �� �

2. Update ��� given � and ��

3. Update � and �� given ���

Alternate 2, 3 above, but:

1. Keep �, �� away from and � early on

2. Force �, �� to or � later

Plaid Models 13�

Fuzzy anovaMinimize:

��

��

���� � ���

�� �� � ��

���

Subject to:

���

���� ���

�����

By taking:

��

�� ��������

� ��

���� �

��

�� �

������ � ��� ��

��

� ���

�� �

������ � ��� �

���

� ��

Plaid Models 14�

Updating �� and ��

Minimize:

��

��

���� � ���

�� �� � ��

���

Let:

��� � � �� � ��

� �

�� ���������� �

����

��

�� �

�� ��������� �

���

��

Notes

1. The �� update only uses gene �’s data

2. The � update only uses gene �’s data

3. Avoids ���� costs

4. Similarly for �� , �� and ����

Plaid Models 15�

Some detailsStarting values SVD finds a -only plaid layer.

Rescale singular vectors to start and �

Backfitting Given ��, ��� � �� �� for

� � �� � � � �� it is cheap to re-estimate all the ��� .

Choosing K Permute row contents, then columns.

Stop if the algorithm finds more structure in the

permuted data. Negative binomial regularization.

Stepping Use� � steps to get �, �� into �� ��.

Unisign We may want a common sign for � ��.

Robustness Inspect each new found layer: release

any rows or columns not well explained.

Plaid Models 16�

Food data� � ��� foods

� � � measures:

1. Fat proportion

2. Saturated fat proportion

3. Calories per gram

4. Cholesterol proportion� �

5. Protein proportion

6. Carbohydrate proportion

For each column: subtract mean, divide by st.dev.

Source:

http://www.ntwrks.com/ mikev/chart1.html

Plaid Models 17�

Yeast dataName Samples

Alpha 1–18

Elutriation 19–32

CDC 33-47

Sporulation 48–53

Sporulation-5 54–56

Sporulation- 57–58

Heat Shock 59–64

DTT 65–68

Cold 69–72

Diauxic Shift 73–79

Eisen, Spellman, Brown, Botstein: PNAS (1998)

Plaid Models 18�

Data Analysis� Analyze log expression

� Few missing values: imputed by additive model

� Background Layer

– Full model � �� � ��

– All genes � � ����

– All samples �� � ����

� Mine the interaction, with up to � layers

– unisign

– 50% threshold

– � permutations per round

� Search stopped at �� layers: ��th had zero genes

Plaid Models 19�

Future directions� Refine existing algorithm

� Explore information retrieval applications

� Explore recommender system applications

� Find less greedy version

� Incorporate predictors

� Extend to higher way tables of data

� Larger data sets (99% missing)

� Use covariates

� Are there “plaid-lets”?

Code

Available for academic research

www-stat.stanford.edu/ owen/clickwrap/plaid.html

Plaid Models 20�

RefinementsReplace

minimize�

��

��

���� � ���

�� �� � ��

���

s.t. ���

���� ���

�����

�� �� � �

By:

minimize�

��

��

���� �

�� �� � ��

���

��

��

��

������� ���

s.t. ���

��� ���

����

�� �� � �� ��

Plaid Models 21�

Updates become:Model parts

��

�� �������

�� ���

�� �

�� ������ � �

� ��

�� �

�� ����� � �

� �

Memberships

� � � iff��

������� � ���

� � ����

��

�� � � iff��

������ � ���

� � ����

��

So far:

1. Seems to find slightly better layers2. Harder to frame multi-layer model

Plaid Models 22�

Another refinementOptimize “Mean Square” instead of “Sum of Squares”

Gets smaller more intense layers

Individual layers more interpretable

But more of them required

Requires a tradeoff of intensity vs sum of squares


Recommended