Refined blood-borne miRNome of human diseases via PCA-based feature extraction

Post on 16-Jul-2015

135 views 0 download

Tags:

transcript

Refined blood-borne miRNome of human diseases via PCA-based feature extraction

Y-h. TaguchiDepartment of Physics,

Chuo University

Yoshiki MurakamiCenter for Genomic Medicine

Kyoto University

Caution:

Main results obtained by the collaboration with Prof. Murakami are based upon his own experiments ( * ), but our results are related to planed patent proposal. Thus, here we decided to present our methods applied to alternative public data.

(*) to be submitted to Journal of hepatology

1. The concept of PCA based feature extraction

2. What is miRNA (will be skipped)?

3. Previous Work (Dry + Wet)

4. Proposed method + Results

5. Summary & Conclusion

1. The concept of PCA based feature extraction

Why feature extraction?

・ Avoiding overfitting ・Needs for experimental validation too many genes/proteins cannot be tested.

・Several methods require fewer state variables than observationsOne of problems: Feature extraction itself rarely passes cross validation test.

Samples

Group1 Group2 Group3

FeatureExtraction

ModelConstruction

FeatureExtraction

ModelConstruction

Validation

Training Set

Conventional Test Set

Samples

Group1 Group2 Group3

ModelConstruction

FeatureExtraction

ModelConstruction

ValidationTraining Set

Proposed

Without knowledge

about classification/t

arget variable

Test Set

2. What is miRNA?

miRNA is a kind of non-coding RNA. miRNAs are believed to suppress target gene expression by degradation of mRNAs. Important features:

・ Typically, there are hundreds kinds of miRNAs found for each species (c.a., 1000 for human).≧

・ Each miRNA targets more than hundreds of genes. ・ miRNA mainly contributes to cell type change

(e.g., cancer, defferentiation, diseases) ・Infulence to target gene expression by miRNA is subtle (〜10%) and contexts dependent.・In spite of that, miRNA critically contributes to the related processesmiRNA critically contributes to the related processes (e.g., induction of cell cycle arrest)

3. Previous Work (Dry + Wet)

Toward the blood-borne miRNome of human diseases, A. Keller et al., Nature Method, (2011).

Discrimination between diseases using miRNA in blood

Feature (miRNA) selection : P-value (t test)

Discrimination: SVC with several types of kernels + grid based optimal parameter search

cf. Nature Method, 10 miRNAs

<0.7

4. Proposed method + Results

Data

PCA

Feature Selection(without classification information)

LDA

◯ Control△ lung cancer 

PCA (samples: diseases/cancers)

diseasescancers

Feature extraction (miRNAs)

PCA (miRNAs)

10 outliner miRNAs

Why outliners?⇓

main contribution to PCA

embeddings of samples

Why 10?⇓

To compare with Nature Method paper results

miRNA

◯ Control △ lung cancer

PCA, again (samples after feature extraction)

diseasescancers

Control vs Lung CancerLDA with PCA (after feature extraction, up to the 5th PC)

control lung cancer

control 56 8

lung cancer 14 24

Accuracy 0.784Specificity 0.800Sensitivity 0.750Precision 0.632

Pred

iction

Actual

0.8130.8440.781

cf. Nature Method, 250 miRNAs

0.813 0.844 0.781 250 miRNAsRelatively Best

0.867 0.867 0.844 150 miRNAsRelatively Worst

(+)(-) : Comparison with 10 miRNA results in Nature Methods

>0.70

Selected miRNAs: diseases/cancers vs normal(+)/(-) : up/downregulated after the transformation by PCA+LDA (*) not selected independence of diseases/cancers

Advantages of proposed method

・ No need of classification information for feature selection

・ Independent of training/test set division for feature selection (Thus, stable) 

5. Summary & Conclusion