Small data: practical modeling issues in human-model -omic data
Defense for the degree of Ph. D.Einar HolsbøFebruary 8th, 2019
Act I: “Boy Bitten by a Lizard” (1590s)
–Eiliv Lund, 4.5 years ago, quote made up
Can we predict breast cancer metastasis from blood samples?
Metastasis is the spread of cancer in the body
Metastasis is the spread of cancer in the body
0.0
0.2
0.4
0.6
0.8
1.0
Five−year survival probability,various cancers
Local Regional Distant
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● Female breast
Data source: Siegel, R. L., Miller, K. D. and Jemal, A. (2017), Cancer statistics, 2017. CA: A Cancer Journal for Clinicians, 67: 7-30. doi:10.3322/caac.21387
Metastasis is the spread of cancer in the body
0.0
0.2
0.4
0.6
0.8
1.0
Five−year survival probability,various cancers
Local Regional Distant
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● Female breast
Data source: Siegel, R. L., Miller, K. D. and Jemal, A. (2017), Cancer statistics, 2017. CA: A Cancer Journal for Clinicians, 67: 7-30. doi:10.3322/caac.21387
Metastasis is the spread of cancer in the body
Goal: predict it, win the Nobel prize 🏅
Norwegian Women and Cancer
• Prospective population-based cohort that tracks 34% (170 000) of all Norwegian women born between 1943-57.
• The data collection started in NOWAC in 1991. Includes blood samples from 50.000 women, as well as more than 300 biopsies.
• Now contains various -omics material: microarray mRNA, miRNA, methylation, metabolomics, and RNA-seq.
ProspectiveEnrollment
ProspectiveEnrollment
Time →
ProspectiveEnrollment
Time →
Prospective
Time →
Prospective
Prospective
Nested case–control
} cc-pair
} cc-pair
} cc-pair
} cc-pair
Prospective design nice because recruitment is blinded to outcome
and exposure
Prospective design nice because recruitment is blinded to outcome
and exposure
Low bias
Gene expressionAT GC CG TA TA CG
……
DNA
Gene expressionAT GC CG TA TA CG
U C G A A G…
…
……
DNA mRNA
Gene expressionAT GC CG TA TA CG
U C G A A G
some useful protein
……
……
DNA mRNA
Gene expression
U C G A A G
Gene expression
U C G A A G
💡
Gene expression
U C G A A G
💡How much light
do we see?
Data at a glancedim(gene_expression)## [1] 88 12404
summary(days_to_diagnosis)## Min. 1st Qu. Median Mean 3rd Qu. Max.## 6.0 117.8 189.5 186.8 269.2 358.0
summary(metastasis)## FALSE TRUE## 66 22
table(metastasis, stratum)## stratum## metastasis screening interval clinical## FALSE 43 10 13## TRUE 6 6 10
Data at a glancedim(gene_expression)## [1] 88 12404
summary(days_to_diagnosis)## Min. 1st Qu. Median Mean 3rd Qu. Max.## 6.0 117.8 189.5 186.8 269.2 358.0
summary(metastasis)## FALSE TRUE## 66 22
table(metastasis, stratum)## stratum## metastasis screening interval clinical## FALSE 43 10 13## TRUE 6 6 10
Data at a glancedim(gene_expression)## [1] 88 12404
summary(days_to_diagnosis)## Min. 1st Qu. Median Mean 3rd Qu. Max.## 6.0 117.8 189.5 186.8 269.2 358.0
summary(metastasis)## FALSE TRUE## 66 22
table(metastasis, stratum)## stratum## metastasis screening interval clinical## FALSE 43 10 13## TRUE 6 6 10
Data at a glancedim(gene_expression)## [1] 88 12404
summary(days_to_diagnosis)## Min. 1st Qu. Median Mean 3rd Qu. Max.## 6.0 117.8 189.5 186.8 269.2 358.0
summary(metastasis)## FALSE TRUE## 66 22
table(metastasis, stratum)## stratum## metastasis screening interval clinical## FALSE 43 10 13## TRUE 6 6 10
Data at a glancedim(gene_expression)## [1] 88 12404
summary(days_to_diagnosis)## Min. 1st Qu. Median Mean 3rd Qu. Max.## 6.0 117.8 189.5 186.8 269.2 358.0
summary(metastasis)## FALSE TRUE## 66 22
table(metastasis, stratum)## stratum## metastasis screening interval clinical## FALSE 43 10 13## TRUE 6 6 10
These are “small data” & we should be careful with them
A computer scientist’s guide to precision medicine
• Step 1: pick some models
• Step 2: pick some scoring rules/performance metrics
• Step 3: “classification”
Scoring rule examples (aka. loss functions, aka. metrics)
• Accuracy: how many did we get right?
• Precision: how many correct “success” predictions did we do
• Recall: how many of the true successes did we detect
Scoring rule examples (aka. loss functions, aka. metrics)
p > .5? something else?
Decoupling score and decision threshold
• AUC: the probability of ranking success higher than failure
(aka. concordance probability)
Just trying some methods & scores
Just trying some methods & scores
log
p
1� p
= �0 + �1x1 + . . .+ �dxd<latexit sha1_base64="ZGi/WZGaXB6h66Gn4QQFG0EtpOE=">AAACJ3icbZBdS8MwFIbT+TXnV9VLb4JDEMTRiqBeiKI3Xio4N1hHSdN0hqVNSU5lo+zneONf8UZERS/9J2ZbEXW+EHjynnNIzhukgmtwnA+rNDU9MztXnq8sLC4tr9irazdaZoqyOpVCqmZANBM8YXXgIFgzVYzEgWCNoHs+rDfumNJcJtfQT1k7Jp2ER5wSMJZvn3hCdrAXKULzdJC7u+kAH2MvYEB8B+8U5PZ8d3gRoQT97YY9P/TtqlNzRsKT4BZQRYUuffvZCyXNYpYAFUTrluuk0M6JAk4FG1S8TLOU0C7psJbBhMRMt/PRogO8ZZwQR1KZkwAeuT8nchJr3Y8D0xkTuNV/a0Pzv1org+iwnfMkzYAldPxQlAkMEg9TwyFXjILoGyBUcfNXTG+JCQ1MthUTgvt35Umo79WOas7VfvX0rEijjDbQJtpGLjpAp+gCXaI6ougePaIX9Go9WE/Wm/U+bi1Zxcw6+iXr8wu9rKQl</latexit><latexit sha1_base64="ZGi/WZGaXB6h66Gn4QQFG0EtpOE=">AAACJ3icbZBdS8MwFIbT+TXnV9VLb4JDEMTRiqBeiKI3Xio4N1hHSdN0hqVNSU5lo+zneONf8UZERS/9J2ZbEXW+EHjynnNIzhukgmtwnA+rNDU9MztXnq8sLC4tr9irazdaZoqyOpVCqmZANBM8YXXgIFgzVYzEgWCNoHs+rDfumNJcJtfQT1k7Jp2ER5wSMJZvn3hCdrAXKULzdJC7u+kAH2MvYEB8B+8U5PZ8d3gRoQT97YY9P/TtqlNzRsKT4BZQRYUuffvZCyXNYpYAFUTrluuk0M6JAk4FG1S8TLOU0C7psJbBhMRMt/PRogO8ZZwQR1KZkwAeuT8nchJr3Y8D0xkTuNV/a0Pzv1org+iwnfMkzYAldPxQlAkMEg9TwyFXjILoGyBUcfNXTG+JCQ1MthUTgvt35Umo79WOas7VfvX0rEijjDbQJtpGLjpAp+gCXaI6ougePaIX9Go9WE/Wm/U+bi1Zxcw6+iXr8wu9rKQl</latexit><latexit sha1_base64="ZGi/WZGaXB6h66Gn4QQFG0EtpOE=">AAACJ3icbZBdS8MwFIbT+TXnV9VLb4JDEMTRiqBeiKI3Xio4N1hHSdN0hqVNSU5lo+zneONf8UZERS/9J2ZbEXW+EHjynnNIzhukgmtwnA+rNDU9MztXnq8sLC4tr9irazdaZoqyOpVCqmZANBM8YXXgIFgzVYzEgWCNoHs+rDfumNJcJtfQT1k7Jp2ER5wSMJZvn3hCdrAXKULzdJC7u+kAH2MvYEB8B+8U5PZ8d3gRoQT97YY9P/TtqlNzRsKT4BZQRYUuffvZCyXNYpYAFUTrluuk0M6JAk4FG1S8TLOU0C7psJbBhMRMt/PRogO8ZZwQR1KZkwAeuT8nchJr3Y8D0xkTuNV/a0Pzv1org+iwnfMkzYAldPxQlAkMEg9TwyFXjILoGyBUcfNXTG+JCQ1MthUTgvt35Umo79WOas7VfvX0rEijjDbQJtpGLjpAp+gCXaI6ougePaIX9Go9WE/Wm/U+bi1Zxcw6+iXr8wu9rKQl</latexit>
X|�i| t
<latexit sha1_base64="8NqjQ2vieh0SxUmPxqGqRkxR35Y=">AAAB/XicbVDLSgNBEJyNrxhfUfHkZTAInsKuCOot6MVjBGMC2RBmJ73JkJnddaZXCJuAv+LFg4pX/8Obf+PkcdDEgoaiqpvuriCRwqDrfju5peWV1bX8emFjc2t7p7i7d2/iVHOo8VjGuhEwA1JEUEOBEhqJBqYCCfWgfz3264+gjYijOxwk0FKsG4lQcIZWahcPfJMqOvQDQNYWQ+pLeKDYLpbcsjsBXSTejJTIDNV28cvvxDxVECGXzJim5ybYyphGwSWMCn5qIGG8z7rQtDRiCkwrm5w/osdW6dAw1rYipBP190TGlDEDFdhOxbBn5r2x+J/XTDG8aGUiSlKEiE8XhamkGNNxFrQjNHCUA0sY18LeSnmPacbRJlawIXjzLy+S2mn5suzenpUqV7M08uSQHJET4pFzUiE3pEpqhJOMPJNX8uY8OS/Ou/Mxbc05s5l98gfO5w8rx5Ur</latexit><latexit sha1_base64="8NqjQ2vieh0SxUmPxqGqRkxR35Y=">AAAB/XicbVDLSgNBEJyNrxhfUfHkZTAInsKuCOot6MVjBGMC2RBmJ73JkJnddaZXCJuAv+LFg4pX/8Obf+PkcdDEgoaiqpvuriCRwqDrfju5peWV1bX8emFjc2t7p7i7d2/iVHOo8VjGuhEwA1JEUEOBEhqJBqYCCfWgfz3264+gjYijOxwk0FKsG4lQcIZWahcPfJMqOvQDQNYWQ+pLeKDYLpbcsjsBXSTejJTIDNV28cvvxDxVECGXzJim5ybYyphGwSWMCn5qIGG8z7rQtDRiCkwrm5w/osdW6dAw1rYipBP190TGlDEDFdhOxbBn5r2x+J/XTDG8aGUiSlKEiE8XhamkGNNxFrQjNHCUA0sY18LeSnmPacbRJlawIXjzLy+S2mn5suzenpUqV7M08uSQHJET4pFzUiE3pEpqhJOMPJNX8uY8OS/Ou/Mxbc05s5l98gfO5w8rx5Ur</latexit><latexit sha1_base64="8NqjQ2vieh0SxUmPxqGqRkxR35Y=">AAAB/XicbVDLSgNBEJyNrxhfUfHkZTAInsKuCOot6MVjBGMC2RBmJ73JkJnddaZXCJuAv+LFg4pX/8Obf+PkcdDEgoaiqpvuriCRwqDrfju5peWV1bX8emFjc2t7p7i7d2/iVHOo8VjGuhEwA1JEUEOBEhqJBqYCCfWgfz3264+gjYijOxwk0FKsG4lQcIZWahcPfJMqOvQDQNYWQ+pLeKDYLpbcsjsBXSTejJTIDNV28cvvxDxVECGXzJim5ybYyphGwSWMCn5qIGG8z7rQtDRiCkwrm5w/osdW6dAw1rYipBP190TGlDEDFdhOxbBn5r2x+J/XTDG8aGUiSlKEiE8XhamkGNNxFrQjNHCUA0sY18LeSnmPacbRJlawIXjzLy+S2mn5suzenpUqV7M08uSQHJET4pFzUiE3pEpqhJOMPJNX8uY8OS/Ou/Mxbc05s5l98gfO5w8rx5Ur</latexit>
X�2i t
<latexit sha1_base64="MKxOuK7+4jD6KuDz6gDROXoIIys=">AAAB/XicbVA9SwNBEN2LXzF+nYqVzWIQrMIlCGoXtLGMYEwgF8PeZpIs2ds7d+eEcAT8KzYWKrb+Dzv/jZvkCk18MPB4b4aZeUEshUHP+3ZyS8srq2v59cLG5tb2jru7d2eiRHOo80hGuhkwA1IoqKNACc1YAwsDCY1geDXxG4+gjYjULY5iaIesr0RPcIZW6rgHvklC6geArCPuK9SX8ECx4xa9kjcFXSTljBRJhlrH/fK7EU9CUMglM6ZV9mJsp0yj4BLGBT8xEDM+ZH1oWapYCKadTs8f02OrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HeeTsVKk4QFJ8t6iWSYkQnWdCu0MBRjixhXAt7K+UDphlHm1jBhlCef3mR1Culi5J3c1qsXmZp5MkhOSInpEzOSJVckxqpE05S8kxeyZvz5Lw4787HrDXnZDP75A+czx+K5pTD</latexit><latexit sha1_base64="MKxOuK7+4jD6KuDz6gDROXoIIys=">AAAB/XicbVA9SwNBEN2LXzF+nYqVzWIQrMIlCGoXtLGMYEwgF8PeZpIs2ds7d+eEcAT8KzYWKrb+Dzv/jZvkCk18MPB4b4aZeUEshUHP+3ZyS8srq2v59cLG5tb2jru7d2eiRHOo80hGuhkwA1IoqKNACc1YAwsDCY1geDXxG4+gjYjULY5iaIesr0RPcIZW6rgHvklC6geArCPuK9SX8ECx4xa9kjcFXSTljBRJhlrH/fK7EU9CUMglM6ZV9mJsp0yj4BLGBT8xEDM+ZH1oWapYCKadTs8f02OrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HeeTsVKk4QFJ8t6iWSYkQnWdCu0MBRjixhXAt7K+UDphlHm1jBhlCef3mR1Culi5J3c1qsXmZp5MkhOSInpEzOSJVckxqpE05S8kxeyZvz5Lw4787HrDXnZDP75A+czx+K5pTD</latexit><latexit sha1_base64="MKxOuK7+4jD6KuDz6gDROXoIIys=">AAAB/XicbVA9SwNBEN2LXzF+nYqVzWIQrMIlCGoXtLGMYEwgF8PeZpIs2ds7d+eEcAT8KzYWKrb+Dzv/jZvkCk18MPB4b4aZeUEshUHP+3ZyS8srq2v59cLG5tb2jru7d2eiRHOo80hGuhkwA1IoqKNACc1YAwsDCY1geDXxG4+gjYjULY5iaIesr0RPcIZW6rgHvklC6geArCPuK9SX8ECx4xa9kjcFXSTljBRJhlrH/fK7EU9CUMglM6ZV9mJsp0yj4BLGBT8xEDM+ZH1oWapYCKadTs8f02OrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HeeTsVKk4QFJ8t6iWSYkQnWdCu0MBRjixhXAt7K+UDphlHm1jBhlCef3mR1Culi5J3c1qsXmZp5MkhOSInpEzOSJVckxqpE05S8kxeyZvz5Lw4787HrDXnZDP75A+czx+K5pTD</latexit>
Figures from Hastie, Tibshirani, and Friedman: The Elements of Statistical Learning
“lasso”“ridge”
+ =
Figures from Hastie, Tibshirani, and Friedman: The Elements of Statistical Learning
X⇥↵�2
i + (1� ↵)|�i|⇤ t
<latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit>
“ElasticNet”
+ =
Figures from Hastie, Tibshirani, and Friedman: The Elements of Statistical Learning
X⇥↵�2
i + (1� ↵)|�i|⇤ t
<latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit>
“ElasticNet”
Tradeoff between penalty types, controls “roundness”
Trying different alphas
−5 −4 −3 −2 −1
0.50
0.65
0.80
log(Lambda)AU
C
111 96 88 80 73 63 50 28 10ElasticNet, binomial family, alpha=0.5
−6 −5 −4 −3 −2
0.2
0.4
0.6
log(Lambda)
AUC
35 33 32 31 31 26 24 13 6 0Lasso, binomial family
1 2 3 4 5
0.60
0.70
0.80
log(Lambda)
AUC
12295 12295 12295 12295 12295Ridge, binomial family
Figures show concordance (higher is better)
Alpha = 1 Alpha = 0Alpha = .5
Trying different alphas
−5 −4 −3 −2 −1
0.50
0.65
0.80
log(Lambda)AU
C
111 96 88 80 73 63 50 28 10ElasticNet, binomial family, alpha=0.5
−6 −5 −4 −3 −2
0.2
0.4
0.6
log(Lambda)
AUC
35 33 32 31 31 26 24 13 6 0Lasso, binomial family
1 2 3 4 5
0.60
0.70
0.80
log(Lambda)
AUC
12295 12295 12295 12295 12295Ridge, binomial family
Figures show concordance (higher is better)
.7 .8
.5
Alpha = 1 Alpha = 0Alpha = .5
0.0 0.2 0.4 0.6 0.8 1.0
0.5
0.6
0.7
0.8
best auc for varying alpha
alpha
AUC
Finding the “best” parameter alpha by cross-validation
0.0 0.2 0.4 0.6 0.8 1.0
0.5
0.6
0.7
0.8
best auc for varying alpha
alpha
AUC
Finding the “best” parameter alpha by cross-validation
????????????(this is the lizard)
❧ intermission ☙
Act II: When you are engulfed in flames
0.0 0.2 0.4 0.6 0.8 1.0
0.5
0.6
0.7
0.8
best auc for varying alpha
alpha
AUC
Finding the “best” parameter alpha by cross-validation
AUC
alpha
Some “technical” sources of variation
• The big classic one: sample size
• Scoring rule
• Validation procedure
Some “technical” sources of variation
• The big classic one: sample size
• Scoring rule
• Validation procedure
Small data: sample size is more or less fixed in the human model
Typical sample sizes in transcriptomics
4 9 21 56 176 614 3372 18736
n = 1178
Small data: sample size is more or less fixed in the human model
Typical sample sizes in transcriptomics
4 9 21 56 176 614 3372 18736
n = 1178
Ethics, economy, logistics limit access to human obs.
Some “technical” sources of variation
• The big classic one: sample size
• Scoring rule
• Validation procedure
Yet another scoring rule
Brier’s score is the mean squared errors of predicted probabilities
n�1X
(p̂i � pi)2
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Some risk surfaces
log
p
1� p
= 1 + x,
x ⇠ U [�6, 6]
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
(risk = expected loss)
Some risk surfaces
−0.5 0.0 0.5 1.0 1.5 2.0
−0.5
0.0
0.5
1.0
1.5
2.0
intercept
slop
e
−0.041 −0.0405
−0.04 −0.0395 −0.039
−0.0385 −0.038 −0.0375 −0.037 −0.0365
−0.036 −0.0355 −0.035 −0.0345 −0.034 −0.0335 −0.033 −0.0325 −0.032 −0.0315 −0.031
−0.0305 −0.03 −0.0295 −0.029
−0.0285 −0.028 −0.0275 −0.027 −0.0265
−0.026 −0.0255
−0.025
−0.0245 −0.024 −0.0235 −0.023
−0.0225
−0.022 −0.0215
−0.021
●
log
p
1� p
= 1 + x,
x ⇠ U [�6, 6]
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Brier
Brighter is better
Some risk surfaces
−0.5 0.0 0.5 1.0 1.5 2.0
−0.5
0.0
0.5
1.0
1.5
2.0
intercept
slop
e
−0.041 −0.0405
−0.04 −0.0395 −0.039
−0.0385 −0.038 −0.0375 −0.037 −0.0365
−0.036 −0.0355 −0.035 −0.0345 −0.034 −0.0335 −0.033 −0.0325 −0.032 −0.0315 −0.031
−0.0305 −0.03 −0.0295 −0.029
−0.0285 −0.028 −0.0275 −0.027 −0.0265
−0.026 −0.0255
−0.025
−0.0245 −0.024 −0.0235 −0.023
−0.0225
−0.022 −0.0215
−0.021
●
−0.5 0.0 0.5 1.0 1.5 2.0
−0.5
0.0
0.5
1.0
1.5
2.0
intercept
slop
e
−0.0445 −0.044 −0.0435
−0.043 −0.0425
−0.042
−0.0415 −0.041
−0.0405 −0.04
−0.0395
−0.039
−0.0385 −0.038 −0.0375
−0.037 −0.0365
−0.036 −0.0355 −0.035
−0.0345
−0.034
−0.0335
−0.032
−0.0315
−0.031
−0.0305
−0.03
−0.0295
−0.029
−0.0285
−0.028
−0.0275
−0.027
−0.0265
−0.026
−0.0255
−0.025
−0.0245
●
Brier Accuracy
log
p
1� p
= 1 + x,
x ⇠ U [�6, 6]
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Brighter is better
Some risk surfaces
−0.5 0.0 0.5 1.0 1.5 2.0
−0.5
0.0
0.5
1.0
1.5
2.0
intercept
slop
e
−0.041 −0.0405
−0.04 −0.0395 −0.039
−0.0385 −0.038 −0.0375 −0.037 −0.0365
−0.036 −0.0355 −0.035 −0.0345 −0.034 −0.0335 −0.033 −0.0325 −0.032 −0.0315 −0.031
−0.0305 −0.03 −0.0295 −0.029
−0.0285 −0.028 −0.0275 −0.027 −0.0265
−0.026 −0.0255
−0.025
−0.0245 −0.024 −0.0235 −0.023
−0.0225
−0.022 −0.0215
−0.021
●
−0.5 0.0 0.5 1.0 1.5 2.0
−0.5
0.0
0.5
1.0
1.5
2.0
intercept
slop
e
0.015 0.016 0.017 0.018 0.019 0.02 0.021 0.022 0.023 0.024 0.025 0.026
●
−0.5 0.0 0.5 1.0 1.5 2.0
−0.5
0.0
0.5
1.0
1.5
2.0
intercept
slop
e
−0.0445 −0.044 −0.0435
−0.043 −0.0425
−0.042
−0.0415 −0.041
−0.0405 −0.04
−0.0395
−0.039
−0.0385 −0.038 −0.0375
−0.037 −0.0365
−0.036 −0.0355 −0.035
−0.0345
−0.034
−0.0335
−0.032
−0.0315
−0.031
−0.0305
−0.03
−0.0295
−0.029
−0.0285
−0.028
−0.0275
−0.027
−0.0265
−0.026
−0.0255
−0.025
−0.0245
●
Brier Accuracy Concordance
log
p
1� p
= 1 + x,
x ⇠ U [�6, 6]
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Brighter is better
Some “technical” sources of variation
• The big classic one: sample size
• Scoring rule
• Validation procedure
Validation
• Holdout data
• Cross-validation
• Repeat CV
• The Bootstrap
Holdout data
Holdout data
Holdout data
i) Fit model
ii) Calculate score
Cross validation
Cross validation
i) Fit model
ii) Score
Cross validation
iii) Fit model
iv) Score
Cross validation
iii) Fit model
iv) Score
&c., &c.
Cross validation
xi) Summarize by mean, sd
Repeated cross validation
It’s exactly what you’d expect
Bootstrap
Bootstrap
Bootstrap
Bootstrap
Bootstrap
Bootstrap
&c., &c., &c.
Bootstrap
F̂ ⇠ F<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
F̂ ⇤ ⇠ F̂<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Bootstrap
F̂ ⇤ ⇠ F̂<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Bootstrap
F̂ ⇤ ⇠ F̂<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
T (F̂ ⇤, F̂ ) ⇠ T (F̂ , F )<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
“The bootstrap principle”
Relative efficiency of two estimators
For two estimators, T1, T2, of the same quantity :
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Var(T1)Var(T2)
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Relative efficiency of two estimators
For two estimators, T1, T2, of the same quantity :
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Var(T1)Var(T2)
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
All else being equal, pick the less variable one
Relative efficiency of two estimators
0.10 0.15 0.20 0.25
05
1015
20
Error estimates, p=2, k=2
Den
sity
split samplebootstraprepeated cvcv
Brier score estimated in different ways
Relative efficiency to split sample:
Bootstrap: 3.5 CV: 3.6
Repeat CV: 3.6
Relative efficiency of two estimators
0.10 0.15 0.20 0.25
05
1015
20
Error estimates, p=2, k=2
Den
sity
split samplebootstraprepeated cvcv
Brier score estimated in different ways
Relative efficiency to split sample:
Bootstrap: 3.5 CV: 3.6
Repeat CV: 3.6
Need 3–4 times as many obs. w/ split sample!
Some lessons
1. Small data: new observations are hard to get
2. Optimize a less weird scoring rule
3. Estimate with less variance
Some lessons
1. Small data: new observations are hard to get
2. Optimize a less weird scoring rule
3. Estimate with less variance
Some lessons
1. Small data: new observations are hard to get
2. Optimize a less weird scoring rule
3. Estimate with less variance
Some lessons
1. Small data: new observations are hard to get
2. Optimize a less weird scoring rule
3. Estimate with less variance
❧ intermission ☙
Act III: Hold Fast
Brier score + Bootstrap�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Concordance: Higher better, random guess is .5
Brier score: Lower better, null model is .19
Brier score + Bootstrap�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Concordance Brier
Concordance: Higher better, random guess is .5
Brier score: Lower better, null model is .19
Brier score + Bootstrap�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Concordance: Higher better, random guess is .5
Brier score: Lower better, null model is .19
Brier score + Bootstrap�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Concordance: Higher better, random guess is .5
Brier score: Lower better, null model is .19
In short more lizards ahead
Reminder of likelihood penaltiesX
|�i| t<latexit sha1_base64="8NqjQ2vieh0SxUmPxqGqRkxR35Y=">AAAB/XicbVDLSgNBEJyNrxhfUfHkZTAInsKuCOot6MVjBGMC2RBmJ73JkJnddaZXCJuAv+LFg4pX/8Obf+PkcdDEgoaiqpvuriCRwqDrfju5peWV1bX8emFjc2t7p7i7d2/iVHOo8VjGuhEwA1JEUEOBEhqJBqYCCfWgfz3264+gjYijOxwk0FKsG4lQcIZWahcPfJMqOvQDQNYWQ+pLeKDYLpbcsjsBXSTejJTIDNV28cvvxDxVECGXzJim5ybYyphGwSWMCn5qIGG8z7rQtDRiCkwrm5w/osdW6dAw1rYipBP190TGlDEDFdhOxbBn5r2x+J/XTDG8aGUiSlKEiE8XhamkGNNxFrQjNHCUA0sY18LeSnmPacbRJlawIXjzLy+S2mn5suzenpUqV7M08uSQHJET4pFzUiE3pEpqhJOMPJNX8uY8OS/Ou/Mxbc05s5l98gfO5w8rx5Ur</latexit><latexit sha1_base64="8NqjQ2vieh0SxUmPxqGqRkxR35Y=">AAAB/XicbVDLSgNBEJyNrxhfUfHkZTAInsKuCOot6MVjBGMC2RBmJ73JkJnddaZXCJuAv+LFg4pX/8Obf+PkcdDEgoaiqpvuriCRwqDrfju5peWV1bX8emFjc2t7p7i7d2/iVHOo8VjGuhEwA1JEUEOBEhqJBqYCCfWgfz3264+gjYijOxwk0FKsG4lQcIZWahcPfJMqOvQDQNYWQ+pLeKDYLpbcsjsBXSTejJTIDNV28cvvxDxVECGXzJim5ybYyphGwSWMCn5qIGG8z7rQtDRiCkwrm5w/osdW6dAw1rYipBP190TGlDEDFdhOxbBn5r2x+J/XTDG8aGUiSlKEiE8XhamkGNNxFrQjNHCUA0sY18LeSnmPacbRJlawIXjzLy+S2mn5suzenpUqV7M08uSQHJET4pFzUiE3pEpqhJOMPJNX8uY8OS/Ou/Mxbc05s5l98gfO5w8rx5Ur</latexit><latexit sha1_base64="8NqjQ2vieh0SxUmPxqGqRkxR35Y=">AAAB/XicbVDLSgNBEJyNrxhfUfHkZTAInsKuCOot6MVjBGMC2RBmJ73JkJnddaZXCJuAv+LFg4pX/8Obf+PkcdDEgoaiqpvuriCRwqDrfju5peWV1bX8emFjc2t7p7i7d2/iVHOo8VjGuhEwA1JEUEOBEhqJBqYCCfWgfz3264+gjYijOxwk0FKsG4lQcIZWahcPfJMqOvQDQNYWQ+pLeKDYLpbcsjsBXSTejJTIDNV28cvvxDxVECGXzJim5ybYyphGwSWMCn5qIGG8z7rQtDRiCkwrm5w/osdW6dAw1rYipBP190TGlDEDFdhOxbBn5r2x+J/XTDG8aGUiSlKEiE8XhamkGNNxFrQjNHCUA0sY18LeSnmPacbRJlawIXjzLy+S2mn5suzenpUqV7M08uSQHJET4pFzUiE3pEpqhJOMPJNX8uY8OS/Ou/Mxbc05s5l98gfO5w8rx5Ur</latexit>
X�2i t
<latexit sha1_base64="MKxOuK7+4jD6KuDz6gDROXoIIys=">AAAB/XicbVA9SwNBEN2LXzF+nYqVzWIQrMIlCGoXtLGMYEwgF8PeZpIs2ds7d+eEcAT8KzYWKrb+Dzv/jZvkCk18MPB4b4aZeUEshUHP+3ZyS8srq2v59cLG5tb2jru7d2eiRHOo80hGuhkwA1IoqKNACc1YAwsDCY1geDXxG4+gjYjULY5iaIesr0RPcIZW6rgHvklC6geArCPuK9SX8ECx4xa9kjcFXSTljBRJhlrH/fK7EU9CUMglM6ZV9mJsp0yj4BLGBT8xEDM+ZH1oWapYCKadTs8f02OrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HeeTsVKk4QFJ8t6iWSYkQnWdCu0MBRjixhXAt7K+UDphlHm1jBhlCef3mR1Culi5J3c1qsXmZp5MkhOSInpEzOSJVckxqpE05S8kxeyZvz5Lw4787HrDXnZDP75A+czx+K5pTD</latexit><latexit sha1_base64="MKxOuK7+4jD6KuDz6gDROXoIIys=">AAAB/XicbVA9SwNBEN2LXzF+nYqVzWIQrMIlCGoXtLGMYEwgF8PeZpIs2ds7d+eEcAT8KzYWKrb+Dzv/jZvkCk18MPB4b4aZeUEshUHP+3ZyS8srq2v59cLG5tb2jru7d2eiRHOo80hGuhkwA1IoqKNACc1YAwsDCY1geDXxG4+gjYjULY5iaIesr0RPcIZW6rgHvklC6geArCPuK9SX8ECx4xa9kjcFXSTljBRJhlrH/fK7EU9CUMglM6ZV9mJsp0yj4BLGBT8xEDM+ZH1oWapYCKadTs8f02OrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HeeTsVKk4QFJ8t6iWSYkQnWdCu0MBRjixhXAt7K+UDphlHm1jBhlCef3mR1Culi5J3c1qsXmZp5MkhOSInpEzOSJVckxqpE05S8kxeyZvz5Lw4787HrDXnZDP75A+czx+K5pTD</latexit><latexit sha1_base64="MKxOuK7+4jD6KuDz6gDROXoIIys=">AAAB/XicbVA9SwNBEN2LXzF+nYqVzWIQrMIlCGoXtLGMYEwgF8PeZpIs2ds7d+eEcAT8KzYWKrb+Dzv/jZvkCk18MPB4b4aZeUEshUHP+3ZyS8srq2v59cLG5tb2jru7d2eiRHOo80hGuhkwA1IoqKNACc1YAwsDCY1geDXxG4+gjYjULY5iaIesr0RPcIZW6rgHvklC6geArCPuK9SX8ECx4xa9kjcFXSTljBRJhlrH/fK7EU9CUMglM6ZV9mJsp0yj4BLGBT8xEDM+ZH1oWapYCKadTs8f02OrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HeeTsVKk4QFJ8t6iWSYkQnWdCu0MBRjixhXAt7K+UDphlHm1jBhlCef3mR1Culi5J3c1qsXmZp5MkhOSInpEzOSJVckxqpE05S8kxeyZvz5Lw4787HrDXnZDP75A+czx+K5pTD</latexit>
X⇥↵�2
i + (1� ↵)|�i|⇤ t
<latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit>
Need to choose t (aka lambda)
Risky procedure# nested cv in bootstrapboot <- boostrap_samples()for (b in boot) { lambda <- cross_validate_glmnet(b)}
Risky procedure# nested cv in bootstrapboot <- boostrap_samples()for (b in boot) { lambda <- cross_validate_glmnet(b)}
Risky procedure# nested cv in bootstrapboot <- boostrap_samples()for (b in boot) { lambda <- cross_validate_glmnet(b)}
Risky procedure# nested cv in bootstrapboot <- boostrap_samples()for (b in boot) { lambda <- cross_validate_glmnet(b)}
Risky procedure# nested cv in bootstrapboot <- boostrap_samples()for (b in boot) { lambda <- cross_validate_glmnet(b)}
i) train
ii) test
Risky procedure# nested cv in bootstrapboot <- boostrap_samples()for (b in boot) { lambda <- cross_validate_glmnet(b)}
i) train
ii) test
Bias toward !!!!!!!!!
0.00 0.05 0.10 0.15 0.20
010
2030
4050
60
Chosen lambda, p=100, k=1000
Den
sity
cvcv in bootstrapdeduplicated cv in bootstrap
Risky procedure
shrinkage parameter lambda
Instead choose lambda by AIC●●●
●●●●
●●
●●
●
●●●
●●●
●●●●
●
●●●●
●●●
●●
●●
●●●●●
●●●●
●●●
●●●●
●●●
●
●●●●●●●●
●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
●
0.00 0.05 0.10 0.15 0.20 0.25 0.30
−250
−150
−50
0
AIC as a function of shrinkage parameter
lambda
AIC
Scatterplot smoother
Max curvature
ElasticNet, alpha = .5
+ =
Figures from Hastie, Tibshirani, and Friedman: The Elements of Statistical Learning
X⇥↵�2
i + (1� ↵)|�i|⇤ t
<latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit><latexit sha1_base64="d4YfrtDxlfZbZi+PS4Pmi+bbts0=">AAACKXicbVDJSgNBFOxxN25Rj14ag6CIYUYE9eZy8ahgVMiM4U3nTdKkZ7H7jRCi3+PFX/Hiwe3qj9hJ5uBW0FBUveL1qzBT0pDrvjsjo2PjE5NT06WZ2bn5hfLi0oVJcy2wJlKV6qsQDCqZYI0kKbzKNEIcKrwMO8d9//IWtZFpck7dDIMYWomMpACyUqN86Js85r7CiOrcB5W1gfshEjTk9Tbf5Ove1lDduCvkO+5r2WpT0E/dcGqUK27VHYD/JV5BKqzAaaP87DdTkceYkFBgTN1zMwp6oEkKhfclPzeYgehAC+uWJhCjCXqDU+/5mlWaPEq1fQnxgfo90YPYmG4c2skYqG1+e33xP6+eU7QX9GSS5YSJGC6KcsUp5f3eeFNqFKS6loDQ0v6VizZoEGTbLdkSvN8n/yW17ep+1T3bqRwcFW1MsRW2ytaZx3bZATthp6zGBHtgT+yFvTqPzrPz5nwMR0ecIrPMfsD5/AJw2aWq</latexit>
ElasticNet, alpha = .5
Brier score
Freq
uenc
y
0.04 0.08 0.12 0.16
015
0
Bootstrapped estimates
Concordance
Freq
uenc
y
0.65 0.75 0.85 0.95
010
0
Stability
Freq
uenc
y
0.05 0.15 0.25 0.35
015
0
ElasticNet, alpha = .5�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Brier score
Freq
uenc
y
0.04 0.08 0.12 0.16
015
0
Bootstrapped estimates
ConcordanceFr
eque
ncy
0.65 0.75 0.85 0.95
010
0Stability
Freq
uenc
y
0.05 0.15 0.25 0.35
015
0
ElasticNet, alpha = .5�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Brier score
Freq
uenc
y
0.04 0.08 0.12 0.16
015
0
Bootstrapped estimates
ConcordanceFr
eque
ncy
0.65 0.75 0.85 0.95
010
0Stability
Freq
uenc
y
0.05 0.15 0.25 0.35
015
0
Use stratum information
ElasticNet, alpha = .5�� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
model �.� model �.�LIMMA-t .44 ± .30 .76 ± .20SAM .46 ± .26 .75 ± .24ANOVA-fs .51 ± .29 .75 ± .16ANOVA-s .41 ± .57 .75 ± .38t-test .65 ± 1.5 .74 ± .71ANOVA-f .44 ± .25 .72 ± .21
intercept .5stratum .49 ± .055lasso .36 ± 1.4ridge .81 ± 3.3
Table �.�: AUC presented as point estimate plus/minus two standard errors. Measuresthe probability of forecasting a higher probability of metastasis for a ran-domly chosen metastasis case than for a randomly chosen non-metastasiscase: higher is better. Model number refers to the equations in Section �.A.�.Model �.� includes stratum as a predictor. Below the break are the fourbaseline models.
The collected results for model �.� suggest some reason for optimism. Due tothe size of the standard errors we must necessarily be uncertain about even thefirst significant digit of our point estimates. But even accounting for uncertaintythere seems to be predictive information better than random guess. As in thesimulations, there is not too much difference between the different methods,perhaps apart from the simple t-test, for which we observe much variance.Note that both SAM and LIMMA are flexible frameworks and we could haveaccounted for stratum and followup in either. Our comparison is between usingthis information and various ways of not using it, and there is no reason tobelieve that either framework should perform poorly if we were to use morerefined models there.
Table �.� shows the predictor set stability as point estimate plus/minus twostandard errors. Stability is in general very low, and the standard errors suggestthat there is even some uncertainty to the order of magnitude of the pointestimates. A possible interpretation is that the correlation between genes issuch that many different genes hold similar information. It is at least clear thatwe need much more data if we want to find a stable set of predictor genes. Ifwe take the point estimates at face value, Table �.� reflects the fact that we seelower uncertainty using ANOVA-f/fs in Tables �.� and �.�.
issue mentioned in the preamble to this chapter. For details see Section �.�.� and Section�.�.
�.A A P P E N D I X: VA R I A B L E S E L E C T I O N M E T H O D S ��
model �.� model �.�t-test .17 ± .45 .17 ± .33ANOVA-fs .27 ± .13 .18 ± .10SAM .34 ± .11 .20 ± .15ANOVA-s .33 ± .22 .20 ± .25ANOVA-f .31 ± .084 .21 ± .11LIMMA-t .35 ± .14 .20 ± .17
intercept .19 ± .010stratum .22 ± .029lasso .27 ± .19ridge .23 ± .30
Table �.�: Brier scores presented as point estimate plus/minus two standard errors.Measures error in forecast probability: lower is better. Model number refersto the equations in Section �.A.�. Model �.� includes stratum as a predictor.Below the break are the four baseline models.
but it is noteworthy that the intercept-only model is among the best-calibrated.The uncertainty is large enough that is difficult to say that any selection methodis better than any other. It is clear that the interaction with detection method inmodel �.� improves calibration for all models. There is also lower uncertaintyin the ANOVA-f/fs models.
AUC or concordance probability is a measure of a model’s ability to discriminatebetween outcomes: the higher the better. Brier score alone does not providefull information about predictive performance; the intercept-only model is well-calibrated but cannot be used for prediction at all. Random guess (or forecastinga constant for every observation) yields AUC of .�; perfect discrimination yieldsAUC of unity. Table �.� shows AUC as point estimate plus/minus two standarderrors in decreasing order by model �.�. Again the clearest signal is that theadded information from detection method is very important. Point estimatesimprove markedly and standard errors generally decrease. Also here does useof stratification and followup time in preselection reduce uncertainty.
The ridge regression baseline performance has a very good AUC point estimate,but the standard error is very large. Too large: it is a theorem that the upperbound on standard deviation in a variable 2 [0, 1] is 1
2 . This says somethingabout the imperfection of the jackknife as an estimator of standard error. Theblame lies at least in part with the correctional factor n�1n in Equation �.�, whichwas originally defined heuristically. Since it is difficult to suggest a sensiblealternative, we choose to live with this.�
�. This was really the result of nesting a cross-validation in the bootstrap: the methodology
Brier score
Freq
uenc
y
0.04 0.08 0.12 0.16
015
0
Bootstrapped estimates
ConcordanceFr
eque
ncy
0.65 0.75 0.85 0.95
010
0Stability
Freq
uenc
y
0.05 0.15 0.25 0.35
015
0
Does not
ElasticNet, alpha = .5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Calibration curve for predictions
Predicted metastasis probability
Prop
ortio
n of
met
asta
ses
expectedmiddle 80%
ElasticNet, alpha = .5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Calibration curve for predictions
Predicted metastasis probability
Prop
ortio
n of
met
asta
ses
expectedmiddle 80%
Overestimation
Underestimation
108 genes selected�.� R E S U LT S ��
likely. This is a natural consequence of doing variable selection: “redundant”information may shrink out of the model.
Table �.�: Resampling selection probability for the ��� elasticnet-selected genes.
GRK�a 0.853 C�orf��� 0.290 ANO� 0.221 FBLN� 0.157GPATCH� 0.682 LOC������ 0.287 PTTG�IP 0.219 BLMH 0.156GNGT� 0.474 RNF��� 0.280 �NDg�gVCd. . .b 0.218 FCRL� 0.149PDGFDc 0.467 SULT�A� 0.278 USF� 0.216 TDRD� 0.143FAM��B 0.457 ZNF��� 0.271 BCCIP 0.210 ACY� 0.142PTPRN� 0.442 USE� 0.267 MGC����� 0.209 ZFP�� 0.142CBLB 0.440 DNMT�A 0.267 GRK�a 0.207 SLIC� 0.138PDCL 0.410 LOC������ 0.266 WTIP 0.205 PICK� 0.135RASA� 0.380 CNTNAP� 0.265 BCL�� 0.204 RTN�IP� 0.134C��orf�� 0.376 IL�RA 0.265 DLGAP� 0.200 CDCA�L 0.132TCEB� 0.374 CCT� 0.264 HRAS 0.199 BEX� 0.131CAPN� 0.354 R�HDM� 0.263 RAD� 0.189 FCAR 0.130STK�� 0.351 MRPL�� 0.260 PRKCE 0.187 ANKRD�� 0.111GUCY�A� 0.348 SLC��A� 0.256 UBAP�L 0.186 USP�� 0.109ZDHHC�� 0.345 GNG� 0.255 BPI 0.186 KIAA���� 0.106SULT�A� 0.336 PLA�G�C 0.251 DTX� 0.184 BRI�BP 0.106Z�FIQGkeo. . .d 0.335 TCF� 0.248 LASS� 0.182 TUBA�A 0.105FAM��A 0.328 uX��cu�f_. . .e 0.247 GSTT� 0.182 IDH� 0.102rh��dQX��. . .f 0.324 C��orf��� 0.245 SPATA�� 0.182 DDX�� 0.100LANCL� 0.323 VCL 0.242 IGLL� 0.172 ANKRD�� 0.094SERPINE� 0.318 EZH� 0.242 SPG�A 0.172 TFG 0.087ADIPOR� 0.314 PRPSAP� 0.237 PPAP�A 0.172 LILRA� 0.080GPR��� 0.312 ISY� 0.235 NOTCH�NL 0.172 C�orf�� 0.078PDGFDc 0.299 UGDH 0.234 TAF� 0.168 WDR�� 0.075LOC������ 0.294 ABCF� 0.230 CCDC��B 0.166 AHCYL� 0.068WEE� 0.293 C��orf� 0.229 LOC������ 0.158 HAUS� 0.068ITM�C 0.291 VAV� 0.225 CDH� 0.157 MAD�L� 0.053
a. Two probes map to the same gene GRK�. Combined selection probability is �.��, implyingthat both get selected together at least some of the time.
b. Illumina probe id �NDg�gVCdQkNdcg.Ko, missing annotation.c. Two probes map to the same gene PDGFD. Combined selection probability is �.���.d. Ilummina probe id Z�FIQGkeoCSiVAoKeg, missing annotation.e. Illumina probe id uX��cu�f_VUIuXoST�, missing annotation.f. Illumina probe id rh��dQX��hUS�uOpRQ, missing annotation.
Figure �.� shows the (log fold change) expression levels in each of the ���selected genes for the metastasized and non-metastasized observations. Theshaded area shows the middle .� of the bootstrap distribution for differencein medians between the two groups; the white notch shows the expectationof this distribution, by which the genes are ordered. The black snake-shapedline marks the two group medians. The non-metastasized median is usuallyaround zero, so the difference in medians is mostly dominated by the medianfold change of the metastasized observations. In other words, for these genesthe average case–control pair is similar in the non-metastasized group, whilethe average pair is dissimilar in the metastasized group.
108 genes selected�.� R E S U LT S ��
likely. This is a natural consequence of doing variable selection: “redundant”information may shrink out of the model.
Table �.�: Resampling selection probability for the ��� elasticnet-selected genes.
GRK�a 0.853 C�orf��� 0.290 ANO� 0.221 FBLN� 0.157GPATCH� 0.682 LOC������ 0.287 PTTG�IP 0.219 BLMH 0.156GNGT� 0.474 RNF��� 0.280 �NDg�gVCd. . .b 0.218 FCRL� 0.149PDGFDc 0.467 SULT�A� 0.278 USF� 0.216 TDRD� 0.143FAM��B 0.457 ZNF��� 0.271 BCCIP 0.210 ACY� 0.142PTPRN� 0.442 USE� 0.267 MGC����� 0.209 ZFP�� 0.142CBLB 0.440 DNMT�A 0.267 GRK�a 0.207 SLIC� 0.138PDCL 0.410 LOC������ 0.266 WTIP 0.205 PICK� 0.135RASA� 0.380 CNTNAP� 0.265 BCL�� 0.204 RTN�IP� 0.134C��orf�� 0.376 IL�RA 0.265 DLGAP� 0.200 CDCA�L 0.132TCEB� 0.374 CCT� 0.264 HRAS 0.199 BEX� 0.131CAPN� 0.354 R�HDM� 0.263 RAD� 0.189 FCAR 0.130STK�� 0.351 MRPL�� 0.260 PRKCE 0.187 ANKRD�� 0.111GUCY�A� 0.348 SLC��A� 0.256 UBAP�L 0.186 USP�� 0.109ZDHHC�� 0.345 GNG� 0.255 BPI 0.186 KIAA���� 0.106SULT�A� 0.336 PLA�G�C 0.251 DTX� 0.184 BRI�BP 0.106Z�FIQGkeo. . .d 0.335 TCF� 0.248 LASS� 0.182 TUBA�A 0.105FAM��A 0.328 uX��cu�f_. . .e 0.247 GSTT� 0.182 IDH� 0.102rh��dQX��. . .f 0.324 C��orf��� 0.245 SPATA�� 0.182 DDX�� 0.100LANCL� 0.323 VCL 0.242 IGLL� 0.172 ANKRD�� 0.094SERPINE� 0.318 EZH� 0.242 SPG�A 0.172 TFG 0.087ADIPOR� 0.314 PRPSAP� 0.237 PPAP�A 0.172 LILRA� 0.080GPR��� 0.312 ISY� 0.235 NOTCH�NL 0.172 C�orf�� 0.078PDGFDc 0.299 UGDH 0.234 TAF� 0.168 WDR�� 0.075LOC������ 0.294 ABCF� 0.230 CCDC��B 0.166 AHCYL� 0.068WEE� 0.293 C��orf� 0.229 LOC������ 0.158 HAUS� 0.068ITM�C 0.291 VAV� 0.225 CDH� 0.157 MAD�L� 0.053
a. Two probes map to the same gene GRK�. Combined selection probability is �.��, implyingthat both get selected together at least some of the time.
b. Illumina probe id �NDg�gVCdQkNdcg.Ko, missing annotation.c. Two probes map to the same gene PDGFD. Combined selection probability is �.���.d. Ilummina probe id Z�FIQGkeoCSiVAoKeg, missing annotation.e. Illumina probe id uX��cu�f_VUIuXoST�, missing annotation.f. Illumina probe id rh��dQX��hUS�uOpRQ, missing annotation.
Figure �.� shows the (log fold change) expression levels in each of the ���selected genes for the metastasized and non-metastasized observations. Theshaded area shows the middle .� of the bootstrap distribution for differencein medians between the two groups; the white notch shows the expectationof this distribution, by which the genes are ordered. The black snake-shapedline marks the two group medians. The non-metastasized median is usuallyaround zero, so the difference in medians is mostly dominated by the medianfold change of the metastasized observations. In other words, for these genesthe average case–control pair is similar in the non-metastasized group, whilethe average pair is dissimilar in the metastasized group.
Low selection frequencies: unstable signatures
Please turn to page 22 in the required reading
−1.0 −0.5 0.0 0.5 1.0 1.5
PRKCEBRI3BP
LOC647460PDCL
KIAA0495DDX52BCCIPACY1
RASA2R3HDM1FAM24B
WTIPC16orf5
SULT1A3LOC649210
DTX1USP39GRK5
ZFP57TCEB1GRK5
PDGFDPDGFDBCL10USE1BEX4CDH2
SERPINE2EZH2GNG8
3NDg8gVCd...PRPSAP2
rh13dQX04...PTPRN2C11orf48
ISY1NOTCH2NL
UBAP2LADIPOR2CNTNAP2PLA2G4C
GNGT2VCL
TUBA4AFCRL3GSTT1
GUCY1A3ITM2CTCF4
SLC38A1MGC29506
CBLBIGLL1VAV3
−1.0 −0.5 0.0 0.5 1.0 1.5
LOC654055SPATA20
BPISULT1A1GPR177
TFGUSF1
ANKRD57HAUS4LASS5
PTTG1IPLOC731486
STK19FCARPICK1
GPATCH4CAPN3
RNF214WDR60ZNF365
C1orf115FBLN5
C6orf47CCT5BLMH
MRPL43Z6FIQGkeo...
ANKRD35uX15cu4f_...
C20orf107HRAS
MAD2L2ABCF2IL2RAUGDH
TDRD9LILRA6
LANCL2ZDHHC11
FAM89ARTN4IP1
RAD1CDCA7LAHCYL2
TAF6PPAP2ASPG3A
DLGAP2WEE1SLIC1
DNMT3ACCDC90B
ANO8IDH1
Expression levels, group medians, and difference in group medians for selected genes
metastasizednon−metastasized Observations
median
median
Bootstrapped difference in mediansw/ middle 80% of bootstrap distribution
Some genes tend to be selected togetherCo−selection heatmap
0
0.1
0.2
0.3
0.4
0.5
0.6 �� C H A P T E R � M E TA S TA S I S P R E D I C T I O N
data even under resampling.
Table �.�: Genes that tend to be selected together, ordered alphabetically.
ADIPOR� FAM��A LANCL� PTPRN� SULT�A�C��orf�� GNG� LOC������ R�HDM� TCEB�C�orf��� GNGT� LOC������ RASA� TCF�CAPN� GPATCH� PDCL rh��dQX�. . . WEE�CBLB GRK� PDGFD SERPINE� Z�FIQGkeo. . .DNMT�A GUCY�A� PDGFD STK�� ZDHHC��FAM��B ITM�C PRPSAP� SULT�A� ZNF���
�.� Conclusion
We have demonstrated predictability of metastasis in these data. We can, witha high probability, rank case–control pairs in terms of predicted metastasisprobability. However we should not count the model itself as a reliable tool dueto poor calibration and stability, and since these results stem from exploratorymodeling we should be moderate in our expectations; further investigation isneeded to establish reliable results.
We provide ��� candidate predictor genes as an avenue for future research. Weare currently investigating their biological properties. An interesting statisticalinvestigation may be to review the importance of the stratification and how tobuild this into a shrinkage model, as the results in the appendix below indicatethat this may lead to improvements. We believe however that it is necessaryto obtain independent data to be able to make any inference stronger thangeneral indication.
�.A Appendix: variable selection methods
In addition to the main results presented above we previously explored vari-ous ad-hoc variable selection schemes. The results of these explorations arenot competitive compared with the above penalized likelihood model, but Ipresent them here for completeness and comparison. To make the next sectionscomplete we must define the followup time of a case. This is the number ofdays between provision of the blood sample and the eventual diagnosis ofcancer. Although followup introduces a time aspect, these are not time seriesdata in the strictest technical sense. Each observation stems from a differentwoman, so there should be no autocorrelation to speak of, and followup timeis random.
At this point maybe call a biologist
https://commons.wikimedia.org/wiki/File:Biologist_Victoria_Achkasova_20150529.jpg
Thesis: Small data require particular care
• 1000s of measurements, maybe 100 observations
• Validation matters more than you think
• Model search difficult
• I suggest to make more assumptions
Thesis: Small data require particular care
• 1000s of measurements, maybe 100 observations
• Validation matters more than you think
• Model search difficult
• I suggest to make more assumptions
Thesis: Small data require particular care
• 1000s of measurements, maybe 100 observations
• Validation matters more than you think
• Model search difficult
• I suggest to make more assumptions
Thesis: Small data require particular care
• 1000s of measurements, maybe 100 observations
• Validation matters more than you think
• Model search difficult
• I suggest to make more assumptions
Thesis: Small data require particular care
• 1000s of measurements, maybe 100 observations
• Validation matters more than you think
• Model search difficult
• I suggest to make more assumptions
Thesis: Small data require particular careFast
Good Cheap
Thesis: Small data require particular careAgnosticmodeling
Carefulvalidation
Smalldata
Thesis: Small data require particular careAgnosticmodeling
Carefulvalidation
Smalldata
NOFREE
LUNCHES
Mosteller & Tukey’s green book
Naturally, we all desire an adequate assessment of both the indications and their uncertainties, but we shouldn’t refuse
good cake only because we can’t have frosting too.
Closing curtain: Thank you.