+ All Categories
Home > Documents > 341-367 以健保資料庫建構頭頸癌併發吸入性肺炎 高風險病患之預 …

341-367 以健保資料庫建構頭頸癌併發吸入性肺炎 高風險病患之預 …

Date post: 04-Jan-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
27
以健保資料庫建構頭頸癌併發吸入性肺炎高風險病患之預測模式 341 以健保資料庫建構頭頸癌併發吸入性肺炎 高風險病患之預測模式 李彥賢* 嘉義大學資訊管理學系 賴家玄 嘉義長庚紀念醫院放射腫瘤科 蔡佳玲 鴻海精密工業股份有限公司 摘要 預防醫學是指以預防疾病的發生,來代替對疾病的治療,其主要目標在於健 康的促進以及疾病的預防,藉由讓民眾增加對疾病的認知、改變態度,用預防的 概念來管理健康。近年來隨著人口結構與疾病型態的轉變,使得預防醫學逐漸受 到重視。根據台灣衛福部 2014 年統計,頭頸癌死亡率在所有癌症中排名第五。 頭頸癌的治療方式根據病人狀況通常包含手術、放射治療及化學治療,然而相關 治療的後遺症或腫瘤位置的因素,往往引起患者吞嚥的問題而導致嗆咳,嚴重者 更會併發吸入性肺炎。根據研究,頭頸癌若併發吸入性肺炎,在 12 個月內的死 亡率將近 10%。過去研究雖指出頭頸癌併發吸入性之可能影響因素,但各研究間 觀察的變數不同,且研究結果略有差異,而實務上亦仍未建立評估準則可供醫師 評估病患。本研究期望能基於健保申報資料,利用資料探勘中分類學習技術,試 圖建構預測模式來協助預測頭頸癌併發吸入性肺炎之高風險病患,以期能給予病 患適當之衛生教育,預防吸入性肺炎或及早發現相關症狀,以降低患者的死亡風 險及相關醫療成本。實驗評估結果顯示,用以建立訓資料的抽樣方式顯影響 分類器效能,而從整體學習方的預測能來Boosting 一般資料預測Bagging 法;Bagging 法效能差異,取決用的基演算法,其中以 Decision Tree 法最佳。如此,本研究評估之五種演算法 皆達 成相當不 之預測 能,而以 RBF-Kernel SVM 學習 演算法 Bagging 更是對訓資料目標類資料未併發吸入性肺炎之頭頸癌 病患,有相當的預測能。 關鍵詞:頭頸癌、吸入性肺炎、民健康保險資料傾向分數對、整體學習 演算法 * 本文通訊作者。電子郵件信箱:[email protected] 2016/04/28 投稿;2017/01/25 修訂;2017/03/03 接受 李彥賢、賴家玄、蔡佳玲(2017),『以健保資料庫建構頭頸癌併發吸入 性肺炎高風險病患之預測模式』,中華民國資訊管理學報,第二十四卷, 第三期,頁 341-367
Transcript

12 10%

Boosting Bagging Bagging Decision Tree RBF-Kernel SVM
Bagging

* [email protected] 2016/04/28 2017/01/25 2017/03/03
2017 341-367
342
A Prediction Model for Head and Neck Cancer Patient Complicated with Aspiration Pneumonia
Yen-Hsien Lee*
Chia-Hsuan Lai
Jia-Ling Cai
Abstract
PurposeThe treatment-related adverse effects of head and neck cancer and/or
the anatomic location of tumors are likely to cause swallowing problems that might lead
to the complications such as choking, malnutrition, and aspiration pneumonia. Prior
research indicated the 12-month death rate of head and neck cancer patient with the
complication of aspiration pneumonia is nearly 10%. The factors that cause the
complication of aspiration pneumonia have been observed in prior studies but
inconclusive. This study aims to discover Taiwan’s National Health Insurance Research
Database, the most comprehensive records of medical insurance claim in Taiwan, to
construct a prediction model for the head and neck cancer patients who are at risk of
aspiration pneumonia.
Design/methodology/approach We reviewed the literature to identify a
collective set of thirteen factors, which are relevant to the head and neck cancer patients
with the complication of aspiration pneumonia and whose data values are available in
Taiwan’s National Health Insurance Research Database, and adopted them as
independent variables. We used propensity score matching to create training dataset and
implemented bagging-based and boosting-based ensemble learning methods with * Corresponding author. Email: [email protected] 2016/04/28 received; 2017/01/25 revised; 2017/03/03 accepted
Lee, Y.H., Lai, C.H. and Cai, J.L. (2017), ‘A prediction model for head and neck cancer patient complicated with aspiration pneumonia’, Journal of Information Management, Vol. 24, No. 3, pp. 341-367.
343
FindingsThe results suggested that the five investigated approaches were
effective in predicting the head and neck cancer patients at risk of aspiration pneumonia.
The prediction performances achieved by boosting-based ensemble learning methods
were better than bagging-based ones. Overall, the proposed approach can be promising
to the construction of prediction model for the head and neck cancer patients with
higher risk of aspiration pneumonia using Taiwan’s National Health Insurance Research
Database.
Research limitations/implicationsThis study applies ensemble learning to
construct the prediction model for predicting the head and neck cancer patients at risk of
aspiration pneumonia. The evaluation results reveal the effectiveness and the
practicability of the proposed method, which builds the prediction model based on
health insurance database. This study has contributed to the research area of health data
mining. Nevertheless, the independent variables used to construct the prediction model
are limited to the records of medical insurance claim. Future research is suggested to
incorporate other data sets, such as medical records into the construction of prediction
models.
Practical implicationsThe proposed method can be developed into a decision
support system to support physicians in assessing the head and neck cancer patients who
are at risk of aspiration pneumonia. Such patients can be well educated in advance to
prevent the occurrence of aspiration pneumonia. The development of such system is
feasible because the records of the medical insurance claim required for constructing the
prediction model are ready available.
Originality/valueThis study investigated the factors that may cause the
complication of aspiration pneumonia, thereby constructing a prediction model based on
the health insurance database to predict the head and neck cancer patients who are at
risk. We developed a method for database preprocessing, training dataset creation, and
prediction model construction. The evaluation results suggested practicability and
effectiveness of the proposed method.
Keywords: head and neck cancer, aspiration pneumonia, National Health Insurance
Research Database, propensity score matching, ensemble learning
344
Dubray-

1
3% 2012
Siegel et al. 2012
3 6
80%
Radio-Therapy; CCRTChu et al. 2013

Mittal et al. 2003
Rosen et
10%Eisbruch et al. 2002


Chu et al. 2013; Langerman et al. 2007; Mortensen et al. 2013; Xu et al. 2015
1 http://www.mohw.gov.tw/news/531349778
345






20


2001

346



A B



2009

Adnet & Baud 1996

Irwin et al.
1999
Marik 2001

Daniels et al. 1998 10%
Roy et al. 1989

2002 8 130



X2-test
347
Chu 2013


X2-testFisher’s exact testst-testlogistic regression


Xu 2015


Gray’s testMultivariate
predictorsGray regression models


Langerman et al.2007 6










B
1995 99%

DOCPER
HVHOXDRUG
IDHOSP_GRADHOSTDTL
LICDT
CTDD
DOCDOO
GDGO
IDGDDGOD


Classification
2 http://nhird.nhri.org.tw/date_01.html
349
ID3C4.5Mingers 1989Native BayesMitchell 1997
Neural Network Backpropagation Neural
NetworkBerry & Linoff 1997Support-Vector-Machine, SVM
Cortes & Vapnik 1995K K-Nearest-NeighborsHenley &
Hand 1996


UCI 27 20




Bootstrap Bagging

learnable
Kearns Valiant1989
weak learnability

AdaBoost
Bauer & Kohavi 1999; Dietterich 2000
Freund Schapire1999 AdaBoost
AdaBoost
& Schapire 1996noise
mislabeled Bagging
Boosting AdaBoost
Freund & Schapire

2

Langerman et al.2007Mortensen et al. 2013Chu et al.2013Xu et al. 2015
Langerman et al.2007Mortensen et al. 2013Chu et al.2013Xu et al. 2015


Langerman et al.2007Mortensen et al. 2013Chu et al.2013Xu et al. 2015
351

Mortensen et al.2013Chu et al. 2013Xu et al.2015

Mortensen et al.2013Chu et al. 2013Xu et al.2015
Chu et al.2013Xu et al.2015












2.
5. DD CD
DD_all CD_all

2. DD_allCD_all
HNC_all
353


1. 90
HNC_PN_ all

2. DD_all
3. CD_all
4. ID
1.
2.
HNC_PN_ all
11
DD_all
CD_all



923 D1 V581 9925
D2


ICD-9 codes 140-149160161



147 148 161
160

PN_all

Chu


430-438 530.11530.81787.1 332
ID
Oversampling Chawla et al. 2002; Lewis & Catlett 1994 Undersampling
Lin et al. 2009; Liu et al. 2009




Propensity
logistic regression model independent
variabledependent variable
0-1 2014
Becker and Ichino2002 PSM
nearest neighbor matching


Bauer & Kohavi 1999; Dietterich 2000; Freund & Schapire
1996Boosting
Boosting
decision
Dietterich 2000 Bagging
Naïve Bayes C4.5
RBF-Kernel SVMBoosting AdaBoost
LogitBoost

3


13 4

(c)
1 2 3 4 6 7 8 13 22 24 43
357/15346 36/528 0/5 0/2 1/5 30/1061 3/205 0/8 0/5 0/129 0/4
(d)

(46)/(2,845) (1)
(1)/(13) (1)
358
0.0000.0000.0000.000
0.0000.0020.0000.000



10
10


Bagging
Naïve BayesDecision Tree Radial Basis FunctionRBFKernel Support
Vector MachineSVM Boosting AdaBoost
LogitBoost Weka 3.7.13SPSS 22.0
Intel Core 2 Duo CPU 2.83GHz4GB RAMWindows 8.1 x64

FN

accuracy ROCAUC ROC
TP/(TP+FN)


SVM Bagging AdaBoost LogitBoost
Weka
Decision Tree RBF-Kernel SVM
RBF-Kernel SVM

Lüdemannet al. 2006; Metz 1978; Obuchowski
2003 Boosting
Bagging Bagging

Bagging-Naïve Bayes 0.939 0.967 0.953 0.986
Bagging-Decision Tree 0.974 0.998 0.986 0.983
Bagging-RBF Kernel SVM 0.810 0.972 0.891 0.933
AdaBoost 0.972 0.998 0.985 0.989
LogitBoost 0.972 0.995 0.984 0.990
undersampling



7 RBF-Kernel SVM 0.717
Specificity 71.7%

RBF-Kernel SVM

361


Bagging RBF-Kernel SVM

Chu 6 0.794 0.986 0.890 0.914 0.650
3 0.799 0.988 0.893 0.919 0.370

9


Bagging-Naïve Bayes 0.801 0.506 0.653 0.698
Bagging-Decision Tree 0.671 0.610 0.641 0.684
Bagging-RBF Kernel SVM 0.851 0.356 0.603 0.640
AdaBoost 0.680 0.577 0.628 0.663
LogitBoost 0.706 0.579 0.643 0.688

Boosting AdaBoost LogitBoost
ROC 0.933
Bagging
RBF-Kernel SVM 89%
95% Boosting Decision Tree
363




S 51-67
2014propensity score
eNews
363-372
Adnet, F. and Baud, F. (1996), ‘Relation between Glasgow Coma Scale and aspiration
pneumonia’, Lancet, Vol. 348, No. 9020, pp. 123-124.
Bauer, E. and Kohavi, R. (1999), ‘An Empirical Comparison of Voting Classification
Algorithms: Bagging, Boosting, and Variants’, Machine Learning, Vol. 36, No. 1,
pp. 105-139.
Baum, G.L., Crapo, J.D., Celli, B.R. and Karlinsky, J.B. (1998), Textbook of Pulmonary
Diseases, Lippincott Williams & Wilkins, Philadelphia.
Beasley, R.P., Lin, C.C., Hwang, L.Y. and Chien, C.S. (1981), ‘Hepatocellular
carcinoma and hepatitis B virus: a prospective study of 22 707 men in Taiwan’,
Lancet, Vol. 318, No. 8256, pp. 1129-1133.
Becker, S.O. and Ichino, A. (2002), ‘Estimation of average treatment effects based on
propensity scores’, The Stata Journal, Vol. 2, No. 4, pp. 358-377.
Berry, M.J. and Linoff, G.S. (1997), Data mining Techniques: For Marketing, Sales, and
Customer Support, Wiley Publishing, Inc., Indianapolis, Indiana.
Breiman, L. (1996), ‘Bagging Predictor’, Machine Learning, Vol. 24, No. 2, pp. 123-
140.
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002), ‘SMOTE:
Synthetic Minority Over-sampling Technique’, Journal of Artificial Intelligence
Research, Vol. 16, No. 1, pp. 321-357.
Chu, C.N., Muo, C.H., Chen, S.W., Lyu, S.Y. and Morisky, D.E. (2013), ‘Incidence of
pneumonia and risk factors among patients with head and neck cancer undergoing
radiotherapy’, BMC Cancer, Vol. 13, No. 370.
Cortes, C. and Vapnik, V. (1995), ‘Support-vector networks’, Machine Learning, Vol. 20,
365
No. 3, pp. 273-297.
Daniels, S.K., Brailey, K., Priestly, D.H., Herrington, L.R., Weisberg, L.A. and Foundas,
A.L. (1998), ‘Aspiration in patients with acute stroke’, Archives of Physical
Medicine and Rehabilitation, Vol. 79, No. 1, pp. 14-19.
Delgado, M., Sánchez, D., Martn-Bautista, M.J. and Vila, M.A. (2001), ‘Mining
association rules with improved semantics in medical databases’, Artificial
Intelligence in Medicine, Vol. 21, No. 1, pp. 241-245.
Dietterich, T.G. (2000), ‘Ensemble methods in machine learning’, Proceedings of the
First International Workshop on Multiple Classifier Systems, Cagliari, Italy, June
21-23, pp. 1-15.
Dubray-Vautrin, A., Ballivet de Régloix, S., Girod, A., Jouffroy, T. and Rodriguez, J.
(2015), ‘Epidemiology, diagnosis and treatment of head and neck cancers’, Soins,
Vol. 60, No. 798, pp. 32-35.
Eisbruch, A., Lyden, T., Bradford, C.R., Dawson, L.A., Haxer, M.J., Miller, A.E. and
Wolf, G.T. (2002), ‘Objective assessment of swallowing dysfunction and aspiration
after radiation concurrent with chemotherapy for head-and-neck cancer’,
International Journal of Radiation Oncology, Biology, Physics, Vol. 53, No. 1, pp.
23-28.
Frawley, W.J., Piatetsky-Shapiro, G. and Matheus, C.J. (1991), ‘Knowledge discovery in
databases: An overview’, AI Magazine, Vol. 13, No. 3, pp. 57-70.
Freund, Y. and Schapire, R.E. (1996), ‘Experiments with a new boosting algorithm’,
Proceedings of the Thirteenth International Conference on Machine Learning
(ICML '96), Bari, Italy, July 3-6, pp. 148-156.
Freund, Y. and Schapire, R.E. (1997), ‘A decision-theoretic generalization of on-Line
learning and an application to boosting’, Journal of Computer and System Sciences,
Vol. 55, No. 1, pp. 119-139.
Freund, Y. and Schapire, R.E. (1999), ‘A short introduction to boosting’, Journal of
Japanese Society for Artificial Intelligence, Vol. 14, No. 5, pp. 771-780.
He, H. and Garcia, E.A. (2009), ‘Learning from imbalanced data’, IEEE Transactions
on Knowledge and Data Engineering, Vol. 21, No. 9, pp. 1263-1284.
Henley, W.E. and Hand, D.J. (1996), ‘A k-nearest-neighbour classifier for assessing
consumer credit risk’, The Statistician, Vol. 45, No. 1, pp. 77-95.
Irwin, R.S., Cerra, F.B. and Rippe, J.M. (1999), Irwin and Rippe’s Intensive Care
Medicine, Lippincott Williams & Wilkins, Philadelphia.
Kearns, M. and Valiant, L. (1989), ‘Crytographic limitations on learning boolean
366
formulae and finite automata’, Proceedings of the Twenty-First Annual ACM
Symposium on Theory of Computing, Seattle, WA, USA, May 14-17, pp. 433-444.
Lüdemann, L., Grieger, W., Wurm, R., Wust, P. and Zimmer, C. (2006), ‘Glioma
assessment using quantitative blood volume maps generated by T1-weighted
dynamic contrast-enhanced magnetic resonance imaging: A receiver operating
characteristic study’, Acta Radiol, Vol. 47, No. 3, pp. 303-310.
Langerman, A., MacCracken, E., Kasza, K., Haraf, D.J., Vokes, E.E. and Stenson, K.M.
(2007), ‘Aspiration in chemoradiated patients with head and neck cancer’, Archives
of Otolaryngology–Head & Neck Surgery, Vol. 133, No. 12, pp. 1289-1295.
Lee, Y.H., Hu, P., Cheng, T.H., Huang, T.C. and Chuang, W.Y. (2013), ‘A preclustering-
based ensemble learning technique for acute appendicitis diagnoses’, Artificial
Intelligence in Medicine, Vol. 58, No. 2, pp. 115-124.
Lewis, D. and Catlett, J. (1994), ‘Heterogeneous uncertainty sampling for supervised
learning’, Proceedings of the 11th International Conference on Machine Learning,
New Brunswick, NJ, pp. 148-156.
Lin, Z., Hao, Z., Yang, X. and Liu, X. (2009), ‘Several SVM ensemble methods
integrated with under-sampling for imbalanced data learning’, Proceedings of the
Fifth International Conference on Advanced Data Mining and Applications
(ADMA’09), Beijing, China, August 17-19, pp. 536-544.
Liu, X.Y., Wu, J. and Zhou, Z.H. (2009), ‘Exploratory undersampling for class-
imbalance learning’, IEEE Transactions on Systems, Man, and Cybernetics, Part B:
Cybernetics, Vol. 39, No. 2, pp. 539-550.
Marik, P.E. (2001), ‘Aspiration pneumonitis and aspiration pneumonia’, New England
Journal of Medicine, Vol. 344, No. 9, pp. 665-671.
Metz, C.E. (1978), ‘Basic principles of ROC analysis’, Seminars in Nuclear Medicine,
Vol. 8, No. 4, pp. 283-298.
Mingers, J. (1989), ‘An empirical comparison of pruning methods for decision tree
induction’, Machine Learning, Vol. 4, No. 2, pp. 227-243.
Mitchell, T.M. (1997), Machine learning, McGraw Hill.
Mittal, B.B., Pauloski, B.R., Haraf, D.J., Pelzer, H.J., Argiris, A., Vokes, E.E.,
Rademaker, A. and Logemann, J.A. (2003), ‘Swallowing dysfunction--preventative
and rehabilitation strategies in patients with head-and-neck cancers treated with
surgery, radiotherapy, and chemotherapy: a critical review’, International Journal
of Radiation Oncology, Biology, Physics, Vol. 57, No. 5, pp. 1219-1230.
367
Mortensen, H.R., Jensen, K. and Grau, C. (2013), ‘Aspiration pneumonia in patients
treated with radiotherapy for head and neck cancer’, Acta Oncologica, Vol. 52, No.
2, pp. 270-276.
Obuchowski, N.A. (2003), ‘Receiver operating characteristic curves and their use in
radiology’, Radiology, Vol. 229, No. 1, pp. 3-8.
Parsons, L.S. (2001), ‘Reducing bias in a propensity score matched-pair sample using
greedy matching techniques’, Proceedings of the Twenty-Sixth Annual SAS® Users
Group International Conference, Long Beach, California, USA, April 22-25, pp.
214-226.
Rosen, A., Rhee, T.H. and Kaufman, R. (2001), ‘Prediction of aspiration in patients with
newly diagnosed untreated advanced head and neck cancer’, Archives of
Otolaryngology-Head & Neck Surgery, Vol. 127, No. 8, pp. 975-979.
Roy, T.M., Ossorio, M.A., Cipolla, L.M., Fields, C.L., Snider, H.L. and Anderson, W.H.
(1989), ‘Pulmonary complications after tricyclic antidepressant overdose’, CHEST
Journal, Vol. 96, No. 4, pp. 852-856.
Siegel, R., Naishadham, D. and Jemal, A. (2012), ‘Cancer statistics, 2012’, CA: A
Cancer Journal for Clinicians, Vol. 62, No. 1, pp. 10-29.
Valiant, L.G. (1984), ‘A theory of learnable’, Communications of the ACM, Vol. 27, No.
11, pp. 1134-1142.
Ward, E., Jemal, A., Cokkinides, V., Singh, G.K., Cardinez, C., Ghafoor, A. and Thun,
M. (2004), ‘Cancer disparities by race/ethnicity and socioeconomic status’, CA: A
Cancer Journal for Clinicians, Vol. 52, No. 4, pp. 78-93.
Xu, B., Boero, I.J., Hwang, L., Le, Q.T., Moiseenko, V., Sanghvi, P.R., Cohen, E.E.,
Mell, L.K. and Murphy, J.D. (2015), ‘Aspiration pneumonia after concurrent
chemoradiotherapy for head and neck cancer’, Cancer, Vol. 121, No. 8, pp. 1303-
1311.
Zorman, M., Eich, H.P., Kokol, P. and Ohmann, C. (2001), ‘Comparison of three
databases with a decision tree approach in the medical field of acute appendicitis’,
Studies in Health Technology and Informatics, Vol. 84, No. 2, pp. 1414-1418.
24__107
24__108
24__109
24__110
24__111
24__112
24__113
24__114
24__115
24__116
24__117
24__118
24__119
24__120
24__121
24__122
24__123
24__124
24__125
24__126
24__127
24__128
24__129
24__130
24__131
24__132
24__133

Recommended