+ All Categories
Home > Documents > IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

Date post: 16-Nov-2021
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
18
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 1 ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening Ahmed M. Alaa, Member, IEEE, Kyeong H. Moon, William Hsu, Member, IEEE, and Mihaela van der Schaar, Fellow, IEEE Abstract—Breast cancer screening policies attempt to achieve timely diagnosis by regularly screening healthy women. Various clinical decisions are needed to manage the screening process: selecting initial screening tests, interpreting test results, and deciding if further diagnostic tests are required. Such decisions are currently guided by clinical practice guidelines (CPGs), which represent a “one-size-fits-all” approach, designed to work well (on average) for a population. Since the risks and benefits of screening tests are functions of each patient’s features, person- alized screening policies tailored to the features of individuals are desirable. To address this issue, we developed ConfidentCare: a computer-aided clinical decision support system that learns a personalized screening policy from electronic health record (EHR) data. By “personalized screening policy”, we mean a clustering of women’s features, and a set of customized screening guidelines for each cluster. ConfidentCare operates by computing clusters of patients with similar features, then learning the “best” screening procedure for each cluster using a supervised learning algorithm. ConfidentCare utilizes an iterative algorithm that applies risk-based clustering of the women’s feature space, followed by learning an active classifier for every cluster. The algorithm ensures that the learned screening policy satisfies a predefined accuracy requirement with a high level of confidence for every cluster. By applying ConfidentCare to real-world data, we show that it outperforms the current CPGs in terms of cost- efficiency and false positive rates. Index Terms—Breast cancer, Confidence measures, Clinical decision support, Personalized medicine, Supervised learning. I. I NTRODUCTION P ERSONALIZED medicine is a healthcare paradigm that aims to move beyond the current “one-size-fits-all” ap- proach to medicine that does not take into account the features and traits of individual patients (e.g. their micro-biomes, environments, and lifestyles) [1]-[3]. Vast attention has been dedicated to research in personalized medicine that builds on data science and machine learning techniques to customize healthcare policies. For instance, the White House has led the “precision medicine initiative” [4], which is scheduled for discussion in the American Association for the Advancement of Science annual meeting for the year 2016 [5]. Breast cancer screening is an important healthcare process that can benefit from personalization. Screening is carried out in order to diagnose a woman with no apparent symptoms in a timely manner [6]-[10]. However, the screening process entails both A. M. Alaa, K. H. Moon, and M. van der Schaar are with the De- partment of Electrical Engineering, University of California Los Angeles, UCLA, Los Angeles, CA, 90024, USA (e-mail: [email protected], [email protected]). This work was supported by the NSF. W. Hsu is with the Department of Radiological Sciences, UCLA, Los Angeles, CA 90024, USA (email: [email protected]). benefits and costs that can differ from one patient to another [11]. This signals the need for personalized screening policies that balance such benefits and costs in a customized manner. In this paper we present ConfidentCare: a clinical de- cision support system (CDSS) that is capable of learning and implementing a personalized screening policy for breast cancer. The personalized screening policy is learned from data in the electronic health record (EHR), and is aimed to issue recommendations for different women with different features on which sequence of screening tests they should take. ConfidentCare discovers subgroups of “similar” patients from the EHR data, and learns how to construct a screening policy that will work well for each subgroup with a high level of confidence. Our approach can provide significant gains in terms of both the cost-efficiency, and the accuracy of the screening process as compared to other “one-size-fits-all” approaches suggested by current clinical practice guidelines (CPGs) that apply the same policy on all patients. A. Breast cancer screening and the need for personalization While breast cancer screening is believed to reduce mortal- ity rates [10], it is associated with the risks of “overscreening”, which leads to unnecessary costs, and “overdiagnosis”, which corresponds to false positive diagnoses that lead the patients to receive unnecessary treatments [11]. While different patients have different levels of risks for developing breast cancer [12]-[16], different tests have different monetary costs, and different levels of accuracy that depend on the features of the patient [17], common CPGs are aimed at populations, and are not typically tailored to specific individuals or significant subgroups [18]-[21]. Being designed to work well on “average” for a population of patients, following CPGs may lead to overscreening or overdiagnosis for specific subgroups of patients, such as young women at a high risk of developing breast cancer, or healthy older women who may have a relatively longer expected lifespan [22]. Moreover, some screening tests may work well for some patients, but not for others (e.g. a mammogram test will exhibit low accuracy for patients with high breast density [17]), which can either lead to “underdiagnosis” or poor tumor detection performance. Migrating from the “one-size-fits-all” screening and diagnosis policies adopted by CPGs to more individualized policies that recognize and approach subgroups of patients with different features is the essence of applying the personalized medicine paradigm to the breast cancer screening process [17], [22]-[25].
Transcript
Page 1: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 1

ConfidentCare: A Clinical Decision Support Systemfor Personalized Breast Cancer Screening

Ahmed M. Alaa, Member, IEEE, Kyeong H. Moon, William Hsu, Member, IEEE, andMihaela van der Schaar, Fellow, IEEE

Abstract—Breast cancer screening policies attempt to achievetimely diagnosis by regularly screening healthy women. Variousclinical decisions are needed to manage the screening process:selecting initial screening tests, interpreting test results, anddeciding if further diagnostic tests are required. Such decisionsare currently guided by clinical practice guidelines (CPGs), whichrepresent a “one-size-fits-all” approach, designed to work well(on average) for a population. Since the risks and benefits ofscreening tests are functions of each patient’s features, person-alized screening policies tailored to the features of individualsare desirable. To address this issue, we developed ConfidentCare:a computer-aided clinical decision support system that learnsa personalized screening policy from electronic health record(EHR) data. By “personalized screening policy”, we mean aclustering of women’s features, and a set of customized screeningguidelines for each cluster. ConfidentCare operates by computingclusters of patients with similar features, then learning the“best” screening procedure for each cluster using a supervisedlearning algorithm. ConfidentCare utilizes an iterative algorithmthat applies risk-based clustering of the women’s feature space,followed by learning an active classifier for every cluster. Thealgorithm ensures that the learned screening policy satisfies apredefined accuracy requirement with a high level of confidencefor every cluster. By applying ConfidentCare to real-world data,we show that it outperforms the current CPGs in terms of cost-efficiency and false positive rates.

Index Terms—Breast cancer, Confidence measures, Clinicaldecision support, Personalized medicine, Supervised learning.

I. INTRODUCTION

PERSONALIZED medicine is a healthcare paradigm thataims to move beyond the current “one-size-fits-all” ap-

proach to medicine that does not take into account the featuresand traits of individual patients (e.g. their micro-biomes,environments, and lifestyles) [1]-[3]. Vast attention has beendedicated to research in personalized medicine that builds ondata science and machine learning techniques to customizehealthcare policies. For instance, the White House has ledthe “precision medicine initiative” [4], which is scheduled fordiscussion in the American Association for the Advancementof Science annual meeting for the year 2016 [5]. Breast cancerscreening is an important healthcare process that can benefitfrom personalization. Screening is carried out in order todiagnose a woman with no apparent symptoms in a timelymanner [6]-[10]. However, the screening process entails both

A. M. Alaa, K. H. Moon, and M. van der Schaar are with the De-partment of Electrical Engineering, University of California Los Angeles,UCLA, Los Angeles, CA, 90024, USA (e-mail: [email protected],[email protected]). This work was supported by the NSF.

W. Hsu is with the Department of Radiological Sciences, UCLA, LosAngeles, CA 90024, USA (email: [email protected]).

benefits and costs that can differ from one patient to another[11]. This signals the need for personalized screening policiesthat balance such benefits and costs in a customized manner.

In this paper we present ConfidentCare: a clinical de-cision support system (CDSS) that is capable of learningand implementing a personalized screening policy for breastcancer. The personalized screening policy is learned fromdata in the electronic health record (EHR), and is aimed toissue recommendations for different women with differentfeatures on which sequence of screening tests they shouldtake. ConfidentCare discovers subgroups of “similar” patientsfrom the EHR data, and learns how to construct a screeningpolicy that will work well for each subgroup with a highlevel of confidence. Our approach can provide significantgains in terms of both the cost-efficiency, and the accuracy ofthe screening process as compared to other “one-size-fits-all”approaches suggested by current clinical practice guidelines(CPGs) that apply the same policy on all patients.

A. Breast cancer screening and the need for personalization

While breast cancer screening is believed to reduce mortal-ity rates [10], it is associated with the risks of “overscreening”,which leads to unnecessary costs, and “overdiagnosis”, whichcorresponds to false positive diagnoses that lead the patients toreceive unnecessary treatments [11]. While different patientshave different levels of risks for developing breast cancer[12]-[16], different tests have different monetary costs, anddifferent levels of accuracy that depend on the features of thepatient [17], common CPGs are aimed at populations, andare not typically tailored to specific individuals or significantsubgroups [18]-[21].

Being designed to work well on “average” for a populationof patients, following CPGs may lead to overscreening oroverdiagnosis for specific subgroups of patients, such as youngwomen at a high risk of developing breast cancer, or healthyolder women who may have a relatively longer expectedlifespan [22]. Moreover, some screening tests may work wellfor some patients, but not for others (e.g. a mammogram testwill exhibit low accuracy for patients with high breast density[17]), which can either lead to “underdiagnosis” or poor tumordetection performance. Migrating from the “one-size-fits-all”screening and diagnosis policies adopted by CPGs to moreindividualized policies that recognize and approach subgroupsof patients with different features is the essence of applying thepersonalized medicine paradigm to the breast cancer screeningprocess [17], [22]-[25].

Page 2: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 2

B. Contributions

ConfidentCare is a computer-aided clinical decision supportsystem that assists clinicians in making decisions on which(sequence of) screening tests a woman should take given herfeatures. ConfidentCare resorts to the realm of supervisedlearning in order to learn a personalized screening policy thatis tailored to granular subgroups of patients. In particular, thesystem recognizes different subgroups of patients, learns thepolicy that fits each subgroup, and prompts recommendationsfor screening tests and clinical decisions that if followed, willlead to a desired accuracy requirement with a desired levelof confidence. Fig. 1 offers a system-level illustration forConfidentCare1. The system operates in two stages: an offlinestage in which it learns from the EHR data how to clusterpatients, and what policy to follow for every cluster, and anexecution stage in which it applies the learned policy to everywoman by first matching her with the closest cluster of patientsin the EHR, and then approach her with the policy associatedwith that cluster. The main features of ConfidentCare are:• ConfidentCare discovers a set of patients’ subgroups.

Given accuracy requirements and confidence levels thatare set by the clinicians, ConfidentCare ensures that everysubgroup of patients experiences a diagnostic accuracy,and a confidence level on that accuracy, that meets theserequirements. Thus, unlike CPGs that perform well onlyon average, ConfidentCare ensures that the accuracy ishigh for every subgroup of patients.

• ConfidentCare ensures cost-efficiency, i.e. patients arenot overscreened, and the sequence of recommendedscreening tests minimizes the screening costs.

The design of ConfidentCare is grounded to a new theoreticalframework for supervised learning which entails the followingtechnical contributions:• We develop a new formulation for supervised learning

problems where the learning task entails ensuring a highconfidence level on the performance of the learner fordifferent, disjoint partitions of the feature space, ratherthan the conventional formulation of supervised learningwhich focuses only on the average performance.

• We introduce a new notion of learnability that suits thescenarios where the goal is to carry out a constrainedminimization of a cost function.

• We develop an iterative algorithm that uses breast cancerrisk assessment to partition the feature space and learns acost-sensitive, high-confidence screening policy for everypartition.

We show that ConfidentCare can improve the screeningcost-efficiency when compared with CPGs, and can offerperformance guarantees for individual subgroups of patientswith a desired level of confidence. Moreover, we show thatConfidentCare can achieve a finer granularity in its learnedpolicy with respect to the patients feature space when it isprovided with more training data. Our results emphasize thevalue of personalization in breast cancer screening process,

1We will revisit this figure and give a more detailed explanation for thesystem components in the next Section

and represent a first step towards individualizing breast cancerscreening, diagnosis and treatment.

C. Related works

1) Personalized (precision) medicine: While medicalstudies investigate the feasibility, potential and impact ofapplying the concepts of personalized medicine in the breastcancer screening process [1]-[3], [17]-[27], [30][31], none ofthese works provided specific tools or methods for buildinga personalized healthcare environment. For instance, in [17],it was shown that CPGs, which recommend screening testsonly based on the age ranges, such as the European Societyfor Medical Oncology (ESMO) CPG and the AmericanCancer Society (ACS) CPG, are not cost-efficient for manysubgroups of patients, where cost-efficiency was measuredin terms of “costs per quality-adjusted life-year”, and theauthors recommended that screening should be personalizedon the basis of a patient’s age, breast density, history of breastbiopsy, and the family history of breast cancer [28][29].Similar results were reported in other medical studies [25]-[27], all suggesting that personalization using dimensionsother than the age can yield more cost efficiency.

2) Dynamic treatment regimes: The work that relatesmost to ours is that on Dynamic treatment regimes (DTRs)[33]-[37]. A DTR is typically a sequence of decision rules,with one rule per stage of clinical intervention, where eachrule maps up-to-date patient information to a recommendedtreatment [33]. DTRs aim to find an “optimal treatmentpolicy”: a sequential mapping of the patient’s informationto recommended treatments that would maximize thepatient’s long term reward. Such policies are constructedvia reinforcement learning techniques, such as Q-learning.However, these works profoundly differ from the setting weconsider in the following aspects: 1) DTRs are only focusedon recommending treatments and do not consider screeningand diagnoses; 2) DTRs does not consider cost-efficiency inthe design of policies since they only consider the “valueof information” in recommending treatments; 3) DTRs’complexity becomes huge when the number of patient“states” increases; 4) while confidence measures can becomputed for policies in DTRs [35], the policies themselvesare not designed in a way that guarantees to the clinician acertain level of reliability for every subgroup of patients.

3) Active classification for medical diagnosis: Screeningand diagnostic clinical decisions typically involve “purchasingcostly information” for the patients, which relates to theparadigm of active learning [38]-[45]. We note that in our set-ting, clinicians “purchase” costly features of the patients ratherthan purchasing unobserved labels, which makes our settingdifferent from the conventional active learning framework[38]-[40]. Classification problems in which some features arecostly are referred to as “active classification” [41], or “activesensing” [44]. Such problems have been addressed in thecontext of medical diagnosis in [41]-[45], but all these workscorrespond to solving an unconstrained optimization problem

Page 3: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 3

Clinician

Patient Decide the

patient’s class

Feature space partitions and associated classifiers

FNR and confidence

level satisfied? Active

classifier

Mammogram Ultrasound MRI Screening features

Personal features

Recommended Test

Policy execution stage

Electronic Health Record (EHR)

FNR /FPR requirements

Confidence level

Learning best classifier for each

patient class

Offline policy construction

Clustering Algorithm

Learning Algorithm

Personalized screening policy

Stratify patients into

subgroups

Patient subgroups

Fig. 1: Schematic of ConfidentCare described in Section II illustrating the offline policy construction and policy execution stage for newpatients.

TABLE I: Comparison against existing literature

Method PersonalizationAccuracy and

confidenceguarantees

Cost-efficiency

DTRs Yes No NoActive

classification No No Yes

ConfidentCare Yes Yes Yes

that targets the whole population, for which no personalizedaccuracy or confidence guarantees can be provided. Table IIpositions our paper with respect to the existing literature byconsidering various aspects.

The rest of the paper is organized as follows. In SectionII, we present a high-level view for the system componentsand operation of ConfidentCare. Next, in Section III, wepresent the technical problem formulation. In Section IV, wepresent the ConfidentCare algorithm, and we carry out variousexperiments using a dataset collected at the UCLA medicalcenter in Section V. Finally, in Section VI, we draw ourconclusions.

II. CONFIDENTCARE: SYSTEM COMPONENTS ANDOPERATION

A. System operation

ConfidentCare is a computer-aided clinical decision supportsystem that learns a personalized screening policy from theEHR data. By a “personalized screening policy” we mean:

a procedure for recommending an action for the clinician totake based on the individual features of the patient, and theoutcomes of the screening tests taken by that patient. An actioncan be: letting the patient take an additional screening test,proceed to a diagnostic test (e.g. biopsy), or just recommenda regular follow-up.

The tasks that ConfidentCare carries out can be summarizedas follows:

• Discover the granularity of the patient’s population:The system is provided with training data from the EHRthat summarizes previous experiences of patients in termsof the screening tests they took, their test results, and theirdiagnoses. From such data, ConfidentCare recognizesdifferent subgroups or clusters of patients who are similarin their features and can be approached using the samescreening policy.

• Learn the best policy for each subgroup of patients:Having discovered the distinct subgroups of patients fromthe training data, ConfidentCare finds the best screeningpolicy for each of these subgroups; by a “best” policy wemean: a policy that minimizes the screening costs whilemaintaining a desired level of diagnostic accuracy, witha high level of confidence that is set by the clinicians.The more training data provided to ConfidentCare, themore “granular” the learned policy leading to increasedpersonalized recommendations for patients.

• Identify the incoming patients’ subgroups and executetheir personalized policies: After being trained, Confi-dentCare handles an incoming patient by observing her

Page 4: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 4

features, identifying the subgroup to which she belongs,and suggests the appropriate screening policy.

ConfidentCare can be thought of as an algorithm thatstratifies patients into clusters, and automatically generatesmultiple CPGs, one for each cluster, in order to issue the bestcustomized guidelines to follow for each cluster. The algorithmensures that the accuracy of clinical decisions for each clustersatisfy a certain requirement with a certain confidence level.

B. Idiosyncrasies of breast cancer screening

Patients’ features fall into two categories: personal features,and screening features. Personal features are observable at nocost, and are accessible without the need for taking any screen-ing tests, for that they are provided by the patient herself viaa questionnaire, etc. The personal features include numericaland categorical features such as: age, age at menarche, numberof previous biopsies, breast density, age at first child birth, andthe family history [17].

Screening tests reveal a set of costly features for the patient,which we call: the screening features. The screening featurescomprise the radiological assessment of breast images, usuallyencoded in the form of BI-RADS (Breast Imaging Report andData System) scores [28]. The BI-RADS scores take valuesfrom the set {1, 2, 3, 4A, 4B, 4C, 5, 6}, the interpretation ofwhich is given in Table II. BI-RADS scores of 3 or aboveare usually associated with followup tests or biopsy. Thedescriptions of all the personal and screening features areshown in Table III.

ConfidentCare considers three possible multimedia-basedscreening tests in the screening stage, which represent threedifferent imaging modalities: mammogram (MG), ultrasound(US), and magnetic resonance imaging (MRI). Every screeningtest is associated with different costs and risks, which are func-tions of the patients’ personal features. We consider a generalcost function that incorporates both the misclassification costsin addition to the monetary costs (the detailed cost model isprovided in the next subsection) [27]. ConfidentCare togetherwith the theoretical framework developed in this section canoperate upon a general class of features and tests, includinggenetic tests.

ConfidentCare recommends an action upon observing theoutcome of a specific screening test. The actions can include:recommend a regular (1 year) followup, recommend a diag-nostic test (biopsy), or an intermediate recommendation for anadditional (costly) screening test (short-term followup). Thefinal action recommended by the screening policy is eitherto proceed to a diagnostic test, or to take a regular followup(screening) test after 1 or 2 years. The accuracy measures thatwe adopt in this paper are: the false positive rate (FPR) andthe false negative rate (FNR), which are defined as follows:the FPR is the probability that a patient with a negative truediagnosis (benign or no tumor) is recommended to proceedto a diagnostic test, whereas the FNR is the probability thata patient with a positive true diagnosis (malignant tumor) isrecommended to take a regular followup screening test [32].

TABLE II: BI-RADS scores interpretation

Score Interpretation0 Incomplete.1 Negative.2 Benign.3 Probably benign.

4A Low suspicion for malignancy.4B Intermediate suspicion of malignancy.4C Moderate concern.5 Highly suggestive of malignancy.6 Known biopsy – proven malignancy.

TABLE III: Personal and screening features

Personal feature Description and range of values

Age information Age at screening test time-age atmenarche-age at first child birth.

Family history

Number of first degree relatives whodeveloped breast cancer (First degree

relatives are: mother, sister, anddaughter).

Number ofprevious biopsies An integer number of biopsies.

Screeningfeatures Description

Breast density

Described by four categories:• Category 1: The breast is almost

entirely fat (fibrous and glandulartissue < 25%).

• Category 2: There are scat-tered fibro-glandular densities (fi-brous and glandular tissue 25%to 50%).

• Category 3: The breast tissueis heterogeneously dense (fibrousand glandular tissue 50% to75%).

• Category 4: The breast tissueis extremely dense (fibrous andglandular tissue > 75%).

MG BI-RADS Radiological assessment of themammogram imaging.

US BI-RADS Radiological assessment of theultrasound test.

MRI BI-RADS Radiological assessment of the MRItest.

C. System components

ConfidentCare is required to deal with the environmentspecified above and carry out the three tasks mentionedearlier, which are: discovering the granularity of the patients’population, learning the appropriate policies for each subgroupof patients, and handling incoming patients by executing thelearned, personalized policy that best matches their observedfeatures and traits. In the following, we describe the Con-fidentCare algorithm, which implements those tasks usingsupervised learning.

The algorithm requires the following inputs from the clini-cian:

• A training set comprising a set of patients with theirassociated features, screening tests taken, and their truediagnoses.

• A restrictions on the maximum tolerable FNR.

Page 5: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 5

• A desired confidence level on the FNR in the diagnosesissued by the system.

Provided by the inputs above, ConfidentCare operates throughtwo basic stages:

• Offline policy construction stage: Given the trainingdata and all the system inputs, ConfidentCare implementsan iterative algorithm to cluster the patients’ personalfeature space, and then learns a separate active classifierfor each cluster of patients. Each active classifierassociated with a cluster of patients is designed suchthat it minimizes the overall screening costs, and meetsthe FNR and confidence requirements. The algorithmruns iteratively until it maximizes the number of patientclusters for which there exist active classifiers that canguarantee the performance and confidence requirementsset by the clinician. This ensures the maximum level ofpersonalization, i.e. ensure that the space of all patients’personal features is segmented into the finer possible setof partitions, where the performance requirements holdfor each of such partitions.

• Policy execution stage: Having learned a policy basedon the training data, ConfidentCare executes the policy byobserving the personal features of an incoming patient,associates her with a cluster (and consequently, an alreadylearned active classifier), and then the classifier associatedto that cluster handles the patient by recommendingscreening tests and observing the test outcomes, until afinal action is recommended.

Fig. 1 illustrates the components and operation of Confident-Care. In the offline policy construction stage, ConfidentCareis provided with training data from the EHR, the maximumtolerable FNR, and the desired level of confidence. Confi-dentCare runs an iterative algorithm that clusters the patients’personal feature space, and learns the best active classifier(the most cost-efficient classifier that meets the FNR accuracyand confidence requirements) for each cluster. In the policyexecution stage, ConfidentCare observes the personal featuresof the incoming patient, associates her with a patients cluster,and then recommends a sequence of screening tests to thatpatient until it issues a final recommendation.

To clarify the operation of ConfidentCare, consider thefollowing illustrative example. Assume that the set of personalfeatures are given by a tuple (Age, breast density, numberof first degree relatives with breast cancer). A patient witha personal features vector (55, 40%,0) is approached byConfidentCare. The system associates the patient with a certaincluster of patients that it has learned from the EHR data.Let the best policy for screening patients in that cluster, ascomputed by ConfidentCare, is to start with mammogram. Ifthe clinician followed such a recommendation, ConfidentCareobserved the mammogram BI-RADS score, say a score of1, and then it decides to issue a final recommendation for aregular followup. If the BI-RADS score was higher, say a scoreof 4A, then the system recommends an additional imaging test,e.g. an MRI, and then observes the BI-RADS score of the MRIbefore issuing further recommendations. The process proceeds

until a final recommendation is issued.

III. THE PERSONALIZED SCREENING POLICY DESIGN

ConfidentCare uses supervised learning to learn apersonalized screening policy from the EHR. In thissubsection, we formally present the learning model underconsideration.

1) Patients’ features: Let Xd, Xs, and Y be three spaces,where Xd is the patients’ d-dimensional personal feature space,Xs = Bs is the s-dimensional space of all screening features,where B = {1, 2, 3, 4A, 4B, 4C, 5, 6}, and Y is the space of allpossible diagnoses, i.e. Y = {0, 1}, where 0 corresponds to anegative diagnosis, and 1 corresponds to a positive diagnosis.The patients’ feature space is (d+s)-dimensional and is givenby X = Xd×Xs. Each instance in the feature space is a (d+s)-dimensional vector x = (xd,xs) ∈ X ,xd ∈ Xd,xs ∈ Xs,the entries of which correspond to the personal and screeningfeatures listed in Table III, and are drawn from an unknownstationary distribution D on X × Y , i.e. (x, y) ∼ D, wherey ∈ Y , and Dx is the marginal distribution of the patients’features, i.e. x ∼ Dx. The set of s available tests is denotedby T , where |T | = s.

The personal features are accessible by ConfidentCare withno cost, whereas the screening features are costly, for that thepatient needs to take screening tests to reveal their values.Initially, the entries of xs are blocked, i.e. they are all set toan unspecified value 〈∗〉, and they are observable only whenthe corresponding screening tests are taken, and their costsare paid. We denote the space of all possible screening testobservations as X ∗s = {B, 〈∗〉}s. ConfidentCare issues recom-mendations and decisions based on both the fully observedpersonal features xd, and a partially observed version of xs,which we denote as x∗s ∈ X ∗s . The screening feature vectorxs can indeed be fully observed, but this would be the caseonly if all the screening tests were carried out for a specificpatient.

In order to clarify the different types of features and theirobservability, consider the following illustrative example. As-sume that we only have two personal features: the age and thenumber of first degree relatives who developed breast cancer,whereas we have three screening tests T = {MG,MRI,US}.That is, we have that d = 2 and s = 3. Initially, ConfidentCareonly observes the personal features, e.g. observing a featurevector (42, 1, 〈∗〉 , 〈∗〉 , 〈∗〉) means that the patient’s age is 42years, she has one first degree relative with breast cancer,and she took no screening tests. Based on the learned policy,ConfidentCare then decides which test should the patient take.For instance, if the policy decides that the patient should take amammogram test, then the feature vector can then be updatedto be (42, 1, 2, 〈∗〉 , 〈∗〉), which means that the BI-RADS scoreof the mammogram is 2. ConfidentCare can then decide whataction should be recommended given that the BI-RADS scoreof the mammogram is 2: classify the patient as one who needsto proceed to a diagnostic test, or classify the patient as onewho just needs to take a regular followup test in a 1 yearperiod, or request an additional screening test result in orderto be able to issue a confident classification for the patient.

Page 6: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 6

Patientinstance

(x, y) ∼D

Sm ∼D⊗m

Learner

ActiveClassifier

δ

FNRrequirement

Confidenceparameter

η

xs

xd

Finalrecommendation

Result of test T

Request resultof test T ∈ T

Pay testcost cT

Personalfeatures

Training data

Fig. 2: Framework for the active classifier construction andoperation.

2) Active classification: The process described in the pre-vious subsection is a typical active classification process: aclassifier aims to issue either a positive or a negative diagnosis(biopsy or regular followup) for patients based on their costlyfeatures (test outcomes). Such a classifier is active in the sensethat it can query the clinician for costly feature informationrather than passively dealing with a given chunk of data [41].This setting should not be confused with conventional activelearning, where labels (and not features) are the costly pieceof information which the classifier may need to purchase[38][39]. In the following, we formally define an activeclassifier.

Definition 1: (Active classifier) An active classifier is ahypothesis (function)

h : X ∗s → Y ∪ T .

Thus, the active classifier either recommends a test in T ,or issues a final recommendation y ∈ Y , where y = 1corresponds to recommending a biopsy (positive screeningtest result) and y = 0 is recommending a regular followup(negative screening test result), given the current, partiallyobserved screening feature vector x∗s ∈ X ∗s . Whenever a testis taken, the screening feature vector is updated, based uponwhich the classifier either issues a new recommendation.

For instance, the range of the function h in our settingcan be {0, 1,MG,MRI,US}, i.e. Y = {0, 1} and T ={MG,MRI,US}. If h(x∗s) = 0 (or 1), then the classifier issues-with high confidence on the accuracy- a final recommendationfor a biopsy or a regular followup for the patient with ascreening feature vector x∗s ∈ X ∗s , whereas if h(x∗s) = MG,then the classifier recommends the patient with a screeningfeature vector x∗s to take a mammogram test. Note that ifh((〈∗〉 , 〈∗〉 , 〈∗〉)) = 0, then the classifier recommends no testsfor any patient.

3) Designing active classifiers: Designing an active clas-sifier for the breast cancer screening and diagnosis problemunder consideration cannot rely on conventional loss functions.This is because the classification problem involves costly

decision making under uncertainty, and different types ofdiagnostic errors (false negatives and false positives) have verydifferent consequences. Hence, our notion of learning needs tobe decision-theoretic, and new objective functions and learningalgorithms need to be defined and formulated.

We use an inductive bias approach for designing the ac-tive classifier; we restrict our learning algorithm to pickone hypothesis h from a specific hypothesis class H. Thatis, we compensate our lack of knowledge of the stationarydistribution D by inducing a prior knowledge on the set ofpossible hypothesis that the learning algorithm can output:a common approach for designing agnostic learners [47].Unlike the conventional supervised learning paradigm whichpicks a hypothesis that minimizes a loss function, we willdesign a learning algorithm that picks a hypothesis from H,such that the overall cost of screening is minimized, whilemaintaining the FNR to be below a predefined threshold, witha desired level of confidence; a common design objectivefor breast cancer clinical systems [29]. The screening costinvolves both the monetary costs of the screening tests, as wellas the misclassification cost reflected by the FPR. The FNRexperienced by the patients when using an active classifier his given by

FNR(h) = P (h(x∗s) = 0 |h(x∗s) ∈ Y, y = 1) , (1)

whereas the FPR is given by

FPR(h) = P (h(x∗s) = 1 |h(x∗s) ∈ Y, y = 0) . (2)

That is, the FNR is the probability that classifier h rec-ommends a regular followup (outputs a 0) for a screeningfeature vector xs, when the patient takes all the recommendedtests, given that the true diagnosis was 1, whereas the FPRis the probability that the classifier recommends a biopsy(outputs a 1) when the true diagnosis is 0. Both types oferror are very different in terms of their implications, andone can easily see that the FNR is more crucial, since itcorresponds to misdiagnosing a patient with breast cancer asbeing healthy [30]. Thus, the system must impose restrictionson the maximum tolerable FNR. On the other hand, theFPR is considered as a misclassification cost that we aim atminimizing given a constraint on the FNR [27].

Now we define the screening cost function. Let cT be themonetary cost of test T ∈ T , which is the same for all patients,and let cT be the normalized monetary cost of test T , givenby cT = cT∑

T′∈T cT ′

. Let c(h(xs)) be the total (normalized)monetary test costs that classifier h will pay in order to reacha final recommendation for a patient with screening featurevector xs. The average monetary cost of a hypothesis h isdenoted as c(h), and is given by c(h) = E [c(h(xs))] , wherethe expectation is taken over the randomness of the screeningtest results. To illustrate how the cost of a hypothesis iscomputed, consider the following example. Let the normalizedcosts of MG, US, and MRI be 0.1, 0.2 and 0.7 respectively.Initially, the classifier observes x∗s = (〈∗〉 , 〈∗〉 , 〈∗〉) . Assumea hypothesis h1 and a patient with a screening features vectorxs = (3, 1, 1). The hypothesis h1 has the following functionalform: h1((〈∗〉 , 〈∗〉 , 〈∗〉)) = MG, i.e. it initially recommends a

Page 7: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 7

mammogram for every patient, h1((3, 〈∗〉 , 〈∗〉)) = MRI, andh1((3, 1, 〈∗〉)) = 0. Hence, using h1, the screening cost is 0.8.Let h2 be another hypothesis with h2((〈∗〉 , 〈∗〉 , 〈∗〉)) = MG,h2((3, 〈∗〉 , 〈∗〉)) = 0. In this case, we have that c(h2) = 0.1,which is less than c(h1) = 0.8, yet it is clear that h2 has ahigher risk for a false negative diagnosis.

Let C(h) be the cost function for hypothesis h, whichincorporates both the average monetary costs and the aver-age misclassification costs incurred by h. Formally, the costfunction is defined as

C(h) = γ FPR(h) + (1− γ) c(h), (3)

where γ ∈ [0, 1] is a parameter that balances the importanceof the misclassification costs compared to the monetary cost.γ = 0 means that ConfidentCare builds the classifiers bysolely minimizing monetary costs, whereas γ = 1 means thatConfidentCare cares only about the misclassification costs. Anoptimal active classifier is denoted by h∗, and is the one thatsolves the following optimization problem

minh∈H

C(h)

s.t. FNR(h) ≤ η.(4)

Obtaining the optimal solution for (4) requires knowledge ofthe distribution D, in order to compute the average FNR andcost in (4). However, D is not available for the (agnostic)learner. Instead, the learner relies on a size-m training sampleSm = (xi, yi)i∈[m], with Sm

i.i.d∼ D⊗m, where D⊗m isthe product distribution of the m patient-diagnosis instances(xi, yi)i∈[m]. The training sample Sm feeds a learning algo-rithm A : Sm → H, where Sm is the space of all possiblesize-m training samples. The learning algorithm A simplytries to solve (4) by picking a hypothesis in H based onlyon the observed training sample Sm, and without knowing theunderlying distribution D. Fig. 2 depicts the framework forlearning and implementing an active classifier.

4) Learnability of active classifiers: In order to evaluatethe learner, and its ability to construct a reasonable solutionfor (4), we define a variant of the probably approximatelycorrect (PAC) criterion for learning active classifiers thatminimize the classification costs with a constraint on the FNR(conventional definitions for PAC-learnability can be foundin [41] and [47]). Our problem setting, and our notion oflearning depart from conventional supervised learning in thatthe learner is concerned with finding a feasible, and (almost)optimal solution for a constrained optimization problem, ratherthan being concerned with minimizing an unconstrained lossfunction.

In the following, we define a variant for the notion ofPAC-learnability, the probably approximately optimal (PAO)learnability, of a hypothesis set H that fits our problem setting.

Definition 2: (PAO-learning of active classifiers) We saythat active classifiers drawn from the hypothesis set H arePAO-learnable using an algorithm A if:

• H∗ = {h : ∀h ∈ H,FNR(h) ≤ η} 6= ∅, with h∗ =arg infh∈H∗ C(h), and h∗ ∈ H∗.

• For every (εc, ε, δ) ∈ [0, 1]3, there exists a polynomialfunction N∗H(ε, εc, δ) = poly( 1

εc, 1ε ,

1δ ), such that for

every m ≥ N∗H(ε, εc, δ), we have that

PSm∼D⊗m (C (A (Sm)) ≥ C(h∗) + εc) ≤ δ,

PSm∼D⊗m (FNR(A (Sm)) ≥ FNR(h∗) + ε) ≤ δ,

where N∗H(ε, εc, δ) is the sample complexity of the clas-sification problem.

PAO-learnability reflects the nature of the learning task of theactive classifier; a learning algorithm is “good” if it picks thehypothesis that, with a probability 1−δ, is within an ε from theregion of feasible region, and within an εc from the optimalsolution. In that sense, a hypothesis set is PAO-learnable ifthere exists a learning algorithm that can find, with a certainlevel of confidence, a probably approximately feasible andoptimal solution to (4).

The sample complexity N∗H(ε, εc, δ) does not depend onη, yet the feasibility of the optimization problem in (4), andhence the learnability of the hypothesis class, depends onboth the value of η and the hypotheses in H. From a bias-variance decomposition point of view, one can view η as arestriction on the amount of inductive bias a hypothesis setcan have with respect to the FNR, whereas ε, εc and δ arerestrictions on the true cost and accuracy estimation errorsthat the agnostic learner would encounter. The threshold ηqualifies or disqualifies the whole hypothesis set H from beinga feasible set for learning the active classifier, whereas thetuple (ε, εc, δ) decides how many training samples do we needin order to learn a qualified hypothesis set H. The notionof PAO-learnability can be thought of as a decision-theoreticvariant of the conventional PAC-learnability, since the learneris effectively solving a constrained cost-minimization problem.

5) Patients feature space partitioning: ConfidentCarelearns a different classifier separately for every subgroup of“similar” patients, which is the essence of personalization.However, the clustering of patients into subgroups is not aninput to the system, but rather a task that it has to carry out;ConfidentCare has to bundle patients into M subgroups, andto each subgroup a different active classifier that is tailored tothe features of the patients in that subgroup. The value of Mreflects the level of personalization, i.e. the larger M is, thelarger is the number of possible classifiers that are customizedfor every subgroup. Partitioning the patient’s population intosubgroups is carried out on the basis of the personal features ofthe patients; patients are categorized based on their personal,fully observable features.

Let (Xd, dx) be a metric space associated with the personalfeature space Xd, where dx is a distance metric, i.e. dx :Xd × Xd → R+. We define an M -partitioning πM (Xd, dx)over the metric space (Xd, dx) as a set of disjoint subsetsof Xd, i.e. πM (Xd, dx) = {C1, C2, . . ., CM}, where Ci ⊆ Xd,⋃Mi=1 Ci = Xd, and Cj

⋂Ci = ∅,∀i 6= j. We define a function

πM (Xd, dx;xd) as a map from the patient’s personal featurevector xd to the index of the partition to which she belongs,i.e. πM (Xd, dx;xd) = j if xd ∈ Cj .

Each partition is simply a subgroup of patients who arebelieved to be “similar”, where similarity is quantified by a

Page 8: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 8

distance metric. By “similar” patients, we mean patients whohave similar risks of developing breast cancer, and experiencesimilar levels of accuracy for the different screening tests.

6) Personalization and ConfidentCare’s optimizationproblem: A personalized screening policy is a tuple(πM (Xd, dx), [hj ]

Mj=1), i.e. a set of partitions over the

personal feature space and the screening guidelines associatedwith each partition. Given a certain partitioning πM (Xd, dx)of the personal feature space, the task of the learner is tolearn an active classifier hj ∈ H for each partition Cj , thatprovides (average) performance guarantees for the patients inthat partition if the size of the training set is large enough,i.e. larger than the sample complexity2. This may not befeasible if the size of the training sample is not large enoughin every partition, or if the hypothesis set has no feasiblehypothesis that have a true FNR less than η for the patientsin that partition. The following definition captures the extentof granularity with which a screening policy can handle thepatient’s population.

Definition 3: (M -personalizable problems) We say thatthe problem (H, Sm, δ, ε, εc,D) is M -personalizable if thereexists an M -partitioning πM (Xd, dx), such that for everypartition Cj ∈ πM (Xd, dx), H is PAO-learnable, and wehave that mj ≥ N∗H(ε, εc, δ), where mj =

∣∣Sjm∣∣, andSjm = |{(xi, yi) : i ∈ [m],xi,d ∈ Cj}|.That is, a problem is M -personalizable if H has a non-emptyset of feasible hypotheses for every partition, and the numberof training samples in every partition is greater than the samplecomplexity for learning H.

ConfidentCare constructs a feature space partitioning, i.e.the system recognizes the maximum number of patient sub-groups for which it can construct separate active classifiersthat meet the accuracy requirements. Designing a personalizedscreening policy involves partitioning Xd and designing anactive classifier for every partition is equivalent to . Fig.3 depicts the envisioned output of ConfidentCare for a 2Dpersonal feature space: the feature space is partitioned into4 partitions, and with each partition, an active classifier (adecision tree) is associated.

Let Π be the set of all possible partitioning maps forthe feature space as defined in (5). ConfidentCare aims atmaximizing the granularity of its screening policy by parti-tioning the feature space into the maximum possible numberof patient subgroups, such that the active classifier associatedwith each subgroup of patients ensures that the FNR of thissubgroup does not exceed η, with a confidence level of 1− δ.Thus, ConfidentCare is required to solve the optimizationproblem in (6). Once the optimal partitioning π∗M (Xd, dx) isfound by solving (6), the associated cost-optimal classifiersare constructed by solving (4).

Designing a screening policy computation algorithm isequivalent to designing a partitioning algorithm Apart : Sm →Π, and a learning algorithm A : Sjm → H. ConfidentCarewould operate by running the partitioning algorithm Apart tocreate a set of partitions of the personal feature space, and

2Note that the training set Sm is drawn from the total population of patients,but each active classifier associated with a certain partition is trained usingtraining instances that belong to that partition only.

Age

Bre

ast

den

sity

Active classifier

1 3

2 4

Personal feature

space

MG

US MRI

MRI

US

Negative

Negative Positive

Negative Negative Positive

B: BI-RADS

score

Fig. 3: An exemplary decision tree designed for a specificpatient subgroup.

then running the learning algorithm A once for each partitionin order to find the appropriate hypothesis for that partition.ConfidentCare computes an optimal screening policy if thepartitioning found by Apart is a solution to (6).

IV. CONFIDENTCARE ALGORITHM: ANALYSIS ANDDESIGN

In this section we introduce the optimal screening policyand the ConfidentCare Algorithm.

A. Optimal screening policies: analysis and technical chal-lenges

Theorem 1 provides an upper bound on the maximumnumber of clusters that can be constructed for a given dataset

Theorem 1: The maximum level of personalization thatcan be achieved for the problem (H, Sm, ε, εc, δ,D) is upper-bounded by

M∗ ≤⌊

m

N∗H(δ, ε, εc)

⌋,

where M∗ is the solution for (6).

Proof See Appendix A.

Theorem 1 captures the intuitive dependencies of the level ofpersonalization on m and (ε, εc, δ). As the training samplesize increases, a finer granularity of the screening policy canbe achieved, whereas decreasing any of (ε, εc, δ) will lead toa coarser policy that has less level of personalization.

While Theorem 1 gives an upper-bound on the possiblelevel of personalization, it does not tell whether such a boundis indeed achievable, i.e. is there a computationally-efficientpartitioning algorithm Apart, and a learning algorithm A,through which we can we construct an optimal personalizedscreening policy given a hypothesis set H and a trainingsample Sm? In fact, it can be shown that for any hypothesisclass H, the problem of finding the maximum achievablelevel of personalization in (6) is NP-hard. Thus, there is no

Page 9: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 9

Π =

{πM (Xd, dx) = {C1, . . ., CM}

∣∣∣∣∣∀Ci ∩ Cj = ∅,M⋃i=1

Ci = Xd, Ci ∀M ∈ {1, 2, . . ., |Xd|}

}. (5)

maxπM (Xd,dx)∈Π

M

s.t. (H, Sm, ε, δ, εc,D) is M -personalizable over πM (Xd, dx).(6)

efficient polynomial-time algorithm Apart that can find theoptimal partitioning of the personal feature space, and henceConfidentCare has to discover the granularity of the personalfeature space via a heuristic algorithm as we will show in thenext subsection.

Given that we have applied a heuristic partitioning algo-rithm Apart to the training data, and obtained a (suboptimal)partitioning πM (Xd, dx), what hypothesis set H should weuse, and what learning algorithm A should we chose inorder to learn the best active classifier for every partition?In order to answer such a question, we need to select bothan appropriate hypothesis set and a corresponding learningalgorithm. We start by studying the learnability of a specificclass of hypothesis sets.

Theorem 2: A finite hypothesis set H, with |H| < ∞, isPAO-learnable over a partition Cj ∈ πM (Xd, dx) if and onlyif infh∈H FNRj(h) ≤ η, where FNRj is the FNR of patientsin partition Cj .

Proof See Appendix B.

While the finiteness of the hypothesis set H is known to thedesigner, one cannot determine whether such a hypothesis setcan support an FNR that is less than η since the distributionD is unknown to the learner. Thus, the learnability of ahypothesis set can only be determined in the learner’s trainingphase, where the learner can infer from the training FNRestimate whether or not infh∈H FNR(h) ≤ η. Theorem 2 alsoimplies that solving the FNR-constrained cost minimizationproblem using the empirical estimates of both the cost andthe FNR will lead to a solution that with probability 1 − δwill be within εc from the optimal value, and within ε fromthe FNR constraint. Thus, an algorithm A that solves theconstrained optimization problem in (4) “empirically” is a“good” learner for the hypothesis set H. The key for theresult of Theorem 2 is that if |H| < ∞, then the FNR andcost functions are Glivenko-Cantelli classes [47], for whichthe uniform convergence property is satisfied, i.e. every largeenough training sample can be used to obtain a “faithful”estimate of the costs and the accuracies of all the hypothesesin the set H. We call the class of algorithms that solveoptimization problem in (4) using the empirical cost and FNRmeasures as empirical constrained cost-minimizers (ECCM).

B. ConfidentCare design rationale

Based on Theorem 2 and the fact that (6) is NP-hard, weknow that ConfidentCare will comprise a heuristic partitioningalgorithm Apart that obtains an approximate solution for (6),

and an ECCM learning algorithm A that picks a hypothesis inH for every partition. Since problem (6) is NP-hard, we usea Divide-and-Conquer approach to partition the feature space:we use a simple risk assessment-based 2-mean clusteringalgorithm Apart to split the a given partition in the personalfeature space, and we iteratively construct a decision tree usingA for each partition of the feature space, and then split allpartitions using Apart, until the algorithm A finds no feasiblesolution for (7) for any of the existing partitions if they are tobe split further.

The algorithm A can be any ECCM algorithm, i.e. A solvesthe following optimization problem

A(Sjm) = arg minh∈H

1

mj

∑(x,y)∈Sj

m

c (h(xs))

s.t.

∑(x,y)∈Sj

mI{h(xs) 6=y,y=1}∑

(x,y)∈SjmI{y=1}

≤ η −

√log (|H|) + log

(4δ

)2mj

,

(7)where the constraint in (7) follows from the sample complexityof H, which is N∗H (ε, εc, δ) = log(4|H|/δ)

2 min{ε2,ε2c}.

C. ConfidentCare algorithm

The inputs to ConfidentCare algorithm can be formallygiven by

• the size-m training data set Sm = (xi, yi)i∈[m].• the FNR restriction η.• the confidence level 1− δ.

The operation of ConfidentCare relies on a clustering algo-rithm that is a variant of Lloyd’s K-means clustering algorithm[48]. However, our clustering algorithm will be restricted tosplitting an input space into two clusters, thus we implement arisk assessment-based 2-means clustering algorithm, for whichwe also exploit some prior information on the input space.That is, we exploit the risk assessments computed via theGail model in order to initialize the clusters centroids [13]-[16], thereby ensuring fast convergence. Let G : Xd → [0, 1]be Gail’s risk assessment function, i.e. a mapping from apatient’s personal feature to a risk of developing breast cancer.Moreover, we use a distance metric that incorporates the riskassessment as computed by the Gail model in order to measurethe distance between patients. The distance metric used by ouralgorithm is

d(x, x′) =

d∑i=1

βi|xi,d − x′

i,d|+ βd+1|G(xd, τ)−G(x′

d, τ)|,

Page 10: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 10

Age

Bre

ast

den

sity

Age

Bre

ast

den

sity

Age

Bre

ast

den

sity

Personal feature space Personal feature space Personal feature space

Learned classifiers in each iteration

Fir

st i

tera

tio

n

Sec

on

d i

tera

tio

n

Th

ird

ite

rati

on

Fig. 4: Demonstration for the operation of ConfidentCare iterative algorithm. In each iteration, the personal feature space issplit and a decision tree is learned for the newly emerging partition of the space.

where G(xid, τ) is the probability that a patient with a featurevector xd would develop a breast cancer in the next τ years.The parameter β quantifies how much information from theGail model is utilized to measure the similarity betweenpatients. In our algorithm, we adopt a risk-based clusteringapproach, which assigns explicitly a weight of β to the `1-norm of the difference between feature values, and a weightof 1− β to the difference in their risk assessments. Thus, thedistance metric can be written as follows

d(x, x′) = β||x− x

′||+ (1− β)|G(x, τ)−G(x

′, τ)|.

Such a formulation explicitly merges the information extractedfrom the data (feature values), and the information extractedfrom medical domain-knowledge (risk assessment models),using a single parameter β. The value of the parameter βindicates to what extent we rely on prior (domain-knowledge)information in clustering the patients. Setting β = 0 isequivalent to stratifying the risk space, whereas β = 1 isequivalent to stratifying the feature space. The value of β needsto be learned as we show later in Section V-B.

Our clustering function, which we call Split(Xd, dx, τ,∆)takes as inputs: a size-N subset of the personal feature space(training set) Xd = {x1

d,x2d, . . .,x

Nd } ⊂ Xd, a distance metric

dx, a Gail model parameter τ , and a precision level ∆. Thefunction carries out the following steps:

• Compute the risk assessments{G(xid, τ)

}Ni=1

for allvectors in the (finite) input space using the Gail model.The parameter τ corresponds to the time interval overwhich the risk is assessed.

• Set the initial centroids to be µ1 = xi∗d , wherei∗ = arg miniG(xid, τ), and µ2 = xi

d , where i∗ =arg maxiG(xid, τ).

• Create two empty sets C1 and C2, which represent themembers of each cluster.

• Until convergence (where the stopping criterion is deter-mined by ∆), repeat the following: assign every vectorxid to C1 if dx(xid, µ1) < dx(xid, µ2), and assign it to C2otherwise. Update the clusters’ centroids as follows

µj =1

|Cj |

N∑i=1

Ixid∈Cjx

id, j ∈ {1, 2}.

• Return the clusters’ centroids µ1 and µ2.The rationale behind selecting the initial centroids as being thefeature vectors with maximum and minimum risk assessmentsis that those two patients’ features are more likely to end upresiding in different clusters. A detailed pseudocode for theclustering function is given in Algorithm 1. As we will showlater, ConfidentCare will utilize this function to iterativelypartition the personal feature space.

For a given feature space partitioning, ConfidentCare buildsan active classifier that emulates a “virtual CPG” for the set ofpatients within the partition. Designing the active classifier isequivalent to: following an inductive bias approach in whicha specific hypothesis class H is picked, and designing analgorithm A that takes the training set Sm as an input andpicks the “best” hypothesis in H, i.e. A(Sm) ∈ H.

Adopting decision trees as a hypothesis set is advantageoussince such a classifier is widely used and easily interpretablefor medical applications [42]-[45]. As shown in Fig. 3, Confi-dentCare will associate a decision tree active classifier withevery partition of the personal feature space. Such a treerepresents the policy to follow with patients who belong tothat partition; what tests to recommend and how to mapthe BI-RADS scores resulting from one test to a new testrecommendation or a diagnostic decision.

Learning the optimal decision tree h∗ ∈ H is known to bean NP-hard problem [49]. Thus, we resort to greedy algorithmA, which we call the confidence-based Cost-sensitive decisiontree induction algorithm (ConfidentTree). The main idea

Page 11: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 11

Algorithm 1: Split(Xd, dx, τ,∆).1 Input: A set N training vectors Xd, K > M , a distance

metric dx, a Gail model parameter τ , and a precisionlevel ∆.

2 Output: Two centroids µ1 and µ2;3 Initialize D−1 = 1, D0 = 0, k = 0, and

µ1 = xi∗d , i∗ = arg miniG(xid, τ), ;4 µ2 = xi

d , i∗ = arg maxiG(xid, τ) ;

5 C1 = ∅, C2 = ∅ ;6 while Dk−1−Dk

Dk> ∆ do

7 C1 ={xid∣∣∀xid ∈ Xd, dx(xid, µ1) < dx(xid, µ2)

};

8 C2 = Xd/C1;9 µ1 = 1

|C1|∑Ni=1 Ixi

d∈C1xid;

10 µ2 = 1|C2|

∑Ni=1 Ixi

d∈C2xid;

11 Set k ← k + 1;12 Compute the 2-means objective function

Dk = 1N

∑2j=1

∑Ni=1 Ixi

d∈Cjdx(xid, µj);13 end

of ConfidentTree is to select tests (nodes of the tree) ina greedy manner by using a splitting rule that operates asfollows: in each step, label the leaves that come out of eachpossible test such that the pessimistic estimate for the FNR(given the confidence level 1−δ) is less than η, and then pickthe test that maximizes the ratio between the information gainand the test cost. After growing such a tree, we apply post-pruning based on confidence intervals of error estimates [50].If there is no possible labeling of the tree leaves that satisfythe FNR requirements, the algorithm reports the infeasibilityof the FNR and confidence levels set by the clinician giventhe training set provided to the program. More precisely, thealgorithm ConfidentTree(Sm, πM (Xd, dx), j, η, 1−δ) takesthe following inputs:• the size-m training set Sm,• the personal feature space partitioning πM (Xd, dx),• the index j of the partition for which we are designing

the active classifier,• the FNR constraint η, and• the confidence level 1− δ.

Given these inputs, the algorithm then executes the followingsteps:• Extract the training instances that belong to partition Cj .• Grow a decision tree with the nodes being the screening

tests in T . The edges are the BI-RADS scores with thefollowing thresholds: BI-RADS < 3, BI-RADS ∈ {3, 4},and BI-RADS > 4. This classification is based on do-main knowledge [28]; the first category corresponds to aprobably negative diagnosis, the second corresponds to asuspicious outcome, whereas the third corresponds to aprobably malignant tumor.

• Split the tree attributes as follows: for each test, labelthe leaves such that the pessimistic estimate (see [50]for confidence interval and error estimates in the C4.5algorithm) for the FNR is equal to η, and then computethe cost function for each test, and select the test that

maximizes the ratio between the information gain andthe cost function.

• Apply post-pruning based on confidence intervals of theerror estimates as in the C4.5 algorithm [50]. This stepis carried out in order to avoid overfitting.

• Report the infeasibility of constructing a decision treewith the given FNR and confidence requirements if thepessimistic estimate for the FNR exceeds η.

A detailed pseudocode for ConfidentTree is given in Algo-rithm 2. ConfidentCare invokes this algorithm whenever thepersonal feature space is partitioned, and the active classifiersneed to be constructed.

Algorithm 2: ConfidentTree(Sm, πM (Xd, dx), j, η, 1−δ)

1 Input: A set of training instances Sm, a partitioningπM (Xd, dx), a partition index j, Maximum tolerableFNR η, and confidence level 1− δ.

2 Output: A cost-sensitive decision-tree hj that can beused as an active classifier for partition Cj .;

3 Let B1 be the event that BI-RADS < 3, B2 be thatBI-RADS ∈ {3, 4}, and B3 be BI-RADS > 4 ;

4 Extract the training set that belong to the targetedpartition Sjm = {(xi, yi) |∀i ∈ [m],xi,d ∈ Cj };

5 For each test, label the leaves attached to edges B1, B2,and B3 such that the empirical FNR is less than thesolution of the following equation for F

η =F + Q−1(δ)

2n +Q−1(δ)

√Fn −

F 2

n + Q−1(δ)2

4n2

1 + Q−1(δ)2

n

,

where Q(.) is the Q-function and n is the number oftraining instances covered by the leaf for which theclassification is 1. ;

6 Given this labeling, let Fp be the empirical value of thefalse positive rate, then pick the test s ∈ T thatmaximizes I(s;Sj

m)

γFp+(1−γ)cs, where I(x; y) is the mutual

information between x and y. ;7 Apply post-pruning using confidence intervals for error

estimates: a node is pruned if the error estimate of itsinduced sub-tree i s lower than error estimate of thenode.

ConfidentCare uses the modules ConfidentTree and Splitin order to iteratively partition the feature space and constructactive classifiers for each partition. ConfidentCare runs in twostages: the offline policy computation stage, and the policyexecution stage. In the offline policy computation stage, thefollowing steps are carried out:

1) Use the Split function to split all current partitions ofthe personal feature space.

2) Use the ConfidentTree to create new active classifiersfor the split partitions, if constructing a decision tree for aspecific partition is infeasible, stop splitting this partition,otherwise go to step (1).

After computing the policy, ConfidentCare handles the incom-ing patients in the policy execution stage as follows:

Page 12: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 12

1) Observe the personal features of the incoming patient,measure the distance between her feature vector and thecentroids of the learned partitions, and associate her withthe closest partition and the associated active classifier.

2) Apply active classification to the patient. After each testoutcome, ConfidentCare prompts a recommended test(the next node in the decision tree), and an intermediatediagnosis together with an associated confidence interval.The clinician and the patient will then decide whether ornot to proceed and take the next test.

The pseudocode for ConfidentCare in both the offline andonline modes is given in Algorithm 3. In the followingTheorem, we show that the greedy ConfidentCare algorithmcan guarantee a reasonable performance.

Algorithm 3: ConfidentCare (Sm, δ, η).1 Input:A training set Sm, required confidence level δ, and

FNR constraint η.2 Output: A sequence of recommendations, intermediate

diagnoses with confidence intervals, and a finaldiagnosis;

3 Offline policy computation stage: ;4 Initialize M =∞, q = 0;5 Initialize µ = ∅ (set of centroids of the personal feature

space) ;6 Hyper-parameters τ , γ, and ∆ can be tuned through a

validation set;7 while q 6= M do8 M = |µ| ;9 Create a partitioning Part(Xd, dx) based on the

centroids in µ ;10 For j = 1 to M ;

11 µ→ Split(Xd, dx, τ,∆);

12 hj = ConfidentTree(Sm, πM (Xd, dx), j, η, 1− δ) ;

13 If hj is infeasible: q ← q + 1 ;14 EndFor15 end16 Policy execution stage: ;17 For the incoming patient i, find the partition it belongs to

by computing the distance dx(xi,d, µj) for everypartition Cj , and associate it with the partition j∗ thatgives the minimum distance ;

18 Use classifier hj∗ to recommend tests and issue diagnoses

Fig. 4 demonstrates the operation of the iterative algorithm;in each iteration, partitions are split as long as a decisiontree for the new partitions are feasible, and the correspondingdecision trees are learned. The end result is a set of decisiontrees for the different partitions, representing different policiesto be followed for every class of patients. Following theCPGs correspond to having a single decision tree for theentire personal feature space, which may consistently performpoorly over specific partitions of the feature space, i.e. specificsubgroups of patients.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

β

Misclassificationerror

η = 0.1, δ = 0.05

False Positive Rate (FPR)

False Negative Rate (FNR)

β∗ = 0.75

Fig. 5: Optimal selection for the distance metric parameter βfor η = 0.1 and η = 0.1.

V. CONFIDENTCARE IN UCLA MEDICAL CENTER

In this section, we use real-world data to illustrate theperformance gains achievable with the use of ConfidentCarefor breast cancer treatment. Moreover, we evaluate the perfor-mance of ConfidentCare and the added value of personaliza-tion by comparing it with CPGs, and policies that are designedin a “one-size-fits-all” fashion.

A. Real-World Dataset for Breast Cancer Patients

A de-identified dataset of 25,594 individuals who under-went screening via mammograms (MG), magnetic resonanceimaging (MRI) and ultrasound (US) at the UCLA medicalcenter is utilized to gain insight into the performance ofCondfidentCare. The features associated with each individualare: age, breast density, ethnicity, gender, family history, ageat menarche, and age at the first child birth. Each individualhas underwent at least one of three screening tests: a MG, anMRI, an US, or a combination of those. With each test taken,a BI-RADS score is associated. Table IV shows the entriesof the dataset and the features associated with every patient.The dataset is labeled by 0 for patients who have a negativediagnosis, and 1 for patients who have a positive diagnosis(malignant tumor). The dataset is imbalanced (or biased) asthere are significantly more instances of MG compared withUS or MRI. Moreover, most patients exhibited negative testresults. Table V lists the percentages of patients who took eachscreening test, and the percentage of patients with positivediagnoses. All features were converted into numerical valuesand normalized. The normalized monetary costs for MG, US,and MRI where set to 0.1, 0.2 and 0.7 respectively, and γ isset to 0.5. In the following subsection, we demonstrate theoperation of ConfidentCare.

B. Learning the distance metric

Recall from Section IV that clustering of the patients’personal feature space was carried out using a distance metricthat combines both the feature values and the risk assessmentsas computed by the Gail risk model using the parameter β.

Page 13: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 13

TABLE IV: De-identified breast cancer screening tests dataset

Patient ID Age Breast density Ethnicity Gender Familyhistory

MammogramBI-RADS

score

MRIBI-RADS

score

UltrasoundBI-RADS

score

1 71 Almost entirely fat(<25%) U F Maternal Aunt 1 - -

2 72 Almost entirely fat(<25%) U F Maternal

Cousin 2 1 -

3 60 Heterogeneouslydense (51% - 75%) B F - 2 - -

4 66 Almost entirely fat(<25%) W F Sister 1 - -

5 56 Heterogeneouslydense (51% - 75%) W F - 1 - -

. . . . . . . . .

. . . . . . . . .

11,733 39 Heterogeneouslydense (51% - 75%) A F - 2 1 -

. . . . . . . . .

. . . . . . . . .

25,594 67 Heterogeneouslydense (51% - 75%) W F Mother 2 2 1

TABLE V: Statistics for the dataset involved in the experiments

Category PercentageMG BI-RADS 93.39%.MRI BI-RADS 2.75%.US BI-RADS 9.21%.

Patients with malignant tumor 8.33%.

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Risk assessment (%)

Fractionofpatients

Fig. 6: Histogram for the Gail risk assessments for patients inthe dataset.

Setting the parameter β = 0 corresponds to risk stratifica-tion, whereas setting β = 1 corresponds to stratifying thepersonal feature space while disregarding the prior informationprovided by the Gail model. Since the Gail model does notincorporate all the patients features (e.g. family history), oneexpects that the best choice of β will be between 0 and 1,for that both the personal features and the risk assessmentsof the patients contains (non-redundant) information aboutpatients’ similarity. As shown in Fig. 5, for an FNR constraintof η = 0.1 and confidence parameter of δ = 0.05, we findthat β = 0.75 is the best choice of the distance metric

2500 5000 7500 10000 12500 15000 17500 20000−2

0

2

4

6

8

10

12

14

Size of training data (m)

Expectednumber

ofpartitions(E

[M])

δ = 0.05

η = 2%

η = 10%

η = 20%

Fig. 7: The expected number of partitions (clusters) of thepersonal feature space versus the size of the training set .

since it maximizes the system’s accuracy (FNR and FPR).This means that for η = 0.1 and δ = 0.05, it is better toincorporate more information from the personal features thanfrom the risk assessment. Our interpretation for such a resultis that since most of the patients in the dataset have a lowto average risks, as shown in the histogram plotted in Fig.6, the information contained in the Gail risk assessment isnot enough to differentiate between patients and bundle theminto clusters. Therefore, we use the value β = 0.75 whenrunning ConfidentCare in all the experiments. All averageperformance measures in this paper where obtained via 50-fold cross validation.

C. ConfidentCare Performance Evaluation

In this subsection, we investigate the operation and per-formance of ConfidentCare in terms of clustering and policyconstruction, endured monetary costs, and accuracy. As we cansee in Fig. 7, ConfidentCare can (on average) discover more

Page 14: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 14

1 2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7η = 0.1, and δ = 0.05

Risk assessment (Gail model) %

Averagenorm

alizedmonetary

cost

0-5% 5-10% 10-15% 20-25% 25-30% 30-35% 35-40% 40-45%15-20%

Fig. 8: Average normalized monetary cost endured by Confi-dentCare for patients with different risk assessments.

1 2 3 40

0.02

0.04

0.06

0.08

0.1

Partition

Falsenegativerate

(FNR)

η = 0.1 and δ = 0.05

1 2 3 40

0.02

0.04

0.06

0.08

0.1

Partition

Falsepositiverate

(FPR)

η

Fig. 9: FNR and FPR of ConfidentCare for different partitionsof the personal feature space.

subgroups of patients for whom it can construct a screeningpolicy with the desired confidence level as the size of the train-ing data increases. In agreement with our expectation that themore training examples provided to ConfidentCare, the higherthe number of clusters can be constructed with guaranteedperformance bounds. Note that for different settings for theconstraint η, the possible levels of stratification are different.For a fixed size of the training data, as the FNR constraintbecomes tighter, the level of personalization decreases. Forinstance, we can see in Fig. 7 that the expected number ofpartitions for η = 0.2 is greater than that for η = 0.1,whereas for η = 0.02 the system can never find any feasiblepartitioning of the feature space regardless of the size of thetraining data.

Fig. 8 shows the average (normalized) monetary costsendured by ConfidentCare for patients with different riskassessments. As the risk level increases, the costs increaseconsequently since ConfidentCaare would recommend moretests (including the expensive MRI test) to patients withhigh level of risk for developing breast cancer. As seen, the

personalized screening policy is different for each cluster.In Fig. 9, we plot the FNR and FPR with respect to every

partition constructed by the algorithm in a specific realizationof ConfidentCare which was able to discover 4 partitions. Itis clear that the FNR satisfies the constraint of η = 0.1 for allpartitions. The FPR for different partitions, for instance we cansee that partition 2 has a FPR of 0, whereas other partitionshave a non-zero FPR. In Fig. 10, we show the partitions (ina 2D subspace of the original personal feature space) and theconstructed policy corresponding to each cluster. It can be seenthat patients who are young in age and have low breast densityare recommended to take no tests, whereas other subgroups arerecommended to take a MG test. We also note that the policyis more “aggressive” for patients with high breast density, i.e.for partition 3, a relatively low BI-RADS score from a MG canstill lead to a recommendation for an addition US or an MRI,whereas for other subgroups the policy is more conservativein terms of recommending additional screening tests. Thisresults as higher breast densities lead to more difficult tumordetection.

Note that Fig. 9 represents just a single realization of Confi-dentCare, and thus it does not reveal the amount of confidencewe have in the algorithm satisfying the FNR constraint with ahigh probability. In order to verify the confidence level in thepolicy constructed by ConfidentCare, we run the algorithm for100 runs and see the fraction of time where the FNR in thetesting set for any partition exceeds the threshold η. It can beseen that this is bounded by the specified confidence level δ.

D. ConfidentCare and Standard CPGs

We compare the performance of ConfidentCare with thatof the current clinical guidelines in order to assess the valueof personalization in terms of cost-efficiency. We compare themonetary cost of ConfidentCare with that of the AmericanCancer Society (ACS) screening guidelines issued in 2015[51]. The reason for selecting this specific CPG is that italready applies a coarse form of risk stratification: low, averageand high risk women are recommended to take different setsof tests. In Fig. 12, we plot the distribution of the normalizedmonetary cost of ConfidentCare together with that of the ACSover different levels of risk. ConfidentCare is expected toreduce screening costs since it supports a finer stratificationof the patients, and thus recommends screening tests only topatients who need them based on thier features and previoustest results. The comparison in Fig. 12 is indeed subject tothe selection of η and δ by clinicians (or institutions). Themore we relax the FNR and confidence constraints, the moresavings we attain in terms of the monetary costs.

Finally, we compare the accuracy of ConfidentCare withthat of a single decision tree of tests that is designed in a“one-size-fits-all” fashion. In particular, we build a tree oftests using the well-known C4.5 algorithm [50], and thencompare its performance with that of ConfidentCare withrespect to every partition found by ConfidentCare. From Fig.13, we can see that for the same realization illustrated inFig. 9 and 10, both approaches have a comparable FNR, butConfidentCare outperforms a single decision tree in terms of

Page 15: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 15

1 2 3 4

No test B>=4 B<4

B: BI-RADS score

MG

US

MRI

B>=4 B<4

B>=4 B<4

Recommend biopsy

Regular followup

B>=3 B<3

MG

US

MRI

B>=3 B<3

B>=3 B<3

Recommend biopsy

Regular followup

B>=4 B<4

MG

US

MRI

B>=4 B<4

B>=3 B<3

Recommend biopsy

Regular followup

Regular followup

Regular followup

Regular followup

Regular followup

Regular followup

Regular followup

Age (years)

Bre

ast

den

sity

0

0.2

5

0

.5

0

.75

1

20 30 40 50 60 70

2

3

4

1

Fig. 10: The personal feature space partitions and the corresponding screening policy.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.110.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11η = 0.1

Confidence level δ

P(F

NR

≤η)

P (FNR ≤ η)

δ

Fig. 11: The probability that the FNR of ConfidentCare isbelow η versus the confidence parameter δ.

1 2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9δ = 0.05

Risk assessment %

Averagenorm

alizedmonetary

cost

ACS CPGConfidentCare with η = 0.1ConfidentCare with η = 0.2

0-5% 5-10% 10-15% 15-20% 20-25% 25-30% 30-35% 35-40% 40-45%

Fig. 12: Average normalized monetary cost versus risk assess-ment for ConfidentCare and the ACS guidelines.

the FPR for all the 4 partitions. This is because ConfidentCaredeals differently with women belonging to different subgroupsas shown in Fig. 10, i.e. for instance women in partition2 are not recommended to take any tests. In other words,

1 2 3 40

0.02

0.04

0.06

0.08

0.1

0.12

Partitions

Falsenegative

rate

(FNR)

η = 0.1 and δ = 0.05

1 2 3 40

0.02

0.04

0.06

0.08

Partitions

Falsepositive

rate

(FPR)

ConfidentCareSingle decision tree

η = 0.1

Fig. 13: FNR and FPR of ConfidentCare and a single decisiontree of screening tests.

TABLE VI: FNR and FPR for ConfidentCare (with η = 0.1 andδ = 0.05) and a single C4.5 decision tree

Algorithm FNR FPRSingle C4.5 decision tree 0.0501 0.0488.

ConfidentCare 0.0512 0.037.

ConfidentCare avoids recommending unnecessary tests, whichreduces the rate of false positives. The average values of theFNR and FPR for 50 runs of ConfidentCare and a singledecision tree are reported in Table VI, where a gain of 31.91%with respect to the FPR is reported.

E. Discussion and future work

The screening policy which we developed aimed to man-aging the short-term screening procedure, i.e. the policy wasrecommending a sequence of screening tests for the patientbased on the outcomes of those tests, and such tests areexpected to be taken in a relatively short time interval. Ourframework can be extended to design policies that are con-cerned with the long-term patient outcomes, and are capable ofnot only recommend tests to the patient, but also recommend

Page 16: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 16

40 45 50 55 60 65 70 75 80 85 900

5

10

15

20

25

30

35

40

45

Age

Gailmodel

risk

assessm

ent(%

)

Representative patient 1

Representative patient 2

Representative patient 3

Representative patient 4

Recommended screeningfrequency forrepresentative patient 3

Recommended screeningfrequency forrepresentative patient 2

Fig. 14: Risk assessment over time for the representativepatients (centroids) constructed by ConfidentCare.

the frequency with which screening should be carried out fordifferent subgroups of patients. To see how our frameworkcan be extended to handle such a setting, we plot the riskof developing breast cancer over time for the representativeagents (centroids) of 4 clusters constructed in one realizationof the algorithm in Fig. 14. Each cluster exhibits a differentrate of risk growth over time, i.e. for instance while clusters3 and 4 in Fig. 14 comprise women of almost the sameage, patients in cluster 3 develop a risk for breast cancermore quickly than patients in cluster 4 due to other factors(e.g. family history). Thus, ConfidentCare can be modifiedto not only recommend a sequence of tests to patients indifferent clusters, but also to compute the optimal frequencyof screening (steps over time for which the patient need to beregularly screened) that would maximize a long-term objectivefunction. Intuitively, the frequency of screening would dependon the slope of the risk assessment over time, i.e. clusterswith steeper slopes would demand more frequent screening.Our framework is well suited to capture such a setting, andthe ConfidentCare algorithm can be modified to construct ascreening policy that maximizes long-term outcomes with highlevels of confidence.

VI. CONCLUSIONS

In this paper, we developed ConfidentCare: a clinical de-cision support system that learns a personalized screeningpolicy from electronic health record data. ConfidentCare op-erates by stratifying the space of a womans features andlearning cost-effective and accurate personalized screeningpolicies with guaranteed performance bounds. ConfidentCarealgorithm iteratively stratifies the patients’ feature space intodisjoint clusters and learns active classifiers associated witheach cluster. We have shown that the proposed algorithmimproves the cost efficiency and accuracy of the screeningprocess compared to current clinical practice guidelines, andstate-of-the-art algorithms that do not consider personalization.

ACKNOWLEDGMENT

We would like to thank Dr. Camelia Davtyan (Ronald Rea-gan UCLA Medical Center) for her valuable help and preciouscomments on the medical aspects of the paper. We also thankDr. William Hoiles (UCLA) for the valuable discussions thatwe had with him on this paper.

APPENDIX APROOF OF THEOREM 1

Recall that learning the optimal hypothesis for every par-tition requires the same sample complexity of N∗H(δ, ε, εc).Since the training set has m samples, then the maximumnumber of partitions is achieved if we can find a partitioningof the personal feature space πM∗(Xd, dx) such that:

1) The hypothesis setH is PAC-learnable for every partition,i.e. infh∈H FNRj(h) ≤ η for every partition Cj inπM∗(Xd, dx), and H has a finite VC-dimension [47].

2) There are N∗H(δ, ε, εc) training samples in every partition.Condition (1) has to be satisfied for any hypothesis classthat will lead to the optimal level of personalization, sinceotherwise problem (4) will be infeasible, whereas condition(2) that we can have at most M∗ =

⌊m

N∗H(δ,ε,εc)

⌋partitions in

πM∗(Xd, dx).

APPENDIX BPROOF OF THEOREM 2

The Theorem states that for all finite hypothesis classes Hwith |H| < ∞, a sufficient condition for PAO-learnability isthat infh∈H FNR(h) ≤ η. If H, then we know that problem (4)will have a feasible solution only if infh∈H FNR(h) ≤ η, thusfor any PAO-learnable finite hypothesis class H, the conditioninfh∈H FNR(h) ≤ η has to be satisfied.

Now we prove the converse, and show that for everyfinite hypothesis set H, the condition infh∈H FNR(h) ≤ ηimplies PAO-learnability. Recall from the definition of PAO-learnability that H is learnable if

1) H∗ = {h : ∀h ∈ H,FNRj(h) ≤ η} 6= ∅, with h∗ =arg infh∈H∗ C(h).

2) For every (εc, ε, δ) ∈ [0, 1]3, there exists a polynomialfunction N∗H(ε, εc, δ) = poly( 1

εc, 1ε ,

1δ ), such that for any

m ≥ N∗H(ε, εc, δ), there is a learning algorithm A forwhich

PSm∼D⊗m (C (A (Sm)) ≥ C(h∗) + εc) ≤ δ,

PSm∼D⊗m (FNR(A (Sm)) ≥ FNR(h∗) + ε) ≤ δ.

Condition (1) is already satisfied since infh∈H FNR(h) ≤ η.Thus, it remains to show that the finiteness of the hypothesisset implies that condition (2) will indeed be satisfied. Note thatgiven that infh∈H FNR(h) ≤ η, which implies the feasibilityof the learning problem, it suffices to prove that the functionsFNRj(h) and Cj(h) are Glivenko-Cantelli classes (i.e. classesthat exhibit uniform convergence for any distribution D) withrespect to partition Cj and the hypothesis set H in order to

Page 17: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 17

prove learnability. Note that the FNR and the cost functionsare Glivenko-Cantelli classes if

limm↓∞

P(

suph∈H

∣∣∣FNRj(h)− ˆFNRj(h, Sjm)∣∣∣ = 0

)= 1,

limm↓∞

P(

suph∈H

∣∣∣Cj(h)− Cj(h, Sjm)∣∣∣ = 0

)= 1,

where ˆFNRj(h, Sjm) =

∑(x,y)∈Sj

mI{hj(xs)6=y,y=1}∑

(x,y)∈Sjm

I{y=1}is the em-

pirical FNR of hypothesis h measured based on the sampleSjm, and similarly, Cj(h, Sjm) is the empirical cost. In thefollowing, we prove that both the FNR and cost functionsexhibit uniform convergence. Note that in order for uniformconvergence to materialize, the following conditions must besatisfied

PSm∼D⊗m

( ∧h∈H

(∣∣∣ ˆFNRj(h, Sjm)− FNRj(h)∣∣∣ ≤ ε)) ≥ 1−δ,

PSm∼D⊗m

( ∧h∈H

(∣∣∣Cj(h, Sjm)− Cj(h)∣∣∣ ≤ εc)) ≥ 1− δ,

which can be combined as shown in (B.2), and rewritten asa union of events as shown in (B.3), and then upper-boundedusing a union bound followed by an application of Hoeffding’sinequality as shown in (B.6). From (B.6), it can be seen thatcondition (2) is satisfied for any finite hypothesis set with|H| <∞, for which N∗H (ε, εc, δ) = log(4|H|/δ)

2 min{ε2,ε2c}.

REFERENCES

[1] M. A. Hamburg, F. S. Collins, “The path to personalized medicine,”New England Journal of Medicine, vol. 363, no. 4, pp. 301-304, Jul.2010.

[2] L. Chin, J. N. Andersen, and P. A. Futreal, “Cancer genomics: fromdiscovery science to personalized medicine,” Nature medicine, vol. 17,no. 3, pp. 297-303, 2011.

[3] L. Hood, S. H. Friend, “Predictive, personalized, preventive, participa-tory (P4) cancer medicine,” Nature Reviews Clinical Oncology, vo. 8,no. 3, pp. 184-187, Mar. 2011.

[4] https://www.whitehouse.gov/precision-medicine.[5] D. Patil, “The White House Precision Medicine Initiative: Technology

Turned Innovative Solutions,” AAAS Annual Meeting 2016, AAAS, Feb.11-15, 2016.

[6] M. X. Ribeiro, A. J. M. Traina, C. Traina, P. M. Azevedo-Marques, “AnAssociation Rule-Based Method to Support Medical Image DiagnosisWith Efficiency,” IEEE Trans. Multimedia, vol. 10, no. 2, pp. 277-285,Feb. 2008.

[7] F. Liu, Y. Zhang, S. Liu, B. Zhang, Q. Liu, Y. Yang, B. Zhang, J. Luo,B. Shan, and J. Bai, “Monitoring of Tumor Response to Au Nanorod-Indocyanine Green Conjugates Mediated Therapy With FluorescenceImaging and Positron Emission Tomography,” IEEE Trans. Multimedia,vol. 15, no. 5, pp. 1025-1030, Aug. 2013.

[8] L. Tabar, et al., “Reduction in mortality from breast cancer after massscreening with mammography: randomised trial from the Breast CancerScreening Working Group of the Swedish National Board of Health andWelfare,” The Lancet, vol. 325, no. 8433, pp. 829-832, 1985.

[9] H. Weedon-Fekjaer, R. R. Pal, and J. V. Lars, “Modern mammographyscreening and breast cancer mortality: population study,” BMJ, vol. 348,no. 3701, pp. 1-8, 2014.

[10] C. Harding, F. Pompei, D. Burmistrov, H. G. Welch, R. Abebe, R.Wilson, “Breast cancer screening, incidence, and mortality across UScounties”, JAMA internal medicine., vol. 175, no. 9, pp. 1483-1489,Sep. 2015.

[11] N. J. Wald, “Guidance on terminology,” Journal of Medical Screening,vol. 15, no. 1, pp. 50-50, 2008.

[12] B. B. Spear, M. Heath-Chiozzi, and J. Huff, “Clinical application ofpharmacogenetics,” Trends Mol. Med., vol. 7, no. 5, pp. 201-204, May2001.

[13] J. A. Tice, et al., “Mammographic breast density and the Gail model forbreast cancer risk prediction in a screening population,” Breast cancerresearch and treatment, vol. 94, no. 2, pp. 115-122, 2005.

[14] M. H. Gail, L. A. Brinton, D. P. Byar, D. K. Corle, S. B. Green,C. Schairer, J. J. Mulvihill, “Projecting individualized probabilities ofdeveloping breast cancer for white females who are being examinedannually,” J. Natl. Cancer Inst., vol. 81, no. 24, pp. 1879-1886, Dec.1989.

[15] J. P. Costantino, M. H. Gail, D. Pee, S. Anderson, C. K. Redmond, J.Benichou, and H. S. Wieand, “Validation studies for models projectingthe risk of invasive and total breast cancer incidence,” J. Natl. CancerInst., vol. 15, no. 91, pp. 1541-1548, Sep. 1999.

[16] M. H. Gail, J. P. Costantino, J. Bryant, R. Croyle, L. Freedman,K. Helzlsouer, and V. Vogel, “Weighing the Risks and Benefits ofTamoxifen Treatment for Preventing Breast Cancern,” Journal of theNational Cancer Institute, vol. 91, no. 21, pp. 1829-1846, 1999.

[17] J. T. Schousboe, K. Kerlikowske, A. Loh, S. R. Cummings, “Personal-izing mammography by breast density and other risk factors for breastcancer: analysis of health benefits and cost-effectiveness,” Annals ofinternal medicine, vol. 155, no. 1m pp. 10-20, Jul. 2011.

[18] F. Cardoso, et al., “Locally recurrent or metastatic breast cancer: ESMOClinical Practice Guidelines for diagnosis, treatment and follow-up,”Annals of oncology, vol. 23, no. 7, 2012.

[19] S. Aebi, et al., “Primary breast cancer: ESMO Clinical Practice Guide-lines for diagnosis, treatment and follow-up,” Annals of oncology,vol.22, no. 6, 2011.

[20] R. A. Smith, V. Cokkinides, A. C. von Eschenbach, B. Levin, C. Cohen,C. D. Runowicz, S. Sener, D. Saslow, and H. J. Eyre, “American CancerSociety guidelines for the early detection of cancer,” CA: a cancerjournal for clinicians, vol. 52, no. 1, pp. 8-22, Jan. 2002.

[21] A. C. von Eschenbach, “NCI remains committed to current mammog-raphy guidelines,” The oncologist, vol. 7, no. 3, pp. 170-171, 2002.

[22] T. Onega, et al., “Breast cancer screening in an era of personalizedregimens: A conceptual model and National Cancer Institute initiativefor risk?based and preference?based approaches at a population level,”Cancer, vol. 120, no. 19, pp. 2955-2964, 2014.

[23] S. Molinaro, S. Pieroni, F. Mariani, M. N. Liebman, “Personalizedmedicine: Moving from correlation to causality in breast cancer,” NewHorizons in Translational Medicine, vol. 2, no. 2, Jan. 2015.

[24] S. A. Feig, “Personalized Screening for Breast Cancer: A Wolf inSheep’s Clothing?”, American Journal of Roentgenology, vol. 205, no.6, pp. 1365-1371, Dec. 2015.

[25] S.-H. Cho, J. Jeon, and S. I. Kim, “Personalized Medicine in BreastCancer: A Systematic Review,” J. Breast Cancer, vol. 15, no. 3, pp.265-272, Sep. 2012.

[26] J. S. Mandelblatt, N. Stout, and A. Trentham-Dietz, “To screen or not toscreen women in their 40s for breast cancer: Is personalized risk-basedscreening the answer?” Annals of internal medicine, vol. 155, no. 1, pp.58-60, 2011.

[27] J. D. Keen, “Analysis of health benefits and cost-effectiveness ofmammography for breast cancer,” Annals of internal medicine, vol. 155,no. 8, 2011.

[28] “ACR BI-RADS breast imaging and reporting data system: breastimaging Atlas 5th Edition,” American College of Radiology, 2013.

[29] L. Liberman and J. H. Menell, “Breast imaging reporting and datasystem (BI-RADS),” Radiologic Clinics of North America, vol. 40, no.3, pp. 409-430, 2002.

[30] R. D. Rosenberg, et al., “Effects of age, breast density, ethnicity, and es-trogen replacement therapy on screening mammographic sensitivity andcancer stage at diagnosis: review of 183,134 screening mammograms inAlbuquerque, New Mexico,” Radiology, vol. 209, no. 2, pp. 511-518,1998.

[31] D. B. Rubin, M. J. van der Laan, “Statistical issues and limitations inpersonalized medicine research with clinical trials,” The internationaljournal of biostatistics, vol. 8, no. 1, Jul. 2012.

[32] K. Kerlikowske, et al., “Likelihood ratios for modern screening mam-mography: risk of breast cancer based on age and mammographicinterpretation,” JAMA, vol. 276, no. 1, pp. 39-43, 1996.

[33] S. A. Murphy, “Optimal dynamic treatment regimes,” Journal of theRoyal Statistical Society: Series B (Statistical Methodology), vol. 65,no. 2, pp. 331-355, May 2003.

[34] E. B. Laber, D. J. Lizotte, M. Qian, W. E. Pelham, and S. A. Murphy,“Dynamic treatment regimes: Technical challenges and applications,”Electronic journal of statistics, vol. 8, no. 1, Jan. 2014.

Page 18: IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, …

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, XXXX 2016 18

PSm∼D⊗m

( ∧h∈H

(∣∣∣Cj(h, Sjm)− Cj(h)∣∣∣ ≤ εc∧∣∣∣ ˆFNRj(h, Sjm)− FNRj(h)

∣∣∣ ≤ ε)) ≥ 1− δ (B.2)

PSm∼D⊗m

( ∨h∈H

(∣∣∣Cj(h, Sjm)− Cj(h)∣∣∣ ≥ εc∨∣∣∣ ˆFNRj(h, Sjm)− FNRj(h)

∣∣∣ ≥ ε)) ≤ δ (B.3)

PSm∼D⊗m

( ∨h∈H

(∣∣∣Cj(h, Sjm)− Cj(h)∣∣∣ ≥ εc∨∣∣∣ ˆFNRj(h, Sjm)− FNRj(h)

∣∣∣ ≥ ε))

≤∑h∈H

PSm∼D⊗m

((∣∣∣Cj(h, Sjm)− Cj(h)∣∣∣ ≥ εc∨∣∣∣ ˆFNRj(h, Sjm)− FNRj(h)

∣∣∣ ≥ ε)) (B.4)

≤ 2 |H|(exp(−2mjε

2) + exp(−2mjε2c))

(B.5)

≤ 4 |H| exp(−2mj min{ε2c , ε2}). (B.6)

[35] B. Chakraborty, and S. A. Murphy, “Dynamic treatment regimes,”Annual review of statistics and its application, p. 447, 2014.

[36] E. E. Moodie, T. S. Richardson, and D. A. Stephens, “Demystifyingoptimal dynamic treatment regimes,” Biometrics, vol. 63, no. 2, pp. 447-455, Jun. 2007.

[37] F. J. Diaz, M. R. Cogollo, E. Spina, V. Santoro, D. M. Rendon, andJ. de Leon, “Drug dosage individualization based on a random-effectslinear model,” Journal of biopharmaceutical statistics, vol. 22, no. 3,pp. 463-484, May 2012.

[38] R. S. Kulkarni, K. M. Sanjoy, and J. N. Tsitsiklis, “Active learning usingarbitrary binary valued queries,” Machine Learning, vol. 11, no. 1, pp.23-35, 1993.

[39] S. Tong and D. Koller, “Support vector machine active learning withapplications to text classification,” The Journal of Machine LearningResearch, vol. 2, pp. 45-66, 2002.

[40] C. Persello and L. Bruzzone, “Active and semisupervised learning forthe classification of remote sensing images,” IEEE Trans. Geosci. andRemote Sens., vol. 52, no. 11, pp. 6937-6956, Nov. 2014.

[41] R. Greiner, A. J. Grove, D. Roth, “Learning cost-sensitive activeclassifiers,” Artificial Intelligence, vol. 139, no. 2, pp. 137-174, Aug.2002.

[42] A. Freitas, A. Costa-Pereira, P. Brazdil, “Cost-sensitive decision treesapplied to medical data,” in Data Warehousing and Knowledge Discov-ery, Springer Berlin Heidelberg, pp. 303-312, Jan. 2007.

[43] C. X. Ling, Q. Yang, J. Wang, S. Zhang, “Decision trees with minimalcosts,” in Proc. of ICML, p. 69, Jul. 2004.

[44] S. Yu, B. Krishnapuram, R. Rosales, and R. B. Rao, “Active sensing,”International Conference on Artificial Intelligence and Statistics, 2009.

[45] S. Lomax, and S. Vadera, “A survey of cost-sensitive decision treeinduction algorithms,” ACM Computing Surveys (CSUR), vol. 45, no.2, 2013.

[46] V. Bryant, “Metric Spaces: Iteration and Application,” Cambridge Uni-versity Press, 1985.

[47] S. S.-Shwartz, S. B.-David, “Understanding Machine Learning: FromTheory to Algorithms,” Cambridge University Press, 2014.

[48] P. S. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Info.Theory, vol. 28, no. 2, pp. 129-137, 1982.

[49] L. Hyafil and Ronald L. Rivest, “Constructing optimal binary decisiontrees is NP-complete,” Information Processing Letters, vol. 5, no. 1, pp.15-17, 1976.

[50] J. R. Quinlan, “C4.5: programs for machine learning,” Elsevier, 2014.[51] http://www.cancer.org/cancer/breastcancer/moreinformation/

breastcancerearlydetection/breast-cancer-early-detection-acs-recs.


Recommended