+ All Categories
Home > Documents > Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting...

Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting...

Date post: 26-Sep-2016
Category:
Upload: leslie-dalton
View: 212 times
Download: 0 times
Share this document with a friend
12
Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification Leslie Dalton & David L Page 1 South Austin Medical Center, Austin, TX, and 1 Department of Pathology, Vanderbilt University, Nashville, TN, USA Date of submission 9 November 2011 Accepted for publication 21 December 2011 Dalton L & Page D L (2012) Histopathology Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification Aims: Nottingham breast cancer grade (NG) is a subjective morphological assessment based on evalua- tion of the entire tumour. The value of many novel immunohistochemical and molecular markers is being assessed on tiny microarray samples of tumour and compared with NG. The aim of this study was to investigate whether tumour morphology in microarray samples would correlate with NG. Methods and results: We examined over 40 morpho- logical features in each of 568 breast tumour samples on a microarray obtained from the US National Cancer Institute. Evaluations were subjective, and features were recorded as being present or absent in each tumour. Subsequently, on the basis of binary results, a boosting classification algorithm was implemented to help assign a ‘microarray score’ and ‘microarray grade’ to each tumour. Microarray grade was significantly correlated with NG (P < 0.01). High-grade versus low- grade discrepancies were rare (five of 568 samples). Conclusions: The strong correlation of microarray grade with NG supports pathologist reproducibility in subjective evaluations. Keywords: breast carcinoma grade, microarray boosting, reproducibility Abbreviations: NG, Nottingham grade; NGMA3, Nottingham grade score 3 mitotic activity Introduction Nottingham grading of breast cancer is well established as providing valuable prognostic information, 1 and remains the standard of comparison in the evaluation of newer markers. However, such markers have been appraised on miniscule samples of tumours in micro- arrays, and subsequent data analysis is often performed with recently developed statistical algorithms. Our aim was to adopt a mindset whereby morphological vari- ables would be studied in much the same manner. The evaluation of morphological parameters in our study was subjective. The subjectivity of grading and corresponding concerns about its reproducibility con- tinue to be points of discussion. 2 It has been proposed that an advantage of genetic profiling studies is the avoidance of a lack of reproducibility in grading by pathologists, 3 although many of the genetic profiling techniques themselves have not undergone adequate reproducibility studies. However, this critique aside, can assessment of morphology in microarray samples provide another means of studying the reproducibility of grading? Although newer molecular and immunohistochem- ical methods promise advantages in grading, consider- ation should be given to the advances in mathematics that have paralleled the advances in methodologies. The mathematics used in arriving at a Nottingham grade (NG) relies on simple addition. NG is derived from the Bloom–Richardson method, 4 and many data min- ing algorithms were developed long after the Bloom– Richardson method was first introduced. A component Address for correspondence: L Dalton, Laboratory/Pathology, 901 W Ben White Blvd, v South Austin Medical Center, Austin, TX 78704, USA. e-mail: [email protected] Ó 2012 Blackwell Publishing Limited. Histopathology 2012 DOI: 10.1111/j.1365-2559.2012.04254.x
Transcript
Page 1: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

Grading breast cancer on microarray samples: comparisonwith Nottingham grade, and use of boosting classification

Leslie Dalton & David L Page1

South Austin Medical Center, Austin, TX, and 1Department of Pathology, Vanderbilt University, Nashville, TN, USA

Date of submission 9 November 2011Accepted for publication 21 December 2011

Dalton L & Page D L

(2012) Histopathology

Grading breast cancer on microarray samples: comparison with Nottingham grade, and use ofboosting classification

Aims: Nottingham breast cancer grade (NG) is asubjective morphological assessment based on evalua-tion of the entire tumour. The value of many novelimmunohistochemical and molecular markers is beingassessed on tiny microarray samples of tumour andcompared with NG. The aim of this study was toinvestigate whether tumour morphology in microarraysamples would correlate with NG.Methods and results: We examined over 40 morpho-logical features in each of 568 breast tumour sampleson a microarray obtained from the US National Cancer

Institute. Evaluations were subjective, and featureswere recorded as being present or absent in eachtumour. Subsequently, on the basis of binary results, aboosting classification algorithm was implemented tohelp assign a ‘microarray score’ and ‘microarray grade’to each tumour. Microarray grade was significantlycorrelated with NG (P < 0.01). High-grade versus low-grade discrepancies were rare (five of 568 samples).Conclusions: The strong correlation of microarraygrade with NG supports pathologist reproducibility insubjective evaluations.

Keywords: breast carcinoma grade, microarray boosting, reproducibility

Abbreviations: NG, Nottingham grade; NGMA3, Nottingham grade score 3 mitotic activity

Introduction

Nottingham grading of breast cancer is well establishedas providing valuable prognostic information,1 andremains the standard of comparison in the evaluationof newer markers. However, such markers have beenappraised on miniscule samples of tumours in micro-arrays, and subsequent data analysis is often performedwith recently developed statistical algorithms. Our aimwas to adopt a mindset whereby morphological vari-ables would be studied in much the same manner.

The evaluation of morphological parameters in ourstudy was subjective. The subjectivity of grading andcorresponding concerns about its reproducibility con-

tinue to be points of discussion.2 It has been proposedthat an advantage of genetic profiling studies is theavoidance of a lack of reproducibility in grading bypathologists,3 although many of the genetic profilingtechniques themselves have not undergone adequatereproducibility studies. However, this critique aside,can assessment of morphology in microarray samplesprovide another means of studying the reproducibilityof grading?

Although newer molecular and immunohistochem-ical methods promise advantages in grading, consider-ation should be given to the advances in mathematicsthat have paralleled the advances in methodologies.The mathematics used in arriving at a Nottinghamgrade (NG) relies on simple addition. NG is derived fromthe Bloom–Richardson method,4 and many data min-ing algorithms were developed long after the Bloom–Richardson method was first introduced. A component

Address for correspondence: L Dalton, Laboratory/Pathology, 901 W

Ben White Blvd, v South Austin Medical Center, Austin, TX 78704,

USA. e-mail: [email protected]

� 2012 Blackwell Publishing Limited.

Histopathology 2012 DOI: 10.1111/j.1365-2559.2012.04254.x

Page 2: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

Table 1. A list of the morphological features studied, with an example photomicrograph of each feature shown in Figures 1–3.The number of cases with a given feature is divided into the three Nottingham grades, and intermediate grade is furthersubdivided into Nottingham score 6 and 7

Feature

Examples Nottingham grades Notting-hamscores

Figures 1–3 Low Intermediate High 6 7

Comparison; large nuclei(judged by comparison with nearby tumours)

1A 8 40 62 21 19

Comparison; small nuclei 1A 65 30 5 25 5

Mitoses, two or more 1B 3 12 39 3 9

Bloom–Richardson mitotic count(hyperchromatic figures and cellularity adjustment)

1C 13 33 55 12 21

Atypical mitotic figures 1B 0 3 8 1 2

Large nuclei (comparison with stromal cells) 1C 8 41 58 17 24

Vesicular nuclei 1D 20 56 61 26 30

Prominent nucleoli 1E 8 25 43 8 17

Increased variation in size and shape of nuclei 1C 42 79 54 51 28

Blastic nuclei 1F 1 11 39 3 8

Nuclear overlapping 1F 21 66 66 39 27

Small dark nuclei 1G 20 37 12 28 9

Small nuclei with open chromatin pattern 1H 89 62 9 49 13

Grooved nuclei (thyroid-like) 2A 2 4 1 2 2

Small-cell carcinoma 2B 0 2 1 2 0

Lobular features (broadly defined) 1G 62 75 11 65 10

Classic lobular (single filing of cells and low-grade nuclei) 2C 7 5 2 5 0

Tubulo-lobular 2D 14 7 1 7 0

Pleomorphic lobular 2E 11 14 3 12 2

Signet cells 2E 1 2 1 1 1

Single filing of tumour cells(five or more cells, not restricted to lobular type)

2C 44 49 2 41 8

Spaces (ducts, cribriform-any glandular spaces) 2F 80 36 4 24 12

Single discrete ducts present (would include tubular) 2G 27 7 0 4 3

Intracytoplasmic lumina 2H 18 16 1 13 3

High-grade adnocarcinoma(high nuclear grade with well-defined ducts)

3A 13 16 3 10 6

2 L Dalton & D L Page

� 2012 Blackwell Publishing Ltd, Histopathology

Page 3: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

of our study was designed to take advantage of ‘new’mathematics in order to evaluate ‘old’ morphology.

Materials and methods

A set of microarray slides containing samples from 590separate breast cancer samples was obtained from theUS National Cancer Institute Cooperative Breast CancerTissue Resource Program. The cases were contributedby sites at four geographically distinct locations in theUSA (Fox Chase Cancer Center in Philadelphia, PA;Kaiser Permanente Northwest in Portland, OR; Wash-ington University School of Medicine in St Louis, MO;and the University of Miami School of Medicine inMiami, FL). Pathologists from each of these institutionsmarked the foci from which microarray cores were tobe taken. On preparation of the microarray slides, thetumour samples were randomly arranged.

Pathologists at each of the four sites worked closelytogether in order to have optimal consensus in theirgrading. Each of the institutions was responsible forassessing tumour grade for those cases submitted fromtheir institutions. The data included a 1–3 score foreach of nuclear grade, tubule formation, and mitoticactivity. From this, a Nottingham score and finalcombined grade were calculated. On the basis of priorwork, we saw no reason to attempt to grade thetumours again.5

The microarray slides were received unstained, and aroutine haematoxylin and eosin stain was performedon the same run, with slides being readied for dailysign-out. A review of multiple morphological features ofeach tumour sample was then performed, and thefeatures that were studied are listed in Table 1. Anexample photomicrograph of each feature is given inFigures 1–3, as referenced in Table 1. Each feature was

Table 1. (Continued )

Feature

Examples Nottingham grades Notting-hamscores

Figures 1–3 Low Intermediate High 6 7

Classic tubular 2G 5 1 0 0 1

Squamo-basaloid architecture 1B 2 4 26 3 1

Larger tumour nests with rounded, sharp contours 1C 6 24 30 13 11

Not adenocarcinoma (if metastatic, breast origin might be missed) 1C 7 53 71 24 29

Pattern 1 3B 3 8 2 3 5

Pattern 2 3C 4 2 0 1 1

Near-normal cytoplasm (not voluminous, cell borders seen) 2G 54 32 6 22 10

Voluminous cytoplasm (clear cells, apocrine, oncocytic, granular) 3D 6 19 12 9 10

Mucinous tumour 3E 1 1 0 1 0

Micropapillary architecture (‘classic’) 3F 2 4 0 4 0

Retraction artefact 3G 24 42 10 28 14

Numerous stromal lymphocytes 1C 14 43 46 27 16

Stromal hyalinization 3B 17 32 5 19 13

Fibroblastic stroma 3H 26 51 26 32 19

Cellular dyscohesion 1G 41 108 47 79 29

Tumour-related necrosis 1C 1 0 2 0 0

Dirty background (cellular debris) 2B 48 112 74 76 36

Boosted microarray breast cancer grade 3

� 2012 Blackwell Publishing Ltd, Histopathology

Page 4: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

A B

C D

E F

G H

Figure 1. Example photomicrographs of tumours (see Table 1).

4 L Dalton & D L Page

� 2012 Blackwell Publishing Ltd, Histopathology

Page 5: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

A B

C D

E F

G H

Figure 2. Example photomicrographs of tumours (see Table 1).

Boosted microarray breast cancer grade 5

� 2012 Blackwell Publishing Ltd, Histopathology

Page 6: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

A B

C D

E F

G H

Figure 3. Example photomicrographs of tumours (see Table 1).

6 L Dalton & D L Page

� 2012 Blackwell Publishing Ltd, Histopathology

Page 7: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

evaluated as being present or absent. None of thefeatures were considered to be mutually exclusive.Conceivably, a case could be marked as having bothlarge and small nuclei. Among the 590 microarraysamples, we found 568 to have acceptable quality formorphological analysis.

Microarray samples were 0.7 mm in diameter, andthe microscope used for evaluation displayed a diam-eter of 0.65 mm at · 400. Therefore, a microarraymitotic count was essentially the number of mitoses ina single high-power field. Whether or not a sample hadtwo or more mitotic figures was one of the morpho-logical predictors.

The microarray slides were evaluated one morpho-logical feature at a time, and no two features wereexamined on the same day. The purpose of this was tomirror how one might evaluate the results of animmunohistochemical marker. Upon review of a fea-ture there was blinding as to the markings of priorevaluations. The time from examination of the firstfeature to the last was five months.

A list of the morphological features to be reviewedhad been prepared before beginning the evaluations.However, on examining a given feature, a finding in asample might garner attention. If so, a feature would beadded to the list and studied at a later date. Forexample, the tumour shown in Figure 2A prompted aseparate study of ‘nuclear grooves’, and perhaps thiscase is an example of a ‘thyroid-like tall cell variant’cancer, as recently described.6 Stromal hyalinizationwas another feature added on the basis of findings inseveral cases. Certain unique patterns were also stud-

ied. Figure 3B shows a ‘pattern 1’ tumour that presentsdense stromal hyalinization and mitotic activity, but inwhich the nuclei are rather bland.

The microarray platform allowed for remarkable easein comparing the morphology of a tumour with that ofneighbouring samples. It followed that several featureswere comparison measures, and the size of nuclei wasof particular interest. At · 40 power, 12 tumoursamples could be viewed, and finding tumours with thelargest nuclei in a low-power field was one of thefeatures examined. Figure 1A demonstrates how twotumours, when compared side to side, can show anobvious difference in the size of nuclei.

statistical analysis

Central to data analysis was the use of STATISTICAdata mining software.7 Within the STATISTICA pack-age, a two-step process was followed, whereby the firststep was to evaluate correlations among the features,in order to decide which features might be separatelyevaluated in a given pass through a data miningprogram. The second was the actual processing by adata mining algorithm.

In the feature selection step, predictor variables wereexamined with factor analysis (Figure 4), chi-squareclassification, the Ward method clustering algorithm,and feedback from a boosting algorithm. The Wardmethod was obtained from the r statistical project.8

After feature selection, five data mining algorithmswere evaluated. This was accomplished by followingthe rapid deployment option available in the statistica

software. In an automated manner, the software stepsthrough several data mining models.9 The modelsexamined were neural networks, classification andregression trees, boosted classification, random forests,and a support vector machine. Boosted classificationwas the algorithm chosen.10–12

Output from the boosted classification algorithmprovided for a fractional probability of a target variablebeing present based on the binary predictor variablesfor each row of data, and each row corresponded to asingle patient. As can be seen from a portion of a dataspreadsheet (Table 2), the last two columns showcontinuous variables that the boosted classificationalgorithms calculated from multiple binary inputs. Thecolumn headers, other than the last two, identifypredictor variables.

For the purposes of correlating a microarray gradewith NG, there were two separate passes of datathrough the boosting algorithm. Low NG was a targetvariable for one pass, and subsequently the targetvariable was reset to high NG. The last two columns as

–0.8

0.6

Ignucncleo

mact2sqbas

scor7

retarhyale

spacesduct

smopn

tublo

iclum

sfile

lobfe

scor6

smdrkdysco

szshpdirty

0.2

0.4

0.0

–0.2

–0.4

–0.6

–0.8–0.6 –0.4 –0.2 0.0 0.2 0.4 0.6 0.8

Factor 1

Fact

or 2

Factor loadings, factor 1 vs. facto r2 Rotation: unrotatedExtraction: principal components

Figure 4. A factor analysis plot shows the relationship among

selected features. Nottingham score 6 tumours are included as a point

of reference (scor6). Note the proximity of scor6 to tumours having

lobular morphology (lobfe). Also note that nucleoli (ncleo) are in

close proximity to mitotic activity (mact2).

Boosted microarray breast cancer grade 7

� 2012 Blackwell Publishing Ltd, Histopathology

Page 8: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

Table 2. A portion of a spreadsheet shows data pertaining to each patient. A 0 or 1 has been assigned for the presence orabsence of a given feature. The last two columns list the fractional probability that each case is either low or high Nottinghamgrade as per output from the boosting algorithm

sfile iclum lymph ncleo vesic ignuc lobfe retar smdrk AHIG APLOG

0 0 0 0 0 0 0 0 0 0.10314 0.83327

0 0 0 0 0 0 0 0 0 0.09574 0.67859

0 0 1 0 0 0 1 0 1 0.27322 0.64628

0 0 0 0 0 0 0 0 0 0.09574 0.84734

0 0 0 0 0 0 0 1 0 0.0994 0.83675

0 0 0 0 0 0 0 0 0 0.14665 0.4394

0 0 0 1 1 1 0 0 0 0.93908 0.14865

0 1 0 0 0 0 1 1 0 0.10693 0.66098

0 0 0 0 0 0 0 0 0 0.10929 0.45155

0 0 0 1 1 1 0 0 0 0.9661 0.14865

1 0 0 1 1 1 0 0 0 0.97605 0.14865

0 0 0 0 0 0 0 0 0 0.14747 0.44257

0 0 0 0 0 0 1 0 1 0.14542 0.35465

1 0 0 0 0 0 0 1 0 0.18279 0.4394

1 1 0 0 0 0 1 0 0 0.11747 0.66365

0 0 0 0 0 0 1 0 1 0.19113 0.4394

0 0 0 1 1 1 0 1 0 0.20392 0.25726

0 0 0 0 0 0 1 0 0 0.18759 0.64564

0 0 0 0 0 0 0 0 0 0.93229 0.21701

0 0 1 1 1 1 0 0 0 0.94387 0.12991

0 0 1 0 0 0 1 0 0 0.2206 0.38544

0 0 0 0 0 0 0 0 0 0.14747 0.6555

0 0 1 0 0 0 0 0 0 0.14002 0.64494

0 0 1 1 1 0 0 0 0 0.34959 0.4053

0 0 0 0 1 0 0 0 0 0.22872 0.31207

0 0 0 0 0 0 0 0 0 0.12825 0.65253

0 0 1 0 0 0 0 0 0 0.16333 0.63495

0 0 0 1 1 0 0 0 0 0.86456 0.18585

0 0 0 0 0 0 0 0 1 0.23494 0.43047

0 0 1 0 0 0 1 0 0 0.23453 0.61629

0 0 0 0 0 0 0 0 0 0.14665 0.4394

1 0 0 0 0 0 1 0 1 0.21743 0.44257

0 0 0 0 0 0 0 0 0 0.09574 0.84734

8 L Dalton & D L Page

� 2012 Blackwell Publishing Ltd, Histopathology

Page 9: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

seen in Table 2 are the fractional probabilities of eachpatient having a low-grade and a high-grade tumour.Given two separate evaluations, the probability of apatient having a low NG versus a high NG does notnecessarily add up to one.

A plot of the probability of the patients having a low-NG versus a high-NG tumour is shown in Figure 5.From this plot, along with examination of receiveroperator curves, cut-points were selected in order tobuild a microarray score with a scale of sevencategories. Of course, a Nottingham score is alsocomposed of seven possible scores.

Results

A histogram comparison of the microarray score withNottingham score is shown in Figure 6. The micro-array score was highly correlated with Nottinghamscore (Table 3). On merging the seven scores into threegrades, there were only five low-grade versus high-grade discrepancies of a microarray grade with NG(Table 4). A weighted kappa statistic comparing micro-array score with Nottingham score was 0.68, whichindicated high reproducibility.

Another output from the boosting classification is therelative importance or weighting that the algorithmassigned to the various predictors in order for thealgorithm to optimize classification. Table 5 shows aranking of the predictor variables when the target variableswere, respectively, low, intermediate and high NG.

Other target variables were examined. Of particularinterest were those tumours with NG score threemitotic activity (NGMA3). NGMA3 was set as thetarget variable, and those microarray morphologicalpredictors that required no identification of mitoticfigures were selected as morphological predictors. Inthis manner, a morphological probability of prolifera-tion was obtained, despite the inability of a microarraysample to yield a 10 high-power field mitotic count.There was a significant correlation of non-mitoticmicroarray predictors with NGMA3 (P < 0.05,

0.8

0.6

0.4

0.2

0.8 1.0APHIG

AP

LOG

0.60.40.2

Figure 5. Each data point represents an individual case. The x-axis

is the probability of a patient having a high-grade tumour, and

the y-axis is the probability of a low-grade tumour. This is a plot of

the last two columns shown in Table 5.

150

100

500

3 4 5 6 7 8 9

Histogram of microarray score

Freq

uenc

y

Microarray score

150

100

500

7 8 96Nottingham score

Histogram of Nottingham scores

Freq

uenc

y53 4

Figure 6. Histograms comparing number of tumours having Not-

tingham scores of 3–9 with tumours having a ‘microarray score’ of

3–9.

Boosted microarray breast cancer grade 9

� 2012 Blackwell Publishing Ltd, Histopathology

Page 10: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

r statistical package; Spearman). The relevant receiveroperator curve is shown in Figure 7, and from this alikelihood ratio of 5.7 and an odds ratio of 10.7 werecalculated.

When the target variable was set to NG score 6tumours, we found a relative abundance of tumourswith lobular features. As shown by factor analysis(Figure 4), NG score 6 tumours (scor6) are located inthe same area as tumours showing lobular features(lobfe). A listing of how often tumours were distributedamong the three NGs is given in Table 1, and Notting-ham intermediate-grade tumours are further subdividedinto Nottingham score 6 and score 7 categories.

Discussion

A microarray grade was developed that showed astrong correlation with NG, despite the use of different

methodologies in assessing grade. In our study, theassessment of NG was performed by those pathologistswho helped prepare the samples for the microarrays.Our evaluation of the microarray samples was blindedto the NG that had originally been assigned. First andforemost, we show reproducibility in grading, but witha unique approach.

The morphological parameters used to arrive at theNG have become well entrenched. This has not alwaysbeen the case. As an example, the 1957 Bloom andRichardson4 paper mentions breast cancer as beingclassified as ‘adenocarcinoma’ and ‘undifferentiatedspheroidal cell carcinoma’. Our use of a ‘non-adeno-carcinoma’ predictor was conceptualized from thisdiscussion. As another example, the observation ofsquamoid morphology in what otherwise might beclassified as a ‘no special type’ tumour was recalled.13

Decades ago, in a text that was well regarded at thetime, Haagensen14 stated that ‘Elaborate systems ofgrading depending upon many microscopic featuresand employing four to five grades are not realistic.’Modern data mining algorithms now suggest theopposite, as such algorithms can easily accommodatemultiple predictors and multiple levels of stratification.The design of the boosting algorithm is intended toenable the construction of a strong classifier from

Table 3. A comparison of Nottingham score with ‘micro-array score’

Nottingham score

Microarray score

3 4 5 6 7 8 9

3 10 3 1 6 1 0 0

4 17 9 5 13 4 0 0

5 29 23 19 34 18 0 2

6 7 25 29 68 42 5 7

7 4 4 10 17 28 6 14

8 1 1 1 8 17 3 24

9 0 0 0 2 7 0 43

R statistical software.

Spearman’s rank correlation rho.

Correlation test (Nottingham score, microarray score, spear-man method).

S = 8 142 644; P < 2.2 e.)16

Alternative hypothesis: true q „ 0; q 0.656158.

Table 4. A comparison of Nottingham grade with ‘micro-array grade’

Nottingham grade

Microarray grade

1 2 3

1 116 79 3

2 77 154 34

3 2 32 70

1.0

0.8

0.6

0.4Sen

sitiv

ity0.

20.

0

1.00.8

Cut off: 0.5562Sensitivity: 0.7757Specificity: 0.8652AUC: 0.9044

0.60.41-specificity

Cut-off that minimizes the distance between the curveand upper left corner.

0.20.0

Figure 7. A receiver operator curve with target variable of Notting-

ham score 3 mitotic count and predictor variables of those

microarray features that did not involve identification of mitotic

figures. The boosting algorithm produces a continuous variable from

multiple binary variables, such that examination of receiver operator

curves becomes applicable.

10 L Dalton & D L Page

� 2012 Blackwell Publishing Ltd, Histopathology

Page 11: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

multiple weak classifiers.10 As it is not possible toanticipate which of multiple weak classifiers mightbecome important, there is an incentive to maintain asmany predictors as reasonably possible. This allows fora robust set of predictors to ‘aim’ at different targets.

An advantage of boosting is the ability to easily resettarget variables. When the target variable was set toNGMA3, we found significant correlations with predic-tors that did not require identification of mitotic figures.Referring once again to the factor analysis plot (Fig-ure 4), we can see the proximity of nucleoli (nucle) tothe presence of mitotic figures (mact2). Helpap15

previously reported an association of nucleoli withtumour proliferation. Our study is supportive of hiswork, and indicates that, morphologically, a quantita-tive evaluation of proliferation need not be limited to amitotic count. Boosting provides a mechanism forintegrating many binary morphological features into aprobability quantified to an individual patient. Similarto what is seen in Table 5, we could obtain a probabilityfor each patient having an increased mitotic count.

The boosted classification algorithm chosen here notonly helped in quantitative analysis, but also indirectlyaided morphological evaluation. There was relativeease in concentrating on one feature at a time. It hasbeen reported that high nuclear grade tumours tend tohave large, vesicular nuclei with prominent nucleoli,along with an increase in variation of the size andshape of nuclei.16 The pathologist, in assessing nucleargrade, must then internally mould these featureswithout any particular advice on how to weigh onefeature against the other. With our method, each of theseparate ‘traditional’ nuclear grade features can simplybe marked as being present or absent, and the boostingalgorithm can proceed from that point on in order forweighting.

It is conceivable that algorithm selection might resultin a more heated debate than would choice of morpho-logical predictors. Needless to say, we were aware of theconcerns related to algorithm selection.17 Boostingclassification is not well known among pathologists,but this is not the case in the data mining community.12

It has been used with gene expression data, and theability of boosting to provide direct class membershipprobabilities was mentioned as an advantage.18

Hopefully, additional studies could help to determinewhich might be the best morphological features tofocus on, and the best statistical or data miningalgorithms for analysis of the data. Tumour grading(and typing) is a low-cost test, and many of the datamining algorithms can be obtained at no cost, includingseveral variations of boosting.19 On a worldwide basis,only a minority of breast cancer patients can afford

Table 5. Relative importance of features in classifying Not-tingham grades

Predictor (feature) Variable rank

Nottingham high gradeSquamo-basaloid 100

Blastic nuclei 91

Nucleoli 91

Variation in size ⁄ shape 90

Background lymphocytes 88

Not adenocarcinoma 88

Nuclear overlapping 81

Vesicular nuclei 80

Large nuclei 80

Bloom-Richardson mitotic count 78

Nottingham intermediate gradeSmall open nuclei 100

Not adenocarcinoma 96

Large nuclei 94

‘Dirty’ background 92s

Vesicular nuclei 91

Single filing of cells 76

Variation in size ⁄ shape 74

Background lymphocytes 73

Nuclear overlapping 71

Spaces 71

Nottingham low gradeBloom-Richardson mitotic count 100

Blastic nuclei 84

Tumour necrosis 83

Nucleoli 81

Squamo-basaloid 77

At least 2 mitoses 76

Small open nuclei 71

Cellular dyscohesion 68

Hyalinized stroma 60

‘Dirty’ background 58

Boosted microarray breast cancer grade 11

� 2012 Blackwell Publishing Ltd, Histopathology

Page 12: Grading breast cancer on microarray samples: comparison with Nottingham grade, and use of boosting classification

esoteric tests. Further study of the information that canbe derived from morphology is a worthy goal.

Acknowledgements

The authors thank Linda Le for technical assistanceand Katherine Daniel for editorial help.

References

1. Elston CW, Ellis IO. Pathological prognostic factors in breast

cancer I. The value of histological grade in breast cancer:

experience from a large study with long-term follow-up. Histo-

pathology 1991; 19; 403–410.

2. Rahka EA, El-Sayed ME, Lee AH et al. Prognostic significance of

Nottingham histologic grade in invasive breast carcinoma.

J. Clin. Oncol. 2008; 26; 3153–3158.

3. Paik S, Shak S, Tang G et al. A multigene assay to predict

recurrence of tamoxifen-treated, node negative breast cancer.

N. Engl. J. Med. 2004; 351; 2817–2826.

4. Bloom HJ, Richardson WW. Histologic grading and prognosis in

breast cancer; a study of 1409 cases of which 359 have been

followed for 15 years. Br. J. Cancer 1957; 11; 359–377.

5. Dalton LW, Page DL, Dupont WD. Histologic grading of breast

carcinoma A reproducibility study. Cancer 1994; 73; 2765–2770.

6. Eusebi V, Damiani S, Ellis IO, Azzopardi JG, Rosai J. Breast tumor

resembling tall cell variant of papillary thyroid carcinoma. Am. J.

Surg. Pathol. 2003; 27; 1114–1118.

7. Statistica Data Miner. Tulsa, OK: Statsoft. Available from: http://

www.statsoft.com/

8. Cesar del CP, Pardo CE. Hierarchic classification by Ward’s

method. The R Foundation for Statistical Computing. 2010.

Available from: http://cran.r-project.org/

9. Widmer CG, Miner G. Dentistry: facial pain study based on 84

predictor variables (both categorical and continuous). In Nisbet

R, Elder J, Miner G eds. Handbook of statistical analysis and data

mining application. San Diego, CA: Academic Press, 2009; 623–

650.

10. Freund Y, Schapire RE. A decision-theoretic generalization of on-

line learning and an application to boosting. J. Comp. Syst. Sci.

1997; 5; 119–139.

11. Schonlau M. Boosted regression (boosting): an introductory

tutorial and a Stata plug-in. Stata J. 2005; 5; 330–354.

12. Wu X, Kumar V, Quinlan JR et al. The top ten algorithms in data

mining. Knowl. Inf. Syst. 2008; 14; 1–37.

13. Page DL, Sakamoto G. Infiltrating carcinoma: major histologic

types. In Page DL, Anderson TJ eds. Diagnostic histopathology of

the breast. Edinburgh: Churchill Livingston, 1987; 193–215.

14. Haagensen CD. Special pathological forms of breast carcinoma.

In Haagensen CD. Diseases of the breast. Philadelphia, PA: W.B.

Saunders, 1971;585–616.

15. Helpap B. Nucleolar grading of breast cancer comparative studies

on frequency and localization of nucleoli and histology, stage,

hormonal receptor status and lectin histochemistry.. Virchow

Arch. A Pathol. Anat Histopathol. 1989; 415; 501–508.

16. Elston CW, Ellis IO. Assessment of histologic grade. In Elston CW,

Ellis IO eds. The breast. Edinburgh: Harcourt Brace and Company,

1998; 365–384.

17. Dupuy A, Simon RM. Critical review of published microarray

studies for cancer guidelines on statistical analysis and reporting.

J. Natl Cancer Inst. 2007; 99; 147–157.

18. Dettling M, Buhlmann P. Boosting for tumor classification with

gene expression data. Bioinformatics 2003; 19; 1061–1069.

19. Ridgeway G. Generalized boosting regression models. The R

Foundation for Statistical Computing. 2007. Available from:

http://cran.r-project.org/web/packages/gbm/index.html.

12 L Dalton & D L Page

� 2012 Blackwell Publishing Ltd, Histopathology


Recommended