+ All Categories
Home > Documents > Inter-observer agreement in describing the ultrasound...

Inter-observer agreement in describing the ultrasound...

Date post: 03-May-2018
Category:
Upload: vuongtruc
View: 222 times
Download: 1 times
Share this document with a friend
30
agreement in ultrasound assessment of ovarian masses 1 1 Inter-observer agreement in describing the ultrasound appearance of adnexal masses and in calculating the risk of malignancy using logistic regression models Povilas Sladkevicius, Lil Valentin Department of Obstetrics and Gynecology, Skåne University Hospital Malmö, Lund University, S-20502 Malmö, Sweden Short title: agreement in ultrasound assessment of ovarian masses Keywords: ovarian neoplasms; ultrasonography; power Doppler ultrasound; reproducibility of results Grant support This work was supported by the Swedish Medical Research Council (grant no. B0012201 and D0228201); funds administered by Skåne University Hospital; Allmänna Sjukhusets i Malmö Stiftelse för bekämpande av cancer (the Malmö General Hospital Foundation for fighting against cancer); Landstingsfinansierad regional forskning and ALF-medel (i.e., two Swedish governmental grants from the region of Scania); Funds administered by Skåne University Hospital. Research. on June 14, 2018. © 2014 American Association for Cancer clincancerres.aacrjournals.org Downloaded from Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 25, 2014; DOI: 10.1158/1078-0432.CCR-14-0906
Transcript

agreement in ultrasound assessment of ovarian masses 1

1

Inter-observer agreement in describing the ultrasound appearance of adnexal

masses and in calculating the risk of malignancy using logistic regression

models

Povilas Sladkevicius Lil Valentin

Department of Obstetrics and Gynecology Skaringne University Hospital Malmouml Lund

University S-20502 Malmouml Sweden

Short title agreement in ultrasound assessment of ovarian masses

Keywords ovarian neoplasms ultrasonography power Doppler ultrasound reproducibility

of results

Grant support

This work was supported by the Swedish Medical Research Council (grant no B0012201 and

D0228201) funds administered by Skaringne University Hospital Allmaumlnna Sjukhusets i Malmouml

Stiftelse foumlr bekaumlmpande av cancer (the Malmouml General Hospital Foundation for fighting

against cancer) Landstingsfinansierad regional forskning and ALF-medel (ie two Swedish

governmental grants from the region of Scania) Funds administered by Skaringne University

Hospital

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 2

2

Corresponding author Povilas Sladkevicius

Department of Obstetrics and Gynecology

Skaringne University Hospital Malmouml S-20502 Malmouml Sweden

Telephone +46 40 332636 Fax +46 40 962600

Email PovilasSladkeviciusmedluse

Conflict of interest statement

Both authors declare that they have no conflicts of interest

The word count excluding abstract (250 words) and references 3688

Total number of figures 1 to publish and 5 as supplemental material

Total number of tables 5 to publish and 3 as supplemental material

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Statement of translational relevance

The International Ovarian Tumor Analysis (IOTA) group has developed two logistic

regression models (LR1 and LR2) including clinical and ultrasound variables for calculation

of the risk of malignancy in adnexal masses It has been suggested that LR1 and LR 2 can be

used to counsel patients about their individual risk of malignancy and so may have a role in

personalized medicine In this work we found large inter-observer differences (gt 25

percentage units) in the calculated risk of malignancy in about 10 of cases The differences

were explained by ultrasound examiners interpreting ultrasound images differently We

suggest measures to improve inter-observer agreement Until better inter-observer agreement

in the calculated risk of malignancy using LR1 and LR2 has been shown one should be

cautious with using the risk estimate for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 1

1

Abstract

Purpose To estimate inter-observer agreement with regard to describing adnexal masses

using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of

malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate

what explained the largest inter-observer differences in calculated risk of malignancy

Experimental design 117 women with adnexal masses were examined with transvaginal

gray scale and power Doppler ultrasound by two independent experienced sonologists who

described the masses using IOTA terminology The risk of malignancy was calculated using

LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy

Results There were 94 benign four borderline and 19 invasively malignant tumors There

was substantial variability between the two sonologists in measurement results and some

variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)

Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)

Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the

inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12

(14117) of tumors respectively Differences in assessment of wall irregularity acoustic

shadowing color score and color flow in papillary projections explained most of these largest

differences

Conclusions Inter-observer agreement in classifying tumors as benign or malignant

using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because

risks estimates may differ substantially between sonologists one should be cautious with using

the risk value for counseling patients about their individual risk

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 2

2

Introduction

One of the first successful attempts to use ultrasound to discriminate between benign and

malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal

masses into five categories unilocular unilocular solid multilocular multilocular solid and

solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts

were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses

- pattern recognition - proved to be an excellent method for discriminating between benign

and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg

endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition

several research teams (8-10) created logistic regression models including clinical and

ultrasound information to calculate the individual risk of malignancy in adnexal masses

Because of unclear definitions of many of the ultrasound variables included in these models

the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and

definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA

group also created and validated several mathematical models in which these standardized

terms and definitions were used to calculate the risk of malignancy for each individual

adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2

including 12 and six variables respectively (see Table 1) were suggested to be suitable for use

in clinical practice (1314) However even when using standardized terms and definitions

ultrasound examiners may evaluate the features of an adnexal mass differently There may

also be variability in measurement results This means that the risk of malignancy calculated

by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that

this is indeed the case when experienced ultrasound examiners analyze three-dimensional

(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound

volumes does not necessarily reflect a situation where live examinations are performed

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 3

3

The aims of this study were to estimate interobserver agreement when live ultrasound

scans are performed with regard to 1) describing adnexal masses using the IOTA terminology

2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2

and 3) to elucidate what explains large interobserver differences in calculated risk of

malignancy

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 2

2

Corresponding author Povilas Sladkevicius

Department of Obstetrics and Gynecology

Skaringne University Hospital Malmouml S-20502 Malmouml Sweden

Telephone +46 40 332636 Fax +46 40 962600

Email PovilasSladkeviciusmedluse

Conflict of interest statement

Both authors declare that they have no conflicts of interest

The word count excluding abstract (250 words) and references 3688

Total number of figures 1 to publish and 5 as supplemental material

Total number of tables 5 to publish and 3 as supplemental material

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Statement of translational relevance

The International Ovarian Tumor Analysis (IOTA) group has developed two logistic

regression models (LR1 and LR2) including clinical and ultrasound variables for calculation

of the risk of malignancy in adnexal masses It has been suggested that LR1 and LR 2 can be

used to counsel patients about their individual risk of malignancy and so may have a role in

personalized medicine In this work we found large inter-observer differences (gt 25

percentage units) in the calculated risk of malignancy in about 10 of cases The differences

were explained by ultrasound examiners interpreting ultrasound images differently We

suggest measures to improve inter-observer agreement Until better inter-observer agreement

in the calculated risk of malignancy using LR1 and LR2 has been shown one should be

cautious with using the risk estimate for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 1

1

Abstract

Purpose To estimate inter-observer agreement with regard to describing adnexal masses

using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of

malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate

what explained the largest inter-observer differences in calculated risk of malignancy

Experimental design 117 women with adnexal masses were examined with transvaginal

gray scale and power Doppler ultrasound by two independent experienced sonologists who

described the masses using IOTA terminology The risk of malignancy was calculated using

LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy

Results There were 94 benign four borderline and 19 invasively malignant tumors There

was substantial variability between the two sonologists in measurement results and some

variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)

Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)

Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the

inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12

(14117) of tumors respectively Differences in assessment of wall irregularity acoustic

shadowing color score and color flow in papillary projections explained most of these largest

differences

Conclusions Inter-observer agreement in classifying tumors as benign or malignant

using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because

risks estimates may differ substantially between sonologists one should be cautious with using

the risk value for counseling patients about their individual risk

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 2

2

Introduction

One of the first successful attempts to use ultrasound to discriminate between benign and

malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal

masses into five categories unilocular unilocular solid multilocular multilocular solid and

solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts

were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses

- pattern recognition - proved to be an excellent method for discriminating between benign

and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg

endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition

several research teams (8-10) created logistic regression models including clinical and

ultrasound information to calculate the individual risk of malignancy in adnexal masses

Because of unclear definitions of many of the ultrasound variables included in these models

the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and

definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA

group also created and validated several mathematical models in which these standardized

terms and definitions were used to calculate the risk of malignancy for each individual

adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2

including 12 and six variables respectively (see Table 1) were suggested to be suitable for use

in clinical practice (1314) However even when using standardized terms and definitions

ultrasound examiners may evaluate the features of an adnexal mass differently There may

also be variability in measurement results This means that the risk of malignancy calculated

by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that

this is indeed the case when experienced ultrasound examiners analyze three-dimensional

(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound

volumes does not necessarily reflect a situation where live examinations are performed

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 3

3

The aims of this study were to estimate interobserver agreement when live ultrasound

scans are performed with regard to 1) describing adnexal masses using the IOTA terminology

2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2

and 3) to elucidate what explains large interobserver differences in calculated risk of

malignancy

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Statement of translational relevance

The International Ovarian Tumor Analysis (IOTA) group has developed two logistic

regression models (LR1 and LR2) including clinical and ultrasound variables for calculation

of the risk of malignancy in adnexal masses It has been suggested that LR1 and LR 2 can be

used to counsel patients about their individual risk of malignancy and so may have a role in

personalized medicine In this work we found large inter-observer differences (gt 25

percentage units) in the calculated risk of malignancy in about 10 of cases The differences

were explained by ultrasound examiners interpreting ultrasound images differently We

suggest measures to improve inter-observer agreement Until better inter-observer agreement

in the calculated risk of malignancy using LR1 and LR2 has been shown one should be

cautious with using the risk estimate for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 1

1

Abstract

Purpose To estimate inter-observer agreement with regard to describing adnexal masses

using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of

malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate

what explained the largest inter-observer differences in calculated risk of malignancy

Experimental design 117 women with adnexal masses were examined with transvaginal

gray scale and power Doppler ultrasound by two independent experienced sonologists who

described the masses using IOTA terminology The risk of malignancy was calculated using

LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy

Results There were 94 benign four borderline and 19 invasively malignant tumors There

was substantial variability between the two sonologists in measurement results and some

variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)

Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)

Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the

inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12

(14117) of tumors respectively Differences in assessment of wall irregularity acoustic

shadowing color score and color flow in papillary projections explained most of these largest

differences

Conclusions Inter-observer agreement in classifying tumors as benign or malignant

using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because

risks estimates may differ substantially between sonologists one should be cautious with using

the risk value for counseling patients about their individual risk

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 2

2

Introduction

One of the first successful attempts to use ultrasound to discriminate between benign and

malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal

masses into five categories unilocular unilocular solid multilocular multilocular solid and

solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts

were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses

- pattern recognition - proved to be an excellent method for discriminating between benign

and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg

endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition

several research teams (8-10) created logistic regression models including clinical and

ultrasound information to calculate the individual risk of malignancy in adnexal masses

Because of unclear definitions of many of the ultrasound variables included in these models

the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and

definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA

group also created and validated several mathematical models in which these standardized

terms and definitions were used to calculate the risk of malignancy for each individual

adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2

including 12 and six variables respectively (see Table 1) were suggested to be suitable for use

in clinical practice (1314) However even when using standardized terms and definitions

ultrasound examiners may evaluate the features of an adnexal mass differently There may

also be variability in measurement results This means that the risk of malignancy calculated

by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that

this is indeed the case when experienced ultrasound examiners analyze three-dimensional

(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound

volumes does not necessarily reflect a situation where live examinations are performed

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 3

3

The aims of this study were to estimate interobserver agreement when live ultrasound

scans are performed with regard to 1) describing adnexal masses using the IOTA terminology

2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2

and 3) to elucidate what explains large interobserver differences in calculated risk of

malignancy

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 1

1

Abstract

Purpose To estimate inter-observer agreement with regard to describing adnexal masses

using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of

malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate

what explained the largest inter-observer differences in calculated risk of malignancy

Experimental design 117 women with adnexal masses were examined with transvaginal

gray scale and power Doppler ultrasound by two independent experienced sonologists who

described the masses using IOTA terminology The risk of malignancy was calculated using

LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy

Results There were 94 benign four borderline and 19 invasively malignant tumors There

was substantial variability between the two sonologists in measurement results and some

variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)

Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)

Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the

inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12

(14117) of tumors respectively Differences in assessment of wall irregularity acoustic

shadowing color score and color flow in papillary projections explained most of these largest

differences

Conclusions Inter-observer agreement in classifying tumors as benign or malignant

using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because

risks estimates may differ substantially between sonologists one should be cautious with using

the risk value for counseling patients about their individual risk

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 2

2

Introduction

One of the first successful attempts to use ultrasound to discriminate between benign and

malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal

masses into five categories unilocular unilocular solid multilocular multilocular solid and

solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts

were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses

- pattern recognition - proved to be an excellent method for discriminating between benign

and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg

endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition

several research teams (8-10) created logistic regression models including clinical and

ultrasound information to calculate the individual risk of malignancy in adnexal masses

Because of unclear definitions of many of the ultrasound variables included in these models

the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and

definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA

group also created and validated several mathematical models in which these standardized

terms and definitions were used to calculate the risk of malignancy for each individual

adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2

including 12 and six variables respectively (see Table 1) were suggested to be suitable for use

in clinical practice (1314) However even when using standardized terms and definitions

ultrasound examiners may evaluate the features of an adnexal mass differently There may

also be variability in measurement results This means that the risk of malignancy calculated

by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that

this is indeed the case when experienced ultrasound examiners analyze three-dimensional

(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound

volumes does not necessarily reflect a situation where live examinations are performed

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 3

3

The aims of this study were to estimate interobserver agreement when live ultrasound

scans are performed with regard to 1) describing adnexal masses using the IOTA terminology

2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2

and 3) to elucidate what explains large interobserver differences in calculated risk of

malignancy

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 2

2

Introduction

One of the first successful attempts to use ultrasound to discriminate between benign and

malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal

masses into five categories unilocular unilocular solid multilocular multilocular solid and

solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts

were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses

- pattern recognition - proved to be an excellent method for discriminating between benign

and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg

endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition

several research teams (8-10) created logistic regression models including clinical and

ultrasound information to calculate the individual risk of malignancy in adnexal masses

Because of unclear definitions of many of the ultrasound variables included in these models

the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and

definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA

group also created and validated several mathematical models in which these standardized

terms and definitions were used to calculate the risk of malignancy for each individual

adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2

including 12 and six variables respectively (see Table 1) were suggested to be suitable for use

in clinical practice (1314) However even when using standardized terms and definitions

ultrasound examiners may evaluate the features of an adnexal mass differently There may

also be variability in measurement results This means that the risk of malignancy calculated

by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that

this is indeed the case when experienced ultrasound examiners analyze three-dimensional

(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound

volumes does not necessarily reflect a situation where live examinations are performed

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 3

3

The aims of this study were to estimate interobserver agreement when live ultrasound

scans are performed with regard to 1) describing adnexal masses using the IOTA terminology

2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2

and 3) to elucidate what explains large interobserver differences in calculated risk of

malignancy

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 3

3

The aims of this study were to estimate interobserver agreement when live ultrasound

scans are performed with regard to 1) describing adnexal masses using the IOTA terminology

2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2

and 3) to elucidate what explains large interobserver differences in calculated risk of

malignancy

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 4

4

Materials and methods

The Ethics Committee of Lund University approved the study protocol Informed consent

was obtained from all participants or participantacutes guardian after the nature of the procedures

had been fully explained

This is a prospective observational study of real-time live ultrasound examinations of

adnexal masses Consecutive patients referred for an ultrasound examination and found to have

an adnexal mass judged to need surgical removal were scanned according to the research

protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second

ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners

used the standardized IOTA examination and measurement technique and the IOTA

terminology (11) to describe their ultrasound findings and noted their results in a dedicated

paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical

variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal

therapy age of the patient) was obtained at the preoperative ultrasound examination by

sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound

examination performed by sonologist 2 The excised tissues underwent histological

examination and tumors were classified according to the criteria recommended by the

International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified

as malignant

The patients were examined in the lithotomy position with an empty urinary bladder (11)

Abdominal ultrasound examination was added when needed The ultrasound variables

assessed with regard to interobserver reproducibility are shown in Table 1 The size of the

lesion and that of its largest solid component were measured (largest diameter and mean of

three orthogonal diameters) using calipers on the frozen ultrasound image A color score was

assigned on the basis of subjective assessment of the color content of the tumor scan at power

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 5

5

Doppler ultrasound examination A color score of 1 indicates absence of color Doppler

signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a

moderate amount of color Doppler signals and a color score of 4 a large amount of color

Doppler signals in the tumor (11)

The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE

Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler

ultrasound examinations the following settings were used for the Voluson 730 Expert system

frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion

filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition

frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)

Statistical analysis

The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk

of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical

program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all

other statistical calculations including calculation of the risk of malignancy when using LR2

we used the Statistical Package for the Social Sciences (SPSS program IBM corp New

York NY USA PASW version 180)

Inter-observer agreement in the assessment of categorical variables was estimated by

calculating the percentage agreement Cohens kappa was used to estimate by how much the

observed agreement exceeded that expected by chance (17) Weighted kappa values are

presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very

good agreement beyond chance kappa values between 061 and 080 good agreement beyond

chance kappa values between 041 and 060 moderate agreement beyond chance kappa values

between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement

beyond chance (19)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 6

6

Inter-observer reproducibility of measurement results including the calculated risks of

malignancy using LR1 and LR2 was described as the difference between two measurement

results The differences between the measured values were plotted against the mean of the two

measurements (Bland-Altman plots) to assess the relationship between the differences and the

magnitude of the measurements (20) Systematic bias between two measurements was

estimated by calculating the 95 confidence interval (CI) of the mean difference (mean

difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the

two measurements Inter-observer agreement was expressed as the mean difference and limits

of agreement (20) Ninety-five percent of differences between any future measurements are

estimated to fall between the lower and upper limit of agreement Inter-observer reliability of

measurements results was estimated by calculating the intra-class correlation coefficient

(ICC) using analysis of variance (two way random model - absolute agreement this allows

generalization of the results to a population of observers) The ICC indicates the proportion of

the total variance in measurement results that can be explained by differences between the

individuals examined It depends both on the magnitude of measurement errors and the true

heterogeneity in the population in which measurements are made The more variable the

population investigated the greater the ICC and the less variable the population the smaller

the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in

clinical practice (22)

The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using

the information of sonologist 1 and 2 were calculated

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 7

7

Results

In all 117 consecutive women with adnexal masses who underwent surgery were

examined with ultrasound by the two sonologists as described above Thirty-four women had

bilateral adnexal masses The most complex mass - or the largest one if both masses had

similar ultrasound morphology - was used in our statistical analysis the mass to be included

being selected retrospectively to ensure that both sonologists contributed the same mass (right

or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study

population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)

women were postmenopausal There were 94 benign four borderline and 19 invasively

malignant adnexal masses (Table 2)

The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61

days (10th

and 90th

percentiles 13 and 132 range 1-204) for the tumors with benign histology

and median 14 days (10th

and 90th

percentiles 2 and 31 range 1-41) for the tumors with

malignant histology There was no relationship between the number of days between the

scans and the differences in measurement results or inter-observer agreement for discrete

variables (Supplementary Fig S1-S5 and Supplementary Table S1)

Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman

plots showed no clear trend for inter-observer differences in measurement results to change

with the magnitude of the measurement values Limits of agreement were wide for all

measurements There was one systematic difference between the two sonologists sonologist 1

(who always performed the first examination) obtaining higher measurement values for the

maximum diameter of the mass The least reliable measurement was the height of the largest

papillary projection

Inter-observer agreement when assessing categorical ultrasound variables is shown in Table

4 For most categorical ultrasound variables inter-observer agreement beyond chance was good

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 8

8

or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or

LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood

flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79

Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)

Bland Altman plots illustrating the relationship between the magnitudes of the estimated

risk of malignancy calculated using LR1 and LR2 and the interobserver difference in

calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the

interobserver differences are smallest for the lowest and highest risks and they are very small

for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially

change the shape of the scatter plot Therefore we present our results as absolute inter-

observer differences in calculated risk (in percentage units) see Table 5 There were no

systematic differences in calculated risks between the two sonologists and reliability

reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2

0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10

(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-

observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer

agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two

sonologists obtained different results with regard to malignancy when using LR1 the absolute

interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of

the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage

units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250

percentage units In the 18 cases where the two sonologists obtained different results with

regard to malignancy when using LR2 the absolute interobserver difference in calculated risk

ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 9

9

difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249

percentage units and in six cases it was gt25 percentage units

The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial

interobserver differences in the calculated risk of malignancy when using LR1 the

interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all

tumors) and when using LR2 the interobserver difference in calculated risk was gt25

percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver

differences explained these largest interobserver differences in calculated risk we scrutinized

each case where the difference was gt25 percentage units The results are shown in

Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical

variable explained the difference in four of the 11 cases while a discrepancy for two

categorical variables explained the difference in one case (differences in measurements being

lt5 mm in these five cases) In six cases there were differences in one or two categorical

variables but also substantial differences (6-61 mm) in at least one measurement result In no

case was the large difference in calculated risk explained exclusively by differences in

measurement results The categorical variables judged differently by the two sonologists in

these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection

(n =3) and acoustic shadowing (n = 2)

When using LR2 a discrepancy for one single categorical variable explained the large

difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in

measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists

judged acoustic shadowing differently In five cases there were differences in one categorical

variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid

component In yet another case there were differences in two categorical variables as well as in

the measurement of the largest solid component The categorical variables judged differently

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 10

10

by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n

= 5) ascites (n = 3) and flow in papillary projection (n = 2)

The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100

(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)

for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using

LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for

sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 11

11

Discussion

We have shown substantial inter-observer variability in the results of measurements taken in

adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very

good or good for most categorical variables but it was only moderate or fair for some Inter-

observer agreement above chance was poorest for variables heavily dependent on subjective

evaluation and or machine settings ie color score presence of color Doppler signals in

papillary projections irregular cyst walls acoustic shadowing (all four variables being

included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this

there was good inter-observer agreement when classifying tumors as benign or malignant using

the predetermined risk of malignancy cut-off of 10 However in some cases there were

substantial differences in the calculated risk of malignancy between the two sonologists the

difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all

tumors when using LR2

The strength of our study is that it provides new information To the best of our knowledge

there is only one publication reporting on interobserver agreement with regard to describing

ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live

ultrasound examinations (23) However that study (23) evaluated interobserver agreement

with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables

included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated

between examiners with different levels of experience The variable with poorest agreement

beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no

published study that has estimated inter-observer reproducibility of the calculated risk of

malignancy using LR1 or LR2 after live scanning

It is a limitation of our study that up to 204 days elapsed between the scans of the two

sonologists (up to 41 days for malignant masses) Because days elapsed between the scans

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 12

12

theoretically the inter-observer differences could be explained by the lesions having changed

in size or morphology between the scans We find this highly unlikely for the following

reasons First there was no relationship between the differences in measurement results and

the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear

tendency for inter-observer agreement for discrete variables to depend on the time between the

scans (Supplementary Table S1) Second one would expect a lesion and its components to

increase in size with time but sonologist 1 performing the first scan obtained higher

measurement values than sonologist 2 Third it is our experience after having performed

gynecological scans for more than 20 years that the ultrasound morphology of both benign and

malignant adnexal masses remains constant over time that benign adnexal lesions grow

slowly and that malignant masses do not change appreciably in size even during 1 month of

observation Therefore we believe that the discrepancies between the two sonologists reflect

true inter-observer differences and not a change of the masses over time A second limitation is

that we did not include estimation of the reproducibility of retrieving anamnestic information

(current hormonal therapy personal history of ovarian cancer) the anamnestic information

collected by the second sonologist being used in all cases It cannot be entirely excluded that

patients would answer differently when asked by different sonologists or that sonologists

could interpret the answers of the patients differently A third limitation is that we did not

estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to

be unacceptable to patients For the same reason only two sonologists were involved in this

study and our results are generalizable only to sonologists with a similar level of experience

The results of this live scanning study are similar to those of another study in which the

same sonologists assessed the same variables using 3D ultrasound volumes from adnexal

masses in another tumor population (15) The similarity in results between the two studies is

surprising because the conditions when assessing 3D ultrasound volumes are different from

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 13

13

those during a live scan When evaluating ultrasound volumes sonologists are exposed to the

same ultrasound images and so any interobserver difference should be explained exclusively

by differences in interpreting the ultrasound information During a live scan there are more

sources of bias This could result either in poorer or better interobserver agreement than when

3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use

different machine settings and scanning conditions may change from one minute to another

better because the dynamic nature of live scanning facilitates discrimination between solid

components and amorphous tissue

Our results showed that two experienced sonologists agreed quite well in their classification

of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2

and that the diagnostic performance of LR1 and LR2 with regard to discrimination between

benign and malignant tumors was similar for the two sonologists and similar to that reported

by others (14 26-28) This is reassuring because the main purpose of using model LR1 and

LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be

used not only to classify adnexal masses as benign or malignant but also to counsel a patient

about her individual risk of malignancy (13) If to use the calculated risk for individual

counseling one must be reasonably certain not only that the estimated risk agrees well with the

true risk (when externally validated both LR1 and LR2 underestimated the true risk especially

in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that

different examiners will obtain similar risk estimates Our results show that risks estimates may

differ substantially between experienced observers the difference in estimated risk being gt250

percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver

agreement above chance was poorest for those variables in the models that are heavily

dependent on subjective evaluation ie color score presence of color Doppler signals in

papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 14

14

explained most of the largest inter-observer differences in calculated risk of malignancy In

models based on few variables changing values in only one variable may result in large

differences in predicted risks while a model with many variables is less vulnerable to a change

in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )

When using LR2 (which includes six variables) a change in value for one single categorical

variable explained an inter-observer difference in calculated risk gt25 percentage units in eight

of 14 cases while when using LR1 (which includes 12 variables) a change in value for one

single categorical variable explained an inter-observer difference in calculated risk gt25

percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1

and LR2 and has great impact on the calculated risk in LR2 with only six variables In our

hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic

shadowing was at most moderate The interobserver agreement for color score was only fair in

our study and color score is an important variable in LR1

To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-

observer differences in descriptions and measurements of adnexal masses using the IOTA

terminology and measurement technique need to be reduced One way to achieve this could be by

providing courses on and training in how to examine and describe adnexal masses using the

IOTA terms Interactive courses in which a large number of ultrasound images are discussed with

the course participants are likely to be very valuable in this respect More precise definitions of

the IOTA terms for example by providing ample imaging material would probably also help

improve inter-observer agreement Special attention should be given to the variables with poorest

reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood

flow in papillary projections Until better inter-observer agreement in the calculated risk of

malignancy using LR1 and LR2 has been shown one should be cautious with using the risk

estimates for individual patient counselling

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 15

15

Acknowledgements

None

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 16

16

References

1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal

sonography Gynecol Oncol 199037224-9

2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the

diagnosis of ovarian masses J Reprod Med 199035491-5

3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the

contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47

4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-

scale ultrasound imaging for discrimination of benign and malignant pelvic masses

Ultrasound Obstet Gynecol 199914273-83

5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al

Subjective assessment of adnexal masses with the use of ultrasonography an analysis

of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6

6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al

Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific

diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70

7 Valentin L Use of morphology to characterize and manage common adnexal masses

Best Pract Res Clin Obstet Gynaecol 20041871-89

8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of

malignancy in adnexal masses using multivariate logistic regression analysis

Ultrasound Obstet Gynecol 19971041-7

9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al

A comparison of methods for the preoperative discrimination between benign and

malignant adnexal masses the development of a new logistic regression model Am J

Obstet Gynecol 199918157-65

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 17

17

10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic

morphologic and color Doppler findings developed to predict adnexal malignancy J

Ultrasound Med 199918837-42

11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms

definitions and measurements to describe the sonographic features of adnexal tumors a

consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group

Ultrasound Obstet Gynecol 200016500-5

12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al

Logistic regression model to distinguish between the benign and malignant adnexal

mass before surgery a multicenter study by the International Ovarian Tumor Analysis

Group J Clin Oncol 2005348794-801

13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al

Improving strategies for diagnosing ovarian cancer a summary of the International

Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20

14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al

Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression

models a temporal and external validation study by the IOTA group Ultrasound Obstet

Gynecol 201036226-34

15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing

adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and

definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet

Gynecol 201341318-27

16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al

Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in

Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 18

18

17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020

37ndash46

18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228

303-8

19 Brennan P Silman A Statistical methods for assessing observer variability in clinical

measures BMJ 1992304 1491-4

20 Bland JM Altman DG Statistical methods for assessing agreement between two

methods of clinical measurement Lancet 19861307-10

21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of

measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-

75

22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J

Clin Epidemiol 20116496-106

23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-

observer agreement with regard to describing adnexal masses using the IOTA simple

rules in a real-time setting and when using three-dimensional ultrasound volumes and

digital clips Ultrasound Obstet Gynecol 20144495-100

24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al

Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet

Gynecol 200831681-90

25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al

Simple ultrasound rules to distinguish between benign and malignant adnexal masses

before surgery prospective validation by IOTA group BMJ 2010341c6839

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

agreement in ultrasound assessment of ovarian masses 19

19

26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al

Prospective internal validation of mathematical models to predict malignancy in

adnexal masses results from the international ovarian tumor analysis study Clin Cancer

Res 200915684-91

27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation

of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer

Ultrasound Obstet Gynecol 201240355-9

28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al

A prospective validation of the IOTA logistic regression models (LR1 and LR2) in

comparison to subjective pattern recognition for the diagnosis of ovarian cancer

Int J Gynecol Cancer 2013231583-9

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Table 2 Histological diagnoses of the masses

___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94

Benign simple cyst 7

Endometrioma 10

Dermoid cyst 16

Serous cystadenoma 16

Mucinous cystadenoma 18

Myomafibroma 9

Cystadenofibroma 11

Paraovarian cyst 5

Sactosalpinx chronic salpingitis 1

Leydig cell tumor 1

Borderline tumors 4

Serous 2

Mucinous 1

Endometrioid 1

Invasive malignancy 19

Primary ovarian adenocarcinoma 13

Granulosa cell tumor 3

Dysgerminoma 1

Leiomyosarcoma 1

Malignant aggressive B-cell lymphoma 1

___________________________________________________________________________

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables

used to describe adnexal masses

Measurement results

(both sonologists)

Difference in mm between two measurements

made by sonologists 1 and 2a

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Variables used in

models LR1 and LR2

Maximum diameter of

adnexal mass mm

70 (n=234)

10 ndash 313

380 (n=117)

112 ndash 648

-2524 ndash 3282

0958 (0937 ndash 0971)

Maximum diameter of

largest solid component

mmb

2950 (n=122)

5 ndash 180

192 (n=61)

-174 ndash 558

-2666 ndash 3050

0942 (0905 ndash 0-964)

Other variables used to

describe adnexal mass

Mean diameter

of adnexal mass mm

585 (n=234)

9 ndash 240

105 (n=117)

-015 ndash 195

-861 ndash 1072

0971 (0958 ndash 0980)

Mean diameter

of largest solid

component mmb

22 (n=122)

4 ndash 156

059 (n=61)

-182 ndash 298

-1816 ndash 1932

0962 (0937 ndash 0977)

Height of largest papillary

projection mmc

8 (n=42)

3 ndash 25

-051 (n=21)

-293 ndash 191

-1161 ndash 1059

0609 (0245 ndash 0821)

a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1

CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers

c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as

follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for

comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional

ultrasound15

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses

Agreement Kappa value

Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d

le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2

Calculated risk of malignancy

(both sonologists)

Difference between the risk calculated

by sonologist 1 and 2

Intra-CC

Parameter Median

Range

Mean 95 CI Limits of

agreement

Point estimate

(95 CI)

Risk of malignancy

calculated using LR1

785 (n=234)

010 ndash 9910

-053 (n=117)

-307 ndash 201

-2805 ndash 2699

0911 (0874 ndash 0937)

Risk of malignancy

calculated using LR2

665 (n=234)

010 ndash 9840

002 (n=117)

-306 ndash 310

-3322 ndash 3326

0832 (0766 ndash 0880)

CI confidence interval Intra-CC intra-class correlation coefficient

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Legends for figure

Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer

1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic

regression model LR1 The plot manifests a diamond shape the differences being smallest for

the lowest and highest risks For risks lt 25 and gt 95 the differences are very small

LOA limits of agreement b) Scatterplot showing the relationship between inter-observer

difference in calculated risk and magnitude of calculated risk when using logistic regression

model LR2 The plot manifests a diamond shape the differences being smallest for the lowest

and highest risks

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound

Updated version

1011581078-0432CCR-14-0906doi

Access the most recent version of this article at

Material

Supplementary

httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1

Access the most recent supplemental material at

Manuscript

Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been

E-mail alerts related to this article or journalSign up to receive free email-alerts

Subscriptions

Reprints and

pubsaacrorgDepartment at

To order reprints of this article or to subscribe to the journal contact the AACR Publications

Permissions

Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)

httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link

Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906

  • Article File
  • Article File
  • Article File
  • Table 1
  • Table 2
  • Table 3
  • Table 4
  • Table 5
  • Article File
  • Figure 1

Recommended