agreement in ultrasound assessment of ovarian masses 1
1
Inter-observer agreement in describing the ultrasound appearance of adnexal
masses and in calculating the risk of malignancy using logistic regression
models
Povilas Sladkevicius Lil Valentin
Department of Obstetrics and Gynecology Skaringne University Hospital Malmouml Lund
University S-20502 Malmouml Sweden
Short title agreement in ultrasound assessment of ovarian masses
Keywords ovarian neoplasms ultrasonography power Doppler ultrasound reproducibility
of results
Grant support
This work was supported by the Swedish Medical Research Council (grant no B0012201 and
D0228201) funds administered by Skaringne University Hospital Allmaumlnna Sjukhusets i Malmouml
Stiftelse foumlr bekaumlmpande av cancer (the Malmouml General Hospital Foundation for fighting
against cancer) Landstingsfinansierad regional forskning and ALF-medel (ie two Swedish
governmental grants from the region of Scania) Funds administered by Skaringne University
Hospital
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Corresponding author Povilas Sladkevicius
Department of Obstetrics and Gynecology
Skaringne University Hospital Malmouml S-20502 Malmouml Sweden
Telephone +46 40 332636 Fax +46 40 962600
Email PovilasSladkeviciusmedluse
Conflict of interest statement
Both authors declare that they have no conflicts of interest
The word count excluding abstract (250 words) and references 3688
Total number of figures 1 to publish and 5 as supplemental material
Total number of tables 5 to publish and 3 as supplemental material
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Statement of translational relevance
The International Ovarian Tumor Analysis (IOTA) group has developed two logistic
regression models (LR1 and LR2) including clinical and ultrasound variables for calculation
of the risk of malignancy in adnexal masses It has been suggested that LR1 and LR 2 can be
used to counsel patients about their individual risk of malignancy and so may have a role in
personalized medicine In this work we found large inter-observer differences (gt 25
percentage units) in the calculated risk of malignancy in about 10 of cases The differences
were explained by ultrasound examiners interpreting ultrasound images differently We
suggest measures to improve inter-observer agreement Until better inter-observer agreement
in the calculated risk of malignancy using LR1 and LR2 has been shown one should be
cautious with using the risk estimate for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 1
1
Abstract
Purpose To estimate inter-observer agreement with regard to describing adnexal masses
using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of
malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate
what explained the largest inter-observer differences in calculated risk of malignancy
Experimental design 117 women with adnexal masses were examined with transvaginal
gray scale and power Doppler ultrasound by two independent experienced sonologists who
described the masses using IOTA terminology The risk of malignancy was calculated using
LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy
Results There were 94 benign four borderline and 19 invasively malignant tumors There
was substantial variability between the two sonologists in measurement results and some
variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)
Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)
Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the
inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12
(14117) of tumors respectively Differences in assessment of wall irregularity acoustic
shadowing color score and color flow in papillary projections explained most of these largest
differences
Conclusions Inter-observer agreement in classifying tumors as benign or malignant
using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because
risks estimates may differ substantially between sonologists one should be cautious with using
the risk value for counseling patients about their individual risk
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Introduction
One of the first successful attempts to use ultrasound to discriminate between benign and
malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal
masses into five categories unilocular unilocular solid multilocular multilocular solid and
solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts
were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses
- pattern recognition - proved to be an excellent method for discriminating between benign
and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg
endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition
several research teams (8-10) created logistic regression models including clinical and
ultrasound information to calculate the individual risk of malignancy in adnexal masses
Because of unclear definitions of many of the ultrasound variables included in these models
the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and
definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA
group also created and validated several mathematical models in which these standardized
terms and definitions were used to calculate the risk of malignancy for each individual
adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2
including 12 and six variables respectively (see Table 1) were suggested to be suitable for use
in clinical practice (1314) However even when using standardized terms and definitions
ultrasound examiners may evaluate the features of an adnexal mass differently There may
also be variability in measurement results This means that the risk of malignancy calculated
by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that
this is indeed the case when experienced ultrasound examiners analyze three-dimensional
(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound
volumes does not necessarily reflect a situation where live examinations are performed
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 3
3
The aims of this study were to estimate interobserver agreement when live ultrasound
scans are performed with regard to 1) describing adnexal masses using the IOTA terminology
2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2
and 3) to elucidate what explains large interobserver differences in calculated risk of
malignancy
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Corresponding author Povilas Sladkevicius
Department of Obstetrics and Gynecology
Skaringne University Hospital Malmouml S-20502 Malmouml Sweden
Telephone +46 40 332636 Fax +46 40 962600
Email PovilasSladkeviciusmedluse
Conflict of interest statement
Both authors declare that they have no conflicts of interest
The word count excluding abstract (250 words) and references 3688
Total number of figures 1 to publish and 5 as supplemental material
Total number of tables 5 to publish and 3 as supplemental material
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Statement of translational relevance
The International Ovarian Tumor Analysis (IOTA) group has developed two logistic
regression models (LR1 and LR2) including clinical and ultrasound variables for calculation
of the risk of malignancy in adnexal masses It has been suggested that LR1 and LR 2 can be
used to counsel patients about their individual risk of malignancy and so may have a role in
personalized medicine In this work we found large inter-observer differences (gt 25
percentage units) in the calculated risk of malignancy in about 10 of cases The differences
were explained by ultrasound examiners interpreting ultrasound images differently We
suggest measures to improve inter-observer agreement Until better inter-observer agreement
in the calculated risk of malignancy using LR1 and LR2 has been shown one should be
cautious with using the risk estimate for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 1
1
Abstract
Purpose To estimate inter-observer agreement with regard to describing adnexal masses
using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of
malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate
what explained the largest inter-observer differences in calculated risk of malignancy
Experimental design 117 women with adnexal masses were examined with transvaginal
gray scale and power Doppler ultrasound by two independent experienced sonologists who
described the masses using IOTA terminology The risk of malignancy was calculated using
LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy
Results There were 94 benign four borderline and 19 invasively malignant tumors There
was substantial variability between the two sonologists in measurement results and some
variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)
Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)
Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the
inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12
(14117) of tumors respectively Differences in assessment of wall irregularity acoustic
shadowing color score and color flow in papillary projections explained most of these largest
differences
Conclusions Inter-observer agreement in classifying tumors as benign or malignant
using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because
risks estimates may differ substantially between sonologists one should be cautious with using
the risk value for counseling patients about their individual risk
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Introduction
One of the first successful attempts to use ultrasound to discriminate between benign and
malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal
masses into five categories unilocular unilocular solid multilocular multilocular solid and
solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts
were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses
- pattern recognition - proved to be an excellent method for discriminating between benign
and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg
endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition
several research teams (8-10) created logistic regression models including clinical and
ultrasound information to calculate the individual risk of malignancy in adnexal masses
Because of unclear definitions of many of the ultrasound variables included in these models
the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and
definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA
group also created and validated several mathematical models in which these standardized
terms and definitions were used to calculate the risk of malignancy for each individual
adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2
including 12 and six variables respectively (see Table 1) were suggested to be suitable for use
in clinical practice (1314) However even when using standardized terms and definitions
ultrasound examiners may evaluate the features of an adnexal mass differently There may
also be variability in measurement results This means that the risk of malignancy calculated
by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that
this is indeed the case when experienced ultrasound examiners analyze three-dimensional
(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound
volumes does not necessarily reflect a situation where live examinations are performed
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 3
3
The aims of this study were to estimate interobserver agreement when live ultrasound
scans are performed with regard to 1) describing adnexal masses using the IOTA terminology
2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2
and 3) to elucidate what explains large interobserver differences in calculated risk of
malignancy
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Statement of translational relevance
The International Ovarian Tumor Analysis (IOTA) group has developed two logistic
regression models (LR1 and LR2) including clinical and ultrasound variables for calculation
of the risk of malignancy in adnexal masses It has been suggested that LR1 and LR 2 can be
used to counsel patients about their individual risk of malignancy and so may have a role in
personalized medicine In this work we found large inter-observer differences (gt 25
percentage units) in the calculated risk of malignancy in about 10 of cases The differences
were explained by ultrasound examiners interpreting ultrasound images differently We
suggest measures to improve inter-observer agreement Until better inter-observer agreement
in the calculated risk of malignancy using LR1 and LR2 has been shown one should be
cautious with using the risk estimate for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 1
1
Abstract
Purpose To estimate inter-observer agreement with regard to describing adnexal masses
using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of
malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate
what explained the largest inter-observer differences in calculated risk of malignancy
Experimental design 117 women with adnexal masses were examined with transvaginal
gray scale and power Doppler ultrasound by two independent experienced sonologists who
described the masses using IOTA terminology The risk of malignancy was calculated using
LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy
Results There were 94 benign four borderline and 19 invasively malignant tumors There
was substantial variability between the two sonologists in measurement results and some
variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)
Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)
Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the
inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12
(14117) of tumors respectively Differences in assessment of wall irregularity acoustic
shadowing color score and color flow in papillary projections explained most of these largest
differences
Conclusions Inter-observer agreement in classifying tumors as benign or malignant
using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because
risks estimates may differ substantially between sonologists one should be cautious with using
the risk value for counseling patients about their individual risk
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Introduction
One of the first successful attempts to use ultrasound to discriminate between benign and
malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal
masses into five categories unilocular unilocular solid multilocular multilocular solid and
solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts
were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses
- pattern recognition - proved to be an excellent method for discriminating between benign
and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg
endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition
several research teams (8-10) created logistic regression models including clinical and
ultrasound information to calculate the individual risk of malignancy in adnexal masses
Because of unclear definitions of many of the ultrasound variables included in these models
the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and
definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA
group also created and validated several mathematical models in which these standardized
terms and definitions were used to calculate the risk of malignancy for each individual
adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2
including 12 and six variables respectively (see Table 1) were suggested to be suitable for use
in clinical practice (1314) However even when using standardized terms and definitions
ultrasound examiners may evaluate the features of an adnexal mass differently There may
also be variability in measurement results This means that the risk of malignancy calculated
by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that
this is indeed the case when experienced ultrasound examiners analyze three-dimensional
(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound
volumes does not necessarily reflect a situation where live examinations are performed
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 3
3
The aims of this study were to estimate interobserver agreement when live ultrasound
scans are performed with regard to 1) describing adnexal masses using the IOTA terminology
2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2
and 3) to elucidate what explains large interobserver differences in calculated risk of
malignancy
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 1
1
Abstract
Purpose To estimate inter-observer agreement with regard to describing adnexal masses
using the International Ovarian Tumor Analysis (IOTA) terminology and the risk of
malignancy calculated using IOTA logistic regression models LR1 and LR2 and to elucidate
what explained the largest inter-observer differences in calculated risk of malignancy
Experimental design 117 women with adnexal masses were examined with transvaginal
gray scale and power Doppler ultrasound by two independent experienced sonologists who
described the masses using IOTA terminology The risk of malignancy was calculated using
LR1 and LR2 A predetermined risk of malignancy cutoff of 10 indicated malignancy
Results There were 94 benign four borderline and 19 invasively malignant tumors There
was substantial variability between the two sonologists in measurement results and some
variability in assessment of categorical variables (agreement 40-98 Kappa 030-091)
Inter-observer agreement when classifying tumors as benign or malignant was 84 (98117)
Kappa 068 for LR1 and for LR2 85 (99117) Kappa 068 When using LR1 and LR2 the
inter-observer difference in calculated risk was gt25 percentage units in 9 (11117) and 12
(14117) of tumors respectively Differences in assessment of wall irregularity acoustic
shadowing color score and color flow in papillary projections explained most of these largest
differences
Conclusions Inter-observer agreement in classifying tumors as benign or malignant
using the risk of malignancy cut off of 10 for LR1 and LR2 was good However because
risks estimates may differ substantially between sonologists one should be cautious with using
the risk value for counseling patients about their individual risk
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Introduction
One of the first successful attempts to use ultrasound to discriminate between benign and
malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal
masses into five categories unilocular unilocular solid multilocular multilocular solid and
solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts
were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses
- pattern recognition - proved to be an excellent method for discriminating between benign
and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg
endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition
several research teams (8-10) created logistic regression models including clinical and
ultrasound information to calculate the individual risk of malignancy in adnexal masses
Because of unclear definitions of many of the ultrasound variables included in these models
the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and
definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA
group also created and validated several mathematical models in which these standardized
terms and definitions were used to calculate the risk of malignancy for each individual
adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2
including 12 and six variables respectively (see Table 1) were suggested to be suitable for use
in clinical practice (1314) However even when using standardized terms and definitions
ultrasound examiners may evaluate the features of an adnexal mass differently There may
also be variability in measurement results This means that the risk of malignancy calculated
by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that
this is indeed the case when experienced ultrasound examiners analyze three-dimensional
(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound
volumes does not necessarily reflect a situation where live examinations are performed
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 3
3
The aims of this study were to estimate interobserver agreement when live ultrasound
scans are performed with regard to 1) describing adnexal masses using the IOTA terminology
2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2
and 3) to elucidate what explains large interobserver differences in calculated risk of
malignancy
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 2
2
Introduction
One of the first successful attempts to use ultrasound to discriminate between benign and
malignant adnexal masses was made by Granberg and coworkers (1) They classified adnexal
masses into five categories unilocular unilocular solid multilocular multilocular solid and
solid tumors and found that unilocular cysts unilocular solid cysts and multilocular cysts
were rarely malignant Later subjective interpretation of ultrasound images of adnexal masses
- pattern recognition - proved to be an excellent method for discriminating between benign
and malignant adnexal masses (2-5) and also for making a specific diagnosis (eg
endometrioma hydrosalpinx etcetera) (367) As an alternative to pattern recognition
several research teams (8-10) created logistic regression models including clinical and
ultrasound information to calculate the individual risk of malignancy in adnexal masses
Because of unclear definitions of many of the ultrasound variables included in these models
the International Ovarian Tumor Analysis (IOTA) group suggested standardized terms and
definitions to be used when describing ultrasound images of adnexal masses (11) The IOTA
group also created and validated several mathematical models in which these standardized
terms and definitions were used to calculate the risk of malignancy for each individual
adnexal mass (12-14) Of these models the logistic regression models LR1 and LR2
including 12 and six variables respectively (see Table 1) were suggested to be suitable for use
in clinical practice (1314) However even when using standardized terms and definitions
ultrasound examiners may evaluate the features of an adnexal mass differently There may
also be variability in measurement results This means that the risk of malignancy calculated
by LR1 or LR2 may vary both within and between ultrasound examiners We have shown that
this is indeed the case when experienced ultrasound examiners analyze three-dimensional
(3D) ultrasound volumes of adnexal masses (15) However analysis of 3D ultrasound
volumes does not necessarily reflect a situation where live examinations are performed
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 3
3
The aims of this study were to estimate interobserver agreement when live ultrasound
scans are performed with regard to 1) describing adnexal masses using the IOTA terminology
2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2
and 3) to elucidate what explains large interobserver differences in calculated risk of
malignancy
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 3
3
The aims of this study were to estimate interobserver agreement when live ultrasound
scans are performed with regard to 1) describing adnexal masses using the IOTA terminology
2) the risk of malignancy calculated using the IOTA logistic regression models LR1 and LR2
and 3) to elucidate what explains large interobserver differences in calculated risk of
malignancy
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 4
4
Materials and methods
The Ethics Committee of Lund University approved the study protocol Informed consent
was obtained from all participants or participantacutes guardian after the nature of the procedures
had been fully explained
This is a prospective observational study of real-time live ultrasound examinations of
adnexal masses Consecutive patients referred for an ultrasound examination and found to have
an adnexal mass judged to need surgical removal were scanned according to the research
protocol by sonologist 1 (PS) as part of the clinical ultrasound examination A second
ultrasound examination was carried out before surgery by sonologist 2 (LV) Both examiners
used the standardized IOTA examination and measurement technique and the IOTA
terminology (11) to describe their ultrasound findings and noted their results in a dedicated
paper form Sonologist 2 was blinded to the results of sonologist 1 Information on the clinical
variables included in LR1 and LR2 (personal history of ovarian cancer current hormonal
therapy age of the patient) was obtained at the preoperative ultrasound examination by
sonologist 2 All patients were operated on within 90 days after the preoperative ultrasound
examination performed by sonologist 2 The excised tissues underwent histological
examination and tumors were classified according to the criteria recommended by the
International Federation of Gynecology and Obstetrics (16) Borderline tumors were classified
as malignant
The patients were examined in the lithotomy position with an empty urinary bladder (11)
Abdominal ultrasound examination was added when needed The ultrasound variables
assessed with regard to interobserver reproducibility are shown in Table 1 The size of the
lesion and that of its largest solid component were measured (largest diameter and mean of
three orthogonal diameters) using calipers on the frozen ultrasound image A color score was
assigned on the basis of subjective assessment of the color content of the tumor scan at power
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 5
5
Doppler ultrasound examination A color score of 1 indicates absence of color Doppler
signals a color score of 2 a minimal amount of color Doppler signals a color score of 3 a
moderate amount of color Doppler signals and a color score of 4 a large amount of color
Doppler signals in the tumor (11)
The ultrasound systems used were GE Voluson 730 Expert or GE Voluson E8 (GE
Healthcare Zipf Austria) with a 5ndash9-MHz transvaginal transducer For power Doppler
ultrasound examinations the following settings were used for the Voluson 730 Expert system
frequency 6-9 ( normal ) MHz pulse repetition frequency 06 kHz gain 08 wall motion
filter low 1 (40 Hz) and for Voluson E8 frequency 6-9 ( normal ) MHz pulse repetition
frequency 06 kHz gain -40 wall motion filter low 1 (40 Hz)
Statistical analysis
The IOTA3 study screen (astraia GMBH Munich Germany) was used to calculate the risk
of malignancy according to LR1 Weighted Kappa indices were calculated using the statistical
program Stata Version 101 for Windows (StataCorp LP College Station TX USA) For all
other statistical calculations including calculation of the risk of malignancy when using LR2
we used the Statistical Package for the Social Sciences (SPSS program IBM corp New
York NY USA PASW version 180)
Inter-observer agreement in the assessment of categorical variables was estimated by
calculating the percentage agreement Cohens kappa was used to estimate by how much the
observed agreement exceeded that expected by chance (17) Weighted kappa values are
presented where appropriate (18) It has been suggested that Kappa values gt081 indicate very
good agreement beyond chance kappa values between 061 and 080 good agreement beyond
chance kappa values between 041 and 060 moderate agreement beyond chance kappa values
between 021 and 040 fair agreement beyond chance and kappa values lt020 poor agreement
beyond chance (19)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 6
6
Inter-observer reproducibility of measurement results including the calculated risks of
malignancy using LR1 and LR2 was described as the difference between two measurement
results The differences between the measured values were plotted against the mean of the two
measurements (Bland-Altman plots) to assess the relationship between the differences and the
magnitude of the measurements (20) Systematic bias between two measurements was
estimated by calculating the 95 confidence interval (CI) of the mean difference (mean
difference plusmn2 SE) If zero lay within this interval no bias was assumed to exist between the
two measurements Inter-observer agreement was expressed as the mean difference and limits
of agreement (20) Ninety-five percent of differences between any future measurements are
estimated to fall between the lower and upper limit of agreement Inter-observer reliability of
measurements results was estimated by calculating the intra-class correlation coefficient
(ICC) using analysis of variance (two way random model - absolute agreement this allows
generalization of the results to a population of observers) The ICC indicates the proportion of
the total variance in measurement results that can be explained by differences between the
individuals examined It depends both on the magnitude of measurement errors and the true
heterogeneity in the population in which measurements are made The more variable the
population investigated the greater the ICC and the less variable the population the smaller
the ICC (21) It has been suggested that ICC values gt090 are needed for a test to be used in
clinical practice (22)
The sensitivity and specificity with regard to malignancy of LR1 and LR2 calculated using
the information of sonologist 1 and 2 were calculated
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 7
7
Results
In all 117 consecutive women with adnexal masses who underwent surgery were
examined with ultrasound by the two sonologists as described above Thirty-four women had
bilateral adnexal masses The most complex mass - or the largest one if both masses had
similar ultrasound morphology - was used in our statistical analysis the mass to be included
being selected retrospectively to ensure that both sonologists contributed the same mass (right
or left) to the analysis Thus 117 adnexal masses from 117 patients constitute our study
population The womenrsquos age ranged between 14 and 88 years (median 53) and 63 (54)
women were postmenopausal There were 94 benign four borderline and 19 invasively
malignant adnexal masses (Table 2)
The time elapsed between the ultrasound examination of sonologist 1 and 2 was median 61
days (10th
and 90th
percentiles 13 and 132 range 1-204) for the tumors with benign histology
and median 14 days (10th
and 90th
percentiles 2 and 31 range 1-41) for the tumors with
malignant histology There was no relationship between the number of days between the
scans and the differences in measurement results or inter-observer agreement for discrete
variables (Supplementary Fig S1-S5 and Supplementary Table S1)
Inter-observer reproducibility of measurement results is shown in Table 3 Bland-Altman
plots showed no clear trend for inter-observer differences in measurement results to change
with the magnitude of the measurement values Limits of agreement were wide for all
measurements There was one systematic difference between the two sonologists sonologist 1
(who always performed the first examination) obtaining higher measurement values for the
maximum diameter of the mass The least reliable measurement was the height of the largest
papillary projection
Inter-observer agreement when assessing categorical ultrasound variables is shown in Table
4 For most categorical ultrasound variables inter-observer agreement beyond chance was good
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 8
8
or very good (19) Inter-observer agreement beyond chance for variables included in LR1 or
LR2 was poorest for color score (agreement 40 weighted Kappa 036) presence of blood
flow in papillary projection (agreement 90 Kappa 048) irregular cyst wall (agreement 79
Kappa 056) and acoustic shadowing (agreement 85 Kappa 058)
Bland Altman plots illustrating the relationship between the magnitudes of the estimated
risk of malignancy calculated using LR1 and LR2 and the interobserver difference in
calculated risk are shown in Figure 1 The plots manifest a diamond shape ie the
interobserver differences are smallest for the lowest and highest risks and they are very small
for risks lt25 and gt95 Logarithmic transformation of the data (20) did not substantially
change the shape of the scatter plot Therefore we present our results as absolute inter-
observer differences in calculated risk (in percentage units) see Table 5 There were no
systematic differences in calculated risks between the two sonologists and reliability
reflected by the ICC-values was good (22) with ICC values for LR1 0911 and for LR2
0832 When classifying tumors as having a risk of malignancy lt10 (benign) or gt10
(malignant) using LR1 or LR2 the inter-observer agreement was good for both models inter-
observer agreement 84 (98117) Kappa value 068 for model LR1 and inter-observer
agreement 85 (99117) Kappa 068 for model LR2 In the 19 cases where the two
sonologists obtained different results with regard to malignancy when using LR1 the absolute
interobserver differences in calculated risk ranged from 07 to 596 percentage units in six of
the 19 cases the absolute interobserver difference in calculated risk was lt100 percentage
units in nine cases it was 100 ndash 249 percentage units and in four cases it was gt250
percentage units In the 18 cases where the two sonologists obtained different results with
regard to malignancy when using LR2 the absolute interobserver difference in calculated risk
ranged from 88 to 679 percentage units in two of the 18 cases the absolute interobserver
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 9
9
difference in calculated risk was lt100 percentage units in ten cases it was 100 ndash 249
percentage units and in six cases it was gt25 percentage units
The Bland Altman plots (Figure 1) illustrate that for some tumors there were substantial
interobserver differences in the calculated risk of malignancy when using LR1 the
interobserver difference in calculated risk was gt25 percentage units in 11 tumors (9 of all
tumors) and when using LR2 the interobserver difference in calculated risk was gt25
percentage units in 14 tumors (12 of all tumors) To elucidate which interobserver
differences explained these largest interobserver differences in calculated risk we scrutinized
each case where the difference was gt25 percentage units The results are shown in
Supplementary Tables S2 and S3 When using LR1 a discrepancy for one single categorical
variable explained the difference in four of the 11 cases while a discrepancy for two
categorical variables explained the difference in one case (differences in measurements being
lt5 mm in these five cases) In six cases there were differences in one or two categorical
variables but also substantial differences (6-61 mm) in at least one measurement result In no
case was the large difference in calculated risk explained exclusively by differences in
measurement results The categorical variables judged differently by the two sonologists in
these 11 cases were color score (n = 5) irregular cyst wall (n = 5) flow in papillary projection
(n =3) and acoustic shadowing (n = 2)
When using LR2 a discrepancy for one single categorical variable explained the large
difference in calculated risk (gt25 percentage units) in eight of the 14 cases (differences in
measurements being lt5 mm in these eight cases) and in four of the eight cases the sonologists
judged acoustic shadowing differently In five cases there were differences in one categorical
variable but also a substantial difference (9 mm-61 mm) in the measurement of the largest solid
component In yet another case there were differences in two categorical variables as well as in
the measurement of the largest solid component The categorical variables judged differently
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 10
10
by the two sonologists in these 14 cases were acoustic shadowing (n = 5) irregular cyst wall (n
= 5) ascites (n = 3) and flow in papillary projection (n = 2)
The sensitivity with regard to malignancy when using LR1 (10 risk cutoff) was 100
(2323 95 CI 82-100) for both sonologists the specificity was 74 (7094 95 CI 64-82)
for sonologist 1 and 63 (5994 95 CI 53-72) for sonologist 2 The sensitivity when using
LR2 was 100 (2323 95 CI 82-100) for sonologist 1 and 91 (2123 95 CI 72-98) for
sonologist 2 and the specificity was 755 (7194 95 CI 65-84) for both sonologists
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 11
11
Discussion
We have shown substantial inter-observer variability in the results of measurements taken in
adnexal masses (wide limits of agreement) Inter-observer agreement beyond chance was very
good or good for most categorical variables but it was only moderate or fair for some Inter-
observer agreement above chance was poorest for variables heavily dependent on subjective
evaluation and or machine settings ie color score presence of color Doppler signals in
papillary projections irregular cyst walls acoustic shadowing (all four variables being
included in LR1 or LR2) echogenicity of cyst fluid and ovarian crescent sign Despite this
there was good inter-observer agreement when classifying tumors as benign or malignant using
the predetermined risk of malignancy cut-off of 10 However in some cases there were
substantial differences in the calculated risk of malignancy between the two sonologists the
difference being gt250 percentage units in 9 of all tumors when using LR1 and in 12 of all
tumors when using LR2
The strength of our study is that it provides new information To the best of our knowledge
there is only one publication reporting on interobserver agreement with regard to describing
ultrasound findings in adnexal masses using the IOTA terminology (11) when performing live
ultrasound examinations (23) However that study (23) evaluated interobserver agreement
with regard to the ten ultrasound features in the IOTA simple rules (2425) not the variables
included in the IOTA logistic regression models LR1 and LR2 and agreement was estimated
between examiners with different levels of experience The variable with poorest agreement
beyond chance in the study cited was acoustic shadowing (Kappa 036) We have found no
published study that has estimated inter-observer reproducibility of the calculated risk of
malignancy using LR1 or LR2 after live scanning
It is a limitation of our study that up to 204 days elapsed between the scans of the two
sonologists (up to 41 days for malignant masses) Because days elapsed between the scans
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 12
12
theoretically the inter-observer differences could be explained by the lesions having changed
in size or morphology between the scans We find this highly unlikely for the following
reasons First there was no relationship between the differences in measurement results and
the number of days between the scans (Supplementary Fig S1-S5) Nor was there a clear
tendency for inter-observer agreement for discrete variables to depend on the time between the
scans (Supplementary Table S1) Second one would expect a lesion and its components to
increase in size with time but sonologist 1 performing the first scan obtained higher
measurement values than sonologist 2 Third it is our experience after having performed
gynecological scans for more than 20 years that the ultrasound morphology of both benign and
malignant adnexal masses remains constant over time that benign adnexal lesions grow
slowly and that malignant masses do not change appreciably in size even during 1 month of
observation Therefore we believe that the discrepancies between the two sonologists reflect
true inter-observer differences and not a change of the masses over time A second limitation is
that we did not include estimation of the reproducibility of retrieving anamnestic information
(current hormonal therapy personal history of ovarian cancer) the anamnestic information
collected by the second sonologist being used in all cases It cannot be entirely excluded that
patients would answer differently when asked by different sonologists or that sonologists
could interpret the answers of the patients differently A third limitation is that we did not
estimate intra-observer reproducibility We considered four scans (two per sonologist) likely to
be unacceptable to patients For the same reason only two sonologists were involved in this
study and our results are generalizable only to sonologists with a similar level of experience
The results of this live scanning study are similar to those of another study in which the
same sonologists assessed the same variables using 3D ultrasound volumes from adnexal
masses in another tumor population (15) The similarity in results between the two studies is
surprising because the conditions when assessing 3D ultrasound volumes are different from
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 13
13
those during a live scan When evaluating ultrasound volumes sonologists are exposed to the
same ultrasound images and so any interobserver difference should be explained exclusively
by differences in interpreting the ultrasound information During a live scan there are more
sources of bias This could result either in poorer or better interobserver agreement than when
3D ultrasound volumes are assessed poorer because ultrasound examiners are likely to use
different machine settings and scanning conditions may change from one minute to another
better because the dynamic nature of live scanning facilitates discrimination between solid
components and amorphous tissue
Our results showed that two experienced sonologists agreed quite well in their classification
of masses as benign or malignant using the 10 risk of malignancy cutoff of LR1 and LR2
and that the diagnostic performance of LR1 and LR2 with regard to discrimination between
benign and malignant tumors was similar for the two sonologists and similar to that reported
by others (14 26-28) This is reassuring because the main purpose of using model LR1 and
LR2 is to classify tumors as benign or malignant Potentially however LR1 and LR2 can be
used not only to classify adnexal masses as benign or malignant but also to counsel a patient
about her individual risk of malignancy (13) If to use the calculated risk for individual
counseling one must be reasonably certain not only that the estimated risk agrees well with the
true risk (when externally validated both LR1 and LR2 underestimated the true risk especially
in the risk interval 30-70 (14) but also that the risk estimates are reproducible ie that
different examiners will obtain similar risk estimates Our results show that risks estimates may
differ substantially between experienced observers the difference in estimated risk being gt250
percentage units in 9 and 12 of cases when using LR1 and LR2 respectively Interobserver
agreement above chance was poorest for those variables in the models that are heavily
dependent on subjective evaluation ie color score presence of color Doppler signals in
papillary projections irregular cyst walls and acoustic shadowing Indeed differences in these
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 14
14
explained most of the largest inter-observer differences in calculated risk of malignancy In
models based on few variables changing values in only one variable may result in large
differences in predicted risks while a model with many variables is less vulnerable to a change
in one or even few variables Our results illustrate this (Supplementary Tables S2 and S3 )
When using LR2 (which includes six variables) a change in value for one single categorical
variable explained an inter-observer difference in calculated risk gt25 percentage units in eight
of 14 cases while when using LR1 (which includes 12 variables) a change in value for one
single categorical variable explained an inter-observer difference in calculated risk gt25
percentage units in only four of 11 cases Acoustic shadowing is a strong variable in both LR1
and LR2 and has great impact on the calculated risk in LR2 with only six variables In our
hands as well as in those of Ruiz de Gauna et al (23) inter-observer agreement for acoustic
shadowing was at most moderate The interobserver agreement for color score was only fair in
our study and color score is an important variable in LR1
To improve inter-observer reproducibility of calculated risks based on LR1 and LR2 inter-
observer differences in descriptions and measurements of adnexal masses using the IOTA
terminology and measurement technique need to be reduced One way to achieve this could be by
providing courses on and training in how to examine and describe adnexal masses using the
IOTA terms Interactive courses in which a large number of ultrasound images are discussed with
the course participants are likely to be very valuable in this respect More precise definitions of
the IOTA terms for example by providing ample imaging material would probably also help
improve inter-observer agreement Special attention should be given to the variables with poorest
reproducibility ie the color score wall irregularity acoustic shadowing and detection of blood
flow in papillary projections Until better inter-observer agreement in the calculated risk of
malignancy using LR1 and LR2 has been shown one should be cautious with using the risk
estimates for individual patient counselling
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 15
15
Acknowledgements
None
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 16
16
References
1 Granberg S Norstroumlm A Wikland M Tumors in the lower pelvis as imaged by vaginal
sonography Gynecol Oncol 199037224-9
2 Benacerraf BR Finkler NJ Wojciechowski C Knapp RC Sonographic accuracy in the
diagnosis of ovarian masses J Reprod Med 199035491-5
3 Valentin L Pattern recognition of pelvic masses by gray-scale ultrasound imaging the
contribution of Doppler ultrasound Ultrasound Obstet Gynecol 199914338-47
4 Valentin L Prospective cross-validation of Doppler ultrasound examination and gray-
scale ultrasound imaging for discrimination of benign and malignant pelvic masses
Ultrasound Obstet Gynecol 199914273-83
5 Timmerman D Schwaumlrzler P Collins WP Claerhout F Coenen M Amant F et al
Subjective assessment of adnexal masses with the use of ultrasonography an analysis
of interobserver variability and experience Ultrasound Obstet Gynecol 19991311-6
6 Sokalska A Timmerman D Testa AC Van Holsbeke C Lissoni AA Leone FPG et al
Diagnostic accuracy of transvaginal ultrasound examination for assigning a specific
diagnosis to adnexal masses Ultrasound Obstet Gynecol 200934462-70
7 Valentin L Use of morphology to characterize and manage common adnexal masses
Best Pract Res Clin Obstet Gynaecol 20041871-89
8 Tailor A Jurkovic D Bourne TH Collins WP Campbell S Sonographic prediction of
malignancy in adnexal masses using multivariate logistic regression analysis
Ultrasound Obstet Gynecol 19971041-7
9 Timmerman D Bourne TH Tailor A Collins WP Verrelst H Vandenberghe K et al
A comparison of methods for the preoperative discrimination between benign and
malignant adnexal masses the development of a new logistic regression model Am J
Obstet Gynecol 199918157-65
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 17
17
10 Alcazar JL Jurado M Prospective evaluation of logistic model based on sonographic
morphologic and color Doppler findings developed to predict adnexal malignancy J
Ultrasound Med 199918837-42
11 Timmerman D Valentin L Bourne TH Collins WP Verrelst H Vergote I Terms
definitions and measurements to describe the sonographic features of adnexal tumors a
consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group
Ultrasound Obstet Gynecol 200016500-5
12 Timmerman D Testa AC Bourne T Ferrazzi E Ameye L Konstantinovic ML et al
Logistic regression model to distinguish between the benign and malignant adnexal
mass before surgery a multicenter study by the International Ovarian Tumor Analysis
Group J Clin Oncol 2005348794-801
13 Kaijser J Bourne T Valentin L Sayasneh A Van Holsbeke C Vergote I et al
Improving strategies for diagnosing ovarian cancer a summary of the International
Ovarian Tumor Analysis (IOTA) studies Ultrasound Obstet Gynecol 201341 9-20
14 Timmerman D Van Calster B Testa AC Guerriero S Fischerova D Lissoni AA et al
Ovarian cancer prediction in adnexal masses using ultrasound-based logistic regression
models a temporal and external validation study by the IOTA group Ultrasound Obstet
Gynecol 201036226-34
15 Sladkevicius P Valentin L Intra- and inter-observer agreement when describing
adnexal masses using the International Ovarian Tumour Analysis (IOTA) terms and
definitions a study on three-dimensional (3D) ultrasound volumes Ultrasound Obstet
Gynecol 201341318-27
16 Heintz APM Odicino F Maisonneuve P Beller U Benedet JL Creasman WT et al
Carcinoma of the Ovary 25th Annual Report on the Results of Treatment in
Gynecological Cancer Int J Gynecol Obstet 200383S135-S166 (suppl 1)
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 18
18
17 Cohen J A coefficient of agreement for nominal scales Educ Psychol Meas 196020
37ndash46
18 Kundel HL Polansky M Measurement of observer agreement Radiology 2003228
303-8
19 Brennan P Silman A Statistical methods for assessing observer variability in clinical
measures BMJ 1992304 1491-4
20 Bland JM Altman DG Statistical methods for assessing agreement between two
methods of clinical measurement Lancet 19861307-10
21 Bartlett JW Frost C Reliability repeatability and reproducibility analysis of
measurement errors in continuous variables Ultrasound Obstet Gynecol 200831466-
75
22 Kottner J Audigeacute L Brorson S Donner A Gajewski BJ Hroacutebjartsson A et al
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed J
Clin Epidemiol 20116496-106
23 Ruiz de Gauna B Sanchez P Pineda L Utrilla-Layna J Juez L Alcaacutezar JLInter-
observer agreement with regard to describing adnexal masses using the IOTA simple
rules in a real-time setting and when using three-dimensional ultrasound volumes and
digital clips Ultrasound Obstet Gynecol 20144495-100
24 Timmerman D Testa AC Bourne T Ameye L Jurkovic D Van Holsbeke C et al
Simple ultrasound-based rules for the diagnosis of ovarian cancer Ultrasound Obstet
Gynecol 200831681-90
25 Timmerman D Ameye L Fischerova D Epstein E Melis GB Guerriero S et al
Simple ultrasound rules to distinguish between benign and malignant adnexal masses
before surgery prospective validation by IOTA group BMJ 2010341c6839
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
agreement in ultrasound assessment of ovarian masses 19
19
26 Van Holsbeke C Van Calster B Testa AC Domali E Lu C Van Huffel S et al
Prospective internal validation of mathematical models to predict malignancy in
adnexal masses results from the international ovarian tumor analysis study Clin Cancer
Res 200915684-91
27 Nunes N Yazbek J Ambler G Hoo W Naftalin J Jurkovic D Prospective evaluation
of the IOTA logistic regression model LR2 for the diagnosis of ovarian cancer
Ultrasound Obstet Gynecol 201240355-9
28 Nunes N Ambler G Hoo WL Naftalin J Foo X Widschwendter M et al
A prospective validation of the IOTA logistic regression models (LR1 and LR2) in
comparison to subjective pattern recognition for the diagnosis of ovarian cancer
Int J Gynecol Cancer 2013231583-9
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 1 Variables for which inter-observer reproducibility was estimated _______________________________________________________________________ Variables included in logistic regression models LR1 and LR2 or of importance for these models Maximum diameter of adnexal mass mm Maximum diameter of largest solid component mm Maximum diameter of largest solid component le50 gt50 mm Presence of an entirely solid adnexal mass yes no Papillary projections present yes no Irregular internal cyst walls yes no Acoustic shadows yes no Color Doppler signals in papillary projection yes no Color score 1 2 3 4 Tenderness of adnexal mass at scan yes no Ascites yes no Other IOTA variables Continuous IOTA variables Mean diameter of adnexal mass mm Mean diameter of largest solid component mm Height of the largest papillary projection mm Categorical IOTA variables Type of tumor unilocular unilocular solid multilocular multilocular solid solid Mean diameter of adnexal mass le40 41-60 61-80 81-100 and gt100 mm Number of cyst locules 0 1 2 3 4 5 6-10 11-20 and gt20 le10 gt10dagger Septum present yes no Incomplete septum present yes no Solid component yes no Papillary projections 0 1 2 3 ge4 1 2 3 ge4Dagger lt4 gt4dagger Echogenicity of cyst fluid anechoic low level ground glass mixed no cyst fluid Color Doppler signals detectable yes no Ovarian crescent sign yes no Diagnosis based on LR1 and LR2 Calculated risk of malignancy using LR1 le10 gt10 Calculated risk of malignancy using LR2 le10 gt10 ___________________________________________________________________________ IOTA international ovarian tumor analysis group LR1 logistic regression model 1 LR2 logistic regression model 2 The risk of malignancy using the LR1 is derived as y =1(1+e-z) where z = -67468 + 15985 (a) ndash 09983 (b) + 00326 (c) + 000841 (d) ndash 08577 (e) + 15513 (f) + 11737 (g) + 09281 (h) + 00496 (i) + 11421 (j) ndash 23550 (k) + 04916 (l) and e is the mathematical constant and base value of natural logarithms The 12 variables included in LR1 are (a) personal history of ovarian cancer (yes = 1 no = 0) (b) current hormonal therapy (yes = 1 no = 0) (c) age of the patient (in years) (d) maximum diameter of the lesion (in millimeters) (e) mass tender at scan (yes = 1 no = 0) (f) the presence of ascites (yes = 1 no = 0) (g) the presence of blood flow within a solid papillary projection (yes = 1 no = 0) (h) the presence of a purely solid tumor (yes = 1 no = 0) (i) maximal diameter of the solid component (expressed in millimeters but with no increase gt 50 mm) (j) irregular internal cyst walls (yes = 1 no = 0) (k) the presence of acoustic shadows (yes = 1 no = 0) and (l) color score (1 2 3 or 4) The risk of malignancy using LR2 is derived as y = 1(1 + e-z) where z = - 53718 + 00354 (c) + 16159 (f) + 11768 (g) + 00697 (i) + 09586 (j) - 29486 (k)12 A calculated risk of malignancy of more than 10 classified the adnexal mass as malignant dagger includes 0 Dagger includes only cases where papillary projections were registered at both examinations
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 2 Histological diagnoses of the masses
___________________________________________________________________________ Histological diagnosis of tumor Number ___________________________________________________________________________ Benign tumors 94
Benign simple cyst 7
Endometrioma 10
Dermoid cyst 16
Serous cystadenoma 16
Mucinous cystadenoma 18
Myomafibroma 9
Cystadenofibroma 11
Paraovarian cyst 5
Sactosalpinx chronic salpingitis 1
Leydig cell tumor 1
Borderline tumors 4
Serous 2
Mucinous 1
Endometrioid 1
Invasive malignancy 19
Primary ovarian adenocarcinoma 13
Granulosa cell tumor 3
Dysgerminoma 1
Leiomyosarcoma 1
Malignant aggressive B-cell lymphoma 1
___________________________________________________________________________
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 3 Inter-observer reproducibility of continuous variables used in the risk calculation models LR1 and LR2 and of other continuous variables
used to describe adnexal masses
Measurement results
(both sonologists)
Difference in mm between two measurements
made by sonologists 1 and 2a
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Variables used in
models LR1 and LR2
Maximum diameter of
adnexal mass mm
70 (n=234)
10 ndash 313
380 (n=117)
112 ndash 648
-2524 ndash 3282
0958 (0937 ndash 0971)
Maximum diameter of
largest solid component
mmb
2950 (n=122)
5 ndash 180
192 (n=61)
-174 ndash 558
-2666 ndash 3050
0942 (0905 ndash 0-964)
Other variables used to
describe adnexal mass
Mean diameter
of adnexal mass mm
585 (n=234)
9 ndash 240
105 (n=117)
-015 ndash 195
-861 ndash 1072
0971 (0958 ndash 0980)
Mean diameter
of largest solid
component mmb
22 (n=122)
4 ndash 156
059 (n=61)
-182 ndash 298
-1816 ndash 1932
0962 (0937 ndash 0977)
Height of largest papillary
projection mmc
8 (n=42)
3 ndash 25
-051 (n=21)
-293 ndash 191
-1161 ndash 1059
0609 (0245 ndash 0821)
a the measurements made by sonologist 2 were subtracted from the measurements made by sonologist 1
CI confidence interval Intra-CC intra-class correlation coefficient bincludes only cases where solid components were registered by both observers
c includes only cases where papillary projections were registered by both observers using logarithmically transformed data the results were as
follows mean difference 103 (95CI 078 ndash 136) limits of agreement 031 ndash 361 Intra-CC 0547 (0153 ndash 0789) these results can be used for
comparison with our published results on inter-observer agreement when analyzing ultrasound volumes obtained by three-dimensional
ultrasound15
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 4 Inter-observer agreement for categorical variables included in the risk calculation models LR1 and LR2 and for other categorical variables used to describe adnexal masses
Agreement Kappa value
Variables used in models LR1 and LR2 Presence of entirely solid tumor (yes no) 97 (114117) 091 Maximum diameter of largest solid component (le50 mm gt50 mm) 95 (111117) 083 Maximum diameter of largest solid component (le50 mm gt50 mm)a 92 (5661) 082 Tenderness of adnexal mass (yes no) 97 (113117) 078 Ascites (yes no) 97 (113117) 073 Papillary projections (yes no) 87 (102117) 065 Solid component (yes no) 84 (98117) 067 Irregular cyst wall (yes no) 79 (92117) 056 Acoustic shadows (yes no) 85 (99117) 058 Color Doppler signals in papillary projections (yes no)b 90 (105117) 048 Color Doppler signals in papillary projections (yes no)c 86 (1821) 071 Color score (1 2 3 4) 40 (47117) 036d Other variables used to describe adnexal masses Tumor type (unilocular unilocular solid multilocular multilocular solid solid) 97 (113117) 070 Mean diameter of tumor (mm) le40 41-60 61-80 81-100 gt100 66 (77117) 076d Number of locules 0 1 2 3 4 5 6-10 11-20 gt20 58 (68117) 079d
le10 gt10e 94 (110117) 080 Septum (yes no) 88 (103117) 076 Incomplete septum (yes no) 98 (115117) para Number of papillary projections 0 1 2 3 ge4 81 (95117) 064d 1 2 3 ge4c 67 (1421) 063d lt4 gt4e 96 (112117) 068 Echogenicity of cyst fluid (anechoic low level ground glass mixed no fluid) 67 (78117) 056 Color Doppler signals detectable (yes no) 84 (98117) 030 Ovarian crescent sign (yes no) 78 (91117) 049 Fluid in Douglas pouch (yes no) 87 (102117) 072 ____________________________________________________________________________________________ a includes only cases where solid components were registered at both examinations b This variable reflects disagreement both with regard to the presence of papillary projections and the presence of blood flow in papillary projection (ie if one observer recorded the presence of papillary projections and the other not there was disagreement with regard to this variable and if both observes recorded the presence of papillary projections but disagreed with regard to the presence of blood flow in the papillary projections there was disagreement with regard to this variable) c includes only cases where papillary projections were registered at both examinations d weighted Kappa index e includes 0 para absent kappa values are explained by asymmetric field tables making it impossible to calculate them
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Table 5 Inter-observer reproducibility of the risk of malignancy calculated using the risk calculation models LR1 and LR2
Calculated risk of malignancy
(both sonologists)
Difference between the risk calculated
by sonologist 1 and 2
Intra-CC
Parameter Median
Range
Mean 95 CI Limits of
agreement
Point estimate
(95 CI)
Risk of malignancy
calculated using LR1
785 (n=234)
010 ndash 9910
-053 (n=117)
-307 ndash 201
-2805 ndash 2699
0911 (0874 ndash 0937)
Risk of malignancy
calculated using LR2
665 (n=234)
010 ndash 9840
002 (n=117)
-306 ndash 310
-3322 ndash 3326
0832 (0766 ndash 0880)
CI confidence interval Intra-CC intra-class correlation coefficient
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Legends for figure
Figure 1 a) Scatterplot showing the relationship between inter-observer difference (observer
1 minus observer 2) in calculated risk and magnitude of calculated risk when using logistic
regression model LR1 The plot manifests a diamond shape the differences being smallest for
the lowest and highest risks For risks lt 25 and gt 95 the differences are very small
LOA limits of agreement b) Scatterplot showing the relationship between inter-observer
difference in calculated risk and magnitude of calculated risk when using logistic regression
model LR2 The plot manifests a diamond shape the differences being smallest for the lowest
and highest risks
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906
Published OnlineFirst November 25 2014Clin Cancer Res Povilas Sladkevicius and Lil Valentin malignancy using logistic regression modelsappearance of adnexal masses and in calculating the risk of Inter-observer agreement in describing the ultrasound
Updated version
1011581078-0432CCR-14-0906doi
Access the most recent version of this article at
Material
Supplementary
httpclincancerresaacrjournalsorgcontentsuppl201411261078-0432CCR-14-0906DC1
Access the most recent supplemental material at
Manuscript
Authoredited Author manuscripts have been peer reviewed and accepted for publication but have not yet been
E-mail alerts related to this article or journalSign up to receive free email-alerts
Subscriptions
Reprints and
pubsaacrorgDepartment at
To order reprints of this article or to subscribe to the journal contact the AACR Publications
Permissions
Rightslink site Click on Request Permissions which will take you to the Copyright Clearance Centers (CCC)
httpclincancerresaacrjournalsorgcontentearly201411251078-0432CCR-14-0906To request permission to re-use all or part of this article use this link
Research on June 14 2018 copy 2014 American Association for Cancerclincancerresaacrjournalsorg Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited Author Manuscript Published OnlineFirst on November 25 2014 DOI 1011581078-0432CCR-14-0906