ON THE EQUIVALENCE BETWEEN GRAPHICAL AND TABULAR REPRESENTATIONS FOR SECURITY RISK ASSESSMENT
Katsiaryna Labunets1, Fabio Massacci1, Federica Paci2
1University of Trento, Italy (<[email protected]>)2University of Southampton, UK ([email protected])
REFSQ’17, Essen, GermanyMarch 2nd, 2017
The Problem [1/2]• Several security risk assessment
(SRA) methods and standards to identify threats and possible security requirements are available
• Academia relies on graphical methods(e.g. Anti-Goals, Secure Tropos, CORAS)
• Industry opts for tabular methods(OCTAVE, ISO 27005, NIST 800-30)
• REFSQ’17 representation stats:• 5 papers discuss graphical notations
(i*, Use Сases, BPMN diagrams),• 3 papers on mixed methods,• 1 paper studied requirements in
natural language.
2
The Problem [2/2]• Are graphical methods actually better?• No clear winner from the past experiments
• [ESEM 2013]: • Graph > Table w.r.t. # of threats (p < 5%) • Table > Graph w.r.t. # of security controls (p < 5%)• Graph =? Table w.r.t. perceived efficacy (not statistically
significant)• [EmpiRE at RE 2014]:
• Graph =? Table w.r.t # of threats and controls (not statistically significant)
• Graph > Table w.r.t. perceived efficacy (p <5%)• Are they really different?
3
Both methods have clear process
Tabular method has less clear process
Research Questions• RQ1: Are tabular and graphical SRA methods equivalent
w.r.t. actual efficacy?• RQ2: Are tabular and graphical SRA methods equivalent
w.r.t. preceived efficacy?
How to answer?
4
Difference tests• Problem
• H0: μA = μB
• Ha: μA.≠ μB
• Test: t-test, Wilcoxon, Mann-Whitney, etc.• we can only reject the null hypothesis H0. • we cannot accept the alternative hypothesis Ha.
• Lack of evidence for difference ≠ evidence for equiavalence• How different two methods should be in order to be considered
different?
5
Equivalence test• Two One-Sided Tests (TOST) [Schuimann, 1981]• Problem
• ẟ defines the range whithin which two methods are considered to be equivalent• Percentage ([80%;125%] by FDA or [70%;143%] by EU) for rational
data• Fixed value (e.g. 0.6 for ordinal values on 1-5 Likert scale with 3 as a
mean value) for ordinal data• We can use t-test, or Wilcoxon, or Mann-Whitney, etc.
6
Experimental Design• Goal
• Compare graphical and tabular representation w.r.t. to the actual and perceived efficacy of an SRA method when applied by novices.
• Treatments• Method: Graphical and tabular SRA methods used in industry• Task: Conduct SRA for each of four security tasks
1. Identity Management security (IM),2. Access Management security (AM),3. Web Application and Database security (WebApp/DB),4. Network and Infrastructural security (Network/Infr).
• Experiment: we conducted two controlled experiments in 2015 and 2016 years.
7
Experimental Execution• ATM Domain
• Remotely Operated Tower (ROT) Scenario by Eurocontrol• Unmanned Air Traffic Management (UTM) Scenario by NASA
• Methods: • Graphical CORAS by SINTEF• Tabular SecRAM by SESAR
• Participants were provided with a catalogues of security threats and controls*
• Participants: 35 and 48 MSc students in Computer Science were involved in ROT2015 and UTM2016 controlled experiments
8
* M. de Gramatica, K. Labunets, F. Massacci, F. Paci and A. Tedeschi. “The Role of Catalogues of Threats and Security Controls in Security Risk Assessment: An Empirical Study with ATM Professionals”. In Proc. of REFSQ’15.
Experimental Protocol
GROUP 1
BACKGROUND
Q1
PARTICIPANTS
TRAINING APPLICATION
RESEARCHERS
EVALUATION
REP
OR
TSM
ETH
OD
S
FOCUS GROUPS
INTERVIEW
FINAL METHOD IMPRESSION
Q32
DOMAIN EXPERTS
GROUPS DELIVER RESULTS
REPORT QUALITY
ASSESSMENT
METHOD DESIGNERS+
DOMAIN EXPERTS
GROUP X
ROT/UTM SCENARIO
GRAPHICAL METHOD
TABULAR METHOD
TRAINING ON SECURITY TOPICSIM AM WebApp/DB Network/Infr
Groups of Type A
Groups of Type B
Groups of Type B
Groups of Type A
Groups of Type A
Groups of Type B
Groups of Type B
Groups of Type A
INITIAL METHOD IMPRESSION
Q31
9
Type A Type BROT2015 9 groups 9 groupsUTM2016 13 groups 11 groups
Results: Actual Efficacy
Exp Act.Efficacy
Tabular Mean
GraphicalMean
ẟmeanTab-Graph
TOSTp-value
ROT2015 Threats 3.17 2.95 +0.22 0.0009SC 3.28 2.97 +0.31 0.001
UTM2016 Threats 3.28 3.24 +0.04 6.3*10-6
SC 3.31 3.29 +0.02 2.4*10-7
10
Table ≈ Graph (both experiments) w.r.t. quality of threats and controls
Actual Efficacy: whether the treatment improves performance of the task
Results: Perceived Efficacy
11
Exp PercEfficacy
Tabular Mean
GraphicalMean
ẟmeanTab-Graph
TOSTp-value
ROT2015 PEOU 3.63 3.20 +0.43 0.08PU 3.54 3.05 +0.37 0.18
UTM2016 PEOU 3.74 3.60 +0.14 2.6*10-5
PU 3.67 3.29 +0.38 0.03
ROT2015 • PEOU & PU: Tabular ? Graphical – inconclusiveUTM2016• PEOU & PU: Tabular ≈Graphical
Threats to Validity• Difference between two experiments (internal validity)
• Low statistical significance (conclusion validity)
• Use of students instead of practitioners (external validity)
• Simple scenario (external validity)
12
Conclusions• No difference? – check equivalence test• How to measure Actual Efficacy: Quantity vs. Quality?• Both graphical and tabular methods have similar support
for SRA• Clear process matters!
• What is next?• Comprehensibility of risk modeling notations
• Labunets et al. “Model Comprehension for Security Risk Assessment: An Empirical Comparison of Tabular vs. Graphical Representations”. Empirical Software Engineering, 2017.
13