Post on 27-Jan-2015
description
transcript
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Tanja E.J. Vos Centro de Investigación en Métodos de Producción de Software
Universidad Politecnica de Valencia Valencia, Spain
tvos@pros.upv.es
1
European initiatives where academia and industry get together.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Overview
• EU projects (with SBST) that I have been coordina;ng: – EvoTest (2006-‐2009)
– FITTEST (2010-‐2013)
• What is means to coordinate them and how they are structured
• How do we evaluate their results through academia-‐industry projects.
2
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
EvoTest • Evolutionary Testing for Complex Systems • September 2006– September 2009 • Total costs: 4.300.000 euros • Partners:
– Universidad Politecnica de Valencia (Spain) – University College London (United Kingdom) – Daimler Chrysler (Germany) – Berner & MaVner (Germany) – Fraunhofer FIRST (Germany) – Motorola (UK) – Rila Solu;ons (Bulgaria)
• Website does not work anymore
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
EvoTest objec;ves/results
• Apply Evolu;onary Search Based Tes;ng techniques to solve tes;ng problems from a wide spectrum of complex real world systems in an industrial context.
• Improve the power of evolu;onary algorithms for searching important test scenarios, hybridising with other techniques:
– other general-‐purpose search techniques,
– other advanced so_ware engineering techniques, such as slicing and program transforma;on.
• An extensible and open Automated Evolu;onary Tes;ng Architecture and Framework will be developed. This will provide general components and interfaces to facilitate the automa;c genera;on, execu;on, monitoring and evalua;on of effec;ve test scenarios.
4
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
FITTEST • Future Internet Testing • September 2010 – December 2013 • Total costs: 5.845.000 euros • Partners:
– Universidad Politecnica de Valencia (Spain) – University College London (United Kingdom) – Berner & MaVner (Germany) – IBM (Israel) – Fondazione Bruno Kessler (Italy) – Universiteit Utrecht (The Netherlands) – So_team (France)
• hVp://www.pros.upv.es/fiVest/
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Future Internet Applications – Characterized by an extreme high level of dynamism – Adaptation to usage context (context awareness) – Dynamic discovery and composition of services – Etc..
• Testing of these applications gets extremely important • Society depends more and more on them • Critical activities such as social services, learning, finance, business.
• Traditional testing is not enough – Testwares are fixed
• Continuous testing is needed – Testwares that automatically adapt to the dynamic behavior of the
Future Internet application – This is the objective of FITTEST
FITTEST objec;ves/results
The FITTEST project is funded by the European Commission (FP7-ICT-257574) ORACLES
RUN SUT
GENERATE TEST CASES
AUTOMATETEST CASES
SUT
INSTRUMENT
COLLECT & PREPROCESS
LOGS
LOGGING TEST-WARE GENERATION
ANALYSE & INFER MODELS
ANALYSE & INFER ORACLES
EXECUTE TEST CASES
TEST EVALUATION
EVALUATE TEST CASES
MODEL BASED ORACLES
HUMAN ORACLES
TEST RESULTS
TEST EXECUTION
FREE ORACLES
Model based oracles
PROPERTIES BASED ORACLES
Log based oracles
End-users
PATTERN BASED ORACLES
MANUALLY
Domain experts Domain Input Specifications
OPTIONAL MANUAL EXTENSIONS
How does it work?
LOGGING
1. Run the target System that is Under Test (SUT)
2. Collect the logs it generates This can be done by: • real usage by end users of the application
in the production environment
• test case execution in the test environment.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
How does it work?
ORACLES
RUN SUT
GENERATE TEST CASES
AUTOMATETEST CASES
SUT
INSTRUMENT
COLLECT & PREPROCESS
LOGS
LOGGING TEST-WARE GENERATION
ANALYSE & INFER MODELS
ANALYSE & INFER ORACLES
EXECUTE TEST CASES
TEST EVALUATION
EVALUATE TEST CASES
MODEL BASED ORACLES
HUMAN ORACLES
TEST RESULTS
TEST EXECUTION
FREE ORACLES
Model based oracles
PROPERTIES BASED ORACLES
Log based oracles
End-users
PATTERN BASED ORACLES
MANUALLY
Domain experts Domain Input Specifications
OPTIONAL MANUAL EXTENSIONS
GENERATION
1. Analyse the logs 2. Generate different testwares:
• Models • Domain Input Specification • Oracles
3. Use these to generate and automate a test suite consisting off:
• Abstract test cases • Concrete test cases • Pass/Fail Evaluation criteria
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
How does it work?
ORACLES
RUN SUT
GENERATE TEST CASES
AUTOMATETEST CASES
SUT
INSTRUMENT
COLLECT & PREPROCESS
LOGS
LOGGING TEST-WARE GENERATION
ANALYSE & INFER MODELS
ANALYSE & INFER ORACLES
EXECUTE TEST CASES
TEST EVALUATION
EVALUATE TEST CASES
MODEL BASED ORACLES
HUMAN ORACLES
TEST RESULTS
TEST EXECUTION
FREE ORACLES
Model based oracles
PROPERTIES BASED ORACLES
Log based oracles
End-users
PATTERN BASED ORACLES
MANUALLY
Domain experts Domain Input Specifications
OPTIONAL MANUAL EXTENSIONS
Execute the test cases and start a new test cycle for continuous testing and adaptation of the test wares!
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
What does it mean to be an EU project coordinator
• Understand what the project is about, what needs to be done and what is most important.
• Do NOT be afraid to get your hands dirty. • Do NOT assume people are working as hard on the project
as you are (or are as enthusiastic as you ;-) • Do NOT assume that the people that ARE responsible for
some tasks TAKE this responsibility • Get CC-ed in all emails and deal with it • Stalk people (email, sms, whatsapp, skype, voicemail
messages) • Be patient when explaining the same things over and over
again 10
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
EU project structures • We have: WorkPackages (WP) • These are composed of: Tasks • These result in: Deliverables
WP
Project M
anagem
ent
WP
Exploita;on and Dissemina;on
WP: Integrated it all together in a superduper solu;on that industry needs
WP: Do some research
WP: And some more
WP: Do more research
WP: And more……..
WP: Evaluate Your Results through Case Studies
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
EU project how to evaluate your results
• You need to do studies that evaluate the resulting testing tools/techniques within a real industrial environment
12
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
WE NEED MORE THAN …
Glenford Myers 1979! Triangle Problem
Test a program which can return the type of a triangle based on the widths of the 3 sides
For evalua;on of tes;ng tools we need more! b a
c
α β
γ
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
WE NEED….
Real people Real faults
Real systems
Real Tes;ng Environments
Real Tes;ng Processes
Empirical studies with
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
What are empirical studies • [Wikipedia] Empirical research is a way of gaining knowledge by means of
direct and indirect observa8on or experience. • [PPV00] An empirical study is really just a test that compares what we
believe to what we observe. Such tests when wisely constructed and executed play a fundamental role in so?ware engineering, helping us understand how and why things work.
• Collec;ng data: – Quan;ta;ve data -‐> numeric data – Qualita;ve data -‐> observa;ons, interviews, opinions, diaries, etc.
• Different kinds are dis;nguished in literature [WRH+00, RH09] – Controlled experiments – Surveys – Case Studies – Ac;on research
15
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Controlled experiments What Experimental inves;ga;on of hypothesis in a laboratory semng, in which condi;ons are set up to isolate the variables of interest ("independent variables") and test how they affect certain measurable outcomes (the "dependent variables")
Good for – Quan;ta;ve analysis of benefits of a tes;ng tool or technique – We can use methods showing sta;s;cal significance – We can demonstrate how scien;fic we are! [EA06]
Disadvantages – Limited confidence that laboratory set-‐up reflects the real situa;on – ignores contextual factors (e.g. social/organiza;onal/poli;cal factors) – extremely ;me-‐consuming
See: [BSH86, CWH05, PPV00, Ple95] 16
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Surveys What Collec;ng informa;on to describe, compare or explain knowledge, amtudes and behaviour over large popula;ons using interviews or ques0onnaires.
Good for – Quan;ta;ve and qualita;ve data – Inves;ga;ng the nature of a large popula;on – Tes;ng theories where there is liVle control over the variables
Disadvantages – Difficul;es of sampling and selec;on of par;cipants – Collected informa;on tends to subjec;ve opinion
See: [PK01-‐03]
17
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Case Studies What A technique for detailed exploratory inves;ga;ons that aVempt to understand and explain phenomenon or test theories within their context
Good for – Quan;ta;ve and qualita;ve data – Inves;ga;ng capability of a tool within a specific real context – Gaining insights into chains of cause and effect – Tes;ng theories in complex semngs where there is liVle control over the
variables (Companies!)
Disadvantages – Hard to find good appropriate case studies – Hard to quan;fy findings – Hard to build generaliza;ons (only context)
See: [RH09, EA06, Fly06]
18
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Ac;on Research What research ini;ated to solve an immediate problem (or a reflec;ve process of progressive problem solving) involving a process of ac;vely par;cipa;ng in an organiza;on’s change situa;on whilst conduc;ng research
Good for – Quan;ta;ve and qualita;ve data – When the goal is solving a problem. – When the goal of the study is change.
Disadvantages – Hard to quan;fy findings – Hard to build generaliza;ons (only context)
See: [RH09, EA06, Fly06, Rob02]
19
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
EU project how to evaluate your results
• You need to do empirical studies that evaluate the resulting testing tools/techniques within a real industrial environment
• The empirical study that best fits our purposes is Case Study – Evaluate the capability of our testing techiques/tools
– In a real industrial context
– Comparing to current testing practice
20
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Case studies: not only for jus;fying research projects
• We need to apply our results in industry to understand the problems they have and if we are going the right direc;on to solve them.
• Real need in so_ware engineering industry to have general guidelines on what tes;ng techniques and tools to use for different tes;ng objec;ves, and how usable these techniques are.
• Up to date these guidelines do not exist.
• If we would have a body of documented experiences and knowledge from which the needed guidelines can be extracted
• With these guidelines, tes;ng prac;cioners might make informed decisions about which techniques to use and es;mate the ;me/effort that is needed.
21
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The challenge
TO DO THIS WE HAVE TO:
• Perform more evaluative empirical case studies in industrial environments • Carry out these studies by following the same methodology, to enhance the
comparison among the testing techniques and tools, • Involve realistic systems, environments and subjects (and not toy-programs
and students as is the case in most current work). • Do the studies thoroughly to ensure that any benefit identified during the
evaluation study is clearly derived from the testing technique studied, and also to ensure that different studies can be compared.
FOR THIS WE NEED:
• A general methodological evaluation framework that can simplify the design of case studies for comparing software testing techniques and make the results more precise, reliable, and easy to compare.
To create a body of evidence consisting of evaluative studies of testing techniques and tools that can be used to understand the needs of industry and derive general guidelines about their usability and applicability in industry.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Obtain answers to general ques;ons about adop;ng different so_ware tes;ng techniques and or tools.
Secondary study (EBSE)
TT1 CS1 CS2
CSn
TT2 CS1 CS2
CSn
TTi CS1 CS2
CSn
TTn CS1 CS2
CSn
Body of Evidence
Tes;ng Techniques and Tools (TT)
Prim
ary case stud
ies
Gen
eral Framew
ork for E
valuaD
ng TesDn
g Techniqu
es and
Too
ls
InstanDate
The idea
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Very brief…what is EBSE Evidence Based So_ware Engineering
• The essence of the evidence-‐based paradigm is that of systema;cally collec0ng and analysing all of the available empirical data about a given phenomenon in order to obtain a much wider and more complete perspec;ve than would be obtained from an individual study, not least because each study takes place within a par;cular context and involves a specific set of par;cipants.
• The core tool of the evidence-‐based paradigm is the Systema;c Literature Review (SLR)
– Secondary study – Gathering an analysing primary stydies.
• See: hVp://www.dur.ac.uk/ebse/about.php
24
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Obtain answers to general ques;ons about adop;ng different so_ware tes;ng techniques and or tools.
Secondary study (EBSE)
TT1 CS1 CS2
CSn
TT2 CS1 CS2
CSn
TTi CS1 CS2
CSn
TTn CS1 CS2
CSn
Body of Evidence
Tes;ng Techniques and Tools (TT)
Prim
ary case stud
ies
Gen
eral Framew
ork for E
valuaD
ng TesDn
g Techniqu
es and
Too
ls
InstanDate
The idea
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Empirical research with industry = difficult
• Not “rocket science”-difficult
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Empirical research with industry = difficult
• But “communication science”-difficult
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
“communica;on science”-‐difficult
28
Only takes 1 hour
Not so long
Not so long
5 minutes
Researcher Practitioner in a company
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Communica;on is extremely difficult
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 30
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Examples…..
Academia • Wants to empirically evaluate T
• What techniques/tools can we compare with?
• Why don´t you know that? • That does not take so much ;me! • Finding real faults would be great! • Can we then inject faults?
• How many people can use it? • Is there historical data? • But you do have that informa;on?
Industry • Wants to execute and use T to see what
happens. • We use intui;on! • You want me to know all that? • That much ;me!? • We cannot give this informa;on. • Not ar;ficial ones, we really need to
know if this would work for real faults. • We can assign 1 person. • That is confiden;al. • Oh.., I thought you did not need that.
31
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
With the objec;ve to improve and reduce some barriers
• Use a general methodological framework. • To use as a vehicle of communica;on • To simplify the design • To make sure that studies can be compared and aggregated
32
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Exis;ng work
• By Lott & Rombach, Eldh, Basili, Do et al, Kitchenham et al • Describe organizational frameworks, i.e.:
– General steps – Warnings when designing – Confounding factors that should be minimized
• We pretended to define a methodological framework that defined how to evaluate software testing techiques, i.e.: – The research questions that can be posed – The variables that can be measured – Etc.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The methodological framework • Imagine a company C wants to evaluate T to see whether it is useful and
worthwhile to incorporate in its company.
• Components of the Framework (each case study will be an instantiation)
– Objectives: effectiveness, efficiency, satisfaction
– Cases or treatments (= the testing techniques/tools)
– The subjects (= practitioners that will do the study)
– The objects or pilot projects: selection criteria.
– The variables and metrics: which data to collect?
– Protocol that defines how to execute and collect data.
– How to analyse the data
– Threats to validity
– Toolbox
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework – research questions
• RQ1: How does T contribute to the effectiveness of testing when it is being
used in real testing environments of C and compared to the current practices
of C?
• RQ2: How does T contribute to the efficiency of testing when it is being used
in real testing environments of C and compared to the current practices of C?
• RQ3: How satisfied (subjective satisfaction) are testing practitioners of C
during the learning, installing, configuring and usage of T when used in real
testing environments?
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework – the cases
• Reused a taxonomy from Vegas and Basili adapted to software testing tools
and augmented with results from Tonella
• The case or the testing technique or tool should be described by:
– Prerequisites: type, life-cycle, environment (platform and languages),
scalability, input, knowledge needed, experience needed.
– Results: output, completeness, effectiveness, defect types, number of
generated test cases.
– Operation: Interaction modes, user guidance, maturity, etc.
– Obtaining the tool: License, cost, support.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework – subjects
• Workers of C that normally use the techniques and tools with which T is being
compared.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework – the objects/pilot projects
• In order to see how we can compare and what type of study we will do, we need answers to the following questions: A. Will we have access to a system with known faults? What informa;on is present
about these faults? B. Are we allowed/able to inject faults into the system? C. Does your company gather data from projects as standard prac;ce? What data is
this? Can this data be made available for comparison? Do you have a company baseline? Do we have access to a sister project?
D. Does company C have enough ;me and resources to execute various rounds of tests?, or more concrete:
• Is company C willing to make a new testsuite TSna with some technique/tool Ta already used in the company C?
• Is company C is willing to make a new testsuite TSnn with some technique/tool Tn that is also new to company C?
• Can we use an exis;ng testsuite TSe that we can use to compare? (Do we know the techniques that were used to create that test suite, and how much ;me it took?)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework -‐ Protocol Can we inject
faults?
Training on T Inject faults
YES
do T to make TST; collect
data
NO
Can we compare with an existing test suite from company C (i.e. TSC)
Can company C make another test suite TSN for
comparison using Tknown or Tunknown
Do we have a version with known
faults?
Do we have a company baseline?
do Tknown o Tunknown to make TSN ;
collect data
NO YES
YESNO
NO YES
Do we have a version with known
faults?
Do we have a version with known
faults?
NO YES
NO
NO YES
YES
1 2
3
6 7
4 5
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework -‐ Scenarios • Remember:
o Does company C have enough time and resources to execute various rounds of tests?, or more concrete:
• Is company C willing to make a new testsuite TSna with some technique/tool Tal already used in the company C?
• Is company C is willing to make a new testsuite TSnn with some technique/tool Tn that is also new to company C?
• Can we use an existing testsuite TSe that we can use to compare? (Do we know the techniques that were used to create that test suite, and how much time it took?)
• Scenario 1 (qualitative assessment only) (Qualitative Effects Analysis) • Scenario 2 (Scenario 1 /\ quantitative analysis based on company baseline) • Scenario 3 ((Scenario 1 \/ Scenario 2) /\ quantitative analysis of FDR) • Scenario 4 ((Scenario 1 \/ Scenario 2) /\ quantitative comparison of T and TSe) • Scenario 5 (Scenario 4 /\ FDR of T and TSe) • Scenario 6 ((Scenario 1 \/ Scenario 2) /\ quantitative comparison of T and (Ta or Tn)) • Scenario 7 (Scenario 6 /\ FDR of T and (Ta or Tn))
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
If we can inject faults, take care that
1. The ar;ficially seeded faults are similar to real faults that naturally occur in real programs due to mistakes made by developers. – To iden;fy realis;c fault types, a history-‐based approach can be used,
i.e. “real” faults can be fetched from the bug tracking system and made sure that these reported faults are an excellent representa;ve of faults that are introduced by developers during implementa;on.
2. The faults should be injected in code that is covered by an adequate number of test cases – e.g., they may be seeded in code that is executed by more than 20
percent and less than 80 percent of the test cases.
3. The faults should be injected “fairly,” i.e., an adequate number of instances of each fault type is seeded.
41
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework – data to collect
• Effectiveness – Number of test cases designed or generated. – How many invalid test cases are generated. – How many repeated test cases are generated – Number of failures observed. – Number of faults found. – Number of false positives (The test is marked as Failed, when the functionality is
working). – Number of false negatives (The test is marked as Passed, when the functionality is not
working). – Type and cause of the faults that were found. – Estimation (or when possible measured) of coverage reached.
• Efficiency
• Subjective satisfaction
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Effectiveness
• Efficiency – Time needed to learn the testing method. – Time needed to design or generate the test cases. – Time needed to set up the testing infrastructure (install, configure, develop test drivers,
etc.) (quantitative). (Note: if software is to be developed or other mayor configuration/installation efforts, it might be a good idea to maintain working diaries).
– Time needed to test the system and observe the failure (i.e. planning, implementation and execution) in hours (quantitative).
– Time needed to identify the fault type and cause for each observed failure (i.e. time to isolate) (quantitative).
• Subjective satisfaction
Defini;on of the framework – data to collect
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Effectiveness
• Efficiency • Subjective satisfaction
– SUS score (10 question questionnaire with 5 likert-scale and a total score)
– 5 reactions (through reaction cards) that will be used to create a word cloud and ven diagrams)
– Emotional face reactions during semi-structured interviews (faces will be evaluated on a 5 likert-scale from “not at all like this” to “very much like this”).
– Subjective opinions about the tool.
A mixture of methods for evaluating subjective satisfaction
Defini;on of the framework – data to collect
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 45
System
Usability Scale (SUS)
© D
igita
l Equ
ipm
ent C
orpo
ratio
n, 1
986
ww
w.u
sabi
lity.
serc
o.co
m/tr
ump/
docu
men
ts/S
usch
apt.d
oc
! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!Strongly!disagree! ! !!!!!!!!!!!!!!!!!!!!!!Strongly!agree!! ! ! ! ! ! ! ! ! !!!!!!!1.!I!think!that!I!would!like!to! !!!!use!this!system!frequently! ! !!!2.!I!found!the!system!unnecessarily!!!!complex!!!3.!I!thought!the!system!was!easy!!!!to!use!!!!!!!!!!!!!!!!!!!!!!! !!4.!I!think!that!I!would!need!the!!!!support!of!a!technical!person!to!!!!be!able!to!use!this!system! !!5.!I!found!the!various!functions!in!!!!this!system!were!well!integrated!! ! ! ! !6.!I!thought!there!was!too!much!!!!inconsistency!in!this!system!! ! ! ! !7.!I!would!imagine!that!most!people!!!!would!learn!to!use!this!system!!!!very!quickly! ! ! !!8.!I!found!the!system!very!!!!cumbersome!to!use!! ! ! !9.!I!felt!very!confident!using!the!!!!system!! !10.!I!needed!to!learn!a!lot!of!!!!things!before!I!could!get!going!!!!with!this!system!! ! !!
!
!
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Why SUS • studies (e.g. [TS04, BKM08]) have shown that this simple
ques;onnaire gives most reliable results. • SUS is technology agnos;c, making it flexible enough to assess a
wide range of interface technologies. • The survey is rela;vely quick and easy to use by both study
par;cipants and administrators. • The survey provides a single score on a scale that is easily
understood by the wide range of people (from project managers to computer programmers) who are typically involved in the development of products and services and who may have liVle or no experience in human factors and usability.
• The survey is nonproprietary, making it a cost effec;ve tool as well.
46
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
SUS Scoring
• SUS yields a single number represen;ng a composite measure of the overall usability of the system being studied. Note that scores for individual items are not meaningful on their own.
• To calculate the SUS score, first sum the score contribu;ons from each item. – Each item's score contribu;on will range from 0 to 4. – For items 1, 3, 5, 7 and 9 the score contribu;on is the scale posi;on
minus 1. – For items 2,4,6,8 and 10, the contribu;on is 5 minus the scale
posi;on. – Mul;ply the sum of the scores by 2.5 to obtain the overall value of SU.
• SUS scores have a range of 0 to 100.
47
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
SUS is not enough…
• In a literature review of 180 published usability studies, Hornbaek [Horn06] concludes that measures of sa;sfac;on should be extended beyond ques;onnaires.
• So we add more….
48
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Reac;on Cards
49
Developed by and © 2002 Microsoft Corporation. All rights reserved.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Emo;onal face reac;ons • The idea is to elicit feedback about the product, par;cularly
emo;ons that arose for the par;cipants while talking about the product (e.g. frustra;on, happiness).
• We will video tape the users when they respond to the following two ques;ons during a semi-‐structured interview.
• Would you recommend this tool it to other colleagues? – If not why – If yes what arguments would you use
• Do you think you can persuade your management to invest in a tool like this? – If not why – If yes what arguments would you use
50
!
Not!at!all!like!this!
! ! ! ! ! Very!much!like!this!
1! 2! 3! 4! 5! 6! 7!
!
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Analysing and interpre;ng the data • Depends on the amount of data we have. • If we only have 1 value for each variable, no analysis techniques
are available and we just present and interpret the data. • If we have sets of values for a variable then we need to use
sta;s;cal methods – Descrip;ve sta;s;cs – Sta;s;cal tests (or significance tes;ng). In sta;s;cs a result is called
sta;s;cally significant if it has been predicted as unlikely to have occurred by chance alone, according to a pre-‐determined threshold probability, the significance level.
• Evalua;ng SBST tools, we will always have sets of values for most of the variables to deal with randomness
51
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Descrip;ve sta;s;cs • Mean, median, middle, standard devia;on, frequency,
correla;on, etc • Graphical visualisa;on: scaVer plot, box plot, histogram, pie
charts
52
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Sta;s;cal test
53
http
://sc
ienc
e.le
iden
univ.
nl/in
dex.
php/
ibl/p
ep/p
eopl
e/To
m_d
e_Jo
ng/T
each
ing
Most impo
rtant is fi
nding te right m
etho
d
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework -‐ Threats
• Threats to validity (of confounding factors) have to be minimized
• These are the effects or situations that might jeopardize the validity of your results….
• Those that cannot be prevented have to be reported • When working with people we have to consider many
sociological effects: • Let us just look at a couple well known ones to give you an
idea…. – The learning curve effect – The Hawthorne effect – The Placebo effect – Etc….
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The learning curve effect • When using new methods/tools people gain familiarity with
their applica;on over ;me (= learning curve). – Ini;ally they are likely to them more ineffec;vely than they might a_er a
period of familiarisa;on/learning.
• Thus the learning curve effect will tend to counteract any posi;ve effects inherent in the new method/tool.
• In the context of evalua;ng methods/tools there are two basic strategies to minimise the learning curve effect (that are not mutually exclusive): 1. Provide appropriate training before undertaking an evalua;on exercise;
2. Separate pilot projects aimed at gaining experience of using a method/tool from pilot projects that are part of an evalua;on exercise.
55
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The Hawthorn effect • When an evalua;on is performed, staff working on the pilot
project(s) may have the percep;on that they are working under more management scru;ny than normal and may therefore work more conscien;ously.
• Name comes from Hawthorne aircra_ factory (lights low or high?). • The Hawthorn effect would tend to exaggerate posi;ve effects
inherent in a new method/tool. • A strategy to minimise the Hawthorne effect is to ensure that a
similar level of management scru;ny is applied to control projects in your case study (i.e. project(s) using the current method/tool) as is applied to the projects that are using the new method/tool.
56
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The placebo effect • In medical research, pa;ents who are deliberately given ineffectual treatments
recover if they believe that the treatment will cure them. • Also a so_ware engineer who believes that adop;ng some prac;ce (i.e.,
wearing a pink t-‐shirt) will improve the reliability of his code may succeed in producing more reliable code.
• Such a result could not be generalised. • In medicine, placebo effect is minimized by not informing the subjects. • This cannot be done in the context of tes;ng tool evalua;ons. • When evalua;ng methods and tools the best you can do is to:
– Assign staff to pilot projects using your normal project staffing methods and hope that the actual selec;on of staff includes the normal mix of enthusiasts, cynics and no-‐hopers that normally comprise your project teams.
– Make a special effort to avoid staffing pilot projects with staff who have a vested interest in the method/tool (i.e. staff who developed or championed it) or a vested interest in seeing it fail (i.e. staff who really hate change rather than just resent it).
• This is a bit like selec;ng a jury. Ini;ally the selec;on of poten;al jurors is at random, but there is addi;onal screening to avoid jurors with iden;fiable bias.
57
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework -‐ Threats
• There are many more factors that all need to be identified • Not only the people but also the technology:
– Did the measurement tools (i.e. Coverage) really measure what we thoughts
– Are the injected faults really representative? – Were the faults injected fairly? – Is the pilot project and software representative? – Were the used oracles reliable? – Etc….
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on of the framework -‐ Toolbox • Toolbox
– Demographic questionnaire: This questionnaire must be answered by the testers before performing the test. This questionnaire aims to obtain the features of the testers: level of experience in using the tool, years, job, knowledge of similar tools,
– Satisfaction questionnaire SUS: Questionnaire in order to extract the testers’ satisfaction when they perform the evaluation.
– Reaction cards – Questions for (taped) semi-structured interviews – Process for investigating the face reactions in videos – An fault taxonomy to classify the found fault. – A software testing technique and tools classification/taxonomy – Working diaries – A fault template to classify each fault detected by means of the test. The template
contains the following information: • Time spent to detect the fault • Test case that found the fault • Cause of the fault: mistake in the implementation, mistake in the design,
mistake in the analysis. • Manifestation of the fault in the code
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Applying or instan;a;ng the framework
• Search Based Structural Testing tool [Vos et.al. 2012] • Search Based Functional Testing tool [Vos et. Al. 2013] • Web testing techniques for AJAX applications [MRT08] • Commercial combinatorial testing tool at Sulake (to be presented at ISSTA
workshop next week) • Automated Test Case Generation at IBM (has been send to ESEM) • We have finished other instances:
– Commercial combinatorial testing tool at SOFTEAM (has been send to ESEM)
• Currently we are working on more instantiations: – Regression testing priorization technique – Continuous testing tool at SOFTEAM – Rogue User Testing at SOFTEAM – …
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Sulake is a Finish company • Develops social entertainment games • Main product: Habbo hotel
– World’s largest virtual community for teenagers – Millions of teenagaers a week all over the world (ages 13-18) – Access direct through the browser or facebook – 11 languages available – 218.000.000 registered users – 11.000.000 visitors / month
• System can be accessed through wide variety of browsers, flashplayers (and their versions) than run on different operating systems!
• Which combinations to use when testing the system?!
Combinatorial Tes;ng example case study Sulake: Context
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Can a tool help? • What about the CTE XL Profesional or CTE for short? • Combinatorial Tree Editor:
– Model your combinatorial problem in a tree – Indicate the priori;es of the combina;ons – The tool automa;cally generated the best test cases!
67
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• What do we want to find out? • Research questions:
– RQ1: Compared to the current test suites used for testing in Sulake, can the test cases generated by the CTE contribute to the effectiveness of testing when it is used in real testing environments at Sulake?
– RQ2: How much effort would be required to introduce the CTE into the testing processes currently implanted at Sulake?
– RQ3: How much effort would be required to add the generated test cases into the testing infrastructure currently used at Sulake?
– RQ4: How satisfied are Sulake testing practitioners during the learning, installing, configuring and usage of CTE when it is used in real testing environment
Combinatorial Tes;ng example case study Sulake: Research Ques;ons
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Current combinatorial testing practice at Sulake – Exploratory testing with feature coverage as objective – Based on real user information(i.e. browsers, OS, Flash,
etc.) combinatorial aspects are taken into account
COMPARED TO • Classification Tree Editor (CTE)
– Classify the combinatorial aspects as a classification tree – Generate prioritized test cases and select
The tes;ng tools evaluated (the cases or treatments)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Subjects: 1 senior tester from Sulake (6 years sw devel, 8 years testing experience)
Who is doing the study (the subjects)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Objects: – 2 nightly builds from Habbo – Existing test suite that Sulake uses (TSsulake) with 42
automated test cases – No known faults, no injection of faults
Systems Under Test (objects or pilot projects)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Can we inject faults?
Training on T Inject faults
YES
do T to make TST; collect
data
NO
Can we compare with an existing test suite from company C (i.e. TSC)
Can company C make another test suite TSN for
comparison using Tknown or Tunknown
Do we have a version with known
faults?
Do we have a company baseline?
do Tknown o Tunknown to make TSN ;
collect data
NO YES
YESNO
NO YES
Do we have a version with known
faults?
Do we have a version with known
faults?
NO YES
NO
NO YES
YES
1 2
3
6 7
4 5
The protocol (scenario 4)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
!Variables) TSSulake) TSCTE)Measuring!effectiveness:!Number!of!test!cases! 42! 68!(selected!42!high!priority)!Number!of!invalid!test!cases! 0! 0!Number!of!repeated!test!cases! 0! 26!Number!of!failures!observed! 0! 12!Number!of!fauls!found! 0! 2!Type!and!cause!of!the!faults! N/A! 1.!critical,!browser!hang!
2.!minor,!broken!UI!element!Feature!coverage!reached! 100%! 40%!All!pairs!coverage!reached! N/A! 80%!Measuring!efficiency:!Time!needed!to!learn!the!CTE!testing!method! N/A! 116!min!Time!needed!to!design!and!generate!the!test!suite!with!the!CTE!! N/A! 95!min!(62!for!tree,!33!for!
removing!duplicates)!!Time!needed!to!setup!testing!infrastructure!specific!to!CTE! N/A! 74!min!Time!needed!to!automate!the!test!suite!generated!by!the!CTE!! N/A! 357!min!Time!needed!to!execute!the!test!suite! 114min! 183!min!(both!builds)!Time!needed!to!identify!fault!types!and!causes!! 0min! 116!min!Measuring!subjective!satisfaction!SUS! N/A! 50!Reaction!cards! N/A! Comprehensive,!Dated,!Old,!
Sterile,!Unattractive!Informal!interview!! N/A! video!
Collected data
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Descrip;ve sta;s;cs for efficiency (1)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Descrip;ve sta;s;cs for efficiency (2)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Emo;onal face reac;ons
76
1. 4.
2. 3.
Duplicates Test Cases
Readability of the CTE trees Technical support and user manuals
Appearance of the tool
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• RQ1: Compared to the current test suites used for tes;ng in Sulake, can the test cases generated by the CTE contribute to the effec;veness of tes;ng when it is used in real tes;ng environments at Sulake?
– 2 new faults were found!! – The need that more structured combinatorial tes;ng is necessary was confirmed by Sulake.
• RQ2: How much effort would be required to introduce the CTE into the tes;ng processes currently implanted at Sulake?
– Effort for learning and installing is medium, but can be jus;fied within Sulake being only once – Designing and genera;ng test cases suffers form duplicates that costs ;me to be removed. Total effort can
be accepted within Sulake because cri;cal faults were discovered. – Execu;ng the test suite generated by the CTE takes 1 hour more that Sulake test suite (due to more
combinatorial aspects being tested). This means that Sulake cannot include these tests in daily build, but will have to add them to nightly builds only.
¢ RQ3: How much effort would be required to add the generated test cases into the tes;ng infrastructure currently used at Sulake? • Effort for automa;ng the generated test cases, but can be jus;fied within Sulake.
¢ RQ4: How sa;sfied are Sulake tes;ng prac;;oners during the learning, installing, configuring and usage of CTE when it is used in real tes;ng environment. • Seems to have everything that is needed but looks unaVrac;ve.
Combinatorial Tes;ng example case study Sulake: Conclusions
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Automated Test Case Genera;on example case study IBM: context
• IBM Research Labs in Haifa, Israel • Develop a system (denoted IMP ;-‐) for resource management
in a networked environment (servers, virtual machines, switches, storage devices, etc.)
• IBM Lab has a designated team that is responsible for tes;ng new versions of the product.
• The tes;ng is being done within a simulated tes;ng environment developed by this tes;ng team
• IBM Lab is interested in evalua;ng the Automated Test Case Genera;on tools developed in the FITTEST project!
78
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• What do we want to find out?
• Research questions:
– RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM?
– RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM?
– RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research?
Automated Test Case Genera;on example case study IBM: the research ques;ons
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Current test case design practice at IBM – Exploratory test case design – The objective of test cases is maximise the coverage of
the system use-cases
COMPARED TO • FITTEST Automated Test Case Generation tools
– Only part of the whole continuous testing approach of FITTEST
The tes;ng tools evaluated (the cases or treatments)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
FITTEST Automated Test Case Genera;on
81
Logs2FSM • Infers FSM models from the logs
• Applying an event-based model inference approach (read more in [MTR08]).
• The model-based oracles that also result from this tool refer to the use of the paths generated from the inferred FSM as oracles. If these paths, when transformed to test cases, cannot be fully executed, then the tester needs to inspect the failing paths to see if that is due to some faults, or the paths themselves are infeasible.
FSM2Tests • Takes FSMs and a Domain Input
Specification (DIS) file created by a tester for the IBM Research SUT to generate concrete test cases.
• This component implements a technique
that combines model-based and combinatorial testing (see [NMT12]):
1. generate test paths from the FSM (using various simple and advanced graph visit algorithms)
2. transform these paths into classification trees using the CTE XL format, enriched with the DIS such as data types and partitions;
3. Generate test combinations from those trees using combinatorial criteria.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Subjects: – 1 senior tester from IBM (10 years sw devel, 5 years testing
experience of which 4 years with IMP system) – 1 researcher from FBK (10 years of experience with sw
development, 5 years of experience with research in testing)
Who is doing the study (the subjects)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• Objects: – SUT: Distributed application for managing system
resources in a networked environment • A management server that communicates with multiple
managed clients (= physical or virtual resources) • Important product for IBM with real customers • Case study will be performed on a new version of this system
– Existing test suite that IBM uses (TSibm) • Selected from what they call System Validation Tests
(SVT) -> tests for high level complex costumer use-cases • Manually designed • Automatically executed through activation scripts
– 10 representative faults to inject into the system
Pilot project
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Can we inject faults?
Training on T Inject faults
YES
do T to make TST; collect
data
NO
Can we compare with an existing test suite from company C (i.e. TSC)
Can company C make another test suite TSN for
comparison using Tknown or Tunknown
Do we have a version with known
faults?
Do we have a company baseline?
do Tknown o Tunknown to make TSN ;
collect data
NO YES
YESNO
NO YES
Do we have a version with known
faults?
Do we have a version with known
faults?
NO YES
NO
NO YES
YES
1 2
3
6 7
4 5
The protocol (scenario 5)
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
More detailed steps 1. Configure the simulated environment and create the logs [IBM] 2. Select test suite TSibm [IBM] 3. Write ac;va;on scripts for each test case in TSibm [IBM] 4. Generate TSfittest [FBK subject]
a. Instan;ate FITTEST components for the IMP b. Generate the FSM with Logs2FSM c. Define the Domain Input Specifica;on (DIS) d. Generate the concrete test data with FSM2Tests
5. Select and inject the faults [IBM] 6. Develop a tool that transforms the concrete test cases generated
by the FITTEST tool FSM2Tests to an executable format [IBM] 7. Execute TSibm [IBM] 8. Execute TSfittest [IBM]
85
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Collected Measures
86
TSibm TSfittest
sizenumber of abstract test cases NA 84number of concrete test cases 4 3054number of commands (or events) 1814 18520constructiondesign of the test cases manual cf. Section III-B1 automated cf. Section III-B2effort
effort to create the test suitedesign 5 hours set up FITTEST tools 8 hoursactivation scripts 4 hours generate the FSM automated, less than 1 second CPU time
specify the DIS 2 hoursgenerate concrete tests automated, less than 1 minute CPU timetransform into executable format 20 hours
TABLE IV. DESCRIPTIVE MEASURES FOR THE TEST SUITES TSibm AND TSfittest
what the researcher have in mind and what is investigatedaccording to the research questions. This type of threat ismainly related to the use of injected faults to measure the fault-finding capability of our testing strategies. This is because thetypes of faults seeded may not be enough representative of realfaults. In order to mitigate this threat, the IBM team identifiedrepresentative faults that were based on real faults, identifiedin earlier time of the development. This identification althoughwas realized by a senior tester, the list was revised by all IBMteam that participated in this case study.
V. CONCLUSIONS
We have presented a “which is better” [9] case study forevaluating FITTEST testing tools with a real user and real taskswithin a realistic industrial environment of IBM Research.The design of the case study has been done according to themethodological framework for defining case studies presentedin [14]. Although a one-subject case study will never providegeneral conclusions with statistical significance, the obtainedresults can be generalized to other similar software in similartesting environments of IBM Research [15], [8]. Moreover, thestudy was very useful for technology transfer purposes: someremarks during the study indicate that the FITTEST techniqueswould not have been evaluated in so much depth if it wouldnot have been backed up by our case study design. Finally,having only limited number of subjects available, this studytook several weeks to complete and hence we overcame theproblem of getting too much information too late.
The objective of this research was to examine the ad-vancements of the FITTEST tools and validate their potentialto improve current testing practices at IBM Research. Thefollowing were the results of the case study:
• The FITTEST tools can increase the effectiveness ofthe current practice of the IBM Research team fortesting the IMP within the simulated environment.
• The efficiency of the FITTEST tools is found accept-able by IBM Research for testing the IMP within thesimulated environment.
• The test cases automatically generated by theFITTEST tools support better the identification of thesource of the faults when testing the IMP within thesimulated environment.
• The effort for deploying the FITTEST within a realindustry case has been found reasonable by IBMResearch.
Moreover, from the FITTEST project’s point of view wehave the following results:
• The FITTEST tools have shown to be useful withinthe context of a real industrial case.
• The FITTEST tools have the ability to automate thetesting process within a real industrial case.
ACKNOWLEDGMENT
This work was financed by the FITTEST project, ICT-2009.1.2 no 257574. Also, we would like to thank the greathelp we got from Alon Aradi for conducting the experiments.
REFERENCES
[1] http://pic.dhe.ibm.com/infocenter/director/pubs/index.jsp?topic=%2Fcom.ibm.director.vim.helps.doc%2Ffsd0 vim main.html.
[2] B. Beizer. Software Testing Techniques. International ThomsonComputer Press, 1990.
[3] A. Biermann and J. Feldman. On the synthesis of finite-state machinesfrom samples of their behavior. IEEE Trans. on Computers, 21(6),1972.
[4] A. C. Dias Neto, R. Subramanyan, M. Vieira, and G. H. Travassos.A survey on model-based testing approaches: a systematic review. In1st ACM Int. workshop on Empirical assessment of software engineer-ing languages and technologies: held in conjunction with the 22ndIEEE/ACM ASE 2007, WEASELTech ’07, pages 31–36, New York,NY, USA, 2007. ACM.
[5] M. Fewster and D. Graham. Software Test Automation. Addison-Wesley,1999.
[6] M. Grochtmann and J. Wegener. Test case design using classificationtrees and the classification-tree editor cte. In Proceedings of the 8thInternational Software Quality Week, San Francisco, USA, Mai 1995.
[7] M. Harman and B. F. Jones. Search-based software engineering.Information and Software Technology, 43(14):833–839, 2001.
[8] W. Harrison. Editorial (N=1: an alternative for software engineeringresearch). Empirical Software Engineering, 2(1):7–10, 1997.
[9] B. Kitchenham, L. Pickard, and S. Pfleeger. Case studies for methodand tool evaluation. Software, IEEE, 12(4):52 –62, July 1995.
[10] G. J. Myers and C. Sandler. The Art of Software Testing. John Wiley& Sons, 2004.
[11] C. D. Nguyen, A. Marchetto, and P. Tonella. Combining model-based and combinatorial testing for effective test case generation. InProceedings of the 2012 International Symposium on Software Testingand Analysis, pages 100–110. ACM, 2012.
[12] C. Nie and H. Leung. A survey of combinatorial testing. ACM Comput.Surv., 43:11:1–11:29, February 2011.
[13] T. Vos, P. Tonella, J. Wegener, M. Harman, W. Prasetya, and S. Ur.Testing of future internet applications running in the cloud. In S. Tilleyand T. Parveen, editors, Software Testing in the Cloud: Perspectives onan Emerging Discipline, pages 305–321. 2013.
START
S27
GET/VMControl/virtualAppliances
S43
GET/VMControl/virtualServers/{id}/customization
S47
GET/VMControl/workloads/{id}
S7
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
S46
GET/VMControl/virtualServers/{id}
S11
GET/VMControl/virtualAppliances/{id}/progress
S19
GET/isdsettings/restcompatibilities
S29
GET/VMControl/workloads/{id}
S10
GET/VMControl/virtualServers/{id}/customization
S8
GET/VMControl/virtualAppliances/{id}
S25
PUT/VMControl/virtualServers/{id}
S15
GET/VMControl/virtualAppliances/{id}/progress
S42
GET/VMControl/virtualAppliances/{id}/targets
S34
GET/VMControl/virtualAppliances
S26
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/progress
S32
GET/VMControl/virtualServers/{id}
S33
GET/VMControl/virtualAppliances/{id}
S41
GET/VMControl/virtualAppliances/{id}/targets
S2
GET/VMControl/virtualAppliances/{id}/progress
S20
GET/VMControl/virtualAppliances/{id}
S37
GET/VMControl/virtualAppliances/{id}/targets
S36
GET/VMControl/virtualAppliances
S28
GET/VMControl/virtualServers/{id}
S24
GET/VMControl/virtualAppliances
GET/VMControl/virtualServers/{id}/customization
S44
GET/VMControl/workloads/{id}
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
S22
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}/progress GET/isdsettings/restcompatibilities
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targetsGET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances
S48
GET/resources/OperatingSystem?Props=IPv4Address
GET/isdsettings/restcompatibilitiesGET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/progress S40
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualAppliances/{id}
GET/VMControl/virtualAppliances/{id}/targets
S23
GET/VMControl/virtualAppliances/{id}
S50
GET/resources/System/{id}/access/a315017165998
S38
GET/VMControl/workloads
GET/VMControl/virtualAppliances/{id}/targets
GET/VMControl/virtualAppliances/{id}
S31
GET/VMControl/workloads/{id}
S39
GET/VMControl/workloads/{id}/progress
S21
GET/VMControl/workloads/{id}
S12
GET/VMControl/virtualServers/{id}/customization
S13
GET/VMControl/virtualServers/{id}
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualServers/{id}/customization
S30
GET/VMControl/virtualServers/{id}
S35
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualAppliances/{id}
GET/isdsettings/restcompatibilities
GET/VMControl/virtualServers/{id}
PUT/VMControl/virtualServers/{id} GET/VMControl/virtualAppliances
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualServers/{id}
PUT/VMControl/virtualServers/{id} GET/VMControl/virtualAppliances
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualServers/{id}
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/virtualServers/{id}
PUT/VMControl/virtualServers/{id}
S3
GET/VMControl/workloads/{id}/progress
GET/VMControl/virtualAppliances
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualAppliances/{id}
GET/VMControl/workloads/{id}
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualServers/{id}
S14
GET/VMControl/workloads/{id}
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/virtualAppliances/{id}/progress
GET/VMControl/workloads/{id}/progress
GET/VMControl/workloads/{id}
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualServers/{id}/customization
GET/VMControl/virtualServers/{id}
GET/VMControl/virtualAppliances
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
S5
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization?desiredCPU=1&desiredMemory=2048&desiredCPUUnits=0.5
S6
GET/VMControl/virtualAppliances/{id}
S4
POST/VMControl/workloads
S18
GET/VMControl/workloads/{id}
S17
GET/isdsettings/restcompatibilities
S16
GET/VMControl/workloads/{id}/progress
GET/VMControl/workloads/{id}/virtualServers
GET/VMControl/virtualServers/{id}
GET/VMControl/virtualAppliancesGET/VMControl/virtualServers/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
GET/VMControl/virtualAppliances/{id}
GET/VMControl/workloads/{id}/progress
GET/VMControl/virtualServers/{id}
GET/VMControl/virtualAppliancesGET/VMControl/virtualServers/{id}/customization
GET/VMControl/workloads/{id}
GET/VMControl/virtualAppliances/{id}/targets/{id}/customization
S49
GET/resources/OperatingSystem/{id}/access/a315017165998
S45
GET/VMControl/workloads/{id}
S9
POST/VMControl/virtualAppliances
S1
GET/VMControl/virtualAppliances
GET/VMControl/workloads/{id}
GET/VMControl/virtualServers/{id}/customization
TABLE II. DESCRIPTIVE MEASURES FOR THE FSM THAT IS USED TOGENERATE TSfittest
VariableNumber of traces used to infer FSM 6Average trace length 100Number of nodes in generated FSM 51Number of transitions in generated FSM 134
at IBM Research, can the FITTEST technologies contributeto the effectiveness of testing when it is used in the testingenvironments at IBM Research?
IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$
Fig. 4. Effectiveness measures for both test suites with respect to the 10injected faults. “0” means that the corresponding fault was not detected, while“1” means it has been detected.
As can be seen from Table IV, the TSibm is substantialsmaller in size than the TSfittest in all parameters, this is oneof the evident results of automation. However, not only thesize of TSfittest is bigger, also the effectiveness of TSfittest,measured by the injected faults coverage (see Figure 4), issignificantly higher (50% vs 70%). Moreover, if we wouldcombine the TSibm and TSfittest suites, the effectivenessincreases to 80%. Therefore, within the context of the studiedenvironment, for IBM Research the FITTEST technologies cancontribute to the effectiveness of testing and IBM Research hasdecided that, for optimizing faults-finding capability, the twotechniques can best be combined.
RQ2: Compared to the current test suite used for testing atIBM Research, can the FITTEST technologies contribute to theefficiency of testing when it is used in the testing environmentsat IBM Research?
TABLE III. EFFICIENCY MEASURES FOR EXECUTION OF BOTH TESTSUITES.
Variable TSibm normalized TSfittest normalizedby size by size
Execution Time 36.75 9.18 127.87 1.52with fault injectionExecution Time 27.97 6.99 50.72 0.60without fault injection
It can be seen from Table III that the time to execute TSibm
is smaller than the time to execute TSfittest. This is due to thenumber of concrete tests in TSfittest. When we normalize theexecution time to the number of tests in the test suit, we seethat per test, the TSfittest execution time is much smaller (1.52vs. 9.18 minutes without the injected faults and 0.60 vs. 6.99minutes with the injected faults). This is due to the fact thatthe TSfittest suite includes much shorter tests. The executiontime is acceptable for IBM Research, considering the fact that
the effectiveness of the tests can be improved and more faultscan be detected in an efficient way (as was discussed in RQ1).Moreover, the shorter tests of which TSfittest is composed,can help identify the faults faster.
RQ3: How much effort would be required to deploy theFITTEST technologies within the testing processes implantedat IBM Research?
As can be seen from Table IV, the effort to set up theFITTEST components for the SUT and to specify the DomainInput Specification was 10 hours of effort for the FBK subject.Generating the FSM and the concrete test cases was automatedby the tools. The whole CPU time needed was about 1 minuteon a moderate personal computer. The effort to convert theconcrete tests by the FITTEST tools into executable tests forIBM Research and writing the automated activation scripts wasabout 2.5 days for the experienced IBM Research subject.
The amount of effort needed to deploy and execute theFITTEST tools is found reasonable by IBM Research, con-sidering the fact that these tasks need to be done only onceduring deployment. Moreover, the tools and format of the testsare new to the team, some learning is required. After all hasbeen set-up, effort to generate a new FITTEST test suites whennew logs would be available is fully automatic.
B. Threats to validity
Internal validity. It is of concern when causal relationsare examined. In our case study, an internal validity threatis related to the logs generated by the IBM simulation en-vironment to be used for automatically constructing the testmodels. Because of the quality of models can be affected bythe content of the input logs. We are aware of this threat andhave asked IBM for a diverse set of logs. Another similar threatis that the quality of concrete test cases can be affected bythe completeness of the Domain Input Specification (DIS) filebecause incomplete specification will weaken the efficiencyof the TSfittest. In fact, this threat might affect the overallnumber of detected faults by TSfittest, but if the specificationcan be improved, such number can be greater. Therefore, theconclusion about the effectiveness of the TSfittest remainsunchanged. Regarding to the involved subjects from IBM,although they had a high level of expertise and experienceworking in the industry as testers, they had no previousknowledge of the FITTEST tools. This threat was reduced bymeans of a closer collaboration between FBK and IBM, bycomplementing their competences in order to avoid possiblemistakes or misunderstandings.
External validity. It is concerned with to what extent itis possible to generalize the findings, and to what extent thefindings are of interest to other people outside the investigatedcase. Our results rely on one industrial case study using agiven set of artificial faults. Although running such studies isexpensive in terms of time consuming, we plan to replicateit with in order to have a more generalizable conclusions.However, as discussed earlier, the system under testing usedis a typical of a broad category of industrial systems thatcommunicates with multiple managed clients and with usersof the management system.
Construct validity. This aspect of validity reflect to whatextent the operational measures that are studied really represent
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Collected measures
87
IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$
TABLE II. DESCRIPTIVE MEASURES FOR THE FSM THAT IS USED TOGENERATE TSfittest
VariableNumber of traces used to infer FSM 6Average trace length 100Number of nodes in generated FSM 51Number of transitions in generated FSM 134
at IBM Research, can the FITTEST technologies contributeto the effectiveness of testing when it is used in the testingenvironments at IBM Research?
IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$
Fig. 4. Effectiveness measures for both test suites with respect to the 10injected faults. “0” means that the corresponding fault was not detected, while“1” means it has been detected.
As can be seen from Table IV, the TSibm is substantialsmaller in size than the TSfittest in all parameters, this is oneof the evident results of automation. However, not only thesize of TSfittest is bigger, also the effectiveness of TSfittest,measured by the injected faults coverage (see Figure 4), issignificantly higher (50% vs 70%). Moreover, if we wouldcombine the TSibm and TSfittest suites, the effectivenessincreases to 80%. Therefore, within the context of the studiedenvironment, for IBM Research the FITTEST technologies cancontribute to the effectiveness of testing and IBM Research hasdecided that, for optimizing faults-finding capability, the twotechniques can best be combined.
RQ2: Compared to the current test suite used for testing atIBM Research, can the FITTEST technologies contribute to theefficiency of testing when it is used in the testing environmentsat IBM Research?
TABLE III. EFFICIENCY MEASURES FOR EXECUTION OF BOTH TESTSUITES.
Variable TSibm normalized TSfittest normalizedby size by size
Execution Time 36.75 9.18 127.87 1.52with fault injectionExecution Time 27.97 6.99 50.72 0.60without fault injection
It can be seen from Table III that the time to execute TSibm
is smaller than the time to execute TSfittest. This is due to thenumber of concrete tests in TSfittest. When we normalize theexecution time to the number of tests in the test suit, we seethat per test, the TSfittest execution time is much smaller (1.52vs. 9.18 minutes without the injected faults and 0.60 vs. 6.99minutes with the injected faults). This is due to the fact thatthe TSfittest suite includes much shorter tests. The executiontime is acceptable for IBM Research, considering the fact that
the effectiveness of the tests can be improved and more faultscan be detected in an efficient way (as was discussed in RQ1).Moreover, the shorter tests of which TSfittest is composed,can help identify the faults faster.
RQ3: How much effort would be required to deploy theFITTEST technologies within the testing processes implantedat IBM Research?
As can be seen from Table IV, the effort to set up theFITTEST components for the SUT and to specify the DomainInput Specification was 10 hours of effort for the FBK subject.Generating the FSM and the concrete test cases was automatedby the tools. The whole CPU time needed was about 1 minuteon a moderate personal computer. The effort to convert theconcrete tests by the FITTEST tools into executable tests forIBM Research and writing the automated activation scripts wasabout 2.5 days for the experienced IBM Research subject.
The amount of effort needed to deploy and execute theFITTEST tools is found reasonable by IBM Research, con-sidering the fact that these tasks need to be done only onceduring deployment. Moreover, the tools and format of the testsare new to the team, some learning is required. After all hasbeen set-up, effort to generate a new FITTEST test suites whennew logs would be available is fully automatic.
B. Threats to validity
Internal validity. It is of concern when causal relationsare examined. In our case study, an internal validity threatis related to the logs generated by the IBM simulation en-vironment to be used for automatically constructing the testmodels. Because of the quality of models can be affected bythe content of the input logs. We are aware of this threat andhave asked IBM for a diverse set of logs. Another similar threatis that the quality of concrete test cases can be affected bythe completeness of the Domain Input Specification (DIS) filebecause incomplete specification will weaken the efficiencyof the TSfittest. In fact, this threat might affect the overallnumber of detected faults by TSfittest, but if the specificationcan be improved, such number can be greater. Therefore, theconclusion about the effectiveness of the TSfittest remainsunchanged. Regarding to the involved subjects from IBM,although they had a high level of expertise and experienceworking in the industry as testers, they had no previousknowledge of the FITTEST tools. This threat was reduced bymeans of a closer collaboration between FBK and IBM, bycomplementing their competences in order to avoid possiblemistakes or misunderstandings.
External validity. It is concerned with to what extent itis possible to generalize the findings, and to what extent thefindings are of interest to other people outside the investigatedcase. Our results rely on one industrial case study using agiven set of artificial faults. Although running such studies isexpensive in terms of time consuming, we plan to replicateit with in order to have a more generalizable conclusions.However, as discussed earlier, the system under testing usedis a typical of a broad category of industrial systems thatcommunicates with multiple managed clients and with usersof the management system.
Construct validity. This aspect of validity reflect to whatextent the operational measures that are studied really represent
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
• RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM?
– TSibm finds 50% of injected faults, TSfittest finds 70%
– Together they find 80%! -> IBM will consider combining the techniques
• RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM?
– FITTEST test cases execute faster because they are smaller
– Shorter tests was good for IBM -> easier to identify faults
• RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research?
– Found reasonable by IBM considering the fact that manual tasks need to be done only once and more faults were fund.
Automated Test Case Genera;on example case study IBM: Conclusions
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Threats to validity
• The learning curve effect • (an basically all the other human factors ;-‐) • Missing informa;on in the logs leads to weak FSMs. • Incomplete specifica;on of the DIS lead to weak concrete test
cases. • The representa;veness of the injected faults to real faults.
89
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Final Things ……
• As researchers, we should concentrate on future problems the
industry will face. ¿no? • How can we claim to know future needs without understanding
current ones?
• Go to industry and evaluate your results!
90
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Need any help? More informa;on? Want to do an instan;a;on? Contact: Tanja E. J. Vos email: tvos@pros.upv.es skype: tanja_vos web: hVp://tanvopol.webs.upv.es/ project: hVp://www.facebook.com/FITTESTproject telephone: +34 690 917 971
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
References empirical research in so_ware engineering
[BSH86] V R Basili, R W Selby, and D H Hutchens. Experimenta;on in so_ware engineering. IEEE Trans. So_w. Eng., 12:733–743, July 1986. [BSL99] Victor R. Basili, Forrest Shull, and Filippo Lanubile. Building knowledge through families of experiments. IEEE Trans. So_w. Eng., 25(4):456–473, 1999. [CWH05] M. Host C. Wohlin and K. Henningsson. Empirical research methods in so_ware and web engineering. In E. Mendes and N. Mosley, editors, Web Engineering -‐ Theory and Prac;ce of Metrics and Measurement for Web Development, pages 409– 429, 2005. [Fly04] Bent Flyvbejrg. Five misunderstandings about case-‐study research. In Jaber F. Gubrium Clive Seale, Giampietro Gobo and David Silverman, editors, Qualita;ve Research Prac;ce., pages 420–434. London and Thousand Oaks, CA., 2004. [KLL97] B. Kitchenham, S. Linkman, and D. Law. Desmet: a methodology for evalua;ng so_ware engineering [KPP+ 02] Barbara A. Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Peter W. Jones, David C. Hoaglin, Khaled El Emam, and JarreV Rosenberg. Preliminary guidelines for empirical research in so_ware engineering. IEEE Trans. So_w. Eng., 28(8):721–734, 2002. [LSS05] Timothy C. Lethbridge, Susan EllioV Sim, and Janice Singer. Studying so_ware engineers: Data collec;on techniques for so_ware field studies. Empirical So_ware Engineering, 10:311– 341, 2005. 10.1007/s10664-‐005-‐1290-‐x. [TTDBS07] Paolo Tonella, Marco Torchiano, Bart Du Bois, and Tarja Syst¨a. Empirical studies in reverse engineering: state of the art and future trends. Empirical So_w. Engg., 12(5):551–571, 2007. [WRH+00] Claes Wohlin, Per Runeson, Mar;n Host, Magnus C. Ohls-‐ son, Bjorn Regnell, and Anders Wesslen. Experimenta;on in so_ware engineering: an introduc;on. Kluwer Academic Publishers, Norwell, MA, USA, 2000. [Rob02] Colin Robson. Real World Research. Blackwell Publishing Limited, 2002. [RH09] Per Runeson and Mar;n Host. Guidelines for conduc;ng and repor;ng case study research in so_ware engineering. Empirical So_w. Engg., 14(2):131–164, 2009. [PPV00] Dewayne E. Perry, Adam A. Porter, and Lawrence G. VoVa. 2000. Empirical studies of so_ware engineering: a roadmap. In Proceedings of the Conference on The Future of So?ware Engineering (ICSE '00). ACM, New York, NY, USA, 345-‐355. [EA06] hVp://www.cs.toronto.edu/~sme/case-‐studies/case_study_tutorial_slides.pdf [Ple95] Pfleeger,S.L.; Experimental design and analysis in so_ware engineering. Annals of So_ware Engineering 1, 219-‐253, 1995. [PK01-‐03] Pfleeger, S.L. and Kitchenham, B.A.. Principles of Survey Research. So_ware Engineering Notes, (6 parts) Nov 2001 -‐ Mar 2003. [Fly06] Flyvbjerg, B.; Five Misunderstandings about Case Study Research. Qualita;ve Inquiry 12 (2) 219-‐245, April 2006 [KPP95] Kitchenham, B.; Pickard, L.; Pfleeger, Shari Lawrence, "Case studies for method and tool evalua;on," So?ware, IEEE , vol.12, no.4, pp.52,62, Jul 1995
92
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
[BS87] Victor R. Basili and Richard W. Selby. Comparing the effec;veness of so_ware tes;ng strategies. IEEE Trans. So_w. Eng., 13(12):1278–1296, 1987. [DRE04] Hyunsook Do, Gregg Rothermel, and Sebas;an Elbaum. In-‐ frastructure support for controlled experimenta;on with so_-‐ ware tes;ng and regression tes;ng techniques. In Proc. Int. Symp. On Empirical So_ware Engineering, ISESE ’04, 2004. [EHP+ 06] Sigrid Eldh, Hans Hansson, Sasikumar Punnekkat, Anders Pet-‐ tersson, and Daniel Sundmark. A framework for comparing efficiency, effec;veness and applicability of so_ware tes;ng tech-‐ niques. Tes;ng: Academic & Industrial Conference on Prac;ce And Research Techniques (TAIC part), 0:159–170, 2006. [JMV04] Natalia Juristo, Ana M. Moreno, and Sira Vegas. Reviewing 25 years of tes;ng technique experiments. Empirical So_w. Engg.,9(1-‐2):7–44, 2004. [RAT+ 06] Per Runeson, Carina Andersson, Thomas Thelin, Anneliese Andrews, and Tomas Berling. What do we know about de-‐ fect detec;on methods? IEEE So_w., 23(3):82–90, 2006. [VB05] Sira Vegas and Victor Basili. A characterisa;on schema for so_ware tes;ng Techniques. Empirical So_w. Engg., 10(4):437–466, 2005. [LR96] Christopher M. LoV and H. Dieter Rombach. Repeatable so_ware engineering experiments for comparing defect-‐detec;on techniques. Empirical So_ware Engineering, 1:241–277, 1996. 10.1007/BF00127447.
93
References Empirical research in so_ware tes;ng
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
[Vos et. Al. 2010] T. E. J. Vos, A. I. Baars, F. F. Lindlar, P. M. Kruse, A. Windisch, and J. Wegener, “Industrial scaled automated structural tes;ng with the evolu;onary tes;ng tool,” in ICST, 2010, pp. 175–184. [Vos et. al 2012] Tanja E. J. Vos, Arthur I. Baars, Felix F. Lindlar, Andreas Windisch, Benjamin Wilmes, Hamilton Gross, Peter M. Kruse, Joachim Wegener: Industrial Case Studies for Evalua;ng Search Based Structural Tes;ng. Interna;onal Journal of So_ware Engineering and Knowledge Engineering 22(8): 1123-‐ (2012) [Vos et. Al. 2013] T. E. J. Vos, F. Lindlar, B. Wilmes, A. Windisch, A. Baars, P. Kruse, H. Gross, and J. Wegener, “Evolu;onary func;onal black-‐box tes;ng in an industrial semng,” So?ware Quality Journal, 21 (2): 259-‐288 (2013) [MRT08] A. MarcheVo, F. Ricca, and P. Tonella, “A case study-‐ based comparison of web tes;ng techniques applied to ajax web applica;ons,” Interna;onal Journal on So_ware Tools for Technology Transfer (STTT), vol. 10, pp. 477– 492, 2008
94
References Descrip;ons of empirical studies done in so_ware tes;ng
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Other references [TS04] T.S. Tullis and J. N. Stetson. A comparison of ques;onnaires for assessing website usability. In Proceedings of the Usability Professionals Associa;on Conference, 2004. [BKM08] Aaron Bangor, Philip T. Kortum, and James T. Miller. An empirical evalua;on of the system usability scale. Interna;onal Journal of Human-‐computer Interac;on, 24:574–594, 2008. [Horn06] Hornbæk, K. (2006), "Current Prac8ce in Measuring Usability: Challenges to Usability Studies and Research", Interna8onal Journal of Human-‐Computer Studies, Volume 64, Issue 2 , February 2006, Pages 79-‐102. [MTR08] MarcheVo, Alessandro; Tonella, Paolo and Ricca, Filippo., State-‐based tes;ng of Ajax web applica;ons, So_ware Tes;ng, Verifica;on, and Valida;on, 2008 1st Interna;onal Conference on, pp121-‐130, IEEE, 2008. [NMT12] C. D. Nguyen, A. MarcheVo, and P. Tonella. Combining model-‐based and combinatorial tes;ng for effec;ve test case genera;on. In Proceedings of the 2012 Interna;onal Symposium on So_ware Tes;ng and Analysis, pages 100-‐110, ACM, 2012.
95