+ All Categories
Home > Documents > Introduction to V&V Techniques and Principles Prepared by Stephen M. Thebaut, Ph.D. University of...

Introduction to V&V Techniques and Principles Prepared by Stephen M. Thebaut, Ph.D. University of...

Date post: 29-Dec-2015
Category:
Upload: cathleen-walters
View: 227 times
Download: 2 times
Share this document with a friend
Popular Tags:
85
Introduction to V&V Techniques and Principles Prepared by Stephen M. Thebaut, Ph.D. University of Florida Software Testing and Verification Lecture 2
Transcript

Introduction to V&V Techniques and Principles

Prepared by

Stephen M. Thebaut, Ph.D.

University of Florida

Software Testing and Verification

Lecture 2

Verification and Validation…

• …at the unit/component-level of system development, is what this course is mostly about.

• Adds value...by improving product quality and reducing risks.

• Three approaches:

– Human-based testing

– Formal correctness proofs

– Machine-based testing

Verification and Validation…

• …at the unit/component-level of system development, is what this course is mostly about.

• Adds value...by improving product quality and reducing risks.

• Three approaches:

– Human-based testing

– Formal correctness proofs

– Machine-based testing

Verification and Validation…

• …at the unit/component-level of system development, is what this course is mostly about.

• Adds value...by improving product quality and reducing risks.

• Three approaches:

– Human-based testing

– Formal correctness proofs

– Machine-based testing

Human-Based Testing

• Desk Checking, Walkthroughs, Reviews / Inspections

• Applicable to requirements / specifications, design, code, proof of correctness arguments, test plans / designs / cases, maintenance plans, etc.

• Can be extremely effective...

Human-Based Testing

• Desk Checking, Walkthroughs, Reviews / Inspections

• Applicable to requirements / specifications, design, code, proof of correctness arguments, test plans / designs / cases, maintenance plans, etc.

• Can be extremely effective...

Human-Based Testing

• Desk Checking, Walkthroughs, Reviews / Inspections

• Applicable to requirements / specifications, design, code, proof of correctness arguments, test plans / designs / cases, maintenance plans, etc.

• Can be extremely effective...

Formal Correctness Proofs

{X≥2}SP2 := 4while SP2 <= X

SP2 := SP2 * 2end_while

{ ( i SP2 = 2i > X Л 2(i-1) X) Л (X =

X’) }

{ ( i SP2 = 2i Л

2(i-1) X) Л (X = X’) }

“pre-condition”

loop “invariant”

“post-condition”

Formal Correctness Proofs (cont’d)

• Require formal specifications and appropriate proof methods.

• Do not eliminate the need for testing.

Formal Correctness Proofs (cont’d)

• Require formal specifications and appropriate proof methods.

• Do not eliminate the need for testing.

Machine-Based Testing

• Execution of (“crafted”) test cases

• Actual and expected results (i.e., program behaviors) are compared.

Machine-Based Testing

• Execution of (“crafted”) test cases

• Actual and expected results (i.e., program behaviors) are compared.

Definitions of “TESTING”

• Hetzel: Any activity aimed at evaluating an attribute or capability of a program or system. It is the measurement of software quality.

• Beizer: The act of executing tests. Tests are designed and then executed to demonstrate the correspondence between an element and its specification.

Definitions of “TESTING”

• Hetzel: Any activity aimed at evaluating an attribute or capability of a program or system. It is the measurement of software quality.

• Beizer: The act of executing tests. Tests are designed and then executed to demonstrate the correspondence between an element and its specification.

Definitions of “TESTING” (cont’d)

• Myers: The process of executing a program with the intent of finding errors.

• IEEE: The process of exercising or evaluating a system or system component by manual or automated means to verify that it satisfies specified requirements or to identify differences between expected and actual results.

Definitions of “TESTING” (cont’d)

• Myers: The process of executing a program with the intent of finding errors.

• IEEE: The process of exercising or evaluating a system or system component by manual or automated means to verify that it satisfies specified requirements or to identify differences between expected and actual results.

Definitions of “TESTING” (cont’d)

• Testing undertaken to demonstrate that a system performs correctly is sometimes referred to as validation testing.

• Testing undertaken to expose defects is sometimes referred to as defect testing.

Definitions of “TESTING” (cont’d)

• Testing undertaken to demonstrate that a system performs correctly is sometimes referred to as validation testing.

• Testing undertaken to expose defects is sometimes referred to as defect testing.

Evolving Attitudes About Testing

• 1950’s

– Machine languages used

– Testing is debugging

• 1960’s

– Compilers developed

– Testing is separate from debugging

Evolving Attitudes About Testing (cont’d)

• 1970’s

– Software engineering concepts introduced

– Testing begins to evolve as a technical discipline

• 1980’s

– CASE tools developed

– Testing expands to Verification and Validation (V&V)

Evolving Attitudes About Testing (cont’d)

• 1990’s

– Increased focus on shorter development cycles

– Quality focus increases

– Testing skills and knowledge in greater demand

– Increased acceptance of testing as a discipline

Evolving Attitudes About Testing (cont’d)

• 2000’s

– More focus on shorter development cycles

– Increased use of Agile methods: Test-First / Test-Driven Development, customer involvement in testing, and the use of automated testing frameworks (e.g., JUnit)

– Better integration of testing / verification / reliability ideas

– Growing interest in software safety, protection, and security

Let’s Pause for a Moment…

Imagine that it’s summertime and that a 3-day weekend is just starting… Wouldn’t it be great to just grab a fishin’ pole and head on out to the lake!

Fisherman’s Dilemma

• You have 3 days for fishing and 2 lakes from which to choose. Day 1 at lake X nets 8 fish. Day 2 at lake Y nets 32 fish. Which lake do you return to for day 3?

• Does your answer depend on any assumptions?

Fisherman’s Dilemma

• You have 3 days for fishing and 2 lakes from which to choose. Day 1 at lake X nets 8 fish. Day 2 at lake Y nets 32 fish. Which lake do you return to for day 3?

• Does your answer depend on any assumptions?

Di Lemma

• In general, the probability of the existence of more errors in a section of a program is directly related to the number of errors already found in that section.

Invalid and Unexpected Inputs

• Test cases must be written for INVALID and UNEXPECTED, as well as valid and expected, input conditions.

• In many systems, MOST of the code is concerned with input error checking and handling.

Invalid and Unexpected Inputs

• Test cases must be written for INVALID and UNEXPECTED, as well as valid and expected, input conditions.

• In many systems, MOST of the code is concerned with input error checking and handling.

Anatomy of a Test Case

• What are the parts of a test case?

1. a description of input condition(s)

2. a description of expected results

• Where do ‘‘expected results’’ come from?

Who Should Test Your Program?

• Most people are inclined to defend what they produce – not find fault with it.

• Thus, programmers should in principle avoid testing their own programs.

• But what if this is not possible or appropriate?

Who Should Test Your Program?

• Most people are inclined to defend what they produce – not find fault with it.

• Thus, programmers should in principle avoid testing their own programs.

• But what if this is not possible or appropriate?

Who Should Test Your Program?

• Most people are inclined to defend what they produce – not find fault with it.

• Thus, programmers should in principle avoid testing their own programs.

• But what if this is not possible or appropriate?

Who Should Test Your Program?

• Most people are inclined to defend what they produce – not find fault with it.

• Thus, programmers should in principle avoid testing their own programs.

• But what if this is not possible or appropriate?

Become Mr. Hyde... I.e., adopt a “tester’s mindset” that mitigates your ego-attachment

to the program.

Testing Techniques

• Black-Box: Testing based solely on analysis of requirements (unit/component specification, user documentation, etc.). Also know as functional testing.

• White-Box: Testing based on analysis of internal logic (design, code, etc.). (But expected results still come from requirements.) Also known as structural testing.

Testing Techniques

• Black-Box: Testing based solely on analysis of requirements (unit/component specification, user documentation, etc.). Also know as functional testing.

• White-Box: Testing based on analysis of internal logic (design, code, etc.). (But expected results still come from requirements.) Also known as structural testing.

Levels or Phases of Testing

• Unit: testing of the smallest programmer work assignments that can reasonably be planned and tracked (e.g., function, procedure, module, object class, etc.)

• Component: testing a collection of units that make up a component (e.g., program, package, task, interacting object classes, etc.)

Levels or Phases of Testing

• Unit: testing of the smallest programmer work assignments that can reasonably be planned and tracked (e.g., function, procedure, module, object class, etc.)

• Component: testing a collection of units that make up a component (e.g., program, package, task, interacting object classes, etc.)

Levels or Phases of Testing (cont’d)

• Product: testing a collection of components that make up a product (e.g., subsystem, application, etc.)

• System: testing a collection of products that make up a deliverable system

Levels or Phases of Testing (cont’d)

• Product: testing a collection of components that make up a product (e.g., subsystem, application, etc.)

• System: testing a collection of products that make up a deliverable system

Levels or Phases of Testing (cont’d)

• Testing usually:

– begins with functional (black-box) tests,

– is supplemented by structural (white-box) tests, and

– progresses from the unit level toward the system level with one or more integration steps.

Other Types of Testing

• Integration: testing which takes place as sub-elements are combined (i.e., integrated) to form higher-level elements

• Regression: re-testing to detect problems caused by the adverse effects of program change

• Acceptance: formal testing conducted to enable the customer to determine whether or not to accept the system (acceptance criteria may be defined in a contract)

Other Types of Testing

• Integration: testing which takes place as sub-elements are combined (i.e., integrated) to form higher-level elements

• Regression: re-testing to detect problems caused by the adverse effects of program change

• Acceptance: formal testing conducted to enable the customer to determine whether or not to accept the system (acceptance criteria may be defined in a contract)

Other Types of Testing

• Integration: testing which takes place as sub-elements are combined (i.e., integrated) to form higher-level elements

• Regression: re-testing to detect problems caused by the adverse effects of program change

• Acceptance: formal testing conducted to enable the customer to determine whether or not to accept the system (acceptance criteria may be defined in a contract)

Other Types of Testing (cont’d)

• Alpha: actual end-user testing performed within the development environment

• Beta: end-user testing performed within the user environment prior to general release

• System Test Acceptance: testing conducted to ensure that a system is “ready” for the system-level test phase

Other Types of Testing (cont’d)

• Alpha: actual end-user testing performed within the development environment

• Beta: end-user testing performed within the user environment prior to general release

• System Test Acceptance: testing conducted to ensure that a system is “ready” for the system-level test phase

Other Types of Testing (cont’d)

• Alpha: actual end-user testing performed within the development environment

• Beta: end-user testing performed within the user environment prior to general release

• System Test Acceptance: testing conducted to ensure that a system is “ready” for the system-level test phase

Other Types of Testing (cont’d)

• Soak: testing a system version over a significant period of time to discover latent errors or performance problems (due to memory leaks, buffer/file overflow, etc.)

• Smoke (build verification): the first test after a software build to detect catastrophic failure (Term comes from hardware testing…)

• Lights out: testing conducted without human intervention – e.g., after normal working hours

Other Types of Testing (cont’d)

• Soak: testing a system version over a significant period of time to discover latent errors or performance problems (due to memory leaks, buffer/file overflow, etc.)

• Smoke (build verification): the first test after a software build to detect catastrophic failure (Term comes from hardware testing…)

• Lights out: testing conducted without human intervention – e.g., after normal working hours

Other Types of Testing (cont’d)

• Soak: testing a system version over a significant period of time to discover latent errors or performance problems (due to memory leaks, buffer/file overflow, etc.)

• Smoke (build verification): the first test after a software build to detect catastrophic failure (Term comes from hardware testing…)

• Lights out: testing conducted without human intervention – e.g., after normal working hours

Testing in Plan-Driven Software Development

Requirementsdefinition

System andsoftware design

Implementationand unit testing

Integration andsystem testing

Operation andmaintenance

Plan-Based Testing Process Activities

Test Planning

Test DesignTest Implementation

Test Execution

Execution Analysis

Result Documentation

Final Reporting

Testing in Incremental (e.g., Agile) Software Development

ValidationFinal

version

DevelopmentIntermediate

versions

SpecificationInitial

version

Outlinedescription

Concurrentactivities

Test-Driven† Development (TDD)

• TDD was introduced in support of agile methods such as Extreme Programming. However, it can also be used in plan-driven development processes.

• Code is developed incrementally, along with a test for that increment. You don’t move on to the next increment until the code that you have developed passes its test.

† TDD is taken to be synonymous with “Test-FIRST Development (TFD)” by some. Others use the terms differently. See below.

Test-Driven Development (TDD)

TDD Process Activities

1. Identify the next required increment of functionality.

2. Write a test for this functionality and implement the test as a program (i.e., an automated test).

3. Run the test, along with all other tests that have been implemented. (The new test will “fail” since you have not yet implemented the functionality.)

4. REPEAT: Implement the functionality, refactor, and re-run the test(s) UNTIL all tests “pass.”

5. Go back to step 1.

TDD Process Activities

1. Identify the next required increment of functionality.

2. Write a test for this functionality and implement the test as a program (i.e., an automated test).

3. Run the test, along with all other tests that have been implemented. (The new test will “fail” since you have not yet implemented the functionality.)†

4. REPEAT: Implement the functionality, refactor, and re-run the test(s) UNTIL all tests “pass.”

5. Go back to step 1.†Why do you think RUNNING (not just implementing) the test is required

BEFORE the functionality is implemented?

Rationale for TDD

“When you test first, you capture your intent in an automatable and executable form. You focus on what you are about to write in a way that works to prevent defects rather than create them. The tests you write serve as a persistent reinforcement of that intent going forward. In addition to helping you do the thing right, it helps you to do the right thing.”

─ Stephen Vance, in Quality Code, 2014

TDD versus TFD

• A distinction is sometimes drawn between “Test-Driven” and “Test-First” Development...

• For some, “Test-driven” simply means that a program’s design is influenced by testing, for example by refactoring the code to make it easier to test. “Test-first” would further imply that test cases are written and run before features are implemented.

• Others use the terms synonymously...

What Doe$ Te$ting Co$t?

• About 50% of the total life-cycle effort is spent on testing.

• About 50% of the total life-cycle time is spent on testing.

When Might Testing Guarantee an Error-Free Program?a. When branch, condition, and loop coverage

are achieved

b. When dataflow testing is utilized

c. When path and compound condition coverage are achieved

d. When all combinations of all possible input and state variable values are covered

e. (None of the above.)

When Might Testing Guarantee an Error-Free Program?a. When branch, condition, and loop coverage

are achieved

b. When dataflow testing is utilized

c. When path and compound condition coverage are achieved

d. When all combinations of all possible input and state variable values are covered

e. (None of the above.)

When Might Testing Guarantee an Error-Free Program?a. When branch, condition, and loop coverage

are achieved

b. When dataflow testing is utilized

c. When path and compound condition coverage are achieved

d. When all combinations of all possible input and state variable values are covered

e. (None of the above.)= “EXHAUSTIVE

Testing”

Exhaustive Testing is Exhausting

• Situation:

– A module has 2 input parameters.

– Word size is 32 bits.

– Testing is completely automated: 100 nanoseconds are required for each test case.

• Question: How long would it take to test this module exhaustively, i.e., covering every possible combination of input values?

Exhaustive Testing is Exhausting (cont’d)

• Short Answer:

• Long Answer:

Exhaustive Testing is Exhausting (cont’d)

• Short Answer: too long…

• Long Answer:

Exhaustive Testing is Exhausting (cont’d)

• Short Answer: too long…

• Long Answer:

264

X 100 X 10-9

> 57,000 years!

3600 X 24 X 365

Exhaustive Testing is Exhausting (cont’d)

• Short Answer: too long…

• Long Answer:

264

X 100 X 10-9

> 57,000 years!

3600 X 24 X 365

Note that the term “exhaustive testing” is often used rather loosely…

• Feb. 3, 2010 (Bloomberg) – Electronicthrottle systems are under review bysafety officials as a possible cause ofsudden acceleration in Toyota MotorCorp. vehicles, as alleged in at least seven lawsuits.

• Toyota has said it ruled out electronics as a cause of sudden acceleration that has resulted in recalls of millions of its cars and trucks.

• “In terms of electronics of the vehicle, we’ve done exhaustive testing and we’ve found no issues with the electronics,” Toyota’s Lentz said on a conference call with reporters Feb. 1.

Electronic throttle testing

The ECONOMICS of Testing

• In general, we can’t test everything (i.e., test exhaustively).– We need to weigh COST and RISK.– When is it not cost effective to continue

testing?• Conventional wisdom: V&V should continue

until there is confidence that the software is “fit for purpose.”

• The level of confidence required depends on at least three factors...

The ECONOMICS of Testing

• In general, we can’t test everything (i.e., test exhaustively).– We need to weigh COST and RISK.– When is it not cost effective to continue

testing?• Conventional wisdom: V&V should continue

until there is confidence that the software is “fit for purpose.”

• The level of confidence required depends on at least three factors...

The ECONOMICS of Testing

• In general, we can’t test everything (i.e., test exhaustively).– We need to weigh COST and RISK.– When is it not cost effective to continue

testing?• Conventional wisdom: V&V should continue

until there is confidence that the software is “fit for purpose.”

• The level of confidence required depends on at least three factors...

Factors affecting level of confidence required

• Software function/purpose: Safety-critical systems, for example, require a much higher level of confidence than demonstration-of-concept prototypes.

• User expectations: Users may have low expectations or may tolerate shortcomings when the benefits of use are high.

• Market environment: Getting a product to market early may be more important than finding additional defects.

Ken Johnston’s “Minimum Viable Quality (MVQ)” testing model†

• Builds on premise that some companies test their web-based software services too much before releasing them to production:

“You need to be comfortable with testing less and knowingly shipping buggier software faster than ever before. Speed of release is the vital competitive advantage in the world of connected services and devices.”

†Ken Johnston is a former Director of Test Excellence at Microsoft Corp.

MVQ testing model (cont’d)

• “...new code is pushed to a subset of users; if it’s too buggy, a quick fail back to last known good takes the code out of use with minimum negative user impact.”

• “...you start to get data about how the code is functioning in production with real users more quickly.”

• “The key aspect is to balance...the feature set and (their) quality... If it is too low, then you won’t discover and learn the harder to find bugs because the code won't be exercised.”

MVQ testing model (cont’d)

• See Ken Johnston’s Guest Lecture from Spring ’14 on the CEN 4072/6070 E-Learning website (via the Course Video Lecture tab):

CEN6070_Supplement_01-hh.mp4

• The lecture is entitled, “The Future of Testing is EaaSy,” and describes his MVQ model in a more general context.

• Note that Exam 1 will include questions based on Ken’s recorded lecture.

Costs of Errors Over Life Cycle

• In general, the sooner an error can be found and corrected, the lower the cost.

• Costs can increase exponentially with time between injection and discovery.

TIME

CO

ST

Costs of Errors Over Life Cycle (cont’d)

• An industry survey showed that it is 75 times more expensive to correct errors discovered during “installation” than during “analysis.”

• One organization reported an average cost of $91 per defect found during “inspections” versus $25,000 per defect found after product delivery.

Costs of Errors Over Life Cycle (cont’d)

• An industry survey showed that it is 75 times more expensive to correct errors discovered during “installation” than during “analysis.”

• One organization reported an average cost of $91 per defect found during “inspections” versus $25,000 per defect found after product delivery.

V&V for Software Engineers

• V&V techniques have evolved considerably and require specialized knowledge, disciplined creativity, and ingenuity.

• Software engineers should be familiar with all V&V techniques, and should be able to employ (and assess the effectiveness of) those techniques appropriate to their responsibilities.

V&V for Software Engineers

• V&V techniques have evolved considerably and require specialized knowledge, disciplined creativity, and ingenuity.

• Software engineers should be familiar with all V&V techniques, and should be able to employ (and assess the effectiveness of) those techniques appropriate to their responsibilities.

Vehicles for Continuous Process Improvement

• Post-Test Analysis: reviewing the results of a testing activity with the intent to improve its effectiveness

• Causal Analysis: identifying the causes of errors and approaches to eliminate future occurrences

• Benchmarking: general practice of recording and comparing indices of performance, quality, cost, etc., to help identify “best practices”

Vehicles for Continuous Process Improvement

• Post-Test Analysis: reviewing the results of a testing activity with the intent to improve its effectiveness

• Causal Analysis: identifying the causes of errors and approaches to eliminate future occurrences

• Benchmarking: general practice of recording and comparing indices of performance, quality, cost, etc., to help identify “best practices”

Vehicles for Continuous Process Improvement

• Post-Test Analysis: reviewing the results of a testing activity with the intent to improve its effectiveness

• Causal Analysis: identifying the causes of errors and approaches to eliminate future occurrences

• Benchmarking: general practice of recording and comparing indices of performance, quality, cost, etc., to help identify “best practices”

Introduction to V&V Techniques and Principles

Prepared by

Stephen M. Thebaut, Ph.D.

University of Florida

Software Testing and Verification

Lecture 2


Recommended