+ All Categories
Home > Documents > Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Date post: 03-Jan-2016
Category:
Upload: homer-bishop
View: 220 times
Download: 3 times
Share this document with a friend
Popular Tags:
28
Massive Stochastic Testing Massive Stochastic Testing of SQL of SQL Don Slutz Don Slutz Microsoft Research Microsoft Research Presented By Presented By Manan Shah Manan Shah
Transcript
Page 1: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Massive Stochastic Testing of SQLMassive Stochastic Testing of SQL

Don SlutzDon SlutzMicrosoft ResearchMicrosoft Research

Presented ByPresented By

Manan ShahManan Shah

Page 2: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Testing a DatabaseTesting a Database

Two Broad Aspects:Two Broad Aspects: Testing the correctness of the SQL EngineTesting the correctness of the SQL Engine Testing the correctness of the SQL outputTesting the correctness of the SQL output

Focus of this paper:Focus of this paper: Testing the correctness of the SQL Engine.Testing the correctness of the SQL Engine.

Page 3: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Test Coverage ProblemTest Coverage Problem

All possible SQL

Used by customers

SQL test library

Detectable software bugs

Page 4: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

BackgroundBackground SQL test groups focus on deterministic testing to cover

individual features of the language.

Typical SQL test libraries require an estimated ½ person-hour per statement to compose.

Software engineering requires a substantial investment

Commercial SQL vendors with tight schedules tend to use a more ad hoc process.

Page 5: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

MotivationMotivation Deterministic testing of SQL database systems is human

intensive.

The input domain, all SQL statements, from any number of users, with all states of the database, is gigantic.

These test libraries cover an important, but tiny, fraction of the SQL input domain.

Large increases in test coverage must come from automating the generation of tests.

Page 6: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Our AimOur Aim

All possible SQL

Used by customers

SQL test library

Detectable software bugs

Page 7: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

IntroductionIntroduction This paper describes a method to rapidly create a very large

number of SQL statements without human intervention.

The SQL statements are generated stochastically.

Stochastic testing has the advantage that the quality of the tests improves as the test size increases

Introduces idea of cross-validation of query results against existing SQL implementations

RAGS (Random Generation Of SQL) is currently used by the Microsoft SQL Server testing group.

Page 8: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

SQL GrammarSQL Grammar

SELECT FROMSELECT FROM

WHERE WHERE

selectpart frompart

[TOP term] [DISTINCT | ALL] selectExpression [,...]

* | expression [[AS] columnAlias] | tableAlias.*

tableExpression [,...]

{[schemaName.] tableName] [{{LEFT | RIGHT} [OUTER] | [INNER] | CROSS | NATURAL} JOIN tableExpression

expression

andcondition [OR andcondition]

operand [conditionRightHandSide]| NOT condition| EXISTS (select)

Page 9: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

ExampleExample

Emp_noEmp_no NameName SexSex Dept.Dept. JobJob SalarySalary CommComm

11 JohnJohn MM R&DR&D Engg.Engg. 2500025000 25002500

22 MaryMary FF SalesSales ClerkClerk 1500015000 15001500

33 AndyAndy MM R&DR&D Mgr.Mgr. 5000050000 50005000

44 SamSam MM SalesSales Mgr.Mgr. 5000050000 50005000

55 PetePete MM R&DR&D ClerkClerk 1500015000 15001500

Page 10: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Stochastic generationStochastic generationSelect

name

salary

+

commfrom Table Emp

where

and

10000salary

>

“sales”Dept

=

Page 11: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

SQL Statement GenerationSQL Statement Generation

RAGS follows the semantic rules of SQL.

Carries state information and directives on its walk down the tree and the results of stochastic outcomes as it walks up.

RAGS makes all its stochastic decisions at the last possible moment

Page 12: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

The RAGS SystemThe RAGS System The RAGS approach is:

1. Greatly enlarge the shaded circle in the figure by stochastic SQL statement generation.

2. Make all aspects of the generated SQL statements configurable.

3. Experiment with configurations to maximize the bug detection rate.

Page 13: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

The RAGS System The RAGS System cont…cont…

RAGS is an experiment to see how effective a million fold increase in the size of a SQL test library can be.

RAGS can be used to drive an SQL system and look for observable errors

The output of successful Select statements can be saved for regression testing.

Page 14: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

The RAGS SystemThe RAGS System

Print Report

Config. file

DBMS1=SYS A

DBMS2=SYS B

100000 Stmts

65% Select

35%Update

Connect to DBMS’s

Read Table Schema

Loop

Generate SQL Stmt

Execute on SYS A

Execute on SYS B

Compare Results

Record Errors

SQL DBMS SYS A

Read configuration

SQL DBMS SYS B

Report:

100000 Stmts exec.

Stmt 156: error in SYS A

Stmt 765: error in SYS B

Page 15: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Components of RAGS SystemComponents of RAGS System The configuration file has several parameters: The configuration file has several parameters:

1. Frequency of occurrence of different statements1. Frequency of occurrence of different statements2. Limits (max no. of tables in a join, entries in group by, etc.)2. Limits (max no. of tables in a join, entries in group by, etc.)3. Frequency of occurrence of features (group by, order by)3. Frequency of occurrence of features (group by, order by)4. Execution Parameters (no. of rows to fetch per query)4. Execution Parameters (no. of rows to fetch per query)

When the RAGS program is started, it first reads the configuration file.

It then uses ODBC to connect to the first DBMS and read the schema information.

Page 16: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Execution of RAGSExecution of RAGS RAGS loops to generate SQL statements and optionally

execute them.

If the statement is executed on more than one system, the execution results are compared

At the end of the run, RAGS produces a report.

A utility is provided that compares the reports from several runs and summarizes the differences.

The comparison can be between different vendors or different versions of the same system (regression testing).

Page 17: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Validation IssuesValidation Issues If a SQL Select executes without errors, there is no easy

method to validate the returned values

The authors’ approach is to execute the same query on multiple vendor’s DBMSs and then compare the results.

First, the number of rows returned is compared

To avoid sorts, a special checksum over all the column values in all the rows is compared.

The method only works for SQL statements that will execute on more than one vendor’s database.

Page 18: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Example RAGS QueryExample RAGS Query

Page 19: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Execution FactsExecution Facts On a 200Mhz Pentium RAGS can generate 833 moderate size

SQL statements per second.

In one hour RAGS can generate 3 million different SQL statements

The starting random seed for a RAGS run can be specified in the configuration file.

If the starting seed is not specified, RAGS obtains a seed by hashing the time of day.

Page 20: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Testing ExperiencesTesting Experiences

Each of the 10 clients executed 2500 SQL statements in transactions that contained an average of 5 statements

86.1% of the statements executed without error, 13.8% had expected errors and 0.07% indicated possible bugs

Page 21: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Comparison TestsComparison Tests

Page 22: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Automatic Statement SimplificationAutomatic Statement Simplification

When a RAGS generated statement caused an error, the debugging process was difficult if the statement was complex

The offending statement can usually be vastly simplified by hand.

The simplification process itself was tedious, so RAGS was extended to simplify the statement automatically.

RAGS walks a parse tree for the statement and tries to remove terms in expressions and certain clauses (Where and Having).

Page 23: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.
Page 24: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

VisualizationVisualization

To investigate the relationship between two metrics, a set of sample pairs is collected and analyzed.

RAGS presents an opportunity to scale up the size of such samples by several orders of magnitude.

It allows one to plot the sample points and visualize the relationship.

Page 25: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Execution time of V2 (sec)

Execution time of V1 (sec)0 0.1 0.2 0.3 0.4

0.4

0.3

0.2

0.1

Page 26: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

SummarySummary RAGS is an experiment in massive stochastic testing of SQL

systems.

Its main contribution is to generate entire SQL statements stochastically

The problem of validating outputs remains a tough issue. Output comparisons for different vendor’s database systems proved to be extremely useful, but only for the small set of common SQL.

The outcome of the experiments was encouraging since RAGS could steadily generate errors in released SQL products.

Page 27: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Future WorksFuture Works SQL coverage can be extended to more data types, more DDL,

stored procedures, utilities, etc.

Robustness tests performed by stochastically generating a family of equivalent SQL statements and comparing their O/P

Testing with equivalent statements has the important advantage of a method to help validate the outputs.

The optimizer estimates of execution metrics, together with the measured execution metrics, can be compared.

Page 28: Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Thank You!Thank You!


Recommended