+ All Categories
Home > Documents > Relational Symbolic Execution - Harvard John A. Paulson...

Relational Symbolic Execution - Harvard John A. Paulson...

Date post: 26-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
Relational Symbolic Execution Gian Pietro Farina University at Buffalo, SUNY gianpiet@buffalo.edu Stephen Chong Harvard University [email protected] Marco Gaboardi University at Buffalo, SUNY gaboardi@buffalo.edu ABSTRACT Symbolic execution is a classical program analysis technique used to show that programs satisfy or violate given specifications. In this work we generalize symbolic execution to support program analysis for relational specifications in the form of relational prop- erties - these are properties about two runs of two programs on related inputs, or about two executions of a single program on related inputs. Relational properties are useful to formalize notions in security and privacy, and to reason about program optimizations. We design a relational symbolic execution engine, named RelSym which supports interactive refutation, as well as proving of rela- tional properties for programs written in a language with arrays and for-like loops. 1 INTRODUCTION Relational properties capture the relations between the behavior of two programs when run on two inputs, and as a special case the behavior of one program on two different inputs. Several safety and security properties can be described as relational properties: non- interference [Goguen and Meseguer 1982, 1984], compiler optimiza- tions [Benton 2004], sensitivity and continuity analysis [Chaudhuri et al. 2010, 2012; Reed and Pierce 2010], and relative cost [Çiçek et al. 2017] are just some examples. In order to prove a relational property, one must ensure that all the pairs of related executions satisfy it, instead of just single executions. Similarly, for finding violations to relational properties, we need to find pairs of related executions that violate the property. A natural way to approach the verification and the testing of re- lational properties is through their reduction to standard (unary) properties through ideas like self-composition [Barthe et al. 2004; Butler and Schulte 2011; Terauchi and Aiken 2005] and product programs [Barthe et al. 2011; Eilers et al. 2018]. This approach permits to use standard program verification and bug-finding tech- niques [Hritcu et al. 2013; Milushev et al. 2012], and to reduce the problem to designing convenient and efficient self-compositions and product programs. Another way to approach the verification and testing of rela- tional properties is through relational extensions of standard, non- relational, techniques for these tasks. Several works have explored this approach for techniques such as type systems [Barthe et al. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PPDP ’19, October 7–9, 2019, Porto, Portugal © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7249-7/19/10. . . $15.00 https://doi.org/10.1145/3354166.3354175 2014b, 2015; Nanevski et al. 2013; Pottier and Simonet 2003], pro- gram logics [Barthe et al. 2012; Benton 2004; Sousa and Dillig 2016], program analysis [Kwon et al. 2017], and abstract interpretation [As- saf et al. 2017; Feret 2001; Giacobazzi and Mastroeni 2004]. In this approach, one often aims at giving the user the choice on how to explore the use of the relational assumptions, (i.e., relational precon- ditions, relational intermediate assumptions, and relational invari- ants) and a way to relate two programs in order to prove relational properties. Relational assumptions have a different flavor than non- relational ones, since they permit to consider only a subset of the product-relation between inputs, and so only a subset of the pairs of execution of a program. These are often the key ingredients for reasoning in a natural way about relational properties. In this paper, we follow this approach and we propose relational symbolic execution (RelSym): a foundational technique combining the idea of relational analysis of programs and symbolic execution. RelSym is a relational symbolic execution engine for a language with arrays and for-loops. The target applications we have in mind are data analysis and statistics, so we focused on a core calculus which constitute the basis of languages like R [R Core Team 2013]. In fact, the design of RelSym was at an early stage informed by the work in [Morandat et al. 2012], on the subset of that language: Core R. For-loops and arrays provide interesting challenges to both the design of the operational semantics and to the representation of the different execution paths in constraints. RelSym combines both proving and interactive refutation of re- lational properties, with the option of providing loop invariants to effectively prove or refute properties of programs containing loops. RelSym is built on a hierarchy of four languages (two relational and two unary — two concrete and two symbolic) whose operational semantics are built on each other in a well-founded manner. In particular, the two relational languages are based on their unary versions and the two symbolic languages are, as it usually happens in symbolic execution, the symbolic versions (i.e., extended with symbolic values) of the concrete ones. The symbolic operational semantics collect constraints about the execution of a program, or about pairs of executions of programs, that can be used to prove or refute relational properties. This gives the user the ability to ex- periment with different ways of proving and interactively refuting relational properties, e.g., both using a single symbolic relational execution or using a pair of unary symbolic executions. We implemented RelSym as a prototype, and we used it for experimenting with different examples of interactive refutation and verification for several relational properties coming from different domains. The range of properties and examples we considered show the flexibility and the feasibility of our approach. We also compare RelSym with other non-relational methods such as self-composition and product programs (which can also be defined using our tool) in their basic form with no optimization. We find that our approach, thanks to the use of relational assumptions, 1
Transcript
  • Relational Symbolic ExecutionGian Pietro Farina

    University at Buffalo, [email protected]

    Stephen ChongHarvard University

    [email protected]

    Marco GaboardiUniversity at Buffalo, SUNY

    [email protected]

    ABSTRACTSymbolic execution is a classical program analysis technique usedto show that programs satisfy or violate given specifications. Inthis work we generalize symbolic execution to support programanalysis for relational specifications in the form of relational prop-erties - these are properties about two runs of two programs onrelated inputs, or about two executions of a single program onrelated inputs. Relational properties are useful to formalize notionsin security and privacy, and to reason about program optimizations.We design a relational symbolic execution engine, named RelSymwhich supports interactive refutation, as well as proving of rela-tional properties for programs written in a language with arraysand for-like loops.

    1 INTRODUCTIONRelational properties capture the relations between the behavior oftwo programs when run on two inputs, and as a special case thebehavior of one program on two different inputs. Several safety andsecurity properties can be described as relational properties: non-interference [Goguen and Meseguer 1982, 1984], compiler optimiza-tions [Benton 2004], sensitivity and continuity analysis [Chaudhuriet al. 2010, 2012; Reed and Pierce 2010], and relative cost [Çiçeket al. 2017] are just some examples.

    In order to prove a relational property, one must ensure thatall the pairs of related executions satisfy it, instead of just singleexecutions. Similarly, for finding violations to relational properties,we need to find pairs of related executions that violate the property.A natural way to approach the verification and the testing of re-lational properties is through their reduction to standard (unary)properties through ideas like self-composition [Barthe et al. 2004;Butler and Schulte 2011; Terauchi and Aiken 2005] and productprograms [Barthe et al. 2011; Eilers et al. 2018]. This approachpermits to use standard program verification and bug-finding tech-niques [Hritcu et al. 2013; Milushev et al. 2012], and to reduce theproblem to designing convenient and efficient self-compositionsand product programs.

    Another way to approach the verification and testing of rela-tional properties is through relational extensions of standard, non-relational, techniques for these tasks. Several works have exploredthis approach for techniques such as type systems [Barthe et al.

    Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’19, October 7–9, 2019, Porto, Portugal© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-7249-7/19/10. . . $15.00https://doi.org/10.1145/3354166.3354175

    2014b, 2015; Nanevski et al. 2013; Pottier and Simonet 2003], pro-gram logics [Barthe et al. 2012; Benton 2004; Sousa and Dillig 2016],program analysis [Kwon et al. 2017], and abstract interpretation [As-saf et al. 2017; Feret 2001; Giacobazzi and Mastroeni 2004]. In thisapproach, one often aims at giving the user the choice on how toexplore the use of the relational assumptions, (i.e., relational precon-ditions, relational intermediate assumptions, and relational invari-ants) and a way to relate two programs in order to prove relationalproperties. Relational assumptions have a different flavor than non-relational ones, since they permit to consider only a subset of theproduct-relation between inputs, and so only a subset of the pairsof execution of a program. These are often the key ingredientsfor reasoning in a natural way about relational properties. In thispaper, we follow this approach and we propose relational symbolicexecution (RelSym): a foundational technique combining the ideaof relational analysis of programs and symbolic execution.

    RelSym is a relational symbolic execution engine for a languagewith arrays and for-loops. The target applications we have in mindare data analysis and statistics, so we focused on a core calculuswhich constitute the basis of languages like R [R Core Team 2013].In fact, the design of RelSym was at an early stage informed bythe work in [Morandat et al. 2012], on the subset of that language:Core R. For-loops and arrays provide interesting challenges to boththe design of the operational semantics and to the representationof the different execution paths in constraints.

    RelSym combines both proving and interactive refutation of re-lational properties, with the option of providing loop invariants toeffectively prove or refute properties of programs containing loops.RelSym is built on a hierarchy of four languages (two relational andtwo unary — two concrete and two symbolic) whose operationalsemantics are built on each other in a well-founded manner. Inparticular, the two relational languages are based on their unaryversions and the two symbolic languages are, as it usually happensin symbolic execution, the symbolic versions (i.e., extended withsymbolic values) of the concrete ones. The symbolic operationalsemantics collect constraints about the execution of a program, orabout pairs of executions of programs, that can be used to proveor refute relational properties. This gives the user the ability to ex-periment with different ways of proving and interactively refutingrelational properties, e.g., both using a single symbolic relationalexecution or using a pair of unary symbolic executions.

    We implemented RelSym as a prototype, and we used it forexperimenting with different examples of interactive refutation andverification for several relational properties coming from differentdomains. The range of properties and examples we considered showthe flexibility and the feasibility of our approach.

    We also compare RelSym with other non-relational methodssuch as self-composition and product programs (which can also bedefined using our tool) in their basic form with no optimization. Wefind that our approach, thanks to the use of relational assumptions,

    1

    https://doi.org/10.1145/3354166.3354175

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    improves in efficiency with respect to self-composition. Productprograms give verification conditions that are often comparableto the one obtained using relational methods, and they can usestandard symbolic execution tools, but a challenge in using thistechnique is the additional cost, in term of design, in building theproduct—even if recent developments considerably eased this task,e.g., [Eilers et al. 2018]. In relational symbolic execution, we do notneed any pre-processing and we can directly analyze a programin a relational way. This shows a trade-off between the differenttechniques which can be exploited accordingly to the concrete tar-get application. At the current stage, RelSym users need to provideinvariants for loops with symbolic guards. We envision for the fu-ture to combine our approach with invariant synthesis techniques,especially relational ones, e.g., [Chen et al. 2017, 2011; Qin et al.2013; Sigurbjarnarson et al. 2018].

    Summarizing, the main contributions of our work are:• The design of a relational symbolic execution technique, Rel-Sym, for a language containing for-loops and arrays. Thistechnique is based on relational and unary symbolic opera-tional semantics that permits to explore the different execu-tion paths of programs, maintaining constraints about pairsof executions that can be used to prove or refute relationalproperties.• The extension of relational symbolic execution to supportrelational and unary invariants to completely explore a loopwith symbolic guards.• We have implemented RelSym in a prototype. The imple-mentation uses an SMT solver to discharge the generatedconstraints. We show the effectiveness of our approach byanalyzing several examples for different relational proper-ties.

    Outline. The paper is structured in the following way: in Sec-tion 2 we introduce the different design choices behind RelSym inan informal way. Using four running examples Section 3 shows atan high level how RelSym works and how relational assumptionshelp in cutting the search space for proofs and refutation witnesses.In Section 4, 5, and 6 we provide the main technical material de-scribing the four languages behind RelSym and the meta theoreticalresults that connect them. Section 7 provides some details about theRelSym implementation. In Section 8 we provide an experimentalcomparison of the relational symbolic approach with other stan-dard techniques for the verification and bug finding of relationalproperties such as self-composition and product programs. Finally,in Section 9 we discuss related works and in Section 10 we concludeby providing a summary of this work.

    2 RELATIONAL SYMBOLIC EXECUTION:INFORMALLY

    In this section, we will give an high-level introduction to the maincharacteristics of RelSym.

    Relational semantics. RelSym is based on a relational operationalsemantics, which describes the execution of two, potentially dif-ferent, programs in two, potentially different, memories. In thissemantics a memory e.g.,M can map a variable e.g., x , either toa single value, for instanceM(x) = 4, or to a pair of values, for

    instanceM(x) = (3, 4). In the first case, we know that in the twoexecutions x will take the same value 4. In the second case, x willtake two different values in the two executions that is 3 and 4. In as-sertions, when we refer to one of the two executions of the programwe use indexed objects. For instance by writing x1 we mean thevariable x interpreted in the first (left) execution. When we insteadhave a precondition that implies that the variable has the samevalue in both run we will just avoid indexes and write, for example,just x . The relational character of memories is extended also to theoperational semantics of commands and expressions thanks to apairing construct ⟨· | ·⟩. In the spirit of [Pottier and Simonet 2003],with ⟨c1 | c2⟩ we denote a pair of commands that might differ intwo runs. These are needed, for instance, when the guard e , of aconditional if e then c1 else c2, evaluates to different values in thetwo executions, and so the two executions need to take differentbranches. For instance, when evaluating if e then c1 else c2, if eevaluates to ⟨1 | 0⟩, the first execution needs to evaluate c1, whilethe second one needs to evaluate c2. This situation is resolved byusing the command pair ⟨c1 | c2⟩. To relationally execute a pairedcommand ⟨c1 | c2⟩ we execute both c1 and c2 in a unary fashion ontwo different memories independently and when they both termi-nate we merge the two final unary memories in one final relationalmemory.

    Symbolic semantics. To enable symbolic execution, the RelSymengine also supports symbolic values X ,Y . . . As in standard sym-bolic execution, a symbolic value X represents a set of possibleconcrete values. However, in relational symbolic execution, sym-bolic values can appear also in pairs ⟨X |Y ⟩. During the compu-tation, symbolic values are refined through constraints comingfrom pre and postconditions, invariants, and conditionals . At eachstep, the constraints describe all the possible concrete values thatsymbolic values, and pairs of symbolic values, can assume. Asa simple example, consider symbolic execution of the programif x = 0 then c1 else c2 starting with a memory M whereM(x) = X . Note that the symbolic value X represents an arbi-trary concrete value, but the value is the same for both executions.Symbolic execution of the program would follow both the firstbranch (collecting the constraint X = 0) and the second branch(collecting the constraint X , 0). The two constraints restrict theset of concrete values that X can represent in the two branches,respectively. Consider instead executing the same program butwith an initial memory whereM(x) = ⟨X1 |X2⟩. Here, the twoexecutions map the variable x to different symbolic values, mean-ing that the value of variable x may differ in the two executions.Symbolic execution of the conditional would generate four possibleconfigurations, based on all possible combinations of the left andright executions taking the true and false branches. Using relationalassumptions, we can cut the space of the branches to explore andstill get an analysis relational in nature that allows us to exploitthe naturality of this approach instead of reducing it to a unaryapproach.

    Relational ghost variables. We will make use of (relational) ghostvariables [Hofmann and Pavlova 2008] to annotate programs orto give specifications for them. Ghost variables are variables thatdon’t correspond to real program entities but appear only in thespecification of a program. For instance when we will reason about

    2

  • Relational Symbolic Execution PPDP ’19, October 7–9, 2019, Porto, Portugal

    relational cost we will use a relational variable γ which countsthe cost of the two runs. Other ghost variables can be used to rea-son about other properties for instance covert channels or traceequivalence. The operational semantics of the languages does notcover ghost variables by itself, but it can easily be extended byadding conditions to the rule describing how they evolve duringthe computation. For instance when reasoning about cost we canselect a (potentially proper) subset of rules of the semantics whichcover the cost model we have in mind, and extend them with con-ditions describing how γ evolves. For simplicity in Section 3 wewill measure the cost of a program by the number of assignmentsit performs.

    Proving relational specifications. Throughout the whole paperwe will use (relational) Hoare triples to denote specifications ofprograms. That is, we will say that a program satisfies (or doesn’t)the triple {Φ}c{Ψ}. Symbolic execution can be used to prove validspecifications. In general, if starting from a symbolic initial statethat satisfies a precondition Φ we execute (relationally and) sym-bolically a program c and we only reach final states where thepath constraints imply the postcondition Ψ we know that the triple{Φ}c{Ψ} is valid.

    Interactive refutation and counterexample generation. The dualway of reasoning is what symbolic execution is mostly used for.Symbolic execution searches for final states whose associated pathconstraints don’t imply the postcondition desired, if they are foundit means that there is at least one state where the desired postcon-dition might not hold. Symbolic execution has been proved usefulto generate concrete test cases that demonstrate violation of speci-fications. This is usually done by using constraint solvers to findsubstitutions for symbolic values that satisfy at the same time thenegation of the postcondition on the final states (the violation ofthe specification) and some path condition (i.e., constraints oversymbolic values based on the control flow of the symbolic execu-tion) guaranteeing the reachability of the violation. RelSym can beused in the same way to find violations of relational properties.

    Loops. Traditionally, symbolic execution has been used morefor bug finding and testing [Khurshid et al. 2003; King 1976] thanfor proving. One of the reasons for this is that conditionals andloops may create state explosion, and long (possibly infinite) tracesof configurations. To improve this situation we extend relationalsymbolic execution with loop invariants [Hentschel et al. 2014] sothat the symbolic execution of a loop can be performed by jumpingover the loop in one step and by adding an invariant to the pathcondition. We design two rules for unary and relational invariantswhich allow one to reason in one step about loops both for provingand for finding counterexamples. We will see in Section 3 that usingan invariant allows us to reason about arrays with symbolic length,proving in this way this program satisfies a relational property (Lip-schitz continuity) for arrays of arbitrary length. When searchingfor counterexamples, the situation is a bit more delicate. Indeed,just providing an inductive invariant may lead to unrealizable coun-terexamples: satisfiable substitutions that are not produced by anyconcrete execution. This can happen when the invariants do notdetermine precisely enough the state that can be reached after the

    loop. To avoid this situation in subsection 6.3 we formalize a no-tion of strength of an invariant. RelSym uses this notion to checkwhether the invariant provided is strong enough ensuring that if acounterexample is found, then indeed it corresponds to a concreteexecution (or a pair of concrete executions) violating the (possiblyrelational) specification of the program. Using loop with invariantsmitigates in part the state explosion problem but it does not solveit entirely. A lot of research has focused and still focuses on tamingthe state explosion in traditional symbolic execution. These tech-niques can also be used for relational symbolic execution in orderto tame this complexity. Since RelSym is intended as a foundationalwork we won’t concern ourselves here with integrating the frame-work with standard techniques for reducing space explosion, orloop invariant synthesization, as our goal is to present a differentapproach to the verification and interactive refutation of relationalproperties.

    Comparison with self-composition and product programs. We al-ready discussed how self-composition and product programs arestandard approaches which reduce relational properties to unaryproperties, and which allow one to use standard program verifi-cation and bug-finding techniques. At the design level, we do notpropose our approach in contrast with these techniques but as analternative. Indeed, one can use RelSym also as a standard sym-bolic execution engine and use these techniques as a pre-processingphase transforming the program in its self-composition or productprogram. However, we believe that at the technical level, relationalsymbolic execution offers, in several situations, some keys advan-tages that permit to maximize the relational reasoning. Indeed inthe next section we will see that we don’t need to reason about thefunctional correctness of the programs, to prove or disprove (eventhough to effectively find counterexamples strong invariants involv-ing a functional description might be needed) relational properties.This property is very useful in relational reasoning since it doesallow one to reduce the complexity of the constraints that one needto consider. Self-composition cannot directly support this for exam-ple for arrays with symbolic length, while product programs cansupport it but it requires more complex invariants than in the caseof relational symbolic execution. To understand better this kindof trade-offs we perform an experimental evaluation comparingRelSym with self-composition and product programs in Section 8.

    3 EXAMPLESIn this sectionwe present a few examples for proving and disprovingrelational properties of programs.Wewill hidemany details in orderto not distract the reader from the main point of the section whichis to provide a general understanding of the way RelSym works.For example, in the following we use assertions and constraintsinterchangeably but later on (i.e., Section 4 and 5) the will be treateddifferently.

    Proving anti-monotonicity of the inverse of cumulative distributionfunction (c.d.f) - concrete bounds. As a first motivating examplewe consider the program in Figure 1. The program takes in inputa real number q ∈ [0, 1] and an array d of size k ≥ 1 such that∀i .1 ≤ i ≤ k .d[i] = P[X ≤ i], where X is some unspecified randomvariable. That is,d represents the c.d.f of a random variableX whose

    3

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    1) cum←0;2) x←0;3) for(i in 1 : len(cd f )) do4) if(cum ≥ q) then5) if(x ≤ 0) then6) x←i7) else8) cum←cum + cd f [i]

    Figure 1: Let CDF the set of c.d.fs. The program implements F−1q :CDF → R. F−1q is monotonically decreasing where the order ⪯cdfon CDF, encoded in finite arrays, is defined as: d1 ⪯cdf d2 ⇐⇒∀x .d1[x ] ≤ d2[x ] and we consider the standard order on R.

    realizations lie in the set {1, . . . ,k}. The program then proceeds tocompute the smallest x such that P[X ≤ x] ≥ q. If we consider das its input and x as its output then the program implements thefunction F−1q , i.e., the inverse c.d.f function. It is natural to considerthe point wise order on c.d.fs described in Figure 1. The function F−1qthen, obeys the following relational property: ∀d1,d2,q.d1 ⪯cdfd2 =⇒ F−1q (d1) ≥ F−1q (d2). This property should hence be true forthe program considered. Let’s see how to see this using RelSym.RelSym will start executing the program in a relational memorywith two arrays d1 and d2 with the same length, say 5 for instance.Every value in the arrays will be symbolic. These arrays will berelated by the following relational assumption (the precondition)Φ ≡ ∀i .1 ≤ i ≤ 5. =⇒ d1[i] ≤ d2[i]. What we want to show isthat in every final state x1 ≥ x2. At the i-th iteration (when xh = 0,for h ∈ {1, 2}) the constraint set will have the following constraintscumh = dh [1] + · · · + dh [i − 1]1. RelSym has now four possiblepaths to explore given by the outer if-then-else, and for threeof these there are four others given by the inner one, for a totalof 13. Instead of following a brute force approach and continuingexploring all the paths we can see that one of the paths is alreadyunsatisfiable. This because Φ implies cum2 ≥ cum1 and hence thepath characterized by the constraint cum1 ≥ q ∧ cum2 < q is notsatisfiable, and hence not reachable, so it can safely be prunedat every i-th iteration. This pruning was possible thanks to therelational assumption Φ. Similarly, at every i-th iteration, from thesymbolic state characterized by cum1 ≥ q ∧ cum2 ≥ q we candisregard the path with constraints x1 > 0 and x2 ≤ 0. Relationalreasoning allowed us to reduce the number of paths to follow atevery iteration form 13 to 8. It is easy to see how, following theremaining paths, RelSym only reaches final states where x1 ≥ x2and hence proves the specification.

    Proving k-Lipschitz continuity of sorting - symbolic bounds. In thesecond running example - code in Figure 2 - we will again prove a re-lational property of a program acting on arrays. The difference withthe previous example is that we will do it for array of symbolic (ar-bitrary) size n. To achieve that we will use a very natural relationalinvariant. In general given a sorting algorithm, run on two arraysa1,a2 of integerswith the same lengthn and related by the followingrelational precondition Φ ≡ ∀t .1 ≤ t ≤ n =⇒ |a1[t] − a2[t]| ≤ k ,we expect the sorted arrays to still satisfy the same condition. We1Actually it will contain the translation of this assertion in a constraint, but this is atechnical detail.

    1) for(i in 1 : len(a) − 1) do2) for(j in i + 1 : len(a)) do3) if(a[i] > a[j]) then4) z←a[i]5) a[i]←a[j]6) a[j]←z

    Figure 2: k-Lipschitz continuity of a sorting algorithm.

    can see this property as k-Lipschitz continuity of a sorting algo-rithm with respect to ℓ−infinity norm in both the input and outputspace. In the program under scrutiny at every iteration of the innerloop we select the smallest element a[j] in the sub array [i + 1 . . .n]and we swap it, if necessary, with a[i]. In order to make sense ofthis example it’s important to understand that the three lines 4),5), and 6) which implement the swapping are actually continuous.Indeed, when RelSym is executing the branching instruction, thereare four possible ways the two executions can proceed, that is: bothtake the same branch, or they get different branches. When the twoexecutions take the same branch then obviously Φ still holds. Thefollowing Observation 1 guarantees that this is the case also whenthe two executions follow different branches.

    Observation 1. ∀x ,y, z,w,k .|x −y | ≤ k, |z −w | ≤ k,x > z,y ≤ w =⇒ |z −y | ≤ k, |x −w | ≤ k .

    For instance, instantiating x = a1[i],y = a2[i], z = a1[i + 1],w =a2[i + 1], ensures that Φ still holds when the left execution takesthe true branch and the right execution takes the false branch. So,omitting synchronization of the loop variables, by using the in-variant: I ≡ ∀t .1 ≤ t < i =⇒ |a1[t] − a2[t]| ≤ k for both theloops we can jump outside of the external loop to a unique statewhere I [len(a) + 1/i] holds. This state implies trivially the post-condition. The important fact to notice here is the very naturalinvariant that relational reasoning allows us to specify. In a unaryexecution instead we would have to come up with non trivial invari-ants allowing us to prove the functional correctness of the program.We will need to prove not only that the program produces a sortedsequence but also that the output is a a

    Refuting cost equivalence - concrete bounds. In the next examplewe will use RelSym to refute a property about a pair of programsc1, c2. Let’s consider the programs in Figure 3. As we mentioned,RelSym rules can be extended to use ghost variables that can beupdated at every step of execution of the abstract machine. We canin this way reason about relational cost [Çiçek et al. 2017], by usingthe relational ghost variable γ which gets incremented at everyassignment. Let’s see this in an example where the two programstake both in input an array of non negative symbolic integers ofsize 5 for instance. The two programs would sum in the variablet all their elements up to some value and save in the variable othe first index in the array that made t ≥ k true. Obviously thefirst program has a higher cost in terms of assignments performed.We want to refute that the two programs have the same cost, thatis our postcondition to falsify is γ1 = γ2, while our preconditionwould be ∀i .1 ≤ i ≤ 5 =⇒ a1[i] = a2[i]. At every iteration ofthe body, for i ranging from 1 to 5, RelSym would perform, usinga specific rule, one step on the left execution updating t and no

    4

  • Relational Symbolic Execution PPDP ’19, October 7–9, 2019, Porto, Portugal

    steps in the right execution. So γ1 would be incremented but γ2would not. Now the two runs are both about to execute a branchinginstruction. If on the left execution the guard is true we performthe assignment, and the same assignment is performed on the right.Hence the difference in cost is preserved. If the guard on the left isfalse we loop, performing another assignment, while on the secondrun we don’t. RelSym would explore these paths finding an initialstate, that is a set of concrete values for the array for which theexecution of the two programs would lead to a final relational statewhere γ1 > γ2. We stress here how RelSym can, with specific rules,relationally analyze programs with different syntactical structuresby looking for synchronization points, i.e., branching instructions,to maximise relational reasoning.

    1) t←0;o←02) for(i in 1 : len(a)) do3) t←a[i] + t4) if (t ≥ k ∧ o ≤ 0) then5) o←i6)

    Version 1

    t←0;o←0for(i in 1 : len(a)) do

    if (t ≥ k) thenif (o ≤ 0) theno←i

    else t←a[i] + tVersion 2

    Figure 3: The two versions of the program are not cost equivalent.

    Refuting non-interference - symbolic bounds with weak invariant.The next running example involves non-interference[Goguen andMeseguer 1982]. Non-interference was introduced as a strong con-fidentiality guarantee preventing information to flow from secretvalues to public observable values. Non-interference can be for-mally stated as a relational property of two executions of a singleprogram with different inputs: a program c is non-interferent ifgiven two input memoriesM1 andM2 that agree on public dataand possibly differ on confidential data, the execution of c onM1andM2 results in memoriesM ′1 andM ′2 , respectively, that agreeon public data. That is, secret variables don’t interfere with observ-able public variables. Let’s consider the program c in Figure (4). Theprogram takes in input a secret vector of integers s and passwordvector of integers p of the same length. It then scans the arrays andchecks whether they are point wise equal. If not it saves in o the in-dex of the first difference. If we assume s to be an high level variableand p,o, t low level variables, this program is obviously interferent.Starting from two memories where len(s) = len(p) ∧ p1 = p22 wecan very well reach a final state where o1 = o2 ∧ t1 = t2 does nothold. We can check this (i.e., refute non-interference) for arbitrarylength arrays of size n. In particular by using the relational invari-ant Iw ≡ (t1 = t2 ∧ o1 = o2) ⇔ s1 = s2. Using RelSym with that

    1) t←0;o←02) for(i in 1 : len(s)) do3) if(s[i] , p[i] ∧ o ≤ 0) then4) o←1; t←i

    Figure 4: Interferent program

    invariant will allow to disprove the postcondition γ1 = γ2, but the2Equality on arrays is point wise equality, and can be easily encoded in a first orderlogic formula with one universal quantifier.

    initial memories that RelSym would find might not correspond toreal counterexamples this because the relational invariant was notstrong enough.

    Counterexample generation for non-interference - symbolic boundswith strong invariant. In the above program we can get exact coun-terexamples by choosing the stronger relational invariant Is ≡Iw ∧ t1 = minh s1[h] , p1[h] ∧ t2 = minh s2[h] , p2[h] ∧ o1 ∈{0, 1} ∧ o2 ∈ {0, 1}3. As we can see we need to specify the func-tional (unary) behavior of the two programs in the relational in-variant in order to strengthen it. RelSym would then disprovethe specification by providing a relational initial memoryM forwhich the precondition holds and a final relational memoryM ′related by the operational semantics of the program. For instance:M(p) = ([0], [0]),M(s) = ([0], [1]),M ′(o) = (0, 1),M ′(t) = (0, 1).

    4 CONCRETE LANGUAGES: FOR, RFORAs already mentioned RelSym is composed by four languages. Thatis, we extend the semantics of the simplest language FOR in twodifferent directions: relationally (RFOR), and symbolically (SFOR).And then we extend them both to obtain RSFOR. In this sectionwe describe the simplest language which is an imperative language(FOR) that contains for-loops and computes over integers and arraysof integers, and then extend it to a relational language (RFOR). Werefer to these two languages as concrete to distinguish them fromthe symbolic languages that we will build on top of them in Section5.

    4.1 FORPrograms in FOR have the following grammar, where v ∈ Z arevalues:

    e ::= e ⊕ e | a[e] | len(a) | x | vc ::= skip | c; c | x←e | a[e]←e | if e then c else c |

    for (x in e:e) do c

    A variable x ∈ Var denotes an integer while an array name a ∈Arrvar denotes a function which maps the set of natural numbers{1, ..., l} to the set Z, with l denoting the length of the array. Theset of such functions is denoted by Array. The symbol ⊕ denotesan arithmetic operation in {+,−, . . . }. Expressions are standardlyevaluated using a big step judgment ⟨M, e⟩ ⇓F v whose definingrules we omit. Programs c are evaluated through a, mainly standard,small step judgment (M, c) F−→ (M ′, c ′), where memoriesM,M ′ ∈Mem are partial functions with type (Var → Z) ∪ (Arrvar →Array). We only show one rule for for-loop construct evaluation inFigure 5. Note that for-loops, and thus FOR programs, are alwaysterminating.

    4.2 Assertions, triples, validityWe state and validate program specifications using Hoare triples{Φ}c{Ψ}, where c is a command in FOR and Φ and Ψ (respectively,the pre- and post-condition of the triple) are assertions. Asser-tions are first-order logical formulas with primitive predicates that

    3Again, this invariant is expressible in the language, but it can be expressed easily inthe language of our assertions.

    5

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    for-unroll⟨M, e1⟩ ⇓F v1 ⟨M, e2⟩ ⇓F v2 v1 ≤ v2

    (M, for (x in e1:e2) do c) F−→©«M,x←v1; c; if v2 −v1 then for (x in v1 + 1:v2) do celse skip ª®¬Figure 5: A rule for loops in FOR

    compare arithmetic expressions aexp. The latter are built from ex-pressions in FOR extended with integer-valued logical variables(i ∈ Lvar) and array expressions α . Array expressions include arraynames a, and array update expressions α[aexp1 7→ aexp2], whichdenotes the array α with the value at index aexp1 updated to aexp2.Array expressions allow us to express and reason about updates onarrays using the the extensional theory of arrays [McCarthy 1961].The truth of a unary assertion Φ is evaluated against a memoryM ∈ Mem and a logical interpretation I ∈ Intlog ≡ ZLvar. WewriteM ⊨I Φ to denote that Φ holds in memoryM with inter-pretation I. The following definition, although standard, is givenbecause it will later be extended to a relational setting.

    Definition 1. Let Φ and Ψ be unary assertions and c be a FORcommand. We say that the triple {Φ}c{Ψ} is valid, and we write ⊨{Φ}c{Ψ}, if and only if ∀M1,M2 ∈ Mem,I ∈ Intlog, ifM1 ⊨I Φand (M1, c) F−→∗(M2, skip) thenM2 ⊨I Ψ.

    4.3 RFORTo enable relational reasoning we first build a relational languageRFOR on top of FOR. Intuitively, execution of a single RFOR pro-gram represents the execution of two FOR programs. Inspired bythe approach of Pottier and Simonet [2003], we extend the gram-mar of FOR with a pair constructor ⟨· | ·⟩ which can be used atthe level of values ⟨v1 |v2⟩, expressions ⟨e1 | e2⟩, or commands⟨c1 | c2⟩. Notice that ci , ei ,vi for i ∈ {1, 2} are commands, expres-sions, and values in FOR, hence nested pairing is not allowed.This syntactic invariant is preserved by the rules handling thebranching instruction. Pair constructs are used to indicate wherecommands, values, or expressions might be different in the twounary executions represented by a single RFOR execution. To de-fine the semantics for RFOR, we first extend memories to allowprogram variables to map to pairs of integers, and array variablesto map to pairs of arrays. That is, the type of memories for RFORis (Var→ Z ∪ Z2) ∪ (Arrvar→ Array ∪Array2). The semanticsof RFOR is defined as a big step judgment ⟨M, e⟩ ⇓RF v for expres-sions and a small step judgment (M, c) RF−→ (M ′, c ′) for commands,whereM,M ′ are relational memories, c, c ′ are commands inRFOR,v ranges over Z∪Z2, and e is a relational expression. Figure 6 showsa selection of the inference rules for these judgments. The rulesuse auxiliary functions ⌊·⌋1 and ⌊·⌋2, which project, respectively,the first (left) and second (right) elements of a pair construct (i.e.,⌊⟨c1 | c2⟩⌋i = ci , ⌊⟨e1 | e2⟩⌋i = ei with ⌊v⌋i = v when v ∈ Z), andare homomorphic for other constructs. For a relational memoryM,we write ⌊M⌋i for the (unary) memory that projects the co-domainappropriately: ∀n ∈ dom(M). ⌊M⌋i (n) = ⌊M(n)⌋i . Rule r-lift isthe only evaluation rule for RFOR expressions. It evaluates the leftand right projections of the memory and expression, and combines

    the results into either a single value, if both projections producethe same result, or a pair value otherwise. Rule r-if-false-falseshows what happens if the left and right executions both agree ontaking the false branch: the command if e then ctt else cff stepsto command cff . However, if the left and right execution disagreeon which branch to take, we need to introduce a command pair con-struct to indicate that the command being executed differs in the leftand right executions. One instance of this is rule r-if-false-true.We ensure well-formedness of the paired commands by projectingctt and cff before pairing them up. Rule r-pair-step evaluates apair command by picking one projection, non nondeterministically,and evaluating it one step, using the semantics of FOR. The helperfunctionmerge(·, ·) merges two FOR memoriesM1 andM2 intoa RFOR memory, using as few pair values as possible:

    merge(M1,M2) = λm.{M1(m) ifM1(m) =M2(m)(M1(m),M2(m)) otherwise

    Another rule, not shown in the figure, reduces (M, ⟨skip | skip⟩)to (M, skip). The rules regarding array assignments now haveto take into account that arrays might differ in the two runs. Inparticular, given the command a[el ]←eh the two expressions eland eh might evaluate differently in the left and right projections.In the case whereM(a) is a unary array but the index expressionevaluates to a pair value then the updated array will be a pair ofarrays, as shown in r-arr-ass-split.

    4.4 Relational assertions, relational triples, andrelational validity

    We again use Hoare triples to provide specifications of RFOR pro-grams. However, assertions for RFOR must be able to express prop-erties of both executions of a program, and the relationship betweenthem. To achieve this, we extend expressions in the language toinclude indexed program variables and array variables, that is weequip an array name a or a program variable x with an indexi ∈ {1, 2} so that, for example a1 denotes the array a in the leftexecution, or x2 denotes the variable x in the right execution. Werefer to the extended language as relational assertions. We extendrelational operators (=, ≤,

  • Relational Symbolic Execution PPDP ’19, October 7–9, 2019, Porto, Portugal

    r-if-false-false⟨M, e⟩ ⇓RF ⟨v1 |v2⟩v1 ≤ 0 v2 ≤ 0

    (M, if e then ctt else cff ) RF−→ (M, cff )

    r-if-false-true⟨M, e⟩ ⇓RF ⟨v1 |v2⟩ v1 ≤ 0 v2 > 0

    (M, if e then ctt else cff ) RF−→ (M, ⟨⌊cff ⌋1 | ⌊ctt⌋2⟩)

    r-arr-ass-split⟨M, el ⟩ ⇓RF vl ⟨M, eh⟩ ⇓RF vh M(a) = f⌊vl ⌋1, ⌊vl ⌋2 ∈ dom(f ) ⌊vl ⌋1 , ⌊vl ⌋2

    f1 = f [⌊vl ⌋1 7→ ⌊vh⌋1] f2 = f [⌊vl ⌋2 7→ ⌊vh⌋2](M,a[el ]←eh ) RF−→ (M[a 7→ ⟨f1 | f2⟩], skip)

    r-pair-step{i, j} = {1, 2}

    (⌊M⌋i , ci ) F−→ (M ′i , c ′i )c ′j = c j M ′j = ⌊M⌋jM ′ = merge(M ′1,M ′2)

    (M, ⟨c1 | c2⟩) RF−→ (M ′, ⟨c ′1 | c ′2⟩)

    r-lift⟨⌊M⌋1, ⌊e⌋1⟩ ⇓F v1⟨⌊M⌋2, ⌊e⌋2⟩ ⇓F v2

    v =

    {v1 if v1 = v2

    (v1,v2) otherwise⟨M, e⟩ ⇓RF v

    Figure 6: Semantics of RFOR (selected rules).

    that the iterations of a loop go in lockstep until at least one sideterminates (after which the other side may continue). In fact, inorder to keep the design of the language simple we only allow paircommands to be introduced by a branching instruction. In generalthis causes RFOR to not be complete with respect to FOR. So it isnot possible to use invariants that hold between different iterationsby using rule such as the dissonant loop rule in [Beringer 2011].Indeed in RelSym the following two programs cannot not be provedequivalent, for arbitrary positive n: for (i in 1:2 ∗ n) do x←x + 1and for (i in 1:n) do x←x + 1; for (i in 1:n) do x←x + 1.

    5 SYMBOLIC LANGUAGES: SFOR, RSFORSymbolic execution [King 1976] extends a language with symbolicvalues that represent unknown or undetermined concrete values.Symbolic execution uses symbolic values in logical formulas thattrack the conditions under which a particular execution path istaken. By exploring different execution paths and finding satisfyingassignments to these logical formulas (i.e., finding concrete valuesto substitute for symbolic values such that the formulas will be satis-fied), symbolic execution of a program can be used to find concretetest cases that demonstrate an assertion violation in a program.Conversely, if all execution paths of a program are explored and noviolation is found, then symbolic execution shows that a programis guaranteed to meet its specification. In this section, we extendthe FOR and RFOR languages with symbolic execution, giving us,respectively, the languages SFOR and RSFOR. In particular, RSFORallows us to reason symbolically about two executions of a FORprogram, and thus enables us to look for violations of relationalassertions of FOR programs. However, we need to define SFOR inorder to fully specify the semantics of RSFOR, indeed, similarly tohow the semantics of RFOR relies on the semantics of FOR, thesemantics of RSFOR relies on the semantics of SFOR.

    The main insight of symbolic execution is to represent sets ofconcrete values (in this case integers) and sets of concrete runs of aprogram with symbolic values drawn from a set Symval. Symbolicvalues can be refined during the computation using constraintsexpressed as formulas in some formal theory. For instance, whenthe guard X of an if construct is symbolic, we might choose tosymbolically execute the true branch and refine the set of possibleconcrete values that X denotes by adding the constraint X > 0 tothe path condition. The connection between symbolic languagesand concrete languages is given by ground substitutions σ ∈ Σ ≡

    Symval→ Z ∪ Array. We say that a constraint ϕ is satisfiable ifthere exists a σ ∈ Σ that makes it true. That is, if substituting allthe symbolic values X appearing in ϕ with σ (X ) gives us a truestatement. If that’s the case we write σ |= ϕ. When we are onlyinterested in expressing the satisfiability of ϕ with no interest inspecifying the actual substitutions wewill write SAT(ϕ). Given a setof constraints S, abusing notation, we denote by S the constraint∧s ∈S

    s . Satisfiable path conditions denote actual concrete executions.

    That is, all those concrete executions which assign to the symbolicvalues concrete values that make the path condition true. If a pathcondition is unsatisfiable then it does not represent any concreteexecution. A set of constraints is valid if it is true under everypossible substitution. We denote the validity of a constraint ϕ by|= ϕ. Building on the previous section we can now define the twosymbolic languages SFOR and RSFOR.

    5.1 SFORWe extend the syntax of FOR expressions by adding to its valueselements X ∈ Symval, denoting symbolic values. Now memoriesin SFOR map program variables to either integers or symbolic val-ues. We also represent symbolic arrays in memory as pairs (X ,v),where v is a (concrete or symbolic) integer value representingthe length of the array, and X is a symbolic value representingthe array contents, as in the standard theory of arrays [McCarthy1961]. The content of the arrays can be refined in a set of con-straints described below. Thus, memories in SFOR have the type(Var → Vs) ∪ (Arrvar → Arrays), where Vs ≡ Z ∪ Symvaland Arrays ≡ Symval × Vs. Configurations in SFOR are triples(M, c,S) where M is a memory, c is a SFOR command, and Sis a set of constraints. Constraints are first-order logical formulaswith primitive predicates that compare expressions (e) over con-crete (n ∈ Z), symbolic values (X ∈ Symval) and logical variables(i ∈ Lvar). Constraint expression select(e1, e2) represents the (inte-ger) result of reading the array denoted by e1 at the index denotedby e2, while store(e1, e2, e3) represents the (array) result of updatingthe array denoted by e1 at index e2 with value e3. A set of constraintsS is used to record restrictions on symbolic values that must holdin order for program execution to reach a specific configuration.

    Note that although both assertions and constraints are logical for-mulas that include comparisons of expressions, they differ becauseassertions may contain program variables and array names but may

    7

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    not contain symbolic values; constraints on the other hand maycontain symbolic values (including select(·, ·) and store(·, ·, ·) ex-pressions) andmay not contain program variables or array variables.Given a memoryM, we can translate assertions to constraints, us-ing M to replace program variables and array names with the(symbolic or concrete) valuesM maps them to. We write J·KM forthis translation function defined inductively on the shape of theexpression. Symbolic values can now appear in expressions, so afor loop executed by unrolling might not terminate. For this reasonwe extend the category of commands to also contain commands ofthis form: for (x in e1:e2) doI c . Where I is an assertion intendedto be a loop invariant. The two kinds (with and without invariant)of for-loops are treated as distinct syntactic forms.

    The semantics of SFOR is defined through a big-step judgment,⟨M, e,S⟩ ⇓SF ⟨v,S′⟩, for expressions, and a small-step judgment(M, c,S) SF−→ (M ′, c ′,S′) for commands. Figure 7 shows someselected rules defining the judgments. Notice that evaluating an ex-pression might generate new symbolic values, and this is why also⇓SF returns an updated set of constraintsS′. In rules for conditionals,like the rule s-if-true, we record in the constraint set the infor-mation about the control flow path. Rules handling the conditionalsmake the small-step operational semantics non-deterministic, sincewe have to consider both the case when the guard reduces to a valuegreater than 0 and when it reduces to a value less or equal than 0. Inrules for arrays, we record in the constraint set the description of ar-rays. For example, the rule s-arr-read records the selection in theconstraint using a fresh symbol Y which has never occurred in thecomputation before that point. Rule s-arr-write evaluates the in-dex of the array to update and the right hand side of the assignmentafter updating the memory it records the array update in the setof constraints. As already mentioned we allow the user to specifyinvariant for loops and use the rule s-for-inv. This rule allows toskip in one step the whole unrolling of the for-loop provided thatthe user has specified an actual inductive invariant. Specifically,the semantic judgment ⊨ {I ∧ e1 ≤ x ∧ x ≤ e2}cb {I [x + 1/x]}imposes that I holds before and after every iteration of the bodyof the loop provided that the counter variable x is between thebounds. Checking that e1, e2 ∈ aexp makes sure that the premiseof the triple is actually an assertion and does not contain symbolicvalues, as it could be the case since e1, e2 are expressions in SFOR.The additional check, |= S′′ =⇒ JI [e1/x] ∧ e1 ≤ e2KM , imposesthat the constraints collected before executing the loop are strongenough to imply the invariant right before the start of the loop. Theconfiguration to which the for-loop with invariant steps to has aset of constraint Sf which records the fact that the for-loop hasterminated and so includes the constraint JI [e2 + 1/x]KMf . Thefinal memoryMf maps to fresh symbolic values all the variables,or array names which might have been updated in the body cb(Upd(·) performs a syntactic check on cb , soundly approximatingthe set of variables updated by cb ). Notice that we don’t update thelength of the arrays, because we consider only arrays of fixed (staticor concrete) length. At the exit of the loop the counter variable hasto map to the value to which the second guard of the for-loop wasreduced to.

    s-arr-read⟨M, e,S⟩ ⇓SF ⟨vs ,S′⟩ M(a) = (X ,v ′s ) Y fresh

    ⟨M,a[e],S⟩ ⇓SF ⟨Y ,S′ ∪ {Y = select(X ,vs ),vs > 0,vs ≤ v ′s }⟩

    s-arr-write⟨M, e1,S⟩ ⇓se ⟨v1,S′⟩ ⟨M, e2,S′⟩ ⇓se ⟨v2,S′′⟩M(a) = (X ,vl ) Y fresh M ′ =M[a 7→ (Y , l)]S′′′ ≡ S′′ ∪ {Y = store(X ,v1,v2),v1 > 0,v1 ≤ l}

    (M,a[e1]←e2,S) SF−→ (M ′, skip,S′′′)

    s-if-true⟨M, e,S⟩ ⇓SF ⟨vs ,S′⟩

    (M, if e then ctt else cff ,S) SF−→ ⟨ctt ,S′ ∪ {vs > 0}⟩

    s-for-inv⟨M, e1,S⟩ ⇓SF ⟨v1,S′⟩ ⟨M, e2,S′⟩ ⇓SF ⟨v2,S′′⟩

    e1, e2 ∈ aexp ⊨ {I ∧ e1 ≤ x ∧ x ≤ e2}cb {I [x + 1/x]}

    Mf = λn.

    v2, if n = xX , if n ∈ Upd(cb ),n ∈ Var,X fresh

    (X , l), if n ∈ Upd(cb ),M(n) = (Z , l),X freshM(n) otherwise|= S′′ =⇒ JI [e1/x] ∧ e1 ≤ e2KMSf = S′′ ∪ {JI [e2 + 1/x]KMf }

    (M, for (x in e1:e2) doI cb ,S) SF−→ (Mf , skip,Sf )Figure 7: Semantics of SFOR (selected rules).

    5.2 RSFORSimilarly to what we did in the previous section, we now extend thelanguage RFOR to RSFOR using symbolic values X . The symbolicextension of the relational language follows the same steps as theunary with the difference that now symbolic values can also appearin pairs of expressions ⟨e1 | e2⟩ and pairs of commands ⟨c1 | c2⟩ andpairs of values in a memory (v1,v2). As in the case of the previouslanguages we give the semantics to RSFOR by means of a big stepsemantics for symbolic relational expressions proving judgmentsof the shape ⟨M, e,S⟩ ⇓RSF ⟨v,S′⟩, and a small step semantics forsymbolic relational commands proving judgments of the shape(M, c,S) RSF−−→ (M ′, c ′,S′). We provide a selection of the rulesto prove those judgments in Figure 8. Projection functions neednow to be smartly extended to relational assertions, this would beparticularly useful for example when a for-loop with invariant Iappears in one of the branches of an if construct with a guardwhich evaluates to a relational value ⟨v1 |v2⟩, since both casesv > 0,v ≤ 0 have to be considered. For this reason we extendprojection functions for basic relational assertions in the followingway (where {p,q} = {a,b}, and where the function Idx(·) returnsthe set (potentially empty) of indices i ∈ {1, 2} appearing in arelational expression):

    ⌊ea ⊗ eb ⌋i = ⌊ea⌋i ⊗ ⌊eb ⌋i if Idx(eq ) ⊆ Idx(ep ) = {i} orIdx(eq ) = Idx(ep ) = ∅

    ⌊ea ⊗ eb ⌋i = true otherwise⌊xi ⌋i = x⌊x⌋i = x

    For other forms of assertions projection functions behave homomor-phically. So for i ∈ {1, 2}we can nowdefine ⌊for (x in e1:e2) doI cb ⌋i ≡

    8

  • Relational Symbolic Execution PPDP ’19, October 7–9, 2019, Porto, Portugal

    for (x in ⌊e1⌋i :⌊e2⌋i ) do ⌊I ⌋i ⌊cb ⌋i . Also,merge-s(·, ·) plays a simi-lar role in the relational symbolic semantics towhatmerge(·, ·) doesin the concrete one. The rule r-s-lift relies on SFOR. It evaluates arelational symbolic expression and returns a single symbolic value ifthe two unary symbolic execution reduce to the same integer value,otherwise it splits. Rule r-s-arr-ass-split takes care of an arrayassignment when the array is symbolic unary but the right handside of the assignment is different, and hence the array needs to besplit. Rule r-s-if-false-true is similar to the analogous rule forthe concrete semantics we presented in Figure 6, the main differenceis that now the path conditions are recorded in the constraint set.Rule r-s-if-right takes care of a pair command with a branchinginstruction on the right and a different command on the left. Thisrule, and a similar one for the left execution, helps synchronizationof the two runs. In rule r-s-pair-step takes care of the generalcase, where c1 ≡ c2 means structural equality, for instance c1 andc2 are both assignments. Similarly to the analogous concrete rule,one side of the two is chosen non-nondeterministically, and onestep on that side is performed using the unary symbolic semantics.Finally, the rule r-s-for-inv allows the user to specify a relationalinvariant for a for-loop which might diverge because one of theguards evaluates to a value containing a symbolic value. The ruler-s-for-inv behaves similarly to s-for-inv but in a relationalsetting.

    5.3 Unary and relational collecting semanticsBuilding on the SF−→ and RF−→ semantics, we define now two collectingsemantics which consider only reachable configurations, namelythose whose set of constraints is satisfiable. Overloading the symbol⇒ we will denote by it both the unary and relational collectingsemantics. Both semantics are defined through only one rule pre-sented in Figure 9. In rule set-step we remove from the set ofconfigurations taken in consideration the current configuration andwe add to it all the configurations reachable in one step that aresatisfiable.

    set-step

    Ft = {(M ′, c ′,S′) | (M, c,S)†−→ (M ′, c ′,S′) ∧ SAT(S′)}

    (M, c,S) ∈ F F ′ =(F \ {(M, c,S)}

    )∪Ft

    F ⇒ F ′

    Figure 9: Unary and relational collecting semantics rule schema.† ∈ {SF, RSF}

    6 META THEORYIn this section we will make more precise the connection betweenconcrete and symbolic languages. In order to do this, we need to rea-son about ground substitutions turning object containing symbolicvalues into concrete objects. Given a command c or an expressione in SFOR (or in RSFOR) and a ground substitution σ ∈ Σ we writeσ (c) (and σ (e)) for the application of σ to c (and e). We can alsoapply a substitution to a unary symbolic memory:

    Definition 2. Given a ground substitution σ ∈ Σ we define itsapplication to a unary symbolic memory as

    r-s-lift⟨⌊M⌋1, ⌊e⌋1,S⟩ ⇓SF ⟨v1,S′⟩ ⟨⌊M⌋2, ⌊e⌋2,S′⟩ ⇓SF ⟨v2,S′′⟩

    ⟨v,S′′′⟩ ={

    ⟨v1,S′′⟩ if (v1,v2) ∈ Z2 ∧v1 = v2⟨(v1,v2),S′′⟩ otherwise⟨M, e,S⟩ ⇓RSF ⟨v,S′′′⟩

    r-s-arr-ass-splitM(a) = (X , l) ∈ Symval × Vs Z fresh W fresh⟨M, ei ,S⟩ ⇓RSF ⟨vi ,S′⟩ ⟨M, eh ,S′⟩ ⇓RSF ⟨vh ,S′′⟩S′′′ = S′ ∪ {⌊vh⌋1 , ⌊vh⌋2,Z = store(X , ⌊vi ⌋1, ⌊vh⌋1)}

    S′′′′ = S′′′ ∪ {W = store(X , ⌊vi ⌋2, ⌊vh⌋2), 0 < ⌊vi ⌋1 ≤ l , 0 < ⌊vi ⌋2 ≤ l}(M,a[ei ]←eh ,S) RSF−−→ (M[a 7→ ((Z , l), (W , l))], skip,S′′′′)

    r-s-if-false-true⟨M, e,S⟩ ⇓RSF ⟨v,S′⟩

    S′′ = S′ ∪ {⌊v⌋1 ≤ 0, ⌊v⌋2 > 0}(M, if e then ctt else cff ,S) RSF−−→ (M, ⟨⌊cff ⌋1 | ⌊ctt⌋2⟩,S′′)

    r-s-pair-step(⌊M⌋i , ci ,S) SF−→ (M ′i , c ′i ,S′′)(

    if · then · else · , c j = c ′j or c1 ≡ c2)

    {1, 2} = {i, j} M ′j = ⌊M⌋j M ′ = merge-s(M ′1,M ′2)(M, ⟨c1 | c2⟩,S) RSF−−→ (M ′, ⟨c ′1 | c ′2⟩,S′′)

    r-s-if-rightc1 ≡ if · then · else · c2 < {if · then · else ·, skip}(⌊M⌋2, c2,S) SF−→ (M ′2, c ′2,S′′) M ′ = merge-s(⌊M⌋1,M ′2)

    (M, ⟨c1 | c2⟩,S) RSF−−→ (M ′, ⟨c1 | c ′2⟩,S′′)

    r-s-for-inv⟨M, ea ,S⟩ ⇓RSF ⟨va ,S′⟩ ⟨M, eb ,S′⟩ ⇓RSF ⟨vb ,S′′⟩⊨ {I ∧ ea ≤ x ∧ x ≤ eb }c{I [x1 + 1/x1][x2 + 1/x2]}|= S′′ ⇒ JI [⌊ea⌋1/x1][⌊ea⌋2/x2] ∧ ea ≤ eb KMSf = S′′ ∪ {JI [⌊vb ⌋1 + 1/x1][⌊vb ⌋2 + 1/x2]KMf }

    Mf = λn.

    vb , if n = x(X ,Y ), if n ∈ Updr(c),M(n) ∈ Vs ∪ Vs2

    X fresh,Y fresh((X , l), (Y , l)), if n ∈ Updr(c),M(n) ∈ Arrays

    π2(M(n)) = l((X , l), (Y , l)), if n ∈ Updr(c),M(n) ∈ Arrays2

    π2(π2(M(n))) = lM(n), otherwise

    (M, for (x in ea :eb ) doI c,S) RSF−−→ (Mf , skip,Sf )Figure 8: Semantics of RSFOR (simplified selected rules).

    σ (M) = λm.

    σ (M(m)) ifM(m) ∈ Symvalσ (π1(M(m))) ifM(m) ∈ Symval × Vs

    M(m) otherwisewherem ranges over Var ∪ Arrvar.

    We have a similar definition for relational symbolic memorieswhich we omit here. From now on, we consider only substitutionsσ which respect the type of the program variables and array namesappearing in a symbolic expression or command. That is, given an

    9

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    expressing e (or command c) we consider substitutions σ for whichσ (e) (σ (c)) is an expression (command) in FOR (RFOR) whenever e(c) is an expression (command) in SFOR (RSFOR). We also want toconsider only substitutions mapping symbolic values to objects oftheir type. This is characterized by the following definition.

    Definition 3. We say that a ground substitution σ ∈ Σ validatesa configuration (M, c,S) and we write σ |= (M, c,S) iff σ |= S,∀a ∈ Arrvar.M(a) = (X ,v) ⇒ σ (X ) ∈ {1, . . . ,σ (v)} → Z, ∀x ∈Var.M(x) = X ⇒ σ (X ) ∈ Z, and σ respects the type of array namesand program variables in c .

    We also consider the natural partial order, ⪯ over Σ given by therelation {(σ1,σ2) ∈ Σ2 | ∀X ∈ dom(σ1).σ1(X ) = σ2(X )} .

    6.1 CoverageWe now want to formalize the idea that a run of the set semanticscan capture (cover) many concrete runs. To do this we formalizewhat a final configuration (F ) of the⇒ semantics (Figure 9) is.

    Definition 4. A unary (or relational) configuration s is final, andwe write Final(s), when s = (M, skip,S). A set of configurationsF is final, denoted Final(F ), if and only if forall s ∈ F .Final(s).

    The following lemma states that any concrete execution canbe covered by a symbolic path. This symbolic path will have asatisfiable set of constraints which will make it possible to map backsymbolic final configurations to the concrete final configuration ofthe concrete path.

    Lemma 6.1. IfF ⇒∗ F ′, (M1, c1,S1) ∈ F , andσ1 |= (M1, c1,S1)then∃kc1 ,∃(M2, c2,S2) ∈ F ′,∃σ2 ∈ Σ such that (σ1(M1), c1) F−→kc1(σ2(M2), c2) (or (σ1(M1), c1) RF−→

    kc1 (σ2(M2), c2)),σ2 ⊨ (M2, c2,S2),and σ1 ⪯ σ2.

    6.2 Proving and soundnessIn symbolic execution we want to execute symbolically a programin order to reason about multiple concrete executions. In order to dothis we need to specify an initial memory from which the symbolicexecution can start. Without loss of generality we choose as initialmemory the most abstract. This leads to the following definition:

    Definition 5. Let Φ be a unary assertion, and c a command inFOR. Define the following symbolic memory:

    MemΦ,c ≡ λn ∈ VarOf(Φ)∪VarOf(c).{

    X , if n ∈ Var(X ,L), if n ∈ Arrvar

    where all the variables X ,L are meant to be distinct and fresh, andthe function VarOf(·) returns the set of program variables and arraynames appearing in the argument.

    The previous definition can be easily extended to relational mem-ories, assertions, and commands. As we already discussed, we areinterested in using RelSym for proving valid specifications of pro-grams. If we want to prove that a triple {Φ}c{Ψ} is valid, we canexecute symbolically c starting from an initial symbolic configu-ration which satisfies the precondition Φ. If we reach only finalconfigurations whose set of constraints imply the postcondition Ψ,then the triple is valid. Formally:

    Definition 6. Let c be a command in FOR (or RFOR) and Φ andΨ unary (or relational) assertions. We say that c symbolically provesΨ from Φ, and we write c : Φ =⇒ Ψ iff there exists F such that• {(MemΦ,c , c, {JΦKMemΦ,c })} ⇒∗set F• Final(F )• ∀(M, skip,S) ∈ F . |= S =⇒ JΨKM

    It now makes sense to formulate the following soundness theo-rem:

    Theorem 6.2 (Soundness of verification). Let Φ and Ψ beunary (or relational) assertions and let c be a command in FOR (orRFOR). Then, if c : Φ =⇒ Ψ then ⊨ {Φ}c{Ψ}.

    Proof. By structural induction on c , using Lemma 6.1. □

    6.3 Finding counterexamples: strength ofinvariants and soundness

    We now want to formalize the fact that we can use RelSym forfinding counterexamples. Let us consider a program c , a precon-dition Φ and a postcondition Ψ. If starting to evaluate c from aninitial symbolic configuration and a set of constraints that satisfythe precondition Φ, we arrive in a final configuration whose set ofconstraint is consistent with the negation of the postconditionΨ (in-terpreted in the memory of the final configuration), then we knowthat the post-condition does not hold. This argument motivates thefollowing definition.

    Definition 7. Let c be a command in FOR (or RFOR) and Φ,Ψunary (or relational) assertions. We say that, c symbolically disprovesΨ from Φ and we write c : Φ ≠⇒ Ψ if and only if exists F such that• {(MemΦ,c , c, {JΦKMemΦ,c })} ⇒∗set F• ∃(M, skip,S) ∈ F .SAT(S ∪ {J¬ΨKM })

    A counterexample to the validity of a unary triple {Φ}c{Ψ}consists of a pair of concrete memoriesM1,M2 and I ∈ Intlogsuch thatM1 ⊨I Φ and (M1, c) F−→∗(M2, skip) butM2 ⊭I Ψ.

    We would like to be able to extract, from an execution showingc : Φ ≠⇒ Ψ, a counterexample for {Φ}c{Ψ}. Unfortunately, thiscannot always be done.

    Indeed because in presence of loops, invariants might just ap-proximate the state after the loop has terminated. That is the in-variants might not specify precisely enough the state after theloop body has been executed n times for arbitrary n. For instance:{z = 0 ∧ x > 0}for (i in 1:x) dotrue z←z + 1{false} is obviouslyan invalid triple but the invariant does not say much about thevalue of z after the loop has been executed x times. The invariantIs ≡ z = i would instead do the job, specifying exactly the finalstate. When invariants have this property we say they are strong.With Definition 8 we capture the notion of strength of an invariant.

    Definition 8. Given a command c ≡ for (x in e1:e2) doI cb inFOR (or in RFOR), we say that the invariant I is strong iff ∀σ1,σ2 ∈ Σ,if σ1 |= (Mf , skip,Sf ), σ2 |= (Mf , skip,Sf ), and σ1(M) =Rσ2(M) then σ1(Mf ) =U σ2(Mf ).

    Where R =((Var ∪ Arrvar) \U

    )∪ {x}, U = Upd(cb ) (or U =

    Updr(cb )), andM,Mf ,Sf are respectively the memory right before10

  • Relational Symbolic Execution PPDP ’19, October 7–9, 2019, Porto, Portugal

    the execution of the for-loop, and the memory and the set of constraintsafter the application of the rule s-for-inv (or r-s-for-inv).

    The following theorem allows to avoid false positives in interac-tive refutation.

    Theorem 6.3 (Soundness of counterexample finding). LetΦ,Ψ be unary (or relational) assertions and c a command in FOR (orRFOR). Then, if c : Φ ≠⇒ Ψ and all the invariants in c (if any) arestrong then ⊭ {Φ}c{Ψ}.

    Theorem 6.3 is a soundness result for counterexample findingwhich implies (relative) completeness of the proving system w.r.tto the semantics of FOR (and RFOR). Indeed, provided the programc is annotated with strong enough invariants, if RelSym cannotderive c : Φ ≠⇒ Ψ then it has to be the case that ⊨ {Φ}c{Ψ}. Thecompleteness just mentioned concerns the proving system and hasnothing to do with the semantic completeness of RFOR w.r.t toFOR which has been already ruled out in Section 4.4.

    7 IMPLEMENTATIONRelSym has been implemented in OCaml 4.06 in about 4k LOC. Thequeries on satisfiability of set of constraints are discharged using theSMT solver Z3 [De Moura and Bjørner 2008]. The implementationis not fully optimized.

    7.1 Checking the semantic judgmentRules s-for-inv and r-s-for-inv, include among the premises aHoare triple validity judgment, which ensures that the assertionprovided is an inductive invariant of the loop. By using semanticvalidity we allow other potential implementations to use differentanalysis techniques for the verification of that triple, e.g., a soundHoare logic for FOR (or RFOR). Since we want RelSym to be a self-contained tool, in the implementation we prove this judgment byrecursively calling RelSym. In particular, while executing the rule s-for-inv (or r-s-for-inv) on the command for (x in e1:e2) doI cwe use recursively RelSym to prove ⊨ {I ∧ e1 ≤ x ∧ x ≤ e2}c{I [x +1/x]}, by checking that indeed c : I ∧e1 ≤ x∧x ≤ e2 =⇒ I [x+1/x].This can also help in practice in finding the right invariant by givingthe user prompt feedback on why the assertion used at the momentis not an inductive invariant.

    7.2 Checking the strength of the invariantIf we want to use RelSym for finding counterexamples to speci-fications, we might need to check that the invariant is strong asin Definition (8), so that by Theorem 6.3 we can be sure that theground substitution provided (if any) by the SMT is indeed a coun-terexample. In particular, this ensures that, if the SMT returns a σsuch that σ |= Sf ∪ {¬JΨKMf }, then indeed:• σ (MemΦ,c ) ⊨ Φ• (σ (MemΦ,c ),σ (c)) F−→∗(σ (Mf ), skip)(or (σ1(M1), c1) RF−→

    ∗ (σ2(M2), c2)),• σ (Mf ) ⊨ ¬Ψ

    A way to check this property is to check for unsatisfiability thefollowing set of constraints:

    Sf ∪ {JI [v2 + 1/x]KFMf } ∪ {∨{X ∈F}

    X , X ′}

    where: F is the set of fresh symbols generated during the executionof the rule s-for-inv (or r-s-for-inv), and JI [v2 + 1/x]KFMf isthe result of taking the invariant where x has been substituted withv2+1, interpreted as a constraint throughMf , with all the symbolsin F substituted with their primed versions. If it is not the case thatSAT(Sf ∪ {JI [v2 + 1/x]KFMf } ∪ {

    ∨{X ∈F} X , X ′}) then there is

    only a possible way to satisfySf once the symbols generated beforethe loop have been fixed, that is given a ground substitution σ forwhich σ ⊨ S′′ then there is only one possible σ ′ such that σ ⪯ σ ′and σ ′ ⊨ Sf . This implies the strength of the invariant I .

    8 EXPERIMENTAL RESULTSWe compared our relational symbolic semantics with other tech-niques used to prove or finding counterexamples to relational prop-erties. In particular with naive self-composition, simple productprograms and the product programs construction of [Eilers et al.2018]. Since our implementation does not use any heuristics totry to improve efficiency it makes sense to compare it with vanillaversions of these techniques. Also, notice that product programsand self-composition can be easily embedded in our framework byjust executing self-composed programs and product programs inSFOR, that is by just using unary symbolic semantics. In this sec-tion we can see some experimental results that show that relationalsymbolic execution is comparable in terms of execution time, callsto the solver, and number of steps with respect to self-compositionand product programs. The results in Table (1) are about provingrelational properties, while in Table (2) the results are about findingcounterexamples to relational properties. Some of the examples aretaken from standard literature (sometimes adapting them to ourlanguage). In the table an R (Relational) means that relational sym-bolic execution was used, whileU denotes that the self composedprogram was analyzed with unary symbolic semantics, a P denotesa product program symbolically executed with unary semantics.Because of space reasons we only show information which showeddiscernible differences in resource usage. An ↑ denotes that thesymbolic execution had to be terminated because it was runningfor too long, while an ✗ means that the SMT solver was not able todischarge a query and so the result is unknown. Finally, a ? denotesabsence of information, necessary when RelSym ran out of timelimits. The results regarding execution time are an average over50 runs executed on an Intel CPU, 2.80GHz with 16 GB of RAMmemory.

    The examples concern properties such as non-interference, e.g., n-inter. example and inter. example series, ni-array example, or exe-cution time independence e.g., [Antonopoulos et al. 2017], or con-tinuity e.g., sum-k-lip-cont, or sort-k-lip-cont. On this benchmarkoverall relational symbolic execution performs better with respectto standard unary self composition and comparably to product pro-grams, in terms of execution time. Besides execution times (unaryand relational semantics) we can consider as measures also otherinformation such as the number of steps of the semantics (small-step #SS, big-steps #BS) performed, calls to the solver (#SMT) andnumber of final states reached (#S). Using these metrics showsmore clearly how a relational approach can, at times, outperformother approaches for the verification or interactive refutation ofrelational properties. [Eilers et al. 2018] construction for product

    11

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    Example R/U/P #BS #SS #SMT #S tm (s)Darvas R 8 5 3 2 0.39Darvas U 15 18 5 3 0.04Costanzo R 30032 42315 10921 4096 139Costanzo U ? ? ? ? ↑Antonopoulos R 68 101 20 10 0.15Antonopoulos U 70 94 22 10 0.16Terauchi[1] R 34 63 24 4 0.22Terauchi[1] P 309 2472 179 9 1.49Terauchi[1] U 46 141 22 4 0.22Terauchi[2] R 55 91 34 9 0.36Terauchi[2] U 31 155 10 3 ✗n-inter. example 1 R 5 4 1 1 0.01n-inter. example 1 P 33 56 26 4 0.01n-inter. example 1 U 9 16 1 1 0.01n-inter. example 2 R 5 4 1 1 0.01n-inter. example 2 U 9 16 1 1 0.01n-inter. example 3 R 7 9 1 1 0.01n-inter. example 3 P 30 87 24 2 0.16n-inter. example 3 U 13 36 1 1 0.02n-inter. example 4 R 14 13 8 4 0.08n-inter. example 4 P 51 109 35 9 0.2n-inter. example 4 U 16 14 10 4 0.1sum-k-lip-cont R 8 5 3 2 0.04sum-k-lip-cont U 8 11 3 2 ✗sort-k-lip-cont U 55 72 23 12 0.12sort-k-lip-cont R 31 45 12 6 0.11sort-k-lip-cont P 50 66 15 12 0.12

    Table 1: Experimental results of proving relational proper-ties. Where Darvas stands for [Darvas et al. 2005], Costanzostands for [Costanzo and Shao 2014], Antonopoulos standsfor [Antonopoulos et al. 2017], Terauchi stands for [Ter-auchi and Aiken 2005]

    Example R/P/U #BS #SS #SMT #S tm (s)ni-array R 291 380 132 37 1.08ni-array U 342 674 90 16 0.7[Eilers et al. 2018] R 9 7 3 2 0.03[Eilers et al. 2018] U 13 25 1 1 0.01inter. example1 R 3 1 1 1 0.01inter. example1 P 13 10 10 4 0.08inter. example1 U 21 33 5 3 0.04inter. example2 R 3 1 1 1 0.01inter. example2 P 13 10 10 4 0.07inter. example2 U 5 4 1 1 0.02inter-password R 485 714 169 64 1.40inter-password U 703 960 190 64 1.78

    Table 2: Experimental results for finding counterexamplesrelational properties.

    programs introduces new variables and new branching instructions.This is the main reasons why the number of SMT calls increases.More generally: consider the base product program construction in

    [Butler and Schulte 2011] and the number of basic instructions per-formed (e.g. assignments) as a measure: commands are duplicatedeven when it doesn’t help. Product self-composition is a genericsyntactic technique. E.g.: take c ≡ p←p + 1, and suppose we wantto show that: ⊨ {p1 = p2}c{p1 = p2}. Under product programs wecould reduce the problem to verifying: ⊨ {p1 = p2}c1×c2{p1 = p2}that is ⊨ {p1 = p2}p1←p1 + 1;p2←p2 + 1{p1 = p2}. In the unarysymbolic execution of the product program necessarily two assign-ments will be performed. While executing relationally p←p + 1might only execute one assignment.

    This evaluation shows that although we have trade-offs betweenthe different techniques and none of them is always better, in severalsituations relational symbolic execution brings clear improvements.

    9 RELATEDWORKSThe works most closely related to ours are the ones that have usedsymbolic execution for relational properties. [Milushev et al. 2012]use symbolic execution to check non-interference by means ofan analysis based on a type directed transformation of the pro-gram first presented in [Terauchi and Aiken 2005]. The analysistargets programs written in a subset of C which includes procedurescalls, and dynamically allocated data structures modeled througha heap. A main difference with our work is that they focus onlyon non-interference while we focus on arbitrary relational proper-ties. Additionally, they use self-composition while we focus on thedesign of a formal relational semantics. Finally, they use a genericapproach based on heaps, instead, we focus on arrays as concretedata structures and we leverage their properties in the design ofour semantics.

    In [Person et al. 2008] symbolic execution is used to check dif-ferences between program versions. The property they analyze,although relational can be easily described with two separate execu-tion of the two programs. Indeed, in their work symbolic executionis used separately for the two programs.

    Relational properties have also been studied through many othertechniques. We already mentioned different works that reduce theverification of relational properties to the one of properties throughself-composition [Barthe et al. 2004; Terauchi and Aiken 2005] andproduct programs [Asada et al. 2017; Butler and Schulte 2011; Eil-ers et al. 2018]. Several works have studied relational versions ofHoare logics. For example, Benton [Benton 2004] studies relationalHoare logics for noninterference and program equivalence, andBarthe et al. [Barthe et al. 2014b, 2012] study relational Hoare logicsfor relational probabilistic properties, such as differential privacy.Their work is based on a denotational semantics based on couplingsand probabilistic liftings, while ours is operational in nature. Otherworks such as [Banerjee et al. 2016] have focused on a relationalHoare logics with frame rules to deal with heap based semantics,and on situations where keeping the traces not aligned might bebeneficial in the same spirit of dissonant loop rules introduced in[Beringer 2011]. Other works instead tried to maximize the amountof synchronicity between the two runs [Pick et al. 2018]. Severalworks have studied type systems for the verification of differentrelational properties, some examples are noninterference [Nanevskiet al. 2013; Pottier and Simonet 2003; Volpano et al. 1996], securityof cryptographic implementations [Barthe et al. 2014a], differential

    12

  • Relational Symbolic Execution PPDP ’19, October 7–9, 2019, Porto, Portugal

    privacy and mechanism design [Barthe et al. 2015], and relationalcost [Çiçek et al. 2017] These approaches are quite different fromours. For instance [Çiçek et al. 2017] focuses on functional programs,and uses a type discipline which requires a lot of domain exper-tise. Other works have applied abstract interpretation techniquesto noninterference [Assaf et al. 2017; Feret 2001; Giacobazzi andMastroeni 2004]. While symbolic execution and abstract interpreta-tions share several similarities, the techniques that the approachesrely on are quite different. In [Austin and Flanagan 2012] authorsintroduce faceted values, that resemble our paired values. They dothis to simulate simultaneous runs of the same program on differ-ent security levels, in order to provide information flow securitywith a dynamic approach as opposed to a static one as we do inthis work. Cartesian Hoare Logic [Sousa and Dillig 2016] and itsquantitative extension [Chen et al. 2017] can be used for reasoningabout generic k-safety properties, and their quantitative analogous.The language that Cartesian Hoare Logic considers includes arraysand while loops with breaks. The class of properties they considergoes beyond relational properties and their analysis is automated.The main difference between their approach and ours is that weperform symbolic execution which can also be used to finding bugswhile they only focus, at least on the theoretical part, on provingcorrectness via Hoare Logic. Kwon et al. [Kwon et al. 2017] recentlyproposed a program analysis for checking information flow poli-cies over streams based on a technique for synthesizing relationalinvariants. This analysis is not based on symbolic execution, butwe plan to explore if their algorithm for synthesizing relationalinvariants can be used in our setting.

    Similar to our work their semantics is based on couplings and theprobabilistic lifting of relations. Close to our work is also [Albargh-outhi and Hsu 2018] where a proof technique, casting differentialprivacy proofs as a strategy in a game encoded as a set of con-straints, is presented. In that work authors focus again in findingproof and not in finding counter examples to differential privacy.

    10 CONCLUSIONSIn this work we presented RelSym, a foundational framework forrelational symbolic execution. The framework supports interactiverefutation as well as proving of relational properties for a languagewith arrays and loops. We provided some meta theoretical resultsabout symbolic execution for its use with respect to proving va-lidity of triples and disproving them and we provided necessaryconditions for which disproving is actually sound. We have shownthe flexibility of this approach by analyzing examples for a rangeof different relational properties. We compared the analysis of thisproperties using different approaches, i.e., self-composition, prod-uct programs and relational approach. We have implemented thetool and in the future we plan to address more complex featureslike functions, promises and closures, as well as exploring the gen-eration of relational loop invariants [Chen et al. 2011; Hoder et al.2011; Khurshid et al. 2003; Kwon et al. 2017; Qin et al. 2013], limitingin this way the need for annotations provided by the user.

    REFERENCESAws Albarghouthi and Justin Hsu. 2018. Synthesizing coupling proofs of differential

    privacy. PACMPL 2, POPL (2018), 58:1–58:30. https://doi.org/10.1145/3158146

    Timos Antonopoulos, Paul Gazzillo, Michael Hicks, Eric Koskinen, Tachio Terauchi,and Shiyi Wei. 2017. Decomposition Instead of Self-composition for Proving theAbsence of Timing Channels. In Proceedings of the 38th ACM SIGPLAN Conferenceon Programming Language Design and Implementation (PLDI 2017). ACM, New York,NY, USA, 362–375. https://doi.org/10.1145/3062341.3062378

    Kazuyuki Asada, Ryosuke Sato, and Naoki Kobayashi. 2017. Verifying RelationalProperties of Functional Programs by First-order Refinement. Sci. Comput. Program.137, C (April 2017), 2–62.

    Mounir Assaf, David A. Naumann, Julien Signoles, Eric Totel, and Frédéric Tronel. 2017.Hypercollecting semantics and its application to static analysis of information flow.In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of ProgrammingLanguages, POPL 2017, Paris, France, January 18-20, 2017. 874–887.

    Thomas H. Austin and Cormac Flanagan. 2012. Multiple Facets for Dynamic Informa-tion Flow. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages (POPL ’12). ACM, New York, NY, USA,165–178. https://doi.org/10.1145/2103656.2103677

    Anindya Banerjee, David A. Naumann, andMohammad Nikouei. 2016. Relational Logicwith Framing and Hypotheses. In 36th IARCS Annual Conference on Foundationsof Software Technology and Theoretical Computer Science, FSTTCS 2016, December13-15, 2016, Chennai, India. 11:1–11:16.

    Gilles Barthe, Juan Manuel Crespo, and César Kunz. 2011. Relational Verification UsingProduct Programs. In Proceedings of the 17th International Conference on FormalMethods (FM’11). Springer-Verlag, Berlin, Heidelberg, 200–214. http://dl.acm.org/citation.cfm?id=2021296.2021319

    Gilles Barthe, Pedro R. D’Argenio, and Tamara Rezk. 2004. Secure Information Flow bySelf-Composition. In Proceedings of the 17th IEEE Workshop on Computer SecurityFoundations (CSFW ’04).

    Gilles Barthe, Cédric Fournet, Benjamin Grégoire, Pierre-Yves Strub, Nikhil Swamy,and Santiago Zanella Béguelin. 2014a. Probabilistic relational verification for cryp-tographic implementations. In The 41st Annual ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages, POPL ’14, San Diego, CA, USA, January20-21, 2014. 193–206.

    Gilles Barthe, Cédric Fournet, Benjamin Grégoire, Pierre-Yves Strub, Nikhil Swamy,and Santiago Zanella-Béguelin. 2014b. Probabilistic Relational Verification forCryptographic Implementations. SIGPLAN Not. 49, 1 (Jan. 2014), 193–205.

    Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, andPierre-Yves Strub. 2015. Higher-Order Approximate Relational Refinement Typesfor Mechanism Design and Differential Privacy. In Proceedings of the 42nd AnnualACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL2015, Mumbai, India, January 15-17, 2015. 55–68. https://doi.org/10.1145/2676726.2677000

    Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. 2012.Probabilistic relational reasoning for differential privacy. (2012), 97–110. https://doi.org/10.1145/2103656.2103670

    Nick Benton. 2004. Simple Relational Correctness Proofs for Static Analyses andProgram Transformations. In Proceedings of the 31st ACM SIGPLAN-SIGACT Sym-posium on Principles of Programming Languages (POPL ’04). ACM, New York, NY,USA, 14–25.

    Lennart Beringer. 2011. Relational Decomposition. In Interactive Theorem Proving.Springer Berlin Heidelberg, Berlin, Heidelberg, 39–54.

    Michael J. Butler and Wolfram Schulte (Eds.). 2011. FM 2011: Formal Methods - 17thInternational Symposium on Formal Methods, Limerick, Ireland, June 20-24, 2011.Proceedings. Lecture Notes in Computer Science, Vol. 6664. Springer.

    Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2010. Continuity Anal-ysis of Programs. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Sym-posium on Principles of Programming Languages (POPL ’10). ACM, New York, NY,USA, 57–70. https://doi.org/10.1145/1706299.1706308

    Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. 2012. Continuity andRobustness of Programs. Commun. ACM 55, 8 (Aug. 2012), 107–115.

    Jia Chen, Yu Feng, and Isil Dillig. 2017. Precise Detection of Side-Channel Vulnera-bilities using Quantitative Cartesian Hoare Logic. In Proceedings of the 2017 ACMSIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas,TX, USA, October 30 - November 03, 2017. 875–890.

    Shikun Chen, Zhoujun Li, Xiaoyu Song, and Mengjun Li. 2011. An Iterative Method forGenerating Loop Invariants. In Proceedings of the 5th Joint International Frontiers inAlgorithmics, and 7th International Conference on Algorithmic Aspects in Informationand Management (FAW-AAIM’11).

    Ezgi Çiçek, Gilles Barthe, Marco Gaboardi, Deepak Garg, and Jan Hoffmann. 2017.Relational cost analysis. In Proceedings of the 44th ACM SIGPLAN Symposium onPrinciples of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017.316–329.

    David Costanzo and Zhong Shao. 2014. A Separation Logic for Enforcing DeclarativeInformation Flow Control Policies. In Principles of Security and Trust - Third In-ternational Conference, POST 2014, Held as Part of the European Joint Conferenceson Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014,Proceedings. 179–198. https://doi.org/10.1007/978-3-642-54792-8_10

    13

    https://doi.org/10.1145/3158146https://doi.org/10.1145/3062341.3062378https://doi.org/10.1145/2103656.2103677http://dl.acm.org/citation.cfm?id=2021296.2021319http://dl.acm.org/citation.cfm?id=2021296.2021319https://doi.org/10.1145/2676726.2677000https://doi.org/10.1145/2676726.2677000https://doi.org/10.1145/2103656.2103670https://doi.org/10.1145/2103656.2103670https://doi.org/10.1145/1706299.1706308https://doi.org/10.1007/978-3-642-54792-8_10

  • PPDP ’19, October 7–9, 2019, Porto, Portugal Gian Pietro Farina, Stephen Chong, and Marco Gaboardi

    Ádám Darvas, Reiner Hähnle, and David Sands. 2005. A Theorem Proving Approachto Analysis of Secure Information Flow. In Proceedings of the Second InternationalConference on Security in Pervasive Computing (SPC’05). Springer-Verlag, Berlin,Heidelberg, 193–209. https://doi.org/10.1007/978-3-540-32004-3_20

    Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceed-ings of the Theory and Practice of Software, 14th International Conference on Toolsand Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08).337–340.

    Marco Eilers, Peter Müller, and Samuel Hitz. 2018. Modular Product Programs. InProgramming Languages and Systems, Amal Ahmed (Ed.). Springer InternationalPublishing, Cham, 502–529.

    Jérôme Feret. 2001. Abstract Interpretation-Based Static Analysis of Mobile Ambients.In Eighth International Static Analysis Symposium (SAS’01) (LNCS). Springer-Verlag.

    Roberto Giacobazzi and Isabella Mastroeni. 2004. Abstract non-interference: parame-terizing non-interference by abstract interpretation. In Proceedings of the 31st ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2004,Venice, Italy, January 14-16, 2004. 186–197.

    Joseph A. Goguen and José Meseguer. 1982. Security Policies and Security Models. In1982 IEEE Symposium on Security and Privacy, Oakland, CA, USA, April 26-28, 1982.11–20.

    Joseph A. Goguen and José Meseguer. 1984. Unwinding and Inference Control. InProceedings of the 1984 IEEE Symposium on Security and Privacy, Oakland, California,USA, April 29 - May 2, 1984. 75–87.

    Martin Hentschel, Richard Bubel, and Reiner Hähnle. 2014. Symbolic ExecutionDebugger (SED). In Proceedings of Runtime Verification 2014 (2014-01-01) (LNCS),Borzoo Bonakdarpour and Scott A. Smolka (Eds.). Springer, 255–262.

    Krystof Hoder, Laura Kovács, and Andrei Voronkov. 2011. Invariant Generation inVampire. In Tools and Algorithms for the Construction and Analysis of Systems - 17thInternational Conference, TACAS 2011, Held as Part of the Joint European Conferenceson Theory and Practice of Software, ETAPS 2011, Saarbrücken, Germany, March26-April 3, 2011. Proceedings. 60–64.

    Martin Hofmann and Mariela Pavlova. 2008. Elimination of Ghost Variables in Pro-gram Logics. In Trustworthy Global Computing. Springer Berlin Heidelberg, Berlin,Heidelberg.

    Catalin Hritcu, John Hughes, Benjamin C. Pierce, Antal Spector-Zabusky, DimitriosVytiniotis, Arthur Azevedo de Amorim, and Leonidas Lampropoulos. 2013. TestingNoninterference, Quickly. In Proceedings of the 18th ACM SIGPLAN InternationalConference on Functional Programming (ICFP ’13). 455–468.

    Sarfraz Khurshid, Corina S. Păsăreanu, and Willem Visser. 2003. Generalized SymbolicExecution for Model Checking and Testing. In Proceedings of the 9th InternationalConference on Tools and Algorithms for the Construction and Analysis of Systems(TACAS’03). 553–568.

    James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7(July 1976), 385–394.

    Hyoukjun Kwon, William Harris, and Hadi Esmaeilzadeh. 2017. Proving Flow Securityof Sequential Logic via Automatically-Synthesized Relational Invariants. In 30thIEEE Computer Security Foundations Symposium, CSF 2017, Santa Barbara, CA, USA,August 21-25, 2017. 420–435.

    John McCarthy. 1961. A Basis for a Mathematical Theory of Computation (prelimi-nary report). In Proceedings of the Western Joint Computer Conference, Cicely M.Popplewell (Ed.). IRE, AIEE, ACM, 225–238.

    Dimiter Milushev, Wim Beck, and Dave Clarke. 2012. Noninterference via SymbolicExecution. In Proceedings of the IFIP WG 6.1 International Conference on FormalTechniques for Distributed Systems (FMOODS’12/FORTE’12). 152–168.

    Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the Designof the R Language - Objects and Functions for Data Analysis. In ECOOP 2012 -Object-Oriented Programming - 26th European Conference, Beijing, China, June 11-16,2012. Proceedings. 104–131.

    Aleksandar Nanevski, Anindya Banerjee, and Deepak Garg. 2013. Dependent TypeTheory for Verification of Information Flow and Access Control Policies. ACMTrans. Program. Lang. Syst. 35 (2013), 6:1–6:41.

    Suzette Person, Matthew B. Dwyer, Sebastian Elbaum, and Corina S. Pǎsǎreanu. 2008.Differential Symbolic Execution. In Proceedings of the 16th ACM SIGSOFT Inter-national Symposium on Foundations of Software Engineering (SIGSOFT ’08/FSE-16).226–237.

    Lauren Pick, Grigory Fedyukovich, and Aarti Gupta. 2018. Exploiting Synchrony andSymmetry in Relational Verification. In Computer Aided Verification.

    François Pottier and Vincent Simonet. 2003. Information Flow Inference for ML. ACMTrans. Program. Lang. Syst. 25, 1 (Jan. 2003), 117–158.

    Shengchao Qin, Guanhua He, Chenguang Luo, Wei-Ngan Chin, and Xin Chen. 2013.Loop invariant synthesis in a combined abstract domain. J. Symb. Comput. 50(2013), 386–408.

    R Core Team. 2013. R: A Language and Environment for Statistical Computing. RFoundation for Statistical Computing, Vienna, Austria. http://www.R-p


Recommended