+ All Categories
Home > Documents > arXiv:2010.13593v2 [cs.PL] 27 Oct 2020

arXiv:2010.13593v2 [cs.PL] 27 Oct 2020

Date post: 06-Mar-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
56
arXiv:2010.13593v2 [cs.PL] 27 Oct 2020 1 Taming x86-TSO Persistency (Extended Version) ARTEM KHYZHA, Tel Aviv University, Israel ORI LAHAV, Tel Aviv University, Israel We study the formal semantics of non-volatile memory in the x86-TSO architecture. We show that while the explicit persist operations in the recent model of Raad et al. from POPL’20 only enforce order between writes to the non-volatile memory, it is equivalent, in terms of reachable states, to a model whose explicit persist operations mandate that prior writes are actually written to the non-volatile memory. The latter provides a novel model that is much closer to common developers’ understanding of persistency semantics. We further introduce a simpler and stronger sequentially consistent persistency model, develop a sound mapping from this model to x86, and establish a data-race-freedom guarantee providing programmers with a safe program- ming discipline. Our operational models are accompanied with equivalent declarative formulations, which facilitate our formal arguments, and may prove useful for program verification under x86 persistency. CCS Concepts: • Computer systems organization Multicore architectures;• Software and its engineer- ing Semantics;• Theory of computation Concurrency; Program semantics. Additional Key Words and Phrases: persistency, non-volatile memory, x86-TSO, weak memory models, con- currency ACM Reference Format: Artem Khyzha and Ori Lahav. 2021. Taming x86-TSO Persistency (Extended Version). Proc. ACM Program. Lang. 1, CONF, Article 1 (January 2021), 56 pages. 1 INTRODUCTION Non-volatile memory (a.k.a. persistent memory) preserves its contents in case of a system failure and thus allows the implementation of crash-safe systems. On new Intel machines non-volatile memory coexists with standard (volatile) memory. Their performance are largely comparable, and it is believed that non-volatile memory may replace standard memory in the future [Pelley et al. 2014]. Nevertheless, in all modern machines, writes are not performed directly to memory, and the caches in between the CPU and the memory are expected to remain volatile (losing their contents upon a crash) [Izraelevitz et al. 2016b]. Thus, writes may propagate to the non-volatile memory later than the time they were issued by the processor, and possibly not even in the order in which they were issued, which may easily compromise the system’s ability to recover to a consistent state upon a failure [Bhandari et al. 2012]. This complexity, which, for concurrent programs, comes on top of the complexity of the memory consistency model, results in counterintuitive behaviors, and makes the programming on such machines very challenging. As history has shown for consistency models in multicore systems, having formal semantics of the underlying persistency model is a paramount precondition for understanding such intricate systems, as well as for programming and reasoning about programs under such systems, and for mapping (i.e., compiling) from one model to another. The starting point for this paper is the recent work of Raad et al. [2020] that in extensive collabo- ration with engineers at Intel formalized an extension of the x86-TSO memory model of Owens et al. [2009] to account for Intel-x86 persistency semantics [Intel 2019]. Roughly speaking, in order to Authors’ addresses: Artem Khyzha, Tel Aviv University, Israel, [email protected]; Ori Lahav, Tel Aviv University, Israel, [email protected]. 2021. 2475-1421/2021/1-ART1 $15.00 https://doi.org/ Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.
Transcript

arX

iv:2

010.

1359

3v2

[cs

.PL

] 2

7 O

ct 2

020

1

Taming x86-TSO Persistency (Extended Version)

ARTEM KHYZHA, Tel Aviv University, Israel

ORI LAHAV, Tel Aviv University, Israel

We study the formal semantics of non-volatile memory in the x86-TSO architecture. We show that while theexplicit persist operations in the recent model of Raad et al. from POPL’20 only enforce order between writesto the non-volatile memory, it is equivalent, in terms of reachable states, to a model whose explicit persistoperations mandate that prior writes are actually written to the non-volatile memory. The latter provides anovel model that is much closer to common developers’ understanding of persistency semantics. We furtherintroduce a simpler and stronger sequentially consistent persistency model, develop a sound mapping fromthis model to x86, and establish a data-race-freedom guarantee providing programmers with a safe program-ming discipline. Our operational models are accompanied with equivalent declarative formulations, whichfacilitate our formal arguments, and may prove useful for program verification under x86 persistency.

CCS Concepts: •Computer systemsorganization→Multicore architectures; • Software and its engineer-

ing → Semantics; • Theory of computation→ Concurrency; Program semantics.

Additional Key Words and Phrases: persistency, non-volatile memory, x86-TSO, weak memory models, con-

currency

ACM Reference Format:

Artem Khyzha and Ori Lahav. 2021. Taming x86-TSO Persistency (Extended Version). Proc. ACM Program.

Lang. 1, CONF, Article 1 (January 2021), 56 pages.

1 INTRODUCTION

Non-volatile memory (a.k.a. persistent memory) preserves its contents in case of a system failureand thus allows the implementation of crash-safe systems. On new Intel machines non-volatilememory coexists with standard (volatile) memory. Their performance are largely comparable, andit is believed that non-volatile memory may replace standard memory in the future [Pelley et al.2014]. Nevertheless, in all modern machines, writes are not performed directly to memory, and thecaches in between the CPU and the memory are expected to remain volatile (losing their contentsupon a crash) [Izraelevitz et al. 2016b]. Thus, writes may propagate to the non-volatile memorylater than the time they were issued by the processor, and possibly not even in the order in whichtheywere issued, whichmay easily compromise the system’s ability to recover to a consistent stateupon a failure [Bhandari et al. 2012]. This complexity, which, for concurrent programs, comes ontop of the complexity of the memory consistency model, results in counterintuitive behaviors, andmakes the programming on such machines very challenging.As history has shown for consistency models in multicore systems, having formal semantics of

the underlying persistency model is a paramount precondition for understanding such intricatesystems, as well as for programming and reasoning about programs under such systems, and formapping (i.e., compiling) from one model to another.The starting point for this paper is the recent work of Raad et al. [2020] that in extensive collabo-

rationwith engineers at Intel formalized an extension of the x86-TSOmemorymodel ofOwens et al.[2009] to account for Intel-x86 persistency semantics [Intel 2019]. Roughly speaking, in order to

Authors’ addresses: Artem Khyzha, Tel Aviv University, Israel, [email protected]; Ori Lahav, Tel Aviv University,Israel, [email protected].

2021. 2475-1421/2021/1-ART1 $15.00https://doi.org/

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:2 Artem Khyzha and Ori Lahav

formally justify certain outcomes that are possible after crash but can never be observed in nor-mal (non-crashing) executions, their model, called Px86, employs two levels of buffers—per threadstore buffers and a global persistence buffer sitting between the store buffers and the non-volatilememory.There are, however, significant gaps between the Px86 model and developers and researchers’

common (often informal) understanding of persistent memory systems.First, Px86’s explicit persist instructions are “asynchronous”. These are instructions that allow

different levels of control over how writes persist (i.e., propagate to the non-volatile memory):flush instructions for persisting single cache lines and more efficient flush-optimal instructionsthat require a following store fence (sfence) to ensure their completion. In Px86 these instructionsare asynchronous: propagating these instructions from the store buffer (making them globallyvisible) does not block until certain writes persist, but rather enforces restrictions on the orderin which writes persist. For example, rather then guaranteeing that a certain cache line has topersist when flush is propagated from the store buffer, it only ensures that prior writes to thatcache line must persist before any subsequent writes (under some appropriate definition of “prior”and “subsequent”). Similarly, Px86’s sfence instructions provide such guarantees for flush-optimalinstructions executed before the sfence, but does not ensure that any cache line actually persisted.In fact, for any program under Px86, it is always possible that writes do not persist at all—thesystem may always crash with the contents of the very initial non-volatile memory.We observe that Px86’s asynchronous explicit persist instructions lie in sharp contrast with a

variety of previous work and developers’ guides, ranging from theory to practice, that assumed,sometimes implicitly, “synchronous” explicit persist instructions that allow the programmer to as-sert that certain write must have persisted at certain program points (e.g., [Arulraj et al. 2018;Chen and Jin 2015; David et al. 2018; Friedman et al. 2020, 2018; Gogte et al. 2018; Izraelevitz et al.2016b; Kolli et al. 2017, 2016; Lersch et al. 2019; Liu et al. 2020; Oukid et al. 2016; Scargall 2020;Venkataraman et al. 2011;Wang et al. 2018; Yang et al. 2015; Zuriel et al. 2019]). For example, Izraelevitz et al.[2016b]’s psync instruction blocks until all previous explicit persist institutions “have actuallyreached persistent memory”, but such instruction cannot be implemented in Px86.Second, the store buffers of Px86 are not standard first-in-first-out (FIFO) buffers. In addition to

pending writes, as in usual TSO store buffers, store buffers of Px86 include pending explicit persistinstructions. While pending writes preserve their order in the store buffers, the order involving thepending persist instructions is not necessarily maintained. For example, a pending flush-optimalinstruction may propagate from the store buffer after a pending write also in case that the flush-optimal instruction was issued by the processor before the write. Indeed, without this (and similar)out-of-order propagation steps,Px86 becomes too strong so it forbids certain observable behaviors.We find the exact conditions on the store buffers propagation order to be rather intricate, makingmanual reasoning about possible outcomes rather cumbersome.Third, Px86 lacks a formal connection to an SC-based model. Developers often prefer sequen-

tially consistent concurrency semantics (SC). They may trust a compiler to place sufficient (prefer-ably not excessive) barriers for ensuring SC when programming against an underlying relaxedmemory model, or rely on a data-race-freedom guarantee (DRF) ensuring that well synchronizedprograms cannot expose weak memory behaviors. However, it is unclear how to derive a sim-pler well-behaved SC persistency model from Px86. The straightforward solution of discardingthe store buffers from the model, thus creating direct links between the processors and the persis-tence buffer, is senseless for Px86. Indeed, if applied to Px86, it would result in an overly strongsemantics, which, in particular, completely identifies the two kinds of explicit persist instructions

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:3

(“flush” and “flush-optimal”), since the difference between them in Px86 emerges solely from prop-agation restrictions from the store buffers. In fact, inPx86, even certain behaviors of single threadedprograms can be only accounted for by the effect of the store buffer.Does this mean that the data structures, algorithms, and principled approaches developed before

having the formal Px86model are futile w.r.t. Px86? The main goal of the current paper is to bridgethe gap between Px86 and developers and researchers’ common understanding, and establish anegative answer to this question.Our first contribution is an alternative x86-TSO operational persistency model that is provably

equivalent to Px86, and is closer, to the best of our understanding, to developers’ mental model ofx86 persistency. Our model, which we call PTSOsyn, has synchronous explicit persist instructions,which, when they are propagated from the store buffer, do block the execution until certain writespersist. (In the case of flush-optimal, the subsequent sfence instruction is the one blocking.) Out-of-order propagation from the store buffers is also significantly confined in our PTSOsyn model(but not avoided altogether, see Ex. 4.3). In addition, PTSOsyn employs per-cache-line persistenceFIFO buffers, which, we believe, are reflecting the guarantees on the persistence order of writesmore directly than the persistence (non-FIFO) buffer of Px86. (This is not a mere technicality, dueto the way explicit persist instructions are handled in Px86, its persistence buffer has to includepending writes of all cache-lines.)The equivalence notion we use to relate Px86 and PTSOsyn is state-based: it deems two models

equivalent if the set of reachable program states (possibly with crashes) in the models coincide.Since a programmay always start by inspecting the memory, this equivalence notion is sufficientlystrong to ensure that every content of the non-volatile memory after a crash that is observable inone model is also observable in the other. Roughly speaking, our equivalence argument builds onthe intuition that crashing before an asynchronous flush instruction completes is observationallyindistinguishable from crashing before a synchronous flush instruction propagates from the storebuffer. Making this intuition into a proof and applying it for the full model including both kinds ofexplicit persist instructions is technically challenging (we use two additional intermediate systemsbetween Px86 and PTSOsyn).Our second contribution is an SC persistency model that is formally related to our TSO per-

sistency model. The SC model, which we call PSC, is naturally obtained by discarding the storebuffers in PTSOsyn. Unlike for Px86, the resulting model, to our best understanding, preciselycaptures the developers’ understanding. In particular, the difficulties described above for Px86 areaddressed by PTSOsyn: even without store buffers the different kinds of explicit persist instruc-tions (flush and flush-optimal) have different semantics in PTSOsyn, and store buffers are neverneeded in single threaded programs.We establish two results relating PSC and PTSOsyn. The first is a sound mapping from PSC

to PTSOsyn, intended to be used as a compilation scheme that ensures simpler and more well-behaved semantics on x86 machines. This mapping extends the standard mapping of SC to TSO:in addition to placing a memory fence (mfence) between writes and subsequent reads to differentlocations, it also places store fences (sfence) between writes and subsequent flush-optimal instruc-tions to different locations (the latter is only required when there is no intervening write or readoperation between the write and the flush-optimal, thus allowing a barrier-free compilation ofstandard uses of flush-optimal). The second result is a DRF-guarantee for PTSOsyn w.r.t. PSC.This guarantee ensures PSC-semantics for programs that are race-free under PSC semantics, andthus provide a safe programming discipline against PTSOsyn that can be followed without evenknowing PTSOsyn. To achieve this, the standard notion of a data race is extend to include races be-tween flush-optimal instructions andwrites. We note that following our precise definition of a datarace, RMW (atomic read-modify-writes) instructions do not induce races, so that with a standard

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:4 Artem Khyzha and Ori Lahav

lock implementation, properly locked programs (using locks to avoid data races) are not consid-ered racy. In fact, both of the mapping of PSC to PTSOsyn and the DRF-guarantee are corollariesof a stronger and more precise theorem relating PSC and PTSOsyn (see Thm. 7.8).Finally, as a by-product of our work, we provide declarative (a.k.a. axiomatic) formulations for

PTSOsyn and PSC (which we have used for formally relating them). Our PTSOsyn declarativemodel is more abstract than one in [Raad et al. 2020]. In particular, its execution graphs do notrecord total persistence order on so-called “durable” events (the ‘non-volatile-order’ of [Raad et al.2020]). Instead, execution graphs are accompanied a mapping that assigns to every location thelatest persisted write to that location. From that mapping, we derive an additional partial orderon events that is used in our acyclicity consistency constraints. We believe that, by avoiding theexistential quantification on all possible persistence orders, our declarative presentation of thepersistency model may lend itself more easily to automatic verification using execution graphs,e.g., in the style of [Abdulla et al. 2018; Kokologiannakis et al. 2017].

Outline. The rest of this paper is organized as follows. In §2 we present our general formal frame-work for operational persistency models. In §3 we present Raad et al. [2020]’s Px86 persistencymodel. In §4 we introduce PTSOsyn and outline the proof of equivalence of PTSOsyn and Px86.In §5 we present our declarative formulation of PTSOsyn and relate it to the operational seman-tics. In §6 we present the persistency SC-model derived from PTSOsyn, as well as its declarativeformulation. In §7 we use the declarative semantics to formally relate Px86 and PTSOsyn. In §8we present the related work and conclude.

Additional Material. Proofs of the theorems in the paper are given in the its accompanying techni-cal appendix.

2 AN OPERATIONAL FRAMEWORK FOR PERSISTENCY SPECIFICATIONS

In this section we present our general framework for defining operational persistency models. Asstandard in weak memory semantics, the operational semantics is obtained by synchronizing aprogram (a.k.a. thread subsystem) and a memory subsystem (a.k.a. storage subsystem). The noveltylies in the definition of persistentmemory subsystems whose states have distinguished non-volatilecomponents. When running a program under a persistent memory subsystem, we include non-deterministic “full system” crash transitions that initialize all volatile parts of the state.We start with some notational preliminaries (§2.1), briefly discuss program semantics (§2.2), and

then define persistent memory subsystems and their synchronization with programs (§2.3).

2.1 Preliminaries

Sequences. For a finite alphabet Σ, we denote by Σ∗ (respectively, Σ+) the set of all sequences

(non-empty sequences) over Σ. We use n to denote the empty sequence. The length of a sequenceB is denoted by |B | (in particular |n | = 0). We often identify a sequence B over Σ with its underlyingfunction in {1, ... ,|B |} → Σ, and write B (:) for the symbol at position 1 ≤ : ≤ |B | in B . We writef ∈ B if f appears in B , that is if B (:) = f for some 1 ≤ : ≤ |B |. We use “·” for the concatenationof sequences, which is lifted to concatenation of sets of sequences in the obvious way. We identifysymbols with sequences of length 1 or their singletons when needed (e.g., in expressions like f ·().

Relations. Given a relation ', dom(') denotes its domain; and '?, '+, and '∗ denote its reflexive,transitive, and reflexive-transitive closures. The inverse of ' is denoted by '−1. The (left) compo-sition of relations '1, '2 is denoted by '1 ; '2. We assume that ; binds tighter than ∪ and \. Wedenote by [�] the identity relation on a set �, and so [�] ; ' ; [�] = ' ∩ (� × �).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:5

Labeled transition systems. A labeled transition system (LTS)� is a tuple 〈&, Σ,&Init,) 〉, where&is a set of states, Σ is a finite alphabet (whose symbols are called transition labels),&Init ⊆ & is a setof initial states, and ) ⊆ & × Σ × & is a set of transitions. We denote by �.Q, �.Σ, �.QInit, and �.T

the components of an LTS �. We writef−→� for the relation {〈@, @′〉 | 〈@, f,@′〉 ∈ �.T}, and −→�

for⋃

f ∈Σf−→� . For a sequence C ∈ �.Σ∗, we write

C−→� for the composition

C (1)−−−→� ; ... ;

C ( |C |)−−−−→� .

A sequence C ∈ �.Σ∗ such that @Init

C−→� @ for some @Init ∈ �.QInit and @ ∈ �.Q is called a trace of �

(or an�-trace). We denote by traces(�) the set of all traces of�. A state @ ∈ �.Q is called reachable

in � if @Init

C−→� @ for some @Init ∈ �.QInit and C ∈ traces(�).

Observable traces. Given an LTS�, we usually have a distinguished symboln included in�.Σ. Werefer to transitions labeled with n as silent transitions, while the other transition are called observ-

able transitions. For a sequence C ∈ (�.Σ\{n})∗, wewriteC=⇒� for the relation {〈@, @′〉 | @

n−→

C (1)−−−→�

n−→

� · · ·n−→

C ( |C |)−−−−→�

n−→

� @′}.

A sequence C ∈ (�.Σ \ {n})∗ such that @Init

C=⇒� @ for some @Init ∈ �.QInit and @ ∈ �.Q is called an

observable trace of � (or an �-observable-trace). We denote by otraces(�) the set of all observabletraces of �.

2.2 Concurrent Programs Representation

To keep the presentation abstract, we do not provide here a concrete programming language, butrather represent programs as LTSs. For this matter, we let Val ⊆ N, Loc ⊆ {x, y, ...}, and Tid ⊆

{T1, T2, ... ,T# }, be sets of values, (shared) memory locations, and thread identifiers. We assume thatVal contains a distinguished value 0, used as the initial value for all locations.

Sequential programs are identified with LTSs whose transition labels are event labels, extendedwith n for silent program transitions, as defined next.1

Definition 2.1. An event label is either a read label R(G, ER), a write label W(G, EW), a read-modify-write (RMW) label RMW(G, ER, EW), a failed compare-and-swap (CAS) label R-ex(G, ER), anmfence labelMF, a flush label FL(G), a flush-opt label FO(G), or an sfence label SF, where G ∈ Loc and ER, EW ∈ Val.We denote by Lab the set of all event labels. The functions typ, loc, valR, and valW retrieve (whenapplicable) the type (R/W/RMW/R-ex/MF/FL/FO/SF), location (G ), read value (ER), and written value(EW) of an event label.

Event labels correspond to the different interactions that a programmayhavewith the persistentmemory subsystem. In particular, we have several types of barrier labels: a memory fence (MF), apersistency per-location flush barrier (FL(G)), an optimized persistency per-location flush barrier,called “flush-optimal” (FO(G)), and a store fence (SF).2 Roughly speaking, memory fences (MF)ensure the completion of all prior instructions, while store fences (SF) ensure that prior flush-optimal instructions have taken their effect. Memory access labels include plain reads and writes,as well as RMWs (RMW(G, ER, EW)) resulting from operations like compare-and-swap (CAS) and fetch-and-add. For failed CAS (a CAS that did not read the expected value) we use a special read labelR-ex(G, ER), which allows us to distinguish such transitions from plain reads and provide themwith stronger semantics.3 We note that our event labels are specific for the x86 persistency, butthey can be easily extended and adapted for other models.

1In our examples we use a standard program syntax and assume a standard reading of programs as LTSs. To assist thereader, Appendix H provides a concrete example of how this can be done.2In [Intel 2019], flush is referred to as CLFLUSH, flush-optimal is referred to as CLFLUSHOPT. Intel’s CLWB instruction isequivalent to CLFLUSHOPT and may improve performance in certain cases [Raad et al. 2020].3Some previous work, e.g., [Lahav et al. 2016; Raad et al. 2020], consider failed RMWs (arising from lock cmpxchg instruc-tions) as plain reads, although failed RMWs induce a memory fence in TSO.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:6 Artem Khyzha and Ori Lahav

In turn, a (concurrent) program Pr is a top-level parallel composition of sequential programs,defined as a mapping assigning a sequential program to every g ∈ Tid. A program Pr is also iden-tified with an LTS, which is obtained by standard lifting of the LTSs representing its componentsequential programs. The transition labels of this LTS record the thread identifier of non-silenttransitions, as defined next.

Definition 2.2. A program transition label is either 〈g, ;〉 for g ∈ Tid and ; ∈ Lab (observabletransition) or n (silent transition). We denote by PTLab the set of all program transition labels.We use the function tid and lab to return the thread identifier (g ) and event label ; of a giventransition label (when applicable). The functions typ, loc, valR, and valW are lifted to transitionlabels in the obvious way (undefined for n-transitions).

The LTS induced by a (concurrent) program Pr is over the alphabet PTLab; its states arefunctions, denoted by @, assigning a state in Pr (g).Q to every g ∈ Tid; its initial states set is∏

g Pr (g).QInit; and its transitions are “interleaved transitions” of Pr ’s components, given by:

; ∈ Lab @(g);−→Pr (g) @

@g,;−−→Pr @[g ↦→ @′]

@(g)n−→Pr (g) @

@n−→Pr @[g ↦→ @′]

We refer to sequences over PTLab \ {n} = Tid × Lab as observable program traces. Clearly,observable program traces are closed under “per-thread prefixes”:

Definition 2.3. We denote by C |g the restriction of an observable program trace C to transitionlabels of the form 〈g, _〉. An observable program trace C ′ is per-thread equivalent to an observableprogram trace C , denoted by C ′ ∼ C , if C ′ |g = C |g for every g ∈ Tid. In turn, C ′ is a per-thread prefixof C , denoted by C ′ . C , if C ′ is a (possibly trivial) prefix of some C ′′ ∼ C (equivalently, C ′ |g is a prefixof C |g for every g ∈ Tid).

Proposition 2.4. If C is a Pr -observable-trace, then so is every C ′ . C .

2.3 Persistent Systems

At the program level, the read values are arbitrary. It is the responsibility of the memory subsystemto specify what values can be read from each location at each point. Formally, the memory subsys-tem is another LTS over PTLab, whose synchronization with the program gives us the possiblebehaviors of the whole system. For persistent memory subsystems, we require that each memorystate is composed of a persistent memory Loc → Val, which survived the crash, and a volatilepart, whose exact structure varies from one system to another (e.g., TSO-based models will havestore buffers in the volatile part and SC-based systems will not).

Definition 2.5. A persistent memory subsystem is an LTS" that satisfies the following:

• ".Σ = PTLab.• ".Q = (Loc → Val) × &̃ where &̃ is some set. We denote by ".Q̃ the particular set &̃ usedin a persistent memory subsystem" . We usually denote states in".Q as @ = 〈<,<̃〉, wherethe two components (< and <̃) of a state @ are respectively called the non-volatile state andthe volatile state.4

• ".QInit = (Loc → Val) × &̃Init where &̃Init is some subset of ".Q̃. We denote by ".Q̃Init theparticular set &̃Init used in a persistent memory subsystem " .

4When the elements of ".Q̃ are tuples themselves, we often simplify the writing by flattening the states, e.g., 〈<,U, V 〉

instead of 〈<, 〈U, V 〉〉.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:7

In the systems defined below, the non-volatile states in ".Q̃ consists a multiple buffers (storebuffers and persistence buffers) that lose their contents upon crash. The transition labels of apersistent memory subsystem are pairs in Tid × Lab, representing the thread identifier and theevent label of the operation, or n for internal (silent) memory actions (e.g., propagation from thestore buffers). We note that, given the requirements of Def. 2.5, to define a persistent memorysubsystem " it suffices to give its sets ".Q̃ and".Q̃Init of volatile states and initial volatile states,and its transition relation.By synchronizing a program Pr and a persistent memory subsystem " , and including non-

deterministic crash transitions (labeled with ), we obtain a persistent system, which we denote byPr q " :

Definition 2.6. A program Pr and a persistent memory subsystem " form a persistent system,denoted byPr q " . It is an LTS over the alphabetPTLab∪{ }whose set of states isPr .Q×(Loc →

Val) × ".Q̃; its initial states set is Pr .QInit × {<Init} × ".Q̃Init , where<Init = _G ∈ Loc. 0; and itstransitions are “synchronized transitions” of Pr and" , given by:

@g,;−−→Pr @

′ 〈<,<̃〉g,;−−→" 〈<′, <̃′〉

〈@,<,<̃〉g,;−−→Prq" 〈@′,<′, <̃′〉

@n−→Pr @

〈@,<,<̃〉n−→Prq" 〈@′,<,<̃〉

〈<,<̃〉n−→" 〈<′, <̃′〉

〈@,<,<̃〉n−→Prq" 〈@,<′, <̃′〉

@Init

∈ Pr .QInit <̃Init ∈ ".Q̃Init

〈@,<,<̃〉 −→Prq" 〈@

Init,<,<̃Init〉

Crash transitions reinitialize the program state @ (which corresponds to losing the programcounter and the local stores) and the volatile component of the memory state <̃. The persistentmemory< is left intact.Given the above definition of persistent system, we can define the set of reachable program

states under a given persistent memory subsystem. Focused on safety properties, we use this no-tion to define when one persistent memory subsystem observationally refines another.

Definition 2.7. A program state @ ∈ Pr .Q is reachable under a persistent memory subsystem " if〈@,<,<̃〉 is reachable in Pr q " for some 〈<,<̃〉 ∈ ".Q.

Definition 2.8. A persistent memory subsystem "1 observationally refines a persistent memorysubsystem "2 if for every program Pr , every program state @ ∈ Pr .Q that is reachable under "1

is also reachable under "2. We say that "1 and "2 are observationally equivalent if "1 observa-tionally refines "2 and"2 observationally refines "1.

While the above refinement notion refers to reachable program states, it is also applicable forthe reachable non-volatile memories. Indeed, a programmay always start by asserting certain con-ditions reflecting the fact that the memory is in certain consistent state (which usually vacuouslyhold for the very initial memory<Init), thus capturing the state of the non-volatile memory in theprogram state itself.

Remark 1. Our notions of observational refinement and equivalence above are state-based. Thisis standard in formalizations of weak memory models, intended to support reasoning about safetyproperties (e.g., detect program assertion violations). In particular, if "1 observationally refines"2, the developer may safely assume"2’s semantics when reasoning about reachable non-volatilememories under "1. We note that a more refined notion of observation in a richer language, e.g.,with I/O side-effects, may expose behaviors of "1 that are not observable in "2 even when "1

and"2 are observationally equivalent according to the definition above.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:8 Artem Khyzha and Ori Lahav

The following lemma allows us to establish refinements without considering all programs andcrashes.

Definition 2.9. An observable trace C of a persistent memory subsystem " is called<0-to-< if

〈<0, <̃Init〉C=⇒" 〈<,<̃〉 for some <̃Init ∈ ".Q̃Init and <̃ ∈ ".Q̃. Furthermore, C is called<0-initialized

if it is<0-to-< for some<.

Lemma 2.10. The following conditions together ensure that a persistent memory subsystem "1

observationally refines a persistent memory subsystem "2:

(i) Every<0-initialized "1-observable-trace is also an<0-initialized "2-observable-trace.(ii) For every<0-to-< "1-observable-trace C1, some C2 . C1 is an<0-to-< "2-observable-trace.

Proof (outline). Consider any program state @ reachable under "1 with a trace C = C0 · ·

C1 · ... · · C= . Each crash resets the program state and the volatile state, but not the non-volatilestate. We leverage condition (ii) in showing that Pr q "2 can reach each crash having the samenon-volatile memory state as Pr q "1 (possibly with a shorter program trace). Therefore, whenPr q "1 proceeds with in C= after the last crash, Pr q "2 is able to proceed from exactly the samestate. Then, condition (i) applied to C= immediately gives us that @ is reachable under "2. �

Intuitively speaking, condition (i) ensures that after the last system crash, the client can onlyobserve behaviors of "1 that are allowed by "2, and condition (ii) ensures that the parts of thestate that survives crashes that are observable in"1 are also observable in"2. Note that condition(ii) allows us (and we actually rely on it in our proofs) to reach the non-volatile memory in"1 witha per-thread prefix of the program trace that reached that memory in"2. Indeed, the program stateis lost after the crash, and the client cannot observe what part of the program has been actuallyexecuted before the crash.

3 THE Px86 PERSISTENT MEMORY SUBSYSTEM

In this section we present Px86, the persistent memory subsystem by Raad et al. [2020] whichmodels the persistency semantics of the Intel-x86 architecture.

Remark 2. Following discussions with Intel engineers, Raad et al. [2020] introduced two models:Px86man and Px86sim. The first formalizes the (ambiguous and under specified) reference manualspecification [Intel 2019]. The latter simplifies and strengthens the first while capturing the “be-havior intended by the Intel engineers”. The model studied here is Px86sim, which we simply callPx86.

Px86 is an extension of the standard TSO model [Owens et al. 2009] with another layer calledpersistence buffer. This is a global buffer that contains writes that are pending to be persisted to the(non-volatile) memory as well as certain markers governing the persistence order. Store buffersare extended to include not only store instruction but also flush and sfence instructions. Both the(per-thread) store buffers and the (global) persistence buffer are volatile.

Definition 3.1. A store buffer is a finite sequence b of event labels ; with typ(;) ∈ {W, FL, FO, SF}.A store-buffer mapping is a function B assigning a store buffer to every g ∈ Tid. We denote by Bn ,the initial store-buffer mapping assigning the empty sequence to every g ∈ Tid.

Definition 3.2. A persistence buffer is a finite sequence p of elements of the form W(G, E) or PER(G)(where G ∈ Loc and E ∈ Val).

Like the memory, the persistence buffer is accessible by all threads. When thread g reads froma shared location G it obtains its latest accessible value of G , which is defined using the following

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:9

< ∈ Loc → Val p ∈ ({W(G, E) | G ∈ Loc, E ∈ Val} ∪ {PER(G) | G ∈ Loc})∗

B ∈ Tid → ({W(G, E) | G ∈ Loc, E ∈ Val} ∪ {FL(G) | G ∈ Loc} ∪ {FO(G) | G ∈ Loc} ∪ {SF})∗

pInit , n BInit , _g. n

write/flush/flush-opt/sfencetyp(;) ∈ {W, FL, FO, SF}

B ′= B [g ↦→ B (g) · ;]

〈<, p,B〉g,;−−→Px86 〈<, p,B ′〉

read; = R(G, E)

get(<,p,B (g))(G) = E

〈<, p,B〉g,;−−→Px86 〈<, p,B〉

rmw; = RMW(G, ER, EW)

get(<, p, n)(G) = ERB (g) = n

p ′ = p · W(G, EW)

〈<, p,B〉g,;−−→Px86 〈<, p ′,B〉

rmw-fail; = R-ex(G, E)

get(<,p, n)(G) = E

B (g) = n

〈<, p,B〉g,;−−→Px86 〈<, p,B〉

mfence; = MF

B (g) = n

〈<, p,B〉g,;−−→Px86 〈<, p,B〉

prop-wB (g) = b1 · W(G, E) · b2W(_, _), FL(_), SF ∉ b1

B ′= B [g ↦→ b1 · b2] p ′ = p · W(G, E)

〈<, p,B〉n−→Px86 〈<, p ′,B ′〉

prop-flB (g) = b1 · FL(G) · b2

W(_, _), FL(_), FO(G), SF ∉ b1B ′

= B [g ↦→ b1 · b2] p ′ = p · PER(G)

〈<, p,B〉n−→Px86 〈<, p ′,B ′〉

prop-foB (g) = b1 · FO(G) · b2W(G, _), FL(G), SF ∉ b1

B ′= B [g ↦→ b1 · b2] p ′ = p · PER(G)

〈<, p,B〉n−→Px86 〈<, p ′,B ′〉

prop-sfB (g) = SF · b

B ′= B [g ↦→ b]

〈<, p,B〉n−→Px86 〈<, p,B ′〉

persist-wp = p1 · W(G, E) · p2W(G, _), PER(_) ∉ p1

p ′ = p1 · p2 <′=< [G ↦→ E]

〈<, p,B〉n−→Px86 〈<′, p ′,B〉

persist-perp = p1 · PER(G) · p2W(G, _), PER(_) ∉ p1

p ′ = p1 · p2

〈<,p,B〉n−→Px86 〈<, p ′,B〉

Fig. 1. The Px86 Persistent Memory Subsystem

get function applied on the current persistent memory<, persistence buffer p, and g ’s store bufferb:

get(<, p, b) , _G.

E b = b1 · W(G, E) · b2 ∧ W(G, _) ∉ b2

E W(G, _) ∉ b ∧ p = p1 · W(G, E) · p2 ∧ W(G, _) ∉ p2

< (G) otherwise

Using these definitions, Px86 is presented in Fig. 1. Its set of volatile states, Px86.Q̃, consists of allpairs 〈p,B〉, where p is a persistence buffer and B is a store-buffer mapping. Initially, all buffersare empty (Px86.Q̃Init = {〈n,Bn 〉}).The system’s transitions are of three kinds: “issuing steps”, “propagation steps”, and “persistence

steps”. Steps of the first kind are defined as in standard TSO semantics, with the only extensionbeing the fact that flush, flush-optimals and sfences instructions emit entries in the store buffer.Propagation of writes from the store buffer (prop-w) is both making the writes visible to other

threads, and propagating them to the persistence buffer. Note that a write may propagate even

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:10 Artem Khyzha and Ori Lahav

when flush-optimals precede it in the store buffer (which means that they were issued beforethe write by the thread). Propagation of flushes and flush-optimals (prop-fl and prop-fo) addsa “PER-marker” to the persistence buffer, which later restricts the order in which writes persist.The difference between the two kinds of flushes is reflected in the conditions on their propagation.In particular, a flush-optimal may propagate even when writes to different locations precede itin the store buffer (which means that they were issued before the flush-optimal by the thread).Propagation of sfences simply removes the sfence entry, which is only used to restrict the orderof propagation of other entries, and is discarded once it reaches the head of the store buffer.Finally, persisting a write moves a write entry from the persistence buffer to the non-volatile

memory (persist-w).Writes to the same location persist in the same order in which they propagate.The PER-markers ensure that writes that propagated before somemarker persist before writes thatpropagate after that marker. After the PER-markers play their role, they are discarded from thepersistence buffer (persist-per).We note that the step for (non-deterministic) system crashes is included in Def. 2.6 upon syn-

chronizing the LTS of a program with the one of the Px86 memory subsystem. Without crashes,the effect of the persistence buffer is unobservable, and Px86 trivially coincides with the standardTSO semantics.

Example 3.3. Consider the following four sequential programs:

x := 1 ;

y := 1 ;

(�) ✓

x := 1 ;

fl(x) ;

y := 1 ;

(�) ✗

x := 1 ;

fo(x) ;

y := 1 ;

(�) ✓

x := 1 ;

fo(x) ;

sfence ;

y := 1 ;

(�) ✗

To refer to particular program behaviors, we use colored boxes for denoting the last write thatpersisted for each locations (inducing a possible content of the non-volatile memory in a run ofthe program). When some location lacks such annotation (like x in the above examples), it meansthat none of its write persisted, so that its value in the non-volatile memory is 0 (the initial value).In particular, the behaviors annotated above all have < ⊇ {x ↦→ 0, y ↦→ 1}. It is easy to verifythat Px86 allows/forbids each of these behaviors as specified by the corresponding ✓/✗ marking.In particular, example (C) demonstrates that propagating a write before a prior flush-optimal isessential. Indeed, the annotated behavior is obtained by propagating y := 1 from the store bufferbefore fo(x) (but necessarily after x := 1). Otherwise, y := 1 cannot persist without x := 1

persisting before.

Remark 3. To simplify the presentation, following Izraelevitz et al. [2016a], but unlike Raad et al.[2020], we conservatively assume that writes persist atomically at the location granularity (repre-senting, e.g., machine words). Real machines provide granularity at the width of a cache line, and,assuming the programmer can faithfully control what locations are stored on same cache line, mayprovide stronger guarantees. Nevertheless, adapting our results to support cache line granularityis straightforward.

Remark 4. Persistent systems make programs responsible for recovery from crashes: after acrash, programs restart with reinitialized program state and the volatile component of the memorystate. In contrast, Raad et al. [2020] define their system assuming a separate recovery programcalled a recovery context, which after a crash atomically advances program state from the initialone. In our technical development, we prefer to make minimal assumptions about the recoverymechanism. Nevertheless, by adjusting crash transitions in Def. 2.6, our framework and resultscan be easily extended to support Raad et al. [2020]’s recovery context.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:11

4 THE PTSOSYN PERSISTENT MEMORY SUBSYSTEM

In this section we present our alternative persistent memory subsystem, which we call PTSOsyn,that is observationally equivalent to Px86. We list major differences between PTSOsyn and Px86:

• PTSOsyn has synchronous flush instructions—the propagation of a flush of location G from thestore buffer is blocking the execution until all writes to G that propagated earlier have persisted.We note that, as expected in a TSO-based model, flushes do not take their synchronous effectwhen they are issued by the thread, but rather have a delayed globally visible effect happeningwhen they propagate from the store buffer.

• PTSOsyn has synchronous sfence instructions—the propagation of an sfence from the store bufferis blocking the execution until all flush-optimals of the same thread that propagated earlierhave taken their effect. The latter means that all writes to the location of the flush-optimal thatpropagated before the flush-optimal have persisted. Thus, flush-optimals serve as markers inthe persistence buffer, that are only meaningful when an sfence (issued by the same thread thatissued the flush-optimal) propagates from the store buffer. As for flushes, the effect of an sfenceis not at its issue time but at its propagation time. We note that mfence and RMW operations(both when they fail and when they succeed) induce an implicit sfence.

• Rather than a global persistence buffer, PTSOsyn employs per-location persistence buffers di-rectly reflecting the fact that the persistence order has to agree with the propagation order onlybetween writes to the same location, while writes to different locations may persist out of order.

• The store buffers of PTSOsyn are “almost” FIFO buffers. With the exception of flush-optimals,entries may propagate from the store buffer only when they reach the head of the buffer. Flush-optimals may still “overtake” writes as well as flushes/flush-optimals of a different location.Example 4.3 below demonstrates whywe need to allow the latter (there is a certain design choicehere, see Remark 5).

To formally present PTSOsyn, we first define per-location persistence buffers and per-location-persistence-buffer mappings.

Definition 4.1. A per-location persistence buffer is a finite sequence p of elements of the formW(E) or FO(g) (where E ∈ Val and g ∈ Tid). A per-location-persistence-buffer mapping is a functionP assigning a per-location persistence buffer to every G ∈ Loc. We denote by Pn , the initial per-location-persistence-buffer mapping assigning the empty sequence to every G ∈ Loc.

Flush instructions underPTSOsyn take effect upon their propagation, so, unlike in Px86, they donot add PER-markers into the persistence buffers. For flush-optimals, instead of PER-markers, weuse (per location) FO(g) markers, where g is the identifier of the thread that issued the instruction.In accordance with how Px86’s sfence only blocks the propagation of the same thread’s flush-optimals, the synchronous behavior of sfence must not wait for flush-optimals by different threads(see Ex. 4.4 below).

The (overloaded) get function is updated in the obvious way:

get(<, p, b) , _G.

E b = b1 · W(G, E) · b2 ∧ W(G, _) ∉ b2

E W(G, _) ∉ b ∧ p = p1 · W(E) · p2 ∧ W(_) ∉ p2

< (G) otherwise

For looking up a value for location G by thread g , we apply get with < being the current non-volatile memory, p being G ’s persistence buffer, b being g ’s store buffer

Using these definitions, PTSOsyn is presented in Fig. 2. Its set of volatile states, PTSOsyn.Q̃,consists of all pairs 〈P ,B〉, where P is a per-location-persistence-buffer mapping and B is a store-buffer mapping. Initially, all buffers are empty (PTSOsyn.Q̃Init = {〈Pn ,Bn 〉}).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:12 Artem Khyzha and Ori Lahav

< ∈ Loc → Val P ∈ Loc → ({W(E) | E ∈ Val} ∪ {FO(g) | g ∈ Tid})∗

B ∈ Tid → ({W(G, E) | G ∈ Loc, E ∈ Val} ∪ {FL(G) | G ∈ Loc} ∪ {FO(G) | G ∈ Loc} ∪ {SF})∗

PInit , _G. n BInit , _g. n

write/flush/flush-opt/sfencetyp(;) ∈ {W, FL, FO, SF}

B ′= B [g ↦→ B (g) · ;]

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ,B ′〉

read; = R(G, E)

get(<,P (G),B (g))(G) = E

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ,B〉

rmw; = RMW(G, ER, EW)

get(<,P (G), n)(G) = ERB (g) = n

∀~. FO(g) ∉ P (~)

P ′= P [G ↦→ P (G) · W(EW)]

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ′,B〉

rmw-fail; = R-ex(G, E)

get(<,P (G), n)(G) = E

B (g) = n

∀~. FO(g) ∉ P (~)

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ,B〉

mfence; = MF

B (g) = n

∀~. FO(g) ∉ P (~)

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ,B〉

prop-wB (g) = W(G, E) · b B ′

= B [g ↦→ b]

P ′= P [G ↦→ P (G) · W(E)]

〈<,P ,B〉n−→PTSOsyn

〈<,P ′,B ′〉

prop-flB (g) = FL(G) · b B ′

= B [g ↦→ b]

P (G) = n

〈<,P ,B〉n−→PTSOsyn

〈<,P ,B ′〉

prop-foB (g) = b1 · FO(G) · b2

W(G, _), FL(G), FO(G), SF ∉ b1B ′

= B [g ↦→ b1 · b2] P ′= P [G ↦→ P (G) · FO(g)]

〈<,P ,B〉n−→PTSOsyn

〈<,P ′,B ′〉

prop-sfB (g) = SF · b B ′

= B [g ↦→ b]

∀~. FO(g) ∉ P (~)

〈<,P ,B〉n−→PTSOsyn

〈<,P ,B ′〉

persist-wP (G) = W(E) · p

P ′= P [G ↦→ p] <′

=< [G ↦→ E]

〈<,P ,B〉n−→PTSOsyn

〈<′,P ′,B〉

persist-foP (G) = FO(_) · pP ′

= P [G ↦→ p]

〈<,P ,B〉n−→PTSOsyn

〈<,P ′,B〉

Fig. 2. The PTSOsyn Persistent Memory Subsystem (differences w.r.t. Px86 are highlighted)

The differences ofPTSOsynw.r.t.Px86 are highlighted in Fig. 2. First, the prop-fl transition onlyoccurs when P (G) = n to ensure that all previously propagated writes have persisted. Second, theprop-sfence transition (as well as rmw, rmw-fail, and mfence) only occurs when ∀~. FO(g) ∉

P (~) holds to ensure that propagation of each sfence blocks until previous flush-optimals of thesame thread have completed. Third, the persist-w and persist-fo transitions persist the entriesfrom the per-location persistence buffers in-order. Finally, the prop-w and prop-fl transitionspropagate entries from the head of a store buffer, so only prop-fo transitions may not use thestore buffers as perfect FIFO queues.

Example 4.2. It is instructive to refer back to the simple programs in Ex. 3.3 and see how samejudgments are obtained for PTSOsyn albeit in a different way. In particular, in these example thepropagation order must follow the issue order. Then, the behavior of program (C) is not explainedby out-of-order propagation, but rather by using the fact that x := 1 and y := 1 are propagated todifferent persistence buffers, and thus can persist in an order opposite to their propagation order.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:13

Example 4.3. As mentioned above, while PTSOsyn forbids propagating writes/flushes/sfencesbefore propagating prior entries, this is still not the case for flush-optimals that can propagatebefore prior write/flushes/flush-optimals.The program on the right demonstrates such case. The annotatedoutcome is allowed in Px86 (and thus, has to be allowed in PTSOsyn).The fact that y := 3 persisted implies that y := 2 propagated aftery := 1. Now, since writes propagate in order, we obtain that y := 2

propagated after x := 1. Had we required that fo(x) must propagateafter y := 2, we would obtain that fo(x) must propagate after x := 1.In turn, due to the sfence instruction, this would forbid z := 1 frompersisting before x := 1 has persisted.

x := 1 ;

y := 1 ;

if y = 2 then

y := 3 ;

y := 2 ;

fo(x) ;

sfence ;

z := 1 ;

Remark 5. There is an alternative formulation forPTSOsyn that always propagates flush-optimalsfrom the head of the store buffer. This simplification comes at the expense of complicating howflush-optimals are added into the store buffer upon issuing. Concretely, we can have a flush-optstep that does not put the new FO(G) entry in the tail of the store buffer (omit FO(G) from thewrite/flush/flush-opt/sfence issuing step). Instead, the step looks inside the buffer and putsthe FO(G)-entry immediately after the last pending entry ; with loc(;) = G or typ(;) = SF (or atthe head of the buffer is no such entry exists):

flush-opt1; = FO(G)

B (g) = bhead · U · btail loc(U) = G ∨ U = SF

W(G, _), FL(G), FO(G), SF ∉ btailB ′

= B [g ↦→ bhead · U · ; · btail]

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ,B ′〉

flush-opt2; = FO(G)

W(G, _), FL(G), FO(G), SF ∉ B (g)

B ′= B [g ↦→ ; · B (g)]

〈<,P ,B〉g,;−−→PTSOsyn

〈<,P ,B ′〉

This alternative reduces the level of non-determinism in the system. Roughly speaking, it is equiv-alent to eagerly taking prop-fo-steps, which is sound, since delaying a prop-fo-step may onlyput more constraints on the rest of the run. We suspect that insertions not in the tail of the buffer(even if done in deterministic positions) may appear slightly less intuitive than eliminations notfrom the head of the buffer, and so we continue with PTSOsyn as formulated in Fig. 2.

Example 4.4. An sfence (or an sfence-inducing operation: mfence and RMW) performed by onethread does not affect flush-optimals by other threads. To achieve this, PTSOsyn records threadidentifiers in FO-entries in the persistence buffer. (In Px86, this is captured by the fact that sfenceonly affects the propagation order from the (per-thread) store buffers.)

The program on the right demonstrates how this works. The anno-tated behavior is allowed by PTSOsyn: the flush-optimal entry in x’spersistence buffer has to be in that buffer at the point the sfence is is-sued (since the second thread has already observed y := 1). But, sinceit is an sfence coming from the store buffer of the second thread, andthe flush-optimal entry is by the first thread, the sfence has no effectin this case.

x := 1 ;

fo(x) ;

y := 1 ;

a := y ; //1sfence ;

if a = 1 then

z := 1 ;

The next lemma (used to prove Thm. 5.29 below) ensures that we can safely assume that crashesonly happen when all store buffers are empty (i.e., ending with Bn , _g . n). (Clearly, such assump-tion is wrong for the persistence buffers). Intuitively, it follows from the fact that we can alwaysremove from a trace all thread operations starting from the first write/flush/sfence operation thatdid not propagate from the store buffer before the crash. These can only affect the volatile part ofthe state.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:14 Artem Khyzha and Ori Lahav

Lemma 4.5. Suppose that 〈<0,Pn ,Bn〉C=⇒PTSOsyn

〈<,P ,B〉. Then:

• 〈<0,Pn ,Bn 〉C=⇒PTSOsyn

〈<′,P ′,Bn 〉 for some<′ and P ′.

• 〈<0,Pn ,Bn 〉C′

=⇒PTSOsyn〈<,P ,Bn〉 for some C ′ . C .

4.1 Observational Equivalence of Px86 and PTSOsyn

Our first main result is stated in the following theorem.

Theorem 4.6. Px86 and PTSOsyn are observationally equivalent.

Webriefly outline the key steps in the proof of this theorem. The full proof presented in Appendix Bformalizes the following ideas by using instrumented memory subsystems and employing two dif-ferent intermediate systems that bridge the gap between Px86 and PTSOsyn.We utilize Lemma 2.10, which splits the task of proving Theorem 4.6 into four parts:

(A) Every<0-initialized PTSOsyn-observable-trace is also an<0-initialized Px86-observable-trace.(B) For every <0-to-< PTSOsyn-observable-trace C , some C ′ . C is an <0-to-< Px86-observable-

trace.(C) Every<0-initialized Px86-observable-trace is also an<0-initialized PTSOsyn-observable-trace.(D) For every <0-to-< Px86-observable-trace C , some C ′ . C is an <0-to-< PTSOsyn-observable-

trace.

Part (A) requires showing that Px86 allows the same observable behaviors as PTSOsyn regard-less of the final memory. This part is straightforward: we perform silent persist-w and persist-fosteps at the end of the PTSOsyn run to completely drain the persistence buffers, and then moveall the persistence steps to be immediately after corresponding propagation steps. It is then easyto demonstrate that Px86 can simulate such sequence of steps.Part (B) requires showing that Px86 can survive crashes with the same non-volatile state as

PTSOsyn. We note that this cannot be always achieved by executing the exact same sequence ofsteps under PTSOsyn and Px86. Example 3.3(C) illustrates a case in point: If PTSOsyn propagatesall of the instructions, and only persists the write y := 1, to achieve the same result, Px86 needsto propagate y := 1 ahead of propagating fo(x) (otherwise, the persist-w step for y := 1 wouldrequire persisting fo(x) first, resulting in a non-volatile state different fromPTSOsyn’s). Our proofstrategy for part (B) is to reach the same non-volatile memory by omitting all propagation stepsof non-persisting flush-optimals from the run. We prove that this results in a trace that can betransformed into a Px86-observable-trace.Part (C) requires showing that PTSOsyn allows the same observable behaviors as Px86 regard-

less of the final memory. In order to satisfy stronger constraints on the content of the persistencebuffers upon the propagation steps ofPTSOsyn, we employ a transformation like the one from part(A) and obtain a trace of Px86, in which every persisted instruction is persisted immediately afterit is propagated. Unlike part (A), it is not trivial that PTSOsyn can simulate such a trace due to itsmore strict constraints on the propagation from the store buffers. We overcome this challenge byeagerly propagating and persisting flush-optimals as we construct an equivalent run of PTSOsyn

(as a part of a forward simulation argument).Part (D) requires showing that PTSOsyn can survive crashes with the same non-volatile state

as Px86. This cannot be always achieved by executing the exact same sequence of steps underPx86 and PTSOsyn, since they do not lead to the same non-volatile states: the synchronous se-mantics of flush, sfence, mfence and RMW instructions under PTSOsyn makes instructions persistearlier. However, the program state is lost after the crash, so at that point the client cannot observeoutcomes of instructions that did not persist. Therefore, crashing before a flush/flush-optimal in-struction persists is observationally indistinguishable from crashing before it propagates from the

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:15

store buffer. These intuitions allow us to reach the non-volatile memory in PTSOsyn with a per-thread-prefix of the program trace that reached that memory in Px86. More concretely, we trimthe sequence of steps ofPx86 to a per-thread prefix in order to remove all propagation steps of non-persisting flush/flush-optimal instructions, and then move the persistence steps of the persistinginstructions to be immediately after their propagation, which is made possible by certain commuta-tivity properties of persistence steps. This way, we essentially obtain a PTSOsyn-observable-trace,which, as in part (C), formally requires the eager propagation and persistence of flush-optimals.

5 DECLARATIVE SEMANTICS

In this section we provide an alternative characterization of PTSOsyn (and, due to the equivalencetheorem, also of Px86) that is declarative (a.k.a. axiomatic) rather than operational. In such se-mantics, instead of considering machine traces that are totally ordered by definition, one aims toabstract from arbitrary choices of the order of operations, and maintain such order only when itis necessary to do so. Accordingly, behaviors of concurrent systems are represented as partial or-ders rather than total ones. This more abstract approach, while may be less intuitive to work with,often leads to much more succinct presentations, and has shown to be beneficial for comparingmodels andmapping fromonemodel to another (see, e.g., [Podkopaev et al. 2019; Sarkar et al. 2012;Wickerson et al. 2017]), reasoning about sound program transformations (see, e.g., [Vafeiadis et al.2015]), and bounded model checking (see, e.g., [Abdulla et al. 2018; Kokologiannakis et al. 2017]).In the current paper, the declarative semantics is instrumental for establishing the DRF and map-ping theorem in §7.We present two different declarative models of PTSOsyn. Roughly speaking, the first, called

DPTSOsyn, is an extension the declarative TSO model in [Lahav et al. 2016], and it is closer tothe operational semantics as it tracks the propagation order. The second, called DPTSOmo

syn, is anextension the declarative TSOmodel in [Alglave et al. 2014] that employs per-location propagationorders on writes only, but ignores some of the program order edges.

5.1 A Declarative Framework for Persistency Specifications

Before introducing the declarative models, we present the general notions used to assign declara-tive semantics to persistent systems (see Def. 2.6). This requires several modifications of the stan-dard declarative approach that does not handle persistency. First, we define execution graphs, eachof which represents a particular behavior. We start with their nodes, called events.

Definition 5.1. An event is a triple 4 = 〈g, =, ;〉, where g ∈ Tid ∪ {⊥} is a thread identifier (⊥ isused for initialization events), = ∈ N is a serial number, and ; ∈ Lab is an event label (as defined inDef. 2.1). The functions tid,#, and lab return the thread identifier, serial number, and label of anevent. The functions typ, loc, valR, and valW are lifted to events in the obvious way. We denoteby E the set of all events, and by Init the set of initialization events, i.e., Init , {4 ∈ E | tid(4) = ⊥}.We use W,R,RMW,R-ex,MF, FL, FO, and SF for the sets of all events of the respective type (e.g.,R , {4 ∈ E | typ(4) = R}). Sub/superscripts are used to restrict these sets to certain location (e.g.,WG = {F ∈ W | loc(F) = G}) and/or thread identifier (e.g., Eg

= {4 ∈ E | tid(4) = g}).

Our representation of events induces a sequenced-before partial order on events, where 41 < 42holds iff (41 ∈ Init and 42 ∉ Init) or (41, 42 ∉ Init, tid(41) = tid(42), and #(41) < #(42)). Thatis, initialization events precede all non-initialization events, and events of the same thread areordered according to their serial numbers.Next, a (standard) mapping justifies every read with a corresponding write event:

Definition 5.2. A relation rf is a reads-from relation for a set � of events if the following hold:

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:16 Artem Khyzha and Ori Lahav

• rf ⊆ (� ∩ (W ∪ RMW)) × (� ∩ (R ∪ RMW ∪ R-ex)).• If 〈F, A〉 ∈ rf , then loc(F) = loc(A ) and valW(F) = valR(A ).• If 〈F1, A 〉, 〈F2, A 〉 ∈ rf , thenF1 = F2 (that is, rf −1 is functional).• ∀A ∈ � ∩ (R∪RMW∪R-ex). ∃F. 〈F, A〉 ∈ rf (each read event reads from some write event).

The “non-volatile outcome” of an execution graph is recorded in memory assignments:

Definition 5.3. A memory assignment ` for a set � of events is a function assigning an event in� ∩ (WG ∪ RMWG ) to every location G ∈ Loc.

Intuitively speaking, ` records the last write in the graph that persisted before the crash. Usingthe above notions, we formally define execution graphs.

Definition 5.4. An execution graph is a tuple� = 〈�, rf , `〉, where � is a finite set of events, rf isa reads-from relation for �, and ` is a memory assignment for �. The components of� are denotedby �.E, �.rf, and �.M. For a set � ⊆ E, we write �.� for �.E ∩ � (e.g., �.WG = �.E ∩ WG ). Inaddition, derived relations and functions are defined as follows:

�.po , {〈41, 42〉 ∈ �.E ×�.E | 41 < 42} (program order)

�.rfe , �.rf \�.po (external reads-from)

<(�) , _G. valW(�.M(G)) (induced persistent memory)

Our execution graphs are always initialized with some initial memory:

Definition 5.5. Given < : Loc → Val, an execution graph � is <-initialized if �.E ∩ Init =

{〈⊥, 0, W(G,<(G))〉 | G ∈ Loc}. We say that� is initialized if it is<-initialized for some< : Loc →

Val. We denote by<Init(�) the (unique) function< for which� is<-initialized.

A declarative characterization of a persistent memory subsystem is captured by the set of exe-cution graphs that the subsystem allows. Intuitively speaking, the conditions it enforces on �.rf

correspond to the consistency aspect of the memory subsystem; and those on �.M correspond toits persistency aspect.

Definition 5.6. A declarative persistency model is a set � of execution graphs. We refer to theelements of � as �-consistent execution graphs.

Now, to use a declarative persistency model for specifying the possible behaviors of programs(namely, what program states are reachable under a given model�), we need to formally associateexecution graphswith programs. The next definition uses the characterization of programs as LTSsto provide this association. (Note that at this stage �.rf and �.M are completely arbitrary.)

Notation 5.7. For a set� of events, thread identifier g ∈ Tid and event label ; ∈ Lab,NextEvent(�,g, ;)

denotes the event given by 〈g,max{#(4) | 4 ∈ �.Eg } + 1, ;〉.

Definition 5.8. An execution graph� is generated by a programPr with final state@ if 〈@Init

, �0〉 →∗

〈@,�.E〉 for some @Init

∈ Pr .QInit and �0 ⊆ Init, where → is defined by:

@g,;−−→Pr @

〈@, �〉 → 〈@′, � ∪ {NextEvent(�,g, ;)}〉

@n−→Pr @

〈@, �〉 → 〈@′, �〉

We say that � is generated by Pr if it is generated by Pr with some final state.

The following alternative characterization of the association of graphs and programs, based ontraces, is useful below.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:17

Definition 5.9. An observable program trace C ∈ (Tid × Lab)∗ is induced by an execution graph� if C = 〈tid(41), lab(41)〉, ... ,〈tid(4=), lab(4=)〉 for some enumeration 41, ... ,4= of �.E \ Init

that respects �.po (i.e., 〈48 , 4 9 〉 ∈ �.po implies that 8 < 9 ). We denote by traces(�) the set of allobservable program trace that are induced by� .

Proposition 5.10. Let C ∈ traces(�). Then, traces(�) = {C ′ ∈ (Tid × Lab)∗ | C ′ ∼ C } (where ∼ isper-thread equivalence of observable program traces, see Def. 2.3).

Proposition 5.11. If � is generated by Pr with final state @, then for every C ∈ traces(�), we

have @Init

C=⇒Pr @ for some @

Init∈ Pr .QInit.

Proposition 5.12. If @Init

C=⇒Pr @ for some @

Init∈ Pr .QInit and C ∈ traces(�), then � is generated

by Pr with final state @.

Now, following [Raad et al. 2020], reachability of program states under a declarative persistencymodel � is defined using “chains” of �-consistent execution graphs, each of which represents thebehavior obtained between two consecutive crashes. Examples 5.21 and 5.22 below illustrate someexecution graph chains for simple programs.

Definition 5.13. A program state @ ∈ Pr .Q is reachable under a declarative persistency model � ifthere exist �-consistent execution graphs�0, ... ,�= such that:

• For every 0 ≤ 8 ≤ = − 1,�8 is generated by Pr .• �= is generated by Pr with final state @.• �0 is<Init-initialized (where<Init = _G ∈ Loc. 0).• For every 1 ≤ 8 ≤ =,�8 is<(�8−1)-initialized.

In the sequel, we provide declarative formulations for (operational) persistent memory subsys-tems (see Def. 2.5). Observational refinements (and equivalence) between a persistent memory sub-system " and a declarative persistency model � are defined just like observational refinementsbetween persistent memory subsystems (see Def. 2.8), comparing reachable program states under" (using Def. 2.7) to reachable program states under � (using Def. 5.13).

The following lemmas are useful establishing refinements without considering all programs andcrashes (compare with Lemma 2.10). In both lemmas " denotes a persistent memory subsystem" , and � denotes a declarative persistency model.

Lemma 5.14. The following conditions together ensure that " observationally refines � :

(i) For every<0-initialized"-observable-trace C , there exists a �-consistent<0-initialized execu-tion graph� such that C ∈ traces(�).

(ii) For every <0-to-< "-observable-trace C , there exist C ′ . C and �-consistent <0-initializedexecution graph such that C ′ ∈ traces(�) and<(�) =<.

Lemma 5.15. If for every �-consistent initialized execution graph � , some C ∈ traces(�) is an<Init(�)-to-<(�) "-observable-trace, then � observationally refines " .

5.2 The DPTSOsyn Declarative Persistency Model

In this sectionwe define the declarativeDPTSOsynmodel. As in (standard) TSOmodels [Lahav et al.2016; Owens et al. 2009], DPTSOsyn-consistency requires one to justify an execution graph witha TSO propagation order (tpo), which, roughly speaking, corresponds to the order in which theevents in the graph are propagated from the store buffers.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:18 Artem Khyzha and Ori Lahav

Definition 5.16. The set of propagated events, denoted by P, is given by:

P , W ∪ RMW ∪ R-ex ∪ MF ∪ FL ∪ FO ∪ SF (= E \ R).

Given an execution graph � , a strict total order tpo on �.P is called a TSO propagation order for� .

DPTSOsyn-consistency sets several conditions on the TSO propagation order that, except forone novel condition related to persistency, are adopted from the model in [Lahav et al. 2016](which, in turn, is a variant of the model in [Owens et al. 2009]). To define these conditions, weuse the standard “from-read” derived relation, which places a read (or RMW) A before a write (orRMW) F when A reads from a write that was propagated before F . We parametrize this conceptby the order on writes. (Here we only need ' = tpo, but we reuse this definition in Def. 5.25 witha different '.)

Definition 5.17. The from-read (a.k.a. reads-before) relation for an execution graph� and a strictpartial order ' on �.E, denoted by�.fr('), is defined by:

�.fr(') ,⋃

G ∈Loc

( [RG ∪ RMWG ∪ R-exG ] ;�.rf−1 ; ' ; [WG ∪ RMWG ]) \ [E] .

Next, for persistency, we use one more derived relation. Since flushes and sfences in PTSOsyn

take effect at the moment they propagate from the store buffer, we can derive the existence of apropagation order from any flush event to location G (or flush-optimal to G followed by sfence)to any write F to G that propagated from the store buffer after �.M(G) persisted. Indeed, if thepropagation order went in the opposite direction, we would be forced to persist F and overwrite�.M(G), but the latter corresponds the last persisted write to G . This derived order is formalized asfollows. (Again, we need ' = tpo , but this definition is reused in Def. 5.25 with a different '.)

Definition 5.18. The derived TSO propagation order for an execution graph� and a strict partialorder ' on�.E, denoted by�.dtpo('), is defined by:

�.dtpo(') ,⋃

G ∈Loc

�.FLOG × {F ∈ WG ∪ RMWG | 〈�.M(G),F〉 ∈ '}

where �.FLOG is the following set:

�.FLOG , �.FLG ∪ (FOG ∩ dom(�.po ; [RMW ∪ R-ex ∪ MF ∪ SF])).

Using fr and dtpo, DPTSOsyn-consistency is defined as follows.

Definition 5.19. The declarative persistency model DPTSOsyn consists of all execution graphs� for which there exists a propagation order tpo for � such that the following hold:

(1) For every 0, 1 ∈ P, except for the case that 0 ∈ W ∪ FL ∪ FO, 1 ∈ FO, and loc(0) ≠ loc(1),if 〈0, 1〉 ∈ �.po, then 〈0, 1〉 ∈ tpo.

(2) tpo? ;�.rfe ;�.po? is irreflexive.(3) �.fr(tpo) ;�.rfe? ;�.po is irreflexive.(4) �.fr(tpo) ; tpo is irreflexive.(5) �.fr(tpo) ; tpo ;�.rfe ;�.po is irreflexive.(6) �.fr(tpo) ; tpo ; [RMW ∪ R-ex ∪ MF] ;�.po is irreflexive.(7) �.dtpo(tpo) ; tpo is irreflexive.

Conditions (1)−(6) take care of the concurrencypart of themodel. They are taken from [Lahav et al.2016] and slightly adapted to take into account the fact that our propagation order also orders FL,

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:19

FO, and SF events which do not exist in non-persistent TSOmodels.5 The only conditions that affectthe propagation order on such events are (1) and (2). Condition (1) forces the propagation orderto agree with the program order, except for the order between a W/FL/FO-event and a subsequentFO-event to a different location. This corresponds to the fact that propagation from PTSOsyn’sstore buffers is in-order, except for out-of-order propagation of FO’s, which can “overtake” preced-ing writes, flushes, and flush-optimals to different locations. In turn, condition (2) ensures that if aread event observes some writeF in the persistence buffer (or persistent memory) via�.rfe, thensubsequent events (including FL/FO/SF-events) are necessarily propagated from the store bufferafter the write F .Condition (7) is our novel constraint. It is the only condition required for the persistency part of

themodel. The approach in [Raad et al. 2020] forPx86 requires the existence of a persistence order,reflecting the order in which writes persist (after they propagate), and enforce certain conditionon this order. This makes the semantics less abstract (in the sense that it is closer to operationaltraces). Instead, we use the derived propagation order (induced by the graph component,�.M), andrequire that it must agree with the propagation order itself. This condition ensures that if a writeF to location G propagated from the store buffer before some flush to G , then the last persistedwrite cannot be a write that propagated before F . The same holds if F propagated before someflush-optimal to G that is followed by an sfence by the same thread (or any other instruction thathas the effect of an sfence).The following simple lemma is useful below.

Lemma 5.20. Let tpo be a propagation order for an execution graph � for which the conditions ofDef. 5.19 hold. Then, �.dtpo(tpo) ⊆ tpo.

Proof. Easily follows from the fact that tpo is total on �.P and the last condition in Def. 5.19.�

Example 5.21. The execution graphs depicted below correspond to the annotated behaviors ofthe simple sequential programs in Ex. 3.3. For every location G , the event �.M(G) is highlighted.The solid edges are programorder edges. In each graph,we also depict the tpo-edges that are forcedin order to satisfy conditions (1) − (6) above, and the �.dtpo(tpo)-edges they induce. Executiongraphs (A) and (C) are DPTSOsyn-consistent, while (B) and (D) violate condition (7) above.

W(x, 0) W(y, 0)

W(x, 1)

W(y, 1)

tpo tpo

tpo

W(x, 0) W(y, 0)

W(x, 1)

FL(x)

W(y, 1)

tpo tpo

tpo

tpo

dtpo

W(x, 0) W(y, 0)

W(x, 1)

FO(x)

W(y, 1)

tpo tpo

tpo

tpo

W(x, 0) W(y, 0)

W(x, 1)

FO(x)

SF

W(y, 1)

tpo tpo

tpo

tpo

tpo

dtpo

(�) ✓ (�) ✗ (�) ✓ (�) ✗

Example 5.22. The following example (variant of Ex. 4.3) demonstrates a non-volatile outcomethat is justified with a sequence of two DPTSOsyn-consistent execution graphs. In the graphsbelow we use serial numbers (=) to present a possible valid tpo relation Note that, for the firstgraph, it is crucial that program order from a write to an FO-event of a different location does notenforce a tpo-order in the same direction (otherwise, the graph would violate condition (7) above).

5Another technical difference is that we ensure here that failed CAS instructions, represented as R-ex events, are alsoacting as mfences, while in [Lahav et al. 2016; Raad et al. 2020] they are not distinguished from plain reads.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:20 Artem Khyzha and Ori Lahav

if (y = 3) then

if (x = 0) then

if (z = 1) then

z := 2 ;

x := 1 ;

y := 1 ;

if y = 2 then

y := 3 ;

y := 2 ;

fo(x) ;

sfence ;

z := 1 ;

(1) W(x, 0) (2) W(y, 0) (3) W(z, 0)

R(y, 0)

(5) W(x, 1)

(6) W(y, 1)

R(y, 2)

(8) W(y, 3)

(7) W(y, 2)

(4) FO(x)

(9) SF

(10) W(z, 1)

�0 ✓

rf

rf

dtpo

(1) W(x, 0) (2) W(y, 3) (3) W(z, 1)

R(y, 3)

R(x, 0)

R(z, 1)

(4) W(z, 2)

�1 ✓

rf

rf

rf

5.3 An Equivalent Declarative Persistency Model: DPTSOmosyn

We present an equivalent more abstract declarative model that requires existential quantificationover modification orders, rather than over propagation orders (total orders of �.P). Modificationorders totally order writes (including RMWs) to the same location, leaving unspecified the orderbetween other events, as well as the order between writes to different locations. This alternativeformulation has a global nature: it identifies an “happens-before” relation and requires acyclicitythis relation. In particular, it allows us to relate PTSOsyn to an SC persistency model (see §7).Unlike in SC, in TSO we cannot include �.po in the “happens-before” relation. Instead, we use

a restricted subset, which consists of the program order edges that are “preserved”.

Definition 5.23. The preserved program order relation for an execution graph � , denoted by�.ppo, is defined by:

�.ppo ,

{〈0, 1〉 ∈ �.po

����(0 ∈ W ∪ FL ∪ FO ∪ SF =⇒ 1 ∉ R) ∧

(0 ∈ W ∪ FL ∪ FO ∧ loc(0) ≠ loc(1) =⇒ 1 ∉ FO)

}

This definition extends the (non-persistent) preserved program order of TSO that is given by{〈0, 1〉 ∈ �.po | 0 ∈ W =⇒ 1 ∉ R} [Alglave et al. 2014].Using ppo, we state a global acyclicity condition, and show that it must hold in DPTSOsyn-

consistent executions.

Lemma 5.24. Let tpo be a propagation order for an execution graph � for which the conditions ofDef. 5.19 hold. Then, �.ppo ∪�.rfe ∪ tpo ∪�.fr(tpo) is acyclic.

Proof (outline). The proof considers a cycle in �.ppo ∪�.rfe ∪ tpo ∪�.fr(tpo) of minimallength. The fact that tpo is total on �.P and the minimality of the cycle imply that this cycle maycontain at most two events in P. Then, each of the possible cases is handled using one of theconditions of Def. 5.19. �

We now switch from propagation orders to modification orders and formulate the alternativedeclarative model.

Definition 5.25. A relationmo is amodification order for an execution graph� ifmo is a disjointunion of relations {moG }G ∈Loc where eachmoG is a strict total order on�.E∩(WG∪RMWG ). Givena modification order mo for � , the PTSOsyn-happens-before relation, denoted by �.hb(mo), isdefined by:

�.hb(mo) , (�.ppo ∪�.rfe ∪mo ∪�.fr(mo) ∪�.dtpo(mo))+.

Definition 5.26. The declarative persistency model DPTSOmosyn consists of all execution graphs

� for which there exists a modification order mo for � such that the following hold:

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:21

(1) �.hb(mo) is irreflexive. (2) �.fr(mo) ;�.po is irreflexive.

In addition to requiring that the PTSOsyn-happens-before is irreflexive, Def. 5.26 forbids �.po

to contradict �.fr(mo). Since program order edges from writes to reads are not included in�.hb(mo), the latter condition is needed to ensure “per-location-coherence” [Alglave et al. 2014].

Example 5.27. Revisiting Ex. 5.21 (B), inDPTSOmosyn-inconsistency follows from the�.dtpo(mo);

ppo loop from the flush event (mo is forced to agree with �.po). In turn, the consistency of�0 in Ex. 5.22 only requires to provide a modification order, which can have (1) → (5) for x,(2) → (6) → (7) → (8) for y, and (3) → (10) for z. Note that mo does not order writes todifferent locations as well as the flush-optimal and the sfence events.

We prove the equivalence of DPTSOsyn and DPTSOmosyn.

Theorem 5.28. DPTSOsyn = DPTSOmosyn.

Proof. For one direction, let � be a DPTSOsyn-consistent execution graph. Let tpo be a prop-agation order for � that satisfies the conditions of Def. 5.19. We define mo ,

⋃G ∈Loc [WG ∪

RMWG ] ; tpo ; [WG ∪RMWG ]. By definition, we have�.fr(mo) = �.fr(tpo) and�.dtpo(mo) =

�.dtpo(tpo). Using Lemma 5.24 and Lemma 5.20, it follows that mo satisfies the conditions ofDef. 5.26, and so � is DPTSOmo

syn-consistent.For the converse, let � be a DPTSOmo

syn-consistent execution graph. Let mo be a modificationorder for � that satisfies the conditions of Def. 5.26. Let ' be any total order on �.E extending�.hb(mo). Let tpo , [P] ; ' ; [P]. Again, we have �.fr(tpo) = �.fr(mo) and �.dtpo(tpo) =

�.dtpo(mo). This construction ensures that �.ppo ∪�.rfe ∪ tpo ∪�.fr(tpo) ∪�.dtpo(mo) iscontained in ', and thus acyclic. Then, all conditions of Def. 5.19 follow. �

5.4 Equivalence of PTSOsyn and DPTSOsyn

Using Lemmas 5.14 and 5.15, we show that PTSOsyn and DPTSOsyn are observationally equiva-lent. (Note that for showing thatDPTSOsyn observationally refinesPTSOsyn, we use the Lemma 5.24.)

Theorem 5.29. PTSOsyn and DPTSOsyn are observationally equivalent.

The proof is given in Appendix C.

6 PERSISTENT MEMORY SUBSYSTEM: PSC

In this section we present an SC-based persistent memory subsystem, which we call PSC. Thissystem is stronger, and thus easier to program with, than PTSOsyn. From a formal verificationpoint of view, assuming finite-state programs, in §6.1 we show that PSC can be represented as afinite transition system (like standard SC semantics), so that reachability of program states underPSC is trivially decidable (PSPACE-complete). In §6.2, we also accompany the operational defini-tion with an equivalent declarative one. The declarative formulation will be used in §7 to relatePTSOsyn and PSC.The persistent memory subsystemPSC is obtained fromPTSOsyn by simply discarding the store

buffers, thus creating direct links between the threads and the per-location persistence buffers.More concretely, issued writes go directly to the appropriate persistence buffer (made globally vis-ible immediately when they are issued); issued flushes to location G wait until the G-persistence-buffer has drained; issued flush-optimals go directly to the appropriate persistence buffer; and is-sued sfenceswait until all writes before a flush-optimal entry (of the same thread issuing the sfence)in every per-location persistence buffer have persisted. As in PTSOsyn, RMWs, failed RMWs, and

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:22 Artem Khyzha and Ori Lahav

< ∈ Loc → Val P ∈ Loc → ({W(E) | E ∈ Val} ∪ {FO(g) | g ∈ Tid})∗

PInit , _G. n

write; = W(G, E)

P ′= P [G ↦→ P (G) · W(E)]

〈<,P〉g,;−−→PSC 〈<,P ′〉

read; = R(G, E)

get(<,P (G))(G) = E

〈<,P〉g,;−−→PSC 〈<,P〉

rmw; = RMW(G, ER, EW)

get(<,P (G))(G) = ER∀~. FO(g) ∉ P (~)

P ′= P [G ↦→ P (G) · W(EW)]

〈<,P〉g,;−−→PSC 〈<,P ′〉

rmw-fail; = R-ex(G, E)

get(<,P (G))(G) = E

∀~. FO(g) ∉ P (~)

〈<,P〉g,;−−→PSC 〈<,P〉

mfence/sfence; ∈ {MF, SF}

∀~. FO(g) ∉ P (~)

〈<,P〉g,;−−→PSC 〈<,P〉

flush; = FL(G)

P (G) = n

〈<,P〉g,;−−→PSC 〈<,P〉

flush-opt; = FO(G)

P ′= P [G ↦→ P (G) · FO(g)]

〈<,P〉g,;−−→PSC 〈<,P ′〉

persist-wP (G) = W(E) · p

P ′= P [G ↦→ p] <′

=< [G ↦→ E]

〈<,P〉n−→PSC 〈<′,P ′〉

persist-foP (G) = FO(_) · pP ′

= P [G ↦→ p]

〈<,P〉n−→PSC 〈<,P ′〉

Fig. 3. The PSC Persistent Memory Subsystem

mfences induce an sfence.6 We note that without crashes, the effect of the persistence buffers isunobservable, and PSC trivially coincides with the standard SC semantics.We note that, unlike for PTSOsyn, discarding the store buffers in Px86 leads to a model that

is stronger than PSC, where flush and flush-optimals are equivalent (which makes sfences redun-dant), and providing this stronger semantics even to sequential programs requires placing addi-tional barriers.To formally define PSC, we again use a “lookup” function (overloading again the get notation).

In PSC, when thread g reads from a shared location G it obtains the latest accessible value of G ,which is defined by applying the following get function on the current persistent memory<, andthe current per-location persistence buffer p for location G :

get(<, p) , _G.

{E p = p1 · W(E) · p2 ∧ W(_) ∉ p2

< (G) otherwise

Using this definition, PSC is presented in Fig. 3. Its set of volatile states, PSC.Q̃, consists all per-location-persistence-buffer mappings. Initially all buffers are empty (PSC.Q̃Init = {Pn }).

Example 6.1. With the exception of Examples 4.3 and 5.22,PSC provides the same allowed/forbiddenjudgments as PTSOsyn (and Px86) for all of the examples above. (Obviously, standard litmustests, which are not related to persistency, differentiate the models.) The annotated behaviors inExamples 4.3 and 5.22 are, however, disallowed in PSC. Indeed, by removing the store buffers, PSC

requires that the order of entries in each persistence buffer follows exactly the order of issuing ofthe corresponding instructions (even when they are issued by different threads).

6In PSC there is no need in mfences, as they are equivalent to sfences; we only keep them here for the sake uniformity.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:23

< ∈ Loc → Val <̃ ∈ Loc → Val ! ⊆ Loc ) ⊆ Tid

<̃Init , _G. 0 !Init , Loc )Init , Tid

write-persist; = W(G, E) G ∈ !

<′=< [G ↦→ E ] <̃′

= <̃ [G ↦→ E ]

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<′, <̃′, !,) 〉

write-no-persist; = W(G, E)

<̃′= <̃ [G ↦→ E ] !′ = ! \ {G }

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃′, !′,) 〉

read; = R(G, E)

<̃ (G) = E

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃, !,) 〉

rmw-persist; = RMW(G, ER, EW) G ∈ !

<̃ (G) = ER g ∈ )

<′=< [G ↦→ EW ] <̃′

= <̃ [G ↦→ EW ]

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<′, <̃′, !′,) 〉

rmw-no-persist; = RMW(G, ER, EW)

<̃ (G) = ER g ∈ )

<̃′= <̃ [G ↦→ EW ] !′ = ! \ {G }

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃′, !′,) 〉

rmw-fail; = R-ex(G, E)

<̃ (G) = E g ∈ )

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃, !,) 〉

mfence/sfence; ∈ {MF, SF} g ∈ )

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃, !,) 〉

flush; = FL(G) G ∈ !

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃, !,) 〉

flush-opt-persist; = FO(G) G ∈ !

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃, !,) 〉

flush-opt-no-persist; = FO(G) !′ = ! \ {G } ) ′

= ) \ {g }

〈<,<̃, !,) 〉g,;−−→PSCfin

〈<,<̃, !′,) ′〉

Fig. 4. The PSCfin Persistent Memory Subsystem

6.1 An Equivalent Finite Persistent Memory Subsystem: PSCfin

From a formal verification perspective, PSC has another important advantage w.r.t. PTSOsyn. As-suming finite-state programs (i.e., finite sets of threads, values and locations, but still, possibly,loopy programs) the reachability problem under PSC (that is, checking whether a given pro-gram state @ is reachable under PSC according to Def. 2.7) is computationally simple—PSPACE-complete—just like under standard SC semantics [Kozen 1977]. Since PSC is an infinite state sys-tem (the persistence buffer are unbounded), the PSPACE upper bound is not immediate. To estab-lish this bound, we present an alternative persistent memory subsystem, called PSCfin, that isobservationally equivalent to PSC, and, assuming that Tid and Loc are finite, PSCfin is a finiteLTS.The system PSCfin is presented in Fig. 4. Its states keep track of a non-volatile memory<, a

(volatile) mapping <̃ of themost recent value to each location, a (volatile) set ! of locations that stillpersist, and a (volatile) set) of thread identifiers that may perform an sfence (or an sfence-inducinginstruction). Every write (or RMW) to some location G can “choose” to not persist, removing G

from !, and thus forbidding later writes to G to persist. Importantly, once some write to G didnot persist (so we have G ∉ !), flushes to G cannot be anymore executed (the system deadlocks). Asimilar mechanism handles flush-optimals: once a flush-optimal y thread g “chooses” to not persist,further writes to the same location may not persist, and, moreover, it removes g from ) , so thatthread g cannot anymore execute an sfence-inducing instruction (sfence, mfence, or RMW).

Theorem 6.2. PSC and PSCfin are observationally equivalent.

Remark 6. One may apply a construction like PSCfin for PTSOsyn, namely replacing the persis-tence buffers with a standard non-volatile memory <̃ and sets ! and) . For PTSOsyn such construc-tion does not lead to a finite-state machine, as we will still have unbounded store buffers. We leavethe investigation of the decidability of reachability under Px86 (equivalently, under PTSOsyn)

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:24 Artem Khyzha and Ori Lahav

to future work. Nevertheless, we note that the non-primitive recursive lower bound establishedby Atig et al. [2010] for reachability under the standard TSO semantics trivially extends to Px86.Indeed, for programs that start by resetting all memory locations to 0 (the very initial value), reach-ability of program states under Px86 coincides with reachability under TSO.

6.2 The DPSC Declarative Persistency Model

We present a declarative formulation of PSC, which we call DPSC. As DPTSOmosyn, it is based on

an “happens-before” relation.

Definition 6.3. Given a modification ordermo for an execution graph� , thePSC-happens-beforerelation, denoted by �.hbPSC(mo), is defined by:

�.hbPSC(mo) , (�.po ∪�.rf ∪mo ∪�.fr(mo) ∪�.dtpo(mo))+.

�.hbPSC(mo) extends the standard happens-before relation that defines SC [Alglave et al. 2014]with the derived propagation order (�.dtpo(mo)). In turn, it extends thePTSOsyn-happens-before(see Def. 5.25) by including all program order edges rather than only the “preserved” ones. Con-sistency simply enforces the acyclicity of �.hbPSC(mo):

Definition 6.4. The declarative persistency model DPSC consists of all execution graphs � forwhich there exists a modification order mo for� such that�.hbPSC(mo) is irreflexive.

Next, we establish the equivalence of PSC and DPSC (the proof is given in Appendix F).

Theorem 6.5. PSC and DPSC are observationally equivalent.

7 RELATING PTSOSYN AND PSC

In this section we develop a data-race-freedom (DRF) guarantee for PTSOsyn w.r.t. the strongerand simpler PSC model. This guarantee identifies certain forms of races and ensures that if allexecutions of a given program do not exhibit such races, then the program’s states that are reach-able under PTSOsyn are also reachable under PSC. Importantly, as standard in DRF guarantees,it suffices to verify the absence of races under PSC. Thus, programmers can adhere to a safe pro-gramming discipline that is formulated solely in terms of PSC.To facilitate the exposition, we start with a simplified version of the DRF guarantee, and later

strengthen the theorem by further restricting the notion of a race. The strengthened theoremis instrumental in deriving a sound mapping of programs from PSC to PTSOsyn, which can befollowed by compilers to ensure PSC semantics under x86-TSO.

7.1 A Simplified DRF Guarantee

The premise of the DRF result requires the absence of two kinds of races: (i) races between awrite/RMW operation and a read accessing the same location; and (ii) races between write/RMWoperation and a flush-optimal instruction to the same location. Write-write races are allowed. Sim-ilarly, racy reads are only plain reads, and not “R-ex’s” that arise from failed CAS operations. Inparticular, this ensures that standard locks, implemented using a CAS for acquiring the lock (in aspinloop) and a plain write for releasing the lock, are race free and can be safely used to avoid racesin programs. This frees us from the need to have lock and unlock primitives (e.g., as in [Owens2010]), and still obtain an applicable DRF guarantee.For the formal statement of the theorem, we define races and racy programs.

Definition 7.1. Given a read or a flush-optimal label ; , we say that thread g exhibits an ;-race in aprogram state @ ∈ Pr .Q if @(g) enables ; , while there exists a thread gW ≠ g such that @(gW) enablesan event label ;W with typ(;W) ∈ {W, RMW} and loc(;W) = loc(;).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:25

Definition 7.2. A program Pr is racy if for some program state @ ∈ Pr .Q that is reachable underPSC, some thread g exhibits an ;-race for some read or flush-optimal label ; .

The above notion of racy programs is operational (we believe it may be more easily applicableby developers compared to a declarative notion). It requires that under PSC, the program Pr canreach a state @ possibly after multiple crashes, where @ enables both a write/RMW by some threadgW and a read/flush-optimal of the same location by some other thread g . As mentioned above,Def. 7.2 formulates a property of programs under the PSC model.

Theorem 7.3. For a non-racy program Pr , a program state @ ∈ Pr .Q is reachable under PTSOsyn

iff it is reachable under PSC.

The theorem is a direct corollary of themore general result in Thm. 7.8 below. A simple corollaryof Thm. 7.3 is that single-threaded programs (e.g., those in Ex. 3.3) cannot observe the differencebetween PTSOsyn and PSC (due to the non-FIFO propagation of flush-optimals in PTSOsyn, eventhis is not completely trivial).

Example 7.4. Since PTSOsyn allows the propagation of flush-optimals before previously issuedwrites to different locations, it is essential to include races on flush-optimals in the definition above.Indeed, if races between writes and flush-optimals are not counted,then the program on the right is clearly race free. However, the an-notated persistent memory (z = w = 1 but x = y = 0) is reachableunder PTSOsyn (by propagating each flush-optimal before the priorwrite), but not under PSC.

x := 1 ;

fo(y) ;

sfence ;

z := 1 ;

y := 1 ;

fo(x) ;

sfence ;

w := 1 ;

7.2 A Generalized DRF Guarantee and a PSC to PTSOsyn Mapping

We refine our definition of races to be sufficiently precise for deriving a mapping scheme fromPSC to PTSOsyn as a corollary of the DRF guarantee. To do so, reads and flush-optimals are onlyconsidered racy if they are unprotected, as defined next.

Definition 7.5. Let d = ;1, ... ,;= be a sequence of event labels.

• A read label R(G, _) is unprotected after d if there is some 1 ≤ 8W ≤ = such that ;8W = W(~, _)with ~ ≠ G and for every 8W < 9 ≤ = we have ; 9 ∉ {W(G, _), RMW(_, _, _), R-ex(_, _), MF}.

• A flush-optimal label FO(G) is unprotected after d if there is some 1 ≤ 8W ≤ = such that ;8W =W(~, _)with~ ≠ G and for every 8W < 9 ≤ =wehave ; 9 ∉ {W(G, _), RMW(_, _, _), R-ex(_, _), MF, SF}.

Roughly speaking, unprotected labels are induced by read/flush-optimal instructions of locationG that follow some write instruction to a different location with no barrier, which can be either anRMW instruction, an mfence, or a write to G , intervening in between. Flush-optimal instructionsare also protected if an sfence barrier is placed between that preceding write and the flush-optimalinstruction.Using the last definitions, we define strongly racy programs.

Notation 7.6. For an observable program traces C and thread g , we denote by suffixg (C) the se-quence of event labels corresponding to the maximal crashless suffix of C |g (i.e., suffixg (C) = ;1, ... ,;=when 〈g, ;1〉, ... ,〈g, ;=〉 is the maximal crashless suffix of the restriction of C to transition labels ofthe form 〈g, _〉).

Definition 7.7. A program Pr is strongly racy if there exist @ ∈ Pr .Q, trace C , thread g , and a reador a flush-optimal label ; such that the following hold:

• @ is reachable under PSC via the trace C

(i.e., 〈@Init

,<Init,Pn〉C=⇒PrqPSC 〈@,<,P〉 for some @

Init∈ Pr .QInit and 〈<,P〉 ∈ PSC.Q).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:26 Artem Khyzha and Ori Lahav

• g exhibits an ;-race in @.• ; is unprotected after suffixg (C).

The generalized DRF result is stated in the next theorem.

Theorem 7.8. For a program Pr that is not strongly racy, a program state @ ∈ Pr .Q is reachableunder PTSOsyn iff it is reachable under PSC.

Example 4.4 is an example of a program that is racy but not strongly racy. By Thm. 7.8, thatprogram has only PSC-behaviors. Example 4.3 can be made not strongly racy: by adding an sfenceinstruction between y := 2 and fo(x); by strengthening fo(x) to fl(x); or by replacing y := 2

with an atomic exchange instruction (an RMW).An immediate corollary of Thm. 7.8 is that programs that only use RMWs when writing to

shared locations (e.g., [Morrison and Afek 2013]) may safely assume PSC semantics (all labels willbe protected). More generally, by “protecting” all racy reads and flush-optimals, we can transforma given program and make it non-racy according to the definition above. In other words, we obtaina compilation scheme from a language with PSC semantics to x86. Since precise static analysis ofraces is hard, such scheme may over-approximate. Concretely, a sound scheme can:

(i) like the standard compilation from SC to TSO [Mapping 2019], place mfences separating allread-after-write pairs of different locations (when there is no RMW already in between); and

(ii) place sfences separating all flush-optimal-after-write pairs of different locations (when thereis no RMW or other sfence already in between).

Moreover, since a write to G between a write to some location~ ≠ G and a flush-optimal to G makesthe flush protected, in the standard case where flush-optimal to some location G immediately fol-lows a write to G (for ensuring a persistence order for that write), flush-optimals can be compiledwithout additional barriers. Similarly, the other standard use of a flush-optimal to G after readingfrom G (known as “flush-on-read” for ensuring a persistence order for writes that the thread relieson) does not require additional barriers as well—an mfence is anyway placed between writes to lo-cations different than G and the read from G that precedes the flush-optimal. Thus, we believe thatfor most “real-world” programs the above schemewill not incur additional runtime overhead com-pared standard mappings from SC to x86 (see, e.g., [Liu et al. 2017; Marino et al. 2011; Singh et al.2012] for performance studies).To prove Thm. 7.8 we use the declarative formulations of PTSOsyn and PSC. First, we relate

unprotected labels as defined in Def. 7.5 with unprotected events in the corresponding executiongraph, as defined next.

Definition 7.9. Let � be an execution graph. An event 4 ∈ R ∪ FO with G = loc(4) is �-unprotected if one of the following holds:

• 4 ∈ �.R and 〈F, 4〉 ∈ �.po \ (�.po ; [WG ∪RMW∪R-ex∪MF] ;�.po) for someF ∈ W \ Init

with loc(F) ≠ G .• 4 ∈ �.FO and 〈F, 4〉 ∈ �.po \ (�.po ; [WG ∪ RMW ∪ R-ex ∪ MF ∪ SF] ; �.po) for someF ∈ W \ Init with loc(F) ≠ G .

Proposition 7.10. Let g ∈ Tid. Let� and� ′ be execution graphs such that� ′.Eg = �.Eg ∪{4} forsome�.po∪�.rf-maximal event 4 . If 4 is� ′-unprotected, then lab(4) is unprotected after suffixg (C)

for some observable program trace C ∈ traces(�).

The next key lemma, establishing the DRF-guarantee “on the execution graph level”, is neededfor proving Thm. 7.8. Its proof utilizes DPTSOmo

syn, which is closer to DPSC than DPTSOsyn.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:27

Lemma 7.11. Let� be aDPTSOsyn-consistent execution graph. Suppose that for everyF ∈ �.W∪

�.RMW and �-unprotected event 4 ∈ Rloc(F) ∪ FOloc(F) , we have either 〈F, 4〉 ∈ (�.po ∪�.rf)+

or 〈4,F〉 ∈ (�.po ∪�.rf)+. Then, � is DPSC-consistent.

With Lemma 7.11, the proof of Thm. 7.8 extends the standard declarativeDRF argument. Roughlyspeaking, we consider the first DPSC-inconsistent execution graph encountered in a chain ofexecution graphs for reaching a certain program state. Then, we show that a minimal DPSC-inconsistent prefix of that graph must entail a strong race as defined in Def. 7.7.

8 CONCLUSION AND RELATED WORK

We have presented an alternative x86-TSO persistency model, called PTSOsyn, formulated it op-erationally and declaratively, and proved it to be observationally equivalent to Px86 when ob-servations consist of reachable program states and non-volatile memories. To the best of our un-derstanding, PTSOsyn captures the intuitive persistence guarantees (of flush-optimal and sfenceinstructions, in particular) widely present in the literature on data-structure design as well ason programming persistent memory (see [Intel 2015; Intel 2019; Scargall 2020]). We have also pre-sented a formalization of an SC-based persistencymodel, calledPSC, which is simpler and strongerthan PTSOsyn, and related it to PTSOsyn via a sound compilation scheme and a DRF-guarantee.We believe that the developments of data structures and language-level persistency constructs fornon-volatile memory, such as listed in §1, may adopt PTSOsyn and PSC as their formal semanticfoundations. Our models may also simplify reasoning about persistency under x86-TSO both forprogrammers and automated verification tools.We have already discussed in length the relation of our work to [Raad et al. 2020]. Next, we

describe the relation to several other related work.Pelley et al. [2014] (informally) explore a hardware co-design for memory persistency and mem-

ory consistency and propose a model of epoch persistency under sequential consistency, whichsplits thread executions into epochs with special persist barriers, so that the order of persistenceis only enforced for writes from different epochs. Condit et al. [2009]; Joshi et al. [2015] proposehardware implementations for persist barriers to enable epoch persistency under x86-TSO. Whilex86-TSO does not provide a persist barrier, flush-optimals combined with an sfence instructioncould be used to this end.Kolli et al. [2016] conducted the first analysis of persistency under x86. They described the se-

mantics induced by the use of CLWB and sfence instructions as synchronous, reaffirming our obser-vation about the commonunderstanding of persistencymodels. The PTSOmodel [Raad and Vafeiadis2018], which was published before Px86, is a proposal for integrating epoch persistency with thex86-TSO semantics. It has synchronous explicit persist instructions and per-location persistencebuffers like our PTSOsyn model, but it is more complex (its persistence buffers are queues of persis-tence sub-buffers, each of which records pending writes of a given epoch), and uses coarse-grainedinstructions for persisting all pending writes, which were deprecated in x86 [Rudoff 2019].Kolli et al. [2017] propose a declarative language-level acquire-release persistency model offer-

ing new abstractions for programming for persistent memory in C/C++. In comparison, our workaims at providing a formal foundation for reasoning about the underlying architecture. Gogte et al.[2018] improved the model of [Kolli et al. 2017] by proposing a generic logging mechanism forsynchronization-free regions that aims to achieve failure atomicity for data-race-free programs.We conjecture that our results (in particular our DRF guarantee relating PTSOsyn and PSC) canserve as a semantic foundation in formally proving the failure-atomicity properties of their imple-mentation.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:28 Artem Khyzha and Ori Lahav

Raad et al. [2019] proposed a general declarative framework for specifying persistency seman-tics and formulated a persistency model for ARM in this framework (which is less expressive thanin x86). Our declarative models follow their framework, accounting for a specific outcomes usingchains of execution graphs, but we refrain from employing an additional “non-volatile-order” fortracking the order in which stores are committed to the non-volatile memory. Instead, in the spiritof a theoretical model of [Izraelevitz et al. 2016b], which gives a declarative semantics of epochpersistency under release consistency (assuming both an analogue of the synchronous sfence andalso an analogue of a deprecated coarse-grained flush instruction), we track the last persisted writefor each location, and use it to derive constraints on existing partial orders. Thus, we believe thatour declarative model is more abstract, and may provide a suitable basis for partial order reductionverification techniques (e.g., [Abdulla et al. 2018; Kokologiannakis et al. 2017]).

ACKNOWLEDGMENTS

We thank the POPL’21 reviewers for their helpful feedback and insights. This research was sup-ported by the Israel Science Foundation (grant number 5166651). The second author was also sup-ported by the Alon Young Faculty Fellowship.

REFERENCES

Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, and Tuan Phong Ngo. 2018. Optimal Stateless Model Check-ing under the Release-Acquire Semantics. Proc. ACM Program. Lang. 2, OOPSLA, Article 135 (Oct. 2018), 29 pages.https://doi.org/10.1145/3276505

Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding Cats: Modelling, Simulation, Testing, and Data MiningforWeakMemory. ACMTrans. Program. Lang. Syst. 36, 2, Article 7 (July 2014), 74 pages. https://doi.org/10.1145/2627752

Joy Arulraj, Justin Levandoski, Umar Farooq Minhas, and Per-Ake Larson. 2018. Bztree: A High-PerformanceLatch-Free Range Index for Non-Volatile Memory. Proc. VLDB Endow. 11, 5 (Jan. 2018), 553–565.https://doi.org/10.1145/3164135.3164147

Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi. 2010. On the Verification Prob-lem for Weak Memory Models. In POPL. ACM, New York, NY, USA, 7–18. https://doi.org/10.1145/1706299.1706303

Kumud Bhandari, Dhruva R. Chakrabarti, and Hans-Juergen Boehm. 2012. Implications of CPU Caching on Byte-addressable

Non-Volatile Memory Programming. Technical Report HPL-2012-236. Hewlett-Packard.Shimin Chen and Qin Jin. 2015. Persistent B+-Trees in Non-Volatile Main Memory. Proc. VLDB Endow. 8, 7 (Feb. 2015),

786–797. https://doi.org/10.14778/2752939.2752947Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coet-

zee. 2009. Better I/O Through Byte-addressable, Persistent Memory. In SOSP. ACM, New York, NY, USA, 133–146.https://doi.org/10.1145/1629575.1629589

Tudor David, Aleksandar Dragojević, Rachid Guerraoui, and Igor Zablotchi. 2018. Log-Free Concurrent Data Structures.In USENIX ATC. USENIX Association, USA, 373–385.

Michal Friedman, Naama Ben-David, Yuanhao Wei, Guy E. Blelloch, and Erez Petrank. 2020. NVTraverse: In NVRAMData Structures, the Destination is More Important than the Journey. In PLDI. ACM, New York, NY, USA, 377–392.https://doi.org/10.1145/3385412.3386031

Michal Friedman, Maurice Herlihy, Virendra Marathe, and Erez Petrank. 2018. A Persistent Lock-free Queue for Non-volatile Memory. In PPoPP. ACM, New York, NY, USA, 28–40. https://doi.org/10.1145/3178487.3178490

Vaibhav Gogte, Stephan Diestelhorst, William Wang, Satish Narayanasamy, Peter M. Chen, and Thomas F.Wenisch. 2018. Persistency for Synchronization-free Regions. In PLDI. ACM, New York, NY, USA, 46–61.https://doi.org/10.1145/3192366.3192367

Intel. 2015. Persistent Memory Programming. http://pmem.io/Intel. 2019. Intel 64 and IA-32 Architectures Software Developer’s Manual (Combined Volumes).

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf Order Number:325462-069US.

Joseph Izraelevitz, Hammurabi Mendes, and Michael L. Scott. 2016a. Brief Announcement: Preserving Happens-before inPersistent Memory. In SPAA. ACM, New York, NY, USA, 157–159. https://doi.org/10.1145/2935764.2935810

Joseph Izraelevitz, Hammurabi Mendes, and Michael L. Scott. 2016b. Linearizability of Persistent Memory Objects Undera Full-System-Crash Failure Model. In DISC. Springer Berlin Heidelberg, Berlin, Heidelberg, 313–327.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:29

Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient Persist Barriers for Multicores. In MICRO.ACM, New York, NY, USA, 660–671. https://doi.org/10.1145/2830772.2830805

Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Viktor Vafeiadis. 2017. Effective Stateless ModelChecking for C/C++ Concurrency. Proc. ACM Program. Lang. 2, POPL, Article 17 (Dec. 2017), 32 pages.https://doi.org/10.1145/3158105

Aasheesh Kolli, Vaibhav Gogte, Ali Saidi, Stephan Diestelhorst, Peter M. Chen, Satish Narayanasamy, andThomas F. Wenisch. 2017. Language-level Persistency. In ISCA. ACM, New York, NY, USA, 481–493.https://doi.org/10.1145/3079856.3080229

Aasheesh Kolli, Jeff Rosen, Stephan Diestelhorst, Ali Saidi, Steven Pelley, Sihang Liu, Peter M. Chen, and Thomas F.Wenisch. 2016. Delegated Persist Ordering. In MICRO. IEEE Press, Piscataway, NJ, USA, Article 58, 13 pages.http://dl.acm.org/citation.cfm?id=3195638.3195709

Dexter Kozen. 1977. Lower bounds for natural proof systems. In SFCS. IEEE Computer Society, Washington, 254–266.https://doi.org/10.1109/SFCS.1977.16

Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. 2016. Taming Release-Acquire Consistency. In POPL. ACM, New York,NY, USA, 649–662. https://doi.org/10.1145/2837614.2837643

Lucas Lersch, Xiangpeng Hao, Ismail Oukid, TianzhengWang, and ThomasWillhalm. 2019. Evaluating Persistent MemoryRange Indexes. Proc. VLDB Endow. 13, 4 (Dec. 2019), 574–587. https://doi.org/10.14778/3372716.3372728

Jihang Liu, Shimin Chen, and LujunWang. 2020. LB+Trees: Optimizing Persistent IndexPerformance on 3DXPointMemory.Proc. VLDB Endow. 13, 7 (March 2020), 1078–1090. https://doi.org/10.14778/3384345.3384355

Lun Liu, Todd Millstein, and Madanlal Musuvathi. 2017. A Volatile-by-Default JVM for Server Applications. Proc. ACM

Program. Lang. 1, OOPSLA, Article 49 (Oct. 2017), 25 pages. https://doi.org/10.1145/3133873Mapping 2019. C/C++11 mappings to processors. Retrieved July 3, 2019 from

http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.htmlDaniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. 2011. A Case for an

SC-Preserving Compiler. In PLDI. ACM, New York, NY, USA, 199–210. https://doi.org/10.1145/1993498.1993522Adam Morrison and Yehuda Afek. 2013. Fast Concurrent Queues for X86 Processors. In PPoPP. ACM, New York, NY, USA,

103–112. https://doi.org/10.1145/2442516.2442527Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A Hybrid SCM-

DRAM Persistent and Concurrent B-Tree for Storage Class Memory. In SIGMOD. ACM, New York, NY, USA, 371–386.https://doi.org/10.1145/2882903.2915251

Scott Owens. 2010. Reasoning About the Implementation of Concurrency Abstractions on x86-TSO. In ECOOP. Springer-Verlag, Berlin, Heidelberg, 478–503. http://dl.acm.org/citation.cfm?id=1883978.1884011

Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86MemoryModel: x86-TSO. In TPHOLs. Springer, Heidelberg,391–407. https://doi.org/10.1007/978-3-642-03359-9_27

Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory Persistency. In ISCA. IEEE Press, Piscataway, NJ,USA, 265–276. http://dl.acm.org/citation.cfm?id=2665671.2665712

Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2019. Bridging the Gap Between Programming Languagesand Hardware Weak Memory Models. Proc. ACM Program. Lang. 3, POPL, Article 69 (Jan. 2019), 31 pages.https://doi.org/10.1145/3290382

Azalea Raad and Viktor Vafeiadis. 2018. Persistence Semantics for Weak Memory: Integrating Epoch Persis-tency with the TSO Memory Model. Proc. ACM Program. Lang. 2, OOPSLA, Article 137 (Oct. 2018), 27 pages.https://doi.org/10.1145/3276507

Azalea Raad, John Wickerson, Gil Neiger, and Viktor Vafeiadis. 2020. Persistency Semantics of the Intel-x86 Architecture.Proc. ACM Program. Lang. 4, POPL, Article 11 (Jan. 2020), 31 pages. https://doi.org/10.1145/3371079

Azalea Raad, John Wickerson, and Viktor Vafeiadis. 2019. Weak Persistency Semantics from the Ground Up: Formalisingthe Persistency Semantics of ARMv8 and Transactional Models. Proc. ACM Program. Lang. 3, OOPSLA, Article 135 (Oct.2019), 27 pages. https://doi.org/10.1145/3360561

AndyM. Rudoff. 2019. Deprecating the PCOMMIT Instruction. https://software.intel.com/content/www/us/en/develop/blogs/deprecate-pcommit-instruction.htmlSusmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and

Derek Williams. 2012. Synchronising C/C++ and POWER. In PLDI. ACM, New York, NY, USA, 311–322.https://doi.org/10.1145/2254064.2254102

Steve Scargall. 2020. Programming Persistent Memory: A Comprehensive Guide for Developers. Apress Media, LLC.https://doi.org/10.1007/978-1-4842-4932-1

Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd Millstein, and Madanlal Musuvathi. 2012.End-to-End Sequential Consistency. SIGARCH Comput. Archit. News 40, 3 (June 2012), 524–535.https://doi.org/10.1145/2366231.2337220

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:30 Artem Khyzha and Ori Lahav

Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. CommonCompiler Optimisations are Invalid in the C11 Memory Model and what we can do about it. In POPL. ACM, New York,NY, USA, 209–220. https://doi.org/10.1145/2676726.2676995

Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and DurableData Structures for Non-Volatile Byte-Addressable Memory. In FAST. USENIX Association, USA, 5.

Tianzheng Wang, Justin J. Levandoski, and Per-Åke Larson. 2018. Easy Lock-Free Indexing in Non-Volatile Memory. InICDE. IEEE Computer Society, Los Alamitos, CA, USA, 461–472. https://doi.org/10.1109/ICDE.2018.00049

John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. 2017. Automatically Comparing MemoryConsistency Models. In POPL. ACM, New York, NY, USA, 190–204. https://doi.org/10.1145/3009837.3009838

Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: ReducingConsistency Cost for NVM-Based Single Level Systems. In FAST. USENIX Association, USA, 167–181.

Yoav Zuriel, Michal Friedman, Gali Sheffi, Nachshon Cohen, and Erez Petrank. 2019. Efficient Lock-free Durable Sets. Proc.ACM Program. Lang. 3, OOPSLA, Article 128 (Oct. 2019), 26 pages. https://doi.org/10.1145/3360554

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:31

A PROOFS FOR SECTION 2

Proposition A.1. For every observable program trace C with ∉ C :

〈@,<,<̃〉C=⇒Prq" 〈@′,<′, <̃′〉 ⇐⇒ (@

C=⇒Pr @

′ ∧ 〈<,<̃〉C=⇒" 〈<′, <̃′〉)

The proposition follows immediately from Definition 2.6.

Lemma 2.10. The following conditions together ensure that a persistent memory subsystem "1

observationally refines a persistent memory subsystem "2:

(i) Every<0-initialized "1-observable-trace is also an<0-initialized "2-observable-trace.(ii) For every<0-to-< "1-observable-trace C1, some C2 . C1 is an<0-to-< "2-observable-trace.

Proof. Suppose that @ ∈ Pr .Q is reachable under "1. Then, by Def. 2.7, 〈@,<,<̃〉 is reachablein Pr q "1 for some 〈<,<̃〉 ∈ "1 .Q. Thus, there exist crashless observable program traces C0, ... ,C= ,initial program states @

0, ... ,@= ∈ Pr .QInit, initial non-volatile memories<1, ... ,<= ∈ Loc → Val,

and initial volatile states <̃0, ... ,<̃= ∈ "1.Q̃Init , such that the following hold:

• 〈@0,<Init, <̃0〉

C0==⇒Prq"1

〈_,<1, _〉, and 〈@8,<8, <̃8〉C8=⇒Prq"1

〈_,<8+1, _〉 for every 1 ≤ 8 ≤

= − 1.• 〈@=,<=, <̃=〉

C===⇒Prq"1

〈@, _, _〉.

By Prop. A.1, it follows that:

• @8C8=⇒Pr _ for every 0 ≤ 8 ≤ = − 1, and @=

C===⇒Pr @.

• C0 is an<Init-to-<"1-observable-trace, and C8 is an<8-to-<8+1"1-observable-trace for every1 ≤ 8 ≤ = − 1.

• C= is an<=-initialized "1-observable-trace.

Then, assumption (ii) entails that there exist C ′0, ... ,C ′=−1 such that the following hold:

• C ′8 . C8 for every 0 ≤ 8 ≤ = − 1.• C ′

0is a<Init-to-<1"2-observable-trace, and C ′8 is an<8-to-<8+1"2-observable-trace for every

1 ≤ 8 ≤ = − 1.

Therefore, there exist initial volatile states <̃′0, ... ,<̃′

=−1 ∈ "2 .Q̃Init such that:

〈<Init, <̃′0〉

C′0

==⇒"2〈<1, _〉 and 〈<8 , <̃

′8〉

C′8

=⇒"2〈<8+1, _〉 for every 1 ≤ 8 ≤ = − 1.

Now, since @8C8=⇒Pr _ and C ′8 . C8 for every 0 ≤ 8 ≤ = − 1, by Prop. 2.4, we have @8

C′8=⇒Pr _ for every

0 ≤ 8 ≤ = − 1. By Prop. A.1, it follows that:

〈@0,<Init, <̃

′0〉

C′0

==⇒Prq"2〈_,<1, _〉 and 〈@8 ,<8, <̃

′8〉

C′8

=⇒Prq"2〈_,<8+1, _〉 for every 1 ≤ 8 ≤ =−1 (1)

In addition, assumption (i) entails that C= is an <=-initialized "2-observable-trace. Therefore,

there exists <̃′= ∈ "2 .Q̃Init such that 〈<=, <̃

′=〉

C===⇒Prq"2

〈_, _〉. Knowing that @=C===⇒Pr @ holds, we

conclude:〈@=,<=, <̃

′=〉

C===⇒Prq"2

〈@, _, _〉 (2)

Putting Eq. (1) and Eq. (2) together, we have shown that there exist C ′ = C ′0· · ... · · C ′=−1 · · C=

and <̃′0, ... ,<̃′

= ∈ "2.Q̃Init such that:

〈@0,<Init, <̃

′0〉

C′0

==⇒Prq"2〈_,<1, _〉

−→Prq"2

〈@1,<1, <̃

′1〉

C′1

==⇒Prq"2...

...C′=−1===⇒Prq"2

〈_,<=, _〉 −→Prq"2

〈@=,<=, <̃′=〉

C===⇒Prq"2

〈@, _, _〉,

meaning that @ is reachable for Pr under the persistent memory subsystem "2. �

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:32 Artem Khyzha and Ori Lahav

B PROOFS FOR SECTION 4

To carry out our equivalence proofs we use instrumented versions of Px86 and PTSOsyn. We alsointroduce two additional (instrumented) persistent memory subsystems, iPTSO1 and iPTSO2,that serve as intermediate systems in our proof. In Appendix B.1 we generally define instrumentedpersistent memory subsystems. In Appendix B.2 we present the instrumented version of Px86. InAppendix B.3 we present iPTSO1 and iPTSO2. In Appendix B.4 we present the instrumentedversion of PTSOsyn. In Appendix B.5 we use these subsystems to establish the proof of Thm. 4.6.Finally, in Appendix B.6 we provide the proof of Lemma 4.5.

B.1 Instrumented Persistent Memory Subsystems

Instrumented persistent memory subsystems are defined similarly to persistent memory subsys-tems, except for their transition labels (the alphabet of the LTS), which carry more information.In particular, the observable transition labels of the form 〈g, ;〉 of persistent memory subsystemsare augmented with an identifier B ∈ N, which uniquely identifies the transition. The n-labels ofsilent transitions of persistent memory subsystems are made more informative as well. Hence, thetransition labels of an instrumented persistent memory subsystem i" consists of transition labelsof the form 〈g, ;#B〉 (where g ∈ Tid and ; ∈ Lab) as well as a set denoted by i".iΣ of instru-mented silent transition labels, which differs from one system to another. We assume that, like theinstrumented non-silent transition labels, the instrumented silent transition labels also include anidentifier B ∈ N. We use the function #(·) to retrieve this identifier from a given instrumented(silent or non-silent) transition label.

In the sequel, we use the same definition style and terminology that we used for persistent mem-ory subsystems also in the context of instrumented persistent memory subsystems (e.g., definingonly the volatile component of the state).The following erasure function Λ forgets the instrumentation in the transition labels.

Definition B.1. For a transition label U of an instrumented persistent memory subsystem i" ,Λ(U) is defined as follows:

Λ(U) ,

{〈g, ;〉 U = 〈g, ;#B〉

n U ∈ i".iΣ

The erasure of a trace iC of an instrumented persistent memory subsystem i" , denoted by Λ(iC),is the sequence obtained from Λ(iC (1)), ... ,Λ(iC (|iC |)) by omitting all n labels.

As usual with instrumented operational semantics, it will be easy to see that the instrumentationdoes not affect the observable behaviors. Formally, we require the existence of an erasure (many-to-one) function from instrumented states to non-instrumented ones that satisfies certain conditions,as defined next.

Definition B.2. Let" be a persistent memory subsystem and i" be an instrumented persistentmemory subsystem. A function Λ : i".Q̃ → ".Q̃ is an erasure function if the following conditionshold:

• ".Q̃Init = {Λ(<̃) | <̃ ∈ i".Q̃Init}.

• If 〈<, i<̃〉g,;#B−−−→i" 〈<′, i<̃′〉, then 〈<,Λ(i<̃)〉

g,;−−→" 〈<′,Λ(i<̃′)〉.

• If 〈<, i<̃〉U−→i" 〈<′, i<̃′〉 for some U ∈ i".iΣ, then 〈<,Λ(i<̃)〉

n−→" 〈<′,Λ(i<̃′)〉.

• If 〈<,Λ(i<̃)〉g,;−−→" 〈<′, <̃′〉, then 〈<, i<̃〉

g,;#B−−−→i" 〈<′, i<̃′〉 for some B ∈ N and i<̃′ ∈ i".Q̃

such that Λ(i<̃′) = <̃′.• If 〈<,Λ(i<̃)〉

n−→" 〈<′, <̃′〉, then 〈<, i<̃〉

U−→i" 〈<′, i<̃′〉 for some U ∈ i".iΣ and i<̃′ ∈

i".Q̃ such that Λ(i<̃′) = <̃′.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:33

Given such function Λ, we say that i" is a Λ-instrumentation of " . Furthermore, i" is called aninstrumentation of" if it is a Λ-instrumentation of" for some erasure function Λ.

Lemma B.3. Let i" be a Λ-instrumentation of a persistent memory subsystem " . Then, the fol-lowing hold:

• For every <0,< ∈ Loc → Val, i<̃Init ∈ i".Q̃Init , i<̃ ∈ i".Q̃, and iC , if 〈<0, i<̃Init〉iC−→i"

〈<, i<̃〉, then 〈<0,Λ(<̃Init)〉Λ(iC)====⇒" 〈<,Λ(i<̃)〉.

• For every<0,< ∈ Loc → Val, <̃Init ∈ ".Q̃Init , <̃ ∈ ".Q̃, and C , if 〈<0, <̃Init〉C=⇒" 〈<,<̃〉, then

〈<0, i<̃Init〉iC−→i" 〈<, i<̃〉 for some iC , i<̃Init ∈ i".Q̃Init, and i<̃ ∈ i".Q̃ such that Λ(iC) = C

and Λ(i<̃) = <̃.

B.2 iPx86: Instrumented Px86

The instrumented versions of our TSO-based persistent memory subsystems augment the entriesof the persistent and store buffers with the identifier B ∈ N that was used in the label of the issuingstep that added the entry to the buffer. For instance, we have entries of the form W(G, E)#B in thepersistence buffer instead of W(G, E); and FL(G)#B in the store buffer instead of FL(G). Then, whenpropagating an entry with identifier B , we include B in the instrumented silent transition label. Thisallows us to easily relate the transitions in which events are issued, propagated from store buffer,and persist. For instance, a write step generates a fresh identifier B (included both in the transitionlabel and in the new store buffer entry), that is (possibly) reused in a (exactly one) later prop-wstep, and further (possibly) reused in (exactly one) later persist-w step.

Definition B.4. An instrumented persistence buffer is a finite sequence ip of elements of the formU#B whereU is a persistence-buffer entry (of the form W(G, E) or PER(G)) and B ∈ N. An instrumentedstore buffer is a finite sequence ib of elements of the form U#B where U is a store-buffer entry (ofthe form W(G, E), FL(G), FO(G), or SF) and B ∈ N. An instrumented store-buffer mapping is a functioniB assigning an instrumented store buffer to every g ∈ Tid.

Definition B.5. The erasure of an instrumented persistence buffer ip, denoted by Λ(ip), is thepersistence buffer obtained from ip by omitting the identifier B from all symbols. Similarly, theerasure of an instrumented store buffer ib, denoted by Λ(ib), is the store buffer obtained from ib

by omitting the identifier B from all symbols, and it is lifted to instrumented store-buffer mappingsin the obvious way.

Using these definitions, iPx86 (instrumented Px86) is presented in Fig. 5. The functions tid,typ, loc are extended to iPx86.iΣ in the obvious way (in particular, for U ∈ iPx86.iΣ, we havetyp(U) ∈ {PropW/PropFL/PropFO/PropSF/PerW/PerPER}).

It is easy to see that iPx86 is an instrumentation of Px86.

Lemma B.6. iPx86 is a Λ-instrumentation of Px86 for Λ , _〈ip, iB , (〉. 〈Λ(ip),Λ(iB )〉.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:34 Artem Khyzha and Ori Lahav

iPx86.iΣ ,{ 〈g, PropW(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropFL(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N}

∪ { 〈g, PropFO(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropSF#B 〉 | g ∈ Tid, B ∈ N}

∪ {PerW(G)#B | G ∈ Loc, B ∈ N} ∪ {PerPER(G)#B | G ∈ Loc, B ∈ N}

< ∈ Loc → Val ip ∈ ( {W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪ {PER(G)#B | G ∈ Loc, B ∈ N})∗

iB ∈ Tid → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪ {FL(G)#B | G ∈ Loc,N ∈ N}

∪ {FO(G)#B | G ∈ Loc, B ∈ N} ∪ {SF#B | B ∈ N})∗ ( ⊆ N

ipInit , n iBInit , _g. n ( Init = ∅

write/flush/flush-opt/sfence(′ = ( ⊎ {B }

typ(;) ∈ {W, FL, FO, SF}

iB ′= iB [g ↦→ iB (g) · ;#B ]

〈<, ip, iB , ( 〉g,;#B−−−−→iPx86 〈<, ip, iB ′, (′〉

read(′ = ( ⊎ {B }

; = R(G, E)

get(<, Λ(ip),Λ(iB (g)) ) (G) = E

〈<, ip, iB , ( 〉g,;#B−−−−→iPx86 〈<, ip, iB , ( ′〉

rmw(′ = ( ⊎ {B }

; = RMW(G, ER, EW)

get(<, Λ(ip), n) (G) = ERiB (g) = n

ip′ = ip · W(G, EW)#B

〈<, ip, iB , ( 〉g,;#B−−−−→iPx86 〈<, ip′, iB , (′〉

rmw-fail(′ = ( ⊎ {B }

; = R-ex(G, E)get(<, Λ(ip), n) (G) = E

iB (g) = n

〈<, ip, iB , ( 〉g,;#B−−−−→iPx86 〈<, ip, iB , ( ′〉

mfence(′ = ( ⊎ {B }

; = MF

iB (g) = n

〈<, ip, iB , ( 〉g,;#B−−−−→iPx86 〈<, ip, iB , (′〉

prop-w! = PropW(G)#B

iB (g) = ib1 · W(G, E)#B · ib2W(_, _)#_, FL(_)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] ip′ = ip · W(G, E)#B

〈<, ip, iB , ( 〉g,!−−−→iPx86 〈<, ip′, iB ′, ( 〉

prop-fl! = PropFL(G)#B

iB (g) = ib1 · FL(G)#B · ib2W(_, _)#_, FL(_)#_, FO(G)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] ip′ = ip · PER(G)#B

〈<, ip, iB , ( 〉g,!−−−→iPx86 〈<, ip′, iB ′, ( 〉

prop-fo! = PropFO(G)#B

iB (g) = ib1 · FO(G)#B · ib2W(G, _)#_, FL(G)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] ip′ = ip · PER(G)#B

〈<, ip, iB , ( 〉g,!−−−→iPx86 〈<, ip′, iB ′, ( 〉

prop-sf! = PropSF#B

iB (g) = SF#B · ib

iB ′= iB [g ↦→ ib ]

〈<, ip, iB , ( 〉g,!−−−→iPx86 〈<, ip, iB ′, ( 〉

persist-w! = PerW(G)#B

ip = ip1 · W(G, E)#B · ip2W(G, _)#_, PER(_)#_ ∉ ip1

ip′ = ip1 · ip2 <′=< [G ↦→ E ]

〈<, ip, iB , ( 〉!−→iPx86 〈<′, ip′, iB , ( 〉

persist-per! = PerPER(G)#B

ip = ip1 · PER(G)#B · ip2W(G, _)#_, PER(_)#_ ∉ ip1

ip′ = ip1 · ip2

〈<, ip, iB , ( 〉!−→iPx86 〈<, ip′, iB , ( 〉

Fig. 5. The iPx86 Instrumented Persistent Memory Subsystem (the instrumentation is colored).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:35

B.3 Intermediate Systems iPTSO1 and iPTSO2

For the proof of equivalence of Px86 and PTSOsyn, we use two intermediate instrumented persis-tent memory subsystems: iPTSO1 and iPTSO2. Next, we present these systems.

Definition B.7. An instrumented per-location persistence buffer is a finite sequence ip of elementsof the form U#B where U is a per-location persistence buffer entry (of the form W(E) or FO(g))and B ∈ N. An instrumented per-location-persistence-buffer mapping is a function iP assigning aninstrumented per-location persistence buffer to every G ∈ Loc.

Definition B.8. The erasure of an instrumented per-location persistence buffer ip, denoted byΛ(ip), is the per-location persistence buffer obtained from ip by omitting the identifier B fromall symbols. It is lifted to instrumented per-location-persistence-buffer mappings in the obviousway.

iPTSO1 is presented in Fig. 6. Note that the per-location-persistence-buffers of iPTSO1 do notinclude FO(g)-entries (these are used in the other systems below). The rules write/flush/flush-

opt/sfence, mfence and prop-sf are identical to the rules of iPx86. The rules read, rmw, rmw-

fail and prop-w are analogous to those of iPx86 (they are trivially adjusted to operate with per-location persistence buffers).Themain feature of iPTSO1 is that it makes all flush and flush-optimal instructions blocking. To

this end, propagation of FO(G) and FL(G) is predicated upon iP (G) being empty, and persistencesteps for writes persist writes from the heads of the buffers.iPTSO2 is presented in Fig. 7. This instrumented persistent memory subsystem is similar to

(the instrumented version of) PTSOsyn with the exception that its store buffers do not have the"almost" FIFO behavior of PTSOsyn and propagate entries out-of-order. We further highlight thedifferences w.r.t. iPTSO1. Like iPTSO1, PTSO2 also has synchronous flush instructions, how-ever, flush-optimal instructions are asynchronous. The prop-fo transition is analogous to iPx86

(adjusted to the type of persistence buffers). PTSO2 makes sfence instructions synchronous, aswell as other serializing instructions, which results in rmw, rmw-fail, mfence and prop-sf en-forcing persistence of all flush-optimal instructions preceding the given one in program order asrequired by the constraint (∀~. FO(g)#_ ∉ iP (~)). Finally, persist-fo simply ensures that writesto a given location persist before the subsequent flush-optimal instruction.

B.4 iPTSOsyn: Instrumented PTSOsyn

Wewill also need an instrumented version of PTSOsyn, called iPTSOsyn. This system is presentedin Fig. 8. It is identical to iPTSO2, except for some transitions (as highlighted in the figure). It iseasy to see that iPTSOsyn is an instrumentation of PTSOsyn.

LemmaB.9. iPTSOsyn is aΛ-instrumentation ofPTSOsyn forΛ , _〈iP , iB , (〉. 〈Λ(iP ),Λ(iB )〉.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:36 Artem Khyzha and Ori Lahav

iPTSO1 .iΣ ,{ 〈g, PropW(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropFL(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N}

∪ { 〈g, PropFO(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropSF#B 〉 | g ∈ Tid, B ∈ N}

∪ {PerW(G)#B | G ∈ Loc, B ∈ N}

< ∈ Loc → Val iP ∈ Loc → {W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N}∗

iB ∈ Tid → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪ {FL(G)#B | G ∈ Loc, B ∈ N}

∪ {FO(G)#B | G ∈ Loc, B ∈ N} ∪ {SF#B | B ∈ N})∗ ( ⊆ N

iPInit , _G. n iBInit , _g. n ( Init = ∅

write/flush/flush-opt/sfence(′ = ( ⊎ {B }

typ(;) ∈ {W, FL, FO, SF}

iB ′= iB [g ↦→ iB (g) · ;#B ]

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO1

〈<, iP, iB ′, (′〉

read(′ = ( ⊎ {B }

; = R(G, E)

get(<, Λ(iP (G) ),Λ(iB (g)) ) (G) = E

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO1

〈<, iP, iB , (′〉

rmw(′ = ( ⊎ {B }

; = RMW(G, ER, EW)

get(<, Λ(iP (G) ), n) (G) = ERiB (g) = n

iP ′= iP [G ↦→ iP (G) · W(EW)#B ]

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO1

〈<, iP ′, iB , ( ′〉

rmw-fail(′ = ( ⊎ {B }

; = R-ex(G, E)get(<, Λ(iP (G) ), n) (G) = E

iB (g) = n

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO1

〈<, iP, iB , (′〉

mfence(′ = ( ⊎ {B }

; = MF

iB (g) = n

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO1

〈<, iP, iB , ( ′〉

prop-w! = PropW(G)#B

iB (g) = ib1 · W(G, E)#B · ib2W(_, _)#_, FL(_)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] iP ′

= iP [G ↦→ iP (G) · W(E)#B ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO1

〈<, iP ′, iB ′, ( 〉

prop-fl! = PropFL(G)#B

iB (g) = ib1 · FL(G)#B · ib2W(_, _)#_, FL(_)#_, FO(G)#_, SF#_ ∉ ib1

iP (G) = n

iB ′= iB [g ↦→ ib1 · ib2 ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO1

〈<, iP, iB ′, ( 〉

prop-fo! = PropFO(G)#B

iB (g) = ib1 · FO(G)#B · ib2W(G, _)#_, FL(G)#_, SF#_ ∉ ib1

iP (G) = n

iB ′= iB [g ↦→ ib1 · ib2 ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO1

〈<, iP, iB ′, ( 〉

prop-sf! = PropSF#B

iB (g) = SF#B · ib

iB ′= iB [g ↦→ ib ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO1

〈<, iP, iB ′, ( 〉

persist-w! = PerW(G)#B

iP (G) = W(E)#B · ip

iP ′= iP [G ↦→ ip ] <′

=< [G ↦→ E ]

〈<, iP, iB , ( 〉!−→iPTSO1

〈<′, iP ′, iB , ( 〉

Fig. 6. The iPTSO1 Instrumented Persistent Memory Subsystem (differences w.r.t. iPx86 are highlighted)

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:37

iPTSO2 .iΣ ,{ 〈g, PropW(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropFL(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N}

∪ { 〈g, PropFO(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropSF#B 〉 | g ∈ Tid, B ∈ N}

∪ {PerW(G)#B | G ∈ Loc, B ∈ N} ∪{PerFO(G)#B | G ∈ Loc, B ∈ N}

< ∈ Loc → Val iP ∈ Loc → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪{FO(g)#B | g ∈ Tid, B ∈ N})∗

iB ∈ Tid → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪ {FL(G)#B | G ∈ Loc, B ∈ N}

∪ {FO(G)#B | G ∈ Loc, B ∈ N} ∪ {SF#B | B ∈ N})∗ ( ⊆ N

iPInit , _G. n iBInit , _g. n ( Init = ∅

write/flush/flush-opt/sfence(′ = ( ⊎ {B }

typ(;) ∈ {W, FL, FO, SF}

iB ′= iB [g ↦→ iB (g) · ;#B ]

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO2

〈<, iP, iB ′, (′〉

read(′ = ( ⊎ {B }

; = R(G, E)

get(<, Λ(iP (G) ),Λ(iB (g)) ) (G) = E

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO2

〈<, iP, iB , (′〉

rmw(′ = ( ⊎ {B }

; = RMW(G, ER, EW)

get(<, Λ(iP (G) ), n) (G) = ERiB (g) = n

∀~. FO(g)#_ ∉ iP (~)

iP ′= iP [G ↦→ iP (G) · W(EW)#B ]

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO2

〈<, iP ′, iB , ( ′〉

rmw-fail(′ = ( ⊎ {B }

; = R-ex(G, E)get(<, Λ(iP (G) ), n) (G) = E

iB (g) = n

∀~. FO(g)#_ ∉ iP (~)

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO2

〈<, iP, iB , (′〉

mfence(′ = ( ⊎ {B }

; = MF

iB (g) = n

∀~. FO(g)#_ ∉ iP (~)

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSO2

〈<, iP, iB , ( ′〉

prop-w! = PropW(G)#B

iB (g) = ib1 · W(G, E)#B · ib2W(_, _)#_, FL(_)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] iP ′

= iP [G ↦→ iP (G) · W(E)#B ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO2

〈<, iP ′, iB ′, ( 〉

prop-fl! = PropFL(G)#B

iB (g) = ib1 · FL(G)#B · ib2W(_, _)#_, FL(_)#_, FO(G)#_, SF#_ ∉ ib1

iP (G) = n

iB ′= iB [g ↦→ ib1 · ib2 ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO2

〈<, iP, iB ′, ( 〉

prop-fo! = PropFO(G)#B

iB (g) = ib1 · FO(G)#B · ib2W(G, _)#_, FL(G)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] iP ′

= iP [G ↦→ iP (G) · FO(g)#B ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO2

〈<, iP ′, iB ′, ( 〉

prop-sf! = PropSF#B

iB (g) = SF#B · ib

∀~. FO(g)#_ ∉ iP (~)

iB ′= iB [g ↦→ ib ]

〈<, iP, iB , ( 〉g,!−−−→iPTSO2

〈<, iP, iB ′, ( 〉

persist-w! = PerW(G)#B

iP (G) = W(E)#B · ip

iP ′= iP [G ↦→ ip ] <′

=< [G ↦→ E ]

〈<, iP, iB , ( 〉!−→iPTSO2

〈<′, iP ′, iB , ( 〉

persist-fo! = PerFO(G)#B

iP (G) = FO(g)#B · ip

iP ′= iP [G ↦→ ip ]

〈<, iP, iB , ( 〉!−→iPTSO2

〈<, iP ′, iB , ( 〉

Fig. 7. The iPTSO2 Instrumented PersistentMemory Subsystem (differences w.r.t. iPTSO1 are highlighted)

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:38 Artem Khyzha and Ori Lahav

iPTSOsyn .iΣ ,{ 〈g, PropW(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropFL(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N}

∪ { 〈g, PropFO(G)#B 〉 | g ∈ Tid, G ∈ Loc, B ∈ N} ∪ { 〈g, PropSF#B 〉 | g ∈ Tid, B ∈ N}

∪ {PerW(G)#B | G ∈ Loc, B ∈ N} ∪ {PerFO(G)#B | G ∈ Loc, B ∈ N}

< ∈ Loc → Val iP ∈ Loc → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪ {FO(g)#B | g ∈ Tid, B ∈ N})∗

iB ∈ Tid → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N} ∪ {FL(G)#B | G ∈ Loc, B ∈ N}

∪ {FO(G)#B | G ∈ Loc, B ∈ N} ∪ {SF#B | B ∈ N})∗ ( ⊆ N

iPInit , _G. n iBInit , _g. n ( Init = ∅

write/flush/flush-opt/sfence(′ = ( ⊎ {B }

typ(;) ∈ {W, FL, FO, SF}

iB ′= iB [g ↦→ iB (g) · ;#B ]

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSOsyn

〈<, iP, iB ′, (′〉

read(′ = ( ⊎ {B }

; = R(G, E)

get(<, Λ(iP (G) ),Λ(iB (g)) ) (G) = E

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSOsyn

〈<, iP, iB , (′〉

rmw(′ = ( ⊎ {B }

; = RMW(G, ER, EW)

get(<, Λ(iP (G) ), n) (G) = ERiB (g) = n

∀~. FO(g)#_ ∉ iP (~)

iP ′= iP [G ↦→ iP (G) · W(EW)#B ]

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSOsyn

〈<, iP ′, iB , (′〉

rmw-fail(′ = ( ⊎ {B }

; = R-ex(G, E)get(<, Λ(iP (G) ), n) (G) = E

iB (g) = n

∀~. FO(g)#_ ∉ iP (~)

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSOsyn

〈<, iP, iB , (′〉

mfence(′ = ( ⊎ {B }

; = MF

iB (g) = n

∀~. FO(g)#_ ∉ iP (~)

〈<, iP, iB , ( 〉g,;#B−−−−→iPTSOsyn

〈<, iP, iB , (′〉

prop-w! = PropW(G)#B

iB (g) = W(G, E)#B · ib

iB ′= iB [g ↦→ ib ] iP ′

= iP [G ↦→ iP (G) · W(E)#B ]

〈<, iP, iB , ( 〉g,!−−−→iPTSOsyn

〈<, iP ′, iB ′, ( 〉

prop-fl! = PropFL(G)#B

iB (g) = FL(G)#B · ib

W(_, _)#_, FL(_)#_, FO(G)#_, SF#_ ∉ ib1iP (G) = n

iB ′= iB [g ↦→ ib ]

〈<, iP, iB , ( 〉g,!−−−→iPTSOsyn

〈<, iP, iB ′, ( 〉

prop-fo! = PropFO(G)#B

iB (g) = ib1 · FO(G)#B · ib2W(G, _)#_, FL(G)#_, FO(G)#_, SF#_ ∉ ib1

iB ′= iB [g ↦→ ib1 · ib2 ] iP ′

= iP [G ↦→ iP (G) · FO(g)#B ]

〈<, iP, iB , ( 〉g,!−−−→iPTSOsyn

〈<, iP ′, iB ′, ( 〉

prop-sf! = PropSF#B

iB (g) = SF#B · ib

∀~. FO(g)#_ ∉ iP (~)

iB ′= iB [g ↦→ ib ]

〈<, iP, iB , ( 〉g,!−−−→iPTSOsyn

〈<, iP, iB ′, ( 〉

persist-w! = PerW(G)#B

iP (G) = W(E)#B · ip

iP ′= iP [G ↦→ ip ] <′

=< [G ↦→ E ]

〈<, iP, iB , ( 〉!−→iPTSOsyn

〈<′, iP ′, iB , ( 〉

persist-fo! = PerFO(G)#B

iP (G) = FO(g)#B · ip

iP ′= iP [G ↦→ ip ]

〈<, iP, iB , ( 〉!−→iPTSOsyn

〈<, iP ′, iB , ( 〉

Fig. 8. The iPTSOsyn Instrumented Persistent Memory Subsystem (differences w.r.t. iPTSO2 are

highlighted)

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:39

B.5 Proof of Theorem 4.6

With the four systems above, we prove Thm. 4.6.Utilizing Lemma 2.10, we need to show:

(A) Every<0-initialized PTSOsyn-observable-trace is also an<0-initialized Px86-observable-trace.(B) For every <0-to-< PTSOsyn-observable-trace C , some C ′ . C is an <0-to-< Px86-observable-

trace.(C) Every<0-initialized Px86-observable-trace is also an<0-initialized PTSOsyn-observable-trace.(D) For every <0-to-< Px86-observable-trace C , some C ′ . C is an <0-to-< PTSOsyn-observable-

trace.

In the proof outlines below, we highlight the steps whose proofs we found more interesting.The proofs of the non-highlighted steps are easier and mostly straightforward.

B.5.1 General Definitions for all Parts.

Definition B.10. Let � be an LTS. We say that a pair 〈f, f ′〉 ∈ �.Σ × �.Σ of transition labelsA-commutes if

f−→� ;

f′

−→� ⊆f′

−→� ;f−→� .

Definition B.11. A trace iC of one the systems iPx86, iPTSO2, or iPTSOsyn is called PropFO-complete if for every 8 ∈ dom(iC ) with iC (8) = 〈g, PropFO(G)#B〉, we have #(iC ( 9 )) = B for some9 > 8 . In addition, if iC is a iPx86-trace, we also say that iC is

(1) PropFL-complete if for every 8 ∈ dom(iC )with iC (8) = 〈g, PropFL(G)#B〉, we have#(iC ( 9 )) =

B for some 9 > 8 .(2) {PropFL, PropFO}-complete if iC is both PropFL-complete and PropFO-complete.

Definition B.12. Given a trace iC of one the systems iPTSO2 or iPTSOsyn, the delay function3iC : dom(iC ) → N assigns to every 8 ∈ dom(iC ) with typ(iC (8)) ∈ {RMW, PropW, PropFO} thedifference 9 −8−1where 9 > 8 is the (unique) index satisfying#(iC ( 9 )) = #(iC (8)). If typ(iC (8)) ∉{RMW, PropW, PropFO} or such index 9 does not exist, the delay 3iC (8) is defined to be 0. Similarly,if iC is a trace of iPx86, the delay function 3iC : dom(iC ) → N assigns to every 8 ∈ dom(iC ) withtyp(iC (8)) ∈ {RMW, PropW, PropFO, PropFL} the difference 9 − 8 − 1 where 9 > 8 is the (unique)index satisfying#(iC ( 9 )) = #(iC (8)). If typ(iC (8)) ∉ {RMW, PropW, PropFO, PropFL} or such index9 does not exist, the delay 3iC (8) is defined to be 0.

Definition B.13. A trace iC of one the systems iPx86, iPTSO2, or iPTSOsyn is synchronous if3iC (8) = 0 for every 1 ≤ 8 ≤ |iC |.

B.5.2 Proof of (A). The proof of (A) is structured as follows:

(A.0) Let C be an<0-initialized PTSOsyn-observable-trace.(A.1) By Lemmas B.3 and B.9, there exists some<0-initialized iPTSOsyn-trace iC such that Λ(iC) = C .(A.2) By Lemma B.16, there exists some<0-initialized iPx86-trace iC ′ such that Λ(iC ′) = Λ(iC ).(A.3) By Lemmas B.3 and B.6, Λ(iC ′) is an<0-initialized Px86-observable-trace.(A.4) Then, the claim follows observing that Λ(iC ′) = Λ(iC ) = C .

Lemma B.14. For every<0-initialized iPTSOsyn-trace iC , there exists some PropFO-complete<0-initialized iPTSOsyn-trace iC

′ such that Λ(iC ) = Λ(iC ′).

Proof. iC can be extended to some iC ′ so that every 〈_, RMW(G, _, E)#B〉, 〈_, PropW(G)#B〉, and〈_, PropFO(G)#B〉 has a matching PerW(G)#B or PerFO(G)#B . Indeed, since it is always possible topersist entries of persistence buffer in order, we can simply append corresponding labels in theorder in which unmatched propagation events occur in iC . �

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:40 Artem Khyzha and Ori Lahav

Lemma B.15. For every PropFO-complete<0-initialized iPTSOsyn-trace iC , there exists some syn-chronous PropFO-complete<0-initialized iPTSOsyn-trace iC

′ such that Λ(iC) = Λ(iC ′).

Proof sketch. Wecan transform iC into a synchronousPropFO-complete<0-initialized iPTSOsyn-trace iC ′ simply bymoving PerW(G)#B and PerFO(G)#B immediately aftermatching 〈_, PropW(G)#B〉,〈_, RMW(G, _, E)#B〉, or 〈_, PropFO(G)#B〉 labels in iC . In a PropFO-complete trace, the writes G thatdo not persist always occur after PerFO(G)#_ steps. With that observed, one can argue that consid-ering propagation labels in order and moving their matching persist labels is possible, as relevantpersistence buffers constraints are satisfied by construction. �

LemmaB.16 (StepA.2). For every<0-initialized iPTSOsyn-trace iC , there exists some<0-initializediPx86-trace iC ′ such that Λ(iC ′) = Λ(iC).

Proof sketch. By Lemma B.14 applied to iC , there exists some PropFO-complete<0-initializediPTSOsyn-trace iC1 such thatΛ(iC) = Λ(iC1). Moreover, by Lemma B.15 applied to iC1, there existssome synchronous PropFO-complete<0-initialized iPTSOsyn-trace iC ′1 such thatΛ(iC1) = Λ(iC ′

1).

We further transform iC ′1into iC ′ by putting a persist step PerPER(G)#B after each PropFL(G)#B ,

and by replacing PerFO(G)#B after each PropFO(G)#B with PerPER(G)#B . Note that the resultingtrace is {PropFL, PropFO}-complete and synchronous.We argue that iC ′ that is a iPx86-trace. Indeed, for all but persistence steps, whenever iPTSOsyn

performs a step, the same step is possible in iPx86. The persistence steps in iC ′ are enabled byconstruction, since their constraints on the content of the persistence buffer are trivially satisfiedin a synchronous trace. Overall, we have constructed iC ′ that is <0-initialized iPx86-trace suchthat Λ(iC ′) = Λ(iC ). �

B.5.3 Proof of (B). The proof of (B) is structured as follows:

(B.0) Let C be an<0-to-< PTSOsyn-observable-trace.(B.1) By Lemmas B.3 and B.9, there exists some<0-to-< iPTSOsyn-trace iC such that Λ(iC ) = C .(B.2) By Lemma B.17, iC is also an<0-to-< iPTSO2-trace.(B.3) By Lemma B.22, there exists some<0-to-< iPTSO1-trace iC1 such that Λ(iC1) . Λ(iC ).(B.4) By Lemma B.23, there exists some<0-to-< iPx86-trace iC ′ such that Λ(iC ′) = Λ(iC1).(B.5) By Lemmas B.3 and B.6, Λ(iC ′) is an<0-to-< Px86-observable-trace.(B.6) Then, the claim follows observing that Λ(iC ′) = Λ(iC1) . Λ(iC ) = C .

Lemma B.17. Every<0-to-< iPTSOsyn-trace iC is also an<0-to-< iPTSO2-trace.

Proof. Every transition of iPTSOsyn is also a transition of iPTSO2. �

Lemma B.18. For every <0-to-< iPTSO2-trace iC , there exists some PropFO-complete <0-to-<iPTSO2-trace iC

′ such that Λ(iC ′) . Λ(iC).

Proof. We take iC ′ to be the trace obtained from iC by discarding all transition labels at an index8 with typ(iC (8)) = PropFO but#(iC ( 9 )) ≠ #(iC (8)) for every 9 > 8 . It is straightforward to verifythat iC ′ is a PropFO-complete<0-to-< iPTSO2-trace, as well as that Λ(iC ′) . Λ(iC ). �

Proposition B.19. 〈U, V〉 iPTSO2-commutes if typ(V) ∈ {PerW, PerFO} and one of the follow-ing conditions holds:

• typ(U) ∉ {PerW, PerFO} and #(U) ≠ #(V).• typ(U) ∈ {PerW, PerFO} and loc(U) ≠ loc(V).

Lemma B.20. For every PropFO-complete<0-to-< iPTSO2-trace iC , there exists some synchronousPropFO-complete<0-to-< iPTSO2-trace iC

′ such that Λ(iC ′) = Λ(iC).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:41

Proof. By induction on the sum of delays in iC (i.e.,∑

8 3iC (8)). If this sum is 0, then we cantake iC ′ = iC . Otherwise, consider the minimal 1 ≤ 8 ≤ |iC | with 3iC (8) > 0. Then, we havetyp(iC (8)) ∈ {RMW, PropW, PropFO} and #(iC ( 9 )) = #(iC (8)) for 9 = 8 + 3iC (8) + 1. FollowingiPTSO2’s transitions, it must be the case that loc(iC ( 9 )) = loc(iC (8)), typ(iC ( 9 )) = PerW iftyp(iC (8)) ∈ {RMW, PropW}, and typ(iC ( 9 )) = PerFO if typ(iC (8)) = PropFO. Now, it is straight-forward to verify that 〈iC ( 9 − 1), iC ( 9 )〉 must satisfy one of the conditions in Prop. B.19, and so thispair iPTSO2-commutes. The resulting PropFO-complete<0-to-< iPTSO2-trace has smaller sumof delays, and the claim follows by applying the induction hypothesis. �

Lemma B.21. For every synchronous PropFO-complete<0-to-< iPTSO2-trace iC , there exists some<0-to-< iPTSO1-trace iC

′ such that Λ(iC ′) = Λ(iC).

Proof. We obtain iC ′ by merging consecutive prop-fo and persist-fo steps in iC into one prop-fo step of iPTSO1, thus maintaining the persistence buffers without FO-entries. �

Lemma B.22 (Step B.3). For every<0-to-< iPTSO2-trace iC , there exists some<0-to-< iPTSO1-trace iC ′ such that Λ(iC ′) . Λ(iC).

Proof. By Lemma B.18, there exists some PropFO-complete <0-to-< iPTSO2-trace iC2 suchthat Λ(iC2 ) . Λ(iC ). Then, by Lemma B.20, there exists a synchronous PropFO-complete<0-to-< iPTSO2-trace iCB such that Λ(iCB) = Λ(iC2 ). Then, by Lemma B.21, there exists an <0-to-<iPTSO1-trace iC ′ such that Λ(iC ′) = Λ(iCB). Now, since Λ(iC2 ) . Λ(iC), Λ(iCB) = Λ(iC2 ), andΛ(iC ′) = Λ(iCB), we have that Λ(iC ′) . Λ(iC), and the claim follows. �

Lemma B.23 (Step B.4). For every <0-to-< iPTSO1-trace iC , there exists some <0-to-< iPx86-trace iC ′ such that Λ(iC ′) = Λ(iC ).

Proof sketch. We transform iC into iC ′ by putting a persist step PerPER(G)#B after each oc-currence of PropFL(G)#B or PropFO(G)#B . All of the steps in iC ′ are trivially enabled in iPx86 byconstruction, so iC ′ is an<0-to-< iPx86-trace. �

B.5.4 Helper Lemmas for (C) and (D). To prove (C) and (D), we introduce several trace transfor-mation properties for persisting synchronously.

Proposition B.24. 〈U, V〉 iPx86-commutes if typ(V) ∈ {PerW, PerPER} and one of the followingconditions holds:

• typ(U) ∉ {PerW, PerPER} and #(U) ≠ #(V).• typ(U) = PerW and loc(U) ≠ loc(V).

Lemma B.25. For every {PropFL, PropFO}-complete <0-to-< iPx86-trace iC , there exists somesynchronous {PropFL, PropFO}-complete<0-to-< iPx86-trace iC ′ such that Λ(iC ′) = Λ(iC).

Proof. By induction on the sum of delays in iC (i.e.,∑

8 3iC (8)). If this sum is 0, then we can takeiC ′ = iC . Otherwise, consider the minimal 1 ≤ 8 ≤ |iC | with 3iC (8) > 0. Then, we have typ(iC (8)) ∈{RMW, PropW, PropFL, PropFO} and #(iC ( 9 )) = #(iC (8)) for 9 = 8 + 3iC (8) + 1. Following iPx86’stransitions, it must be the case that loc(iC ( 9 )) = loc(iC (8)), typ(iC ( 9 )) = PerW if typ(iC (8)) ∈

{RMW, PropW}, and typ(iC ( 9 )) = PerPER if typ(iC (8)) ∈ {PropFL, PropFO}. Consider the possiblecases:

(1) typ(iC ( 9 − 1)) ∉ {PerW, PerPER}: Then, by Prop. B.24, 〈iC ( 9 − 1), iC ( 9 )〉 iPx86-commutes.The resulting PropFO-complete <0-to-< iPx86-trace has smaller sum of delays, and theclaim follows by applying the induction hypothesis.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:42 Artem Khyzha and Ori Lahav

(2) typ(iC ( 9 − 1)) = PerW: The minimality of 8 ensures that the index 8 ′ with #(iC (8 ′)) =

#(iC ( 9 − 1)) satisfies 8 ′ ≥ 8 . Following iPx86’s transitions, we must have loc(iC ( 9 − 1)) ≠

loc(iC ( 9 )) (writes to the same location persist in their propagation order). Then, again, theclaim follows using Prop. B.24 and the induction hypothesis.

(3) typ(iC ( 9 − 1)) = PerPER: The minimality of 8 ensures that the index 8 ′ with #(iC (8 ′)) =

#(iC ( 9 − 1)) satisfies 8 ′ ≥ 8 . Following iPx86’s transitions, we must have typ(iC ( 9 )) =

PerW (PER-entries to the same location are removed from the persistence buffer in theirpropagation order), as well as loc(iC ( 9 − 1)) ≠ loc(iC ( 9 )) (a PER-entry cannot be removedfrom the persistence buffer if there is a preceding write entry to the same location). In thiscase we can swap iC ( 9 − 1) and C ( 9 ), and, as before obtain a PropFO-complete <0-to-<iPx86-trace, so the claim follows by the induction hypothesis. �

Lemma B.26 (Steps C.3 and D.3). For every<0-to-< iPTSO2-trace iC , there exists some<0-to-<iPTSOsyn-trace iC

′ such that Λ(iC ′) = Λ(iC).

Proof (outline). We use a standard forward simulation argument, where iPTSOsyn eagerlytakes prop-fo and persist-fo steps whenever possible. Then, iPTSOsyn is always at a state inwhich the flush-optimals are further propagated w.r.t. the corresponding state of iPTSO2 (e.g., aflush-optimal in iPTSO2’s store buffer may already be in iPTSOsyn’s persistence buffer). In thiscase, the flush-optimals impose only (possibly) weaker constraints on the transitions. For this argu-ment to work we rely on the fact that a flush-optimal of a certain thread being further propagateddoes not impose constraints on actions of other threads.More formally, we define a simulation relation ' between iPTSO2-states and iPTSOsyn-states.

To define ' we use the notation B |) to restrict a sequence B (which will be an instrumented per-location persistence buffer or an instrumented store buffer) to entries of type X ∈ ) (yielding apossibly shorter sequence). The simulation relation ' ⊆ iPTSO2 .Q × iPTSOsyn.Q is defined asfollows: 〈〈<2, iP2, iB2, (2〉, 〈<, iP , iB , (〉〉 ∈ ' if the following hold:

• <2 =< and (2 = ( .• For every G ∈ Loc, iP2(G) | {W} = iP (G) | {W}.• For every g ∈ Tid, iB2(g) | {W,FL,SF} = iB (g) | {W,FL,SF}.• If iB (g) (8) ∈ {W(G, _)#_, FL(G)#_, SF#_} and iB (g) ( 9 ) = FO(G)#_ for some 8 < 9 , theniB (g) (82) = iB (g) (8) and iB (g) ( 92) = iB (g) ( 9 ) for some 82 < 92.

• If iP (G) (8) = W(_)#_ and iP (G) ( 9 ) = FO(g)#B for some 8 < 9 , then one of the followingholds:– iP2(G) (82) = iP (G) (8) and iP2 (G) ( 92) = FO(g)#B for some 82 < 92; or– iP2(G) (82) = iP (G) (8) and iB2 (g) ( 92) = FO(G)#B for some 82 and 92.

• If iB2 (g) (82) = SF#_ and iB2 (g) ( 92) = FO(_)#_ for some 82 < 92, then iB (g) (8) = iB2 (g) (82)

and iB (g) ( 9 ) = iB2 (g) ( 92) for some 8 < 9 .• If iB (g) ( 9 ) = FO(G)#_, then iB (g) (8) ∈ {W(G, _)#_, FL(G)#_, SF#_} for some 8 < 9 .• If iP (G) ( 9 ) = FO(_)#_, then iP (G) (8) = W(_)#_ for some 8 < 9 .

Initially, we clearly have 〈〈<0,Pn ,Bn , ∅〉, 〈<0,Pn ,Bn , ∅〉〉 ∈ '. Now, suppose that 〈<, iP2, iB2, (〉U−→iPTSO2

〈<′, iP ′2, iB ′

2, ( ′〉, and let 〈<1, iP , iB , (1〉 ∈ iPTSOsyn.Q such that 〈〈<, iP2, iB2, (〉, 〈<1, iP , iB , (1〉〉 ∈

'. Then, we have< =<1 and ( = (1. We show that 〈<, iP , iB , (〉C−→iPTSOsyn

〈<′, iP ′, iB ′, ( ′〉 forsome C , iP ′, and iB ′ such that Λ(C) = Λ(U) and 〈〈<′, iP ′

2, iB ′

2, ( ′〉, 〈<′, iP ′, iB ′, ( ′〉〉 ∈ '.

Roughly speaking, to obtain this we will make iPTSOsyn take persist-fo steps as eagerly aspossible after every other step. (Thus, when iPTSO2 takes a prop-fo or persist-fo step, iPTSOsyn

remains in the same state.) The rest of the proof continues by separately considering each possiblestep of iPTSO2, and establishing the simulation invariants at each step. For example, suppose

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:43

that 〈<, iP2, iB2, (〉g,PropW(G)#B−−−−−−−−−−→iPTSO2

〈<′, iP ′2, iB ′

2, ( ′〉. Then, the simulation invariants ensure

that 〈<, iP , iB , (〉g,PropW(G)#B−−−−−−−−−−→iPTSOsyn

〈<′, iPmid, iBmid, (′〉 for some iPmid and iBmid. Then, to

establish the simulation invariant, we repeatedly execute prop-fo and persist-fo steps as long asit is possible and obtain the state 〈<′, iP ′, iB ′, ( ′〉. �

B.5.5 Proof of (C). The proof of (C) is structured as follows:

(C.0) Let C be an<0-initialized Px86-observable-trace.(C.1) By Lemmas B.3 and B.6, there exists some<0-initialized iPx86-trace iC such that Λ(iC) = C .(C.2) By Lemma B.28, there exists some<0-initialized iPTSO2-trace iC2 such that Λ(iC2) = Λ(iC).(C.3) By Lemma B.26, there exists some<0-initialized iPTSOsyn-trace iC ′2 such that Λ(iC

′2) = Λ(iC2).

(C.4) By Lemmas B.3 and B.9, Λ(iC ′2) is an<0-initialized PTSOsyn-observable-trace.

(C.5) Then, the claim follows observing that Λ(iC ′2) = Λ(iC2) = Λ(iC ) = C .

The next lemma states that every trace can be continued to empty the content of its persistencebuffer.

LemmaB.27. For every<0-initialized iPx86-trace iC , there exists some {PropFL, PropFO}-complete<0-initialized iPx86-trace iC ′ such that Λ(iC) = Λ(iC ′).

Proof sketch. iC can be extended to some iC ′ so that every 〈_, RMW(G, _, E)#B〉, 〈_, PropW(G)#B〉,〈_, PropFO(G)#B〉 or 〈_, PropFL(G)#B〉 has a matching PerW(G)#B or PerPER(G)#B . Indeed, sinceit is always possible to persist entries of persistence buffer in order, we can simply append corre-sponding labels in the order, in which unmatched propagation events occur in iC . �

Lemma B.28 (Step C.2). For every<0-initialized iPx86-trace iC , there exists some<0-initializediPTSO2-trace iC2 such that Λ(iC2) = Λ(iC ).

Proof sketch. By Lemma B.27 applied to iC , there is some {PropFL, PropFO}-complete iPx86-trace iC1 such that Λ(iC ) = Λ(iC1). Moreover, by applying Lemma B.25 to iC1, there is some syn-chronous {PropFL, PropFO}-complete <0-initialized iPx86-trace iC ′

1such that Λ(iC1) = Λ(iC ′

1).

We transform iC ′1further into iC2 by removing every PerPER(G)#B following 〈_, PropFL(G)#B〉,

and by replacing every PerPER(G)#B following 〈_, PropFO(G)#B〉 with PerFO(G)#B .We argue that iC2 that is an iPTSO2-trace. Indeed, by construction of iC ′, each persistence buffer

iP (G) only contains FO(g)#B-entries right before the step propagating them from the buffer takesplace. Moreover, each persistence buffer iP (G) does not contain W(E)#B-entries upon executing〈_, PropFL(G)#B〉 steps, since the conditions for persisting flush instructions in iC ′

1ensure that

such writes previously persisted. Hence, the constraints on the content of the persistence buffersare satisfied in iPTSO2 by construction. �

B.5.6 Proof of (D). The proof of (D) is structured as follows:

(D.0) Let C be an<0-to-< Px86-observable-trace.(D.1) By Lemmas B.3 and B.6, there exists some<0-to-< iPx86-trace iC such that Λ(iC ) = C .(D.2) By Lemma B.31, there exists some<0-to-< iPTSO1-trace iC1 such that Λ(iC1) . Λ(iC ).(D.3) By Lemma B.32, there exists some<0-to-< iPTSO2-trace iC2 such that Λ(iC2) = Λ(iC1).(D.4) By Lemma B.26, there exists some<0-to-< iPTSOsyn-trace iC ′2 such that Λ(iC ′

2) = Λ(iC2).

(D.5) By Lemmas B.3 and B.9, Λ(iC ′2) is an<0-to-< PTSOsyn-observable-trace.

(D.6) Then, the claim follows observing that Λ(iC ′2) = Λ(iC2) = Λ(iC1) . Λ(iC ) = C .

Lemma B.29. For every <0-to-< iPx86-trace iC , there exists some {PropFL, PropFO}-complete<0-to-< iPx86-trace iC ′ such that Λ(iC ′) . Λ(iC).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:44 Artem Khyzha and Ori Lahav

Proof. Let 80 be the minimal index for which typ(iC (80)) ∈ {PropFL, PropFO} but#(iC ( 9 )) ≠

#(iC (80)) for every 9 > 80. Let 81, ... ,8< be an enumeration of all indices 8 > 80 with typ(iC (8)) ∈

{PerW, PerPER}. We define iC ′ = iC (1), ... ,iC (80−1), iC (81), ... ,iC (8<).We trivially have thatΛ(iC ′) .Λ(iC). To see that iC ′ is a ({PropFL, PropFO}-complete) iPx86-trace, it suffices to note that the tran-sitions of iPx86 ensure that for every 1 ≤ 9 ≤ < with typ(iC (8 9 )) = PerW, we have typ(iC (:)) ∈{RMW, PropW} and #(iC (:)) = #(iC (8 9 )) for some : < 80; and for every 1 ≤ 9 ≤ < withtyp(iC (8 9 )) = PerPER, we have typ(iC (:)) ∈ {PropFL, PropFO} and #(iC (:)) = #(iC (8 9 )) forsome : < 80. Finally, since iC ′ includes all PerW transitions of iC , it is an<0-to-< iPx86-trace. �

Lemma B.30. For every synchronous {PropFL, PropFO}-complete <0-to-< iPx86-trace iC , thereexists some<0-to-< iPTSO1-trace iC

′ such that Λ(iC ′) = Λ(iC ).

Proof. We obtain iC ′ by merging consecutive prop-fl/prop-fo and persist-per steps in iC

into one prop-fl/prop-fo step of iPTSO1, thus maintaining the persistence buffers without PER-entries. �

Lemma B.31 (Step D.2). For every <0-to-< iPx86-trace iC , there exists some <0-to-< iPTSO1-trace iC ′ such that Λ(iC ′) . Λ(iC).

Proof. By Lemma B.29, there exists some {PropFL, PropFO}-complete<0-to-< iPx86-trace iC2such that Λ(iC2 ) . Λ(iC). Then, by Lemma B.25, there exists a synchronous {PropFL, PropFO}-complete<0-to-< iPx86-trace iCB such thatΛ(iCB) = Λ(iC2 ). Then, by Lemma B.30, there exists an<0-to-< iPTSO1-trace iC ′ such that Λ(iC ′) = Λ(iCB). Now, since Λ(iC2 ) . Λ(iC), Λ(iCB) = Λ(iC2 ),and Λ(iC ′) = Λ(iCB), we have that Λ(iC ′) . Λ(iC ), and the claim follows. �

Lemma B.32 (Step D.3). For every<0-to-< iPTSO1-trace iC , there exists some<0-to-< iPTSO2-trace iC ′ such that Λ(iC ′) = Λ(iC ).

Proof sketch. iPTSO2 can simulate iPTSO1 by taking a persist-fo step immediately afterevery prop-fo step, keeping the persistence buffers without any FO(_)#_ entries. �

B.6 Proof of Lemma 4.5

Lemma 4.5. Suppose that 〈<0,Pn ,Bn〉C=⇒PTSOsyn

〈<,P ,B〉. Then:

• 〈<0,Pn ,Bn 〉C=⇒PTSOsyn

〈<′,P ′,Bn 〉 for some<′ and P ′.

• 〈<0,Pn ,Bn 〉C′

=⇒PTSOsyn〈<,P ,Bn〉 for some C ′ . C .

Proof. The first item is trivial (we can simply propagate and persist whatever needed in theend of the trace). We prove the second using the instrumented system iPTSOsyn. By Lemmas B.3

and B.9, there exist iC , iP , iB , and ( ⊆ N, such that 〈<0,Pn ,Bn , ∅〉iC−→iPTSOsyn

〈<, iP , iB , (〉,Λ(iC) = C , Λ(iP ) = P , and Λ(iB ) = B . For every g ∈ Tid, let 8g be the minimal index such thattid(iC (8g )) = g , typ(iC (8g )) ∈ {W, FL, FO, SF}, and #(iC ( 9 )) ≠ #(iC (8g )) for every 9 > 8 (thatis, the operation in index 8g never propagated from the store buffer). If such index does not exist,we let 8g = ⊥. For every g ∈ Tid, let �g be the set of all indices 8 ≥ 8g such that tid(iC (8)) = g

and typ(iC (8)) ∈ {W, R, RMW, R-ex, MF, FL, FO, SF} (that is, the operation in index 8 was issued afteran operation that never propagated from the store buffer). If 8g = ⊥, we let �g = ∅. Now, let iC ′

be the sequence obtained from C by omitting for every g ∈ Tid all transition labels in indices�g , and further omitting iC ( 9 ) if #(iC ( 9 )) = #(iC (8)) for some 8 ∈ �g (that is, we remove theoperations in �g and their corresponding propagation operations). Note that such 9 can only exist

if typ(iC (8)) = FO. It is easy to see that 〈<0,Pn ,Bn , ∅〉iC ′

−−→iPTSOsyn〈<, iP ,Bn , (

′〉 for some ( ′ (inparticular, all operations of threads c ≠ g , as well as all propagation operations, are oblivious to

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:45

the contents of B (g)). Going back to the non-instrumented system, by Lemmas B.3 and B.9, we

obtain that 〈<0,Pn ,Bn 〉Λ(iC′)=====⇒PTSOsyn

〈<,Λ(iP ),Bn〉. It is also easy to see that our constructionensures that Λ(iC ′) . C . �

C PROOFS FOR SECTION 5

Lemma 5.14. The following conditions together ensure that " observationally refines � :

(i) For every<0-initialized"-observable-trace C , there exists a �-consistent<0-initialized execu-tion graph� such that C ∈ traces(�).

(ii) For every <0-to-< "-observable-trace C , there exist C ′ . C and �-consistent <0-initializedexecution graph such that C ′ ∈ traces(�) and<(�) =<.

Proof. Suppose that @ ∈ Pr .Q is reachable under " . Then, by definition, 〈@,<,<̃〉 is reachablein Pr q " for some 〈<,<̃〉 ∈ ".Q. Thus, there exist crashless observable program traces C0, ... ,C= ,initial program states @

0, ... ,@= ∈ Pr .QInit, initial non-volatile memories<1, ... ,<= ∈ Loc → Val,

and initial volatile states <̃0, ... ,<̃= ∈ ".Q̃Init , such that the following hold:

• 〈@0,<Init, <̃0〉

C0==⇒Prq" 〈_,<1, _〉, and 〈@8,<8, <̃8〉

C8=⇒Prq" 〈_,<8+1, _〉 for every 1 ≤ 8 ≤ = − 1.

• 〈@=,<=, <̃=〉C===⇒Prq" 〈@, _, _〉.

By Prop. A.1, it follows that:

• @8C8=⇒Pr _ for every 0 ≤ 8 ≤ = − 1, and @=

C===⇒Pr @.

• C0 is an<Init-to-< "-observable-trace, and C8 is an<8-to-<8+1 "-observable-trace for every1 ≤ 8 ≤ = − 1.

• C= is an<=-initialized "-observable-trace.

Then, assumption (ii) entails that there exist C ′0, ... ,C ′=−1 and�-consistent execution graphs�0, ... ,�=−1

such that the following hold:

• C ′8 . C8 for every 0 ≤ 8 ≤ = − 1.• C ′8 ∈ traces(�8 ) for every 0 ≤ 8 ≤ = − 1.• �0 is<Init-initialized and<(�0) =<1.• For every 1 ≤ 8 ≤ = − 1,�8 is<8-initialized and<(�8) =<8+1.

Now, since @8C8=⇒Pr _ and C ′8 . C8 for every 0 ≤ 8 ≤ = − 1, by Prop. 2.4, we have @8

C′8=⇒Pr _ for every

0 ≤ 8 ≤ = − 1. Since C ′8 ∈ traces(�8 ) for every 0 ≤ 8 ≤ = − 1, by Prop. 5.12, it follows that �8 isgenerated by Pr for every 0 ≤ 8 ≤ = − 1.In addition, assumption (i) entails that there exists a�-consistent<=-initialized execution graph

�= such that C= ∈ traces(�=). Since @=C===⇒Pr @, by Prop. 5.12, it follows that�= is generated by Pr

with final state @.It follows that�0, ... ,�= are�-consistent execution graphs that satisfy the conditions ofDef. 5.13,

so that @ is reachable under � . �

Lemma 5.15. If for every �-consistent initialized execution graph � , some C ∈ traces(�) is an<Init(�)-to-<(�) "-observable-trace, then � observationally refines " .

Proof. Suppose that @ ∈ Pr .Q is reachable under � . Let �0, ... ,�= be �-consistent executiongraphs that satisfy the conditions of Def. 5.13. Our assumption entails that there exist C0, ... ,C= suchthat for every 1 ≤ 8 ≤ =, C8 ∈ traces(�8) and C8 is an<Init (�8)-to-<(�8) "-observable-trace. Let

<̃0, ... ,<̃= ∈ ".Q̃Init such that 〈<Init (�8), <̃8〉C8=⇒" 〈<(�8), _〉 for every 1 ≤ 8 ≤ =.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:46 Artem Khyzha and Ori Lahav

By Prop. 5.11, since �8 is generated by Pr for every 0 ≤ 8 ≤ = − 1, there exist initial program

states @0, ... ,@=−1 ∈ Pr .QInit, such that @8

C8=⇒Pr _ for every 0 ≤ 8 ≤ =−1. Using Prop. A.1, it follows

that 〈@8,<Init(�8), <̃8〉C8=⇒Prq" 〈_,<(�8), _〉 for every 0 ≤ 8 ≤ = − 1.

In addition, since �= is generated by Pr with final state @, there exists initial program state

@= ∈ Pr .QInit, such that @=C===⇒Pr @. Using Prop. A.1, it follows that 〈@=,<Init (�=), <̃=〉

C===⇒Prq"

〈@,<(�=), _〉.Now, since <Init(�0) = <Init and <Init (�8) = <(�8−1) for every 1 ≤ 8 ≤ =, it follows that

〈@,<(�=), <̃〉 is reachable in Pr q " for some <̃ ∈ ".Q̃. �

The following property of ppo is useful below:

Lemma C.1. �.ppo ; [R] ;�.po ⊆ �.ppo.

Lemma 5.24. Let tpo be a propagation order for an execution graph � for which the conditions ofDef. 5.19 hold. Then, �.ppo ∪�.rfe ∪ tpo ∪�.fr(tpo) is acyclic.

Proof. In this proof we consider a single graph � , and thus omit the “�.” prefix from all nota-tions.Consider a cycle in ppo ∪ rfe ∪ tpo ∪ fr(tpo) of minimal length. The fact that tpo is total on

P and the minimality of the cycle imply that this cycle may contain at most two events in P.If the cycle contains no events in P, then it must consist solely of ppo-edges, which contradict

the fact that po is irreflexive.If the cycle contains one event inP, thenwemust have 〈4, 4〉 ∈ (ppo∪rfe);ppo+;(ppo∪fr(tpo))

for some 4 ∈ E, which implies that one of the following holds:

(8) 〈4, 4〉 ∈ ppo+ ⊆ po,(88) 〈4, 4〉 ∈ ppo+ ; fr(tpo) ⊆ po ; fr(tpo),(888) 〈4, 4〉 ∈ rfe ; ppo+ ⊆ rfe ; po, or(8E) 〈4, 4〉 ∈ rfe ; ppo+ ; fr(tpo) ⊆ rfe ; po ; fr(tpo).

Each of these options contradicts one of the conditions of Def. 5.19.Finally, suppose that the cycle contains two events in P. Then, from the fact that tpo is total

on P, there must exist some 〈41, 42〉 ∈ tpo, such that 〈42, 41〉 ∈ ppo ∪ rfe ∪ fr(tpo) or 〈42, 41〉 ∈(ppo∪rfe) ; [R] ;ppo∗ ; (ppo∪fr(tpo)). The first case leads to a contradiction since the conditionsof Def. 5.19 ensure that tpo ; ppo, tpo ; rfe, and tpo ; fr(tpo) are all irreflexive. It follows that oneof the following holds:

(8) 〈42, 42〉 ∈ ppo ; [R] ; ppo+ ; tpo ⊆ ppo ; [R] ; po ; tpo ⊆ ppo ; tpo (by Lemma C.1),(88) 〈42, 42〉 ∈ ppo ; [R] ; ppo∗ ; fr(tpo) ; tpo,(888) 〈42, 42〉 ∈ rfe ; ppo+ ; tpo ⊆ rfe ; po ; tpo, or(8E) 〈42, 42〉 ∈ rfe ; ppo∗ ; fr(tpo) ; tpo ⊆ tpo ∪ rfe ; po ; fr(tpo) ; tpo .

As before, each of these options contradicts one of the conditions of Def. 5.19. The least trivialcase is (88): suppose that 〈42, 42〉 ∈ ppo ; [R] ; ppo∗ ; fr(tpo) ; tpo. Then, it must be the case that42 ∈ RMW∪R-ex∪MF, and so 〈42, 42〉 ∈ po;fr(tpo) ;tpo ; [RMW∪R-ex∪MF], which contradictsDef. 5.19. �

Theorem 5.29 is obtained from the following two theorems (one for each direction):

Theorem C.2. PTSOsyn observationally refines DPTSOsyn.

Proof (outline). Using Lemma 5.14, it suffices to show that:

• For every <0-initialized PTSOsyn-observable-trace C , there exists a DPTSOsyn-consistent<0-initialized execution graph� such that C ∈ traces(�).

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:47

• For every<0-to-< PTSOsyn-observable-trace C , there exist C ′ . C and<0-initializedDPTSOsyn-consistent execution graph such that C ′ ∈ traces(�) and<(�) =<.

Using Lemma 4.5, it suffices to prove that 〈<0,Pn ,Bn 〉C=⇒PTSOsyn

〈<,P ,Bn〉 implies that thereexists a DPTSOsyn-consistent <0-initialized execution graph � such that C ∈ traces(�) and

<(�) = <. Suppose that 〈<0,Pn ,Bn 〉C=⇒PTSOsyn

〈<,P ,Bn 〉. We construct a DPTSOsyn-consistent<0-initialized execution graph� such that C ∈ traces(�) and<(�) =<.

We use the instrumented semantics (iPTSOsyn). By Lemmas B.3 andB.9, we have 〈<0,Pn ,Bn , ∅〉iC=⇒iPTSOsyn

〈<, iP ,Bn , (〉 for some iC such that Λ(iC ) = C , iP , and ( ⊆ N. We use the (instrumented) trace iCto construct� :

• Events: For every 1 ≤ 8 ≤ |iC |with iC (8) = 〈g, ;#_〉 and typ(;) ∈ {W, R, RMW, R-ex, MF, FL, FO, SF},we include the event 48 , 〈g, 8, ;〉 in �.E. In addition, we include the initialization events4G , 〈⊥, 0, W(G,<0(G))〉 for every G ∈ Loc. It is easy to see that we have C ∈ traces(�) andthat � is<0-initialized.

• Reads-from:�.rf is constructed as follows: for every 1 ≤ 8 ≤ |iC |with typ(48 ) ∈ {R, RMW, R-ex}and loc(48 ) = G , we locate the last index 1 ≤ 9 < 8 such that typ(4 9 ) = W, loc(4 9 ) = G ,tid(4 9 ) = tid(48 ) and there does not exist an index 9 < : < 8 such that#(iC (:)) = #(iC ( 9 ))

(namely, the write that corresponds to 4 9 was not propagated from the store buffer whenthe read that corresponds to 48 was executed), and include an edge 〈4 9 , 48〉 in �.rf. If suchan index 9 does not exist, we further locate the last index 1 ≤ : < 8 such that such thattyp(4 9 ) ∈ {RMW, PropW} and loc(4 9 ) = G , and include an edge 〈4 9 , 48〉 in �.rf, where 9 isthe unique index satisfying 9 < : and #(iC ( 9 )) = #(iC (:)), or 9 = : in case typ(4 9 ) = RMW.Finally, if such index : does not exist as well, we include the edge 〈4G , 48〉 in �.rf (readingfrom the initialization event). Using iPTSOsyn’s operational semantics, it is easy to verifythat �.rf is indeed a reads-from relation for �.E.

• Memory assignment: To define �.M, for every G ∈ Loc, let 8 (G) be the maximal index suchthat typ(iC (8 (G))) = PerW and loc(iC (8 (G))) = G (that is, 8 (G) is the index of the last prop-agation to the persistent memory of a write to G ). In addition, let F (8 (G)) be the (unique)index : such that typ(iC (:)) ∈ {W, RMW} and#(iC (:)) = #(iC (8 (G))) (that is,F (8 (G)) is theindex of the write operation that persists in index 8 (G)). Now, we define �.M(G) , 4F (8 (G))

for every G ∈ Loc for which 8 (G) is defined. If 8 (G) is undefined (typ(iC (8) = PerW andloc(iC (8)) = G never hold), we set �.M(G) , 4G (the initialization event of G ). Then, weclearly have<(�) =<.

To show that � is DPTSOsyn-consistent, we construct a propagation order tpo for � . First, forevery 1 ≤ 8 ≤ |iC | with typ(48 ) ∈ {W, FL, FO, SF}, let tp (8) denote the (unique) index : such thattyp(iC (:)) ∈ {PropW/PropFL/PropFO/PropSF} and #(iC (:)) = #(iC (8)) (that is, tp (8) is theindex of the propagation from the store buffer of the operation in index 8). In addition, for every1 ≤ 8 ≤ |iC | with typ(48 ) ∈ {RMW, R-ex, MF}, we let tp (8) , 8 . Now, tpo is constructed as follows:for every 48 , 4 9 ∈ �.P, we include 〈48 , 4 9 〉 ∈ tpo iff tp (8) < tp ( 9 ). In addition, we include in tpo

some arbitrary total order on�.E∩ Init, as well as pairs ordering all initialization events before allnon-initialization events. It is straightforward to verify that this construction satisfies the (local)properties of Def. 5.19 yielding a DPTSOsyn-consistent graph:

(1) For every 0, 1 ∈ P, except for the case that 0 ∈ W ∪ FL ∪ FO, 1 ∈ FO, and loc(0) ≠ loc(1),if 〈0, 1〉 ∈ �.po, then 〈0, 1〉 ∈ tpo: Let 0, 1 ∈ P such that 〈0, 1〉 ∈ �.po. Suppose that itis not the case that 0 ∈ W ∪ FL ∪ FO, 1 ∈ FO, and loc(0) ≠ loc(1). First, if 0 is aninitialization event, then by definition we have 〈0, 1〉 ∈ tpo (1 cannot be an initializationevent in this case). Otherwise, we have that 0 = 48 and 1 = 4 9 for some 1 ≤ 8 < 9 ≤ |iC | such

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:48 Artem Khyzha and Ori Lahav

that tid(48 ) = tid(4 9 ). Since iPTSOsyn propagates the entries from the persistent bufferin the same order they were issued, except for the case of an FO-entry that may propagatebefore previously-issued W/FL/FO-entries to a different location, it must be the case thattp (8) < tp ( 9 ), and so we have 〈0, 1〉 = 〈48 , 4 9 〉 ∈ tpo .

(2) tpo? ;�.rfe ;�.po? is irreflexive: First, we show that �.rfe ;�.po? is irreflexive. Supposethat 〈0, 1〉 ∈ �.rfe and 〈1, 0〉 ∈ �.po?. Then, we have that 0 = 4 9 and 1 = 48 for some1 ≤ 8 ≤ 9 ≤ |iC | such that tid(48 ) = tid(4 9 ) (note that initialization events do not haveincoming po or rf-edges). However, 〈4 9 , 48〉 ∈ �.rf implies that 9 < 8 . Now, suppose that〈0, 1〉 ∈ tpo, 〈1, 2〉 ∈ �.rfe, and 〈2, 0〉 ∈ �.po?. Then, it follows that 0 = 48 , 1 = 4 9 , and2 = 4: for some 1 ≤ 8, 9 , : ≤ |iC | such that tid(4: ) = tid(48 ), : ≤ 8 , and tp (8) < tp ( 9 ).Then, since we do not have 〈4 9 , 4: 〉 ∈ �.po ∪ �.po−1, we cannot have g (4 9 ) = g (4: ). Then,the construction of �.rf ensures that tp ( 9 ) < : . It follows that tp (8) < : . Since 8 ≤ tp (8),this contradicts the fact that : ≤ 8 .

(3) �.fr(tpo) ; �.rfe? ; �.po is irreflexive: From the construction of �.rf, it is easy to verifythat 〈48 , 4 9 〉 ∈ �.fr(tpo) implies that 8 < tp ( 9 ). Now, suppose that 〈0, 1〉 ∈ �.fr(tpo) and〈1, 0〉 ∈ �.po. Then, 0 = 4 9 and 1 = 48 for some 1 ≤ 8 ≤ 9 ≤ |iC | such that tid(48 ) = tid(4 9 )

and 9 < tp (8). It follows that 8 < tp (8) which contradicts our construction. Finally, supposethat 〈0, 1〉 ∈ �.fr(tpo), 〈1, 2〉 ∈ �.rfe, and 〈2, 0〉 ∈ �.po. Then, it follows that 0 = 48 , 1 = 4 9 ,and 2 = 4: for some 1 ≤ 8, 9 , : ≤ |iC | such that tid(4: ) = tid(48 ), : ≤ 8 , and 8 < tp ( 9 ). As inthe previous item, we have that tp ( 9 ) < : , which leads to a contradiction.

(4) �.fr(tpo) ; tpo is irreflexive: Suppose that 〈0, 1〉 ∈ �.fr(tpo) and 〈1, 0〉 ∈ tpo . Then, 0 = 4 9and 1 = 48 for some 1 ≤ 8, 9 ≤ |iC | such that 8 < tp ( 9 ) and tp ( 9 ) ≤ tp (8). It follows that8 < tp (8), which contradicts our construction.

(5) �.fr(tpo) ; tpo ; �.rfe ; �.po is irreflexive: Suppose that 〈0, 1〉 ∈ �.fr(tpo), 〈1, 2〉 ∈ tpo,〈2, 3〉 ∈ �.rfe, and 〈3, 0〉 ∈ �.po. Then, it follows that 0 = 48 , 1 = 4 9 , 2 = 4: , and 3 = 4< forsome 1 ≤ 8, 9 , :,< ≤ |iC | such that tid(4<) = tid(48 ),< < 8 , 8 < tp ( 9 ), tp ( 9 ) < tp (:), andtp (:) < <. Clearly, these inequalities lead to a contradiction.

(6) �.fr(tpo) ; tpo ; [RMW ∪ R-ex∪MF] ;�.po is irreflexive: Suppose that 〈0, 1〉 ∈ �.fr(tpo),〈1, 2〉 ∈ tpo , 2 ∈ RMW ∪ R-ex ∪ MF, and 〈2, 0〉 ∈ �.po. Then, it follows that 0 = 48 , 1 = 4 9 ,2 = 4: for some 1 ≤ 8, 9 , : ≤ |iC | such that tid(4: ) = tid(48 ), : < 8 , 8 < tp ( 9 ), tp ( 9 ) < tp (:).However, since 2 ∈ RMW∪R-ex∪MF, we have tp (:) = : , and, as before, these inequalitieslead to a contradiction.

(7) �.dtpo(tpo) ;tpo is irreflexive: Suppose that 〈0, 1〉 ∈ �.dtpo(tpo) and 〈1, 0〉 ∈ tpo . By defini-tion, there is a locationG ∈ Loc such that0 ∈ �.FLOG = �.FLG∪(FOG∩dom(�.po ; [RMW ∪ R-ex ∪ MF ∪ SF])),1 ∈ WG ∪ RMWG , and 〈�.M(G), 1〉 ∈ tpo . Then, 0 = 4 9 and 1 = 48 for some 1 ≤ 8, 9 ≤ |iC |

such that tp (8) < tp ( 9 ). Now, if 0 is a flush event, the flush step in index 9 can only existif the write entry that corresponds to 1 has persisted. Hence, 8 (G) is defined, and we have�.M(G) = 4F (8 (G)) . In addition, 〈�.M(G), 1〉 ∈ tpo implies that tp (F (8 (G))) ≤ tp (8). How-ever, since the persistence order (on each location) must follow the order in which the writepropagated from the store buffer, the write entry that corresponds to 1 must persist after thewrite entry that corresponds to�.M(G), which contradicts the construction of�.M. The casethat 0 is a flush-optimal event followed by an RMW ∪ R-ex ∪ MF ∪ SF-event of the samethread is handled similarly. �

Theorem C.3. DPTSOsyn observationally refines PTSOsyn.

Proof (outline). By Lemma 5.15, is suffices to show that for every DPTSOsyn-consistent ini-tialized execution graph� , some C ∈ traces(�) is an<Init(�)-to-<(�) PTSOsyn-observable-trace.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:49

By Lemmas B.3 and B.9, wemay use the instrumented system iPTSOsyn and show that there existsan<Init(�)-to-<(�) iPTSOsyn-trace iC such that Λ(iC) ∈ traces(�).Let� be a DPTSOsyn-consistent execution graph, and let tpo be a propagation order for� that

satisfies the conditions of Def. 5.19. Let � be some injective function from events to N (we willuse it to assign identifiers to the different operations). For every event 4 ∈ E, we associate threetransition labels U (4), V (4), W (4):

• Issue of 4: U (4) = 〈tid(4), lab(4)#� (4)〉.• Propagation of 4 from store buffer to persistence buffer (only defined for 4 ∈ W∪FL∪FO∪

SF): V (4) =

〈tid(4), PropW(loc(4))#� (4)〉 4 ∈ W

〈tid(4), PropFL(loc(4))#� (4)〉 4 ∈ FL

〈tid(4), PropFO(loc(4))#� (4)〉 4 ∈ FO

〈tid(4), PropSF#� (4)〉 4 ∈ SF

• Propagation of 4 from persistence buffer to persistent memory (only defined for 4 ∈ W ∪

RMW ∪ FO): W (4) =

{PerW(loc(4))#� (4) 4 ∈ W ∪ RMW

PerFO(loc(4))#� (4) 4 ∈ FO

Using these definition, we construct a set � of transition labels of iPTSOsyn. Let:

• �U = �.E \ Init.• �V = (�.W \ Init) ∪�.FL ∪�.FO ∪�.SF.

• �WGW = {F ∈ (WG \ Init) ∪ RMWG | 〈F,�.M(G)〉 ∈ tpo?}.

• �WW =

⋃G ∈Loc �

WGW .

• �FOGW = FOG∩(dom(tpo? ;�.po ; [RMW ∪ R-ex ∪ MF ∪ SF])∪dom(tpo ; [FLG ∪ {�.M(G)}]).

• �FOW =

⋃G ∈Loc �

FOGW .

• �W = �WW ∪ �FO

W .

We define

� = {U (4) | 4 ∈ �U } ∪ {V (4) | 4 ∈ �V } ∪ {W (4) | 4 ∈ �W }.

Next, we construct an enumeration of � which will serve as iC . Let ' be the union of the follow-ing relations on �:

• '1 = {〈U (4), V (4)〉 | 4 ∈ �V}

• '2 = {〈V (4), W (4)〉 | 4 ∈ �W }

• '3 = {〈U (41), U (42)〉 | 〈41, 42〉 ∈ [�U ] ;�.po}

• '4 = {〈V (41), V (42)〉 | 〈41, 42〉 ∈ [�V ] ; tpo ; [�V ]}

• '5 = {〈U (41), V (42)〉 | 〈41, 42〉 ∈ [RMW ∪ R-ex ∪ MF] ; tpo ; [�V]}

• '6 = {〈V (41), U (42)〉 | 〈41, 42〉 ∈ [�V] ; tpo ; [RMW ∪ R-ex ∪ MF]}

• '7 = {〈V (41), U (42)〉 | 〈41, 42〉 ∈ [�V] ;�.rfe}

• '8 = {〈U (41), U (42)〉 | 〈41, 42〉 ∈ [RMW] ;�.rfe}

• '9 = {〈U (41), V (42)〉 | 〈41, 42〉 ∈ �.fr(tpo) ; [�V ]}

• '10 = {〈U (41), U (42)〉 | 〈41, 42〉 ∈ �.fr(tpo) ; [RMW]}

• '11 = {〈W (41), V (42)〉 | 〈41, 42〉 ∈ [�W ] ; tpo ; [FL]}

• '12 = {〈W (41), V (42)〉 | 〈41, 42〉 ∈ [�FOW ] ;�.po ; [SF]}

• '13 = {〈W (41), U (42)〉 | 〈41, 42〉 ∈ [�FOW ] ;�.po ; [RMW ∪ R-ex ∪ MF]}

• '14 = {〈W (41), W (42)〉 | 〈41, 42〉 ∈ [�W ] ; tpo ; [�W ]}

It is standard to verify that for any enumeration iC of ', we have Λ(iC ) ∈ traces(�) and that iCis an <Init(�)-to-<(�) iPTSOsyn-trace. In particular, let G ∈ Loc and suppose that for the lasttransition label of the form PerW(G)#_ in iC is not PerW(G)#� (�.M(G)), but rather PerW(G)#� (F)

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:50 Artem Khyzha and Ori Lahav

for someF ∈ �WW \ {�.M(G)}. Then, sinceF ∈ �W

W we have 〈F,�.M(G)〉 ∈ tpo?, which contradictsthe fact that '14 ⊆ '. The proof that iC is indeed an iPTSOsyn-trace is performed by induction:assume that a prefix iC ′ of iC is an iPTSOsyn-trace, show that it can be extended with one morelabel from iC . For that matter, the claim has to be strengthened to relate the prefix iC ′ with the statethat iPTSOsyn reaches. This state, denoted by 〈<iC′, iPiC′, iBiC ′, (iC′〉, is constructed as follows:

• Persistent memory: For every G ∈ Loc, let 4G ∈ �WW ∩EG such thatW (4G ) is the last occurrence

in iC ′ of a transition label of the form PerW(G)#_. If no transition of the form PerW(G)#_occurs in iC ′, let 4G be the initialization write to G in � (i.e., <Init(�) (G)). Then, <iC′ =

_G. valW(4G ).• Instrumented persistent buffers: For every location G , we include in iPiC′ (G) all entries ofthe following forms:– W(loc(4), valW(4))##(4) for some 4 ∈ �.WG such that V (4) ∈ iC ′ and W (4) ∉ iC ′.– W(loc(4), valW(4))##(4) for some 4 ∈ �.RMWG such that U (4) ∈ iC ′ and W (4) ∉ iC ′.– FO(tid(4))##(4) for some 4 ∈ �.FOG such that V (4) ∈ iC ′ and W (4) ∉ iC ′.Denote the instrumented entry related to event 4 by entry(4). Then, entry(41) appears beforeentry(42) in iPiC′ (G) iff one of the following hold:– If 41, 42 ∉ �.RMWG and V (41) appears before V (42) in iC ′.– If 41 ∉ �.RMWG , 42 ∈ �.RMWG , and V (41) appears before U (42) in iC ′.– If 41 ∈ �.RMWG , 42 ∉ �.RMWG , and U (41) appears before V (42) in iC ′.– If 41, 42 ∈ �.RMWG and U (41) appears before U (42) in iC ′.

• Instrumented store buffers: For every thread identifier g , we include in iBiC′ (g) all entriesof the following forms:– W(loc(4), valW(4))##(4) for some 4 ∈ �.Wg such that U (4) ∈ iC ′ and V (4) ∉ iC ′.– FL(loc(4))##(4) for some 4 ∈ �.FLg such that U (4) ∈ iC ′ and V (4) ∉ iC ′.– FO(loc(4))##(4) for some 4 ∈ �.FOg such that U (4) ∈ iC ′ and V (4) ∉ iC ′.– SF##(4) for some 4 ∈ �.SFg such that U (4) ∈ iC ′ and V (4) ∉ iC ′.Denote the instrumented entry related to event 4 by entry(4). Then, entry(41) appears beforeentry(42) in iBiC′ (g) iff U (41) appears before U (42) in iC ′.

• (iC′ is the set of all identifiers used in iC ′.

It remains to show that ' is acyclic. Clearly, a cycle in '3 induces a �.po-cycle, and so '3 isacyclic. Now, since '3 is transitive, we can assume that any use of '3 in an '-cycle follows an'8-step with 8 ≠ 3. It follows that any use of '3 in an '-cycle must start in a transition label U (4)for some 4 ∈ �.R∪�.RMW∪�.R-ex∪�.MF. Hence, any '-cycle induces cycle in�.ppo∪�.rfe∪

tpo ∪�.fr(tpo), which is acyclic by Lemma 5.24. �

D PROOFS FOR SECTION 6

For the proofs in this section, we use the instrumented persistent memory subsystem (seeAppendix B.1)iPSC, presented in Fig. 9. The functions tid, typ, loc are extended to iPSC.iΣ in the obviousway (in particular, for U ∈ iPSC.iΣ, we have typ(U) ∈ {PerW/PerFO}).

It is easy to see that iPSC is an instrumentation of PSC (see Def. B.8 for the definition of anerasure of an instrumented per-location persistence buffer).

Lemma D.1. iPSC is a Λ-instrumentation of PSC for Λ , _〈iP , (〉. Λ(iP ).

E PROOFS FOR SECTION 6.1

The next lemmas are used to prove Thm. 6.2.

Lemma E.1. Every<0-to-< PSCfin-observable-trace C is also an<0-to-< PSC-observable-trace.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:51

iPSC.iΣ ,{PerW(G)#B | G ∈ Loc, B ∈ N}∪{PerFO(G)#B | G ∈ Loc, B ∈ N}

< ∈ Loc → Val iP ∈ Loc → ({W(G, E)#B | G ∈ Loc, E ∈ Val, B ∈ N}∪{FO(g)#B | g ∈ Tid, B ∈ N})∗

iPInit , _G. n (Init = ∅

write( ′ = ( ⊎ {B}

; = W(G, E)

iP ′= iP [G ↦→ iP (G) · W(E)#B]

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP ′, ( ′〉

read( ′ = ( ⊎ {B}

; = R(G, E)

get(<,Λ(iP (G)))(G) = E

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP , ( ′〉

rmw( ′ = ( ⊎ {B}

; = RMW(G, ER, EW)

get(<,Λ(iP (G)))(G) = ER∀~. FO(g)#_ ∉ iP (~)

iP ′= iP [G ↦→ iP (G) · W(EW)#B]

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP ′, ( ′〉

rmw-fail( ′ = ( ⊎ {B}

; = R-ex(G, E)get(<,Λ(iP (G)))(G) = E

∀~. FO(g)#_ ∉ iP (~)

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP , ( ′〉

mfence/sfence( ′ = ( ⊎ {B}

; ∈ {MF, SF}

∀~. FO(g)#_ ∉ iP (~)

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP , ( ′〉

flush( ′ = ( ⊎ {B}

; = FL(G)

iP (G) = ∅

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP , ( ′〉

flush-opt( ′ = ( ⊎ {B}

; = FO(G)

iP ′= iP [G ↦→ iP (G) · FO(g)#B]

〈<, iP , (〉g,;#B−−−−→iPSC 〈<, iP ′, ( ′〉

persist-w! = PerW(G)#B

iP (G) = W(E)#B · ip

iP ′= iP [G ↦→ ip] <′

=< [G ↦→ E]

〈<, iP , (〉!−→iPSC 〈<′, iP ′, (〉

persist-fo! = PerFO(G)#B

iP (G) = FO(_)#B · ipiP ′

= iP [G ↦→ ip]

〈<, iP , (〉!−→iPSC 〈<, iP ′, (〉

Fig. 9. The iPSC Instrumented Persistent Memory Subsystem (the instrumentation is colored).

Proof (outline). We use a standard forward simulation argument. A simulation relation ' ⊆

PSCfin.Q × PSC.Q is defined as follows: 〈〈<5 , <̃, !,) 〉, 〈<,P〉〉 ∈ ' if the following hold:

• <5 =<.• For every G ∈ Loc, <̃(G) = get(<,P (G)) (G).• G ∈ ! iff P (G) = n .• g ∈ ) iff ∀~. FO(g) ∉ P (~).

Initially, we clearly have 〈〈<0, <̃Init, !Init,)Init〉, 〈<0,Pn 〉〉 ∈ '. Now, suppose that 〈<5 , <̃, !,) 〉g,;−−→PSCfin

〈<′5, <̃′, !′,) ′〉, and let 〈<,P〉 ∈ PSC.Q such that 〈〈<5 , <̃, !,) 〉, 〈<,P〉〉 ∈ '. Then, we have

<5 =<. We show that 〈<,P〉g,;==⇒; 〈<

′5,P ′〉 for some P ′ such that 〈〈<′

5, <̃′, !′,) ′〉, 〈<′

5,P ′〉〉 ∈ '.

The rest of the proof continues by separately considering each possible step of PSCfin, and estab-lishing the simulation invariants at each step. Below, we present the mapping of PSCfin-steps toPSC-steps:

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:52 Artem Khyzha and Ori Lahav

• write-persist-step is mapped to a write-step immediately followed by a persist-w-step.• write-no-persist is mapped to a write-step.• rmw-persist is mapped to an rmw-step immediately followed by a persist-w-step.• rmw-no-persist is mapped to an rmw-step.• flush-opt-persist is mapped to an flush-opt-step immediately followed by a persist-fo-step.

• flush-opt-no-persist is mapped to an flush-opt-step.• All other steps (read, rmw-fail, mfence ,sfence, and flush) are mapped to the PSC-stepof the same name.

It is straightforward to verify that this mapping induces possible sequences of steps, and preservesthe simulation invariants. �

For the converse, we use the following additional proposition (see Def. B.10 for the definitionof “commutes”).

Proposition E.2. 〈U, V〉 iPSC-commutes if typ(V) ∈ {PerW, PerFO} and one of the followingconditions holds:

• typ(U) ∉ {PerW, PerFO} and #(U) ≠ #(V).• typ(U) ∈ {PerW, PerFO} and loc(U) ≠ loc(V).

Lemma E.3. Every<0-to-< PSC-observable-trace C is also an<0-to-< PSCfin-observable-trace.

Proof (outline). Let C be an <0-to-< PSC-observable-trace. By Lemmas D.1 and B.3, thereexists an <0-to-< iPSC-trace iC such that Λ(iC ) = C . Using Prop. E.2, we can move all PerW-steps and PerFO-steps to immediately follow their corresponding W/RMW-step and FO-step, thusobtaining a “synchronized” instrumented trace in which every write/rmw/flush-optimal eitherpersists immediately after it is issued or never persists. This instrumented trace easily inducesan <0-to-< PSCfin-observable-trace: we take a *-persist-step for steps that are followed by aPerW-steps or PerFO-steps, and otherwise we take the *-no-persist or other steps of PSCfin. �

Theorem 6.2. PSC and PSCfin are observationally equivalent.

Proof. Follows from Lemmas 2.10, E.1 and E.3. �

F PROOFS FOR SECTION 6.2

The following lemma is used to show that DPSC observationally refines PSC.

Lemma F.1. Let � be a DPSC-consistent initialized execution graph. Then, some C ∈ traces(�) isan<Init(�)-to-<(�) PSC-observable-trace.

Proof (outline). By Lemmas D.1 and B.3, wemay use the instrumented system iPSC and showthat some iC with Λ(iC ) ∈ traces(�) is an<Init (�)-to-<(�) iPSC-trace.Let mo be a modification order for � that satisfies the condition of Def. 6.4. Let � be some

injective function from events to N (we will use it to assign identifiers to the different operations).For every event 4 ∈ E, we associate two transition labels U (4), W (4):

• Issue of 4: U (4) = 〈tid(4), lab(4)#� (4)〉.• Propagation of 4 from persistence buffer to persistent memory (only defined for 4 ∈ W ∪

RMW ∪ FO): W (4) =

{PerW(loc(4))#� (4) 4 ∈ W ∪ RMW

PerFO(loc(4))#� (4) 4 ∈ FO

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:53

Let ) be any total order on �.E extending �.hbPSC(mo). We construct a set � of transitionlabels of iPSC and an enumeration of � which will serve as iC .Let:

• �U = �.E \ Init.• �WG

W = {F ∈ (WG \ Init) ∪ RMWG | 〈F,�.M(G)〉 ∈ mo?}.

• �WW =

⋃G ∈Loc �

WGW .

• �FOGW = FOG∩(dom() ? ; [FOG ] ;�.po ; [RMW ∪ R-ex ∪ MF ∪ SF])∪dom() ; [FLG ∪ �WG

W ]).

• �FOW =

⋃G ∈Loc �

FOGW .

• �W = �WW ∪ �FO

W .

We define� = {U (4) | 4 ∈ �U } ∪ {W (4) | 4 ∈ �W

W ∪ �FO

W }.

Let ' be the union of the following relations on �:

• '1 = {〈U (4), W (4)〉 | 4 ∈ �W }

• '2 = {〈U (41), U (42)〉 | 〈41, 42〉 ∈ [�U ] ;) }

• '3 = {〈W (41), U (42)〉 | 〈41, 42〉 ∈ [�W ] ;) ; [FL]}

• '4 = {〈W (41), U (42)〉 | 〈41, 42〉 ∈ [�FOW ] ;�.po ; [RMW ∪ R-ex ∪ MF ∪ SF]}

• '5 = {〈W (41), W (42)〉 | 〈41, 42〉 ∈ [�W ] ;) ; [�W ]}

It is easy to see that ' is acyclic (an '-cycle would entail a) -cycle). It is standard to verify that forany enumeration iC of ', we haveΛ(iC ) ∈ traces(�) and that iC is an<Init(�)-to-<(�) iPSC-trace.In particular, let G ∈ Loc and suppose that for the last transition label of the form PerW(G)#_ iniC is not PerW(G)#� (�.M(G)), but rather PerW(G)#� (F) for someF ∈ �W

W \ {�.M(G)}. Then, since

F ∈ �WW we have 〈F,�.M(G)〉 ∈ mo? ⊆ ) ?, which contradicts the fact that '5 ⊆ '. �

Theorem 6.5. PSC and DPSC are observationally equivalent.

Proof (outline). The fact that DPSC observationally refines PSC immediately follows fromLemmas 5.15 and F.1. Next, we first show that PSC observationally refines DPSC. Let C be an<0-to-< PSC-observable-trace. We construct a DPSC-consistent <0-initialized execution graph� such that C ∈ traces(�) and<(�) =<. Then, the claim follows using Lemma 5.14.

We use the instrumented semantics (iPSC). By Lemmas D.1 and B.3, there exists a <0-to-<iPSC-trace iC such that Λ(iC ) = C . We use iC to construct� :

• Events: For every 1 ≤ 8 ≤ |iC | with iC (8) of the form 〈g, ;#B〉, we include the event 48 , 〈g, 8, ;〉

in �.E. In addition, we include the initialization events 4G , 〈⊥, 0, W(G,<0(G))〉 for everyG ∈ Loc. It is easy to see that we have C ∈ traces(�) and that � is<0-initialized.

• Reads-from:�.rf is constructed as follows: for every 1 ≤ 8 ≤ |iC |with typ(48 ) ∈ {R, RMW, R-ex}and loc(48 ) = G , we locate the maximal index 1 ≤ 9 < 8 such that typ(4 9 ) ∈ {W, RMW} andloc(4 9 ) = G (namely, the write that corresponds to 4 9 was the last write executed before theread that corresponds to 48 was executed), and include an edge 〈4 9 , 48〉 in�.rf. If such index9 does not exist, we include the edge 〈4G , 48〉 in �.rf (reading from the initialization event).Using iPSC’s operational semantics, it is easy to verify that �.rf is indeed a reads-fromrelation for�.E.

• Memory assignment: To define �.M, for every G ∈ Loc, let 8 (G) be the maximal index suchthat typ(iC (8 (G))) = PerW and loc(iC (8 (G))) = G (that is, 8 (G) is the index of the last prop-agation to the persistent memory of a write to G ). In addition, let F (8 (G)) be the (unique)index : such that typ(iC (:)) ∈ {W, RMW} and#(iC (:)) = #(iC (8 (G))) (that is,F (8 (G)) is theindex of the write operation that persists in index 8 (G)). Now, we define �.M(G) , 4F (8 (G))

for every G ∈ Loc for which 8 (G) is defined. If 8 (G) is undefined (typ(iC (8) = PerW and

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:54 Artem Khyzha and Ori Lahav

loc(iC (8)) = G never hold), we set �.M(G) , 4G (the initialization event of G ). Then, weclearly have<(�) =<.

To show that � is DPSC-consistent, we construct a modification mo for � . For every two events48 , 4 9 ∈ �.E ∩ (W ∪ RMW) with loc(48 ) = loc(4 9 ), we include 〈48 , 4 9 〉 in mo if either 48 ∈ Init or8 < 9 (that is, the write the corresponds to 48 was executed before the write that corresponds to 4 9 ).It is to verify that 〈48 , 4 9 〉 ∈ �.po ∪�.rf∪mo ∪�.fr(mo) ∪�.dtpo(mo) implies that 48 ∈ Init or8 < 9 . It follows that �.hbPSC(mo) is acyclic and so � is DPSC-consistent. �

G PROOFS FOR SECTION 7

Lemma 7.11. Let� be aDPTSOsyn-consistent execution graph. Suppose that for everyF ∈ �.W∪

�.RMW and �-unprotected event 4 ∈ Rloc(F) ∪ FOloc(F) , we have either 〈F, 4〉 ∈ (�.po ∪�.rf)+

or 〈4,F〉 ∈ (�.po ∪�.rf)+. Then, � is DPSC-consistent.

Proof. By Thm. 5.28, there exists a modification order mo for � such that �.hb(mo) and�.fr(mo) ;�.po are irreflexive. We show that�.hbPSC(mo) is irreflexive. Suppose otherwise. Letpo = �.po, rf = �.rf, fr = �.fr(mo), dtpo = �.dtpo(mo), ppo = �.ppo, and hb = �.hb(mo).Since po is transitive, (rf ∪mo ∪ fr ∪ dtpo) ; dtpo = ∅ (because of the domains and codomains

of the different relations), rf ; fr ⊆ mo ,mo ; fr ⊆ mo, fr ; fr ⊆ fr , dtpo ; fr ⊆ dtpo (all these easilyfollow from the fact that hb is irreflexive), and dom(rf ∪mo) ⊆ W ∪ RMW, it suffices to showthat po ; [W ∪ RMW] ∪ rf ∪mo ∪ po ; fr ∪ po ; dtpo is acyclic.For this matter, we show that

[(R∪W∪RMW∪R-ex)\Init];(po;fr∪po;dtpo)\(po;[W∪RMW]∪rf ∪mo)+ ⊆ ppo+;fr∪ppo+;dtpo .

Given the latter inclusion, since po ; [W∪RMW] ⊆ ppo , the acyclicity of po ; [W∪RMW] ∪ rf ∪

mo ∪ po ; fr ∪ po ; dtpo will follow from the fact that hb is irreflexive.Let 〈0, 2〉 ∈ [(R∪W∪RMW∪R-ex) \ Init] ; (po ; fr ∪po ;dtpo) \ (po ; [W∪RMW] ∪ rf ∪mo)+.

Let 1 ∈ E such that 〈0, 1〉 ∈ po and 〈1, 2〉 ∈ fr ∪ dtpo . Let G = loc(1). Consider the possible cases:

• 0 ∈ W, loc(0) ≠ G , 1 ∈ R, and 1 is �-protected: Then, we obtain that 〈0, 1〉 ∈ po ; [WG ∪

RMW∪R-ex∪MF] ;po . If 〈0, 1〉 ∈ po ; [RMW∪R-ex∪MF] ;po , then we have 〈0, 1〉 ∈ ppo+.Otherwise, there is some 1 ′ ∈ WG such that 〈0, 1 ′〉 ∈ po and 〈1 ′, 1〉 ∈ po . In this case itfollows that 〈1 ′, 2〉 ∈ mo , which contradicts the assumption that 〈0, 2〉 ∉ (po ; [W∪RMW] ∪

rf ∪mo)+.• 0 ∈ W, loc(0) ≠ G , 1 ∈ R, and 1 is not �-protected: Then, we must have either 〈2, 1〉 ∈

(po∪rf )+ or 〈1, 2〉 ∈ (po∪rf )+. In the first case we obtain that 〈1, 1〉 ∈ fr ; (po∪rf )+, whichcontradicts the fact that hb and fr ; po are irreflexive. In turn, the second case contradictsthe assumption that 〈0, 2〉 ∉ (po ; [W ∪ RMW] ∪ rf ∪mo)+.

• 0 ∈ W, loc(0) = G , and 1 ∈ R: In this case, we must have 〈0, 1〉 ∈ mo? ;rf and so 〈0, 2〉 ∈ mo ,which contradicts the assumption that 〈0, 2〉 ∉ (po ; [W ∪ RMW] ∪ rf ∪mo)+.

• 0 ∈ W, loc(0) ≠ G , and 1 ∈ FO: Then, if 1 is�-protected, we obtain that 〈0, 1〉 ∈ po ; [WG ∪

RMW ∪ R-ex ∪ MF ∪ SF] ; po ⊆ ppo+. Otherwise, we must have either 〈2, 1〉 ∈ (po ∪ rf )+

or 〈1, 2〉 ∈ (po ∪ rf )+. In the first case we obtain that 〈1,1〉 ∈ dtpo ; (po ∪ rf )+, whichcontradicts the fact that hb is irreflexive. In turn, the second case contradicts the assumptionthat 〈0, 2〉 ∉ (po ; [W ∪ RMW] ∪ rf ∪mo)+.

• Otherwise, the fact that 〈0, 1〉 ∈ po directly implies that 〈0, 1〉 ∈ ppo . �

Theorem 7.8. For a program Pr that is not strongly racy, a program state @ ∈ Pr .Q is reachableunder PTSOsyn iff it is reachable under PSC.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

Taming x86-TSO Persistency (Extended Version) 1:55

Proof. The right-to-left direction is trivial. For the left-to-right direction, suppose that @ ∈

Pr .Q is reachable under PTSOsyn. By Theorems 5.28 and C.2, @ is reachable under DPTSOmosyn. Let

�0, ... ,�= be DPTSOmosyn-consistent execution graphs that satisfy the conditions of Def. 5.13 (for

the program Pr and the state @). If all�8 ’s are DPSC-consistent, then @ is reachable under DPSC,and the claim follows using Thm. 6.5.Suppose otherwise. We show that Pr is strongly racy, which contradicts our assumption. Let

0 ≤ 8 ≤ =−1 be theminimal index such that�8 is notDPSC-consistent. Let� = �8 . Theminimalityof 8 ensures that�0, ... ,�8−1 are allDPSC-consistent as well. Hence, using the sequence�0, ... ,�8−1,by repeatedly applying Lemma F.1 and Prop. 5.11, we obtain that for<0 , <(�8−1) or<0 , <Init

if 8 = 0, we have that 〈@Init

,<0,Pn 〉 is reachable in Pr q PSC for some @Init

∈ Pr .QInit.Let �.hb = (�.po ∪�.rf)+ and let

, =

{F ∈ W ∪ RMW

���� ∃4.4 is �-unprotected ∧ 4 ∈ Rloc(F) ∪ FOloc(F) ∧

〈F, 4〉 ∉ (�.po ∪�.rf)+ ∧ 〈4,F〉 ∉ (�.po ∪�.rf)+

}.

By Lemma 7.11, , is not empty. Let F be a �.po ∪ �.rf-minimal event in , , and let 4 be a�.po∪�.rf-minimal�-unprotected event in Rloc(F) ∪FOloc(F) such that 〈F, 4〉 ∉ (�.po∪�.rf)+

and 〈4,F〉 ∉ (�.po ∪�.rf)+.Let � ′

= {4 ′ | 〈4 ′,F〉 ∈ (�.po ∪�.rf)+ ∨ 〈4 ′, 4〉 ∈ (�.po ∪�.rf)+} and � ′ be the executiongraph given by � ′.E = � ′, � ′.rf = [� ′.E] ; �.rf ; [� ′.E], and � ′.M = _G. maxmo �

′.E ∩ (WG ∪

RMWG ), where mo is some modification order for � that satisfies the conditions of Def. 6.4. It iseasy to see that � ′ is DPTSOsyn-consistent (since � is DPTSOsyn-consistent). The minimality ofF and 4 ensures that for every F ′ ∈ � ′.W ∪ � ′.RMW and � ′-unprotected event 4 ′ ∈ Rloc(F) ∪

FOloc(F) , we have either have 〈F′, 4 ′〉 ∈ (� ′.po ∪� ′.rf)+ or 〈4 ′,F ′〉 ∈ (� ′.po ∪� ′.rf)+. Hence,

by Lemma 7.11, � ′ is DPSC-consistent.Now, since � is generated by Pr , we clearly also have that � ′ is generated by Pr with some

final state @′. Hence, by Prop. 5.11, for every C ∈ traces(� ′), we have @Init

C=⇒Pr @

′ for some @Init

Pr .QInit. By Lemma F.1, some C ∈ traces(� ′) is an<0-to-<(� ′) PSC-observable-trace. It follows

that 〈@Init

,<0,Pn 〉C=⇒PrqPSC 〈@′,<(� ′),P〉 for some P .

Furthermore, the construction of � ′ ensures that for gW = tid(F) and g = tid(4), we havethat @′(gW) enables lab(F) and @′(gR) enables lab(4). To show that Pr is strongly racy, it remainsto show that lab(4) is unprotected in suffixtid(4) (C). Let � ′

4 be the execution graph given by� ′4 .E = � ′.E ∪ {4}, � ′

4 .rf = [� ′4 .E] ;�.rf ; [� ′

4 .E], and �′4 .M = _G. maxmo �

′4 .E ∩ (WG ∪ RMWG ).

Using Prop. 7.10, it suffices to show that 4 is� ′4-unprotected. The latter easily follows from the fact

that 4 is �-unprotected. �

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.

1:56 Artem Khyzha and Ori Lahav

4 ::= A | E | 4 + 4 | 4 = 4 | 4 ≠ 4 | ...

Inst ∋ inst ::= A := 4 | if 4 goto = | G := 4 | A := G |

A := FADD(G, 4) | A := CAS(G, 4, 4) |

mfence | fl(G) | fo(G) | sfence

Fig. 10. Programming language syntax.

( (pc) = A := 4

q ′= q [A ↦→ q (4)]

〈pc, q〉n−→( 〈pc + 1, q ′〉

( (pc) = if 4 goto =

q (4) ≠ 0

〈pc, q〉n−→( 〈=, q〉

( (pc) = if 4 goto =

q (4) = 0

〈pc, q〉n−→( 〈pc + 1, q〉

( (pc) = G := 4

; = W(G,q (4))

〈pc, q〉;−→( 〈pc + 1, q〉

( (pc) = A := G

; = R(G, E) q ′= q [A ↦→ E]

〈pc, q〉;−→( 〈pc + 1, q ′〉

( (pc) = A := FADD(G, 4)

; = RMW(G, E, E + q (4))

q ′= q [A ↦→ E]

〈pc, q〉;−→( 〈pc + 1, q ′〉

( (pc) = A := CAS(G, 4R, 4W)

; = RMW(G,q (4R), q (4W))

q ′= q [A ↦→ q (4R)]

〈pc, q〉;−→( 〈pc + 1, q ′〉

( (pc) = A := CAS(G, 4R, 4W)

; = R-ex(G, E) E ≠ q (4R)

q ′= q [A ↦→ E]

〈pc, q〉;−→( 〈pc + 1, q ′〉

( (pc) = mfence

; = MF

〈pc, q〉;−→( 〈pc + 1, q〉

( (pc) = fl(G)

; = FL(G)

〈pc, q〉;−→( 〈pc + 1, q〉

( (pc) = fo(G)

; = FO(G)

〈pc, q〉;−→( 〈pc + 1, q〉

( (pc) = sfence

; = SF

〈pc, q〉;−→( 〈pc + 1, q〉

Fig. 11. Transitions of LTS induced by a sequential program ( ∈ SProg.

H FROM PROGRAMS TO LABELED TRANSITION SYSTEMS

We present a concrete programming language syntax for (sequential) programs, and show howprograms in this language are interpreted as LTSs in the form assumed assumed in §2.1.Let Reg ⊆ {a, b, ...} be finite sets register names. Figure 10 presents our toy language. Its expres-

sions are constructed from registers (local variables) and values. Instructions include assignmentsand conditional branching, as well as memory operations.A sequential program ( is a function from a set of the form {0, 1, ... ,# } (the possible values of the

program counter) to instructions. It induces an LTS over Lab ∪ {n}. Its states are pairs @ = 〈pc, q〉

where ?2 ∈ N (called program counter) and q : Reg → Val (called local store, and extended toexpressions in the obvious way). Its initial state is 〈0, _A ∈ Reg. 0〉 and its transitions are given inFig. 11 (In particular, a read instruction in ( induces |Val| transitions with different labels.)

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2021.


Recommended