The English Lake District, Cumbria, UK · 2015. 11. 2. · The English Lake District, Cumbria, UK....

ICAPS 2006International Planning Competition

Table of contents

Preface 3

Part I: The Deterministic Track

Plan Constraints and Preferences in PDDL3 7Alfonso Gerevini and Derek Long

The Benchmark Domains of the Deterministic Part of IPC-5 14Yannis Dimopolus, Alfonso Gerevini, Patrik Haslum and Alessandro Saetti

Planning with Temporally Extended Preferences by Heuristic Search 20Jorge Baier, Jeremy Hussell, Fahiem Bacchus and Sheila McIllraith

YochanPS: PDDL3 Simple Preferences as Partial Satisfaction Plan-ning

23

J. Benton and Subbarao Kambhampati

IPPLAN: Planning as Integer Programming 26Menkes van den Briel, Subbarao Kambhampati and Thomas Vossen

Large-Scale Optimal PDDL3 Planning with MIPS-XXL 28Stefan Edelkamp, Shahid Jabbar and Mohammed Nazih

Optimal Symbolic PDDL3 Planning with MIPS-BDD 31Stefan Edelkamp

FDP: Filtering and Decomposition for Planning 34Stephane Grandcolas and Cyril Pain-Barre

Fast (Diagonally) Downward 37Malte Helmert

New Features in SGPlan for Handling Preferences and Constraints inPDDL3.0

39

Chih-Wei Hsu, Benjamin W. Wah, Ruoyun Huang and Yixin Chen

OCPlan - Planning for soft constraints in classical domains 42Bharat Ranjan Kavuluri, Naresh Babu Saladi, Rakesh Garwal and DeepakKhemani

SATPLAN04: Planning as Satisfiability 45Henry Kautz and Bart Selman

The resource YAHSP planner 47Marie de Roquemaurel, Pierre Regnier and Vincent Vidal

The New Version of CPT, an Optimal Temporal POCL Planner basedon Constraint Programming

50

Vincent Vidal and Sebastien Tabary

MaxPlan: Optimal Planning by Decomposed Satisfiability and Back-ward Reduction

53

Zhao Xing, Yixin Chen and Weixiong Zhang

Abstracting Planning Problems with Preferences and Soft Goals 56Lin Zhu and Robert Givan

Part II: The Probabilistic Track

POND: The Partially-Observable and Non-Deterministic Planner 58Daniel Bryce

Conformant-FF 61Joerg Hoffmann

COMPLAN: A Conformant Probabilistic Planner 63Jinbo Huang

cf2sat and cf2cs+cf2sat: Two Conformant Planners 66Hector Palacios

The Factored Policy Gradient planner (IPC-06 Version) 69Olivier Buffet and Douglas Aberdeen

Paragraph: A Graphplan-based Probabilistic Planner 72Iain Little

Probabilistic Planning via Linear Value-approximation of First-orderMDPs

74

Scott Sanner and Craig Boutilier

Symbolic Stochastic Focused Dynamic Programming with DecisionDiagrams

77

Florent Teichteil-Koenigsbuch and Patrick Fabiani

http://icaps06.icaps-conference.org/

ICAPS 2006International Planning Competition

Preface

The international planning competition is a biennial event with several goals, includinganalyzing and advancing the state-of-the-art in automated planning systems; providingnew data sets to be used by the research community as benchmarks for evaluatingdifferent approaches to automated planning; emphasizing new research issues in plan-ning; promoting the acceptance and applicability of planning technology.

The fifth international planning competition, IPC-5 for short, has attracted many re-searchers. As in the fourth competition, IPC-5 and its organization is split into two parts:the Deterministic Track, that considers fully deterministic and observable planning (pre-viously also called ”classical” planning), and the Probabilistic Track, that considers nondeterministic planning.

The deterministic part is organized by two groups of people: an organizing commit-tee, that is in charge of the various activities for running the competition, and a consult-ing committee, that was mainly involved in the early phase of the organization to discussan extension to the language of the competition (PDDL) to be used in IPC-5.

The deterministic part of IPC-5 has two main novelties with respect to previouscompetition. Firstly, while considering the CPU-time, we intend to give more emphasisto the importance of plan quality, as defined by the problem plan metric. Partly motivatedby this reason, we significantly extended PDDL to include some new constructs, aimingat a better characterization of plan quality by allowing the user to express strong and”soft” constraints about the structure of the desired plans, as well as strong and softproblem goals. The new language, called PDDL3, was developed in strict collaborationwith Derek Long, a member of the IPC-5 consulting committee.

In PDDL3.0, the version of PDDL3 used in the competition, we can express prob-lems for which only a subset of the goals and plan trajectory constraints can be achieved(because they conflict with each other, or because achieving all them is computationallytoo expensive), and where the ability to distinguish the importance of different goalsand constraints is critical. A planner should try to find a solution that satisfies as manysoft goals and constraints as possible, taking into account their importance and theircomputational costs. Soft goals and constraints, or preferences, as they are called inPDDL3.0, are taken into account by the plan metric, which can give a penalty for failureto satisfy each of the preferences (or, conversely, a bonus for satisfying them). Theextensions made in PDDL3.0 seem to have gained fairly wide acceptance, with morethan half the competing planners in the deterministic track supporting at least some ofthe new features.

Another novelty of the deterministic part of IPC-5 which required considerable ef-forts concerns the test domains: we designed five new planning domains, together witha large collection of benchmark problems. In order to make PDDL3.0 language moreaccessible to the competitors, for each test domain, we developed various variants usingdifferent fragments of PDDL3.0 with increasing expressiveness. In addition, we re-usedtwo domains from previous competitions, extended with new variants including someof the features of PDDL3.0. The IPC-5 test domains have different motivations. Someof them are inspired by real world applications; others are aimed at exploring the ap-plicability and effectiveness of automated planning for new applications or for problemsthat have been investigated in other field of computer science; while the domains fromprevious competitions are used as sample references for measuring the advancementof the current planning systems with respect to the existing benchmarks.

The probabilistic track of the competition appeared for the first time in the fourth edi-tion of the competition in 2004. The probabilistic track consists of probabilistic planningproblems with complete observability specified in the PPDDL language. The focus ofthe competition is in planners that can deliver real-time decision making as opposedto complete policies. The planners are evaluated using the client/server architecturedeveloped for the probabilistic track of IPC-4. Thus, any type of planner can enter thecompetition as long as it is able to choose and send actions to the server. The plannersare evaluated in a number of episodes for each instance problem from which an esti-mate of the average cost to the goal of planner’s policy is computed. The planners arethen ranked using such scores.

This year’s competition includes, for the first time, a conformant planning subtrackwithin the probabilistic track. In conformant planning, the planners are faced with non-deterministic planning problems and required to output a contingency-safe and linearplan that solves the problem. Planners in this subtrack are evaluated in terms of theCPU time required to output a valid plan.

We have included novel and interesting domains in the probabilistic and conformanttracks which aims to reveal interesting tradeoffs in non-deterministic planning. The do-main codifications are as simple as possible trying to avoid complex syntactic constructssuch as nested conditional effects, disjunctive preconditions and goals, etc. Indeed,some domains are grounded codifications (as some domains in the deterministic trackof IPC-4), while others are ’lifted’ first-order codifications of problems, which can be ex-ploited by some of the planners. We have included problem generators for almost all thedomains so to allow the competitors to tune their planners. The competition benchmarkconsisted of a set of domains for practice and another set for the actual competition.

In the deterministic track of IPC-5, there are 14 competing teams (initially they were18, but 4 of them had to withdraw their planners during the competition), each of whichcan participate with at most two planners (or variants of the same planner), and 40participating researchers from various universities and research institutes in Europe,USA, Canada and India.

The probabilistic track consists of 8 teams divided into 2 groups of 4 teams each forprobabilistic and conformant planning respectively. The teams are from various univer-sities and research institutes in USA, Canada, Europe and Australia.

At the time of writing the competition is still running. The results will be announcedat ICAPS’06 and made available from the deterministic and probabilistic websites of thecompetition. This booklet contains the abstracts of the IPC-5 planners that are currentlyrunning the competition tests. The descriptions of the planners may be in many casespreliminary, since the systems continue to evolve as they are faced with new problemdomains.

The planner abstracts of the deterministic part of IPC-5 are preceded by an ex-tended abstract describing the main features of PDDL3.0, which was distributed aboutsix month before starting the competition, and by an extended abstract giving a shortdescription of the benchmark domains.

The organizing committees of both tracks would like to send their best wishes anda great thanks to all the competing teams - it is mainly their hard efforts that make thecompetition such an exciting event!

Blai Bonet (Co-Chair Probabilistic Track)Alfonso Gerevini (Chair Deterministic Track)Bob Givan (Co-Chair Probabilistic Track)

Organizers (Deterministic track)

• Yannis Dimopoulos - University of Cyprus (Cyprus)• Alfonso Gerevini (chair) - University of Brescia (Italy)• Patrik Haslum - Linköping University (Sweden)• Alessandro Saetti - University of Brescia (Italy)

Organizers (Probabilistic track)

• Blai Bonet (co-chair) - Universidad Simn Bolvar (Venezuela)• Robert Givan (co-chair) - Purdue University (U.S.A.)

Consulting Committee (Deterministic Track)

• Stefan Edelkamp• Maria Fox• Joerg Hoffmann• Derek Long• Drew McDermott• Len Schubert• Ivan Serina• David Smith• Dan Weld

Consulting Committee (Probabilistic Track)

• Hector Geffner• Sylvie Thiebaux

Plan Constraints and Preferences in PDDL3The Language of the Deterministic Part of the Fifth International Planning Competition

Extended Abstract

Alfonso Gerevini+ and Derek Long∗

+ Department of Electronics for Automation, University of Brescia (Italy), [email protected]∗ Department of Computer and Information Sciences, University of Strathclyde (UK), [email protected]

Abstract

We propose an extension to the PDDL language, calledPDDL3.0, that aims at a better characterization of plan qual-ity by allowing the user to express strong and soft constraintsabout the structure of the desired plans, as well as strong andsoft problem goals. PDDL3.0 was the reference language ofthe 5th International Planning competition (IPC-5). This pa-per contains most of the document about PDDL3.0 that wasdiscussed by the Consulting Committee of IPC-5, and thendistributed to the IPC-5 competitors.

IntroductionThe notion of plan quality in automated planning is a prac-tically very important issue. In many real-world planningdomains, we have to address problems with a large set ofsolutions, or with a set of goals that cannot all be achieved.In these problems, it is important to generate plans of goodor optimal quality achieving all problem goals (if possible)or some subset of them.

In the previous International planning competitions, theplan generation CPU-time played a central role in the eval-uation of the competing planners. In the fifth Internationalplanning competition (IPC-5), while considering the CPU-time, we would like to give greater emphasis to the impor-tance of plan quality. The versions of PDDL used in the pre-vious two competitions (PDDL2.1 and PDDL2.2) allow usto express some criteria for plan quality, such as the numberof plan actions or parallel steps, and relatively complex planmetrics involving plan makespan and numerical quantities.These are powerful and expressive in domains that includemetric fluents, but plan quality can still only be measured byplan size in the case of propositional planning. We believethat these criteria are insufficient, and we propose to extendPDDL with new constructs increasing its expressive powerabout the plan quality specification.

The proposed extended language allows us to expressstrong and soft constraints on plan trajectories (i.e. con-straints over possible actions in the plan and intermediatestates reached by the plan), as well as strong and soft prob-lem goals (i.e. goals that must be achieved in any valid plan,and goals that we desire to achieve, but that do not have to benecessarily achieved). Strong constraints and goals must besatisfied by any valid plan, while soft constraints and goalsexpress desired constraints and goals, some of which may

be more preferred than others. Informally, in planning withsoft constraints and goals, the best quality plan should sat-isfy “as much as possible” the soft constraints and goals ac-cording to the specified preference relation distinguishingalternative feasible plans (satisfying all strong constraintsand goals). While soft constraints have been extensivelystudied in the CSP literature, only very recently has theplanning community started to investigate them (Brafman& Chernyavsky 2005; Briel et al. 2004; Delgrande, Schaub,& Tompits 2005; Miguel, Jarvis, & Shen 2001; Smith 2004;Son & Pontelli 2004), and we believe that they deserve moreresearch efforts.

The following are some informal examples of plan trajec-tory constraints and soft goals. Additional formal exampleswill be given in the next section.

Examples in a blocksworld domain: a fragile block cannever have something above it, or it can have at most oneblock on it; we would like that the blocks forming the sametower always have the same colour; in some state of theplan, all blocks should be on the table.

Examples in a transportation domain: we would like thatevery airplane is used (instead of using only a few airplanes,because it is better to distribute the workload among theavailable resources and limit heavy usage); whenever a shipis ready at a port to load the containers it has to transport,all such containers should be ready at that port; we wouldlike that at the end of the plan all trucks are clean and attheir source location; we would like no truck to visit anydestination more than once.

When we have soft constraints and goals, it can be usefulto give different priorities to them, and this should be takeninto account in the plan quality evaluation. While there ismore than one way to specify the importance of a soft con-straint or goal, as a first attempt to tackle this issue, for IPC-5 we have chosen a simple quantitative approach: each softconstraint and goal is associated with a numerical weightrepresenting the cost of its violation in a plan (and hencealso its relative importance with respect the other specifiedsoft constraints and goals). Weighted soft constraints andgoals are part of the plan metric expression, and the bestquality plans are those optimising such an expression (moredetails are given in the next sections).

ICAPS 2006

International Planning Competition 7

Using this approach we can express that certain plans aremore preferred than others. Some examples are (other for-malised examples are given in the next sections):1

I prefer a plan where every airplane is used, rather thana plan using 100 units of fuel less, which could be expressedby weighting a failure to use all the planes by a number 100times bigger than the weight associated with the fuel use inthe plan metric; I prefer a plan where each city is visitedat most once, rather than a plan with a shorter makespan,which could be expressed by using constraint violation costspenalising a failure to visit each city at most once very heav-ily; I prefer a plan where at the end each truck is at its startlocation, rather than a plan where every city is visited byat most one truck, which could be expressed by using goalcosts penalising a goal failure of having every truck at itsstart location more heavily than a failure of having in theplan every city visited by at most one truck.

We also observe that the rich additional expressive powerwe propose to add for goal specifications allows the ex-pression of constraints that are actually derivable necessaryproperties of optimal plans. By adding them as goal con-ditions, we have a way to express constraints that we knowwill lead to the planner finding optimal plans. Similarly, onecan express constraints that prevent a planner from exploringparts of the plan space that are known to lead to inefficientperformance.

In the next sections, we outline some extensions toPDDL2.2 that we propose for IPC-5. We call the extendedlanguage PDDL3.0. It should be noted that this is a pre-liminary version of the extended language, and that a moredetailed description will be prepared in the future. More-over, given that the proposed extensions are relatively newin the planning community, and that the teams participatingin IPC-5 will have limited time to develop their systems, weimpose some simplifying restrictions to make the languagemore accessible.

State Trajectory ConstraintsSyntax and Intended MeaningState trajectory constraints assert conditions that must bemet by the entire sequence of states visited during the ex-ecution of a plan. They are expressed through temporalmodal operators over first order formulae involving statepredicates. We recognise that there would be value in alsoallowing propositions asserting the occurrence of action in-stances in a plan, rather than simply describing properties ofthe states visited during execution of the plan, but we chooseto restrict ourselves to state predicates in this extension ofthe language. The use of the extensions described here im-ply a new requirements flag, :constraints.

The basic modal operators we propose to use in IPC-5are: always, sometime, at-most-once, and atend (forgoal state conditions). We use a special default assumptionthat unadorned conditions in the goal specification are auto-matically taken to be “at end” conditions. This assumption

1The benchmark domains and problems of IPC-5 contain manyadditional examples; some samples of them are described in(Gerevini & Long 2006).

is made in order to preserve the standard meaning for exist-ing goal specifications, despite the fact that in a standardsemantics for an LTL formula an unadorned propositionwould be interpreted according to the current state. We addwithin which can be used to express deadlines. In addition,rather than allowing arbitrary nesting of modal operators,we introduce some specific operators that offer some limitednesting. We have sometime-before, sometime-after,always-within. Other modalities could be added, but webelieve that these are sufficiently powerful for an initial levelof the sublanguage modelling constraints.

It should be noted that, by combining these modalitieswith timed initial literals (defined in PDDL2.2), we can ex-press further goal constraints. In particular, one can spec-ify the interval of time when a goal should hold, or thelower bound on the time when it should hold. Since theseare interesting and useful constraints, we introduce twomodal operators as “syntactic sugar” of the basic language:hold-during and hold-after.

Trajectory constraints are specified in the planning prob-lem file in a new field, called :constraints that will usu-ally appear after the goal. In addition, we allow constraintsto be specified in the action domain file on the grounds thatsome constraints might be seen as safety conditions, or op-erating conditions, that are not physical limitations, but arenevertheless constraints that must always be respected in anyvalid plan for the domain (say legal constraints or operatingprocedures that must be respected). This also uses a sec-tion labelled (:constraints ...). The interpretation of(:constraints ...) in the conjunction of a domain anda problem file is that it is equivalent to having all the con-straints added to the goals. The use of trajectory constraints(in the domain file or in the goal specification) implies theneed for the :constraints flag in the :requirementslist.

Note that no temporal modal operator is allowed in pre-conditions of actions. That is, all action preconditions arewith respect to a state (or time interval, in the case ofoverall action conditions).

The specific BNF grammar of PDDL3.0 is given in(Gerevini & Long 2005). The following is a fragment ofthe grammar concerning the new modalities of PDDL3.0 forexpressing constraints (con-GD): ::= (at end ) | (always ) |

(sometime ) | (within ) |

(at-most-once ) |

(sometime-after ) |

(sometime-before ) |

(always-within ) |

(hold-during |

(hold-after | ...

where is a goal description (a first order logic for-mula), is any numeric literal (in STRIPS domainsit will be restricted to integer values). There is a minor com-plication in the interpretation of the bound for within andalways-withinwhen considering STRIPS plans (and sim-ilarly for hold-during and hold-after): the question iswhether the bound refers to sequential steps (in other words,actions) or to parallel steps. For STRIPS plans, the numericbounds will be counted in terms of plan happenings. For

ICAPS 2006

8 International Planning Competition

instance, (within 10 φ) would mean that φ must holdwithin 10 happenings. These would be happenings of oneaction or of multiple actions, depending on whether the planis sequential or parallel.

Notes on SemanticsThe semantics of goal descriptors in PDDL2.2 evaluatesthem only in the context of a single state (the state of ap-plication for action preconditions or conditional effects andthe final state for top level goals). In order to give meaningto temporal modalities, which assert properties of trajecto-ries rather than individual states, it is necessary to extendthe semantics to support interpretation with respect to a fi-nite trajectory (as it is generated by a plan). We propose asemantics for the modal operators that is the same basic in-terpretation as is used in TLPlan (Bacchus & Kabanza 2000)for LT and other standard LTL treatments. Recall that ahappening in a plan for a PDDL domain is the collection ofall effects associated with the (start or end points of) actionsthat occur at the same time. This time is then the time of thehappening and a happening can be “applied” to a state by si-multaneously applying all effects in the happening (which iswell defined because no pair of such effects may be mutex).

Definition 1 Given a domain D, a plan π and an initialstate I , π generates the trajectory

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉

iff S0 = I and for each happening h generated by π, withh at time t, there is some i such that ti = t and Si is theresult of applying the happening h to Si−1, and for everyj ∈ {1 . . . n} there is a happening in π at tj .Definition 2 Given a domain D, a plan π, an initial stateI , and a goal G, π is valid if the trajectory it gen-erates, 〈(S0, 0), (S1, t1), ..., (Sn, tn)〉, satisfies the goal:〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= G.

This definition contrasts with the original semantics ofgoal satisfaction, where the requirement was that Sn |= G.The contrast reflects precisely this requirement that goalsshould now be interpreted with respect to an entire trajec-tory. We do not allow action preconditions to use modaloperators and therefore their interpretation continues to berelative to the single state in which the action is applied. Theinterpretation of simple formulae, φ (containing no modali-ties), in a single state S continues to be as before and con-tinues to be denoted S |= φ. In the following definition werely on context to make clear where we are using the inter-pretation of non-modal formulae in single states, and wherewe are interpreting modal formulae in trajectories.

Definition 3 Let φ andψ be atomic formulae over the predi-cates of the planning problem plus equality (between objectsor numeric terms) and inequalities between numeric terms,and let t be any real constant value. The interpretation ofthe modal operators is as specified in Figure 1.

Note that this interpretation exploits the fact that modaloperators are not nested. A more general semantics fornested modalities is a straight-forward extension of this one.

Note also that the last four expressions in Figure 1 are ex-pressible in different ways if one allows nesting of modali-ties and use of the standard LTL modality until (more detailson this in (Gerevini & Long 2005)).

The constraint at-most-once is satisfied if its argumentbecomes true and then stays true across multiple states andthen (possibly) becomes false and stays false. Thus, there isonly at most one interval in the plan over which the argu-ment proposition is true.

For general formulae (which may or may not containmodalities):

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (and φ1...φn) iff, forevery i, 〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= φiand similarly for other connectives.

Of the constraints hold-during and hold-after,(hold-during t1 t2 φ) states that φ must be true duringthe interval [t1, t2), while (hold-after t φ) states that φmust be true after time t. The first can be expressed by usingtimed initial literals to specify that a dummy timed literal dis true during the time window [t1, t2) together with the goal(always(implies d φ)).A variant of hold-during where φ must hold exactly dur-ing the specified interval could be easily obtained in a similarway. The second can be expressed by using timed initial lit-erals to specify that d is true only from time t, together withthe goal (sometime-after d φ).

Soft Constraints and PreferencesA soft constraint is a condition on the trajectory generated bya plan that the user would prefer to see satisfied rather thannot satisfied, but is prepared to accept might not be satisfiedbecause of the cost of satisfying it, or because of conflictswith other constraints or goals. In case a user has multiplesoft constraints, there is a need to determine which of thevarious constraints should take priority if there is a conflictbetween them or if it should prove costly to satisfy them.This could be expressed using a qualitative approach but,following careful deliberations, we have chosen to adopt asimple quantitative approach for this version of PDDL.

Syntax and Intended MeaningThe syntax for soft constraints falls into two parts. Firstly,there is the identification of the soft constraints, and sec-ondly there is the description of how the satisfaction, or lackof it, of these constraints affects the quality of a plan.

Goal conditions, including action preconditions, can belabelled as preferences, meaning that they do not have to betrue in order to achieve the corresponding goal or precondi-tion. Thus, the semantics of these conditions is simple, asfar as the correctness of plans is concerned: they are all triv-ially satisfied in any state. The role of these preferences isapparent when we consider the relative quality of differentplans. In general, we consider plans better when they satisfysoft constraints and worse when they do not. A complicationarises, however, when comparing two plans that satisfy dif-ferent subsets of constraints (where neither set strictly con-tains the other). In this case, we rely on a specification ofthe violation costs associated with the preferences.

ICAPS 2006


〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (atend φ) iff Sn |= φ〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= φ iff Sn |= φ〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (always φ) iff ∀i : 0 ≤ i ≤ n · Si |= φ〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (sometime φ) iff ∃i : 0 ≤ i ≤ n · Sj |= φ〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (within t φ) iff ∃i : 0 ≤ i ≤ n · Si |= φand ti ≤ t〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (at-most-once φ) iff ∀i : 0 ≤ i ≤ n · if Si |= φ then

∃j : j ≥ i · ∀k : i ≤ k ≤ j · Sk |= φand ∀k : k > j · Sk |= ¬φ

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (sometime-after φ ψ) iff ∀i · if Si |= φ then ∃j : i ≤ j ≤ n · Sj |= ψ〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (sometime-before φ ψ) iff ∀i · if Si |= φ then ∃j : 0 ≤ j < i · Sj |= ψ〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (always-within t φ ψ) iff ∀i · if Si |= φ then ∃j : i ≤ j ≤ n · Sj |= ψ

and tj − ti ≤ t

Figure 1: Semantics of the basic modal operators in PDDL3.

The syntax for labelling preferences is simple:

(preference [name] ).

The definition of a goal description can be extended toinclude preference expressions. However, in PDDL3.0, wereject as syntactically invalid any expression in which pref-erences appear nested inside any connectives, or modalities,other than conjunction and universal quantifiers. We alsoconsider it a syntax violation if a preference appears in thecondition of a conditional effect. Note that where a namedpreference appears inside a universal quantifier, it is consid-ered to be equivalent to a conjunction (over all legal instan-tiations of the quantified variable) of preferences all with thesame name.

Where a name is selected for a preference it can be used torefer to the preference in the construction of penalties for theviolated constraint. The same name can be shared betweenpreferences, in which case they share the same penalty.

Penalties for violation of preferences are calculated usingthe expression

(is-violated )

where is a name associated with one or morepreferences. This expression takes on a value equal to thenumber of distinct preferences with the given name that arenot satisfied in the plan. Note that in PDDL3.0 we do notattempt to distinguish degrees of satisfaction of a soft con-straint — we are only concerned with whether or not theconstraint is satisfied. Note, too, that the count includes eachseparate constraint with the same name. This means that:

(preference VisitParis(forall (?x - tourist)

(sometime (at ?x Paris))))

yields a violation count of 1 for (is-violatedVisitParis), if at least one tourist fails to visit Parisduring a plan, while

(forall (?x - tourist)(preference VisitParis

(sometime (at ?x Paris))))

yields a violation count equal to the number of people whofailed to visit Paris during the plan. The intention behind

this is that each preference is considered to be a distinct pref-erence, satisfied or not independently of other preferences.The naming of preferences is a convenience to allow dif-ferent penalties to be associated with violation of differentconstraints.

Plans are awarded a value through the plan metric, intro-duced in PDDL2.1 (Fox & Long 2003). The constraints canbe used in weighted expressions in a metric. For example,(:metric minimize

(+ (* 10 (fuel-used))(is-violated VisitParis)))

would weight fuel use as ten times more significant than vi-olations of the VisitParis constraint. Note that the vi-olation of a preference in the preconditions of an action iscounted multiple times, depending on the number of the ac-tion occurrences in the plan. For instance, suppose that p isa preference in the precondition of an action a, which occursthree times in plan π. If the plan metric evaluating π con-tains the term (* k (is-violated p)), then this is in-terpreted as if it were (* v (* k (is-violated p))),where v is the number of separate occurrences of a in π forwhich the preference is not satisfied.

SemanticsWe say that

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (preference Φ)is always true, so this allows preference statements to becombined in formulae expressing goals. The point in mak-ing the formula always true is that the preference is a softconstraint, so failure to satisfy it is not considered to falsifythe goal formula. In the context of action preconditions, wesay Si |= (preference Φ) is always true, too, for the samereasons.

We also say that a preference (preference Φ) is sat-isfied iff 〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= Φ and violatedotherwise. This means that (or Φ(preferenceΨ)) is thesame as (preference(or ΦΨ)), both in terms of the sat-isfaction of the formulae and also in terms of whether thepreference is satisfied. The same idea is applied to actionprecondition preferences. Hence, a goal such as:(and (at package1 london)

ICAPS 2006


(preference (clean truck1)))

would lead to the following interpretation:

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |=(and (at package1 london)

(preference (clean truck1)

iff

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |=(at package1 london)

and

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |=(preference (clean truck1))

iff Sn |= (at package1 london)iff (at package1 london) ∈ Sn, since the preference

is always interpreted as true. In addition, the preferencewould be satisfied iff:

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |=(at end (clean truck1))

iff (clean truck1) ∈ Sn.If the preference is not satisfied, it is violated.

Now suppose that we have the following preferences andplan metric:

(preference p1 (always (clean truck1)))

(preference p2 (and (at end (at package2 paris))

(sometime (clean track1))))

(preference p3 (...))

(:metric (+ (* 10 (is-violated p1)) (* 5 (is-violated p2))

(is-violated p3))).

Suppose we have two plans, π1, π2, and π1 does not satisfypreferences p1 and p3 (but it satisfies preference p2) andπ2 does not satisfy preferences p2 and p3 (but it satisfiespreference p1), then the metric for π1 would yield a value(11) that is higher than that for π2 (6) and we would say thatπ2 is better than π1.

Formally, a preference precondition is satisfied if the statein which the corresponding action is applied satisfies thepreference. Note that the restriction on where preferencesmay appear in precondition formulae and goals, togetherwith the fact that they are banned from conditional effects,means that this definition is sufficient: the context of theirappearance will never make it ambiguous whether it is nec-essary to determine the status of a preference. Similarly, agoal preference is satisfied if the proposition it contains issatisfied in the final state. Finally, an invariant (overall)condition of a durative action is satisfied if the correspond-ing proposition is true throughout the duration of the action.

In some case, it can be hard to combine preferences withan appropriate weighting to achieve the intended balance be-tween soft constraints and other factors that contribute to thevalue of a plan (such as plan make span, resource consump-tion and so on). For example, to ensure that a constrainttakes priority over a plan cost associated with resource con-sumption (such as make span or fuel consumption) is partic-ularly tricky: a constraint must be weighted with a value thatis higher than any possible consumption cost and this might

not be possible to determine. With non-linear functions itis possible to achieve a bounded behaviour for costs associ-ated with resources. For example, if a constraint, C, is to beconsidered always to have greater importance than the makespan for the plan then a metric could be defined as follows:(:metric minimize (+ (is-violated C)

(- 1 (/ 1 (total-time))))).

This metric will always prefer a plan that satisfiesC, but willuse make span to break ties.

Nevertheless, for the competition, where it is importantto provide an unambiguous specification by which to rankplans, the use of plan metrics in this way is clearly verystraightforward and convenient. We leave for later proposalsthe possibilities for extending the evaluation of plans in theface of soft constraints.

Some ExamplesThe following state trajectory constraints could be stated ei-ther as strong constraints or soft constraints.“A fragile block can never have something above it”:(always (forall (?b - block)

(implies (fragile ?b) (clear ?b))))

“A fragile block can have at most one block on it”:(always (forall (?b1 ?b2 - block)

(implies (and (fragile ?b1) (on ?b2 ?b1))

(clear ?b2))))

“The blocks forming the same tower always have the samecolor”:(always (forall (?b1 ?b2 - block ?c1 ?c2 - color)

(implies (and (on ?b1 ?b2) (color ?b1 ?c1)

(color ?b2 ?c2))

(= ?c1 ?c2))))

“Each block should be picked up at least once”:(forall (?b - block) (sometime (holding ?b)))

“Each block should be picked up at most once”:(forall (?b - block) (at-most-once (holding ?b)))

“In some state visited by the plan all blocks should be on thetable”:(sometime (forall (?b - block) (on-table ?b)))

This constraint requires all the blocks to be on the tablein the same state. In contrast, if we only require that everyblock should be on the table in some state we can write:(forall (?b - block) (sometime (on-table ?b)))

“Whenever I am at a restaurant, I want to have a reserva-tion”:(always (forall (?r - restaurant)

(implies (at ?r) (have-reservation ?r)))

“Each truck should visit each city at most once”:(forall (?t - truck ?c - city) (at-most-once (at ?t ?c)))

“At some point in the plan all the trucks should be at city1”:(sometime (forall (?t - truck) (at ?t city1)))

“Each truck should visit each city exactly once”:(and (forall (?t - truck ?c - city)

(at-most-once (at ?t ?c)))

(forall (?t - truck ?c - city)

(sometime (at ?t ?c))))

ICAPS 2006


“Each city is visited by at most one truck at the same time”:

(forall (?t1 ?t2 - truck ?c1 city)

(always (implies (and (at ?t1 ?c1)

(at ?t2 ?c1)) (= ?t1 ?t2))))

The following two examples use the IPC-3 Rovers domaininvolving numerical fluents. “We would like that the energyof every rover should always be above the threshold of 5units”:

(always (forall (?r - rover) (> (energy ?r) 5))))

“Whenever the energy of a rover is below 5, it should be atthe recharging location within 10 time units”:

(forall (?r - rover)

(always-within 10 (< (energy ?r) 5)

(at ?r recharging-point)))

The next two examples illustrate the usefulness ofsometime-before and sometime-after. The first onestates that “a truck can visit a certain city (where initiallythere is no truck) only after having visited another particularone”; the second one that “if a taxi has been used and it is atthe depot, then it has to be cleaned” (if a taxi is used but itdoes not go back to the depots, then there is no need to cleanit).

(forall (?t - truck)

(sometime-before (at ?t city1) (at ?t city2)))

(forall (?t - taxi)

(sometime-after (and (at ?t depot) (used ?t))

(clean ?t)))

“We want a plan moving package1 to London such thattruck1 is always maintained clean, and at some point truck2is at Paris. Moreover, we also prefer that truck3 is alwaysclean and that at the end of the plan package2 is at London”:

(:goal (and (at package1 london)

(preference (at package2 london))))

(:constraints

(and (always (clean truck1))

(sometime (at truck2 paris))

(preference (always (clean truck3)))

(preference (at end (at package2 london)))))

“We prefer that every fragile package to be transported isinsured”.

(forall (?p - package)

(preference P1

(always (implies (fragile ?) (insured ?p)))))

We now consider an example with a plan metric.“We want three jobs completed. We would prefer to take acoffee-break and that we take it when everyone else takesit (at coffee-time) rather than at any time. We would alsolike to finish reviewing a paper, but it is less important thantaking a break. Finally, we would like to be finished so thatwe can get home at a reasonable time, and this matters morethan finishing the review or having a sociable coffee break”:

(:goal (and (finished job1)

(finished job2)

(finished job3)) )

(:constraints

(and (preference break

(sometime (at coffee-room)))

(preference social

(sometime (and (at coffee-room) (coffee-time))))

(preference reviewing (reviewed paper1))))

(:plan-metric minimize

(+ (* 5 (total-time))

(* 4 (is-violated social))

(* 2 (is-violated break))

(is-violated reviewing)))

Now consider three plans, π1, π2 and π3, such that allthree plans complete the three jobs. Suppose π1 achievesthis in 4 hours, but takes no break and does not include re-viewing the paper. Suppose π2 completes the jobs in 8 hours,but takes a coffee-break at coffee-time and reviews the pa-per. Finally, π3 completes the jobs in 6 hours, includingreviewing the paper, but only by taking a short break whenthe coffee room is empty. Then the values of the plans are:

Plan Qualityπ1 5*4 + 4*1 + 2*1 + 1 = 27π2 5*8 + 4*0 + 2*0 + 0 = 40π3 5*6 + 4*1 + 2*0 + 0 = 34

This makes π1 the best plan and π2 the worst.

Plan Validation and EvaluationA plan validator will be developed as an extension of theexisting validator used in the previous competitions. Thetwo key aspects of this extension are checking state tra-jectory constraints in the goal, which does not complicatethe execution simulation for a plan, and the checking ofpreferences in order to compare plans. This latter exten-sion will involve identifying the constraint violations as-sociated with each plan and their violation times, in or-der to evaluate the plan quality according to the specifiedmetric (which may include terms for the preference viola-tions). The organizers of IPC-5 are considering the pos-sibility of using different variants of the test problems in-volving only strong constraints or soft constraints, with apossible additional distinction between simple preferences,involving only goals or action preconditions, and more com-plex preferences involving general soft constraints. Moredetails about this organization of the benchmarks will be an-nounced in the the web page of the deterministic track ofIPC-5: http://ipc5.ing.unibs.it.

Extensions and GeneralizationThere is considerable scope for developing the proposed ex-tension. First, and most obviously, modal operators could beallowed to nest. This would allow a rich expressive powerin the specification of modal temporal goals. Nesting wouldallow constraints to be applied to parts of trajectories, as isusual in modal temporal logics. In addition, we could in-troduce propositions representing that an action appears in aplan.

Other modal operators could be added. We have excludedthem PDDL3.0 because we have found that many interest-ing and challenging goals can be captured without them,

ICAPS 2006


〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (always-persist t φ) iff ∀i : 0 < i ≤ n · if Si |= φ and Si−1 |= ¬φ then∃j : j − i ≥ t · ∀z : i ≤ z ≤ j · Sz |= φ andif S0 |= φ then ∀z : z ≤ t · Sz |= φ

〈(S0, 0), (S1, t1), ..., (Sn, tn)〉 |= (always-persist t φ) iff ∃i : 0 < i ≤ n · if Si |= φ and Si−1 |= ¬φ then∃j : j − i ≥ t · ∀z : i ≤ z ≤ j · Sz |= φ, orif S0 |= φ then ∀z : z ≤ t · Sz |= φ

Figure 2: Semantics of always-persist and sometime-persist.

and we do not wish to add unnecessarily to the load onpotential competitors. The modal operator until would bean obvious one to add. Without nesting, a related always-until and sometime-until would allow expression of goalssuch as “every time a truck arrives at the depot, it must staythere until loaded” or “when the truck arrives at the depot,it must stay there until cleaned and fully refuelled at leastonce in the plan”. The formal semantics of always-untiland sometime-until can be easily derived from the one ofuntil in LTL. By combining always-until and other modali-ties we can express complex constraints such as that “when-ever the energy of a rover is below 5, it should be at therecharging location within 10 time units and remain thereuntil recharged”:(and (always-until (charged ?r) (at ?r rechargepoint))

(always-within 10 (< (charge ?r) 5)

(at ?r rechargingpoint)))

Another modality that would be an useful extension ofthe expressive power is a complement for within, such aspersist, with the semantics that a proposition once madetrue must persist for at least some minimal period of time.Without nesting, a related always-persist and sometime-persist would allow expression of goals such as “I want tospend at least 2 days in each of the cities on my tour”, or“every time the taxi goes to the station it must wait for atleast 10 without a passenger”.The formal semantics of always-persist and sometime-persist is given in Figure 2. A generalisation that wouldallow within and persist to be combined would be to al-low the time specification to be associated with a compar-ison operator to indicate whether the bound is an upper orlower bound.

We have deliberately not introduced the operator next,which is common in modal temporal logics. This is becauseconcurrent fragments of a plan might cause a state changethat is not relevant to the part of the state in which the nextcondition is intended to apply. Furthermore, the fact thatPDDL plans are embedded on a real time line means that theintention behind next is less obviously relevant. We realisethat next has been particularly useful in expressing controlrules for planners like TALPlanner (Kvarnström & Magnus-son 2003) and TLPlan (Bacchus & Kabanza 2000), but ourintention in developing this extension is to focus on provid-ing a language that is useful for expressing constraints thatgovern plan quality, rather than for control knowledge. Webelieve that the use of always-within captures a muchmore useful concept for plan quality that is actually a farmore realistic constraint in modelling planning problems.

Extensions to the use of soft constraints include the def-

inition of more complex preferences, such as conditionalpreferences, and a possible qualitative method for express-ing priorities over preferences. Moreover, the evaluationof the soft constraints could be extended by consideringa degree of constraint violation, such as the amount oftime when an always constraint is violated, the delay thatfalsifies a within constraint, or the number of times analways-after constraint is violated.

AcknowledgmentsWe would like to thank Y. Dimopoulos, C. Domshlak, S.Edelkamp, M. Fox, P. Haslum, J. Hoffmann, A. Jonsson, D.McDermott, A. Saetti, L. Schubert, I.Serina, D. Smith andD. Weld for some very useful discussions about PDDL3.

ReferencesBacchus, F., and Kabanza, F. 2000. Using temporal logic toexpress search control knowledge for planning. Artificial Intelli-gence 116(1-2):123–191.Brafman, R., and Chernyavsky, Y. 2005. Planning with goalpreferences and constraints. In Proc. of ICAPS-05.Briel, M.; Sanchez, R.; Do, M.; and Kambhampati, S. 2004.Effective approaches for partial satisfaction (over-subscription)planning. In Proc. of the AAAI-04.Delgrande, P. J.; Schaub, T.; and Tompits, H. 2005. A gen-eral framework for expressing preferences in causal reasoning andplanning. In Proc. of the 7th Int. Symposium on Logical Formal-izations of Commonsense Reasoning.Fox, M., and Long, D. 2003. PDDL2.1: An extension to PDDLfor expressing temporal planning domains. Journal of AI Re-search 20:pp. 61–124.Gerevini, A., and Long, D. 2005. Plan constraints and pref-erences in PDDL3. Technical Report RT-2005-08-47, Dep. diElettronica per l’Automazione, Universitá di Brescia, Italy. Anextension with the BNF grammar of PDDL3.0 is available fromhttp://ipc5.ing.unibs.it.Gerevini, A., and Long, D. 2006. Preferences and soft constraintsin PDDL3. In Proc. of ICAPS Workshop on Preferences and Softconstraints in Planning.Kvarnström, J., and Magnusson, M. 2003. Talplanner in the 3rdinternational planning competition: Extensions and control rules.Journal of AI Research 20.Miguel, I.; Jarvis, P.; and Shen, Q. 2001. Efficient flexible plan-ning via dynamic flexible constraint satisfaction. Engineering Ap-plications of Artificial Intelligence 14(3):301–327.Smith, D. 2004. Choosing objectives in over-subscription plan-ning. In Proc. of ICAPS-04.Son, T., C., and Pontelli, E. 2004. Planning with preferences us-ing logic programming. In Proc. of LPNMR-04. Springer-Verlag.LNAI 2923.

ICAPS 2006


The Benchmark Domains of the Deterministic Part of IPC-5

Yannis Dimopoulos+ Alfonso Gerevini? Patrik Haslum◦ Alessandro Saetti?+ Department of Computer Science, University of Cyprus, Nicosia, Cyprus

? Department of Electronics for Automation, University of Brescia, Brescia, Italy◦ Department of Computer and Information Science, Linköping University, Linköping, Sweden

[email protected] ?{gerevini,saetti}@ing.unibs.it ◦[email protected]

Abstract

We present a set of planning domains and problemsthat have been used as benchmarks for the fifth Inter-national planning competition. Some of them were in-spired by different types of logistics applications, otherswere obtained by encoding known problems from op-eration research and bioinformatics. For each domain,we developed several variants using different fragmentsof PDDL3 with increasing expressiveness.

IntroductionThe language of the fifth International planning com-petition (IPC-5), PDDL3.0 (Gerevini & Long 2005), isan extension of the previous versions of PDDL (Fox &Long 2003; Edelkamp & Hoffmann 2004) that aims ata better characterization of plan quality. The new lan-guage allows us to express strong and soft constraintson plan trajectories (i.e., constraints over intermediatestates reached by the plan), as well as strong and softproblem goals. Strong trajectory constraints and goalsmust be satisfied by any valid plan, while soft trajec-tory constraints and goals (called preferences) expressdesired constraints and goals, which do not necessarilyhave to be achieved. In PDDL3.0, the plan metric ex-pression can include weighted penalty terms associatedwith the violation of the soft trajectory constraints andgoals in the problem.

This paper gives an informal presentation of thebenchmark domains and problems that we developedfor IPC-5, and that include most of the new features ofPDDL3.0.1 We designed five new domains, as well assome new variants of two domains that have been usedin previous planning competitions. In order to makethe language more accessible to the the IPC-5 competi-tors, we developed for each domain several variants,using different fragments of PDDL3.0. The “proposi-tional” and “metric-time” variants use only the con-structs of PDDL2.2 (Edelkamp & Hoffmann 2004); the“simple preferences” variant extends the propositional

1A detailed description of the IPC-5 benchmarks isoutside the scope of this short paper; their PDDLformalization is available from the IPC-5 website:http://ipc5.ing.unibs.it.

with preferences over the problem goals; the “qual-itative preferences” variant also includes preferencesover state trajectory constraints; the “metric-time con-straints” variant extends the metric-time variant withstrong state trajectory constraints; and, finally, the“complex preferences” variant uses the full power ofthe language, including soft trajectory constraints andgoals. However, not all the different variants of each do-main actually use the full fragment “allowed” for thatvariant.

In the domain variants involving preferences we cre-ated for each planning problem a plan metric incorpo-rating terms specifying the penalties for violations ofthe preference. The metric is a very important part ofthe problem statements in such domains, since it deter-mines which is the best trade-off between different, per-haps mutually exclusive, preferences, and we tried withmuch care to ensure that the metrics in the test prob-lems give rise to challenging optimization problems.

The IPC-5 test domains have different motivations.Some of them were inspired by real world applications,(e.g., storage, trucks and pathways); others wereaimed at exploring the applicability and effectiveness ofautomated planning for new applications (pathways),or for known problems that have been addressed inother fields of computer science (TPP and openstacks);finally, two domains were taken from previous competi-tions, as sample references for the advancement of auto-mated planning with respect to the existing benchmarks(rovers and pipesworld).

For some domains, the problems we generated havemany solutions. In these problems, the most chal-lenging aspect is finding plans of good quality. Otherproblems are challenging for different reasons: the ex-pressiveness of the planning language used to modelthe problem including some of the new features ofPDDL3.0, the large size of the problem, or the knownNP-hardness of the computational problem they model.In most cases, the test problems were automatically (orsemi-automatically) generated by using dedicated soft-ware tools.

ICAPS 2006


The Travelling Purchaser DomainThis is a relatively recent planning domain that hasbeen investigated in operations research (OR) for sev-eral years, e.g., (Riera-Ledesma & Salazar-Gonzalez2005). The Travelling Purchaser Problem (TPP) is aknown generalization of the Travelling Salesman Prob-lem, and is defined as follows. We have a set of productsand a set of markets. Each market can provide a limitedamount of each product at a known price. The TPPconsists in selecting a subset of markets such that agiven demand for each product can be purchased, min-imizing the combined travel and purchase cost. Thisproblem arises in several applications, mainly in rout-ing and scheduling contexts, and it is NP-hard. In OR,computing optimal or near optimal solutions for theTPP instances is still an active research topic.

For IPC-5, we have formalized several variants of thisdomain in PDDL. One of them is equivalent to the orig-inal TPP, while the others are different formulations orsignificant (we believe and hope) extensions. In all thesedomain variants, plan quality is important, although forsome instances even finding an arbitrary solution couldbe quite difficult for a fully-automated planner.

For this domain, we developed both a metric versionwithout time and a metric-time version. We begin thedescription with the metric version because it is the oneequivalent to the original formulation of the TPP.

MetricThis version is equivalent to the original formulation ofthe TPP in OR. There are only three operators, two ofwhich are used to model the purchasing actions: “buy-all” and “buy-allneeded”. The first buys at a certainmarket (?m) the whole amount of a type of goods (?g)sold by the market (?m and ?g are operator parameters);while the second one buys at ?m the amount of ?g thatis needed to complete the purchase of ?g (as specifiedin the problem goals). In this version, every marketis directly connected to every other market and to thedepots. Moreover, there is only one depot and only onetruck.

PropositionalThis version models a variant of the original TPPwhere: (1) there can be more than one depot and morethan one truck; (2) the amount of goods are discreteand represented by qualitative levels; (3) every type ofgoods has the same price, independent from the mar-ket where we buy it; (4) there are two new operators forloading and unloading goods to/from trucks; (5) mar-kets and depots can be indirectly connected.

Simple PreferencesThe operators in this domain are the same as in thepropositional version. The difference is in the goals,which are all soft goals (preferences). These prefer-ences concern maximizing the level of goods that arestored in the depots, constraints between the levels of

different stored goods, and the safety condition that allpurchased goods are stored at some market.

Qualitative PreferencesThe operators in this version are the same as in thepropositional version. All goals are preferences con-cerning maximizing, for every type of goods, the pur-chased and stored levels. This version includes prefer-ences over trajectory constraints. These are constraintsbetween the levels of two types of stored goods; con-straints about the use of the trucks for loading goods;constraints imposing the use of every truck. Moreover,we have the preference that in the final state all pur-chased goods are stored at some depot.

Metric-TimeWith respect to the simpler metric version, which isequivalent to the original formulation of the TPP, thisversion has the the following main differences: sameas points (1), (4), (5) illustrated in the description ofthe propositional variants; each action has a durationand the plan quality is a linear combination of total-time (makespan) and the total cost of traveling andpurchasing; the operator “buyall” has a “rebate” rate(if you buy the whole amount of a type of goods thatis sold at a market, then you have a discount).

Metric-Time ConstraintsThe operators in this version are the same as in themetric-time version. In addition, in the domain file, wehave some strong constraints imposing that in the fi-nal state all purchased goods are stored, every marketcan be visited by at most one truck at the same time,every truck is used. Moreover, in the problem speci-fication, we have several strong constraints about therelative amounts of different types of goods stored in adepot, the number of times a truck can visit a market,the order in which goods should be stored, the orderin which we should store some type of goods and buyanother one, and deadlines about delivering goods oncethey have been loaded in a truck.

Complex PreferencesThe operators in this version are the same as in themetric-time version. In addition, it contains many pref-erences over state trajectory constraints that are similarto those used for the metric-time constraints version.

The Openstacks DomainThe openstacks domain is based on the “minimum max-imum simultaneous open stacks” combinatorial opti-mization problem, which can be stated as follows:

A manufacturer has a number of orders, each for acombination of different products, and can only makeone product at a time. The total required quantity ofeach product is made at the same time (because chang-ing from making one product to making another re-quires a production stop). From the time that the first

ICAPS 2006


product included in an order is made to the time that allproducts included in the order have been made, the or-der is said to be “open” and during this time it requiresa “stack” (a temporary storage space). The problem isto order the making of the different products so thatthe maximum number of stacks that are in use simulta-neously, or equivalently the number of orders that arein simultaneous production, is minimized (because eachstack takes up space in the production area).

This problem, and many related variants, have beenstudied in operations research (see, e.g., Fink & Voss1999). It is known to be NP-hard, and equivalent toseveral other problems (Linhares & Yanasse 2002). Thisis a pure optimization problem: for any instance of theproblem, every ordering of the making of products is asolution, which at worst uses as many simultaneouslyopen stacks as there are orders. Thus, finding a planis quite trivial (in the sense that there exists a domain-specific linear-time algorithm that solves the problem),but finding a plan of high quality is hard (even for adomain-specific algorithm).

The openstacks problem was recently posed as a chal-lenge problem for the constraint programming commu-nity, and, as a result, a large library of problem in-stances, together with results on those instances for anumber of different solution approaches, are available(see Smith & Gent (2005)).

Propositional

This variant is simply an encoding of the original open-stacks problem as a planning problem. The encodingis done in such a way that minimizing the length (se-quential or parallel) of the plan also minimizes the ob-jective function, i.e., the maximum number of simulta-neously open stacks. There are three basic actions tostart orders, make products, and ship orders once theyare completed, plus an action that “opens” a new stack,but in order to ensure the correspondance between par-allel length and the objective function, some of theseactions are split in two parts. The domain formulationuses some ADL constructs (quantified disjunctive pre-conditions), but these can be compiled away with onlya linear increase in size.

The problems are a selection of the problems usedin the constraint modelling challenge, including a fewproblems that could not be solved (optimally) by anyof the CSP approaches, plus a small number of extrasmall instances.

Time

In this variant of the domain the number of availablestacks is fixed, and the objective is instead to minimizemakespan. Makespan is dominated by the actions thatmake products. The number of stacks is for each prob-lem chosen to be somewhere between the optimal andthe trivial upper bound (equal to the number of orders).

Metric-Time

In this variant, the objective function is to minimizea (linear) combination of the number of open stacksand the plan makespan. The number of open stacks ismodelled using numeric fluents.

Simple Preferences

In this variant, the goal of including all required prod-ucts in each order is softened, and a “score” (or “re-ward”) is instead given for each product that is includedin an order when it is shipped. The objective is to max-imize this score. The maximum number of open stacksis fixed, like in the temporal variant, but at a numberslightly less than the optimal number required to satisfyall the requirements of all orders.

This version of the domain uses an ADL construct (aquantified conditional effects) that can only be compiledaway at an exponential increase in problem size.

Complex Preferences

This version, like the previous, has soft goals, but alsoa variable maximum number of open stacks. The ob-jective is to maximize a linear combination of the score(positive) and the number of open stacks (negative).Also like the previous version, the formulation uses aquantified conditional effect.

The Storage Domain

“Storage” is a planning domain involving spatial rea-soning. Basically, the domain is about moving a certainnumber of crates from some containers to some depotsby hoists. Inside a depot, each hoist can move accord-ing to a specified spatial map connecting different areasof the depot. The test problems for this domain involvedifferent numbers of depots, hoists, crates, containers,and depot areas. While in this domain it is importantto generate plans of good quality, for many test prob-lems, even finding any solution can be quite hard fordomain-independent planners.

Altogether, the different variants of this domain, in-volve almost all the new features of PDDL3.0. Notethat this domain is basically a propositional domain,where the space for storing crates is represented byPDDL literals. For this domain, instead of a metric-time version, we have a “time-only” version (withoutnumerical fluents).

Propositional

The domain has five different actions: an action forlifting a crate by a hoist, an action for dropping a crateby a hoist, an action for moving a hoist into a depot,an action for moving a hoist from one area of a depotto another one, and finally an action for moving a hoistoutside a depot.

ICAPS 2006


TimeThis variant is basically the propositional variant wherethe actions have duration and the plan quality is total-time (plan makespan).

Simple PreferenceThe operators in this domain are the same as those inthe propositional version. The main difference is in thegoals. All goals are soft goals (preferences). These pref-erences concern which depots and depot areas should beused for storing the crates, the desire that only “com-patible” crates are stored in the same depot, the desirethat the incompatible crates stored in the same depotare located at non-adjacent areas of the depot and, fi-nally, the desire that the hoists are located in depotsdifferent from those where we store the crates.

Qualitative PreferencesThe operators in this domain are the same as those inthe propositional version. The differences are in thepreferences over the goals and state trajectory con-straints. All goals are soft goals similar to some ofthe soft goals specified in the simple preferences vari-ant. The preferences over trajectory constraints con-cern constraints about the use of the available hoistsfor moving the crates, and about the order in whichcrates are stored in the depots. Moreover, we have thepreference that in any state crossed by the plan, theadjacent areas in a depot can be occupied only by com-patible crates.

Time ConstraintsThe operators in this version are the same as thosein the temporal version. The problem goals are speci-fied by an “at-end” constraint imposing that all cratesare stored in a depot. The problems have several con-straints imposing that a crate can be lifted at most once,ordering constraints about storing certain crates beforeothers, deadlines for storing the crates, and maximumtime a hoist can stay outside a depot. There are alsoconstraints imposing a safety condition, that in the fi-nal state, all hoists are inside a depot; some constraintsimposing that every hoist is used; and some constraintsimposing that incompatible crates are not stored at ad-jacent areas of the depot.

Time PreferencesThe operators in this version are the same as those inthe temporal version. In addition, this version containsmany preferences over state trajectory constraints thatare similar to those used for the time constraints ver-sion.

The Trucks DomainEssentially, this is a logistics domain about movingpackages between locations by trucks under certain con-straints. The loading space of each truck is organizedby areas: a package can be (un)loaded onto an area

of a truck only if the areas between the area underconsideration and the truck door are free. Moreover,some packages must be delivered within a deadline. Inthis domain, it is important to find good quality plans.However, for many test problems, even finding one plancould be a rather difficult task.

Like the Storage domain, this domain has a “time-only” variant instead of a metric-time variant (i.e., thereare no numerical fluents). The other variants make ex-tensive use of the new features of PDDL3.0. We startthe description from the time constraint version, be-cause it is the one closest to a realistic problem.

Time Constraints

The domain has four different actions: an action forloading a package into a truck, one for unloading a pack-age from a truck, one for moving a truck, and finallyone for delivering a package. The durations of load-ing, unloading and delivering packages are negligiblecompared to the durations of the driving actions. Theproblem goals require that certain packages are at theirfinal destinations by certain deadlines. For this variant,we also created an equivalent version, “Time-TIL”, inwhich the trajectory constraints of type “within” arecompiled into timed initial literals. Each competingteam is free to choose one of the two alternative vari-ants.

Time

The operators are the same as those in the time con-straints version, but there is no deadline for deliveringpackages. Finding a valid plan in this version is signif-icantly easier, but finding a plan with short makespanis still challenging.

Complex Preferences

The operators in this version are the same as those inthe constraints version. The deadlines are modeled bypreferences. Moreover, this version contains preferencesover trajectory constraints. These are constraints im-posing some ordering about when delivering packages,constraints about the usage of the areas in the trucks,and constraints about loading packages.

Propositional

The operators in this version are similar to those inthe constraints version, with the main difference thattime is modeled as a discrete resource (with a fixednumber of levels). Moreover, the driving actions cannotbe executed concurrently.

Simple Preferences

The operators in this domain are the same as thosein the propositional version. The difference concernsthe problem goals where the delivering deadlines aremodeled by preferences.

ICAPS 2006


Qualitative PreferencesThe operators in this domain are the same as thosein the propositional version. The difference concernsthe problems goals including soft delivering deadlines.Moreover, this version includes many preferences overstate trajectory constraints that are similar to thoseused for the complex preferences version.

The Pathways DomainThis domain is inspired by the field of molecular biol-ogy, specifically biochemical pathways. “A pathway isa sequence of chemical reactions in a biological organ-ism. Such pathways specify mechanisms that explainhow cells carry out their major functions by means ofmolecules and reactions that produce regular changes.Many diseases can be explained by defects in pathways,and new treatments often involve finding drugs that cor-rect those defects.” (Thagard 2003) We can model partsof the functioning of a pathway as a planning problemby simply representing chemical reactions as actions.The goal in these planning problems is to construct asequence of reactions that produces one or more sub-stances, using a limited number of substances as input.The planner is partly free to choose which input sub-stances to use, i.e., to choose some aspects of the initialstate of the problem. This aspect of the problem ismodelled by means of additional actions.

The biochemical pathway domain of the competitionis based on the pathway of the Mammalian Cell CycleControl as it described in (Kohn 1999) and modelled in(Chabrier 2003). There are three different kinds of basicactions corresponding to the different kinds of reactionsthat can appear in a pathway.

PropositionalThis is a simple qualitative encoding of the reactionsof the pathway. The domain has five different actions:an action for choosing the initial substances, an actionfor increasing the quantity of a chosen substance (inthe propositional version, quantity coincides with pres-ence, and it is modeled through a predicate indicatingif a substance is available or not), an action model-ing biochemical association reactions, an action mod-eling biochemical association reactions requiring cata-lysts, and an action modeling biochemical synthesis re-actions. Also, there is an additional set of “dummy”actions used to encode the disjunctive problem goals.

The goals refer to substances that must be synthe-sized by the pathway, and are disjunctive with two dis-juncts each. Furthermore, there is a limit on the num-ber of input substances that can be used by the path-way.

Simple PreferencesThis is similar to the propositional version, with thedifference that both the products that must be syn-thesized by the pathway and the number of the inputreactants that are used by the network are turned into

preferences. The challenge here is finding plans thatachieve a good tradeoff between the different kinds ofpreferences.

Metric-TimeIn this version of the domain, reactions have differentdurations. The reactions can only happen if their inputreactants reach some concentration level, and reactionsgenerate their products in specific quantities. The goalsin this version are summations of substance concentra-tions that must be generated by the reactions of thepathway. The plan metric minimizes some linear com-bination of the number of input substances and the planduration.

Complex PreferencesThis is an extension of the metric-time version with dif-ferent preferences concerning the concentration of sub-stances of the pathway, or the order in which substancesare produced. The metric is a combination of these pref-erences, the number of substances used and the planmakespan.

The Extended Rovers Domain

The Rovers domain was introduced in the 2002 planningcompetition (Long & Fox 2003). It models the problemof planning for a group of planetary rovers to explorethe planet they are on (taking pictures and samplesfrom interesting locations).

Propositional and Metric-TimeThe propositional and metric-time versions of the do-main are the same as in IPC 2002, with the addition ofsome planning problems.

The domain has nine different actions: an action formoving rovers on a planet surface, two actions for sam-pling soil and rock, an action for dropping rock or soil,an action for calibrating rover instruments, an action fortaking image of interesting objective, and finally threeactions for transmitting soil data, rock data or imagedata.

Qualitative PreferencesThis is the IPC 2002 propositional version with softtrajectory constraints added (constraint types always,sometime and at-most-once are used). The objective issimply to maximize the number of preferences satisfied.The preferences are “artificial”, in the sense that theydo not encode any “real” preferences on the plan, butare constructed in a way as to make the problem ofmaximizing the satisfaction of preferences challenging.

Metric Simple PreferencesThis version is a special case of the complex preferencesversion, which has preferences only on the goals of theproblem.

ICAPS 2006


This version of the domain poses a so-called “net ben-efit” problem: goals (atoms, and in some cases conjunc-tion of atoms) have values and actions have cost, andthe objective is to maximize the sum values of achievedgoals minus the sum of costs of actions in the plan.Only the actions that move the rovers have non-zerocost. The domain uses simple (goal state) preferencesto encode goal values and fluents to encode action costs.There are three different sets of problems, with some-what different properties. In the first, goals are inter-fering, meaning that the cost of achieving any two goalsis greater than the sum of achieving them individually.The second has instead synergy between the goals, i.e.,the cost of achieving several goals is less than the sumof achieving each of them separately, while the thirdcontains goals with relationships of both kinds.

The Extended Pipesworld Domain

The Pipesworld domain was introduced in the previousplanning competition (Hoffmann & Edelkamp 2005).It models the transportation of batches of petroleumproducts in a network of pipelines.

Propositional and TimeThe propositional and temporal versions of the domainare the “tankage” variant of the domain used in IPC2004 The domain has six actions: two actions for mov-ing a batch from a tankage to a pipeline segment (onefor the start and one for the end of the activity), twoactions for moving a batch from a tankage to a pipelinesegment, and two actions for moving a batch from atankage (or pipeline segment) to a pipeline segment (ortankage) in case the pipes consist of only one segment.

Time ConstraintsThe time constraints variant is based on the temporalno-tankage variant from IPC 2004, but adds hard dead-lines on when each of the goals must be reached. Dead-lines are specified using the PDDL3 within constraint.The problems also have a number of “triggered” dead-line constraints, specified with PDDL3 always-withinconstraint.

Complex PreferencesThis variant is similar to the previous, but has softdeadlines instead, encoded with preferences on the con-straints. Each goal can have several (increasing) dead-line, with different (increasing) penalties for missingthem.

Conclusions

We have given an informal description of the benchmarkdomains that we developed for the deterministic partof the 2006 International Planning Competition. Thegeneral aim was to create a new set of problems for theplanning community involving new and interesting –and hopefully also useful – issues, in particular planning

with (possibly contradicting) preferences over problemgoals and state trajectory constraints.

Several competing teams have declared their thattheir planners are capable of handling parts of the ex-tended PDDL3 language. At the time of writing, bench-mark tests are still being run. In addition to their usefor the competition, we hope that the new benchmarkswill provide a challenging extension to the existing setof planning benchmarks, both those involving PDDL3constructs and those that can be specified through theprevious versions of PDDL.

ReferencesChabrier, N. 2003. http://contraintes.inria.fr/BIOCHAM/EXAMPLES/∼cell cycle/cell cycle.bc.Edelkamp, S., and Hoffmann, J. 2004. PDDL2.2: Thelanguage for the classic part of the 4th internationalplanning competition. Technical Report 195, Institutfür Informatik, Freiburg, Germany.Fink, A., and Voss, S. 1999. Applications of modernheuristic search methods to pattern sequencing prob-lems. Computers & Operations Research 26:17 – 34.Fox, M., and Long, D. 2003. PDDL2.1: An ex-tension to PDDL for expressing temporal planningdomains. Journal of Artificial Intelligence Research(JAIR) 20:pp. 61–124.Gerevini, A., and Long, D. 2005. Plan constraints andpreferences in PDDL3. Technical report rt-2005-08-47,Universitá di Brescia, Dipartimento di Elettronica perl’Automazione.Hoffmann, J., and Edelkamp, S. 2005. The deter-ministic part of IPC-4: An overview. Journal of AIResearch 24:519 – 579.Kohn, K. 1999. Molecular interaction map of themammalian cell cycle control and dna repair systems.Mol Biol Cell 10(8).Linhares, A., and Yanasse, H. 2002. Connection be-tween cutting-pattern sequencing, VLSI design andflexible machines. Computers & Operations Research29:1759 – 1772.Long, D., and Fox, M. 2003. The 3rd internationalplanning competition: Results and analysis. Journalof Artificial Intelligence Research 20:1 – 59.Riera-Ledesma, J., and Salazar-Gonzalez, J., J. 2005.A heuristic approach for the travelling purchaserproblem. European Journal of Operational Research160(3):599–613.Smith, B., and Gent, I. 2005. Constraint mod-elling challenge 2005. http://www.dcs.st-and.ac.uk/∼ipg/challenge/.Thagard, P. 2003. Pathways to biomedical discovery.Philosophy of Science 70.

ICAPS 2006


Planning with Temporally Extended Preferences by Heuristic Search

Jorge Baier and Jeremy Hussell and Fahiem Bacchus and Sheila McIlraithDepartment of Computer Science

University of TorontoToronto, Canada

[jabaier�hussell

�fbacchus

�sheila]@cs.toronto.edu

Abstract

In this paper we describe a planner that extends the TLPLANsystem to enable planning with temporally extended prefer-ences specified in PDDL3, a variant of PDDL that includesdescriptions of temporal plan preferences. We do so by com-piling preferences into nondeterministic finite state automatawhose accepting conditions denote achievement of the prefer-ence described by the automaton. Automata are representedin the planning problem through additional predicates andactions. With this compilation in hand, we are able to usedomain-independent heuristics to guide TLPLAN towardsplans that realize the preferences. We are entering our plan-ner in the qualitative preferences track of IPC5, the 2006 In-ternational Planning Competition. As such, the planner de-scription provided in this paper is preliminary pending finaladjustments in the coming weeks.

IntroductionStandard goals in planning allow us to distinguish betweenplans that satisfy the goal and those that do not, however,they fail to discriminate between the quality of different suc-cessful plans. Preferences, on the other hand, express infor-mation about how “good” a plan is thus allowing us to distin-guish between desirable successful plans and less desirablesuccessful plans.

PDDL3 (Gerevini & Long 2005) is an extension of previ-ous planning languages that includes facilities for express-ing preferences. It was designed in conjunction with the2006 International Planning Competition. One of the keyfeatures of PDDL3 is that it supports temporally extendedpreference statements, i.e., statements that express prefer-ences over sequences of events. In particular, in the qualita-tive preferences category of the planning competition pref-erences can be expressed with temporal formulae that area subset of LTL (linear temporal logic). A plan satisfies apreference whenever the sequence of states generated by theplan’s execution satisfies the LTL formula representing thepreference.

PDDL3 allows each planning instance to specify aproblem-specific metric used to compute the value of a plan.For any given plan, over the course of its execution variouspreferences will be violated or satisfied with some prefer-ence perhaps being violated multiple times. The plan valuemetric can depend on the preferences that are violated and

the number of times that they are violated. The aim in solv-ing the planning instance is to generate a plan that has thebest metric value, and to do this the planner must be able to“monitor” the preferences to determine when and how manytimes different preferences are being violated. Furthermore,the planner must be able to use this information to guide itssearch so that it can find best-value plans.

We have crafted a preference planner that uses varioustechniques to find best-value plans. Our planner is basedon the TLPLAN system (Bacchus & Kabanza 1998), ex-tending TLPLAN so that fully automated heuristic-guidedsearch for a best-value plan can be performed. We use twotechniques to obtain heuristic guidance. First, we translatetemporally extended preference formulae into nondetermin-istic finite state automata that are then encoded as a new setof predicates and action effects. When added to the exist-ing predicates and actions, we thus obtain a new planningdomain containing only standard ADL-operators. Second,once we have recovered a standard planning domain we canuse a modified relaxed plan heuristic to guide search. Inwhat follows, we describe our translation process and theheuristic search techniques we use to guide planning. Weconclude with a brief discussion of related work.

Translation of LTL to Finite State AutomataTLPLAN already has the ability to evaluate LTL formulaeduring planning. It was originally designed to use such for-mulae to express search control knowledge. Thus one couldsimply express the temporally extended preference formulaein TLPLAN directly and have TLPLAN evaluate these for-mulae as it generates plans. The difficulty, however, is thatthis approach is by itself not able to provide any heuristicguidance. That is, there is no obvious way to use the par-tially evaluated LTL formulae maintained by TLPLAN toguide the planner towards satisfying these formulae (i.e., tosatisfy the preferences expressed in LTL).

Instead our approach is to use the techniques presentedin (Baier & McIlraith 2006) to convert the temporal formu-lae into nondeterministic finite state automata. Intuitivelythe states of the automata “monitor” progress towards sat-isfying the original temporal formula. In particular, as theworld is updated by actions added to the plan, the state ofthe automata is also updated dependent on changes made tothe world. If the automata enters an accepting state then the

ICAPS 2006


sequence of worlds traversed by the partial plan has satisfiedthe original temporal preference formula.

There are various issues involved in building efficient au-tomata from an arbitrary temporal formula, and more detailsare provided in (Baier & McIlraith 2006). However, oncethe automaton is built, we can integrate it with the planningdomain by creating an augmented planning domain. In theaugmented domain there is a predicate specifying the cur-rent set of states that the automata could be in (it is a non-deterministic automata so there are a set of current states).Moreover, for each automata, we have a single predicate (theaccepting predicate) that is true iff the automata has reachedan accepting condition, denoting satisfaction of the prefer-ence. In addition, we define a post-action update sequenceof ADL operators, which take into account the changes justmade to the world and the current state of the automata inorder to compute the new set of possible automata states.This post-action update is performed immediately after anyaction of the domain is performed. TLPLAN is then askedto generate a plan using the new augmented domain.

To deal with multiple preference statements, we apply thismethod to each of the preferences in turn. This generatesmultiple automata, and we combine all of their updates intoa single ADL action (actually to simplify the translation weuse a pair of ADL actions that are always executed in se-quence).

A number of refinements must be made however to dealwith some of the special features of PDDL3. First, inPDDL3 a preference can be scoped by a universal quanti-fier. Such preferences act as parameterized preference state-ments, representing a set of individual preference statementone for each object that is a legal binding of the universalvariable. To avoid the explosion of automata that wouldoccur if we were to generate an distinct automata for eachbinding, we translate such preferences into “parameterized”automata. In particular, instead of having a predicate de-scribing the current set of states the automata could be in, wehave a predicate with extra arguments which specifies whatstate the automata could be in for different objects. Simi-larly, the automata update actions generated by our translatorare modified so that they can handle the update for all of theobjects through universally quantified conditional effects.

Second, PDDL3 allows preference statements in actionpreconditions. These preferences refer to conditions thatmust ideally hold true immediately before performing an ac-tion. These conditions are not temporal, i.e., they refer onlyto the state in which the action is performed. Therefore, wedo not model these preferences using automata but rather asconditional effects of the action. If the preference formuladoes not hold and the action is performed, then, as an effectof the action, a counter is incremented. This counter, repre-senting the number of times the precondition preference isviolated, is used to compute the metric function, describedbelow.

Third, PDDL3 specifies its metric using an “is-violated”function. The is-violated function takes as an argumentthe name of a preference type, and returns the number oftimes preferences of this type were violated. Individualpreferences are either satisfied or violated by the current

plan. However, many different individual preferences canbe grouped into a single type. For example, when a prefer-ence is scoped by a universal quantifier, all of the individualpreference statements generated by different bindings of thequantifier yield a preference of the same type. Thus the is-violated function must be able to count the number of theseprefe

Date post:	29-Jan-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

The English Lake District, Cumbria, UK · 2015. 11. 2. · The English Lake District, Cumbria, UK....

Documents