Strategic knowledge has traditionally been specified using procedural programming languages or dynamic logic [Harel1984] [Harmelen & Balder1992]. This paper proposes a representation scheme for the declarative formalization of strategies for action selection based on the situation calculus [McCarthy & Hayes1969] and circumscription [McCarthy1980] [McCarthy1986].
The idea of representing strategies as sets of action selection rules [Genesereth & Hsu1989] is explored. An action selection rule is an implication whose antecedent is a formula of the situation calculus, and whose consequent may take one of the following forms: or . Action selection rules are interpreted as follows: if the conditions of the antecedent hold, then performing action a at situation s is good, bad or better than performing action for the purpose of achieving the goal described by situation .
The following action selection rules describe some heuristics for moving blocks in order to solve planning problems in the blocks world: () If a block can be moved to final position, this should be done right away; () If a block is not in final position and cannot be moved to final position, it is better to move it to the table than anywhere else; () If a block is in final position, do not move it; () If there is a block that is above a block it ought to be above in the goal configuration but it is not in final position (tower-deadlock), put it on the table. A consistent set of action selection rules defines a particular strategy.
In addition to sets of action selection rules describing particular strategies, we need some axioms that allow a program to understand the predicates good, bad and better in terms of action selection. The predicate Better establishes a partial order among a set of actions with respect to a given goal and a particular situation. The following axiom says that an action is bad for a given goal and a particular situation if there is a better action for the same goal and situation.
An important issue in reasoning about strategies is to foresee their consequences on action selection. We will call the problem of determining what sequences of actions can be selected according to a given strategy the problem of projecting the strategy. The projection problem for strategies addresses the issue of characterizing the behavior of a particular strategy (or a class of strategies) when applied to solve a specific problem (or a class of problems). We talk about the projection problem of a strategy to distinguish it from the projection problem of the theory of action the strategy is about, which addresses the issue of characterizing the effects of the actions as opposed to the effects of the strategy on action selection.
The following axiom addresses this issue by defining a fluent called Selectable. A situation is selectable iff: (1) it is the initial situation; or (2) it is the result of performing a good action in a selectable situation at which the goal has not been achieved; or (3) it is the result of performing a non-bad action in a selectable situation at which the goal has not been achieved and for which there are no good actions.
The introduction of axiom describing the behavior of the fluent Selectable allows us to apply the non-monotonic machinery developed for reasoning about action to the problem of reasoning about declarative formalizations of strategies for action selection. In the same way circumscription can be used to jump to the conclusion that a fluent does not change unless stated otherwise (i.e., to solve the frame problem), it can be used to assume that an action is ``not good'' or ``not bad'' unless it can be deduced from the set of axioms describing a strategy that it is so. This use of circumscription has some representational advantages, it allows us to describe strategies: (1) succinctly, since negative information (i.e., which actions are not good, not bad or not better than others) need not be specified; (2) according to a least commitment strategy, in which it is not necessary to state that an action is good, bad or better than another unless it is known for sure; and (3) incrementally, because the application of circumscription is designed in such a way that the conclusions about the selectability of different situations adapt automatically to the addition of consistent heuristics which may become available later on.