Strategic knowledge has traditionally been specified using procedural programming languages or dynamic logic [Harel1984] [Harmelen & Balder1992]. This paper proposes a representation scheme for the declarative formalization of strategies for action selection based on the situation calculus [McCarthy & Hayes1969] and circumscription [McCarthy1980] [McCarthy1986].
The idea of representing strategies
as sets of action selection rules
[Genesereth & Hsu1989] is explored. An action
selection rule is an implication whose
antecedent is a formula of the
situation calculus, and whose consequent
may take one of the following forms:
or
.
Action selection rules are interpreted
as follows: if the conditions of the
antecedent hold, then performing action
a at situation s is good, bad
or better than performing action
for the purpose of achieving the
goal described by situation
.
The following action selection rules describe
some heuristics for moving blocks in order to
solve planning problems in the blocks world:
() If a block can be
moved to final position, this should be done
right away; (
) If a block is
not in final position and cannot be
moved to final position, it
is better to move it to the table than
anywhere else; (
) If a block is
in final position, do not move it;
(
) If
there is a block that is above a block it
ought to be above in the goal configuration
but it is not in final position
(tower-deadlock), put it on the
table
.
A consistent set of action
selection rules defines a particular
strategy.
In addition to sets of action selection rules
describing particular strategies,
we need some axioms that allow a program to
understand the predicates good, bad
and better in terms of action selection.
The predicate Better establishes a
partial order among a set of actions with
respect to a given goal and a particular
situation. The following axiom says that
an action is bad for a given goal and a
particular situation if there is a
better action for the same goal and
situation.
An important issue in reasoning about
strategies is to foresee their consequences
on action selection. We will call the
problem of determining what sequences
of actions can be selected
according to a given strategy
the problem of projecting the
strategy.
The projection problem for strategies
addresses the issue of characterizing the
behavior of a particular strategy (or a
class of strategies) when applied to solve a
specific problem (or a class of problems).
We talk about the projection problem
of a strategy to distinguish it from the
projection problem of the theory of
action the strategy is about, which
addresses the issue of characterizing the
effects of the actions as
opposed to the effects of the strategy on
action selection.
The following axiom addresses this issue by
defining a fluent called Selectable.
A situation is selectable iff: (1) it is the
initial situation; or (2) it is the result of
performing a good action in a selectable
situation at which the goal has not been
achieved;
or (3) it is the result of performing
a non-bad action in
a selectable situation at which the goal has
not been achieved and for which there are
no good actions.
The introduction of axiom
describing the behavior
of the fluent Selectable allows us to
apply the non-monotonic machinery developed
for reasoning about action to the problem of
reasoning about declarative formalizations
of strategies for action selection.
In the same way circumscription can be used
to jump to the conclusion that a fluent
does not change unless stated otherwise
(i.e., to solve the frame problem), it
can be used to assume that an action is
``not good'' or ``not bad'' unless it can be
deduced from the set of axioms describing
a strategy that it is so. This use of
circumscription has some representational
advantages, it allows us to describe strategies:
(1) succinctly, since negative information
(i.e., which actions are not good, not bad or not
better than others) need not be specified;
(2) according to a least commitment
strategy, in which it is not necessary to state
that an action is good, bad or better than
another unless it is known for sure; and (3)
incrementally, because the application of
circumscription is designed in such a way that
the conclusions about the selectability of
different situations adapt automatically to
the addition of consistent heuristics which
may become available later on.