In our opinion, a major source of problems in defining mental and intensional concepts is the weakness of the methods of definition that have been explicitly used. We introduce two kinds of definition: definition relative to an approximate theory and second order structural definition and apply them to defining mental qualities.
3.1. Definitions Relative to an Approximate Theory
It is commonplace that most scientific concepts are not defined by isolated sentences of natural languages but rather as parts of theories, and the acceptance of the theory is determined by its fit to a large collection of phenomena. We propose a similar method for explicating mental and other common sense concepts, but a certain phenomenon plays a more important role than with scientific theories: the concept is meaningful only in the theory, and cannot be defined with more precision than the theory permits.
The notion of one theory approximating another needs to be formalized. In the case of physics, one can think of various kinds of numerical or probabilistic approximation. I think this kind of approximation is untypical and misleading and won't help explicate such concepts as intentional action as meaningful in approximate theories. Instead it may go something like this:
Consider a detailed theory T that has a state variable s. We may imagine that s changes with time. The approximating theory has a state variable . There is a predicate whose truth means that is applicable when the world is in state s. There is a relation which asserts that corresponds to the state s. We have
Certain functions , , etc. have corresponding functions , , etc. We have relations like
However, the approximate theory may have additional functions , etc. that do not correspond to any functions of s. Even when it is possible to construct gs corresponding to the s, their definitions will often seem arbitrary, because the common sense user of will only have used it within the context of . Concepts whose definition involves counterfactuals provide examples.
Suppose we want to ascribe intentions and free will and to distinguish a deliberate action from an occurrence. We want to call an output a deliberate action if the output would have been different if the machine's intentions had been different. This requires a criterion for the truth of the counterfactual conditional sentence If its intentions had been different the output wouldn't have occurred, and we require what seems to be a novel treatment of and important class of counterfactuals.
We treat the ``relevant aspect of reality'' as a Cartesian product so that we can talk about changing one component and leaving the others unchanged. This would be straightforward if the Cartesian product structure existed in the world; however, it usually exists only in certain approximate models of the world. Consequently no single definite state of the world as a whole corresponds to changing one component. The following paragraphs present these ideas in greater detail.
Suppose A is a theory in which some aspect of reality is characterized by the values of three quantities x, y and z. Let f be a function of three arguments, let u be a quantity satisfying u = f(x,y,z), where f(1,1,1) = 3 and f(2,1,1) = 5. Consider a state of the model in which x = 1, y = 1 and z = 1. Within the theory A, the counterfactual conditional sentence ``u = 3, but if x were 2, then u would be 5'' is true, because the counterfactual condition means changing x to 2 and leaving the other variables unchanged.
Now let's go beyond the model and suppose that x, y and z are quantities depending on the state of the world. Even if u = f(x,y,z) is taken as a law of nature, the counterfactual need not be taken as true, because someone might argue that if x were 2, then y would be 3 so that u might not be 5. If the theory A has a sufficiently preferred status we may take the meaning of the counterfactual in A to be its general meaning, but it may sometimes be better to consider the counterfactual as defined solely in the theory, i.e. as syncategorematic in the Kantian jargon.
A common sense example may be helpful: Suppose a ski instructor says, ``He wouldn't have fallen if he had bent his knees when he made that turn'', and another instructor replies, ``No, the reason he fell was that he didn't put his weight on his downhill ski''. Suppose further that on reviewing a film, they agree that the first instructor was correct and the second mistaken. I contend that this agreement is based on their common acceptance of a theory of skiing, and that within the theory, the decision may well be rigorous even though no-one bothers to imagine an alternate world as much like the real world as possible but in which the student had put his weight on his downhill ski.
We suggest that this is often (I haven't yet looked for counter-examples) the common sense meaning of a counterfactual. The counterfactual has a definite meaning in a theory, because the theory has a Cartesian product structure, and the theory is sufficiently preferred that the meaning of the counterfactual in the world is taken as its meaning in the theory. This is especially likely to be true for concepts that have a natural definition in terms of counterfactuals, e.g. the concept of deliberate action with which we started this section.
In all cases that we know about, the theory is approximate and incomplete. Provided certain propositions are true, a certain quantity is approximately a given function of certain other quantities. The incompleteness lies in the fact that the theory doesn't predict states of the world but only certain functions of them. Thus a useful concept like deliberate action may seem to vanish if examined too closely, e.g. when we try to define it in terms of states of the world and not just in terms of certain functions of these states.
3.1.1. The known cases in which a concept is defined relative to an approximate theory involve counterfactuals. This may not always be the case.
3.1.2 It is important to study the nature of the approximations.
3.1.3 (McCarthy and Hayes 1969) treats the notion of X can do Y using a theory in which the world is regarded as a collection of interacting automata. That paper failed to note that sentences using can cannot necessarily be translated into single assertions about the world.
3.1.4 The attempt by old fashioned introspective psychology to analyze the mind into an interacting will, intellect and other components cannot be excluded on the methodological grounds used by behaviorists and postitivists to declare them meaningless and exclude them from science. These concepts might have precise definitions within a suitable approximate theory.
3.1.5 The above treatment of counterfactuals in which they are defined in terms of the Cartesian product structure of an approximate theory may be better than the closest possible world treatments discussed in (Lewis 1973). The truth-values are well defined within the approximate theories, and the theories can be justified by evidence involving phenomena not mentioned in isolated counterfactual assertions.
3.1.6 Definition relative to approximate theories may help separate questions, such as some of those concerning counterfactuals, into internal questions within the approximate theory and the external question of the justification of the theory as a whole. The internal questions are likely to be technical and have definite answers on which people can agree even if they have philosophical or scientific disagreements about the external questions.
3.2. Second Order Structural Definitions
Structural definitions of qualities are given in terms of the state of the system being described while behavioral definitions are given in terms of its actual or potential behavior.
If the structure of the machine is known, one can give an ad hoc first order structural definition. This is a predicate B(s,p) where s represents a state of the machine and p represents a sentence in a suitable language, and B(s,p) is the assertion that when the machine is in state s, it believes the sentence p. (The considerations of this paper are neutral in deciding whether to regard the object of belief as a sentence or to use a modal operator or to admit propositions as abstract objects that can be believed. The paper is written as though sentences are the objects of belief, but I have more recently come to favor propositions and discuss them in (McCarthy 1979).)
A general first order structural definition of belief would be a predicate B(W,M,s,p) where W is the ``world'' in which the machine M whose beliefs are in question is situated. I do not see how to give such a definition of belief, and I think it is impossible. Therefore we turn to second order definitions.
A second order structural definition of belief is a second order predicate . asserts that the first order predicate B is a ``good'' notion of belief for the machine M in the world W. Here ``good'' means that the beliefs that B ascribes to M agree with our ideas of what beliefs M would have, not that the beliefs themselves are true. The axiomatizations of belief in the literature are partial second order definitions.
In general, a second order definition gives criteria for evaluating an ascription of a quality to a system. We suggest that both our common sense and scientific usage of not-directly-observable qualities corresponds more losely to second order structural definition than to any kind of behavioral definition. Note that a second order definition cannot guarantee that there exist predicates B meeting the criterion or that such a B is unique. Some qualities are best defined jointly with related qualities, e.g. beliefs and goals may require joint treatment.
Second order definitions criticize whole belief structures rather than individual beliefs. We can treat individual beliefs by saying that a system believes p in state s provided all ``reasonably good'' B's satisfy B(s,p). Thus we are distinguishing the ``intersection'' of the reasonably good B's.
(An analogy with cryptography may be helpful. We solve a cryptogram by making hypotheses about the structure of the cipher and about the translation of parts of the cipher text. Our solution is complete when we have ``guessed'' a cipher system that produces the cryptogram from a plausible plaintext message. Though we never prove that our solution is unique, two different solutions are almost never found except for very short cryptograms. In the analogy, the second order definition corresponds to the general idea of encipherment, and B is the particular system used. While we will rarely be able to prove uniqueness, we don't expect to find two B's both satisfying ). [MH69] discusses the improbability of there being two good decompositions of an automaton into subautomata.
It seems to me that there should be a metatheorem of mathematical logic asserting that not all second order definitions can be reduced to first order definitions and further theorems characterizing those second order definitions that admit such reductions. Such technical results, if they can be found, may be helpful in philosophy and in the construction of formal scientific theories. I would conjecture that many of the informal philosophical arguments that certain mental concepts cannot be reduced to physics will turn out to be sketches of arguments that these concepts require second (or higher) order definitions.
Here is an approximate second order definition of belief. For each state s of the machine and each sentence p in a suitable language L, we assign truth to B(s,p) if and only if the machine is considered to believe p when it is in state s. The language L is chosen for our convenience, and there is no assumption that the machine explicitly represents sentences of L in any way. Thus we can talk about the beliefs of Chinese, dogs, corporations, thermostats, and computer operating systems without assuming that they use English or our favorite first order language. L may or may not be the language we are using for making other assertions, e.g. we could, writing in English, systematically use French sentences as objects of belief. However, the best choice for artificial intelligence work may be to make L a subset of our ``outer'' language restricted so as to avoid the paradoxical self-references of (Montague 1963).
We now subject B(s,p) to certain criteria; i.e. is considered true provided the following conditions are satisfied:
3.2.1. The set Bel(s) of beliefs, i.e. the set of p's for which B(s,p) is assigned true when M is in state s contains sufficiently ``obvious'' consequences of some of its members.
3.2.2. Bel(s) changes in a reasonable way when the state changes in time. We like new beliefs to be logical or ``plausible'' consequences of old ones or to come in as communications in some language on the input lines or to be observations, i.e. beliefs about the environment the information for which comes in on the input lines. The set of beliefs should not change too rapidly as the state changes with time.
3.2.3. We prefer the set of beliefs to be as consistent as possible. (Admittedly, consistency is not a quantitative concept in mathematical logic--a system is either consistent or not, but it would seem that we will sometimes have to ascribe inconsistent sets of beliefs to machines and people. Our intuition says that we should be able to maintain areas of consistency in our beliefs and that it may be especially important to avoid inconsistencies in the machine's purely analytic beliefs).
3.2.4. Our criteria for belief systems can be strengthened if we identify some of the machine's beliefs as expressing goals, i.e. if we have beliefs of the form ``It would be good if ...''. Then we can ask that the machine's behavior be somewhat rational, i.e. it does what it believes will achieve its goals. The more of its behavior we can account for in this way, the better we will like the function B(s,p). We also would like to regard internal state changes as changes in belief in so far as this is reasonable.
3.2.5. If the machine communicates, i.e. emits sentences in some language that can be interpreted as assertions, questions and commands, we will want the assertions to be among its beliefs unless we are ascribing to it a goal or subgoal that involves lying. We will be most satisfied with our belief ascription, if we can account for its communications as furthering the goals we are ascribing.
3.2.6. Sometimes we shall want to ascribe introspective beliefs, e.g. a belief that it does not know how to fly to Boston or even that it doesn't know what it wants in a certain situation.
3.2.7. Finally, we will prefer a more economical ascription B to a less economical one. The fewer beliefs we ascribe and the less they change with state consistent with accounting for the behavior and the internal state changes, the better we will like it. In particular, if , but not conversely, and B1 accounts for all the state changes and outputs that B2 does, we will prefer B1 to B2. This insures that we will prefer to assign no beliefs to stones that don't change and don't behave. A belief predicate that applies to a family of machines is preferable to one that applies to a single machine.
The above criteria have been formulated somewhat vaguely. This would be bad if there were widely different ascriptions of beliefs to a particular machine that all met our criteria or if the criteria allowed ascriptions that differed widely from our intuitions. My present opinion is that more thought will make the criteria somewhat more precise at no cost in applicability, but that they should still remain rather vague, i.e. we shall want to ascribe belief in a family of cases. However, even at the present level of vagueness, there probably won't be radically different equally ``good'' ascriptions of belief for systems of practical interest. If there were, we would notice unresolvable ambiguities in our ascriptions of belief to our acquaintances.
While we may not want to pin down our general idea of belief to a single axiomatization, we will need to build precise axiomatizations of belief and other mental qualities into particular intelligent computer programs.