From Minds and Machines 10 (2000): 321-360.
Reprinted with kind permission of
Kluwer Academic Publishers.
7
The Nature of Nonmonotonic Reasoning
1 Introduction
My friend Sarah is debating with herself about going to the
mall. she needs to do some shopping, but she dislikes crowds. She notes that it
is
At this hour of the day in the middle of the week, most adults with jobs will be at work, and most children will be in school. So, the mall will not be crowded.
But then Sarah remembers that it is the 15th of the month. In her town, all the merchants in the mall give a 25% discount to seniors on the 15th of the month. She reasons as follows:
But it is senior’s discount day at the mall. Seniors really like to take advantage of discounts because many of them are on fixed incomes, and seniors will not be at work. So, the mall will be crowded.
On further reflection, Sarah remembers reading in the paper that there is a serious flu epidemic in her area. It is so severe that just yesterday the Department of Health issued an advisory, warning all seniors to avoid crowds. She reasons as follows:
However, the health department just issued that flu advisory, and at their age, most seniors are pretty careful about their health. Hence most seniors will avoid shopping until the advisory is lifted. So, the mall will not be crowded.
Sarah’s opinions about whether or not the mall will be crowded switch back and forth as she recalls more and more relevant information. (For the very picky, we could retell the story having a concerned companion sequentially feed Sarah the additional pieces of information, rather than having Sarah recall them herself.) The pattern illustrated in our example is familiar to us all. As we add more and more information, our conclusions are apt to swing from one extreme to another, and back again. Commonsense reasoning about everyday matters is full of such examples.
Formal logics of the sorts usually studied by logicians (at least up to about 20 years ago) do not seem well suited to the analysis of this sort of reasoning. Let A be an arbitrary sentence and let ' be an arbitrary set of sentences. We will use the notation ‘' | A’ to indicate that our logic sanctions the inference from premises ' to conclusion A. We may sometimes add a subscript to specify the logic if in the context confusion about the logic is likely. The usual formal logics (classical, intuitionistic, many‑valued, modal) have a property that has come to be known as ‘monotonicity’; namely if the logic sanctions an inference from premises ' to conclusion A, then the logic will sanction the inference to conclusion A from plus any additional premises. That is, if ' | A, then ' c ) | A, for any set ).
Note that in our example about Sarah, we did not reject any of our previous explicitly stated premises as we proceeded. We simply added more and more information. As each new bit of information was added, Sarah’s conclusion switched back and forth. Sarah’s reasoning was nonmonotonic. Most common sense reasoning about everyday matters seems to be nonmonotonic. Since common sense reasoning is nonmonotonic, but the usual formal logics are all monotonic, it would seem that we need to develop a new kind of logic, a nonmonotonic logic, to model common sense reasoning.
It is the purpose of this paper to challenge not only the NEED, but also the rational basis, for moving to a special logic in order to account for the nonmonotonic character of commonsense inference. After a few brief preliminaries, we will present four proofs that it is impossible to have a nonmonotonic consequence relation that corresponds to universal constraints on rational belief structures. In brief, a rational nonmonotonic logic is impossible. We will then go on to show that one can model nonmonotonic reasoning with standard monotonic logic, indicating that there is no need for a special nonmonotonic logic anyway. Non‑monotonicity results from making alterations in our mental models of the world as a result of new information, and then using the revised mental models and standard logics to draw our conclusions. Principles of nonmonotonic reasoning are simply heuristics for the modification of the mental models we use to get around in the world. As such they are not content free, as is the case with principles of deductive reasoning. Principles of nonmonotonic reasoning contain empirical assumptions and should be subject to empirical test and verification, as with any other scientific generalizations. While these conclusions are not startlingly new, what is new are our proofs of the irrationality inherent in the alternative of viewing nonmonotonicity as deriving from some strange logic.
This paper is basically meta‑theoretical. It is not intended to be a survey of the field of nonmonotonic logic, and generally, we will avoid detailed commentary on specific systems. Good introductions to and surveys of the early trends in research in this area may be found in Besnard (1989), Smets et al (1988) and Turner (1984). More recently there have been a large number of survey books published covering more recent developments; one may profitably begin with Brewka et al (1997).
1 Preliminary Matters
We will begin with a few mundane observations about the role of computation in the interaction of an organism with its environment. Biological organisms have a number of primary goals, among which we may list (1) energy acquisition, (2) reproduction, and (3) death avoidance. These categories are only mentioned as a rough guide and are intended to be neither mutually exclusive nor jointly exhaustive. Let us compare a very simple organism like the paramecium with a more complicated organism, like the common house cat.
For the paramecium in its usual environment, food is plentiful and randomly distributed, enemies are randomly distributed, and reproductive opportunities (at least opportunities for the sharing of genetic material) are randomly distributed. So in its environment, the paramecium has no need of great computational power. Computational power would not help it in energy acquisition, reproduction, or death avoidance. Since computational power costs energy, increased computational power for the paramecium would be a net loss.
In contrast to the environment of the paramecium, for the common cat, food is scarce and not randomly distributed, enemies are not randomly distributed, and reproductive opportunities are not randomly distributed. Nor are any of these items uniformly distributed. In order to overcome this non‑random, but non‑uniform distribution, the cat has to be able to take information in from its environment and make predictions about the consequences of various actions it might take.
Consider a cat trying to pounce on a mouse in the middle of a field. In order to be useful, the feline computations must satisfy a number of constraints. Any prediction by the cat of the movements of the mouse, in response to anticipated movements by the cat, must occur sufficiently in advance of the event to allow the cat to act on the prediction; if it takes too long to do the computation, it is useless. Further, the prediction must concern matters for which the cat possesses some behavioral strategy; it will do the cat no good to compute air flight trajectories, as an owl might do, since the cat cannot fly. And finally, the time and energy costs of the computation from the data available must not on average exceed the utility of the prediction; so the extra time and energy cost to the cat of predicting the position of the mouse correct to an angstrom, rather than correct to a centimeter, would not yield enough payoff to be worthwhile. In summary, predictions must be made relatively efficiently, they must be mostly correct, and they must be about matters that are important to the organism.
Random strategies are effective only in random environments of a certain sort. Success in a highly non‑random, non‑uniform environment requires both (1) an adequate behavioral repertoire, and (2) appropriate predictive capacity. The greater the behavioral repertoire and the better the predictive capacity, the more diverse environments an organism can exploit, and vice versa. But in general as behavioral and computational capability increase, there is a corresponding increase in the logical state space complexity (more vectors in the space). And as state space complexity increases, there must in general be an increase in the physical complexity of any physical realization of a computer with respect to that space. But as the physical complexity of an organism increases, there will be increased energy requirements. Hence, in general, organisms that can exploit more diverse environments have greater energy requirements.
Devices which can store information in the environment and recover it in a systematic way as needed will in general have an advantage over devices not so endowed. The first advantage is that fewer internal states will be required for memory, so devices with ‘environmental memory’ can be physically simpler. And the second advantage is that devices with ‘environmental memory’ will generally have logically superior computational ability. By analogy, recall that a Turing machine is just a finite state machine brain controlling a read/write head for storing and recovering information in its environment; and Turing machines are certainly computationally superior to finite state machines. Further, a universal Turing machine can perform any computation that can be done by any Turing machine, and yet the universal machine has a fixed ‘brain’ size.
Thus organisms, with the ability to exploit very diverse environments will derive an energy advantage from the use of computationally efficient languages that can be used to model the environment and to predict future events. But in general, time and energy efficiency requirements prohibit totally accurate models and totally accurate predictions.
So the human use of natural languages is based, at least in part, on their utility for modeling and predicting aspects of the environment under conditions of uncertainty. In order to be practically useful, our modeling of the world will not generally include all possible details. The most detailed map would be one with a 1:1 scale, but it would be absolutely useless because it would completely cover the terrain! Similarly, time, space, and energy bounds on our mental computations require that mental models be somewhat simpler than that which they model. And we are seldom able to make, nor have need of, measurements of length precise to an angstrom, for example. So many, perhaps most, of our models of and claims about the world are best viewed as approximations.
To repeat, our rational thought processes involve modeling and predicting aspects of our environment under conditions of uncertainty. Here we will assume that our formal language is adequate for such modeling and for the expression of claims about our world that are important to us.
To begin with, we do not wish to beg any important formal questions, so we will be as general as possible in our characterization of the language. We will assume that we have given some formal language, with a denumerable set of well formed formulas.
= {E1, E2, ...}
For stylistic variety, we will use the terms ‘well formed formula’, ‘WFF’, ‘sentence’, and ‘expression’ interchangeably. Initially we make no assumptions about connectives, quantifiers, or other logical particles.
We will simply assume that crucial aspects of our reasoning can be represented in an appropriate way by expressions in the language. In particular, we assume that for any argument, the premises are given as a set of sentences, and the conclusion is a sentence. We use the notation ‘' | A’ to indicate that the logic under study sanctions drawing conclusion A from the set of premises '. The consequence relation ‘ | ’ will sometimes be subscripted when it is necessary to distinguish the consequence relation of one logic from that of another. For a given logic, the set of consequences of the premises ' is defined by:
Con(') = {A: ' | A}
We will now be more precise about the definition of monotonicity. In mathematical terminology, a real function f(x) defined over the real interval (a,b) is said to be strongly positively monotonic in the interval (a,b) iff for x 0 (a,b), f(x) increases as x increases; f(x) is said to be weakly positively monotonic iff for x 0 (a,b), f(x) does not decrease as x increases. For negatively monotonic, just replace ‘f(x) increases’ and ‘f(x) does not decrease’ in the above definitions by respectively ‘f(x) decreases’ and ‘f(x) does not increase’. By an obvious extension, the terminology may be applied to any function whose domain and/or range is merely partially ordered. So for example, the power set function (') is strongly positively monotonic, since if ' d ), then (') d ()). Further, the terminology may be applied to functions of more than one argument by careful specification of the argument position. For example, we may say that over the interval (1,2) the two place function g(x,y) = x/y is strongly positively monotonic with respect to its first argument and strongly negatively monotonic with respect to its second argument.
With these definitions, we can see that the consequence relation of classical logic is weakly positively monotonic with respect to premises, because if ' d ), then ConC(') f ConC()); i.e., if ' d ), then for all sentences A, ) |C A if ' |C A. Or, to use the more usual formulation, the consequence relation of classical logic is (weakly positively) monotonic with respect to premises because if ' | C A, then ' c ) | C A. Many researchers in the field of computing science seem to be unaware of the mathematical origins of the terms ‘monotonic’ and ‘nonmonotonic’, and this lack of understanding sometimes leads to confusion.
For future reference, it will be useful to characterize the situation in a slightly different way. There is an equivalent way of defining monotonicity of a relation in terms of the characteristic function of the relation. For any n-adic relation R (n = 1, 2, ...), the characteristic function C[R] is defined to be the map which takes n-tuples from the field of R into {1,0}, as follows:
C[R](x1, ..., xn) = 1, if and only if R(x1, ..., xn) is true
0, otherwise
We then say that the relation R is strongly/weakly, positively/negatively monotonic with respect to position i if and only if its characteristic function is. So, we can say that the consequence relation of classical logic |C is weakly positively monotonic because the characteristic function C[|C] is weakly positively monotonic.
In this paper we will always be talking about positively monotonic functions, so with that understanding we will no longer specify which sort of monotonicity is meant. Further, consequence relations are generally only weakly monotonic at best, so henceforth we will follow common usage in the logic literature and say ‘monotonic’ when we mean ‘weakly positively monotonic’.
Common sense reasoning takes us from premises to conclusions. So just as with classical logic, we may think of common sense reasoning as corresponding to a consequence relation ‘|CS’. Our simple example about Sarah’s reasoning (concerning how crowded the mall will be) clearly shows that the common sense consequence relation is not monotonic, i.e., it is nonmonotonic; in some cases we have ' |CS A, but we do NOT have ' c ) |CS A. We now want to examine the possibility of formalizing a theory of this nonmonotonic consequence relation.
Cooking up a nonmonotonic consequence relation is a completely trivial task. For example, consider the following strange consequence relation:
' |S A iff card(') # n and Lng(A) f Lng(') but A ó '.
By ‘card(')’, we mean ‘the cardinal number of elements in '’; and in the above formulation, ‘n’ is some arbitrary finite integer. And we use ‘Lng(A)’, the language of A, to mean the set of non‑logical components used in the construction of A, i.e., the set of predicates, constants, and sentence letters that actually occur in A; similarly, ‘Lng(')’ is used to designate the set of nonlogical components that appear in the members of '. So basically, our strange consequence relation says that as long as we have fewer than some specified number of premises, any sentence which uses only the language of the premises but is not one of the premises is a consequence of those premises.
Our strange consequence relation is clearly nonmonotonic, since if we add premises so the total exceeds the specified number n, then nothing follows. It is also paraconsistent, because if the language contains a negation ‘-’, we may have both ' |S A and ' |S - A, but we will not have ' |S B for all B. But even though it is nonmonotonic, this consequence relation is still clearly uninteresting. It is trivial because it sanctions many inferences we would deem to be irrational, and it fails to sanction inferences we would deem to be rational. For example, assuming 2 # n, then from premise set {The cat sat on the mat., The dog is brown.}, we could conclude ‘The dog sat on the cat.’ We would also be entitled to conclude ‘The cat did not sit on the mat’ but we would not be entitled to conclude ‘The cat sat on the mat’. Anyone whose reasoning corresponded to our strange consequence relation would rightly be deemed to be irrational. To the extent that one refuses to accept logically obvious conclusions, one may be deemed to be less than rational; on the other hand, jumping to bizarre conclusions is a clear indication of irrationality.
So cooking up nonmonotonic consequence relations is not a problem. The problem is to formulate a nonmonotonic consequence relation that corresponds to cannons of rationality. Roughly what we want is an explication of something like ‘Given that it is rational to believe the set of premises ', it is rational to believe A.’ Relations which violate principles of rational belief can not be taken seriously as candidates for a common sense consequence relation; to be non‑trivial, a consequence relation must correspond to principles of rational belief. In order to be acceptable, a logic must sanction all (or at least most) of the arguments which our common sense tells us are rational. On the other hand, we will not accept as adequate a logic that sanctions just any old inference at all. To be acceptable, the logic must reject all (or at least most) of the arguments which our common sense tells us are not rational.
In short, to be acceptable, our consequence relation must correspond to some sort of theory of rational belief and inference, both in the arguments it accepts and in the arguments it rejects. When we say that nonmonotonic logic is impossible, at least part of what we are claiming is that every adequate characterization of a logical consequence relation in terms of rational belief structures is monotonic. That is, mathematically speaking its consequence relation is weakly, positively monotonic. Our first two proofs are aimed at establishing this result. Further, we also wish to claim that the nonmonotonic character of our commonsense inferences is an essentially meta‑linguistic characteristic of the way in which the proof theory is used, and it cannot be mirrored by any syntactic structures in the object language; in other words, it is impossible to introduce into the syntax of the object language any structures that adequately reflect the nonmonotonic character of common sense reasoning. Our second two proofs are aimed at establishing this result.
2 First Proof
Perhaps the most detailed formal account of rational belief structures is classical probability theory. We can think of probability theory as an instantaneous snapshot of the beliefs of some ideally rational agent. But for our purposes, classical probability theory depends on too many assumptions. We do not want to assume that the language contains any particular logical particles, while classical probability theory usually assumes at least the machinery of classical logic. We do not want to assume that our beliefs can be assigned precise numerical values, as is done in probability theory. We do not want to assume that all beliefs can be linearly ordered, nor even that all beliefs are pairwise comparable.
Intuitively, we can think of a theory about some aspect of our world as just a set of sentences. An instantaneous snapshot of the belief system of a rational agent will assign some level of ‘belief’ to at least some sets; we could write ‘b(')’ for ‘the degree of belief in '’. But degrees of belief may not correspond to numbers, and many sets may have no degree of belief because they have never been entertained. Further, it may be that there is no intersubjective way of comparing degrees of belief from one individual to another. From our point of view, what is important is that the belief system will impose some form of ordering on some of the sets. Since it is the ordering that indicates the structure, we will generally concentrate on the ordering. We will use LEb, to indicate a quasi‑ordering relation relative to belief structure b; but because the ‘degree of belief’ is of no interest to us apart from the ordering relation, we will generally drop the subscript. So instead of writing something like ‘b(') # b())’ we will write ‘' LEb )’, or more simply ‘' LE )’ for ‘the degree of joint acceptability of the members of ' is less than or equal to the degree of joint acceptability of the members of )’.
Recall that we do not assume that ‘degree of acceptability’ or ‘degree of belief’ corresponds to any numerical measure. Further, we do not assume that the relations LE are either objectively or subjectively based; they may be either, they may be partially one and partially the other, or they may be based on some third alternative. We do not assume that the relations LE are in accord with any standard probability measure or that they correspond to some objective relative frequency scheme. We want to admit interpretations such as ‘If all the members of ' have a probability above threshold r, then all the members of ) also have probability above that threshold,’ or ‘most rational people who find all the members of ' to be more acceptable than their denials would find the members of ) to be more acceptable than their denials’.
With these brief preliminaries, we are now in a positions to specify some restrictions which apply to all rational belief structures. For our purposes, we will take a belief structure LE to be a subset of () x (), subject to the following two constraints:
b.l Reflexivity: ' LE '
b.2 Transitivity: If ' LE 'N and 'N LE 'O, then ' LE 'O.
In short, we will consider any quasi‑ordering relation defined over pairs of sets of sentences to be a belief structure.
Recall that we are thinking of a set of sentences as being a theory about the world, in the ordinary scientific sense of the word ‘theory’. For example, the set {A, B} is interpreted as meaning that both A and B hold. It would be bizarre in the extreme for someone to claim to believe the theory of thermodynamics, but then go on to say they do not believe in the second law. So sets of sentences are treated conjunctively, even if there is no conjunction operator in the language. Further, sets may contain an infinite number of sentences, and so would not correspond to any single sentence even in standard languages with a conjunction operator. Thus we do not assume that sets are closed with respect to arbitrary conjunctions of members. In addition, just as in ordinary scientific discourse, we do not assume that the sets are ‘deductively closed’, or closed with respect to any consequence relation. To write down the theory of Newtonian mechanics, we do not have to write down all its deductive consequences.
Our next constraint is motivated by simple relative frequency considerations. As an illustrative example, note that it is harder to build a house and an attached garage than it is to build just the house. Similarly, a theory which claims both A and B will be more difficult to support than a theory which claims just A. Looking at it from another point of view, there will be fewer (or no more) universe designs compatible with both A and B than there are compatible with just A. In general, if ' makes no more claims about the universe than ), then ' is at least as likely as ). Formally, we can state the condition as follows.
b.3 Subset principle: If ' f ), then ) LE '.
If ' makes no more claims about the world than does ), then it would be irrational not to have at least as strong a belief in ' as in ).
For any specific logic, there will no doubt be other constraints one could place on rational belief structures. For just one example, consider classical logic. In classical logic we know that from a conditional A e B and its antecedent A, one may infer the consequent B. This inference rule might translate into a number of restrictions on rational belief structures, such as either of the following:
b.mpl {A e B, A} LE {B}
b.mp2 If A e B 0 ' and A 0 ', then ' LE ' c {B}
But again, we do not wish to prejudge any issues with regard to logical particles or inference rules. Recall that formal logics are both descriptive and prescriptive. So to avoid questionable presuppositions, we will adopt the very general view that a logic is just any prescription for what is to count as a rational belief structure. That is, a logic L is any arbitrary set of belief structures:
L = {LE1, LE2, ...}
Each distinct logic will pick out a different set of belief structures; those for classical logic will be different from those for intuitionism, and both will be different than those for Post logic. But the important point is that from the standpoint of logic L, all and only the belief structures in L are rational.
Recall that we are concerned with the question of what kinds of entailment relations can be characterized in this way. Whatever else it may be, logic is not psychology. Logicians are not trying to codify the mental associations of some particular individual. So logical entailment does not depend on any single belief structure. Rather, logical entailment for logic L must be based on some universal properties of all rational belief structures as determined by L. Further, if the proof theory is to be adequate, it must accept all the arguments sanctioned by L and reject all the arguments deemed unacceptable by L. These considerations are expressed by two standard properties, which we will discuss in turn.
To motivate the first property, suppose we are trying to construct an artificial reasoning system. We certainly want to AVOID making a system that will sometimes, in logically unpredictable cases, draw conclusions that most would regard as irrational. Our first property captures this desideratum and is called ‘soundness.’ We can think of this principle as the ‘good path’ principle; logical entailment should never lead us astray. We can express the soundness principle formally as follows:
(1.1) Soundness: If ' |b A, then ' LE {A} for all (most) rational belief structures LE 0 L .
Note that we are using the notation ‘|b’ to indicate a proof theory which coincides with some set L of rational belief structures.
To motivate the second property, again suppose we are trying to construct an artificial reasoning system. As before, we want to AVOID making a system that will sometimes, in logically unpredictable cases, FAIL to draw conclusions that most would deem to be rational. The second property is called ‘completeness.’ Basically, we want our proof theory to be strong enough to capture any argument which is sanctioned by every rational belief structure. If, no matter what the rational belief structure is like, it is always rational to accept A on condition that we accept all of ', then ' logically entails A. Using the same notation as above, we can formally express the completeness condition as follows:
(1.2) Completeness: If ' LE {A} for all (most) rational
belief structures LE 0 L, then ' |b A.
Note that in our statements of (1.1) and (1.2), we have included ‘most’ in parentheses in an effort to be as generous as possible. In our proof below, we will use the ‘all’ construction to keep the presentation of the proof as simple as possible. However, simple relative frequency considerations could be used to establish the same result as long as ‘most’ means ‘more that 50%.’
So far, we have been very general in all our considerations. We have allowed the formal language to have absolutely any logical particles and internal structure desired. Our characterization of belief structures was also very general, requiring only the minimal conditions dictated by relative frequency considerations. In particular, we imposed no requirements on the evolution of belief structures. And we allowed a completely arbitrary specification of which belief structures are to be considered rational. We have imposed no requirements on the sorts of inference rules contained in the proof theory. Of our proof theory, we required only (1) that it not lead us astray by sanctioning inferences deemed to be irrational, and (2) that it be strong enough to capture any argument deemed to be rational by all belief structures. But even with this simple framework, we are now in a position to demonstrate the impossibility of a nonmonotonic consequence relation.
Theorem 1: Let L be an arbitrary set of rational
belief structures which are reflexive, transitive, and satisfy the subset principle.
Further suppose that logical entailment |b is sound and complete with respect to the
set L . Then logical
entailment is monotonic; that is, if ' |b A, then ' c ) |b
A.
Proof:
1. ' |b A given
2. ' LE {A} for all LE 0 L. 1, soundness
3. ' c ) LE ' for all LE 0 L. subset principle
4. ' c ) LE {A} for all LE 0 L. 2,3, transitivity
5. ' c ) |b A 4, completeness
In summary, it is trivially easy to formulate ‘proof theories’ which violate the principle of monotonicity. But these nonmonotonic algebras cannot correspond to logical entailment based on rational belief. Any entailment relation which corresponds to elementary principles of rational belief must be monotonic. Hence, a rational nonmonotonic logic is impossible, no matter how the set of rational belief structures is selected.
(I have presented the proof in this section in a number of lectures over the years since 1988, and it appears in Morgan (1998).)
3 Second Proof
Some may feel that trying to represent belief structures by quasi‑orderings on sets of sentences does not do justice to the conditional nature of actual belief systems. It seems that almost any belief we have is conditional on a lot of background assumptions. For example, even the belief that ‘The cat is on the mat’ involves assumptions about the physical nature of cats and mats, the impenetrability of matter, and so on. So, let us consider conditional belief structures as a foundation for an entailment relation.
Theories of conditional probability are likely the most frequently encountered candidates for models of conditional belief structures. As we previously indicated, there is some controversy over the adequacy of numerically based probability theories as models of actual belief structures. Most people cannot linearly order their beliefs, much less assign precise numerical weights to them, so it seems that numerically based theories are over specified. As before, we do not wish to beg any fundamental questions, so we will allow both numerically based theories and comparative theories as well. A ‘conditional belief unit’ will be represented by ‘c(A, ')’ for A any sentence in the language and ' any set of sentences of the language. We think of A as the conclusion, and we think of the set ' as containing the premises, the evidence, or the background assumptions. So c(A, ') is to be thought of as the degree of belief in A, given the assumptions (evidence) in the set '.
For numerically based theories, we think of ‘c’ as being a function which maps ordered pairs (A, ') into the closed unit interval [0,1]. A theory of rational belief must then include a set of constraints which must be satisfied by any acceptable function c. Comparative theories are sometimes represented in terms of some quasi‑ordering relation on the units c(A, '), and the theory of rational belief specifies constraints on the permissible ordering relations. Alternatively, comparative theories are sometimes formulated in terms of a 4‑place predicate; for example, Carnap (1962) uses the notation ‘mc(A, ', B, ))’ intuitively meaning something like ‘the argument (A, ') is at least as strong as the argument (B, ))’. Constraints are then phrased in terms of the predicate mc. But for our purposes, even for comparative theories it will be convenient to speak of ‘c’ as a function mapping ordered pairs (A, ') onto the elements c(A, ') of some quasi‑ordered set. That is, constraints on the ordering relation or the 4‑place predicate can be written equivalently as constraints on the function c. Thus in any case, at least part of a theory of rational belief is the specification of a set of constraints that determines the acceptable conditional belief functions c.
Since we wish to be as general as possible, we will not discuss the nature of the constraints that determine the acceptable conditional belief functions. Rather, we will allow any prescription for rationality. So once again, we will take a logic L to be any set of conditional belief functions:
L = {c1, c2, ...}
The c‑f unctions are assumed to have as domain the ordered pairs in x (). However, for our purposes we need not specify the range; it could be the closed unit interval [0,1], or it could be the elements of some quasi‑ordered algebra in case we are dealing with a comparative theory.
We now wish to draw attention to an important property of conditional belief. It is of interest to note that we are rationally able to entertain a variety of alternative hypotheses. I can consider what the world will be like, on condition that I decide not to go to work tomorrow. I can consider what the world will be like, on condition that I decide I will go to work tomorrow. In each case, my conditional belief state seems perfectly rational. A casual examination of a wide variety of probability theories, whether numerical or comparative, will show that they are virtually all conditionalizable; i.e., if I conditionalize a rational belief state by adding some additional sentence to my background assumptions, the resulting belief state is itself rational. A more formal characterization follows:
C.2 A set L of conditional belief functions is closed under conditionalization iff for all c in L , for any set ), the function c* defined below is also a member of L .
c*(A, ') =df c(A, ' c ))
Carefully note that we are not saying that c(A, ') = c(A, ' c )). We are making no claim about the relationship between c(A, ') and c(A, ' c )). All we are saying is that a conditionalized legitimate conditional belief structure is itself a legitimate conditional belief structure.
Also note that requiring L to be closed under conditionalization does not impose any restriction on the evolution of beliefs. While some have claimed that the only rational way for beliefs to evolve is by (e.g., Bayesian) conditionalization, others have disputed such claims. Again, C.1 does not say anything about the evolution of beliefs; it simply says that a conditionalized rational belief function is itself a rational belief function.
The only problems with conditionalization that generally arise correspond to cases in which the measure of the conditioning set is 0, or alternatively, when the sentences in the conditioning set are internally inconsistent or incoherent in some way. In such cases there is a trivial modification of the theory possible to give c* some default value, usually 1 for numerically based theories. The alternative is to permit partial functions; in such a case, we would take c* to be the function indicated in C.1, but undefined when ' c ) is problematic as evidence.
We will now consider an important general property of many characterizations of entailment. In order to motivate the property, let us consider a few proposed definitions for an entailment relation. For simplicity, we will restrict the examples to those based on numerically valued functions c.
D.1 ' |c A iff (c) c(A, ') > c(- A, ').
D.2 ' |c A iff (c) c(A, ') $ 1 ‑ 0.
D.3 ' |c A iff (c) for all E, c(A, ' c E) > 0.5.
In both D.1 and D.2, the evidence position is always occupied just by the premise set '. However, in D.3, the evidence position is occupied by ' c E, and E is universally quantified. That is, the defining sentence may talk about not just ', but also some set‑theoretic function of '. But in all three cases, the defining sentence has a property which we call being ‘transparent to unions’; the formal characterization is as follows:
C.2 ‘Sentence[A, c(–, f('))]’ is transparent to unions iff for all sets ), if ‘Sentence[A, c(–, f(') c )))]’ is true, then
‘Sentence[A, c(–, f(' c )))]’ is true.
To forestall trivial objections based on misunderstanding, it is important to note that by ‘transparent to unions’, we do not mean any of the following:
* if ‘Sentence[A, c(–, f('))]’ is true, then
‘Sentence[A, c(–, f(' c )))]’ is true.
** if ‘Sentence[A, c(–, f('))]’ is true, then
‘Sentence[A, c(–, f(') c ))]’ is true.
*** if ‘Sentence[A, c(–, f(') c ))]’ is true, then
‘Sentence[A, c(–, f('))]’ is true.
**** if ‘Sentence[A, c(–, f(' c )))]’ is true, then
‘Sentence[A, c(–, f('))]’ is true.
Being transparent to unions does not mean that one can ignore unions in the evidence position.
Note that every sentence in which the assumption set ' is mentioned only in expressions of the form ‘c(–, ')’ will be transparent to unions. To fail to be transparent to unions, a defining sentence must make strong exclusionary claims about ' (e.g. ‘' must not contain sentences of the form X’, or ‘' must be finite’) or the sentence must make use of complicated set‑theoretic functions f. A casual examination of proposed definitions of entailment based on conditional belief functions shows that virtually all of them are based on properties defined by sentences transparent to unions.
We are now in a position to recast our soundness and completeness principles in terms of some defining sentence concerning conditional belief structures. Recall our previous discussion that logic is not psychology. Entailment must be a universal property of all rational belief structures, independent of the beliefs of any particular individual.
(2.1) Soundness: If ' |c A, then for all conditional belief functions c 0 L, Sentence[A, c(–, f('))].
(2.2) Completeness: If for all conditional belief functions c 0 L , Sentence[A, c(–, f('))], then ' |c A.
With these preliminaries behind us, we are now in a position to give our second proof that logical entailment must be monotonic. Our proof of the following theorem is purposely rather pedantic in order that there will be no confusion on the role of ‘union‑transparent’.
Theorem 2:
Suppose the set of conditional belief structures is closed under
conditionalization. Further suppose that logical entailment is sound and complete
with respect to a property of conditional belief structures that is transparent
to unions. Then logical entailment is monotonic; that is, if '
|c
A, then ' c )
|c
A, for all A.
Proof:
1. ' |c A given
2. Let )N be an arbitrary set. assume
3. (c 0 L ) Sentence[A, c(–, f('))] 1, soundness
4. Let c’ be an arbitrary member of L. assume
5. cO 0 L , for cO(B, E) = cN(B, E c )). 2,4, condidonalization
6. Sentence[A, cO(–, f('))] 3,5
7. Sentence[A, cN(–, f(') c ))] 5,6
8. Sentence[A, cN(–, f(' c )))] 7, union‑transparent
9. (c 0 L ) Sentence[A, c(–, f(' c )))] 4‑8, generalization
10. ' c ) |c A 9,completeness
11. ' c ) |c A, for all A. 3‑10,generalization
Our proof here is extremely general, but very simple, and cannot be avoided by any characterization of entailment in terms of conditional belief functions in which the characterization depends only on the given evidence. All of our examples D.l‑3 yield monotonic entailment relations. As another example, Carnap’s characterization in Carnap (1962) of his comparative relation mc is such that mc(h, e, hN, eN) holds if and only if for all his regular c‑functions, c(h, e) $ c(hN, eN); but it is easy to show from his characterization that, as would be expected from the above theorem, mc is monotonic in the following sense: If mc(h, e, hN, eN), then for all d, mc(h, e v d, hN, eN v d) (provided that neither e v d nor eN v d is logically false).
As long as the set of conditional belief functions is closed under conditionalization, then any entailment relation based on a universal claim about c(–, ') must be monotonic. Any entailment relation which corresponds to a universal property of rational belief functions conditioned on the given evidence is monotonic. Hence, again, a rational nonmonotonic logic is impossible.
One possible way to avoid the result is to try to formulate a defining sentence that is not transparent to unions. But note that even if the defining sentence is not fully transparent to unions, it may be transparent to unions when the set ) belongs to a certain class. For example, we may require that the set of assumptions be finite. In this case, the defining sentence would be transparent to unions for finite sets. Thus our proof of monotonicity would still hold, with the sets ) restricted to being finite. From the standpoint of the examples usually cited to motivate nonmonotonic logic, the proof that entailment is monotonic relative to sets of a certain class (e.g., finite sets) is just as damaging as the more general proof.
Yet another way to try to avoid the result is to argue against conditionalizability. But while many have argued that conditionalization of a given belief state (probability distribution) is not always the best, most useful, or most rational way to update our beliefs, that is not the point at issue here. We are not here concerned with the question of updating beliefs. All that is required is that a conditionalized legitimate belief state should itself be a legitimate belief state. Clearly common sense reasoning makes much use of constructions like ‘Suppose X is true ...’, and ‘Consider what will happen if Y turns out to be the case ...’ It would certainly seem bizarre to claim that when we conditionalize our belief state in this way, we suddenly fall into irrationality!
Further note that conditionalizability follows from relative frequency considerations, even relative frequency of universe designs. Any distribution derived from relative frequency can be conditionalized, as long as we incorporate the proviso to handle cases with frequency 0. In addition, let us consider the usual product rule for conjunctions.
P(A v B, ') = P(B, ') x P(A, ' c {B})
If we reject conditionalizability, then this rule would seem to reduce to an absurdity, since it takes the probability of the conjunction to be a product of two terms, one of which comes from a conditionalized distribution.
There is an additional argument for conditionalizability based on the product rule for conjunctions. Assume P is some given distribution, and consider the distribution P*, which is just P conditionalized on B.
P*(A, ') =df P(A v B, ')/ P(B, '), for all WFFs A and sets '
There will be a problem if P(B, ') = 0, but otherwise the distribution P* will be as well defined as the distribution P. A trivial modification of the classical theory will permit the distribution obtained by assigning the constant value 1 to all conclusion‑evidence pairs; so we could assign P*(A, ') the value 1 for those cases in which P(B, ') = 0.
Now, note that in the definition, for each evidence set ', the value P(B, ') is just a normalization factor, a constant. In terms of order alone, for a fixed WFF B, the values of P*(A, '), for various WFFs A, will be related to each other in the same way as the values of P(A v B, '). So, if there is nothing irrational about the ordering for conjunctions, there should be nothing irrational about the ordering in the conditionalized distribution for each fixed '. We do need to worry a bit about the ordering as ' is varied; but again, by hypothesis the conjunctions are rationally ordered, compared over different evidence sets, by the original distribution. But recall as we said before, in terms of the beliefs of real agents, a quasi‑ordering is the best we can hope to achieve. Hence those who wish to argue against conditionalizability will be forced to question the rationality of the ordering among conjunctions in the original belief structure, contrary to the hypothesis that the original belief structure is rational. (This same justification can be given in terms of classical comparative probability theory, but is a bit more complex, and arrives at the same result.) So the set of rational belief structures must be closed under conditionalization, assuming something like the classical product rule for conjunctions holds. And something akin to the classical product rule for conjunctions is dictated by relative frequency considerations.
(The proof in this section is a generalization of one presented in Morgan (1994).)
4 Soundness
Both of our first two proofs depend on the assumption of a principle of soundness. One proposal for avoiding the impossibility proofs is to abandon soundness. In terms of classical logic, ‘revisable’ conclusions must go beyond what is deductively sanctioned by the premises, otherwise they would not be ‘revisable’. But for such conclusions, it will always be possible to find a model in which the premises are all true but the conclusion is false. Hence, in terms of classical semantics, nonmonotonic inferences are not sound.
Before blithely abandoning the soundness principle, it should be noted that rational belief structures of the sort considered in our first two proofs are quite different from classical model theory. The two soundness principles utilized in our proofs are not equivalent, at least not by themselves, to the soundness principle of classical logic. Note that in both of our impossibility proofs, subject to the initial constraints, we allowed the selection of the belief structures deemed to be rational to be made in any way whatsoever. What we proved in each case is that it is impossible to characterize a set of rational belief structures in such a way that (1) every inference that is always deemed to be rational is sanctioned by the proof theory, and such that (2) every inference sanctioned by the proof theory is always deemed to be rational. That is, no matter what the nonmonotonic proof theory is like, and no matter how you select the set of rational belief structures, if the proof theory is complete, then it must sanction arguments that are sometimes deemed irrational. In this sense, nonmonotonic proof theory is self‑refuting. To give up on soundness is, in essence, to admit that nonmonotonic proof theory must be irrational.
Note that we cannot even beg the question and use the proof theory to select the set of belief structures that are to be considered rational. Suppose we start with a proof theory, and then select the set of belief structures that sanction (in the sense given by the appropriate soundness and completeness conditions) all and only the inferences that are sanctioned by the proof theory. Then relative to that set of belief structures, the proof theory will be monotonic. In short, there is no way to select a set of rational belief structures and specify a nonmonotonic proof theory that does not violate the cannons of rationality. If you want a complete nonmonotonic proof theory, then you must be (at least sometimes) irrational.
5 Third Proof
In spite of our first two proofs, one might still harbor a bit of hope for constructing a theory with some sort of nonmonotonic operator in the syntax of the language, and then using that operator in some way to account for the nonmonotonicity of the consequence relation. It will be useful to consider an analogy with classical probability theory.
We begin with a simple example. Suppose we draw a single card from a standard deck with the usual constitution. Then the probability that the card is black, given the background information, is 1/2. Now, if we are told, in addition to the constitution of the deck, that the card drawn is a spade, then the probability that the card is black will rise to 1; but if instead the additional information is that the card is red, then the probability that the card is black will fall to 0. In other words, we can find specific probability distributions P, sentences A, and sets ', ), and E, such that:
P(A, ' c )) < P(A, ') < P(A, ' c E)
Some additional evidence makes our conclusion more likely, while some additional evidence makes our conclusion less likely. Mathematically speaking, conditional probability functions are in general not weakly positively monotonic in the evidence position.
At this point, it is important to be very clear about our terminology. As we stated above, conditional probability functions are in general not weakly positively monotonic in the evidence position. For some sets ), P(A, ' c )) is less than P(A, '), while for other sets E, P(A, ' c E) is greater than P(A, '). Without trying to specify the details, we note that the assertability of a conclusion relative to a set of premises is related to the conditional probability value. For just one example, we consider the extreme cases: the sentence ‘A probably holds, provided that things are as described by '’ is a reasonable claim if P(A, ') = 1, but it is clearly unreasonable if P(A, ') = 0. Hence there will be cases in which the conditional ‘A probably holds, provided that things are as described by '’ is reasonable, but the conditional ‘A probably holds, provided that things are as described by ' c )’ is not reasonable. Similarly, there will be cases in which the conditional ‘A probably holds, provided that things are as described by '’ is not reasonable, but the conditional ‘A probably holds, provided that things are as described by ' c E’ is reasonable. Using the modern idiom, the conditional of conditional probability theory clearly seems to be nonmonotonic.
For an alternative view, recall our earlier discussion of the characteristic function of a consequence relation. Typically, characteristic functions are two-valued. But by a simple extension of the notion, we could admit graded characteristic functions. For any graded n-adic relation R (n = 1, 2, ...), the graded characteristic function G[R] is defined to be the map which takes n-tuples from the field of R into the range of possible gradations, as follows:
G[R](x1, ... , xn) = d, if and only if R(x1, ..., xn) holds to degree d
One very simple view of probability theory is that it is a graded characteristic function for an entailment relation. That is, probability theory specifies the degree to which a set of premises entails a conclusion. For example, we typically use significance levels as cutoffs for accepting or rejecting conclusions in statistics. Once again, from this point of view, the entailment relation associated with probability theory will be nonmonotonic (in the current idiom), because probability distributions are not in general weakly positively monotonic in the evidence position.
In summary, it seems that the conditional associated with conditional probability theory will be nonmonotonic. Similarly, it seems that if by analogy with the characteristic function for classical entailment, we think of probability theory as a graded characteristic function for an entailment relation, that entailment relation will be nonmonotonic. So it seems appropriate to simply say that probability theory is nonmonotonic, meaning by this phrase only that in general, probability functions are not weakly positively monotonic in the evidence position.
Note that not every probability function P will behave in this nonmonotonic fashion. For example, in the case of classical logic, for any maximally consistent set ', we can define a two valued distribution as follows, for all sentences A and sets ):
P'(A, ' c )) = 1 iff either A 0 ' or not ) f '
= 0 iff ) f ' and not A 0 '
The function so defined will satisfy all the constraints for classical probability theory (modulo the proviso for evidence of measure 0). But the distribution P' is strictly monotonic, in the sense that for all sentences A and all sets ) and E:
P'(A, )) # P'(A, ) c E)
So we cannot say that every conditional probability distribution fails to be weakly positively monotonic in the evidence position. However, the general theory of probability does admit many nonmonotonic distributions.
So we do have general laws about how the conditional of conditional probability works, and we know that it is sometimes nonmonotonic. Some researchers feel that probability theory in some form (numerical, comparative, classificatory) is the most detailed account of rational non‑demonstrative reasoning we have. But even without adopting such a strong view, it is clear that reasoning concerning relative frequencies certainly constitutes at least a corner stone of rationality. Perhaps we can formulate general laws about a nonmonotonic conditional in our object language that captures the conditionality of some very general version of conditional probability theory. In other words, the suggestion is that we extend our formal object language with some object language syntactic structure ‘s’ such that for all distributions P, all sentences A and B, and all sets ':
P(A, ' c {B}) = P(s(B, A), ')
Since the conditional of conditional probability is nonmonotonic, the syntactic structure ‘s(B, A)’ should then represent some sort of nonmonotonic conditional in the object language, with antecedent B and consequent A. With some such object language conditional, and a corresponding proof theory ‘ |s’, we may hope to define a corresponding nonmonotonic consequence relation ‘|nm’ for finite premise sets, perhaps as follows:
' |nm A iff |s s(v', A)
for v' the conjunction of the members of '. (We would have to do something a bit fancier for the case of infinite '.)
As initially attractive as the above approach may seem, it is not too difficult to prove that this strategy is doomed to failure; and the proof has a significant impact on the prospects for any nonmonotonic logic. In order that our proof should be as general as possible, we will not limit ourselves to classical probability theory, nor even to numerically valued theories. Instead, our proof will depend only on some very weak constraints of a comparative conditional theory. All of the standard numerically valued probability theories will satisfy our constraints, and hence the theorem will be very broad in its application.
As before, we assume to be given some arbitrary language consisting of a set of well formed formulas. We will use the notation ‘(A, ') LE (B, ))’ for the claim ‘A, given ', is no more likely than is B, given )’. Each distribution LE is actually a 4‑place relation:
LE f ( x ()) x ( x ())
However, as indicated, we will write the relations in infix notation, and treat them as two‑place relations on ordered pairs. We will first list the constraints on the relations LE, and then we will give a brief motivation of each one.
CCP.1 The relation LE is transitive.
CCP.2 (A, ' c {A}) LE (A, ' c {A, B})
CCP.3 If (A, ' c {A}) LE (A, ' c {B}), then (B, ') LE (A, ').
Condition CCP.1 must trivially be satisfied by any comparative relation. Next consider CCP.2. In any standard logic, the argument to conclusion A from premises that contain A itself is as strong as any argument can be; in numerical probability, the value will always be 1. Such an argument is deductively valid, and no additional premises can reduce the strength of the conclusion, relative to the premises. Our condition CCP.2 reflects this observation in our comparative theory. Finally, consider CCP.3. The following relationship always holds in numerically valued conditional probability theory.
P(A, ') x P(B, ' c {A}) = P(B, ') x P(A, ' c {B})
An elementary 3 circle Venn diagram will easily establish that the relationship must hold of any theory in accord with relative frequency. But if P(A, ' c {B}) takes the value 1, then it immediately follows that P(B, ') # P(A, '). Condition CCP.3 is just a simple comparative version of this result.
Let us now consider the possibility of attempting to introduce into the object language a syntactic structure which is meant to capture the conditional of conditional probability theory. In terms of our comparative structures, we require the following two conditions:
CCP.4.a (A, ' c {B}) LE (s(B, A), ')
CCP.4.b (s(B, A), ') LE (A, ' c {B})
But serious difficulties arise if we try to add these conditions. We will state and prove two theorems, and then discuss their significance.
Theorem 3.1:
If LE satisfies CCP. 1‑3, 4.a, then the syntactic construction
s(B, A) is monotonic; that is, the following holds for all expressions A and B
and all sets ':
(A, ') LE (s(B, A), ')
Proof:
1. (A, ' c {A, B}) LE (s(B, A), ' c {A}) CCP.4.a
2. (A, ' c {A}) LE (A, ' c {A, B}) CCP.2
3. (A, ' c {A}) LE (s(B, A), ' c {A}) 1,2,CCP.1
4. (s(B, A), ' c {s(B, A)}) LE
(s(B, A), ' c {s(B, A), A}) CCP.2
5. (A, ' c {s(B, A), A} c {A}) LE
(A, ' c {s(B, A), A} c {A, s(B, A)}) CCP.2
6. (A, ' c {s(B, A), A} c {A}) LE
(A, ' c {s(B, A), A} c {s(B, A)}) 5,set theory
7. (s(B, A), ' c {s(B, A), A}) LE
(A, ' c {s(B, A), A}) 6,CCP.3
8. (s(B, A), ' c {s(B, A)}) LE
(A, ' c {s(B, A), A}) 4,7,CCP.1
9. (A, ' c {A} c {A}) LE
(A, ' c {A} c {s(s(B, A), A)}) CCP.2
10. (s(s(B, A), A), ' c {A}) LE (A, ' c {A}) 9, CCP.3
11. (A, ' c {A,s(B,A)} LE
(s(s(B, A), A), ' c {A}) CCP.4.a
12. (A, ' c {A, s(B, A)} LE (A, ' c {A}) 10,11,CCP.1
13. (s(B, A), ' c {s(B, A)}) LE (A, ' c {A}) 12,8,CCP.1
14. (s(B, A), ' c {s(B, A)}) LE
(s(B, A), ' c {A}) 3,13,CCP.1
15. (A,
') LE
(s(B, A), ') 14,CCP.3
Theorem 3.2: If
LE satisfies CCP.1‑3, 4.a, and 4.b, then LE is monotonic;
that is, the following holds for all expressions A and B and all sets ':
(A, ') LE (A, ' c {B})
Proof:
The proof is immediate from Theorem 3.1, CCP.4.b, and CCP.1.
Suppose we try to introduce a conditional construction into the syntax that allows us to move sentences from the premise set of the probability distribution into the antecedent of a conditional in the object language without decrease of probability value (i.e., in accord with CCP.4.a). Theorem 3.1 immediately tells us that any such conditional must be monotonic. Looked at in another way, Theorem 3.1 tells us that any nonmonotonic conditional in the syntax must violate fundamental principles of probability theory. In other words, no nonmonotonic conditional in the syntax can serve as the syntactic structure of CCP.4.a. It is logically impossible for a nonmonotonic conditional in the syntax of the language to reflect the most elementary principles of relative frequency.
But the situation is even more serious, as is indicated by Theorem 3.2. Note that the statement of Theorem 3.2 does not directly mention the new syntactic structure. The simple attempt to include a structure in the object language which captures the conditional of the probability theory forces the probability theory to be monotonic for every set of premises and every possible conclusion, whether the new conditional structure is involved or not. But not only is the probability theory forced to be monotonic, the corresponding object language conditional must itself be monotonic as well. So the attempt to capture in the object language the nonmonotonic character of conditional probability theory is doubly doomed: (1) the probability theory immediately ceases to be nonmonotonic, and (2) the new object language conditional cannot be monotonic either.
No matter what the logic is like, no matter what logical particles the language contains, it is logically impossible to incorporate any syntactic structure in the object language which captures the nonmonotonic character of conditional probability theory. As soon as we try to incorporate such a structure into the object language, the probability theory is forced to be monotonic. But if the conditional of the probability theory is forced to be monotonic and the object language conditional captures the conditional of the probability theory, then the object language conditional will be monotonic too. Hence, the nonmonotonic character of conditional probability theory is essentially, necessarily meta‑theoretical, and cannot be reflected in the underlying syntax of the object language.
Thus any alleged nonmonotonic conditional in the object language must be characterized by principles that violate the most elementary considerations of relative frequency. Note that this result applies, no matter how a theory of nonmonotonic conditionals is formulated. We motivated the theorem in this section by beginning with a consideration of the conditional of conditional probability theory. But these results apply to any logic, with any logical particles, no matter what the characteristics of the underlying proof theory, no matter what the characteristics of its underlying semantics, if any.
In short, it does not matter whether or not the theory is formulated in terms of some abstract calculus completely unrelated to probabilistic considerations. No matter what the proof theory is like, no matter what the syntax of the language is like, if the language includes a conditional operator that is nonmonotonic, then that operator logically cannot correspond to the conditional of conditional probability theory.
It is fair enough for any researchers to say that, at least initially, they are not trying to capture probability theory. But the important point that cannot be avoided, regardless of the goals of the research, is that any nonmonotonic conditional logically must be governed by principles that violate elementary considerations of relative frequency. But these elementary principles are basic to most any notion of rationality.
(The proof in this section is based on considerations which appear in Morgan and Mares (1995) and Morgan (1997).)
6 Fourth Proof
From our first three theorems it is obvious that there are very severe problems associated with any attempt to account for the general nonmonotonic character of our reasoning by the development of a new consequence relation in formal logic. Such developments cannot correspond to rational belief structures. And there is no way to incorporate logical particles into the syntax of the language in such a way that those particles are governed by relative frequency considerations. However, we did indicate that it is always trivially possible to develop nonmonotonic algebras, and in spite of their shortcomings, such theories may turn out to have useful applications in some areas. For our last proof, we will consider potential difficulties that might arise when we try to reason with such logics. We will begin with a couple of simple examples of reasoning.
First example: Sherlock, the detective, is trying to figure out who shot JR. Since there was only one bullet, only one person did the deed. Initial evidence (motive, location) indicates the culprit was either Mary or Sue. Subsequent evidence (finding the gun, matching the marks on the bullet in the body to those from bullets fired from the gun, fingerprints on the gun, witnesses) strongly suggests Mary pulled the trigger. Sherlock concludes that Sue did not shoot JR.
Second example: Astronomer Spock makes observations of the motion of bodies in a certain location in space. His observations plus considerations of relativistic mechanics suggest there is a very massive object with a relatively small radius there. Such an object would be a black hole. From the theory of black holes, and from the observations of the matter in the region, Spock predicts that characteristic radiation, released by matter falling into the black hole, will be detected in the region.
Both of our examples are instances of chaining together inferences in a way indicated by the following pattern:
trans: lf ' | A and ' c {A} | B, then ' | B.
This characteristic of consequence relations is sometimes called transitivity. We have already alluded to another property of consequences relations, which is sometimes called reflexivity.
reflx: ' c {A} | A
Both of these properties, along with monotonicity, were formulated by Tarski in his early studies of consequence relations. However, here we would like to examine the possibility of nonmonotonic reflexive and transitive consequence relations.
It turns out that there is a tight connection between our ability to reflect the consequence relation in the syntax of our logic and the principle of monotonicity. In parallel to our discussion in the previous section, let us consider the possibility of some sort of ‘deduction’ theorem. That is, let us suppose that there is some syntactic structure in the object language which reflects the consequence relation; this ‘reflection’ has two parts.
ded.1 lf ' c {B} | A, then ' | s(B,A).
ded.2 lf ' | s(B, A), then ' c {B} | A.
Of course in classical logic, the deduction theorem holds with respect to the material conditional, or equivalent syntactic structures. But it turns out that we cannot have a nonmonotonic logic whose consequence relation is reflected in the syntax.
Theorem 4.1: For any reflexive, transitive consequence relation satisfying ded.l, the syntactic structure ‘s’ is monotonic; that is:
lf ' | A, then ' | s(B,A).
Proof:
1. ' | A given
2. ' c {A, B} | A reflx
3. ' c {A} | s(B,A) ded.l
4. ' | s(B,A) l,3,trans
Theorem 4.1 demonstrates that it is impossible to reflect any transitive, reflexive consequence relation by any nonmonotonic construction in the syntax of the object language. However, the situation is even more serious, as is indicated by the following result.
Theorem 4.2: Any
reflexive, transitive consequence relation that satisfies both ded.1 and ded.2
with respect to any structure ‘s’ is itself monotonic; that is:
lf ' | A, then ' c {B} | A.
Proof:
The proof is immediate from Theorem 4.1 and ded.2.
Theorem 4.2 tells us that if a deduction theorem holds with respect to any syntactic construct, no matter what the characteristics of that construct, then a transitive, reflexive consequence relation must be monotonic. (This result tells us that it is impossible to obtain a nonmonotonic logic by any simple extension of a classical natural deduction system.) Thus Theorem 4.2 suggests again that nonmonotonicity of the consequence relation is independent of the syntax of the language.
Looked at from another point of view, Theorem 4.2 just says that no transitive, reflexive, nonmonotonic consequence relation can have a deduction theorem with respect to any syntactic structure in the object language. For some, this result may not seem too devastating, at first. After all, the usual formulation of the rule of necessitation for the standard modal logics precludes a proof of the deduction theorem. But, there are some important differences between the standard modal logics and attempts to formulate nonmonotonic logic. In modal logic, the rule of necessitation is validity preserving, not truth preserving. But as we have seen in our first two impossibility proofs, nonmonotonic logics cannot be formulated with validity preserving principles, i.e., with principles that are universalizable across all models. There are ways of formulating standard modal logics and the rule of necessitation so that the deduction theorem holds. But in contrast, there is no way to formulate a nonmonotonic logic so that the deduction theorem holds. So the difficulty presented by Theorem 4.2 is rather deeper than the difficulty associated with the usual formulations of modal logics.
To see how penetrating the problem is, we can reformulate Theorems 4.1‑2 in terms of an axiomatic structure with a special conditional operator ‘‑>‘ and a conjunction operator ‘v’, both in the object‑language. We can think of our characterizations of the consequence relation at the meta‑level as being characteristics of the object‑language conditional instead. Our transitivity and reflexivity conditions now become the following object‑language conditions:
otrans: If | x ‑> y and | (y v x) ‑> z, then | x ‑> z.
oreflx: | (x v y) ‑> x
There will be two object‑language principles, parallel to the two components of the deduction theorem, as follows:
oded.1: If | (x v y) -> z,then | x‑> s(y,z).
oded.2: If | x ‑> s(y, z), then | (x v y) ‑> z.
Finally, we assume the conjunction operator is at least partially associative in the antecedent of the conditional:
assoc: If | (w v (x v y)) ‑> z, then | ((w v x) v y) -> z.
We can now recast Theorems 4.1 and 4.2, as follows.
Theorem 4.3: Suppose
we have an axiomatic system that satisfies otrans, oreflx, oded.l, and assoc.
Then for all expressions A, B, and C: if |
A ‑> B, then | A ‑> s(C,
B).
Proof:
1. | A ‑> B given
2. | (B v (A v C)) ‑> B oreflx
3. | ((B v A) v C) ‑> B 2, assoc
4. | (B v A) ‑> s(C, B) 3, oded.l
5. | A ‑> s(C, B) l,4, otrans
Theorem 4.4: Suppose
we have an axiomatic system that satisfies otrans, oreflx, oded.l, oded.2 and
assoc. Then for all expressions A, B, and C: if |
A ‑> B, then | (A v C) ‑> B.
Proof:
The proof is immediate from Theorem 4.3 and oded.2.
Any syntactic structure which serves to reflect a conditional in the consequent of that conditional, in the sense of oded.l, must be monotonic in that reflection. So in particular, if the conditional reflects itself, then in its second order position, it will be monotonic. For example, suppose our conditional ‘‑>‘ is the syntactic structure of oded.l, so that the following holds:
oded.3: If | (x v y) ‑> z, then | x ‑> (y ‑> z).
Then Theorem 4.3 tells us that the conditional is monotonic in its second order occurrences:
If | A ‑> B, then | A ‑> (C ‑> B).
Hence, all of its nonmonotonic behavior will be limited to its first‑order occurrences.
But even worse for seekers after a nonmonotonic logic, any conditional which has a uniform reflection by any syntactic structure in its consequence, in the sense of oded.l and oded.2, any such conditional will be monotonic. If a conditional is provable, then so will be a similar conditional with arbitrary added conjunctive components in its antecedent. But then many moves which we make in common sense reasoning will not hold for such a logic. For example one cannot convert a nonmonotonic conditional with a conjunctive antecedent into a conditional chain and back again.
In summary, there can be no uniform connection between nonmonotonic consequence relations and nonmonotonic conditionals. Out of necessity, researchers in nonmonotonic logics have given up the deduction theorem for various specific pet systems; thus some may be inclined to simply shrug at these results. But these results are not peculiar to some specific system and some specific connective; they are much more general. They tell us that all nonmonotonic conditionals and all nonmonotonic consequence relations are logically incompatible. Paradoxically, trying to formally force a connection between the two guarantees that neither the consequence relation nor the conditional will be nonmonotonic. But then ordinary patterns of common sense reasoning seem to have no counterpart in such logics. So a theory of nonmonotonic conditionals that broadly accords with common sense reasoning is impossible.
7 The Nature of Nonmonotonic Reasoning
Perhaps the first point that needs to be emphasized is that one cannot escape the four impossibility proofs by introducing strange, new modal operators, conditionals, or other connectives into the object language. Except for Theorems 4.3 and 4.4, none of our theorems depend in any way on the logical particles in the object‑language, and those latter two only assumed the presence of a simple conjunction operator. The theorems apply no matter what exotic syntactic structures or inference rules are introduced. I do not mean to suggest that there are no good reasons for introducing new syntactic structures or inference rules into our logical systems. But changes in our underlying logic cannot be motivated by claims that the changes will produce a nonmonotonic proof theory that accords with rational belief. What we have shown is that no matter how exotic, if a proof theory is to accord with rational belief, then it must be monotonic, period.
It seems we are now in a very strange position. We began this paper by giving a rather straight forward, uncomplicated example of common sense reasoning that is clearly nonmonotonic. Such examples can be enumerated without end. But we have just presented four rather distinct proofs that a nonmonotonic logic that corresponds with rational belief is impossible. Are we left to conclude that common sense nonmonotonic reasoning is illogical or irrational? Not at all. I do not want to suggest that nonmonotonic reasoning is irrational. What I do want to suggest is that we can account for nonmonotonic reasoning by using standard monotonic logics; we do not need a ‘weird’ logic to account for the process.
Thus far we have show that no nonmonotonic consequence relation can correspond to universal principles of either unconditional or conditional rational belief. We have also shown that any nonmonotonic object language conditional is incompatible with elementary principles of relative frequency. And we have shown that given elementary principles of common sense reasoning, nonmonotonic conditionals in the object language are logically incompatible with nonmonotonic consequence relations. But we also assume that common sense nonmonotonic reasoning is rational. Hence, the nonmonotonicity of common sense reasoning cannot be accounted for by any nonmonotonic character of the logic that we use. The nonmonotonicity of commonsense reasoning is due to the way we use logic, and not due to the logic that we use.
Let us consider a simple analogy. The field of astrophysics changes relatively quickly for a science. For example, our views of the large scale structure of the universe have changed drastically, several times over the last 50 years. However, no one would suggest that we need a nonmonotonic theory of thermodynamics, general relativity theory, or elementary arithmetic just because we in astrophysics change our minds about some astronomical phenomenon when we get new evidence. Mathematics and the fundamental theories of physics are just tools that astrophysicists use to try to understand astrophysical phenomena. In a similar way, logic is a tool that we can use to help us understand the human process of reasoning. Logic is not itself a full account of the process of reasoning, any more than is general relativity an account of the process of the evolution of the universe. Logic is a tool to help us understand the process of reasoning, just as general relativity is a tool to help us understand the evolution of the universe.
Part of what makes the jump from ‘nonmonotonic reasoning’ to ‘nonmonotonic logic’ seem so appealing is a confusion of logical theory with an account of the human process of reasoning. The term ‘logic’ is commonly used in several senses. In one sense, ‘logical’ is used interchangeably with ‘reasonable’. To ask for the logic behind someone’s inferences or actions is to ask for an account that would make us see those inferences or actions as rational; that is, we want a story that would allow us to empathize, to see that we would make the same inferences or perform the same actions in similar circumstances. When we arrive at such an understanding, we feel the inferences or actions are ‘rule governed’, not random or erratic. But it would be a mistake to jump from the observation ‘Her inferences were quite logical’ to the claim that there is some different formal system with weird connectives and/or inference rules that is needed to account for her inferences. Reasoning is a human process that takes place through time. Logic provides, at best, an instantaneous snapshot of a ‘rational’ consequence relation. The difference is as fundamental as the difference in physics between dynamics and statics.
Before giving a couple of examples of how monotonic logic can be used to provide an account of nonmonotonic reasoning, we will sketch an elementary abstract account in terms of model theory and associated proof theory that may serve as a simplified general framework to aid our understanding. Consider some standard deductive consequence relation, which we will symbolize by ‘ |D’. Let us also assume we have given some notion of deductive model, so that with each set ' of sentences we have an associated set DM(') of the deductive models of '. The usual soundness and completeness results can be stated as follows:
D‑soundness: If ' |D A, then DM(') f DM({A}).
D‑completeness: If DM(') f DM({A}), then ' |D A.
Can we construct a similar account for inductive logic (i.e., so‑called nonmonotonic logic)? Using similar notation, it seems we need an inductive consequence relation and some notion of inductive models, and hopefully parallel soundness and completeness results.
I‑soundness: If ' |I A, then IM(') f IM({A}).
I‑completeness: If IM(') f IM({A}), then ' |I A.
But as we have seen above, reasonable accounts of inductive model theory (e.g. in terms or rational belief structures, or in terms of probability theory) satisfy the following condition:
subset: IM(' c )) f IM())
And this combination immediately gives rise to monotonicity, which we are trying to avoid. We now turn to a consideration of how this situation may be remedied.
It would seem pretty strange if there was no relationship between the notion of a deductive model and an inductive model, particularly when there is a strong relationship between the two consequence relations. After all, it is usual to treat deductive consequence as just a ‘supreme’ case of inductive consequence. In fact, our formal theory would be much simpler if, instead of having both deductive models ‘DM(')’ and inductive models ‘IM(')’ we could just settle on one notion of model ‘M(')’ and use it throughout. But if inductive consequence is not to coincide with deductive consequence, then there must be some set E and some sentence C such that E |I C, but not E |D C. Hence, if we are going to have only one notion of model, it must not be the case that M(E) f M({C}). So the pressure is to abandon inductive soundness.
Having only one notion of model is so appealing that it is not to be abandoned lightly, especially in view of the lack of consensus in the literature concerning the notion of an inductive model. But then we are faced with the problem of having a semantic account of inductive consequence. There is a very simple suggestion for resolving the problem that even has some basis in the facts of human reasoning. Going back to our original example of Sarah reasoning about the crowds at the mall, every time Sarah recalled some new information, she revised her model of the actions of potential shoppers. Note she did not revise her beliefs concerning the conservation of matter or the color of her car. That is. Sarah’s revisions were not random, nor did she consider all possible revisions consistent with the new information.
Suppose we associate with each set ' of WFFs some set of ‘presupposed’ models PM('). Of course we assume that the presupposed models of ' are in fact models of ', so PM(') f M('). Our examples suggest that we define inductive consequence as follows:
' š|I A iff PM(') f M({A})
That is, when reasoning inductively about the consequences of ', we select some ‘presupposed’ set of models (pictures of the world, universe designs) in which ' holds; we do not consider all possible models of '. If all of these presupposed models of ' turn out to be models of A, then we say that A is an inductive consequence of '. From this characterization, it would immediately follow that if ' š|D A, then ' š|I A, but not the reverse, just as desired. Further, there is no reason to expect that PM(' c )) f PM('), so monotonicity would not follow in general.
I do not wish to insist on any more detail about how the presupposed models are determined. It could be that presupposed models differ from one individual to another. It could be that presupposed models are determined by the amount of energy required to alter the current mental model. It could be that the way in which we alter our mental models of the world in response to new information is determined by sociological conditioning, or by genetic factors, or both, or perhaps neither. There are a lot of alternatives. I do not pretend to have the answers as to the origin, selection, or ordering of the presupposed models. I do not wish to suggest that these questions are unimportant. On the contrary, I would agree that such questions are among the most important in our understanding of inductive (nonmonotonic) reasoning. In fact, I would urge that they deserve a great deal of research effort. But my goal in this part of the paper is rather modest. I only want to show that with the presupposed model, one only need employ standard monotonic logic in order to construct systems which reason in a non-monotonic manner. In the previous sections, I demonstrated that rational nonmonotonic logic is impossible. In this section and the next, I will show that nonmonotonic logic is not even needed.
As presented here, our account is much simpler than the ‘preferential models’ account of Besnard and Siegel (1988); the two accounts were developed independently but have some commonality. We are strongly sympathetic to their approach as a general framework in which to think about nonmonotonic reasoning, though we differ on specific technical details, and hence the slight shift in terminology. We do not wish to suggest that they would agree with any of the positions advocated here. Our simple account also has obvious affinities with the preferential models account of Kraus et al (1990); however, we doubt that the technical complexities of their systems can be made to correspond to some independent account of rational belief.
Note that making inferences based on a set of presupposed models amounts to making an assumption, or a bet, that the universe will turn out to be a certain way. One is assuming, or betting, that it is safe to ignore those models of ' that are not in the presupposed set. Because inductive inferences frequently go beyond what is deductively warranted by the premises, we can say in general that PM(') M('). Hence, on this bare bones theory, even if ' š|I A, it will still be the case that Pr(A, ') = 0 for some probability distributions Pr. So as we have seen, inductive consequence is not definable in terms of some universal property of all probability functions.
In terms of proof theory, the picture we are suggesting is extremely simple, and not new. We get around in the world with some mental model of how the world works; formally, uncertainties in our mental picture are equivalent to having a set of more complete models instead of just one model. When I reason inductively, I just draw standard deductive consequences from my mental models. When I get new information, I update or change my mental models, and inductive consequences of the new information are just deductive consequences of the updated models. Since the set of presupposed models does not constitute all the models, the inductive consequences are a superset of the deductive consequences.
Of course there are lots of interesting questions about how models get changed or updated, especially if the new information contradicts something in the current model. But just how models are actually updated by human beings is a matter of empirical fact, not a question of logic. If one wishes to be prescriptive about the way in which models should be changed, then clear criteria for preferring one updating scheme to another must be specified. Claims like ‘changing models this way rather than that is more likely to lead to correct conclusions’ make very strong claims about the way our world actually operates, and amount to empirical assumptions about our world which should be explicitly stated and empirically tested. Remember, it is not our goal here to answer such questions. Our goal is only to show that we do not need any nonmonotonic logic to understand nonmonotonic reasoning. In terms of understanding human reasoning, the effort spent trying to develop some nonmonotonic logic is totally wasted; that effort should be redirected to potentially more fruitful areas, such as how humans update their presupposed models or shift from one to the other.
Our bare‑bones semantic framework has a lot in common with the semantics of spheres proposed by Lewis (1973) and subsequently applied by Delgrande (1988) to nonmonotonic logic. But from the discussion in our impossibility proofs, serious problems arise with Delgrande’s approach when he attempts to use the semantics as a semantics for an object‑language conditional, and then use that conditional to define a nonmonotonic consequence relation. In light of our previous discussion, it is particularly suggestive that Delgrande does not allow his conditional to be nested, and thus it is more like a consequence relation than an object language conditional. Certainly there are nonmonotonic conditionals in English, and their logic is important. But as we have seen, nonmonotonic consequence relations cannot be generally based on object‑language constructs.
On the other hand, our associated bare‑bones proof theoretic account has a lot in common with default logics of the sorts elaborated by Reiter (1980). From our point of view, one can see the default rules as an attempt to code information about the current mental model, as well as information about how the model is to be changed when certain sorts of information are encountered. However, our simple picture does not require some special inference rules, nor some special object language connectives. From the presupposed models point of view, so‑called default rules are highly simplified encodings of empirical assumptions about our world. As such, they are best thought of as part of the corpus of common sense empirical wisdom (folk physics, folk psychology, folk biology, etc.) rather than as part of logic.
The heart of nonmonotonic reasoning is not in some new logic; rather, what is needed is more information about how our models of the world come to be originally formulated and then subsequently changed and updated in response to new information. The logic that we use to draw conclusions is always the same; nonmonotonicity is just the result of changing our background assumptions or models and using the same old logic to draw conclusions.
This simple framework also sheds light on the problems we encountered when trying to use an object‑language conditional to reflect either the conditional of conditional probability or any allegedly nonmonotonic consequence relation (our third and fourth proofs, above). Both cases are impossible, as we demonstrated above. Thus nonmonotonicity seems to be essentially meta‑theoretical. It is a result of die way in which logic is used; it is not the result of syntactic constructs within the language. The nonmonotonicity of common sense reasoning is a result of the way we use logic; it is not a result of the logic we use.
8 Non‑Monotonic Systems
The general framework at which we have arrived is very simple. Common sense reasoning by humans uses (some) standard monotonic logic. The reasoning process involves (for all members of the animal kingdom) the use of mental models of our world. When we get new information, we update or change our models. We use our models and standard logic to draw conclusions. The nonmonotonic aspect of our reasoning is a result of the model changing process, not a result of the logic. I do not suggest that this picture is particularly original nor startlingly new. However, I do want to suggest that it is richer and more powerful than has been widely acknowledged. Let us turn to a few examples of systems based on this ‘presupposed models’ account.
Our first example of a nonmonotonic reasoning system is the theory of probability itself. One way of looking at a probability distribution is as a sequence of elementary weighting functions over a F‑field of sets of models, where models are conceived of as maximally consistent sets of WFFs (see Morgan (1991) and Van Fraassen (1981)). For a given evidence set ', we find the first weighting scheme in the sequence in which there are models of ' with non‑zero weight. If there is no such scheme in the sequence, then all probability values, given that ', will be 1. If there is such a weighting scheme, then the probability of each sentence A, given ', is just the weighted relative frequency of the ' models in which A holds. When we add new information ), we just select the first weighting scheme in the sequence in which there are models of ' c ) with non‑zero weight, and compute probabilities as before. In the simple presupposed models picture, each model gets a weight of 0 or 1, but in this generalized scheme, we may allow more than just two values for the weights. Each distinct probability distribution just corresponds to a different weighting scheme over the models. Seen in this light, probability theory is a generalization of the presupposed models analysis of nonmonotonic reasoning. Probability theory just is relative frequency applied to the presupposed models analysis of nonmonotonic reasoning. Probability theory makes one very important assumption to which we would like to draw attention. It assumes that the order in which we arrive at our premises does not matter. More graphically, the probability of A, given the background information <B1, B2, B3> is the same as given <B2, B3, B1>, and the same as given <B1, B3, B2> or any other permutation. In some treatments of weird conditionals, the order of the antecedents is deemed to be important.
As we have mentioned earlier, probability theory is much too detailed for most applications; we are just not in a position to assign precise numerical measures to all evidence‑conclusion pairs. Further, the abstract theory of probability really does not tell us which distributions we should use in any given circumstance. Our second example is a rather simple scheme which at least partially avoids both of these problems. The approach is based on the pioneering work of Carnap (1962).
Many examples of nonmonotonic reasoning may be thought of as based on simple statistical information derived from experience concerning elementary classificatory properties, combined with the following ‘rule’: if Pr(A, ') > .5, then it is reasonable, given ', to act as if A is the case. (Of course the reasonableness of this rule is based on the assumption of comparable, modest potential gains and losses. We show by a simple example below the need to consider the magnitude of gains and losses when choosing how to act.) We assume the language contains predicates, individual constants, and the standard classical logical particles but no quantifiers. Our experience is always finite, so our background information E concerns only a finite number of individuals:
I(E) = {a1, ..., an}
In addition to S, we are given information '(b) about some new individual b, and we would like to know whether or not to conclude E(b); both ' and E may be very complex and may contain information about individuals other than b.
Of course the very first thing to check is whether E c '(b) | E(b). If E(b) is just a deductive consequence of our background information, then we are done. For computational purposes, we could use any standard theorem checker for the deductive logic in question, perhaps with resource limitation cut‑off to ensure termination. In a similar way, we may check to see if E c '(b) | - E(b), for if so, our problem is solved.
If neither E(b) nor - E(b) is a deductive consequence (within whatever resource limits we have imposed), we need to consider whether or not either is an inductive consequence. For the purposes of this second problem, we may act as if I(E) constitutes the entire universe. We then find the set of objects '(I(E)) in this universe which, according to the information in E, have all of the properties listed in ':
'(I(E)) = {x: x 0 I(E) and E | B(x) for every WFF B(x) 0 '(x)}
If '(I(E)) is empty, we refuse to draw the conclusion on the grounds of inadequate information. If it is not empty, we then find the set of objects E('(I(E))) which, according to the information in E, have the property E:
E('(I(E))) = {x: x 0 '(I(E)) and E | E(x)}
Then if E('(I(E))) contains more than 50% of the objects in '(I(E)), it would make sense to predict that E(b). On the other hand, if E('(I(E))) contains less than 50% of the objects in '(I(E)), it would make sense to predict that - E(b).
This simple scheme is an elementary formalization of reasoning by analogy. It is similar to circumscription and is applicable to many of the same kinds of problems. But our scheme is easy to implement, is computationally tractable, and corresponds well with our intuitions. It easily handles the standard examples about flying and birds, and similar cases as well. It does not require the introduction of weird logical particles nor the use of dubious ‘default’ rules.
Obviously the scheme is not monotonic. Adding information 7 about new objects not in I(E) may change the distributions, since 1(E c 7) will not be the same as I(E). Further, if we add more background information )(b) about the individual in question we may find '(l(E)) is different from (' c ))(I(E)). The scheme has obvious limitations, since it deals only with very simple, non-quantified languages. However, we could extend the scheme to quantified languages by requiring the expansion of all quantifiers using the objects in I(E). (With quantifiers, it becomes possible to make claims about the number of objects without naming them, so in some cases difficulties may result from such a treatment of the quantifiers.) There are many other ways to use probability in a nonmonotonic inferencing system; see Kyburg (1994) for a good alternative.
As a final example of a nonmonotonic system, we will consider a technique which is similar to so‑called default logic. The technique has nothing directly to do with probability theory, although it is actually derived from the method presented in Morgan (1996) for constructing canonical probability distributions. The intuitive picture behind the technique is that part of common sense is the possession of an ordered sequence of hypotheses, each of which is a potential background assumption which may be used to fill in holes in our knowledge. When we are given some set of premises, we augment those premises by trying to consistently add the hypotheses, in order, to the premises. Any hypothesis that cannot be consistently added is just abandoned. Common sense conclusions are then just the deductive conclusions of this expanded premise set.
For a slightly more formal account, we assume to be given some deductive consequence relation ‘|’ and a corresponding notion of consistency. Let S be any arbitrary (preference) ordered set of WFFs, which will be used as potential background assumptions:
S = {H1, H2, ...}
Consider some given premise set '. We use S to determine an expanded premise set '(S), as follows:
'0 = '
'i+1 = 'i c {Hi+1}, if 'ic {Hi+1} is consistent
= 'i , otherwise
'(S) = U 'i
There is no requirement that the set S be consistent. The members of S are considered to be alternatives. Thus even if S is itself inconsistent, it may well be that individual members of S are consistent with the premise set. As long as the premise set ' is consistent, the expanded premise set '(S) will be consistent, no matter what S contains. We use the expanded premise set to determine the S-consequences (inductive) of ', as follows:
' |S E iff '(S) | E
Clearly even for a fixed sequence S, the S‑consequence relation will be nonmonotonic. If we add information ) to the set ', we may well find that different hypotheses from sequence S get put into the expanded premise set. In general, we will not have '(S) f (' c ))(S).
For a simple, but illustrative example, we will use the following symbols for the corresponding English phrases:
Px: x is a penguin
Bx: x is a bird
Ax: x is an animal
Fx: x is capable of flight
t: Tweety
For the example, we will use the following sequence of ordered background assumptions:
S = {(x)((Px v Bx v Ax) e - Fx), (x)((Bx v Ax) e Fx), (x)(Ax e - Fx)}
Using the definitions above, we have the following expanded premise sets:
{At}(S) = {At} c S
{At, Bt}(S) = {At, Bt} c {(x)((Px v Bx v Ax) e - Fx),
(x)((Bx v Ax) e Fx)}
{At, Bt, Pt}(S) = {At, Bt, Pt} c {(x)((Px v Bx v Ax) e - Fx)}
When told only that Tweety is an animal, we conclude that Tweety cannot fly, since {At} (S) | ~Ft. Similarly, when told that Tweety is an animal but also a bird, we conclude that Tweety can fly, since {At, Bt}(S) | Ft. And finally, when told that Tweety is an animal and a bird but also a penguin, we conclude that Tweety cannot fly, since {At, Bt, Pt}(S) | ~Ft.
This scheme makes use of a deductive consequence relation and the related notion of consistency, both of which may be computationally problematic, especially for a first‑order language. But these notions are problematic for humans as well. In any real application, reasonable results could be obtained by relying on resource limited computational processes. We have said nothing about the origin of the sequence S. However, default logics generally have nothing to say about the origin of the defaults. Perhaps some general principles could be formulated, like ‘when comparing two universal conditionals, prefer the one with the logically more complex antecedent over the one with the simpler antecedent’. From our point of view, the important points to note are that (i) we rely on ordinary deductive logic, and (ii) we do not require any weird connectives, any special inference rules, nor any appeal to complex and powerful fixed point theorems.
9 Conclusions
We have given four formal proofs that a rational nonmonotonic logic is impossible. In the first proof, we showed that it is impossible to formulate a nonmonotonic consequence relation that corresponds to any characterization of unconditioned rational belief structures. In the second proof, we showed that it is impossible to formulate a nonmonotonic consequence relation that corresponds to any characterization of conditional rational belief structures. In our third proof, we showed that no matter how complex the formal language, no object‑language construct (e.g., a nonmonotonic conditional) can reflect elementary principles of relative frequency. And in our fourth proof, we showed that given standard accepted patterns of reasoning, there is a fundamental logical incompatibility between any nonmonotonic consequence relation and any object‑language nonmonotonic conditional; any attempt to reflect one by the other forces both to be monotonic.
Careful attention to the proofs suggests that nonmonotonicity is a function of the way we use logic, rather than being a function of the logic we use. We suggested a very elementary semantic account, the presupposed models account, of non‑demonstrative reasoning. Using this account, we were able to explain how nonmonotonic reasoning arises from the application of standard monotonic logic to situations in which additional acquired background information leads us to change our presupposed world model. We then discussed three examples of nonmonotonic reasoning systems based on standard monotonic logic.
We certainly do not want to claim that our three examples of nonmonotonic reasoning systems exhaust the possibilities. Our primary purpose was to demonstrate the variety of systems which can be constructed without resorting to weird logics. The simple presupposed models picture of nonmonotonic reasoning is very general and deserves to be more widely investigated.
A secondary goal was to illustrate how nonmonotonic reasoning systems depend on empirical assumptions. Recall that in earlier sections, we proved that no nonmonotonic proof system could cohere with universal principles of rational belief; every nonmonotonic proof system logically must violate elementary principles of relative frequency. Semantically speaking then, nonmonotonic reasoning must violate soundness with respect to rational belief structures. So specific examples of nonmonotonic reasoning are only rational relative to a proper subset of rational belief structures. It is the selection of that proper subset which constitutes the empirical assumption behind the reasoning. It is impossible to construct a nonmonotonic reasoning system without building in empirical assumptions. It is important that researchers in the field clearly recognize these logical requirements and try to be explicit about the empirical assumptions built into their systems. These logical facts about nonmonotonic reasoning are clearly incorporated in the presupposed model. In terms of formal structures, nonmonotonic reasoning is specific to the presupposed world model being considered. That is, to know what inductive conclusions to draw, one must know the details of the presupposed world model structure.
A third important item to note is that all of our different example reasoning systems are applicable to many of the same sorts of problems (e.g., the question of Tweety’s flight capabilities). As with automated (deductive) theorem proving systems, there is no one ‘correct’ nonmonotonic reasoning system. Each system will have its advantages and its disadvantages; each may do better on certain kinds of problems than others.
Finally, I would like to make a disclaimer. I do not claim to have shown here that there are no general principles of reasoning that would result in changing the conclusions that we draw when we change our background assumptions; I have not shown that there are no nonmonotonic inference patterns. However, I suspect that most such patterns, when fleshed out with all qualifications, are more reasonably viewed as part of a higher order theory of rationality rather than as some sort of inference rules.
As a simple example, consider the following elementary principle of ‘statistical syllogism’:
SG.1.1 P(A,') > P(-A, ')
SG.1.2 '
- - - - - - - - - - - - -
SG.1.3 (probably) A
Now, this simple exposition does not begin to do justice to the verbiage that generally accompanies its presentation. At the very least, we must qualify the pattern by some sort of requirement for total evidence, i.e., we must require that ' contains all the available evidence. Note that the principle of total evidence cannot just be expressed by:
P(A, ' c )) > P(- A, ' c )) for, all ).
This latter requirement is met by almost none of the examples to which we want to apply the statistical syllogism pattern, since probability distributions are in general not weakly positively monotonic in the evidence position. For example, given only that Tweety is a bird, it is reasonable to hold that it is more likely that Tweety can fly than that Tweety cannot fly; But the relation will be reversed if we are given the fact that Tweety is a bird and a penguin. So the total evidence requirement has to do with what we know, accept, or believe.
The following is a more reasonable characterization of statistical syllogism:
SG.2.1 X believes P(A, ') > P(- A, ').
SG.2.2 X believes the members of '.
SG.2.3 X believes there is no ) such that:
(2.3a) X believes the members of ), and
(2.3b) X believes P(- A, ' c )) > P(A, ' c )).
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SG.2.4 X should believe A.
For ‘belief’ here, I do not have anything fancy in mind, just good old ordinary reasonable belief. Certainly if any of the premises changes in a fundamental way, we would not want to insist on the same conclusion. But now as we have just phrased it, this pattern is part of a higher order theory about rational belief.
But if we are going to use statistical syllogism to update a data base, we might phrase it as follows:
SG.3.1 (A, ') > P(-A, ')
SG.3.2 ' is the total content of the data base.
- - - - - - - - - - - - - - - - - - - - - - - - -
SG.3.1 A should be added to the data base.
Now, if we add additional information ) to the data base, we have falsified SG.3.2, and thereby undercut the reason for adding A to the data base. Hence, we may have to remove A on the basis of subsequent information. (This example well illustrates the need for the currently common practice of distinguishing between information given ‘externally’ and information which results from ‘internal’ computations based on the external material.) But this inference pattern is no different in principle from observing that: (1) if you give me p e q and p, then I am entitled to conclude q; but (2) if you subsequently retract p, I can no longer conclude q from p e q alone. In other words, this rule looks suspiciously like the presupposed models picture of nonmonotonic inference!
There remains one aspect of nonmonotonic reasoning which I would like to stress. In many actual cases, the nonmonotonicity of our reasoning is a function (at least in part) of our perceptions concerning the utility of acting as if various conclusions were true. As the perceived utilities change, the conclusions drawn from the same premises may change as well. As a simple example, suppose we are told only that Tweety is a bird, and we are asked whether or not Tweety can fly. Consider the two following ‘pay off’ regimes:
Regime 1: You answer ‘yes’.
A. You are correct.You receive $10,000.
B. You are incorrect.You receive nothing.
You answer ‘no’.
A. You are correct.You receive nothing.
B. You are incorrect.You receive nothing.
Regime 2: You answer ‘yes’.
A. You are correct.You receive nothing.
B. You are incorrect.You must pay $10,000..
You answer ‘no’.
A. You are correct.You receive nothing.
B. You are incorrect.You receive nothing.
Clearly under regime 1 the rational person ought to answer ‘yes’, since there is a potential gain for that answer and no potential gain for answering ‘no’. However, under regime 2, the rational person ought to answer ‘no’ because an answer of ‘yes’ exposes one to a potential loss and no gain, while the ‘no’ answer does not expose one to any loss. So in this example, the utilities completely outweigh any statistical or other considerations.
The presupposed models view has been around for some time.
See Daniels and Freeman (1980) for the details of a formal analysis of
subjunctive conditionals based on this idea as well as for older references to
the literature. Quite independently of our work,
Acknowledgement
Over the years I have benefited greatly from discussions of this material with many people, especially with those who disagree strongly with my views. In particular I would like to thank Romas Aleliunas, Jim Delgrande, David Etherington, Robert Hadley, Henry Kyburg, and Don Nute.
References
Besnard, P., and Siegel, P. (1988), ‘The preferential‑models approach to nonmonotonic logics’, in Smets et al, 1988, pp. 137‑161.
Besnard, P. (1989), An Introduction to Default Logic.
Brewka, G., Dix, J., and Konolige, K. (1997) Nonmonotonic Reasoning: An Overview. Stanford: Center for the Study of Language and Information.
Carnap, R. (1962), Logical Foundations of Probability, 2nd
ed,
Daniels, C., and Freeman, J. (1980) ‘An analysis of the subjunctive conditional’, Notre Dame Journal of Formal Logic 21, pp.639‑655.
Delgrande, J. (1988), ‘An approach to default reasoning based on a first‑order conditional logic’, Artificial Intelligence 36 , pp. 63‑90.
Kraus, S., Lehmann, D., and Magidor, M. (1990), ‘Nonmonotonic reasoning, preferential models and cumulative logics’, Artificial Intelligence 44, pp 167‑207.
Kyburg, H. (1994), ‘Believing on the basis of the evidence’, Computational Intelligence 10, pp. 3‑20.
Lewis, D. (1973), Counterfactuals,
Morgan, C. and Mares, E. (1991), ‘Conditionals, probability, and non‑triviality’, Journal of Philosophical Logic 24, pp. 455‑467.
Morgan, C. (1991), ‘Logic, probability theory, and artificial intelligence – Part 1: the probabilistic foundations of logic’, Computational Intelligence 7, pp. 94‑109.
Morgan, C. (1994),
‘Evidence, belief, and inference’, Computational Intelligence 10, pp. 79‑84.
Morgan, C. (1996) ‘Canonical models and probabilistic semantics’, presented at the annual meeting of the Society for Exact Philosophy, held at East Tennessee State University; published in Logic, Probability and Science, N. Shanks and R. Gardner, eds., pp. 17-35, in the series Poznan Studies in the Philosophy of Science and the Humanities (2000).
Morgan, C. (1997), ‘Conditionals, comparative probability, and triviality’, presented at the annual meeting of the Society for Exact Philosophy, held at McGill University, published in Topoi, vol. 18 (1999), pp. 97-116.
Morgan, C. (1998), ‘Non‑monotonic logic is impossible’, Canadian Artificial Intelligence 42, pp 19-25.
Reiter, R. (1980), ‘A logic for default reasoning’, Artificial Intelligence 13, pp. 81‑132.
Smets, P. et al., eds. (1988), Non‑Standard Logics for
Automated Reasoning,
Turner, R. (1984) Logics for Artificial Intelligence,
Van Fraassen, B. (1981) ‘Probabilistic semantics objectified: 1. Postulates and logics’, Journal of Philosophical Logic 10, pp. 371‑394.