Jump to ContentJump to Main Navigation
Thinking as ComputationA First Course$

Hector J. Levesque

Print publication date: 2012

Print ISBN-13: 9780262016995

Published to MIT Press Scholarship Online: August 2013

DOI: 10.7551/mitpress/9780262016995.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2018. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use. Subscriber: null; date: 23 October 2019

Case Study: Other Ways of Thinking

Case Study: Other Ways of Thinking

Chapter:
(p.238) (p.239) * 11 Case Study: Other Ways of Thinking
Source:
Thinking as Computation
Author(s):

Levesque Hector J.

Publisher:
The MIT Press
DOI:10.7551/mitpress/9780262016995.003.0011

Abstract and Keywords

This chapter considers other ways of thinking, that is to say, other ways of using what is known. Section 11.1 reexamines back-chaining as the starting point for discussing the other ways of thinking. Sections 11.2, 11.3, and 11.4 discuss explanation, learning, and propositional reasoning, respectively. There is something of a paradox in using Prolog and back-chaining to look at new ways of thinking that go beyond back-chaining. And as the paradox is resolved, thinking itself will end up becoming the subject matter of the thinking. This reinterpretation will require a major conceptual shift, and this chapter is the most technically challenging one in the book.

Keywords:   Prolog programming, back-chaining, thinking, learning, explanation, propositional reasoning

The discussion in previous chapters demonstrated how thinking supports intelligent behavior in a variety of domains, including puzzles, visual scenes, English sentences, actions, and games. Thinking uses what is known, and across the domains, the knowledge varied, but the thinking procedure was always the same: back-chaining. Even with complex extensions, like situations and fluents or minimax, the thinking procedure remained the same. This chapter, however, considers other ways of thinking, that is to say, other ways of using what is known.

These new modes of thinking can be illustrated with a simple example. Consider the following four sentences:

  1. 1. If X is a polar bear, then X is white.

  2. 2. Thornton is a polar bear.

  3. 3. Thornton is white.

  4. 4. If X is a swan, then X Is whIte.

From (1) and (2), the sort of thinking considered so far would allow (3) to be concluded, which is a logical entailment of (1) and (2). This sort of reasoning is often called deduction: from (1) and (2), one can deduce (3).

But suppose only (1) and (4) are known. Nothing at all is known about the individual, Thornton. If he were actually observed to be white, one might conjecture that he is a polar bear, since it is known that polar bears are white. This sort of thinking is not deductive; it does not come with the guarantee of logical entailment. Thornton might be a swan or something completely different. Nonetheless, the conjecture that Thornton is a polar bear accounts for the color, as does the conjecture that he is a swan. This type of nondeductive thinking was called abduction by the philosopher C. S. Peirce: from (1) and (3), one can abduce (2). This is the sort of reasoning that underlies explanation: according is what is known, Thornton being a polar bear would explain his being white.

Now assume that at first only (2) and (3) are known. Nothing at all is known about the color of polar bears or of swans. But then imagine (5) and (6) are observed:

  • (p.240) 5. Shako is a polar bear, and Shako is white.

  • 6. Peppy is a polar bear, and Peppy is white.

One might want to generalize from these and surmise that all polar bears are white. Again, this sort of thinking is not deductive. The facts about the individuals do not logically entail that all polar bears are white; the very next polar bear could turn out to be a different color. C. S. Peirce called this type of nondeductive thinking induction: from (2) and (3) (and others like it), one can induce (1). This is the sort of reasoning that underlies a form of learning: from observing the color of polar bears Thornton, Shako, Peppy, and perhaps others, one learns that as a rule, polar bears are white.

So, to summarize:

  • from (1) and (2), one can deduce (3);

  • from (1) and (3), one can abduce (2);

  • from (2) and (3), one can induce (1).

Finally, suppose (7) is added:

  • 7. Freddy is a either a polar bear or a swan.

Based on what is known, one can conclude (that is, deduce) that Freddy is white. This is logically entailed by (1), (4), and (7). If (7) is true, then there are only two possibilities, and in either case, given (1) and (4), the color must be white. However, this conclusion could not be drawn using the current thinking procedure, back-chaining. There is no way to even represent a sentence like (7), since it is neither an atomic sentence nor a conditional. This chapter also explores what is involved with this sort of thinking, called propositional reasoning. (It is sometimes also known as Boolean reasoning.) Back-chaining is a special case.

The chapter is divided into four sections. The first section reexamines back-chaining as the starting point for discussing the other ways of thinking. Sections 2, 3, and 4 discuss explanation, learning, and propositional reasoning, respectively.

There is something of a paradox in using Prolog and back-chaining to look at new ways of thinking that go beyond back-chaining. And as the paradox is resolved, thinking itself will end up becoming the subject matter of the thinking. This reinterpretation will require a major conceptual shift, and this chapter is the most technically challenging one in the book. (p.241)

Case Study: Other Ways of Thinking

Figure 11.1. A recap of back-chaining

11.1 Back-chaining as subject matter

The back-chaining procedure that Prolog uses is summarized in figure 11.1. (This simplified version omits negation, equality, and any mention of variables or unification in the atoms.) In talking about back-chaining, it is assumed that any knowledge to be applied in thinking is represented symbolically in the knowledge base (KB). The KB can be about a world of family relationships, or a visual scene, or a game; the clauses in the KB will talk about these items and the relationships among them. An example is the clause in the family program in figure 3.1 that relates the father predicate to the child and male predicates.

But suppose one wants to think about back-chaining itself rather than about fathers and families. The clauses in the KB would have to talk not about people but about knowledge bases, clauses and queries. Instead of a predicate like father(x, y) that holds if person x is a father of person y, there might be a predicate like est(k, q) (short for establish) that holds when the query q can be established by back-chaining from the knowledge base k. The arguments to the predicate est would be Prolog terms standing for a knowledge base and a query. In sum, clauses, knowledge bases, and queries would be represented as Prolog terms so that they can then be used as arguments to predicates like est.

How is this done? To simplify matters, assume for now that predicates have no arguments. Then a clause can be represented as a Prolog term using a list of constants: the head of the list will represent the head of the clause, and the tail will represent the body of the clause.

(p.242) Consider, for example, the Prolog clause

u :- p, b.

This can be represented by the Prolog term [u,p,b]. Similarly, a knowledge base can be represented as a list whose elements represent the clauses. Consider, for example, the following simple Prolog knowledge base:

a.

b.

u :- p, b.

p :- a.

This knowledge base can be represented using the following Prolog term:

[ [a], [b], [u,p,b], [p,a] ].

Of course, this list is not a knowledge base itself, but it can be used to represent one as a Prolog term. Each element of this list is a nonempty list representing a clause. (Atomic clauses are represented as lists with a single element.) Similarly, a query is represented as a list of atoms.

With these list representations, there are now sentences in Prolog that talk about the properties of knowledge bases and queries, just as previous sentences talked about properties of people. In particular, the predicate est(k, q) should hold if the query represented by q can be established by back-chaining from the knowledge base represented by k. In other words, the following behavior is desired:

?- est([[a],[b],[u,p,b],[p,a]], [p,b]).

Yes

?- est([[a],[b],[u,p,b],[p,a]], [c]).

No

The first query should succeed because the knowledge base represented by the first argument (that is, the four clauses just listed) logically entails both p and b, the query represented by the second argument. The second query should fail because that same knowledge base does not logically entail c.

How should this est predicate be defined? Keeping in mind that one is working not with knowledge bases and queries but with symbolic representations of them as lists, one can encode what was said about back-chaining in figure 11.1:

The predicate est(k, q) holds if one of the following holds:

  • 1. q is the empty list (where k can be anything). (p.243)

    Case Study: Other Ways of Thinking

    Figure 11.2. Querying the est predicate

    Case Study: Other Ways of Thinking

    Figure 11.3. Tracing the est predicate

  • 2. q has head a and tail t, and there is a list in k whose head is also a and whose tail is b, and est(k, q′) holds (recursively), where q′ is the list formed by joining b and t.

This leads to the following two clauses for the est predicate:

% est(K,Q): query Q can be established from knowledge base K

est(_,[]).

est(K,[A|T]) :- member([A|B],K), append(B,T,Q), est(K,Q).

In the first clause, est holds trivially when the query is empty. In the second clause, the query is nonempty; it has head A and tail T; and one must find (using member) an element of the knowledge base K whose head is also A and whose tail (the body of the clause) is B. To establish the original query, one must be able to establish (recursively) a new query Q that is the result of joining B and T (using append). Note that these two clauses for est form a very concise description of back-chaining.

Some additional examples of using this predicate appear in figure 11.2. It is not hard to see that this program works just the way Prolog does, that is, est(k, q) returns (p.244) success when the query represented by q would return success in Prolog for the knowledge base represented by k. The partial trace in figure 11.3 shows that est does its work in the same order as Prolog’s back-chaining would.

But so what? What is the point of all this? It shows that Prolog is well-suited for describing, among other things, how Prolog works. Once one can talk about thinking the way that Prolog does, one can also talk about thinking in other ways. So Prolog can be used to express other thinking procedures that work quite differently from Prolog itself. Understanding what those two simple clauses for the predicate est do and why is crucial to understanding the rest of this chapter.

11.1.1 A breadth-first thinking procedure

Let us examine a thinking procedure that differs from Prolog in a very minor way.

Perhaps the simplest part of back-chaining to modify is the order in which atoms are considered in a query. Consider a partial trace of a query that eventually fails:

Call: (18) est([[a],[b],[u,p,b],[p,a]], [u,c])

Call: (11) est([[a],[b],[u,p,b],[p,a]], [p,b,c])

Call: (12) est([[a],[b],[u,p,b],[p,a]], [a,b,c])

Call: (13) est([[a],[b],[u,p,b],[p,a]], [b,c])

Call: (14) est([[a],[b],[u,p,b],[p,a]], [c])

Fail: (14) est([[a],[b],[u,p,b],[p,a]], [c])

Note that before est can get to the atom c, it has to first deal with u. To deal with u, it has to go deeper and deal with p and b. To deal with p, it has to go deeper still and deal with a. Once it is done with a, it can go on to b. Once it is done with b, it finally gets to c, and then the query fails.

This sort of procedure is called a depth-first procedure, since the program handles new atomic queries that arise from the bodies of clauses before returning to the existing atoms in the query. In a breadth-first procedure, the program would consider existing atoms in a query before looking at new atoms that come from the bodies of clauses. It turns out that a breadth-first variant of est is very easy to express in Prolog:

% estbf(K,Q): query Q can be established from knowledge base K.

% The method is a breadth-first variant of back-chaining.

estbf(_,[]).

estbf(K,[A|T]) :- member([A|B],K), append(T,B,Q), estbf(K,Q).

The only difference between est and estbf is the order of arguments to append. A partial trace of the same query shows that the predicate estbf does not consider atomic queries in the same order as est does:

(p.245) Call: (6) estbf([[a],[b],[u,p,b],[p,a]], [u,c])

Call: (7) estbf([[a],[b],[u,p,b],[p,a]], [c,p,b])

Fail: (7) estbf([[a],[b],[u,p,b],[p,a]], [c,p,b])

Fail: (6) estbf([[a],[b],[u,p,b],[p,a]], [u,c])

The procedure estbf detects the failure before even getting to the atom p.

So estbf is a thinking procedure that does not work like Prolog. There is no contradiction in saying this. Prolog uses a depth-first procedure, and its atomic queries (like member and append and even estbf) will always be handled in this way. The estbf procedure is breadth-first, however, and its atomic queries (like u and c) are handled in that way. So Prolog is reasoning in a depth-first way about a thinking procedure that reasons in a breadth-first way. This way of handling thinking procedures that are different from the back-chaining of Prolog is at the root of everything else discussed in this chapter.

Note that the breadth-first variant of back-chaining avoids some of the difficulties encountered with a depth-first procedure:

?- est([[a],[p,p]], [p,b]).     % Gets stuck in a loop.

ERROR: Out of global stack

?- estbf([[a],[p,p]], [p,b]).     % Does not get stuck.

No

Although the estbf predicate terminates here without getting stuck, it is not guaranteed to always terminate:

?- estbf([[a],[p,p],[p,a]], [p]).

ERROR: Out of global stack

It is possible to revise back-chaining to ensure termination. But it is easier to move to a very different sort of procedure that is guaranteed to terminate.

* 11.1.2 A forward-chaining thinking procedure

Let us now turn to a more substantial change in how to deal with a knowledge base. Instead of working backward from a query the way that Prolog does, one can work forward from the knowledge base. This is what was termed forward-chaining in section 2.5.

The idea with forward-chaining is to start by focusing on the clauses whose bodies are empty, that is, the atomic sentences of the knowledge base. Call an atom that is the head of such a clause solved (meaning “known to be true”). To establish a query by forward-chaining, proceed as follows: (p.246)

Case Study: Other Ways of Thinking

Figure 11.4. A forward-chaining procedure        estfc.pl

  1. 1. If all the query atoms are solved, the procedure returns success.

  2. 2. Otherwise, look for a clause in the knowledge base whose head is not solved but whose body consists of atoms that are all solved.

    1. a. If none is found, the procedure returns failure.

    2. b. If one is found, extend the knowledge base to include the head as a new atomic clause in the KB, and go back to step 1.

A Prolog program that realizes this procedure is shown in figure 11.4.

Whenever the predicates est and estbf return an answer, the predicate estfc will return the same answer. Here is a trace of a query:

Call: (6) estfc([[a],[b],[u,p,b],[p,a]], [u])

Call: (7) estfc([[p],[a],[b],[u,p,b],[p,a]], [u])

Call: (8) estfc([[u],[p],[a],[b],[u,p,b],[p,a]], [u])

Exit: (8) estfc([[u],[p],[a],[b],[u,p,b],[p,a]], [u])

Exit: (7) estfc([[p],[a],[b],[u,p,b],[p,a]], [u])

Exit: (6) estfc([[a],[b],[u,p,b],[p,a]], [u])

Note that in the recursive calls to estfc, the KB first grows to include p (since the atom in its body, a, is already solved), and then u (since the atoms in its body, p and b, are then solved). At this point, the resulting new KB contains atomic clauses for all the atoms in the query, and so the query is established.

Unlike the predicates est and estbf, the forward-chaining estfc procedure will never get stuck in a loop. To see this, note that each time estfc is used recursively, it is on a new KB that includes an additional solved atom. Once all the atoms mentioned in the KB have been considered, nkb must then fail, and the procedure terminates. (p.247)

Case Study: Other Ways of Thinking

Figure 11.5. Making copies of terms

11.1.3 Back-chaining with variables, negation, and equality

Now consider a final version of est that will correctly handle variables, negation, and equality. As before, a knowledge base is represented by a list of the representations of clauses, where a clause is represented by a nonempty list of the representations of the literals. Recall from section 9.2.3 that an atom can be used as a Prolog term; this means that an atom can be used to represent itself. Equalities are represented using a predicate eq; and negations, using a predicate not. For example, the clause

q(a,X) :- p(Y,Z), \+ r(Y), Z=Y, s(X,Y).

is represented as a Prolog term by the following list:

[q(a,X), p(Y,Z), not(r(Y)), eq(Z,Y), s(X,Y)]

So knowledge bases with variables can be represented, as in the following:

?- est([[p(a)],[p(b)],[q(X),p(X)]], [q(a)]).

X = a

Yes

The predicate est appears to work with variables. But consider this query:

?- est([[p(a)],[p(b)],[q(X),p(X)]], [q(a),q(b)]).

No

The trouble here is that once q(X) unifies with q(a), the variable X is instantiated, and so q(X) will not unify with q(b).

A Prolog predicate that copies terms

The solution is a special built-in Prolog predicate copy_term(x, y), which holds if x and y are terms that are identical except that the variables appearing in y are all new, unrelated to those in x. Some examples of its use are shown in figure 11.5. As can be (p.248)

Case Study: Other Ways of Thinking

Figure 11.6. The final version of back-chaining        est.pl

seen, copy_term behaves just like Prolog = except in the presence of (uninstantiated) variables.

A final version of the est predicate that handles all the new elements (equality, negation, variables) appears in figure 11.6. Here is how it works:

  • The first clause is as before;

  • The second clause handles a query whose first literal is an equality by unifying the two arguments and then continuing with the rest of the query.

  • The third clause handles a query whose first literal is a negation by ensuring that the unnegated literal fails and then continuing with the rest of the query.

  • The fourth clause handles a query whose first literal is an atom. It is as before except that member_copy is used instead of member.

The member_copy predicate here behaves just like member when the first argument happens to be a list with just one element (representing a clause that has no body). Otherwise, member_copy uses copy_term. This has the effect of finding a matching clause in the knowledge base but leaving its variables uninstantiated for later use. (Assume here that variables only appear in clauses with bodies.)

The following queries illustrate the behavior of the new est predicate:

?- est([[p(a)],[p(b)],[q(X),p(X)]], [q(a),q(b)]).

Yes

?- est([[p(a)],[p(b)],[q(X),p(X)]], [q(Y),not(eq(Y,a))]).

Y = b

Yes

(p.249) This version of est now handles almost all of Prolog. (It does not yet work with numbers or built-in operations like write and assert. Also, it does not do the right thing when the same variable appears in both the query and knowledge base.) With a bit more work, any Prolog program and any query can be represented as terms and used as arguments to a predicate like est. So this program is a language processor for Prolog: a program that can take a symbolic representation of a Prolog program and execute it. What takes some getting used to, perhaps, is the fact that this processor for Prolog is written in the Prolog language itself.

11.2 Explanation

Explanation involves finding an atomic sentence such that if it were true, it would account for a given sentence’s being true. For example, assuming it is known that all polar bears are white, one would like to be able to explain Thornton’s being white by abducing that he is a polar bear. As it turns out, the final version of the est predicate from figure 11.6 makes this possible by using a variable A to stand for the new atomic sentence to be abduced:

?- est([[A],[white(X),polar_bear(X)]],[white(thornton)]).

A = white(thornton)     ;

A = polar_bear(thornton)  ;

No

The query asks for an atomic sentence A such that if A is added to the knowledge base, the desired conclusion then follows. There are two such atoms. The first one, white(thornton) is technically correct but only in a trivial way, so one might not want to consider it as a possible explanation.

11.2.1 Diagnosis

Imagine that a knowledge base has the following facts:

A person will have sore elbows if she has a tennis elbow or sore joints in general. Arthritis will also lead to sore joints. A person will have sore hips if she has sore joints in general or a hip fracture. Only a tennis player will get tennis elbow.

A representation of this knowledge base is shown in figure 11.7. (The predicate background indicates that this list represents background knowledge.) One can then look for an explanation for a person Sue’s having a sore elbow: (p.250)

Case Study: Other Ways of Thinking

Figure 11.7. A knowledge base about sore joints        joints.pl

?- background(K), est([[A]|K],[sore_elbow(sue)]).

A = sore_elbow(sue)    ;

A = tennis_elbow(sue) ;

A = sore_joints(sue)   ;

A = arthritis(sue)    ;

No

Thus there are four possible atoms A that could be added as clauses to the background knowledge K to get the desired conclusion. The first atom, sore_elbow, is the trivial explanation as before. The third atom, sore_joints, may or may not be useful as an explanation. The other two, tennis_elbow and arthritis, are likely the desired explanations for sore_elbow.

Often only explanations drawn from some prespecified assumption set or hypothesis set are of interest. A typical case of explanation is (medical) diagnosis. In diagnosis, some of the atoms are considered to be possible symptoms, and some of the atoms are considered to be possible diseases. Observations that are among the symptoms are given to a diagnostician, and an explanation is sought among the diseases. In general, what one is willing to assume depends on the problem, and that is what the assumable predicate in figure 11.7 is for:

?- background(K), assumable(A),

est([[A]|K], [sore_elbow(sue)]).

A = tennis_elbow(sue)   ;

A = arthritis(sue)    ;

No

Note that the explanations that are generated depend on what needs to be explained. For example, suppose Sue also has sore hips: (p.251)

Case Study: Other Ways of Thinking

Figure 11.8. A general explanation program        explain.pl

?- background(K), assumable(A),

est([[A]|K], [sore_elbow(sue),sore_hips(sue)]).

A = arthritis(sue)    ;

No

In this case, only one explanation does the job; tennis elbow is no longer sufficient. In other cases, there may be no single atom that gives the desired explanation, and one may need to look for multiple atoms to assume.

11.2.2 A general explanation program

The general explanation program shown in figure 11.8 attempts to find one or more atoms to assume. The predicate explain takes two arguments: a query Q that is to be explained and an explanation E that it will generate as a list of atoms to assume. The merge predicate takes the list of atoms and adds them as atomic clauses to the knowledge base K, checking along the way that they are assumable and that there are no repeats in the list.

The explain predicate can be seen in action by looking at another diagnostic example, this time involving car trouble (see the knowledge base in figure 11.9). Some queries for this example appear in figure 11.10.

  • The first query, without additional assumptions, cannot get an explanation of the car’s not starting.

  • The second query shows that a dead battery and being out of gas are both single-atom explanations for the car’s not starting.

  • The third query shows that being out of gas is the only single-atom explanation for the car’s not starting when the radio is working. (p.252)

    Case Study: Other Ways of Thinking

    Figure 11.9. A knowledge base about car trouble        cars.pl

    Case Study: Other Ways of Thinking

    Figure 11.10. Explanations for car trouble

  • The fourth query shows that no single-atom suffices to explain all three facts: the car not starting, the radio not working, the wipers working.

  • (p.253) The fifth query shows that there is a double-atom explanation for these three facts. The explanation is that the car is out of gas and the radio speaker is broken (in either order).

  • The sixth query looks for an arbitrary explanation under the same conditions. It finds the double explanation, but then continues to search indefinitely until all memory is exhausted.

The last query shows that it is possible to use explain without specifying how many atomic sentences are desired, but some care is needed.

11.3 Learning

As noted at the start of this chapter, (inductive) learning is a form of thinking that is similar to explanation except that instead of seeking an atomic sentence to account for a query’s being true, it looks for a general rule, that is, a conditional sentence with variables to explain some observations. If it is known that Thornton and his friends are polar bears, and they are all observed to be white, it is reasonable to induce that all polar bears are white. Then on hearing of a new polar bear, one may wish to conclude that she also is white.

The general format for this type of thinking is similar to the explain predicate. Instead of

explain(Q,E) :- background(K), merge(E,K,K1), est(K1,Q).

which generates an explanation E (a list of atoms) that accounts for a given query Q, a predicate induce is defined as follows:

induce(Q,R) :- background(K), rule(R), est([R|K],Q).

The predicate rule generates a conditional sentence R with the property that if it is added to the background knowledge K, then Q can be established. For example, if the background knowledge contains polar_bear(thornton) and polar_bear(shako), the desired behavior is the following:

?- induce([white(thornton),white(shako)],R).

R = [white(_G17), polar_bear(_G17)]

This induced rule (with the variable _G17 in it) says that polar bears are white. The next section discusses how to generate rules like this.

(p.254) 11.3.1 Inducing general rules

In the simplest case, a rule that explains some observations should have the following characteristics:

  • The head of the rule should be a predicate in the relevant domain (like polar_bear or white) with variables as arguments.

  • The body of the rule should be a sequence of literals. Each literal should be a possibly negated predicate in the domain. Moreover, the predicate should be different from the one in the head (to avoid recursion), and the arguments of the predicate should be among the variables in the head.

So for a rule whose head is r(X,Y,Z), the body can contain the literals not(p(Y)) and q(X,Z,X) but it should not contain r(X,Z,X), which would make the rule recursive, nor q(X,a,W), which uses arguments a and W that do not appear in the head.

The simplest way to satisfy these requirements is to assume that each problem domain specifies which predicates and which variables can be used. To do this, a special predicate called predicate can be used. It plays a role similar to that of the assumable predicate:

predicate(polar_bear(X),polar_bear,[X]).

predicate(white(X),white,[X]).

predicate(bigger_than(X,Y),bigger_than,[X,Y]).

The first argument to predicate is an atom that can be used in the head of a rule; the second argument is the predicate part of the atom as a constant; and the third argument is a list of the variables used in the head.

A definition for an induce predicate is presented in figure 11.11. The rule predicate uses the predicate hb (for “head-body”) to generate an atom for the head and a list of literals for the body. To deal with the body, in the second clause of hb, each literal L (an atom or its negation) is checked to ensure that its predicate Q is different from the predicate in the head P, and its arguments U are among the arguments in the head V (using the auxiliary predicate subl). In addition, a negative literal is generated only if the rest of the body B is empty or begins with a negative literal.

11.3.2 An example of induction

To show the induce predicate in action, a new knowledge base about animals is provided in figure 11.12. There are a number of individuals that are animals of different types, and a number of predicates to learn about. (p.255)

Case Study: Other Ways of Thinking

Figure 11.11. The induce predicate in Prolog

Case Study: Other Ways of Thinking

Figure 11.12. A knowledge base about animals        animals.pl

What rule can be learned from observing that Fido barks:

?- induce([barks(fido)], R).

R = [barks(_G404), animal(_G404)]    % Every animal barks?

(p.256) The answer is somewhat disappointing: the program induces that every animal barks. While this rule is certainly sufficient to explain Fido’s barking, something more specific is desired. To find a more specific rule, the induce predicate must be given some negative examples:

?- induce([barks(fido),barks(rover),not(barks(kitty)),

not(barks(donald))], R).

R = [barks(_G425), dog(_G425)]    % Every dog barks.

This is a much better rule. Similarly, what rule can be learned from observing that Fido and Kitty are four-legged but Daffy (the duck) is not:

?- induce([four_legged(fido),four_legged(kitty),

not(four_legged(daffy))], R).

R = [four_legged(_G411), not(duck(_G411))]

% Everything that is not a duck has four legs?

This induced rule may or may not be acceptable given what is known. At the very least, it should be restricted to animals only, so induce needs to be given information about a nonanimal individual, say, a tree:

?- induce([four_legged(fido),four_legged(kitty),

not(four_legged(daffy)),

not(four_legged(tree17))], R).

R = [four_legged(_G428), animal(_G428), not(duck(_G428))]

% Every animal that is not a duck has four legs.

This induced rule has a body with two literals: the individual must be an animal and must not be a duck.

* 11.3.3 Classification: Training and testing

What is the point of inducing new rules? In general, one wants to observe the world, notice the properties of some individuals, induce some general rules about them, and finally, use the general rule to predict the properties of new individuals. So, for example, after observing that Fido and Rover both bark (but that Kitty and Daffy do not), the induced rule should be able to predict that Spot the dog will also bark.

This type of learning is often called classification: some training examples are given, some of which have a property (like barking) and some of which do not; then new test examples are classified into those that do and those that do not have the property.

The induce predicate can be used to find out if new individuals have a property. In fact, the new atoms learned from the training examples can be generated:

(p.257) % Given a query Q as training data (as before), find a new

% atom A that can be established from an induced rule.

learned_atom(Q,A) :-

background(K), induce(Q,R), est([R|K],[A]),

\+ est(K,[A]), \+ member(A,Q).

The learned_atom predicate uses induce to generate a rule as before, then looks for an atom A that is not an element of the training data Q, and that can be established given that rule, but not without the rule. The result is the following:

?- learned_atom([four_legged(fido),four_legged(kitty),

not(four_legged(daffy)),

not(four_legged(tree17))], A).

A = four_legged(spot)   ;

A = four_legged(rover)  ;

A = four_legged(kelly)

In this case, enough is learned from the training examples to generate new examples: Spot, Rover, and Kelly are four-legged.

Unfortunately, this simple version of induce looks for ever longer rules to explain the observations it is given and never actually fails. So if learned_atom is asked to classify Donald or Huey as four-legged given the same training examples, instead of failing as it should, it would simply run forever.

There are ways of dealing with this issue. The easiest perhaps is to rewrite induce so that it eventually gives up and fails. For example, the length of the desired rule could be limited. But a better approach is to commit to the first induced rule found by the program. In other words, if a rule R has been found that accounts for the training data Q, the testing should be done with R, with no backtracking to look for other rules. Prolog provides a built-in predicate called once that does this: if induce(Q,R) in the body of learned_atom is replaced by once(induce(Q,R)), the program commits to the first rule found and learned_atom will then fail as it should, for instance, on classifying the test examples Donald and Huey as four-legged.

In performing such classification, the bulk of the work involves finding a rule that works well. When there are a number of predicates to choose from and a number of variables that are needed, it can be very time-consuming to find a rule that does the job properly. This makes it especially important to have training examples that include near-misses, individuals that do not have the desired property but are very similar to those that do. With near-misses, it is much easier to locate the negative literals that need to be included in the body of the induced rule.

(p.258) 11.4 Propositional reasoning

So far, it has been assumed that a knowledge base is made up only of atomic or conditional sentences. This allowed the use of a thinking procedure like back-chaining to draw all the necessary conclusions, and even to do explanation and learning.

But, of course, there are sentences of English that do not fit this pattern. Consider the following three sentences:

At least one of Alice, Bob, or Carol is guilty.

If Alice is guilty, then so is Bob.

If Alice is not guilty, then neither is Carol.

From these sentences alone, there is no way to tell whether Alice and Carol are guilty. However, it is not hard to see that Bob must be guilty. The reasoning is as follows. If Alice is guilty, then Bob is also guilty. But if Alice is not guilty, then neither is Carol, and since one of the three must be guilty, it must be Bob. So either way, Bob must be guilty. In other words, the three sentences logically entail that Bob is guilty.

To be able to use knowledge in this sophisticated way, it is necessary to go beyond atomic and conditional sentences to represent what is known, and to go beyond back-chaining to calculate logical entailments. So it is useful to consider a new language for representing knowledge, a propositional language, which allows us the expression of negations and disjunctions in addition to the previous conjunctions and conditionals.

11.4.1 Conjunctive normal form

Perhaps the simplest representation of a propositional language in Prolog is to use the conjunctive normal form, or CNF. For present purposes, a CNF formula is a list of disjunctive clauses, or dclauses for short, where a dclause is a list of literals, and a literal is either an atom or its negation. A dclause is interpreted as the disjunction of its elements, and the entire CNF formula is interpreted as the conjunction of its elements. So the dclause [p,q] is interpreted as saying that p or q is true, and the CNF formula [[p,q],[not(r)]] is interpreted as saying that p or q is true and r is false.

Note that dclauses are interpreted differently from clauses. Previously, a list like [p,q,r,s] represented the conditional “If q and r and s then p.” That same list as a dclause represents the disjunction “p or q or r or s.”

And yet there is a definite connection between the two interpretations. Consider a conditional like this:

If q and r and s then p.

(p.259) That conditional says the same thing as the following disjunction:

not(q) or not(r) or not(s) or p.

To see why, ask what it takes for the conditional to be false. This happens when q, r, and s are true, but p is false. Now what does it take for the disjunction to be false? This happens when all the disjuncts are false, that is, when q, r, and s are true, but p is false. So the conditional and the disjunction are equivalent: they are either both false or both true. This means that the conditional “If q and r and s then p” can also be represented by a dclause: [p,not(q),not(r),not(s)]. (The order of the literals in a dclause does not matter.)

In the example with Alice, Bob, and Carol, let the atoms a, b, and c stand for the guilt of Alice, Bob, and Carol, respectively. Then the three given facts can be represented as a CNF formula:

[ [a,b,c], [not(a),b], [a,not(c)] ]

This gives the following:

  • The first dclause says that Alice is guilty or Bob is guilty or Carol is guilty (which is just another way of saying that at least one of them is guilty).

  • The second dclause says that Alice is not guilty or Bob is guilty (which is an equivalent way of saying that if Alice is guilty, then so is Bob).

  • The third dclause says that Alice is guilty or Carol is not guilty (which is an equivalent way of saying that if Alice is not guilty, then neither is Carol).

So every conditional can be represented as a dclause (containing precisely one unnegated literal), and once it is determined how to calculate logical entailments for CNF formulas in general, that procedure will work for conditionals too.

Although every clause can be rewritten as a dclause, the converse is not true. For example, there is no way to represent [not(p)] as a clause. Although there was a form of negation in queries and in the bodies of clauses, there is no way to have a simple negative fact (like Alice is not guilty) in a Prolog knowledge base.

11.4.2 Satisfiability

How does logical entailment work for CNF formulas? So far, a collection of sentences S1, S2, …, Sn logically entails a sentence S if S has to be true when the Si are all true. Another way of saying this is that the sentences {S1, S2, …, Sn, not(S)} cannot all be true. A set of sentences is said to be satisfiable if they can all be true, and unsatisfiable otherwise. So to compute if a collection of sentences Si entails another S, it is sufficient (p.260) to compute whether a certain set of sentences (formed by adding the negation of S to the other Si) is satisfiable or not.

But how does one test for satisfiability? For a set of dclauses without variables, there is a very simple account of satisfiability:

A set of dclauses (without variables) is satisfiable if and only if there is a way to pick a literal from each dclause without picking both an atom and its negation.

This derives from the fact that a disjunction will be true if one of the disjuncts is true (that is, the literal chosen for that dclause), and a conjunction will be true if all of the conjuncts are true (that is, all the dclauses are true). So, for example, the CNF formula

[ [p], [not(p),not(q)], [not(q)] ]

is satisfiable: pick p from the first dclause, not(q) from the second, and not(q) (already picked) from the third. This means that the dclauses can all be true: it happens when p is true and q is false.

However, the CNF formula

[ [p], [not(p),not(q)], [q] ]

is unsatisfiable: pick p from the first dclause, which forces the choice of not(q) from the second (to avoid having an atom and its negation), and then there is nothing to pick from the third. (In other words, the first two dclauses logically entail not(q).) Similarly, the CNF formula

[ [p,q], [not(p),q], [not(q),p], [not(p),not(q)] ]

is unsatisfiable: no matter what is chosen for the first three dclauses, both p and q are picked, with nothing left to choose from the last one. (In other words, the first three dclauses logically entail both p and q.)

11.4.3 Computing satisfiability

A predicate for testing whether a list of dclauses (without variables) is satisfiable is presented in figure 11.13. It uses the predicate satpick to go though the dclauses, picking a literal from each one using the predicate pickone. For each dclause, there are three possibilities (and so three clauses for the pickone predicate):

  • No new picking is needed if the head of the dclause has already been picked.

  • The head of the dclause can be picked if its negation has not been picked.

  • A literal from the tail of the dclause can also be picked. (p.261)

    Case Study: Other Ways of Thinking

    Figure 11.13. A satisfiability program        sat.pl

The predicate member_neg is used to check if the negation of a literal has already been picked. It does this by either adding or removing a not on the given literal.

It is not hard to confirm that this predicate does the right thing:

?- sat([[p],[not(p),not(q)],[not(q)]],L).

L = [not(q), p]

Yes

?- sat([[p],[not(p),not(q)],[q]],L).

No

?- sat([[p,q],[not(p),q],[not(q),p],[not(p),not(q)]],_).

No

11.4.4 Logical entailment reconsidered

The notion of logical entailment can now be enlarged to deal with knowledge bases that are lists of dclauses. Continue to assume that there are no variables in the dclauses, and for simplicity, that the queries are as before, a list of atoms or their negations, all of which are to be established. It is then easy to do logical entailment: negate the literals in the query and let the sat predicate do the heavy lifting, as shown in figure 11.14.

With this procedure, one can go beyond what was possible with back-chaining. For example, the puzzle with Alice, Bob, and Carol can now be solved:

?- estsat([[a,b,c],[not(a),b],[a,not(c)]], [a]).

No (p.262)

Case Study: Other Ways of Thinking

Figure 11.14. Entailment using satisfiability        estsat.pl

?- estsat([[a,b,c],[not(a),b],[a,not(c)]], [b]).

Yes

?- estsat([[a,b,c],[not(a),b],[a,not(c)]], [c]).

No

The given facts entail that Bob is guilty, but they do not entail the guilt (or innocence) of Alice or Carol.

The back-chaining examples shown in figure 11.2 can be repeated with the conditionals translated into dclauses:

?- estsat([[a],[b],[u,not(p),not(b)],[p,not(a)]], [b,a]).

Yes

?- estsat([[a],[b],[u,not(p),not(b)],[p,not(a)]], [u]).

Yes

?- estsat([[a],[b],[u,not(p),not(b)],[p,not(a)]], [p,d]).

No

Of course, the way these queries are answered (using the sat predicate) is very different from the back-chaining method used until now.

* 11.4.5 First-order reasoning

When a propositional language is enlarged to include variables, it is usually called a first-order language. Reasoning correctly with a first-order language is much more complicated than what has been done so far. For satisfiability, it is no longer sufficient to pick a literal from every dclause; a literal must be picked from every dclause for every value of the variables in that dclause.

For example, the formula [[not(p(a))],[not(q(b))],[p(X),q(X)]] is considered to be satisfiable, since q(X) can be chosen when X is a, and p(X) otherwise. But the formula [[not(p(a))],[not(q(a))],[p(X),q(X)]] is unsatisfiable, since (p.263)

Case Study: Other Ways of Thinking

Figure 11.15. An unsatisfiability program        unsat.pl

there is no literal to choose when X is a. (Specifically, the CNF formula without variables, [[not(p(a))],[not(q(a))],[p(a),q(a)]], is unsatisfiable according to what was done before.) One complication here is that more than one value per variable may have to be considered to handle cases like the following: the formula [[not(p(a)),not(p(b))],[p(X)],[p(Y)]] is unsatisfiable (when X is a and Y is b), and so [[not(p(a)),not(p(b))],[p(X)]] is unsatisfiable too.

This suggests that for first-order reasoning it is easier to work with unsatisfiability than with satisfiability:

A set of dclauses (with variables) is unsatisfiable if and only if there are (possibly multiple) values for the variables such that the resulting set of dclauses without variables is not satisfiable.

A program that realizes this definition appears in figure 11.15. To deal with the multiple values for variables, it begins by making copies of all the dclauses using the built-in (p.264) copy_term predicate. (It uses a parameter N to decide how many copies to make.) It then uses the predicate nopick to confirm that there is no way to pick a literal from each dclause in the resulting list. Below the dashed line in the figure, the estunsat predicate is like the estsat predicate in figure 11.14 except that it uses the success of unsat rather than the failure of sat to do all the hard work:

?- estunsat(1, [[p(a)],[not(p(X)),q(X)]], [q(a)])

Yes

?- estunsat(1, [[p(a)],[not(p(X)),q(X)]], [q(b)])

No

Note that for some queries it is necessary to use a parameter N 〉 1:

?- unsat(1, [[not(p(a)),not(p(b))],[p(X)]]).   % Wrong.

No

?- unsat(2, [[not(p(a)),not(p(b))],[p(X)]]).   % Right.

Yes

Unfortunately, there is no way to know in advance how large a value of N will be needed to obtain the correct answer.

With a first-order language, more advanced reasoning problems can be handled. Consider the following puzzle (a generalized version was discussed in section 4.4):

There are three blocks, A, B, C, in a stack, with the top one (A) green

and the bottom one (C) not green.

Is there a green block directly on top of a non-green one?

It is clear that the answer must be yes. If B is green, then it is a green one directly on a non-green one (C); if, on the other hand, B is not green, then the top block (A) is a green one directly on a non-green one (B). Either way, the conclusion follows.

What makes this puzzle challenging is that we know there has to be a green block directly on top of a non-green one, and yet we cannot say which one it is. With back-chaining, whenever it was known that something had a property, enough information was always given to figure out what that something was. First-order reasoning using estunsat is more general and does not have this limitation:

?- estunsat(2,

[ [on(a,b)], [on(b,c)],    % The facts

[green(a)], [not(green(c))] ],   % More facts

[ on(X,Y), green(X), not(green(Y))] ).  % The query

Yes

(p.265) Observe that the only thing that prevents representing all the given facts as a set of clauses in Prolog (and using normal back-chaining) is the fact about the bottom block not being green. But to handle this negative fact properly and to fully use what is known requires thinking in this new advanced way.

Want to read more?

This chapter explored ways of thinking that went beyond simple back-chaining. Remarkably enough, these new ways of thinking were all formulated in terms of back-chaining.

The idea of using a programming language to write a language processor for that very language (a meta-interpreter) is a practice developed to a fine art in the LISP community [53], which uses this idea in a number of ways, including adding new primitive constructs to the language. In this chapter it was used to explore variants of the Prolog processor and to handle abduction and induction by making temporary additions to the underlying knowledge base.

The section on explanation took a first glance at an area of research called abductive logic programming (for example, see [54]). The section on learning introduced the area of inductive logic programming (for example, see [56]). These two continue to be active areas of research, although techniques more closely allied with probability theory have gradually come to dominate the study of both diagnosis and learning.

The section on propositional reasoning is perhaps the most open-ended part of the book. Its generalization of clauses to dclauses allowed representing a much wider range of knowledge, in particular, incomplete knowledge, when certain facts must be considered even though their truth is unknown (like the guilt of Alice). When variables and a few other features are included, the resulting first-order logic is the typical starting point for logical reasoning within AI, for example in [4].

Computing satisfiability (for CNF formulas without variables) has become an active area of AI research. It turns out that many of the problems considered in this book (including constraint satisfaction and planning) can be reformulated directly in terms of satisfiability. This is significant because there have been great advances in recent programs that compute satisfiability [55].

Computing unsatisfiability (for CNF formulas with variables) is also an active area of research called automated theorem-proving [57]. The emphasis is on taking (small) parts of mathematics, formalizing them as a set of axioms in a first-order logic, and (p.266) then asking if they entail some mathematical conjecture of interest. In 1996 an automated theorem-proving program written by William McCune and colleagues was able to prove Robbins’ Conjecture, a statement in algebra whose status had remained open for sixty years.

All the reasoning in this book comes out as a special case of first-order reasoning. This means (in theory) leaving Prolog behind and continuing the work using a first-order logic instead. Prolog is a conceptual ladder of great value, but now one can kick the ladder away and explore along different paths.