Preface: these notes are primarily based on Chapter 3 of Ertel's text.
Every fact must be stated as its own (independent) proposition, and so there is no way to reason at a higher-level about "similar" facts, or to design more general statements of implication.
Application: music recommendations
Knowledge base:
A = Michael likes Coldplay
B = Michael likes Maroon 5
C = Michael does not like alternative music
D = Susan likes Maroon 5
E = Susan likes alternative music
F = Susan does not like Miley Cyrus
G = Maroon 5 plays Pop Rock
If we wish to reason about such knowledge, we cannot easily do so in propositional logic, because there is no clear similarity between facts such as A and B, or B and D.
Assume that there are two robots A and B in a 100x100
grid. We want to express that
We might define a separate proposition literals for every single possible
position of A, for example defining proposition
A0,0 to be true when A is currently at (0,0).
Unfortunately, then we need to write a great deal of propositions to
reason about the environment. Relevant formulas might include:
¬(A0,0 ∧ A0,1)
¬(A0,0 ∧ A0,2)
¬(A0,0 ∧ A0,3)
...
¬(A1,0 ∧ A0,1)
¬(A1,0 ∧ A0,1)
...
¬(A0,0 ∧ B0,0)
¬(A0,1 ∧ B0,1)
¬(A1,0 ∧ B1,0)
Worse yet, to express the original fact that b is further to the
right than a, we would have a huge formula, starting (incoherently) as:
(A0,0 ∧ B0,1)
∨
(A0,0 ∧ B0,2)
∨
(A0,0 ∧ B1,1)
∨
(A3,3 ∧ B0,5)
∨
...
with there eventually being something like 100,000,000 different clauses
given the pairwise possibility between two robots, each of which has
10,000 possible locations.
In predicate logic, we consider the world to be a (possibly infinite) collection of objects (known as the domain of discourse).
We then express knowledge in the form of predicates which can be
applied to any number of objects, such as
In first-order predicate logic, we allow the use of quantifiers to express knowledge about one or more objects in the world. Specifically, the symbol ∀ represents the universal quantifier, applying to every object, and symbol ∃ represents the existential quantifier, meaning that a statement applies to at least one object from the domain.
Examples
Before we get bogged down in formal syntax and semantics, let's look
at some examples of statements in PL1.
| Formula | Description |
|---|---|
| ∀x likes(x, cake) | Everyone likes cake |
| ¬∀x likes(x, cake) | Not everyone likes cake (but some might) |
| ∀x ¬likes(x, cake) | No one likes cake |
| ¬∃x likes(x, cake) | No one likes cake (equivalent to previous formula) |
| ∀x frog(x) ⇒ green(x) | All frogs are green |
| ∃x ∀y likes(x, y) | There is someone who likes everything |
| ∃y ∀x likes(x, y) | There is something that everyone likes |
| ∀x ∃y likes(x, y) | Everyone likes at least one thing |
| ∀y ∃x likes(x, y) | Everything is liked by at least one person |
A set K of constants (e.g., cake).
A set V of variables (e.g., x, y). In a formula, a variable can either be
While predicates formula evaluate to true or false, it is convenient to consider a set F of functions, that evaluate to an object from the domain. For example, we might want to have a function, mother(x), that evaluates to the object representing the mother of object x. Functions are syntactic sugar for some other formal predicate that happens to describe a relationship that is a function (meaning, that there is a well defined y for any f(x)).
For example, we might have a formal predicate
∀x likes(x, mother(x))could be avoided by writing the more complex
∀x ∃y likes(x, y) ∧ mother_of(y,x)The use of the function syntax simplifies the expression of the statement.
We also will allow infix notation
We now define a term recursively, as either
We now define a PL1 formula recursively, as follows
We refer to expressions such as
A formula in which there are no free variables is known as a first-order sentence or closed formula.
Definitions of CNF and Horn clauses extend to PL1.
An interpretation is a mapping of all variables to the set W of objects in the world, as well as a mapping from the set of all functions and predicates to actual functions and relations in the world.
A formula is valid (i.e., true) under an interpretation when
If the formula is atomic (i.e., a single predicate), it is true so long as the corresponding relationship exists for the objects of the world assigned in that interpretation.
If the formula has no quantifiers, then the rules of propositional logic extend to the validity.
Formula ∀x F is true under an interpretation, if it is would remain true for an arbitrary change to the assignment for variable x.
Formula ∃x F is true under an interpretation, if there is at least one assignment for variable x that would make it true.
Because we will often want to talk about making a replacement of every free occurrence of a variable x in a formula φ with some term t, we introduce the corresponding notation:
φ[x/t]
For simplicity, we will assume that all quantifiers in a given formula use a different variable (renaming variables in subformula to avoid unintended name collisions).
Since world is finite, an extension of de Morgan's law state that
∀x φ ≡ ¬∃x ¬φ
∃x φ ≡ ¬∀x ¬φ
A predicate logic formula φ is in prenex normal form if it holds that:
φ = Q1x1, Q2x2, ... Qnxn ψwhere each Qixi ∈ [∀, ∃] is a quantifier for i = 1, ..., n, and where ψ is quantifierless.
Theorem: Every predicate logic formula can be transformed into an equivalent formula in prenex normal form (and of course, the quantifierless portion can itself be converted to CNF).
Algorithm is rather direct, so long as you use replacement to ensure that each original quantifier uses a distinct variable.
Example: (from book)
... (spoiler alert) ...
∃x ∃y p(x) ⇒ q(y)
Example: (Wikipedia)
... (spoiler alert) ...
If we only care about testing satisfiability of a formula, we can convert it to an equally satisfiable (but not equivalent) formula that does not use any existential quantifiers. This process is known as Skolemization.
This is accomplished by replacing each existentially quantified variable by either a constant or by a newly created function that depends on any universally quantified variables with surrounding scope.
As a very simple example, if we have a term
As a more typical example, if we have an expression that starts:
∀x ∀y ∃z φthe choice of z might depend upon the surrounding choice of x and y. We replace any use of variable z with a new function,
The advantage of Skolemization is that it provides a more restrictive normal form. In fact, once we know that all remaining variables are universally quantified, we can even drop that from the syntax and leave them all as free variables in a formula.
Another Example: (Russell's paradox)
"There is a barber who shaves precisely those who do not shave themselves"
∃y ∀x shaves(y,x) ⇔ ¬shaves(x,x)
This is already in prenex normal form. But we can convert the rest to CNF as:
If we pick barber as the Skolem constant for y, we get∃y ∀x (shaves(y,x) ∨ shaves(x,x)) ∧ (¬shaves(y,x) ∨ ¬shaves(x,x))
(shaves(barber,x) ∨ shaves(x,x)) ∧ (¬shaves(barber,x) ∨ ¬shaves(x,x))
Godel's Completeness Theorem (1929)
There exists a sound and complete proof calculus for PL1. That is, a
derivation system that is sound, and for which any true formula has a
finite derivation.
Church/Turing Undecidability Theorem (1936, 1937)
PL1 is undecidable, assuming there is at least one predicate of arity
at least 2 (other than equality).
Taken together, there is a process which could enumerate through infinitely countable set of possible derivations, so that any true formula can be proven. However, there is no finite way to conclude that an arbitrary formula is not true (there might still be a proof out there the hasn't yet been generated by the enumeration).
When working with a first-order knowledge base, it is common to make
what is known as the closed-world assumption; that is, even
though the universe may be infinite, we assume that objects exist only
if the KB knows about them. In this case, the domain appears to be finite and we
could consider converting all PL1 formula into propositional logic and
use our existing mechanism. For example,
So we consider PL1 proof calculi. As with propositional logic, the resolution calculus can be shown to be sound and complete for PL1 in CNF (with one additional rule added to the calculus).
What is new about performing resolution in PL1 (versus propositial logic), is that we need to allow for replacements of bound variables and functions using a process known as substitution. Specifically, we allow the following derivation step:
∀x φ(x) ⊢ φ[x/t]
To perform resolution steps, we need to be able to match a pair of negated literals that might each be expressed using a variety of bound and unbound variables and functions. As a simple example, assume that we have knowledge base:
frog(kermit)Converted to CNF, and assuming free variables are universally quantified, we would rewrite this KB as:
∀x frog(x) ⇒ green(x)
frog(kermit) ¬frog(x) ∨ green(x)
We would certainly hope to conclude that kermit is green, but the negated terms are not quite matching, as we have frog(kermit) and ¬frog(x). But since x is a variable, we can apply the substitution [x/kermit], in which case we can resolve on
frog(kermit), ¬frog(kermit) ∨ green(kermit)
⊢ green(kermit)
Unification
If performing forward chaining, we wish to derive the most general
form of new knowledge. This is done by working with the substitution
known as the most general unifier (MGU). Two literals are
unifiable if there is a substituion σ for all
variables which makes the literals equal. Such a substitution is a
unifier. A unifier is an MGU if all other unifiers can be
obtained from it by further substitution.
Example: consider literals
The MGU for these literlas is [y/f(g(x)), z/f(f(g(x))), u/f(g(x))]p(f(g(x), y, z)
p(u, u, f(u))
There are a variety of algorithms for computing the MGU.
Resolution
The generalized resolution step for PL1 can be written as follows.
derives the resolvent(A1 ∨ A2 ∨ ... ∨ Am ∨ B), (¬B' ∨ C1 ∨ C2 ∨ ... ∨ Cn), σ(B) = σ(B')
where σ is the MGU of B and B'.(σ(A1) ∨ σ(A2) ∨ ... ∨ σ(Am) ∨ σ(C1) ∨ σ(C2) ∨ ... ∨ σ(Cn))
Essentially, if we were to first apply the substitution to both clauses, then the resolvent is what results from the standard resolution rule.
The resolution step alone is sound, but not complete for PL1. To see why, we consider the barber paradox above. That statement is unsatisfiable. But we cannot derive a contradiction using only the resolution rule given above.
Fact 1: shaves(barber,x) ∨ shaves(x,x)
Fact 2: ¬shaves(barber,x) ∨ ¬shaves(x,x)
What substitution should we use to resolve the two clauses above? If we apply [x/barber], then anything we get from resolution turns out to be a tautology.
The extra rule we need is known as factorization. Essentially, if a substitution causes two different terms of the same clause to become identical, then we can drop one of those two terms when substituting. That is, if we perform [x/barber], then we can see that
shaves(barber,x) ∨ shaves(x,x), [x/barber](rather than the redundant "shaves(barber,barber) ∨ shaves(barber,barber)") Theorem:
⊢ shaves(barber,barber)