First Order Predicate Logic

Preface: these notes are primarily based on Chapter 3 of Ertel's text.

Limits of Propositional Logic

Every fact must be stated as its own (independent) proposition, and so there is no way to reason at a higher-level about "similar" facts, or to design more general statements of implication.

Application: music recommendations

Knowledge base:
A = Michael likes Coldplay
B = Michael likes Maroon 5
C = Michael does not like alternative music
D = Susan likes Maroon 5
E = Susan likes alternative music
F = Susan does not like Miley Cyrus
G = Maroon 5 plays Pop Rock

If we wish to reason about such knowledge, we cannot easily do so in propositional logic, because there is no clear similarity between facts such as A and B, or B and D.
Application: Robotics
Assume that there are two robots A and B in a 100x100 grid. We want to express that "Robot A is further to the right than Robot B".

We might define a separate proposition literals for every single possible position of A, for example defining proposition A_0,0 to be true when A is currently at (0,0). Unfortunately, then we need to write a great deal of propositions to reason about the environment. Relevant formulas might include:
¬(A_0,0 ∧ A_0,1)
¬(A_0,0 ∧ A_0,2)
¬(A_0,0 ∧ A_0,3)
...
¬(A_1,0 ∧ A_0,1)
¬(A_1,0 ∧ A_0,1)
...
¬(A_0,0 ∧ B_0,0)
¬(A_0,1 ∧ B_0,1)
¬(A_1,0 ∧ B_1,0)

Worse yet, to express the original fact that b is further to the right than a, we would have a huge formula, starting (incoherently) as:
(A_0,0 ∧ B_0,1) ∨ (A_0,0 ∧ B_0,2) ∨ (A_0,0 ∧ B_1,1) ∨ (A_3,3 ∧ B_0,5) ∨ ...
with there eventually being something like 100,000,000 different clauses given the pairwise possibility between two robots, each of which has 10,000 possible locations.

First Order Predicate Logic ("PL1")

In predicate logic, we consider the world to be a (possibly infinite) collection of objects (known as the domain of discourse).

We then express knowledge in the form of predicates which can be applied to any number of objects, such as likes(a, b), and position(r, x, y). A predicate formally evaluates to either true or false, and it expresses a relationship between objects in the domain. For example, there exists a set of triples (r,x,y) for which position(r, x, y) evaluates to true.

In first-order predicate logic, we allow the use of quantifiers to express knowledge about one or more objects in the world. Specifically, the symbol ∀ represents the universal quantifier, applying to every object, and symbol ∃ represents the existential quantifier, meaning that a statement applies to at least one object from the domain.

Examples
Before we get bogged down in formal syntax and semantics, let's look at some examples of statements in PL1.

Formula Description

∀x likes(x, cake) Everyone likes cake

¬∀x likes(x, cake) Not everyone likes cake (but some might)

∀x ¬likes(x, cake) No one likes cake

¬∃x likes(x, cake) No one likes cake (equivalent to previous formula)

∀x frog(x) ⇒ green(x) All frogs are green

∃x ∀y likes(x, y) There is someone who likes everything

∃y ∀x likes(x, y) There is something that everyone likes

∀x ∃y likes(x, y) Everyone likes at least one thing

∀y ∃x likes(x, y) Everything is liked by at least one person

Formula	Description
∀x likes(x, cake)	Everyone likes cake
¬∀x likes(x, cake)	Not everyone likes cake (but some might)
∀x ¬likes(x, cake)	No one likes cake
¬∃x likes(x, cake)	No one likes cake (equivalent to previous formula)
∀x frog(x) ⇒ green(x)	All frogs are green
∃x ∀y likes(x, y)	There is someone who likes everything
∃y ∀x likes(x, y)	There is something that everyone likes
∀x ∃y likes(x, y)	Everyone likes at least one thing
∀y ∃x likes(x, y)	Everything is liked by at least one person

Syntax for PL1 Formulas

A set K of constants (e.g., cake).

A set V of variables (e.g., x, y). In a formula, a variable can either be

bound (quantified)
free (unbound)

While predicates formula evaluate to true or false, it is convenient to consider a set F of functions, that evaluate to an object from the domain. For example, we might want to have a function, mother(x), that evaluates to the object representing the mother of object x. Functions are syntactic sugar for some other formal predicate that happens to describe a relationship that is a function (meaning, that there is a well defined y for any f(x)).

For example, we might have a formal predicate mother_of(y,x) that is true if y is the mother of x, and such that for any x, there is one and only one y for which mother_of(y,x) is true. The use of function mother(x) in the formula:

∀x likes(x, mother(x))

could be avoided by writing the more complex

∀x ∃y likes(x, y) ∧ mother_of(y,x)

The use of the function syntax simplifies the expression of the statement.

We also will allow infix notation "x = y" to be a standard equality relation, that could otherwise be expressed as equals(x,y), along with axioms to enforce typical reflexive, symmetric, and transitive properties. This will allow us to denote things such as Mary = mother(John).

We now define a term recursively, as either

A constant or variable (such a term is atomic)
An expression f (t₁, t₁, ... t_n) where f is an n-aray function, and t₁, t₁, ... t_n are terms.
(thus functions can be applied recursively)

We now define a PL1 formula recursively, as follows

p (t₁, t₁, ... t_n) where p is an n-aray predicate, and t₁, t₁, ... t_n are terms.
(such a formula is atomic)
If A and B are formula then (using standard propositional operators) the following are also formulas:
¬A, (A), A ∧ B, A ∨ B, A ⇒ B, A ⇔ B.
If x is a variable and A a formula, then the following are formulas:
∀x A and ∃x A.

We refer to expressions such as p (t₁, t₁, ... t_n) and ¬p (t₁, t₁, ... t_n) as literals.

A formula in which there are no free variables is known as a first-order sentence or closed formula.

Definitions of CNF and Horn clauses extend to PL1.

Semantics for PL1 Formulas

An interpretation is a mapping of all variables to the set W of objects in the world, as well as a mapping from the set of all functions and predicates to actual functions and relations in the world.

A formula is valid (i.e., true) under an interpretation when

If the formula is atomic (i.e., a single predicate), it is true so long as the corresponding relationship exists for the objects of the world assigned in that interpretation.
If the formula has no quantifiers, then the rules of propositional logic extend to the validity.
Formula ∀x F is true under an interpretation, if it is would remain true for an arbitrary change to the assignment for variable x.
Formula ∃x F is true under an interpretation, if there is at least one assignment for variable x that would make it true.

Because we will often want to talk about making a replacement of every free occurrence of a variable x in a formula φ with some term t, we introduce the corresponding notation:

φ[x/t]

For simplicity, we will assume that all quantifiers in a given formula use a different variable (renaming variables in subformula to avoid unintended name collisions).

Quantifiers and Normal Forms

Since world is finite, an extension of de Morgan's law state that

∀x φ ≡ ¬∃x ¬φ
∃x φ ≡ ¬∀x ¬φ

A predicate logic formula φ is in prenex normal form if it holds that:

φ = Q₁x₁, Q₂x₂, ... Q_nx_n ψ

where each Q_ix_i ∈ [∀, ∃] is a quantifier for i = 1, ..., n, and where ψ is quantifierless.

Theorem: Every predicate logic formula can be transformed into an equivalent formula in prenex normal form (and of course, the quantifierless portion can itself be converted to CNF).

Algorithm is rather direct, so long as you use replacement to ensure that each original quantifier uses a distinct variable.

Example: (from book)
(∀x p(x)) ⇒ (∃y q(y))

... (spoiler alert) ...

∃x ∃y p(x) ⇒ q(y)

Example: (Wikipedia)
( P( ) ∨ ∃x Q(x) ) ⇒ ∀y R(x, y)

... (spoiler alert) ...

∀x∀y ( (P( ) ∨ Q(x)) ⇒ R(x,y) )

Skolemization

If we only care about testing satisfiability of a formula, we can convert it to an equally satisfiable (but not equivalent) formula that does not use any existential quantifiers. This process is known as Skolemization.

This is accomplished by replacing each existentially quantified variable by either a constant or by a newly created function that depends on any universally quantified variables with surrounding scope.

As a very simple example, if we have a term ∃x P(x) that is not within the scope of any universal quantifier, we replace it by the term P(c) for a newly chosen constant c. In effect, this simply "hides" the burden of finding an existential variable satisfying the predicate upon the task of finding a satisfying interpretation (as the constant must eventually be assigned to some object in the domain)

As a more typical example, if we have an expression that starts:

∀x ∀y ∃z φ

the choice of z might depend upon the surrounding choice of x and y. We replace any use of variable z with a new function, g(x, y). Again, for this to have a satisfying interpretation, there is a burden of identifying a well-defined function g, and therefore there must be some z chosen as the value of g(x, y) for any pair (x, y).

The advantage of Skolemization is that it provides a more restrictive normal form. In fact, once we know that all remaining variables are universally quantified, we can even drop that from the syntax and leave them all as free variables in a formula.

Another Example: (Russell's paradox)
"There is a barber who shaves precisely those who do not shave themselves"

∃y ∀x shaves(y,x) ⇔ ¬shaves(x,x)

This is already in prenex normal form. But we can convert the rest to CNF as:

∃y ∀x (shaves(y,x) ∨ shaves(x,x)) ∧ (¬shaves(y,x) ∨ ¬shaves(x,x))

If we pick barber as the Skolem constant for y, we get

(shaves(barber,x) ∨ shaves(x,x)) ∧ (¬shaves(barber,x) ∨ ¬shaves(x,x))

Limits of Computation

Godel's Completeness Theorem (1929)
There exists a sound and complete proof calculus for PL1. That is, a derivation system that is sound, and for which any true formula has a finite derivation.

But...

Church/Turing Undecidability Theorem (1936, 1937)
PL1 is undecidable, assuming there is at least one predicate of arity at least 2 (other than equality).

Taken together, there is a process which could enumerate through infinitely countable set of possible derivations, so that any true formula can be proven. However, there is no finite way to conclude that an arbitrary formula is not true (there might still be a proof out there the hasn't yet been generated by the enumeration).

Resolution and First-Order Predicate Logic

When working with a first-order knowledge base, it is common to make what is known as the closed-world assumption; that is, even though the universe may be infinite, we assume that objects exist only if the KB knows about them. In this case, the domain appears to be finite and we could consider converting all PL1 formula into propositional logic and use our existing mechanism. For example, ∀x P(x) becomes P(a) ∧ P(b) ∧ P(c) ∧ ... . However, there are two problems. First, even if we do this, we get an outrageously sized KB, so we prefer to have a proof calculus that can work directly with PL1 formula. But the above also breaks down when functions are allowed in the system, since if we assume mother(x) is defined for all x and that mother(x) ≠ x, then there must be infinitely many objects, as x, mother(x), mother(mother(x)), ... . (This is what leads to the Undecidability Theorem.)

So we consider PL1 proof calculi. As with propositional logic, the resolution calculus can be shown to be sound and complete for PL1 in CNF (with one additional rule added to the calculus).

What is new about performing resolution in PL1 (versus propositial logic), is that we need to allow for replacements of bound variables and functions using a process known as substitution. Specifically, we allow the following derivation step:

∀x φ(x) ⊢ φ[x/t]

To perform resolution steps, we need to be able to match a pair of negated literals that might each be expressed using a variety of bound and unbound variables and functions. As a simple example, assume that we have knowledge base:

frog(kermit)
∀x frog(x) ⇒ green(x)

Converted to CNF, and assuming free variables are universally quantified, we would rewrite this KB as:

frog(kermit)
¬frog(x) ∨ green(x)

We would certainly hope to conclude that kermit is green, but the negated terms are not quite matching, as we have frog(kermit) and ¬frog(x). But since x is a variable, we can apply the substitution [x/kermit], in which case we can resolve on

frog(kermit), ¬frog(kermit) ∨ green(kermit)
⊢ green(kermit)

Unification
If performing forward chaining, we wish to derive the most general form of new knowledge. This is done by working with the substitution known as the most general unifier (MGU). Two literals are unifiable if there is a substituion σ for all variables which makes the literals equal. Such a substitution is a unifier. A unifier is an MGU if all other unifiers can be obtained from it by further substitution.

Example: consider literals

p(f(g(x), y, z)
p(u, u, f(u))

The MGU for these literlas is [y/f(g(x)), z/f(f(g(x))), u/f(g(x))]

There are a variety of algorithms for computing the MGU.

Resolution
The generalized resolution step for PL1 can be written as follows.

(A₁ ∨ A₂ ∨ ... ∨ A_m ∨ B), (¬B' ∨ C₁ ∨ C₂ ∨ ... ∨ C_n), σ(B) = σ(B')

derives the resolvent

(σ(A₁) ∨ σ(A₂) ∨ ... ∨ σ(A_m) ∨ σ(C₁) ∨ σ(C₂) ∨ ... ∨ σ(C_n))

where σ is the MGU of B and B'.

Essentially, if we were to first apply the substitution to both clauses, then the resolvent is what results from the standard resolution rule.

The resolution step alone is sound, but not complete for PL1. To see why, we consider the barber paradox above. That statement is unsatisfiable. But we cannot derive a contradiction using only the resolution rule given above.

Fact 1: shaves(barber,x) ∨ shaves(x,x)
Fact 2: ¬shaves(barber,x) ∨ ¬shaves(x,x)

What substitution should we use to resolve the two clauses above? If we apply [x/barber], then anything we get from resolution turns out to be a tautology.

The extra rule we need is known as factorization. Essentially, if a substitution causes two different terms of the same clause to become identical, then we can drop one of those two terms when substituting. That is, if we perform [x/barber], then we can see that

shaves(barber,x) ∨ shaves(x,x), [x/barber]
⊢ shaves(barber,barber)

(rather than the redundant "shaves(barber,barber) ∨ shaves(barber,barber)")

Theorem:
Resolution (when combined with normalization), is a sound and complete calculus for PL1.

Michael Goldwasser

Last modified: Tuesday, 10 September 2013