Automated Theorem Proving

Throughout the course, you have learned the mechanical, low-level perspective on rigorous mathematical reasoning. Keeping in mind the rules of Natural Deduction, you might have wondered if you could build a computer program that could check proofs or maybe even prove propositions automatically! Indeed such programs exist; they are called automated theorem provers. Given a logical statement and a collection of axioms, an automated theorem prover determines whether they logical statement is provable.

The process of automatically deriving a proof of a proposition is, as you can imagine, very computational expensive. One way to work around this limitation is to relax our expectation that the proving process is fully automated. Instead, we can prove support for people developing proofs, for example, by offering useful suggestions or and checking their work.

In this lab and the subsequent one, we’ll try out the Coq as an example of such a proof assistant. Coq is used in both academia and industry to write machine-checked, formally verified software. Rather than installing Coq, we’ll use the excellent CoqJs tool which allows us to run Coq in a web browser.

You’ll notice that to the right of this reading is the Coq proving interface which reports the output of Coq commands. The Coq commands are embedded in this reading as editable code boxes. You can go through this reading and the lab, execute the proofs line-by-line, and observe the output of Coq in the proving interface. In addition to the buttons in the UI, the following shortcuts will be useful in syncing Coq to your reading:

alt-↓ or alt-n: move to the next line.
alt-↑ or alt-p: move to the previous line.
alt-enter or alt-→: move to the current position of the cursor.
F8: toggle the goal panel.

As a first example of using Coq, here is a proposition from our discussion of mathematical logic:

Lemma (Example Logic Prop): for any propositions \(P\), \(Q\), and \(R\), \((Q \rightarrow R) \rightarrow (P \rightarrow Q) \rightarrow (P \rightarrow R)\).

Coq consists of languages for writing programs in a functional style, propositions, and proofs. With our limited time with Coq, we won’t try to learn all of these different commands. Instead, we’ll focus on observing the similarities between our paper proofs and Coq.

Here is the statement of the lemma in Coq as well as the start of the proof of this lemma. Try evaluating it with the “move to next line” or “move to current position” shortcuts described above.

Lemma example_logic_prop : forall (P Q R:Prop), (Q -> R) -> (P -> Q) -> (P -> R).
Proof.

You’ll note that the proving interface reports the following after evaluating this line:

1 goal

--------------------------------------
∀ P Q R : ℙ, (Q → R) → (P → Q) → P → R

This is the current proof state as understood by Coq. Recall that the state of our proofs consist of a goal proposition and a collection of assumptions. Here, the goal proposition appears below the line and our assumptions appear above the line. When we initially prove a proposition, we frequently start with just a goal propositions and no assumptions.

At this point, let’s think about how we would prove this claim on our own. Then we will see how this reasoning maps onto Coq’s proof language, Gallina.

First, observe that our lemma is in the form of a universally-quantified proposition. Recall to prove such a proposition, we assume that the quantified variables are arbitrary/unknown constants of the appropriate type. In other words, we might say:

Let \(P\), \(Q\), and \(R\) be arbitrary propositions.

In Coq, we accomplish this with the intros command, naming each of the quantified variables in left-to-right order:

intros P Q R.

(Also note that commands in Coq end with a period, a fact that is easy to miss!)

After executing this Coq command, note that our proof state updates:

1 goal

P,Q,R : ℙ
-------------------------
(Q → R) → (P → Q) → P → R

That is, \(P\), \(Q\), and \(R\) are now propositions that are assumed to be provable.

Next, we are left with a nested implication. Recall that in such a scenario, the left-(→) rule tells us that the right-most proposition is our eventual goal, and every other proposition nested in the implication becomes an assumption. Thus we might write:

Assume that \(Q \rightarrow R\), \(P \rightarrow Q\), and that \(P\) are provable.

Like the universally quantified variables, we can use intros to introduce these additional assumptions. Unlike our prose, we must give them identifiers so that we can reference them later in our proof.

intros Hqr Hpq Hp.

Now, we must prove \(R\) assuming a whole bunch of stuff. Note how Coq keeps tracks of our assumptions for us, so we know what we have at our disposal!

At this point, we can observe that to prove \(R\), we can utilize \(Q \rightarrow R\) via the right-(→) rule. This will require us to prove \(Q\).

Because \(Q \rightarrow R\), it is sufficient to prove \(Q\) in order to prove \(R\).

In Coq, we can apply the hypothesis \(Q \rightarrow R\), named Hqr in our proof, with the apply command.

apply Hqr.

Note how Coq automatically updates the goal to be \(Q\) because \(Q\) is the premise of the implication we used.

Now, we can apply the \(P \rightarrow Q\) implication, Hpq, which will leave us with proving \(P\). But we already are assuming that \(P\) holds, Hp, completing the proof.

Because \(P \rightarrow Q\), it is sufficient to prove \(P\) in order to prove \(Q\), but we already know \(P\) from assumption.

Two applications of apply complete the Coq version of the proof. At this point, Coq reports that there are no proof goal left to prove; we are done! We then use the Qed command to finish the proof.

apply Hpq.
apply Hp.
Qed.

The preceding proposition was a statement in first-order logic. However, we can do more than this. We can both write functional programs and prove their correctness within Coq!

As an example of this, let’s consider the proposition that motivated our study of induction:

Claim (Length of Appended Lists): for all lists l1 and l2, (+ (length l1) (length l2)) ≡ (length (append l1 l2)).

Coq doesn’t support reasoning about Racket programs, but it has its own typed, functional programming language that behaves similarly to the subset of Racket that we looked at. To keep our discussion self-contained, we’ll define a simple list datatype as well as appropriate length and append functions.

Inductive list :=
| nil : list
| cons : nat -> list -> list.

Fixpoint length (l:list) : nat :=
  match l with
  | nil => 0
  | cons _ l' => 1 + length l'
  end.
  
Fixpoint append (l1 l2:list) : list :=
  match l1 with
  | nil => l2
  | cons x l1' => cons x (append l1' l2)
  end.

While the keywords are likely unfamiliar to you, you should recognize the first Inductive definition as defining the list type with two possibilities nil and cons. The second two Fixpoint definitions define list length and list append in terms of pattern matching.

With these definitions, we can now state our lemma in Coq:

Lemma length_append : forall (l1 l2 : list), length l1 + length l2 = length (append l1 l2).
Proof.

Recall that we proved this lemma by induction on \(l_1\), producing two cases: \(l_1\) is empty or non-empty.

Let \(l_1\) and \(l_2\) be arbitrary lists. We proceed by induction on \(l_1\).

We achieve the same effect in Coq with the induction command.

intros l1 l2.
induction l1 as [ | x l1'].

With the induction command, we create two proof goals in Coq: one for when l1 = nil and another for when l1 = cons x l1' for some value x and some sublist l1'.

In the case when \(l_1\) is nil, we can evaluate both sides of the equality to length l2, proving the case.

\(l_1\) is nil. Both the left- and right-hand sides of the equality evaluate to length l2.

In Coq, we can use the simpl command to evaluate the goal as much as possible. From here, we can use reflexivity to prove the goal because it becomes an equality between two identical expressions.

+ simpl.
  reflexivity.

When \(l_1\) is a cons, we gained an induction hypothesis specialized to l1. We evaluated the goal and then applied the induction hypothesis to complete the proof.

\(l_1\) is cons x l1' for some element x and sublist l1'. Our induction hypothesis says:

IH: length l1' + length l2 = length (append l1' l2).

The left-hand side of the goal evaluates to:
```
length (cons x l1') + length l2 -->* 1 + length l1' + length l2
```
The right-hand side of the goal evaluates to:
```
length (append (cons x l1') l2) -->* 1 + length (append l1' l2)
```
At this point, our goal is true via our induction hypothesis.

Importantly, Coq generates our induction hypothesis for us in this case (labeled IHl1') so we can’t get it wrong! From here, we use simpl to simplify the goal and use the induction hypothesis to rewrite our goal with the rewrite command before using reflexivity.

+ simpl.
  rewrite IHl1'.
  reflexivity.
Qed.

Exercise (Beginning to fly):

Try completing these Coq proofs on your own! Note, to break apart an assumed conjunction or disjunction—by “assuming both sides” of a conjunction or “performing case analysis” on a disjunction—use destruct v where v is the object, e.g., an expression or a proposition, you wish to break apart. Like the example above, you should use bullets to separate out the cases when they arise.

Lemma exercise_logic1: forall (P Q:Prop), P /\ Q -> P.
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  Admitted.

Lemma exercise_logic2: forall (P Q R:Prop), P \/ Q -> (P -> R) -> (Q -> R) -> R.
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  Admitted.

Definition xor (b1 b2:bool) : bool :=
  match b1 with
  | true => (negb b2)
  | false => b2
  end.

Lemma xor_cancellation: forall (b:bool), xor b b = false.
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  Admitted.

Lemma append_nil: forall (l:list), append nil l = l.
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  (* NOTE: this proof should be simple... *)
  Admitted.

Lemma nil_append: forall (l:list), append l nil = l.
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  (* NOTE: this proof should require induction! Why was that the case again? *)
  Admitted.

Exercise (Building up Knowledge):

Complete the following pair of proofs that shows that rev preserves the length of its argument. To prove this claim, you first need to prove an auxiliary claim about snoc, the function that rev uses!

Fixpoint snoc (v:nat) (l:list): list :=
  match l with
  | nil => cons v nil
  | cons x l' => cons x (snoc v l')
  end.

Fixpoint rev (l:list): list :=
  match l with
  | nil => nil
  | cons x l' => snoc x (rev l')
  end.

Lemma snoc_length: forall (v:nat) (l:list), length (snoc v l) = 1 + (length l).
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  Admitted.

Lemma rev_length: forall (l:list), length (rev l) = length l.
Proof.
  (* TODO: replace "Admitted." and fill me in, ending in the "Qed." command! *)
  (* NOTE: you can `apply snoc_length.` to use it in this proof! *)
  Admitted.

CSC/MAT 208 (Spring 2024)

Lab: Automated Theorem Proving