A Provably-Correct Optimizer

In this lab, we’ll walk through an example of using the Coq proof assistant to define a small programming language of arithmetic expressions and an optimizer over that language. We will then use Coq to prove the correctness of the optimizer.

First, let’s define our language. This is a simple toy programming language that consists of only two constructs: numbers and addition. To define the language, we use an inductive data type. This inductive datatype, exp, defines a new type as well as two constructors of that type. We can think of these constructors as functions that produce exps:

ELit takes a natural number (of type nat) as input and produces an exp as output.
EPlus takes two expressions as input and produces an exp as output.

Import Nat.

Inductive exp : Type :=
| ELit : nat -> exp
| EPlus : exp -> exp -> exp.

In Coq, we write a function invocation by specifying the function and then the arguments, separated by spaces. For example ELit 5 produces an exp which we can verify with the Check command in Coq:

Check ELit 5.

To express a more complicated expression, e.g., \(1 + 1\), we can nest calls to the various constructors:

Check (EPlus (ELit 1) (ELit 1)).

Next let’s define an evaluator for our language. We define evaluation for this language as a function that takes an exp as input and produces a nat as output. Because expressions are recursive—an addition has two sub-expressions—we can define evaluation recursively as follows:

\[\begin{align*} \mathsf{eval}(n) =&\; n \\ \mathsf{eval}(e_1 + e_2) =&\; \mathsf{eval}(e_1) + \mathsf{eval}(e_2) \end{align*}\]

Translating this algorithm into English:

If the expression is a number \(n\), we just evaluate to that number.
If the expression is an addition \(e_1 + e_2\), then we recursively evaluate the sub-expressions \(e_1\) and \(e_2\) and then add their resulting values together.

We can express this algorithm directly in Coq using pattern matching:

Fixpoint eval (e:exp) : nat :=
  match e with
  | ELit n => n
  | EPlus e1 e2 => add (eval e1) (eval e2)
  end.

We can use the Eval Coq command to see how our eval function behaves:

Eval simpl in (ELit 5).
Eval simpl in (EPlus (ELit 1) (ELit 1)).

With the evaluator defined, we can move onto the optimizer! Our optimizer will be fairly straightforward: it will rewrite all occurrences of \(n + 0\) and \(0 + n\) to \(n\). We can also define our optimizer recursively as follows:

\[\begin{align*} \mathsf{optimize}(n) =&\; n \\ \mathsf{optimize}(0 + e) =&\; \mathsf{optimize}(e) \\ \mathsf{optimize}(e + 0) =&\; \mathsf{optimize}(e) \\ \mathsf{optimize}(e_1 + e_2) =&\; \mathsf{opitmize}(e_1) + \mathsf{optimize}(e_2) \end{align*}\]

Note that the second and third cases are more-specific instances of the general case of addition. Traditionally, we interpret the pattern of a recursive definition in top-down order, so these patterns would match before the final “catch-all” pattern.

Like eval, we can translate this recursive definition into Coq via pattern matching:

Fixpoint optimize (e:exp) : exp :=
  match e with
  | ELit n => ELit n
  | EPlus (ELit O) e => e
  | EPlus e (ELit O) => e
  | EPlus e1 e2 => EPlus (optimize e1) (optimize e2)
  end.

With eval and optimize defined, we can now prove the correctness of our optimizer. What does correctness mean in this context? We would like to ensure that our optimized program produces the same output as our unoptimized program, a property called observational equivalence.

Lemma (Observation Equivalence of \(\mathsf{optimize}\)): for any expression \(e\), \(\mathsf{eval}(e) = \mathsf{eval}(\mathsf{optimize}(e))\).

Lemma optimize_correct : forall (e:exp), eval e = eval (optimize e).

To prove observational equivalence of our optimizer, we proceed by induction on our expression:

Proof.

intros e.
induction e.

There are two cases, the case where e is a number (ELit) and the case where e is an addition (EPlus). First, let’s consider the ELit case:

* simpl.
  reflexivity.

Problem 1: Step through this case of the proof and observe how the proof state changes. From this, give a proof of this case in prose.

Now let’s consider the EPlus case. Our expression \(e\) is now an addition \(e_1 + e_2\). Recall that our optimizer does different things to an addition depending on the arguments:

If either \(e_1\) or \(e_2\) is zero, the optimizer recursively optimizes the other side and then returns that optimized expression.
Otherwise the optimizer recursively optimizes \(e_1\) and \(e_2\) and returns an updated addition with the sub-expressions optimized.

Note that our induction hypotheses applies to \(e_1\) and \(e_2\) in this case. They are the IHe1 and IHe2 assumptions in this proof.

Our analysis must proceed by further case analysis on \(e_1\) and \(e_2\). First, we destruct \(e_1\) and continue our analysis.

Problem 2: Step through the case of the proof where \(e_1\) is an ELit and observe how the proof state changes. From this, give a proof of this case in prose.

* simpl.
  destruct e1 as [n1 | e11 e12].
    (* e1 = ELit n *)
    + destruct n1 as [ | n1'].
      (* n1 = O *)
      - auto.
      (* n1 = S n1' *)
      - destruct e2 as [n2 | e21 e22].
        (* e2 = ELit n *)
        ** destruct n2 as [ | n2'].
           (* n2 = O *)
           ++ simpl. rewrite <- plus_n_O. reflexivity.
           (* n2 = S n2' *)
           ++ rewrite IHe1. rewrite IHe2. reflexivity.
        (* e2 = EPlus e21 e22 *)
        ** rewrite IHe1. rewrite IHe2. reflexivity.

Now try writing some Coq on your own! Use what you’ve seen to complete the case where \(e_1\) is an EPlus. I’ve started you off with a case analysis on e2 below.

Problem 3: Remove the line admit and complete the proof in Coq!

    (* e1 = EPlus e11 e12 *)
    + destruct e2 as [n2 | e21 e22].
      (* TODO: remove the admits below and complete the proof! *)
      (* e2 = EInt n2 *)
      - admit.
      (* e2 = EPlus e21 e22 *)
      - admit.

(* Once you are done, replace the "Admitted." with a "Qed." to complete the proof! *)
Admitted.

The proof above contained significant redundancy. For reference, here is an alternative version of the proof that uses Coq proof automation to prove the claim without the need for laborious case analysis. In practice, we use a mixture of explicit proof steps and automation to engineer a proof in Coq!

Lemma optimize_correct' : forall (e:exp), eval e = eval (optimize e).
  intros e; induction e as [n | e1 IHe1 e2 IHe2]; auto.
  destruct e1 as [n1 | e1_1 IHe1_1 e1_2 IHe1_2];
  destruct e2 as [n2 | e2_1 IHe2_1 e2_2 IHe2_2];
  try destruct n1; try destruct n2; simpl;
  (try rewrite <- plus_n_O; auto || rewrite IHe1; rewrite IHe2; auto).
Qed.

CSC/MAT 208 (Spring 2024)

Lab: A Provably-Correct Optimizer