Generalized fold for inductive datatypes in coq - fold

I've found myself repeating a pattern over and over again, and I'd like to abstract it. I'm fairly confident that coq is sufficiently expressive to capture the pattern, but I'm having a bit of trouble figuring out how to do so. I'm defining a programming language, which has mutually recursive inductive datatypes representing the syntactic terms:
Inductive Expr : Set :=
| eLambda (x:TermVar) (e:Expr)
| eVar (x:TermVar)
| eAscribe (e:Expr) (t:IFType)
| ePlus (e1:Expr) (e2:Expr)
| ... many other forms ...
with DType : Set :=
| tArrow (x:TermVar) (t:DType) (c:Constraint) (t':DType)
| tInt
| ... many other forms ...
with Constraint : Set :=
| cEq (e1:Expr) (e2:Expr)
| ...
Now, there are a number of functions that I need to define over these types. For example, I'd like a function to find all of the free variables, a function to perform substitution, and a function to pull out the set of all constraints. These functions all have the following form:
Fixpoint doExpr (e:Expr) := match e with
(* one or two Interesting cases *)
| ...
(* lots and lots of boring cases,
** all of which just recurse on the subterms
** and then combine the results in the same way
*)
| ....
with doIFType (t:IFType) := match t with
(* same structure as above *)
with doConstraint (c:Constraint) := match c with
(* ditto *)
For example, to find free variables, I need to do something interesting in the variable cases and the cases that do binding, but for everything else I just recursively find all of the free variables of the subexpressions and then union those lists together. Similarly for the function that produces a list of all of the constraints. The substitution case is a little bit more tricky, because the result types of the three functions are different, and the constructors used to combine the subexpressions are also different:
Variable x:TermVar, v:Expr.
Fixpoint substInExpr (e:Expr) : **Expr** := match e with
(* interesting cases *)
| eLambda y e' =>
if x = y then eLambda y e' else eLambda y (substInExpr e')
| eVar y =>
if x = y then v else y
(* boring cases *)
| eAscribe e' t => **eAscribe** (substInExpr e') (substInType t)
| ePlus e1 e2 => **ePlus** (substInExpr e1) (substInExpr e2)
| ...
with substInType (t:Type) : **Type** := match t with ...
with substInConstraint (c:Constraint) : **Constraint** := ...
.
Writing these functions is tedious and error prone, because I have to write out all of the uninteresting cases for each function, and I need to make sure I recurse on all of the subterms. What I would like to write is something like the following:
Fixpoint freeVars X:syntax := match X with
| syntaxExpr eVar x => [x]
| syntaxExpr eLambda x e => remove x (freeVars e)
| syntaxType tArrow x t1 c t2 => remove x (freeVars t1)++(freeVars c)++(freeVars t2)
| _ _ args => fold (++) (map freeVars args)
end.
Variable x:TermVar, v:Expr.
Fixpoint subst X:syntax := match X with
| syntaxExpr eVar y => if y = x then v else eVar y
| syntaxExpr eLambda y e => eLambda y (if y = x then e else (subst e))
| syntaxType tArrow ...
| _ cons args => cons (map subst args)
end.
The key to this idea is the ability to generally apply a constructor to some number of arguments, and to have some kind of "map" that that preserves the type and number of arguments.
Clearly this pseudocode doesn't work, because the _ cases just aren't right. So my question is, is it possible to write code that is organized this way, or am I doomed to just manually listing out all of the boring cases?

Here's another way, though it's not everyone's cup of tea.
The idea is to move recursion out of the types and the evaluators, parameterizing it instead, and turning your expression values into folds. This offers convenience in some ways, but more effort in others -- it's really a question of where you end up spending the most time. The nice aspect is that evaluators can be easy to write, and you won't have to deal with mutually recursive definitions. However, some things that are simpler the other way can become brain-twisters in this style.
Require Import Ssreflect.ssreflect.
Require Import Ssreflect.ssrbool.
Require Import Ssreflect.eqtype.
Require Import Ssreflect.seq.
Require Import Ssreflect.ssrnat.
Inductive ExprF (d : (Type -> Type) -> Type -> Type)
(c : Type -> Type) (e : Type) : Type :=
| eLambda (x:nat) (e':e)
| eVar (x:nat)
| eAscribe (e':e) (t:d c e)
| ePlus (e1:e) (e2:e).
Inductive DTypeF (c : Type -> Type) (e : Type) : Type :=
| tArrow (x:nat) (t:e) (c':c e) (t':e)
| tInt.
Inductive ConstraintF (e : Type) : Type :=
| cEq (e1:e) (e2:e).
Definition Mu (f : Type -> Type) := forall a, (f a -> a) -> a.
Definition Constraint := Mu ConstraintF.
Definition DType := Mu (DTypeF ConstraintF).
Definition Expr := Mu (ExprF DTypeF ConstraintF).
Definition substInExpr (x:nat) (v:Expr) (e':Expr) : Expr := fun a phi =>
e' a (fun e => match e return a with
(* interesting cases *)
| eLambda y e' =>
if (x == y) then e' else phi e
| eVar y =>
if (x == y) then v _ phi else phi e
(* boring cases *)
| _ => phi e
end).
Definition varNum (x:ExprF DTypeF ConstraintF nat) : nat :=
match x with
| eLambda _ e => e
| eVar y => y
| _ => 0
end.
Compute (substInExpr 2 (fun a psi => psi (eVar _ _ _ 3))
(fun _ phi =>
phi (eLambda _ _ _ 1 (phi (eVar _ _ _ 2)))))
nat varNum.
Compute (substInExpr 1 (fun a psi => psi (eVar _ _ _ 3))
(fun _ phi =>
phi (eLambda _ _ _ 1 (phi (eVar _ _ _ 2)))))
nat varNum.

Here is a way to go, but it does not give very readable code: use tactics.
Let's say I have a language with many constructors of various arity, and I want to apply a specific goal only to the case given by constructor aaa, and I want to traverse all the other constructors, to get down to the aaa's that may appear under them. I can do the following:
Say you want to define a function A -> B (A is the type of the language), you will need to keep track of what case you are in,
so you should define a phantom type over A, reducing to B.
Definition phant (x : A) : Type := B.
I suppose that the union function has type B -> B -> B and that you have a default value in B, called empty_B
Ltac generic_process f acc :=
match goal with
|- context [phan (aaa _)] => (* assume aaa has arith 1 *)
intros val_of_aaa_component; exact process_this_value val_of_aaa_component
| |- _ =>
(* This should be used when the next argument of the current
constructor is in type A, you want to process recursively
down this argument, using the function f, and keep this result
in the accumulator. *)
let v := fresh "val_in_A" in
intros v; generic_process f (union acc (f v))
(* This clause will fail if val_in_A is not in type A *)
| |- _ => let v := fresh "val_not_in_A" in
(* This should be used when the next argument of the current
constructor is not in type A, you want to ignore it *)
intros v; generic_process f acc
| |- phant _ =>
(* this rule should be used at the end, when all
the arguments of the constructor have been used. *)
exact acc
end.
Now, you define the function by a proof. Let's say the function is called process_aaa.
Definition process_aaa (x : A) : phant x.
fix process_aaa 1.
(* This adds process_add : forall x:A, phant x. in the context. *)
intros x; case x; generic_process process_aaa empty_B.
Defined.
Note that the definition of generic_process only mention one constructor by name, aaa, all others
are treated in a systematic way. We use the type information to detect those sub-components in which we want to perform a recursive descent. If you have several mutually inductive types, you can add arguments to the generic_process function to indicate which function will be used for each type and have more clauses, one for each argument of each type.
Here is a test of this idea, where the language has 4 constructors, values to be processed are the ones that appear in the constructor var and the type nat is also used in another constructor (c2). We use the type of lists of natural numbers as the type B, with nil as the empty and singleton lists as result when encountering variables. The function collects all occurrences of var.
Require Import List.
Inductive expr : Type :=
var : nat -> expr
| c1 : expr -> expr -> expr -> expr
| c2 : expr -> nat -> expr
| c3 : expr -> expr -> expr
| c4 : expr -> expr -> expr
.
Definition phant (x : expr) : Type := list nat.
Definition union := (#List.app nat).
Ltac generic_process f acc :=
match goal with
|- context[phant (var _)] => exact (fun y => y::nil)
| |- _ => let v := fresh "val_in_expr" in
intros v; generic_process f (union acc (f v))
| |- _ => let v := fresh "val_not_in_expr" in
intros v; generic_process f acc
| |- phant _ => exact acc
end.
Definition collect_vars : forall x : expr, phant x.
fix collect_vars 1.
intros x; case x; generic_process collect_vars (#nil nat).
Defined.
Compute collect_vars (c1 (var 0) (c2 (var 4) 1)
(c3 (var 2) (var 3))).
The last computation returns a list containing values 0 4 2 and 3 as expected, but not 1, which did not occur inside a var constructor.

Related

Why Peano numbers in OCaml not working due to scope error?

I have the following peano number written with GADTs:
type z = Z of z
type 'a s = Z | S of 'a
type _ t = Z : z t | S : 'n t -> 'n s t
module T = struct
type nonrec 'a t = 'a t
end
type 'a nat = 'a t
type e = T : 'n nat -> e
The following function to decode a 'a nat (or 'a t) into the number it encoded, works:
let to_int : type n. n t -> int =
let rec go : type n. int -> n t -> int =
fun acc n -> match n with Z -> acc | S n -> go (acc + 1) n
in
fun x -> go 0 x
but if I try to rewrite it almost exactly the same this way:
let to_int2 (type a) (a: a nat) : int =
let rec go (type a) (acc : int) (x : a nat) : int =
match x with
| Z -> acc
| S v -> go (acc + 1) v
in
go 0 a
I get a scope error. What's the difference between the two functions?
138 | | S v -> go (acc + 1) v
^
Error: This expression has type $0 t but an expression was expected of type
'a
The type constructor $0 would escape its scope
The root issue is polymorphic recursion, GADTs are a red herring here.
Without an explicit annotation, recursive functions are not polymorphic in their own definition.
For instance, the following function has type int -> int
let rec id x =
let _discard = lazy (id 0) in
x;;
because id is not polymorphic in
let _discard = lazy (id 0) in
and thus id 0 implies that the type of id is int -> 'a which leads to id having type int -> int.
In order to define polymorphic recursive function, one need to add an explicit universally quantified annotation
let rec id : 'a. 'a -> 'a = fun x ->
let _discard = lazy (id 0) in
x
With this change, id recovers its expected 'a -> 'a type.
This requirement does not change with GADTs. Simplifying your code
let rec to_int (type a) (x : a nat) : int =
match x with
| Z -> 0
| S v -> 1 + to_int v
the annotation x: a nat implies that the function to_int only works with a nat, but you are applying to an incompatible type (and ones that lives in a too narrow scope but that is secondary).
Like in the non-GADT case, the solution is to add an explicit polymorphic annotation:
let rec to_int: 'a. 'a nat -> int = fun (type a) (x : a nat) ->
match x with
| Z -> 0
| S v -> 1 + to_int v
Since the form 'a. 'a nat -> int = fun (type a) (x : a nat) -> is both a mouthful and quite often needed with recursive function on GADTs, there is a shortcut notation available:
let rec to_int: type a. a nat -> int = fun x ->
match x with
| Z -> 0
| S v -> 1 + to_int v
For people not very familiar with GADTs, this form is the one to prefer whenever one write a GADT function. Indeed, not only this avoids the issue with polymorphic recursion, writing down the explicit type of a function before trying to implement it is generally a good idea with GADTs.
See also https://ocaml.org/manual/polymorphism.html#s:polymorphic-recursion , https://ocaml.org/manual/gadts-tutorial.html#s%3Agadts-recfun , and https://v2.ocaml.org/manual/locallyabstract.html#p:polymorpic-locally-abstract .

Why does OCaml think that this function takes an int parameter when nothing suggests that it should be the case?

I was working on chapter 1 of Modern Compiler Implementation in ML by Andrew Appel and I decided to implement it in OCaml instead of SML. I'm new to OCaml and I came across a very frustrating problem. OCaml seems to think that the below function has the signature int * (int * 'a) -> 'a option.
let rec lookupTable = function
| name, (i, v) :: _ when name = i -> Some v
| name, (_, _) :: rest -> lookupTable (name, rest)
| _, [] -> None
But as far as I can tell, there should be nothing that suggests that the first element in the tuple is an int. This is a problem because when the lookupTable function down the line, the compiler complains that I am not passing it an integer. Perhaps I am missing something incredibly obvious, but it has been pretty mind-boggling. Here is the rest of the program
open Base
type id = string
type binop = Plus | Minus | Times | Div
type stm =
| CompoundStm of stm * stm
| AssignStm of id * exp
| PrintStm of exp list
and exp =
| IdExp of id
| NumExp of int
| OpExp of exp * binop * exp
| EseqExp of stm * exp
(* Returns the maximum number of arguments of any print
statement within any subexpression of a given statement *)
let rec maxargs s =
match s with
| CompoundStm (stm1, stm2) -> Int.max (maxargs stm1) (maxargs stm2)
| AssignStm (_, exp) -> maxargs_exp exp
(* Might be more nested expressions *)
| PrintStm exps -> Int.max (List.length exps) (maxargs_explist exps)
and maxargs_exp e = match e with EseqExp (stm, _) -> maxargs stm | _ -> 0
and maxargs_explist exps =
match exps with
| exp :: rest -> Int.max (maxargs_exp exp) (maxargs_explist rest)
| [] -> 0
type table = (id * int) list
let updateTable name value t : table = (name, value) :: t
let rec lookupTable = function
| name, (i, v) :: _ when name = i -> Some v
| name, (_, _) :: rest -> lookupTable (name, rest)
| _, [] -> None
exception UndefinedVariable of string
let rec interp s =
let t = [] in
interpStm s t
and interpStm s t =
match s with
| CompoundStm (stm1, stm2) -> interpStm stm2 (interpStm stm1 t)
| AssignStm (id, exp) ->
let v, t' = interpExp exp t in
updateTable id v t'
(* Might be more nested expressions *)
| PrintStm exps ->
let interpretAndPrint t e =
let v, t' = interpExp e t in
Stdio.print_endline (Int.to_string v);
t'
in
List.fold_left exps ~init:t ~f:interpretAndPrint
and interpExp e t =
match e with
| IdExp i -> (
match lookupTable (i, t) with
| Some v -> (v, t)
| None -> raise (UndefinedVariable i))
| NumExp i -> (i, t)
| OpExp (exp1, binop, exp2) ->
let exp1_val, t' = interpExp exp1 t in
let exp2_val, _ = interpExp exp2 t' in
let res =
match binop with
| Plus -> exp1_val + exp2_val
| Minus -> exp1_val - exp2_val
| Times -> exp1_val * exp2_val
| Div -> exp1_val / exp2_val
in
(res, t')
| EseqExp (s, e) -> interpExp e (interpStm s t)
Base defines = as int -> int -> bool, so when you have the expression name = i the compiler will infer them as ints.
You can access the polymorphic functions and operators through the Poly module, or use a type-specific operator by locally opening the relevant module, e.g. String.(name = i).
The reason Base does not expose polymorphic operators by default is briefly explained in the documentation's introduction:
The comparison operators exposed by the OCaml standard library are polymorphic:
What they implement is structural comparison of the runtime representation of values. Since these are often error-prone, i.e., they don't correspond to what the user expects, they are not exposed directly by Base.
There's also a performance-argument to be made, because the polymorphic/structural operators need to also inspect what kind of value it is at runtime in order to compare them correctly.

Ocaml user-defined type pattern matching

Here is my type definition:
type ('type1, 'type2) symbol =
| N of 'type1
| T of 'type2
Here are some types:
type mysub =| Abc | Bcd | Def
I also have a list [N Abc;N Bcd; T"("].
What I want to do is that throw away all items of type T, and also throw away 'type1 or 'type2. So the desired result is [Abc; Bcd]
But when I try this piece of code:
List.map (fun x-> match x with N (a)->a |T (b) ->b ) (List.filter (fun x->match x with
N (type1) ->true |T (type2) -> false) [N Abc;N Bcd; T"("]);;
it gives me the following message:
Error: This expression has type (mysub, string) symbol list
but an expression was expected of type
(mysub, mysub) symbol list
Type string is not compatible with type mysub.
How can I fix it?
In this fragment
List.map
(fun x -> match x with
N a -> a
T b -> b
)
The return types aren't the same for the two match cases. If there were any T b elements in the list, the b would be a string, and the compiler doesn't know there aren't any such elements. Since you know there aren't any, you could fix this by supplying something other than b as the result for that case. Something like this:
List.map
(fun x -> match x with
N a -> a
T _ -> Abc
)
Or even this:
List.map
(fun x -> match x with
N a -> a
T _ -> failwith "This can't happen"
)

When is OCaml's warning 27 "Innocuous unused variable" useful?

This is the description of warning 27 from the OCaml manual:
27 Innocuous unused variable: unused variable that is not bound with let nor as, and doesn't start with an underscore (_) character.
This warning is turned on by jbuilder --dev, and I'm curious to know in which cases people find it useful. For me, it's an annoyance to get warnings when I write code like this:
$ utop -w +27
utop # fun (x, y) -> x;;
Characters 8-9:
Warning 27: unused variable y.
- : 'a * 'b -> 'a = <fun>
or like that:
utop # let error loc msg = failwith (loc ^ ": " ^ msg);;
val error : string -> string -> 'a = <fun>
utop # let rec eval = function
| `Plus (loc, a, b) -> eval a + eval b
| `Minus (loc, a, b) -> eval a - eval b
| `Star (loc, a, b) -> eval a * eval b
| `Slash (loc, a, b) ->
let denom = eval b in
if denom = 0 then
error loc "division by zero"
else
eval a / denom
| `Int (loc, x) -> x
;;
Characters 33-36:
Warning 27: unused variable loc.
Characters 73-76:
Warning 27: unused variable loc.
Characters 112-115:
Warning 27: unused variable loc.
Characters 287-290:
Warning 27: unused variable loc.
val eval :
([< `Int of 'b * int
| `Minus of 'c * 'a * 'a
| `Plus of 'd * 'a * 'a
| `Slash of 'e * 'a * 'a
| `Star of 'f * 'a * 'a ]
as 'a) ->
int = <fun>
I know that prepending an underscore to the identifiers as in _loc suppresses the warnings, but it's not compatible with my notions that:
variables starting with an underscore are ugly and are meant for use in generated code, hidden from the programmer;
a name given to something should not have to change based on how it's used (including unused).
Using underscores, the code becomes:
(* Here we have _loc or loc depending on whether it's used. *)
let rec eval = function
| `Plus (_loc, a, b) -> eval a + eval b
| `Minus (_loc, a, b) -> eval a - eval b
| `Star (_loc, a, b) -> eval a * eval b
| `Slash (loc, a, b) ->
let denom = eval b in
if denom = 0 then
error loc "division by zero"
else
eval a / denom
| `Int (_loc, x) -> x
or
(* Here it can be hard to know what _ stands for. *)
let rec eval = function
| `Plus (_, a, b) -> eval a + eval b
| `Minus (_, a, b) -> eval a - eval b
| `Star (_, a, b) -> eval a * eval b
| `Slash (loc, a, b) ->
let denom = eval b in
if denom = 0 then
error loc "division by zero"
else
eval a / denom
| `Int (_, x) -> x
It is very useful in the monadic code, where instead of the common syntactic let bindings you're forced to use monadic >>= bind operator. Basically, where
let x = something in
code
translates to
something >>= fun x ->
code
If x is not used in code then only with the 27 warning enabled the latter will be highlighted, while the former will produce a warning by default. Enabling this warning, revealed lots of bugs for us. For example, it showed us that this code is buggy :)
Another source of use cases are higher-order functions, i.e., map, fold, etc. It captures one of the most common bugs:
let bug init =
List.fold ~init ~f:(fun acc xs ->
List.fold ~init ~f:(fun acc x -> x :: acc))
Concerning the ugliness, I totally agree that underscores are ugly, but in most cases, this is the main purpose of them - to highlight the suspicious code. Concerning the example, that you're showing, in the modern OCaml it could be easily addressed with the inline records, e.g.,
type exp =
| Plus of {loc : loc; lhs : exp; rhs: exp}
| ...
so that instead of using the underscores, you can just omit the unused field,
let rec eval = function
| Plus {lhs; rhs} -> eval lhs + eval rhs
You can use the same approach without using inline records by sparing some extra space in your program and defining all those records separately. The real-world example.
For me this warning is useful in order to remind me to explicit more my intention. If we take your example :
fun (x, y) -> x;;
Your intention is to use only the first element. If we rewrite it this way :
fun (x, _ ) -> x;;
You use a pattern matching in the parameter to make your code more concise, but you explain your intention of using only the first element. The added value in this example is small, related to the very simple implementation. But in real life functions, this warning promote a good habit in coding.

how to implement lambda-calculus in OCaml?

In OCaml, it seems that "fun" is the binding operator to me. Does OCaml have built-in substitution? If does, how it is implemented? is it implemented using de Bruijn index?
Just want to know how the untyped lambda-calculus can be implemented in OCaml but did not find such implementation.
As Bromind, I also don't exactly understand what you mean by saying "Does OCaml have built-in substitution?"
About lambda-calculus once again I'm not really understand but, if you talking about writing some sort of lambda-calculus interpreter then you need first define your "syntax":
(* Bruijn index *)
type index = int
type term =
| Var of index
| Lam of term
| App of term * term
So (λx.x) y will be (λ 0) 1 and in our syntax App(Lam (Var 0), Var 1).
And now you need to implement your reduction, substitution and so on. For example you may have something like this:
(* identity substitution: 0 1 2 3 ... *)
let id i = Var i
(* particular case of lift substitution: 1 2 3 4 ... *)
let lift_one i = Var (i + 1)
(* cons substitution: t σ(0) σ(1) σ(2) ... *)
let cons (sigma: index -> term) t = function
| 0 -> t
| x -> sigma (x - 1)
(* by definition of substitution:
1) x[σ] = σ(x)
2) (λ t)[σ] = λ(t[cons(0, (σ; lift_one))])
where (σ1; σ2)(x) = (σ1(x))[σ2]
3) (t1 t2)[σ] = t1[σ] t2[σ]
*)
let rec apply_subs (sigma: index -> term) = function
| Var i -> sigma i
| Lam t -> Lam (apply_subs (function
| 0 -> Var 0
| i -> apply_subs lift_one (sigma (i - 1))
) t)
| App (t1, t2) -> App (apply_subs sigma t1, apply_subs sigma t2)
As you can see OCaml code is just direct rewriting of definition.
And now small-step reduction:
let is_value = function
| Lam _ | Var _ -> true
| _ -> false
let rec small_step = function
| App (Lam t, v) when is_value v ->
apply_subs (cons id v) t
| App (t, u) when is_value t ->
App (t, small_step u)
| App (t, u) ->
App (small_step t, u)
| t when is_value t ->
t
| _ -> failwith "You will never see me"
let rec eval = function
| t when is_value t -> t
| t -> let t' = small_step t in
if t' = t then t
else eval t'
For example you can evaluate (λx.x) y:
eval (App(Lam (Var 0), Var 1))
- : term = Var 1
OCaml does not perform normal-order reduction and uses call-by-value semantics. Some terms of lambda calculus have a normal form than cannot be reached with this evaluation strategy.
See The Substitution Model of Evaluation, as well as How would you implement a beta-reduction function in F#?.
I don't exactly understand what you mean by saying "Does OCaml have built-in substitution? ...", but concerning how the lambda-calculus can be implemented in OCaml, you can indeed use fun : just replace all the lambdas by fun, e.g.:
for the church numerals: you know that zero = \f -> (\x -> x), one = \f -> (\x -> f x), so in Ocaml, you'd have
let zero = fun f -> (fun x -> x)
let succ = fun n -> (fun f -> (fun x -> f (n f x)))
and succ zero gives you one as you expect it, i.e. fun f -> (fun x -> f x) (to highlight it, you can for instance try (succ zero) (fun s -> "s" ^ s) ("0") or (succ zero) (fun s -> s + 1) (0)).
As far as I remember, you can play with let and fun to change the evaluation strategy, but to be confirmed...
N.B.: I put all parenthesis just to make it clear, maybe some can be removed.