Using external type declarations with OCamlyacc - ocaml

I have a type expr in an expr.ml file. In parser.mly (OCamlyacc file), I define the expr rule and give the type :
%start expr
%type <expr> expr
However, I get :
File "parser.mli", line 34, characters 48-52:
Error: Unbound type constructor expr
I tried adding
%{
open Expr
%}
at the beginning of the .mly file but it still doesn't work. How may I define this expr type in an external file and use it as the return value of my rule? Thanks.

You need to qualify expr type with the module name. I.e., if it is defined in expression.ml (using type expr = ...) you should use
%type <Expresssion.expr> main
Note the capital E when using the module name.

I"m not sure if I'm understanding correctly.
But you are struggling with a circular dependency? Let's say T contains your type and calls the parser, P. P cannot produce type T.t since T depends on P, not the other way around. Normally, I've created a third file that contains the type information, T'.
For example,
T.ml
let parse filename : T'.t =
filename
|> open_in
|> Lexing.from_channel
|> P.command L.token
P.mly
%type <T'.t> command
%start command
%%
T'.ml
type t = Label of String
| Integer of String
| Float of string
| Star of t

Ocamlyacc doesn't let you specify text to be generated in the interface (.mli) file. So wherever you specify a type that goes into the interface (the type of a token or rule), you need to use a fully-qualified type.
Here it looks like you can use a fully-qualified type, but sometimes that's not possible because the type involves a functor application. There are several workarounds:
Arrange to build all functors in a separate compilation unit. This is easy, but doesn't work e.g. if the functors involve the token type.
Do post-processing on the ocamlyacc-generated .mli file to add a header. You can do pretty much anything this way, but it's ugly and annoying.
Use Menhir, an improved replacement of Ocamlyacc. It's an additional dependency, but it does solve Ocamlyacc's main shortcomings.

Related

OCAML module contains type variables that cannot be generalized

Code:
let size = 10
let getTbl = Array.init size ~f:(fun _ -> Avltree.empty )
end
Error:
Error: The type of this module,
sig val size : int val getTbl : ('_weak1, '_weak2) Avltree.t array end,
contains type variables that cannot be generalized
How do I let the Ocaml compiler know that I plan to store both my key's and values as ints?
Have tried a few different approaches - none of which have worked.
Weak type variables denote types that are not yet inferred, usually because you have defined a program variable and never used it, so the type checker has no idea what this variable contains. It is fine, in general, as the first usage of the variable will define its type. However, since the whole type checking routine in OCaml is bounded by the scope of a compilation unit (i.e., a file), such variables should be defined before you compile your file.
Therefore, you have to either (1) use the variable, (2) constraint it to some type, e.g., (let getTbl : (int, int) Avltree.t array) .. in the implementation (.ml) file, or (3) in the mli file. You can even just create an empty .mli file (with the same name as you .ml file) and this will automatically hide all variables defined in your module and enable compilation.
It might work to change Avltree.empty to (Avltree.empty : (int, int) Avltree.t)

Ocaml: warning 40 when usage of tag of a variant type in a simple module

I have a simple module in a text file mpd.ml with variants types:
type ack_error =
| Not_list
| Arg
| Password
| Permission
| Unknown
| No_exist
| Playlist_max
| System
| Playlist_load
| Update_already
| Player_sync
| Exist
type response = Ok | Error of (ack_error * int * string * string)
And when I use them :
let test_ok test_ctxt = assert_equal Mpd.Ok (Mpd.parse_response "OK\n")
Even if everything works, I have those warnings:
ocamlfind ocamlc -o test -package oUnit,str -linkpkg -g mpd.ml test.ml
File "test.ml", line 7, characters 2-4:
Warning 40: Ok was selected from type Mpd.response.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
File "test.ml", line 8, characters 2-7:
Warning 40: Error was selected from type Mpd.response.
It is not visible in the current scope, and will not
be selected if the type becomes unknown.
What does it means and how can I improve my code so that those warnings disapear.
** edit **
full code : https://gist.github.com/cedlemo/8806f367a971bacfaa0f59b1c78a3605
It looks like that you're showing not the line, that provoked the warning. As in the warning it is said, that the Ok constructor is between characters 2-4, but there is nothing alike in your code.
In general, I would suggest to use IDE, like Emacs, Vim, etc, as they will directly jump to the source of the error.
Since, the warning is quite common, I will still explain the reasoning behind it. In OCaml constructors and field names are identifiers, that as well as any other identifier have a scope, and the scope is the module. So, whenever you define a variant type, you are actually defining several constructors in the scope of the module. To reference to the constructor, you need either to use a fully qualified name, or make sure that it is in the scope. If you're in the module, that defines it, then you're ok, otherwise you need to bring the name to the scope somehow.
In previous version of OCaml it was an error, to use a constructor, that is not in the scope. Just a regular unbound identifier. At the latest, the heuristics was added, that infers from which scope the constructor comes. But it is still guarded by a warning, so people is actually trying not to use it. (Digression, I'm wondering why people added a feature, and then momentary disgraced it with a warning, so no one will actually use it).
So, to fix the warning you need to qualify all constructors with the module name, or, alternatively open the module to bring all definitions to the scope, e.g., open Mpd.
Update
So, the code full code discloses that at the line 7, as indeed was pointed by a compiler there is an unqualified constructor:
match response with
| Ok -> false
| Error ...
Here the Ok is unqualified, the correct way is to say:
match response with
| Mpd.Ok -> false
| Mpd.Error ...
The general advice, that describes policy that I use in particular, is to define a module that defines only types, so that you can open it rather safely. This will also solve you a problem of repeating type definitions in .mli as it is considered acceptable to not to have .mli file for a module, that defines only types.

Functors in OCaml: triple code duplication necessary?

I'd like to clarify one point: currently it seems to me that triple signature duplication is necessary while declaring a functor, provided we export it in the .mli file. Here is an example:
Suppose we have a functor Make, which produces a module A parametrized by SigA (simplest example I could think of). Consequently, in the .mli file we have:
module type A = sig
type a
val identity : a -> a
end
module type SigA = sig
type a
end
module Make (MA:SigA) :
A with type a := MA.a
Now I understand that we have to write an implementation in the .ml file:
module Make (MA:SigA) = struct
type a = MA.a
let identity obj = obj
end
So far so good, right? No! Turns out we have to copy the declaration of A and SigA verbatim into the .ml file:
module type A = sig
type a
val identity : a -> a
end
module type SigA = sig
type a
end
module Make (MA:SigA) = struct
type a = MA.a
let identity obj = obj
end
While I (vaguely) understand the rationale behind copying SigA (after all, it is mentioned in the source code), copying A definition seems like a completely pointless exercise to me.
I've had a brief look through the Core codebase, and they just seem to either duplicate it for small modules and for larger once they export it to the separate .mli, which is used both from .ml and .mli.
So is it just a state of affairs? Is everyone fine with copying the module signature THREE times (once in the .mli file, two times in the .ml file: declaration and the definition!!)
Currently I'm considering just ditching .mli files altogether and restricting the modules export using signatures in the .ml files.
EDIT: yes I know that I can avoid this problem by declaring the interface for A inline inside Make in the .mli file. However this doesn't help me if I want to use that interface from outside of that module.
That's because a pair of ML and MLI file acts like a structure and a corresponding signature it is matched against.
The usual way to avoid writing out the module type twice is to define it in a separate ML file. For example,
(* sig.ml *)
module type A = sig
type a
end
module type B = sig
type b
val identity : b -> b
end
(* make.mli *)
module Make (A : Sig.A) : Sig.B with type b = A.a
(* make.ml *)
module Make (A : Sig.A) =
struct
type b = A.a
let identity x = x
end
It is fine to leave out an MLI file in the case where it does not hide anything, like for the Sig module above.
In other cases, writing out the signature separately from the implementation is a feature, and not really duplication -- it defines the export of a module, and usually, that is a small subset of what's in the implementation.

Cannot use backticks in term names as backtick quotes are being used by camlp5 (OCaml)

I'm using the Yojson library and one of the constructors used is called `Bool (with a backtick). I'm working with OCaml source where camlp5 is used so that text surrounded by backticks is interpreted differently (e.g. the text is converted to an OCaml data structure).
The problem I'm having is that when `Bool appears in my source code, camlp5/OCaml is seeing the backtick and thinking it is the start of the quote, causing an error. How can I make sure this is interpreted as an `Bool OCaml term instead? Is there some way to temporarily turn off what campl5 does? Some kind of escape character I can use?
Since you are using a syntax extension that overrides the behavior of backquotes, you cannot use polymorphic variants like `Bool in the same file.
I would advise you first to change the syntax extension to use a different character than backquotes. Why not %% for example ?
The other solution is simple, but more verbose: use two different files, one where you don't use the syntax extension, and another one where you use the syntax extension.
In the first file (without the syntax extension), you define a type with normal variants that are similar to the ones use in Yojson, and functions to translate from and to polymorphic variants:
type t =
| Bool of ...
| ...
let to_yojson x =
match x with
| Bool v -> `Bool v
| ...
let from_yojson x =
match x with
| `Bool v -> Bool v
| ...
This way, you can manipulate this new type in your code with the syntax extension without using backquotes, and then use the translation functions to call Yojson. There is a cost to the translation, but if it is your case, you should choose to modify the syntax extension.

Resolving typedefs in C and C++

I'm trying to automatically resolve typedefs in arbitrary C++ or C projects.
Because some of the typedefs are defined in system header files (for example uint32), I'm currently trying to achieve this by running the gcc preprocessor on my code files and then scanning the preprocessed files for typedefs. I should then be able to replace the typedefs in the project's code files.
I'm wondering, if there is another, perhaps simpler way, I'm missing. Can you think of one?
The reason, why I want to do this: I'm extracting code metrics from the C/C++ projects with different tools. The metrics are method-based. After extracting the metrics, I have to merge the data, that is produced by the different tools. The problem is, that one of the tools resolves typedefs and others don't. If there are typedefs used for the parameter types of methods, I have metrics mapped to different method-names, which are actually referring to the same method in the source code.
Think of this method in the source code: int test(uint32 par1, int par2)
After running my tools I have metrics, mapped to a method named int test(uint32 par1, int par2) and some of my metrics are mapped to int test(unsigned int par1, int par2).
If you do not care about figuring out where they are defined, you can use objdump to dump the C++ symbol table which resolves typedefs.
lorien$ objdump --demangle --syms foo
foo: file format mach-o-i386
SYMBOL TABLE:
00001a24 g 1e SECT 01 0000 .text dyld_stub_binding_helper
00001a38 g 1e SECT 01 0000 .text _dyld_func_lookup
...
00001c7c g 0f SECT 01 0080 .text foo::foo(char const*)
...
This snippet is from the following structure definition:
typedef char const* c_string;
struct foo {
typedef c_string ntcstring;
foo(ntcstring s): buf(s) {}
std::string buf;
};
This does require that you compile everything and it will only show symbols in the resulting executable so there are a few limitations. The other option is to have the linker dump a symbol map. For GNU tools add -Wl,-map and -Wl,name where name is the name of the file to generate (see note). This approach does not demangle the names, but with a little work you can reverse engineer the compiler's mangling conventions. The output from the previous snippet will include something like:
0x00001CBE 0x0000005E [ 2] __ZN3fooC2EPKc
0x00001D1C 0x0000001A [ 2] __ZN3fooC1EPKc
You can decode these using the C++ ABI specification. Once you get comfortable with how this works, the mangling table included with the ABI becomes priceless. The derivation in this case is:
<mangled-name> ::= '_Z' <encoding>
<encoding> ::= <name> <bare-function-type>
<name> ::= <nested-name>
<nested-name> ::= 'N' <source-name> <ctor-dtor-name> 'E'
<source-name> ::= <number> <identifier>
<ctor-dtor-name> ::= 'C2' # base object constructor
<bare-function-type> ::= <type>+
<type> ::= 'P' <type> # pointer to
<type> ::= <cv-qualifier> <type>
<cv-qualifier> ::= 'K' # constant
<type> ::= 'c' # character
Note: it looks like GNU changes the arguments to ld so you may want to check your local manual (man ld) to make sure that the map file generation commands are -mapfilename in your version. In recent versions, use -Wl,-M and redirect stdout to a file.
You can use Clang (the LLVM C/C++ compiler front-end) to parse code in a way that preserves information on typedefs and even macros. It has a very nice C++ API for reading the data after the source code is read into the AST (abstract syntax tree). http://clang.llvm.org/
If you are instead looking for a simple program that already does the resolving for you (instead of the Clang programming API), I think you are out of luck, as I have never seen such a thing.
GCC-XML can help with resolving the typedefs, you'd have to follow the type-ids of <Typedef> elements until you resolved them to a <FundamentalType>, <Struct> or <Class> element.
For replacing the typedefs in your project you have a more fundamental problem though: you can't simply search and replace as you'd have to respect the scope of names - think of e.g. function-local typedefs, namespace aliases or using directives.
Depending on what you're actually trying to achieve, there has to be a better way.
Update: Actually, in the given context of fixing metrics data, the replacement for the typenames using gcc-xml should work fine if it supports your code-base.