Programmatically load code in sml/nj - sml

I try to load an external .sml file - let's say a.sml - and execute a fun (add: int -> int -> int) listed in this file.
I perfectly know how to do this in the interactive shell: use "a.sml";
But how to achieve this in a .sml file? I tried the following:
val doTest =
let
val _ = print ("Loading..." ^ "\n")
val _ = use "a.sml"
val _ = print ("1 + 2 = " ^ Int.toString (add 1 2) ^ "\n")
in
1
end
But the compilers reaction is:
test.sml:7.49-7.52 Error: unbound variable or constructor: add
BTW: I know that using the CM is the more appropriate way. But in my case I do not know the file a.sml prior to the compilation.

You can't do this. The compiler must know the types of the functions you are calling at compile time. What you are asking is for SML to load a file at run time (use ...) and subsequently run the code therein. This isn't possible due to the phase distinction; type checking occurs during compilation, after which all type information can be forgotten.
If you're generating code and know the file name, you can still use the CM and compile in two steps using your build system. Then you'd get the type errors from the generated code in the second compilation step. Please describe your situation if such an approach doesn't work for you.

Related

OCaml string length limitation when reading from stdin\file

As part of a Compiler Principles course I'm taking in my university, we're writing a compiler that's implemented in OCaml, which compiles Scheme code into CISC-like assembly (which is just C macros).
the basic operation of the compiler is such:
Read a *.scm file and convert it to an OCaml string.
Parse the string and perform various analyses.
Run a code generator on the AST output from the semantic analyzer, that outputs text into a *.c file.
Compile that file with GCC and run it in the terminal.
Well, all is good and well, except for this: I'm trying to read an input file, that's around 4000 lines long, and is basically one huge expressions that's a mix of Scheme if & and.
I'm executing the compiler via utop. When I try to read the input file, I immediately get a stack overflow error message. It is my initial guess that the file is just to large for OCaml to handle, but I wasn't able to find any documentation that would support this theory.
Any suggestions?
The maximum string length is given by Sys.max_string_length. For a 32-bit system, it's quite short: 16777211. For a 64-bit system, it's 144115188075855863.
Unless you're using a 32-bit system, and your 4000-line file is over 16MB, I don't think you're hitting the string length limit.
A stack overflow is not what you'd expect to see when a string is too long.
It's more likely that you have infinite recursion, or possibly just a very deeply nested computation.
Well, it turns out that the limitation was the amount of maximum ram the OCaml is configured to use.
I ran the following command in the terminal in order to increase the quota:
export OCAMLRUNPARAM="l=5555555555"
This worked like a charm - I managed to read and compile the input file almost instantaneously.
For reference purposes, this is the code that reads the file:
let file_to_string input_file =
let in_channel = open_in input_file in
let rec run () =
try
let ch = input_char in_channel in ch :: (run ())
with End_of_file ->
( close_in in_channel;
[] )
in list_to_string (run ());;
where list_to_string is:
let list_to_string s =
let rec loop s n =
match s with
| [] -> String.make n '?'
| car :: cdr ->
let result = loop cdr (n + 1) in
String.set result n car;
result
in
loop s 0;;
funny thing is - I wrote file_to_string in tail recursion. This prevented the stack overflow, but for some reason went into an infinite loop. Oh, well...

"Unbound type constructor _no_unused_value_warning" (only when #use ing file)

Consider this very basic module definition:
module type My_test = sig
type config with sexp
end;;
When I directly enter this on the utop prompt, everything works fine:
utop # module type My_test = sig
type config with sexp
end;;
module type My_test =
sig type config val config_of_sexp : Sexp.t -> config val sexp_of_config : config -> Sexp.t end
But when I try to #use a file containing the exact same definition, I get an Unbound type constructor _no_unused_value_warning_ error:
utop # #use "dummy.mli";;
File "dummy.mli", line 2, characters 7-13:
Error: Unbound type constructor _no_unused_value_warning_
(line 2 is type config with sexp )
Version info: The universal toplevel for OCaml, version 1.7, compiled for OCaml version 4.01.0
UPDATE:
I'm starting a bounty since I'd really be interested in
knowing whether this is an OCaml bug
sensible workarounds / fixes for my code
1) The ocaml top-level has two undocumented options named: -dsource & -dparsetree
2) If I enable the -dsource and then try the #use "dummy.mli". I saw that the source that was generated looked like this:
$ ocaml -dsource
# #use "dummy.mli";;
module type My_test =
sig
type config
val config_of_sexp : (Sexplib.Sexp.t -> config) _no_unused_value_warning_
val sexp_of_config : (config -> Sexplib.Sexp.t) _no_unused_value_warning_
end;;
File "dummy.mli", line 1, characters 31-37:
Error: Unbound type constructor _no_unused_value_warning_
3) however, when I directly enter the type declaration directly into the toplevel the source generated did not have the "_no_unused_value_warning_"
4) The parse-tree that is generated for these two cases is slightly different, due to the presence of _no_unused_value_warning_.
5) After some greping I saw that the type_conv library inserts `'val name : _no_unused_value_warning_' as a sort of hack to deactivate warnings -- https://github.com/janestreet/type_conv/blob/master/lib/pa_type_conv.ml -- there is a comment starting on line 916 that explains this stuff (I am still learning ocaml, so I don't yet understand everything about these parts)
Since sexplib uses type_conv, this signature was added in this case.
6) But, the real issue here has to be be how the ocaml toplevel handles the #use directive and the directly input lines of code.
In this file: https://github.com/diml/ocaml-3.12.1-print/blob/master/toplevel/opttoploop.ml
-- use_file (at line 316) uses List.iter to loop over a list of Parsetree.toplevel_phrase and calls execute_phrase on each element.
The REPL loop (at line 427) calls execute_phrase on a single Parsetree.toplevel_phrase
7) I am still not sure what is really causing the difference in the parse-tree - but trying to figure it out was interesting.
It would be awesome if someone who understands these parts more posted an answer.
I ran into this today using utop 1.17 with Ocaml 4.02.1. After reading gautamc'e excellent answer, I tried this simple workaround:
utop # type 'a _no_unused_value_warning_ = 'a;;
This allowed me to successfully #use the module I had been having a problem with.

Datalog require field `unlock'

While compiling an OCaml application I get the following error:
File "/tmp/ocamlpp466ee0", line 308, characters 34-233:
Error: Signature mismatch:
...
The field `unlock' is required but not provided
The field `lock' is required but not provided
Command exited with code 2.
My guess is that the error is releated with the OCaml library Datalog (I've installed the version 0.3 from here) because the line 308 in the file is /tmp/ocamlpp466ee0 the first one in the following code
module Logic = Datalog.Logic.Make(struct
type t = atom
let equal = eq_atom
let hash = hash_atom
let to_string a = Utils.sprintf "%a" pp_atom a
let of_string s = atom_of_json (Json.from_string s)
end)
I would really appreciate if someone could help me to know what I am doing wrong.
Moreover, I would like to undestand why the file /tmp/ocamlpp466ee0 is generated each time I execute 'make'? I tried to understand by reading the Makefile but I did not succeed.
I think that something have changed in Datalog library and in some version > 0.3 functor Datalog.Logic.Make requires module argument with values lock and unlock declared. So, it's version problem.
About temporary file. As you can see, its name consists of ocaml literal, pp which means preprocessor and some number. Preprocessors in OCaml usually work this way: they read input source file and write output source files. That's why some temporary files are created.

SML-NJ, how to compile standalone executable

I start to learn Standard ML, and now I try to use Standard ML of New Jersey compiler.
Now I can use interactive loop, but how I can compile source file to standalone executable?
In C, for example, one can just write
$ gcc hello_world.c -o helloworld
and then run helloworld binary.
I read documentation for SML NJ Compilation Manager, but it don`t have any clear examples.
Also, is there another SML compiler (which allow standalone binary creating) available?
Both MosML and MLton also have the posibility to create standalone binary files. MosML through mosmlc command and MLton through the mlton command.
Note that MLton doesn't have an interactive loop but is a whole-program optimising compiler. Which in basic means that it takes quite some time to compile but in turn it generates incredibly fast SML programs.
For SML/NJ you can use the CM.mk_standalone function, but this is not advised in the CM User Manual page 45. Instead they recommend that you use the ml-build command. This will generate a SML/NJ heap image. The heap image must be run with the #SMLload parameter, or you can use the heap2exec program, granted that you have a supported system. If you don't then I would suggest that you use MLton instead.
The following can be used to generate a valid SML/NJ heap image:
test.cm:
Group is
test.sml
$/basis.cm
test.sml:
structure Test =
struct
fun main (prog_name, args) =
let
val _ = print ("Program name: " ^ prog_name ^ "\n")
val _ = print "Arguments:\n"
val _ = map (fn s => print ("\t" ^ s ^ "\n")) args
in
1
end
end
And to generate the heap image you can use: ml-build test.cm Test.main test-image and then run it by sml #SMLload test-image.XXXXX arg1 arg2 "this is one argument" where XXXXX is your architecture.
If you decide to MLton at some point, then you don't need to have any main function. It evaluates everything at toplevel, so you can create a main function and have it called by something like this:
fun main () = print "this is the main function\n"
val foo = 4
val _ = print ((Int.toString 4) ^ "\n")
val _ = main ()
Then you can compile it by mlton foo.sml which will produce an executable named "foo". When you run it, it will produce this as result:
./foo
4
this is the main function
Note that this is only one file, when you have multiple files you will either need to use MLB (ML Basis files) which is MLtons project files or you can use cm files and then compile it by mlton projectr.mlb

ocamlyacc parse error: what token?

I'm using ocamlyacc and ocamllex. I have an error production in my grammar that signals a custom exception. So far, I can get it to report the error position:
| error { raise (Parse_failure (string_of_position (symbol_start_pos ()))) }
But, I also want to know which token was read. There must be a way---anyone know?
Thanks.
The best way to debug your ocamlyacc parser is to set the OCAMLRUNPARAM param to include the character p - this will make the parser print all the states that it goes through, and each shift / reduce it performs.
If you are using bash, you can do this with the following command:
$ export OCAMLRUNPARAM='p'
Tokens are generated by lexer, hence you can use the current lexer token when error occurs :
let parse_buf_exn lexbuf =
try
T.input T.rule lexbuf
with exn ->
begin
let curr = lexbuf.Lexing.lex_curr_p in
let line = curr.Lexing.pos_lnum in
let cnum = curr.Lexing.pos_cnum - curr.Lexing.pos_bol in
let tok = Lexing.lexeme lexbuf in
let tail = Sql_lexer.ruleTail "" lexbuf in
raise (Error (exn,(line,cnum,tok,tail)))
end
Lexing.lexeme lexbuf is what you need. Other parts are not necessary but useful.
ruleTail will concat all remaining tokens into string for the user to easily locate error position. lexbuf.Lexing.lex_curr_p should be updated in the lexer to contain correct positions. (source)
I think that, similar to yacc, the tokens are stored in variables corresponding to the symbols in your grammar rule. Here since there is one symbol (error), you may be able to simply output $1 using printf, etc.
Edit: responding to comment.
Why do you use an error terminal? I'm reading an ocamlyacc tutorial that says a special error-handling routine is called when a parse error happens. Like so:
3.1.5. The Error Reporting Routine
When ther parser function detects a
syntax error, it calls a function
named parse_error with the string
"syntax error" as argument. The
default parse_error function does
nothing and returns, thus initiating
error recovery (see Error Recovery).
The user can define a customized
parse_error function in the header
section of the grammar file such as:
let parse_error s = (* Called by the parser function on error *)
print_endline s;
flush stdout
Well, looks like you only get "syntax error" with that function though. Stay tuned for more info.