OCaml string length limitation when reading from stdin\file - ocaml

As part of a Compiler Principles course I'm taking in my university, we're writing a compiler that's implemented in OCaml, which compiles Scheme code into CISC-like assembly (which is just C macros).
the basic operation of the compiler is such:
Read a *.scm file and convert it to an OCaml string.
Parse the string and perform various analyses.
Run a code generator on the AST output from the semantic analyzer, that outputs text into a *.c file.
Compile that file with GCC and run it in the terminal.
Well, all is good and well, except for this: I'm trying to read an input file, that's around 4000 lines long, and is basically one huge expressions that's a mix of Scheme if & and.
I'm executing the compiler via utop. When I try to read the input file, I immediately get a stack overflow error message. It is my initial guess that the file is just to large for OCaml to handle, but I wasn't able to find any documentation that would support this theory.
Any suggestions?

The maximum string length is given by Sys.max_string_length. For a 32-bit system, it's quite short: 16777211. For a 64-bit system, it's 144115188075855863.
Unless you're using a 32-bit system, and your 4000-line file is over 16MB, I don't think you're hitting the string length limit.
A stack overflow is not what you'd expect to see when a string is too long.
It's more likely that you have infinite recursion, or possibly just a very deeply nested computation.

Well, it turns out that the limitation was the amount of maximum ram the OCaml is configured to use.
I ran the following command in the terminal in order to increase the quota:
export OCAMLRUNPARAM="l=5555555555"
This worked like a charm - I managed to read and compile the input file almost instantaneously.
For reference purposes, this is the code that reads the file:
let file_to_string input_file =
let in_channel = open_in input_file in
let rec run () =
try
let ch = input_char in_channel in ch :: (run ())
with End_of_file ->
( close_in in_channel;
[] )
in list_to_string (run ());;
where list_to_string is:
let list_to_string s =
let rec loop s n =
match s with
| [] -> String.make n '?'
| car :: cdr ->
let result = loop cdr (n + 1) in
String.set result n car;
result
in
loop s 0;;
funny thing is - I wrote file_to_string in tail recursion. This prevented the stack overflow, but for some reason went into an infinite loop. Oh, well...

Related

Create list from input using interact in Haskell

I'm starting out with Haskell and was looking into I/O mechanims. I read up on the interact function which takes a function of type String -> String as parameter. I tried to write a simple program that takes to numbers from stdin and creates a list and prints it line by line.
import Data.List
readIn = sort . map read . words
writeOut = unlines . map show
rangeList [n,m] = [n .. m]
main = interact (writeOut . rangeList . readIn)
For some reason it wont print the numbers. Could you help me out?
interact requires you to input an end-of-file (EOF) to stdin with Ctrl+D (or Ctrl+Z on Windows); when I type that combination, the output appears as required. This is necessary because, as the documentation for interact states, ‘the entire input from the standard input device is passed to [interact] as its argument’; due to this, you need to explicitly signal the place where stdin ends.
(By the way, I’m not even sure how you got your program to compile; GHC gives me lots of ‘ambiguous type’ errors when I try. I had to add type signatures to get it working, at which point I found the solution above to work.)

OCAMLRUNPARAM does not affect stack size

I would like to change my stack size to allow a project with many non-tail-recursive functions to run on larger data. To do so, I tried to set OCAMLRUNPARAM="l=xxx" for varying values of xxx (in the range 0 through 10G), but it did not have any effect. Is setting OCAMLRUNPARAM even the right approach?
In case it is relevant: The project I am interested in is built using OCamlMakefile, target native-code.
Here is a minimal example where simply a large list is created without tail recursion. To quickly check whether the setting of OCAMLRUNPARAM has an effect, I compiled the program stacktest.ml:
let rec create l =
match l with
| 0 -> []
| _ -> "00"::(create (l-1))
let l = create (int_of_string (Sys.argv.(1)))
let _ = print_endline("List of size " ^ string_of_int (List.length l) ^ " created.")
using the command
ocamlbuild stacktest.native
and found out roughly at which length of the list a stack overflow occurs by (more or less) binary search with the following bash script foo.sh:
#!/bin/bash
export OCAMLRUNPARAM="l=$1"
increment=1000000
length=1
while [[ $increment > 0 ]] ; do
while [[ $(./stacktest.native $length) ]]; do
length=$(($length+$increment))
done
length=$(($length-$increment))
increment=$(($increment/2))
length=$(($length+$increment))
done
length=$(($length-$increment))
echo "Largest list without overflow: $length"
echo $OCAMLRUNPARAM
The results vary between runs of this script (and the intermediate results are not even consistent within one run, but let's ignore that for now), but they are similar no matter whether I call
bash foo.sh 1
or
bash foo.sh 1G
i.e. whether the stack size is set to 1 or 2^30 words.
Changing the stack limit via OCAMLRUNPARAM works only for bytecode executables, that are run by the OCaml interpreter. A native program is handled by an operating system and executed directly on CPU. Thus, in order to change the stack limit, you need to use facilities, provided by your operating system.
For example, on Linux there is the ulimit command that handles many process parameters, including the stack limit. Add the following to your script
ulimit -s $1
And you will see that the result is changing.

Haskell: odd difference between compiled vs interpreted functions which print concatenated infinite lists

I'm just exploring Haskell for fun, and to learn about the language. I thought the following behavior was interesting, and I can't find the reason why this happens.
This is an often quoted piece of Haskell code which keeps calculating pi until interrupted, slightly modified to give a concatenated list of chars instead of a list of integers:
main :: IO ()
main = do putStrLn pi'
pi' :: [Char]
pi' = concat . map show $ g(1,0,1,1,3,3) where
g(q,r,t,k,n,l) =
if 4*q+r-t<n*t
then n : g(10*q,10*(r-n*t),t,k,div(10*(3*q+r))t-10*n,l)
else g(q*k,(2*q+r)*l,t*l,k+1,div(q*(7*k+2)+r*l)(t*l),l+2)
If I run it from prelude, it starts concatenating a string resembling the digits of pi:
λ> putStrLn pi'
31415926535897932384626433832795028841971 ...etc
Works as expected, it instantly starts spewing out digits.
Now this is a piece of code I just quickly wrote which has the same structure. It's completely useless from a mathematical point of view, I was just messing around to find out how Haskell works. The operations are much simpler, but it does have the same type, and so does the sub function (except for the smaller tuple).
main :: IO ()
main = do putStrLn func
func :: [Char]
func = concat . map show $ h(1,2,1) where
h(a,b,c) =
if a <= 1000
then a : h((div a 1)+2*b,b,1)
else h(b,div (b-3) (-1),div a a)
Same type of result from prelude:
λ> putStrLn func
1591317212529333741454953576165697377818589 ...etc
Works as expected, although much faster than the pi function of course, because the calculations are less complex.
Now for the part which confuses me:
If I compile: ghc pi.hs, and run my program: ./pi, the output stays blank forever, until I send an interrupt signal. At that moment, the whole calculated string of pi is instantly displayed. It doesn't "stream" the output into stdout, like GHCI does. OK, I know they don't always behave in the same way.
But next I run: ghc func.hs, and run my program: ./func... and it immediately starts printing the list of characters.
Where does this difference come from? I thought it might be because my stupid useless little function is (eventually) repeating, so the compiler can "predict" the output better?
Or is there another fundamental difference between the way the functions work? Or am I doing something utterly stupid?
Solution / Answer
Provided by Thomas & Daniel below, I was:
Impatient. Large chunks eventually show up with the pi function, it's just a bit slow on my simple old coding machine.
Not handling buffering in any way.
So after rewriting the main function:
import System.IO
main :: IO ()
main = do hSetBuffering stdout NoBuffering
putStrLn pi'
It was fixed!
Provided by Thomas & Daniel in the comments, it turned out I was:
Impatient. Large chunks eventually show up with the pi function, it's just a bit slow on my simple old coding machine.
Not handling buffering in any way.
So after rewriting the main function:
import System.IO
main :: IO ()
main = do hSetBuffering stdout NoBuffering
putStrLn pi'
It was fixed!

Programmatically load code in sml/nj

I try to load an external .sml file - let's say a.sml - and execute a fun (add: int -> int -> int) listed in this file.
I perfectly know how to do this in the interactive shell: use "a.sml";
But how to achieve this in a .sml file? I tried the following:
val doTest =
let
val _ = print ("Loading..." ^ "\n")
val _ = use "a.sml"
val _ = print ("1 + 2 = " ^ Int.toString (add 1 2) ^ "\n")
in
1
end
But the compilers reaction is:
test.sml:7.49-7.52 Error: unbound variable or constructor: add
BTW: I know that using the CM is the more appropriate way. But in my case I do not know the file a.sml prior to the compilation.
You can't do this. The compiler must know the types of the functions you are calling at compile time. What you are asking is for SML to load a file at run time (use ...) and subsequently run the code therein. This isn't possible due to the phase distinction; type checking occurs during compilation, after which all type information can be forgotten.
If you're generating code and know the file name, you can still use the CM and compile in two steps using your build system. Then you'd get the type errors from the generated code in the second compilation step. Please describe your situation if such an approach doesn't work for you.

SML-NJ, how to compile standalone executable

I start to learn Standard ML, and now I try to use Standard ML of New Jersey compiler.
Now I can use interactive loop, but how I can compile source file to standalone executable?
In C, for example, one can just write
$ gcc hello_world.c -o helloworld
and then run helloworld binary.
I read documentation for SML NJ Compilation Manager, but it don`t have any clear examples.
Also, is there another SML compiler (which allow standalone binary creating) available?
Both MosML and MLton also have the posibility to create standalone binary files. MosML through mosmlc command and MLton through the mlton command.
Note that MLton doesn't have an interactive loop but is a whole-program optimising compiler. Which in basic means that it takes quite some time to compile but in turn it generates incredibly fast SML programs.
For SML/NJ you can use the CM.mk_standalone function, but this is not advised in the CM User Manual page 45. Instead they recommend that you use the ml-build command. This will generate a SML/NJ heap image. The heap image must be run with the #SMLload parameter, or you can use the heap2exec program, granted that you have a supported system. If you don't then I would suggest that you use MLton instead.
The following can be used to generate a valid SML/NJ heap image:
test.cm:
Group is
test.sml
$/basis.cm
test.sml:
structure Test =
struct
fun main (prog_name, args) =
let
val _ = print ("Program name: " ^ prog_name ^ "\n")
val _ = print "Arguments:\n"
val _ = map (fn s => print ("\t" ^ s ^ "\n")) args
in
1
end
end
And to generate the heap image you can use: ml-build test.cm Test.main test-image and then run it by sml #SMLload test-image.XXXXX arg1 arg2 "this is one argument" where XXXXX is your architecture.
If you decide to MLton at some point, then you don't need to have any main function. It evaluates everything at toplevel, so you can create a main function and have it called by something like this:
fun main () = print "this is the main function\n"
val foo = 4
val _ = print ((Int.toString 4) ^ "\n")
val _ = main ()
Then you can compile it by mlton foo.sml which will produce an executable named "foo". When you run it, it will produce this as result:
./foo
4
this is the main function
Note that this is only one file, when you have multiple files you will either need to use MLB (ML Basis files) which is MLtons project files or you can use cm files and then compile it by mlton projectr.mlb