I have the following code block which I modified from the mirageOS github repo:
open Lwt.Infix
module Main (KV: Mirage_kv.RO) = struct
let start kv =
let read_from_file kv =
KV.get kv (Mirage_kv.Key.v "secret") >|= function
| Error e ->
Logs.warn (fun f -> f "Could not compare the secret against a known constant: %a"
KV.pp_error e)
| Ok stored_secret ->
Logs.info (fun f -> f "Data -> %a" Format.pp_print_string stored_secret);
in
read_from_file kv
end
This code reads data from a file called "secret" and outputs it once. I want to read the file and output from it constantly with sleep in between.
The usage case is this: While this program is running I will update the secret file with other processes, so I want to see the change in the output.
What I tried ?
I tried to put the last statement in while loop with
in
while true do
read_from_file kv
done
But It gives the error This expression has type unit Lwt.t but an expression was expected of type unit because it is in the body of a while loop.
I just know that lwt is a threading library but I'm not a ocaml developer and don't try to be one, (I'm interested in MirageOS) , so I can't find the functional syntax to write it.
You need to write the loop as a function. e.g.
let rec loop () =
read_from_file kv >>= fun () ->
(* wait here? *)
loop ()
in
loop ()
I am building a floating point calculator and I'm stuck. The fp calculator has prompt shape, so my problem is that where I handle the exceptions I leave the recursive function that keeps the prompt showing up and ends the execution:
let initialDictionary = ref EmptyDictionary;;
let launcher () =
print_string ("Welcome");
let rec aux dic =
try
print_string ("->");
aux ( execute dic (s token (Lexing.from_string (read_line () ))));
with
End_of_exec -> print_endline ("closing")
Var_not_assigned s -> printf "var %s not assigned" s
in aux !initialDictionary;;
Where:
val exec : dictionary -> Instruction -> dictionary;
type dictionary = (string,float)list
The point here is that as lists are immutable on ocaml, the only way I have to keep my dictionary stacking variables with its value is applying recursion taking the value dictionary from exec return.
So any ideas on how not to leave the "prompt like" execution while showing exceptions?
A read eval print loop interpreter has all these different phases which must be handled separately, and I think your implementation tried to do too much in a single step. Just breaking down the second part of your function will help untangle the different error handlers:
let launcher () =
let rec prompt dic =
print_string "->";
parse dic
and parse dic =
let res =
try
s token (Lexing.from_string ## read_line ())
with End_of_exec -> (print_encline "closing"; raise End_of_exec)
in evalloop dic res
and evalloop dic rep =
let dic =
try execute dic rep
with Var_not_assigned s -> (printf "var %s not assigned\n" s; dic)
in prompt dic
in prompt !InitialDictionary;;
Since each sub function will either tail call the next one or fail with an exception, the overall code should be tail recursive and optimised to a loop.
The good thing about this is that now you get a better picture of what is going on, and it makes it easier to change an individual step of the loop without touching the rest.
By the way, recall that read_line may trigger an End_of_file exception as well, but it should be trivial now to handle it now (left as an exercise).
Ok a friend of mine gave me a solution and thinking about it I cant figure out a more elegant way.
Let exec () =
Let _= print_string "->" in
( try
Let inst = s token (Lexing.from_string (read_line())) in
Try
Let r = exec_instruction !initialDictionary inst in
initialDictionary := r
With
| Var_not_assigned s -> printf "var %s not assigned. \n" s
| Com_not_implemented s -> printf " command %s not implemented. \n" s
| Function_not_implemented s -> printf " function %s not implemented. \n" s
With
| lexic_error -> print_endline "lexic error"
| Parsing. Parse_error -> print_endline "sintax error"
);;
Let rec program cont =
If cont then
Try
Exec (); program true
With
| End_of_exec -> print_endline "closing"; program false;;
Program true
The only thing that bothers me its the initialDictionary := r assignment, because exec_instruction returns a dictionary so it can be done recursive but it works anyways, ill figure out something hah.
Thank you for the help, if someone can see a brigther solution let me know.
I would like to build a string list by prompting the user for input. My end goal is to be able to parse a string list against a simple hash table using a simple routine.
`let list_find tbl ls =
List.iter (fun x ->
let mbr = if Hashtbl.mem tbl x then "aok" else "not found"
in
Printf.printf "%s %s\n" x mbr) ls ;;`
Building a string list is accomplished with the cons operator ::, but somehow I am not able to get the prompt to generate a string list. A simpe list function returns anything that is put into it as a list:
`let build_strlist x =
let rec aux x = match x with
| [] -> []
| hd :: tl -> hd :: aux tl
in
aux x ;;`
Thus far, I have been able to set the prompt, but building the string list did not go so well. I am inclined to think I should be using Buffer or Scanning.in_channel. This is what I have thus far:
`#load "unix.cma" ;;
let prompt () = Unix.isatty Unix.stdin && Unix.isatty Unix.stdout ;;
let build_strlist () =
let rec loop () =
let eof = ref false in
try
while not !eof do
if prompt () then print_endline "enter input ";
let line = read_line () in
if line = "-1" then eof := true
else
let rec build x = match x with
| [] -> []
| hd :: tl -> hd :: build tl
in
Printf.printf "you've entered %s\n" (List.iter (build line));
done
with End_of_file -> ()
in
loop () ;;`
I am getting an error the keyword "line" has the type string, but an expression was expected of type 'a list. Should I be building the string list using Buffer.create buf and then Buffer.add_string buf prepending [ followed by quotes " another " and a semicolon? This seems to be an overkill. Maybe I should just return a string list and ignore any attempts to "peek at what we have"? Printing will be done after checking the hash table.
I would like to have a prompt routine so that I can use ocaml for scripting and user interaction. I found some ideas on-line which allowed me to write the skeleton above.
I would probably break down the problem in several steps:
get the list of strings
process it (in your example, simply print it back)
1st step can be achieved with a recursive function as follow:
let build_strlist' () =
let rec loop l =
if prompt () then (
print_string "enter input: ";
match read_line () with
"-1" -> l
| s -> loop (s::l)
) else l
in loop [];;
See how that function loops on itself and build up the list l as it goes. As you mentioned in your comment, I dropped the imperative part of your code to keep the functional recursion only. You could have achieved the same by keeping instead the imperative part and leaving out the recursion, but recursion feels more natural to me, and if written correctly, leads to mostly the same machine code.
Once you have the list, simply apply a List.iter to it with the ad hoc printing function as you did in your original function.
I'm a beginner with OCaml and I want to read lines from a file and then examine all characters in each line.
As a dummy example, let's say we want to count the occurrences of the character 'A' in a file.
I tried the following
open Core.Std
let count_a acc string =
let rec count_help res stream =
match Stream.peek stream with
| None -> res
| Some char -> Stream.junk stream; if char = 'A' then count_help (res+1) stream else count_help res stream
in acc + count_help 0 (Stream.of_string string)
let count_a = In_channel.fold_lines stdin ~init:0 ~f:count_a
let () = print_string ((string_of_int count_a)^"\n"
I compile it with
ocamlfind ocamlc -linkpkg -thread -package core -o solution solution.ml
run it with
$./solution < huge_file.txt
on a a file with one million lines which gives me the following times
real 0m16.337s
user 0m16.302s
sys 0m0.027s
which is 4 times more than my python implementation. I'm fairly sure that it should be possible to make this go faster, but I how should I go about doing this?
To count the number of A chars in a string you can just use String.count function. Indeed, the simpliest solution will be:
open Core.Std
let () =
In_channel.input_all stdin |>
String.count ~f:(fun c -> c = 'A') |>
printf "we have %d A's\n"
update
A slightly more complicated (and less memory hungry solution), with [fold_lines] will look like this:
let () =
In_channel.fold_lines stdin ~init:0 ~f:(fun n s ->
n + String.count ~f:(fun c -> c = 'A') s) |>
printf "we have %d A's\n"
Indeed, it is slower, than the previous one. It takes 7.3 seconds on my 8-year old laptop, to count 'A' in 20-megabyte text file. And 3 seconds on a former solution.
Also, you can find this post interesting, I hope.
Suppose I am writing an OCaml program and my input will be a large stream of integers separated by spaces i.e.
let string = input_line stdin;;
will return a string which looks like e.g. "2 4 34 765 5 ..." Now, the program itself will take a further two values i and j which specify a small subsequence of this input on which the main procedure will take place (let's say that the main procedure is the find the maximum of this sublist). In other words, the whole stream will be inputted into the program but the program will only end up acting on a small subset of the input.
My question is: what is the best way to translate the relevant part of the input stream into something usable i.e. a string of ints? One option would be to convert the whole input string into a list of ints using
let list = List.map int_of_string(Str.split (Str.regexp_string " ") string;;
and then once the bounds i and j have been entered one easily locates the relevant sublist and its maximum. The problem is that the initial pre-processing of the large stream is immensely time-consuming.
Is there an efficient way of locating the small sublist directly from the large stream i.e. processing the input along with the main procedure?
OCaml's standard library is rather small. It provides necessary and sufficient set of orthogonal features, as should do any good standard library. But, usually, this is not enough for a casual user. That's why there exist libraries, that do the stuff, that is rather common.
I would like to mention two the most prominent libraries: Jane Street's Core library and Batteries included (aka Core and Batteries).
Both libraries provides a bunch of high-level I/O functions, but there exists a little problem. It is not possible or even reasonable to try to address any use case in a library. Otherwise the library's interface wont be terse and comprehensible. And your case is non-standard. There is a convention, a tacit agreement between data engineers, to represent a set of things with a set of lines in a file. And to represent one "thing" (or a feature) with a line. So, if you have a dataset where each element is a scalar, you should represent it as a sequence of scalars separated by a newline. Several elements on a single line is only for multidimensional features.
So, with a proper representation, your problem can be solve as simple as (with Core):
open Core.Std
let () =
let filename = "data" in
let max_number =
let open In_channel in
with_file filename
~f:(fold_lines ~init:0
~f:(fun m s -> Int.(max m ## of_string s))) in
printf "Max number is %s is %d\n" filename max_number
You can compile and run this program with corebuild test.byte -- assuming that code is in a file name test.byte and core library is installed (with opam install core if you're using opam).
Also, there exists an excellent library Lwt, that provides a monadic high-level interface to the I/O. With this library, you can parse a set of scalars in a following way:
open Lwt
let program =
let filename = "data" in
let lines = Lwt_io.lines_of_file filename in
Lwt_stream.fold (fun s m -> max m ## int_of_string s) lines 0 >>=
Lwt_io.printf "Max number is %s is %d\n" filename
let () = Lwt_main.run program
This program can be compiled and run with ocamlbuild -package lwt.unix test.byte --, if lwt library is installed on your system (opam install lwt).
So, that is not to say, that your problem cannot be solved (or is hard to be solved) in OCaml, it is just to mention, that you should start with a proper representation. But, suppose, you do not own the representation, and cannot change it. Let's look, how this can be solved efficiently with OCaml. As previous examples represent, in general your problem can be described as a channel folding, i.e. an consequential application of a function f to each value in a file. So, we can define a function fold_channel, that will read an integer value from a channel and apply a function to it and the previously read value. Of course, this function can be further abstracted, by lifting the format argument, but for the demonstration purpose, I suppose, this will be enough.
let rec fold_channel f init ic =
try Scanf.fscanf ic "%u " (fun s -> fold_channel f (f s init) ic)
with End_of_file -> init
let () =
let max_value = open_in "atad" |> fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
Although, I should note that this implementation is not for a heavy duty work. It is even not tail-recursive. If you need really efficient lexer, you can use ocaml's lexer generator, for example.
Update 1
Since there is a word "efficient" in the title, and everybody likes benchmarks, I've decided to compare this three implementations. Of course, since pure OCaml implementation is not tail-recursive it is not comparable to others. You may wonder, why it is not tail-recursive, as all calls to fold_channel is in a tail position. The problem is with exception handler - on each call to the fold channel, we need to remember the init value, since we're going to return it. This is a common issue with recursion and exceptions, you may google it for more examples and explanations.
So, at first we need to fix the third implementation. We will use a common trick with option value.
let id x = x
let read_int ic =
try Some (Scanf.fscanf ic "%u " id) with End_of_file -> None
let rec fold_channel f init ic =
match read_int ic with
| Some s -> fold_channel f (f s init) ic
| None -> init
let () =
let max_value = open_in "atad" |> fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
So, with a new tail-recursive implementation, let's try them all on a big-data. 100_000_000 numbers is a big data for my 7 years old laptop. I've also added a C implementations as a baseline, and an OCaml clone of the C implementation:
let () =
let m = ref 0 in
try
let ic = open_in "atad" in
while true do
let n = Scanf.fscanf ic "%d " (fun x -> x) in
m := max n !m;
done
with End_of_file ->
Printf.printf "max value is %u\n" !m;
close_in ic
Update 2
Yet another implementation, that uses ocamllex. It consists of two files, a lexer specification lex_int.mll
{}
let digit = ['0'-'9']
let space = [' ' '\t' '\n']*
rule next = parse
| eof {None}
| space {next lexbuf}
| digit+ as n {Some (int_of_string n)}
{}
And the implementation:
let rec fold_channel f init buf =
match Lex_int.next buf with
| Some s -> fold_channel f (f s init) buf
| None -> init
let () =
let max_value = open_in "atad" |>
Lexing.from_channel |>
fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
And here are the results:
implementation time ratio rate (MB/s)
plain C 22 s 1.0 12.5
ocamllex 33 s 1.5 8.4
Core 62 s 2.8 4.5
C-like OCaml 83 s 3.7 3.3
fold_channel 84 s 3.8 3.3
Lwt 143 s 6.5 1.9
P.S. You can see, that in this particular case Lwt is an outlier. This doesn't mean that Lwt is slow, it is just not its granularity. And I would like to assure you, that to my experience Lwt is a well suited tool for a HPC. For example, in one of my programs it processes a 30 MB/s network stream in a real-time.
Update 3
By the way, I've tried to address the problem in an abstract way, and I didn't provide a solution for your particular example (with j and k). Since, folding is a generalization of the iteration, it can be easily solved by extending the state (parameter init) to hold a counter and check whether it is contained in a range, that was specified by a user. But, this leads to an interesting consequence: what to do, when you have outran the range? Of course, you can continue to the end, just ignoring the output. Or you can non-locally exit from a function with an exception, something like raise (Done m). Core library provides such facility with a with_return function, that allows you to break out of your computation at any point.
open Core.Std
let () =
let filename = "data" in
let b1,b2 = Int.(of_string Sys.argv.(1), of_string Sys.argv.(2)) in
let range = Interval.Int.create b1 b2 in
let _,max_number =
let open In_channel in
with_return begin fun call ->
with_file filename
~f:(fold_lines ~init:(0,0)
~f:(fun (i,m) s ->
match Interval.Int.compare_value range i with
| `Below -> i+1,m
| `Within -> i+1, Int.(max m ## of_string s)
| `Above -> call.return (i,m)
| `Interval_is_empty -> failwith "empty interval"))
end in
printf "Max number is %s is %d\n" filename max_number
You may use the Scanf module family of functions. For instance, Scanf.fscanf let you read tokens from a channel according to a string format (which is a special type in OCaml).
Your program can be decomposed in two functions:
one which skip a number i of tokens from the input channel,
one which extract the maximum integer out of a number j from a channel
Let's write these:
let rec skip_tokens c i =
match i with
| i when i > 0 -> Scanf.fscanf c "%s " (fun _ -> skip_tokens c ## pred i)
| _ -> ()
let rec get_max c j m =
match j with
| j when j > 0 -> Scanf.fscanf c "%d " (fun x -> max m x |> get_max c (pred j))
| _ -> m
Note the space after the token format indicator in the string which tells the scanner to also swallow all the spaces and carriage returns in between tokens.
All you need to do now is to combine them. Here's a small program you can run from the CLI which takes the i and j parameters, expects a stream of tokens, and print out the maximum value as wanted:
let _ =
let i = int_of_string Sys.argv.(1)
and j = int_of_string Sys.argv.(2) in
skip_tokens stdin (pred i);
get_max stdin j min_int |> print_int;
print_newline ()
You could probably write more flexible combinators by extracting the recursive part out. I'll leave this as an exercise for the reader.