Logistic Regression in OCaml - ocaml

I was trying to use Logistic regression in OCaml. I need to use it as a blackbox for another problem I'm solving. I found the following site:
http://math.umons.ac.be/anum/en/software/OCaml/Logistic_Regression/
I pasted the following code (with a few modifications - I defined my own iris_features and iris_label) from this site into a file named logistic_regression.ml:
open Scanf
open Format
open Bigarray
open Lacaml.D
let log_reg ?(lambda=0.1) x y =
(* [f_df] returns the value of the function to maximize and store
its gradient in [g]. *)
let f_df w g =
let s = ref 0. in
ignore(copy ~y:g w); (* g ← w *)
scal (-. lambda) g; (* g = -λ w *)
for i = 0 to Array.length x - 1 do
let yi = float y.(i) in
let e = exp(-. yi *. dot w x.(i)) in
s := !s +. log1p e;
axpy g ~alpha:(yi *. e /. (1. +. e)) ~x:x.(i);
done;
-. !s -. 0.5 *. lambda *. dot w w
in
let w = Vec.make0 (Vec.dim x.(0)) in
ignore(Lbfgs.F.max f_df w);
w
let iris_features = [1 ; 2 ; 3] ;;
let iris_labels = 2 ;;
let proba w x y = 1. /. (1. +. exp(-. float y *. dot w x))
let () =
let sol = log_reg iris_features iris_labels in
printf "w = %a\n" Lacaml.Io.pp_fvec sol;
let nwrongs = ref 0 in
for i = 0 to Array.length iris_features - 1 do
let p = proba sol iris_features.(i) iris_labels.(i) in
printf "Label = %i prob = %g => %s\n" iris_labels.(i) p
(if p > 0.5 then "correct" else (incr nwrongs; "wrong"))
done;
printf "Number of wrong labels: %i\n" !nwrongs
I have the following questions:
On trying to compile the code, I get the error message: "Error: Unbound module Lacaml". I've installed Lacaml; done opam init several times, tried to provide a flag -package = Lacaml ; I don't know how to solve this?
As you can see I've defined my own version of iris_features and iris_labels - are the types correct i.e. in the function log_reg is the type of x int list and that of y as int?

Both iris_features and iris_labels are arrays and array literals in OCaml are delimited with the [|, |] style parentheses, e.g.,
let iris_features = [|(* I don't know what to put here*)|]
let iris_labels = [|2|]
The iris_features array has type vec array, i.e., an array of vectors, not an array of integers, and didn't I dig too deep to know what to put there, but the syntax is the following,
let iris_features =[|
Vec.of_list [1.; 2.; 3.;];
Vec.of_list [4.; 5.; 6.;];
|]
The Lacaml interface has changed a bit since the code was written and axpy no longer accepts labeled ~x arguments (both x and y vectors are positional now) so you need to remove ~x and fix the order (I presume that x.(i) is x in the a*x + y expression and g corresponds to y, e.g.,
axpy ~alpha:(yi *. e /. (1. +. e)) x.(i) g;
This code also depends on lbfgs, so you need to install it as well,
opam depext --install lbfgs
I would suggest you using dune as your default built system but for fast prototyping, you can use ocamlbuild. Put your code into an empty folder in a file named regress.ml (you can pick other name, just update the build instructions correspondingly), now you can build it to a native executable, as
ocamlbuild -pkg lacaml -pkg lbfgs regress.native
run it as
./regress.native
If you're playing in the OCaml toplevel (aka interpreter, i.e., running your code in the ocaml interpreter), you can load lacaml and lbfgs using the following two directives:
#use "topfind";;
#require "lacaml.top";;
#require "lbfgs";;
(The # is not a prompt but a part of the directive syntax, so don't forget to type it as well).
Now you can copy-paste your code into the interpreter and play with it.
Bonus Track - building with dune
create an empty folder and a put regress.ml there.
remove open Bigarray and open Scanf as dune is very strict on warnings and turns them into errors (and it will warn you on those lines as they are, in fact, unused)
create the dune project
dune init exe regress --libs lacaml,lbfgs
build and run
dune exec ./regress.exe

Related

inconsistencies between the original expression and the result one in apron

I'm using the OCaml interface of the Apron library.
When I want to reduce the expression [| x + y -2 >= 0; x + y > - 3=0|], the result of tab is [|-3 + 1 * x + 1 * y >= 0|], How can I get the origin expression x + y - 3 >= 0?
let _ =
let vx = Var.of_string "x" in
let vy = Var.of_string "y" in
let env = Environment.make [||] [|vx;vy|] in
let c = Texpr1.cst env (Coeff.s_of_int 2) in
let c' = Texpr1.cst env (Coeff.s_of_int 3) in
let vx' = Texpr1.var env vx in
let vy' = Texpr1.var env vy in
let texpr = Texpr1.binop Add vx' vy' Real Near in
let texpr1 = Texpr1.binop Sub texpr c Real Near in
let texpr2 = Texpr1.binop Sub texpr c' Real Near in
(* let sum' = Texpr1.(Binop(Sub,x2,Cst c,Int,Near)) in *)
Format.printf "env = %a#." (fun x -> Environment.print x) env;
Format.printf "expr = %a#." (fun x -> Texpr1.print x) texpr;
let cons1 = Tcons1.make texpr1 Lincons0.SUPEQ in
let cons2 = Tcons1.make texpr2 Lincons0.SUPEQ in
let tab = Tcons1.array_make env 2 in
Tcons1.array_set tab 0 cons1;
Tcons1.array_set tab 1 cons2;
let abs = Abstract1.of_tcons_array manpk env tab in
let tab' = Abstract1.to_tcons_array manpk abs in
Format.printf "tab = %a#." (fun x -> Tcons1.array_print x) tab;
Format.printf "tab1 = %a#." (fun x -> Tcons1.array_print x) tab'
It seems to me that there is no inconsistency as the expressions -3 + 1 * x + 1 * y >= 0 and x + y - 3 >= 0 are semantically equivalent.
Why is the expression printed this way?
You are building a polyhedron (i'm guessing manpk refers to the polka manager) and even if it is built using tree-constraints, it is represented internally using linear-constraints. So when you convert it back to tree-constraints, you actually are converting a Lincons1.earray to a Tcons1.earray, hence the representation as a sum of monoms.
If by "get the origin expression" you mean, make Apron print it in a human friendly way, i suggest you convert your polyhedron to a linear-constraint array (using to_lincons_array) and then define your own pretty-printing utility over linear constraints.
Alternatively, you can use the Apronext library, which is a small wrapper I wrote around the Apron library which provides pp_print functions. On your specific example, using Linconsext.pp_print, you get: x+y>=3. Disclaimer, Apronext is neither efficient, nor reliable, nor maintained, so i suggest you dont use it extensively, but only for understanding purposes

ocamlbuild with Toploop/TopLevel

I'm looking to implement an eval function like in this answer here: https://stackoverflow.com/a/33293116/
However, when I go to compile my code sample:
let eval code =
let as_buf = Lexing.from_string code in
let parsed = !Toploop.parse_toplevel_phrase as_buf in
ignore (Toploop.execute_phrase true Format.std_formatter parsed)
let rec sum_until n =
if n = 0
then 0
else n + sum_until (n - 1);;
let a = print_string "Enter sum_until x where x = an int: "; read_line ();;
print_int eval a;;
with the following:
ocamlbuild UserInputEval.native -pkgs compiler-libs,compiler-libs.toplevel
I am getting the error:
File "_none_", line 1: Error: Cannot find file
/usr/lib/ocaml/compiler-libs/ocamltoplevel.cmxa Command exited with
code 2.
I have checked the compiler-libs directory and I don't have an ocamltoplevel.cmxa file but I do have an ocamltoplevel.cma file.
I'm wondering if this is a simple fix? I'm a bit new to ocaml so I'm not sure how to go about fixing this. Thanks!
The toplevel library is only available in bytecode mode:
ocamlbuild UserInputEval.byte -pkgs compiler-libs,compiler-libs.toplevel
Note also that the compiler-libs package may need to be installed separately (this is at least the case for archlinux).
Nevertheless, your code is probably not doing what are you expecting: you are only feeding the user input to the toplevel interpreter without reading anything from the toplevel state.
If you just want to read an integer, you can do it with simply:
let a = print_string "Enter sum_until x where x = an int: \n"; read_int ();;
print_int (sum_until a);;
without any need for compiler-libs.

How to create a module OCaml and use it?

I'm new to this language/Tecnology
I have a simple question but I can not find answer:
I would like to create a my Module where you can enter OCaml simple functions / assignments such as the following
let rec gcd (m, n) = if m = 0 then n
   else gcd (n mod m, n);;
 
let one = 1;;
let two = 2;;
Use these functions to other programs OCaml
Every OCaml source file forms a module named the same as the file (with first character upper case). So one way to do what you want is to have a file named (say) numtheory.ml:
$ cat numtheory.ml
let rec gcd (m, n) = if m = 0 then n
else gcd (n mod m, n)
let one = 1
let two = 2
This forms a module named Numtheory. You can compile it and link into projects. Or you can compile it and use it from the OCaml toplevel:
$ ocamlc -c numtheory.ml
$ ocaml
OCaml version 4.01.0
# #load "numtheory.cmo";;
# Numtheory.one;;
- : int = 1
# Numtheory.gcd (4, 8);;
- : int = 8
(For what it's worth, this doesn't look like the correct definition of gcd.)

Efficient input in OCaml

Suppose I am writing an OCaml program and my input will be a large stream of integers separated by spaces i.e.
let string = input_line stdin;;
will return a string which looks like e.g. "2 4 34 765 5 ..." Now, the program itself will take a further two values i and j which specify a small subsequence of this input on which the main procedure will take place (let's say that the main procedure is the find the maximum of this sublist). In other words, the whole stream will be inputted into the program but the program will only end up acting on a small subset of the input.
My question is: what is the best way to translate the relevant part of the input stream into something usable i.e. a string of ints? One option would be to convert the whole input string into a list of ints using
let list = List.map int_of_string(Str.split (Str.regexp_string " ") string;;
and then once the bounds i and j have been entered one easily locates the relevant sublist and its maximum. The problem is that the initial pre-processing of the large stream is immensely time-consuming.
Is there an efficient way of locating the small sublist directly from the large stream i.e. processing the input along with the main procedure?
OCaml's standard library is rather small. It provides necessary and sufficient set of orthogonal features, as should do any good standard library. But, usually, this is not enough for a casual user. That's why there exist libraries, that do the stuff, that is rather common.
I would like to mention two the most prominent libraries: Jane Street's Core library and Batteries included (aka Core and Batteries).
Both libraries provides a bunch of high-level I/O functions, but there exists a little problem. It is not possible or even reasonable to try to address any use case in a library. Otherwise the library's interface wont be terse and comprehensible. And your case is non-standard. There is a convention, a tacit agreement between data engineers, to represent a set of things with a set of lines in a file. And to represent one "thing" (or a feature) with a line. So, if you have a dataset where each element is a scalar, you should represent it as a sequence of scalars separated by a newline. Several elements on a single line is only for multidimensional features.
So, with a proper representation, your problem can be solve as simple as (with Core):
open Core.Std
let () =
let filename = "data" in
let max_number =
let open In_channel in
with_file filename
~f:(fold_lines ~init:0
~f:(fun m s -> Int.(max m ## of_string s))) in
printf "Max number is %s is %d\n" filename max_number
You can compile and run this program with corebuild test.byte -- assuming that code is in a file name test.byte and core library is installed (with opam install core if you're using opam).
Also, there exists an excellent library Lwt, that provides a monadic high-level interface to the I/O. With this library, you can parse a set of scalars in a following way:
open Lwt
let program =
let filename = "data" in
let lines = Lwt_io.lines_of_file filename in
Lwt_stream.fold (fun s m -> max m ## int_of_string s) lines 0 >>=
Lwt_io.printf "Max number is %s is %d\n" filename
let () = Lwt_main.run program
This program can be compiled and run with ocamlbuild -package lwt.unix test.byte --, if lwt library is installed on your system (opam install lwt).
So, that is not to say, that your problem cannot be solved (or is hard to be solved) in OCaml, it is just to mention, that you should start with a proper representation. But, suppose, you do not own the representation, and cannot change it. Let's look, how this can be solved efficiently with OCaml. As previous examples represent, in general your problem can be described as a channel folding, i.e. an consequential application of a function f to each value in a file. So, we can define a function fold_channel, that will read an integer value from a channel and apply a function to it and the previously read value. Of course, this function can be further abstracted, by lifting the format argument, but for the demonstration purpose, I suppose, this will be enough.
let rec fold_channel f init ic =
try Scanf.fscanf ic "%u " (fun s -> fold_channel f (f s init) ic)
with End_of_file -> init
let () =
let max_value = open_in "atad" |> fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
Although, I should note that this implementation is not for a heavy duty work. It is even not tail-recursive. If you need really efficient lexer, you can use ocaml's lexer generator, for example.
Update 1
Since there is a word "efficient" in the title, and everybody likes benchmarks, I've decided to compare this three implementations. Of course, since pure OCaml implementation is not tail-recursive it is not comparable to others. You may wonder, why it is not tail-recursive, as all calls to fold_channel is in a tail position. The problem is with exception handler - on each call to the fold channel, we need to remember the init value, since we're going to return it. This is a common issue with recursion and exceptions, you may google it for more examples and explanations.
So, at first we need to fix the third implementation. We will use a common trick with option value.
let id x = x
let read_int ic =
try Some (Scanf.fscanf ic "%u " id) with End_of_file -> None
let rec fold_channel f init ic =
match read_int ic with
| Some s -> fold_channel f (f s init) ic
| None -> init
let () =
let max_value = open_in "atad" |> fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
So, with a new tail-recursive implementation, let's try them all on a big-data. 100_000_000 numbers is a big data for my 7 years old laptop. I've also added a C implementations as a baseline, and an OCaml clone of the C implementation:
let () =
let m = ref 0 in
try
let ic = open_in "atad" in
while true do
let n = Scanf.fscanf ic "%d " (fun x -> x) in
m := max n !m;
done
with End_of_file ->
Printf.printf "max value is %u\n" !m;
close_in ic
Update 2
Yet another implementation, that uses ocamllex. It consists of two files, a lexer specification lex_int.mll
{}
let digit = ['0'-'9']
let space = [' ' '\t' '\n']*
rule next = parse
| eof {None}
| space {next lexbuf}
| digit+ as n {Some (int_of_string n)}
{}
And the implementation:
let rec fold_channel f init buf =
match Lex_int.next buf with
| Some s -> fold_channel f (f s init) buf
| None -> init
let () =
let max_value = open_in "atad" |>
Lexing.from_channel |>
fold_channel max 0 in
Printf.printf "max value is %u\n" max_value
And here are the results:
implementation time ratio rate (MB/s)
plain C 22 s 1.0 12.5
ocamllex 33 s 1.5 8.4
Core 62 s 2.8 4.5
C-like OCaml 83 s 3.7 3.3
fold_channel 84 s 3.8 3.3
Lwt 143 s 6.5 1.9
P.S. You can see, that in this particular case Lwt is an outlier. This doesn't mean that Lwt is slow, it is just not its granularity. And I would like to assure you, that to my experience Lwt is a well suited tool for a HPC. For example, in one of my programs it processes a 30 MB/s network stream in a real-time.
Update 3
By the way, I've tried to address the problem in an abstract way, and I didn't provide a solution for your particular example (with j and k). Since, folding is a generalization of the iteration, it can be easily solved by extending the state (parameter init) to hold a counter and check whether it is contained in a range, that was specified by a user. But, this leads to an interesting consequence: what to do, when you have outran the range? Of course, you can continue to the end, just ignoring the output. Or you can non-locally exit from a function with an exception, something like raise (Done m). Core library provides such facility with a with_return function, that allows you to break out of your computation at any point.
open Core.Std
let () =
let filename = "data" in
let b1,b2 = Int.(of_string Sys.argv.(1), of_string Sys.argv.(2)) in
let range = Interval.Int.create b1 b2 in
let _,max_number =
let open In_channel in
with_return begin fun call ->
with_file filename
~f:(fold_lines ~init:(0,0)
~f:(fun (i,m) s ->
match Interval.Int.compare_value range i with
| `Below -> i+1,m
| `Within -> i+1, Int.(max m ## of_string s)
| `Above -> call.return (i,m)
| `Interval_is_empty -> failwith "empty interval"))
end in
printf "Max number is %s is %d\n" filename max_number
You may use the Scanf module family of functions. For instance, Scanf.fscanf let you read tokens from a channel according to a string format (which is a special type in OCaml).
Your program can be decomposed in two functions:
one which skip a number i of tokens from the input channel,
one which extract the maximum integer out of a number j from a channel
Let's write these:
let rec skip_tokens c i =
match i with
| i when i > 0 -> Scanf.fscanf c "%s " (fun _ -> skip_tokens c ## pred i)
| _ -> ()
let rec get_max c j m =
match j with
| j when j > 0 -> Scanf.fscanf c "%d " (fun x -> max m x |> get_max c (pred j))
| _ -> m
Note the space after the token format indicator in the string which tells the scanner to also swallow all the spaces and carriage returns in between tokens.
All you need to do now is to combine them. Here's a small program you can run from the CLI which takes the i and j parameters, expects a stream of tokens, and print out the maximum value as wanted:
let _ =
let i = int_of_string Sys.argv.(1)
and j = int_of_string Sys.argv.(2) in
skip_tokens stdin (pred i);
get_max stdin j min_int |> print_int;
print_newline ()
You could probably write more flexible combinators by extracting the recursive part out. I'll leave this as an exercise for the reader.

How to use List.nth inside a function

I am new to OCaml. I am trying to use List.nth just like List.length but it keeps giving me a syntax error or complains about not matching the interface defined in another file. Everything seems to work fine if I comment out using List.nth
Thanks
It's hard to help unless you show the code that's not working. Here is a session that uses List.nth:
$ ocaml
OCaml version 4.00.0
# let x = [3;5;7;9];;
val x : int list = [3; 5; 7; 9]
# List.nth x 2;;
- : int = 7
#
Here's a session that defines a function that uses List.nth. (There's nothing special about this.)
# let name_of_day k =
List.nth ["Mon";"Tue";"Wed";"Thu";"Fri";"Sat";"Sun"] k;;
val name_of_day : int -> string = <fun>
# name_of_day 3;;
- : string = "Thu"
# 
(As a side comment: using List.nth is often inappropriate. It takes time proportional to n to find the nth element of a list. People just starting with OCaml often think of it like accessing an array--i.e., constant time--but it's not.)