Printing abstract syntax tree using ppx_deriving - ocaml

Could some please tell me why this code is not compiling. I am trying to print the abstract syntax tree using ppx_deriving library.
type prog = command list
[##deriving show]
and command =
| Incv | Decv
| Incp | Decp
| Input | Output
| Loop of command list
[##deriving show]
let _ = Format.printf "%s" (show_prog ([Incv, Incv]))
hello:brainfuckinter mukeshtiwari$ ocamlbuild -package ppx_deriving.std ast.byte
+ /Users/mukeshtiwari/.opam/4.02.1/bin/ocamlc.opt -c -I /Users/mukeshtiwari/.opam/4.02.1/lib/ppx_deriving -o ast.cmo ast.ml
File "ast.ml", line 10, characters 28-37:
Error: Unbound value show_prog
Command exited with code 2.
Compilation unsuccessful after building 2 targets (1 cached) in 00:00:00.
hello:brainfuckinter mukeshtiwari$ ocaml
OCaml version 4.02.1

Add -use-ocamlfind as first argument of ocamlbuild. It should solve the issue.
(You also have a typo in [Incv, Incv], the , should be a ;.

Related

Why are rational numbers from Num printed as <abstr>?

I continue with my exploration on the Num library of Ocaml, with the reason that one whole library about logics was written using it.
Today, I would like to make the negative of a rational number. Obtain -1/2, from 1/2.
To do so, I think that, given an a of type Ratio.ratio, I can compute the negative of it (and return a ratio, not a num) this way:
ratio_of_num (minus_num (num_of_ratio a))
(Functions from: https://ocaml.org/releases/4.05/htmlman/libref/Num.html#TYPEnum)
Now, I would like to check the result, but I always get this solution: Ratio.ratio = <abstr>
The point is that now I realize that I always get this solution when I use ratio_of_num. For instance:
ratio_of_num (Int 2);;
- : Ratio.ratio = <abstr>
I have searched a bit and found this question (OCaml toplevel output formatting) where a different function (ratio_of_int 2) was used, but seems no longer possible. Maybe that ratio is a different library.
Any help?
PS: By the way, in order to replace num in the future, I am trying to install Zarith with opam, but cannot.
My problem is I do opam install zarith and this is displayed:
┌─ The following actions failed
│ λ build conf-gmp 3
└─
╶─ No changes have been performed
The packages you requested declare the following system dependencies. Please
make sure they are installed before retrying:
gmp
So I do opam install gmp and I get:
┌─ The following actions failed
│ λ build gmp 6.2.1
└─
╶─ No changes have been performed
Which offers me no clue on how to continue trying. Any help with this also?
I would appreciate any answer whether for the first question or the second one!!
Here below, I post some editions that have been added to the question, as a result of the conversation below:
EDIT (Solved adding the needed #require)
I have done what #ivg has suggested, but still does not work (I do the initial open Num, because it will ask it otherwise):
─( 23:12:59 )─< command 0 >──────────────────────────────────────{ counter: 0 }─
utop # open Num;;
─( 23:13:00 )─< command 1 >──────────────────────────────────────{ counter: 0 }─
utop # let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
val pp_num : Format.formatter -> num -> unit = <fun>
─( 23:14:11 )─< command 2 >──────────────────────────────────────{ counter: 0 }─
utop # #install_printer pp_num;;
─( 23:14:16 )─< command 3 >──────────────────────────────────────{ counter: 0 }─
utop # ratio_of_num (Int 2);;
- : Ratio.ratio = <abstr>
EDIT 2 (Also needed a #require)
I have also tried Ocaml instead of utop, but the error is worse:
OCaml version 4.10.2
Findlib has been successfully loaded. Additional directives:
#require "package";; to load a package
#list;; to list the available packages
#camlp4o;; to load camlp4 (standard syntax)
#camlp4r;; to load camlp4 (revised syntax)
#predicates "p,q,...";; to set these predicates
Topfind.reset();; to force that packages will be reloaded
#thread;; to enable threads
# open Num;;
# let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
Error: Reference to undefined global `Num'
#
EDIT 3 (Works in Ocaml, instead of utop)
##require "num";;
# let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
val pp_num : Format.formatter -> Num.num -> unit = <fun>
# #install_printer pp_num;;
# ratio_of_num (Int 2);;
- : Ratio.ratio = <ratio 2/1>
#
EDIT 4 (Works in utop, note that printing simplifies the result when it is an integer)
utop # let pp_ratio ppf r = Format.fprintf ppf "%a" pp_num (num_of_ratio r);;
val pp_ratio : Format.formatter -> Ratio.ratio -> unit = <fun>
─( 23:28:07 )─< command 6 >──────────────────────────────────────{ counter: 0 }─
utop # #install_printer pp_ratio;;
─( 23:28:22 )─< command 7 >──────────────────────────────────────{ counter: 0 }─
utop # ratio_of_num (Int 2);;
- : Ratio.ratio = 2
─( 23:28:29 )─< command 8 >──────────────────────────────────────{ counter: 0 }─
utop #
The reason why you have <abstr> instead of the actual representation is that the top-level (aka interpreter) doesn't know how to print the num object. It is easy to teach the top-level, using the #install_printer directive, e.g.,
let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
val pp_num : Format.formatter -> Num.num -> unit = <fun>
# #install_printer pp_num;;
# ratio_of_num (Int 2);;
- : Ratio.ratio = <ratio 2/1>
#
So we defined the pretty-printing function,
let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x)
And then used the #install_printer directive to install it in the top-level,
# #install_printer pp_num;;
and now every time we have num it will be printed for us.
You can also use this pp_num function together with other Format module functions (that are used for pretty printing), e.g.,
Format.printf "my num = %a" pp_num (ratio_of_num (Int 2))
It might be that an older version of OCaml is unable to derive how to print ratio from the nums itself, so we can help it by defining an additional printer,
# let pp_ratio ppf r = Format.fprintf ppf "%a" pp_num (num_of_ratio r);;
val pp_ratio : Format.formatter -> Ratio.ratio -> unit = <fun>
# #install_printer pp_ratio;;
# ratio_of_num (Int 2);;
- : Ratio.ratio = 2
Re: P.S.
For zarith you need to install system dependencies. You can use opam for that, e.g.,
opam depext --install zarith
it will install the system dependencies (the gmp library) using your operating system package manager and then install the zarith library.

How to make Spark session read all the files recursively?

Displaying the directories under which JSON files are stored:
$ tree -d try/
try/
├── 10thOct_logs1
├── 11thOct
│   └── logs2
└── Oct
└── 12th
└── logs3
Task is to read all logs using SparkSession.
Is there an elegant way to read through all the files in directories and then sub-directories recursively?
Few commands that I tried are prone to cause unintentional exclusion.
spark.read.json("file:///var/foo/try/<exp>")
+----------+---+-----+-------+
| <exp> -> | * | */* | */*/* |
+----------+---+-----+-------+
| logs1 | y | y | n |
| logs2 | n | y | y |
| logs3 | n | n | y |
+----------+---+-----+-------+
You can see in the above table that none of the three expressions matches all the directories (located at 3 different depths) at the same time. Frankly speaking, I wasn't expecting the exclusion of 10thOct_logs1 while using the third expression */*/*.
This makes me conclude that whatever files or directories path match against the expression following last / is considered as an exact match, and everything else is ignored.
Update
A new option was introduced in Spark 3 to read from nested folder recursiveFileLookup :
spark.read.option("recursiveFileLookup", "true").json("file:///var/foo/try")
For older versions, alternatively, you can use Hadoop listFiles to list recursively all the file paths and then pass them to Spark read:
import org.apache.hadoop.fs.{Path}
val conf = sc.hadoopConfiguration
// get all file paths
val fromFolder = new Path("file:///var/foo/try/")
val logfiles = fromFolder.getFileSystem(conf).listFiles(fromFolder, true)
var files = Seq[String]()
while (logfiles.hasNext) {
// one can filter here some specific files
files = files :+ logfiles.next().getPath().toString
}
// read multiple paths
val df = spark.read.csv(files: _*)
df.select(input_file_name()).distinct().show(false)
+-------------------------------------+
|input_file_name() |
+-------------------------------------+
|file:///var/foo/try/11thOct/log2.csv |
|file:///var/foo/try/10thOct_logs1.csv|
|file:///var/foo/try/Oct/12th/log3.csv|
+-------------------------------------+
Unfortunately hadoop globs do not support recursive globs. See Querying the Filesystem#File Patterns
There is an option to list multiple globs for each dir level.
{a,b} alternation Matches either expression a or b
You have to be careful not to match same file twice, otherwise it will appear as duplicate.
spark.read.json("./try/{*logs*,*/*logs*,*/*/*logs*}")
You can also load multiple dataframes and union them
val dfs = List(
spark.read.json("./try/*logs*"),
spark.read.json("./try/*/*logs*"),
spark.read.json("./try/*/*/*logs*")
)
val df = dfs.reduce(_ union _)

Haskell Regex performance

I've been looking at the existing options for regex in Haskell, and I wanted to understand where the gap in performance came from when comparing the various options with each other and especially with a simple call to grep...
I have a relatively small (~ 110M, compared to a usual several 10s of G in most of my use cases) trace file :
$ du radixtracefile
113120 radixtracefile
$ wc -l radixtracefile
1051565 radixtracefile
I first tried to find how many matches of the (arbitrary) pattern .*504.*ll were in there through grep :
$ time grep -nE ".*504.*ll" radixtracefile | wc -l
309
real 0m0.211s
user 0m0.202s
sys 0m0.010s
I looked at Text.Regex.TDFA (version 1.2.1) with Data.ByteString :
import Control.Monad.Loops
import Data.Maybe
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import Text.Regex.TDFA
import qualified Data.ByteString as B
main = do
f <- B.readFile "radixtracefile"
matches :: [[B.ByteString]] <- f =~~ ".*504.*ll"
mapM_ (putStrLn . show . head) matches
Building and running :
$ ghc -O2 test-TDFA.hs -XScopedTypeVariables
[1 of 1] Compiling Main ( test-TDFA.hs, test-TDFA.o )
Linking test-TDFA ...
$ time ./test-TDFA | wc -l
309
real 0m4.463s
user 0m4.431s
sys 0m0.036s
Then, I looked at Data.Text.ICU.Regex (version 0.7.0.1) with Unicode support:
import Control.Monad.Loops
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import Data.Text.ICU.Regex
main = do
re <- regex [] $ T.pack ".*504.*ll"
f <- TIO.readFile "radixtracefile"
setText re f
whileM_ (findNext re) $ do
a <- start re 0
putStrLn $ "last match at :"++(show a)
Building and running :
$ ghc -O2 test-ICU.hs
[1 of 1] Compiling Main ( test-ICU.hs, test-ICU.o )
Linking test-ICU ...
$ time ./test-ICU | wc -l
309
real 1m36.407s
user 1m36.090s
sys 0m0.169s
I use ghc version 7.6.3. I haven't had the occasion of testing other Haskell regex options. I knew that I would not get the performance that I had with grep and was more than happy with that, but more or less 20 times slower for the TDFA and ByteString... That is very scary. And I can't really understand why it is what it is, as I naively though that this was a wrapper on a native backend... Am I somehow not using the module correctly ?
(And let's not mention the ICU + Text combo which is going through the roof)
Is there an option that I haven't tested yet that would make me happier ?
EDIT :
Text.Regex.PCRE (version 0.94.4) with Data.ByteString :
import Control.Monad.Loops
import Data.Maybe
import Text.Regex.PCRE
import qualified Data.ByteString as B
main = do
f <- B.readFile "radixtracefile"
matches :: [[B.ByteString]] <- f =~~ ".*504.*ll"
mapM_ (putStrLn . show . head) matches
Building and running :
$ ghc -O2 test-PCRE.hs -XScopedTypeVariables
[1 of 1] Compiling Main ( test-PCRE.hs, test-PCRE.o )
Linking test-PCRE ...
$ time ./test-PCRE | wc -l
309
real 0m1.442s
user 0m1.412s
sys 0m0.031s
Better, but still with a factor of ~7-ish ...
So, after looking at other libraries for a bit, I ended up trying PCRE.Ligth (version 0.4.0.4) :
import Control.Monad
import Text.Regex.PCRE.Light
import qualified Data.ByteString.Char8 as B
main = do
f <- B.readFile "radixtracefile"
let lines = B.split '\n' f
let re = compile (B.pack ".*504.*ll") []
forM_ lines $ \l -> maybe (return ()) print $ match re l []
Here is what I get out of that :
$ ghc -O2 test-PCRELight.hs -XScopedTypeVariables
[1 of 1] Compiling Main ( test-PCRELight.hs, test-PCRELight.o )
Linking test-PCRELight ...
$ time ./test-PCRELight | wc -l
309
real 0m0.832s
user 0m0.803s
sys 0m0.027s
I think this is decent enough for my purposes. I might try to see what happens with the other libs when I manually do the line splitting like I did here, although I doubt it's going to make a big difference.

How to take the output of Sys.command as string in OCaml?

In OCaml, I have this piece of code:
let s =Sys.command ("minisat test.txt | grep 'SATIS' ");;
I want to take the output of minisat test.txt | grep "SATIS" , which is SATISFIABLE/UNSATISFIABLE to the string s.
I am getting the following output:
SATISFIABLE
val s : int = 0
So, how can I make the output of this command to a string.
Also, is it possible to even import time?
This is the output I get when I try minisat test.txt in terminal
WARNING: for repeatability, setting FPU to use double precision
============================[ Problem Statistics ]=============================
| |
| Number of variables: 5 |
| Number of clauses: 3 |
| Parse time: 0.00 s |
| Eliminated clauses: 0.00 Mb |
| Simplification time: 0.00 s |
| |
============================[ Search Statistics ]==============================
| Conflicts | ORIGINAL | LEARNT | Progress |
| | Vars Clauses Literals | Limit Clauses Lit/Cl | |
===============================================================================
===============================================================================
restarts : 1
conflicts : 0 (-nan /sec)
decisions : 1 (0.00 % random) (inf /sec)
propagations : 0 (-nan /sec)
conflict literals : 0 (-nan % deleted)
Memory used : 8.00 MB
CPU time : 0 s
SATISFIABLE
If you use just Sys, you can't.
However, you can create a temporary file (see the Filename module's documentation here) and tell the command to output in it:
let string_of_command () =
let tmp_file = Filename.temp_file "" ".txt" in
let _ = Sys.command ## "minisat test.txt | grep 'SATIS' >" ^ tmp_file in
let chan = open_in tmp_file in
let s = input_line chan in
close_in chan;
s
Note that this function is drafty: you have to properly handle potential errors happening. Anyway, you can adapt it to your needs I guess.
You can avoid the temporary file trick by using the Unix library or more advanced libraries.
You have to use Unix.open_process_in or Unix.create_process, if you want to capture the output.
Or better use a higher level wrapper like 'shell' (from ocamlnet):
http://projects.camlcity.org/projects/dl/ocamlnet-4.0.2/doc/html-main/Shell_intro.html
But I wouldn't pipe it to grep (not portable). Parse the output with your favorite regex library inside OCAML.

OCaml compile error with ocamlfind

Here is the code:
class parser =
let test1 = function
| 1 -> print_int 1
| 2 -> print_int 2
| _ -> print_int 3 in
let test = function
| 1 -> print_int 1
| 2 -> print_int 2
| _ -> print_int 3 in
object(self)
end
Here is the _tags
true: syntax(camlp4o)
true: package(deriving,deriving.syntax)
true: thread,debug,annot
true: bin_annot
Here is the compile command:
ocamlbuild -use-ocamlfind test.native
Here is the compile error:
Warning: tag "package" does not expect a parameter, but is used with parameter "deriving,deriving.syntax"
Warning: tag "syntax" does not expect a parameter, but is used with parameter "camlp4o"
+ /usr/local/bin/ocamldep.opt -modules test.ml > test.ml.depends
File "test.ml", line 8, characters 0-3:
Error: Syntax error
Command exited with code 2.
Compilation unsuccessful after building 1 target (0 cached) in 00:00:00.
However, when I use this:
ocamlbuild test.native
Then the code can be successfully compiled...
This is because ocamlbuild -use-ocamlfind test.native directs compiler to use camlp4 parser. It is a bit different from standard OCaml parser. Actually, parser is a keyword in camlp4, so you can't use it as a class name. Just rename it.