Why are rational numbers from Num printed as <abstr>? - ocaml

I continue with my exploration on the Num library of Ocaml, with the reason that one whole library about logics was written using it.
Today, I would like to make the negative of a rational number. Obtain -1/2, from 1/2.
To do so, I think that, given an a of type Ratio.ratio, I can compute the negative of it (and return a ratio, not a num) this way:
ratio_of_num (minus_num (num_of_ratio a))
(Functions from: https://ocaml.org/releases/4.05/htmlman/libref/Num.html#TYPEnum)
Now, I would like to check the result, but I always get this solution: Ratio.ratio = <abstr>
The point is that now I realize that I always get this solution when I use ratio_of_num. For instance:
ratio_of_num (Int 2);;
- : Ratio.ratio = <abstr>
I have searched a bit and found this question (OCaml toplevel output formatting) where a different function (ratio_of_int 2) was used, but seems no longer possible. Maybe that ratio is a different library.
Any help?
PS: By the way, in order to replace num in the future, I am trying to install Zarith with opam, but cannot.
My problem is I do opam install zarith and this is displayed:
┌─ The following actions failed
│ λ build conf-gmp 3
└─
╶─ No changes have been performed
The packages you requested declare the following system dependencies. Please
make sure they are installed before retrying:
gmp
So I do opam install gmp and I get:
┌─ The following actions failed
│ λ build gmp 6.2.1
└─
╶─ No changes have been performed
Which offers me no clue on how to continue trying. Any help with this also?
I would appreciate any answer whether for the first question or the second one!!
Here below, I post some editions that have been added to the question, as a result of the conversation below:
EDIT (Solved adding the needed #require)
I have done what #ivg has suggested, but still does not work (I do the initial open Num, because it will ask it otherwise):
─( 23:12:59 )─< command 0 >──────────────────────────────────────{ counter: 0 }─
utop # open Num;;
─( 23:13:00 )─< command 1 >──────────────────────────────────────{ counter: 0 }─
utop # let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
val pp_num : Format.formatter -> num -> unit = <fun>
─( 23:14:11 )─< command 2 >──────────────────────────────────────{ counter: 0 }─
utop # #install_printer pp_num;;
─( 23:14:16 )─< command 3 >──────────────────────────────────────{ counter: 0 }─
utop # ratio_of_num (Int 2);;
- : Ratio.ratio = <abstr>
EDIT 2 (Also needed a #require)
I have also tried Ocaml instead of utop, but the error is worse:
OCaml version 4.10.2
Findlib has been successfully loaded. Additional directives:
#require "package";; to load a package
#list;; to list the available packages
#camlp4o;; to load camlp4 (standard syntax)
#camlp4r;; to load camlp4 (revised syntax)
#predicates "p,q,...";; to set these predicates
Topfind.reset();; to force that packages will be reloaded
#thread;; to enable threads
# open Num;;
# let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
Error: Reference to undefined global `Num'
#
EDIT 3 (Works in Ocaml, instead of utop)
##require "num";;
# let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
val pp_num : Format.formatter -> Num.num -> unit = <fun>
# #install_printer pp_num;;
# ratio_of_num (Int 2);;
- : Ratio.ratio = <ratio 2/1>
#
EDIT 4 (Works in utop, note that printing simplifies the result when it is an integer)
utop # let pp_ratio ppf r = Format.fprintf ppf "%a" pp_num (num_of_ratio r);;
val pp_ratio : Format.formatter -> Ratio.ratio -> unit = <fun>
─( 23:28:07 )─< command 6 >──────────────────────────────────────{ counter: 0 }─
utop # #install_printer pp_ratio;;
─( 23:28:22 )─< command 7 >──────────────────────────────────────{ counter: 0 }─
utop # ratio_of_num (Int 2);;
- : Ratio.ratio = 2
─( 23:28:29 )─< command 8 >──────────────────────────────────────{ counter: 0 }─
utop #

The reason why you have <abstr> instead of the actual representation is that the top-level (aka interpreter) doesn't know how to print the num object. It is easy to teach the top-level, using the #install_printer directive, e.g.,
let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x);;
val pp_num : Format.formatter -> Num.num -> unit = <fun>
# #install_printer pp_num;;
# ratio_of_num (Int 2);;
- : Ratio.ratio = <ratio 2/1>
#
So we defined the pretty-printing function,
let pp_num ppf x = Format.fprintf ppf "%s" (Num.string_of_num x)
And then used the #install_printer directive to install it in the top-level,
# #install_printer pp_num;;
and now every time we have num it will be printed for us.
You can also use this pp_num function together with other Format module functions (that are used for pretty printing), e.g.,
Format.printf "my num = %a" pp_num (ratio_of_num (Int 2))
It might be that an older version of OCaml is unable to derive how to print ratio from the nums itself, so we can help it by defining an additional printer,
# let pp_ratio ppf r = Format.fprintf ppf "%a" pp_num (num_of_ratio r);;
val pp_ratio : Format.formatter -> Ratio.ratio -> unit = <fun>
# #install_printer pp_ratio;;
# ratio_of_num (Int 2);;
- : Ratio.ratio = 2
Re: P.S.
For zarith you need to install system dependencies. You can use opam for that, e.g.,
opam depext --install zarith
it will install the system dependencies (the gmp library) using your operating system package manager and then install the zarith library.

Related

Haskell Regex performance

I've been looking at the existing options for regex in Haskell, and I wanted to understand where the gap in performance came from when comparing the various options with each other and especially with a simple call to grep...
I have a relatively small (~ 110M, compared to a usual several 10s of G in most of my use cases) trace file :
$ du radixtracefile
113120 radixtracefile
$ wc -l radixtracefile
1051565 radixtracefile
I first tried to find how many matches of the (arbitrary) pattern .*504.*ll were in there through grep :
$ time grep -nE ".*504.*ll" radixtracefile | wc -l
309
real 0m0.211s
user 0m0.202s
sys 0m0.010s
I looked at Text.Regex.TDFA (version 1.2.1) with Data.ByteString :
import Control.Monad.Loops
import Data.Maybe
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import Text.Regex.TDFA
import qualified Data.ByteString as B
main = do
f <- B.readFile "radixtracefile"
matches :: [[B.ByteString]] <- f =~~ ".*504.*ll"
mapM_ (putStrLn . show . head) matches
Building and running :
$ ghc -O2 test-TDFA.hs -XScopedTypeVariables
[1 of 1] Compiling Main ( test-TDFA.hs, test-TDFA.o )
Linking test-TDFA ...
$ time ./test-TDFA | wc -l
309
real 0m4.463s
user 0m4.431s
sys 0m0.036s
Then, I looked at Data.Text.ICU.Regex (version 0.7.0.1) with Unicode support:
import Control.Monad.Loops
import qualified Data.Text as T
import qualified Data.Text.IO as TIO
import Data.Text.ICU.Regex
main = do
re <- regex [] $ T.pack ".*504.*ll"
f <- TIO.readFile "radixtracefile"
setText re f
whileM_ (findNext re) $ do
a <- start re 0
putStrLn $ "last match at :"++(show a)
Building and running :
$ ghc -O2 test-ICU.hs
[1 of 1] Compiling Main ( test-ICU.hs, test-ICU.o )
Linking test-ICU ...
$ time ./test-ICU | wc -l
309
real 1m36.407s
user 1m36.090s
sys 0m0.169s
I use ghc version 7.6.3. I haven't had the occasion of testing other Haskell regex options. I knew that I would not get the performance that I had with grep and was more than happy with that, but more or less 20 times slower for the TDFA and ByteString... That is very scary. And I can't really understand why it is what it is, as I naively though that this was a wrapper on a native backend... Am I somehow not using the module correctly ?
(And let's not mention the ICU + Text combo which is going through the roof)
Is there an option that I haven't tested yet that would make me happier ?
EDIT :
Text.Regex.PCRE (version 0.94.4) with Data.ByteString :
import Control.Monad.Loops
import Data.Maybe
import Text.Regex.PCRE
import qualified Data.ByteString as B
main = do
f <- B.readFile "radixtracefile"
matches :: [[B.ByteString]] <- f =~~ ".*504.*ll"
mapM_ (putStrLn . show . head) matches
Building and running :
$ ghc -O2 test-PCRE.hs -XScopedTypeVariables
[1 of 1] Compiling Main ( test-PCRE.hs, test-PCRE.o )
Linking test-PCRE ...
$ time ./test-PCRE | wc -l
309
real 0m1.442s
user 0m1.412s
sys 0m0.031s
Better, but still with a factor of ~7-ish ...
So, after looking at other libraries for a bit, I ended up trying PCRE.Ligth (version 0.4.0.4) :
import Control.Monad
import Text.Regex.PCRE.Light
import qualified Data.ByteString.Char8 as B
main = do
f <- B.readFile "radixtracefile"
let lines = B.split '\n' f
let re = compile (B.pack ".*504.*ll") []
forM_ lines $ \l -> maybe (return ()) print $ match re l []
Here is what I get out of that :
$ ghc -O2 test-PCRELight.hs -XScopedTypeVariables
[1 of 1] Compiling Main ( test-PCRELight.hs, test-PCRELight.o )
Linking test-PCRELight ...
$ time ./test-PCRELight | wc -l
309
real 0m0.832s
user 0m0.803s
sys 0m0.027s
I think this is decent enough for my purposes. I might try to see what happens with the other libs when I manually do the line splitting like I did here, although I doubt it's going to make a big difference.

Printing abstract syntax tree using ppx_deriving

Could some please tell me why this code is not compiling. I am trying to print the abstract syntax tree using ppx_deriving library.
type prog = command list
[##deriving show]
and command =
| Incv | Decv
| Incp | Decp
| Input | Output
| Loop of command list
[##deriving show]
let _ = Format.printf "%s" (show_prog ([Incv, Incv]))
hello:brainfuckinter mukeshtiwari$ ocamlbuild -package ppx_deriving.std ast.byte
+ /Users/mukeshtiwari/.opam/4.02.1/bin/ocamlc.opt -c -I /Users/mukeshtiwari/.opam/4.02.1/lib/ppx_deriving -o ast.cmo ast.ml
File "ast.ml", line 10, characters 28-37:
Error: Unbound value show_prog
Command exited with code 2.
Compilation unsuccessful after building 2 targets (1 cached) in 00:00:00.
hello:brainfuckinter mukeshtiwari$ ocaml
OCaml version 4.02.1
Add -use-ocamlfind as first argument of ocamlbuild. It should solve the issue.
(You also have a typo in [Incv, Incv], the , should be a ;.

Printing only print output with SML/NJ

I'm trying to use SML/NJ, and I use sml < source.sml to run the code, but it prints out too much information.
For example, this is the source.sml:
fun fac 0 = 1
| fac n = n * fac (n - 1)
val r = fac 10 ;
print(Int.toString(r));
This is the output:
Standard ML of New Jersey v110.77 [built: Tue Mar 10 07:03:24 2015]
- val fac = fn : int -> int
val r = 3628800 : int
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[autoloading done]
3628800val it = () : unit
From Suppress "val it" output in Standard ML, How to disable SMLNJ warnings?, and SMLNJ want to remove "val it = () : unit" from every print statement execution, I got some hints how to suppress them.
I execute CM_VERBOSE=false sml < $filename and added one line of Control.Print.out := {say=fn _=>(), flush=fn()=>()}; in the code, but I still have some message:
Standard ML of New Jersey v110.77 [built: Tue Mar 10 07:03:24 2015]
- 3628800
How can I print out only the output?
The sml command is intended to be used interactively. It sounds to me like you would be better off building a standalone executable from your program instead.
There are a few options:
If you are relying on SML/NJ extensions, or if you simply cannot use another ML implementation, you can follow the instructions in this post to build an SML/NJ heap image that can be turned into a standalone executable using heap2exec.
A better option might be to use the MLton compiler, another implementation of Standard ML. It lacks a REPL, but unlike SML/NJ it requires no boilerplate to generate a standalone executable. Building is as simple as issuing:
$ mlton your-program.sml
$ ./your-program
3628800

OCaml compile error with ocamlfind

Here is the code:
class parser =
let test1 = function
| 1 -> print_int 1
| 2 -> print_int 2
| _ -> print_int 3 in
let test = function
| 1 -> print_int 1
| 2 -> print_int 2
| _ -> print_int 3 in
object(self)
end
Here is the _tags
true: syntax(camlp4o)
true: package(deriving,deriving.syntax)
true: thread,debug,annot
true: bin_annot
Here is the compile command:
ocamlbuild -use-ocamlfind test.native
Here is the compile error:
Warning: tag "package" does not expect a parameter, but is used with parameter "deriving,deriving.syntax"
Warning: tag "syntax" does not expect a parameter, but is used with parameter "camlp4o"
+ /usr/local/bin/ocamldep.opt -modules test.ml > test.ml.depends
File "test.ml", line 8, characters 0-3:
Error: Syntax error
Command exited with code 2.
Compilation unsuccessful after building 1 target (0 cached) in 00:00:00.
However, when I use this:
ocamlbuild test.native
Then the code can be successfully compiled...
This is because ocamlbuild -use-ocamlfind test.native directs compiler to use camlp4 parser. It is a bit different from standard OCaml parser. Actually, parser is a keyword in camlp4, so you can't use it as a class name. Just rename it.

Missing characters using Text.Regex.PCRE to parse web page title

I recently made a website that needs to retrieve talk titles from TED website.
So far, the problem is specific to this talk: Francis Collins: We need better drugs -- now
From the web page source, I get:
<title>Francis Collins: We need better drugs -- now | Video on TED.com</title>
<span id="altHeadline" >Francis Collins: We need better drugs -- now</span>
Now, in ghci, I tried this:
λ> :m +Network.HTTP Text.Regex.PCRE
λ> let uri = "http://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html"
λ> body <- (simpleHTTP $ getRequest uri) >>= getResponseBody
λ> body =~ "<span id=\"altHeadline\" >(.+)</span>" :: [[String]]
[["id=\"altHeadline\" >Francis Collins: We need better drugs -- now</span>\n\t\t</h","s Collins: We need better drugs -- now</span"]]
λ> body =~ "<title>(.+)</title>" :: [[String]]
[["tle>Francis Collins: We need better drugs -- now | Video on TED.com</title>\n<l","ncis Collins: We need better drugs -- now | Video on TED.com</t"]]
Either way, the parsed title misses some characters on the left, and has some unintended characters on the right. It seems to have something to do with the -- in talk title. However,
λ> let body' = "<title>Francis Collins: We need better drugs -- now | Video on TED.com</title>"
λ> body' =~ "<title>(.+)</title>" :: [[String]]
[["<title>Francis Collins: We need better drugs -- now | Video on TED.com</title>","Francis Collins: We need better drugs -- now | Video on TED.com"]]
Luckily, this is not a problem with Text.Regex.Posix.
λ> import qualified Text.Regex.Posix as P
λ> body P.=~ "<title>(.+)</title>" :: [[String]]
[["<title>Francis Collins: We need better drugs -- now | Video on TED.com</title>","Francis Collins: We need better drugs -- now | Video on TED.com"]]
My recommendation would be: don't use a regex for parsing HTML. Use a proper HTML parser instead. Here's an example using the html-conduit parser together with the xml-conduit cursor library (and http-conduit for download).
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid (mconcat)
import Network.HTTP.Conduit (simpleHttp)
import Text.HTML.DOM (parseLBS)
import Text.XML.Cursor (attributeIs, content, element,
fromDocument, ($//), (&//), (>=>))
main = do
lbs <- simpleHttp "http://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html"
let doc = parseLBS lbs
cursor = fromDocument doc
print $ mconcat $ cursor $// element "title" &// content
print $ mconcat $ cursor $// element "span" >=> attributeIs "id" "altHeadline" &// content
The code is also available as active code on the School of Haskell.