how to safely discard an unsused variable in OCAML - ocaml

Here's the part of my code :
(* Read the input file *)
let a = input_char inc in
(* Check if a is a number *)
if char_is_number a then
(* Read the second letter *)
let b = input_char inc in
(* Discard the space *)
input_char inc;
Where inc is a input_channel. It's reading from a .map file ( by the way, if you have good libraries that I am unaware of that can handle .map file I would gladly take it ) input_char will read the next character.
Basically, I'm reading 1 number and a character. The 3rd should be a space ( I will be doing these verification later ) and will be discarded.
My current code raises a warning, saying the last line should be unit
Is there a safe/elegant/right way to discard the next character read?

To ignore the return value of an expression, you simply need to use ignore function which serves exactly this purpose.
let b = input_char inc in
ignore(input_char inc);
To parse sufficiently complex files, you probably should consider OCamllex + Menhir, especially if you ever used lex/flex & yacc/bison.

While ignore will do what you want, it looks like using the wildcard pattern, _, might suit you better in this case, since you're otherwise assigning to "variables".
Consider
let b = input_char inc in
let _ = input_char inc in
let c = input_char inc in
...
vs
let b = input_char inc in
ignore (input_char inc);
let c = input_char inc in
...
The wildcard pattern, which you might have come across when using match, matches anything and then simply discards the value without binding it to a name. You can use any pattern with the let <pattern> in <expression> construct.

Related

Pretty-printing with a comment string prefixing a box

I am trying to generate a text file for use in another program. This program only has line-style comments. I want to pretty-print a comment that, whenever the line is broken, it is prefixed by //.
Here is what I have so far:
type elaborate_type = A | B
let elaborate_to_string = function
| A -> "OK, this is type A, but long"
| B -> "B"
let pp_elaborate chan v = Format.pp_print_string chan (elaborate_to_string v)
Format.printf "#[<hv2>{#,#[<hov>// Here is a long comment I want to break# // \
here, but also indent. It should also be the case that anything# // \
I put here (such as some complex printable term \"%a\") should# // \
only break if it has //, too).#]#,\
#[...#]\
#]#,}#."
pp_elaborate A
which gives the output
{
// Here is a long comment I want to break
// here, but also indent. It should also be the case that anything
// I put here (such as some complex printable term "OK, this is type A, but long") should
// only break if it has //, too).
...
}
Is there a way to do this without adding the //# to the end of each line I want to break?
A option to solving this issue is to update the newline function of the formatter to make it prints // right after the newline:
let add_double_slash_after_linebreak_and_before_indents fmt =
let fns = Format.pp_get_formatter_out_functions fmt () in
let out_newline () =
fns.out_newline ();
fns.out_string "//" 0 2
in
Format.pp_set_formatter_out_functions fmt { fns with out_newline}
let () =
let () =
add_double_slash_after_linebreak_and_before_indents Format.std_formatter
in
Format.printf "#[<v 2>This tests the formatting#,One line#,two line #]"
This tests the formatting
// One line
// two line val add_double_slash_after_linebreak_and_before_indents :
However, the double slashes // will appear at the start of the line independently of the indentation, if you prefer them to appear after the indentation, you can update the indentation function of the formatter instead:
let add_double_slash_after_linebreak_and_indents fmt =
let fns = Format.pp_get_formatter_out_functions fmt () in
let out_indent n =
fns.out_indent n;
fns.out_string "//" 0 2
in
Format.pp_set_formatter_out_functions fmt { fns with out_indent}
let () =
let () =
add_double_slash_after_linebreak_and_indents Format.std_formatter
in
Format.printf "#[<v 2>This tests the formatting#,One line#,two line #]"
This tests the formatting
//One line
//two line
Concerning your follow-up question, any \n in a string will mess up the formatting if there are printed with %s. You can avoid this issue by using pp_print_text which replaces and \n in the string by calls to pp_print_space and pp_force_line.

What is this OCaml function returning?

As I understand it, OCaml doesn't require explicit return statements to yield a value from a function. The last line of the function is what returns something.
In that case, could someone please let me know what the following function foo is returning? It seems that it's returning a stream of data. Is it returning the lexer?
and foo ?(input = false) =
lexer
| 'x' _
-> let y = get_func lexbuf
get_text y
| ',' -> get_func lexbuf
| _ -> get_text lexbuf
I'm trying to edit the following function, bar, to return a data stream, as well, so that I can replace foo with bar in another function. However, it seems that bar has multiple lexers which is preventing this return. How can I rewrite bar to return a data stream in a similar way that foo appears to?
let bar cmd lexbuf =
let buff = Buffer.create 0 in
let quot plus =
lexer
| "<" -> if plus then Buffer.add_string b "<" quot plus lexbuf
and unquot plus =
lexer
| ">" -> if plus then Buffer.add_string b ">" unquot plus lexbuf
in
match unquot true lexbuf with
| e -> force_text cmd e
First, your code is probably using one of the old camlp4 syntax extension, you should precise that.
Second, foo is returning the same type of value as either get_text or get_funct. Without the code for those functions, it is not really possible to say more than that.
Third,
Buffer.add_string b ">" unquot plus lexbuf
is ill-typed. Are you missing parentheses:
Buffer.add_string b ">" (unquot plus lexbuf)
?

Changing the State of Lexing.lexbuf

I am writing a lexer for Brainfuck with Ocamllex, and to implement its loop, I need to change the state of lexbuf so it can returns to a previous position in the stream.
Background info on Brainfuck (skippable)
in Brainfuck, a loop is accomplished by a pair of square brackets with
the following rule:
[ -> proceed and evaluate the next token
] -> if the current cell's value is not 0, return to the matching [
Thus, the following code evaluates to 15:
+++ [ > +++++ < - ] > .
it reads:
In the first cell, assign 3 (increment 3 times)
Enter loop, move to the next cell
Assign 5 (increment 5 times)
Move back to the first cell, and subtract 1 from its value
Hit the closing square bracket, now the current cell (first) is equals to 2, thus jumps back to [ and proceed into the loop again
Keep going until the first cell is equals to 0, then exit the loop
Move to the second cell and output the value with .
The value in the second cell would have been incremented to 15
(incremented by 5 for 3 times).
Problem:
Basically, I wrote two functions to take care of pushing and popping the last position of the last [ in the header section of brainfuck.mll file, namely push_curr_p and pop_last_p which pushes and pops the lexbuf's current position to a int list ref named loopstack:
{ (* Header *)
let tape = Array.make 100 0
let tape_pos = ref 0
let loopstack = ref []
let push_curr_p (lexbuf: Lexing.lexbuf) =
let p = lexbuf.Lexing.lex_curr_p in
let curr_pos = p.Lexing.pos_cnum in
(* Saving / pushing the position of `[` to loopstack *)
( loopstack := curr_pos :: !loopstack
; lexbuf
)
let pop_last_p (lexbuf: Lx.lexbuf) =
match !loopstack with
| [] -> lexbuf
| hd :: tl ->
(* This is where I attempt to bring lexbuf back *)
( lexbuf.Lexing.lex_curr_p <- { lexbuf.Lexing.lex_curr_p with Lexing.pos_cnum = hd }
; loopstack := tl
; lexbuf
)
}
{ (* Rules *)
rule brainfuck = parse
| '[' { brainfuck (push_curr_p lexbuf) }
| ']' { (* current cell's value must be 0 to exit the loop *)
if tape.(!tape_pos) = 0
then brainfuck lexbuf
(* this needs to bring lexbuf back to the previous `[`
* and proceed with the parsing
*)
else brainfuck (pop_last_p lexbuf)
}
(* ... other rules ... *)
}
The other rules work just fine, but it seems to ignore [ and ]. The problem is obviously at the loopstack and how I get and set lex_curr_p state. Would appreciate any leads.
lex_curr_p is meant to keep track of the current position, so that you can use it in error messages and the like. Setting it to a new value won't make the lexer actually seek back to an earlier position in the file. In fact I'm 99% sure that you can't make the lexer loop like that no matter what you do.
So you can't use ocamllex to implement the whole interpreter like you're trying to do. What you can do (and what ocamllex is designed to do) is to translate the input stream of characters into a stream of tokens.
In other languages that means translating a character stream like var xyz = /* comment */ 123 into a token stream like VAR, ID("xyz"), EQ, INT(123). So lexing helps in three ways: it finds where one token ends and the next begins, it categorizes tokens into different types (identifiers vs. keywords etc.) and discards tokens you don't need (white space and comments). This can simplify further processing a lot.
Lexing Brainfuck is a lot less helpful as all Brainfuck tokens only consist of a single character anyway. So finding out where each token ends and the next begins is a no-op and finding out the type of the token just means comparing the character against '[', '+' etc. So the only useful thing a Brainfuck lexer does is to discard whitespace and comments.
So what your lexer would do is turn the input [,[+-. lala comment ]>] into something like LOOP_START, IN, LOOP_START, INC, DEC, OUT, LOOP_END, MOVE_RIGHT, LOOP_END, where LOOP_START etc. belong to a discriminated union that you (or your parser generator if you use one) defined and made the lexer output.
If you want to use a parser generator, you'd define the token types in the parser's grammar and make the lexer produce values of those types. Then the parser can just parse the token stream.
If you want to do the parsing by hand, you'd call the lexer's token function by hand in a loop to get all the tokens. In order to implement loops, you'd have to store the already-consumed tokens somewhere to be able to loop back. In the end it'd end up being more work than just reading the input into a string, but for a learning exercise I suppose that doesn't matter.
That said, I would recommend going all the way and using a parser generator to create an AST. That way you don't have to create a buffer of tokens for looping and having an AST actually saves you some work (you no longer need a stack to keep track of which [ belongs to which ]).

Finding permutations using regular expressions

I need to create a regular expression (for program in haskell) that will catch the strings containing "X" and ".", assuming that there are 4 "X" and only one ".". It cannot catch any string with other X-to-dot relations.
I have thought about something like
[X\.]{5}
But it catches also "XXXXX" or ".....", so it isn't what I need.
That's called permutation parsing, and while "pure" regular expressions can't parse permutations it's possible if your regex engine supports lookahead. (See this answer for an example.)
However I find the regex in the linked answer difficult to understand. It's cleaner in my opinion to use a library designed for permutation parsing, such as megaparsec.
You use the Text.Megaparsec.Perm module by building a PermParser in a quasi-Applicative style using the <||> operator, then converting it into a regular MonadParsec action using makePermParser.
So here's a parser which recognises any combination of four Xs and one .:
import Control.Applicative
import Data.Ord
import Data.List
import Text.Megaparsec
import Text.Megaparsec.Perm
fourXoneDot :: Parsec Dec String String
fourXoneDot = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = [a, b, c, d, e]
x = char 'X'
dot = char '.'
I'm applying the mkFive function, which just stuffs its arguments into a five-element list, to four instances of the x parser and one dot, combined with <||>.
ghci> parse fourXoneDot "" "XXXX."
Right "XXXX."
ghci> parse fourXoneDot "" "XX.XX"
Right "XXXX."
ghci> parse fourXoneDot "" "XX.X"
Left {- ... -}
This parser always returns "XXXX." because that's the order I combined the parsers in: I'm mapping mkFive over the five parsers and it doesn't reorder its arguments. If you want the permutation parser to return its input string exactly, the trick is to track the current position within the component parsers, and then sort the output.
fourXoneDotSorted :: Parsec Dec String String
fourXoneDotSorted = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = map snd $ sortBy (comparing fst) [a, b, c, d, e]
x = withPos (char 'X')
dot = withPos (char '.')
withPos = liftA2 (,) getPosition
ghci> parse fourXoneDotSorted "" "XX.XX"
Right "XX.XX"
As the megaparsec docs note, the implementation of the Text.Megaparsec.Perm module is based on Parsing Permutation Phrases; the idea is described in detail in the paper and the accompanying slides.
The other answers look quite complicated to me, given that there are only five strings in this language. Here's a perfectly fine and very readable regex for this:
\.XXXX|X\.XXX|XX\.XX|XXX\.X|XXXX\.
Are you attached to regex, or did you just end up at regex because this was a question you didn't want to try answering with applicative parsers?
Here's the simplest possible attoparsec implementation I can think of:
parseDotXs :: Parser ()
parseDotXs = do
dotXs <- count 5 (satisfy (inClass ".X"))
let (dots,xS) = span (=='.') . sort $ dotXs
if (length dots == 1) && (length xS == 4) then do
return ()
else do
fail "Mismatch between dots and Xs"
You may need to adjust slightly depending on your input type.
There are tons of fancy ways to do stuff in applicative parsing land, but there is no rule saying you can't just do things the rock-stupid simple way.
Try the following regex :
(?<=^| )(?=[^. ]*\.)(?=(?:[^X ]*X){4}).{5}(?=$| )
Demo here
If you have one word per string, you can simplify the regex by this one :
^(?=[^. \n]*\.)(?=(?:[^X \n]*X){4}).{5}$
Demo here

writing main() to call a function

I have an OCaml function that converts a string to an array. What is the canonical way of writing a "main" function to call this and print the array.
let createArray pattern patArray =
(* some unimportant way of setting all the elements in the array patArray
based on the string pattern *)
let main () =
let pattern = "Pattern" in
let patArray = Array.create (String.length pattern) 0 in
let res = createArray pattern patArray in
Array.iter ~f:(printf "%d ") patArray;; <------------------
main ()
1) In the above, if I leave out the ';;' , it does not work. What is the significance of that?
2) Instead of using a dummy binding "res" , can I somehow just write two statements to be executed sequentially , like so:
createArray pattern patArray
Array.iter ~f:(printf "%d ") patArray
Without the ;;, the parser cannot know that the main () call following that line is supposed to be a stand-alone expression (whitespace is not significant here).
You can use the following idiom instead:
let main () = ...
let () = main ()
The let () = expr idiom will evaluate an expression of type unit at that point. The initial let informs the parser that a new top-level let construct begins. Using ;; is an alternative way to tell the parser about the end of a top-level construct, but is primarily intended for interactive use.
In order to evaluate two expressions sequentially, separate them with a semicolon (use parentheses or begin ... end if you're unsure about precedence rules). For example:
let patArray = Array.create (String.length pattern) 0 in
createArray pattern patArray;
Array.iter ~f:(printf "%d ") patArray
Or, using begin and end to make precedence clearer:
let patArray = Array.create (String.length pattern) 0 in begin
createArray pattern patArray;
Array.iter ~f:(printf "%d ") patArray
end
Without the ;, the parser would not know whether Array.iter on the next line is supposed to an additional argument to the createArray call.