building a lexical analyser using ml-lex - sml

I need to create a new instance of a lexer tied to the standard input stream.
However, when I type in
val lexer = makeLexer( fn n => inputLine( stdIn ) );
I get an error that I don't understand:
stdIn:1.5-11.13 Error: operator and operand don't agree [tycon mismatch]
operator domain: int -> string
operand: int -> string option
in expression:
(makeLexer is a function name present in my source code)

inputLine returns a string option, and my guess is a string is expected.
What you want to do is either have makeLexer take a string option, like so:
fun makeLexer NONE = <whatever you want to do when stream is empty>
| makeLexer (SOME s) = <the normal body makeLexer, working on the string s>
or change your line to:
val lexer = makeLexer( fn n => valOf ( inputLine( stdIn ) ) );
valOf takes an option type and unpacks it.
Note that, since inputLine returns NONE when the stream is empty, it's probably a better idea to use the first approach, rather than the second.

An example of how to make an interactive stream is given on page 38 (or 32 in the paper) of the User's Guide to ML-Lex and ML-Yacc
The example code could be simpler by using inputLine.
So I would use the example given by Sebastian, keeping in mind that inputLine might return NONE using stdIn atleast if the user presses CTRL-D.
val lexer =
let
fun input f =
case TextIO.inputLine f of
SOME s => s
| NONE => raise Fail "Implement proper error handling."
in
Mlex.makeLexer (fn (n:int) => input TextIO.stdIn)
end
Also the calculator example on page 40 (34 in the paper) shows how to use this in a whole
In general the user guide contains some nice examples and explanations.

Related

I am trying to read a string from stdin and flush it out to stdout but I can't find a Standard ML way

NOTE: I'm totally Newbie in Standard ML. I merely have basic F# knowledge.
This is a good ol' code in C
#include <stdio.h>
int main()
{
char str[100]; // size whatever you want
scanf("%s", str);
printf("%s\n", str);
return 0;
}
now, I want to make a Standard ML-version-equivalent of this code. so I tried this:
val str = valOf (TextIO.inputLine TextIO.stdIn)
val _ = print str
but my SML/NJ says this:
uncaught exception Option
raised at: smlnj/init/pre-perv.sml:21.28-21.34
I googled it, and I also searched this site, but I cannot find any solution which doesn't cause error.
does anyone knows it?
EDIT: I tried this code:
fun main =
let val str = valOf (TextIO.inputLine TextIO.stdIn)
in
case str
of NONE => print "NONE\n"
| _ => print str
end
but it also makes error:
stdIn:1.6-1.10 Error: can't find function arguments in clause
stdIn:4.9-6.33 Error: case object and rules don't agree [tycon mismatch]
rule domain: 'Z option
object: string
in expression:
(case str
of NONE => print "NONE\n"
| _ => print str)
This answer was pretty much given in the next-most recent question tagged sml: How to read string from user keyboard in SML language? -- you can just replace the user keyboard with stdin, since stdin is how you interact with the keyboard using a terminal.
So you have two problems with this code:
fun main =
let val str = valOf (TextIO.inputLine TextIO.stdIn)
in
case str
of NONE => print "NONE\n"
| _ => print str
end
One problem is that if you write fun main then it has to take arguments, e.g. fun main () = .... The () part does not represent "nothing" but rather exactly one thing, being the unit value.
The other problem is eagerness. The Option.valOf function will crash when there is no value, and it will do this before you reach the case-of, making the case-of rather pointless. So what you can do instead is:
fun main () =
case TextIO.inputLine TextIO.stdIn of
SOME s => print s
| NONE => print "NONE\n"
Using the standard library this can be shortened to:
fun main () =
print (Option.getOpt (TextIO.inputLine TextIO.stdIn, "NONE\n"))
I encourage you to read How to read string from user keyboard in SML language?

How to parse a string into a code structure with TemplateHaskell?

Right now, I have the following piece of code in my project:
embedNarration :: String -> Q Exp
embedNarration file =
let text = unsafePerformIO $ readFile file
parsedMaybe = parseNarration text
succ x = case x of
Left x -> throw $ ErrorCall x
Right x' -> x'
parsed = succ parsedMaybe
res = do parser <- [|((fromRight undefined) . parseNarration)|]
d <- return $ seq parsed text
return $ AppE parser $! LitE $! StringL d
in res
Which is jury-rigged from Data.FileEmbed module's source code. The intention of the code is to generate a Narration (which is a data structure defined in-code) from a resource file.
Right now this quite ugly piece that I don't fully understand tries to parse the resource file; throws a compile-time error if the parse is unsuccessful; or, if the parse is successful, embeds the following piece into the source code:
((fromRight undefined) . parseNarration $ "THE ENTIRE RESOURCE FILE")
Where parseNarration is a function :: String -> Either String Narration
The problem here is the double parsing - the resource file is parsed once during compile time to ensure it's valid, and then the second time during runtime from a string literal. Ideally, I want to, instead of a string literal and a call to the parser, for TemplateHaskell to directly substitute a Narration, so that the parser would only exist during compile-time. But I have no idea how to do this. Surface-level guides to TemplateHaskell and trying to jury-rigg the code further haven't been successful. Is it possible to do? If yes, how?

Changing the State of Lexing.lexbuf

I am writing a lexer for Brainfuck with Ocamllex, and to implement its loop, I need to change the state of lexbuf so it can returns to a previous position in the stream.
Background info on Brainfuck (skippable)
in Brainfuck, a loop is accomplished by a pair of square brackets with
the following rule:
[ -> proceed and evaluate the next token
] -> if the current cell's value is not 0, return to the matching [
Thus, the following code evaluates to 15:
+++ [ > +++++ < - ] > .
it reads:
In the first cell, assign 3 (increment 3 times)
Enter loop, move to the next cell
Assign 5 (increment 5 times)
Move back to the first cell, and subtract 1 from its value
Hit the closing square bracket, now the current cell (first) is equals to 2, thus jumps back to [ and proceed into the loop again
Keep going until the first cell is equals to 0, then exit the loop
Move to the second cell and output the value with .
The value in the second cell would have been incremented to 15
(incremented by 5 for 3 times).
Problem:
Basically, I wrote two functions to take care of pushing and popping the last position of the last [ in the header section of brainfuck.mll file, namely push_curr_p and pop_last_p which pushes and pops the lexbuf's current position to a int list ref named loopstack:
{ (* Header *)
let tape = Array.make 100 0
let tape_pos = ref 0
let loopstack = ref []
let push_curr_p (lexbuf: Lexing.lexbuf) =
let p = lexbuf.Lexing.lex_curr_p in
let curr_pos = p.Lexing.pos_cnum in
(* Saving / pushing the position of `[` to loopstack *)
( loopstack := curr_pos :: !loopstack
; lexbuf
)
let pop_last_p (lexbuf: Lx.lexbuf) =
match !loopstack with
| [] -> lexbuf
| hd :: tl ->
(* This is where I attempt to bring lexbuf back *)
( lexbuf.Lexing.lex_curr_p <- { lexbuf.Lexing.lex_curr_p with Lexing.pos_cnum = hd }
; loopstack := tl
; lexbuf
)
}
{ (* Rules *)
rule brainfuck = parse
| '[' { brainfuck (push_curr_p lexbuf) }
| ']' { (* current cell's value must be 0 to exit the loop *)
if tape.(!tape_pos) = 0
then brainfuck lexbuf
(* this needs to bring lexbuf back to the previous `[`
* and proceed with the parsing
*)
else brainfuck (pop_last_p lexbuf)
}
(* ... other rules ... *)
}
The other rules work just fine, but it seems to ignore [ and ]. The problem is obviously at the loopstack and how I get and set lex_curr_p state. Would appreciate any leads.
lex_curr_p is meant to keep track of the current position, so that you can use it in error messages and the like. Setting it to a new value won't make the lexer actually seek back to an earlier position in the file. In fact I'm 99% sure that you can't make the lexer loop like that no matter what you do.
So you can't use ocamllex to implement the whole interpreter like you're trying to do. What you can do (and what ocamllex is designed to do) is to translate the input stream of characters into a stream of tokens.
In other languages that means translating a character stream like var xyz = /* comment */ 123 into a token stream like VAR, ID("xyz"), EQ, INT(123). So lexing helps in three ways: it finds where one token ends and the next begins, it categorizes tokens into different types (identifiers vs. keywords etc.) and discards tokens you don't need (white space and comments). This can simplify further processing a lot.
Lexing Brainfuck is a lot less helpful as all Brainfuck tokens only consist of a single character anyway. So finding out where each token ends and the next begins is a no-op and finding out the type of the token just means comparing the character against '[', '+' etc. So the only useful thing a Brainfuck lexer does is to discard whitespace and comments.
So what your lexer would do is turn the input [,[+-. lala comment ]>] into something like LOOP_START, IN, LOOP_START, INC, DEC, OUT, LOOP_END, MOVE_RIGHT, LOOP_END, where LOOP_START etc. belong to a discriminated union that you (or your parser generator if you use one) defined and made the lexer output.
If you want to use a parser generator, you'd define the token types in the parser's grammar and make the lexer produce values of those types. Then the parser can just parse the token stream.
If you want to do the parsing by hand, you'd call the lexer's token function by hand in a loop to get all the tokens. In order to implement loops, you'd have to store the already-consumed tokens somewhere to be able to loop back. In the end it'd end up being more work than just reading the input into a string, but for a learning exercise I suppose that doesn't matter.
That said, I would recommend going all the way and using a parser generator to create an AST. That way you don't have to create a buffer of tokens for looping and having an AST actually saves you some work (you no longer need a stack to keep track of which [ belongs to which ]).

How to run program in OCaml toplevel with input from file?

I know that in order to load a program in OCaml one has to type #use "source_code_file.ml" in toplevel where source_code_file.ml is the file we want to use.
My program reads input from stdin. In the command line i have a txt file that with redirection is used to act as stdin. Can i do this in toplevel? I would like to this because in toplevel i can easily see what type variables have and if things are initialized with the correct values.
If you're on a Unix-like system you can use Unix.dup2 to do almost any kind of input redirection. Here is a function with_stdin that takes an input file name, a function, and a value. It calls the function with standard input redirected from the named file.
let with_stdin fname f x =
let oldstdin = Unix.dup Unix.stdin in
let newstdin = Unix.openfile fname [Unix.O_RDONLY] 0 in
Unix.dup2 newstdin Unix.stdin;
Unix.close newstdin;
let res = f x in
Unix.dup2 oldstdin Unix.stdin;
Unix.close oldstdin;
res
If your function doesn't consume the entire input the leftover input will confuse the toplevel. Here's an example that does consume its entire input:
# let rec linecount c =
try ignore (read_line ()); linecount (c + 1)
with End_of_file -> c;;
val linecount : int -> int = <fun>
# with_stdin "/etc/passwd" linecount 0;;
- : int = 86
#
This technique is too simple if you wanted to interleave interactions with the toplevel with calls to your function to consume just part of its input. I suspect that would make things too complicated to be worth the effort. It would be much easier (and perhaps better overall) to rewrite your code to work with an explicitly specified input channel.

Haskell - Concat a list of strings

Im trying to create a list of strings using some recursion.
Basically i want to take a part of a string up to a certain point. Create a list from that and then process the rest of the string through recursion.
type DocName = FilePath
type Line = (Int,String)
type Document = [Line]
splitLines :: String -> Document
splitLines [] = []
splitLines str | length str == 0 = []
| otherwise = zip [0..(length listStr)] listStr
where
listStr = [getLine] ++ splitLines getRest
getLine = (takeWhile (/='\n') str)
getRest = (dropWhile (=='\n') (dropWhile (/='\n') str))
Thats what i got. But it just concats the strings back together since they are list of characters themselves. But i want to create a list of strings.
["test","123"] if the input was "test\n123\n"
Thanks
If you try to compile your code, you'll get an error message telling you that in the line
listStr = [getLine] ++ splitLines getRest
splitLines getRest has type Document, but it should have type [String]. This is easy enough to understand, since [getLine] is a list of strings (well a list of one string) and so it can only be concatenated with another list of strings, not a list of int-string-tuples.
So to fix this we can use map to replace each int-string-tuple in the Document with only the string to get a list of strings, i.e.:
listStr = [getLine] ++ map snd (splitLines getRest)
After changing the line to the above your code will compile and run just fine.
But it just concats the strings back together since they are list of characters themselves.
I'm not sure why you think that.
The reason your code did not compile was because of the type of splitLines as I explained above. Once you fix that error, the code behaves exactly as you want it to, returning a list of integer-string-tuples. At no point are strings concatenated.
Well, if you wrote this just to practice recursion then it is fine once you fix error mentioned by sepp2k. But in real code, I would prefer -
splitLines str = zip [0..] (lines str)
Or even
splitLines = zip [0..] . lines