Receiving Stdlib.Scanf.Scan_failure : character '\\n' - ocaml

I downloaded and executed utop as guided here, and I ran the following code:
Scanf.scanf "%d %d" (fun a b -> Printf.printf "%d\n" (a - b));;
On the first time I input 3 1, it worked fine, giving 2 - : unit = (),
but after the second try with the same input, it keeps on giving the message:
Exception:
Stdlib.Scanf.Scan_failure
"scanf: bad input at char number 3: character '\\n' is not a decimal digit".

Scanf consumes as little input as possible.
If you evaluate
Scanf.scanf "%d %d" (fun a b -> Printf.printf "%d\n" (a - b))
and send to the standard input
3 1\n
Scanf reads and consumes the 3 1 prefix and leaves the newline character \n in the input buffer.
Then the next call to
Scanf.scanf "%d %d" (fun a b -> Printf.printf "%d\n" (a - b))
will be stuck on this remaining character and fail with
Stdlib.Scanf.Scan_failure
"scanf: bad input at char number 3: character '\\n' is not a decimal digit".
In this situation, you can consume this \n character with either
Scanf.scanf "\n" ();
Scanf.scanf "%d %d" (-);;
or
Scanf.scanf "\n%d %d" (-);;
However, a better solution is probably to add a newline to your input format:
Scanf.scanf "%d %d\n" (fun a b -> Printf.printf "%d\n" (a - b))

To extend on the existing answer, I think it is better if the scanner doesn't expect a newline at the beginning ("\n%d %d") or the end ("%d %d\n") of the pattern, because typically you read data from a stream that doesn't start with a newline and may not end with a newline (e.g. end of file).
I'd suggest splitting the input into lines and scanning them individually. Also, it is preferable to define smaller functions that do one thing each instead of trying to mix everything in a single big function.
For example, first let's define a function that scans two integers:
# let scan_couple str =
Scanf.sscanf str "%d %d" (fun a b -> (a, b))
val scan_couple : string -> int * int = <fun>
It works as follows:
# scan_couple "42 69";;
- : int * int = (42, 69)
Nice, now let's define a function that scans the next couple from a channel, assuming couples are entries separated by newlines:
# let scan_next_couple c =
match (In_channel.input_line c) with
None -> None
| Some line -> Some (scan_couple line);;
val scan_next_couple : In_channel.t -> (int * int) option = <fun>
Hopefully each definition is a bit simpler to understand.
You still have to handle exceptions if an entry does not match the scanner format, etc. You may want to handle all the possible exceptions listed in the Scanf manual:
let scan_next_couple c =
match (In_channel.input_line c) with
| None -> None
| Some line ->
try Some (scan_couple line) with
| Scanf.Scan_failure _
| Failure _
| End_of_file
| Invalid_argument _ -> None
But then you cannot distinguish between an end of file and a single line that is malformed, which can be a problem. Depending on how much effort you want to spend on it, you can be more or less robust to errors here (e.g. maybe wrap option values in a result type, or define another type).

Related

Create a list reading a file with SML

I'm trying to create a List reading a text file, for example I have a text file like this "1 5 12 9 2 6" and I want to create a list like this [1,5,12,9,2,6] using SML
You can divide this task into several sub-problems:
Reading a file into a string can be done with
type filepath = string
(* filepath -> string *)
fun readFile filePath =
let val fd = TextIO.openIn filePath
val s = TextIO.inputAll fd
val _ = TextIO.closeIn fd
in s end
See the TextIO library.
Converting a string into a list of strings separated by whitespace can be done with
(* string -> string list *)
fun split s =
String.tokens Char.isSpace s
See the String.tokens function.
Converting a list of strings into a list of integers can be done with
(* 'a option list -> 'a list option *)
fun sequence (SOME x :: rest) = Option.map (fn xs => x :: xs) (sequence rest)
| sequence (NONE :: _) = NONE
| sequence [] = SOME []
fun convert ss = sequence (List.map Int.fromString ss)
Since any one string-to-integer conversion with Int.fromString may fail and produce a NONE, List.map Int.fromString will produce an "int option list" rather than an "int list". This list of "int option" may be converted to an optional "int list", i.e., remove the SOME of all the "int option", but if there's a single NONE, the entire result is discarded and becomes NONE. This gives the final type "int list option" (either NONE or SOME [1,2,...]).
See the Option.map function which was useful for this kind of recursion.
Combining these,
(* filepath -> int list *)
fun readIntegers filePath =
convert (split (readFile filePath))
This approach does yield some potentially unwanted behavior:
Filesystem errors will make readIntegers throw an Io exception
The string ~5 inside the file will be interpreted as negative five
The string -5 will produce a failure (NONE)
The string 123a will produce the number 123 (Int.toString is a bit too forgiving)
You may want to address those.

How to read lists of numbers from a file using string formats in OCaml

I want to get the list of numbers present in a file in a specific format. But I did not get any format (like %s %d) for list of numbers.
My file contains text as follows:
[1;2] [2] 5
[45;37] [9] 33
[3] [2;4] 1000
I tried the following
value split_input str fmt = Scanf.sscanf str fmt (fun x y z -> (x,y,z));
value rec read_file chin acc fmt =
try let line = input_line chin in
let (a,b,c) = split_input line fmt in
let acc = List.append acc [(a,b,c)] in
read_file chin acc fmt
with
[ End_of_file -> do { close_in chin; acc}
];
value read_list =
let chin = open_in "filepath/filename" in
read_file chin [] "%s %s %d";
The problem is with the format that is specified towards the end. I used the same code for getting data from some other file, where the data was in the format (string * string * int).
To reuse the same code I have to receive the above text in string and then split according to my requirement. My question is: is there a format like %s %d for a list of integers, so that I get the list directly from the file instead of writing another code to convert string to list.
There is no built-in specifier for lists in Scanf. It is possible to use the %r specifier to delegate parsing to custom scanner, but Scanf is not really designed for parsing complex format:
let int_list b = Scanf.bscanf b "[%s#]" (fun s ->
List.map int_of_string ## String.split_on_char ';' s
)
Then with this int_list parser, we can write
let test = Scanf.sscanf "[1;2]#[3;4]" "%r#%r" int_list int_list (#)
and obtain
val test : int list = [1; 2; 3; 4]
as expected. But at the same time, it was easier to use String.split_on_char to do the splitting. In general parsing complicated format is better done with
a regexp library, a parser combinator library or a parser generator.
P.S: you should probably avoid the revised syntax, it has fallen into disuse.

How to easily read lines from stdin?

Some time ago, I decided to solve a simple task on HackerRank but using OCaml and Core, in order to learn them. In one of the tasks, I'm supposed to read data from standard input:
The first line contains an integer, denoting the number of entries
in the phone book. Each of the subsequent lines describes an entry in
the form of space-separated values on a single line. The first value
is a friend's name, and the second value is an -digit phone number.
After the lines of phone book entries, there are an unknown number of
lines of queries. Each line (query) contains a to look up, and you
must continue reading lines until there is no more input.
The main issues:
I don't know how many lines there will be
Last line don't ends by newline, so I can't just read scanf "%s\n" until End_of_file
And my code became messy:
open Core.Std
open Printf
open Scanf
let read_numbers n =
let phone_book = String.Table.create () ~size:n in
for i = 0 to (n - 1) do
match In_channel.input_line stdin with
| Some line -> (
match (String.split line ~on:' ') with
| key :: data :: _ -> Hashtbl.set phone_book ~key ~data
| _ -> failwith "This shouldn't happen"
)
| None -> failwith "This shouldn't happen"
done;
phone_book
let () =
let rec loop phone_book =
match In_channel.input_line stdin with
| Some line -> (
let s = match Hashtbl.find phone_book line with
| Some number -> sprintf "%s=%s" line number
| None -> "Not found"
in
printf "%s\n%!" s;
loop phone_book
)
| None -> ()
in
match In_channel.input_line stdin with
| Some n -> (
let phone_book = read_numbers (int_of_string n) in
loop phone_book
)
| None -> failwith "This shouldn't happen"
If I solve this task in Python, then code looks like this:
n = int(input())
book = dict([tuple(input().split(' ')) for _ in range(n)])
while True:
try:
name = input()
except EOFError:
break
else:
if name in book:
print('{}={}'.format(name, book[name]))
else:
print('Not found')
This is shorter and clearer than the OCaml code. Any advice on how to improve my OCaml code? And there two important things: I don't want to abandon OCaml, I just want to learn it; second - I want to use Core because of the same reason.
The direct implementation of the Python code in OCaml would look like this:
let exec name =
In_channel.(with_file name ~f:input_lines) |> function
| [] -> invalid_arg "Got empty file"
| x :: xs ->
let es,qs = List.split_n xs (Int.of_string x) in
let es = List.map es ~f:(fun entry -> match String.split ~on:' ' entry with
| [name; phone] -> name,phone
| _ -> invalid_arg "bad entry format") in
List.iter qs ~f:(fun name ->
match List.Assoc.find es name with
| None -> printf "Not found\n"
| Some phone -> printf "%s=%s\n" name phone)
However, OCaml is not a script-language for writing small scripts and one shot prototypes. It is the language for writing real software, that must be readable, supportable, testable, and maintainable. That's why we have types, modules, and all the stuff. So, if I were writing a production quality program, that is responsible for working with such input, then it will look very differently.
The general style that I personally employ, when I'm writing a program in a functional language is to follow these two simple rules:
When in doubt use more types.
Have fun (lots of fun).
I.e., allocate a type for each concept in the program domain, and use lots of small function.
The following code is twice as big, but is more readable, maintainable, and robust.
So, first of all, let's type: the entry is simply a record. I used a string type to represent a phone for simplicity.
type entry = {
name : string;
phone : string;
}
The query is not specified in the task, so let's just stub it with a string:
type query = Q of string
Now our parser state. We have three possible states: the Start state, a state Entry n, where we're parsing entries with n entries left so far, and Query state, when we're parsing queries.
type state =
| Start
| Entry of int
| Query
Now we need to write a function for each state, but first of all, let's define an error handling policy. For a simple program, I would suggest just to fail on a parser error. We will call a function named expect when our expectations fail:
let expect what got =
failwithf "Parser error: expected %s got %s\n" what got ()
Now the three parsing functions:
let parse_query s = Q s
let parse_entry s line = match String.split ~on:' ' line with
| [name;phone] -> {name;phone}
| _ -> expect "<name> <phone>" line
let parse_expected s =
try int_of_string s with exn ->
expect "<number-of-entries>" s
Now let's write the parser:
let parse (es,qs,state) input = match state with
| Start -> es,qs,Entry (parse_expected input)
| Entry 0 -> es,qs,Query
| Entry n -> parse_entry input :: es,qs,Entry (n-1)
| Query -> es, parse_query input :: qs,Query
And finally, let's read data from file:
let of_file name =
let es,qs,state =
In_channel.with_file name ~f:(fun ch ->
In_channel.fold_lines ch ~init:([],[],Start) ~f:parse) in
match state with
| Entry 0 | Query -> ()
| Start -> expect "<number-of-entries><br>..." "<empty>"
| Entry n -> expect (sprintf "%d entries" n) "fewer"
We also check that our state machine reached a proper finish state, that is it is either in Query or Entry 0 state.
As in Python, the key to a concise implementation is to let the standard library do most of the work; the following code uses Sequence.fold in lieu of Python's list comprehension. Also, using Pervasives.input_line rather than In_channel.input_line allows you to cut down on extraneous pattern matching (it will report an end of file condition as an exception rather than a None result).
open Core.Std
module Dict = Map.Make(String)
let n = int_of_string (input_line stdin)
let d = Sequence.fold
(Sequence.range 0 n)
~init:Dict.empty
~f:(fun d _ -> let line = input_line stdin in
Scanf.sscanf line "%s %s" (fun k v -> Dict.add d ~key:k ~data:v))
let () =
try while true do
let name = input_line stdin in
match Dict.find d name with
| Some number -> Printf.printf "%s=%s\n" name number
| None -> Printf.printf "Not found.\n"
done with End_of_file -> ()

Reading all characters in OCaml is too slow

I'm a beginner with OCaml and I want to read lines from a file and then examine all characters in each line.
As a dummy example, let's say we want to count the occurrences of the character 'A' in a file.
I tried the following
open Core.Std
let count_a acc string =
let rec count_help res stream =
match Stream.peek stream with
| None -> res
| Some char -> Stream.junk stream; if char = 'A' then count_help (res+1) stream else count_help res stream
in acc + count_help 0 (Stream.of_string string)
let count_a = In_channel.fold_lines stdin ~init:0 ~f:count_a
let () = print_string ((string_of_int count_a)^"\n"
I compile it with
ocamlfind ocamlc -linkpkg -thread -package core -o solution solution.ml
run it with
$./solution < huge_file.txt
on a a file with one million lines which gives me the following times
real 0m16.337s
user 0m16.302s
sys 0m0.027s
which is 4 times more than my python implementation. I'm fairly sure that it should be possible to make this go faster, but I how should I go about doing this?
To count the number of A chars in a string you can just use String.count function. Indeed, the simpliest solution will be:
open Core.Std
let () =
In_channel.input_all stdin |>
String.count ~f:(fun c -> c = 'A') |>
printf "we have %d A's\n"
update
A slightly more complicated (and less memory hungry solution), with [fold_lines] will look like this:
let () =
In_channel.fold_lines stdin ~init:0 ~f:(fun n s ->
n + String.count ~f:(fun c -> c = 'A') s) |>
printf "we have %d A's\n"
Indeed, it is slower, than the previous one. It takes 7.3 seconds on my 8-year old laptop, to count 'A' in 20-megabyte text file. And 3 seconds on a former solution.
Also, you can find this post interesting, I hope.

OCaml error: wrong type of expression in constructor

I have a function save that take standard input, which is used individually like this:
./try < input.txt (* save function is in try file *)
input.txt
2
3
10 29 23
22 14 9
and now i put the function into another file called path.ml which is a part of my interpreter. Now I have a problem in defining the type of Save function and this is because save function has type in_channel, but when i write
type term = Save of in_channel
ocamlc complain about the parameter in the command function.
How can i fix this error? This is the reason why in my last question posted on stackoverflow, I asked for the way to express a variable that accept any type. I understand the answers but actually it doesn't help much in make the code running.
This is my code:
(* Data types *)
open Printf
type term = Print_line_in_file of int*string
| Print of string
| Save of in_channel (* error here *)
;;
let input_line_opt ic =
try Some (input_line ic)
with End_of_file -> None
let nth_line n filename =
let ic = open_in filename in
let rec aux i =
match input_line_opt ic with
| Some line ->
if i = n then begin
close_in ic;
(line)
end else aux (succ i)
| None ->
close_in ic;
failwith "end of file reached"
in
aux 1
(* get all lines *)
let k = ref 1
let first = ref ""
let second = ref ""
let sequence = ref []
let append_item lst a = lst # [a]
let save () =
try
while true do
let line = input_line stdin in
if k = ref 1
then
begin
first := line;
incr k;
end else
if k = ref 2
then
begin
second := line;
incr k;
end else
begin
sequence := append_item !sequence line;
incr k;
end
done;
None
with
End_of_file -> None;;
let rec command term = match term with
| Print (n) -> print_endline n
| Print_line_in_file (n, f) -> print_endline (nth_line n f)
| Save () -> save ()
;;
EDIT
Error in code:
Save of in_channel:
Error: This pattern matches values of type unit
but a pattern was expected which matches values of type in_channel
Save of unit:
Error: This expression has type 'a option
but an expression was expected of type unit
There are many errors in this code, so it's hard to know where to start.
One problem is this: your save function has type unit -> 'a option. So it's not the same type as the other branches of your final match. The fix is straightforward: save should return (), not None. In OCaml these are completely different things.
The immediate problem seems to be that you have Save () in your match, but have declared Save as taking an input channel. Your current code doesn't have any way to pass the input channel to the save function, but if it did, you would want something more like this in your match:
| Save ch -> save ch
Errors like this suggest (to me) that you're not so familiar with OCaml's type system. It would probably save you a lot of trouble if you went through a tutorial of some kind before writing much more code. You can find tutorials at http://ocaml.org.