Scanning a file as a string - ocaml

I have a built a function which takes as input a string and output a string.
Let's call it f.
I would like to scan the string into a file input.txt and apply my function on this string and write it on another file output.txt.
Other questions: If the file is too big, maybe the scanning is impossible. Thus I have a function f_line, and I would like to scan one by one each line of input.txt and apply this function to this line, and write each output in the file in the file output.txt.
How can I do that?

You basically want to map a file with your function to another file, much like you map lists, e.g.,
# List.map String.uppercase_ascii ["hello"; "world"];;
- : string list = ["HELLO"; "WORLD"]
In OCaml, files are read and written via an abstraction called a channel. Channels have directions, i.e., input channels are distinguished from the output channels. To open an input channel use the open_in function, to close it, use close_in. The corresponding functions for the output channels have the _out prefix.
To map two channels line by line, we need to read a line from one channel, apply our transformation f to each line and write to the output channel, until the first channel raises the End_of_file exception that indicates that there is no more data, e.g.,
let rec map_channels input output f =
match f (input_line input) with
| exception End_of_file -> flush output
| r ->
output_string output r;
output_char output '\n';
map_channels input output f
Now we can use this function to write a function that takes filenames, instead of channels, e.g.,
let map_files input output f =
if input = output
then invalid_arg "the input and output files must differ";
let input = open_in input in
let output = open_out output in
map_channels input output f;
close_in input;
close_out output
Notice, that we are checking that input and output files are different to prevent mapping the file to itself, which might end up in an infinite loop and may corrupt files.

I've finally found an easy solution with the following code :
let transform_files_by_line
(f_line : string -> string) (in_filename : string)
(out_filename : string) =
let input_chan = open_in in_filename
and output_chan = open_out out_filename
in
let rec transform_rec () =
let str = input_line input_chan in
output_string output_chan (f_line str) ;
transform_rec () ;
in
try (transform_rec ()) with
End_of_file -> (
close_in input_chan;
close_out output_chan;) ;;

Related

How to process elements of an OCAML list?

I want to process the data present in file "persons.txt".
But i have tried everything to process all the lines from text file.
The only way i can process data is by creating the list manually.
let myList = ["John";"23"]
I want the program to iterate through all the lines of the text file.
I have managed a way to pass all the content of the text file into a list but i can+t seem to move on from that stage.
My way of thinking is:
Read content from text file
Convert to OCaml list
Separate list into sublists
Iterate through sublists
Only print to screen text respecting conditions
Can you please guide me?
Thanks!!
open Printf
(* FILE CONTENTS *)
(*
John;23;
Mary;16;
Anne;21;
*)
let file = "data/persons.txt"
;;
(* READ FROM EXTERNAL FILE *)
let read_lines name : string list =
if Sys.file_exists (name) then
begin
let ic = open_in name in
try
let try_read () =
try Some (input_line ic) with End_of_file -> None in
let rec loop acc = match try_read () with
| Some s -> loop (s :: acc)
| None -> close_in_noerr ic; List.rev acc in
loop []
with e ->
close_in_noerr ic;
[]
end
else
[]
;;
(...)
Your question is not at all clear. Here are some observations:
First, your read_lines function doesn't return the input in the form you need.
What read_lines returns looks like this:
["John;23;"; "Mary;16;"; "Anne;21;"]
But what you want is something more like this:
[("John", "23)"; ("Mary", "16"); ("Anne", "21")]
The key here is to split the strings into pieces using ; as a separator. You can probably use String.split_on_char for this.
Second, you are not defining a function to calculate an answer from paramters. Instead your calculation is based on global variables. This won't generalize.
Instead of saying this:
let adult_check_condition =
... using global age and name ...
You need to define a function:
let adult_check_condition age name =
... use parameters age and name ...
Then you can call this function with different ages and names.

Read a file line per line and store every line read in a single list

I'm a student and I've been given a exercice i've been struggling with for about a month or so.
I'm trying to write a function in Ocaml. This function must read a text file which has a word per line, and it must store all the words in a list.
But the problem is that this program must be a recursive one (which means no loops, no "while").
All I've been able to do so far is to create a function which reads the text file (pretty much like the BASH command "cat")
let dico filename =
let f = open_in filename in
let rec dico_rec () =
try
print_string (input_line f);
print_newline ();
dico_rec();
with End_of_file -> close_in f
in dico_rec() ;;
I just don't know how to do it. Ocaml is hardly my favourite language.
Here's an alternate definition of build_list that is tail recursive. You can use it instead of #MitchellGouzenko's definition if your inputs can have many lines.
let rec build_list l =
match input_line ic with
| line -> build_list (line :: l)
| exception End_of_file -> close_in ic; List.rev l
open Printf
let file = "example.dat"
let () =
let ic = open_in file in
let rec build_list infile =
try
let line = input_line infile in
line :: build_list(infile)
with End_of_file ->
close_in infile;
[] in
let rec print_list = function
[] -> ()
| e::l -> print_string e ; print_string " " ; print_list l in
print_list(build_list(ic))
Edit: The algorithm I previously proposed was unnecessarily complicated. Try to understand this one instead.
To understand this recursion, we assume that build_list works correctly. That is, assume build_list correctly takes an open file as an argument and returns a list of lines in the file.
Now, let's look at the function's body. It reads a line from the file and calls build_list again. If there are N lines in the file, calling build_list again should return a list of the remaining N-1 lines in the file (since we just read the first line ourselves). We append the line we just read to the list returned from build_list, and return the resulting list, which has all N lines.
The recursion continues until it hits the base case. The base case is when there's an End_of_file. In this case we return an empty list.

Read a large file into string lines OCaml

I am basically trying to read a large file (around 10G) into a list of lines. The file contains a sequence of integer, something like this:
0x123456
0x123123
0x123123
.....
I used the method below to read files by default for my codebase, but it turns out to be quit slow (~12 minutes) at this scenario
let lines_from_file (filename : string) : string list =
let lines = ref [] in
let chan = open_in filename in
try
while true; do
lines := input_line chan :: !lines
done; []
with End_of_file ->
close_in chan;
List.rev !lines;;
I guess I need to read the file into memory, and then split them into lines (I am using a 128G server, so it should be fine for the memory space). But I still didn't understand whether OCaml provides such facility after searching the documents here.
So here is my question:
Given my situation, how to read files into string list in a fast way?
How about using stream? But I need to adjust related application code, then that could cause some time.
First of all you should consider whether you really need to have all the information at once in your memory. Maybe it is better to process file line-by-line?
If you really want to have it all at once in memory, then you can use Bigarray's map_file function to map a file as an array of characters. And then do something with it.
Also, as I see, this file contains numbers. Maybe it is better to allocate the array (or even better a bigarray) and the process each line in order and store integers in the (big)array.
I often use the two following function to read the lines of a file. Note that the function lines_from_files is tail-recursive.
let read_line i = try Some (input_line i) with End_of_file -> None
let lines_from_files filename =
let rec lines_from_files_aux i acc = match (read_line i) with
| None -> List.rev acc
| Some s -> lines_from_files_aux i (s :: acc) in
lines_from_files_aux (open_in filename) []
let () =
lines_from_files "foo"
|> List.iter (Printf.printf "lines = %s\n")
This should work:
let rec ints_from_file fdesc =
try
let l = input_line fdesc in
let l' = int_of_string l in
l' :: ints_from_file fdesc
with | _ -> []
This solution converts the strings to integers as they're read in (which should be a bit more memory efficient, and I assume this was going to be done to them eventually.
Also, because it is recursive, the file must be opened outside of the function call.

copying files in sml

I am trying to learn input output in sml.In an effort to copy strings of lsthat are the same as s1 into the file l2 I did the following.I am getting some errors I can not really understand.Can someone help me out.
fun test(l2:string,ls:string list,s1:string) = if (String.isSubstring(s1 hd(ls))) then
(TextIO.openOut l2; TextIO.inputLine hd(ls))::test(l2,tl(ls),s1) else
test(l2,tl(ls),s1);
Here are some general hints:
Name your variables something meaningful, like filename, lines and line.
The function TextIO.inputLine takes as argument a value of type instream.
When you write TextIO.inputLine hd(ls), what this is actually interpreted as is
(TextIO.inputLine hd) ls, which means "treat hd as if it were an instream and
try and read a line from it, take that line and treat it as if it were a function,
and apply it on ls", which is of course complete nonsense.
The proper parenthesising in this case would be TextIO.inputLine (hd ls), which
still does not make sense, since we decided that ls is a string list, and so hd ls
will be a string and not an instream.
Here is something that resembles what you want to do, but opposite:
(* Open a file, read each line from file and return those that contain mySubstr *)
fun test (filename, mySubstr) =
let val instr = TextIO.openIn filename
fun loop () = case TextIO.inputLine instr of
SOME line => if String.isSubstring mySubstr line
then line :: loop () else loop ()
| NONE => []
val lines = loop ()
val _ = TextIO.closeIn instr
in lines end
You need to use TextIO.openOut and TextIO.output instead. TextIO.inputLine is one that reads from files.

return values from file - ocaml

I am trying to read a file and return the element read from the file as an input to another function.
How can I return a value when I am reading from the file??
I tried everything I am aware of and am still hopelessly lost.
My code is as follows:
let file = "code.txt";;
let oc = open_out file in (* create or truncate file, return channel *)
fprintf oc "%s\n" (play); (* write code to file returned from calling (play) function *)
close_out oc ;;
(*read from file*)
let read l=
let f x =
let ic = open_in file in
let line = input_line ic in (* read line from in_channel and discard \n *)
print_endline line; (* write the result to stdout *)
((x ^ line) :: l);
flush stdout;
close_in ic ;
in
f l
;;
prompt: read;; function call outputs:
- : unit = ()
My file contains a string which is a code needed as input for another function.
Please help. I am not sure where I am going wrong.
Thank you.
If multiple expressions are sequenced together using ; the value of the whole expression is the value of the last expression in the sequence.
So if you have something like ((x ^ line) :: l); close_in ic the value of that expression is the value of close_in ic, which is ().
Obviously that's not what you want. In order to make ((x ^ line) :: l) the result of the whole expression, you should place it after close_in ic.